Hi Simon, After reading what you wrote, I fully agree with you. Your distinction between citing a citation and citing the original work is a useful one. This would happen naturally if you focused on identifying the algorithm and understanding its limitations, as I recommended. But they way you have put it is more nuanced and sensible.
Actually, regarding thin wrappers, nothing irritates me more than a programmer saying "we have just implemented feature X in system Y" when what they mean is they "wrapped function W from external library Z" and called it an "implementation" of feature X. [This issue actually grates on me a lot. I recently saw the most absurd example of this. Someone claimed they had "implemented" a C REPL. It turned out to be a Python program which took the C code you entered and passed it to GCC via the command line. It was completely and totally broken and didn't even preserve state. Now, it turned out that a group at CERN has actually been working on a real C++ REPL called Cling, for analysing data from the LHC (something like 25 petabytes of serialised C++ objects a year after filtering). I could imagine some miscreant wrapping the whole Cling project in Python and claiming they have implemented a C++ REPL. At least the Sage library has many new and interesting implementations of original algorithms which build on the functionality it "wraps".] As for citation, unfortunately professional mathematicians rarely have time to learn anything about programming and so often you find less than adequate details when they cite computer software. I've been at a Pure Maths conference for a week and a half and I've seen computers mentioned twice. The first time no credit was given at all (though it was almost certainly Pari). The second time the entire credit was "SAGE", although the speaker took care to acknowledge the help of John Cremona in helping them perform the computation. In that case I believe the computation was possibly done entirely in Pari via Sage. It looked to be computation of class numbers and unit groups of some number fields, although I am not 100% sure of this. The speaker was delighted that "computers can do these things". Of course computers can't do anything. Better to cite the person who did the implementation. And did that speaker realise that the computation was subject to the GRH or worse? They may be excused for not going into great detail if the computation was as mundane as I thought it was. It seems to fall into the category of "by a well known result...". The situation we really need to be worried about is where some special algorithm is used which makes a given computation possible where it was not otherwise. If my computation relied critically on multiply two large polynomials of degree ten billion, I might want to mention that I used flint via Sage, in case the reader haplessly tries to do this in Pari or some system that only supports polynomials of length up to 2^30. Of course properly citing Pari and the algorithm used and checking under what conditions the result is meaningful, etc, assumes that these things are well documented and accessible. So, a positive step the Pari developers could take is to make it easy to read the source code for a given function in Pari, much as Sage has done for Sage library code, and to document the algorithms used and where they are published and what their restrictions are. A lot of Pari code is especially difficult because it uses messy stack allocation all over the place. This is not thread safe and it is difficult to read and contribute to. If I wanted to make Pari really easy to contribute to and understand, I'd probably use garbage collection for all but the core arithmetic routines. I'd still make the core threadsafe though. I believe flint2 proves that this is possible in an efficient manner without messy stack allocation routines. The Pari developers might also like to consider people who wrote the Sage wrapper for Pari as contributing to Pari and credit them as such. It is in matters like these that Sage has taken a massive first step towards making Computational Number Theory a respectable sport. Whilst it doesn't address all the issues I raised in my earlier post, it has taken the field from being a mass of poorly documented, broken, non- portable, incompatible code lying around on people's hard drives or locked up in inscrutable closed implementations to being a unified distribution which meets some minimum standards. The code in the Sage library is open and easily accessible. Every function has a docstring and doctests (alright, that's a lie, but it's almost true). Library code is subjected to some kind of peer review. References to papers are provided. And moreover, Sage builds on multiple platforms and is tested regularly so it is more useful for some people. These are all very important first steps in bringing the field into good repute. None of these things seems to have hurt the popularity of Sage. Having said that, I personally find it very difficult to trace through which algorithm is used in what regime in Sage. I once made some comments about which algorithms were used in Sage in a certain regime and someone lambasted me for getting it wrong. I subsequently carefully checked it through, following the rabbit warren of function calls across multiple interface boundaries through many files separated across vast reaches of the Sage directory tree and I am convinced I had the details substantially right. Of course if we just relied on Sage documentation to tell us what algorithm is being used in which package, we'd get it wrong much of the time, because that relies on the documentation having been maintained correctly. But at least Sage makes a step in the right direction. I've had similar issues trying to trace through the linbox package to figure out which algorithm is used. By the way, nothing I've said should be taken to mean I am not grateful for the efforts of people like Burcin Erocal, Robert Bradshaw, Martin Albrecht (to name a few) who have written Sage wrapper code for flint. This is important if Sage is to be successful. I am only agreeing with the notion that wrapper code does not merit citation per se. Nothing is stopping someone adding an acknowledgement for this work if it enables something to be done that could not formerly be done. But it should be clear that five minutes spent wrapping a library function is substantially different to spending a day or a month or even a year implementing that function. Bill. On Jul 26, 1:02 am, Simon King <simon.k...@uni-jena.de> wrote: > Hi Bill, > > let me comment on the "how to cite individual Sage components" topic. > > On 26 Jul., 00:18, Bill Hart <goodwillh...@googlemail.com> wrote: > > > * Citing a Number Theory package you know nothing about in published > > research is about as dubious as citing a paper you have never read. > > Unless you are prepared to read the code and understand the algorithm > > Pari uses, what right do you have to cite it? However, when writing > > papers, if you use a published result which makes use of some other > > published results, you usually cite the result you directly used, not > > the many other results that backed it up. In fact it is considered > > poor form to cite papers you know nothing about. > > You draw an analogy between "using software A that in fact uses > software B behind the scenes" and "using a theorem of paper A that in > fact uses a theorem from paper B". You conclude: One would only cite > paper A, and it would actually be wrong to cite paper B if one doesn't > know it; hence, one should (by analogy) cite software A, but not > software B that one doesn't know but that does the actual work. > > I believe you need to differentiate a bit more. Assume that the > theorem that you use appears in paper A just as a citation from paper > B. What would you do in your paper? Would you plainly write "by Thm > 3.2 in [A]", even though the theorem is not proved in A? Or would you > try to look it up in paper B and write "by Thm 7.8 in [B] (see also > Thm 3.2 in [A])"? > > The analogy of "paper A cites the theorem from paper B" in computer > algebra is "software A just uses a thin wrapper around software B". > That's to say: Assume that you have in your Sage session an object X > that in fact is a group living in GAP. Now, you do X.SylowSubgroup(2). > Sage's main contribution to that computation is to send the string > "SylowSubgroup(%s, 2)"%X.name() to GAP and to wait for an answer. > > So, would you say "Sage has computed the Sylow subgroup"? Wouldn't it > be more honest to say "GAP has computed the Sylow subgroup"? > > There are, of course, more complicated cases. For example, I wrote an > (optional) spkg that computes modular cohomology rings of finite > groups. > * It makes intense use of GAP and Singular, but the code base is > mainly original Cython code. However, the computations performed by > GAP and Singular are essential steps in the algorithm. Would you say > that it suffices to cite Sage for my spkg? I can tell you that some > Singular developers would be VERY upset if I wouldn't properly credit > Singular! And I agree with them. > * In one line of an auxiliary method, I factor a multivariate > polynomial over a finite field. The factorisation is of no importance > for the final result, but it is part of a heuristic to speed up one > step of the algorithm. Would you say that I should try to find out > what component of Sage is responsible for the factorisation, and cite > it? I think that would go too far. > > I don't know much about Pari. But if it does the actual work and Sage > only provides the interface, then I think it is clear that Pari must > be cited. > > Cheers, > Simon -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org