Hi Simon,

After reading what you wrote, I fully agree with you. Your distinction
between citing a citation and citing the original work is a useful
one. This would happen naturally if you focused on identifying the
algorithm and understanding its limitations, as I recommended. But
they way you have put it is more nuanced and sensible.

Actually, regarding thin wrappers, nothing irritates me more than a
programmer saying "we have just implemented feature X in system Y"
when what they mean is they "wrapped function W from external library
Z" and called it an "implementation" of feature X.

[This issue actually grates on me a lot. I recently saw the most
absurd example of this. Someone claimed they had "implemented" a C
REPL. It turned out to be a Python program which took the C code you
entered and passed it to GCC via the command line. It was completely
and totally broken and didn't even preserve state. Now, it turned out
that a group at CERN has actually been working on a real C++ REPL
called Cling, for analysing data from the LHC (something like 25
petabytes of serialised C++ objects a year after filtering). I could
imagine some miscreant wrapping the whole Cling project in Python and
claiming they have implemented a C++ REPL. At least the Sage library
has many new and interesting implementations of original algorithms
which build on the functionality it "wraps".]

As for citation, unfortunately professional mathematicians rarely have
time to learn anything about programming and so often you find less
than adequate details when they cite computer software. I've been at a
Pure Maths conference for a week and a half and I've seen computers
mentioned twice. The first time no credit was given at all (though it
was almost certainly Pari). The second time the entire credit was
"SAGE", although the speaker took care to acknowledge the help of John
Cremona in helping them perform the computation. In that case I
believe the computation was possibly done entirely in Pari via Sage.
It looked to be computation of class numbers and unit groups of some
number fields, although I am not 100% sure of this. The speaker was
delighted that "computers can do these things". Of course computers
can't do anything. Better to cite the person who did the
implementation. And did that speaker realise that the computation was
subject to the GRH or worse? They may be excused for not going into
great detail if the computation was as mundane as I thought it was. It
seems to fall into the category of "by a well known result...".

The situation we really need to be worried about is where some special
algorithm is used which makes a given computation possible where it
was not otherwise. If my computation relied critically on multiply two
large polynomials of degree ten billion, I might want to mention that
I used flint via Sage, in case the reader haplessly tries to do this
in Pari or some system that only supports polynomials of length up to
2^30.

Of course properly citing Pari and the algorithm used and checking
under what conditions the result is meaningful, etc, assumes that
these things are well documented and accessible. So, a positive step
the Pari developers could take is to make it easy to read the source
code for a given function in Pari, much as Sage has done for Sage
library code, and to document the algorithms used and where they are
published and what their restrictions are.

A lot of Pari code is especially difficult because it uses messy stack
allocation all over the place. This is not thread safe and it is
difficult to read and contribute to. If I wanted to make Pari really
easy to contribute to and understand, I'd probably use garbage
collection for all but the core arithmetic routines. I'd still make
the core threadsafe though. I believe flint2 proves that this is
possible in an efficient manner without messy stack allocation
routines.

The Pari developers might also like to consider people who wrote the
Sage wrapper for Pari as contributing to Pari and credit them as
such.

It is in matters like these that Sage has taken a massive first step
towards making Computational Number Theory a respectable sport. Whilst
it doesn't address all the issues I raised in my earlier post, it has
taken the field from being a mass of poorly documented, broken, non-
portable, incompatible code lying around on people's hard drives or
locked up in inscrutable closed implementations to being a unified
distribution which meets some minimum standards. The code in the Sage
library is open and easily accessible. Every function has a docstring
and doctests (alright, that's a lie, but it's almost true). Library
code is subjected to some kind of peer review. References to papers
are provided. And moreover, Sage builds on multiple platforms and is
tested regularly so it is more useful for some people. These are all
very important first steps in bringing the field into good repute.
None of these things seems to have hurt the popularity of Sage.

Having said that, I personally find it very difficult to trace through
which algorithm is used in what regime in Sage. I once made some
comments about which algorithms were used in Sage in a certain regime
and someone lambasted me for getting it wrong. I subsequently
carefully checked it through, following the rabbit warren of function
calls across multiple interface boundaries through many files
separated across vast reaches of the Sage directory tree and I am
convinced I had the details substantially right. Of course if we just
relied on Sage documentation to tell us what algorithm is being used
in which package, we'd get it wrong much of the time, because that
relies on the documentation having been maintained correctly. But at
least Sage makes a step in the right direction. I've had similar
issues trying to trace through the linbox package to figure out which
algorithm is used.

By the way, nothing I've said should be taken to mean I am not
grateful for the efforts of people like Burcin Erocal, Robert
Bradshaw, Martin Albrecht (to name a few) who have written Sage
wrapper code for flint. This is important if Sage is to be successful.
I am only agreeing with the notion that wrapper code does not merit
citation per se. Nothing is stopping someone adding an acknowledgement
for this work if it enables something to be done that could not
formerly be done. But it should be clear that five minutes spent
wrapping a library function is substantially different to spending a
day or a month or even a year implementing that function.

Bill.

On Jul 26, 1:02 am, Simon King <simon.k...@uni-jena.de> wrote:
> Hi Bill,
>
> let me comment on the "how to cite individual Sage components" topic.
>
> On 26 Jul., 00:18, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > * Citing a Number Theory package you know nothing about in published
> > research is about as dubious as citing a paper you have never read.
> > Unless you are prepared to read the code and understand the algorithm
> > Pari uses, what right do you have to cite it? However, when writing
> > papers, if you use a published result which makes use of some other
> > published results, you usually cite the result you directly used, not
> > the many other results that backed it up. In fact it is considered
> > poor form to cite papers you know nothing about.
>
> You draw an analogy between "using software A that in fact uses
> software B behind the scenes" and "using a theorem of paper A that in
> fact uses a theorem from paper B". You conclude: One would only cite
> paper A, and it would actually be wrong to cite paper B if one doesn't
> know it; hence, one should (by analogy) cite software A, but not
> software B that one doesn't know but that does the actual work.
>
> I believe you need to differentiate a bit more. Assume that the
> theorem that you use appears in paper A just as a citation from paper
> B. What would you do in your paper? Would you plainly write "by Thm
> 3.2 in [A]", even though the theorem is not proved in A? Or would you
> try to look it up in paper B and write "by Thm 7.8 in [B] (see also
> Thm 3.2 in [A])"?
>
> The analogy of "paper A cites the theorem from paper B" in computer
> algebra is "software A just uses a thin wrapper around software B".
> That's to say: Assume that you have in your Sage session an object X
> that in fact is a group living in GAP. Now, you do X.SylowSubgroup(2).
> Sage's main contribution to that computation is to send the string
> "SylowSubgroup(%s, 2)"%X.name() to GAP and to wait for an answer.
>
> So, would you say "Sage has computed the  Sylow subgroup"? Wouldn't it
> be more honest to say "GAP has computed the Sylow subgroup"?
>
> There are, of course, more complicated cases. For example, I wrote an
> (optional) spkg that computes modular cohomology rings of finite
> groups.
>  * It makes intense use of GAP and Singular, but the code base is
> mainly original Cython code. However, the computations performed by
> GAP and Singular are essential steps in the algorithm. Would you say
> that it suffices to cite Sage for my spkg? I can tell you that some
> Singular developers would be VERY upset if I wouldn't properly credit
> Singular! And I agree with them.
>  * In one line of an auxiliary method, I factor a multivariate
> polynomial over a finite field. The factorisation is of no importance
> for the final result, but it is part of a heuristic to speed up one
> step of the algorithm. Would you say that I should try to find out
> what component of Sage is responsible for the factorisation, and cite
> it? I think that would go too far.
>
> I don't know much about Pari. But if it does the actual work and Sage
> only provides the interface, then I think it is clear that Pari must
> be cited.
>
> Cheers,
> Simon

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to