Re: [sage-devel] Re: Sage & Pari

2011-07-27 Thread daly


> On Jul 27, 11:58 am, John Cremona  wrote:
> > My innocent posting to sage-devel on this subject two days ago seems
> > to have done something to increase the number of postings this month,
> > even if a lot of the discussion has been much wider than the orginal
> > title (Sage & Pari) suggests!

Well, credit is "the coin of the realm" in open source
so it is important to make sure that those who work get
recognized ("paid"). This is especially sensitive in an
academic environment. If you spend a year writing software
but don't publish then the tenure committee won't see any
of that work.

I managed to widen ("hijack") this thread to the larger
topic of computational math and literate programming.
Apologies for that.

Tim Daly



-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


[sage-devel] Re: Sage & Pari

2011-07-27 Thread Bill Hart
I cited Pari in a non-peer reviewed article today.

Bill.

On Jul 27, 11:58 am, John Cremona  wrote:
> My innocent posting to sage-devel on this subject two days ago seems
> to have done something to increase the number of postings this month,
> even if a lot of the discussion has been much wider than the orginal
> title (Sage & Pari) suggests!
>
> John

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


Re: [sage-devel] Re: Sage & Pari

2011-07-27 Thread John Cremona
My innocent posting to sage-devel on this subject two days ago seems
to have done something to increase the number of postings this month,
even if a lot of the discussion has been much wider than the orginal
title (Sage & Pari) suggests!

John

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


[sage-devel] Re: Sage & Pari

2011-07-27 Thread Bill Hart
Of course if you code is so dependent on CPU cycles then it is a good
candidate for assembly optimisation. After superoprimisation you might
hit the theoretical retirement rate of muops in which case you can
actually say something about how fast it will run. Integer arithmetic
is one such case.

In cases where you are entirely cache dependent you may be able to set
a cutoff based entirely on the sizes of the caches on the machine.

But yes, in general it is an understatement. You are right.

Bill.

On Jul 27, 11:10 am, Volker Braun  wrote:
> On Wednesday, July 27, 2011 10:33:57 AM UTC+1, Bill Hart wrote:
>
> > It's rarely possible to fully
> > explain a cutoff without reference to the architecture, e.g.
> > Instruction latency, caching, etc
>
> Thats an understatement. The current state of CPUs makes it factually
> impossible to precisely quantify the execution time of a particular piece of
> code without running it. For starters, there are multiple cache levels of
> varying size. The compiler will reorder your C/C++ program when generating
> x86 (say) assembly, and the CPU will again reorder the x86 operands when it
> decodes them into mu-ops. And thanks to speculative execution sometimes
> mu-ops are effectively processed in parallel (depending on their order). For
> all practical purposes its impossible to say how many clock cycles are used
> to evaluate some C/C++ code.

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


[sage-devel] Re: Sage & Pari

2011-07-27 Thread Volker Braun
On Wednesday, July 27, 2011 10:33:57 AM UTC+1, Bill Hart wrote:
>
> It's rarely possible to fully 
> explain a cutoff without reference to the architecture, e.g. 
> Instruction latency, caching, etc


Thats an understatement. The current state of CPUs makes it factually 
impossible to precisely quantify the execution time of a particular piece of 
code without running it. For starters, there are multiple cache levels of 
varying size. The compiler will reorder your C/C++ program when generating 
x86 (say) assembly, and the CPU will again reorder the x86 operands when it 
decodes them into mu-ops. And thanks to speculative execution sometimes 
mu-ops are effectively processed in parallel (depending on their order). For 
all practical purposes its impossible to say how many clock cycles are used 
to evaluate some C/C++ code.



-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


[sage-devel] Re: Sage & Pari

2011-07-27 Thread Bill Hart

Hi Tim,

Let me first say that I am a little embarrassed that you downloaded
Flint as a possible example of literate code. Reading my original post
I did seem to give that impression. Actually I was merely mentioning
it as an example of a documented system, not necessarily literate as
such.

I didn't personally implement the Bell numbers code. The 7000 is
almost certainly a tuning cutoff which works well for "Fredrik's
laptop" which probably explains why it is not documented. At this
point there is no general tuning framework in flint. When we do
implement that this number will simply be autogenerated for the
architecture on which Flint is built. It's rarely possible to fully
explain a cutoff without reference to the architecture, e.g.
Instruction latency, caching, etc. But sometimes the cutoffs defy
explanation even when this information is available. It is useful when
a formula is found which can be used with only a few inputs such as
cache size and relative latencies as tuning can then often be done
without running timing experiments at build time, which is time
consuming.

I applaud your efforts with Clojure and look forward to seeing the
results. This is an important language which has received a lot of
attention.

In order to answer your other questions I will need to be at my
laptop, which I will not be for about a day. I'll try to reply in more
detail then.

I think it would be very interesting to have a Flint example of
literate programming. I'd be happy to provide some details (or at
least ask Fredrik if he might help).

Again with regard to the Sage library I wasn't directly claiming it
was literate in the sense you mean it. Merely that it is sometimes a
reasonable approximation. I'd be happy to look for some good examples
of this, though many of the Python files will do. It probably only
takes a few minutes to find an example of Sage best practice.

Bill.



On Jul 27, 7:32 am, daly  wrote:
> > > Again I have to credit Sebastian Pancratz and Fredrik Johansson here
> > > for raising the standard in flint. I thought I had been producing
> > > beautiful code until Sebastian started rewriting some of it for
> > > me. :-)
>
> I downloaded Flint and looked at the source code and documentation.
> First, I applaud you on the PDF. It looks very nice and it includes
> quite a bit of information. I especially liked the hyperlinks to
> Wolfram's site and the fact that you included a bibliography.
> The inline mathematics is very useful (a TeX advantage).
>
> As an experiment I tried to understand "Bell Numbers", of which I
> know nothing. Since the algorithm for "bell_number_vec_recursive"
> is small I can see what is being computed. However, I also see that
> "bell_number_vec" contains a magic number 7000 as the defining
> value to split the computation. One advantage of a literate form
> of the documentation would be the obvious need to explain this
> number. The pdf simply says that it "Automatically switches..."
> but not when or why.
>
> The hard part of a fully literate document is the need to explain
> the special "tricks" that make an algorithm fast. Mathematically
> they might not matter but computational math hinges on the details.
> Many an algorithm in Sage has been carefully tuned to get speed
> and it is this careful tuning that makes the algorithm worthwhile.
> It also makes it obscure. But this tuning includes crucial knowledge
> that cannot be reverse-engineered and is only known to the author.
> Eventually the author dies, as has happened with several Axiom authors.
>
> If a future maintainer needed to port Flint to run on the ARM
> or the Apple5 or the GPU/CPU/APU setup what are the critical things
> to optimize? Is 7000 machine independent? Knowing these details is
> the difference between a classroom algorithm and a world class one.
> If I write Bell Numbers in Axiom, does 7000 matter?
>
> The reason to favor a fully-embedded literate program is that you
> can't ignore or hide any implementation detail. The magic number
> 7000 would cry out for an explanation. Keeping the code in one
> file and the documentation in another file makes it trivial to
> skip details. A peer review of a literate program would certainly
> notice that this was skipped.
>
> I am tempted to make a literate paper about bell numbers that
> is built on your code but I'd obviously make a fool of myself
> in front of a crowd of number theorists :-) Never the less,
> a small "pamphlet" on the implementation of Bell Numbers would
> make a good starting example to critique. I would be willing
> to provide the tools and techniques if you'd be willing to
> provide the explanations. Feel free to decline.
>
> Tim Daly

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


Re: [sage-devel] Re: Sage & Pari

2011-07-26 Thread daly


> > Again I have to credit Sebastian Pancratz and Fredrik Johansson here
> > for raising the standard in flint. I thought I had been producing
> > beautiful code until Sebastian started rewriting some of it for
> > me. :-)

I downloaded Flint and looked at the source code and documentation.
First, I applaud you on the PDF. It looks very nice and it includes
quite a bit of information. I especially liked the hyperlinks to
Wolfram's site and the fact that you included a bibliography.
The inline mathematics is very useful (a TeX advantage).

As an experiment I tried to understand "Bell Numbers", of which I
know nothing. Since the algorithm for "bell_number_vec_recursive"
is small I can see what is being computed. However, I also see that
"bell_number_vec" contains a magic number 7000 as the defining
value to split the computation. One advantage of a literate form
of the documentation would be the obvious need to explain this
number. The pdf simply says that it "Automatically switches..."
but not when or why.

The hard part of a fully literate document is the need to explain
the special "tricks" that make an algorithm fast. Mathematically
they might not matter but computational math hinges on the details.
Many an algorithm in Sage has been carefully tuned to get speed
and it is this careful tuning that makes the algorithm worthwhile.
It also makes it obscure. But this tuning includes crucial knowledge
that cannot be reverse-engineered and is only known to the author.
Eventually the author dies, as has happened with several Axiom authors.

If a future maintainer needed to port Flint to run on the ARM
or the Apple5 or the GPU/CPU/APU setup what are the critical things
to optimize? Is 7000 machine independent? Knowing these details is
the difference between a classroom algorithm and a world class one.
If I write Bell Numbers in Axiom, does 7000 matter?




The reason to favor a fully-embedded literate program is that you
can't ignore or hide any implementation detail. The magic number
7000 would cry out for an explanation. Keeping the code in one
file and the documentation in another file makes it trivial to 
skip details. A peer review of a literate program would certainly
notice that this was skipped.

I am tempted to make a literate paper about bell numbers that
is built on your code but I'd obviously make a fool of myself
in front of a crowd of number theorists :-) Never the less,
a small "pamphlet" on the implementation of Bell Numbers would
make a good starting example to critique. I would be willing
to provide the tools and techniques if you'd be willing to 
provide the explanations. Feel free to decline.

Tim Daly


-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


Re: [sage-devel] Re: Sage & Pari

2011-07-26 Thread daly
...[snip]...
> > In regular mathematics it is standard practice to fully document, that
> > is, show the complete proof of your findings. In computational maths
> > we do not do this. There is no particular reason why we don't except
> > that it is not the expected behavior.
> 
> Unfortunately this is rarely true these days. In regular mathematics
> it is now very common for there to be large leaps between steps.
> Usually the more advanced the mathematician the less detail that is
> expected. Graduate students spend a lot of time filling it in.
> 
> But your point regarding computational mathematics is spot on. It is
> very common to find code completely undocumented in any way. There is
> a substantial disparity in attitudes here.

What concerns me is the long run. We are at the very beginning of a
new science of computational mathematics. The early systems are the
"newton's notebooks" of this science. Most of them are locked up in
companies (MMA, Maple, Derive, Macsyma, etc.). But companies die and
take their software with them, for example, Macsyma. (For company
deaths see the last few minutes of:
http://www.ted.com/talks/geoffrey_west_the_surprising_math_of_cities_and_corporations.html

If we are to have a firm foundation for computational mathematics it
is necessary that we begin to build the citation trail of algorithms
available in a public, published way. Why are we wasting time and
effort reinventing algorithms? Why can't I find executable algorithms
along with their theory? Why can't I attend a talk, download the
literate paper, and execute the algorithm while the talk is ongoing?

> 
> When I began my massive rewrite of flint from scratch I decided that
> each function would have a complete description and justification of
> the algorithm (if not well-known) in an associated text file, with
> references and proofs if necessary. Sebastian Pancratz came along and
> saw these text files and decided to write a parser that automatically
> turned them into pdf documentation!

Excellent! Although I'd much rather follow Knuth and have a literate
document I find it outstanding that you have created these. Can you
point to a URL as an example?
 
> 
> >
> > We could raise the bar of expected behavior by having algorithms fully
> > explained in the source code along with citations of theory sources. We
> > could include a section on limitations, boundary conditions, examples,
> > assumptions, and test cases. This is not as challenging as it sounds
> > if it is done by the original author. It is extremely hard to recover
> > or discover this information after the fact.
> 
> To a good extent this is done in most of the Sage library. I agree
> this should be the expected standard.
I am unaware of this, which is clearly my fault. 
Can you point to an example you consider literate?

> 
> Again I have to credit Sebastian Pancratz and Fredrik Johansson here
> for raising the standard in flint. I thought I had been producing
> beautiful code until Sebastian started rewriting some of it for
> me. :-)
> 
> >
> > Knuth proposed the idea of "literate programs" which combine the
> > actual source code with the human language text as a single document.
> > The document re-arranges the source code for ease of explanation and
> > includes the usual paper sections on background, theory, along with
> > a bibliography citing prior (literate) work. (See Queinnec, Christian
> > ``Lisp in Small Pieces'' ISBN 0-521-56247-3 for a literate example)
> 
> My favourite example of this is Jonesforth. I highly recommend anyone
> who has not read this program to do so immediately. It is one of the
> most beautiful documents you can obtain for free. And you get yourself
> a Forth implementation at no extra cost.
All page references to Jonesforth seem to time out. 

I am writing a similar document for the Clojure language. 
The literate document contains a full implementation of Clojure.
Unpacking the document creates a Clojure REPL and a PDF of the
documentation (well, as far as it has gotten at this point).
source: http://daly.literatesoftware.com/clojure.pamphlet
pdf   : http://daly.literatesoftware.com/clojure.pdf

> 
> I don't personally do literate programming in the sense you intend it,
> with rearranged code so that it reads more naturally. But for very
> complex and touchy pieces of critical code I sometimes add formal
> justifications for every line of code. A good example of this is the
> bitpacking code in flint.
> 
> >
> > We could take small steps in this direction, possibly by hosting a
> > "literate program" track at conferences or workshops to encourage the
> > more literate among us to show examples and build up the technology.
> 
> I could certainly imagine a talk being constructed around an actual
> program. I might try this one day and see how it goes.
I would like to see Sage have some small step in this direction but
I have no idea where to start. I know next to nothing about number
theory so I couldn't begin to rev

[sage-devel] Re: Sage & Pari

2011-07-26 Thread Bill Hart


On Jul 26, 7:02 pm, daly  wrote:
> On Tue, 2011-07-26 at 08:30 -0700, Bill Hart wrote:



> To paraphrase your above comment, I believe that we need to raise the
> standards of computational mathematics further toward being "a
> respectable sport".
>
> Consider your observation:
>    "Of course properly citing Pari and the algorithm used and checking
>     under what conditions the result is meaningful, etc, assumes that
>     these things are well documented and accessible. So, a positive step
>     the Pari developers could take is to make it easy to read the source
>     code for a given function in Pari, much as Sage has done for Sage
>     library code, and to document the algorithms used and where they are
>     published and what their restrictions are."
>
> In regular mathematics it is standard practice to fully document, that
> is, show the complete proof of your findings. In computational maths
> we do not do this. There is no particular reason why we don't except
> that it is not the expected behavior.

Unfortunately this is rarely true these days. In regular mathematics
it is now very common for there to be large leaps between steps.
Usually the more advanced the mathematician the less detail that is
expected. Graduate students spend a lot of time filling it in.

But your point regarding computational mathematics is spot on. It is
very common to find code completely undocumented in any way. There is
a substantial disparity in attitudes here.

When I began my massive rewrite of flint from scratch I decided that
each function would have a complete description and justification of
the algorithm (if not well-known) in an associated text file, with
references and proofs if necessary. Sebastian Pancratz came along and
saw these text files and decided to write a parser that automatically
turned them into pdf documentation!

>
> We could raise the bar of expected behavior by having algorithms fully
> explained in the source code along with citations of theory sources. We
> could include a section on limitations, boundary conditions, examples,
> assumptions, and test cases. This is not as challenging as it sounds
> if it is done by the original author. It is extremely hard to recover
> or discover this information after the fact.

To a good extent this is done in most of the Sage library. I agree
this should be the expected standard.

Again I have to credit Sebastian Pancratz and Fredrik Johansson here
for raising the standard in flint. I thought I had been producing
beautiful code until Sebastian started rewriting some of it for
me. :-)

>
> Knuth proposed the idea of "literate programs" which combine the
> actual source code with the human language text as a single document.
> The document re-arranges the source code for ease of explanation and
> includes the usual paper sections on background, theory, along with
> a bibliography citing prior (literate) work. (See Queinnec, Christian
> ``Lisp in Small Pieces'' ISBN 0-521-56247-3 for a literate example)

My favourite example of this is Jonesforth. I highly recommend anyone
who has not read this program to do so immediately. It is one of the
most beautiful documents you can obtain for free. And you get yourself
a Forth implementation at no extra cost.

I don't personally do literate programming in the sense you intend it,
with rearranged code so that it reads more naturally. But for very
complex and touchy pieces of critical code I sometimes add formal
justifications for every line of code. A good example of this is the
bitpacking code in flint.

>
> We could take small steps in this direction, possibly by hosting a
> "literate program" track at conferences or workshops to encourage the
> more literate among us to show examples and build up the technology.

I could certainly imagine a talk being constructed around an actual
program. I might try this one day and see how it goes.

>
> Raising the standards for published mathematical software will help
> us all in the long run. We can look forward to a time when we can
> read, understand, and cite particular Pari and Sage algorithms which
> are their actual implementations in literate form.
>

It's not the only important step that needs to be taken though.

My epiphany regarding automated program checking came when I
serendipitously encountered a fantastic book on Prolog on the exact
same day that I sat down to implement Damas-Hindley-Milner type
inference for an interpreter I have been writing. I marvel that a 20
line implementation of the unification algorithm saves me so much
hassle. After this little epiphany I have not had any problems seeing
the future. Type inference and unification have been around for ages,
but I feel like someone who has been in a time machine and returned
and who has to sit around and watch the inevitable development of the
compiler technology that I have already seen. Progress feels like a
slow motion replay.

Bill.

-- 
To post to this group, send an email to sage-devel@goog

[sage-devel] Re: Sage & Pari

2011-07-26 Thread Jason Grout

On 7/26/11 11:03 AM, William Stein wrote:

(Sorry if anybody's favorite Sage component is missed in the attached
diagram -- these were just the notes for my talk, and the diagram I
actually drew was by hand with chalk and had ... below some spots to
emphasize incompleteness.)


Nice; I think it's funny how Jmol is pictured in the center, but not 
mentioned :).  Matplotlib should also probably have been in the python 
list, as it's a well-used component of Sage.


But all in all, this is a great diagram!

Jason


--
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


Re: [sage-devel] Re: Sage & Pari

2011-07-26 Thread daly
On Tue, 2011-07-26 at 08:30 -0700, Bill Hart wrote:
> Hi Simon,
> 
...[snip]...

> 
> Of course properly citing Pari and the algorithm used and checking
> under what conditions the result is meaningful, etc, assumes that
> these things are well documented and accessible. So, a positive step
> the Pari developers could take is to make it easy to read the source
> code for a given function in Pari, much as Sage has done for Sage
> library code, and to document the algorithms used and where they are
> published and what their restrictions are.
> 
> A lot of Pari code is especially difficult because it uses messy stack
> allocation all over the place. This is not thread safe and it is
> difficult to read and contribute to. If I wanted to make Pari really
> easy to contribute to and understand, I'd probably use garbage
> collection for all but the core arithmetic routines. I'd still make
> the core threadsafe though. I believe flint2 proves that this is
> possible in an efficient manner without messy stack allocation
> routines.
> 
> The Pari developers might also like to consider people who wrote the
> Sage wrapper for Pari as contributing to Pari and credit them as
> such.
> 
> It is in matters like these that Sage has taken a massive first step
> towards making Computational Number Theory a respectable sport. Whilst
> it doesn't address all the issues I raised in my earlier post, it has
> taken the field from being a mass of poorly documented, broken, non-
> portable, incompatible code lying around on people's hard drives or
> locked up in inscrutable closed implementations to being a unified
> distribution which meets some minimum standards. The code in the Sage
> library is open and easily accessible. Every function has a docstring
> and doctests (alright, that's a lie, but it's almost true). Library
> code is subjected to some kind of peer review. References to papers
> are provided. And moreover, Sage builds on multiple platforms and is
> tested regularly so it is more useful for some people. These are all
> very important first steps in bringing the field into good repute.
> None of these things seems to have hurt the popularity of Sage.
> 
> Having said that, I personally find it very difficult to trace through
> which algorithm is used in what regime in Sage. I once made some
> comments about which algorithms were used in Sage in a certain regime
> and someone lambasted me for getting it wrong. I subsequently
> carefully checked it through, following the rabbit warren of function
> calls across multiple interface boundaries through many files
> separated across vast reaches of the Sage directory tree and I am
> convinced I had the details substantially right. Of course if we just
> relied on Sage documentation to tell us what algorithm is being used
> in which package, we'd get it wrong much of the time, because that
> relies on the documentation having been maintained correctly. But at
> least Sage makes a step in the right direction. I've had similar
> issues trying to trace through the linbox package to figure out which
> algorithm is used.

...[snip]...

To paraphrase your above comment, I believe that we need to raise the
standards of computational mathematics further toward being "a 
respectable sport". 

Consider your observation:
   "Of course properly citing Pari and the algorithm used and checking
under what conditions the result is meaningful, etc, assumes that
these things are well documented and accessible. So, a positive step
the Pari developers could take is to make it easy to read the source
code for a given function in Pari, much as Sage has done for Sage
library code, and to document the algorithms used and where they are
published and what their restrictions are."

In regular mathematics it is standard practice to fully document, that
is, show the complete proof of your findings. In computational maths
we do not do this. There is no particular reason why we don't except
that it is not the expected behavior.

We could raise the bar of expected behavior by having algorithms fully
explained in the source code along with citations of theory sources. We
could include a section on limitations, boundary conditions, examples,
assumptions, and test cases. This is not as challenging as it sounds
if it is done by the original author. It is extremely hard to recover
or discover this information after the fact.

Knuth proposed the idea of "literate programs" which combine the
actual source code with the human language text as a single document.
The document re-arranges the source code for ease of explanation and
includes the usual paper sections on background, theory, along with
a bibliography citing prior (literate) work. (See Queinnec, Christian
``Lisp in Small Pieces'' ISBN 0-521-56247-3 for a literate example)

We could take small steps in this direction, possibly by hosting a
"literate program" track at conferences or workshops to encourage the
more literate among us

[sage-devel] Re: Sage & Pari

2011-07-26 Thread Bill Hart


On Jul 26, 5:59 pm, William Stein  wrote:

> When doing such investigations, we care greatly that the data we get is
> correct.  We just don't care so much that the program producing it be
> *proven* correct in a formal and technically rigorous sense.

I understand that you qualified that with "such investigations" in
reference to my example of investigating conjectures and gathering
numerical evidence etc. And I agree with you in that context.

However, I wish to emphasise the point I am making that the issues of
correct data and rigorous programming methods are very much related.
We are currently doing this ad hoc. It will in future be largely
automated. And where it is not, it will be part of producing code to
publication standard.

I only wish to emphasise the point so that in a few decades time you
will see how absolutely certain I was of that fact at this moment.

That is to take nothing away from the important work of FoCM and
similar ventures to improve attitudes. That is important too and I
believe very relevant to Pari.

Bill.

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


Re: [sage-devel] Re: Sage & Pari

2011-07-26 Thread William Stein
On Tue, Jul 26, 2011 at 9:00 AM, Bill Hart wrote:
>
> Regarding rigor, mathematics itself went through a phase where
> informal arguments were displaced with formal ones. Likewise, informal
> computer programs will eventually give way to formally verified ones,
> and this will naturally be embraced by mathematicians. You are right
> that this will not directly have an effect on Number Theory
> computations as they are currently done because people will continue
> to use computers as a tool to investigate conjectures and collect
> numerical evidence and so on. They care little about whether the
> program is correct.
>

When doing such investigations, we care greatly that the data we get is
correct.  We just don't care so much that the program producing it be
*proven* correct in a formal and technically rigorous sense.

When people begin justifying their programs in their papers, (which
> will of course also be electronic in the future), then the field will
> become more academically sound and eventually the problems it
> currently has will become a thing of the past. At the moment the field
> (if you can even call it that) has a serious image problem. People
> speak about working on computation in whispers as though it is sinful.
>

There are other things happening in the world of mathematics that are
helping with this image problem.  For example, I was just at FoCM 2011 [1]
and did not get this impression about computation, and there was a very wide
range of mathematics represented (everything from numerical PDE's to
Computational Topology to Number Theory).   And there were certainly some
well respected mathematicians among the participants (e.g., Steven Smale,
Gunar Carlson, etc.).   The people who build communities (major conference
series / proceedings volumes) like the FoCM organizers are helping.

[1] http://www.damtp.cam.ac.uk/user/na/FoCM11/workshops.html


> This is possibly something you don't encounter as a computer
> scientist.
>
> Bill.
>
> --
> To post to this group, send an email to sage-devel@googlegroups.com
> To unsubscribe from this group, send an email to
> sage-devel+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
>



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


[sage-devel] Re: Sage & Pari

2011-07-26 Thread Bill Hart


On Jul 26, 1:51 am, rjf  wrote:

> I hope that Bill was hoping to stir up some dissenting opinions.

Yes, naturally.

However, it looks to me that you have simply recorded the way things
currently are. People will do whatever they feel like. That's true by
definition.

Regarding rigor, mathematics itself went through a phase where
informal arguments were displaced with formal ones. Likewise, informal
computer programs will eventually give way to formally verified ones,
and this will naturally be embraced by mathematicians. You are right
that this will not directly have an effect on Number Theory
computations as they are currently done because people will continue
to use computers as a tool to investigate conjectures and collect
numerical evidence and so on. They care little about whether the
program is correct.

I'm also not talking about program proving in the sense that people
usually mean it. I simply mean that compiler technology will give more
confidence in implementations by virtue of the fact that they will
check things currently left unchecked. This will obviously be picked
up by serious mathematicians, namely the ones who are currently
growing up with computers and who come to care about rigor, and will
be made a formal part of doing mathematics with computers. I don't
envision some centralised authority imposing conditions on published
work. It will simply become part of mathematics de rigueur. Reearchers
will not risk submitting papers that may be rejected on the grounds of
inappropriate formal justification of their code for the standard of
journal they are submitting to.

When people begin justifying their programs in their papers, (which
will of course also be electronic in the future), then the field will
become more academically sound and eventually the problems it
currently has will become a thing of the past. At the moment the field
(if you can even call it that) has a serious image problem. People
speak about working on computation in whispers as though it is sinful.
This is possibly something you don't encounter as a computer
scientist.

Bill.

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


[sage-devel] Re: Sage & Pari

2011-07-26 Thread Bill Hart
Hi Simon,

After reading what you wrote, I fully agree with you. Your distinction
between citing a citation and citing the original work is a useful
one. This would happen naturally if you focused on identifying the
algorithm and understanding its limitations, as I recommended. But
they way you have put it is more nuanced and sensible.

Actually, regarding thin wrappers, nothing irritates me more than a
programmer saying "we have just implemented feature X in system Y"
when what they mean is they "wrapped function W from external library
Z" and called it an "implementation" of feature X.

[This issue actually grates on me a lot. I recently saw the most
absurd example of this. Someone claimed they had "implemented" a C
REPL. It turned out to be a Python program which took the C code you
entered and passed it to GCC via the command line. It was completely
and totally broken and didn't even preserve state. Now, it turned out
that a group at CERN has actually been working on a real C++ REPL
called Cling, for analysing data from the LHC (something like 25
petabytes of serialised C++ objects a year after filtering). I could
imagine some miscreant wrapping the whole Cling project in Python and
claiming they have implemented a C++ REPL. At least the Sage library
has many new and interesting implementations of original algorithms
which build on the functionality it "wraps".]

As for citation, unfortunately professional mathematicians rarely have
time to learn anything about programming and so often you find less
than adequate details when they cite computer software. I've been at a
Pure Maths conference for a week and a half and I've seen computers
mentioned twice. The first time no credit was given at all (though it
was almost certainly Pari). The second time the entire credit was
"SAGE", although the speaker took care to acknowledge the help of John
Cremona in helping them perform the computation. In that case I
believe the computation was possibly done entirely in Pari via Sage.
It looked to be computation of class numbers and unit groups of some
number fields, although I am not 100% sure of this. The speaker was
delighted that "computers can do these things". Of course computers
can't do anything. Better to cite the person who did the
implementation. And did that speaker realise that the computation was
subject to the GRH or worse? They may be excused for not going into
great detail if the computation was as mundane as I thought it was. It
seems to fall into the category of "by a well known result...".

The situation we really need to be worried about is where some special
algorithm is used which makes a given computation possible where it
was not otherwise. If my computation relied critically on multiply two
large polynomials of degree ten billion, I might want to mention that
I used flint via Sage, in case the reader haplessly tries to do this
in Pari or some system that only supports polynomials of length up to
2^30.

Of course properly citing Pari and the algorithm used and checking
under what conditions the result is meaningful, etc, assumes that
these things are well documented and accessible. So, a positive step
the Pari developers could take is to make it easy to read the source
code for a given function in Pari, much as Sage has done for Sage
library code, and to document the algorithms used and where they are
published and what their restrictions are.

A lot of Pari code is especially difficult because it uses messy stack
allocation all over the place. This is not thread safe and it is
difficult to read and contribute to. If I wanted to make Pari really
easy to contribute to and understand, I'd probably use garbage
collection for all but the core arithmetic routines. I'd still make
the core threadsafe though. I believe flint2 proves that this is
possible in an efficient manner without messy stack allocation
routines.

The Pari developers might also like to consider people who wrote the
Sage wrapper for Pari as contributing to Pari and credit them as
such.

It is in matters like these that Sage has taken a massive first step
towards making Computational Number Theory a respectable sport. Whilst
it doesn't address all the issues I raised in my earlier post, it has
taken the field from being a mass of poorly documented, broken, non-
portable, incompatible code lying around on people's hard drives or
locked up in inscrutable closed implementations to being a unified
distribution which meets some minimum standards. The code in the Sage
library is open and easily accessible. Every function has a docstring
and doctests (alright, that's a lie, but it's almost true). Library
code is subjected to some kind of peer review. References to papers
are provided. And moreover, Sage builds on multiple platforms and is
tested regularly so it is more useful for some people. These are all
very important first steps in bringing the field into good repute.
None of these things seems to have hurt the popularity of Sage.

Hav

[sage-devel] Re: Sage & Pari

2011-07-25 Thread rjf
There seem to be two issues here.
1. It would be nice for Pari to get credit when Pari is used, even if
the
user gave credit to Sage.
 and
2. (Bill Hart's assertion) that citing an algorithm that you haven't
studied
is like citing a paper you haven't read.

Regarding item 1.  If you accept code under GPL, you are restricted in
your further distribution of the code.  I am unaware of requirements
that
you cite the code if you publish a result of that code.  If that were
the case,
we would have to cite everything from Python, Cython, GCC, to Linux,
or whatever
layers of software were used. Maybe hardware too (which processors
etc.)

  (It is sometimes useful to know such details,
especially when comparing times; one hopes that numerical results in
number theory
are affected by the version of GCC, but that might happen.)

Thus someone using Sage need not even mention Sage, though it would be
courteous to
do so.  At least some of the early use of Macsyma by distinguished MIT
math professors
(usually through some surrogate lower-form-of-life:)  ) was entirely
hidden from view.
Presumably all use of Maxima from Sage is hidden from the view of the
user.

It would be courteous to credit the programmers who wrote the programs
that computed
the results, but this is hardly ever known. Who really wrote the
python system? Linux?
etc.

Giving credit where credit is due is hardly required of anyone.
Indeed, taking credit
where credit is NOT due occasionally happens.  (Wolfram, in a blurb on
their new CDF
format, incidentally claims Wolfram was the major force behind
MathML...)

If Pari were distributed with a condition that persons using it would
have to acknowledge it,
then it would presumably not be part of Sage.

If Pari has to show usage statistics, maybe some Sage server could be
instrumented to show this.
Otherwise, all one can do, it seems, is request number-theorists who
use Sage to consider somehow whether they are computing using Pari,
and cite it too.  But they might not know.

As for item 2, I disagree with Bill.

I think that saying your result was computed by program X, and that
other people (including the referees) can reproduce your results by
using program X is perfectly valid.  You might strengthen your article
by saying that you believe the results to be correct for some reason,
or even provide a proof of some sort. Your paper might still be right
even if the program is riddled with bugs;
your paper might be wrong and the program correct.  The referees might
accept the paper and it might be wrong. Or they might reject it even
if it is right.

  Citing a paper rather than a computational result is not necessarily
stronger, as evidenced by the fact that many papers have been shown to
be wrong.

It seems to me that the reality is: an author takes primary
responsibility for the correctness of a paper submitted to a journal.
Whatever tools, or references, are used to bolster the confidence of
the correctness are potentially valid.  I think we are largely past
the idea that a paper should be rejected on the grounds that a proof
was (merely) a computation and not confirmed by humans. But maybe I'm
naive on that score.

Also in response to Bill's view, it seems to me dubious that advances
in proving program correctness or in formulation of standards will
have much effect on number theory calculations.

I hope that Bill was hoping to stir up some dissenting opinions.
RJF

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


[sage-devel] Re: Sage & Pari

2011-07-25 Thread Simon King
Hi Bill,

let me comment on the "how to cite individual Sage components" topic.

On 26 Jul., 00:18, Bill Hart  wrote:
> * Citing a Number Theory package you know nothing about in published
> research is about as dubious as citing a paper you have never read.
> Unless you are prepared to read the code and understand the algorithm
> Pari uses, what right do you have to cite it? However, when writing
> papers, if you use a published result which makes use of some other
> published results, you usually cite the result you directly used, not
> the many other results that backed it up. In fact it is considered
> poor form to cite papers you know nothing about.

You draw an analogy between "using software A that in fact uses
software B behind the scenes" and "using a theorem of paper A that in
fact uses a theorem from paper B". You conclude: One would only cite
paper A, and it would actually be wrong to cite paper B if one doesn't
know it; hence, one should (by analogy) cite software A, but not
software B that one doesn't know but that does the actual work.

I believe you need to differentiate a bit more. Assume that the
theorem that you use appears in paper A just as a citation from paper
B. What would you do in your paper? Would you plainly write "by Thm
3.2 in [A]", even though the theorem is not proved in A? Or would you
try to look it up in paper B and write "by Thm 7.8 in [B] (see also
Thm 3.2 in [A])"?

The analogy of "paper A cites the theorem from paper B" in computer
algebra is "software A just uses a thin wrapper around software B".
That's to say: Assume that you have in your Sage session an object X
that in fact is a group living in GAP. Now, you do X.SylowSubgroup(2).
Sage's main contribution to that computation is to send the string
"SylowSubgroup(%s, 2)"%X.name() to GAP and to wait for an answer.

So, would you say "Sage has computed the  Sylow subgroup"? Wouldn't it
be more honest to say "GAP has computed the Sylow subgroup"?

There are, of course, more complicated cases. For example, I wrote an
(optional) spkg that computes modular cohomology rings of finite
groups.
 * It makes intense use of GAP and Singular, but the code base is
mainly original Cython code. However, the computations performed by
GAP and Singular are essential steps in the algorithm. Would you say
that it suffices to cite Sage for my spkg? I can tell you that some
Singular developers would be VERY upset if I wouldn't properly credit
Singular! And I agree with them.
 * In one line of an auxiliary method, I factor a multivariate
polynomial over a finite field. The factorisation is of no importance
for the final result, but it is part of a heuristic to speed up one
step of the algorithm. Would you say that I should try to find out
what component of Sage is responsible for the factorisation, and cite
it? I think that would go too far.

I don't know much about Pari. But if it does the actual work and Sage
only provides the interface, then I think it is clear that Pari must
be cited.

Cheers,
Simon

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


[sage-devel] Re: Sage & Pari

2011-07-25 Thread Bill Hart
Can we please be more precise and scientific when discussing this
important issue. I find it confusing to try to think about the many
issues raised without some precision.

Some trivialities first. Please take care to distinguish:

* finding a bug in Pari and contributing a bug report
* finding a bug in Pari and contributing a fix to Pari
* finding missing functionality in Pari and implementing it in the
Sage library
* finding missing functionality in Pari and implementing it in C and
contributing it to Pari

Can we also distinguish "blame" from "reasons things are the way they
are". Are we criticising the choice of language (C) or are we simply
giving a reason why people prefer contributing to Sage. Blame implies
someone did something wrong. And if we are to "blame" someone for
there only being two developers, please state the reasons. It is
circuitous logic to state that no one wants to contribute to Pari
because there are only two developers. It is therefore a logical
fallacy to argue that someone should be blamed for this.

Can we also distinguish between the fact that Pari has a developers
list (I am subscribed and see posts every few days, the most recent
being today) and the opinion that some functionality of the list is
considered suboptimal.

Also it is a matter of fact that there have been code contributions to
Pari from the Sage group. Jason Moxham worked for 3 months on porting
Pari to MSVC with a grant obtained from MSR by William Stein and
myself.

Some other issues require further discussion in my opinion.

* I think it is simply an illusion that there are hordes of skilled C
programmers out there who wish to write Open Source, Number Theory
code. I think, basically, the intersection of these two specialties is
almost empty these days. And where it is not, these people are working
on their own projects or are already Pari developers. I think it is
misguided for the Pari developers to expect Sage developers to
contribute to Pari. The best way to improve the state of affairs is to
train people, i.e. run undergrad classes, do summer projects, offer
Computational Number Theory classes, teach mathematics students to
write C, hire people to do PhD's and postdocs, run workshops on
contributing to Pari, etc.

* It is perfectly logical that the number of people citing Sage is
increasing. The breadth of coverage in Sage of all areas of
mathematics is increasing all the time. And the number of users of
Sage is (or at least was) increasing. But this is not necessarily
anything to do with Pari.

* It is not at all clear to me that the number of people using Pari
via Sage for publishable Computational Number Theory research is
actually increasing. It may be that the number of real users of Pari
is actually shrinking. I do not claim that this is the case. I am
merely making the observation that I have not seen any quantifiable
data to substantiate a claim in either direction (this may be a
statement of ignorance on my part).

* Citing a Number Theory package you know nothing about in published
research is about as dubious as citing a paper you have never read.
Unless you are prepared to read the code and understand the algorithm
Pari uses, what right do you have to cite it? However, when writing
papers, if you use a published result which makes use of some other
published results, you usually cite the result you directly used, not
the many other results that backed it up. In fact it is considered
poor form to cite papers you know nothing about. Yet here we are
asking people to cite Pari when they know nothing about it and in fact
used Sage, which took care to use Pari in the way that it was intended
(which was checked by referees right!?). The people who need to cite
Pari are the people who interface directly with it in Sage. After all,
they are implementing new algorithms which they are publishing, and
they are carefully checking that Pari is being used correctly
according to the published limits of the algorithms and the documented
interface, right? If we apply the logic regarding citation of Pari to
all software, then where does it stop? If someone writes a package on
top of Sage, and another on top of that, should they all cite Pari if
they use Number Theory functionality? Obviously not! And yes, I infer
that it is the policy of the Sage developers to cite Pari when using
number theoretical code that makes use of Pari.

* It is very important to carefully check that you are using computer
software carefully, according to the documented interface (it is
documented, right!?) and that it does what you think it does. This is
even more relevant for computer algebra than for papers, because
papers -- especially ones in good journals -- are carefully peer
reviewed and often relatively widely read, whereas computer software
is usually not reviewed critically according to agreed standards and
almost never read. In fact, computer software is not held to any
academic standard. There is no rigorous formal basis fo