[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

Heres another idea which should speed things up a bit.

For 1x1 we currently use k = 6. Instead of this, we could use
k = 5 and make two Gray tables simultaneously. This will still fit in
cache.

Instead of doing 6 bits at a time, we can then do 10 bits at a time.
We'd load the appropriate line from the first Gray table, then the
appropriate one from the second and xor them, then xor with the output
matrix. This should decrease the number of loads and stores
considerably. Moreover, the SSE instructions will then be much more
efficient as the ratio of arithmetic instructions to loads and stores
is higher.

Of course one could also do 16 bits at a time, by doing 4 tables, but
I think this might actually get slower again since you've only
increased the amount of work done by 60%, but you've had a 30 %
increase in instructions.

Bill.

On 17 May, 17:45, Bill Hart [EMAIL PROTECTED] wrote:
 Martin,

 The test code still passes if you change RADIX to 128. I've no idea
 how it passes, but it does. Shame the results are not correct, because
 this speeds the code up by a factor of 2.

 I notice that in the SSE code, you check to see if alignment can be
 achieved, otherwise it doesn't use SSE. But this introduces an
 unpredictable branch. Also, where ther are three operands, you can't
 use SSE2 because the likelihood of all three being aligned is too
 small.

 I think a better idea would be to explicitly force all matrices and
 all rows to be 128 bit aligned if the matrices are wide enough to
 benefit from SSE2, Then the combine function can always use SSE2 and
 there will be no need to check for alignment.

 I experimented with interleaving MMX and GPR XOR's, but this doesn't
 speed anything up. There are more instructions emitted and the time
 stays about the same. The only way interleaving the MMX and GPR code
 would speed things up is if there was more computation going on in the
 registers and less memory loading and storing, I think.

 Bill.

 On 17 May, 15:45, Bill Hart [EMAIL PROTECTED] wrote:

  Hi Martin,

  Here is another 10% improvement. In the loop at the bottom of
  mzd_combine you can explicitly unroll by a factor of 8:

      word * end = b1_ptr + wide;
      register word * end8 = end - 8;
      while (b1_ptr  end8)
      {
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
      }
      while (b1_ptr  end)
      {
           *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
      }

  I did this in combination with changing the crossover for 1x1
  from 3600 to 7200.

  Bill.

  On 17 May, 09:40, Martin Albrecht [EMAIL PROTECTED]
  wrote:

   On Saturday 17 May 2008, Bill Hart wrote:

In going from 5000x5000 to 1x1 Magma's time increases by a
factor of less than 4. That is impossible. Strassen will never help us
there. They must be doing something else. Probably something clever.

Bill.

    I was stuck there too yesterday. Maybe only at 1x1 the pipeline 
   gets
   fully utilised?

   Martin

   PS: If we run out of idea we can simply go for parallelism, that should 
   help
   on sage.math ;-)

   --
   name: Martin Albrecht
   _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
   _www:http://www.informatik.uni-bremen.de/~malb
   _jab: [EMAIL PROTECTED]
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Martin Albrecht

On Saturday 17 May 2008, Bill Hart wrote:
 Martin,

 The test code still passes if you change RADIX to 128. I've no idea
 how it passes, but it does. Shame the results are not correct, because
 this speeds the code up by a factor of 2.

Since all routines use the RADIX and I only check if their results match they 
are all wrong in the same way but it isn't detected. I should add a test with 
known answers I suppose.

 I notice that in the SSE code, you check to see if alignment can be
 achieved, otherwise it doesn't use SSE. But this introduces an
 unpredictable branch. Also, where ther are three operands, you can't
 use SSE2 because the likelihood of all three being aligned is too
 small.

 I think a better idea would be to explicitly force all matrices and
 all rows to be 128 bit aligned if the matrices are wide enough to
 benefit from SSE2, Then the combine function can always use SSE2 and
 there will be no need to check for alignment.

I'll try that.

 I experimented with interleaving MMX and GPR XOR's, but this doesn't
 speed anything up. There are more instructions emitted and the time
 stays about the same. The only way interleaving the MMX and GPR code
 would speed things up is if there was more computation going on in the
 registers and less memory loading and storing, I think.

I came to the same conclusion (but my code might not have been as good as 
your's). I improved other areas of the code (e.g. use naiv multiplication 
rather than M4RM if B-ncols  RADIX since it is faster etc.) I can forward 
you my newest tarball (but the speed improvements aren't really noticable 
yet).

Martin


-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Martin Albrecht

On Saturday 17 May 2008, Martin Albrecht wrote:
  I think a better idea would be to explicitly force all matrices and
  all rows to be 128 bit aligned if the matrices are wide enough to
  benefit from SSE2, Then the combine function can always use SSE2 and
  there will be no need to check for alignment.

 That doesn't seem to make a noticeable difference for me (on C2D). However,
 I realised that the multiplications where the target matrix is a real
 matrix rather than a window (which has bad data locality). Copying
 everything over seems not like a good idea but it at least indicates an
 area for improvements.

Okay, if I only copy when we crossover to M4RM then the memory overhead is 
constant (~ cutoff^2) and the performance still improves.

Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
Matrix DimensionMagma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
10,000 x 10,000 2.920   3.610
16,384 x 16,384 11.140  12.120
20,000 x 20,000 20.370  24.390
32,000 x 32,000 74.290  94.910

New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
Matrix DimensionMagma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
10,000 x 10,000 2.920   2.990
16,384 x 16,384 11.140  11.750
20,000 x 20,000 20.370  21.180
32,000 x 32,000 74.290  86.570

On Opteron things don't look this way, but I think sage.math is pretty heavily 
used right now such that my benchmarks there are not very telling.

Martin

-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

That's looking good. Would you like me to run it on an unburdened
opteron to see how it goes? If you like you can send me a tarball and
I'll try it out.

I think our best bet for a significant improvement now is the idea of
using two Gray tables of half the size simultaneously. I also realised
it possibly improves the cache performance for the A matrix too.

I was casually wondering whether Magma might use a highly optimised
Winograd's algorithm instead of the naive algorithm. But over GF2 I
think it probably actually takes longer, since it basically replaces
n^2 full length scalar multiplies by n^2 half length ones and 2*n^2
half row additions, plus a pile of other overhead.

Bill.

On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED]
wrote:
 On Saturday 17 May 2008, Martin Albrecht wrote:

   I think a better idea would be to explicitly force all matrices and
   all rows to be 128 bit aligned if the matrices are wide enough to
   benefit from SSE2, Then the combine function can always use SSE2 and
   there will be no need to check for alignment.

  That doesn't seem to make a noticeable difference for me (on C2D). However,
  I realised that the multiplications where the target matrix is a real
  matrix rather than a window (which has bad data locality). Copying
  everything over seems not like a good idea but it at least indicates an
  area for improvements.

 Okay, if I only copy when we crossover to M4RM then the memory overhead is
 constant (~ cutoff^2) and the performance still improves.

 Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
 Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
 10,000 x 10,000         2.920                           3.610
 16,384 x 16,384         11.140                          12.120
 20,000 x 20,000         20.370                          24.390
 32,000 x 32,000 74.290                          94.910

 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
 Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
 10,000 x 10,000         2.920                           2.990
 16,384 x 16,384         11.140                          11.750
 20,000 x 20,000         20.370                          21.180
 32,000 x 32,000 74.290                          86.570

 On Opteron things don't look this way, but I think sage.math is pretty heavily
 used right now such that my benchmarks there are not very telling.

 Martin

 --
 name: Martin Albrecht
 _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
 _www:http://www.informatik.uni-bremen.de/~malb
 _jab: [EMAIL PROTECTED]
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

Yet another idea.

Suppose we do not combine entire rows in the Gray table, but only half
rows. Once half a row is bigger than a single cache line (512 bits on
the Opteron) we may as well work with half rows. This allows us to
work with twice as many rows at once in the Gray tables (each of half
the size). This means that we are dealing with twice as many bits from
rows of A as usual and twice as many rows of B as usual, but we need
to do it all again for the second half of the rows. This means we get
twice the work done in the same amount of cache space.

Combined with the idea of using two Gray tables of 2^5 combinations of
rows instead of a single table of 2^6 combinations of rows, this would
equate to dealing with 20 bits of each row of A at a time and 20 rows
of B at a time.

With this scheme, there would then be 4 arithmetic operations in SSE
registers, 5 loads and 1 store, when combining rows from Gray tables,
instead of about 6.6 loads, 3.3 stores and 3.3 arithmetic operations,
changing the ratio of load/stores to arithmetic ops from 2.7 to 1.5.

This is another example where copying the data (the half rows) out and
reordering it so it has better locality, would probably make a big
difference. That sort of thing always works exceptionally well on AMD
chips.

Bill.

On 17 May, 21:05, Bill Hart [EMAIL PROTECTED] wrote:
 That's looking good. Would you like me to run it on an unburdened
 opteron to see how it goes? If you like you can send me a tarball and
 I'll try it out.

 I think our best bet for a significant improvement now is the idea of
 using two Gray tables of half the size simultaneously. I also realised
 it possibly improves the cache performance for the A matrix too.

 I was casually wondering whether Magma might use a highly optimised
 Winograd's algorithm instead of the naive algorithm. But over GF2 I
 think it probably actually takes longer, since it basically replaces
 n^2 full length scalar multiplies by n^2 half length ones and 2*n^2
 half row additions, plus a pile of other overhead.

 Bill.

 On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED]
 wrote:

  On Saturday 17 May 2008, Martin Albrecht wrote:

I think a better idea would be to explicitly force all matrices and
all rows to be 128 bit aligned if the matrices are wide enough to
benefit from SSE2, Then the combine function can always use SSE2 and
there will be no need to check for alignment.

   That doesn't seem to make a noticeable difference for me (on C2D). 
   However,
   I realised that the multiplications where the target matrix is a real
   matrix rather than a window (which has bad data locality). Copying
   everything over seems not like a good idea but it at least indicates an
   area for improvements.

  Okay, if I only copy when we crossover to M4RM then the memory overhead is
  constant (~ cutoff^2) and the performance still improves.

  Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
  Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
  10,000 x 10,000         2.920                           3.610
  16,384 x 16,384         11.140                          12.120
  20,000 x 20,000         20.370                          24.390
  32,000 x 32,000 74.290                          94.910

  New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
  Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
  10,000 x 10,000         2.920                           2.990
  16,384 x 16,384         11.140                          11.750
  20,000 x 20,000         20.370                          21.180
  32,000 x 32,000 74.290                          86.570

  On Opteron things don't look this way, but I think sage.math is pretty 
  heavily
  used right now such that my benchmarks there are not very telling.

  Martin

  --
  name: Martin Albrecht
  _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
  _www:http://www.informatik.uni-bremen.de/~malb
  _jab: [EMAIL PROTECTED]
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

Martin,

Here's a really unusual thing. Perhaps you can confirm this. I get a
20% improvement if I add:

if (x)
{
}

in the three obvious places in the function _mzd_mul_m4rm_impl. This
stops it mpz_combining the zero row.

But I don't understand why this works. The time should be only 1.5%
better since k = 6 and there are 2^k rows in the table, only one of
which is zero.

Could it be that your coinflip function is not quite random?

Anyway, I'm down to 3.40s for 1x1 with this change. Test
functions still pass.

Bill.


On 17 May, 22:05, Bill Hart [EMAIL PROTECTED] wrote:
 Yet another idea.

 Suppose we do not combine entire rows in the Gray table, but only half
 rows. Once half a row is bigger than a single cache line (512 bits on
 the Opteron) we may as well work with half rows. This allows us to
 work with twice as many rows at once in the Gray tables (each of half
 the size). This means that we are dealing with twice as many bits from
 rows of A as usual and twice as many rows of B as usual, but we need
 to do it all again for the second half of the rows. This means we get
 twice the work done in the same amount of cache space.

 Combined with the idea of using two Gray tables of 2^5 combinations of
 rows instead of a single table of 2^6 combinations of rows, this would
 equate to dealing with 20 bits of each row of A at a time and 20 rows
 of B at a time.

 With this scheme, there would then be 4 arithmetic operations in SSE
 registers, 5 loads and 1 store, when combining rows from Gray tables,
 instead of about 6.6 loads, 3.3 stores and 3.3 arithmetic operations,
 changing the ratio of load/stores to arithmetic ops from 2.7 to 1.5.

 This is another example where copying the data (the half rows) out and
 reordering it so it has better locality, would probably make a big
 difference. That sort of thing always works exceptionally well on AMD
 chips.

 Bill.

 On 17 May, 21:05, Bill Hart [EMAIL PROTECTED] wrote:

  That's looking good. Would you like me to run it on an unburdened
  opteron to see how it goes? If you like you can send me a tarball and
  I'll try it out.

  I think our best bet for a significant improvement now is the idea of
  using two Gray tables of half the size simultaneously. I also realised
  it possibly improves the cache performance for the A matrix too.

  I was casually wondering whether Magma might use a highly optimised
  Winograd's algorithm instead of the naive algorithm. But over GF2 I
  think it probably actually takes longer, since it basically replaces
  n^2 full length scalar multiplies by n^2 half length ones and 2*n^2
  half row additions, plus a pile of other overhead.

  Bill.

  On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED]
  wrote:

   On Saturday 17 May 2008, Martin Albrecht wrote:

 I think a better idea would be to explicitly force all matrices and
 all rows to be 128 bit aligned if the matrices are wide enough to
 benefit from SSE2, Then the combine function can always use SSE2 and
 there will be no need to check for alignment.

That doesn't seem to make a noticeable difference for me (on C2D). 
However,
I realised that the multiplications where the target matrix is a real
matrix rather than a window (which has bad data locality). Copying
everything over seems not like a good idea but it at least indicates an
area for improvements.

   Okay, if I only copy when we crossover to M4RM then the memory overhead is
   constant (~ cutoff^2) and the performance still improves.

   Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
   Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
   10,000 x 10,000         2.920                           3.610
   16,384 x 16,384         11.140                          12.120
   20,000 x 20,000         20.370                          24.390
   32,000 x 32,000 74.290                          94.910

   New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
   Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
   10,000 x 10,000         2.920                           2.990
   16,384 x 16,384         11.140                          11.750
   20,000 x 20,000         20.370                          21.180
   32,000 x 32,000 74.290                          86.570

   On Opteron things don't look this way, but I think sage.math is pretty 
   heavily
   used right now such that my benchmarks there are not very telling.

   Martin

   --
   name: Martin Albrecht
   _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
   _www:http://www.informatik.uni-bremen.de/~malb
   _jab: [EMAIL PROTECTED]
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

I suppose that this might be due to the ends of rows all being zero as
they aren't a multiple of 64 bits long. But I checked for 16384x16384
and we are nearly down to the speed of Magma there too. I just don't
get it. The coinflip has to be broken I think.

Bill.

On 17 May, 22:40, Bill Hart [EMAIL PROTECTED] wrote:
 Martin,

 Here's a really unusual thing. Perhaps you can confirm this. I get a
 20% improvement if I add:

 if (x)
 {

 }

 in the three obvious places in the function _mzd_mul_m4rm_impl. This
 stops it mpz_combining the zero row.

 But I don't understand why this works. The time should be only 1.5%
 better since k = 6 and there are 2^k rows in the table, only one of
 which is zero.

 Could it be that your coinflip function is not quite random?

 Anyway, I'm down to 3.40s for 1x1 with this change. Test
 functions still pass.

 Bill.

 On 17 May, 22:05, Bill Hart [EMAIL PROTECTED] wrote:

  Yet another idea.

  Suppose we do not combine entire rows in the Gray table, but only half
  rows. Once half a row is bigger than a single cache line (512 bits on
  the Opteron) we may as well work with half rows. This allows us to
  work with twice as many rows at once in the Gray tables (each of half
  the size). This means that we are dealing with twice as many bits from
  rows of A as usual and twice as many rows of B as usual, but we need
  to do it all again for the second half of the rows. This means we get
  twice the work done in the same amount of cache space.

  Combined with the idea of using two Gray tables of 2^5 combinations of
  rows instead of a single table of 2^6 combinations of rows, this would
  equate to dealing with 20 bits of each row of A at a time and 20 rows
  of B at a time.

  With this scheme, there would then be 4 arithmetic operations in SSE
  registers, 5 loads and 1 store, when combining rows from Gray tables,
  instead of about 6.6 loads, 3.3 stores and 3.3 arithmetic operations,
  changing the ratio of load/stores to arithmetic ops from 2.7 to 1.5.

  This is another example where copying the data (the half rows) out and
  reordering it so it has better locality, would probably make a big
  difference. That sort of thing always works exceptionally well on AMD
  chips.

  Bill.

  On 17 May, 21:05, Bill Hart [EMAIL PROTECTED] wrote:

   That's looking good. Would you like me to run it on an unburdened
   opteron to see how it goes? If you like you can send me a tarball and
   I'll try it out.

   I think our best bet for a significant improvement now is the idea of
   using two Gray tables of half the size simultaneously. I also realised
   it possibly improves the cache performance for the A matrix too.

   I was casually wondering whether Magma might use a highly optimised
   Winograd's algorithm instead of the naive algorithm. But over GF2 I
   think it probably actually takes longer, since it basically replaces
   n^2 full length scalar multiplies by n^2 half length ones and 2*n^2
   half row additions, plus a pile of other overhead.

   Bill.

   On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED]
   wrote:

On Saturday 17 May 2008, Martin Albrecht wrote:

  I think a better idea would be to explicitly force all matrices and
  all rows to be 128 bit aligned if the matrices are wide enough to
  benefit from SSE2, Then the combine function can always use SSE2 and
  there will be no need to check for alignment.

 That doesn't seem to make a noticeable difference for me (on C2D). 
 However,
 I realised that the multiplications where the target matrix is a real
 matrix rather than a window (which has bad data locality). Copying
 everything over seems not like a good idea but it at least indicates 
 an
 area for improvements.

Okay, if I only copy when we crossover to M4RM then the memory overhead 
is
constant (~ cutoff^2) and the performance still improves.

Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
10,000 x 10,000         2.920                           3.610
16,384 x 16,384         11.140                          12.120
20,000 x 20,000         20.370                          24.390
32,000 x 32,000 74.290                          94.910

New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
10,000 x 10,000         2.920                           2.990
16,384 x 16,384         11.140                          11.750
20,000 x 20,000         20.370                          21.180
32,000 x 32,000 74.290                          86.570

On Opteron things don't look this way, but I think sage.math is pretty 
heavily
used right now such that my benchmarks there are not very telling.

Martin

--
name: Martin Albrecht
_pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
_www:http

[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

I checked the coinflip and it is definitely fine. There is no greater
probability of 6 zeroes in a row than there ought to be. So the
speedup I just reported is quite a mystery.

Bill.

On 17 May, 22:57, Bill Hart [EMAIL PROTECTED] wrote:
 I suppose that this might be due to the ends of rows all being zero as
 they aren't a multiple of 64 bits long. But I checked for 16384x16384
 and we are nearly down to the speed of Magma there too. I just don't
 get it. The coinflip has to be broken I think.

 Bill.

 On 17 May, 22:40, Bill Hart [EMAIL PROTECTED] wrote:

  Martin,

  Here's a really unusual thing. Perhaps you can confirm this. I get a
  20% improvement if I add:

  if (x)
  {

  }

  in the three obvious places in the function _mzd_mul_m4rm_impl. This
  stops it mpz_combining the zero row.

  But I don't understand why this works. The time should be only 1.5%
  better since k = 6 and there are 2^k rows in the table, only one of
  which is zero.

  Could it be that your coinflip function is not quite random?

  Anyway, I'm down to 3.40s for 1x1 with this change. Test
  functions still pass.

  Bill.

  On 17 May, 22:05, Bill Hart [EMAIL PROTECTED] wrote:

   Yet another idea.

   Suppose we do not combine entire rows in the Gray table, but only half
   rows. Once half a row is bigger than a single cache line (512 bits on
   the Opteron) we may as well work with half rows. This allows us to
   work with twice as many rows at once in the Gray tables (each of half
   the size). This means that we are dealing with twice as many bits from
   rows of A as usual and twice as many rows of B as usual, but we need
   to do it all again for the second half of the rows. This means we get
   twice the work done in the same amount of cache space.

   Combined with the idea of using two Gray tables of 2^5 combinations of
   rows instead of a single table of 2^6 combinations of rows, this would
   equate to dealing with 20 bits of each row of A at a time and 20 rows
   of B at a time.

   With this scheme, there would then be 4 arithmetic operations in SSE
   registers, 5 loads and 1 store, when combining rows from Gray tables,
   instead of about 6.6 loads, 3.3 stores and 3.3 arithmetic operations,
   changing the ratio of load/stores to arithmetic ops from 2.7 to 1.5.

   This is another example where copying the data (the half rows) out and
   reordering it so it has better locality, would probably make a big
   difference. That sort of thing always works exceptionally well on AMD
   chips.

   Bill.

   On 17 May, 21:05, Bill Hart [EMAIL PROTECTED] wrote:

That's looking good. Would you like me to run it on an unburdened
opteron to see how it goes? If you like you can send me a tarball and
I'll try it out.

I think our best bet for a significant improvement now is the idea of
using two Gray tables of half the size simultaneously. I also realised
it possibly improves the cache performance for the A matrix too.

I was casually wondering whether Magma might use a highly optimised
Winograd's algorithm instead of the naive algorithm. But over GF2 I
think it probably actually takes longer, since it basically replaces
n^2 full length scalar multiplies by n^2 half length ones and 2*n^2
half row additions, plus a pile of other overhead.

Bill.

On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED]
wrote:

 On Saturday 17 May 2008, Martin Albrecht wrote:

   I think a better idea would be to explicitly force all matrices 
   and
   all rows to be 128 bit aligned if the matrices are wide enough to
   benefit from SSE2, Then the combine function can always use SSE2 
   and
   there will be no need to check for alignment.

  That doesn't seem to make a noticeable difference for me (on C2D). 
  However,
  I realised that the multiplications where the target matrix is a 
  real
  matrix rather than a window (which has bad data locality). Copying
  everything over seems not like a good idea but it at least 
  indicates an
  area for improvements.

 Okay, if I only copy when we crossover to M4RM then the memory 
 overhead is
 constant (~ cutoff^2) and the performance still improves.

 Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
 Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
 10,000 x 10,000         2.920                           3.610
 16,384 x 16,384         11.140                          12.120
 20,000 x 20,000         20.370                          24.390
 32,000 x 32,000 74.290                          94.910

 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
 Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
 10,000 x 10,000         2.920                           2.990
 16,384 x 16,384         11.140                          11.750
 20,000 x 20,000         20.370                          21.180

[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

Woot!!

On 17 May, 23:46, Martin Albrecht [EMAIL PROTECTED]
wrote:
  Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
  Matrix Dimension   Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
  10,000 x 10,000    2.920                           3.610
  16,384 x 16,384    11.140                          12.120
  20,000 x 20,000    20.370                          24.390
  32,000 x 32,000    74.290                          94.910

  New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
  Matrix Dimension   Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
  10,000 x 10,000    2.920                           2.990
  16,384 x 16,384    11.140                          11.750
  20,000 x 20,000    20.370                          21.180
  32,000 x 32,000    74.290                          86.570

 If you take this + Bill's idea:



  For 1x1 we currently use k = 6. Instead of this, we could use
  k = 5 and make two Gray tables simultaneously. This will still fit in
  cache.

  Instead of doing 6 bits at a time, we can then do 10 bits at a time.
  We'd load the appropriate line from the first Gray table, then the
  appropriate one from the second and xor them, then xor with the output
  matrix. This should decrease the number of loads and stores
  considerably. Moreover, the SSE instructions will then be much more
  efficient as the ratio of arithmetic instructions to loads and stores
  is higher.

  Of course one could also do 16 bits at a time, by doing 4 tables, but
  I think this might actually get slower again since you've only
  increased the amount of work done by 60%, but you've had a 30 %
  increase in instructions.

 You get (on the C2D):

 sage: B = random_matrix(GF(2), 3.2*10^4, 3.2*10^4)
 sage: A = random_matrix(GF(2), 3.2*10^4, 3.2*10^4)
 sage: time C= A._multiply_strassen(B,cutoff=2^11)
 CPU times: user 75.82 s, sys: 0.22 s, total: 76.04 s
 Wall time: 76.31

 sage: A = random_matrix(GF(2), 2*10^4, 2*10^4)
 sage: B = random_matrix(GF(2), 2*10^4, 2*10^4)
 sage: time C= A._multiply_strassen(B,cutoff=2^11)
 CPU times: user 19.14 s, sys: 0.09 s, total: 19.24 s
 Wall time: 19.29

 sage: B = random_matrix(GF(2), 2^14, 2^14)
 sage: A = random_matrix(GF(2), 2^14, 2^14)
 sage: time C= A._multiply_strassen(B,cutoff=2^11)
 CPU times: user 10.62 s, sys: 0.05 s, total: 10.67 s
 Wall time: 10.70

 sage: B = random_matrix(GF(2), 10^4, 10^4)
 sage: A = random_matrix(GF(2), 10^4, 10^4)
 sage: time C= A._multiply_strassen(B,cutoff=2^11)
 CPU times: user 2.73 s, sys: 0.02 s, total: 2.75 s
 Wall time: 2.76

 i.e  the speed of my current Magma install on the same computer (mind you,
 this one might not be optimised for the C2D but for the Opteron, I don't
 know). The times above don't have SSE2 yet. I guess documenting Bill's tricks

  - process the rows of A in blocks
  - use two rather than one Gray code table

 well is in order since now M4RM looks quite different from the original
 algorithm. I'll do that tomorrow.

 Again, thanks Bill!
 Martin

 --
 name: Martin Albrecht
 _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
 _www:http://www.informatik.uni-bremen.de/~malb
 _jab: [EMAIL PROTECTED]
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Martin Albrecht

 I suppose that this might be due to the ends of rows all being zero as
 they aren't a multiple of 64 bits long. But I checked for 16384x16384
 and we are nearly down to the speed of Magma there too. I just don't
 get it. The coinflip has to be broken I think.

If one uses M4RI with the new patch from within Sage another PRBG is used, but 
coinflip should be fine. Don't see these speedups (but I have two Gray code 
tables and this warrants for more if's)

Hi, I think we might consider merging our two forks again? Or do you also have 
the two Gray code tables? Are your timings on the Opteron? Because then 
things look really goo since mine are on the C2D.

Exciting times,
Martin

-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Does the '.spkg' format just cause more problems than it solves?

2008-05-17 Thread Dr. David Kirkby

Sage is distributed as a lot of files ending in .spkg, which are
basically tar files compressed with bzip2.

I myself think it would be better if instead of using these files,
Sage was simply distributed as a set of source files. I see several
problems with these files.

* Since they are compressed tar files, if you really want to use it,
why was the extension .tar.bz2 not used? At least a user would be able
to work out the file format.

* Distributing binary files like this makes it difficult to use CVS or
similar.

* There probably is an easier way (if so tell me), but the way I am
trying to build sage on Solaris I find I'm contantly recreated a .spkg
file. For example, I've notived a possible problem with /spkg/build/
singular-3-0-4-2-20080405.p1/src/Singular/flexer.sh Is there a better
way to apply a fix than to make the charge, then tar up the directory,
then compress it, then overwrite the old .spkg file? I assume there is
a better way.

Sage is pretty unique in the way all these packages are distributed in
source form. I'm not convinced this uniqueness is a good thing, but
perhaps I am wrong.

Dave
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

I don't have the two Gray code tables, so it would be good to get your
version. Also my code is currently a mess, so it would be good to
clean it up by merging with a cleaner version (yours). Tomorrow I'll
check carefully what I've changed and try and merge the ideas if there
are any you don't have which definitely improve performance on the
Opteron.

The speedups I am seeing from the ifs are possibly a feature of the
Opteron cache algorithms. It is very sensitive when things just begin
to fall out of cache, as they certainly are here. Not combining with
the zero row just nudges things closer in to the cache boundary since
it never has to read that row.

I have checked and the speedups are quite reproducible, and they
definitely come from the ifs, though I am now using a crossover with
Strassen of 7200!!

Bill.

On 18 May, 00:12, Martin Albrecht [EMAIL PROTECTED]
wrote:
  I suppose that this might be due to the ends of rows all being zero as
  they aren't a multiple of 64 bits long. But I checked for 16384x16384
  and we are nearly down to the speed of Magma there too. I just don't
  get it. The coinflip has to be broken I think.

 If one uses M4RI with the new patch from within Sage another PRBG is used, but
 coinflip should be fine. Don't see these speedups (but I have two Gray code
 tables and this warrants for more if's)

 Hi, I think we might consider merging our two forks again? Or do you also have
 the two Gray code tables? Are your timings on the Opteron? Because then
 things look really goo since mine are on the C2D.

 Exciting times,
 Martin

 --
 name: Martin Albrecht
 _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
 _www:http://www.informatik.uni-bremen.de/~malb
 _jab: [EMAIL PROTECTED]
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

P.S: yes all my times are on a 2.8Ghz Opteron. Cpuinfo says:

[EMAIL PROTECTED]:~/m4ri-20080514/testsuite cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2220
stepping: 3
cpu MHz : 1000.000
cache size  : 1024 KB
snip
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm
extapic cr8_legacy
snip
The 1000.000 there refers to the FSB.

Bill.

On 18 May, 00:12, Martin Albrecht [EMAIL PROTECTED]
wrote:
  I suppose that this might be due to the ends of rows all being zero as
  they aren't a multiple of 64 bits long. But I checked for 16384x16384
  and we are nearly down to the speed of Magma there too. I just don't
  get it. The coinflip has to be broken I think.

 If one uses M4RI with the new patch from within Sage another PRBG is used, but
 coinflip should be fine. Don't see these speedups (but I have two Gray code
 tables and this warrants for more if's)

 Hi, I think we might consider merging our two forks again? Or do you also have
 the two Gray code tables? Are your timings on the Opteron? Because then
 things look really goo since mine are on the C2D.

 Exciting times,
 Martin

 --
 name: Martin Albrecht
 _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
 _www:http://www.informatik.uni-bremen.de/~malb
 _jab: [EMAIL PROTECTED]
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Martin Albrecht

On Sunday 18 May 2008, Bill Hart wrote:
 I don't have the two Gray code tables, so it would be good to get your
 version. Also my code is currently a mess, so it would be good to
 clean it up by merging with a cleaner version (yours). Tomorrow I'll
 check carefully what I've changed and try and merge the ideas if there
 are any you don't have which definitely improve performance on the
 Opteron.

 The speedups I am seeing from the ifs are possibly a feature of the
 Opteron cache algorithms. It is very sensitive when things just begin
 to fall out of cache, as they certainly are here. Not combining with
 the zero row just nudges things closer in to the cache boundary since
 it never has to read that row.

 I have checked and the speedups are quite reproducible, and they
 definitely come from the ifs, though I am now using a crossover with
 Strassen of 7200!!

I'm using a crossover of 2048 here, so maybe our improvements are orthogonal? 
Even more puzzling, I'd expect that my crossover should be bigger than yours. 
(on a side note: my code changes how the crossover is used, your 
version: 'size  cutoff', my version: '|cutoff - size| is minimal' which 
should give a actual cutoffs closer to the desired values).

My version is here:

   http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg

(this needs an updated patch for Sage)

and here:

   http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz

(which is the raw source). Those don't have SSE2 yet but it doesn't seem to 
make that much of a difference anyway. I'll add that back before doing an 
official release. However, unfortunately I'll probably have limited/no time 
tomorrow to commit.

Martin

PS: To give at least some indication that my code still does the right thing, 
a 'known answer' test:

sage: A = random_matrix(GF(2), 10^3, 10^3)
sage: B = random_matrix(GF(2), 10^3, 10^3)
sage: (A*B)._magma_() == A._magma_() * B._magma_()
True
sage: (A._multiply_strassen(B,cutoff=256))._magma_() == A._magma_() * 
B._magma_()
True


-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: Does the '.spkg' format just cause more problems than it solves?

2008-05-17 Thread Francois



On May 18, 11:22 am, Dr. David Kirkby [EMAIL PROTECTED]
wrote:
 Sage is distributed as a lot of files ending in .spkg, which are
 basically tar files compressed with bzip2.

 I myself think it would be better if instead of using these files,
 Sage was simply distributed as a set of source files. I see several
 problems with these files.

 * Since they are compressed tar files, if you really want to use it,
 why was the extension .tar.bz2 not used? At least a user would be able
 to work out the file format.

 * Distributing binary files like this makes it difficult to use CVS or
 similar.

 * There probably is an easier way (if so tell me), but the way I am
 trying to build sage on Solaris I find I'm contantly recreated a .spkg
 file. For example, I've notived a possible problem with /spkg/build/
 singular-3-0-4-2-20080405.p1/src/Singular/flexer.sh Is there a better
 way to apply a fix than to make the charge, then tar up the directory,
 then compress it, then overwrite the old .spkg file? I assume there is
 a better way.

 Sage is pretty unique in the way all these packages are distributed in
 source form. I'm not convinced this uniqueness is a good thing, but
 perhaps I am wrong.

On my linux desktop, ark - the kde front end to compression programs
-
identify them easily as tar.bz2 but ask for confirmation.
As for applying patch, when I was working with a mainly monolithic
sage on Gentoo that's pretty much what I was doing. I even wrote a set
of commands to automate it as much as possible. I don't know if cpio
would handle tar.bz2 in which case it probably would be a better way.

Francois
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

Here are the times I get with the different cutoffs.

Magma M4RI:7200 M4RI:2048

1x1:
   2.940s 3.442s 4.132s

16384x16384:
   9.250s 11.47s 11.80s

2x2:
   16.57s 19.3s 26.05s

32000x32000:
   59.05s 71.9s 71.8s

So it seems when there is not an exact cut, the higher cutoff is
substantially better. Don't know why that is.

Tomorrow I'll see if there is anything I have that speeds up your
code. I'm hopeful we'll be within about 5% on the Opteron by then. The
other ideas I outlined above should push us 10-15% ahead of Magma if
we end up implementing them, I think. Of course one can go too crazy
with optimisation.

Bill.

On 18 May, 00:40, Martin Albrecht [EMAIL PROTECTED]
wrote:
 On Sunday 18 May 2008, Bill Hart wrote:



  I don't have the two Gray code tables, so it would be good to get your
  version. Also my code is currently a mess, so it would be good to
  clean it up by merging with a cleaner version (yours). Tomorrow I'll
  check carefully what I've changed and try and merge the ideas if there
  are any you don't have which definitely improve performance on the
  Opteron.

  The speedups I am seeing from the ifs are possibly a feature of the
  Opteron cache algorithms. It is very sensitive when things just begin
  to fall out of cache, as they certainly are here. Not combining with
  the zero row just nudges things closer in to the cache boundary since
  it never has to read that row.

  I have checked and the speedups are quite reproducible, and they
  definitely come from the ifs, though I am now using a crossover with
  Strassen of 7200!!

 I'm using a crossover of 2048 here, so maybe our improvements are orthogonal?
 Even more puzzling, I'd expect that my crossover should be bigger than yours.
 (on a side note: my code changes how the crossover is used, your
 version: 'size  cutoff', my version: '|cutoff - size| is minimal' which
 should give a actual cutoffs closer to the desired values).

 My version is here:

    http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg

 (this needs an updated patch for Sage)

 and here:

    http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz

 (which is the raw source). Those don't have SSE2 yet but it doesn't seem to
 make that much of a difference anyway. I'll add that back before doing an
 official release. However, unfortunately I'll probably have limited/no time
 tomorrow to commit.

 Martin

 PS: To give at least some indication that my code still does the right thing,
 a 'known answer' test:

 sage: A = random_matrix(GF(2), 10^3, 10^3)
 sage: B = random_matrix(GF(2), 10^3, 10^3)
 sage: (A*B)._magma_() == A._magma_() * B._magma_()
 True
 sage: (A._multiply_strassen(B,cutoff=256))._magma_() == A._magma_() *
 B._magma_()
 True

 --
 name: Martin Albrecht
 _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99
 _www:http://www.informatik.uni-bremen.de/~malb
 _jab: [EMAIL PROTECTED]
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread David Harvey


On May 17, 2008, at 8:38 PM, Bill Hart wrote:

 Of course one can go too crazy with optimisation.

No surely that never happens around here.

david


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] sed problem with Singular/flexer.sh on Solaris

2008-05-17 Thread Dr. David Kirkby

I'm getting a build problem on Solaris 10 (SPARC) The script Singular/
flexer.sh assumes 'flex' is present, then tries to extract the version
number. This line:

TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\\.//`

is creating an error on Solaris.

I think it tries to get the last part of the version - i.e. the Z of
version X.Y.Z.

Anyway, assuming that is what is is needed, re-writing the line to:

TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\.//`

solves the problem.

Here is part of the original, without the proposed change.

VERSION=`flex --version |sed -e s/^.*version //|sed -e s/^flex //`
LV=`echo $VERSION|sed -e s/\.[0-9]*\.[0-9]*\$//`
MIDV=`echo $VERSION|sed -e s/^[0-9]*\.//|sed -e s/\.[0-9]*\$//`
TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\\.//`
#echo $LV $MIDV $TV
#goodversion=
if [ $LV -lt 2 ];
then goodversion=true
fi
if [ $LV -eq 2 ];
then
if [ $MIDV -lt 5 ];
then goodversion=true
fi
if [ $MIDV -eq 5 ];
then
if [ $TV -le 4 ];
then goodversion=true;
fi
fi
fi

--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: sed problem with Singular/flexer.sh on Solaris

2008-05-17 Thread mabshoff

On May 18, 2:45 am, Dr. David Kirkby [EMAIL PROTECTED]
wrote:

Hi David,

 I'm getting a build problem on Solaris 10 (SPARC) The script Singular/
 flexer.sh assumes 'flex' is present,

The need to have flex installed has been fixed in 3.0.2.alpha0.

 then tries to extract the version
 number. This line:

 TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\\.//`

 is creating an error on Solaris.

 I think it tries to get the last part of the version - i.e. the Z of
 version X.Y.Z.

 Anyway, assuming that is what is is needed, re-writing the line to:

 TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\.//`

This should go upstream. malb?

 solves the problem.

 Here is part of the original, without the proposed change.

SNIP

While we are talking about libSingular: The part of spkg-install that
create the Singular script is buggy since it depends on GNU's tail to
work. I haven't fixed that yet, but I plan to get it done in 3.0.2 or
3.0.3.

Cheers,

Michael
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart



On 18 May, 00:40, Martin Albrecht [EMAIL PROTECTED]
wrote:
 My version is here:

    http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg

 (this needs an updated patch for Sage)

 and here:

    http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz

 (which is the raw source).

This pure C version seems to be the old version, before you made
either of the two big changes.

Bill.
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: Does the '.spkg' format just cause more problems than it solves?

2008-05-17 Thread mabshoff

On May 18, 1:22 am, Dr. David Kirkby [EMAIL PROTECTED]
wrote:

Hi David,

 Sage is distributed as a lot of files ending in .spkg, which are
 basically tar files compressed with bzip2.

 I myself think it would be better if instead of using these files,
 Sage was simply distributed as a set of source files. I see several
 problems with these files.

 * Since they are compressed tar files, if you really want to use it,
 why was the extension .tar.bz2 not used? At least a user would be able
 to work out the file format.

Not all spkgs are compressed. The only exception to the rule that is
in the default distribution is the Fortran.spkg. Spkgs can also
contain binaries or databases, so it isn't always sources. How to
work with spkgs is documented in the developer's manual.

 * Distributing binary files like this makes it difficult to use CVS or
 similar.

We don't track spkgs in any RCS. That is mostly due to their size. And
the src directory in an spkg is supposed to be vanilla upstream as
documented in SPKG.txt, so if it ever gets corrupted we can just nuke
it and replace it with a vanilla tarball. We have copies of all old
spkgs around, so when we need to go back and fine something older that
isn't a problem.

 * There probably is an easier way (if so tell me), but the way I am
 trying to build sage on Solaris I find I'm contantly recreated a .spkg
 file. For example, I've notived a possible problem with /spkg/build/
 singular-3-0-4-2-20080405.p1/src/Singular/flexer.sh Is there a better
 way to apply a fix than to make the charge, then tar up the directory,
 then compress it, then overwrite the old .spkg file? I assume there is
 a better way.

sage -pkg foo/bad does all the dirty work for you. But once you
source local/bin/sage-env you can just run ./spkg-install inside the
spkg's directory. Once you got all the changes done apply the changes
to a fresh spkg so that you do not add all the binary crap/left over
from the build. And since we do ship vanilla sources you need to add a
patched file into the patches directory and copy it over since we do
not use patch. The reason for not using patch is simply that patch is
often absent or broken on many systems. The fact that patch on Solaris
per default doesn't understand unified diff makes my blood boil each
time I run into that problem. I know there is gpatch, but that doesn't
really solve the problem 

 Sage is pretty unique in the way all these packages are distributed in
 source form. I'm not convinced this uniqueness is a good thing, but
 perhaps I am wrong.

I am sure you are :). We are aiming for wide build support and that
includes Windows via Cygwin [which William and I are working on to
make it supported again in 3.0.x or 3.x] and then MSVC. the spkg
format is KISS and I doubt any proposed change will make live more
complicated. So far *any* proposed change didn't make it since the
current spkg format works with warts and all.

 Dave

Cheers,

Michael
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: Does the '.spkg' format just cause more problems than it solves?

2008-05-17 Thread Dr. David Kirkby

On May 18, 1:14 am, Francois [EMAIL PROTECTED] wrote:

  Sage is pretty unique in the way all these packages are distributed in
  source form. I'm not convinced this uniqueness is a good thing, but
  perhaps I am wrong.

 On my linux desktop, ark - the kde front end to compression programs
 -
 identify them easily as tar.bz2 but ask for confirmation.
 As for applying patch, when I was working with a mainly monolithic
 sage on Gentoo that's pretty much what I was doing. I even wrote a set
 of commands to automate it as much as possible. I don't know if cpio
 would handle tar.bz2 in which case it probably would be a better way.

 Francois

I guess one could make a script to simplyify the process of making
changes to the .spkg files somewhat. But the fact the source is
distributed in large compressed files means that even the simplest
change will need a new .spkg file to be made. If someone else has made
that change, and you want to use it, you have to download a
large .spkg file.

In contrast, if the actual source files could be checked out from a
CVS (or whatever) repositry, then it would be much easier to keep up
to date. A change of a few bytes in one file will only need a few
bytes to be downloaded, not a multi-megabyte .spkg file.


--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: slightly OT: new M4RI library

2008-05-17 Thread Bill Hart

I managed to get the modified version from the spkg. Nice code!!

Unfortunately it is not as fast on my opteron. So more work tomorrow
for me to try and get it down to the same times as I have with my
version.

Here are the times all on my opteron. Note your CTD version was
optimal at a cutoff of 2048, not 7200 as for my code. Now I am worried
that maybe my code is actually broken somehow and still passing the
test code. I'll carefully make the changes to your code tomorrow to
see if that is the case.

Magma CTD-M4RI:2048 AMD-M4RI:7200 AMD-M4RI:2048

1x1: 2.940s 3.13s 3.442s 4.132s

16384x16384: 9.250s 12.96s 11.47s 11.80s

2x2: 16.57s 22.43s 19.3s 26.05s

32000x32000: 59.05s 90.20s 71.9s 71.8s

Bill.

On 18 May, 01:58, Bill Hart [EMAIL PROTECTED] wrote:
 On 18 May, 00:40, Martin Albrecht [EMAIL PROTECTED]
 wrote:

  My version is here:

     http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg

  (this needs an updated patch for Sage)

  and here:

     http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz

  (which is the raw source).

 This pure C version seems to be the old version, before you made
 either of the two big changes.

 Bill.
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: Does the '.spkg' format just cause more problems than it solves?

2008-05-17 Thread mabshoff

On May 18, 3:13 am, Dr. David Kirkby [EMAIL PROTECTED]
wrote:
SNIP

Hi David,

 I guess one could make a script to simplyify the process of making
 changes to the .spkg files somewhat. But the fact the source is
 distributed in large compressed files means that even the simplest
 change will need a new .spkg file to be made. If someone else has made
 that change, and you want to use it, you have to download a
 large .spkg file.

Sure, but few people actually make changes for spkgs and the vast
majority of changes are to the Sage libaray, which one can pull
directly.

 In contrast, if the actual source files could be checked out from a
 CVS (or whatever) repositry, then it would be much easier to keep up
 to date. A change of a few bytes in one file will only need a few
 bytes to be downloaded, not a multi-megabyte .spkg file.

But that requires massive infrastructure and sticking 150mb+
compressed sources into some RCS isn't a walk in the park. Bandwidth
is cheap and plentiful, CPU cycles to get the same number of updates
from an RCS: not so much. And we do not assume a working RCS to
compile Sage from scratch since that would be a technical hurdle many
people cannot cross. It is all about KISS ;)

Cheers,

Michael
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Fwd: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)

2008-05-17 Thread William Stein

Hi Sage-Devel,

Can somebody volunteer to make a trac ticket and put all the following
fixes into sage?

William


-- Forwarded message --
From: Johann Tonsing [EMAIL PROTECTED]
Date: Sat, May 17, 2008 at 4:17 PM
Subject: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)
To: [EMAIL PROTECTED]


William,
Thanks for your efforts.  Having used SAGE for a few hours I am duly impressed.
Herewith a few minor corrections to the tutorial + other documentation.

help:
a.
The variable DATA contains the directory with data files that you
upload into the worksheet. For example, to open a file in that
directory, do open(DIR+'filename').
DIR = DATA
b. Inconsistent capitalisation: Split and join cells should perhaps
be Split and Join Cells, also DATA variable should be DATA
Variable, etc.

Tutorial:
1. doc/live/tut/node9.html:
Note: You should not type the triple dots ... above; they are just to
emphasize that the code is indented.
The dots do not appear in the HTML live notebook version.  Ideally the
tools would provide a way to omit some text in this version.  If
impossible write something like:
Note: The interactive interpreter may display three dots (...) to
indicate that code is indented - these don't need to be entered.
---
2. doc/live/tut/node20.html:
Type p.show(axes=false) to see this without any ases.
ases = axes
---

3. doc/live/tut/node13.html
Next lets do some arithmetic.
suggest rather:
Next, let's do some arithmetic.
---
4. doc/live/tut/node23.html
equatoins = equations
---
5. doc/live/tut/node24.html
pari and maxima = PARI and Maxima
The Sage notebook version displays
vellip#vdots;
in the table (whether viewed in Opera for Mac version +- 9.5 or Safari
3.1).  Just vellip; might have worked.  The static version exhibits
the same problem.
---
6. doc/live/tut/node40.html
gap.console(): You are completely using another program, e.g., Gap/Magma/GP
a. Add . after GP.
b. Suggest replacing entire sentence with:
gap.console(): This opens the GAP console, i.e. transfers control to GAP.
as the phrase completely using is unclear.
c. Suggest replacing occurrences of Gap that refer to the GAP
software with GAP --- this applies to this page and possibly other
documentation files, perhaps search for the word Gap.
---
7. doc/live/tut/node56.html
Please note that you cannot do a stats = prun -r A*A for some internal reason.
replace with
Note: entering stats = prun -r A*A displays a syntax error message
because prun is an IPython shell command, not a regular function.
---
8.
See SAGE_ROOT/examples/pyrex/factorial.spyx for an example of a
compiled implementation of the factorial function that directly uses
the GMP C library
Replace
examples/pyrex
with
examples/programming/sagex
as the file seems to have moved.  Please confirm the exact directory
name to list here as there are two instances of factorial.spyx shipped
with v3.0.1.
---
9. doc/live/tut/node46.html
Suggest replacing
In particular, attach has the side effect of (auto-reload), very handy
when debugging code, while load does not.
with
The attach command automatically reloads a file whenever it changes,
which is handy when debugging code, whereas the load command only
loads a file once.
as (auto-reload) is not defined anywhere (is this a LISP command?)
and therefore might not make sense.
---
10. doc/static/ref/node57.html
Heierarchy = Hierarchy
---
11. doc/static/ref/node58.html
DD NOT = DO NOT
(IMHO ***NOT*** can be replaced with NOT - shouting (capital
letters) should surely be enough?  Alternatively render the NOT in
bold.)
---
12. doc/static/ref/node18.html
SAGE includes the Moin Moin Wiki interactive web page system standard.
just omit  standard
-or-
standard = as standard
-or-
standard = by default
---
13. doc/static/ref/module-sage.plot.plot.html
We combine together  = We combine

Intuitive usage and completeness notes
The following are not really errors, just potential sources of confusion.
1. doc/live/tut/node10.html
When reading
the complex numbers CC (which uses I (or i), as usual, for the square
root of −1).
I tried
(1+2*I) in CC
(1+2*i) in CC
and was surprised to receive False.  Eventually I figured out one has
to define I or i first.
Suggest rather
the complex numbers CC (which uses I (or i) for the square root of −1
-- just enter I=CC.0 to define I).
---
2. doc/live/tut/node10.html
It might have been handy if
optional_packages()
and
install_package('database_gap-4.4.10')
were already executable blocks.
---
3. doc/live/tut/node10.html
I installed database_gap.  The actual output of
K.galois_group()
and
K.class_group()
when evaluated were not what was pre-computed + stored in the
tutorial.  I was using sage 3.0.1 and database_gap 4.4.10.
---
4. doc/live/tut/node13.html
(The object MPolynomialRing(GF(5),3,z) is the same as the object
MPolynomialRing(GF(5),3,z).)
Oh, so x == x.  Ah.  Huh?
---
5. 

[sage-devel] Re: Fwd: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)

2008-05-17 Thread mabshoff



On May 18, 4:24 am, William Stein [EMAIL PROTECTED] wrote:
 Hi Sage-Devel,

 Can somebody volunteer to make a trac ticket and put all the following
 fixes into sage?
SNIP
 13. When I executed
 /Applications/sage/local/bin/maxima
 the following was displayed:
 dyld: Library not loaded:
 /Users/clarita/Desktop/sage-2.8.11/local/lib/libreadline.5.2.dylib
   Referenced from:
 /Applications/sage/local/lib/maxima/5.13.0/binary-clisp/lisp.run
   Reason: image not found
 Trace/BPT trap
 I have installed libreadline.5.2.dylib in the standard MacPorts
 location (/opt/local/lib/libreadline.5.2.dylib) - would it somehow be
 possible to build maxima to look for shared libraries in the MacPorts
 standard location?
 (I was able to resolve this by entering
 export DYLD_LIBRARY_PATH=/opt/local/lib
 ./maxima
 but others might not know that they should do that.)

This is not a bug on our end. We ship a libreadline.5.2.dylib in
$SAGE_LOCAL/lib, so the solution is to source $SAGE_ROOT/local/bin/
sage-env before randomly starting applications ;)

Cheers,

Michael

SNIP
--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: Fwd: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)

2008-05-17 Thread William Stein

On Sat, May 17, 2008 at 7:29 PM, mabshoff [EMAIL PROTECTED] wrote:



 On May 18, 4:24 am, William Stein [EMAIL PROTECTED] wrote:
 Hi Sage-Devel,

 Can somebody volunteer to make a trac ticket and put all the following
 fixes into sage?
 SNIP
 13. When I executed
 /Applications/sage/local/bin/maxima
 the following was displayed:

Instead, do (from Terminal):

   cd /Applications/sage
   ./sage   -maxima

 -- William

 dyld: Library not loaded:
 /Users/clarita/Desktop/sage-2.8.11/local/lib/libreadline.5.2.dylib
   Referenced from:
 /Applications/sage/local/lib/maxima/5.13.0/binary-clisp/lisp.run
   Reason: image not found
 Trace/BPT trap
 I have installed libreadline.5.2.dylib in the standard MacPorts
 location (/opt/local/lib/libreadline.5.2.dylib) - would it somehow be
 possible to build maxima to look for shared libraries in the MacPorts
 standard location?
 (I was able to resolve this by entering
 export DYLD_LIBRARY_PATH=/opt/local/lib
 ./maxima
 but others might not know that they should do that.)

 This is not a bug on our end. We ship a libreadline.5.2.dylib in
 $SAGE_LOCAL/lib, so the solution is to source $SAGE_ROOT/local/bin/
 sage-env before randomly starting applications ;)

 Cheers,

 Michael

 SNIP
 




-- 
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] sage/numpy fan mail

2008-05-17 Thread William Stein

Hi Sage-Devel,

Here is a Sage fanmail blogpost about Sage the distribution from here:
http://www.funjackals.com/blog/?p=274

Sage Makes Me Happier Than Seems Reasonable
Posted May 14, 2008

I've known about the Python-for-mathematics software stack SAGE for a
while now, but I hadn't played with it until today. What happened? I
read Vincent Noel's very good blog entry on replacing Matlab with
Python. What I had missed about SAGE is that it's got a fully
self-contained build environment in the source distribution. Building
the full stack, from ATLAS and BLAS through Python 2.5 and Numpy 1.0.3
is as simple as issuing a single make command. Really.

Why is this exciting? Because I've recently been through the horror of
trying to build Numpy and it's dependencies form source on our new
server. That should be easy, right? Well, it's been a real pain in the
neck. Mostly because I've had all kinds of problems getting the right
versions of libraries linked in to Numpy. There are the system
libraries. And the libraries that came with the Absoft Fortran 95
compiler. And the versions that I built in my account. And I can never
seem to get the arguments to setup.py quite right. But now I don't
have to. I can't wait to get to work tomorrow!

--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: SAGE + OpenOffice PyUno /OOMath

2008-05-17 Thread William Stein

On Mon, May 12, 2008 at 5:52 AM, Kutoma Ltd [EMAIL PROTECTED] wrote:

 Hallo to all,

 Does anyone use OpenOffice and Sage in combination like the down
 mentioned link

 describing the  interface to python and the usage of the equation
 editor



 http://documentation.openoffice.org/manuals/oooauthors/MathObjects.pdf


 http://wiki.services.openoffice.org/wiki/PyUNO_bridge


 Any feedback welcome


 Best regards


 Gottfried


Sorry that nobody responded to your question.  This probably suggests
that indeed nobody on sage-devel actually uses Sage that way.
It should be possible though.  Are you asking because you want to,
and just want to know about pitfalls?  Or?

If you do try linking Sage and openoffice using PyUno, could you
please post about what happens?  If it doesn't work, post, etc.
I've definitely _wondered_ about such linking for a long time, but
never personally found time to investigate.

William

--~--~-~--~~~---~--~~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~--~~~~--~~--~--~---



[sage-devel] Re: Fwd: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)

2008-05-17 Thread John H Palmieri

I'll address some of the issues with tut.tex.  See

http://trac.sagemath.org/sage_trac/ticket/3208

for patches.  (Some of these things have already been taken care of
there, and I'll post patches to others soon.)

Some of the things I'm not dealing with: I don't have Opera on my Mac,
and I'm not going to install it any time soon.  I don't see any of the
garbled things that are reported here, so I can't do much about it.
I'm also not dealing with the mare's nest that is the reference
manual.

  John


On May 17, 7:24 pm, William Stein [EMAIL PROTECTED] wrote:
 Hi Sage-Devel,

 Can somebody volunteer to make a trac ticket and put all the following
 fixes into sage?

 William

 Tutorial:
 1. doc/live/tut/node9.html:
 Note: You should not type the triple dots ... above; they are just to
 emphasize that the code is indented.
 The dots do not appear in the HTML live notebook version.  Ideally the
 tools would provide a way to omit some text in this version.

This can be done.

 If impossible write something like:
 Note: The interactive interpreter may display three dots (...) to
 indicate that code is indented - these don't need to be entered.
 ---
 2. doc/live/tut/node20.html:
 Type p.show(axes=false) to see this without any ases.
 ases = axes

This seems to have been changed already, at least in the copy of
tut.tex that I have.

 ---

 3. doc/live/tut/node13.html
 Next lets do some arithmetic.
 suggest rather:
 Next, let's do some arithmetic.

Same here.

 ---
 4. doc/live/tut/node23.html
 equatoins = equations

Same here

 ---
 5. doc/live/tut/node24.html
 pari and maxima = PARI and Maxima
 The Sage notebook version displays
 vellip#vdots;
 in the table (whether viewed in Opera for Mac version +- 9.5 or Safari
 3.1).  Just vellip; might have worked.  The static version exhibits
 the same problem.

I don't see this problem.  I see PARI and Maxima in node28.html, but
no dots anywhere near there.

 ---
 6. doc/live/tut/node40.html
 gap.console(): You are completely using another program, e.g., Gap/Magma/GP
 a. Add . after GP.
 b. Suggest replacing entire sentence with:
 gap.console(): This opens the GAP console, i.e. transfers control to GAP.
 as the phrase completely using is unclear.

Okay.

 c. Suggest replacing occurrences of Gap that refer to the GAP
 software with GAP --- this applies to this page and possibly other
 documentation files, perhaps search for the word Gap.

I don't see Gap anywhere in tut.tex.

 ---
 7. doc/live/tut/node56.html
 Please note that you cannot do a stats = prun -r A*A for some internal reason.
 replace with
 Note: entering stats = prun -r A*A displays a syntax error message
 because prun is an IPython shell command, not a regular function.

Okay

 ---
 8.
 See SAGE_ROOT/examples/pyrex/factorial.spyx for an example of a
 compiled implementation of the factorial function that directly uses
 the GMP C library
 Replace
 examples/pyrex
 with
 examples/programming/sagex
 as the file seems to have moved.  Please confirm the exact directory
 name to list here as there are two instances of factorial.spyx shipped
 with v3.0.1.

Okay (fixed in earlier patch).

 ---
 9. doc/live/tut/node46.html
 Suggest replacing
 In particular, attach has the side effect of (auto-reload), very handy
 when debugging code, while load does not.
 with
 The attach command automatically reloads a file whenever it changes,
 which is handy when debugging code, whereas the load command only
 loads a file once.
 as (auto-reload) is not defined anywhere (is this a LISP command?)
 and therefore might not make sense.

I see In particular, {\em attach} has the side effect of auto-
reloading,
very handy when debugging code, while {\em load} does not.  Anyway, I
can change it; the suggested wording seems a bit better to me.


 ---
 10. doc/static/ref/node57.html
 Heierarchy = Hierarchy
 ---
 11. doc/static/ref/node58.html
 DD NOT = DO NOT
 (IMHO ***NOT*** can be replaced with NOT - shouting (capital
 letters) should surely be enough?  Alternatively render the NOT in
 bold.)
 ---
 12. doc/static/ref/node18.html
 SAGE includes the Moin Moin Wiki interactive web page system standard.
 just omit  standard
 -or-
 standard = as standard
 -or-
 standard = by default
 ---
 13. doc/static/ref/module-sage.plot.plot.html
 We combine together  = We combine
 
 Intuitive usage and completeness notes
 The following are not really errors, just potential sources of confusion.
 1. doc/live/tut/node10.html
 When reading
 the complex numbers CC (which uses I (or i), as usual, for the square
 root of −1).
 I tried
 (1+2*I) in CC
 (1+2*i) in CC
 and was surprised to receive False.  Eventually I figured out one has
 to define I or i first.
 Suggest rather
 the complex numbers CC (which uses I (or i) for the square root of −1
 -- just enter I=CC.0 to define I).

Dealt in an earlier patch.

 ---
 2. doc/live/tut/node10.html
 It might have been handy if
 optional_packages()
 and