[sage-devel] Re: slightly OT: new M4RI library
Heres another idea which should speed things up a bit. For 1x1 we currently use k = 6. Instead of this, we could use k = 5 and make two Gray tables simultaneously. This will still fit in cache. Instead of doing 6 bits at a time, we can then do 10 bits at a time. We'd load the appropriate line from the first Gray table, then the appropriate one from the second and xor them, then xor with the output matrix. This should decrease the number of loads and stores considerably. Moreover, the SSE instructions will then be much more efficient as the ratio of arithmetic instructions to loads and stores is higher. Of course one could also do 16 bits at a time, by doing 4 tables, but I think this might actually get slower again since you've only increased the amount of work done by 60%, but you've had a 30 % increase in instructions. Bill. On 17 May, 17:45, Bill Hart [EMAIL PROTECTED] wrote: Martin, The test code still passes if you change RADIX to 128. I've no idea how it passes, but it does. Shame the results are not correct, because this speeds the code up by a factor of 2. I notice that in the SSE code, you check to see if alignment can be achieved, otherwise it doesn't use SSE. But this introduces an unpredictable branch. Also, where ther are three operands, you can't use SSE2 because the likelihood of all three being aligned is too small. I think a better idea would be to explicitly force all matrices and all rows to be 128 bit aligned if the matrices are wide enough to benefit from SSE2, Then the combine function can always use SSE2 and there will be no need to check for alignment. I experimented with interleaving MMX and GPR XOR's, but this doesn't speed anything up. There are more instructions emitted and the time stays about the same. The only way interleaving the MMX and GPR code would speed things up is if there was more computation going on in the registers and less memory loading and storing, I think. Bill. On 17 May, 15:45, Bill Hart [EMAIL PROTECTED] wrote: Hi Martin, Here is another 10% improvement. In the loop at the bottom of mzd_combine you can explicitly unroll by a factor of 8: word * end = b1_ptr + wide; register word * end8 = end - 8; while (b1_ptr end8) { *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); } while (b1_ptr end) { *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); } I did this in combination with changing the crossover for 1x1 from 3600 to 7200. Bill. On 17 May, 09:40, Martin Albrecht [EMAIL PROTECTED] wrote: On Saturday 17 May 2008, Bill Hart wrote: In going from 5000x5000 to 1x1 Magma's time increases by a factor of less than 4. That is impossible. Strassen will never help us there. They must be doing something else. Probably something clever. Bill. I was stuck there too yesterday. Maybe only at 1x1 the pipeline gets fully utilised? Martin PS: If we run out of idea we can simply go for parallelism, that should help on sage.math ;-) -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
On Saturday 17 May 2008, Bill Hart wrote: Martin, The test code still passes if you change RADIX to 128. I've no idea how it passes, but it does. Shame the results are not correct, because this speeds the code up by a factor of 2. Since all routines use the RADIX and I only check if their results match they are all wrong in the same way but it isn't detected. I should add a test with known answers I suppose. I notice that in the SSE code, you check to see if alignment can be achieved, otherwise it doesn't use SSE. But this introduces an unpredictable branch. Also, where ther are three operands, you can't use SSE2 because the likelihood of all three being aligned is too small. I think a better idea would be to explicitly force all matrices and all rows to be 128 bit aligned if the matrices are wide enough to benefit from SSE2, Then the combine function can always use SSE2 and there will be no need to check for alignment. I'll try that. I experimented with interleaving MMX and GPR XOR's, but this doesn't speed anything up. There are more instructions emitted and the time stays about the same. The only way interleaving the MMX and GPR code would speed things up is if there was more computation going on in the registers and less memory loading and storing, I think. I came to the same conclusion (but my code might not have been as good as your's). I improved other areas of the code (e.g. use naiv multiplication rather than M4RM if B-ncols RADIX since it is faster etc.) I can forward you my newest tarball (but the speed improvements aren't really noticable yet). Martin -- name: Martin Albrecht _pgp: http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www: http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
On Saturday 17 May 2008, Martin Albrecht wrote: I think a better idea would be to explicitly force all matrices and all rows to be 128 bit aligned if the matrices are wide enough to benefit from SSE2, Then the combine function can always use SSE2 and there will be no need to check for alignment. That doesn't seem to make a noticeable difference for me (on C2D). However, I realised that the multiplications where the target matrix is a real matrix rather than a window (which has bad data locality). Copying everything over seems not like a good idea but it at least indicates an area for improvements. Okay, if I only copy when we crossover to M4RM then the memory overhead is constant (~ cutoff^2) and the performance still improves. Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix DimensionMagma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 3.610 16,384 x 16,384 11.140 12.120 20,000 x 20,000 20.370 24.390 32,000 x 32,000 74.290 94.910 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix DimensionMagma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 2.990 16,384 x 16,384 11.140 11.750 20,000 x 20,000 20.370 21.180 32,000 x 32,000 74.290 86.570 On Opteron things don't look this way, but I think sage.math is pretty heavily used right now such that my benchmarks there are not very telling. Martin -- name: Martin Albrecht _pgp: http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www: http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
That's looking good. Would you like me to run it on an unburdened opteron to see how it goes? If you like you can send me a tarball and I'll try it out. I think our best bet for a significant improvement now is the idea of using two Gray tables of half the size simultaneously. I also realised it possibly improves the cache performance for the A matrix too. I was casually wondering whether Magma might use a highly optimised Winograd's algorithm instead of the naive algorithm. But over GF2 I think it probably actually takes longer, since it basically replaces n^2 full length scalar multiplies by n^2 half length ones and 2*n^2 half row additions, plus a pile of other overhead. Bill. On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED] wrote: On Saturday 17 May 2008, Martin Albrecht wrote: I think a better idea would be to explicitly force all matrices and all rows to be 128 bit aligned if the matrices are wide enough to benefit from SSE2, Then the combine function can always use SSE2 and there will be no need to check for alignment. That doesn't seem to make a noticeable difference for me (on C2D). However, I realised that the multiplications where the target matrix is a real matrix rather than a window (which has bad data locality). Copying everything over seems not like a good idea but it at least indicates an area for improvements. Okay, if I only copy when we crossover to M4RM then the memory overhead is constant (~ cutoff^2) and the performance still improves. Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 3.610 16,384 x 16,384 11.140 12.120 20,000 x 20,000 20.370 24.390 32,000 x 32,000 74.290 94.910 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 2.990 16,384 x 16,384 11.140 11.750 20,000 x 20,000 20.370 21.180 32,000 x 32,000 74.290 86.570 On Opteron things don't look this way, but I think sage.math is pretty heavily used right now such that my benchmarks there are not very telling. Martin -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
Yet another idea. Suppose we do not combine entire rows in the Gray table, but only half rows. Once half a row is bigger than a single cache line (512 bits on the Opteron) we may as well work with half rows. This allows us to work with twice as many rows at once in the Gray tables (each of half the size). This means that we are dealing with twice as many bits from rows of A as usual and twice as many rows of B as usual, but we need to do it all again for the second half of the rows. This means we get twice the work done in the same amount of cache space. Combined with the idea of using two Gray tables of 2^5 combinations of rows instead of a single table of 2^6 combinations of rows, this would equate to dealing with 20 bits of each row of A at a time and 20 rows of B at a time. With this scheme, there would then be 4 arithmetic operations in SSE registers, 5 loads and 1 store, when combining rows from Gray tables, instead of about 6.6 loads, 3.3 stores and 3.3 arithmetic operations, changing the ratio of load/stores to arithmetic ops from 2.7 to 1.5. This is another example where copying the data (the half rows) out and reordering it so it has better locality, would probably make a big difference. That sort of thing always works exceptionally well on AMD chips. Bill. On 17 May, 21:05, Bill Hart [EMAIL PROTECTED] wrote: That's looking good. Would you like me to run it on an unburdened opteron to see how it goes? If you like you can send me a tarball and I'll try it out. I think our best bet for a significant improvement now is the idea of using two Gray tables of half the size simultaneously. I also realised it possibly improves the cache performance for the A matrix too. I was casually wondering whether Magma might use a highly optimised Winograd's algorithm instead of the naive algorithm. But over GF2 I think it probably actually takes longer, since it basically replaces n^2 full length scalar multiplies by n^2 half length ones and 2*n^2 half row additions, plus a pile of other overhead. Bill. On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED] wrote: On Saturday 17 May 2008, Martin Albrecht wrote: I think a better idea would be to explicitly force all matrices and all rows to be 128 bit aligned if the matrices are wide enough to benefit from SSE2, Then the combine function can always use SSE2 and there will be no need to check for alignment. That doesn't seem to make a noticeable difference for me (on C2D). However, I realised that the multiplications where the target matrix is a real matrix rather than a window (which has bad data locality). Copying everything over seems not like a good idea but it at least indicates an area for improvements. Okay, if I only copy when we crossover to M4RM then the memory overhead is constant (~ cutoff^2) and the performance still improves. Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 3.610 16,384 x 16,384 11.140 12.120 20,000 x 20,000 20.370 24.390 32,000 x 32,000 74.290 94.910 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 2.990 16,384 x 16,384 11.140 11.750 20,000 x 20,000 20.370 21.180 32,000 x 32,000 74.290 86.570 On Opteron things don't look this way, but I think sage.math is pretty heavily used right now such that my benchmarks there are not very telling. Martin -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
Martin, Here's a really unusual thing. Perhaps you can confirm this. I get a 20% improvement if I add: if (x) { } in the three obvious places in the function _mzd_mul_m4rm_impl. This stops it mpz_combining the zero row. But I don't understand why this works. The time should be only 1.5% better since k = 6 and there are 2^k rows in the table, only one of which is zero. Could it be that your coinflip function is not quite random? Anyway, I'm down to 3.40s for 1x1 with this change. Test functions still pass. Bill. On 17 May, 22:05, Bill Hart [EMAIL PROTECTED] wrote: Yet another idea. Suppose we do not combine entire rows in the Gray table, but only half rows. Once half a row is bigger than a single cache line (512 bits on the Opteron) we may as well work with half rows. This allows us to work with twice as many rows at once in the Gray tables (each of half the size). This means that we are dealing with twice as many bits from rows of A as usual and twice as many rows of B as usual, but we need to do it all again for the second half of the rows. This means we get twice the work done in the same amount of cache space. Combined with the idea of using two Gray tables of 2^5 combinations of rows instead of a single table of 2^6 combinations of rows, this would equate to dealing with 20 bits of each row of A at a time and 20 rows of B at a time. With this scheme, there would then be 4 arithmetic operations in SSE registers, 5 loads and 1 store, when combining rows from Gray tables, instead of about 6.6 loads, 3.3 stores and 3.3 arithmetic operations, changing the ratio of load/stores to arithmetic ops from 2.7 to 1.5. This is another example where copying the data (the half rows) out and reordering it so it has better locality, would probably make a big difference. That sort of thing always works exceptionally well on AMD chips. Bill. On 17 May, 21:05, Bill Hart [EMAIL PROTECTED] wrote: That's looking good. Would you like me to run it on an unburdened opteron to see how it goes? If you like you can send me a tarball and I'll try it out. I think our best bet for a significant improvement now is the idea of using two Gray tables of half the size simultaneously. I also realised it possibly improves the cache performance for the A matrix too. I was casually wondering whether Magma might use a highly optimised Winograd's algorithm instead of the naive algorithm. But over GF2 I think it probably actually takes longer, since it basically replaces n^2 full length scalar multiplies by n^2 half length ones and 2*n^2 half row additions, plus a pile of other overhead. Bill. On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED] wrote: On Saturday 17 May 2008, Martin Albrecht wrote: I think a better idea would be to explicitly force all matrices and all rows to be 128 bit aligned if the matrices are wide enough to benefit from SSE2, Then the combine function can always use SSE2 and there will be no need to check for alignment. That doesn't seem to make a noticeable difference for me (on C2D). However, I realised that the multiplications where the target matrix is a real matrix rather than a window (which has bad data locality). Copying everything over seems not like a good idea but it at least indicates an area for improvements. Okay, if I only copy when we crossover to M4RM then the memory overhead is constant (~ cutoff^2) and the performance still improves. Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 3.610 16,384 x 16,384 11.140 12.120 20,000 x 20,000 20.370 24.390 32,000 x 32,000 74.290 94.910 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 2.990 16,384 x 16,384 11.140 11.750 20,000 x 20,000 20.370 21.180 32,000 x 32,000 74.290 86.570 On Opteron things don't look this way, but I think sage.math is pretty heavily used right now such that my benchmarks there are not very telling. Martin -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
I suppose that this might be due to the ends of rows all being zero as they aren't a multiple of 64 bits long. But I checked for 16384x16384 and we are nearly down to the speed of Magma there too. I just don't get it. The coinflip has to be broken I think. Bill. On 17 May, 22:40, Bill Hart [EMAIL PROTECTED] wrote: Martin, Here's a really unusual thing. Perhaps you can confirm this. I get a 20% improvement if I add: if (x) { } in the three obvious places in the function _mzd_mul_m4rm_impl. This stops it mpz_combining the zero row. But I don't understand why this works. The time should be only 1.5% better since k = 6 and there are 2^k rows in the table, only one of which is zero. Could it be that your coinflip function is not quite random? Anyway, I'm down to 3.40s for 1x1 with this change. Test functions still pass. Bill. On 17 May, 22:05, Bill Hart [EMAIL PROTECTED] wrote: Yet another idea. Suppose we do not combine entire rows in the Gray table, but only half rows. Once half a row is bigger than a single cache line (512 bits on the Opteron) we may as well work with half rows. This allows us to work with twice as many rows at once in the Gray tables (each of half the size). This means that we are dealing with twice as many bits from rows of A as usual and twice as many rows of B as usual, but we need to do it all again for the second half of the rows. This means we get twice the work done in the same amount of cache space. Combined with the idea of using two Gray tables of 2^5 combinations of rows instead of a single table of 2^6 combinations of rows, this would equate to dealing with 20 bits of each row of A at a time and 20 rows of B at a time. With this scheme, there would then be 4 arithmetic operations in SSE registers, 5 loads and 1 store, when combining rows from Gray tables, instead of about 6.6 loads, 3.3 stores and 3.3 arithmetic operations, changing the ratio of load/stores to arithmetic ops from 2.7 to 1.5. This is another example where copying the data (the half rows) out and reordering it so it has better locality, would probably make a big difference. That sort of thing always works exceptionally well on AMD chips. Bill. On 17 May, 21:05, Bill Hart [EMAIL PROTECTED] wrote: That's looking good. Would you like me to run it on an unburdened opteron to see how it goes? If you like you can send me a tarball and I'll try it out. I think our best bet for a significant improvement now is the idea of using two Gray tables of half the size simultaneously. I also realised it possibly improves the cache performance for the A matrix too. I was casually wondering whether Magma might use a highly optimised Winograd's algorithm instead of the naive algorithm. But over GF2 I think it probably actually takes longer, since it basically replaces n^2 full length scalar multiplies by n^2 half length ones and 2*n^2 half row additions, plus a pile of other overhead. Bill. On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED] wrote: On Saturday 17 May 2008, Martin Albrecht wrote: I think a better idea would be to explicitly force all matrices and all rows to be 128 bit aligned if the matrices are wide enough to benefit from SSE2, Then the combine function can always use SSE2 and there will be no need to check for alignment. That doesn't seem to make a noticeable difference for me (on C2D). However, I realised that the multiplications where the target matrix is a real matrix rather than a window (which has bad data locality). Copying everything over seems not like a good idea but it at least indicates an area for improvements. Okay, if I only copy when we crossover to M4RM then the memory overhead is constant (~ cutoff^2) and the performance still improves. Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 3.610 16,384 x 16,384 11.140 12.120 20,000 x 20,000 20.370 24.390 32,000 x 32,000 74.290 94.910 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 2.990 16,384 x 16,384 11.140 11.750 20,000 x 20,000 20.370 21.180 32,000 x 32,000 74.290 86.570 On Opteron things don't look this way, but I think sage.math is pretty heavily used right now such that my benchmarks there are not very telling. Martin -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http
[sage-devel] Re: slightly OT: new M4RI library
I checked the coinflip and it is definitely fine. There is no greater probability of 6 zeroes in a row than there ought to be. So the speedup I just reported is quite a mystery. Bill. On 17 May, 22:57, Bill Hart [EMAIL PROTECTED] wrote: I suppose that this might be due to the ends of rows all being zero as they aren't a multiple of 64 bits long. But I checked for 16384x16384 and we are nearly down to the speed of Magma there too. I just don't get it. The coinflip has to be broken I think. Bill. On 17 May, 22:40, Bill Hart [EMAIL PROTECTED] wrote: Martin, Here's a really unusual thing. Perhaps you can confirm this. I get a 20% improvement if I add: if (x) { } in the three obvious places in the function _mzd_mul_m4rm_impl. This stops it mpz_combining the zero row. But I don't understand why this works. The time should be only 1.5% better since k = 6 and there are 2^k rows in the table, only one of which is zero. Could it be that your coinflip function is not quite random? Anyway, I'm down to 3.40s for 1x1 with this change. Test functions still pass. Bill. On 17 May, 22:05, Bill Hart [EMAIL PROTECTED] wrote: Yet another idea. Suppose we do not combine entire rows in the Gray table, but only half rows. Once half a row is bigger than a single cache line (512 bits on the Opteron) we may as well work with half rows. This allows us to work with twice as many rows at once in the Gray tables (each of half the size). This means that we are dealing with twice as many bits from rows of A as usual and twice as many rows of B as usual, but we need to do it all again for the second half of the rows. This means we get twice the work done in the same amount of cache space. Combined with the idea of using two Gray tables of 2^5 combinations of rows instead of a single table of 2^6 combinations of rows, this would equate to dealing with 20 bits of each row of A at a time and 20 rows of B at a time. With this scheme, there would then be 4 arithmetic operations in SSE registers, 5 loads and 1 store, when combining rows from Gray tables, instead of about 6.6 loads, 3.3 stores and 3.3 arithmetic operations, changing the ratio of load/stores to arithmetic ops from 2.7 to 1.5. This is another example where copying the data (the half rows) out and reordering it so it has better locality, would probably make a big difference. That sort of thing always works exceptionally well on AMD chips. Bill. On 17 May, 21:05, Bill Hart [EMAIL PROTECTED] wrote: That's looking good. Would you like me to run it on an unburdened opteron to see how it goes? If you like you can send me a tarball and I'll try it out. I think our best bet for a significant improvement now is the idea of using two Gray tables of half the size simultaneously. I also realised it possibly improves the cache performance for the A matrix too. I was casually wondering whether Magma might use a highly optimised Winograd's algorithm instead of the naive algorithm. But over GF2 I think it probably actually takes longer, since it basically replaces n^2 full length scalar multiplies by n^2 half length ones and 2*n^2 half row additions, plus a pile of other overhead. Bill. On 17 May, 20:32, Martin Albrecht [EMAIL PROTECTED] wrote: On Saturday 17 May 2008, Martin Albrecht wrote: I think a better idea would be to explicitly force all matrices and all rows to be 128 bit aligned if the matrices are wide enough to benefit from SSE2, Then the combine function can always use SSE2 and there will be no need to check for alignment. That doesn't seem to make a noticeable difference for me (on C2D). However, I realised that the multiplications where the target matrix is a real matrix rather than a window (which has bad data locality). Copying everything over seems not like a good idea but it at least indicates an area for improvements. Okay, if I only copy when we crossover to M4RM then the memory overhead is constant (~ cutoff^2) and the performance still improves. Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 3.610 16,384 x 16,384 11.140 12.120 20,000 x 20,000 20.370 24.390 32,000 x 32,000 74.290 94.910 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 2.990 16,384 x 16,384 11.140 11.750 20,000 x 20,000 20.370 21.180
[sage-devel] Re: slightly OT: new M4RI library
Woot!! On 17 May, 23:46, Martin Albrecht [EMAIL PROTECTED] wrote: Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 3.610 16,384 x 16,384 11.140 12.120 20,000 x 20,000 20.370 24.390 32,000 x 32,000 74.290 94.910 New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) 10,000 x 10,000 2.920 2.990 16,384 x 16,384 11.140 11.750 20,000 x 20,000 20.370 21.180 32,000 x 32,000 74.290 86.570 If you take this + Bill's idea: For 1x1 we currently use k = 6. Instead of this, we could use k = 5 and make two Gray tables simultaneously. This will still fit in cache. Instead of doing 6 bits at a time, we can then do 10 bits at a time. We'd load the appropriate line from the first Gray table, then the appropriate one from the second and xor them, then xor with the output matrix. This should decrease the number of loads and stores considerably. Moreover, the SSE instructions will then be much more efficient as the ratio of arithmetic instructions to loads and stores is higher. Of course one could also do 16 bits at a time, by doing 4 tables, but I think this might actually get slower again since you've only increased the amount of work done by 60%, but you've had a 30 % increase in instructions. You get (on the C2D): sage: B = random_matrix(GF(2), 3.2*10^4, 3.2*10^4) sage: A = random_matrix(GF(2), 3.2*10^4, 3.2*10^4) sage: time C= A._multiply_strassen(B,cutoff=2^11) CPU times: user 75.82 s, sys: 0.22 s, total: 76.04 s Wall time: 76.31 sage: A = random_matrix(GF(2), 2*10^4, 2*10^4) sage: B = random_matrix(GF(2), 2*10^4, 2*10^4) sage: time C= A._multiply_strassen(B,cutoff=2^11) CPU times: user 19.14 s, sys: 0.09 s, total: 19.24 s Wall time: 19.29 sage: B = random_matrix(GF(2), 2^14, 2^14) sage: A = random_matrix(GF(2), 2^14, 2^14) sage: time C= A._multiply_strassen(B,cutoff=2^11) CPU times: user 10.62 s, sys: 0.05 s, total: 10.67 s Wall time: 10.70 sage: B = random_matrix(GF(2), 10^4, 10^4) sage: A = random_matrix(GF(2), 10^4, 10^4) sage: time C= A._multiply_strassen(B,cutoff=2^11) CPU times: user 2.73 s, sys: 0.02 s, total: 2.75 s Wall time: 2.76 i.e the speed of my current Magma install on the same computer (mind you, this one might not be optimised for the C2D but for the Opteron, I don't know). The times above don't have SSE2 yet. I guess documenting Bill's tricks - process the rows of A in blocks - use two rather than one Gray code table well is in order since now M4RM looks quite different from the original algorithm. I'll do that tomorrow. Again, thanks Bill! Martin -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
I suppose that this might be due to the ends of rows all being zero as they aren't a multiple of 64 bits long. But I checked for 16384x16384 and we are nearly down to the speed of Magma there too. I just don't get it. The coinflip has to be broken I think. If one uses M4RI with the new patch from within Sage another PRBG is used, but coinflip should be fine. Don't see these speedups (but I have two Gray code tables and this warrants for more if's) Hi, I think we might consider merging our two forks again? Or do you also have the two Gray code tables? Are your timings on the Opteron? Because then things look really goo since mine are on the C2D. Exciting times, Martin -- name: Martin Albrecht _pgp: http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www: http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Does the '.spkg' format just cause more problems than it solves?
Sage is distributed as a lot of files ending in .spkg, which are basically tar files compressed with bzip2. I myself think it would be better if instead of using these files, Sage was simply distributed as a set of source files. I see several problems with these files. * Since they are compressed tar files, if you really want to use it, why was the extension .tar.bz2 not used? At least a user would be able to work out the file format. * Distributing binary files like this makes it difficult to use CVS or similar. * There probably is an easier way (if so tell me), but the way I am trying to build sage on Solaris I find I'm contantly recreated a .spkg file. For example, I've notived a possible problem with /spkg/build/ singular-3-0-4-2-20080405.p1/src/Singular/flexer.sh Is there a better way to apply a fix than to make the charge, then tar up the directory, then compress it, then overwrite the old .spkg file? I assume there is a better way. Sage is pretty unique in the way all these packages are distributed in source form. I'm not convinced this uniqueness is a good thing, but perhaps I am wrong. Dave --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
I don't have the two Gray code tables, so it would be good to get your version. Also my code is currently a mess, so it would be good to clean it up by merging with a cleaner version (yours). Tomorrow I'll check carefully what I've changed and try and merge the ideas if there are any you don't have which definitely improve performance on the Opteron. The speedups I am seeing from the ifs are possibly a feature of the Opteron cache algorithms. It is very sensitive when things just begin to fall out of cache, as they certainly are here. Not combining with the zero row just nudges things closer in to the cache boundary since it never has to read that row. I have checked and the speedups are quite reproducible, and they definitely come from the ifs, though I am now using a crossover with Strassen of 7200!! Bill. On 18 May, 00:12, Martin Albrecht [EMAIL PROTECTED] wrote: I suppose that this might be due to the ends of rows all being zero as they aren't a multiple of 64 bits long. But I checked for 16384x16384 and we are nearly down to the speed of Magma there too. I just don't get it. The coinflip has to be broken I think. If one uses M4RI with the new patch from within Sage another PRBG is used, but coinflip should be fine. Don't see these speedups (but I have two Gray code tables and this warrants for more if's) Hi, I think we might consider merging our two forks again? Or do you also have the two Gray code tables? Are your timings on the Opteron? Because then things look really goo since mine are on the C2D. Exciting times, Martin -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
P.S: yes all my times are on a 2.8Ghz Opteron. Cpuinfo says: [EMAIL PROTECTED]:~/m4ri-20080514/testsuite cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2220 stepping: 3 cpu MHz : 1000.000 cache size : 1024 KB snip flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy snip The 1000.000 there refers to the FSB. Bill. On 18 May, 00:12, Martin Albrecht [EMAIL PROTECTED] wrote: I suppose that this might be due to the ends of rows all being zero as they aren't a multiple of 64 bits long. But I checked for 16384x16384 and we are nearly down to the speed of Magma there too. I just don't get it. The coinflip has to be broken I think. If one uses M4RI with the new patch from within Sage another PRBG is used, but coinflip should be fine. Don't see these speedups (but I have two Gray code tables and this warrants for more if's) Hi, I think we might consider merging our two forks again? Or do you also have the two Gray code tables? Are your timings on the Opteron? Because then things look really goo since mine are on the C2D. Exciting times, Martin -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
On Sunday 18 May 2008, Bill Hart wrote: I don't have the two Gray code tables, so it would be good to get your version. Also my code is currently a mess, so it would be good to clean it up by merging with a cleaner version (yours). Tomorrow I'll check carefully what I've changed and try and merge the ideas if there are any you don't have which definitely improve performance on the Opteron. The speedups I am seeing from the ifs are possibly a feature of the Opteron cache algorithms. It is very sensitive when things just begin to fall out of cache, as they certainly are here. Not combining with the zero row just nudges things closer in to the cache boundary since it never has to read that row. I have checked and the speedups are quite reproducible, and they definitely come from the ifs, though I am now using a crossover with Strassen of 7200!! I'm using a crossover of 2048 here, so maybe our improvements are orthogonal? Even more puzzling, I'd expect that my crossover should be bigger than yours. (on a side note: my code changes how the crossover is used, your version: 'size cutoff', my version: '|cutoff - size| is minimal' which should give a actual cutoffs closer to the desired values). My version is here: http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg (this needs an updated patch for Sage) and here: http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz (which is the raw source). Those don't have SSE2 yet but it doesn't seem to make that much of a difference anyway. I'll add that back before doing an official release. However, unfortunately I'll probably have limited/no time tomorrow to commit. Martin PS: To give at least some indication that my code still does the right thing, a 'known answer' test: sage: A = random_matrix(GF(2), 10^3, 10^3) sage: B = random_matrix(GF(2), 10^3, 10^3) sage: (A*B)._magma_() == A._magma_() * B._magma_() True sage: (A._multiply_strassen(B,cutoff=256))._magma_() == A._magma_() * B._magma_() True -- name: Martin Albrecht _pgp: http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www: http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: Does the '.spkg' format just cause more problems than it solves?
On May 18, 11:22 am, Dr. David Kirkby [EMAIL PROTECTED] wrote: Sage is distributed as a lot of files ending in .spkg, which are basically tar files compressed with bzip2. I myself think it would be better if instead of using these files, Sage was simply distributed as a set of source files. I see several problems with these files. * Since they are compressed tar files, if you really want to use it, why was the extension .tar.bz2 not used? At least a user would be able to work out the file format. * Distributing binary files like this makes it difficult to use CVS or similar. * There probably is an easier way (if so tell me), but the way I am trying to build sage on Solaris I find I'm contantly recreated a .spkg file. For example, I've notived a possible problem with /spkg/build/ singular-3-0-4-2-20080405.p1/src/Singular/flexer.sh Is there a better way to apply a fix than to make the charge, then tar up the directory, then compress it, then overwrite the old .spkg file? I assume there is a better way. Sage is pretty unique in the way all these packages are distributed in source form. I'm not convinced this uniqueness is a good thing, but perhaps I am wrong. On my linux desktop, ark - the kde front end to compression programs - identify them easily as tar.bz2 but ask for confirmation. As for applying patch, when I was working with a mainly monolithic sage on Gentoo that's pretty much what I was doing. I even wrote a set of commands to automate it as much as possible. I don't know if cpio would handle tar.bz2 in which case it probably would be a better way. Francois --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
Here are the times I get with the different cutoffs. Magma M4RI:7200 M4RI:2048 1x1: 2.940s 3.442s 4.132s 16384x16384: 9.250s 11.47s 11.80s 2x2: 16.57s 19.3s 26.05s 32000x32000: 59.05s 71.9s 71.8s So it seems when there is not an exact cut, the higher cutoff is substantially better. Don't know why that is. Tomorrow I'll see if there is anything I have that speeds up your code. I'm hopeful we'll be within about 5% on the Opteron by then. The other ideas I outlined above should push us 10-15% ahead of Magma if we end up implementing them, I think. Of course one can go too crazy with optimisation. Bill. On 18 May, 00:40, Martin Albrecht [EMAIL PROTECTED] wrote: On Sunday 18 May 2008, Bill Hart wrote: I don't have the two Gray code tables, so it would be good to get your version. Also my code is currently a mess, so it would be good to clean it up by merging with a cleaner version (yours). Tomorrow I'll check carefully what I've changed and try and merge the ideas if there are any you don't have which definitely improve performance on the Opteron. The speedups I am seeing from the ifs are possibly a feature of the Opteron cache algorithms. It is very sensitive when things just begin to fall out of cache, as they certainly are here. Not combining with the zero row just nudges things closer in to the cache boundary since it never has to read that row. I have checked and the speedups are quite reproducible, and they definitely come from the ifs, though I am now using a crossover with Strassen of 7200!! I'm using a crossover of 2048 here, so maybe our improvements are orthogonal? Even more puzzling, I'd expect that my crossover should be bigger than yours. (on a side note: my code changes how the crossover is used, your version: 'size cutoff', my version: '|cutoff - size| is minimal' which should give a actual cutoffs closer to the desired values). My version is here: http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg (this needs an updated patch for Sage) and here: http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz (which is the raw source). Those don't have SSE2 yet but it doesn't seem to make that much of a difference anyway. I'll add that back before doing an official release. However, unfortunately I'll probably have limited/no time tomorrow to commit. Martin PS: To give at least some indication that my code still does the right thing, a 'known answer' test: sage: A = random_matrix(GF(2), 10^3, 10^3) sage: B = random_matrix(GF(2), 10^3, 10^3) sage: (A*B)._magma_() == A._magma_() * B._magma_() True sage: (A._multiply_strassen(B,cutoff=256))._magma_() == A._magma_() * B._magma_() True -- name: Martin Albrecht _pgp:http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x8EF0DC99 _www:http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
On May 17, 2008, at 8:38 PM, Bill Hart wrote: Of course one can go too crazy with optimisation. No surely that never happens around here. david --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] sed problem with Singular/flexer.sh on Solaris
I'm getting a build problem on Solaris 10 (SPARC) The script Singular/ flexer.sh assumes 'flex' is present, then tries to extract the version number. This line: TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\\.//` is creating an error on Solaris. I think it tries to get the last part of the version - i.e. the Z of version X.Y.Z. Anyway, assuming that is what is is needed, re-writing the line to: TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\.//` solves the problem. Here is part of the original, without the proposed change. VERSION=`flex --version |sed -e s/^.*version //|sed -e s/^flex //` LV=`echo $VERSION|sed -e s/\.[0-9]*\.[0-9]*\$//` MIDV=`echo $VERSION|sed -e s/^[0-9]*\.//|sed -e s/\.[0-9]*\$//` TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\\.//` #echo $LV $MIDV $TV #goodversion= if [ $LV -lt 2 ]; then goodversion=true fi if [ $LV -eq 2 ]; then if [ $MIDV -lt 5 ]; then goodversion=true fi if [ $MIDV -eq 5 ]; then if [ $TV -le 4 ]; then goodversion=true; fi fi fi --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: sed problem with Singular/flexer.sh on Solaris
On May 18, 2:45 am, Dr. David Kirkby [EMAIL PROTECTED] wrote: Hi David, I'm getting a build problem on Solaris 10 (SPARC) The script Singular/ flexer.sh assumes 'flex' is present, The need to have flex installed has been fixed in 3.0.2.alpha0. then tries to extract the version number. This line: TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\\.//` is creating an error on Solaris. I think it tries to get the last part of the version - i.e. the Z of version X.Y.Z. Anyway, assuming that is what is is needed, re-writing the line to: TV=`echo $VERSION|sed -e s/^[0-9]*\.[0-9]*\.//` This should go upstream. malb? solves the problem. Here is part of the original, without the proposed change. SNIP While we are talking about libSingular: The part of spkg-install that create the Singular script is buggy since it depends on GNU's tail to work. I haven't fixed that yet, but I plan to get it done in 3.0.2 or 3.0.3. Cheers, Michael --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
On 18 May, 00:40, Martin Albrecht [EMAIL PROTECTED] wrote: My version is here: http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg (this needs an updated patch for Sage) and here: http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz (which is the raw source). This pure C version seems to be the old version, before you made either of the two big changes. Bill. --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: Does the '.spkg' format just cause more problems than it solves?
On May 18, 1:22 am, Dr. David Kirkby [EMAIL PROTECTED] wrote: Hi David, Sage is distributed as a lot of files ending in .spkg, which are basically tar files compressed with bzip2. I myself think it would be better if instead of using these files, Sage was simply distributed as a set of source files. I see several problems with these files. * Since they are compressed tar files, if you really want to use it, why was the extension .tar.bz2 not used? At least a user would be able to work out the file format. Not all spkgs are compressed. The only exception to the rule that is in the default distribution is the Fortran.spkg. Spkgs can also contain binaries or databases, so it isn't always sources. How to work with spkgs is documented in the developer's manual. * Distributing binary files like this makes it difficult to use CVS or similar. We don't track spkgs in any RCS. That is mostly due to their size. And the src directory in an spkg is supposed to be vanilla upstream as documented in SPKG.txt, so if it ever gets corrupted we can just nuke it and replace it with a vanilla tarball. We have copies of all old spkgs around, so when we need to go back and fine something older that isn't a problem. * There probably is an easier way (if so tell me), but the way I am trying to build sage on Solaris I find I'm contantly recreated a .spkg file. For example, I've notived a possible problem with /spkg/build/ singular-3-0-4-2-20080405.p1/src/Singular/flexer.sh Is there a better way to apply a fix than to make the charge, then tar up the directory, then compress it, then overwrite the old .spkg file? I assume there is a better way. sage -pkg foo/bad does all the dirty work for you. But once you source local/bin/sage-env you can just run ./spkg-install inside the spkg's directory. Once you got all the changes done apply the changes to a fresh spkg so that you do not add all the binary crap/left over from the build. And since we do ship vanilla sources you need to add a patched file into the patches directory and copy it over since we do not use patch. The reason for not using patch is simply that patch is often absent or broken on many systems. The fact that patch on Solaris per default doesn't understand unified diff makes my blood boil each time I run into that problem. I know there is gpatch, but that doesn't really solve the problem Sage is pretty unique in the way all these packages are distributed in source form. I'm not convinced this uniqueness is a good thing, but perhaps I am wrong. I am sure you are :). We are aiming for wide build support and that includes Windows via Cygwin [which William and I are working on to make it supported again in 3.0.x or 3.x] and then MSVC. the spkg format is KISS and I doubt any proposed change will make live more complicated. So far *any* proposed change didn't make it since the current spkg format works with warts and all. Dave Cheers, Michael --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: Does the '.spkg' format just cause more problems than it solves?
On May 18, 1:14 am, Francois [EMAIL PROTECTED] wrote: Sage is pretty unique in the way all these packages are distributed in source form. I'm not convinced this uniqueness is a good thing, but perhaps I am wrong. On my linux desktop, ark - the kde front end to compression programs - identify them easily as tar.bz2 but ask for confirmation. As for applying patch, when I was working with a mainly monolithic sage on Gentoo that's pretty much what I was doing. I even wrote a set of commands to automate it as much as possible. I don't know if cpio would handle tar.bz2 in which case it probably would be a better way. Francois I guess one could make a script to simplyify the process of making changes to the .spkg files somewhat. But the fact the source is distributed in large compressed files means that even the simplest change will need a new .spkg file to be made. If someone else has made that change, and you want to use it, you have to download a large .spkg file. In contrast, if the actual source files could be checked out from a CVS (or whatever) repositry, then it would be much easier to keep up to date. A change of a few bytes in one file will only need a few bytes to be downloaded, not a multi-megabyte .spkg file. --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: slightly OT: new M4RI library
I managed to get the modified version from the spkg. Nice code!! Unfortunately it is not as fast on my opteron. So more work tomorrow for me to try and get it down to the same times as I have with my version. Here are the times all on my opteron. Note your CTD version was optimal at a cutoff of 2048, not 7200 as for my code. Now I am worried that maybe my code is actually broken somehow and still passing the test code. I'll carefully make the changes to your code tomorrow to see if that is the case. Magma CTD-M4RI:2048 AMD-M4RI:7200 AMD-M4RI:2048 1x1: 2.940s 3.13s 3.442s 4.132s 16384x16384: 9.250s 12.96s 11.47s 11.80s 2x2: 16.57s 22.43s 19.3s 26.05s 32000x32000: 59.05s 90.20s 71.9s 71.8s Bill. On 18 May, 01:58, Bill Hart [EMAIL PROTECTED] wrote: On 18 May, 00:40, Martin Albrecht [EMAIL PROTECTED] wrote: My version is here: http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg (this needs an updated patch for Sage) and here: http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz (which is the raw source). This pure C version seems to be the old version, before you made either of the two big changes. Bill. --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: Does the '.spkg' format just cause more problems than it solves?
On May 18, 3:13 am, Dr. David Kirkby [EMAIL PROTECTED] wrote: SNIP Hi David, I guess one could make a script to simplyify the process of making changes to the .spkg files somewhat. But the fact the source is distributed in large compressed files means that even the simplest change will need a new .spkg file to be made. If someone else has made that change, and you want to use it, you have to download a large .spkg file. Sure, but few people actually make changes for spkgs and the vast majority of changes are to the Sage libaray, which one can pull directly. In contrast, if the actual source files could be checked out from a CVS (or whatever) repositry, then it would be much easier to keep up to date. A change of a few bytes in one file will only need a few bytes to be downloaded, not a multi-megabyte .spkg file. But that requires massive infrastructure and sticking 150mb+ compressed sources into some RCS isn't a walk in the park. Bandwidth is cheap and plentiful, CPU cycles to get the same number of updates from an RCS: not so much. And we do not assume a working RCS to compile Sage from scratch since that would be a technical hurdle many people cannot cross. It is all about KISS ;) Cheers, Michael --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Fwd: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)
Hi Sage-Devel, Can somebody volunteer to make a trac ticket and put all the following fixes into sage? William -- Forwarded message -- From: Johann Tonsing [EMAIL PROTECTED] Date: Sat, May 17, 2008 at 4:17 PM Subject: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28) To: [EMAIL PROTECTED] William, Thanks for your efforts. Having used SAGE for a few hours I am duly impressed. Herewith a few minor corrections to the tutorial + other documentation. help: a. The variable DATA contains the directory with data files that you upload into the worksheet. For example, to open a file in that directory, do open(DIR+'filename'). DIR = DATA b. Inconsistent capitalisation: Split and join cells should perhaps be Split and Join Cells, also DATA variable should be DATA Variable, etc. Tutorial: 1. doc/live/tut/node9.html: Note: You should not type the triple dots ... above; they are just to emphasize that the code is indented. The dots do not appear in the HTML live notebook version. Ideally the tools would provide a way to omit some text in this version. If impossible write something like: Note: The interactive interpreter may display three dots (...) to indicate that code is indented - these don't need to be entered. --- 2. doc/live/tut/node20.html: Type p.show(axes=false) to see this without any ases. ases = axes --- 3. doc/live/tut/node13.html Next lets do some arithmetic. suggest rather: Next, let's do some arithmetic. --- 4. doc/live/tut/node23.html equatoins = equations --- 5. doc/live/tut/node24.html pari and maxima = PARI and Maxima The Sage notebook version displays vellip#vdots; in the table (whether viewed in Opera for Mac version +- 9.5 or Safari 3.1). Just vellip; might have worked. The static version exhibits the same problem. --- 6. doc/live/tut/node40.html gap.console(): You are completely using another program, e.g., Gap/Magma/GP a. Add . after GP. b. Suggest replacing entire sentence with: gap.console(): This opens the GAP console, i.e. transfers control to GAP. as the phrase completely using is unclear. c. Suggest replacing occurrences of Gap that refer to the GAP software with GAP --- this applies to this page and possibly other documentation files, perhaps search for the word Gap. --- 7. doc/live/tut/node56.html Please note that you cannot do a stats = prun -r A*A for some internal reason. replace with Note: entering stats = prun -r A*A displays a syntax error message because prun is an IPython shell command, not a regular function. --- 8. See SAGE_ROOT/examples/pyrex/factorial.spyx for an example of a compiled implementation of the factorial function that directly uses the GMP C library Replace examples/pyrex with examples/programming/sagex as the file seems to have moved. Please confirm the exact directory name to list here as there are two instances of factorial.spyx shipped with v3.0.1. --- 9. doc/live/tut/node46.html Suggest replacing In particular, attach has the side effect of (auto-reload), very handy when debugging code, while load does not. with The attach command automatically reloads a file whenever it changes, which is handy when debugging code, whereas the load command only loads a file once. as (auto-reload) is not defined anywhere (is this a LISP command?) and therefore might not make sense. --- 10. doc/static/ref/node57.html Heierarchy = Hierarchy --- 11. doc/static/ref/node58.html DD NOT = DO NOT (IMHO ***NOT*** can be replaced with NOT - shouting (capital letters) should surely be enough? Alternatively render the NOT in bold.) --- 12. doc/static/ref/node18.html SAGE includes the Moin Moin Wiki interactive web page system standard. just omit standard -or- standard = as standard -or- standard = by default --- 13. doc/static/ref/module-sage.plot.plot.html We combine together = We combine Intuitive usage and completeness notes The following are not really errors, just potential sources of confusion. 1. doc/live/tut/node10.html When reading the complex numbers CC (which uses I (or i), as usual, for the square root of −1). I tried (1+2*I) in CC (1+2*i) in CC and was surprised to receive False. Eventually I figured out one has to define I or i first. Suggest rather the complex numbers CC (which uses I (or i) for the square root of −1 -- just enter I=CC.0 to define I). --- 2. doc/live/tut/node10.html It might have been handy if optional_packages() and install_package('database_gap-4.4.10') were already executable blocks. --- 3. doc/live/tut/node10.html I installed database_gap. The actual output of K.galois_group() and K.class_group() when evaluated were not what was pre-computed + stored in the tutorial. I was using sage 3.0.1 and database_gap 4.4.10. --- 4. doc/live/tut/node13.html (The object MPolynomialRing(GF(5),3,z) is the same as the object MPolynomialRing(GF(5),3,z).) Oh, so x == x. Ah. Huh? --- 5.
[sage-devel] Re: Fwd: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)
On May 18, 4:24 am, William Stein [EMAIL PROTECTED] wrote: Hi Sage-Devel, Can somebody volunteer to make a trac ticket and put all the following fixes into sage? SNIP 13. When I executed /Applications/sage/local/bin/maxima the following was displayed: dyld: Library not loaded: /Users/clarita/Desktop/sage-2.8.11/local/lib/libreadline.5.2.dylib Referenced from: /Applications/sage/local/lib/maxima/5.13.0/binary-clisp/lisp.run Reason: image not found Trace/BPT trap I have installed libreadline.5.2.dylib in the standard MacPorts location (/opt/local/lib/libreadline.5.2.dylib) - would it somehow be possible to build maxima to look for shared libraries in the MacPorts standard location? (I was able to resolve this by entering export DYLD_LIBRARY_PATH=/opt/local/lib ./maxima but others might not know that they should do that.) This is not a bug on our end. We ship a libreadline.5.2.dylib in $SAGE_LOCAL/lib, so the solution is to source $SAGE_ROOT/local/bin/ sage-env before randomly starting applications ;) Cheers, Michael SNIP --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: Fwd: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)
On Sat, May 17, 2008 at 7:29 PM, mabshoff [EMAIL PROTECTED] wrote: On May 18, 4:24 am, William Stein [EMAIL PROTECTED] wrote: Hi Sage-Devel, Can somebody volunteer to make a trac ticket and put all the following fixes into sage? SNIP 13. When I executed /Applications/sage/local/bin/maxima the following was displayed: Instead, do (from Terminal): cd /Applications/sage ./sage -maxima -- William dyld: Library not loaded: /Users/clarita/Desktop/sage-2.8.11/local/lib/libreadline.5.2.dylib Referenced from: /Applications/sage/local/lib/maxima/5.13.0/binary-clisp/lisp.run Reason: image not found Trace/BPT trap I have installed libreadline.5.2.dylib in the standard MacPorts location (/opt/local/lib/libreadline.5.2.dylib) - would it somehow be possible to build maxima to look for shared libraries in the MacPorts standard location? (I was able to resolve this by entering export DYLD_LIBRARY_PATH=/opt/local/lib ./maxima but others might not know that they should do that.) This is not a bug on our end. We ship a libreadline.5.2.dylib in $SAGE_LOCAL/lib, so the solution is to source $SAGE_ROOT/local/bin/ sage-env before randomly starting applications ;) Cheers, Michael SNIP -- William Stein Associate Professor of Mathematics University of Washington http://wstein.org --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] sage/numpy fan mail
Hi Sage-Devel, Here is a Sage fanmail blogpost about Sage the distribution from here: http://www.funjackals.com/blog/?p=274 Sage Makes Me Happier Than Seems Reasonable Posted May 14, 2008 I've known about the Python-for-mathematics software stack SAGE for a while now, but I hadn't played with it until today. What happened? I read Vincent Noel's very good blog entry on replacing Matlab with Python. What I had missed about SAGE is that it's got a fully self-contained build environment in the source distribution. Building the full stack, from ATLAS and BLAS through Python 2.5 and Numpy 1.0.3 is as simple as issuing a single make command. Really. Why is this exciting? Because I've recently been through the horror of trying to build Numpy and it's dependencies form source on our new server. That should be easy, right? Well, it's been a real pain in the neck. Mostly because I've had all kinds of problems getting the right versions of libraries linked in to Numpy. There are the system libraries. And the libraries that came with the Absoft Fortran 95 compiler. And the versions that I built in my account. And I can never seem to get the arguments to setup.py quite right. But now I don't have to. I can't wait to get to work tomorrow! --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: SAGE + OpenOffice PyUno /OOMath
On Mon, May 12, 2008 at 5:52 AM, Kutoma Ltd [EMAIL PROTECTED] wrote: Hallo to all, Does anyone use OpenOffice and Sage in combination like the down mentioned link describing the interface to python and the usage of the equation editor http://documentation.openoffice.org/manuals/oooauthors/MathObjects.pdf http://wiki.services.openoffice.org/wiki/PyUNO_bridge Any feedback welcome Best regards Gottfried Sorry that nobody responded to your question. This probably suggests that indeed nobody on sage-devel actually uses Sage that way. It should be possible though. Are you asking because you want to, and just want to know about pitfalls? Or? If you do try linking Sage and openoffice using PyUno, could you please post about what happens? If it doesn't work, post, etc. I've definitely _wondered_ about such linking for a long time, but never personally found time to investigate. William --~--~-~--~~~---~--~~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~--~~~~--~~--~--~---
[sage-devel] Re: Fwd: SAGE 3.0.1 doc errata, mostly w.r.t. tutorial (doc dated 2007.10.28)
I'll address some of the issues with tut.tex. See http://trac.sagemath.org/sage_trac/ticket/3208 for patches. (Some of these things have already been taken care of there, and I'll post patches to others soon.) Some of the things I'm not dealing with: I don't have Opera on my Mac, and I'm not going to install it any time soon. I don't see any of the garbled things that are reported here, so I can't do much about it. I'm also not dealing with the mare's nest that is the reference manual. John On May 17, 7:24 pm, William Stein [EMAIL PROTECTED] wrote: Hi Sage-Devel, Can somebody volunteer to make a trac ticket and put all the following fixes into sage? William Tutorial: 1. doc/live/tut/node9.html: Note: You should not type the triple dots ... above; they are just to emphasize that the code is indented. The dots do not appear in the HTML live notebook version. Ideally the tools would provide a way to omit some text in this version. This can be done. If impossible write something like: Note: The interactive interpreter may display three dots (...) to indicate that code is indented - these don't need to be entered. --- 2. doc/live/tut/node20.html: Type p.show(axes=false) to see this without any ases. ases = axes This seems to have been changed already, at least in the copy of tut.tex that I have. --- 3. doc/live/tut/node13.html Next lets do some arithmetic. suggest rather: Next, let's do some arithmetic. Same here. --- 4. doc/live/tut/node23.html equatoins = equations Same here --- 5. doc/live/tut/node24.html pari and maxima = PARI and Maxima The Sage notebook version displays vellip#vdots; in the table (whether viewed in Opera for Mac version +- 9.5 or Safari 3.1). Just vellip; might have worked. The static version exhibits the same problem. I don't see this problem. I see PARI and Maxima in node28.html, but no dots anywhere near there. --- 6. doc/live/tut/node40.html gap.console(): You are completely using another program, e.g., Gap/Magma/GP a. Add . after GP. b. Suggest replacing entire sentence with: gap.console(): This opens the GAP console, i.e. transfers control to GAP. as the phrase completely using is unclear. Okay. c. Suggest replacing occurrences of Gap that refer to the GAP software with GAP --- this applies to this page and possibly other documentation files, perhaps search for the word Gap. I don't see Gap anywhere in tut.tex. --- 7. doc/live/tut/node56.html Please note that you cannot do a stats = prun -r A*A for some internal reason. replace with Note: entering stats = prun -r A*A displays a syntax error message because prun is an IPython shell command, not a regular function. Okay --- 8. See SAGE_ROOT/examples/pyrex/factorial.spyx for an example of a compiled implementation of the factorial function that directly uses the GMP C library Replace examples/pyrex with examples/programming/sagex as the file seems to have moved. Please confirm the exact directory name to list here as there are two instances of factorial.spyx shipped with v3.0.1. Okay (fixed in earlier patch). --- 9. doc/live/tut/node46.html Suggest replacing In particular, attach has the side effect of (auto-reload), very handy when debugging code, while load does not. with The attach command automatically reloads a file whenever it changes, which is handy when debugging code, whereas the load command only loads a file once. as (auto-reload) is not defined anywhere (is this a LISP command?) and therefore might not make sense. I see In particular, {\em attach} has the side effect of auto- reloading, very handy when debugging code, while {\em load} does not. Anyway, I can change it; the suggested wording seems a bit better to me. --- 10. doc/static/ref/node57.html Heierarchy = Hierarchy --- 11. doc/static/ref/node58.html DD NOT = DO NOT (IMHO ***NOT*** can be replaced with NOT - shouting (capital letters) should surely be enough? Alternatively render the NOT in bold.) --- 12. doc/static/ref/node18.html SAGE includes the Moin Moin Wiki interactive web page system standard. just omit standard -or- standard = as standard -or- standard = by default --- 13. doc/static/ref/module-sage.plot.plot.html We combine together = We combine Intuitive usage and completeness notes The following are not really errors, just potential sources of confusion. 1. doc/live/tut/node10.html When reading the complex numbers CC (which uses I (or i), as usual, for the square root of −1). I tried (1+2*I) in CC (1+2*i) in CC and was surprised to receive False. Eventually I figured out one has to define I or i first. Suggest rather the complex numbers CC (which uses I (or i) for the square root of −1 -- just enter I=CC.0 to define I). Dealt in an earlier patch. --- 2. doc/live/tut/node10.html It might have been handy if optional_packages() and