[sage-devel] NTL 10.3.0

Victor Shoup Fri, 18 Nov 2016 11:00:53 -0800

Go to http://shoup.net/ntl for full details.
Below is the latest entry in the change log.


I've also updated the NTL/FLINT comparison document:
http://shoup.net/ntl/benchmarks.pdf
The only significant change is to the ZZ_pX factoring benchmarks
(the new code in NTL speeds up mat_ZZ_p multiplication and
indirectly CompMod and CanZass).

======= Changes in 10.3.0 =======

 * Implementation of a multi-modular strategy for matrix multiplication
    over ZZ_p. Here are some benchmarks that compare the new strategy to
    the old (naive) one. Here, we consider n-by-n matrices modulo an
    n-bit prime.
      o n=128, speedup=6.8x
      o n=256, speedup=8.6x
      o n=512, speedup=9.3x
      o n=1024, speedup=18x
      o n=2048, speedup=37x 

    I also compared NTL's new mat_ZZ_p multiplication to FLINT's
    fmpz_mat multiplication. The latter also uses a multi-modular
    strategy. For n=128,256,512,1024 as above, NTL's code is between
    2.7 and 3.1 times faster than FLINT's (and we did not count the time
    it would take to reduce the entries mod p for the FLINT code).

    Part of this may be due to the AVX-enhanced small-prime matrix
    multiplication code used by NTL, and part may be due to better CRT
    code.

  * I also instrumented both the plain and multi-modular matrix
    multiplication for ZZ_p so that they are both "thread boosted"
    (i.e., will automatcally exploit multiple cores when possible).

  * As an initial application of this faster matrix multiplication, I
    implemented a new version of Brent/Kung modular composition for
    ZZ_pX, which is now between 2 and 5 times faster than the old one
    (depending on parameters). This is done with a new class called
    ZZ_pXNewArgument, which supersedes ZZ_pXArgument (retained for
    compatibility). This also makes the CanZass factoring algorithm for
    ZZ_pX faster (sometimes by a factor of almost 2, depending on
    parameters). It also makes CanZass more memory intensive (but while
    the overall memory usage increases, the memory access pattern is
    fairly cache friendly).

  * I would like to see if this faster matrix multiplication can be used
    to get faster linear algebra (determinants, inverses, more general
    Gaussian elimination) via reduction to matrix multiplication. If
    anyone wants to volunteer to work on this, please let me know.
    Presumably, the FLINT nmod_mat code could be repurposed for this. I
    won't have time to work on this for a few months, but would be glad
    to discuss ideas with anyone who wanted to do the work. Note that
    the "plain" versions of these routines also need some work.

  * I also added move methods to the Vec and Mat classes, and made a
    slight tweak to the Lazy class. 

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

[sage-devel] NTL 10.3.0

Reply via email to