Hi, the following might be interesting for some people around here. As time permits I will offer an optional spkg.
Cheers, Michael -------- Original Message -------- Subject: [atlas-devel] ATLAS 3.9.0 & LAPACK Date: Fri, 18 Jul 2008 06:19:10 -0500 From: Clint Whaley <[EMAIL PROTECTED]> Reply-To: List for developer discussion, NOT SUPPORT. <math-atlas- [EMAIL PROTECTED]> To: [EMAIL PROTECTED] Guys, Its been a long time coming, but I have finally heaved out 3.9.0. The main reason for the this long delay is that I did a major rewrite of ATLAS for additional rank-K performance, which timings showed was a big win **until I fixed the performance bug that mandated 3.8.2**. After that, I found I had written thousands of lines of code for nothing, so I had to yank the code back out :( However, 3.9.0 is finally out, and it has several key features that I hope will make it worth the wait. There are much improved DGEMM kernels for Core2Duo64 and K10h64 architectures. These kernels (particulary K10h) can still be improved, and I haven't yet ported them to single precision or 32 bits. However, this should provide some relief on the Core2Duo, where ATLAS was taking a savage beat-down from Goto and MKL blas. ATLAS still trails Goto, but it is not quite the same excoriating humiliation now (at least for for double). The key to the Core2Duo64 was doing 2D blocking, which I had tested but apparently messed up before. Thanks to Yevgen Voronenko of CMU/ SPIRAL, who gave me a code fragment to work from (see ATLAS/doc/AtlasCredits.txt for details). The main focus of 3.9.0 has been in improving ATLAS's LAPACK support. The first of these is that you no longer have to install LAPACK separately from ATLAS. If you have LAPACK 3.1.1 untarred somewhere, you can use the flag '-Ss lasrc /path/to/lapack3.1.1/SRC', and ATLAS will automatically build it during the ATLAS build, with no need for the flag/make.inc headaches that we have in the 3.8 series. You can also provide '--with-netlib-lapack-tarfile=/path/to/tarfile' and ATLAS will extract the tarfile for you in the ATLAS directory, and build it from there. If you have more than one install, you can save space by using the -Ss flag, so that all ATLAS installs share one copy of the LAPACK source, so I recommend the first method. The second big lapack push for this release is that I've started to support a new C API for lapack, which I hope to eventually expand to all of LAPACK. For most of the routines, it calls the F77LAPACK, but for ATLAS native routines (like LU/Cholesky) it calls ATLAS's faster routines instead. The name is the f77 name, in lower case, with a "C_" prepended. Thus DGETRF is C_dgetrf. Character arguments (Uplo, Trans, etc) are replaced by CBLAS enum types, and all (non-complex) scalars are passed by value. This API supports only column major arrays (it mostly calls the F77/netlib lapack, which are column-major only). Routines that take workspace in F77 don't in the C_ equivalents, as the wrapper auto- queries LAPACK and allocates. However, if you want to allocate the work yourself, the routine taking workspace usually exists with the name C_rout_wrk and you can test if it exists by doing (it may not exist if ATLAS supports the routine natively): #ifdef ATL_C2F<rout>_wrk__ (eg., ATL_C2Fdgels_wrk__) This API is currently supported for the following LAPACK routines: ATLAS native routines: xPOSV xGESV xPOTRF xGETRF xPOTRS xPOTRI xLAUMM C2F wrappers: xGELS xGELQF xGERQF xGEQLF xGEQRF Obviously, you need to build the full lapack library (and thus need a functional F77 compiler) to use these routines. You can find more info in the following files found in ATLAS/include: C_lapack.h # main header file you must include to use the C_lapack API clapack.h # header for ATLAS's native lapack atlas_C2Flapack.h # header for C to F77 wrapper functions. I would like to get some feedback on this new API. I use macros to select between native & C2F files to save some calling overhead. Is this real bad news for people? Will it make your life easier to have a full C API supported out-of-the-box for ATLAS? If there is a demand for this API, I can fill it out fairly rapidly (with some help from you guys for testing); if there's not, I will populate it only as needed for internal ATLAS stuff. So, speak up if you are interested! Finally, the last lapack deal is that ATLAS can now tune some of the lapack routines that it doesn't natively support by empirically tuning LAPACK's blocking factor to both the platform and problem size. Right now, ATLAS autotunes only the QR factorization routines mentioned above. Initial timings show improvements ranging from 5-25% (as much as 75% for small problems on Itanium!). Core2Duo64SSE3 has arch defaults with QR pretuned. Be default, ATLAS does not tune LAPACK. To enable it, you pass '-Si latune 1' to configure. You will only want to do this if QR (or one of the many LAPACK routines that call it) is important to you: my present BFI lapack tuning framework adds roughly 3 hours to a *fast* machine's install! Cheers, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Math-atlas-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/math-atlas-devel --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---