Regarding Forth, check out Jonesforth. Truly one of the most beautiful pieces of code on the web.
Bill. On 22 June 2010 23:45, Antony Vennard <antony.venn...@gmail.com> wrote: > Just a quick check in from me. > > I am indeed still working on this. Work has been suspended slightly > because I set up a company and am just finishing with a client project > (gotta fund myself somehow). However, the upside is I've written a > python script to automate quite a bit of the process of writing a django > app (because nothing's worse than writing the same thing over and over > again inserting different variable names). So fingers crossed it > shouldn't take me too much longer. Next few weeks, I promise and > something more will be working. > > Now, I'm a programmer, not a website designer, which means whenever I go > near CSS I produce ugly websites no matter how hard I try, so if anyone > has a taste for design or wants to explore their creative side, say hi > now. Or, if you're a django whiz, say hi now also. I'm not fussy. > > Disclaimer: the following will be bsd-licensed and/or relate to bsdnt > > In other news - I've been trying to get around how to parallelise > operations in bsdnt - it seems I'm trying to solve a very hard problem > however in parellelising the core functions which are already very > efficient if implemented from scratch in assembly and performing at a > level comparable to existing bignum libraries. Bill suggested actually > what might be better is to write vector functions where one passes an > array of arbitrary precision numbers to a operation function and it does > all of them at once. > > This is actually relatively trivial to do. Based on mpir (any > contribution I make is bsd-licensed or gplv2, take your pick) one could > do this: > > int mpz_mul_vec(mpz_ptr* w, mpz_srcptr* u, mpz_srcptr v, const int dim) > { > int i = 0; > #pragma omp parallel for shared w,u,v > for ( i = 0; i < dim; i++ ) > { > mpz_mul(w[i], u[i], v[i]); > } > > return 0; > } > > Basically openmp would then break this down into pthread_create function > calls (try the ldd trace on any program not linking to pthreads and > you'll find pthreads linked in...) or CreateThread() on Windows. Compile > with -fopenmp on gcc/clang and whatever it is with msvc (not sure but I > know it is there). > > Disclaimer - not tested (yet). > > The same format happens for all functions you want to parallelise or > happen in "vector" format - add, sub, whatever. > > You could go further and detect the number of cpus present / number of > threads requested and divide dim by that and have two loops, such that > you stick to the "optimum" number of threads per cpu rather than going > all-out parallel and getting nowhere because the cpu is constantly > switching between threads. Here's a rough sketch of how it might be done > (disclaimer - I wrote this in my email client - it can probably be done > better): > > int mpz_mul_vec(mpz_ptr* w, mpz_srcptr* u, mpz_srcptr v, const int dim) > { > int nt = omp_get_num_threads(); > int i = 0; int x = 0; > int ns = (dim / nt) + 1; > int nsdim = dim; > int tdim = dim > > for ( x = 0; x < ns; x++ ) > { > /* there has to be a better way to achieve this bit > * too late at night to be doing this > */ > if ( tdim >= dim ) > { > nsdim = dim; > } > else > { > nsdim = tdim; > } > > #pragma omp parallel for shared w,u,v > for ( i = 0; i < nsdim; i++ ) > { > mpz_mul(w[i], u[i], v[i]); > } > > tdim -= dim; > } > > return 0; > } > > Disclaimer: definitely not tested yet. > > Now, the next stage is to get things going on the gpu. For that, the > word forth was mentioned somewhere so I'm currently slowly, when I have > a spare moment I can't face using for web dev, writing one in x86-64 > linux assembly so I can then port the ideas to arb or whatever nvidia > asm lang is. Either way, I'm learning a heck of a lot, so... > > ...that's me for now. > > Antony > > On 06/22/2010 12:30 AM, Bill Hart wrote: >> Hi all, >> >> this is the first of three posts, primary audience sage-devel, esp. >> those attending Sage Days in Leiden. >> >> As many of you know, I recently stepped down from the MPIR project. >> The main reason was that I need to focus on FLINT until about 1 Oct >> 2010, then need to transition to 80% research, 20% coding (my >> colleagues at Warwick will be aghast that this is not already the >> ratio - but they probably guessed as much). Now that I have stepped >> down from MPIR, I can divulge all the "secret" development plans I >> had, in the hope that some other people might get inspired. That is >> what this email is about - potential MPIR projects that *anyone* could >> work on (assuming they code in C), even research level projects!! >> >> Firstly, in my mind, MPIR is not going away just because I stopped >> working on it regularly. Since the last release I managed, a major >> MPIR has been released, 2.1.1, which includes support for MSVC 2010 >> from Dr. Brian Gladman and proper xgcd normalisation (actually done by >> me) and a number of other fixes by Jason Moxham. Minh Nguyen managed >> the release, and did a stellar job! >> >> Brian Gladman has always said, MPIR *is* him, me and Jason, so without >> me it's just him and Jason. But Jason recently posted to mpir-devel >> with plans for the next release. So the project is not stopping. >> >> As everyone is aware, we switched to LGPL v3+ so that code could be >> exchanged with GMP devels and indeed much code from GMP 5.x.y went >> into MPIR 2.0.0. Even if MPIR turns into GMP with proper MSVC Windows >> support, it'll still be a success in my book. But a *lot* more is >> possible. Here are some of the "secret" plans I had for the future: >> >> Multiplication: >> >> * We currently don't have Toom 6 (GMP does). Someone could pull this >> code from GMP and put it in MPIR. It is largely written by Marco >> Bodrato, using a new "mixing" idea, which we had not thought of in >> MPIR. So it is very fast and well coded. Marco Bodrato and Alberto >> Zanoni are the leading experts on Toom related stuff. >> >> * We do have Toom 8 (also largely by Marco), but what should come >> next? Perhaps Toom 12? Perhaps a general Toom 2n? This is research >> mathematics. Currently such a thing doesn't exist. You could write a >> paper on it. Do please talk to Marco, who is very friendly and >> approachable and probably has lots of good ideas about what is >> possible. There's an open question as to what the actual asymptotics >> of a general Toom 2n is. The best known result is due to Knuth in >> TAOCP. >> >> * FFT - I have written a completely new FFT (I did this last >> Christmas). Apart from not having the sqrt2 trick implemented, this is >> actually almost ready to be put into MPIR. It should be *much* better >> than the FFT currently there, and I even have some new ideas which I >> could discuss with anyone who would like to do even better. I will be >> giving a talk at the upcoming Sage Days in Leiden on the FFT. If you >> get inspired, talk to me and I'll explain my new ideas. __I will *not* >> have time to complete this FFT___, but wrote extensive notes on what >> needs to be done to finish it, and there's a thousand lines of code or >> so ready for someone to work on if they so pleased. This new code is >> currently not used by anything! This is a project absolutely *anyone* >> could work on. >> >> Division >> >> * Anyone who's checked will notice MPIR's division code is actually >> slower than GMP's. This is fully intentional. It is because I decided >> we should use David Harvey's divide and conquer code (just a quick >> note - he disclaims responsibility for bugs in our code - we used his >> pre-implementation and adapted it for MPIR and he bears no >> responsibility for the ways in which we massacred it). Now >> theoretically divide and conquer code based on middle product should >> be faster than that based on straight multiplication (as currently >> used by GMP say). But this boils down to as much work being put into >> middle product code as into ordinary multiplication code, which is >> currently not the case. Specifically David Harvey invented this >> fantastic integer middle product algorithm (which was completely new). >> We've got an implementation largely by him (some by some of the GMP >> devels) which has assembly code for a basecase, then there is the >> equivalent of karatsuba multiplication for middle product (it's called >> toom42_mulmid). Actually you can "invert" any ordinary multiplication >> algorithm and turn it into an algorithm for middle product. This uses >> a type of inversion which is called Tellegen's principal (I can >> provide references, though they are also contained in David Harvey's >> paper on his website). Now, the next logical step is to implement >> Toom63_mulmid. This is an inverted Toom3 algorithm. In fact I have >> derived the correct identity for doing this (it took me ages as I did >> it by hand ;-)). Warning: David said he once gave this a go and it >> wasn't faster. However, he still left the door open for it to be >> successful because without proper assembly code for some of the >> components, it simply *can't* compete with the Toom3 multiplication in >> MPIR. But done right, it could in theory speed up division in MPIR by >> a *factor of 2*!!! Again, you might even get a small paper out of this >> work! >> >> * For asymptotically fast division, we use a precomputed inverse, i.e. >> to get f/g we first compute 1/g using Newton iteration, then we >> essentially multiply by f (that's a slight simplification, but not >> much). Now this can be made *much* faster too. Basically, Newton >> iteration can be done in something like 5/3*M(n) where M(n) is the >> time for multiplication (it's unknown whether 4/3 is possible for >> integer inversion). With FFT precaching you can combine this with the >> multiplication and get the whole division down to probably something >> like 2M(n) (might be slightly higher or lower - didn't look it up for >> a while). We are probably currently around 3M(n) or something of that >> order. Obviously this depends on the new FFT being put in, but in the >> end, we can speed up MPIR's asymptotically fast division (the concept >> for which I implemented - making reuse of some of the divide and >> conquer code from GMP for "boilerplate" stuff). In fact, it can be >> sped up a *lot*. Really a lot! >> >> GCD/XGCD >> >> * The current GCD/XGCD code in MPIR could do with a *lot* of cleaning >> up and general optimisation. GMP's is currently faster. It's all based >> on Niel's Moller's implementation of his ngcd algorithm, which is the >> best known. But I did it messily in MPIR. >> >> * I also have an idea for a new GCD algorithm for very small integers >> (a few limbs and less). I've implemented it for single limbs and it is >> way faster than what GMP uses. But it can be extended. I'd be happy to >> explain this to anyone who'd like to work on it. Again there *might* >> be some kind of paper in it. Maybe. Not sure though. (Just cause it's >> faster doesn't always get you a paper.) >> >> Jacobi Symbol >> >> * Richard Brent and Paul Zimmermann recently developed a new algorithm >> for computing Jacobi symbol asymptotically fast. They've implemented >> it and I think a future GMP will have an implementation. It's >> subquadratic with the same complexity as Schoenhage's continued >> fraction algorithm. It is not too hard to find a copy of the slides >> Paul and Richard will use to announce it at an upcoming meeting. >> There's also a paper floating around somewhere.... But this algorithm >> is based on Stehle and Zimmermann's half gcd algorithm. It is not at >> all clear to me why one couldn't derive a similar algorithm based on >> Niel's Moller's ngcd algorithm (if Paul is around, perhaps he can >> comment) at least his slides didn't rule this out. *If* such a thing >> would work, I believe it should be faster. Again, you *might* get a >> paper out of this. Actually, I don't see why not, so long as a >> comparable implementation beats Paul and Richard's implementation of >> their algorithm. As GMP will have the Brent-Zimmermann algorithm, it >> seems logical that MPIR could have the other. >> >> By the way, I want to draw attention to a new algorithm of Niels >> Moller and Torbjorn Granlund for "Improved division by invariant >> integers". MPIR already uses this algorithm as it was implemented in >> GMP and we now use that code for basecase division in MPIR. It's an >> interesting improvement and amazingly, if I understand correctly, they >> actually beat the chip for 2 by 1 limb division!! (flint2 uses this >> new algorithm extensively in the new nmod_poly module). >> >> Powm >> >> * MPIR is woefully slow at this. I vaguely recall some new REDD >> algorithm mentioned on the GMP website for this. I have no idea what >> this is about, or if it is even implemented, but a *pure guess* is it >> might be some generalised REDC algorithm, perhaps based on a similar >> trick to the divide by 3 algorithm that David Harvey and I >> (re)invented some years ago when working on FLINT. I reckon the >> likelihood is high (based on the name) that this is what it is, in >> which case, let's put it in MPIR. If not, well then there's something >> to study.... >> >> BSDNT >> >> * Many, many people have supported the idea of writing a BSD licensed >> Bignum library. Most recently Ondre Certik wrote to me *strongly* >> supporting the idea. This is complex at best, because it must be so >> different from GMP as to cause *absolutely no question whatsoever* >> that it is independent, and thus must be implemented completely from >> scratch. One idea I had was to write all the low level routines in >> LLVM instead of assembly, then just build suitable optimisations into >> LLVM so that nice assembly is automatically produced. This is entirely >> feasible and would mean only one lot of low level routines would ever >> need to be written, rather than a different set for each platform >> (LLVM supports *lots* of platforms, even the playstation SPU!!). >> However, LLVM doesn't support carry propagation well (as it is written >> with the C front end in mind and C doesn't need to support carry >> propagation). So I'm still umming and aahhing about whether this might >> be more work than just writing lots of assembly code from scratch. At >> this point I don't know. And yes it matters, it makes at least a >> factor of 2 difference in times, which we can't just "live with" for a >> bignum library. >> >> * Antony Vennard was working hard on a website for a bsdnt project. >> Bits and pieces of it function already (it's high tech). If anyone >> knows Django and would love to help Antony, let us know - lots is done >> but more needs to be done. I don't see a project as going anywhere >> without a website.... >> >> * I wrote some code for multiplication which is relatively quick up to >> a few hundred limbs, I also wrote some basecase division, but want to >> change how it works because I have a better way of doing it now. All >> this code is in very few lines, but still relies on assembly code in >> MPIR to be efficient. We obviously can't have a BSD licensed library >> relying on an LGPL library. Doesn't make sense... >> >> * I just want to emphasise, bsdnt has *nothing* to do with MPIR. It's >> a totally separate project and it came about because lots of people >> kept telling us they really wanted a BSD licensed bignum library. Its >> main goal is clean, simple code which anyone can read. Each function >> will be only a few lines. Performance will come second, (although >> there is a goal to at least be reasonable in comparison with MPIR, say >> within a factor of 2 in the first year and within 20% in the second - >> clock didn't start yet ;-)). Performance is *very* difficult to >> achieve if you specifically do *not* want to replicate any portion of >> GMP/MPIR, so there's no intention to ever beat MPIR. Actually, I don't >> think that is even possible. They represent the state-of-the art by >> some orders of magnitude. >> >> Don't forget, I'll be around at Sage Days Leiden for anyone who is >> going to be there. But also feel free to contact me on list here about >> anything I've discussed. >> >> Two more posts to go.... >> >> Bill. >> > > -- > You received this message because you are subscribed to the Google Groups > "mpir-devel" group. > To post to this group, send email to mpir-de...@googlegroups.com. > To unsubscribe from this group, send email to > mpir-devel+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/mpir-devel?hl=en. > > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-de...@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.