Ssssh. I'm trying to take over Tim's cluster. peter
On 8/16/07, Jack C <[EMAIL PROTECTED]> wrote: > > Peter, > > You can always install MPI on a single host to test your code, even if you > don't have a cluster. You can even run multple threads on that single host, > it just won;t have the speedup that multiple hosts would. > > -Jack Carrozzo > > On 8/16/07, Peter St. John <[EMAIL PROTECTED]> wrote: > > > > Tim, > > I thought about this some. I don't have a cluster, but I can program in > > C; you have a cluster, but don't want to program too much. It's possible we > > can help each other. > > > > A toy application I thought of is finding numbers that can written as a > > sum of two cubes in two different ways (there's a famous story about > > Ramanujan recognizing the four-digit number of a taxicab as "the smallest > > number than can be expressed as a sum of two cubes in two distinct ways"). > > > > So I had this plan. I went to wiki and got the source of an MPI "hello > > world" application, http://en.wikipedia.org/wiki/Message_Passing_Interface > > > > > > Then I wrote a C program to take a range of numbers, say 1000 to 1999, > > and find the numbers in that interval that have the taxicab quality (it > > turns out that's what such numbers are called now!). > > Since I don't have the MPI library or a cluster I can't test the > > integrated thing, so at the moment all I have is the wiki example (which > > just fills a buffer with "I am node #<whatever>" for later reporting), which > > presumably is an OK example of MPI, and my own code (which just does > > arithmetic) which I tested myself. So if you wanted, you could try and > > integrate these, and see if you can find bigger reults by compiling for 64 > > bit arithmetic. > > > > My program wastes a huge amount of redundant calculation; it's only > > virtue is to split the memory required among nodes. So it computes x^3 + y^3 > > for silly huge ranges of x and y, at each node, but only stores results for > > comparisons in limited ranges which can be split up among nodes. > > > > I was disappointed at first because I only got 1729 (which was > > Ramanujan's number, the smallest), and 4,104: > > > > C:\PeteStJohn\Experiments\Ramanujan\Debug>ramanujan > > Deubg: i = 0 > > DEBUG: HIT: > > 01729 = 12^3 + 1^3, > > 01729 = 10^3 + 9^3 > > DEBUG: HIT: > > 04104 = 16^3 + 2^3, > > 04104 = 15^3 + 9^3 > > Deubg: i = 1000 > > DEBUG: i = 1588 is too big. > > Findrama made 2 hits > > Winner 4104: > > = 16 + 2 > > = 15 + 9 > > > > but I checked, putting those two numbers, 1729 and 4104, into the Online > > Encyclopedia of Integer Sequences > > http://www.research.att.com/~njas/sequences/?q=1729%2C+4104&language=english&go=Search > > , and that got me the "Taxi Cab Sequence" and the next value is 13,832. > > My program is naive, using an exhaustive and redundant search, with only 32 > > bit, so I get stuck when 1588 is too big becaues it's cube is bigger than > > about 4 billion, the extent of a 32 bit int. Also I didn't take the time > > even to accommodate the signum bit. > > > > I suggest you look at the wiki link. If you want to hack that code > > (which could be confusing because the address of something in this node's > > memory may not make sense when passed to some other node, but the MPI > > library makes some accommodation) let me know and I'll send you my C > > program, to call from inside that Wiki program. > > > > You might start by just running the wiki example, and see if it works > > and what you have to do to make it work. If that seems like fun then we can > > try doing math with it. > > > > I think I'll integrate them myself, but I wouldn't be able to test it > > directly. If you think this toy project is the kind of thing you could use > > let me know. Maybe some day you will run my genetic algorithm app, which is > > not a toy, for me, and I won't need a cluster :-) > > > > Well, here. It's not that big a program so I'll just paste it in. > > Serious software engineers may avert their eyes. > > > > > > // FindRama > > // Toy parallelizable math application > > // Pete St.John 8/2007 > > // > > > > #include <stdio.h> > > #include <stdlib.h> > > #include <malloc.h> > > > > > > #define BIGGEST 4000000000 > > // sloppy way to deal with 32-bit limitation > > > > struct pair > > { > > long x; > > long y; > > }; > > // a pair of integers; the sum of their cubes matters to us. > > > > struct winner > > { > > long win; > > struct pair first; > > struct pair second; > > }; > > // any pair you want to keep and report, e.g. the largest summand found > > in the range. > > > > int findrama(int myid, long z1, long z2, struct winner *winnerp); > > // findrama finds numbers which are sums of cubes in two distince ways, > > // e.g. 1729 = 12^3 + 1^3 but also = 10^3 + 9^3 > > // "myid" is the hypothetical node number of the MPI instance that would > > invoke this > > // z1 and z2 are the range of numbers to search, e.g. 1729 might be > > found > > // between 1000 and 19999; > > // winner is the "best" result found, which in this example is the > > largest. > > > > > > int main(int argc, char **argv) > > { > > // I'm not doing much in this main. We want to use the wiki MPI example > > // main to call "findrama" with a range of numbers, perhaps based on > > myid; e.g. > > // node 1 does z = 1 to 1999, node 2 does 2000 to 2999, etc, based on > > the number > > // of nodes you have and the largest arithmetic you can do. > > > > int myid; > > > > int hits; > > > > struct winner mywinner; > > > > myid = 0; > > > > hits = findrama(myid, 1, 10000, &mywinner); > > > > printf("Findrama made %d hits\n", hits); > > > > if(hits < 0) > > { > > printf("Error, probably malloc error.\n"); > > return 0; > > } > > > > printf("Winner %d:\n", mywinner.win); > > printf("= %d, %d\n", mywinner.first.x, mywinner.first.y); > > printf("= %d, %d\n", mywinner.second.x, mywinner.second.y); > > > > return 0; > > } > > > > int findrama(int myid, long z1, long z2, struct winner *winnerp) > > { > > struct pair *p; > > long range; > > long i, j, k; > > int hits; > > long bigz, z; > > long xi, xj; > > long xsum; > > > > //struct pair newpair; > > struct winner mywinner; > > > > bigz = 0; > > > > hits = 0; > > > > range = (z2 - z1); > > p = malloc( range * sizeof(struct pair)); > > // > > // malloc should return a pointer to a range of allocated memory > > // large enough for "range" pairs, and pair takes up space for two long > > integers. > > // > > > > if(!p) > > { > > fprintf(stderr, "Error: unable to malloc at myid = %d\n", myid); > > return -1; > > } > > // in the zth offset we will store the pair x, y st x^3 + y^3 = z^3 > > // for z in the range from z1 <= z < z2 > > > > // zero out > > for(i=0; i < range; i++) > > { > > p[i].x = 0; > > p[i].y = 0; > > > > } > > > > //z2cube = z2 * z2 * z2; > > > > for(i = 0; i < z2; i++) > > { > > xi = i*i*i; > > > > if(xi > BIGGEST) > > { > > printf("DEBUG: i = %d is too big.\n", i); > > break; > > } > > > > // debug > > if(i%1000 == 0) > > { > > printf("Deubg: i = %d\n", i); > > } > > > > for(j = 0; j <= i; j++) > > { > > xj = j*j*j; > > xsum = xi + xj; > > > > if(xsum > z2) > > { > > // we've exceeded our range > > break; > > } > > > > for(k = 0; k< range; k++) > > { > > z = z1 + k; > > > > if(z == xsum) > > { > > if(p[z].x == 0 && p[z].y == 0) > > { > > // found this sum of cubes for the first time > > p[z].x = i; > > p[z].y = j; > > } > > else > > { > > // we have discovered a number that > > // can be written as a sum of two cubes > > // in two distince ways. > > hits++; > > > > printf("DEBUG: HIT:\n %05d = %d^3 + %d^3,\n %05d = %d^3 + %d^3\n", > > z, i, j, z, p[z].x, p[z].y > > ); > > mywinner.win = z; > > mywinner.first.x = i; > > mywinner.first.y = j; > > mywinner.second.x = p[z].x; > > mywinner.second.y = p[z].y; > > } > > } > > } > > } > > } > > winnerp->win = mywinner.win; > > winnerp->first.x = mywinner.first.x; > > winnerp->first.y = mywinner.first.y; > > winnerp->second.x = mywinner.second.x; > > winnerp->second.y = mywinner.second.y; > > > > > > > > return hits; > > } > > > > Peter > > > > > > > > > > On 8/16/07, Robert G. Brown <[EMAIL PROTECTED] > wrote: > > > > > > On Tue, 14 Aug 2007, Tim Simon wrote: > > > > > > > I would like to know if there is any open source software that will > > > > run on a beowulf, which will do something like find prime numbers, > > > or > > > > something simillar. I cant afford a commercial program. > > > > > > > > I am guessing that there is not "one size fits all" type thing - > > > what > > > > fits your beowulf wont fit mine. Is there anything that can be > > > easily > > > > made to fit? I have searched all over the net, and all i can find > > > are > > > > educational or high research type applications. I just would like > > > > something reasonably simple, (I thought prime numbers but hey, > > > > anything is good). > > > > > > Writing your own is good. Hunting for primes is actually something > > > that > > > will parallelize decently simply because the Sieve of Erasthones for a > > > > > > proposed number N only requires that you try all the primes found up > > > to > > > N/2. Therefore a collection of nodes can contain and share all the > > > primes found up to N-1, and distribute the testing of N, N+2, N+4, > > > N+6... N+M (obviously skipping all even numbers) as long as N+M < 2N. > > > > > > For a small number M of nodes, if you precompute and load all the > > > primes > > > less than M -- trivially done -- then you're in business. > > > > > > You will have a few longer term problems. A single processor can > > > sieve > > > all the unsigned int primes from 0-(2^32-1) out fairly quickly anyway. > > > To go beyond this, you'll need to use symbolic division instead of > > > binary computer arithmetic, I think. You'll also run into > > > storage/memory problems as you start to get a significant number of > > > primes in your running table. > > > > > > But it is a fun problem, for sure. Go for it. > > > > > > Beyond that, there are a number of programs people use to play with or > > > demonstrate a beowulf's speedup. The "funnest" is any of the > > > rubberbandable Mandelbrot set exploration tools parallelized on top of > > > MPI or PVM -- makes nifty graphics. It isn't quite as cool as it used > > > to be because Moore's Law has overwhelmed the computational difficulty > > > > > > of the task. When I started in this business, it might have taken > > > minutes to compute a rubberbanding down into the MS, depending on > > > where > > > one was (and how deep one had to go on all those pixels to terminate). > > > Parallelize that on sixty or so nodes and it drops DRAMATICALLY to > > > seconds. > > > > > > Now CPUs are so fast that a SINGLE CPU can rubberband down in a second > > > or two, and networks haven't kept pace so that computing the pixels in > > > a > > > single thread might actually be faster than splitting them. So > > > speedup > > > is a bit less dramatic, but still very cool. > > > > > > rgb > > > > > > > I am running Red Hat FC4. > > > > > > Hmmm, might at least TRY 6 or 7. > > > > > > > > > > > > > > > > > > Tim > > > > _______________________________________________ > > > > Beowulf mailing list, [email protected] > > > > To change your subscription (digest mode or unsubscribe) visit > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > -- > > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > > > Duke University Dept. of Physics, Box 90305 > > > Durham, N.C. 27708-0305 > > > Phone: 1-919-660-2567 Fax: 919-660-2525 email:[EMAIL PROTECTED] > > > > > > > > > _______________________________________________ > > > Beowulf mailing list, [email protected] > > > To change your subscription (digest mode or unsubscribe) visit > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > _______________________________________________ > > Beowulf mailing list, [email protected] > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > >
_______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
