[Boston.pm] combinations
I am fairly new to Perl and haven't approached a scipt this complex or computation this intensive. So I would certainly appreciate any advice. I have successfully created a hash of arrays equivalent to a 122 x 6152 matrix that I want to run in 'pairwise combinations' and execute the 'sum of the difference squares' for each combination. In other words: rows: y1...y122 columns: x1...x6152 so... comb(y1,y2): {( y1[x1] - y2[x1] ) ^2 + ( y1[x2] - y2[x2] ) ^2 + ... + ( y1[x122] - y2[x122] ) ^2}; comb(y1,y3): {( y1[x1] - y3[x1] ) ^2 + ( y1[x2] - y3[x2] ) ^2 + ... + ( y1[x122] - y3[x122] ) ^2};. . . comb(y1,y6152) comb(y2,y3) . . comb(y2,y6152) comb(y3,y4) . . etc. This is going to be very large. According to the combinations formula (nCk, n=6152, k=2), the output will be a hash (with, for example, 'y1y2' key and 'd^2' value) of about 19 million records. I think my next step is to create a combinations formula, but I'm having problems doing so. Thank you in advance, David __ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] damian talk to boston.pm in sept.
Who is this Smelto guy anyway? On Tue, 29 Jul 2003, Uri Guttman wrote: i will get the url of his free talks and we will do the usual round of voting for your favorites. *Please* can we *finally* have the Perligata talk? :) so vote early and often for your favorite. if a new set of talks is listed, use that instead. Early often? % cat ~/bin/perligata #!/bin/sh echo yet another vote for perligata | \ mail -s 'damian talk vote' [EMAIL PROTECTED] % crontab -l | grep perligata 00,15,30,45 * * * * /Users/cdevers/bin/perligata ...on second thought... Smelto should run this. *ahem* -- Chris Devers [EMAIL PROTECTED] http://devers.homeip.net:8080/ POM, n. \pronounced P-O-M or pom (esp. Australian)\ [Acronym for Phase Of the Moon.] Chiefly, as POM-dependent, flaky, unreliable. See also PHASE. -- from _The Computer Contradictionary_, Stan Kelly-Bootle, 1995 ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm _ The new MSN 8: advanced junk mail protection and 2 months FREE* http://join.msn.com/?page=features/junkmail ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] combinations
Date: Mon, 4 Aug 2003 13:53:52 -0700 (PDT) From: David Byrne [EMAIL PROTECTED] I am fairly new to Perl and haven't approached a scipt this complex or computation this intensive. So I would certainly appreciate any advice. I have successfully created a hash of arrays equivalent to a 122 x 6152 matrix that I want to run in 'pairwise combinations' and execute the 'sum of the difference squares' for each combination. In other words: rows: y1...y122 columns: x1...x6152 This is a single large matrix? Sparse or dense? If sparse, a hash of hashes is probably the memory efficient way to store it: $matrix{y32}{x53} = value for row 32, column 53; If dense, you could use an array of arrays: $matrix[32][53] = value for row 32, column 53; Or you could investigate PDL (Piddle, Perl Data Language). so... comb(y1,y2): {( y1[x1] - y2[x1] ) ^2 + ( y1[x2] - y2[x2] ) ^2 + ... + ( y1[x122] - y2[x122] ) ^2}; You've reversed x and y compared to above. # array of arrays version for my $i (1..6152) { for my $j ($i+1 .. 6152) { $comb[$i][$j] = 0; $comb[$i][$j] += ($matrix[$i][$_] - $matrix[$j][$_]) **2 for (1..122); } } This is going to be very large. According to the combinations formula (nCk, n=6152, k=2), the output will be a hash (with, for example, 'y1y2' key and 'd^2' value) of about 19 million records. Yes. PDL is more memory efficient. Or just run it on a machine that has lots of RAM+swap. Or use various techniques to move most of the storage out of memory into files or a database. (Simplest example: instead of creating a $comb AoA above, just create a $comb scalar each round, then write it out: print comb of rows $i and $j is $comb\n; ) --kag ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Postal address De-duping
Hey, all. We do lots of (snail) mailings, and we're looking for a fast, customizable de-duping solution. We're currently taking a look at doubletake from http://peoplesmith.com/, which is not too expensive, but I was thinking there might be some perl stuff out there, given perl's text-processing powers. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Postal address De-duping
On Monday, August 4, 2003, at 05:12 PM, Joel Gwynn wrote: Hey, all. We do lots of (snail) mailings, and we're looking for a fast, customizable de-duping solution. We're currently taking a look at doubletake from http://peoplesmith.com/, which is not too expensive, but I was thinking there might be some perl stuff out there, given perl's text-processing powers. There's a wee script I wrote for TPJ a while back that scrapes the U.S. Postal Service's address canonicalizer. The script is on tpj.com; look under Archives for the article called Five Quick Hacks. The canonicalizer (well, they call it a zip code locator or something like that) will transform variants on the same address into the One True Address that the USPS recognizes, so de-duping then becomes a matter of simple string matching. Won't help you for foreign addresses, obviously. -Jon ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm