Hello Offer,

Offer Kaye wrote, On 12/19/2010 10:25 AM:
> On Thu, Dec 16, 2010 at 7:32 AM, Assaf Gordon wrote:
> [...] But it's too interesting a subject to pass up :)

I'd be glad to discuss it all day long ;)

But it's more of a philosophical discussion, as my script is available here:
https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header

>> sorting huge files (bigger than available RAM),
> 
> As long as you're not trying to slurp the file into memory inside your
> Perl program, why is this a problem for Perl?
> 
> use Tie::File;
> tie my @array, 'Tie::File', 'large_unsorted_file' or die "blah";
> print "$_\n" for sort @array;

I tried your script on a 2G file - it took perl 1m30s to load the entire file 
into memory (taking 4.3GB), then kept running for another 5m before I killed it.
sort (single thread) loads the same file in 23s (taking 2.9GB), then starts 
writing output to STDOUT in 1m.

Sorry, but perl loses on this one (BTW, this was all done in memory).

> There's probably already a solution for most sort related options,
> somewhere on CPAN. If there isn't, Perl's sort-subs give you the
> ability to, I think, cover pretty much any situation. 

My goal (and perhaps I forgot to mention it) was to have a drop-in 
replacement/wrapper for the command line GNU sort, 
supporting most of its options and arguments.
I don't want to sort from inside a perl program, I want to sort from shell 
scripts.

To have a perl program that supports all of GNU sort's options (using whatever 
modules out there) sounds like re-inventing the wheel.
More-ever, I want this script to be easy to use where ever GNU sort is 
available, so requiring perl module is not in my best interest.

> Not Perl related, but the question "How do I run a multithreaded sort
> on a huge text file while skipping the first few header lines?" might
> get you some interesting answers over at Stack Overflow. 

I wrote my own answer, see the script above.
And as I've mentioned, sooner or later a patch will be accepted that will add 
this feature to sort, rendering my script obsolete.

> Who knows, maybe you'll end up with a Java based solution ;)

Surely you're joking, Mr. Kaye ;)

-Assaf
_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl

Reply via email to