On Thu, 21 Apr 2022 07:12:07 -0700
[email protected] wrote:
> OP maybe need the streaming IO for reading files.
Which is what they were already doing - they used:
while (<HD>) {
...
}
Which, under the hood, uses readline, to read a line at a time.
(where "HD" is their global filehandle - a lexical filehandle would
have been better, but makes no difference here)
You can use B::Deparse to see that the above deparses to a use of
readline:
[davidp@columbia:~]$ cat tmp/readline
#!/usr/bin/env perl
while (<STDIN>) {
print "Line: $_\n";
}
[davidp@columbia:~]$ perl -MO=Deparse tmp/readline
while (defined($_ = readline STDIN)) {
print "Line: $_\n";
}
tmp/readline syntax OK
So, they're already reading line-wise, it seems they're just
running in to memory usage issues from holding a hash of 80+million
values, which is not super suprising on a reasonably low-memory box.
Personally, if it were me, I'd go one of two ways:
* just throw some more RAM at it - these days, RAM is far cheaper than
programmer time trying to squeeze it into the least amount of bytes,
especially true if it's a quick "Get It Done" solution
* hand it off to a tool made for the job - import the data into SQLite
or some other DB engine and let it do what it's designed for, as it's
likely to be far more efficient than a hand-rolled Perl solution.
(They already proved that Apache Spark can handle it on the same
hardware)
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/