On Thu, 21 Apr 2022 07:12:07 -0700
al...@coakmail.com wrote:

> OP maybe need the streaming IO for reading files.

Which is what they were already doing - they used:

    while (<HD>) {
        ...
    }

Which, under the hood, uses readline, to read a line at a time.

(where "HD" is their global filehandle - a lexical filehandle would
have been better, but makes no difference here)


You can use B::Deparse to see that the above deparses to a use of
readline:


  [davidp@columbia:~]$ cat tmp/readline
  #!/usr/bin/env perl
  
  while (<STDIN>) {
      print "Line: $_\n";
  }
  
  [davidp@columbia:~]$ perl -MO=Deparse tmp/readline
  while (defined($_ = readline STDIN)) {
      print "Line: $_\n";
  }
  tmp/readline syntax OK


So, they're already reading line-wise, it seems they're just
running in to memory usage issues from holding a hash of 80+million
values, which is not super suprising on a reasonably low-memory box.

Personally, if it were me, I'd go one of two ways:

* just throw some more RAM at it - these days, RAM is far cheaper than
  programmer time trying to squeeze it into the least amount of bytes,
  especially true if it's a quick "Get It Done" solution

* hand it off to a tool made for the job - import the data into SQLite
  or some other DB engine and let it do what it's designed for, as it's
  likely to be far more efficient than a hand-rolled Perl solution.
  (They already proved that Apache Spark can handle it on the same
  hardware)


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to