Even more investigation reveals that I was jumping to conclusions. Turns out that on a Debian 7 Wheezy machine with Perl 5.14, a simple script to dump the MARC for all of our 2.7 million, non-deleted bib records uses almost 12GB of RAM. Looks like this was always an issue, and I was just running the scripts on hardware with more RAM.
I should still build a VM with a more recent Debian or Ubuntu release with enough RAM for comparison. Sorry for all of the noise. On 04/05/2017 10:30 AM, Jason Stephenson wrote: > More investigation points the finger at Perl DBI and/or DBD::Pg. They > are now apparently caching all of the results before returning anything. > My reading of the documentation seems to imply that this was always the > case, however I was able to dump 2.7 million records with Perl 5.14 on a > server with 8GB of RAM without running out of memory. With Perl 5.18+ > this appears to no longer be possible, YMMV. > > It looks like a comprehensive fix could be found in teaching DBD::Pg to > use row caching: > > https://rt.cpan.org/Public/Bug/Display.html?id=93266 > > > > On 03/10/2017 11:11 AM, Jason Stephenson wrote: >> Hi, all. >> >> NOTE: This is not https://bugs.launchpad.net/evergreen/+bug/1671845. It >> may be related or have a similar cause, but the experience/symptoms are >> completely different. >> >> At this point, consider this a head's up, as well as a problem >> description that I don't yet think I have enough information to file as >> a bug report. It is also a request for anyone who wants to double check >> this report and to help with debugging. >> >> I've noticed some bizarre behavior with DBI, MARC::Record, and writing >> to a file with Perl version 5.20 and 5.22. (These versions of Perl ship >> with Debian 8 Jessie and Ubuntu 16.04 Xenial Xerus, respectively.) >> >> I have a script (https://github.com/Dyrcona/boopsie) that I use to make >> a weekly extract of records to send to Boopsie, Inc. on behalf of our >> member libraries that use their app. >> >> What I have seen is that when run with the aforementioned versions of >> Perl, the program consumes all of the RAM on the server and gets killed >> by OOM killer. No output ever reaches the file. This suggests to me that >> the problem occurs in the main loop with extract and converting the >> MARCXML from the database, though it could be the Perl output buffer run >> amok. >> >> The main loop of my program is similar, though less complicated, than >> that of marc_export. I tried marc_export to see if it would have the >> same problem. When extracting my whole database, it does: >> >> marc_export -a -e UTF-8 > all.mrc >> >> It also crashes if fed the output of an equivalent psql query to extract >> all of the record ids, or if a file of all record ids is piped into >> marc_export. It makes no difference if the output format is USMARC or >> MARCXML. >> >> I can split this up into batches of 50,000 or so records (quite possibly >> more) and all is well. I figured this out by dumping records for a >> branch with around 51,000 items and that worked. My whole database has >> just over 2.7 million, non-deleted bib records. >> >> This worked on Perl version 5.14 on Debian 7 Wheezy and on Ubuntu 14.04 >> Trusty Tahr. >> >> I hope to run marc_export with the Perl debugger to figure out the exact >> cause. Until this is fixed, I'm using a work around in my scripts of >> dumping MARCXML batches and converting them to USMARC and putting them >> into 1 file with yaz-marcdump. This seems to work in light of the Lp bug >> mentioned in the NOTE. >> >> Any and all information, contradictory or otherwise, from those using >> Debian 8 or Ubuntu 16.04 is most welcome. >> >> Jason >>