We could use a cursor to keep the data on the server side and fetch it a record at a time. Here's an attempt:
http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/marc_export_by_cursor Thoughts? -- Mike Rylander | President | Equinox Open Library Initiative | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@equinoxinitiative.org | web: http://equinoxinitiative.org On Wed, Apr 19, 2017 at 10:01 AM, Jason Stephenson <ja...@sigio.com> wrote: > Even more investigation reveals that I was jumping to conclusions. > > Turns out that on a Debian 7 Wheezy machine with Perl 5.14, a simple > script to dump the MARC for all of our 2.7 million, non-deleted bib > records uses almost 12GB of RAM. Looks like this was always an issue, > and I was just running the scripts on hardware with more RAM. > > I should still build a VM with a more recent Debian or Ubuntu release > with enough RAM for comparison. > > Sorry for all of the noise. > > On 04/05/2017 10:30 AM, Jason Stephenson wrote: >> More investigation points the finger at Perl DBI and/or DBD::Pg. They >> are now apparently caching all of the results before returning anything. >> My reading of the documentation seems to imply that this was always the >> case, however I was able to dump 2.7 million records with Perl 5.14 on a >> server with 8GB of RAM without running out of memory. With Perl 5.18+ >> this appears to no longer be possible, YMMV. >> >> It looks like a comprehensive fix could be found in teaching DBD::Pg to >> use row caching: >> >> https://rt.cpan.org/Public/Bug/Display.html?id=93266 >> >> >> >> On 03/10/2017 11:11 AM, Jason Stephenson wrote: >>> Hi, all. >>> >>> NOTE: This is not https://bugs.launchpad.net/evergreen/+bug/1671845. It >>> may be related or have a similar cause, but the experience/symptoms are >>> completely different. >>> >>> At this point, consider this a head's up, as well as a problem >>> description that I don't yet think I have enough information to file as >>> a bug report. It is also a request for anyone who wants to double check >>> this report and to help with debugging. >>> >>> I've noticed some bizarre behavior with DBI, MARC::Record, and writing >>> to a file with Perl version 5.20 and 5.22. (These versions of Perl ship >>> with Debian 8 Jessie and Ubuntu 16.04 Xenial Xerus, respectively.) >>> >>> I have a script (https://github.com/Dyrcona/boopsie) that I use to make >>> a weekly extract of records to send to Boopsie, Inc. on behalf of our >>> member libraries that use their app. >>> >>> What I have seen is that when run with the aforementioned versions of >>> Perl, the program consumes all of the RAM on the server and gets killed >>> by OOM killer. No output ever reaches the file. This suggests to me that >>> the problem occurs in the main loop with extract and converting the >>> MARCXML from the database, though it could be the Perl output buffer run >>> amok. >>> >>> The main loop of my program is similar, though less complicated, than >>> that of marc_export. I tried marc_export to see if it would have the >>> same problem. When extracting my whole database, it does: >>> >>> marc_export -a -e UTF-8 > all.mrc >>> >>> It also crashes if fed the output of an equivalent psql query to extract >>> all of the record ids, or if a file of all record ids is piped into >>> marc_export. It makes no difference if the output format is USMARC or >>> MARCXML. >>> >>> I can split this up into batches of 50,000 or so records (quite possibly >>> more) and all is well. I figured this out by dumping records for a >>> branch with around 51,000 items and that worked. My whole database has >>> just over 2.7 million, non-deleted bib records. >>> >>> This worked on Perl version 5.14 on Debian 7 Wheezy and on Ubuntu 14.04 >>> Trusty Tahr. >>> >>> I hope to run marc_export with the Perl debugger to figure out the exact >>> cause. Until this is fixed, I'm using a work around in my scripts of >>> dumping MARCXML batches and converting them to USMARC and putting them >>> into 1 file with yaz-marcdump. This seems to work in light of the Lp bug >>> mentioned in the NOTE. >>> >>> Any and all information, contradictory or otherwise, from those using >>> Debian 8 or Ubuntu 16.04 is most welcome. >>> >>> Jason >>>