We could use a cursor to keep the data on the server side and fetch it
a record at a time.  Here's an attempt:



Mike Rylander
 | President
 | Equinox Open Library Initiative
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@equinoxinitiative.org
 | web:  http://equinoxinitiative.org

On Wed, Apr 19, 2017 at 10:01 AM, Jason Stephenson <ja...@sigio.com> wrote:
> Even more investigation reveals that I was jumping to conclusions.
> Turns out that on a Debian 7 Wheezy machine with Perl 5.14, a simple
> script to dump the MARC for all of our 2.7 million, non-deleted bib
> records uses almost 12GB of RAM. Looks like this was always an issue,
> and I was just running the scripts on hardware with more RAM.
> I should still build a VM with a more recent Debian or Ubuntu release
> with enough RAM for comparison.
> Sorry for all of the noise.
> On 04/05/2017 10:30 AM, Jason Stephenson wrote:
>> More investigation points the finger at Perl DBI and/or DBD::Pg. They
>> are now apparently caching all of the results before returning anything.
>> My reading of the documentation seems to imply that this was always the
>> case, however I was able to dump 2.7 million records with Perl 5.14 on a
>> server with 8GB of RAM without running out of memory. With Perl 5.18+
>> this appears to no longer be possible, YMMV.
>> It looks like a comprehensive fix could be found in teaching DBD::Pg to
>> use row caching:
>> https://rt.cpan.org/Public/Bug/Display.html?id=93266
>> On 03/10/2017 11:11 AM, Jason Stephenson wrote:
>>> Hi, all.
>>> NOTE: This is not https://bugs.launchpad.net/evergreen/+bug/1671845. It
>>> may be related or have a similar cause, but the experience/symptoms are
>>> completely different.
>>> At this point, consider this a head's up, as well as a problem
>>> description that I don't yet think I have enough information to file as
>>> a bug report. It is also a request for anyone who wants to double check
>>> this report and to help with debugging.
>>> I've noticed some bizarre behavior with DBI, MARC::Record, and writing
>>> to a file with Perl version 5.20 and 5.22. (These versions of Perl ship
>>> with Debian 8 Jessie and Ubuntu 16.04 Xenial Xerus, respectively.)
>>> I have a script (https://github.com/Dyrcona/boopsie) that I use to make
>>> a weekly extract of records to send to Boopsie, Inc. on behalf of our
>>> member libraries that use their app.
>>> What I have seen is that when run with the aforementioned versions of
>>> Perl, the program consumes all of the RAM on the server and gets killed
>>> by OOM killer. No output ever reaches the file. This suggests to me that
>>> the problem occurs in the main loop with extract and converting the
>>> MARCXML from the database, though it could be the Perl output buffer run
>>> amok.
>>> The main loop of my program is similar, though less complicated, than
>>> that of marc_export. I tried marc_export to see if it would have the
>>> same problem. When extracting my whole database, it does:
>>> marc_export -a -e UTF-8 > all.mrc
>>> It also crashes if fed the output of an equivalent psql query to extract
>>> all of the record ids, or if a file of all record ids is piped into
>>> marc_export. It makes no difference if the output format is USMARC or
>>> I can split this up into batches of 50,000 or so records (quite possibly
>>> more) and all is well. I figured this out by dumping records for a
>>> branch with around 51,000 items and that worked. My whole database has
>>> just over 2.7 million, non-deleted bib records.
>>> This worked on Perl version 5.14 on Debian 7 Wheezy and on Ubuntu 14.04
>>> Trusty Tahr.
>>> I hope to run marc_export with the Perl debugger to figure out the exact
>>> cause. Until this is fixed, I'm using a work around in my scripts of
>>> dumping MARCXML batches and converting them to USMARC and putting them
>>> into 1 file with yaz-marcdump. This seems to work in light of the Lp bug
>>> mentioned in the NOTE.
>>> Any and all information, contradictory or otherwise, from those using
>>> Debian 8 or Ubuntu 16.04 is most welcome.
>>> Jason

Reply via email to