Even more investigation reveals that I was jumping to conclusions.

Turns out that on a Debian 7 Wheezy machine with Perl 5.14, a simple
script to dump the MARC for all of our 2.7 million, non-deleted bib
records uses almost 12GB of RAM. Looks like this was always an issue,
and I was just running the scripts on hardware with more RAM.

I should still build a VM with a more recent Debian or Ubuntu release
with enough RAM for comparison.

Sorry for all of the noise.

On 04/05/2017 10:30 AM, Jason Stephenson wrote:
> More investigation points the finger at Perl DBI and/or DBD::Pg. They
> are now apparently caching all of the results before returning anything.
> My reading of the documentation seems to imply that this was always the
> case, however I was able to dump 2.7 million records with Perl 5.14 on a
> server with 8GB of RAM without running out of memory. With Perl 5.18+
> this appears to no longer be possible, YMMV.
> 
> It looks like a comprehensive fix could be found in teaching DBD::Pg to
> use row caching:
> 
> https://rt.cpan.org/Public/Bug/Display.html?id=93266
> 
> 
> 
> On 03/10/2017 11:11 AM, Jason Stephenson wrote:
>> Hi, all.
>>
>> NOTE: This is not https://bugs.launchpad.net/evergreen/+bug/1671845. It
>> may be related or have a similar cause, but the experience/symptoms are
>> completely different.
>>
>> At this point, consider this a head's up, as well as a problem
>> description that I don't yet think I have enough information to file as
>> a bug report. It is also a request for anyone who wants to double check
>> this report and to help with debugging.
>>
>> I've noticed some bizarre behavior with DBI, MARC::Record, and writing
>> to a file with Perl version 5.20 and 5.22. (These versions of Perl ship
>> with Debian 8 Jessie and Ubuntu 16.04 Xenial Xerus, respectively.)
>>
>> I have a script (https://github.com/Dyrcona/boopsie) that I use to make
>> a weekly extract of records to send to Boopsie, Inc. on behalf of our
>> member libraries that use their app.
>>
>> What I have seen is that when run with the aforementioned versions of
>> Perl, the program consumes all of the RAM on the server and gets killed
>> by OOM killer. No output ever reaches the file. This suggests to me that
>> the problem occurs in the main loop with extract and converting the
>> MARCXML from the database, though it could be the Perl output buffer run
>> amok.
>>
>> The main loop of my program is similar, though less complicated, than
>> that of marc_export. I tried marc_export to see if it would have the
>> same problem. When extracting my whole database, it does:
>>
>> marc_export -a -e UTF-8 > all.mrc
>>
>> It also crashes if fed the output of an equivalent psql query to extract
>> all of the record ids, or if a file of all record ids is piped into
>> marc_export. It makes no difference if the output format is USMARC or
>> MARCXML.
>>
>> I can split this up into batches of 50,000 or so records (quite possibly
>> more) and all is well. I figured this out by dumping records for a
>> branch with around 51,000 items and that worked. My whole database has
>> just over 2.7 million, non-deleted bib records.
>>
>> This worked on Perl version 5.14 on Debian 7 Wheezy and on Ubuntu 14.04
>> Trusty Tahr.
>>
>> I hope to run marc_export with the Perl debugger to figure out the exact
>> cause. Until this is fixed, I'm using a work around in my scripts of
>> dumping MARCXML batches and converting them to USMARC and putting them
>> into 1 file with yaz-marcdump. This seems to work in light of the Lp bug
>> mentioned in the NOTE.
>>
>> Any and all information, contradictory or otherwise, from those using
>> Debian 8 or Ubuntu 16.04 is most welcome.
>>
>> Jason
>>

Reply via email to