https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=37020

--- Comment #19 from Leo Stoyanov <leo.stoya...@bywatersolutions.com> ---
Created attachment 176921
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=176921&action=edit
Bug 37020: [alternate] Changed script to use XML:Twig to reduce memory usage.

In the original script, MARC::Batch->new('XML', ...) was holding onto large
chunks of parsed data for MARCXML files, resulting in excessive memory usage
even after the script tried to free references. A hacky approach was to
manually clear the batch’s internal structures, but Data::Dumper showed that
MARC::Batch wasn’t storing data in those specific fields. Hence, the hack
offered no solution to the underlying caching. By contrast, XML::Twig can
stream XML elements one by one, calling a handler and then discarding the
parsed chunk.

Note, there is no guarantee this implementation works for non-XML files as it
stands (although, in theory it should). The focus is on the XML:Twig
implementation for reference as a solution. Overall, batching seems to be
eating up memory.

To test:
1. Run perl misc/migration_tools/bulkmarcimport.pl -m=MARCXML -b -d -v
--commit=1000 --file=file_path_here on a large MARCXML/XML file (for example, 2
GB or greater).
2. On whatever machine or container it is ran, the script will likely cause an
out-of-memory error and crash the environment.
3. Apply the patch, run "restart_all", and redo step 1. The script should
utilize much less memory to import records from MARCXML/XML files.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Reply via email to