https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913
Magnus Enger <mag...@libriotech.no> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #176701|0 |1 is obsolete| | --- Comment #4 from Magnus Enger <mag...@libriotech.no> --- Created attachment 176704 --> https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=176704&action=edit Bug 38913: (bug 38416 follow-up) Elasticsearch indexing explodes with oversized records After Bug 38416 Elasticsearch indexing explodes with oversized records, especially with UTF encoded data. In Koha::SearchEngine::Elasticsearch::marc_records_to_documents a following snippet has been introduced: my $usmarc_record = $record->as_usmarc(); my $decoded_usmarc_record = MARC::Record->new_from_usmarc($usmarc_record); But if $record is oversized (> 99999 bytes), it is OK for MARC::Record object, but not for $record->as_usmarc. The produced ISO 2709 string is not correct and hence cannot be properly converted back to MARC::Record object by new_from_usmarc. The result in this case can be like: UTF-8 "\x85" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm line 35. Since it is done without any eval / try, the whole reindex procedure (for instance rebuild_elasticsearch.pl) is being randomly interrupted with no explanation. Test plan: ========== Hard to reproduce. But the explanation together with discussion in Bug 38416 (from 2024-12-15) explains and justifies the need of this added eval. 1. Have a standard KTD installation with Elasticsearch. 2. Use the provided test record - add it to Koha with ./misc/migration_tools/bulkmarcimport.pl -b -file test.xml -m=MARCXML (have patience). During load process you should see a message like: UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm line 35. 3. The record should get biblionumber 439. Check in librarian interface with http://<your_addreess>:8081/cgi-bin/koha/catalogue/detail.pl?biblionumber=439 that the record has been imported. However, you should not be able to make a search for this record. 4. Try to reindex with: ./misc/search_tools/rebuild_elasticsearch.pl -b -bn 439 You should get a message like: UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm line 35. Again, no search results. 5. Apply the patch ; restart_all. 6. Repeat reindex with: ./misc/search_tools/rebuild_elasticsearch.pl -b -bn 439 There should be no warning now and you should be able to find the record. Signed-off-by: Magnus Enger <mag...@libriotech.no> Followed the test plan. Works as advertised. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/