https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Magnus Enger <mag...@libriotech.no> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #176701|0                           |1
        is obsolete|                            |

--- Comment #4 from Magnus Enger <mag...@libriotech.no> ---
Created attachment 176704
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=176704&action=edit
Bug 38913: (bug 38416 follow-up) Elasticsearch indexing explodes with oversized
records

After Bug 38416 Elasticsearch indexing explodes with oversized
records, especially with UTF encoded data.

In Koha::SearchEngine::Elasticsearch::marc_records_to_documents a
following snippet has been introduced:

my $usmarc_record = $record->as_usmarc();
my $decoded_usmarc_record = MARC::Record->new_from_usmarc($usmarc_record);

But if $record is oversized (> 99999 bytes), it is OK for MARC::Record
object, but not for $record->as_usmarc. The produced ISO 2709 string
is not correct and hence cannot be properly converted back to
MARC::Record object by new_from_usmarc.

The result in this case can be like:

UTF-8 "\x85" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
line 35.

Since it is done without any eval / try, the whole reindex procedure
(for instance rebuild_elasticsearch.pl) is being randomly interrupted
with no explanation.

Test plan:
==========
Hard to reproduce. But the explanation together with discussion in Bug
38416 (from 2024-12-15) explains and justifies the need of this added
eval.

1. Have a standard KTD installation with Elasticsearch.
2. Use the provided test record - add it to Koha with
   ./misc/migration_tools/bulkmarcimport.pl -b -file test.xml -m=MARCXML
   (have patience).
   During load process you should see a message like:
   UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
line 35.
3. The record should get biblionumber 439. Check in librarian interface with
  
http://<your_addreess>:8081/cgi-bin/koha/catalogue/detail.pl?biblionumber=439
   that the record has been imported.
   However, you should not be able to make a search for this record.
4. Try to reindex with:
   ./misc/search_tools/rebuild_elasticsearch.pl -b -bn 439
   You should get a message like:
   UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
line 35.
   Again, no search results.
5. Apply the patch ; restart_all.
6. Repeat reindex with:
   ./misc/search_tools/rebuild_elasticsearch.pl -b -bn 439
   There should be no warning now and you should be able to find the record.

Signed-off-by: Magnus Enger <mag...@libriotech.no>
Followed the test plan. Works as advertised.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Reply via email to