https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38101

            Bug ID: 38101
           Summary: ES skips records with huge fields
 Change sponsored?: ---
           Product: Koha
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5 - low
         Component: Searching - Elasticsearch
          Assignee: [email protected]
          Reporter: [email protected]
        QA Contact: [email protected]

I saw a case in the wild, where the staff member copied and pasted some legal
text from a PDF into a 500 field. Then the record was not able to be found
using ES.

The reason is fairly simple: ES has a max size it will accept for a phrase
index.

To reproduce:
1. Have KTD running with ES:
   $ ktd --proxy --es7 up -d
2. Perform a search
3. Pick the first result for edition
4. Find a cool Wiki page with lots of paragraphs
5. Copy all of the paragraphs and put them on a 500$a field for the record.
6. Repeat 2
=> FAIL: The record is not found
7. Reindex manually:
   $ ktd --shell
  k$ perl misc/search_tools/rebuild_elasticsearch.pl --biblios --where
"biblionumber=3"  -v -v
=> FAIL: You get something like:
```
[22229] Committing final records...
One or more ElasticSearch errors occurred when indexing documents at
/kohadevbox/koha/Koha/SearchEngine/Elasticsearch/Indexer.pm line 148.
[22229] There were errors during indexing
Record #3 Document contains at least one immense term in field="note.raw"
(whose UTF8 encoding is longer than the max length 32766), all of which were
skipped.  Please correct the analyzer to not produce such terms.  The prefix of
the first immense term is: '[10, 109, 117, 115, 116, 97, 102, 97, 32, 102, 117,
101, 32, 101, 108, 32, 115, 101, 103, 117, 110, 100, 111, 32, 104, 105, 106,
111, 32, 100]...', original message: bytes can be at most 32766 in length; got
32771 (illegal_argument_exception) : max_bytes_length_exceeded_exception (bytes
can be at most 32766 in length; got 32771)
[22229] Total 1 records indexed
```

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Reply via email to