https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893
Bug ID: 19893 Summary: Alternative optimized indexing for Elasticsearch Change sponsored?: --- Product: Koha Version: master Hardware: All OS: All Status: NEW Severity: enhancement Priority: P5 - low Component: Searching - Elasticsearch Assignee: koha-bugs@lists.koha-community.org Reporter: glask...@gmail.com At our library perhaps owning to a larger than average number of biblios a full re-index takes an unacceptable amount of time complete (> 24h). We also had an issue with indexing becoming increasingly slower when new mappings are added. After some profiling using NYTProf it became clear most of this overhead is in Catmandu::Store::ElasticSearch and Catmandu::MARC. After giving it some thought the simplest way to resolve this issue actually seemed to be to replace these libraries with Koha-specific code, since the functionality provided is actually not that hard to re-implement in a more efficient manner. Due to the complexity of Catmandu optimizing these libraries would most likely be more challenging (and some parts are not actually possible to optimize because of limitations owing to the architecture of Catmandu/Fix). Main benefits include: 1) Increased indexing performance (about twice as fast, six times as fast if comparing time spent in update_index()), due to more efficient json-conversion and fewer Elasticsearch requests. 2) With Catmandu indexing speed decreases as more mappings are added, with the alternative algorithm indexing is kept more or less constant no matter how many mappings you add. 3) Neglectable indexing start-up time. For example we have an issue with the book drop machine, each return taking a couple of seconds because of the catmandu start-up overhead. 4) More transparent code and less complexity compared with Catmandu. With this patch the largest bottleneck is instead Marc::Record::as_xml_record, to use marc21 as serialization format would probably be a lot faster but still chose marc-xml because of the binary format length limitation (which could be exceeded with many items). Still, I will probably try to look into faster marc-xml serialization options in the future to address this. I also attach profiling results with and without the patch applied. -- You are receiving this mail because: You are the assignee for the bug. You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/