http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15541
--- Comment #1 from David Cook <dc...@prosentient.com.au> --- Here's my latest findings: Input: "http://libris.kb.se/resource/bib/219553" C4::Matcher::_normalize() = "HTTPLIBRISKBSERESOURCEBIB219553" Zebra CHR = "http libris kb se resource bib 219553" Zebra ICU = "http libriskbse resource bib 219553" It seems to me that the smartest thing to do is NOT to normalize with C4::Matcher::_normalize(), because we're probably going to get it wrong as we have above. Zebra indexes "http://libris.kb.se/resource/bib/219553" as "http libris kb se resource bib 219553" (CHR Phrase) or as "http libriskbse resource bib 219553" (ICU Phrase) or as "http://libris.kb.se/resource/bib/219553" (URL, which is a Charmap when using either CHR or ICU). If we query Zebra with "http://libris.kb.se/resource/bib/219553", it will normalize the query the same way that it normalized "http://libris.kb.se/resource/bib/219553" when it was originally indexing it, and we'll get a match. Of course, we can't necessarily stop using C4::Matcher::_normalize() as it's the default behaviour. Many people may count on that _normalize() without even knowing it... even if it's potentially working badly. I think what I want to do is create a new normalizer which does nothing, and call it "none" or "raw". That way, I'm passing to Zebra the same thing that it's seen before, and it will normalize it exactly the same way and the likelihood of an accurate match increases considerably. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/