http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15555
--- Comment #3 from David Cook <dc...@prosentient.com.au> --- NOTE: The output I've provided in the above comments has come from a Koha using CHR indexing. ICU indexing has different output, but the same behaviour. Here's what I see with "format xml" and "elements zebra::index": <index name="Identifier-other" type="w" seq="21"></index> <index name="Identifier-other" type="w" seq="1"></index> <index name="Identifier-other" type="w" seq="22"></index> <index name="Identifier-other" type="w" seq="23"></index> <index name="Identifier-other" type="w" seq="24"></index> <index name="Identifier-other" type="w" seq="25"></index> <index name="Identifier-other" type="p" seq="21"></index> <index name="Identifier-other" type="p" seq="22"></index> <index name="Identifier-other" type="p" seq="23"></index> <index name="Identifier-other" type="p" seq="24"></index> <index name="Identifier-other" type="p" seq="25"></index> <index name="Identifier-other" type="u" seq="26">http://libris.kb.se/resource/bib/219553</index> Here's what I see with "format xml" and "elements index": <z:index name="Identifier-other:w Identifier-other:p">http://libris.kb.se/resource/bib/219553</z:index> <z:index name="Identifier-other:u">http://libris.kb.se/resource/bib/219553</z:index> However, this output is misleading. That's basically just the output of "xsltproc biblio-zebra-indexdefs.xsl <record>". It's pre-normalization and thus essentially meaningless. ----------- Only advanced users will look at yaz-client though. That being said, there are functional differences between ICU and CHR. For instance, the following query will work in CHR but NOT in ICU: id-other,phr=http libris kb se resource bib 219553 Conversely, the following query will work in ICU but not in CHR: id-other,phr=http libriskbse resource bib 219553 That's because we've configured tokenization and normalization to work differently between the two schemes. Fun, right? ICU uses the "l" tokenize rule from http://www.indexdata.com/yaz/doc/yaz-icu.html. That means it tokenizes based on slashes, spaces, and maybe some other characters I haven't discovered yet. You can verify that with the following commands: echo "THIS IS A TEST" | yaz-icu -x -c ./etc/zebradb/etc/phrases-icu.xml echo "THIS/IS/A/TEST" | yaz-icu -x -c ./etc/zebradb/etc/phrases-icu.xml Indeed, check out the following yaz-client output when using ICU: Z> f id-other,phr=http://libris.kb.se/resource/bib/219553 Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 12, setno 17 SearchResult-1: term=http cnt=12, term=libriskbse cnt=12, term=resource cnt=12, term=bib cnt=12, term=219553 cnt=12 records returned: 0 Elapsed: 0.001458 You can see the URL has been broken into 5 terms/tokens with ICU, while CHR does the following: Z> f id-other,phr=http://libris.kb.se/resource/bib/219553 Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 12, setno 13 SearchResult-1: term=http cnt=12, term=libris cnt=12, term=kb cnt=12, term=se cnt=12, term=resource cnt=12, term=bib cnt=12, term=219553 cnt=12 records returned: 0 Elapsed: 0.001119 We actually have 7 terms/tokens in the case of CHR! And that means that we don't want to try to outsmart Zebra by pre-normalizing our queries. We want to query Zebra with the exact same data that it indexed, because that way the normalization will be the same! Science! -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/