https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=9729
--- Comment #10 from David Cook <dc...@prosentient.com.au> --- Oh I've had some fun playing with ICU... chain.xml: <icu_chain locale=""> <tokenize rule="l"/> <transliterate rule="[:Punctuation:] } [:WhiteSpace:] > ''"/> <transform rule="[:WhiteSpace:] Remove "/> <display/> <casemap rule="l"/> </icu_chain> echo -n '.NET. test' | yaz-icu -c chain.xml 1 1 '.net'' '.NET'' 2 1 'test' 'test' -- Here we tokenize based on the line break (ie space), and then we perform our transliteerate and transform rules as per http://userguide.icu-project.org/transforms/general. With the transliterate, we can use the following syntax: "before_context { text_to_replace } after_context > completed_result | result_to_revisit ;" So here the "text_to_replace" is the [:Punctuation:] and the "after_context" is [:WhiteSpace:], and the completed result is transliterating the punctuation into nothing. So we trim the "." from the end of NET but we don't trim the "." from the start. Of course, that doesn't really work in practice, because it misses sooo many other scenarios: echo -n 'Was that a good idea?' | yaz-icu -c chain.xml 1 1 'was' 'Was' 2 1 'that' 'that' 3 1 'a' 'a' 4 1 'good' 'good' 5 1 'idea?' 'idea?' I'm not really sure how to solve this problem in an efficient way. We could just map "C#", "C++", and ".NET" to "csharp", "cplusplus', and 'dotnet', but that's not a very scalable or comprehensive solution for all Koha users. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/