[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064 Tomás Cohen Arazi changed: What|Removed |Added Status|Passed QA |Pushed to Master CC||tomasco...@gmail.com --- Comment #4 from Tomás Cohen Arazi --- Patch pushed to master. Thanks Fridolin! -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064 Kyle M Hall changed: What|Removed |Added Attachment #32296|0 |1 is obsolete|| --- Comment #3 from Kyle M Hall --- Created attachment 32676 --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=32676&action=edit [PASSED QA] Bug 13064 - Indexing problem with ICU on control characters The ICU configuration files contains a rule to remove control characters : This rule is before tokenization. The problem is that "[:Control:]" regex contains line feed, carriage return and tab. See http://www.regular-expressions.info/posixbrackets.html. So when several lines are indexed, last word of line is joined with first line of next line. Thoses words are then not searchable. For example : First line Second line This will become "First lineSecond line", tokenized as "First", "lineSecond" and "line". Test plan : - Use ICU in Zebra configuration - Choose an indexed field, like 300$a - Create a new record - Enter several lines in choosen field, like : First line Second line - Index this record => Without patch the search on "Second" does not return the record => With patch the search on "Second" returns the record - Same tests with tab and carriage return instead of line feed Signed-off-by: Chris Cormack Signed-off-by: Kyle M Hall -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064 Kyle M Hall changed: What|Removed |Added Status|Signed Off |Passed QA -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064 Chris Cormack changed: What|Removed |Added Attachment #32136|0 |1 is obsolete|| --- Comment #2 from Chris Cormack --- Created attachment 32296 --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=32296&action=edit Bug 13064 - Indexing problem with ICU on control characters The ICU configuration files contains a rule to remove control characters : This rule is before tokenization. The problem is that "[:Control:]" regex contains line feed, carriage return and tab. See http://www.regular-expressions.info/posixbrackets.html. So when several lines are indexed, last word of line is joined with first line of next line. Thoses words are then not searchable. For example : First line Second line This will become "First lineSecond line", tokenized as "First", "lineSecond" and "line". Test plan : - Use ICU in Zebra configuration - Choose an indexed field, like 300$a - Create a new record - Enter several lines in choosen field, like : First line Second line - Index this record => Without patch the search on "Second" does not return the record => With patch the search on "Second" returns the record - Same tests with tab and carriage return instead of line feed Signed-off-by: Chris Cormack -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064 Chris Cormack changed: What|Removed |Added Status|Needs Signoff |Signed Off -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064 --- Comment #1 from Fridolin SOMERS --- Created attachment 32136 --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=32136&action=edit Bug 13064 - Indexing problem with ICU on control characters The ICU configuration files contains a rule to remove control characters : This rule is before tokenization. The problem is that "[:Control:]" regex contains line feed, carriage return and tab. See http://www.regular-expressions.info/posixbrackets.html. So when several lines are indexed, last word of line is joined with first line of next line. Thoses words are then not searchable. For example : First line Second line This will become "First lineSecond line", tokenized as "First", "lineSecond" and "line". Test plan : - Use ICU in Zebra configuration - Choose an indexed field, like 300$a - Create a new record - Enter several lines in choosen field, like : First line Second line - Index this record => Without patch the search on "Second" does not return the record => With patch the search on "Second" returns the record - Same tests with tab and carriage return instead of line feed -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064 Fridolin SOMERS changed: What|Removed |Added Status|NEW |Needs Signoff Patch complexity|--- |Trivial patch -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064 Fridolin SOMERS changed: What|Removed |Added Assignee|gmcha...@gmail.com |fridolyn.som...@biblibre.co ||m -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/