[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters

2014-11-14 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064

Tomás Cohen Arazi  changed:

   What|Removed |Added

 Status|Passed QA   |Pushed to Master
 CC||tomasco...@gmail.com

--- Comment #4 from Tomás Cohen Arazi  ---
Patch pushed to master.

Thanks Fridolin!

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters

2014-10-24 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064

Kyle M Hall  changed:

   What|Removed |Added

  Attachment #32296|0   |1
is obsolete||

--- Comment #3 from Kyle M Hall  ---
Created attachment 32676
  -->
http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=32676&action=edit
[PASSED QA] Bug 13064 - Indexing problem with ICU on control characters

The ICU configuration files contains a rule to remove control characters :
  
This rule is before tokenization.

The problem is that "[:Control:]" regex contains line feed, carriage return and
tab. See http://www.regular-expressions.info/posixbrackets.html.
So when several lines are indexed, last word of line is joined with first line
of next line. Thoses words are then not searchable.

For example :
  First line
  Second line
This will become "First lineSecond line", tokenized as "First", "lineSecond"
and "line".

Test plan :
- Use ICU in Zebra configuration
- Choose an indexed field, like 300$a
- Create a new record
- Enter several lines in choosen field, like :
  First line
  Second line
- Index this record
=> Without patch the search on "Second" does not return the record
=> With patch the search on "Second" returns the record
- Same tests with tab and carriage return instead of line feed

Signed-off-by: Chris Cormack 

Signed-off-by: Kyle M Hall 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters

2014-10-24 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064

Kyle M Hall  changed:

   What|Removed |Added

 Status|Signed Off  |Passed QA

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters

2014-10-14 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064

Chris Cormack  changed:

   What|Removed |Added

  Attachment #32136|0   |1
is obsolete||

--- Comment #2 from Chris Cormack  ---
Created attachment 32296
  -->
http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=32296&action=edit
Bug 13064 - Indexing problem with ICU on control characters

The ICU configuration files contains a rule to remove control characters :
  
This rule is before tokenization.

The problem is that "[:Control:]" regex contains line feed, carriage return and
tab. See http://www.regular-expressions.info/posixbrackets.html.
So when several lines are indexed, last word of line is joined with first line
of next line. Thoses words are then not searchable.

For example :
  First line
  Second line
This will become "First lineSecond line", tokenized as "First", "lineSecond"
and "line".

Test plan :
- Use ICU in Zebra configuration
- Choose an indexed field, like 300$a
- Create a new record
- Enter several lines in choosen field, like :
  First line
  Second line
- Index this record
=> Without patch the search on "Second" does not return the record
=> With patch the search on "Second" returns the record
- Same tests with tab and carriage return instead of line feed

Signed-off-by: Chris Cormack 

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters

2014-10-14 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064

Chris Cormack  changed:

   What|Removed |Added

 Status|Needs Signoff   |Signed Off

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters

2014-10-10 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064

--- Comment #1 from Fridolin SOMERS  ---
Created attachment 32136
  -->
http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=32136&action=edit
Bug 13064 - Indexing problem with ICU on control characters

The ICU configuration files contains a rule to remove control characters :
  
This rule is before tokenization.

The problem is that "[:Control:]" regex contains line feed, carriage return and
tab. See http://www.regular-expressions.info/posixbrackets.html.
So when several lines are indexed, last word of line is joined with first line
of next line. Thoses words are then not searchable.

For example :
  First line
  Second line
This will become "First lineSecond line", tokenized as "First", "lineSecond"
and "line".

Test plan :
- Use ICU in Zebra configuration
- Choose an indexed field, like 300$a
- Create a new record
- Enter several lines in choosen field, like :
  First line
  Second line
- Index this record
=> Without patch the search on "Second" does not return the record
=> With patch the search on "Second" returns the record
- Same tests with tab and carriage return instead of line feed

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters

2014-10-10 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064

Fridolin SOMERS  changed:

   What|Removed |Added

 Status|NEW |Needs Signoff
   Patch complexity|--- |Trivial patch

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters

2014-10-10 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064

Fridolin SOMERS  changed:

   What|Removed |Added

   Assignee|gmcha...@gmail.com  |fridolyn.som...@biblibre.co
   ||m

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/