subject:"\[jira\] Updated\: \(LUCENE\-1224\) NGramTokenFilter creates bad TokenStream"

[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-03-16 Thread Hiroaki Kawai (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hiroaki Kawai updated LUCENE-1224:
--

Attachment: LUCENE-1224.patch

Patch updated with unit test.

LUCENE-1225 is easier to understand this problem. This patch also includes
token filter issues that is more complicated.

NGramTokenFilter creates bad TokenStream

Key: LUCENE-1224
URL: https://issues.apache.org/jira/browse/LUCENE-1224
Project: Lucene - Java
Issue Type: Bug
Components: contrib/*
Reporter: Hiroaki Kawai
Assignee: Grant Ingersoll
Priority: Critical
Attachments: LUCENE-1224.patch, NGramTokenFilter.patch,
NGramTokenFilter.patch

With current trunk NGramTokenFilter(min=2,max=4) , I index abcdef string
into an index, but I can't query it with abc. If I query with ab, I can
get a hit result.
The reason is that the NGramTokenFilter generates badly ordered TokenStream.
Query is based on the Token order in the TokenStream, that how stemming or
phrase should be anlayzed is based on the order (Token.positionIncrement).
With current filter, query string abc is tokenized to : ab bc abc
meaning query a string that has ab bc abc in this order.
Expected filter will generate : ab abc(positionIncrement=0) bc
meaning query a string that has (ab|abc) bc in this order
I'd like to submit a patch for this issue. :-)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-03-13 Thread Hiroaki Kawai (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated LUCENE-1224:
--

Attachment: NGramTokenFilter.patch

Modified to set a right start/end offset value in Token properties.

 NGramTokenFilter creates bad TokenStream
 

 Key: LUCENE-1224
 URL: https://issues.apache.org/jira/browse/LUCENE-1224
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Reporter: Hiroaki Kawai
Priority: Critical
 Attachments: NGramTokenFilter.patch, NGramTokenFilter.patch


 With current trunk NGramTokenFilter(min=2,max=4) , I index abcdef string 
 into an index, but I can't query it with abc. If I query with ab, I can 
 get a hit result.
 The reason is that the NGramTokenFilter generates badly ordered TokenStream. 
 Query is based on the Token order in the TokenStream, that how stemming or 
 phrase should be anlayzed is based on the order (Token.positionIncrement).
 With current filter, query string abc is tokenized to : ab bc abc 
 meaning query a string that has ab bc abc in this order.
 Expected filter will generate : ab abc(positionIncrement=0) bc
 meaning query a string that has (ab|abc) bc in this order
 I'd like to submit a patch for this issue. :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated LUCENE-1224:
--

Attachment: NGramTokenFilter.patch

 NGramTokenFilter creates bad TokenStream
 

 Key: LUCENE-1224
 URL: https://issues.apache.org/jira/browse/LUCENE-1224
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Reporter: Hiroaki Kawai
Priority: Critical
 Attachments: NGramTokenFilter.patch


 With current trunk NGramTokenFilter(min=2,max=4) , I index abcdef string 
 into an index, but I can't query it with abc. If I query with ab, I can 
 get a hit result.
 The reason is that the NGramTokenFilter generates badly ordered TokenStream. 
 Query is based on the Token order in the TokenStream, that how stemming or 
 phrase should be anlayzed is based on the order (Token.positionIncrement).
 With current filter, query string abc is tokenized to : ab bc abc 
 meaning query a string that has ab bc abc in this order.
 Expected filter will generate : ab abc(positionIncrement=0) bc
 meaning query a string that has (ab|abc) bc in this order
 I'd like to submit a patch for this issue. :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

3 matches

Site Navigation

Mail list logo

Footer information