[jira] Commented: (LUCENE-1227) NGramTokenizer to handle more than 1024 chars

Otis Gospodnetic (JIRA) Tue, 13 May 2008 23:08:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596637#action_12596637
 ]


Otis Gospodnetic commented on LUCENE-1227:
------------------------------------------

Thanks for the test and for addressing this!

Could you add some examples for NO_OPTIMIZE and QUERY_OPTIMIZE?  I can't tell 
from looking at the patch what those are about.  Also, note how existing 
variables use camelCaseLikeThis.  It would be good to stick to the same pattern 
(instead of bufflen, buffpos, etc.), as well as to the existing style (e.g. 
space between if and open paren, spaces around == and =, etc.)

I'll commit as soon as you make these changes, assuming you can make them.  
Thank you.


> NGramTokenizer to handle more than 1024 chars
> ---------------------------------------------
>
>                 Key: LUCENE-1227
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1227
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>            Reporter: Hiroaki Kawai
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-1227.patch, NGramTokenizer.patch, 
> NGramTokenizer.patch
>
>
> Current NGramTokenizer can't handle character stream that is longer than 
> 1024. This is too short for non-whitespace-separated languages.
> I created a patch for this issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1227) NGramTokenizer to handle more than 1024 chars

Reply via email to