[ 
https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429025#comment-13429025
 ] 

Uwe Schindler commented on LUCENE-4216:
---------------------------------------

It is also much more performant, as your code creates regex mathcers all the 
time and copies the token chars to new Strings all the time instead of working 
directly on the CharTermAttribute (which extends CharSequence, so can do 
regexes directly).
                
> Token X exceeds length of provided text sized X
> -----------------------------------------------
>
>                 Key: LUCENE-4216
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4216
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>    Affects Versions: 4.0-ALPHA
>         Environment: Windows 7, jdk1.6.0_27
>            Reporter: Ibrahim
>         Attachments: ArabicTokenizer.java, myApp.zip
>
>
> I'm facing this exception:
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم 
> exceeds length of provided text sized 170
>       at 
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
>       at classes.myApp$16$1.run(myApp.java:1508)
> I tried to find anything wrong in my code when i start migrating Lucene 3.6 
> to 4.0 without successful. i found similar issues with HTMLStripCharFilter 
> e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm 
> triggering this here to see if there is really a bug or it is something wrong 
> in my code with v4. The code that im using:
> final Highlighter highlighter = new Highlighter(new 
> SimpleHTMLFormatter("<font color=red>", "</font>"), new QueryScorer(query));
> .......
> final TokenStream tokenStream = 
> TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, "Line", 
> analyzer);
> final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, 
> doc.get("Line"), false, 10);
> Please note that this is working fine with v3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to