[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029416#comment-13029416
 ] 

Robert Muir commented on LUCENE-3071:
-------------------------------------

bq. Can you help me with the purpose of finalOffset? Or can I simply skip it in 
my tests (they are working if I skip it)?

The finalOffset is supposed to be the offset of the entire document, this is 
useful so that offsets are correct on multivalued fields.

Example multivalued field "foo" with two values:
"bar " <-- this one ends with a space
"baz"

With a whitespace tokenizer, value 1 will have a single token "bar" with 
startOffset=0, endOffset=3. But, finalOffset needs to be 4 (essentially however 
many chars you read in from the Reader)

This way, the offsets will then accumulate correctly for "baz".


> PathHierarchyTokenizer adaptation for urls: splits reversed
> -----------------------------------------------------------
>
>                 Key: LUCENE-3071
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3071
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Olivier Favre
>            Priority: Minor
>         Attachments: LUCENE-3071.patch, ant.log.tar.bz2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> {{PathHierarchyTokenizer}} should be usable to split urls the a "reversed" 
> way (useful for faceted search against urls):
> {{www.site.com}} -> {{www.site.com, site.com, com}}
> Moreover, it should be able to skip a given number of first (or last, if 
> reversed) tokens:
> {{/usr/share/doc/somesoftware/INTERESTING/PART}}
> Should give with 4 tokens skipped:
> {{INTERESTING}}
> {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to