Hi Kendall,
"Position" and "Offset" are often confused in Lucene ;)
Lucene uses offset to track what you referred to ("(character, not byte)
offset into a text file", or into an indexed string).
Lucene uses position to track the Nth token: position 0 is first token,
position 1 is the second token, etc. But since tokens are usually N > 1
characters, the offsets grow faster than the positions. These tokens need
not be only a linear sequence: they can be a graph structure when
multi-token synonyms are applied.
Lucene indexes both of these, and you can turn them individually on/off if
you want.
Finally, you might be interested in Lucene's highlighters module -- this
contains tooling to do hit highlighting, to solve the "final inch" problem
of showing your users precisely which words/excerpts matched inside each
matched hit. Here's an example
<https://jirasearch.mikemccandless.com/search.py?chg=new&text=python&a1=&a2=&page=0&searcher=24390&sort=recentlyUpdated&format=list&id=jvmz29ec86du&dd=project%3ALucene&newText=python>
(searching Lucene's issues for the word "python").
Mike McCandless
http://blog.mikemccandless.com
On Fri, Jul 22, 2022 at 12:51 AM Mikhail Khludnev <[email protected]> wrote:
> Hello, Kendall.
>
> You can read about Token Position Increments at
>
> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/analysis/package-summary.html#package.description
> Usually position is a number of word and offset is a number of symbol.
> Modeling entries via positions is boilerplate, I suppose. Nowadays we
> either denormalize by copying values across children into a single parent
> document. Also, here are more relational options
>
> https://lucene.apache.org/core/9_2_0/join/org/apache/lucene/search/join/package-summary.html
>
>
> On Fri, Jul 22, 2022 at 7:02 AM Kendall Shaw <[email protected]>
> wrote:
>
> > Hi,
> >
> > I'm trying to figure out if I should be learning to use Lucene. I
> > imagine wanting to provide a user with a way to search for something and
> > present that found thing, in some way. If what is ultimately searched is
> > text files, then position would be an offset into the text file, I
> > think. But, that seems like a pretty unlikely scenario.
> >
> > If I have stored structured data into a database of some sort, does
> > Lucene provide some way to associate a position with an entry in a
> > database? Or is that left to the programmer to implement, outside of
> > Lucene?
> >
> > Kendall
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>