[ 
https://issues.apache.org/jira/browse/LUCENE-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978757#comment-13978757
 ] 

Paul Elschot commented on LUCENE-5627:
--------------------------------------

This adds a module called "label" as a prototype for index-time positional 
joins by labeled text fragments.

This provides a 1 : 0..n positional join.
It is a generalization of FieldMaskingSpanQuery that provides a 1 : 1 
positional join. 

At indexing time labeled text fragments for a document are analysed from a 
TokenStream.

In package org.apache.lucene.analysis.label such a labeled fragments stream is 
split into
a label stream, and into pairs of streams for fragments and fragment positions.
A fragment is series of tokens, possibly empty.
The fragments in each fragment stream will be contiguous,
the labels and the other fragment streams have no influence on their positions.

The output streams can be used to provide documents with different fields per 
stream.
It is up to the user to associate the output streams with fields in documents 
to be indexed for search.

Labels and fragments are represented at query time by Spans.
Querying labeled fragments with positional joins is supported in package 
org.apache.lucene.search.spans.label.

This implementation uses EliasFanoBytes (LUCENE-5524) to compress a payload 
with start/end positions.
These have a value index, which allows for fast fragment to label associations.
Currently these have no position index, so label to fragment associations will 
be somewhat slower.
Since payloads need to be loaded completely during searches, this will not have 
high performance for larger payloads.

This is a prototype because I don't expect high performance for larger payloads.
All code javadocs are marked experimental.


> Positional joins
> ----------------
>
>                 Key: LUCENE-5627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5627
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Paul Elschot
>            Priority: Minor
>
> Prototype of analysis and search for labeled fragments



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to