[ https://issues.apache.org/jira/browse/LUCENE-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978757#comment-13978757 ]
Paul Elschot commented on LUCENE-5627: -------------------------------------- This adds a module called "label" as a prototype for index-time positional joins by labeled text fragments. This provides a 1 : 0..n positional join. It is a generalization of FieldMaskingSpanQuery that provides a 1 : 1 positional join. At indexing time labeled text fragments for a document are analysed from a TokenStream. In package org.apache.lucene.analysis.label such a labeled fragments stream is split into a label stream, and into pairs of streams for fragments and fragment positions. A fragment is series of tokens, possibly empty. The fragments in each fragment stream will be contiguous, the labels and the other fragment streams have no influence on their positions. The output streams can be used to provide documents with different fields per stream. It is up to the user to associate the output streams with fields in documents to be indexed for search. Labels and fragments are represented at query time by Spans. Querying labeled fragments with positional joins is supported in package org.apache.lucene.search.spans.label. This implementation uses EliasFanoBytes (LUCENE-5524) to compress a payload with start/end positions. These have a value index, which allows for fast fragment to label associations. Currently these have no position index, so label to fragment associations will be somewhat slower. Since payloads need to be loaded completely during searches, this will not have high performance for larger payloads. This is a prototype because I don't expect high performance for larger payloads. All code javadocs are marked experimental. > Positional joins > ---------------- > > Key: LUCENE-5627 > URL: https://issues.apache.org/jira/browse/LUCENE-5627 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Paul Elschot > Priority: Minor > > Prototype of analysis and search for labeled fragments -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org