Tim Allison created LUCENE-5317:
-----------------------------------

             Summary: [PATCH] Concordance capability
                 Key: LUCENE-5317
                 URL: https://issues.apache.org/jira/browse/LUCENE-5317
             Project: Lucene - Core
          Issue Type: New Feature
          Components: core/search
    Affects Versions: 4.5
            Reporter: Tim Allison
             Fix For: 4.6
         Attachments: concordance_v1.patch.gz

This patch enables a Lucene-powered concordance search capability.

Concordances are extremely useful for linguists, lawyers and other analysts 
performing analytic search vs. traditional snippeting/document retrieval tasks. 
 By "analytic search," I mean that the user wants to browse every time a term 
appears (or at least the topn)  in a subset of documents and see the words 
before and after.  

Concordance technology is far simpler and less interesting than IR relevance 
models/methods, but it can be extremely useful for some use cases.

Traditional concordance sort orders are available (sort on words before the 
target, words after, target then words before and target then words after).

Under the hood, this is running SpanQuery's getSpans() and reanalyzing to 
obtain character offsets.  There is plenty of room for optimizations and 
refactoring.

Many thanks to my colleague, Jason Robinson, for input on the design of this 
patch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to