Tim Allison created LUCENE-5317:
-----------------------------------
Summary: [PATCH] Concordance capability
Key: LUCENE-5317
URL: https://issues.apache.org/jira/browse/LUCENE-5317
Project: Lucene - Core
Issue Type: New Feature
Components: core/search
Affects Versions: 4.5
Reporter: Tim Allison
Fix For: 4.6
Attachments: concordance_v1.patch.gz
This patch enables a Lucene-powered concordance search capability.
Concordances are extremely useful for linguists, lawyers and other analysts
performing analytic search vs. traditional snippeting/document retrieval tasks.
By "analytic search," I mean that the user wants to browse every time a term
appears (or at least the topn) in a subset of documents and see the words
before and after.
Concordance technology is far simpler and less interesting than IR relevance
models/methods, but it can be extremely useful for some use cases.
Traditional concordance sort orders are available (sort on words before the
target, words after, target then words before and target then words after).
Under the hood, this is running SpanQuery's getSpans() and reanalyzing to
obtain character offsets. There is plenty of room for optimizations and
refactoring.
Many thanks to my colleague, Jason Robinson, for input on the design of this
patch.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]