Take a look at LUCENE-5317 [1] and LUCENE-5318 [2].
They're available on my github site [3], and I've pushed them to maven central
[4].
LUCENE-5318 is crazily useful as a term/phrase recommender system.
I haven't documented either very well yet. I'll try to add documentation to my
github site tomorrow.
Let me know if you have any questions.
Cheers,
Tim
[1] https://issues.apache.org/jira/browse/LUCENE-5317
[2] https://issues.apache.org/jira/browse/LUCENE-5318
[3] https://github.com/tballison/lucene-addons (both 5317 and 5318 are under
the "5317 project"
[4] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5317/6.2-0.1
-----Original Message-----
From: José Tomás Atria [mailto:[email protected]]
Sent: Monday, September 19, 2016 3:32 PM
To: [email protected]
Subject: Cooccurrence matrices
Hello All,
I'm trying to use Lucene in order to create a sliding window cooccurrence
matrix. I've found some old discussion threads on this list that provide some
pointers, but most of those are for really old lucene versions, or rely on
components that are no longer available.
So far, I tried walking over every document collecting teir term-vectors and
then counting cooccurrences based on each term-vector's per-document index, but
this seems a little innefficient to me (not to say that it requires
termvectors) and I was wondering if anyone here has some other idea of how to
extract cooccurrence counts from a lucene index.
Just to be clear: what I need is to collect cooccurrence counts for all terms
within a (possibly asymetric) sliding window around a focal term, for each term
in an index.
Any ideas would be greatly appreciated. Thanks!
jta
--
sent from a phone. please excuse terseness and tpyos.
enviado desde un teléfono. por favor disculpe la parquedad y los erroers.