Or just remove the generics, right?
On Sep 3, 2008, at 5:09 PM, Karl Wettin (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628132
#action_12628132 ]
Karl Wettin commented on LUCENE-1320:
-------------------------------------
OK. Either remove it or place it in some alternative contrib module?
The first chooise is obviously the easiest.
ShingleMatrixFilter, a three dimensional permutating shingle filter
-------------------------------------------------------------------
Key: LUCENE-1320
URL: https://issues.apache.org/jira/browse/LUCENE-1320
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/analyzers
Affects Versions: 2.3.2
Reporter: Karl Wettin
Assignee: Karl Wettin
Priority: Blocker
Fix For: 2.4
Attachments: LUCENE-1320.txt, LUCENE-1320.txt, LUCENE-1320.txt
Backed by a column focused matrix that creates all permutations of
shingle tokens in three dimensions. I.e. it handles multi token
synonyms.
Could for instance in some cases be used to replaces 0-slop phrase
queries with something speedier.
{code:java}
Token[][][]{
{{hello}, {greetings, and, salutations}},
{{world}, {earth}, {tellus}}
}
{code}
passes the following test with 2-3 grams:
{code:java}
assertNext(ts, "hello_world");
assertNext(ts, "greetings_and");
assertNext(ts, "greetings_and_salutations");
assertNext(ts, "and_salutations");
assertNext(ts, "and_salutations_world");
assertNext(ts, "salutations_world");
assertNext(ts, "hello_earth");
assertNext(ts, "and_salutations_earth");
assertNext(ts, "salutations_earth");
assertNext(ts, "hello_tellus");
assertNext(ts, "and_salutations_tellus");
assertNext(ts, "salutations_tellus");
{code}
Contains more and less complex tests that demonstrate offsets,
posincr, payload boosts calculation and construction of a matrix
from a token stream.
The matrix attempts to hog as little memory as possible by seeking
no more than maximumShingleSize columns forward in the stream and
clearing up unused resources (columns and unique token sets). Can
still be optimized quite a bit though.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]