[
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062383#comment-13062383
]
Michael McCandless commented on LUCENE-3233:
--------------------------------------------
bq. But the lookup on the original is still faster, right?
That was before we optimized FST for this usage case.
Now, from the testing above, it looks like we are faster when syns actually
match; if no syns match the two are around the same speed.
Separately: shouldn't we not have any syns in the default text_en field type?
Like we can have a synonyms.txt but comment out all the rules in there?
I don't think we should keep the old one around, ie, we should [eventually]
replace it with the new one.
> HuperDuperSynonymsFilterâ„¢
> -------------------------
>
> Key: LUCENE-3233
> URL: https://issues.apache.org/jira/browse/LUCENE-3233
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch,
> LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch,
> LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch,
> LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip
>
>
> The current synonymsfilter uses a lot of ram and cpu, especially at build
> time.
> I think yesterday I heard about "huge synonyms files" three times.
> So, I think we should use an FST-based structure, sharing the inputs and
> outputs.
> And we should be more efficient with the tokenStream api, e.g. using
> save/restoreState instead of cloneAttributes()
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]