AW: fuzzy/case insensitive AnalyzingSuggester )

2015-01-24 Thread Clemens Wyss DEV
I am back on this topic ;)

Case- and diacritics insensitivity is supported out-of-the-box by the 
analyzing suggesters, including the FuzzySuggester. 
The logic is in the Analyzer.
So how do I force case-insensitivity?
I tried
...
str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.FuzzyLookupFactory/str
str name=ignoreCase=true/str
...
or
...
str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory/str
str name=ignoreCase=true/str
...
to no avail

-Ursprüngliche Nachricht-
Von: Oliver Christ [mailto:ochr...@ebsco.com] 
Gesendet: Freitag, 20. Juni 2014 15:52
An: java-user@lucene.apache.org
Betreff: RE: fuzzy/case insensitive AnalyzingSuggester )

Hi Clemens,

I haven't yet built a suggester which combines all three, and am not aware of 
one. I'd love to have one though ;-)

Case- and diacritics insensitivity is supported out-of-the-box by the analyzing 
suggesters, including the FuzzySuggester. The logic is in the Analyzer.

I haven't yet tried out AnalyzingInfixSuggester, and haven't investigated 
whether it's possible to combine that with FuzzySuggester (which also is an 
analyzing suggester).

Due to memory constraints, we build infix suggesters by adding each relevant 
substring, but use WFST suggesters with payloads as the base, to reduce RAM 
load at runtime. We call the analyzer in the dictionary iterator. At search 
time, we look up the surface form (completion) in a secondary index using the 
payload as a key (and for deduping).

If FuzzySuggester supports payloads (haven't checked), you could get an infix 
suggester using the same approach. That will lead to large automata, and as 
you'd have to look up the completion in a secondary index, you'd never use the 
surface form returned by the automaton itself, so it's a waste of space. WFSTs 
are more space-efficient but don't support payloads (if I remember correctly) 
and there's no fuzzy WFST suggester either :(

Generally, we found it beneficial to not combine all functionality in a single 
suggester, but use separate automata in a cascaded model. We first look up 
completions in the prefix non-fuzzy suggester. Based on several criteria, we 
may then consult the infix suggester, and if needed, the fuzzy suggester. The 
rationale is that we don't want high-ranking fuzzy or infix hits to fill up the 
completion list while there are good (but less popular) prefix hits. Having 
control over which suggester is used when, and how its specific suggestions are 
merged into the final result list, helps improving the user experience, at 
least with our use cases.

Cheers, Oli

-Original Message-
From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Sent: Friday, June 20, 2014 6:47 AM
To: java-user@lucene.apache.org
Subject: AW: fuzzy/case insensitive AnalyzingSuggester )

Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to hand my big IndexReader (see oher post  [lucene 4.6] NPE 
when calling IndexReader#openIfChanged) into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes 
(by swapping  out part of the lookup-table)?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

B�CB��[��X��ܚX�KK[XZ[
��]�K]\�\�][��X��ܚX�PX�[�K�\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[
��]�K]\�\�Z[X�[�K�\X�K�ܙ�B�B


AW: fuzzy/case insensitive AnalyzingSuggester )

2014-06-22 Thread Clemens Wyss DEV
Oli, 
thanks for your valuable inputs!

 Generally, we found it beneficial to not combine all functionality in a 
 single suggester
Makes absolutely sense, but doesn't help keeping RAM-load low ;) unless you go 
with WFSTs. 

What we have done so far is build a term-index based on the terms of the 
corresponding (data)index. I.e. an index always comes in pair with its 
corresponding term index.

-Ursprüngliche Nachricht-
Von: Oliver Christ [mailto:ochr...@ebsco.com] 
Gesendet: Freitag, 20. Juni 2014 15:52
An: java-user@lucene.apache.org
Betreff: RE: fuzzy/case insensitive AnalyzingSuggester )

Hi Clemens,

I haven't yet built a suggester which combines all three, and am not aware of 
one. I'd love to have one though ;-)

Case- and diacritics insensitivity is supported out-of-the-box by the analyzing 
suggesters, including the FuzzySuggester. The logic is in the Analyzer.

I haven't yet tried out AnalyzingInfixSuggester, and haven't investigated 
whether it's possible to combine that with FuzzySuggester (which also is an 
analyzing suggester).

Due to memory constraints, we build infix suggesters by adding each relevant 
substring, but use WFST suggesters with payloads as the base, to reduce RAM 
load at runtime. We call the analyzer in the dictionary iterator. At search 
time, we look up the surface form (completion) in a secondary index using the 
payload as a key (and for deduping).

If FuzzySuggester supports payloads (haven't checked), you could get an infix 
suggester using the same approach. That will lead to large automata, and as 
you'd have to look up the completion in a secondary index, you'd never use the 
surface form returned by the automaton itself, so it's a waste of space. WFSTs 
are more space-efficient but don't support payloads (if I remember correctly) 
and there's no fuzzy WFST suggester either :(

Generally, we found it beneficial to not combine all functionality in a single 
suggester, but use separate automata in a cascaded model. We first look up 
completions in the prefix non-fuzzy suggester. Based on several criteria, we 
may then consult the infix suggester, and if needed, the fuzzy suggester. The 
rationale is that we don't want high-ranking fuzzy or infix hits to fill up the 
completion list while there are good (but less popular) prefix hits. Having 
control over which suggester is used when, and how its specific suggestions are 
merged into the final result list, helps improving the user experience, at 
least with our use cases.

Cheers, Oli

-Original Message-
From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Sent: Friday, June 20, 2014 6:47 AM
To: java-user@lucene.apache.org
Subject: AW: fuzzy/case insensitive AnalyzingSuggester )

Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to hand my big IndexReader (see oher post  [lucene 4.6] NPE 
when calling IndexReader#openIfChanged) into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes 
(by swapping  out part of the lookup-table)?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

B�CB��[��X��ܚX�KK[XZ[
��]�K]\�\�][��X��ܚX�PX�[�K�\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[
��]�K]\�\�Z[X�[�K�\X�K�ܙ�B�B


AW: fuzzy/case insensitive AnalyzingSuggester )

2014-06-20 Thread Clemens Wyss DEV
Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to hand my big IndexReader (see oher post  [lucene 4.6] NPE 
when calling IndexReader#openIfChanged) into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes 
(by swapping  out part of the lookup-table)?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org