Thank you Mike,
I have thought about that solution myself, but the problem with this
approach is that the terms still need to be modified before building the
dictionary that is feed to the spell checker.
Also, the similarity scores which are used to determine the spell
suggestions are affected by the prefixes. So this solution is probably
not a good idea.
On 11/23/2011 08:04 PM, Michael Sokolov wrote:
could use simply index every term with a namespace prefix like:
Q::term
where Q is the namespace and term the term?
Then when you do spell corrections, submit each candidate term with
the namespace prefix prepended
-Mike
On 11/23/2011 9:28 AM, E. van Chastelet wrote:
I currently have an idea to get it done, but it's not a nice solution.
If we have an index Q with all documents for all namespaces, we first
extract the list of all terms that appear for the field namespace in
Q (this field indicates the namespace of the document).
Then, for each namespace n in the terms list:
- Get all docs from Q that match +namespace:n
- Construct a temporary index from these docs
- Use this temporary index to construct the dictionary, which the
SpellChecker can use as input.
- Call indexDictionary on SpellChecker to create spellcheck index
for current namespace.
- Delete temporary index
We now have separate spell check indexes for each namespace.
Any suggestions for a cleaner solution?
Regards,
Elmer van Chastelet
On 11/10/2011 01:16 PM, E. van Chastelet wrote:
Hi all,
In our project we like to have the ability to get search results
scoped to one 'namespace' (as we call it). This can easily be
achieved by using a filter or just an additional must-clause.
For the spellchecker (and our autocompletion, which is a modified
spellchecker), the story seems different. The spell checker index is
created using a LuceneDictionary, which has a IndexReader as source.
We would like to get (spellcheck/autocomplete) suggestions that are
scoped to one namespace (i.e. field 'namespace' should have a
particular value).
With a single source index containing docs for all namespaces, it
seems not possible to create a spellcheck index for each namespace
the ordinary way.
Q1: Is there a way to construct a LuceneDictionary from a subset of
a single source index (all terms where namespace = %value%) ?
Another, maybe better solution is to customize the spellchecker by
adding an additional namespace field to the spellchecker index. At
query-time, an additional must-clause is added, scoping the
suggestions to one (or more) namespace(s). The advantage of this is
to have a singleton spellchecker (or at least the index reader) for
all namespaces. This also means less open files by our application
(imagine if there are over 1000 namespaces).
Q2: Will there be a significant penalty (say more than 50% slower)
for the additional must-clause at query time?
Q3: Or can you think of a better solution for this problem? :)
How we currently do it: we currently use Lucene 3.1 with Hibernate
Search and we actually already have auto completion and spell
checking scoped to one namespace. This is currently achieved by
using index sharding, so each namespace has its own index and
reader, and another for spell check and auto completion.
Unfortunately there are some downsides to this:
- Our faceting engine has no good support for multiple indexes, so
faceting only works on a single namespace
- Needs administration for mapping namespace identifier (String) to
index number (integer)
- The number of shards (and thus name spaces) is currently
hardcoded. At this moment it is set to 100, and this means Hibernate
Search opens up 100 index readers/writers, while only n<100 are in
use. and therfore:
- Much open file descriptors
- Hard limit on number of namespaces
Therefore it seems better to switch back to having a single index
for all namespaces.
Thanks!
Regards,
Elmer van Chastelet
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org