AW: Spellchecking and suggesting part numbers

2014-11-03 Thread Lochschmied, Alexander
Thanks James, this did help a lot.

Is it possible to make DirectSolrSpellChecker try to return suggestions with 
maximum length of matching leading characters?

Alexander

-Ursprüngliche Nachricht-
Von: Dyer, James [mailto:james.d...@ingramcontent.com] 
Gesendet: Mittwoch, 24. September 2014 16:42
An: solr-user@lucene.apache.org
Betreff: RE: Spellchecking and suggesting part numbers

Alexander,

You could use a higher value for spellcheck.count, maybe 20 or so, then in your 
application pick out the suggestions that make changes on the right side.

Another option is to use DirectSolrSpellChecker (usually a better choice 
anyhow) and set the "minPrefix" field.  This will require up to n characters on 
the left side to match before it will make suggestions.  Taking a quick look at 
the code, it seems to me it won't try and correct anything in this prefix 
region also.  So perhaps you can set this to 2-4 (default=1).  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29
 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] 
Sent: Wednesday, September 24, 2014 9:06 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking and suggesting part numbers

Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:



solr.IndexBasedSpellChecker
./spellchecker
did_you_mean_part




did_you_mean_part
on


spellcheck_part




















Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander




Re: Spellchecking and suggesting part numbers

2014-09-24 Thread Jorge Luis Betancourt Gonzalez
I’ve done something similar to this using the the EdgeNGram not the 
spellchecker component, I don’t know if this is along with your requirements:

The relevant portion of my fieldType config:



 class="solr.SpellCheckComponent">
>   
>   solr.IndexBasedSpellChecker
>   ./spellchecker
>   did_you_mean_part
>   
>   
>startup="lazy">
>   
>   did_you_mean_part
>   on
>   
>   
>   spellcheck_part
>   
>   
> 
> 
>positionIncrementGap="100">
>   
>class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement=""/>
>   
>   
>minGramSize="1" maxGramSize="20" side="front"/>
>class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   
>   
>class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement=""/>
>   
>   
>minGramSize="1" maxGramSize="20" side="front"/>
>   
>   
> 
> Can we tweak the setup such that we should get more relevant part numbers?
> 
> Thanks,
> Alexander

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


RE: Spellchecking and suggesting part numbers

2014-09-24 Thread Dyer, James
Alexander,

You could use a higher value for spellcheck.count, maybe 20 or so, then in your 
application pick out the suggestions that make changes on the right side.

Another option is to use DirectSolrSpellChecker (usually a better choice 
anyhow) and set the "minPrefix" field.  This will require up to n characters on 
the left side to match before it will make suggestions.  Taking a quick look at 
the code, it seems to me it won't try and correct anything in this prefix 
region also.  So perhaps you can set this to 2-4 (default=1).  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29
 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] 
Sent: Wednesday, September 24, 2014 9:06 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking and suggesting part numbers

Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:



solr.IndexBasedSpellChecker
./spellchecker
did_you_mean_part




did_you_mean_part
on


spellcheck_part




















Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander




Spellchecking and suggesting part numbers

2014-09-24 Thread Lochschmied, Alexander
Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:



solr.IndexBasedSpellChecker
./spellchecker
did_you_mean_part




did_you_mean_part
on


spellcheck_part




















Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander


RE: Spellchecking suggestions won't collate

2014-08-20 Thread Corey Gerhardt
I'm working with business names which are even sometimes people names such as " 
Wardell F E B Dr ".  I suspect I need to change my logic to not try to rely on 
spellchecking so much as you suggest.

Thanks.

Corey

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: August-20-14 9:37 AM
To: solr-user@lucene.apache.org
Subject: RE: Spellchecking suggestions won't collate

Because "my" is the 7th suggestion down the list, it is going to need more than 
30 tries to figure out the one that can give some hits.  You can increase 
"maxCollationTries" if you're willing to endure the performance penalty of 
trying so many replacement queries.  This case actually highlights why 
DirecrSpellChecker by default doesn't even bother with short words like this.

Rather than letting the spellchecker check words this small, possibly you can 
just scan the user's input and make any words <4 characters long to be 
optional?  Or even just use a mm below 100%? (65% ?)  I realize this will give 
you a small loss of precision but the recall will be better and you'll have to 
rely less on spellcheck.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 15, 2014 3:21 PM
To: Solr User List
Subject: Spellchecking suggestions won't collate

It must be Friday. I can't figure out why there is no collation value:

{
  "responseHeader":{
"status":0,
"QTime":31,
"params":{
  "spellcheck":"on",
  "spellcheck.collateParam.qf":"BUS_BUSINESS_NAME",
  "spellcheck.maxResultsForSuggest":"5",
  "spellcheck.maxCollations":"3",
  "spellcheck.maxCollationTries":"30",
  "qf":"BUS_BUSINESS_NAME_PHRASE",
  "q.alt":"*:*",
  "spellcheck.collate":"true",
  "spellcheck.onlyMorePopular":"false",
  "defType":"edismax",
  "debugQuery":"true",
  "echoParams":"all",
  "spellcheck.count":"10",
  "spellcheck.alternativeTermCount":"10",
  "indent":"true",
  "q":"Mi Next Promo",
  "wt":"json"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "spellcheck":{
"suggestions":[
  "mi",{
"numFound":10,
"startOffset":0,
"endOffset":2,
"suggestion":["mr",
  "mp",
  "mid",
  "mix",
  "mb",
  "mj",
  "my",
  "md",
  "mc",
  "ma"]},
  "next",{
"numFound":3,
"startOffset":3,
"endOffset":7,
"suggestion":["nest",
  "news",
  "neil"]},
  "promo",{
"numFound":4,
"startOffset":8,
"endOffset":13,
"suggestion":["photo",
  "prime",
  "pronto",
  "prof"]}]},

The actual business name is "My Next Promo" which I'm hoping would be the 
collation value.

Thanks,

Corey



RE: Spellchecking suggestions won't collate

2014-08-20 Thread Dyer, James
Because "my" is the 7th suggestion down the list, it is going to need more than 
30 tries to figure out the one that can give some hits.  You can increase 
"maxCollationTries" if you're willing to endure the performance penalty of 
trying so many replacement queries.  This case actually highlights why 
DirecrSpellChecker by default doesn't even bother with short words like this.

Rather than letting the spellchecker check words this small, possibly you can 
just scan the user's input and make any words <4 characters long to be 
optional?  Or even just use a mm below 100%? (65% ?)  I realize this will give 
you a small loss of precision but the recall will be better and you'll have to 
rely less on spellcheck.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 15, 2014 3:21 PM
To: Solr User List
Subject: Spellchecking suggestions won't collate

It must be Friday. I can't figure out why there is no collation value:

{
  "responseHeader":{
"status":0,
"QTime":31,
"params":{
  "spellcheck":"on",
  "spellcheck.collateParam.qf":"BUS_BUSINESS_NAME",
  "spellcheck.maxResultsForSuggest":"5",
  "spellcheck.maxCollations":"3",
  "spellcheck.maxCollationTries":"30",
  "qf":"BUS_BUSINESS_NAME_PHRASE",
  "q.alt":"*:*",
  "spellcheck.collate":"true",
  "spellcheck.onlyMorePopular":"false",
  "defType":"edismax",
  "debugQuery":"true",
  "echoParams":"all",
  "spellcheck.count":"10",
  "spellcheck.alternativeTermCount":"10",
  "indent":"true",
  "q":"Mi Next Promo",
  "wt":"json"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "spellcheck":{
"suggestions":[
  "mi",{
"numFound":10,
"startOffset":0,
"endOffset":2,
"suggestion":["mr",
  "mp",
  "mid",
  "mix",
  "mb",
  "mj",
  "my",
  "md",
  "mc",
  "ma"]},
  "next",{
"numFound":3,
"startOffset":3,
"endOffset":7,
"suggestion":["nest",
  "news",
  "neil"]},
  "promo",{
"numFound":4,
"startOffset":8,
"endOffset":13,
"suggestion":["photo",
  "prime",
  "pronto",
  "prof"]}]},

The actual business name is "My Next Promo" which I'm hoping would be the 
collation value.

Thanks,

Corey



Spellchecking suggestions won't collate

2014-08-15 Thread Corey Gerhardt
It must be Friday. I can't figure out why there is no collation value:

{
  "responseHeader":{
"status":0,
"QTime":31,
"params":{
  "spellcheck":"on",
  "spellcheck.collateParam.qf":"BUS_BUSINESS_NAME",
  "spellcheck.maxResultsForSuggest":"5",
  "spellcheck.maxCollations":"3",
  "spellcheck.maxCollationTries":"30",
  "qf":"BUS_BUSINESS_NAME_PHRASE",
  "q.alt":"*:*",
  "spellcheck.collate":"true",
  "spellcheck.onlyMorePopular":"false",
  "defType":"edismax",
  "debugQuery":"true",
  "echoParams":"all",
  "spellcheck.count":"10",
  "spellcheck.alternativeTermCount":"10",
  "indent":"true",
  "q":"Mi Next Promo",
  "wt":"json"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "spellcheck":{
"suggestions":[
  "mi",{
"numFound":10,
"startOffset":0,
"endOffset":2,
"suggestion":["mr",
  "mp",
  "mid",
  "mix",
  "mb",
  "mj",
  "my",
  "md",
  "mc",
  "ma"]},
  "next",{
"numFound":3,
"startOffset":3,
"endOffset":7,
"suggestion":["nest",
  "news",
  "neil"]},
  "promo",{
"numFound":4,
"startOffset":8,
"endOffset":13,
"suggestion":["photo",
  "prime",
  "pronto",
  "prof"]}]},

The actual business name is "My Next Promo" which I'm hoping would be the 
collation value.

Thanks,

Corey


Re: permissive mm value and efficient spellchecking

2014-05-16 Thread elisabeth benoit
ok, thanks a lot, I'll check that out.


2014-05-14 14:20 GMT+02:00 Markus Jelsma :

> Elisabeth, i think you are looking for SOLR-3211 that introduced
> spellcheck.collateParam.* to override e.g. dismax settings.
>
> Markus
>
> -Original message-
> From:elisabeth benoit 
> Sent:Wed 14-05-2014 14:01
> Subject:permissive mm value and efficient spellchecking
> To:solr-user@lucene.apache.org;
> Hello,
>
> I'm using solr 4.2.1.
>
> I use a very permissive value for mm, to be able to find results even if
> request contains non relevant words.
>
> At the same time, I'd like to be able to do some efficient spellcheking
> with solrdirectspellchecker.
>
> So for instance, if user searches for "rue de Chraonne Paris", where
> Chraonne is mispelled, because of my permissive mm value I get more than
> 100 000 results containing words "rue" and "Paris" ("de" is a stopword),
> which are very frequent terms in my index, but no spellcheck correction for
> Chraonne. If I set mm=3, then I get the expected spellcheck correction
> value: "rue de Charonne Paris".
>
> Is there a way to achieve my two goals in a single solr request?
>
> Thanks,
> Elisabeth
>


permissive mm value and efficient spellchecking

2014-05-14 Thread elisabeth benoit
Hello,

I'm using solr 4.2.1.

I use a very permissive value for mm, to be able to find results even if
request contains non relevant words.

At the same time, I'd like to be able to do some efficient spellcheking
with solrdirectspellchecker.

So for instance, if user searches for "rue de Chraonne Paris", where
Chraonne is mispelled, because of my permissive mm value I get more than
100 000 results containing words "rue" and "Paris" ("de" is a stopword),
which are very frequent terms in my index, but no spellcheck correction for
Chraonne. If I set mm=3, then I get the expected spellcheck correction
value: "rue de Charonne Paris".

Is there a way to achieve my two goals in a single solr request?

Thanks,
Elisabeth


RE: permissive mm value and efficient spellchecking

2014-05-14 Thread Markus Jelsma
Elisabeth, i think you are looking for SOLR-3211 that introduced 
spellcheck.collateParam.* to override e.g. dismax settings.

Markus
 
-Original message-
From:elisabeth benoit 
Sent:Wed 14-05-2014 14:01
Subject:permissive mm value and efficient spellchecking
To:solr-user@lucene.apache.org; 
Hello,

I'm using solr 4.2.1.

I use a very permissive value for mm, to be able to find results even if
request contains non relevant words.

At the same time, I'd like to be able to do some efficient spellcheking
with solrdirectspellchecker.

So for instance, if user searches for "rue de Chraonne Paris", where
Chraonne is mispelled, because of my permissive mm value I get more than
100 000 results containing words "rue" and "Paris" ("de" is a stopword),
which are very frequent terms in my index, but no spellcheck correction for
Chraonne. If I set mm=3, then I get the expected spellcheck correction
value: "rue de Charonne Paris".

Is there a way to achieve my two goals in a single solr request?

Thanks,
Elisabeth


RE: Spellchecking - looking for general advice

2014-05-03 Thread Susheel Kumar
Got it.  Are you also considering Stemming & Phonetic here.  For e.g. phonetic 
may catch some of the restaurant variations and recruiter & recruited may 
convert to base words and at last spell check would have catch all situation.

-Original Message-
From: Maciej Dziardziel [mailto:fied...@gmail.com]
Sent: Saturday, May 03, 2014 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecking - looking for general advice

Hi

I've set it to 2, but python implementation of Levenshtein says its 3 for 
restraunt -> restaurant.

On Sat, May 3, 2014 at 2:44 PM, Susheel Kumar 
 wrote:
> How much is the maxEdits you have set. It should catch restaurant example 
> with edit distance set to 2.
>
> Thanks,
> Susheel
>
> -Original Message-
> From: Maciej Dziardziel [mailto:fied...@gmail.com]
> Sent: Friday, May 02, 2014 7:05 PM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking - looking for general advice
>
> Hi
>
> I was looking at spellcheck (Direct and FileBased) and testing that they can 
> do.
> Direct works fine most of the time, but I'd like to find solution for few 
> corner cases:
>
> 1) having "recruted" and "recruiter" in index, "recruter" should suggest the 
> latter.
> Obviously the distance to the former is smaller, so it may be completely 
> arbitrary,
> and perhaps must be handled on application side rather then solr.
> 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to 
> big for that.
>
> Those are few examples of queries that spellcheck gets (according to my 
> requirements) wrong.
> For now I am just looking at possible solutions and I'd need to come up with 
> initial concept to have something to show to users and get more feedback, 
> likely with more cases to correct.
>
> I'd like to know if there are some tweaks to spellcheck component I could 
> make (or perhaps other ways of doing this with solr), or am I forced to 
> hardcode list of all such corrections that go beyond what spellcheck can do?
>
> One solution I am considering is to put list of those special cases
> into FileSpellChecker (it seems to be more relaxed, and handles
> restraunt case well) and fall back to Direct if this yields no
> results... though I am not sure yet how well that would work in
> practice if the list of misspelled words would grow beyond few I have
> now. It would most likely woldn't scale
>
> Another possibility would be to analyze list of queries our users use that 
> yield little results and check if there is spellchecked version that improves 
> that... but that seems to require human to review corrections.
>
> Yet another thing I was thinking about would be to pull terms into separate 
> spellchecker (like aspell) and see if they do better job or are more 
> tweakable.
>
> That's a bit open ended problem, so any advice welcome.
>
> --
> Maciej Dziardziel
> fied...@gmail.com
> This e-mail message may contain confidential or legally privileged 
> information and is intended only for the use of the intended recipient(s). 
> Any unauthorized disclosure, dissemination, distribution, copying or the 
> taking of any action in reliance on the information herein is prohibited. 
> E-mails are not secure and cannot be guaranteed to be error free as they can 
> be intercepted, amended, or contain viruses. Anyone who communicates with us 
> by e-mail is deemed to have accepted these risks. The Digital Group is not 
> responsible for errors or omissions in this message and denies any 
> responsibility for any damage arising from the use of e-mail. Any opinion 
> defamatory or deemed to be defamatory or  any material which could be 
> reasonably branded to be a species of plagiarism and other statements 
> contained in this message and any attachment are solely those of the author 
> and do not necessarily represent those of the company.



--
Maciej Dziardziel
fied...@gmail.com
This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.


Re: Spellchecking - looking for general advice

2014-05-03 Thread Maciej Dziardziel
Hi

I've set it to 2, but python implementation of Levenshtein says its 3
for restraunt -> restaurant.

On Sat, May 3, 2014 at 2:44 PM, Susheel Kumar
 wrote:
> How much is the maxEdits you have set. It should catch restaurant example 
> with edit distance set to 2.
>
> Thanks,
> Susheel
>
> -Original Message-
> From: Maciej Dziardziel [mailto:fied...@gmail.com]
> Sent: Friday, May 02, 2014 7:05 PM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking - looking for general advice
>
> Hi
>
> I was looking at spellcheck (Direct and FileBased) and testing that they can 
> do.
> Direct works fine most of the time, but I'd like to find solution for few 
> corner cases:
>
> 1) having "recruted" and "recruiter" in index, "recruter" should suggest the 
> latter.
> Obviously the distance to the former is smaller, so it may be completely 
> arbitrary,
> and perhaps must be handled on application side rather then solr.
> 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to 
> big for that.
>
> Those are few examples of queries that spellcheck gets (according to my 
> requirements) wrong.
> For now I am just looking at possible solutions and I'd need to come up with 
> initial concept to have something to show to users and get more feedback, 
> likely with more cases to correct.
>
> I'd like to know if there are some tweaks to spellcheck component I could 
> make (or perhaps other ways of doing this with solr), or am I forced to 
> hardcode list of all such corrections that go beyond what spellcheck can do?
>
> One solution I am considering is to put list of those special cases into 
> FileSpellChecker (it seems to be more relaxed, and handles restraunt case 
> well) and fall back to Direct if this yields no results... though I am not 
> sure yet how well that would work in practice if the list of misspelled words 
> would grow beyond few I have now. It would most likely woldn't scale
>
> Another possibility would be to analyze list of queries our users use that 
> yield little results and check if there is spellchecked version that improves 
> that... but that seems to require human to review corrections.
>
> Yet another thing I was thinking about would be to pull terms into separate 
> spellchecker (like aspell) and see if they do better job or are more 
> tweakable.
>
> That's a bit open ended problem, so any advice welcome.
>
> --
> Maciej Dziardziel
> fied...@gmail.com
> This e-mail message may contain confidential or legally privileged 
> information and is intended only for the use of the intended recipient(s). 
> Any unauthorized disclosure, dissemination, distribution, copying or the 
> taking of any action in reliance on the information herein is prohibited. 
> E-mails are not secure and cannot be guaranteed to be error free as they can 
> be intercepted, amended, or contain viruses. Anyone who communicates with us 
> by e-mail is deemed to have accepted these risks. The Digital Group is not 
> responsible for errors or omissions in this message and denies any 
> responsibility for any damage arising from the use of e-mail. Any opinion 
> defamatory or deemed to be defamatory or  any material which could be 
> reasonably branded to be a species of plagiarism and other statements 
> contained in this message and any attachment are solely those of the author 
> and do not necessarily represent those of the company.



-- 
Maciej Dziardziel
fied...@gmail.com


RE: Spellchecking - looking for general advice

2014-05-03 Thread Susheel Kumar
How much is the maxEdits you have set. It should catch restaurant example with 
edit distance set to 2.

Thanks,
Susheel

-Original Message-
From: Maciej Dziardziel [mailto:fied...@gmail.com]
Sent: Friday, May 02, 2014 7:05 PM
To: solr-user@lucene.apache.org
Subject: Spellchecking - looking for general advice

Hi

I was looking at spellcheck (Direct and FileBased) and testing that they can do.
Direct works fine most of the time, but I'd like to find solution for few 
corner cases:

1) having "recruted" and "recruiter" in index, "recruter" should suggest the 
latter.
Obviously the distance to the former is smaller, so it may be completely 
arbitrary,
and perhaps must be handled on application side rather then solr.
2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to big 
for that.

Those are few examples of queries that spellcheck gets (according to my 
requirements) wrong.
For now I am just looking at possible solutions and I'd need to come up with 
initial concept to have something to show to users and get more feedback, 
likely with more cases to correct.

I'd like to know if there are some tweaks to spellcheck component I could make 
(or perhaps other ways of doing this with solr), or am I forced to hardcode 
list of all such corrections that go beyond what spellcheck can do?

One solution I am considering is to put list of those special cases into 
FileSpellChecker (it seems to be more relaxed, and handles restraunt case well) 
and fall back to Direct if this yields no results... though I am not sure yet 
how well that would work in practice if the list of misspelled words would grow 
beyond few I have now. It would most likely woldn't scale

Another possibility would be to analyze list of queries our users use that 
yield little results and check if there is spellchecked version that improves 
that... but that seems to require human to review corrections.

Yet another thing I was thinking about would be to pull terms into separate 
spellchecker (like aspell) and see if they do better job or are more tweakable.

That's a bit open ended problem, so any advice welcome.

--
Maciej Dziardziel
fied...@gmail.com
This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.


Spellchecking - looking for general advice

2014-05-02 Thread Maciej Dziardziel
Hi

I was looking at spellcheck (Direct and FileBased) and testing that they can do.
Direct works fine most of the time, but I'd like to find solution for
few corner cases:

1) having "recruted" and "recruiter" in index, "recruter" should
suggest the latter.
Obviously the distance to the former is smaller, so it may be
completely arbitrary,
and perhaps must be handled on application side rather then solr.
2) "restraunt" doesn't suggest "restaurant" - I assume that distance
is to big for that.

Those are few examples of queries that spellcheck gets (according to
my requirements) wrong.
For now I am just looking at possible solutions and I'd need to come
up with initial concept
to have something to show to users and get more feedback, likely with
more cases
to correct.

I'd like to know if there are some tweaks to spellcheck component I
could make (or perhaps other ways of doing this with solr),
or am I forced to hardcode list of all such corrections that go beyond
what spellcheck can do?

One solution I am considering is to put list of those special cases
into FileSpellChecker (it seems to be more relaxed, and handles
restraunt case well) and fall back to Direct if this yields no
results... though I am not sure yet how well that would work in
practice
if the list of misspelled words would grow beyond few I have now. It
would most likely woldn't scale

Another possibility would be to analyze list of queries our users use
that yield little results and check if there is spellchecked
version that improves that... but that seems to require human to
review corrections.

Yet another thing I was thinking about would be to pull terms into
separate spellchecker (like aspell) and see if they do better job or
are more tweakable.

That's a bit open ended problem, so any advice welcome.

--
Maciej Dziardziel
fied...@gmail.com


RE: Spellchecking problem

2013-12-20 Thread Dyer, James
Gastone,

You may, at least while developing, specify 
"spellcheck.collateExtendedResults=true" so you can see for sure it has 
verified how many hits each collation would return.

But my guess is that your "mm" parameter makes pretty much anything return some 
hits.  You might want to specify "spellcheck.collateParam.mm=100%" or something 
like that to restrict collations to only those queries that return hits if all 
the terms were required.

See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Friday, December 20, 2013 8:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecking problem

Thank you for your answer.

this is the querystring

http://seshat:9000/solr/browse/?q=otto+maialotto&fq=shelf:GIO&qf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0 category_label^0 &pf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0
category_label^0&spellcheck=true&spellcheck.collate=true&spellcheck.maxCollationTries=10&spellcheck.q=otto+il+maialotto&mm=2%3C-1+5%3C80%25&

shelf is the field that rappresent the typology of product and GIO is the
typology (games)

the problem is the collation
the result gives ( Otto il polpo ) is the name of another product typology
(Book)
why?

the result is this.




5
0
17
0


otto il polpo
2


gigetto il maialetto  vol.0
2


sotto il mare  vol.0
2


sotto il mare
2


otto il rinoceronte
2



true
(otto il polpo)



this is the conf:


textSpell


  default
  spellcheckdef
  spellchecker
  on
  false
  true
  6
  true
  .001


  

Thanks






2013/12/20 Dyer, James 

> If you are using "spellcheck.maxCollateTries" with a value greater than 0
> the *collatation* section of your spellcheck response will give query
> corrections that are proven to produce hits.  Possibly you were looking at
> the first section where it gives individual word suggestions?  Or maybe one
> of your query parameters is misspelled (check case and that you have
> "spellcheck." in front of all of them)?  If you can't figure it out,
> provide us the entire query string you're using, the spellcheck response
> you get back and also the relevant portions of solrconfig.xml.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Gastone Penzo [mailto:gastone.pe...@gmail.com]
> Sent: Friday, December 20, 2013 7:43 AM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking problem
>
> Hello,
>
> i have problem with spellchecking.
> i use solr to index an ecommerce products (dvd, cd, books ecc)
> the collation is only one but in the index there'is the field: typology (of
> product)
> When i build spellchecking indexes, they are build together.
> How can i have only suggestsions of one typology?
>
> i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
> solr evaluates every suggestion with fq parameter of the query. In my query
> i have for example fq=typology:book
> but it doesn't works. why?
>
> i also tried collationparameter.fq=typology:book
> the same
>
> i use solr 4.3
> thank you
>
>
> --
> *Gastone Penzo*
>
>


-- 
*Gastone Penzo*



Re: Spellchecking problem

2013-12-20 Thread Gastone Penzo
Thank you for your answer.

this is the querystring

http://seshat:9000/solr/browse/?q=otto+maialotto&fq=shelf:GIO&qf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0 category_label^0 &pf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0
category_label^0&spellcheck=true&spellcheck.collate=true&spellcheck.maxCollationTries=10&spellcheck.q=otto+il+maialotto&mm=2%3C-1+5%3C80%25&

shelf is the field that rappresent the typology of product and GIO is the
typology (games)

the problem is the collation
the result gives ( Otto il polpo ) is the name of another product typology
(Book)
why?

the result is this.




5
0
17
0


otto il polpo
2


gigetto il maialetto  vol.0
2


sotto il mare  vol.0
2


sotto il mare
2


otto il rinoceronte
2



true
(otto il polpo)



this is the conf:


textSpell


  default
  spellcheckdef
  spellchecker
  on
  false
  true
  6
  true
  .001


  

Thanks






2013/12/20 Dyer, James 

> If you are using "spellcheck.maxCollateTries" with a value greater than 0
> the *collatation* section of your spellcheck response will give query
> corrections that are proven to produce hits.  Possibly you were looking at
> the first section where it gives individual word suggestions?  Or maybe one
> of your query parameters is misspelled (check case and that you have
> "spellcheck." in front of all of them)?  If you can't figure it out,
> provide us the entire query string you're using, the spellcheck response
> you get back and also the relevant portions of solrconfig.xml.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Gastone Penzo [mailto:gastone.pe...@gmail.com]
> Sent: Friday, December 20, 2013 7:43 AM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking problem
>
> Hello,
>
> i have problem with spellchecking.
> i use solr to index an ecommerce products (dvd, cd, books ecc)
> the collation is only one but in the index there'is the field: typology (of
> product)
> When i build spellchecking indexes, they are build together.
> How can i have only suggestsions of one typology?
>
> i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
> solr evaluates every suggestion with fq parameter of the query. In my query
> i have for example fq=typology:book
> but it doesn't works. why?
>
> i also tried collationparameter.fq=typology:book
> the same
>
> i use solr 4.3
> thank you
>
>
> --
> *Gastone Penzo*
>
>


-- 
*Gastone Penzo*


RE: Spellchecking problem

2013-12-20 Thread Dyer, James
If you are using "spellcheck.maxCollateTries" with a value greater than 0 the 
*collatation* section of your spellcheck response will give query corrections 
that are proven to produce hits.  Possibly you were looking at the first 
section where it gives individual word suggestions?  Or maybe one of your query 
parameters is misspelled (check case and that you have "spellcheck." in front 
of all of them)?  If you can't figure it out, provide us the entire query 
string you're using, the spellcheck response you get back and also the relevant 
portions of solrconfig.xml.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Friday, December 20, 2013 7:43 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking problem

Hello,

i have problem with spellchecking.
i use solr to index an ecommerce products (dvd, cd, books ecc)
the collation is only one but in the index there'is the field: typology (of
product)
When i build spellchecking indexes, they are build together.
How can i have only suggestsions of one typology?

i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
solr evaluates every suggestion with fq parameter of the query. In my query
i have for example fq=typology:book
but it doesn't works. why?

i also tried collationparameter.fq=typology:book
the same

i use solr 4.3
thank you


-- 
*Gastone Penzo*



Spellchecking problem

2013-12-20 Thread Gastone Penzo
Hello,

i have problem with spellchecking.
i use solr to index an ecommerce products (dvd, cd, books ecc)
the collation is only one but in the index there'is the field: typology (of
product)
When i build spellchecking indexes, they are build together.
How can i have only suggestsions of one typology?

i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
solr evaluates every suggestion with fq parameter of the query. In my query
i have for example fq=typology:book
but it doesn't works. why?

i also tried collationparameter.fq=typology:book
the same

i use solr 4.3
thank you


-- 
*Gastone Penzo*


Re: Spellchecking

2013-09-23 Thread Gastone Penzo
Thank you!!




2013/9/20 Dyer, James 

> If you're using "spellcheck.collate" you can also set
> "spellcheck.maxCollationTries" to validate each collation against the index
> before suggesting it.  This validation takes into account any "fq"
> parameters on your query, so if your original query has "fq=Product:Book",
> then the collations returned will all be vetted by internally running the
> query with that filter applied.
>
> If for some reason your main query does not have "fq=Product:Book", but
> you want it considered when collations are being built, you can include
> "spellcheck.collateParam.fq=Product:Book".
>
> See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateand 
> following sections.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Gastone Penzo [mailto:gastone.pe...@gmail.com]
> Sent: Friday, September 20, 2013 4:00 AM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking
>
> Hi,
> i'd like to know if is it possibile to have suggests only of a part of
> indexes.
> for example:
>
> an ecommerce:
> there are a lot of typologies of products (book, dvd, cd..)
>
> if i search inside books, i want only suggests of books products, not cds
> but the spellchecking indexs are all together.
>
> is it possibile to divided indexes or have suggests only of a typology?
>
> thanx
>
> --
> Gastone
>
>


-- 
*Gastone Penzo*
*
*


RE: Spellchecking

2013-09-20 Thread Dyer, James
If you're using "spellcheck.collate" you can also set 
"spellcheck.maxCollationTries" to validate each collation against the index 
before suggesting it.  This validation takes into account any "fq" parameters 
on your query, so if your original query has "fq=Product:Book", then the 
collations returned will all be vetted by internally running the query with 
that filter applied.

If for some reason your main query does not have "fq=Product:Book", but you 
want it considered when collations are being built, you can include 
"spellcheck.collateParam.fq=Product:Book".

See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate and 
following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Friday, September 20, 2013 4:00 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking

Hi,
i'd like to know if is it possibile to have suggests only of a part of
indexes.
for example:

an ecommerce:
there are a lot of typologies of products (book, dvd, cd..)

if i search inside books, i want only suggests of books products, not cds
but the spellchecking indexs are all together.

is it possibile to divided indexes or have suggests only of a typology?

thanx

-- 
Gastone



Spellchecking

2013-09-20 Thread Gastone Penzo
Hi,
i'd like to know if is it possibile to have suggests only of a part of
indexes.
for example:

an ecommerce:
there are a lot of typologies of products (book, dvd, cd..)

if i search inside books, i want only suggests of books products, not cds
but the spellchecking indexs are all together.

is it possibile to divided indexes or have suggests only of a typology?

thanx

-- 
Gastone


Re: Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Hi All,

I didn't have the lucene-solr source compiling cleaning in eclipse
initially so I created a very quick maven project to demonstrate this issue:

https://github.com/rainkinz/solr_spellcheck_index_out_of_bounds.git

Having said that I just got everything set up in eclipse, so I can create a
test case if this is actually an issue and not something weird with my
configuration.

Thanks
Brendan



On Thu, Aug 15, 2013 at 1:43 PM, Brendan Grainger <
brendan.grain...@gmail.com> wrote:

> Further to this. If I change:
>
> tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
> monitoring system,tpm,low tire warning,tire pressure monitor system
>
> to
>
> service tire monitor,tire monitor,tire pressure monitor,tire pressure
> monitoring system,tpm,low tire warning,tire pressure monitor system,tpms
>
> I don't get a crash. I tried it with some other fields too. e.g.:
>
> asdm,airbag system diagnostic module => crash
>
> airbag system diagnostic module,asdm => no crash
>
> Thanks
> Brendan
>
>
>
> On Thu, Aug 15, 2013 at 1:37 PM, Brendan Grainger <
> brendan.grain...@gmail.com> wrote:
>
>> Hi All,
>>
>> I've been debugging an issue where the query 'tpms' would make the
>> spellchecker throw the following exception:
>>
>> 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter
>>  – null:java.lang.StringIndexOutOfBoundsException: String index out of
>> range: -1
>>  at
>> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
>> at java.lang.StringBuilder.replace(StringBuilder.java:266)
>>  at
>> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
>> at
>> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)
>>
>>
>> I have the following synonyms defined for tpms:
>>
>> tpms,service tire monitor,tire monitor,tire pressure monitor,tire
>> pressure monitoring system,tpm,low tire warning,tire pressure monitor system
>>
>> Note that if you query any of the other synonyms there is no issue, only
>> tpms.
>>
>> Looking at my field definition for my spellchecker I realized I am doing
>> query time synonym expansion:
>>
>> > positionIncrementGap="100" omitNorms="true">
>>   
>> 
>> > ignoreCase="true"
>> words="lang/stopwords_en.txt"
>> enablePositionIncrements="true"
>> />
>> 
>> 
>>   
>>   
>> 
>> > ignoreCase="true" expand="true"/>
>> > ignoreCase="true"
>> words="lang/stopwords_en.txt"
>> enablePositionIncrements="true"
>> />
>> 
>> 
>>   
>> 
>>
>> I copied this field definition from:
>> http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
>> related to synonyms I removed the SynonymFilterFactory and everything
>> works.
>>
>> I'm going to try to create a reproducible test case for the crash, but
>> right now I'm wondering what I lose by not having synonym expansion when
>> spell checking?
>>
>> Thanks
>>  Brendan
>>
>>
>>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>



-- 
Brendan Grainger
www.kuripai.com


Re: Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Further to this. If I change:

tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system

to

service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system,tpms

I don't get a crash. I tried it with some other fields too. e.g.:

asdm,airbag system diagnostic module => crash

airbag system diagnostic module,asdm => no crash

Thanks
Brendan



On Thu, Aug 15, 2013 at 1:37 PM, Brendan Grainger <
brendan.grain...@gmail.com> wrote:

> Hi All,
>
> I've been debugging an issue where the query 'tpms' would make the
> spellchecker throw the following exception:
>
> 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  –
> null:java.lang.StringIndexOutOfBoundsException: String index out of range:
> -1
>  at
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
> at java.lang.StringBuilder.replace(StringBuilder.java:266)
>  at
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
> at
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)
>
>
> I have the following synonyms defined for tpms:
>
> tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
> monitoring system,tpm,low tire warning,tire pressure monitor system
>
> Note that if you query any of the other synonyms there is no issue, only
> tpms.
>
> Looking at my field definition for my spellchecker I realized I am doing
> query time synonym expansion:
>
>  positionIncrementGap="100" omitNorms="true">
>   
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> enablePositionIncrements="true"
> />
> 
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> enablePositionIncrements="true"
> />
> 
> 
>   
> 
>
> I copied this field definition from:
> http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
> related to synonyms I removed the SynonymFilterFactory and everything
> works.
>
> I'm going to try to create a reproducible test case for the crash, but
> right now I'm wondering what I lose by not having synonym expansion when
> spell checking?
>
> Thanks
> Brendan
>
>
>


-- 
Brendan Grainger
www.kuripai.com


Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Hi All,

I've been debugging an issue where the query 'tpms' would make the
spellchecker throw the following exception:

21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  –
null:java.lang.StringIndexOutOfBoundsException: String index out of range:
-1
at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
at java.lang.StringBuilder.replace(StringBuilder.java:266)
at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)


I have the following synonyms defined for tpms:

tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system

Note that if you query any of the other synonyms there is no issue, only
tpms.

Looking at my field definition for my spellchecker I realized I am doing
query time synonym expansion:


  




  
  





  


I copied this field definition from:
http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
related to synonyms I removed the SynonymFilterFactory and everything
works.

I'm going to try to create a reproducible test case for the crash, but
right now I'm wondering what I lose by not having synonym expansion when
spell checking?

Thanks
Brendan


Two questions on spellchecking

2012-08-06 Thread Uwe Reh

Hi,

even though I read a lot, none of my spellchecker configurations works 
really well. I reached a dead end. Maybe someone could help, to solve my 
challenges.


- How can I get case sensitive suggestions, independent of the given 
case in the query?


- How to configure a 'did you mean' spellchecking, as discussed in 
https://issues.apache.org/jira/browse/SOLR-2585 (Context-Sensitive 
Spelling Suggestions & Collations)



I'm using following environment:
- Solr 4.0-alpha (downloaded 25. June)
- Java 7
- schema.xml

 
 


 
  

> ...

  

- solrconfig.xml (suggester)

   
  
 all
 true
 suggester
 true
 false
 20
  
  
 suggester
  
   
   
  
 suggester
 org.apache.solr.spelling.suggest.Suggester
 org.apache.solr.spelling.suggest.tst.TSTLookup
 suggest
  
   

- solrconfig.xml (spellcheck)

  
  
 all
 10
 allfields
 true
 false
 20
  
  
 spellcheck
  
   

>

  textSpell
  
 default
 suggest
 solr.DirectSolrSpellChecker
 internal
 0.1
 2
 1
 5
 1
 0.1
 0.001
  
   


*Suggester problem*
With this configuration the suggester works not case sensitive, but the 
hints are all lower case.

Example: .../hint?q=da&wt=xml&spellcheck=true&spellcheck.build=true



0173truealltruesuggester20falsetruedaxmltruebuild2002dat-marktspiegel spezialdata structures with c++ using stldata warehousedatan, 
ingeborgdatenbanken mit delphidatenverschlüsselungdauner, gabrieledautermann, margitdavid copperfielddavid, horstdav

id, leodavid, nicholasdavis, charles t.davis, edward ldavis, leslie dorfmandavis, stanley m.davor 
kommt nochdavydova, irina n.dawidowski, bernddayan, danielfalse


Using just solr.StrField as field type, the suggestion are true to 
original capitalization, but I get no suggestions, if the query starts 
with a lower case character.


*Spelling problem*
One of the indexed entries in the field 'suggest' is "David Copperfield" 
and I want this string as alternative suggestion to the query "David 
opperfield".

Example .../select?q="david+opperfield"&rows=0&wt=xml&spellcheck=true



015allfieldsalltrue20false0true"david opperfield"xml0false


.../select?q=david+opperfield&rows=0&wt=xml&spellcheck=true
--> true

=?8-)
Uwe

Btw. Is there a DirectSolrSuggester corresponding to DirectSolrSpellChecker?




Re: Solr spellchecking fails on sharded query

2012-06-22 Thread fabio curti
Hi,
it seems the shards suggestion working fine if i set the select RH as
follow (  instead of  )

  

 
   explicit
   10
 

   
spellcheck
   



Now suggestion is populated!

Fabio

2012/6/22 fabio curti 

> I did as you suggest enabling "spellcheck" component in select RH.
>
>   
>  
>explicit
>10
>  
>
>  spellcheck
>
> 
>
> Response contains error 500
> 
> 
> 500
> 29
> 
> file
> true
> 
>
> fc:8900/solr/commenti,fc:7500/solr/commenti,fc:8584/solr/commenti,fc:7574/solr/commenti
> 
> piza
> piza
> 
> 
> 
> 
> java.lang.NullPointerException at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:819)
> at
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)
> at
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> at org.eclipse.jetty.server.Server.handle(Server.java:351) at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> at java.lang.Thread.run(Thread.java:679)
> 
> 500
> 
> 
>
> Fabio
>
> 2012/6/22 Markus Jelsma 
>
>> Hi,
>>
>> The spellcheck component must be enabled in your default request handler
>> otherwise your suggestions list is empty.
>>
>> Cheers,
>>
>>
>>
>> -Original message-
>> > From:fabio curti 
>> > Sent: Fri 22-Jun-2012 09:34
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Solr spellchecking fails on sharded query
>> >
>> > Hi,
>> > i try solr shards configuration ( SolrCloud ) and request settings as
>> > suggested in
>> >
>> http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Supportfor
>> > shards spelling.
>> > Suggestion is empty as Eric said.
>> >
>> > Any idea?
>> >
>> > Fabio
>> >
>> > 2012/6/19 Eric Wilson 
>> >
>> > > I have a Solr application that is distributed into 11 shards, using
>> Solr
>> > > version 4.0.0.2011.07.26.16.34.16
>> > >
>> > > In the solrconfig.xml for e

Re: Solr spellchecking fails on sharded query

2012-06-22 Thread fabio curti
I did as you suggest enabling "spellcheck" component in select RH.

  
 
   explicit
   10
 
   
spellcheck
   


Response contains error 500


500
29

file
true

fc:8900/solr/commenti,fc:7500/solr/commenti,fc:8584/solr/commenti,fc:7574/solr/commenti

piza
piza




java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:819)
at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Thread.java:679)

500



Fabio

2012/6/22 Markus Jelsma 

> Hi,
>
> The spellcheck component must be enabled in your default request handler
> otherwise your suggestions list is empty.
>
> Cheers,
>
>
>
> -Original message-
> > From:fabio curti 
> > Sent: Fri 22-Jun-2012 09:34
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr spellchecking fails on sharded query
> >
> > Hi,
> > i try solr shards configuration ( SolrCloud ) and request settings as
> > suggested in
> >
> http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Supportfor
> > shards spelling.
> > Suggestion is empty as Eric said.
> >
> > Any idea?
> >
> > Fabio
> >
> > 2012/6/19 Eric Wilson 
> >
> > > I have a Solr application that is distributed into 11 shards, using
> Solr
> > > version 4.0.0.2011.07.26.16.34.16
> > >
> > > In the solrconfig.xml for each shard, I have configured a spellcheck
> > > component:
> > >
> > >
> > >
> > >  textSpell
> > >
> > >  
> > >
> > >cn_spell
> > >
> > >company_name_spell
> > >
> > >0.0001
> > >
> > >true
> > >
> > >./spellchecker_cn_spell
> > >
> > >  
> > >
> > >
> > >
> > > I have built the dictionary for each shard, and verified that each
> shard
> > > will return suggestions for misspellings. Moreover, it is evident that
> a
> > > different dictionary is being used for the various shards.
> > >
> > > The problem comes when I submit a sharded query. In that case the
> result
> > > comes back with the following:
> > >
> > >
> > >  
> > >
> > >
> > > In other words, the list of words for which there are suggestions is
> empty.
> > >
> > > Is there a trick to sharded spellchecking? I appreciate any
> suggestions.
> > >
> > > Eric
> > >
> >
>


RE: Solr spellchecking fails on sharded query

2012-06-22 Thread Markus Jelsma
Hi,

The spellcheck component must be enabled in your default request handler 
otherwise your suggestions list is empty.

Cheers,

 
 
-Original message-
> From:fabio curti 
> Sent: Fri 22-Jun-2012 09:34
> To: solr-user@lucene.apache.org
> Subject: Re: Solr spellchecking fails on sharded query
> 
> Hi,
> i try solr shards configuration ( SolrCloud ) and request settings as
> suggested in
> http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support for
> shards spelling.
> Suggestion is empty as Eric said.
> 
> Any idea?
> 
> Fabio
> 
> 2012/6/19 Eric Wilson 
> 
> > I have a Solr application that is distributed into 11 shards, using Solr
> > version 4.0.0.2011.07.26.16.34.16
> >
> > In the solrconfig.xml for each shard, I have configured a spellcheck
> > component:
> >
> >
> >
> >  textSpell
> >
> >  
> >
> >cn_spell
> >
> >company_name_spell
> >
> >0.0001
> >
> >true
> >
> >./spellchecker_cn_spell
> >
> >  
> >
> >
> >
> > I have built the dictionary for each shard, and verified that each shard
> > will return suggestions for misspellings. Moreover, it is evident that a
> > different dictionary is being used for the various shards.
> >
> > The problem comes when I submit a sharded query. In that case the result
> > comes back with the following:
> >
> >
> >  
> >
> >
> > In other words, the list of words for which there are suggestions is empty.
> >
> > Is there a trick to sharded spellchecking? I appreciate any suggestions.
> >
> > Eric
> >
> 


Re: Solr spellchecking fails on sharded query

2012-06-22 Thread fabio curti
Hi,
i try solr shards configuration ( SolrCloud ) and request settings as
suggested in
http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support for
shards spelling.
Suggestion is empty as Eric said.

Any idea?

Fabio

2012/6/19 Eric Wilson 

> I have a Solr application that is distributed into 11 shards, using Solr
> version 4.0.0.2011.07.26.16.34.16
>
> In the solrconfig.xml for each shard, I have configured a spellcheck
> component:
>
>
>
>  textSpell
>
>  
>
>cn_spell
>
>company_name_spell
>
>0.0001
>
>true
>
>./spellchecker_cn_spell
>
>  
>
>
>
> I have built the dictionary for each shard, and verified that each shard
> will return suggestions for misspellings. Moreover, it is evident that a
> different dictionary is being used for the various shards.
>
> The problem comes when I submit a sharded query. In that case the result
> comes back with the following:
>
>
>  
>
>
> In other words, the list of words for which there are suggestions is empty.
>
> Is there a trick to sharded spellchecking? I appreciate any suggestions.
>
> Eric
>


Re: Solr spellchecking fails on sharded query

2012-06-19 Thread fabio curti
Hi,
i found this article about your issue.

http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support

Fabio

2012/6/19 Eric Wilson 

> I have a Solr application that is distributed into 11 shards, using Solr
> version 4.0.0.2011.07.26.16.34.16
>
> In the solrconfig.xml for each shard, I have configured a spellcheck
> component:
>
>
>
>  textSpell
>
>  
>
>cn_spell
>
>company_name_spell
>
>0.0001
>
>true
>
>./spellchecker_cn_spell
>
>  
>
>
>
> I have built the dictionary for each shard, and verified that each shard
> will return suggestions for misspellings. Moreover, it is evident that a
> different dictionary is being used for the various shards.
>
> The problem comes when I submit a sharded query. In that case the result
> comes back with the following:
>
>
>  
>
>
> In other words, the list of words for which there are suggestions is empty.
>
> Is there a trick to sharded spellchecking? I appreciate any suggestions.
>
> Eric
>


Solr spellchecking fails on sharded query

2012-06-19 Thread Eric Wilson
I have a Solr application that is distributed into 11 shards, using Solr
version 4.0.0.2011.07.26.16.34.16

In the solrconfig.xml for each shard, I have configured a spellcheck
component:



  textSpell

  

cn_spell

company_name_spell

0.0001

true

./spellchecker_cn_spell

  



I have built the dictionary for each shard, and verified that each shard
will return suggestions for misspellings. Moreover, it is evident that a
different dictionary is being used for the various shards.

The problem comes when I submit a sharded query. In that case the result
comes back with the following:


  


In other words, the list of words for which there are suggestions is empty.

Is there a trick to sharded spellchecking? I appreciate any suggestions.

Eric


spellchecking in nutch solr

2011-09-01 Thread alxsss


Hello,
I have tried to implement spellchecker based on index in nutch-solr by adding 
spell field to schema.xml and making it a copy from content field. However, 
this increased data folder size twice and spell filed as a copy of content 
field appears in xml feed which is not necessary. Is it possible to implement 
spellchecker without this issue?

Thanks.
Alex.
 


SolR : Spellchecking & Autocomplete

2011-08-11 Thread vsham
Hello,

I posted on the Lucene Forums, and someone told me to e-mail it here.

Instead of writing again my question here, I take the liberty to link my post. 
Its about SolR, autocompletion, Spellchecking and "case-sentivieness" (?).

http://lucene.472066.n3.nabble.com/SolR-Spellchecking-amp-Autocomplete-td3243107.html

Thanks for all,

Valentin


Re: Problem with spellchecking, dont want multiple request to SOLR

2011-07-07 Thread roySolr
What should the query look like??

I can't define 2 spellchecker in one query. I want something like this:

Search: Soccerclub(what) Manchester(where)

select/?q=socerclub
macnchester&spellcheck=true&spellcheck.dictionary=spell_what&spellcheck.dictionary=spell_where&spell_what=socerclub&spell_where=macnchester

Now i have 2 spellcheckers in my requesthandler but i can't set them correct
in my query.
My config looks like this:


spellcheck1
spellcheck2



 
spell_what
spell_search1
true
spellchecker1



 
spell_where
spell_search2
true
spellchecker2






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p3147545.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with spellchecking, dont want multiple request to SOLR

2011-05-30 Thread Jan Høydahl
Hi,

Define two searchComponents with different names. Then refer to both in 
 in your Search Request Handler config.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 27. mai 2011, at 10.01, roySolr wrote:

> mm ok. I configure 2 spellcheckers:
> 
> 
>
>   spell_what
>   spell_what
>   true
>   spellchecker_what
>   
>   
>   spell_where
>   spell_where
>   true
>   spellchecker_where
>   
> 
> 
> How can i enable it in my search request handler and search both in one
> request?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2992076.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem with spellchecking, dont want multiple request to SOLR

2011-05-27 Thread roySolr
mm ok. I configure 2 spellcheckers:


 
spell_what
spell_what
true
spellchecker_what


spell_where
spell_where
true
spellchecker_where



How can i enable it in my search request handler and search both in one
request?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2992076.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with spellchecking, dont want multiple request to SOLR

2011-05-26 Thread Jan Høydahl
Yep, it's possible. Setup two spellcheckers, one named "spellwhat" and one 
named "spellwhere" and enable both on your searchRequestHandler.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 26. mai 2011, at 12.04, roySolr wrote:

> Hello,
> 
> First i will explain my situation. I have a 2 fields on my website: What and
> Where. 
> When a user search i want spellcheck on both fields. Now i have 2
> dictionaries, one for
> what and one for where. I want to search with one request and spellcheck
> both fields. Is
> it possible and how?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2988167.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Problem with spellchecking, dont want multiple request to SOLR

2011-05-26 Thread roySolr
Hello,

First i will explain my situation. I have a 2 fields on my website: What and
Where. 
When a user search i want spellcheck on both fields. Now i have 2
dictionaries, one for
what and one for where. I want to search with one request and spellcheck
both fields. Is
it possible and how?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2988167.html
Sent from the Solr - User mailing list archive at Nabble.com.


auto-completion with suggester and spellchecking

2011-05-24 Thread Jean-Claude Dauphin
Hi,
I would like to get  suggestions that correspond  to spelling correction in
case there are typing mistakes in the typed characters. I found a similar
post but with no answer
http://lucene.472066.n3.nabble.com/Solr-suggester-and-spell-checker-td2326907.html


And I have some questions about how to use Solr suggester component for
autocompletion and spellchecking at the same time.

1) Does Solr can use the same spellcheck dictionary (that is based upon the
main index) for autocompletion and spellchecking?

2) In solrconfig.xml, should I configure a "suggest" search Component AND a
"spellcheck" component? OR a single search component would be sufficient?
any example of configuration would be appreciated.

3) Which parameters should be used in the query?
I
The following query:

http://localhost:8983/solr/position/suggest?q=ing&qt=/suggest&onlyMorePopular=true
returns no suggestions in case of typing mistake.

Thank you in advance for yr time

Best wishes


-- 
Jean-Claude Dauphin

jc.daup...@gmail.com
jc.daup...@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org


Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw
Thanks Otis and Luke.

Yes it does make sense to spellcheck phrases in Chinese. Looks like the
default Solr spellCheck component is already doing some kind of NGram-ing.
When examining the spellCheck index, I did see gram1, gram2, gram3, gram4...
The problem is no Chinese terms were indexed into the spellChecker index,
only English terms.

Regards,

Alex

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2813149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Luke Lu
It doesn't make sense to spell check individual character sized words,
but makes a lot of sense for phrases. Due to pervasive use of pinyin
IM, it's very easy to write phrases that are totally wrong in
semantics and but "sounds" correct. n-gram should work if it doesn't
mangle the characters.

On Tue, Apr 12, 2011 at 12:47 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> Does spellchecking in Chinese actually make sense?  I once asked a native
> Chinese speaker about that and the person told me it didn't really make sense.
> Anyhow, with n-grams, I don't think this could technically work even if it 
> made
> sense for Chinese, could it?
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: alexw 
>> To: solr-user@lucene.apache.org
>> Sent: Tue, April 12, 2011 3:07:48 PM
>> Subject: Spellchecking in the Chinese Lanugage
>>
>> Hi,
>>
>> I have been trying to get spellcheck to work in the Chinese language.  So far
>> I have not had any luck. Can someone shed some light here as a general  guide
>> line in terms of what need to happen?
>>
>> I am using the CJKAnalyzer  in the text field type and searching works fine,
>> but spelling does not work.  Here are the things I have tried:
>>
>> 1. Put CJKAnalyzer in the "textSpell"  field type.
>> 2. Set the characterEncoding param to "utf-8" in the spellcheck  search
>> component.
>> 3. Using Luke, I can see the Chinese characters in the  "spell" field in the
>> main index.
>> 4. After building the spelling index, I  don't see Chinese characters in the
>> "spellchecker" index, only terms in  English.
>> 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no  luck
>> either.
>>
>> Thanks!
>>
>>
>> --
>> View this message in context:
>>http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html
>>
>> Sent  from the Solr - User mailing list archive at Nabble.com.
>>
>


Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Otis Gospodnetic
Hi,

Does spellchecking in Chinese actually make sense?  I once asked a native 
Chinese speaker about that and the person told me it didn't really make sense.
Anyhow, with n-grams, I don't think this could technically work even if it made 
sense for Chinese, could it?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: alexw 
> To: solr-user@lucene.apache.org
> Sent: Tue, April 12, 2011 3:07:48 PM
> Subject: Spellchecking in the Chinese Lanugage
> 
> Hi,
> 
> I have been trying to get spellcheck to work in the Chinese language.  So far
> I have not had any luck. Can someone shed some light here as a general  guide
> line in terms of what need to happen?
> 
> I am using the CJKAnalyzer  in the text field type and searching works fine,
> but spelling does not work.  Here are the things I have tried:
> 
> 1. Put CJKAnalyzer in the "textSpell"  field type.
> 2. Set the characterEncoding param to "utf-8" in the spellcheck  search
> component.
> 3. Using Luke, I can see the Chinese characters in the  "spell" field in the
> main index.
> 4. After building the spelling index, I  don't see Chinese characters in the
> "spellchecker" index, only terms in  English.
> 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no  luck
> either.
> 
> Thanks!
> 
> 
> --
> View this message in context: 
>http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 


Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw
Hi,

I have been trying to get spellcheck to work in the Chinese language. So far
I have not had any luck. Can someone shed some light here as a general guide
line in terms of what need to happen?

I am using the CJKAnalyzer in the text field type and searching works fine,
but spelling does not work. Here are the things I have tried:

1. Put CJKAnalyzer in the "textSpell" field type.
2. Set the characterEncoding param to "utf-8" in the spellcheck search
component.
3. Using Luke, I can see the Chinese characters in the "spell" field in the
main index.
4. After building the spelling index, I don't see Chinese characters in the
"spellchecker" index, only terms in English.
5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck
either.

Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellchecking Escaped Queries

2011-04-04 Thread Colin Vipurs
Apologies for the duplicate post.  I'm having Evolution problems


> Thanks Chris, 
> 
> The field used for indexing and spellcheck is the same and is
> configured like this:..
> 
> 
>  class="solr.TextField" >
>
>   
>   ignoreCase="true" expand="true"/>
>  
> pattern="^([^!]+)\!([^!]+)$"
>   replacement="$1i$2"
>   replace="all"/> 
>   generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="0" catenateAll="1" splitOnCaseChange="1" 
> preserveOriginal="1"/>
>  
>
> 
> 
> 
> I use the pattern replace filter to swap all instances of "!" within a
> word to "i".  I know this part is working correctly as performing a
> search works correctly.
> 
> The spellcheck is initialized like this:
> 
> 
> 
>title
>
>   default
>   searchfield
>   ./spellchecker
>   false
>
> 
> 
> And is attached to as a component to my search handler.
> 
> Thanks,
> 
> Colin
> 
> 
> > : I'm having an issue performing a spellcheck on some information and
> > : search of the archive isn't helping.
> > 
> > For this type of quesiton, there's not much feedback anyone can offer w/o 
> > knowing exactly what analyzers you have configured for hte various 
> > fieldtypes (both the field you index/search and the fieldtype used for 
> > spellchecking)
> > 
> > it's also fairly critical to know how you have the spellcheck component 
> > configured.
> > 
> > off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a 
> > wonky way given your usecase -- but like i said: would need to see the 
> > configs to make a guess.
> > 
> > 
> > -Hoss
> > 
> > __
> > This email has been scanned by the MessageLabs Email Security System.
> > For more information please visit http://www.messagelabs.com/email 
> > __
> 
> 
> -- 
> 
> 
> Colin Vipurs
> Server Team Lead
> 
> Shazam Entertainment Ltd   
> 26-28 Hammersmith Grove, London W6 7HA
> m:   +44 (0)  000 000   t: +44 (0) 20 8742 6820
> w:www.shazam.com
> 
> Please consider the environment before printing this document
> 
> This e-mail and its contents are strictly private and confidential. It
> must not be disclosed, distributed or copied without our prior
> consent. If you have received this transmission in error, please
> notify Shazam Entertainment immediately on: +44 (0) 020 8742 6820 and
> then delete it from your system. Please note that the information
> contained herein shall additionally constitute Confidential
> Information for the purposes of any NDA between the recipient/s and
> Shazam Entertainment. Shazam Entertainment Limited is incorporated in
> England and Wales under company number 3998831 and its registered
> office is at 26-28 Hammersmith Grove, London W6 7HA. 
> 
> 
> 
> 
> __
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> __
> 
> __
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> __


-- 


Colin Vipurs
Server Team Lead

Shazam Entertainment Ltd   
26-28 Hammersmith Grove, London W6 7HA
m:   +44 (0)  000 000   t: +44 (0) 20 8742 6820
w:www.shazam.com

Please consider the environment before printing this document

This e-mail and its contents are strictly private and confidential. It
must not be disclosed, distributed or copied without our prior consent.
If you have received this transmission in error, please notify Shazam
Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it
from your system. Please note that the information contained herein
shall additionally constitute Confidential Information for the purposes
of any NDA between the recipient/s and Shazam Entertainment. Shazam
Entertainment Limited is incorporated in England and Wales under company
number 3998831 and its registered office is at 26-28 Hammersmith Grove,
London W6 7HA. 




__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

Re: Spellchecking Escaped Queries

2011-04-04 Thread Colin Vipurs
Thanks Chris, 

The field used for indexing and spellcheck is the same and is configured
like this:..



   
  
 
 
  
 
 
   



I use the pattern replace filter to swap all instances of "!" within a
word to "i".  I know this part is working correctly as performing a
search works correctly.

The spellcheck is initialized like this:



   title
   
  default
  searchfield
  ./spellchecker
  false
   



This is attached as a component to my search handler and spellchecking
is done inline with the queries.

Thanks,

Colin



> : I'm having an issue performing a spellcheck on some information and
> : search of the archive isn't helping.
> 
> For this type of quesiton, there's not much feedback anyone can offer w/o 
> knowing exactly what analyzers you have configured for hte various 
> fieldtypes (both the field you index/search and the fieldtype used for 
> spellchecking)
> 
> it's also fairly critical to know how you have the spellcheck component 
> configured.
> 
> off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a 
> wonky way given your usecase -- but like i said: would need to see the 
> configs to make a guess.
> 
> 
> -Hoss
> 
> __
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> __


-- 


Colin Vipurs
Server Team Lead

Shazam Entertainment Ltd   
26-28 Hammersmith Grove, London W6 7HA
m:   +44 (0)  000 000   t: +44 (0) 20 8742 6820
w:www.shazam.com

Please consider the environment before printing this document

This e-mail and its contents are strictly private and confidential. It
must not be disclosed, distributed or copied without our prior consent.
If you have received this transmission in error, please notify Shazam
Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it
from your system. Please note that the information contained herein
shall additionally constitute Confidential Information for the purposes
of any NDA between the recipient/s and Shazam Entertainment. Shazam
Entertainment Limited is incorporated in England and Wales under company
number 3998831 and its registered office is at 26-28 Hammersmith Grove,
London W6 7HA. 






__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

Re: Spellchecking Escaped Queries

2011-04-04 Thread Colin Vipurs
Thanks Chris, 

The field used for indexing and spellcheck is the same and is configured
like this:..



   
  
 
 
  
 
 
   



I use the pattern replace filter to swap all instances of "!" within a
word to "i".  I know this part is working correctly as performing a
search works correctly.

The spellcheck is initialized like this:



   title
   
  default
  searchfield
  ./spellchecker
  false
   


And is attached to as a component to my search handler.

Thanks,

Colin


> : I'm having an issue performing a spellcheck on some information and
> : search of the archive isn't helping.
> 
> For this type of quesiton, there's not much feedback anyone can offer w/o 
> knowing exactly what analyzers you have configured for hte various 
> fieldtypes (both the field you index/search and the fieldtype used for 
> spellchecking)
> 
> it's also fairly critical to know how you have the spellcheck component 
> configured.
> 
> off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a 
> wonky way given your usecase -- but like i said: would need to see the 
> configs to make a guess.
> 
> 
> -Hoss
> 
> __
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> __


-- 


Colin Vipurs
Server Team Lead

Shazam Entertainment Ltd   
26-28 Hammersmith Grove, London W6 7HA
m:   +44 (0)  000 000   t: +44 (0) 20 8742 6820
w:www.shazam.com

Please consider the environment before printing this document

This e-mail and its contents are strictly private and confidential. It
must not be disclosed, distributed or copied without our prior consent.
If you have received this transmission in error, please notify Shazam
Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it
from your system. Please note that the information contained herein
shall additionally constitute Confidential Information for the purposes
of any NDA between the recipient/s and Shazam Entertainment. Shazam
Entertainment Limited is incorporated in England and Wales under company
number 3998831 and its registered office is at 26-28 Hammersmith Grove,
London W6 7HA. 




__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

Re: Spellchecking Escaped Queries

2011-04-02 Thread Chris Hostetter

: I'm having an issue performing a spellcheck on some information and
: search of the archive isn't helping.

For this type of quesiton, there's not much feedback anyone can offer w/o 
knowing exactly what analyzers you have configured for hte various 
fieldtypes (both the field you index/search and the fieldtype used for 
spellchecking)

it's also fairly critical to know how you have the spellcheck component 
configured.

off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a 
wonky way given your usecase -- but like i said: would need to see the 
configs to make a guess.


-Hoss


Spellchecking Escaped Queries

2011-03-21 Thread Colin Vipurs
I'm having an issue performing a spellcheck on some information and
search of the archive isn't helping.

I'm indexing the word "p!nk" (yes, that's a bang in there), and have a
replacement filter setup so that the ! becomes i.  Looking at the
analyzer the right thing is happening with both the indexer and query
mapping to "pink".  When I ask switch on spelling suggestions I get a
suggestion of "p!pink" which just seems odd.

When I make a request for something like "rink", I get the correct
suggestion of "pink", but asking for "r!nk", I get a suggestion of "r!
pink".  It seems like the spellcheck component isn't quite doing the
right thing somewhere.

I'm running 1.4.1 with the
https://issues.apache.org/jira/browse/SOLR-1553 patch applied for the
edismax query parser.

Thanks,

Colin
-- 


Colin Vipurs
Server Team Lead

Shazam Entertainment Ltd   
26-28 Hammersmith Grove, London W6 7HA
m:   +44 (0)  000 000   t: +44 (0) 20 8742 6820
w:www.shazam.com

Please consider the environment before printing this document

This e-mail and its contents are strictly private and confidential. It
must not be disclosed, distributed or copied without our prior consent.
If you have received this transmission in error, please notify Shazam
Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it
from your system. Please note that the information contained herein
shall additionally constitute Confidential Information for the purposes
of any NDA between the recipient/s and Shazam Entertainment. Shazam
Entertainment Limited is incorporated in England and Wales under company
number 3998831 and its registered office is at 26-28 Hammersmith Grove,
London W6 7HA. 




__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

Re: Spellchecking with some misspelled words in index source

2011-02-15 Thread Grijesh

You have to correct the misspelled terms in your content to work properly
because spell checker will find the term and supposed as  right term.

spell checker will return suggestion when word not found in its dictionary.

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecking-with-some-misspelled-words-in-index-source-tp2505722p2507110.html
Sent from the Solr - User mailing list archive at Nabble.com.


Spellchecking with some misspelled words in index source

2011-02-15 Thread Tanner Postert
I'm building my spellcheck index from my content and it seems to be working,
but my problem is that there are a few misspelled words in my content.  For
example: the word Sheriff is improperly misspelled Sherrif in my content a
couple dozen times (but spelled correctly a couple thousand times). The
results of the spellcheck at first glance indicate that the word is spelled
correctly because it is found in the spellcheck dictionary and has valid
search results. Adding a spellcheck.onlyMorePopular=true to the query
results in the spellcheck returning additional suggestions, but none of them
are for the correct spelling of the word:




sherriff


10




sherri


2319




sherril


155




sherif


19




sherric


4




is this just a strange glitch in my spellcheck dictionary based on my
content? What is strange, is sending the spellcheck sherriff (which is
another misspelling that has results in the index) results in the spellcheck
sending back the correct spelling as the top result.


RE: spellchecking even the key is true....

2011-01-17 Thread Dyer, James
Add spellcheck.onlyMorePopular=true to your query and I think it'll do what you 
want.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular for 
more info.

One caveat is if you use spellcheck.collate, this will likely result in 
useless, nonsensical collations most of the time.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: satya swaroop [mailto:satya.yada...@gmail.com] 
Sent: Monday, January 17, 2011 10:32 AM
To: solr-user@lucene.apache.org
Subject: spellchecking even the key is true

Hi All,
can we get the spellchecking results even when the keyword is true.
As for spellchecking will give only to the wrong keywords, cant we get
similar and near words of the keyword though the spellcheck.q is true..
as an example
http://localhost:8080/solr/spellcheck?q=java&spellcheck=true&spellcheck.count=5
the result will be

1)-

-






can we get the result as
2)

-


javax
javac
javabean
javascript



NOTE:: all the keywords in the 2nd result is are in index...

Regards,
satya


spellchecking even the key is true....

2011-01-17 Thread satya swaroop
Hi All,
can we get the spellchecking results even when the keyword is true.
As for spellchecking will give only to the wrong keywords, cant we get
similar and near words of the keyword though the spellcheck.q is true..
as an example
http://localhost:8080/solr/spellcheck?q=java&spellcheck=true&spellcheck.count=5
the result will be

1)-

-






can we get the result as
2)

-


javax
javac
javabean
javascript



NOTE:: all the keywords in the 2nd result is are in index...

Regards,
satya


Re: Spellchecking and frequency

2010-07-28 Thread Jonathan Rochkind



I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell library. I also extended the SpellCheckComponent to take
the
matrix of suggested words and query the corpus to find the first
combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.


This is interesting to me. I also have not been that happy with standard 
solr spellcheck. 

In addition to possibly filing a JIRA for future fix to Solr itself, 
another option would be you could make your 'alternate' SpellCheck 
component available as a seperate .jar, so anyone could use it just by 
installing and specifying it in their solrconfig.xml.  I would encourage 
you to consider that, not as a replacement for suggesting a patch to 
Solr itself, but so people can use your improved spellchecker 
immediately, without waiting for possible Solr patches.


Jonathan



Re: Spellchecking and frequency

2010-07-28 Thread dan sutton
Hi Mark,

Thanks for that info looks very interesting, would be great to see your
code. Out of interest did you use the dictionary and the phonetic file? Did
you see better results with both?

In regards to the secondary part to check the corpus for matching
suggestions, would another way to do this is to have an event listener to
listen for commits, and then build the dictionary for matching corpus words
that way, then you avoid the performance hit at query time.

Cheers,
Dan

On Tue, Jul 27, 2010 at 7:04 PM, Mark Holland wrote:

> Hi,
>
> I found the suggestions returned from the standard solr spellcheck not to
> be
> that relevant. By contrast, aspell, given the same dictionary and mispelled
> words, gives much more accurate suggestions.
>
> I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
> the java aspell library. I also extended the SpellCheckComponent to take
> the
> matrix of suggested words and query the corpus to find the first
> combination
> of suggestions which returned a match. This works well for my use case,
> where term frequency is irrelevant to spelling or scoring.
>
> I'd like to publish the code in case someone finds it useful (although it's
> a bit crude at the moment and will need a decent tidy up). Would it be
> appropriate to open up a Jira issue for this?
>
> Cheers,
> ~mark
>
> On 27 July 2010 09:33, dan sutton  wrote:
>
> > Hi,
> >
> > I've recently been looking into Spellchecking in solr, and was struck by
> > how
> > limited the usefulness of the tool was.
> >
> > Like most corpora , ours contains lots of different spelling mistakes for
> > the same word, so the 'spellcheck.onlyMorePopular' is not really that
> > useful
> > unless you click on it numerous times.
> >
> > I was thinking that since most of the time people spell words correctly
> why
> > was there no other frequency parameter that could enter into the score?
> > i.e.
> > something like:
> >
> > spell_score ~ edit_dist * freq
> >
> > I'm sure others have come across this issue and was wonding what
> > steps/algorithms they have used to overcome these limitations?
> >
> > Cheers,
> > Dan
> >
>


Re: Spellchecking and frequency

2010-07-27 Thread Erick Erickson
"Yonik's Law of Patches" reads: "A half-baked patch in Jira, with no
documentation, no tests and no backwards compatibilty is better than no
patch at all."

It'd be perfectly appropriate, IMO, for you to post an outline of what your
enhancements do over on the SOLR dev list and get a reaction from the folks
over there as to whether it should be a Jira or not... see
solr-...@lucene.apache.org

Best
Erick

On Tue, Jul 27, 2010 at 2:04 PM, Mark Holland wrote:

> Hi,
>
> I found the suggestions returned from the standard solr spellcheck not to
> be
> that relevant. By contrast, aspell, given the same dictionary and mispelled
> words, gives much more accurate suggestions.
>
> I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
> the java aspell library. I also extended the SpellCheckComponent to take
> the
> matrix of suggested words and query the corpus to find the first
> combination
> of suggestions which returned a match. This works well for my use case,
> where term frequency is irrelevant to spelling or scoring.
>
> I'd like to publish the code in case someone finds it useful (although it's
> a bit crude at the moment and will need a decent tidy up). Would it be
> appropriate to open up a Jira issue for this?
>
> Cheers,
> ~mark
>
> On 27 July 2010 09:33, dan sutton  wrote:
>
> > Hi,
> >
> > I've recently been looking into Spellchecking in solr, and was struck by
> > how
> > limited the usefulness of the tool was.
> >
> > Like most corpora , ours contains lots of different spelling mistakes for
> > the same word, so the 'spellcheck.onlyMorePopular' is not really that
> > useful
> > unless you click on it numerous times.
> >
> > I was thinking that since most of the time people spell words correctly
> why
> > was there no other frequency parameter that could enter into the score?
> > i.e.
> > something like:
> >
> > spell_score ~ edit_dist * freq
> >
> > I'm sure others have come across this issue and was wonding what
> > steps/algorithms they have used to overcome these limitations?
> >
> > Cheers,
> > Dan
> >
>


RE: Spellchecking and frequency

2010-07-27 Thread Dyer, James
Mark,

I'd like to see your code if you open a JIRA for this.  I recently
opened SOLR-2010 with a patch that does something similar to the second
part only of what you describe (find combinations that actually return a
match).  But I'm not sure if my approach is the best one so I would like
to see yours to compare.

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-Original Message-
From: Mark Holland [mailto:mark.holl...@zoopla.co.uk] 
Sent: Tuesday, July 27, 2010 1:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecking and frequency

Hi,

I found the suggestions returned from the standard solr spellcheck not
to be
that relevant. By contrast, aspell, given the same dictionary and
mispelled
words, gives much more accurate suggestions.

I therefore wrote an implementation of SolrSpellChecker that wraps
jazzy,
the java aspell library. I also extended the SpellCheckComponent to take
the
matrix of suggested words and query the corpus to find the first
combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.

I'd like to publish the code in case someone finds it useful (although
it's
a bit crude at the moment and will need a decent tidy up). Would it be
appropriate to open up a Jira issue for this?

Cheers,
~mark

On 27 July 2010 09:33, dan sutton  wrote:

> Hi,
>
> I've recently been looking into Spellchecking in solr, and was struck
by
> how
> limited the usefulness of the tool was.
>
> Like most corpora , ours contains lots of different spelling mistakes
for
> the same word, so the 'spellcheck.onlyMorePopular' is not really that
> useful
> unless you click on it numerous times.
>
> I was thinking that since most of the time people spell words
correctly why
> was there no other frequency parameter that could enter into the
score?
> i.e.
> something like:
>
> spell_score ~ edit_dist * freq
>
> I'm sure others have come across this issue and was wonding what
> steps/algorithms they have used to overcome these limitations?
>
> Cheers,
> Dan
>


Re: Spellchecking and frequency

2010-07-27 Thread Mark Holland
Hi,

I found the suggestions returned from the standard solr spellcheck not to be
that relevant. By contrast, aspell, given the same dictionary and mispelled
words, gives much more accurate suggestions.

I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell library. I also extended the SpellCheckComponent to take the
matrix of suggested words and query the corpus to find the first combination
of suggestions which returned a match. This works well for my use case,
where term frequency is irrelevant to spelling or scoring.

I'd like to publish the code in case someone finds it useful (although it's
a bit crude at the moment and will need a decent tidy up). Would it be
appropriate to open up a Jira issue for this?

Cheers,
~mark

On 27 July 2010 09:33, dan sutton  wrote:

> Hi,
>
> I've recently been looking into Spellchecking in solr, and was struck by
> how
> limited the usefulness of the tool was.
>
> Like most corpora , ours contains lots of different spelling mistakes for
> the same word, so the 'spellcheck.onlyMorePopular' is not really that
> useful
> unless you click on it numerous times.
>
> I was thinking that since most of the time people spell words correctly why
> was there no other frequency parameter that could enter into the score?
> i.e.
> something like:
>
> spell_score ~ edit_dist * freq
>
> I'm sure others have come across this issue and was wonding what
> steps/algorithms they have used to overcome these limitations?
>
> Cheers,
> Dan
>


Spellchecking and frequency

2010-07-27 Thread dan sutton
Hi,

I've recently been looking into Spellchecking in solr, and was struck by how
limited the usefulness of the tool was.

Like most corpora , ours contains lots of different spelling mistakes for
the same word, so the 'spellcheck.onlyMorePopular' is not really that useful
unless you click on it numerous times.

I was thinking that since most of the time people spell words correctly why
was there no other frequency parameter that could enter into the score? i.e.
something like:

spell_score ~ edit_dist * freq

I'm sure others have come across this issue and was wonding what
steps/algorithms they have used to overcome these limitations?

Cheers,
Dan


Re: PECL and Spellchecking

2010-05-05 Thread Israel Ekpo
Hi Peter

A full list of spell check parameters are available here

http://wiki.apache.org/solr/SpellCheckComponent

With the PECL extension, there is currently no special method that handles
the spell check component so you would have to use the SolrParams::set() or
SolrParams::setParam() method available from the SolrQuery class (a child of
the SolrParams class)

Below is the code snippet :

$options = array of options for solr client (name => value pairs); see
SolrClient::__construct()

$client = new SolrClient($options);

$spellcheck_component_name = 'spell';

$client->setServlet(SolrClient::SEARCH_SERVLET_TYPE,
$spellcheck_component_name);

$q = new SolrQuery();

$q->set($param_name, $param_value);
$q->setParam($param_name, $param_value);


$q->set('spellcheck', 'true');
$q->set('spellcheck.q', 'pecl');
$q->set('spellcheck.build', 'true');

$response = $client->query($q);

That should do it.

I hope this helps.

On Wed, May 5, 2010 at 4:56 AM, Peter Gabriel  wrote:

> Hi there,
>
> i´m working with the solr-pecl extension and asking me how I to permanently
> activate spellchecking.
> I couldn´t find a command from the pecl library to activate it by the
> client - like $solrQuery->enableFacet(true) for factes.
>
> Or is it possible to keep spellchecking permanently activate by solrconfig?
> Without using the "&spellcheck=true" parameters?
>
> Would be nice if someone could help me.
>
> Thx and greetings,
> Peter
> --
> GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


PECL and Spellchecking

2010-05-05 Thread Peter Gabriel
Hi there,

i´m working with the solr-pecl extension and asking me how I to permanently 
activate spellchecking. 
I couldn´t find a command from the pecl library to activate it by the client - 
like $solrQuery->enableFacet(true) for factes.

Or is it possible to keep spellchecking permanently activate by solrconfig? 
Without using the "&spellcheck=true" parameters? 

Would be nice if someone could help me.

Thx and greetings,
Peter
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01


Re: SpellChecking

2010-05-04 Thread Jan Kammer

Hi,

thanks, exactly that i forgot. Now it works fine. :-)

Am 03.05.2010 16:50, schrieb Michael Kuhlmann:

Am 03.05.2010 16:43, schrieb Jan Kammer:
   

Hi,

It worked fine with a normal field. There must something wrong with
copyfield, or why does dataimporthandler add/update no more documents?
 

Did you define your destination field as multivalue?

-Michael
   




Re: SpellChecking

2010-05-03 Thread Michael Kuhlmann
Am 03.05.2010 16:43, schrieb Jan Kammer:
> Hi,
> 
> It worked fine with a normal field. There must something wrong with
> copyfield, or why does dataimporthandler add/update no more documents?

Did you define your destination field as multivalue?

-Michael


Re: SpellChecking

2010-05-03 Thread Jan Kammer

Hi,

i build the index with ...&spellcheck.build=true
It worked fine with a normal field. There must something wrong with 
copyfield, or why does dataimporthandler add/update no more documents?


Can somebody paste the code for copyfield with many fields?

Greetz, Jan



Am 03.05.2010 16:36, schrieb Villemos, Gert:

We are using copy fields for 40+ fields to do spelling, and it works
fine.

Are you sure that you actually build the spell index before you try to
do spelling? You need to either configure SOLr to build spell index on
commit, or manually issue a spell index build request.

Regards,
Gert.





-Original Message-
From: Jan Kammer [mailto:jan.kam...@mni.fh-giessen.de]
Sent: Montag, 3. Mai 2010 16:26
To: solr-user@lucene.apache.org
Subject: Re: SpellChecking

Hi,

if I define one of my normal fields from schema.xml in solrconfig.xml
for spellchecking all works fine:

...

That didnt work, because nothing was in "spell" after that.

Next try was to copy each field in a line to "spell":



...
This does work up to 3 documents, if i define more, the count for failed

documents in dataimporthandler gets higher and higher the more i copy
into "spell".
16444

So my question is, if this is the right way to use the spellchecker with

many fields, or is there an other "better" way...

thanks.

greetz, Jan

Am 03.05.2010 16:08, schrieb Erick Erickson:
   

It would help a lot to see your actual config file, and if you
 

provided a
   

bit more
detail about what failure looks like

Best
Erick

On Mon, May 3, 2010 at 9:43 AM, Jan
 

Kammerwrote:
   


 

Hi there,

I want to enable spellchecking, but i got many fields.

I tried around with copyfield to copy all with "*" in one field, but
   

that
   

didnt work.
Next try was to copy some fields specified each by name in one field
   

named
   

"spell", but that worked only for 2 or 3 fields, but not for 10 or
   

more...
   

My question is, what the best practice is to enable spellchecking on
   

many
   

fields.

thanks.

greetz, Jan


   


 



Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.


   




RE: SpellChecking

2010-05-03 Thread Villemos, Gert
We are using copy fields for 40+ fields to do spelling, and it works
fine.

Are you sure that you actually build the spell index before you try to
do spelling? You need to either configure SOLr to build spell index on
commit, or manually issue a spell index build request.

Regards,
Gert.





-Original Message-
From: Jan Kammer [mailto:jan.kam...@mni.fh-giessen.de] 
Sent: Montag, 3. Mai 2010 16:26
To: solr-user@lucene.apache.org
Subject: Re: SpellChecking

Hi,

if I define one of my normal fields from schema.xml in solrconfig.xml 
for spellchecking all works fine:

...

That didnt work, because nothing was in "spell" after that.

Next try was to copy each field in a line to "spell":



...
This does work up to 3 documents, if i define more, the count for failed

documents in dataimporthandler gets higher and higher the more i copy 
into "spell".
16444

So my question is, if this is the right way to use the spellchecker with

many fields, or is there an other "better" way...

thanks.

greetz, Jan

Am 03.05.2010 16:08, schrieb Erick Erickson:
> It would help a lot to see your actual config file, and if you
provided a
> bit more
> detail about what failure looks like
>
> Best
> Erick
>
> On Mon, May 3, 2010 at 9:43 AM, Jan
Kammerwrote:
>
>
>> Hi there,
>>
>> I want to enable spellchecking, but i got many fields.
>>
>> I tried around with copyfield to copy all with "*" in one field, but
that
>> didnt work.
>> Next try was to copy some fields specified each by name in one field
named
>> "spell", but that worked only for 2 or 3 fields, but not for 10 or
more...
>>
>> My question is, what the best practice is to enable spellchecking on
many
>> fields.
>>
>> thanks.
>>
>> greetz, Jan
>>
>>  
>



Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.




Re: SpellChecking

2010-05-03 Thread Jan Kammer

Hi,

if I define one of my normal fields from schema.xml in solrconfig.xml 
for spellchecking all works fine:


...



That didnt work, because nothing was in "spell" after that.

Next try was to copy each field in a line to "spell":



...
This does work up to 3 documents, if i define more, the count for failed 
documents in dataimporthandler gets higher and higher the more i copy 
into "spell".

16444

So my question is, if this is the right way to use the spellchecker with 
many fields, or is there an other "better" way...


thanks.

greetz, Jan

Am 03.05.2010 16:08, schrieb Erick Erickson:

It would help a lot to see your actual config file, and if you provided a
bit more
detail about what failure looks like

Best
Erick

On Mon, May 3, 2010 at 9:43 AM, Jan Kammerwrote:

   

Hi there,

I want to enable spellchecking, but i got many fields.

I tried around with copyfield to copy all with "*" in one field, but that
didnt work.
Next try was to copy some fields specified each by name in one field named
"spell", but that worked only for 2 or 3 fields, but not for 10 or more...

My question is, what the best practice is to enable spellchecking on many
fields.

thanks.

greetz, Jan

 
   




Re: SpellChecking

2010-05-03 Thread Erick Erickson
It would help a lot to see your actual config file, and if you provided a
bit more
detail about what failure looks like

Best
Erick

On Mon, May 3, 2010 at 9:43 AM, Jan Kammer wrote:

> Hi there,
>
> I want to enable spellchecking, but i got many fields.
>
> I tried around with copyfield to copy all with "*" in one field, but that
> didnt work.
> Next try was to copy some fields specified each by name in one field named
> "spell", but that worked only for 2 or 3 fields, but not for 10 or more...
>
> My question is, what the best practice is to enable spellchecking on many
> fields.
>
> thanks.
>
> greetz, Jan
>


SpellChecking

2010-05-03 Thread Jan Kammer

Hi there,

I want to enable spellchecking, but i got many fields.

I tried around with copyfield to copy all with "*" in one field, but 
that didnt work.
Next try was to copy some fields specified each by name in one field 
named "spell", but that worked only for 2 or 3 fields, but not for 10 or 
more...


My question is, what the best practice is to enable spellchecking on 
many fields.


thanks.

greetz, Jan


Re: Spellchecking - Is there a way to do this?

2009-12-17 Thread Lance Norskog
Another thing you might check into is stemming. The Porter stemmer
included in Solr is "aggressive", meaning that it will tend to do
weird things with misspellings. There is a different stemmer called
KStem which is available from www.lucidimagination.com/Downloads is
less aggressive. Porter turns "changes" and "changing" into "chang",
while KStem does not go this far.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

On Thu, Dec 17, 2009 at 12:59 PM, Lance Norskog  wrote:
> Character-based NGrams are a good tool for this problem. MLT is a
> document-wide numerical analysis.
>
> If the common types of OCR mistakes are different than what NGrams
> create, you might tune the ngram generator. For example, swapping
> letters might not happen very often. SIngle- and multi-word errors
> must happen a lot.
>
> If you do a facet query on your indexed terms, you will get a lot of
> facets with only one appearance in the index. These are often
> misspellings. It is possible to automate pulling these and creating a
> matching set of synonyms for words that appear in the spelling index.
>
> On Tue, Dec 15, 2009 at 12:57 PM, Chris Hostetter
>  wrote:
>>
>> : My first problem appears because I need suggestions inclusive when the
>> : expression has returned results. It's seems that only appear
>> : suggestions when there are no results. Is there a way to do so?
>>
>> can you give us an example of what your queries look like?  with the
>> example configs, i can get matches, as well as suggestions...
>>
>>
>> http://localhost:8983/solr/spell?q=ide&spellcheck=true
>>
>> : The second question is: For the purposes that I've mentioned, is the
>> : best way to use spellchecker or mlt component? Or some other (as a
>> : fuzzy query)?
>>
>> there's no clear cut answer to that -- i don't remember anyone else ever
>> asking about anything particularly similar to what you're doing, so i
>> don't know that there is any precident for a "best" way to go about it.
>>
>>
>>
>> -Hoss
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: Spellchecking - Is there a way to do this?

2009-12-17 Thread Lance Norskog
Character-based NGrams are a good tool for this problem. MLT is a
document-wide numerical analysis.

If the common types of OCR mistakes are different than what NGrams
create, you might tune the ngram generator. For example, swapping
letters might not happen very often. SIngle- and multi-word errors
must happen a lot.

If you do a facet query on your indexed terms, you will get a lot of
facets with only one appearance in the index. These are often
misspellings. It is possible to automate pulling these and creating a
matching set of synonyms for words that appear in the spelling index.

On Tue, Dec 15, 2009 at 12:57 PM, Chris Hostetter
 wrote:
>
> : My first problem appears because I need suggestions inclusive when the
> : expression has returned results. It's seems that only appear
> : suggestions when there are no results. Is there a way to do so?
>
> can you give us an example of what your queries look like?  with the
> example configs, i can get matches, as well as suggestions...
>
>
> http://localhost:8983/solr/spell?q=ide&spellcheck=true
>
> : The second question is: For the purposes that I've mentioned, is the
> : best way to use spellchecker or mlt component? Or some other (as a
> : fuzzy query)?
>
> there's no clear cut answer to that -- i don't remember anyone else ever
> asking about anything particularly similar to what you're doing, so i
> don't know that there is any precident for a "best" way to go about it.
>
>
>
> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Spellchecking - Is there a way to do this?

2009-12-15 Thread Chris Hostetter

: My first problem appears because I need suggestions inclusive when the
: expression has returned results. It's seems that only appear
: suggestions when there are no results. Is there a way to do so?

can you give us an example of what your queries look like?  with the 
example configs, i can get matches, as well as suggestions...


http://localhost:8983/solr/spell?q=ide&spellcheck=true

: The second question is: For the purposes that I've mentioned, is the
: best way to use spellchecker or mlt component? Or some other (as a
: fuzzy query)?

there's no clear cut answer to that -- i don't remember anyone else ever 
asking about anything particularly similar to what you're doing, so i 
don't know that there is any precident for a "best" way to go about it.



-Hoss



Re: Re: Solr Cell and Spellchecking.

2009-12-09 Thread boyleme

I just resolved the issue (fresh coffee == good) ! 

In my schema, I had added:



but missed the copyField definition. Adding these:




and a restart and everything is working properly.

Thanks for the reply and for LucidImagination -- the only reason I have been 
able to get Solr integrated into our ruby app.

-Mike



Re: Solr Cell and Spellchecking.

2009-12-09 Thread Grant Ingersoll
What's your schema and your config look like for the various relevant pieces?

On Dec 8, 2009, at 8:04 PM, Michael Boyle wrote:

> Following Eric Hatcher's post about using SolrCell and acts_as_solr { 
> http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ }, I have 
> been able to index a rich document stream and retrieve it's id. No worries.
> 
> However, I have the SpellCheckComponent setup to build on commit 
> (buildOnCommit=true). Alas, the rich document text is not being added to the 
> spellchecker dictionary.
> 
> Is there something special I need to do within the SolrConfig.xml or within 
> the acts_as_solr ruby classes?
> 
> - thanks in advance for any ideas -
> 
> Mike Boyle

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Solr Cell and Spellchecking.

2009-12-08 Thread Michael Boyle
Following Eric Hatcher's post about using SolrCell and acts_as_solr { 
http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ }, I 
have been able to index a rich document stream and retrieve it's id. No 
worries.


However, I have the SpellCheckComponent setup to build on commit 
(buildOnCommit=true). Alas, the rich document text is not being added to 
the spellchecker dictionary.


Is there something special I need to do within the SolrConfig.xml or 
within the acts_as_solr ruby classes?


- thanks in advance for any ideas -

Mike Boyle


Spellchecking - Is there a way to do this?

2009-12-06 Thread Germán Biozzoli
Hello everybody

1. Have tons of digitalized text with the logical errors in OCR process
2. Have indexed with Solr and is working OK.
3. Have added spellchecker index-based for words and phrases with the
hope to offer suggestions with "suspicious" possible new query
expressions, or related query expressions to the actual one with the
intention to find documents that have the original expression but
contains OCR errors (the user originally have search for "state and
democracy" and the interface will offer "stete and demcraci" as an
alternate query expression)

My first problem appears because I need suggestions inclusive when the
expression has returned results. It's seems that only appear
suggestions when there are no results. Is there a way to do so?

The second question is: For the purposes that I've mentioned, is the
best way to use spellchecker or mlt component? Or some other (as a
fuzzy query)?

Thanks a lot
German


Re: spellchecking multiple fields?

2008-07-16 Thread Ryan McKinley
and the caveat that all fields would need to be declared in the  
solrconfig.xml (or get used for both fields)


this could work...  would also need to augment the response with the  
name of the dictionary, or assert that something will be written all  
the time (so you could know the 2nd  would be for  
the 2nd configured dictionary.



On Jul 16, 2008, at 8:06 AM, Grant Ingersoll wrote:


Another thought that might work:

Declare two separate components, one for each field and then  
implement a QueryConverter that takes in the field and only extracts  
the tokens for the field or choice.


This is a definite workaround, but I think it might work.  Hmm,  
except we only have one QueryConverter


-Grant

On Jul 15, 2008, at 8:56 PM, Ryan McKinley wrote:

I have a use case where I want to spellcheck the input query across  
multiple fields:

Did you mean: location = washington
vs
Did you mean: person = washington

The current parameter / response structure for the spellcheck  
component does not support this kind of thing.  Any thoughts on how/ 
if the component should handle this?  Perhaps it could be in a  
requestHandler where the params are passed in as json?


spelling={ dictionary="location",  
onlyMorePopular=true}&spelling={ dictionary="person",  
onlyMorePopular=false }


Thoughts?
ryan







Re: spellchecking multiple fields?

2008-07-16 Thread Grant Ingersoll

Another thought that might work:

Declare two separate components, one for each field and then implement  
a QueryConverter that takes in the field and only extracts the tokens  
for the field or choice.


This is a definite workaround, but I think it might work.  Hmm, except  
we only have one QueryConverter


-Grant

On Jul 15, 2008, at 8:56 PM, Ryan McKinley wrote:

I have a use case where I want to spellcheck the input query across  
multiple fields:

Did you mean: location = washington
 vs
Did you mean: person = washington

The current parameter / response structure for the spellcheck  
component does not support this kind of thing.  Any thoughts on how/ 
if the component should handle this?  Perhaps it could be in a  
requestHandler where the params are passed in as json?


spelling={ dictionary="location",  
onlyMorePopular=true}&spelling={ dictionary="person",  
onlyMorePopular=false }


Thoughts?
ryan





Re: spellchecking multiple fields?

2008-07-15 Thread Shalin Shekhar Mangar
One way would be to create a copyField containing both the fields and use it
as the dictionary's source.

If you do want to keep separate dictionaries for both the fields then I
guess we can introduce per-dictionary overridable parameters like the
per-field overridden facet parameters. That would be cleaner than json
params. What do you think?

On Wed, Jul 16, 2008 at 6:26 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:

> I have a use case where I want to spellcheck the input query across
> multiple fields:
>  Did you mean: location = washington
>  vs
>  Did you mean: person = washington
>
> The current parameter / response structure for the spellcheck component
> does not support this kind of thing.  Any thoughts on how/if the component
> should handle this?  Perhaps it could be in a requestHandler where the
> params are passed in as json?
>
>  spelling={ dictionary="location", onlyMorePopular=true}&spelling={
> dictionary="person", onlyMorePopular=false }
>
> Thoughts?
> ryan
>



-- 
Regards,
Shalin Shekhar Mangar.


spellchecking multiple fields?

2008-07-15 Thread Ryan McKinley
I have a use case where I want to spellcheck the input query across  
multiple fields:

 Did you mean: location = washington
  vs
 Did you mean: person = washington

The current parameter / response structure for the spellcheck  
component does not support this kind of thing.  Any thoughts on how/if  
the component should handle this?  Perhaps it could be in a  
requestHandler where the params are passed in as json?


 spelling={ dictionary="location",  
onlyMorePopular=true}&spelling={ dictionary="person",  
onlyMorePopular=false }


Thoughts?
ryan


Re: Integrated Spellchecking

2008-02-20 Thread Doug Steigerwald

Posted our patches if anyone wants to take a look: 
https://issues.apache.org/jira/browse/SOLR-433

Small change to core.RunExecutableListener and all the changes to the shell 
scripts.

All these scripts seem to run fine on RHEL-3 and RHEL-5.1 servers.

doug

Doug Steigerwald wrote:

Sure.  I'll try to post it today or tomorrow.

Doug Steigerwald
Software Developer
McClatchy Interactive
[EMAIL PROTECTED]
919.861.1287

Otis Gospodnetic wrote:

Hey Doug,

You have multicore/spellcheck replication going already?  We have been 
working on the replication for multicore.  Sounds like we are 
replicating each others work.  When will you be able to attach your 
stuff to JIRA issue? https://issues.apache.org/jira/browse/SOLR-433
 
Thanks,

Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 

From: Doug Steigerwald <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, February 15, 2008 12:45:08 PM
Subject: Re: Integrated Spellchecking

That unfortunately got pushed aside to work on some of our higher 
priority solr work since we already had it working one way.


Hoping to revisit this after we push to production and start working 
on new features and share what I've done for this and 
multicore/spellcheck replication (which we have working quite well in 
QA right now).


Doug Steigerwald
Software Developer
McClatchy Interactive
[EMAIL PROTECTED]
919.861.1287


oleg_gnatovskiy wrote:


dsteiger wrote:
I've got a couple search components for automatic spell correction 
that

I've been working on.

I've converted most of the SpellCheckerRequestHandler to a search
component (hopefully will throw a patch out soon for this).  Then 
another search component that will do auto

correction for a query if the search returns zero results.

We're hoping to see some performance improvements out of handling 
this in

Solr instead of our Rails service.

doug


Ryan McKinley wrote:

Yes -- this is what search components are for!

Depending on where you put it in the chain, it could only return 
spell checked results if there are too few results (or the top 
score is below some threshold)


ryan


Grant Ingersoll wrote:
Is it feasible to submit a query to any of the various handlers 
and have it bring back results and spelling suggestions all in 
one response?  Is this something the query components piece would 
handle, assuming one exists for the spell checker?


Thanks,
Grant



So have you succeeded in implementing this patch? I'd definitely 
like to use

this functionality as a search suggestion.




Re: Integrated Spellchecking

2008-02-20 Thread Doug Steigerwald
Allocating some time to this next week.  Need to try and remember what issues I was having when I 
stopped working on it.


doug

Matthew Runo wrote:
I'd have to agree with this. I'd probably be able to put a bit of work 
into it as well, as it's something we'd use for sure if it were available.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 18, 2008, at 6:09 AM, Grant Ingersoll wrote:


Hey Doug,

If you have permission to donate, perhaps you can just post the patch 
anyway and state that it isn't quite ready to go.  This is something I 
could use too, and so may have some cycles to work on it.  I hate to 
replicate the work if you already have something that is more or less 
working.  A half baked patch is better than no patch.


-Grant


On Feb 15, 2008, at 12:45 PM, Doug Steigerwald wrote:

That unfortunately got pushed aside to work on some of our higher 
priority solr work since we already had it working one way.


Hoping to revisit this after we push to production and start working 
on new features and share what I've done for this and 
multicore/spellcheck replication (which we have working quite well in 
QA right now).


Doug Steigerwald
Software Developer
McClatchy Interactive
[EMAIL PROTECTED]
919.861.1287


oleg_gnatovskiy wrote:

dsteiger wrote:
I've got a couple search components for automatic spell correction 
that

I've been working on.

I've converted most of the SpellCheckerRequestHandler to a search
component (hopefully will throw a patch out soon for this).  Then 
another search component that will do auto

correction for a query if the search returns zero results.

We're hoping to see some performance improvements out of handling 
this in

Solr instead of our Rails service.

doug


Ryan McKinley wrote:

Yes -- this is what search components are for!

Depending on where you put it in the chain, it could only return 
spell checked results if there are too few results (or the top 
score is below some threshold)


ryan


Grant Ingersoll wrote:
Is it feasible to submit a query to any of the various handlers 
and have it bring back results and spelling suggestions all in 
one response?  Is this something the query components piece would 
handle, assuming one exists for the spell checker?


Thanks,
Grant



So have you succeeded in implementing this patch? I'd definitely 
like to use

this functionality as a search suggestion.





Re: Integrated Spellchecking

2008-02-20 Thread Doug Steigerwald

Sure.  I'll try to post it today or tomorrow.

Doug Steigerwald
Software Developer
McClatchy Interactive
[EMAIL PROTECTED]
919.861.1287

Otis Gospodnetic wrote:

Hey Doug,

You have multicore/spellcheck replication going already?  We have been working 
on the replication for multicore.  Sounds like we are replicating each 
others work.  When will you be able to attach your stuff to JIRA issue? 
https://issues.apache.org/jira/browse/SOLR-433
 
Thanks,

Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 

From: Doug Steigerwald <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, February 15, 2008 12:45:08 PM
Subject: Re: Integrated Spellchecking

That unfortunately got pushed aside to work on some of our higher priority solr 
work since we 
already had it working one way.


Hoping to revisit this after we push to production and start working on new 
features and share what 
I've done for this and multicore/spellcheck replication (which we have working 
quite well in QA 
right now).


Doug Steigerwald
Software Developer
McClatchy Interactive
[EMAIL PROTECTED]
919.861.1287


oleg_gnatovskiy wrote:


dsteiger wrote:

I've got a couple search components for automatic spell correction that
I've been working on.

I've converted most of the SpellCheckerRequestHandler to a search
component (hopefully will throw a 
patch out soon for this).  Then another search component that will do auto
correction for a query if 
the search returns zero results.


We're hoping to see some performance improvements out of handling this in
Solr instead of our Rails 
service.


doug


Ryan McKinley wrote:

Yes -- this is what search components are for!

Depending on where you put it in the chain, it could only return spell 
checked results if there are too few results (or the top score is below 
some threshold)


ryan


Grant Ingersoll wrote:
Is it feasible to submit a query to any of the various handlers and 
have it bring back results and spelling suggestions all in one 
response?  Is this something the query components piece would handle, 
assuming one exists for the spell checker?


Thanks,
Grant



So have you succeeded in implementing this patch? I'd definitely like to use
this functionality as a search suggestion.




Re: Integrated Spellchecking

2008-02-19 Thread Otis Gospodnetic
Hey Doug,

You have multicore/spellcheck replication going already?  We have been working 
on the replication for multicore.  Sounds like we are replicating each 
others work.  When will you be able to attach your stuff to JIRA issue? 
https://issues.apache.org/jira/browse/SOLR-433
 
Thanks,
Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Doug Steigerwald <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, February 15, 2008 12:45:08 PM
> Subject: Re: Integrated Spellchecking
> 
> That unfortunately got pushed aside to work on some of our higher priority 
> solr 
> work since we 
> already had it working one way.
> 
> Hoping to revisit this after we push to production and start working on new 
> features and share what 
> I've done for this and multicore/spellcheck replication (which we have 
> working 
> quite well in QA 
> right now).
> 
> Doug Steigerwald
> Software Developer
> McClatchy Interactive
> [EMAIL PROTECTED]
> 919.861.1287
> 
> 
> oleg_gnatovskiy wrote:
> > 
> > 
> > dsteiger wrote:
> >> I've got a couple search components for automatic spell correction that
> >> I've been working on.
> >>
> >> I've converted most of the SpellCheckerRequestHandler to a search
> >> component (hopefully will throw a 
> >> patch out soon for this).  Then another search component that will do auto
> >> correction for a query if 
> >> the search returns zero results.
> >>
> >> We're hoping to see some performance improvements out of handling this in
> >> Solr instead of our Rails 
> >> service.
> >>
> >> doug
> >>
> >>
> >> Ryan McKinley wrote:
> >>> Yes -- this is what search components are for!
> >>>
> >>> Depending on where you put it in the chain, it could only return spell 
> >>> checked results if there are too few results (or the top score is below 
> >>> some threshold)
> >>>
> >>> ryan
> >>>
> >>>
> >>> Grant Ingersoll wrote:
> >>>> Is it feasible to submit a query to any of the various handlers and 
> >>>> have it bring back results and spelling suggestions all in one 
> >>>> response?  Is this something the query components piece would handle, 
> >>>> assuming one exists for the spell checker?
> >>>>
> >>>> Thanks,
> >>>> Grant
> >>>>
> >>
> > 
> > 
> > So have you succeeded in implementing this patch? I'd definitely like to use
> > this functionality as a search suggestion.
> 




Re: Integrated Spellchecking

2008-02-19 Thread Matthew Runo
I'd have to agree with this. I'd probably be able to put a bit of work  
into it as well, as it's something we'd use for sure if it were  
available.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 18, 2008, at 6:09 AM, Grant Ingersoll wrote:


Hey Doug,

If you have permission to donate, perhaps you can just post the  
patch anyway and state that it isn't quite ready to go.  This is  
something I could use too, and so may have some cycles to work on  
it.  I hate to replicate the work if you already have something that  
is more or less working.  A half baked patch is better than no patch.


-Grant


On Feb 15, 2008, at 12:45 PM, Doug Steigerwald wrote:

That unfortunately got pushed aside to work on some of our higher  
priority solr work since we already had it working one way.


Hoping to revisit this after we push to production and start  
working on new features and share what I've done for this and  
multicore/spellcheck replication (which we have working quite well  
in QA right now).


Doug Steigerwald
Software Developer
McClatchy Interactive
[EMAIL PROTECTED]
919.861.1287


oleg_gnatovskiy wrote:

dsteiger wrote:
I've got a couple search components for automatic spell  
correction that

I've been working on.

I've converted most of the SpellCheckerRequestHandler to a search
component (hopefully will throw a patch out soon for this).  Then  
another search component that will do auto

correction for a query if the search returns zero results.

We're hoping to see some performance improvements out of handling  
this in

Solr instead of our Rails service.

doug


Ryan McKinley wrote:

Yes -- this is what search components are for!

Depending on where you put it in the chain, it could only return  
spell checked results if there are too few results (or the top  
score is below some threshold)


ryan


Grant Ingersoll wrote:
Is it feasible to submit a query to any of the various handlers  
and have it bring back results and spelling suggestions all in  
one response?  Is this something the query components piece  
would handle, assuming one exists for the spell checker?


Thanks,
Grant



So have you succeeded in implementing this patch? I'd definitely  
like to use

this functionality as a search suggestion.







Re: Integrated Spellchecking

2008-02-18 Thread Grant Ingersoll

Hey Doug,

If you have permission to donate, perhaps you can just post the patch  
anyway and state that it isn't quite ready to go.  This is something I  
could use too, and so may have some cycles to work on it.  I hate to  
replicate the work if you already have something that is more or less  
working.  A half baked patch is better than no patch.


-Grant


On Feb 15, 2008, at 12:45 PM, Doug Steigerwald wrote:

That unfortunately got pushed aside to work on some of our higher  
priority solr work since we already had it working one way.


Hoping to revisit this after we push to production and start working  
on new features and share what I've done for this and multicore/ 
spellcheck replication (which we have working quite well in QA right  
now).


Doug Steigerwald
Software Developer
McClatchy Interactive
[EMAIL PROTECTED]
919.861.1287


oleg_gnatovskiy wrote:

dsteiger wrote:
I've got a couple search components for automatic spell correction  
that

I've been working on.

I've converted most of the SpellCheckerRequestHandler to a search
component (hopefully will throw a patch out soon for this).  Then  
another search component that will do auto

correction for a query if the search returns zero results.

We're hoping to see some performance improvements out of handling  
this in

Solr instead of our Rails service.

doug


Ryan McKinley wrote:

Yes -- this is what search components are for!

Depending on where you put it in the chain, it could only return  
spell checked results if there are too few results (or the top  
score is below some threshold)


ryan


Grant Ingersoll wrote:
Is it feasible to submit a query to any of the various handlers  
and have it bring back results and spelling suggestions all in  
one response?  Is this something the query components piece  
would handle, assuming one exists for the spell checker?


Thanks,
Grant



So have you succeeded in implementing this patch? I'd definitely  
like to use

this functionality as a search suggestion.





Re: Integrated Spellchecking

2008-02-15 Thread Doug Steigerwald
That unfortunately got pushed aside to work on some of our higher priority solr work since we 
already had it working one way.


Hoping to revisit this after we push to production and start working on new features and share what 
I've done for this and multicore/spellcheck replication (which we have working quite well in QA 
right now).


Doug Steigerwald
Software Developer
McClatchy Interactive
[EMAIL PROTECTED]
919.861.1287


oleg_gnatovskiy wrote:



dsteiger wrote:

I've got a couple search components for automatic spell correction that
I've been working on.

I've converted most of the SpellCheckerRequestHandler to a search
component (hopefully will throw a 
patch out soon for this).  Then another search component that will do auto
correction for a query if 
the search returns zero results.


We're hoping to see some performance improvements out of handling this in
Solr instead of our Rails 
service.


doug


Ryan McKinley wrote:

Yes -- this is what search components are for!

Depending on where you put it in the chain, it could only return spell 
checked results if there are too few results (or the top score is below 
some threshold)


ryan


Grant Ingersoll wrote:
Is it feasible to submit a query to any of the various handlers and 
have it bring back results and spelling suggestions all in one 
response?  Is this something the query components piece would handle, 
assuming one exists for the spell checker?


Thanks,
Grant






So have you succeeded in implementing this patch? I'd definitely like to use
this functionality as a search suggestion.


Re: Integrated Spellchecking

2008-02-15 Thread oleg_gnatovskiy



dsteiger wrote:
> 
> I've got a couple search components for automatic spell correction that
> I've been working on.
> 
> I've converted most of the SpellCheckerRequestHandler to a search
> component (hopefully will throw a 
> patch out soon for this).  Then another search component that will do auto
> correction for a query if 
> the search returns zero results.
> 
> We're hoping to see some performance improvements out of handling this in
> Solr instead of our Rails 
> service.
> 
> doug
> 
> 
> Ryan McKinley wrote:
>> Yes -- this is what search components are for!
>> 
>> Depending on where you put it in the chain, it could only return spell 
>> checked results if there are too few results (or the top score is below 
>> some threshold)
>> 
>> ryan
>> 
>> 
>> Grant Ingersoll wrote:
>>> Is it feasible to submit a query to any of the various handlers and 
>>> have it bring back results and spelling suggestions all in one 
>>> response?  Is this something the query components piece would handle, 
>>> assuming one exists for the spell checker?
>>>
>>> Thanks,
>>> Grant
>>>
> 
> 


So have you succeeded in implementing this patch? I'd definitely like to use
this functionality as a search suggestion.
-- 
View this message in context: 
http://www.nabble.com/Integrated-Spellchecking-tp14930232p15504125.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Integrated Spellchecking

2008-01-17 Thread Grant Ingersoll


On Jan 17, 2008, at 3:01 PM, Doug Steigerwald wrote:

I've got a couple search components for automatic spell correction  
that I've been working on.


I've converted most of the SpellCheckerRequestHandler to a search  
component (hopefully will throw a patch out soon for this).  Then  
another search component that will do auto correction for a query if  
the search returns zero results.


If you need somebody to test, throw it up on a JIRA, as I would be  
happy to test.


-Grant


Re: Integrated Spellchecking

2008-01-17 Thread Doug Steigerwald

I've got a couple search components for automatic spell correction that I've 
been working on.

I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a 
patch out soon for this).  Then another search component that will do auto correction for a query if 
the search returns zero results.


We're hoping to see some performance improvements out of handling this in Solr instead of our Rails 
service.


doug


Ryan McKinley wrote:

Yes -- this is what search components are for!

Depending on where you put it in the chain, it could only return spell 
checked results if there are too few results (or the top score is below 
some threshold)


ryan


Grant Ingersoll wrote:
Is it feasible to submit a query to any of the various handlers and 
have it bring back results and spelling suggestions all in one 
response?  Is this something the query components piece would handle, 
assuming one exists for the spell checker?


Thanks,
Grant



Re: Integrated Spellchecking

2008-01-17 Thread Yonik Seeley
On Jan 17, 2008 2:33 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> Yes -- this is what search components are for!
>
> Depending on where you put it in the chain, it could only return spell
> checked results if there are too few results (or the top score is below
> some threshold)

Score thresholds are tricky in lucene since scores across different
queries aren't that meaningful.
But a number of results threshold sounds like it might be a good idea

Perhaps there could even be options to
- test if the suggestion actually matches any documents
- replace the original query with the suggestion before running the query
- add an additional DocList to the response for documents matching the
suggestion


 Thinking a little more on the threshold idea, it seems to have some issues.

One problem:
  In general, you want spell suggestions to be corpus wide... so you
might be under a threshold just because the query is heavily filtered
(restrictive fqs) and the suggestion may not match anything under
those restrictions.  Getting the DocSet of the query only to check the
number of hits adds expense to the request.

But
- if not sorting by score, the cache would re-use the query DocSet
instead of going to the Lucene index
- one could add a call to Solr to retrieve the number of hits in the
base query, before filtering (but that could limit or complicate
future optimizations that move some of the filters into the base
query...)

Another issue is how big the spelling index is if it's big enough,
best practice might be to have a separate spelling index that the
front-end client hits concurrently with the main index.  This also
sort of applies to distributed search (one may want a single separate
spelling index that isn't distributed).

-Yonik


Re: Integrated Spellchecking

2008-01-17 Thread Ryan McKinley

Yes -- this is what search components are for!

Depending on where you put it in the chain, it could only return spell 
checked results if there are too few results (or the top score is below 
some threshold)


ryan


Grant Ingersoll wrote:
Is it feasible to submit a query to any of the various handlers and have 
it bring back results and spelling suggestions all in one response?  Is 
this something the query components piece would handle, assuming one 
exists for the spell checker?


Thanks,
Grant





Integrated Spellchecking

2008-01-17 Thread Grant Ingersoll
Is it feasible to submit a query to any of the various handlers and  
have it bring back results and spelling suggestions all in one  
response?  Is this something the query components piece would handle,  
assuming one exists for the spell checker?


Thanks,
Grant