Re: NPE during spell checking when result collapsing is activated and local parameters are used

2019-11-18 Thread Stefan Walter
I created https://issues.apache.org/jira/browse/SOLR-13944 (sorry, it took
a while). Thanks again!


Am 15. November 2019 um 16:10:55, Tomás Fernández Löbbe (
tomasflo...@gmail.com) schrieb:

Would you create a Jira issue anyway tu fix the fact that it NPE instead of
throwing a bad request?

On Fri, Nov 15, 2019 at 2:31 AM Stefan Walter  wrote:

> Indeed, you are right. Interestingly, it generally worked with the two {!
> ..} in the filter query - besides the problem with the collations, of
> course. Therefore I never questioned it...
>
> Thank you!
> Stefan
>
>
> Am 15. November 2019 um 00:01:52, Tomás Fernández Löbbe (
> tomasflo...@gmail.com) schrieb:
>
> I believe your syntax is incorrect. I believe local params must all be
> included in between the same {!...}, and "{!" can only be at the
beginning
>
> have you tried:
>
> ={!collapse tag=collapser field=productId sort='merchantOrder asc,
> price asc, id asc'}
>
>
>
> On Thu, Nov 14, 2019 at 4:54 AM Stefan Walter  wrote:
>
> > Hi!
> >
> > I have an issue with Solr 7.3.1 in the spell checking component:
> >
> > java.lang.NullPointerException at
> >
> >
>
>
org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021)

>
> > at
> >
> >
>
>
org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081)

>
> > at
> >
> >
>
>
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230)

>
> > at
> >
> >
>
>
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602)

>
> > at
> >
> >
>
>
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419)

>
> > at
> >
>
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584)
> > ...
> >
> > I have found an issue that addresses a similiar problem:
> > https://issues.apache.org/jira/browse/SOLR-8807
> >
> > The fix, which was introduced with this issue seems to miss our
> situation,
> > though. The relevant part of the query is this:
> >
> > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc,
> > price asc, id asc'}
> >
> > When I remove the local parameter {!tag=collapser} the collation works
> > fine. Looking at the diff of the commit of the issue mentioned above,
it
> > seems that the "startsWith" could be the problem:
> >
> > + // Collate testing does not support the Collapse QParser (See
> > SOLR-8807)
> > + params.remove("expand");
> > + String[] filters = params.getParams(CommonParams.FQ);
> > + if (filters != null) {
> > + List filtersToApply = new ArrayList<>(filters.length);
> > + for (String fq : filters) {
> > + if (!fq.startsWith("{!collapse")) {
> > + filtersToApply.add(fq);
> > + }
> > + }
> > + params.set("fq", filtersToApply.toArray(new
> > String[filtersToApply.size()]));
> > + }
> >
> > Can someone confirm this? I would open a bug ticket then. (Since the
code
> > is unchanged in the latest version.)
> >
> > Thanks,
> > Stefan
> >
>


Re: NPE during spell checking when result collapsing is activated and local parameters are used

2019-11-15 Thread Tomás Fernández Löbbe
Would you create a Jira issue anyway tu fix the fact that it NPE instead of
throwing a bad request?

On Fri, Nov 15, 2019 at 2:31 AM Stefan Walter  wrote:

> Indeed, you are right. Interestingly, it generally worked with the two {!
> ..} in the filter query - besides the problem with the collations, of
> course. Therefore I never questioned it...
>
> Thank you!
> Stefan
>
>
> Am 15. November 2019 um 00:01:52, Tomás Fernández Löbbe (
> tomasflo...@gmail.com) schrieb:
>
> I believe your syntax is incorrect. I believe local params must all be
> included in between the same {!...}, and "{!" can only be at the beginning
>
> have you tried:
>
> ={!collapse tag=collapser field=productId sort='merchantOrder asc,
> price asc, id asc'}
>
>
>
> On Thu, Nov 14, 2019 at 4:54 AM Stefan Walter  wrote:
>
> > Hi!
> >
> > I have an issue with Solr 7.3.1 in the spell checking component:
> >
> > java.lang.NullPointerException at
> >
> >
>
> org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021)
>
> > at
> >
> >
>
> org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081)
>
> > at
> >
> >
>
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230)
>
> > at
> >
> >
>
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602)
>
> > at
> >
> >
>
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419)
>
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584)
> > ...
> >
> > I have found an issue that addresses a similiar problem:
> > https://issues.apache.org/jira/browse/SOLR-8807
> >
> > The fix, which was introduced with this issue seems to miss our
> situation,
> > though. The relevant part of the query is this:
> >
> > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc,
> > price asc, id asc'}
> >
> > When I remove the local parameter {!tag=collapser} the collation works
> > fine. Looking at the diff of the commit of the issue mentioned above, it
> > seems that the "startsWith" could be the problem:
> >
> > + // Collate testing does not support the Collapse QParser (See
> > SOLR-8807)
> > + params.remove("expand");
> > + String[] filters = params.getParams(CommonParams.FQ);
> > + if (filters != null) {
> > + List filtersToApply = new ArrayList<>(filters.length);
> > + for (String fq : filters) {
> > + if (!fq.startsWith("{!collapse")) {
> > + filtersToApply.add(fq);
> > + }
> > + }
> > + params.set("fq", filtersToApply.toArray(new
> > String[filtersToApply.size()]));
> > + }
> >
> > Can someone confirm this? I would open a bug ticket then. (Since the code
> > is unchanged in the latest version.)
> >
> > Thanks,
> > Stefan
> >
>


Re: NPE during spell checking when result collapsing is activated and local parameters are used

2019-11-15 Thread Stefan Walter
Indeed, you are right. Interestingly, it generally worked with the two {!
..} in the filter query - besides the problem with the collations, of
course. Therefore I never questioned it...

Thank you!
Stefan


Am 15. November 2019 um 00:01:52, Tomás Fernández Löbbe (
tomasflo...@gmail.com) schrieb:

I believe your syntax is incorrect. I believe local params must all be
included in between the same {!...}, and "{!" can only be at the beginning

have you tried:

={!collapse tag=collapser field=productId sort='merchantOrder asc,
price asc, id asc'}



On Thu, Nov 14, 2019 at 4:54 AM Stefan Walter  wrote:

> Hi!
>
> I have an issue with Solr 7.3.1 in the spell checking component:
>
> java.lang.NullPointerException at
>
>
org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021)

> at
>
>
org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081)

> at
>
>
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230)

> at
>
>
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602)

> at
>
>
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419)

> at
>
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584)
> ...
>
> I have found an issue that addresses a similiar problem:
> https://issues.apache.org/jira/browse/SOLR-8807
>
> The fix, which was introduced with this issue seems to miss our
situation,
> though. The relevant part of the query is this:
>
> ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc,
> price asc, id asc'}
>
> When I remove the local parameter {!tag=collapser} the collation works
> fine. Looking at the diff of the commit of the issue mentioned above, it
> seems that the "startsWith" could be the problem:
>
> + // Collate testing does not support the Collapse QParser (See
> SOLR-8807)
> + params.remove("expand");
> + String[] filters = params.getParams(CommonParams.FQ);
> + if (filters != null) {
> + List filtersToApply = new ArrayList<>(filters.length);
> + for (String fq : filters) {
> + if (!fq.startsWith("{!collapse")) {
> + filtersToApply.add(fq);
> + }
> + }
> + params.set("fq", filtersToApply.toArray(new
> String[filtersToApply.size()]));
> + }
>
> Can someone confirm this? I would open a bug ticket then. (Since the code
> is unchanged in the latest version.)
>
> Thanks,
> Stefan
>


Re: NPE during spell checking when result collapsing is activated and local parameters are used

2019-11-14 Thread Tomás Fernández Löbbe
I believe your syntax is incorrect. I believe local params must all be
included in between the same {!...}, and "{!" can only be at the beginning

have you tried:

={!collapse tag=collapser field=productId sort='merchantOrder asc,
price asc, id asc'}



On Thu, Nov 14, 2019 at 4:54 AM Stefan Walter  wrote:

> Hi!
>
> I have an issue with Solr 7.3.1 in the spell checking component:
>
> java.lang.NullPointerException at
>
> org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021)
> at
>
> org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081)
> at
>
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230)
> at
>
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602)
> at
>
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419)
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584)
> ...
>
> I have found an issue that addresses a similiar problem:
> https://issues.apache.org/jira/browse/SOLR-8807
>
> The fix, which was introduced with this issue seems to miss our situation,
> though. The relevant part of the query is this:
>
> ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc,
> price asc, id asc'}
>
> When I remove the local parameter {!tag=collapser} the collation works
> fine. Looking at the diff of the commit of the issue mentioned above, it
> seems that the "startsWith" could be the problem:
>
> +// Collate testing does not support the Collapse QParser (See
> SOLR-8807)
> +params.remove("expand");
> +String[] filters = params.getParams(CommonParams.FQ);
> +if (filters != null) {
> +  List filtersToApply = new ArrayList<>(filters.length);
> +  for (String fq : filters) {
> +if (!fq.startsWith("{!collapse")) {
> +  filtersToApply.add(fq);
> +}
> +  }
> +  params.set("fq", filtersToApply.toArray(new
> String[filtersToApply.size()]));
> +}
>
> Can someone confirm this? I would open a bug ticket then. (Since the code
> is unchanged in the latest version.)
>
> Thanks,
> Stefan
>


NPE during spell checking when result collapsing is activated and local parameters are used

2019-11-14 Thread Stefan Walter
Hi!

I have an issue with Solr 7.3.1 in the spell checking component:

java.lang.NullPointerException at
org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021)
at
org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081)
at
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584)
...

I have found an issue that addresses a similiar problem:
https://issues.apache.org/jira/browse/SOLR-8807

The fix, which was introduced with this issue seems to miss our situation,
though. The relevant part of the query is this:

={!tag=collapser}{!collapse field=productId sort='merchantOrder asc,
price asc, id asc'}

When I remove the local parameter {!tag=collapser} the collation works
fine. Looking at the diff of the commit of the issue mentioned above, it
seems that the "startsWith" could be the problem:

+// Collate testing does not support the Collapse QParser (See
SOLR-8807)
+params.remove("expand");
+String[] filters = params.getParams(CommonParams.FQ);
+if (filters != null) {
+  List filtersToApply = new ArrayList<>(filters.length);
+  for (String fq : filters) {
+if (!fq.startsWith("{!collapse")) {
+  filtersToApply.add(fq);
+}
+  }
+  params.set("fq", filtersToApply.toArray(new
String[filtersToApply.size()]));
+}

Can someone confirm this? I would open a bug ticket then. (Since the code
is unchanged in the latest version.)

Thanks,
Stefan


Re: spell checking on query

2016-11-14 Thread Emir Arnautovic

Hi Midas,

You can use Solr's spellcheck component: 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking


Emir


On 14.11.2016 08:37, Midas A wrote:

How can we do the query time spell checking with help of solr .



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



spell checking on query

2016-11-13 Thread Midas A
How can we do the query time spell checking with help of solr .


Spell checking: What is left to the programmer?

2015-09-15 Thread Mark Fenbers

Greetings!

My Java app, using SolrJ, now successfully does searches. I've used the 
web interface to do a full-text indexing and for each new entry added 
through my app, I have it add to this index.


But now I want to use SolrJ to also do spell checking.  I have read 
several documents on this and examined a couple of Java examples, but 
one main question still persists.  Let's first assume that I have my 
configuration XML files set up correctly and I can spell-check a word 
through the web interface using something like 
.../spell?q=missspelled=on.  Assume also that the end user 
has typed in a paragraph and is about to submit the text.  In the 
current implementation of my software, using SolorJ API, the text will 
get parsed into words and the words will be added to the search index.  
For spell-checking, however; I am puzzled.


Is it up to me, the programmer, to parse the text into individual words 
and determine which words are misspelled, then run the query on a 
misspelled word to get a list of suggestions for that misspelled word??  
Or does Solr itself parse the text string into words and run a query on 
every word, thus indicating which words are misspelled by the non-zero 
list of suggestions?  Or is there a third option I haven't thought of 
(like, spell-check as I type)??


I'm just trying to picture the behavior in my head so I know what 
programming approach to take.  Thanks for the help!


Mark


Re: Spell checking the synonym list?

2015-07-10 Thread Ryan Yacyshyn
Thanks both!

James, I like that approach. I'll give it a try. I forgot to mention I was
only using query-time synonyms but shouldn't be a problem in my case to add
synonyms during index-time.

Ryan



On Thu, 9 Jul 2015 at 22:07 Dyer, James james.d...@ingramcontent.com
wrote:

 Ryan,

 If you use index-time synonyms on the spellcheck field, this will give you
 what you want.

 For instance, if the document has lawyer and you index both terms
 lawyer,attorney, then the spellchecker will see that atorney is 1
 edit away from an indexed term and will suggest attorney.

 You'll need to have the same synonyms set up against the query field, but
 you have the option of making these query-time synonyms if you prefer.

 James Dyer
 Ingram Content Group

 -Original Message-
 From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com]
 Sent: Thursday, July 09, 2015 2:28 AM
 To: solr-user@lucene.apache.org
 Subject: Spell checking the synonym list?

 Hi all,

 I'm wondering if it's possible to have spell checking performed on terms in
 the synonym list?

 For example, let's say I have documents with the word lawyer in them and
 I add lawyer, attorney in the synonyms.txt file. Then a query is made for
 the word atorney. Is there any way to provide spell checking on this?

 Thanks,
 Ryan



RE: Spell checking the synonym list?

2015-07-09 Thread Dyer, James
Ryan,

If you use index-time synonyms on the spellcheck field, this will give you what 
you want.

For instance, if the document has lawyer and you index both terms 
lawyer,attorney, then the spellchecker will see that atorney is 1 edit 
away from an indexed term and will suggest attorney. 

You'll need to have the same synonyms set up against the query field, but you 
have the option of making these query-time synonyms if you prefer.

James Dyer
Ingram Content Group

-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Thursday, July 09, 2015 2:28 AM
To: solr-user@lucene.apache.org
Subject: Spell checking the synonym list?

Hi all,

I'm wondering if it's possible to have spell checking performed on terms in
the synonym list?

For example, let's say I have documents with the word lawyer in them and
I add lawyer, attorney in the synonyms.txt file. Then a query is made for
the word atorney. Is there any way to provide spell checking on this?

Thanks,
Ryan


Spell checking the synonym list?

2015-07-09 Thread Ryan Yacyshyn
Hi all,

I'm wondering if it's possible to have spell checking performed on terms in
the synonym list?

For example, let's say I have documents with the word lawyer in them and
I add lawyer, attorney in the synonyms.txt file. Then a query is made for
the word atorney. Is there any way to provide spell checking on this?

Thanks,
Ryan


RE: Spell checking the synonym list?

2015-07-09 Thread Reitzel, Charles
One of the uses of synonyms is to replace a mis-spelled query term with a 
correctly spelled value.

The 2 sided synonym file format allows you to control which values survive 
into the actual query.

lawyer, attorney, ambulance chaser, atorney, lowyor = lawyer, attorney

I am not aware, however, of any integration between synonym processing and a 
spellcheck dictionary.   Makes sense, though.   But I think additional metadata 
would be required, per dictionary entry, to govern synonym processing.   Thus, 
building the dictionary would not be a transparent/automatic process.

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-SynonymFilter


-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Thursday, July 09, 2015 3:28 AM
To: solr-user@lucene.apache.org
Subject: Spell checking the synonym list?

Hi all,

I'm wondering if it's possible to have spell checking performed on terms in the 
synonym list?

For example, let's say I have documents with the word lawyer in them and I 
add lawyer, attorney in the synonyms.txt file. Then a query is made for the 
word atorney. Is there any way to provide spell checking on this?

Thanks,
Ryan

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*


How make Searching fast in spell checking

2015-02-24 Thread Nitin Solanki
Hello all,
 I have 49 GB of indexed data. I am doing spell checking
things. I have applied ShingleFilter on both index and query part and
taking 25 suggestions of each word in the query and not using collations.
When I search a phrase(taken 5-6 words. Ex.- barack obama is president of
America) then it takes 2 to 3 seconds to process while searching a single
term(Ex. - barack) then it takes only 0.23 second which is good.
Why phrase checking is taking time. Am I doing something wrong ? Any help
on this?


Odd extra character duplicates in spell checking

2014-04-15 Thread Ed Smiley
Hi,
I am going to make this question pretty short, so I don’t overwhelm with 
technical details until  the end.
I suspect that some folks may be seeing this issue without the particular 
configuration we are using.

What our problem is:

  1.  Correctly spelled words are returning as not spelled correctly, with the 
original, correctly spelled word with a single oddball character appended as 
multiple suggestions.
  2.  Incorrectly spelled words are returning correct spelling suggestions with 
a single oddball character appended as multiple suggestions.
  3.  We’re seeing this in Solr 4.5x and 4.7x.

Example:

The return values are all a single character (unicode shown in square brackets).

correction=attitude[2d]
correction=attitude[2f]
correction=attitude[2026]

Spurious characters:

  *   Unicode Character 'HYPHEN-MINUS' (U+002D)
  *   Unicode Character 'SOLIDUS' (U+002F)
  *   Unicode Character 'HORIZONTAL ELLIPSIS' (U+2026)

Anybody see anything like this?  Anybody fix something like this?

Thanks!
—Ed


OK, here’s the gory details:


What we are doing:
We have developed an application that returns  did you mean” spelling 
alternatives against a specific (presumably misspelled word).
We’re using the vocabulary of indexed pages of a specified book as the source 
of the alternatives, so this is not a general dictionary spell check, we are 
returning only matching alternatives.
So when I say “correctly spelled” I mean they are words found on at least one 
page.  We are using the collations, so that we restrict ourselves to those 
pages in one book.
We are having to check for and “fix up” these faulty results.  That’s not a 
robust or desirable solution.

We are using SolrJ to get the collations,
  private static final String DID_YOU_MEAN_REQUEST_HANDLER = 
/spell”;
….
SolrQuery query = new SolrQuery(q);
query.set(spellcheck, true);
query.set(SpellingParams.SPELLCHECK_COUNT, 10);
query.set(SpellingParams.SPELLCHECK_COLLATE, true);
query.set(SpellingParams.SPELLCHECK_COLLATE_EXTENDED_RESULTS, true);
query.set(wt, json);
query.setRequestHandler(DID_YOU_MEAN_REQUEST_HANDLER);
query.set(shards.qt, DID_YOU_MEAN_REQUEST_HANDLER);
query.set(shards.tolerant, true);
etc……

but we can duplicate the behavior without SolrJ with the collations/ 
misspellingsAndCorrections below:, e.g.:
solr/pg1/spell?q=+doc-id:(810500)+AND+attitudexspellcheck=truespellcheck.count=10spellcheck.collate=truespellcheck.collateExtendedResults=truewt=jsonqt=%2Fspellshards.qt=%2Fspellshards.tolerant=true.out.print


{responseHeader:{status:0,QTime:60},response:{numFound:0,start:0,maxScore:0.0,docs:[]},spellcheck:{suggestions:[attitudex,{numFound:6,startOffset:21,endOffset:30,origFreq:0,suggestion:[{word:attitudes,freq:362486},{word:attitu
 dex,freq:4819},{word:atti tudex,freq:3254},{word:attit 
udex,freq:159},{word:attitude-,freq:1080},{word:attituden,freq:261}]},correctlySpelled,false,collation,[collationQuery,
 doc-id:(810500) AND 
attitude-,hits,2,misspellingsAndCorrections,[attitudex,attitude-]],collation,[collationQuery,
 doc-id:(810500) AND 
attitude/,hits,2,misspellingsAndCorrections,[attitudex,attitude/]],collation,[collationQuery,
 doc-id:(810500) AND 
attitude…,hits,2,misspellingsAndCorrections,[attitudex,attitude…]]]}}

The configuration is:

requestHandler name=/spell class=solr.SearchHandler startup=lazy

lst name=defaults

  str name=dftext/str

  str name=spellcheck.dictionarydefault/str

  str name=spellcheck.dictionarywordbreak/str

  str name=spellcheckon/str

  str name=spellcheck.extendedResultstrue/str

  str name=spellcheck.count10/str

  str name=spellcheck.alternativeTermCount5/str

  str name=spellcheck.maxResultsForSuggest5/str

  str name=spellcheck.collatetrue/str

  str name=spellcheck.collateExtendedResultstrue/str

  str name=spellcheck.maxCollationTries10/str

  str name=spellcheck.maxCollations5/str

name=last-components

  strspellcheck/str

/arr

  /requestHandler


lst name=spellchecker

  str name=namewordbreak/str

  str name=classnamesolr.WordBreakSolrSpellChecker/str

  str name=fieldtext/str

  str name=combineWordstrue/str

  str name=breakWordstrue/str

  int name=maxChanges25/int

  int name=minBreakLength3/int

/lst


lst name=spellchecker

  str name=namedefault/str

  str name=fieldtext/str

  str name=classnamesolr.DirectSolrSpellChecker/str

  str name=distanceMeasureinternal/str

  float name=accuracy0.2/float

  int name=maxEdits2/int

  int name=minPrefix1/int

  int name=maxInspections25/int

  int name=minQueryLength4/int

  float name=maxQueryFrequency1/float

/lst

--

Ed Smiley, Senior Software Architect, eBooks
ProQuest | 161 E Evelyn Ave|
Mountain View, CA 94041 | USA |
+1 650 475 8700 extension 3772

Search using the result returned from the spell checking component

2012-11-19 Thread Roni
Hi,

I've successfully configured the spell check component and it works well.

I couldn't find an answer to my question so any help would be much
appreciated: 

Can i send a single request to Solr, and make it so that if any part of the
query was misspelled, than the search would be performed using the first
spell suggestion that returns?

I want to make only one request, e.g. submit a query only once, if that is
possible.

For example: if a user searched for jaca than the search would be
performed only once - for java.

Thanks an advance for any answer or a link to a relevant resource (I
couldn't find any).

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Search using the result returned from the spell checking component

2012-11-19 Thread Dyer, James
What you want isn't supported.  You always will need to issue that second 
request.  This would be a nice feature to add though.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Roni [mailto:r...@socialarray.com] 
Sent: Monday, November 19, 2012 12:54 PM
To: solr-user@lucene.apache.org
Subject: Search using the result returned from the spell checking component

Hi,

I've successfully configured the spell check component and it works well.

I couldn't find an answer to my question so any help would be much
appreciated: 

Can i send a single request to Solr, and make it so that if any part of the
query was misspelled, than the search would be performed using the first
spell suggestion that returns?

I want to make only one request, e.g. submit a query only once, if that is
possible.

For example: if a user searched for jaca than the search would be
performed only once - for java.

Thanks an advance for any answer or a link to a relevant resource (I
couldn't find any).

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Search using the result returned from the spell checking component

2012-11-19 Thread Roni
Thank you.

I was wondering - what if a make a first request, and ask it to return only
1 result - will it still return the spell suggestions while avoiding the
overhead of returning all relevant results?

Than I could make a second request to get all the results i need.

Would that work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021140.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search using the result returned from the spell checking component

2012-11-19 Thread Walter Underwood
You can even request zero rows. That will still return the number of matches.  
--wunder

On Nov 19, 2012, at 11:12 AM, Roni wrote:

 Thank you.
 
 I was wondering - what if a make a first request, and ask it to return only
 1 result - will it still return the spell suggestions while avoiding the
 overhead of returning all relevant results?
 
 Than I could make a second request to get all the results i need.
 
 Would that work?





Re: Search using the result returned from the spell checking component

2012-11-19 Thread Roni
And performance-wise: is asking for 0 rows the same as asking for 100 rows?

On Mon, Nov 19, 2012 at 9:22 PM, Walter Underwood [via Lucene] 
ml-node+s472066n4021143...@n3.nabble.com wrote:

 You can even request zero rows. That will still return the number of
 matches.  --wunder

 On Nov 19, 2012, at 11:12 AM, Roni wrote:

  Thank you.
 
  I was wondering - what if a make a first request, and ask it to return
 only
  1 result - will it still return the spell suggestions while avoiding the
  overhead of returning all relevant results?
 
  Than I could make a second request to get all the results i need.
 
  Would that work?





 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021143.html
  To unsubscribe from Search using the result returned from the spell
 checking component, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4021135code=cm9uaUBzb2NpYWxhcnJheS5jb218NDAyMTEzNXwtMTQ5MzI5ODA0Mw==
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021144.html
Sent from the Solr - User mailing list archive at Nabble.com.

spell checking and filtering in the same query

2012-02-09 Thread Mark Swinson
Background:

I have an solr index containing foodtypes, chefs, and courses. This is
an initial setup to test my configuration.


Here is the problem I'm trying to solve :

-When I query for a mispelt foodtype 'x' and filter by chef 'c' I should
get a suggested list of foodtypes prepared by chef 'c'


ok:

I've managed to set up a spellcheck component so I can make the
following query:

/suggest?q=banspellcheck.dictionary=foodtypes

This gets me the results
'banana bread'
'banoffee pie'

How can I modify this query and the solr configuration to allow me to
filter by another field?

I'm aware the the fq parameter does not work with the SpellCheck
component.
Is there anyway of passing the results of the first query to a filter
query? I've seen various posts
on this topic, but no solutions. The best suggestion was to make the
client make a second request,
which is something I do not want to do.

Is it possible to write a SearchComponent or SearchHandler that chains
results?


Thanks for any help.


Mark








http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



RE: spell checking and filtering in the same query

2012-02-09 Thread Dyer, James
Mark,

I'm not as familiar with the Suggester, but with normal spellcheck if you set 
spellcheck.maxCollationTries to something greater than 0 it will check the 
collations with the index.  This checking includes any fq params you had.  So 
in this sense the SpellCheckComponent does work with fq.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mark Swinson [mailto:mark.swin...@bbc.co.uk] 
Sent: Thursday, February 09, 2012 7:38 AM
To: solr-user@lucene.apache.org
Subject: spell checking and filtering in the same query

Background:

I have an solr index containing foodtypes, chefs, and courses. This is
an initial setup to test my configuration.


Here is the problem I'm trying to solve :

-When I query for a mispelt foodtype 'x' and filter by chef 'c' I should
get a suggested list of foodtypes prepared by chef 'c'


ok:

I've managed to set up a spellcheck component so I can make the
following query:

/suggest?q=banspellcheck.dictionary=foodtypes

This gets me the results
'banana bread'
'banoffee pie'

How can I modify this query and the solr configuration to allow me to
filter by another field?

I'm aware the the fq parameter does not work with the SpellCheck
component.
Is there anyway of passing the results of the first query to a filter
query? I've seen various posts
on this topic, but no solutions. The best suggestion was to make the
client make a second request,
which is something I do not want to do.

Is it possible to write a SearchComponent or SearchHandler that chains
results?


Thanks for any help.


Mark








http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



RE: Spell Checking a multi word phrase

2011-01-17 Thread Dyer, James
Camden,

You may also want to be aware that there is a new feature added to Spell 
Check's collate functionality that will guarantee the collations will return 
hits.  It also is able to return more than one collation and tell you how many 
hits each one would result in if re-queried.  This might do the same thing 
you're trying to do using shingles, but with more accuracy and less work.

For info, look at spellcheck.collate, spellcheck.maxCollations, 
spellcheck.maxCollationTries  spellcheck.collateExtendedResults on the 
component's wiki page: 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

This feature is committed to 3.x and 4.x and is available as a patch for 1.4.1 
(here:  https://issues.apache.org/jira/browse/SOLR-2010).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Camden Daily [mailto:cam...@jaunter.com] 
Sent: Monday, January 17, 2011 1:01 PM
To: solr-user@lucene.apache.org
Subject: Spell Checking a multi word phrase

Hello all,

I'm pretty new to Solr, and trying to set up a spell checker that can handle
entire phrases.  My goal would be to have something that could offer a
suggestion of united states for a query of untied stats.

I have a very large index, and I've worked a bit with creating shingles for
the spelling index.  The problem I'm running into now is that the
SpellCheckComponent is always tokenizing the query that I pass to it.

For example, a query like this
http://localhost:8080/solr/spell?q=untied\statsspellcheck=truedebugQuery=on

The debug information shows me that the parsed query is:
PhraseQuery(text:untied stats)

But I receive the spelling suggestions for untied and stats separately.
From what I understand, this is not a case where I would want to collate; I
simply want the entire phrase treated as one token.

I found the following post after much searching that suggests setting up a
custom QueryConverter:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E

Does anyone know if that would be required?  I had hoped to avoid Java code
entirely with Solr (I haven't used Java in a very long time), but if I do
need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be
able to give me some tips of exactly how I would add that functionality to
Solr?

Relevant configs below:

solrconfig.xml:

  searchComponent name=spellcheck class=solr.SpellCheckComponent
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspellShingle/str
  str name=spellcheckIndexDir./spellShingle/str
  str name=queryAnalyzerFieldTypetextSpellShingle/str
  str name=buildOnOptimizetrue/str
/lst
/searchComponent

schema.xml:

fieldType name=textSpellShingle class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=2
outputUnigrams=true/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

(I had thought setting the KeywordTokenizer for the query analyzer would
keep it from being tokenized, but it doesn't seem to make any difference)

-Camden Daily


Re: Spell Checking a multi word phrase

2011-01-17 Thread Camden Daily
James,

Thank you, but I'm not sure that will work for my needs.  I'm very
interested in contextual spell checking.  Take for example the author
stephenie meyer.  stephenie is a far less popular spelling than
stephanie, but in this context it's the correct option.  I feel like
shingles with an un tokenized query string would be able to catch this, but
I can't find too many examples of people attempting this.

On Mon, Jan 17, 2011 at 2:19 PM, Dyer, James james.d...@ingrambook.comwrote:

 Camden,

 You may also want to be aware that there is a new feature added to Spell
 Check's collate functionality that will guarantee the collations will
 return hits.  It also is able to return more than one collation and tell you
 how many hits each one would result in if re-queried.  This might do the
 same thing you're trying to do using shingles, but with more accuracy and
 less work.

 For info, look at spellcheck.collate, spellcheck.maxCollations,
 spellcheck.maxCollationTries  spellcheck.collateExtendedResults on the
 component's wiki page:
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

 This feature is committed to 3.x and 4.x and is available as a patch for
 1.4.1 (here:  https://issues.apache.org/jira/browse/SOLR-2010).

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Camden Daily [mailto:cam...@jaunter.com]
 Sent: Monday, January 17, 2011 1:01 PM
 To: solr-user@lucene.apache.org
 Subject: Spell Checking a multi word phrase

 Hello all,

 I'm pretty new to Solr, and trying to set up a spell checker that can
 handle
 entire phrases.  My goal would be to have something that could offer a
 suggestion of united states for a query of untied stats.

 I have a very large index, and I've worked a bit with creating shingles for
 the spelling index.  The problem I'm running into now is that the
 SpellCheckComponent is always tokenizing the query that I pass to it.

 For example, a query like this

 http://localhost:8080/solr/spell?q=untied\statsspellcheck=truedebugQuery=onhttp://localhost:8080/solr/spell?q=untied%5Cstatsspellcheck=truedebugQuery=on

 The debug information shows me that the parsed query is:
 PhraseQuery(text:untied stats)

 But I receive the spelling suggestions for untied and stats separately.
 From what I understand, this is not a case where I would want to collate; I
 simply want the entire phrase treated as one token.

 I found the following post after much searching that suggests setting up a
 custom QueryConverter:

 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E

 Does anyone know if that would be required?  I had hoped to avoid Java code
 entirely with Solr (I haven't used Java in a very long time), but if I do
 need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be
 able to give me some tips of exactly how I would add that functionality to
 Solr?

 Relevant configs below:

 solrconfig.xml:

  searchComponent name=spellcheck class=solr.SpellCheckComponent
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspellShingle/str
  str name=spellcheckIndexDir./spellShingle/str
  str name=queryAnalyzerFieldTypetextSpellShingle/str
  str name=buildOnOptimizetrue/str
/lst
 /searchComponent

 schema.xml:

fieldType name=textSpellShingle class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=2
 outputUnigrams=true/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

 (I had thought setting the KeywordTokenizer for the query analyzer would
 keep it from being tokenized, but it doesn't seem to make any difference)

 -Camden Daily



RE: Spell Checking a multi word phrase

2011-01-17 Thread Dyer, James
Camden,

Have you seen SmileyPugh's Solr book?  They describe something very similar to 
what you're trying to do on p180ff.  The difference seems to be they use a 
field that only has a couple of terms so they don't bother with shingles.  The 
book makes a big point about using spellcheck.q in this case in order to get 
the analysis right.  I'm not sure if this is the solution but I thought I'd 
mention it.  I never tried spell checking this way because it seemed very 
limited and possibly quite expensive. 

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Camden Daily [mailto:cam...@jaunter.com] 
Sent: Monday, January 17, 2011 1:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Spell Checking a multi word phrase

James,

Thank you, but I'm not sure that will work for my needs.  I'm very
interested in contextual spell checking.  Take for example the author
stephenie meyer.  stephenie is a far less popular spelling than
stephanie, but in this context it's the correct option.  I feel like
shingles with an un tokenized query string would be able to catch this, but
I can't find too many examples of people attempting this.

On Mon, Jan 17, 2011 at 2:19 PM, Dyer, James james.d...@ingrambook.comwrote:

 Camden,

 You may also want to be aware that there is a new feature added to Spell
 Check's collate functionality that will guarantee the collations will
 return hits.  It also is able to return more than one collation and tell you
 how many hits each one would result in if re-queried.  This might do the
 same thing you're trying to do using shingles, but with more accuracy and
 less work.

 For info, look at spellcheck.collate, spellcheck.maxCollations,
 spellcheck.maxCollationTries  spellcheck.collateExtendedResults on the
 component's wiki page:
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

 This feature is committed to 3.x and 4.x and is available as a patch for
 1.4.1 (here:  https://issues.apache.org/jira/browse/SOLR-2010).

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Camden Daily [mailto:cam...@jaunter.com]
 Sent: Monday, January 17, 2011 1:01 PM
 To: solr-user@lucene.apache.org
 Subject: Spell Checking a multi word phrase

 Hello all,

 I'm pretty new to Solr, and trying to set up a spell checker that can
 handle
 entire phrases.  My goal would be to have something that could offer a
 suggestion of united states for a query of untied stats.

 I have a very large index, and I've worked a bit with creating shingles for
 the spelling index.  The problem I'm running into now is that the
 SpellCheckComponent is always tokenizing the query that I pass to it.

 For example, a query like this

 http://localhost:8080/solr/spell?q=untied\statsspellcheck=truedebugQuery=onhttp://localhost:8080/solr/spell?q=untied%5Cstatsspellcheck=truedebugQuery=on

 The debug information shows me that the parsed query is:
 PhraseQuery(text:untied stats)

 But I receive the spelling suggestions for untied and stats separately.
 From what I understand, this is not a case where I would want to collate; I
 simply want the entire phrase treated as one token.

 I found the following post after much searching that suggests setting up a
 custom QueryConverter:

 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E

 Does anyone know if that would be required?  I had hoped to avoid Java code
 entirely with Solr (I haven't used Java in a very long time), but if I do
 need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be
 able to give me some tips of exactly how I would add that functionality to
 Solr?

 Relevant configs below:

 solrconfig.xml:

  searchComponent name=spellcheck class=solr.SpellCheckComponent
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspellShingle/str
  str name=spellcheckIndexDir./spellShingle/str
  str name=queryAnalyzerFieldTypetextSpellShingle/str
  str name=buildOnOptimizetrue/str
/lst
 /searchComponent

 schema.xml:

fieldType name=textSpellShingle class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=2
 outputUnigrams=true/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

 (I had thought setting the KeywordTokenizer for the query analyzer would
 keep it from being tokenized, but it doesn't seem to make any difference)

 -Camden Daily



Re: Spell Checking a multi word phrase

2011-01-17 Thread Camden Daily
James,

Thanks, the spellcheck.q was exactly what I needed to be using!

-Camden

On Mon, Jan 17, 2011 at 3:54 PM, Dyer, James james.d...@ingrambook.comwrote:

 Camden,

 Have you seen SmileyPugh's Solr book?  They describe something very
 similar to what you're trying to do on p180ff.  The difference seems to be
 they use a field that only has a couple of terms so they don't bother with
 shingles.  The book makes a big point about using spellcheck.q in this
 case in order to get the analysis right.  I'm not sure if this is the
 solution but I thought I'd mention it.  I never tried spell checking this
 way because it seemed very limited and possibly quite expensive.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Camden Daily [mailto:cam...@jaunter.com]
 Sent: Monday, January 17, 2011 1:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Spell Checking a multi word phrase

 James,

 Thank you, but I'm not sure that will work for my needs.  I'm very
 interested in contextual spell checking.  Take for example the author
 stephenie meyer.  stephenie is a far less popular spelling than
 stephanie, but in this context it's the correct option.  I feel like
 shingles with an un tokenized query string would be able to catch this, but
 I can't find too many examples of people attempting this.

 On Mon, Jan 17, 2011 at 2:19 PM, Dyer, James james.d...@ingrambook.com
 wrote:

  Camden,
 
  You may also want to be aware that there is a new feature added to Spell
  Check's collate functionality that will guarantee the collations will
  return hits.  It also is able to return more than one collation and tell
 you
  how many hits each one would result in if re-queried.  This might do the
  same thing you're trying to do using shingles, but with more accuracy and
  less work.
 
  For info, look at spellcheck.collate, spellcheck.maxCollations,
  spellcheck.maxCollationTries  spellcheck.collateExtendedResults on
 the
  component's wiki page:
  http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate
 
  This feature is committed to 3.x and 4.x and is available as a patch for
  1.4.1 (here:  https://issues.apache.org/jira/browse/SOLR-2010).
 
  James Dyer
  E-Commerce Systems
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Camden Daily [mailto:cam...@jaunter.com]
  Sent: Monday, January 17, 2011 1:01 PM
  To: solr-user@lucene.apache.org
  Subject: Spell Checking a multi word phrase
 
  Hello all,
 
  I'm pretty new to Solr, and trying to set up a spell checker that can
  handle
  entire phrases.  My goal would be to have something that could offer a
  suggestion of united states for a query of untied stats.
 
  I have a very large index, and I've worked a bit with creating shingles
 for
  the spelling index.  The problem I'm running into now is that the
  SpellCheckComponent is always tokenizing the query that I pass to it.
 
  For example, a query like this
 
 
 http://localhost:8080/solr/spell?q=untied\statsspellcheck=truedebugQuery=onhttp://localhost:8080/solr/spell?q=untied%5Cstatsspellcheck=truedebugQuery=on
 
 http://localhost:8080/solr/spell?q=untied%5Cstatsspellcheck=truedebugQuery=on
 
 
  The debug information shows me that the parsed query is:
  PhraseQuery(text:untied stats)
 
  But I receive the spelling suggestions for untied and stats
 separately.
  From what I understand, this is not a case where I would want to collate;
 I
  simply want the entire phrase treated as one token.
 
  I found the following post after much searching that suggests setting up
 a
  custom QueryConverter:
 
 
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E
 
  Does anyone know if that would be required?  I had hoped to avoid Java
 code
  entirely with Solr (I haven't used Java in a very long time), but if I do
  need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be
  able to give me some tips of exactly how I would add that functionality
 to
  Solr?
 
  Relevant configs below:
 
  solrconfig.xml:
 
   searchComponent name=spellcheck class=solr.SpellCheckComponent
 lst name=spellchecker
   str name=namedefault/str
   str name=fieldspellShingle/str
   str name=spellcheckIndexDir./spellShingle/str
   str name=queryAnalyzerFieldTypetextSpellShingle/str
   str name=buildOnOptimizetrue/str
 /lst
  /searchComponent
 
  schema.xml:
 
 fieldType name=textSpellShingle class=solr.TextField
  positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt/
 filter class=solr.ShingleFilterFactory maxShingleSize=2
  outputUnigrams=true/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer

Re: Spell checking question from a Solr novice

2010-11-29 Thread Bill Dueber
On Mon, Oct 18, 2010 at 5:24 PM, Jason Blackerby jblacke...@gmail.comwrote:

 If you know the misspellings you could prevent them from being added to the
 dictionary with a StopFilterFactory like so:



Or, you know, correct the data :-)

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Spell checking question from a Solr novice

2010-10-18 Thread Xin Li
Hi, 

I am looking for a quick solution to improve a search engine's spell checking 
performance. I was wondering if anyone tried to integrate Google SpellCheck API 
with Solr search engine (if possible). Google spellcheck came to my mind 
because of two reasons. First, it is costly to clean up the data to be used as 
spell check baseline. Secondly, google probably has the most complete set of 
misspelled search terms. That's why I would like to know if it is a feasible 
way to go.

Thanks,
Xin
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.


RE: Spell checking question from a Solr novice

2010-10-18 Thread Xin Li
Oops, never mind. Just read Google API policy. 1000 queries per day limit  for 
non-commercial use only. 



-Original Message-
From: Xin Li 
Sent: Monday, October 18, 2010 3:43 PM
To: solr-user@lucene.apache.org
Subject: Spell checking question from a Solr novice

Hi, 

I am looking for a quick solution to improve a search engine's spell checking 
performance. I was wondering if anyone tried to integrate Google SpellCheck API 
with Solr search engine (if possible). Google spellcheck came to my mind 
because of two reasons. First, it is costly to clean up the data to be used as 
spell check baseline. Secondly, google probably has the most complete set of 
misspelled search terms. That's why I would like to know if it is a feasible 
way to go.

Thanks,
Xin
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.


Re: Spell checking question from a Solr novice

2010-10-18 Thread Jonathan Rochkind
In general, the benefit of the built-in Solr spellcheck is that it can 
use a dictionary based on your actual index.


If you want to use some external API, you certainly can, in your actual 
client app -- but it doesn't really need to involve Solr at all anymore, 
does it?  Is there any benefit I'm not thinking of to doing that on the 
solr side, instead of just in your client app?


I think Yahoo (and maybe Microsoft?) have similar APIs with more 
generous ToSs, but I haven't looked in a while.


Xin Li wrote:
Oops, never mind. Just read Google API policy. 1000 queries per day limit  for non-commercial use only. 




-Original Message-
From: Xin Li 
Sent: Monday, October 18, 2010 3:43 PM

To: solr-user@lucene.apache.org
Subject: Spell checking question from a Solr novice

Hi, 


I am looking for a quick solution to improve a search engine's spell checking 
performance. I was wondering if anyone tried to integrate Google SpellCheck API 
with Solr search engine (if possible). Google spellcheck came to my mind 
because of two reasons. First, it is costly to clean up the data to be used as 
spell check baseline. Secondly, google probably has the most complete set of 
misspelled search terms. That's why I would like to know if it is a feasible 
way to go.

Thanks,
Xin
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.


Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.
This electronic mail message contains information that (a) is or 
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
PROTECTED 
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of 
the
addressee(s) named herein.  If you are not an intended recipient, 
please contact the sender immediately and take the steps 
necessary 
to delete the message completely from your computer system.


Not Intended as a Substitute for a Writing: Notwithstanding the 
Uniform Electronic Transaction Act or any other law of similar 
effect, absent an express statement to the contrary, this e-mail 
message, its contents, and any attachments hereto are not 
intended 
to represent an offer or acceptance to enter into a contract and 
are not otherwise intended to bind this sender, 
barnesandnoble.com 
llc, barnesandnoble.com inc. or any other person or entity.
  


Re: Spell checking question from a Solr novice

2010-10-18 Thread Pradeep Singh
I think a spellchecker based on your index has clear advantages. You can
spellcheck words specific to your domain which may not be available in an
outside dictionary. You can always dump the list from wordnet to get a
starter english dictionary.

But then it also means that misspelled words from your domain become the
suggested correct word. Hmmm ... you'll need to have a way to prune out such
words. Even then, your own domain based dictionary is a total go.

On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 In general, the benefit of the built-in Solr spellcheck is that it can use
 a dictionary based on your actual index.

 If you want to use some external API, you certainly can, in your actual
 client app -- but it doesn't really need to involve Solr at all anymore,
 does it?  Is there any benefit I'm not thinking of to doing that on the solr
 side, instead of just in your client app?

 I think Yahoo (and maybe Microsoft?) have similar APIs with more generous
 ToSs, but I haven't looked in a while.


 Xin Li wrote:

 Oops, never mind. Just read Google API policy. 1000 queries per day limit
  for non-commercial use only.


 -Original Message-
 From: Xin Li Sent: Monday, October 18, 2010 3:43 PM
 To: solr-user@lucene.apache.org
 Subject: Spell checking question from a Solr novice

 Hi,
 I am looking for a quick solution to improve a search engine's spell
 checking performance. I was wondering if anyone tried to integrate Google
 SpellCheck API with Solr search engine (if possible). Google spellcheck came
 to my mind because of two reasons. First, it is costly to clean up the data
 to be used as spell check baseline. Secondly, google probably has the most
 complete set of misspelled search terms. That's why I would like to know if
 it is a feasible way to go.

 Thanks,
 Xin
 This electronic mail message contains information that (a) is or may be
 CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
 DISCLOSURE, and (b) is intended only for the use of the
 addressee(s) named herein.  If you are not an intended recipient, please
 contact the sender immediately and take the steps necessary to delete the
 message completely from your computer system.

 Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
 Electronic Transaction Act or any other law of similar effect, absent an
 express statement to the contrary, this e-mail message, its contents, and
 any attachments hereto are not intended to represent an offer or acceptance
 to enter into a contract and are not otherwise intended to bind this sender,
 barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
 entity.
 This electronic mail message contains information that (a) is or may be
 CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
 DISCLOSURE, and (b) is intended only for the use of the
 addressee(s) named herein.  If you are not an intended recipient, please
 contact the sender immediately and take the steps necessary to delete the
 message completely from your computer system.

 Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
 Electronic Transaction Act or any other law of similar effect, absent an
 express statement to the contrary, this e-mail message, its contents, and
 any attachments hereto are not intended to represent an offer or acceptance
 to enter into a contract and are not otherwise intended to bind this sender,
 barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
 entity.





Re: Spell checking question from a Solr novice

2010-10-18 Thread Jason Blackerby
If you know the misspellings you could prevent them from being added to the
dictionary with a StopFilterFactory like so:

fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=misspelled_words.txt/
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
replacement= replace=all/
filter class=solr.LengthFilterFactory min=2 max=50/
  /analyzer
/fieldType

where misspelled_words.txt contains the misspellings.

On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh pksing...@gmail.com wrote:

 I think a spellchecker based on your index has clear advantages. You can
 spellcheck words specific to your domain which may not be available in an
 outside dictionary. You can always dump the list from wordnet to get a
 starter english dictionary.

 But then it also means that misspelled words from your domain become the
 suggested correct word. Hmmm ... you'll need to have a way to prune out
 such
 words. Even then, your own domain based dictionary is a total go.

 On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:

  In general, the benefit of the built-in Solr spellcheck is that it can
 use
  a dictionary based on your actual index.
 
  If you want to use some external API, you certainly can, in your actual
  client app -- but it doesn't really need to involve Solr at all anymore,
  does it?  Is there any benefit I'm not thinking of to doing that on the
 solr
  side, instead of just in your client app?
 
  I think Yahoo (and maybe Microsoft?) have similar APIs with more generous
  ToSs, but I haven't looked in a while.
 
 
  Xin Li wrote:
 
  Oops, never mind. Just read Google API policy. 1000 queries per day
 limit
   for non-commercial use only.
 
 
  -Original Message-
  From: Xin Li Sent: Monday, October 18, 2010 3:43 PM
  To: solr-user@lucene.apache.org
  Subject: Spell checking question from a Solr novice
 
  Hi,
  I am looking for a quick solution to improve a search engine's spell
  checking performance. I was wondering if anyone tried to integrate
 Google
  SpellCheck API with Solr search engine (if possible). Google spellcheck
 came
  to my mind because of two reasons. First, it is costly to clean up the
 data
  to be used as spell check baseline. Secondly, google probably has the
 most
  complete set of misspelled search terms. That's why I would like to know
 if
  it is a feasible way to go.
 
  Thanks,
  Xin
  This electronic mail message contains information that (a) is or may be
  CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
  DISCLOSURE, and (b) is intended only for the use of the
  addressee(s) named herein.  If you are not an intended recipient, please
  contact the sender immediately and take the steps necessary to delete
 the
  message completely from your computer system.
 
  Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
  Electronic Transaction Act or any other law of similar effect, absent an
  express statement to the contrary, this e-mail message, its contents,
 and
  any attachments hereto are not intended to represent an offer or
 acceptance
  to enter into a contract and are not otherwise intended to bind this
 sender,
  barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
  entity.
  This electronic mail message contains information that (a) is or may be
  CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
  DISCLOSURE, and (b) is intended only for the use of the
  addressee(s) named herein.  If you are not an intended recipient, please
  contact the sender immediately and take the steps necessary to delete
 the
  message completely from your computer system.
 
  Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
  Electronic Transaction Act or any other law of similar effect, absent an
  express statement to the contrary, this e-mail message, its contents,
 and
  any attachments hereto are not intended to represent an offer or
 acceptance
  to enter into a contract and are not otherwise intended to bind this
 sender,
  barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
  entity.
 
 
 



Re: Spell checking question from a Solr novice

2010-10-18 Thread Ezequiel Calderara
You can cross the new words against a dictionary and keep them in the file
as Jason described...

What Pradeep said is true, is always better to have suggestions related to
your index that have suggestions with no results...


On Mon, Oct 18, 2010 at 6:24 PM, Jason Blackerby jblacke...@gmail.comwrote:

 If you know the misspellings you could prevent them from being added to the
 dictionary with a StopFilterFactory like so:

fieldType name=textSpell class=solr.TextField
 positionIncrementGap=100 
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=misspelled_words.txt/
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
 replacement= replace=all/
filter class=solr.LengthFilterFactory min=2 max=50/
  /analyzer
/fieldType

 where misspelled_words.txt contains the misspellings.

 On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh pksing...@gmail.com
 wrote:

  I think a spellchecker based on your index has clear advantages. You can
  spellcheck words specific to your domain which may not be available in an
  outside dictionary. You can always dump the list from wordnet to get a
  starter english dictionary.
 
  But then it also means that misspelled words from your domain become the
  suggested correct word. Hmmm ... you'll need to have a way to prune out
  such
  words. Even then, your own domain based dictionary is a total go.
 
  On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu
  wrote:
 
   In general, the benefit of the built-in Solr spellcheck is that it can
  use
   a dictionary based on your actual index.
  
   If you want to use some external API, you certainly can, in your actual
   client app -- but it doesn't really need to involve Solr at all
 anymore,
   does it?  Is there any benefit I'm not thinking of to doing that on the
  solr
   side, instead of just in your client app?
  
   I think Yahoo (and maybe Microsoft?) have similar APIs with more
 generous
   ToSs, but I haven't looked in a while.
  
  
   Xin Li wrote:
  
   Oops, never mind. Just read Google API policy. 1000 queries per day
  limit
for non-commercial use only.
  
  
   -Original Message-
   From: Xin Li Sent: Monday, October 18, 2010 3:43 PM
   To: solr-user@lucene.apache.org
   Subject: Spell checking question from a Solr novice
  
   Hi,
   I am looking for a quick solution to improve a search engine's spell
   checking performance. I was wondering if anyone tried to integrate
  Google
   SpellCheck API with Solr search engine (if possible). Google
 spellcheck
  came
   to my mind because of two reasons. First, it is costly to clean up the
  data
   to be used as spell check baseline. Secondly, google probably has the
  most
   complete set of misspelled search terms. That's why I would like to
 know
  if
   it is a feasible way to go.
  
   Thanks,
   Xin
   This electronic mail message contains information that (a) is or may
 be
   CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW
 FROM
   DISCLOSURE, and (b) is intended only for the use of the
   addressee(s) named herein.  If you are not an intended recipient,
 please
   contact the sender immediately and take the steps necessary to delete
  the
   message completely from your computer system.
  
   Not Intended as a Substitute for a Writing: Notwithstanding the
 Uniform
   Electronic Transaction Act or any other law of similar effect, absent
 an
   express statement to the contrary, this e-mail message, its contents,
  and
   any attachments hereto are not intended to represent an offer or
  acceptance
   to enter into a contract and are not otherwise intended to bind this
  sender,
   barnesandnoble.com llc, barnesandnoble.com inc. or any other person
 or
   entity.
   This electronic mail message contains information that (a) is or may
 be
   CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW
 FROM
   DISCLOSURE, and (b) is intended only for the use of the
   addressee(s) named herein.  If you are not an intended recipient,
 please
   contact the sender immediately and take the steps necessary to delete
  the
   message completely from your computer system.
  
   Not Intended as a Substitute for a Writing: Notwithstanding the
 Uniform
   Electronic Transaction Act or any other law of similar effect, absent
 an
   express statement to the contrary, this e-mail message, its contents,
  and
   any attachments hereto are not intended to represent an offer or
  acceptance
   to enter into a contract and are not otherwise intended to bind this
  sender,
   barnesandnoble.com llc, barnesandnoble.com inc. or any other person
 or
   entity.
  
  
  
 




-- 
__
Ezequiel.

Http://www.ironicnet.com


Re: Spell checking question from a Solr novice

2010-10-18 Thread Dennis Gearon
The first question to ask is will it work for you.

The SECOND question is do  you want google to know what's in your data?

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/18/10, Xin Li x...@book.com wrote:

 From: Xin Li x...@book.com
 Subject: Spell checking question from a Solr novice
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 12:43 PM
 Hi, 
 
 I am looking for a quick solution to improve a search
 engine's spell checking performance. I was wondering if
 anyone tried to integrate Google SpellCheck API with Solr
 search engine (if possible). Google spellcheck came to my
 mind because of two reasons. First, it is costly to clean up
 the data to be used as spell check baseline. Secondly,
 google probably has the most complete set of misspelled
 search terms. That's why I would like to know if it is a
 feasible way to go.
 
 Thanks,
 Xin
 This electronic mail message contains information that (a)
 is or 
 may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE 
 PROTECTED 
 BY LAW FROM DISCLOSURE, and (b) is intended only for the
 use of 
 the
 addressee(s) named herein.  If you are not an intended
 recipient, 
 please contact the sender immediately and take the steps 
 necessary 
 to delete the message completely from your computer
 system.
 
 Not Intended as a Substitute for a Writing: Notwithstanding
 the 
 Uniform Electronic Transaction Act or any other law of
 similar 
 effect, absent an express statement to the contrary, this
 e-mail 
 message, its contents, and any attachments hereto are not 
 intended 
 to represent an offer or acceptance to enter into a
 contract and 
 are not otherwise intended to bind this sender, 
 barnesandnoble.com 
 llc, barnesandnoble.com inc. or any other person or
 entity.



Spell checking and keyword tokenizer

2010-09-14 Thread Glen Stampoultzis
Hi,

I'm trying to spell check a whole field using a lowercasing keyword
tokenizer [1].

for example if I query for furntree gully I'm hoping to get back
ferntree gully as a suggestion.  Unfortunately the spell checker
seems to be recognizing this as two tokens and returning suggestions
for both.  Query [2] and result [3] below.  In this case ferntree
actually does end up with ferntree gully as a suggestion however it
also gives bulla as a suggestion for gully (go figure :-) ).

Any suggestions?

Regards,

Glen


[1] -

fieldType name=lowercase class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType

[2] -

Query

q=locality_lc%3A%22furntree+gully%22spellcheck=truespellcheck.build=truespellcheck.reload=truespellcheck.accuracy=0.5spellcheck.dictionary=locality_spellcheckerspellcheck.collate=truefl=street_name%2Clocality%2Cstate

[3] -

response
  lst name=responseHeader
int name=status
  0
/int
int name=QTime
  379
/int
lst name=params
  str name=spellcheck
true
  /str
  str name=fl
street_name,locality,state
  /str
  str name=spellcheck.accuracy
0.5
  /str
  str name=q
locality_lc:quot;furntree gullyquot;
  /str
  str name=spellcheck.dictionary
locality_spellchecker
  /str
  str name=spellcheck.collate
true
  /str
  str name=spellcheck.reload
true
  /str
  str name=spellcheck.build
true
  /str
/lst
  /lst
  str name=command
build
  /str
  result name=response numFound=0 start=0/
  lst name=spellcheck
lst name=suggestions
  lst name=furntree
int name=numFound
  1
/int
int name=startOffset
  13
/int
int name=endOffset
  21
/int
arr name=suggestion
  str
ferntree gully
  /str
/arr
  /lst
  lst name=gully
int name=numFound
  1
/int
int name=startOffset
  22
/int
int name=endOffset
  27
/int
arr name=suggestion
  str
bulla
  /str
/arr
  /lst
  str name=collation
locality_lc:quot;ferntree gully bullaquot;
  /str
/lst
  /lst
/response


Re: Spell checking and keyword tokenizer

2010-09-14 Thread Glen Stampoultzis
Nevermind this one... With a bit more research I discovered I can use
spellcheck.q to provide the correct suggestion.

On 14 September 2010 16:02, Glen Stampoultzis gst...@gmail.com wrote:
 Hi,

 I'm trying to spell check a whole field using a lowercasing keyword
 tokenizer [1].

 for example if I query for furntree gully I'm hoping to get back
 ferntree gully as a suggestion.  Unfortunately the spell checker
 seems to be recognizing this as two tokens and returning suggestions
 for both.  Query [2] and result [3] below.  In this case ferntree
 actually does end up with ferntree gully as a suggestion however it
 also gives bulla as a suggestion for gully (go figure :-) ).

 Any suggestions?

 Regards,

 Glen


 [1] -

        fieldType name=lowercase class=solr.TextField
 positionIncrementGap=100
            analyzer
                tokenizer class=solr.KeywordTokenizerFactory/
                filter class=solr.LowerCaseFilterFactory /
            /analyzer
        /fieldType

 [2] -

 Query

 q=locality_lc%3A%22furntree+gully%22spellcheck=truespellcheck.build=truespellcheck.reload=truespellcheck.accuracy=0.5spellcheck.dictionary=locality_spellcheckerspellcheck.collate=truefl=street_name%2Clocality%2Cstate

 [3] -

 response
  lst name=responseHeader
    int name=status
      0
    /int
    int name=QTime
      379
    /int
    lst name=params
      str name=spellcheck
        true
      /str
      str name=fl
        street_name,locality,state
      /str
      str name=spellcheck.accuracy
        0.5
      /str
      str name=q
        locality_lc:quot;furntree gullyquot;
      /str
      str name=spellcheck.dictionary
        locality_spellchecker
      /str
      str name=spellcheck.collate
        true
      /str
      str name=spellcheck.reload
        true
      /str
      str name=spellcheck.build
        true
      /str
    /lst
  /lst
  str name=command
    build
  /str
  result name=response numFound=0 start=0/
  lst name=spellcheck
    lst name=suggestions
      lst name=furntree
        int name=numFound
          1
        /int
        int name=startOffset
          13
        /int
        int name=endOffset
          21
        /int
        arr name=suggestion
          str
            ferntree gully
          /str
        /arr
      /lst
      lst name=gully
        int name=numFound
          1
        /int
        int name=startOffset
          22
        /int
        int name=endOffset
          27
        /int
        arr name=suggestion
          str
            bulla
          /str
        /arr
      /lst
      str name=collation
        locality_lc:quot;ferntree gully bullaquot;
      /str
    /lst
  /lst
 /response



spell checking problem

2010-07-29 Thread satya swaroop
hi all,
  i need some help in spellchecking.i configured my solrconfig and
schema by looking the usermailing list and here i give you the configuration
i made..

my schema.xml::

 fieldType name=spellText class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 field name=spell type=spellText indexed=true stored=true
multiValued=true/

copyField source=* dest=spell/



my solrconfig.xml:
--
  requestHandler name=spellchecker class=solr.SearchHandler
startup=lazy
lst name=defaults
  str name=spellcheck.dictionarydefault/str
  str name=spellcheck.onlyMorePopularfalse/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count5/str

/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler



 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespellText/str

lst name=spellchecker
  str name=namedefault/str
  str name=fieldname/str   !-- the default field in
solrconfig if i change to spell field then the dictionary is not created
--
  str name=spellcheckIndexDir./spell/str
  str name=buildOnCommittrue/str
/lst

!-- a spellchecker that uses a different distance measure--
lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldspell/str
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellcheckerjaro/str
/lst


  /searchComponent




1)the problem here is for the default dictionary the index is getting
created and if i write jawa the suggestions it gives are data,sata.. but
the actual sugest is java. I nearly have 20 java docs indexed
2)another problem ::: if i make build to jarowinkler dictionary which is
using the spell field is not going to create the dictionary and i only see
segments.gen and segments_1 in its directory


regards,
satya


spell checking....

2010-07-26 Thread satya swaroop
hi all,
i am a new one to solr and able to implement indexing the documents
by following the solr wiki. now i am trying to add the spellchecking. i
followed the spellcheck component in wiki but not getting the suggested
spellings. i first build it by spellcheck.build=true,...

here i give u the example:::

http://localhost:8080/solr/spell?q=javsspellcheck=truespellcheck.collate=true

response

-
/result

lst name=spellcheck
lst name=suggestions/
/lst
/response


here the response should actualy suggest the java but didnt..

can any one guide me about it...
 i am using solr 1.4, tomcat in ubuntu





Regards,
swarup


Re: spell checking....

2010-07-26 Thread satya swaroop
This is in solrconfig.xml:::

searchComponent name=spellcheck class=solr.SpellCheckComponent
  lst name=spellchecker
  str name=namedefault/str

  str name=classnamesolr.IndexBasedSpellChecker/str

  str name=fieldspell/str
   str name=spellcheckIndexDir./spellchecker/str
   str name=accuracy0.7/str
 str name=buildOnCommittrue/str
str name=buildOnOptimizetrue/str
/lst

lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldlowerfilt/str
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellchecker/str
  str name=buildOnCommittrue/str
  str name=buildOnOptimizetrue/str
/lst

  str name=queryAnalyzerFieldTypetextSpell/str
/searchComponent

!--
  The SpellingQueryConverter to convert raw (CommonParams.Q) queries into
tokens.  Uses a simple regular expression
  to strip off field markup, boosts, ranges, etc. but it is not guaranteed
to match an exact parse from the query parser.

  Optional, defaults to solr.SpellingQueryConverter
--
queryConverter name=queryConverter
class=org.apache.solr.spelling.SpellingQueryConverter/


 i added the following in standard request handler::

requestHandler name=standard class=solr.SearchHandler default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   !--
   int name=rows10/int
   str name=fl*/str
   str name=version2.1/str
!-- Optional, must match spell checker's name as defined above,
defaults to default --
  str name=spellcheck.dictionarydefault/str
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst
 arr name=last-components
  strspellcheck/str
/arr

  /requestHandler


Spell checking not working

2009-09-17 Thread Villemos, Gert
I'm trying to setup a spell checker but failing misserably. I would like
to have a spell check based on actual values injected into the index
from other fields. The configuration is shown below.
 
After indexing and running a query with 'spellcheck.build=true' I can
see that the spellcheck index files updates, i.e. data must is being
injected. I can also see that the injected documents have 'spell'
fields, such as 'spell=closed'. I would therefore expect that a search
for 'clo' would return these as suggestions.
 
But I have tried the queries;
[url]/solr/select?qt=huginnq=clospellcheck=true
[url]/solr/select?qt=huginnq=clo*spellcheck=true
[url]/solr/select?qt=huginnspell:clospellcheck=true
[url]/solr/select?qt=huginnspell:clo*spellcheck=true
 
With no effect. I do not get any hits back. What am I doing wrong?
 
Cheers,
Gert.
 
 
 
 
 
-- SCHEMA.XML
-
 
fieldType name=testSpell class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/types
 
 
fields
...
field name=ieStatus type=text indexed=true stored=true
multiValued=true/ 
field name=spell type=textSpell indexed=true stored=true
multiValued=true/ 
/fields
 
copyField source=ieStatus dest=spell/
 
 
 
 
 
-- SOLRCONFIG.XML
---
 
requestHanlder name=huginn class=solr.SearchHandler default=true
lst name=defaults
... [setup as dismax handler]]
lst
 
arr name=last-components
strspellcheck/str
/arr
/requestHandler
 
 
searchComponent name=spellcheck class=solr.SpellCheckComponent
 
str name=queryAnalyzerFieldTypetextSpell/str
 
lst name=spellchecker
str name=namedefault/str
str name=fieldspell/str
str name=spellCheckIndexDir./spellcheck/default/str
str name=accuracy0.5/str
/lst
/searchComponent
 
 
 


Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: Spell checking: Is there a way to exclude words known to be wrong?

2009-07-14 Thread Erik Hatcher
Use the stopwords feature with a custom mispeled_words.txt and a  
StopFilterFactory on the spell check field ;)


Erik


On Jul 13, 2009, at 8:27 PM, Jay Hill wrote:


We're building a spell index from a field in our main index with the
following configuration:
searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspell/str
  str name=spellcheckIndexDir./spellchecker/str
  str name=buildOnCommittrue/str
/lst
/searchComponent

This works great and re-builds the spelling index on commits as  
expected.
However, we know there are misspellings in the spell field of our  
main
index. We could remove these from the spelling index using Luke,  
however
they will be added again on commits. What we need is something  
similar to
how the protwords.txt file is used. So that when we notice  
misspelled words
such as beginnning being pulled from our main index we could add  
them to

an exclusion file so they are not added to the spelling index again.

Any tricks to make this possible?

-Jay




Re: Spell checking: Is there a way to exclude words known to be wrong?

2009-07-14 Thread Shalin Shekhar Mangar
On Tue, Jul 14, 2009 at 6:37 PM, Erik Hatcher e...@ehatchersolutions.comwrote:

 Use the stopwords feature with a custom mispeled_words.txt and a
 StopFilterFactory on the spell check field ;)


Very cool! :)

-- 
Regards,
Shalin Shekhar Mangar.


Spell checking: Is there a way to exclude words known to be wrong?

2009-07-13 Thread Jay Hill
We're building a spell index from a field in our main index with the
following configuration:
  searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspell/str
  str name=spellcheckIndexDir./spellchecker/str
  str name=buildOnCommittrue/str
/lst
  /searchComponent

This works great and re-builds the spelling index on commits as expected.
However, we know there are misspellings in the spell field of our main
index. We could remove these from the spelling index using Luke, however
they will be added again on commits. What we need is something similar to
how the protwords.txt file is used. So that when we notice misspelled words
such as beginnning being pulled from our main index we could add them to
an exclusion file so they are not added to the spelling index again.

Any tricks to make this possible?

-Jay


Re: Spell checking: Is there a way to exclude words known to be wrong?

2009-07-13 Thread Mark Miller
I don't think there is a way currently, but it might make a nice patch. Or
you could just implement a custom SolrSpellChecker - both
FileBasedSpellChecker and IndexBasedSpellChecker are actually like maybe 50
lines of code or less. It would be fairly quick to just plug a custom
version in as a plugin.

-- 
- Mark

http://www.lucidimagination.com

On Mon, Jul 13, 2009 at 8:27 PM, Jay Hill jayallenh...@gmail.com wrote:

 We're building a spell index from a field in our main index with the
 following configuration:
  searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspell/str
  str name=spellcheckIndexDir./spellchecker/str
  str name=buildOnCommittrue/str
/lst
  /searchComponent

 This works great and re-builds the spelling index on commits as expected.
 However, we know there are misspellings in the spell field of our main
 index. We could remove these from the spelling index using Luke, however
 they will be added again on commits. What we need is something similar to
 how the protwords.txt file is used. So that when we notice misspelled words
 such as beginnning being pulled from our main index we could add them to
 an exclusion file so they are not added to the spelling index again.

 Any tricks to make this possible?

 -Jay



Re: spell checking

2009-06-05 Thread Michael Ludwig

Walter Underwood schrieb:

query suggest --wunder


That's very good.

On the other hand, I noticed how the term spellcheck is spread
all over the place, and that would be a massive renaming orgy.
An explanation at the appropriate place in the documentation is
less invasive. I added two sentences to the Introduction of:

http://wiki.apache.org/solr/SpellCheckComponent

Michael Ludwig


Re: spell checking

2009-06-05 Thread Shalin Shekhar Mangar
On Thu, Jun 4, 2009 at 7:26 PM, Walter Underwood wunderw...@netflix.comwrote:

 query suggest --wunder


How about DidYouMeanComponent?

-- 
Regards,
Shalin Shekhar Mangar.


Re: spell checking

2009-06-04 Thread Michael Ludwig

Yao Ge schrieb:


Maybe we should call this alternative search terms or
suggested search terms instead of spell checking. It is
misleading as there is no right or wrong in spelling, there
is only popular (term frequency?) alternatives.


I had exactly the same difficulty in understanding the concept
because of the name given to the feature, which usually denotes
just what it says, i.e. a spellchecker, which is driven by an
authoritative dictionary and a set of rules, as integrated in
word processors, in order to ensure orthography.

What we have here is quite different from a spellchecker.

IMHO, a name conveying the actual meaning, along the lines of
suggest, would make more sense.

Michael Ludwig


Re: spell checking

2009-06-04 Thread Walter Underwood
query suggest --wunder

On 6/4/09 1:25 AM, Michael Ludwig m...@as-guides.com wrote:

 Yao Ge schrieb:
 
 Maybe we should call this alternative search terms or
 suggested search terms instead of spell checking. It is
 misleading as there is no right or wrong in spelling, there
 is only popular (term frequency?) alternatives.
 
 I had exactly the same difficulty in understanding the concept
 because of the name given to the feature, which usually denotes
 just what it says, i.e. a spellchecker, which is driven by an
 authoritative dictionary and a set of rules, as integrated in
 word processors, in order to ensure orthography.
 
 What we have here is quite different from a spellchecker.
 
 IMHO, a name conveying the actual meaning, along the lines of
 suggest, would make more sense.
 
 Michael Ludwig



spell checking

2009-06-02 Thread Yao Ge

Can someone help providing a tutorial like introduction on how to get
spell-checking work in Solr. It appears many steps are requires before the
spell-checkering functions can be used. It also appears that a dictionary (a
list of correctly spelled words) is required to setup the spell checker. Can
anyone validate my impression?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23835427.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spell checking

2009-06-02 Thread Grant Ingersoll

Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent


On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:



Can someone help providing a tutorial like introduction on how to get
spell-checking work in Solr. It appears many steps are requires  
before the
spell-checkering functions can be used. It also appears that a  
dictionary (a
list of correctly spelled words) is required to setup the spell  
checker. Can

anyone validate my impression?

Thanks.
--
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23835427.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: spell checking

2009-06-02 Thread Yao Ge

Yes. I did. I was not able to grasp the concept of making spell checking
work.
For example, the wiki page says an spell check index need to be built. But
did not say how to do it. Does Solr buid the index out of thin air? Or the
index is buit from the main index? or index is built form a dictionary or
word list?

Please help.


Grant Ingersoll-6 wrote:
 
 Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent
 
 
 On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:
 

 Can someone help providing a tutorial like introduction on how to get
 spell-checking work in Solr. It appears many steps are requires  
 before the
 spell-checkering functions can be used. It also appears that a  
 dictionary (a
 list of correctly spelled words) is required to setup the spell  
 checker. Can
 anyone validate my impression?

 Thanks.
 -- 
 View this message in context:
 http://www.nabble.com/spell-checking-tp23835427p23835427.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23840843.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spell checking

2009-06-02 Thread Otis Gospodnetic

Hello,

This is how you build the SC index:
http://wiki.apache.org/solr/SpellCheckComponent#head-78f5afcf43df544832809abc68dd36b98152670c

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Yao Ge yao...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, June 2, 2009 5:03:24 PM
 Subject: Re: spell checking
 
 
 Yes. I did. I was not able to grasp the concept of making spell checking
 work.
 For example, the wiki page says an spell check index need to be built. But
 did not say how to do it. Does Solr buid the index out of thin air? Or the
 index is buit from the main index? or index is built form a dictionary or
 word list?
 
 Please help.
 
 
 Grant Ingersoll-6 wrote:
  
  Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent
  
  
  On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:
  
 
  Can someone help providing a tutorial like introduction on how to get
  spell-checking work in Solr. It appears many steps are requires  
  before the
  spell-checkering functions can be used. It also appears that a  
  dictionary (a
  list of correctly spelled words) is required to setup the spell  
  checker. Can
  anyone validate my impression?
 
  Thanks.
  -- 
  View this message in context:
  http://www.nabble.com/spell-checking-tp23835427p23835427.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
  
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
  
  Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
  using Solr/Lucene:
  http://www.lucidimagination.com/search
  
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/spell-checking-tp23835427p23840843.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: spell checking

2009-06-02 Thread Jeff Newburn
The spell checking dictionary should be built on startup with spellchecking
is enabled in the system.

First we defined the component in solrconfig.xml.  Notice how it has
buildOnCommit to tell it rebuild the dictionary.

  searchComponent name=spellcheck class=solr.SpellCheckComponent
lst name=spellchecker
  str name=namedefault/str
  str name=classnamesolr.IndexBasedSpellChecker/str
  str name=fieldfield/str
  str name=spellcheckIndexDir./spellchecker1/str
  str name=accuracy0.5/str
  str name=buildOnCommittrue/str
/lst
lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldfield/str
  !-- Use a different Distance Measure --
  str 
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/s
tr
  str name=spellcheckIndexDir./spellchecker2/str
  str name=accuracy0.5/str
  str name=buildOnCommittrue/str
/lst

Second we added the component to the dismax handler:
 arr name=last-components
   strspellcheck/str
 /arr

This seems to work for us.  Hope it helps

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Yao Ge yao...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Tue, 2 Jun 2009 14:03:24 -0700 (PDT)
 To: solr-user@lucene.apache.org
 Subject: Re: spell checking
 
 
 Yes. I did. I was not able to grasp the concept of making spell checking
 work.
 For example, the wiki page says an spell check index need to be built. But
 did not say how to do it. Does Solr buid the index out of thin air? Or the
 index is buit from the main index? or index is built form a dictionary or
 word list?
 
 Please help.
 
 
 Grant Ingersoll-6 wrote:
 
 Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent
 
 
 On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:
 
 
 Can someone help providing a tutorial like introduction on how to get
 spell-checking work in Solr. It appears many steps are requires
 before the
 spell-checkering functions can be used. It also appears that a
 dictionary (a
 list of correctly spelled words) is required to setup the spell
 checker. Can
 anyone validate my impression?
 
 Thanks.
 -- 
 View this message in context:
 http://www.nabble.com/spell-checking-tp23835427p23835427.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 
 
 -- 
 View this message in context:
 http://www.nabble.com/spell-checking-tp23835427p23840843.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: spell checking

2009-06-02 Thread Yao Ge

Sorry for not be able to get my point across.

I know the syntax that leads to a index build for spell checking. I actually
run the command saw some additional file created in data\spellchecker1
directory. What I don't understand is what is in there as I can not trick
Solr to make spell suggestions based on the documented query structure in
wiki. 

Can anyone tell me what happened after when the default spell check is
built? In my case, I used copyField to copy a couple of text fields into a
field called spell. These fields are the original text, they are the ones
with typos that I need to run spell check on. But how can these original
data be used as a base for spell checking? How does Solr know what are
correctly spelled words?

   field name=tech_comment type=text indexed=true stored=true
multiValued=true/
   field name=cust_comment type=text indexed=true stored=true
multiValued=true/
   ...
   field name=spell type=textSpell indexed=true stored=true
multiValued=true/
   ...
   copyField source=tech_comment dest=spell/
   copyField source=cust_comment dest=spell/



Yao Ge wrote:
 
 Can someone help providing a tutorial like introduction on how to get
 spell-checking work in Solr. It appears many steps are requires before the
 spell-checkering functions can be used. It also appears that a dictionary
 (a list of correctly spelled words) is required to setup the spell
 checker. Can anyone validate my impression?
 
 Thanks.
 

-- 
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23841373.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spell checking

2009-06-02 Thread Otis Gospodnetic

Hello,

In short, the assumption behind this type of SC is that the text in the
main index is (mostly) correctly spelled.  When the SC finds query
terms that are close in spelling to words indexed in SC, it offers
spelling suggestions/correction using those presumably correctly spelled terms 
(there are other parameters that control the exact behaviour, but this is the 
idea)

Solr (Lucene's spellchecker, which Solr uses under the hood, actually) turn the 
input text (values from those fields you copy to the spell field) into so 
called n-grams.  You can see that if you open up the SC index with something 
like Luke.  Please see
http://wiki.apache.org/jakarta-lucene/SpellChecker .

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Yao Ge yao...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, June 2, 2009 5:34:07 PM
 Subject: Re: spell checking
 
 
 Sorry for not be able to get my point across.
 
 I know the syntax that leads to a index build for spell checking. I actually
 run the command saw some additional file created in data\spellchecker1
 directory. What I don't understand is what is in there as I can not trick
 Solr to make spell suggestions based on the documented query structure in
 wiki. 
 
 Can anyone tell me what happened after when the default spell check is
 built? In my case, I used copyField to copy a couple of text fields into a
 field called spell. These fields are the original text, they are the ones
 with typos that I need to run spell check on. But how can these original
 data be used as a base for spell checking? How does Solr know what are
 correctly spelled words?
 
   
 multiValued=true/
   
 multiValued=true/
...
   
 multiValued=true/
...
   
   
 
 
 
 Yao Ge wrote:
  
  Can someone help providing a tutorial like introduction on how to get
  spell-checking work in Solr. It appears many steps are requires before the
  spell-checkering functions can be used. It also appears that a dictionary
  (a list of correctly spelled words) is required to setup the spell
  checker. Can anyone validate my impression?
  
  Thanks.
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/spell-checking-tp23835427p23841373.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: spell checking

2009-06-02 Thread Yao Ge

Excellent. Now everything make sense to me. :-)

The spell checking suggestion is the closest variance of user input that
actually existed in the main index. So called correction is relative the
text existed indexed. So there is no need for a brute force list of all
correctly spelled words. Maybe we should call this alternative search
terms or suggested search terms instead of spell checking. It is
misleading as there is no right or wrong in spelling, there is only popular
(term frequency?) alternatives.

Thanks for the insight.


Otis Gospodnetic wrote:
 
 
 Hello,
 
 In short, the assumption behind this type of SC is that the text in the
 main index is (mostly) correctly spelled.  When the SC finds query
 terms that are close in spelling to words indexed in SC, it offers
 spelling suggestions/correction using those presumably correctly spelled
 terms (there are other parameters that control the exact behaviour, but
 this is the idea)
 
 Solr (Lucene's spellchecker, which Solr uses under the hood, actually)
 turn the input text (values from those fields you copy to the spell field)
 into so called n-grams.  You can see that if you open up the SC index with
 something like Luke.  Please see
 http://wiki.apache.org/jakarta-lucene/SpellChecker .
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Yao Ge yao...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, June 2, 2009 5:34:07 PM
 Subject: Re: spell checking
 
 
 Sorry for not be able to get my point across.
 
 I know the syntax that leads to a index build for spell checking. I
 actually
 run the command saw some additional file created in data\spellchecker1
 directory. What I don't understand is what is in there as I can not trick
 Solr to make spell suggestions based on the documented query structure in
 wiki. 
 
 Can anyone tell me what happened after when the default spell check is
 built? In my case, I used copyField to copy a couple of text fields into
 a
 field called spell. These fields are the original text, they are the
 ones
 with typos that I need to run spell check on. But how can these original
 data be used as a base for spell checking? How does Solr know what are
 correctly spelled words?
 
   
 multiValued=true/
   
 multiValued=true/
...
   
 multiValued=true/
...
   
   
 
 
 
 Yao Ge wrote:
  
  Can someone help providing a tutorial like introduction on how to get
  spell-checking work in Solr. It appears many steps are requires before
 the
  spell-checkering functions can be used. It also appears that a
 dictionary
  (a list of correctly spelled words) is required to setup the spell
  checker. Can anyone validate my impression?
  
  Thanks.
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/spell-checking-tp23835427p23841373.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/spell-checking-tp23835427p23844050.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spell checking

2009-06-02 Thread Otis Gospodnetic

I'm glad my late night explanation helped.
You may be right about there being a better name for this functionality.
Note that we do have support for file-based (dictionary-like) spellchecker, too.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Yao Ge yao...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, June 2, 2009 9:42:48 PM
 Subject: Re: spell checking
 
 
 Excellent. Now everything make sense to me. :-)
 
 The spell checking suggestion is the closest variance of user input that
 actually existed in the main index. So called correction is relative the
 text existed indexed. So there is no need for a brute force list of all
 correctly spelled words. Maybe we should call this alternative search
 terms or suggested search terms instead of spell checking. It is
 misleading as there is no right or wrong in spelling, there is only popular
 (term frequency?) alternatives.
 
 Thanks for the insight.
 
 
 Otis Gospodnetic wrote:
  
  
  Hello,
  
  In short, the assumption behind this type of SC is that the text in the
  main index is (mostly) correctly spelled.  When the SC finds query
  terms that are close in spelling to words indexed in SC, it offers
  spelling suggestions/correction using those presumably correctly spelled
  terms (there are other parameters that control the exact behaviour, but
  this is the idea)
  
  Solr (Lucene's spellchecker, which Solr uses under the hood, actually)
  turn the input text (values from those fields you copy to the spell field)
  into so called n-grams.  You can see that if you open up the SC index with
  something like Luke.  Please see
  http://wiki.apache.org/jakarta-lucene/SpellChecker .
  
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
  From: Yao Ge 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, June 2, 2009 5:34:07 PM
  Subject: Re: spell checking
  
  
  Sorry for not be able to get my point across.
  
  I know the syntax that leads to a index build for spell checking. I
  actually
  run the command saw some additional file created in data\spellchecker1
  directory. What I don't understand is what is in there as I can not trick
  Solr to make spell suggestions based on the documented query structure in
  wiki. 
  
  Can anyone tell me what happened after when the default spell check is
  built? In my case, I used copyField to copy a couple of text fields into
  a
  field called spell. These fields are the original text, they are the
  ones
  with typos that I need to run spell check on. But how can these original
  data be used as a base for spell checking? How does Solr know what are
  correctly spelled words?
  
   
  multiValued=true/
   
  multiValued=true/
 ...
   
  multiValued=true/
 ...
   
   
  
  
  
  Yao Ge wrote:
   
   Can someone help providing a tutorial like introduction on how to get
   spell-checking work in Solr. It appears many steps are requires before
  the
   spell-checkering functions can be used. It also appears that a
  dictionary
   (a list of correctly spelled words) is required to setup the spell
   checker. Can anyone validate my impression?
   
   Thanks.
   
  
  -- 
  View this message in context: 
  http://www.nabble.com/spell-checking-tp23835427p23841373.html
  Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/spell-checking-tp23835427p23844050.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Spell checking not returning full terms

2009-02-05 Thread Grant Ingersoll


On Feb 4, 2009, at 7:54 PM, Rupert Fiasco wrote:

Awesome! After reading up on the links you sent me I got it all  
working. Thanks!


FYI - I did previously come across one of the links you sent over:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

But what threw me off is that when I started reading about that
yesterday, in the first paragraph it says that this component is
deprecated and to use SpellCheckComponent - so at that point I stopped
reading and went over to the component page. If I had kept reading I
would have encountered all of the gritty details that I in fact needed
to get it to work. The wiki entry makes it seem old and deprecated and
is no longer relevant, but it certainly is.


Hmmm, yeah, I see your point.  Some people still use the  
SpellCheckerReqHandler.   I made it more explicit on each of the pages  
by linking to a separate page: http://wiki.apache.org/solr/SpellCheckingAnalysis 
  Feel free to add/modify based on your experience!



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout mail/wiki/docs/ 
JIRA) using Solr/Lucene:

http://www.lucidimagination.com/search













Spell checking not returning full terms

2009-02-04 Thread Rupert Fiasco
We are using Solr 1.3 and trying to get spell checking functionality.

FYI, our index contains a lot of medical terms (which might or might
not make a difference as they are not English-y words, if that makes
any sense?)

If I specify a spellcheck query of spellcheck.q=diabtes

I get suggestions of:

strdiabet/str
strdiabetogen/str
strdilat/str
strdiamet/str
strdiatom/str
strdiastol/str
strdiactin/str
strdialect/str

If I re-mis-spell Diabetes to q=diabets then I go no suggestions.

So first off two things:

1) Why would leaving out one e over the other affect the spelling
suggestions so substantially?
2) In the former list of suggestions, notice the first suggestion is
diabet, which isnt all that helpful, it should return something like
diabetes or maybe even diabetic.

Note that if I do a normal search against diabetes then I get a ton
of results, in other words, our index is filled with terms of
diabetes.

My relevant solrconfig is:


str name=queryAnalyzerFieldTypetext/str

lst name=spellchecker
  str name=namedefault/str
  str name=fieldtext_t/str
  str name=spellcheckIndexDir./spellchecker1/str
  str name=accuracy0.1/str

/lst
lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldtext_t/str
  !-- Use a different Distance Measure --
  str 
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellchecker2/str
  str name=accuracy0.1/str

/lst

and I have

spellcheck.count = 8

Notice that I severely bumped down the accuracy setting to get more
results. Bumping it up higher yields less results (not sure what
setting really meant so I dont know in what direction I want to change
that value - I am guessing that a lower value allows for more
mis-spellings, e.g. its more promiscuous).

Our text and text_t fields are defined in schema.xml as:

field name=text type=text indexed=true stored=false
multiValued=true/
and
dynamicField name=*_t type=text   indexed=true
stored=true multiValued=true /

Any help would be appreciated.

Thanks
-Rupert


Re: Spell checking not returning full terms

2009-02-04 Thread Grant Ingersoll
I'm guessing the field you are checking against is being stemmed.  The  
field you spell check against should have minimal analysis done to it,  
i.e. tokenization and probably downcasing.  See http://wiki.apache.org/solr/SpellCheckComponent 
 and http://wiki.apache.org/solr/SpellCheckerRequestHandler for tips  
on how to handle analysis for spelling.


On Feb 4, 2009, at 2:33 PM, Rupert Fiasco wrote:


We are using Solr 1.3 and trying to get spell checking functionality.

FYI, our index contains a lot of medical terms (which might or might
not make a difference as they are not English-y words, if that makes
any sense?)

If I specify a spellcheck query of spellcheck.q=diabtes

I get suggestions of:

strdiabet/str
strdiabetogen/str
strdilat/str
strdiamet/str
strdiatom/str
strdiastol/str
strdiactin/str
strdialect/str

If I re-mis-spell Diabetes to q=diabets then I go no suggestions.

So first off two things:

1) Why would leaving out one e over the other affect the spelling
suggestions so substantially?
2) In the former list of suggestions, notice the first suggestion is
diabet, which isnt all that helpful, it should return something like
diabetes or maybe even diabetic.

Note that if I do a normal search against diabetes then I get a ton
of results, in other words, our index is filled with terms of
diabetes.

My relevant solrconfig is:


   str name=queryAnalyzerFieldTypetext/str

   lst name=spellchecker
 str name=namedefault/str
 str name=fieldtext_t/str
 str name=spellcheckIndexDir./spellchecker1/str
 str name=accuracy0.1/str

   /lst
   lst name=spellchecker
 str name=namejarowinkler/str
 str name=fieldtext_t/str
 !-- Use a different Distance Measure --
 str  
name 
= 
distanceMeasure 
org.apache.lucene.search.spell.JaroWinklerDistance/str

 str name=spellcheckIndexDir./spellchecker2/str
 str name=accuracy0.1/str

   /lst

and I have

spellcheck.count = 8

Notice that I severely bumped down the accuracy setting to get more
results. Bumping it up higher yields less results (not sure what
setting really meant so I dont know in what direction I want to change
that value - I am guessing that a lower value allows for more
mis-spellings, e.g. its more promiscuous).

Our text and text_t fields are defined in schema.xml as:

field name=text type=text indexed=true stored=false
multiValued=true/
and
dynamicField name=*_t type=text   indexed=true
stored=true multiValued=true /

Any help would be appreciated.

Thanks
-Rupert


--
Grant Ingersoll
http://www.lucidimagination.com/

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ













Re: Spell checking not returning full terms

2009-02-04 Thread Rupert Fiasco
Awesome! After reading up on the links you sent me I got it all working. Thanks!

FYI - I did previously come across one of the links you sent over:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

But what threw me off is that when I started reading about that
yesterday, in the first paragraph it says that this component is
deprecated and to use SpellCheckComponent - so at that point I stopped
reading and went over to the component page. If I had kept reading I
would have encountered all of the gritty details that I in fact needed
to get it to work. The wiki entry makes it seem old and deprecated and
is no longer relevant, but it certainly is.

-Rupert

On Wed, Feb 4, 2009 at 11:57 AM, Grant Ingersoll gsing...@apache.org wrote:
 I'm guessing the field you are checking against is being stemmed.  The field
 you spell check against should have minimal analysis done to it, i.e.
 tokenization and probably downcasing.  See
 http://wiki.apache.org/solr/SpellCheckComponent and
 http://wiki.apache.org/solr/SpellCheckerRequestHandler for tips on how to
 handle analysis for spelling.

 On Feb 4, 2009, at 2:33 PM, Rupert Fiasco wrote:

 We are using Solr 1.3 and trying to get spell checking functionality.

 FYI, our index contains a lot of medical terms (which might or might
 not make a difference as they are not English-y words, if that makes
 any sense?)

 If I specify a spellcheck query of spellcheck.q=diabtes

 I get suggestions of:

 strdiabet/str
 strdiabetogen/str
 strdilat/str
 strdiamet/str
 strdiatom/str
 strdiastol/str
 strdiactin/str
 strdialect/str

 If I re-mis-spell Diabetes to q=diabets then I go no suggestions.

 So first off two things:

 1) Why would leaving out one e over the other affect the spelling
 suggestions so substantially?
 2) In the former list of suggestions, notice the first suggestion is
 diabet, which isnt all that helpful, it should return something like
 diabetes or maybe even diabetic.

 Note that if I do a normal search against diabetes then I get a ton
 of results, in other words, our index is filled with terms of
 diabetes.

 My relevant solrconfig is:


   str name=queryAnalyzerFieldTypetext/str

   lst name=spellchecker
 str name=namedefault/str
 str name=fieldtext_t/str
 str name=spellcheckIndexDir./spellchecker1/str
 str name=accuracy0.1/str

   /lst
   lst name=spellchecker
 str name=namejarowinkler/str
 str name=fieldtext_t/str
 !-- Use a different Distance Measure --
 str
 name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
 str name=spellcheckIndexDir./spellchecker2/str
 str name=accuracy0.1/str

   /lst

 and I have

 spellcheck.count = 8

 Notice that I severely bumped down the accuracy setting to get more
 results. Bumping it up higher yields less results (not sure what
 setting really meant so I dont know in what direction I want to change
 that value - I am guessing that a lower value allows for more
 mis-spellings, e.g. its more promiscuous).

 Our text and text_t fields are defined in schema.xml as:

 field name=text type=text indexed=true stored=false
 multiValued=true/
 and
 dynamicField name=*_t type=text   indexed=true
 stored=true multiValued=true /

 Any help would be appreciated.

 Thanks
 -Rupert

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ