Re: NPE during spell checking when result collapsing is activated and local parameters are used
I created https://issues.apache.org/jira/browse/SOLR-13944 (sorry, it took a while). Thanks again! Am 15. November 2019 um 16:10:55, Tomás Fernández Löbbe ( tomasflo...@gmail.com) schrieb: Would you create a Jira issue anyway tu fix the fact that it NPE instead of throwing a bad request? On Fri, Nov 15, 2019 at 2:31 AM Stefan Walter wrote: > Indeed, you are right. Interestingly, it generally worked with the two {! > ..} in the filter query - besides the problem with the collations, of > course. Therefore I never questioned it... > > Thank you! > Stefan > > > Am 15. November 2019 um 00:01:52, Tomás Fernández Löbbe ( > tomasflo...@gmail.com) schrieb: > > I believe your syntax is incorrect. I believe local params must all be > included in between the same {!...}, and "{!" can only be at the beginning > > have you tried: > > ={!collapse tag=collapser field=productId sort='merchantOrder asc, > price asc, id asc'} > > > > On Thu, Nov 14, 2019 at 4:54 AM Stefan Walter wrote: > > > Hi! > > > > I have an issue with Solr 7.3.1 in the spell checking component: > > > > java.lang.NullPointerException at > > > > > > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > > > at > > > > > > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > > > at > > > > > > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > > > at > > > > > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > > > at > > > > > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > > > at > > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > > ... > > > > I have found an issue that addresses a similiar problem: > > https://issues.apache.org/jira/browse/SOLR-8807 > > > > The fix, which was introduced with this issue seems to miss our > situation, > > though. The relevant part of the query is this: > > > > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, > > price asc, id asc'} > > > > When I remove the local parameter {!tag=collapser} the collation works > > fine. Looking at the diff of the commit of the issue mentioned above, it > > seems that the "startsWith" could be the problem: > > > > + // Collate testing does not support the Collapse QParser (See > > SOLR-8807) > > + params.remove("expand"); > > + String[] filters = params.getParams(CommonParams.FQ); > > + if (filters != null) { > > + List filtersToApply = new ArrayList<>(filters.length); > > + for (String fq : filters) { > > + if (!fq.startsWith("{!collapse")) { > > + filtersToApply.add(fq); > > + } > > + } > > + params.set("fq", filtersToApply.toArray(new > > String[filtersToApply.size()])); > > + } > > > > Can someone confirm this? I would open a bug ticket then. (Since the code > > is unchanged in the latest version.) > > > > Thanks, > > Stefan > > >
Re: NPE during spell checking when result collapsing is activated and local parameters are used
Would you create a Jira issue anyway tu fix the fact that it NPE instead of throwing a bad request? On Fri, Nov 15, 2019 at 2:31 AM Stefan Walter wrote: > Indeed, you are right. Interestingly, it generally worked with the two {! > ..} in the filter query - besides the problem with the collations, of > course. Therefore I never questioned it... > > Thank you! > Stefan > > > Am 15. November 2019 um 00:01:52, Tomás Fernández Löbbe ( > tomasflo...@gmail.com) schrieb: > > I believe your syntax is incorrect. I believe local params must all be > included in between the same {!...}, and "{!" can only be at the beginning > > have you tried: > > ={!collapse tag=collapser field=productId sort='merchantOrder asc, > price asc, id asc'} > > > > On Thu, Nov 14, 2019 at 4:54 AM Stefan Walter wrote: > > > Hi! > > > > I have an issue with Solr 7.3.1 in the spell checking component: > > > > java.lang.NullPointerException at > > > > > > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > > > at > > > > > > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > > > at > > > > > > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > > > at > > > > > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > > > at > > > > > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > > > at > > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > > ... > > > > I have found an issue that addresses a similiar problem: > > https://issues.apache.org/jira/browse/SOLR-8807 > > > > The fix, which was introduced with this issue seems to miss our > situation, > > though. The relevant part of the query is this: > > > > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, > > price asc, id asc'} > > > > When I remove the local parameter {!tag=collapser} the collation works > > fine. Looking at the diff of the commit of the issue mentioned above, it > > seems that the "startsWith" could be the problem: > > > > + // Collate testing does not support the Collapse QParser (See > > SOLR-8807) > > + params.remove("expand"); > > + String[] filters = params.getParams(CommonParams.FQ); > > + if (filters != null) { > > + List filtersToApply = new ArrayList<>(filters.length); > > + for (String fq : filters) { > > + if (!fq.startsWith("{!collapse")) { > > + filtersToApply.add(fq); > > + } > > + } > > + params.set("fq", filtersToApply.toArray(new > > String[filtersToApply.size()])); > > + } > > > > Can someone confirm this? I would open a bug ticket then. (Since the code > > is unchanged in the latest version.) > > > > Thanks, > > Stefan > > >
Re: NPE during spell checking when result collapsing is activated and local parameters are used
Indeed, you are right. Interestingly, it generally worked with the two {! ..} in the filter query - besides the problem with the collations, of course. Therefore I never questioned it... Thank you! Stefan Am 15. November 2019 um 00:01:52, Tomás Fernández Löbbe ( tomasflo...@gmail.com) schrieb: I believe your syntax is incorrect. I believe local params must all be included in between the same {!...}, and "{!" can only be at the beginning have you tried: ={!collapse tag=collapser field=productId sort='merchantOrder asc, price asc, id asc'} On Thu, Nov 14, 2019 at 4:54 AM Stefan Walter wrote: > Hi! > > I have an issue with Solr 7.3.1 in the spell checking component: > > java.lang.NullPointerException at > > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > ... > > I have found an issue that addresses a similiar problem: > https://issues.apache.org/jira/browse/SOLR-8807 > > The fix, which was introduced with this issue seems to miss our situation, > though. The relevant part of the query is this: > > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, > price asc, id asc'} > > When I remove the local parameter {!tag=collapser} the collation works > fine. Looking at the diff of the commit of the issue mentioned above, it > seems that the "startsWith" could be the problem: > > + // Collate testing does not support the Collapse QParser (See > SOLR-8807) > + params.remove("expand"); > + String[] filters = params.getParams(CommonParams.FQ); > + if (filters != null) { > + List filtersToApply = new ArrayList<>(filters.length); > + for (String fq : filters) { > + if (!fq.startsWith("{!collapse")) { > + filtersToApply.add(fq); > + } > + } > + params.set("fq", filtersToApply.toArray(new > String[filtersToApply.size()])); > + } > > Can someone confirm this? I would open a bug ticket then. (Since the code > is unchanged in the latest version.) > > Thanks, > Stefan >
Re: NPE during spell checking when result collapsing is activated and local parameters are used
I believe your syntax is incorrect. I believe local params must all be included in between the same {!...}, and "{!" can only be at the beginning have you tried: ={!collapse tag=collapser field=productId sort='merchantOrder asc, price asc, id asc'} On Thu, Nov 14, 2019 at 4:54 AM Stefan Walter wrote: > Hi! > > I have an issue with Solr 7.3.1 in the spell checking component: > > java.lang.NullPointerException at > > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > ... > > I have found an issue that addresses a similiar problem: > https://issues.apache.org/jira/browse/SOLR-8807 > > The fix, which was introduced with this issue seems to miss our situation, > though. The relevant part of the query is this: > > ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, > price asc, id asc'} > > When I remove the local parameter {!tag=collapser} the collation works > fine. Looking at the diff of the commit of the issue mentioned above, it > seems that the "startsWith" could be the problem: > > +// Collate testing does not support the Collapse QParser (See > SOLR-8807) > +params.remove("expand"); > +String[] filters = params.getParams(CommonParams.FQ); > +if (filters != null) { > + List filtersToApply = new ArrayList<>(filters.length); > + for (String fq : filters) { > +if (!fq.startsWith("{!collapse")) { > + filtersToApply.add(fq); > +} > + } > + params.set("fq", filtersToApply.toArray(new > String[filtersToApply.size()])); > +} > > Can someone confirm this? I would open a bug ticket then. (Since the code > is unchanged in the latest version.) > > Thanks, > Stefan >
NPE during spell checking when result collapsing is activated and local parameters are used
Hi! I have an issue with Solr 7.3.1 in the spell checking component: java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) at org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) ... I have found an issue that addresses a similiar problem: https://issues.apache.org/jira/browse/SOLR-8807 The fix, which was introduced with this issue seems to miss our situation, though. The relevant part of the query is this: ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price asc, id asc'} When I remove the local parameter {!tag=collapser} the collation works fine. Looking at the diff of the commit of the issue mentioned above, it seems that the "startsWith" could be the problem: +// Collate testing does not support the Collapse QParser (See SOLR-8807) +params.remove("expand"); +String[] filters = params.getParams(CommonParams.FQ); +if (filters != null) { + List filtersToApply = new ArrayList<>(filters.length); + for (String fq : filters) { +if (!fq.startsWith("{!collapse")) { + filtersToApply.add(fq); +} + } + params.set("fq", filtersToApply.toArray(new String[filtersToApply.size()])); +} Can someone confirm this? I would open a bug ticket then. (Since the code is unchanged in the latest version.) Thanks, Stefan
Re: spell checking on query
Hi Midas, You can use Solr's spellcheck component: https://cwiki.apache.org/confluence/display/solr/Spell+Checking Emir On 14.11.2016 08:37, Midas A wrote: How can we do the query time spell checking with help of solr . -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
spell checking on query
How can we do the query time spell checking with help of solr .
Spell checking: What is left to the programmer?
Greetings! My Java app, using SolrJ, now successfully does searches. I've used the web interface to do a full-text indexing and for each new entry added through my app, I have it add to this index. But now I want to use SolrJ to also do spell checking. I have read several documents on this and examined a couple of Java examples, but one main question still persists. Let's first assume that I have my configuration XML files set up correctly and I can spell-check a word through the web interface using something like .../spell?q=missspelled=on. Assume also that the end user has typed in a paragraph and is about to submit the text. In the current implementation of my software, using SolorJ API, the text will get parsed into words and the words will be added to the search index. For spell-checking, however; I am puzzled. Is it up to me, the programmer, to parse the text into individual words and determine which words are misspelled, then run the query on a misspelled word to get a list of suggestions for that misspelled word?? Or does Solr itself parse the text string into words and run a query on every word, thus indicating which words are misspelled by the non-zero list of suggestions? Or is there a third option I haven't thought of (like, spell-check as I type)?? I'm just trying to picture the behavior in my head so I know what programming approach to take. Thanks for the help! Mark
Re: Spell checking the synonym list?
Thanks both! James, I like that approach. I'll give it a try. I forgot to mention I was only using query-time synonyms but shouldn't be a problem in my case to add synonyms during index-time. Ryan On Thu, 9 Jul 2015 at 22:07 Dyer, James james.d...@ingramcontent.com wrote: Ryan, If you use index-time synonyms on the spellcheck field, this will give you what you want. For instance, if the document has lawyer and you index both terms lawyer,attorney, then the spellchecker will see that atorney is 1 edit away from an indexed term and will suggest attorney. You'll need to have the same synonyms set up against the query field, but you have the option of making these query-time synonyms if you prefer. James Dyer Ingram Content Group -Original Message- From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] Sent: Thursday, July 09, 2015 2:28 AM To: solr-user@lucene.apache.org Subject: Spell checking the synonym list? Hi all, I'm wondering if it's possible to have spell checking performed on terms in the synonym list? For example, let's say I have documents with the word lawyer in them and I add lawyer, attorney in the synonyms.txt file. Then a query is made for the word atorney. Is there any way to provide spell checking on this? Thanks, Ryan
RE: Spell checking the synonym list?
Ryan, If you use index-time synonyms on the spellcheck field, this will give you what you want. For instance, if the document has lawyer and you index both terms lawyer,attorney, then the spellchecker will see that atorney is 1 edit away from an indexed term and will suggest attorney. You'll need to have the same synonyms set up against the query field, but you have the option of making these query-time synonyms if you prefer. James Dyer Ingram Content Group -Original Message- From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] Sent: Thursday, July 09, 2015 2:28 AM To: solr-user@lucene.apache.org Subject: Spell checking the synonym list? Hi all, I'm wondering if it's possible to have spell checking performed on terms in the synonym list? For example, let's say I have documents with the word lawyer in them and I add lawyer, attorney in the synonyms.txt file. Then a query is made for the word atorney. Is there any way to provide spell checking on this? Thanks, Ryan
Spell checking the synonym list?
Hi all, I'm wondering if it's possible to have spell checking performed on terms in the synonym list? For example, let's say I have documents with the word lawyer in them and I add lawyer, attorney in the synonyms.txt file. Then a query is made for the word atorney. Is there any way to provide spell checking on this? Thanks, Ryan
RE: Spell checking the synonym list?
One of the uses of synonyms is to replace a mis-spelled query term with a correctly spelled value. The 2 sided synonym file format allows you to control which values survive into the actual query. lawyer, attorney, ambulance chaser, atorney, lowyor = lawyer, attorney I am not aware, however, of any integration between synonym processing and a spellcheck dictionary. Makes sense, though. But I think additional metadata would be required, per dictionary entry, to govern synonym processing. Thus, building the dictionary would not be a transparent/automatic process. https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-SynonymFilter -Original Message- From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] Sent: Thursday, July 09, 2015 3:28 AM To: solr-user@lucene.apache.org Subject: Spell checking the synonym list? Hi all, I'm wondering if it's possible to have spell checking performed on terms in the synonym list? For example, let's say I have documents with the word lawyer in them and I add lawyer, attorney in the synonyms.txt file. Then a query is made for the word atorney. Is there any way to provide spell checking on this? Thanks, Ryan * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *
How make Searching fast in spell checking
Hello all, I have 49 GB of indexed data. I am doing spell checking things. I have applied ShingleFilter on both index and query part and taking 25 suggestions of each word in the query and not using collations. When I search a phrase(taken 5-6 words. Ex.- barack obama is president of America) then it takes 2 to 3 seconds to process while searching a single term(Ex. - barack) then it takes only 0.23 second which is good. Why phrase checking is taking time. Am I doing something wrong ? Any help on this?
Odd extra character duplicates in spell checking
Hi, I am going to make this question pretty short, so I don’t overwhelm with technical details until the end. I suspect that some folks may be seeing this issue without the particular configuration we are using. What our problem is: 1. Correctly spelled words are returning as not spelled correctly, with the original, correctly spelled word with a single oddball character appended as multiple suggestions. 2. Incorrectly spelled words are returning correct spelling suggestions with a single oddball character appended as multiple suggestions. 3. We’re seeing this in Solr 4.5x and 4.7x. Example: The return values are all a single character (unicode shown in square brackets). correction=attitude[2d] correction=attitude[2f] correction=attitude[2026] Spurious characters: * Unicode Character 'HYPHEN-MINUS' (U+002D) * Unicode Character 'SOLIDUS' (U+002F) * Unicode Character 'HORIZONTAL ELLIPSIS' (U+2026) Anybody see anything like this? Anybody fix something like this? Thanks! —Ed OK, here’s the gory details: What we are doing: We have developed an application that returns did you mean” spelling alternatives against a specific (presumably misspelled word). We’re using the vocabulary of indexed pages of a specified book as the source of the alternatives, so this is not a general dictionary spell check, we are returning only matching alternatives. So when I say “correctly spelled” I mean they are words found on at least one page. We are using the collations, so that we restrict ourselves to those pages in one book. We are having to check for and “fix up” these faulty results. That’s not a robust or desirable solution. We are using SolrJ to get the collations, private static final String DID_YOU_MEAN_REQUEST_HANDLER = /spell”; …. SolrQuery query = new SolrQuery(q); query.set(spellcheck, true); query.set(SpellingParams.SPELLCHECK_COUNT, 10); query.set(SpellingParams.SPELLCHECK_COLLATE, true); query.set(SpellingParams.SPELLCHECK_COLLATE_EXTENDED_RESULTS, true); query.set(wt, json); query.setRequestHandler(DID_YOU_MEAN_REQUEST_HANDLER); query.set(shards.qt, DID_YOU_MEAN_REQUEST_HANDLER); query.set(shards.tolerant, true); etc…… but we can duplicate the behavior without SolrJ with the collations/ misspellingsAndCorrections below:, e.g.: solr/pg1/spell?q=+doc-id:(810500)+AND+attitudexspellcheck=truespellcheck.count=10spellcheck.collate=truespellcheck.collateExtendedResults=truewt=jsonqt=%2Fspellshards.qt=%2Fspellshards.tolerant=true.out.print {responseHeader:{status:0,QTime:60},response:{numFound:0,start:0,maxScore:0.0,docs:[]},spellcheck:{suggestions:[attitudex,{numFound:6,startOffset:21,endOffset:30,origFreq:0,suggestion:[{word:attitudes,freq:362486},{word:attitu dex,freq:4819},{word:atti tudex,freq:3254},{word:attit udex,freq:159},{word:attitude-,freq:1080},{word:attituden,freq:261}]},correctlySpelled,false,collation,[collationQuery, doc-id:(810500) AND attitude-,hits,2,misspellingsAndCorrections,[attitudex,attitude-]],collation,[collationQuery, doc-id:(810500) AND attitude/,hits,2,misspellingsAndCorrections,[attitudex,attitude/]],collation,[collationQuery, doc-id:(810500) AND attitude…,hits,2,misspellingsAndCorrections,[attitudex,attitude…]]]}} The configuration is: requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dftext/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count10/str str name=spellcheck.alternativeTermCount5/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries10/str str name=spellcheck.maxCollations5/str name=last-components strspellcheck/str /arr /requestHandler lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtext/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges25/int int name=minBreakLength3/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtext/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.2/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections25/int int name=minQueryLength4/int float name=maxQueryFrequency1/float /lst -- Ed Smiley, Senior Software Architect, eBooks ProQuest | 161 E Evelyn Ave| Mountain View, CA 94041 | USA | +1 650 475 8700 extension 3772
Search using the result returned from the spell checking component
Hi, I've successfully configured the spell check component and it works well. I couldn't find an answer to my question so any help would be much appreciated: Can i send a single request to Solr, and make it so that if any part of the query was misspelled, than the search would be performed using the first spell suggestion that returns? I want to make only one request, e.g. submit a query only once, if that is possible. For example: if a user searched for jaca than the search would be performed only once - for java. Thanks an advance for any answer or a link to a relevant resource (I couldn't find any). -- View this message in context: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Search using the result returned from the spell checking component
What you want isn't supported. You always will need to issue that second request. This would be a nice feature to add though. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Roni [mailto:r...@socialarray.com] Sent: Monday, November 19, 2012 12:54 PM To: solr-user@lucene.apache.org Subject: Search using the result returned from the spell checking component Hi, I've successfully configured the spell check component and it works well. I couldn't find an answer to my question so any help would be much appreciated: Can i send a single request to Solr, and make it so that if any part of the query was misspelled, than the search would be performed using the first spell suggestion that returns? I want to make only one request, e.g. submit a query only once, if that is possible. For example: if a user searched for jaca than the search would be performed only once - for java. Thanks an advance for any answer or a link to a relevant resource (I couldn't find any). -- View this message in context: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Search using the result returned from the spell checking component
Thank you. I was wondering - what if a make a first request, and ask it to return only 1 result - will it still return the spell suggestions while avoiding the overhead of returning all relevant results? Than I could make a second request to get all the results i need. Would that work? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021140.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search using the result returned from the spell checking component
You can even request zero rows. That will still return the number of matches. --wunder On Nov 19, 2012, at 11:12 AM, Roni wrote: Thank you. I was wondering - what if a make a first request, and ask it to return only 1 result - will it still return the spell suggestions while avoiding the overhead of returning all relevant results? Than I could make a second request to get all the results i need. Would that work?
Re: Search using the result returned from the spell checking component
And performance-wise: is asking for 0 rows the same as asking for 100 rows? On Mon, Nov 19, 2012 at 9:22 PM, Walter Underwood [via Lucene] ml-node+s472066n4021143...@n3.nabble.com wrote: You can even request zero rows. That will still return the number of matches. --wunder On Nov 19, 2012, at 11:12 AM, Roni wrote: Thank you. I was wondering - what if a make a first request, and ask it to return only 1 result - will it still return the spell suggestions while avoiding the overhead of returning all relevant results? Than I could make a second request to get all the results i need. Would that work? -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021143.html To unsubscribe from Search using the result returned from the spell checking component, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4021135code=cm9uaUBzb2NpYWxhcnJheS5jb218NDAyMTEzNXwtMTQ5MzI5ODA0Mw== . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021144.html Sent from the Solr - User mailing list archive at Nabble.com.
spell checking and filtering in the same query
Background: I have an solr index containing foodtypes, chefs, and courses. This is an initial setup to test my configuration. Here is the problem I'm trying to solve : -When I query for a mispelt foodtype 'x' and filter by chef 'c' I should get a suggested list of foodtypes prepared by chef 'c' ok: I've managed to set up a spellcheck component so I can make the following query: /suggest?q=banspellcheck.dictionary=foodtypes This gets me the results 'banana bread' 'banoffee pie' How can I modify this query and the solr configuration to allow me to filter by another field? I'm aware the the fq parameter does not work with the SpellCheck component. Is there anyway of passing the results of the first query to a filter query? I've seen various posts on this topic, but no solutions. The best suggestion was to make the client make a second request, which is something I do not want to do. Is it possible to write a SearchComponent or SearchHandler that chains results? Thanks for any help. Mark http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
RE: spell checking and filtering in the same query
Mark, I'm not as familiar with the Suggester, but with normal spellcheck if you set spellcheck.maxCollationTries to something greater than 0 it will check the collations with the index. This checking includes any fq params you had. So in this sense the SpellCheckComponent does work with fq. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Mark Swinson [mailto:mark.swin...@bbc.co.uk] Sent: Thursday, February 09, 2012 7:38 AM To: solr-user@lucene.apache.org Subject: spell checking and filtering in the same query Background: I have an solr index containing foodtypes, chefs, and courses. This is an initial setup to test my configuration. Here is the problem I'm trying to solve : -When I query for a mispelt foodtype 'x' and filter by chef 'c' I should get a suggested list of foodtypes prepared by chef 'c' ok: I've managed to set up a spellcheck component so I can make the following query: /suggest?q=banspellcheck.dictionary=foodtypes This gets me the results 'banana bread' 'banoffee pie' How can I modify this query and the solr configuration to allow me to filter by another field? I'm aware the the fq parameter does not work with the SpellCheck component. Is there anyway of passing the results of the first query to a filter query? I've seen various posts on this topic, but no solutions. The best suggestion was to make the client make a second request, which is something I do not want to do. Is it possible to write a SearchComponent or SearchHandler that chains results? Thanks for any help. Mark http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
RE: Spell Checking a multi word phrase
Camden, You may also want to be aware that there is a new feature added to Spell Check's collate functionality that will guarantee the collations will return hits. It also is able to return more than one collation and tell you how many hits each one would result in if re-queried. This might do the same thing you're trying to do using shingles, but with more accuracy and less work. For info, look at spellcheck.collate, spellcheck.maxCollations, spellcheck.maxCollationTries spellcheck.collateExtendedResults on the component's wiki page: http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate This feature is committed to 3.x and 4.x and is available as a patch for 1.4.1 (here: https://issues.apache.org/jira/browse/SOLR-2010). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Camden Daily [mailto:cam...@jaunter.com] Sent: Monday, January 17, 2011 1:01 PM To: solr-user@lucene.apache.org Subject: Spell Checking a multi word phrase Hello all, I'm pretty new to Solr, and trying to set up a spell checker that can handle entire phrases. My goal would be to have something that could offer a suggestion of united states for a query of untied stats. I have a very large index, and I've worked a bit with creating shingles for the spelling index. The problem I'm running into now is that the SpellCheckComponent is always tokenizing the query that I pass to it. For example, a query like this http://localhost:8080/solr/spell?q=untied\statsspellcheck=truedebugQuery=on The debug information shows me that the parsed query is: PhraseQuery(text:untied stats) But I receive the spelling suggestions for untied and stats separately. From what I understand, this is not a case where I would want to collate; I simply want the entire phrase treated as one token. I found the following post after much searching that suggests setting up a custom QueryConverter: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E Does anyone know if that would be required? I had hoped to avoid Java code entirely with Solr (I haven't used Java in a very long time), but if I do need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be able to give me some tips of exactly how I would add that functionality to Solr? Relevant configs below: solrconfig.xml: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=fieldspellShingle/str str name=spellcheckIndexDir./spellShingle/str str name=queryAnalyzerFieldTypetextSpellShingle/str str name=buildOnOptimizetrue/str /lst /searchComponent schema.xml: fieldType name=textSpellShingle class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType (I had thought setting the KeywordTokenizer for the query analyzer would keep it from being tokenized, but it doesn't seem to make any difference) -Camden Daily
Re: Spell Checking a multi word phrase
James, Thank you, but I'm not sure that will work for my needs. I'm very interested in contextual spell checking. Take for example the author stephenie meyer. stephenie is a far less popular spelling than stephanie, but in this context it's the correct option. I feel like shingles with an un tokenized query string would be able to catch this, but I can't find too many examples of people attempting this. On Mon, Jan 17, 2011 at 2:19 PM, Dyer, James james.d...@ingrambook.comwrote: Camden, You may also want to be aware that there is a new feature added to Spell Check's collate functionality that will guarantee the collations will return hits. It also is able to return more than one collation and tell you how many hits each one would result in if re-queried. This might do the same thing you're trying to do using shingles, but with more accuracy and less work. For info, look at spellcheck.collate, spellcheck.maxCollations, spellcheck.maxCollationTries spellcheck.collateExtendedResults on the component's wiki page: http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate This feature is committed to 3.x and 4.x and is available as a patch for 1.4.1 (here: https://issues.apache.org/jira/browse/SOLR-2010). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Camden Daily [mailto:cam...@jaunter.com] Sent: Monday, January 17, 2011 1:01 PM To: solr-user@lucene.apache.org Subject: Spell Checking a multi word phrase Hello all, I'm pretty new to Solr, and trying to set up a spell checker that can handle entire phrases. My goal would be to have something that could offer a suggestion of united states for a query of untied stats. I have a very large index, and I've worked a bit with creating shingles for the spelling index. The problem I'm running into now is that the SpellCheckComponent is always tokenizing the query that I pass to it. For example, a query like this http://localhost:8080/solr/spell?q=untied\statsspellcheck=truedebugQuery=onhttp://localhost:8080/solr/spell?q=untied%5Cstatsspellcheck=truedebugQuery=on The debug information shows me that the parsed query is: PhraseQuery(text:untied stats) But I receive the spelling suggestions for untied and stats separately. From what I understand, this is not a case where I would want to collate; I simply want the entire phrase treated as one token. I found the following post after much searching that suggests setting up a custom QueryConverter: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E Does anyone know if that would be required? I had hoped to avoid Java code entirely with Solr (I haven't used Java in a very long time), but if I do need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be able to give me some tips of exactly how I would add that functionality to Solr? Relevant configs below: solrconfig.xml: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=fieldspellShingle/str str name=spellcheckIndexDir./spellShingle/str str name=queryAnalyzerFieldTypetextSpellShingle/str str name=buildOnOptimizetrue/str /lst /searchComponent schema.xml: fieldType name=textSpellShingle class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType (I had thought setting the KeywordTokenizer for the query analyzer would keep it from being tokenized, but it doesn't seem to make any difference) -Camden Daily
RE: Spell Checking a multi word phrase
Camden, Have you seen SmileyPugh's Solr book? They describe something very similar to what you're trying to do on p180ff. The difference seems to be they use a field that only has a couple of terms so they don't bother with shingles. The book makes a big point about using spellcheck.q in this case in order to get the analysis right. I'm not sure if this is the solution but I thought I'd mention it. I never tried spell checking this way because it seemed very limited and possibly quite expensive. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Camden Daily [mailto:cam...@jaunter.com] Sent: Monday, January 17, 2011 1:41 PM To: solr-user@lucene.apache.org Subject: Re: Spell Checking a multi word phrase James, Thank you, but I'm not sure that will work for my needs. I'm very interested in contextual spell checking. Take for example the author stephenie meyer. stephenie is a far less popular spelling than stephanie, but in this context it's the correct option. I feel like shingles with an un tokenized query string would be able to catch this, but I can't find too many examples of people attempting this. On Mon, Jan 17, 2011 at 2:19 PM, Dyer, James james.d...@ingrambook.comwrote: Camden, You may also want to be aware that there is a new feature added to Spell Check's collate functionality that will guarantee the collations will return hits. It also is able to return more than one collation and tell you how many hits each one would result in if re-queried. This might do the same thing you're trying to do using shingles, but with more accuracy and less work. For info, look at spellcheck.collate, spellcheck.maxCollations, spellcheck.maxCollationTries spellcheck.collateExtendedResults on the component's wiki page: http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate This feature is committed to 3.x and 4.x and is available as a patch for 1.4.1 (here: https://issues.apache.org/jira/browse/SOLR-2010). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Camden Daily [mailto:cam...@jaunter.com] Sent: Monday, January 17, 2011 1:01 PM To: solr-user@lucene.apache.org Subject: Spell Checking a multi word phrase Hello all, I'm pretty new to Solr, and trying to set up a spell checker that can handle entire phrases. My goal would be to have something that could offer a suggestion of united states for a query of untied stats. I have a very large index, and I've worked a bit with creating shingles for the spelling index. The problem I'm running into now is that the SpellCheckComponent is always tokenizing the query that I pass to it. For example, a query like this http://localhost:8080/solr/spell?q=untied\statsspellcheck=truedebugQuery=onhttp://localhost:8080/solr/spell?q=untied%5Cstatsspellcheck=truedebugQuery=on The debug information shows me that the parsed query is: PhraseQuery(text:untied stats) But I receive the spelling suggestions for untied and stats separately. From what I understand, this is not a case where I would want to collate; I simply want the entire phrase treated as one token. I found the following post after much searching that suggests setting up a custom QueryConverter: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E Does anyone know if that would be required? I had hoped to avoid Java code entirely with Solr (I haven't used Java in a very long time), but if I do need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be able to give me some tips of exactly how I would add that functionality to Solr? Relevant configs below: solrconfig.xml: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=fieldspellShingle/str str name=spellcheckIndexDir./spellShingle/str str name=queryAnalyzerFieldTypetextSpellShingle/str str name=buildOnOptimizetrue/str /lst /searchComponent schema.xml: fieldType name=textSpellShingle class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType (I had thought setting the KeywordTokenizer for the query analyzer would keep it from being tokenized, but it doesn't seem to make any difference) -Camden Daily
Re: Spell Checking a multi word phrase
James, Thanks, the spellcheck.q was exactly what I needed to be using! -Camden On Mon, Jan 17, 2011 at 3:54 PM, Dyer, James james.d...@ingrambook.comwrote: Camden, Have you seen SmileyPugh's Solr book? They describe something very similar to what you're trying to do on p180ff. The difference seems to be they use a field that only has a couple of terms so they don't bother with shingles. The book makes a big point about using spellcheck.q in this case in order to get the analysis right. I'm not sure if this is the solution but I thought I'd mention it. I never tried spell checking this way because it seemed very limited and possibly quite expensive. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Camden Daily [mailto:cam...@jaunter.com] Sent: Monday, January 17, 2011 1:41 PM To: solr-user@lucene.apache.org Subject: Re: Spell Checking a multi word phrase James, Thank you, but I'm not sure that will work for my needs. I'm very interested in contextual spell checking. Take for example the author stephenie meyer. stephenie is a far less popular spelling than stephanie, but in this context it's the correct option. I feel like shingles with an un tokenized query string would be able to catch this, but I can't find too many examples of people attempting this. On Mon, Jan 17, 2011 at 2:19 PM, Dyer, James james.d...@ingrambook.com wrote: Camden, You may also want to be aware that there is a new feature added to Spell Check's collate functionality that will guarantee the collations will return hits. It also is able to return more than one collation and tell you how many hits each one would result in if re-queried. This might do the same thing you're trying to do using shingles, but with more accuracy and less work. For info, look at spellcheck.collate, spellcheck.maxCollations, spellcheck.maxCollationTries spellcheck.collateExtendedResults on the component's wiki page: http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate This feature is committed to 3.x and 4.x and is available as a patch for 1.4.1 (here: https://issues.apache.org/jira/browse/SOLR-2010). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Camden Daily [mailto:cam...@jaunter.com] Sent: Monday, January 17, 2011 1:01 PM To: solr-user@lucene.apache.org Subject: Spell Checking a multi word phrase Hello all, I'm pretty new to Solr, and trying to set up a spell checker that can handle entire phrases. My goal would be to have something that could offer a suggestion of united states for a query of untied stats. I have a very large index, and I've worked a bit with creating shingles for the spelling index. The problem I'm running into now is that the SpellCheckComponent is always tokenizing the query that I pass to it. For example, a query like this http://localhost:8080/solr/spell?q=untied\statsspellcheck=truedebugQuery=onhttp://localhost:8080/solr/spell?q=untied%5Cstatsspellcheck=truedebugQuery=on http://localhost:8080/solr/spell?q=untied%5Cstatsspellcheck=truedebugQuery=on The debug information shows me that the parsed query is: PhraseQuery(text:untied stats) But I receive the spelling suggestions for untied and stats separately. From what I understand, this is not a case where I would want to collate; I simply want the entire phrase treated as one token. I found the following post after much searching that suggests setting up a custom QueryConverter: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200810.mbox/%3c1224516331.3820.119.ca...@localhost.localdomain.tld%3E Does anyone know if that would be required? I had hoped to avoid Java code entirely with Solr (I haven't used Java in a very long time), but if I do need to set up the 'MultiWordSpellingQueryConvert' class, would anyone be able to give me some tips of exactly how I would add that functionality to Solr? Relevant configs below: solrconfig.xml: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=fieldspellShingle/str str name=spellcheckIndexDir./spellShingle/str str name=queryAnalyzerFieldTypetextSpellShingle/str str name=buildOnOptimizetrue/str /lst /searchComponent schema.xml: fieldType name=textSpellShingle class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer
Re: Spell checking question from a Solr novice
On Mon, Oct 18, 2010 at 5:24 PM, Jason Blackerby jblacke...@gmail.comwrote: If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so: Or, you know, correct the data :-) -- Bill Dueber Library Systems Programmer University of Michigan Library
Spell checking question from a Solr novice
Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
RE: Spell checking question from a Solr novice
Oops, never mind. Just read Google API policy. 1000 queries per day limit for non-commercial use only. -Original Message- From: Xin Li Sent: Monday, October 18, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Spell checking question from a Solr novice Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
Re: Spell checking question from a Solr novice
In general, the benefit of the built-in Solr spellcheck is that it can use a dictionary based on your actual index. If you want to use some external API, you certainly can, in your actual client app -- but it doesn't really need to involve Solr at all anymore, does it? Is there any benefit I'm not thinking of to doing that on the solr side, instead of just in your client app? I think Yahoo (and maybe Microsoft?) have similar APIs with more generous ToSs, but I haven't looked in a while. Xin Li wrote: Oops, never mind. Just read Google API policy. 1000 queries per day limit for non-commercial use only. -Original Message- From: Xin Li Sent: Monday, October 18, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Spell checking question from a Solr novice Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
Re: Spell checking question from a Solr novice
I think a spellchecker based on your index has clear advantages. You can spellcheck words specific to your domain which may not be available in an outside dictionary. You can always dump the list from wordnet to get a starter english dictionary. But then it also means that misspelled words from your domain become the suggested correct word. Hmmm ... you'll need to have a way to prune out such words. Even then, your own domain based dictionary is a total go. On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote: In general, the benefit of the built-in Solr spellcheck is that it can use a dictionary based on your actual index. If you want to use some external API, you certainly can, in your actual client app -- but it doesn't really need to involve Solr at all anymore, does it? Is there any benefit I'm not thinking of to doing that on the solr side, instead of just in your client app? I think Yahoo (and maybe Microsoft?) have similar APIs with more generous ToSs, but I haven't looked in a while. Xin Li wrote: Oops, never mind. Just read Google API policy. 1000 queries per day limit for non-commercial use only. -Original Message- From: Xin Li Sent: Monday, October 18, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Spell checking question from a Solr novice Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
Re: Spell checking question from a Solr novice
If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so: fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=misspelled_words.txt/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all/ filter class=solr.LengthFilterFactory min=2 max=50/ /analyzer /fieldType where misspelled_words.txt contains the misspellings. On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh pksing...@gmail.com wrote: I think a spellchecker based on your index has clear advantages. You can spellcheck words specific to your domain which may not be available in an outside dictionary. You can always dump the list from wordnet to get a starter english dictionary. But then it also means that misspelled words from your domain become the suggested correct word. Hmmm ... you'll need to have a way to prune out such words. Even then, your own domain based dictionary is a total go. On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote: In general, the benefit of the built-in Solr spellcheck is that it can use a dictionary based on your actual index. If you want to use some external API, you certainly can, in your actual client app -- but it doesn't really need to involve Solr at all anymore, does it? Is there any benefit I'm not thinking of to doing that on the solr side, instead of just in your client app? I think Yahoo (and maybe Microsoft?) have similar APIs with more generous ToSs, but I haven't looked in a while. Xin Li wrote: Oops, never mind. Just read Google API policy. 1000 queries per day limit for non-commercial use only. -Original Message- From: Xin Li Sent: Monday, October 18, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Spell checking question from a Solr novice Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
Re: Spell checking question from a Solr novice
You can cross the new words against a dictionary and keep them in the file as Jason described... What Pradeep said is true, is always better to have suggestions related to your index that have suggestions with no results... On Mon, Oct 18, 2010 at 6:24 PM, Jason Blackerby jblacke...@gmail.comwrote: If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so: fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=misspelled_words.txt/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all/ filter class=solr.LengthFilterFactory min=2 max=50/ /analyzer /fieldType where misspelled_words.txt contains the misspellings. On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh pksing...@gmail.com wrote: I think a spellchecker based on your index has clear advantages. You can spellcheck words specific to your domain which may not be available in an outside dictionary. You can always dump the list from wordnet to get a starter english dictionary. But then it also means that misspelled words from your domain become the suggested correct word. Hmmm ... you'll need to have a way to prune out such words. Even then, your own domain based dictionary is a total go. On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote: In general, the benefit of the built-in Solr spellcheck is that it can use a dictionary based on your actual index. If you want to use some external API, you certainly can, in your actual client app -- but it doesn't really need to involve Solr at all anymore, does it? Is there any benefit I'm not thinking of to doing that on the solr side, instead of just in your client app? I think Yahoo (and maybe Microsoft?) have similar APIs with more generous ToSs, but I haven't looked in a while. Xin Li wrote: Oops, never mind. Just read Google API policy. 1000 queries per day limit for non-commercial use only. -Original Message- From: Xin Li Sent: Monday, October 18, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Spell checking question from a Solr novice Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. -- __ Ezequiel. Http://www.ironicnet.com
Re: Spell checking question from a Solr novice
The first question to ask is will it work for you. The SECOND question is do you want google to know what's in your data? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/18/10, Xin Li x...@book.com wrote: From: Xin Li x...@book.com Subject: Spell checking question from a Solr novice To: solr-user@lucene.apache.org Date: Monday, October 18, 2010, 12:43 PM Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
Spell checking and keyword tokenizer
Hi, I'm trying to spell check a whole field using a lowercasing keyword tokenizer [1]. for example if I query for furntree gully I'm hoping to get back ferntree gully as a suggestion. Unfortunately the spell checker seems to be recognizing this as two tokens and returning suggestions for both. Query [2] and result [3] below. In this case ferntree actually does end up with ferntree gully as a suggestion however it also gives bulla as a suggestion for gully (go figure :-) ). Any suggestions? Regards, Glen [1] - fieldType name=lowercase class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType [2] - Query q=locality_lc%3A%22furntree+gully%22spellcheck=truespellcheck.build=truespellcheck.reload=truespellcheck.accuracy=0.5spellcheck.dictionary=locality_spellcheckerspellcheck.collate=truefl=street_name%2Clocality%2Cstate [3] - response lst name=responseHeader int name=status 0 /int int name=QTime 379 /int lst name=params str name=spellcheck true /str str name=fl street_name,locality,state /str str name=spellcheck.accuracy 0.5 /str str name=q locality_lc:quot;furntree gullyquot; /str str name=spellcheck.dictionary locality_spellchecker /str str name=spellcheck.collate true /str str name=spellcheck.reload true /str str name=spellcheck.build true /str /lst /lst str name=command build /str result name=response numFound=0 start=0/ lst name=spellcheck lst name=suggestions lst name=furntree int name=numFound 1 /int int name=startOffset 13 /int int name=endOffset 21 /int arr name=suggestion str ferntree gully /str /arr /lst lst name=gully int name=numFound 1 /int int name=startOffset 22 /int int name=endOffset 27 /int arr name=suggestion str bulla /str /arr /lst str name=collation locality_lc:quot;ferntree gully bullaquot; /str /lst /lst /response
Re: Spell checking and keyword tokenizer
Nevermind this one... With a bit more research I discovered I can use spellcheck.q to provide the correct suggestion. On 14 September 2010 16:02, Glen Stampoultzis gst...@gmail.com wrote: Hi, I'm trying to spell check a whole field using a lowercasing keyword tokenizer [1]. for example if I query for furntree gully I'm hoping to get back ferntree gully as a suggestion. Unfortunately the spell checker seems to be recognizing this as two tokens and returning suggestions for both. Query [2] and result [3] below. In this case ferntree actually does end up with ferntree gully as a suggestion however it also gives bulla as a suggestion for gully (go figure :-) ). Any suggestions? Regards, Glen [1] - fieldType name=lowercase class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType [2] - Query q=locality_lc%3A%22furntree+gully%22spellcheck=truespellcheck.build=truespellcheck.reload=truespellcheck.accuracy=0.5spellcheck.dictionary=locality_spellcheckerspellcheck.collate=truefl=street_name%2Clocality%2Cstate [3] - response lst name=responseHeader int name=status 0 /int int name=QTime 379 /int lst name=params str name=spellcheck true /str str name=fl street_name,locality,state /str str name=spellcheck.accuracy 0.5 /str str name=q locality_lc:quot;furntree gullyquot; /str str name=spellcheck.dictionary locality_spellchecker /str str name=spellcheck.collate true /str str name=spellcheck.reload true /str str name=spellcheck.build true /str /lst /lst str name=command build /str result name=response numFound=0 start=0/ lst name=spellcheck lst name=suggestions lst name=furntree int name=numFound 1 /int int name=startOffset 13 /int int name=endOffset 21 /int arr name=suggestion str ferntree gully /str /arr /lst lst name=gully int name=numFound 1 /int int name=startOffset 22 /int int name=endOffset 27 /int arr name=suggestion str bulla /str /arr /lst str name=collation locality_lc:quot;ferntree gully bullaquot; /str /lst /lst /response
spell checking problem
hi all, i need some help in spellchecking.i configured my solrconfig and schema by looking the usermailing list and here i give you the configuration i made.. my schema.xml:: fieldType name=spellText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=spell type=spellText indexed=true stored=true multiValued=true/ copyField source=* dest=spell/ my solrconfig.xml: -- requestHandler name=spellchecker class=solr.SearchHandler startup=lazy lst name=defaults str name=spellcheck.dictionarydefault/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str /lst arr name=last-components strspellcheck/str /arr /requestHandler searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespellText/str lst name=spellchecker str name=namedefault/str str name=fieldname/str !-- the default field in solrconfig if i change to spell field then the dictionary is not created -- str name=spellcheckIndexDir./spell/str str name=buildOnCommittrue/str /lst !-- a spellchecker that uses a different distance measure-- lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellcheckerjaro/str /lst /searchComponent 1)the problem here is for the default dictionary the index is getting created and if i write jawa the suggestions it gives are data,sata.. but the actual sugest is java. I nearly have 20 java docs indexed 2)another problem ::: if i make build to jarowinkler dictionary which is using the spell field is not going to create the dictionary and i only see segments.gen and segments_1 in its directory regards, satya
spell checking....
hi all, i am a new one to solr and able to implement indexing the documents by following the solr wiki. now i am trying to add the spellchecking. i followed the spellcheck component in wiki but not getting the suggested spellings. i first build it by spellcheck.build=true,... here i give u the example::: http://localhost:8080/solr/spell?q=javsspellcheck=truespellcheck.collate=true response - /result lst name=spellcheck lst name=suggestions/ /lst /response here the response should actualy suggest the java but didnt.. can any one guide me about it... i am using solr 1.4, tomcat in ubuntu Regards, swarup
Re: spell checking....
This is in solrconfig.xml::: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.7/str str name=buildOnCommittrue/str str name=buildOnOptimizetrue/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldlowerfilt/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker/str str name=buildOnCommittrue/str str name=buildOnOptimizetrue/str /lst str name=queryAnalyzerFieldTypetextSpell/str /searchComponent !-- The SpellingQueryConverter to convert raw (CommonParams.Q) queries into tokens. Uses a simple regular expression to strip off field markup, boosts, ranges, etc. but it is not guaranteed to match an exact parse from the query parser. Optional, defaults to solr.SpellingQueryConverter -- queryConverter name=queryConverter class=org.apache.solr.spelling.SpellingQueryConverter/ i added the following in standard request handler:: requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str !-- Optional, must match spell checker's name as defined above, defaults to default -- str name=spellcheck.dictionarydefault/str !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler
Spell checking not working
I'm trying to setup a spell checker but failing misserably. I would like to have a spell check based on actual values injected into the index from other fields. The configuration is shown below. After indexing and running a query with 'spellcheck.build=true' I can see that the spellcheck index files updates, i.e. data must is being injected. I can also see that the injected documents have 'spell' fields, such as 'spell=closed'. I would therefore expect that a search for 'clo' would return these as suggestions. But I have tried the queries; [url]/solr/select?qt=huginnq=clospellcheck=true [url]/solr/select?qt=huginnq=clo*spellcheck=true [url]/solr/select?qt=huginnspell:clospellcheck=true [url]/solr/select?qt=huginnspell:clo*spellcheck=true With no effect. I do not get any hits back. What am I doing wrong? Cheers, Gert. -- SCHEMA.XML - fieldType name=testSpell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /types fields ... field name=ieStatus type=text indexed=true stored=true multiValued=true/ field name=spell type=textSpell indexed=true stored=true multiValued=true/ /fields copyField source=ieStatus dest=spell/ -- SOLRCONFIG.XML --- requestHanlder name=huginn class=solr.SearchHandler default=true lst name=defaults ... [setup as dismax handler]] lst arr name=last-components strspellcheck/str /arr /requestHandler searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellCheckIndexDir./spellcheck/default/str str name=accuracy0.5/str /lst /searchComponent Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: Spell checking: Is there a way to exclude words known to be wrong?
Use the stopwords feature with a custom mispeled_words.txt and a StopFilterFactory on the spell check field ;) Erik On Jul 13, 2009, at 8:27 PM, Jay Hill wrote: We're building a spell index from a field in our main index with the following configuration: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str str name=buildOnCommittrue/str /lst /searchComponent This works great and re-builds the spelling index on commits as expected. However, we know there are misspellings in the spell field of our main index. We could remove these from the spelling index using Luke, however they will be added again on commits. What we need is something similar to how the protwords.txt file is used. So that when we notice misspelled words such as beginnning being pulled from our main index we could add them to an exclusion file so they are not added to the spelling index again. Any tricks to make this possible? -Jay
Re: Spell checking: Is there a way to exclude words known to be wrong?
On Tue, Jul 14, 2009 at 6:37 PM, Erik Hatcher e...@ehatchersolutions.comwrote: Use the stopwords feature with a custom mispeled_words.txt and a StopFilterFactory on the spell check field ;) Very cool! :) -- Regards, Shalin Shekhar Mangar.
Spell checking: Is there a way to exclude words known to be wrong?
We're building a spell index from a field in our main index with the following configuration: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str str name=buildOnCommittrue/str /lst /searchComponent This works great and re-builds the spelling index on commits as expected. However, we know there are misspellings in the spell field of our main index. We could remove these from the spelling index using Luke, however they will be added again on commits. What we need is something similar to how the protwords.txt file is used. So that when we notice misspelled words such as beginnning being pulled from our main index we could add them to an exclusion file so they are not added to the spelling index again. Any tricks to make this possible? -Jay
Re: Spell checking: Is there a way to exclude words known to be wrong?
I don't think there is a way currently, but it might make a nice patch. Or you could just implement a custom SolrSpellChecker - both FileBasedSpellChecker and IndexBasedSpellChecker are actually like maybe 50 lines of code or less. It would be fairly quick to just plug a custom version in as a plugin. -- - Mark http://www.lucidimagination.com On Mon, Jul 13, 2009 at 8:27 PM, Jay Hill jayallenh...@gmail.com wrote: We're building a spell index from a field in our main index with the following configuration: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str str name=buildOnCommittrue/str /lst /searchComponent This works great and re-builds the spelling index on commits as expected. However, we know there are misspellings in the spell field of our main index. We could remove these from the spelling index using Luke, however they will be added again on commits. What we need is something similar to how the protwords.txt file is used. So that when we notice misspelled words such as beginnning being pulled from our main index we could add them to an exclusion file so they are not added to the spelling index again. Any tricks to make this possible? -Jay
Re: spell checking
Walter Underwood schrieb: query suggest --wunder That's very good. On the other hand, I noticed how the term spellcheck is spread all over the place, and that would be a massive renaming orgy. An explanation at the appropriate place in the documentation is less invasive. I added two sentences to the Introduction of: http://wiki.apache.org/solr/SpellCheckComponent Michael Ludwig
Re: spell checking
On Thu, Jun 4, 2009 at 7:26 PM, Walter Underwood wunderw...@netflix.comwrote: query suggest --wunder How about DidYouMeanComponent? -- Regards, Shalin Shekhar Mangar.
Re: spell checking
Yao Ge schrieb: Maybe we should call this alternative search terms or suggested search terms instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular (term frequency?) alternatives. I had exactly the same difficulty in understanding the concept because of the name given to the feature, which usually denotes just what it says, i.e. a spellchecker, which is driven by an authoritative dictionary and a set of rules, as integrated in word processors, in order to ensure orthography. What we have here is quite different from a spellchecker. IMHO, a name conveying the actual meaning, along the lines of suggest, would make more sense. Michael Ludwig
Re: spell checking
query suggest --wunder On 6/4/09 1:25 AM, Michael Ludwig m...@as-guides.com wrote: Yao Ge schrieb: Maybe we should call this alternative search terms or suggested search terms instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular (term frequency?) alternatives. I had exactly the same difficulty in understanding the concept because of the name given to the feature, which usually denotes just what it says, i.e. a spellchecker, which is driven by an authoritative dictionary and a set of rules, as integrated in word processors, in order to ensure orthography. What we have here is quite different from a spellchecker. IMHO, a name conveying the actual meaning, along the lines of suggest, would make more sense. Michael Ludwig
spell checking
Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23835427.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell checking
Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent On Jun 2, 2009, at 8:50 AM, Yao Ge wrote: Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23835427.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: spell checking
Yes. I did. I was not able to grasp the concept of making spell checking work. For example, the wiki page says an spell check index need to be built. But did not say how to do it. Does Solr buid the index out of thin air? Or the index is buit from the main index? or index is built form a dictionary or word list? Please help. Grant Ingersoll-6 wrote: Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent On Jun 2, 2009, at 8:50 AM, Yao Ge wrote: Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23835427.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23840843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell checking
Hello, This is how you build the SC index: http://wiki.apache.org/solr/SpellCheckComponent#head-78f5afcf43df544832809abc68dd36b98152670c Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yao Ge yao...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, June 2, 2009 5:03:24 PM Subject: Re: spell checking Yes. I did. I was not able to grasp the concept of making spell checking work. For example, the wiki page says an spell check index need to be built. But did not say how to do it. Does Solr buid the index out of thin air? Or the index is buit from the main index? or index is built form a dictionary or word list? Please help. Grant Ingersoll-6 wrote: Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent On Jun 2, 2009, at 8:50 AM, Yao Ge wrote: Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23835427.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23840843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell checking
The spell checking dictionary should be built on startup with spellchecking is enabled in the system. First we defined the component in solrconfig.xml. Notice how it has buildOnCommit to tell it rebuild the dictionary. searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldfield/str str name=spellcheckIndexDir./spellchecker1/str str name=accuracy0.5/str str name=buildOnCommittrue/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldfield/str !-- Use a different Distance Measure -- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/s tr str name=spellcheckIndexDir./spellchecker2/str str name=accuracy0.5/str str name=buildOnCommittrue/str /lst Second we added the component to the dismax handler: arr name=last-components strspellcheck/str /arr This seems to work for us. Hope it helps -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Yao Ge yao...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Tue, 2 Jun 2009 14:03:24 -0700 (PDT) To: solr-user@lucene.apache.org Subject: Re: spell checking Yes. I did. I was not able to grasp the concept of making spell checking work. For example, the wiki page says an spell check index need to be built. But did not say how to do it. Does Solr buid the index out of thin air? Or the index is buit from the main index? or index is built form a dictionary or word list? Please help. Grant Ingersoll-6 wrote: Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent On Jun 2, 2009, at 8:50 AM, Yao Ge wrote: Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23835427.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23840843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell checking
Sorry for not be able to get my point across. I know the syntax that leads to a index build for spell checking. I actually run the command saw some additional file created in data\spellchecker1 directory. What I don't understand is what is in there as I can not trick Solr to make spell suggestions based on the documented query structure in wiki. Can anyone tell me what happened after when the default spell check is built? In my case, I used copyField to copy a couple of text fields into a field called spell. These fields are the original text, they are the ones with typos that I need to run spell check on. But how can these original data be used as a base for spell checking? How does Solr know what are correctly spelled words? field name=tech_comment type=text indexed=true stored=true multiValued=true/ field name=cust_comment type=text indexed=true stored=true multiValued=true/ ... field name=spell type=textSpell indexed=true stored=true multiValued=true/ ... copyField source=tech_comment dest=spell/ copyField source=cust_comment dest=spell/ Yao Ge wrote: Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23841373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell checking
Hello, In short, the assumption behind this type of SC is that the text in the main index is (mostly) correctly spelled. When the SC finds query terms that are close in spelling to words indexed in SC, it offers spelling suggestions/correction using those presumably correctly spelled terms (there are other parameters that control the exact behaviour, but this is the idea) Solr (Lucene's spellchecker, which Solr uses under the hood, actually) turn the input text (values from those fields you copy to the spell field) into so called n-grams. You can see that if you open up the SC index with something like Luke. Please see http://wiki.apache.org/jakarta-lucene/SpellChecker . Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yao Ge yao...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, June 2, 2009 5:34:07 PM Subject: Re: spell checking Sorry for not be able to get my point across. I know the syntax that leads to a index build for spell checking. I actually run the command saw some additional file created in data\spellchecker1 directory. What I don't understand is what is in there as I can not trick Solr to make spell suggestions based on the documented query structure in wiki. Can anyone tell me what happened after when the default spell check is built? In my case, I used copyField to copy a couple of text fields into a field called spell. These fields are the original text, they are the ones with typos that I need to run spell check on. But how can these original data be used as a base for spell checking? How does Solr know what are correctly spelled words? multiValued=true/ multiValued=true/ ... multiValued=true/ ... Yao Ge wrote: Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23841373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell checking
Excellent. Now everything make sense to me. :-) The spell checking suggestion is the closest variance of user input that actually existed in the main index. So called correction is relative the text existed indexed. So there is no need for a brute force list of all correctly spelled words. Maybe we should call this alternative search terms or suggested search terms instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular (term frequency?) alternatives. Thanks for the insight. Otis Gospodnetic wrote: Hello, In short, the assumption behind this type of SC is that the text in the main index is (mostly) correctly spelled. When the SC finds query terms that are close in spelling to words indexed in SC, it offers spelling suggestions/correction using those presumably correctly spelled terms (there are other parameters that control the exact behaviour, but this is the idea) Solr (Lucene's spellchecker, which Solr uses under the hood, actually) turn the input text (values from those fields you copy to the spell field) into so called n-grams. You can see that if you open up the SC index with something like Luke. Please see http://wiki.apache.org/jakarta-lucene/SpellChecker . Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yao Ge yao...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, June 2, 2009 5:34:07 PM Subject: Re: spell checking Sorry for not be able to get my point across. I know the syntax that leads to a index build for spell checking. I actually run the command saw some additional file created in data\spellchecker1 directory. What I don't understand is what is in there as I can not trick Solr to make spell suggestions based on the documented query structure in wiki. Can anyone tell me what happened after when the default spell check is built? In my case, I used copyField to copy a couple of text fields into a field called spell. These fields are the original text, they are the ones with typos that I need to run spell check on. But how can these original data be used as a base for spell checking? How does Solr know what are correctly spelled words? multiValued=true/ multiValued=true/ ... multiValued=true/ ... Yao Ge wrote: Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23841373.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23844050.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell checking
I'm glad my late night explanation helped. You may be right about there being a better name for this functionality. Note that we do have support for file-based (dictionary-like) spellchecker, too. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yao Ge yao...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, June 2, 2009 9:42:48 PM Subject: Re: spell checking Excellent. Now everything make sense to me. :-) The spell checking suggestion is the closest variance of user input that actually existed in the main index. So called correction is relative the text existed indexed. So there is no need for a brute force list of all correctly spelled words. Maybe we should call this alternative search terms or suggested search terms instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular (term frequency?) alternatives. Thanks for the insight. Otis Gospodnetic wrote: Hello, In short, the assumption behind this type of SC is that the text in the main index is (mostly) correctly spelled. When the SC finds query terms that are close in spelling to words indexed in SC, it offers spelling suggestions/correction using those presumably correctly spelled terms (there are other parameters that control the exact behaviour, but this is the idea) Solr (Lucene's spellchecker, which Solr uses under the hood, actually) turn the input text (values from those fields you copy to the spell field) into so called n-grams. You can see that if you open up the SC index with something like Luke. Please see http://wiki.apache.org/jakarta-lucene/SpellChecker . Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yao Ge To: solr-user@lucene.apache.org Sent: Tuesday, June 2, 2009 5:34:07 PM Subject: Re: spell checking Sorry for not be able to get my point across. I know the syntax that leads to a index build for spell checking. I actually run the command saw some additional file created in data\spellchecker1 directory. What I don't understand is what is in there as I can not trick Solr to make spell suggestions based on the documented query structure in wiki. Can anyone tell me what happened after when the default spell check is built? In my case, I used copyField to copy a couple of text fields into a field called spell. These fields are the original text, they are the ones with typos that I need to run spell check on. But how can these original data be used as a base for spell checking? How does Solr know what are correctly spelled words? multiValued=true/ multiValued=true/ ... multiValued=true/ ... Yao Ge wrote: Can someone help providing a tutorial like introduction on how to get spell-checking work in Solr. It appears many steps are requires before the spell-checkering functions can be used. It also appears that a dictionary (a list of correctly spelled words) is required to setup the spell checker. Can anyone validate my impression? Thanks. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23841373.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/spell-checking-tp23835427p23844050.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spell checking not returning full terms
On Feb 4, 2009, at 7:54 PM, Rupert Fiasco wrote: Awesome! After reading up on the links you sent me I got it all working. Thanks! FYI - I did previously come across one of the links you sent over: http://wiki.apache.org/solr/SpellCheckerRequestHandler But what threw me off is that when I started reading about that yesterday, in the first paragraph it says that this component is deprecated and to use SpellCheckComponent - so at that point I stopped reading and went over to the component page. If I had kept reading I would have encountered all of the gritty details that I in fact needed to get it to work. The wiki entry makes it seem old and deprecated and is no longer relevant, but it certainly is. Hmmm, yeah, I see your point. Some people still use the SpellCheckerReqHandler. I made it more explicit on each of the pages by linking to a separate page: http://wiki.apache.org/solr/SpellCheckingAnalysis Feel free to add/modify based on your experience! -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout mail/wiki/docs/ JIRA) using Solr/Lucene: http://www.lucidimagination.com/search
Spell checking not returning full terms
We are using Solr 1.3 and trying to get spell checking functionality. FYI, our index contains a lot of medical terms (which might or might not make a difference as they are not English-y words, if that makes any sense?) If I specify a spellcheck query of spellcheck.q=diabtes I get suggestions of: strdiabet/str strdiabetogen/str strdilat/str strdiamet/str strdiatom/str strdiastol/str strdiactin/str strdialect/str If I re-mis-spell Diabetes to q=diabets then I go no suggestions. So first off two things: 1) Why would leaving out one e over the other affect the spelling suggestions so substantially? 2) In the former list of suggestions, notice the first suggestion is diabet, which isnt all that helpful, it should return something like diabetes or maybe even diabetic. Note that if I do a normal search against diabetes then I get a ton of results, in other words, our index is filled with terms of diabetes. My relevant solrconfig is: str name=queryAnalyzerFieldTypetext/str lst name=spellchecker str name=namedefault/str str name=fieldtext_t/str str name=spellcheckIndexDir./spellchecker1/str str name=accuracy0.1/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldtext_t/str !-- Use a different Distance Measure -- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker2/str str name=accuracy0.1/str /lst and I have spellcheck.count = 8 Notice that I severely bumped down the accuracy setting to get more results. Bumping it up higher yields less results (not sure what setting really meant so I dont know in what direction I want to change that value - I am guessing that a lower value allows for more mis-spellings, e.g. its more promiscuous). Our text and text_t fields are defined in schema.xml as: field name=text type=text indexed=true stored=false multiValued=true/ and dynamicField name=*_t type=text indexed=true stored=true multiValued=true / Any help would be appreciated. Thanks -Rupert
Re: Spell checking not returning full terms
I'm guessing the field you are checking against is being stemmed. The field you spell check against should have minimal analysis done to it, i.e. tokenization and probably downcasing. See http://wiki.apache.org/solr/SpellCheckComponent and http://wiki.apache.org/solr/SpellCheckerRequestHandler for tips on how to handle analysis for spelling. On Feb 4, 2009, at 2:33 PM, Rupert Fiasco wrote: We are using Solr 1.3 and trying to get spell checking functionality. FYI, our index contains a lot of medical terms (which might or might not make a difference as they are not English-y words, if that makes any sense?) If I specify a spellcheck query of spellcheck.q=diabtes I get suggestions of: strdiabet/str strdiabetogen/str strdilat/str strdiamet/str strdiatom/str strdiastol/str strdiactin/str strdialect/str If I re-mis-spell Diabetes to q=diabets then I go no suggestions. So first off two things: 1) Why would leaving out one e over the other affect the spelling suggestions so substantially? 2) In the former list of suggestions, notice the first suggestion is diabet, which isnt all that helpful, it should return something like diabetes or maybe even diabetic. Note that if I do a normal search against diabetes then I get a ton of results, in other words, our index is filled with terms of diabetes. My relevant solrconfig is: str name=queryAnalyzerFieldTypetext/str lst name=spellchecker str name=namedefault/str str name=fieldtext_t/str str name=spellcheckIndexDir./spellchecker1/str str name=accuracy0.1/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldtext_t/str !-- Use a different Distance Measure -- str name = distanceMeasure org.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker2/str str name=accuracy0.1/str /lst and I have spellcheck.count = 8 Notice that I severely bumped down the accuracy setting to get more results. Bumping it up higher yields less results (not sure what setting really meant so I dont know in what direction I want to change that value - I am guessing that a lower value allows for more mis-spellings, e.g. its more promiscuous). Our text and text_t fields are defined in schema.xml as: field name=text type=text indexed=true stored=false multiValued=true/ and dynamicField name=*_t type=text indexed=true stored=true multiValued=true / Any help would be appreciated. Thanks -Rupert -- Grant Ingersoll http://www.lucidimagination.com/ Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Spell checking not returning full terms
Awesome! After reading up on the links you sent me I got it all working. Thanks! FYI - I did previously come across one of the links you sent over: http://wiki.apache.org/solr/SpellCheckerRequestHandler But what threw me off is that when I started reading about that yesterday, in the first paragraph it says that this component is deprecated and to use SpellCheckComponent - so at that point I stopped reading and went over to the component page. If I had kept reading I would have encountered all of the gritty details that I in fact needed to get it to work. The wiki entry makes it seem old and deprecated and is no longer relevant, but it certainly is. -Rupert On Wed, Feb 4, 2009 at 11:57 AM, Grant Ingersoll gsing...@apache.org wrote: I'm guessing the field you are checking against is being stemmed. The field you spell check against should have minimal analysis done to it, i.e. tokenization and probably downcasing. See http://wiki.apache.org/solr/SpellCheckComponent and http://wiki.apache.org/solr/SpellCheckerRequestHandler for tips on how to handle analysis for spelling. On Feb 4, 2009, at 2:33 PM, Rupert Fiasco wrote: We are using Solr 1.3 and trying to get spell checking functionality. FYI, our index contains a lot of medical terms (which might or might not make a difference as they are not English-y words, if that makes any sense?) If I specify a spellcheck query of spellcheck.q=diabtes I get suggestions of: strdiabet/str strdiabetogen/str strdilat/str strdiamet/str strdiatom/str strdiastol/str strdiactin/str strdialect/str If I re-mis-spell Diabetes to q=diabets then I go no suggestions. So first off two things: 1) Why would leaving out one e over the other affect the spelling suggestions so substantially? 2) In the former list of suggestions, notice the first suggestion is diabet, which isnt all that helpful, it should return something like diabetes or maybe even diabetic. Note that if I do a normal search against diabetes then I get a ton of results, in other words, our index is filled with terms of diabetes. My relevant solrconfig is: str name=queryAnalyzerFieldTypetext/str lst name=spellchecker str name=namedefault/str str name=fieldtext_t/str str name=spellcheckIndexDir./spellchecker1/str str name=accuracy0.1/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldtext_t/str !-- Use a different Distance Measure -- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker2/str str name=accuracy0.1/str /lst and I have spellcheck.count = 8 Notice that I severely bumped down the accuracy setting to get more results. Bumping it up higher yields less results (not sure what setting really meant so I dont know in what direction I want to change that value - I am guessing that a lower value allows for more mis-spellings, e.g. its more promiscuous). Our text and text_t fields are defined in schema.xml as: field name=text type=text indexed=true stored=false multiValued=true/ and dynamicField name=*_t type=text indexed=true stored=true multiValued=true / Any help would be appreciated. Thanks -Rupert -- Grant Ingersoll http://www.lucidimagination.com/ Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ