custom handler results don't seem to match manually entered query string
Hi, my problem is as follows: my request handler's code filters = null; DocListAndSet docs_main = searcher.getDocListAndSet(query, filters, null, start, rows, flags); String querystr = query.toString(); rsp.add(QUERY_main, querystr); gives zero responses: str name=QUERY_main((text:Travel text:Home text:Online_Archives text:Ireland text:Consumer_Information text:Regional text:Europe text:News text:Complaints text:CNN.com text:February text:Transport text:Airlines)^0.3)/str result name=response numFound=0 start=0 maxScore=0.0 / While copying the QUERY_main string into Solr admin returns full of them: str name=q (text:Travel text:Home text:Online_Archives text:Ireland text:Consumer_Information text:Regional text:Europe text:News text:Complaints text:CNN.com text:February text:Transport text:Airlines)^0.3 /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=71584 start=0 Please help me understand what's going on, I'm a bit confused atm. Thanks :-) -- View this message in context: http://www.nabble.com/custom-handler-results-don%27t-seem-to-match-manually-entered-query-string-tp15544268p15544268.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Integrated Spellchecking
Hey Doug, If you have permission to donate, perhaps you can just post the patch anyway and state that it isn't quite ready to go. This is something I could use too, and so may have some cycles to work on it. I hate to replicate the work if you already have something that is more or less working. A half baked patch is better than no patch. -Grant On Feb 15, 2008, at 12:45 PM, Doug Steigerwald wrote: That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/ spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: quick question
I think I remembered Was it Set name=Hostlocalhost/Set in the jetty.xml addlistener section? On 18 Feb 2008, at 14:44, matt davies wrote: Hello everyone I've forgotten where I stipulated in my solr that the solr admin back end was only viewable from localhost. Can anyone point me in the right direction? thanks
quick question
Hello everyone I've forgotten where I stipulated in my solr that the solr admin back end was only viewable from localhost. Can anyone point me in the right direction? thanks
Re: quick question
Hello Everyone, I'm having some issues getting SOLR to work with our data. I'm using it to index incident data for our technical support department. The two main issues: 1) As an example, searching for binarydata_groupdocument_fk returns nothing, while searching for BinaryData_GroupDocument_FK returns results. I have the lowercasefilterfactory applied to both the index and query analyzers. Does this not actually set everything to lower case? From the wiki at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters, it says Creates tokens by lowercasing all letters and dropping non-letters but that does not seem to be happening here. 2) Some of our data is one sentence. Some is over 5 MB of text. When searching for a term, it's returning the one sentence data first because the fieldNorm is so different (0.4 for one, 0.002 for others). Is there a way to disable using the fieldnorm in the score calculation? An alternative I tried was posting parts of the data in as different values of the field (so having multiple tags of that field-name in the add xml post), but that appeared to have zero effect on the results - even the querydebugger showed the exact same calculation for the search. Does anyone know how to disable the fieldnorm, or have the score created from adding the scores from each value of a multivalued field? 3) I discovered that searching for 'certificate not found' (using the double quotes for a phrase here) did not return any results, even though the phrase did exist (and was lower case originally too, so different than my first issue). I discovered it was because of the stopword not, but the same stopfilterfactory was applied to both the index and query analyzers. Am I doing something wrong there? As a workaround I'm having php manually removing stopwords from the querystring, which is a real pain. Here is my fieldtype I do the actual searches on: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Any help or advice would be greatly appreciated, thanks! -Reece
Re: Using embedded Solr with admin GUI
If you are running a single webapp, you can just put the jsp files in there. I'm guessing that isn't what you mean though. Well, ultimately we're heading towards a single webapp with multiple embedded Solr cores. In that case, could the .jsp-based GUI/admin functionality peacefully co-exist with our use of the embedded cores? There are a bunch of admin request handlers that do many of the things from the /admin/ jsp files without the nice interface. The one major missing component was analysis.jsp, but grant just added: https://issues.apache.org/jira/browse/SOLR-477 Is there a description of the roadmap for the Solr GUI? For example, I'm assuming the .jsp files will still exist going forward, but will become much more of just a GUI layer on top of the new/beefed up admin request handlers - yes? Or is the plan to eventually get to just Javascript on HTML using JSON responses from these request handlers? Thanks, -- Ken Ken Krugler wrote: Hi all, We're moving towards embedding multiple Solr cores, versus using multiple Solr webapps, as a way of simplifying our build/deploy and also getting more control over the startup/update process. But I'd hate to lose that handy GUI for inspecting the schema and (most importantly) trying out queries with explain turned on. Has anybody tried this dual-mode method of operation? Thoughts on whether it's workable, and what the issues would be? I've taken a quick look at the .jsp and supporting Java code, and have some ideas on what would be needed, but I'm hoping there's an easy(er) approach than just whacking at the admin support code. Thanks, -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it
Re: quick question
Beating Hossman to the punch http://people.apache.org/~hossman/#threadhijackhttp://people.apache.org/%7Ehossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking On Feb 18, 2008 10:48 AM, Reece [EMAIL PROTECTED] wrote: Hello Everyone, I'm having some issues getting SOLR to work with our data. I'm using it to index incident data for our technical support department. The two main issues: 1) As an example, searching for binarydata_groupdocument_fk returns nothing, while searching for BinaryData_GroupDocument_FK returns results. I have the lowercasefilterfactory applied to both the index and query analyzers. Does this not actually set everything to lower case? From the wiki at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters, it says Creates tokens by lowercasing all letters and dropping non-letters but that does not seem to be happening here. 2) Some of our data is one sentence. Some is over 5 MB of text. When searching for a term, it's returning the one sentence data first because the fieldNorm is so different (0.4 for one, 0.002 for others). Is there a way to disable using the fieldnorm in the score calculation? An alternative I tried was posting parts of the data in as different values of the field (so having multiple tags of that field-name in the add xml post), but that appeared to have zero effect on the results - even the querydebugger showed the exact same calculation for the search. Does anyone know how to disable the fieldnorm, or have the score created from adding the scores from each value of a multivalued field? 3) I discovered that searching for 'certificate not found' (using the double quotes for a phrase here) did not return any results, even though the phrase did exist (and was lower case originally too, so different than my first issue). I discovered it was because of the stopword not, but the same stopfilterfactory was applied to both the index and query analyzers. Am I doing something wrong there? As a workaround I'm having php manually removing stopwords from the querystring, which is a real pain. Here is my fieldtype I do the actual searches on: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Any help or advice would be greatly appreciated, thanks! -Reece
Questions about filters and scoring
Hello Everyone, First off, sorry about the thread hijack earlier, it was not intentional. Back to the point though, I'm having some issues getting SOLR to work with our dataset. I'm using it to index ticket data for our technical support department. Below are a few of the problems I've been having, and the wiki hasn't had much to say about them. 1) As an example, searching for binarydata_groupdocument_fk returns nothing, while searching for BinaryData_GroupDocument_FK returns results. I have the lowercasefilterfactory applied to both the index and query analyzers. Does this not actually set everything to lower case? From the wiki at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters, it says Creates tokens by lowercasing all letters and dropping non-letters but that does not seem to be happening here. Am I forgetting to configure something? 2) Some of our data is one sentence. Some is over 5 MB of text. When searching for a term, it's returning the one sentence data first because the fieldNorm is so different (0.4 for one, 0.002 for others). Is there a way to disable using the fieldnorm in the score calculation? An alternative I tried was posting parts of the data in as different values of the field (so having multiple tags of that field-name in the add xml post), but that appeared to have zero effect on the results - even the querydebugger showed the exact same calculation for the search. Does anyone know how to disable the fieldnorm, or have the score created from adding the scores from each value of a multivalued field? 3) I discovered that searching for 'certificate not found' (using the double quotes for a phrase here) did not return any results, even though the phrase did exist (and was lower case originally too, so different than my first issue). I discovered it was because of the stopword not, but the same stopfilterfactory was applied to both the index and query analyzers. Am I doing something wrong there? As a workaround I'm having php manually removing stopwords from the querystring, which is a real pain. I'm thinking my filters aren't being applied correctly since this is similar to issue #1 but with a different filter. Here is my fieldtype I do the actual searches on: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Any help or advice would be greatly appreciated, thanks! -Reece
Re: Questions about filters and scoring
On Feb 18, 2008 3:56 PM, Reece [EMAIL PROTECTED] wrote: Hello Everyone, First off, sorry about the thread hijack earlier, it was not intentional. Back to the point though, I'm having some issues getting SOLR to work with our dataset. I'm using it to index ticket data for our technical support department. Below are a few of the problems I've been having, and the wiki hasn't had much to say about them. 1) As an example, searching for binarydata_groupdocument_fk returns nothing, while searching for BinaryData_GroupDocument_FK returns results. I have the lowercasefilterfactory applied to both the index and query analyzers. Does this not actually set everything to lower case? From the wiki at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters, it says Creates tokens by lowercasing all letters and dropping non-letters but that does not seem to be happening here. Am I forgetting to configure something? Did you re-index? 2) Some of our data is one sentence. Some is over 5 MB of text. When searching for a term, it's returning the one sentence data first because the fieldNorm is so different (0.4 for one, 0.002 for others). Is there a way to disable using the fieldnorm in the score calculation? It's probably Lucene's default length normalization over-emphasizing short fields. You could use a better similarity for your data, or turn off length normalization by setting omitNorms=true for that field in the schema and then re-indexing (make sure to delete the old index entirely first). An alternative I tried was posting parts of the data in as different values of the field (so having multiple tags of that field-name in the add xml post), but that appeared to have zero effect on the results - even the querydebugger showed the exact same calculation for the search. Does anyone know how to disable the fieldnorm, or have the score created from adding the scores from each value of a multivalued field? 3) I discovered that searching for 'certificate not found' (using the double quotes for a phrase here) did not return any results, even though the phrase did exist (and was lower case originally too, so different than my first issue). I discovered it was because of the stopword not, but the same stopfilterfactory was applied to both the index and query analyzers. Am I doing something wrong there? As a workaround I'm having php manually removing stopwords from the querystring, which is a real pain. I'm thinking my filters aren't being applied correctly since this is similar to issue #1 but with a different filter. Hmmm, looks like a recent change in lucene probably causes this bug. Could you open a new Solr JIRA issue to report this bug? -Yonik
Re: Questions about filters and scoring
On Feb 18, 2008 4:42 PM, Yonik Seeley [EMAIL PROTECTED] wrote: Hmmm, looks like a recent change in lucene probably causes this bug. Nope... I just checked Solr 1.2, and it shows the same behavior. With the example data, a query of optimized for high finds the solr document, but optimized for high does not. -Yonik
Re: Questions about filters and scoring
On Feb 18, 2008 5:05 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Feb 18, 2008 4:42 PM, Yonik Seeley [EMAIL PROTECTED] wrote: Hmmm, looks like a recent change in lucene probably causes this bug. Nope... I just checked Solr 1.2, and it shows the same behavior. With the example data, a query of optimized for high finds the solr document, but optimized for high does not. Scratch that... both Solr 1.2 and trunk seem to work fine for me. My test was flawed because I was searching for optimized for high while the solr document had a misspelling: Optimizied -Yonik
Re: Questions about filters and scoring
On Feb 18, 2008 4:50 PM, Lance Norskog [EMAIL PROTECTED] wrote: But then would not 'certificate anystopword found' match your phrase? Yes... stopwords are ignored, so that's what it should do in general. -Yonik
RE: Questions about filters and scoring
3) But then would not 'certificate anystopword found' match your phrase? I wound up making a separate index without stopwords just so that my phrase lookups would work. (I do not have the luxury of re-indexing, so now I'm stuck with this design even if there is a better one.) I also made one with the phonetic DoubleMetaphone analyzer. This is really useful, especially for spell checking. Cheers, Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, February 18, 2008 1:43 PM To: solr-user@lucene.apache.org; Reece Subject: Re: Questions about filters and scoring On Feb 18, 2008 3:56 PM, Reece [EMAIL PROTECTED] wrote: Hello Everyone, First off, sorry about the thread hijack earlier, it was not intentional. Back to the point though, I'm having some issues getting SOLR to work with our dataset. I'm using it to index ticket data for our technical support department. Below are a few of the problems I've been having, and the wiki hasn't had much to say about them. 1) As an example, searching for binarydata_groupdocument_fk returns nothing, while searching for BinaryData_GroupDocument_FK returns results. I have the lowercasefilterfactory applied to both the index and query analyzers. Does this not actually set everything to lower case? From the wiki at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters, it says Creates tokens by lowercasing all letters and dropping non-letters but that does not seem to be happening here. Am I forgetting to configure something? Did you re-index? 2) Some of our data is one sentence. Some is over 5 MB of text. When searching for a term, it's returning the one sentence data first because the fieldNorm is so different (0.4 for one, 0.002 for others). Is there a way to disable using the fieldnorm in the score calculation? It's probably Lucene's default length normalization over-emphasizing short fields. You could use a better similarity for your data, or turn off length normalization by setting omitNorms=true for that field in the schema and then re-indexing (make sure to delete the old index entirely first). An alternative I tried was posting parts of the data in as different values of the field (so having multiple tags of that field-name in the add xml post), but that appeared to have zero effect on the results - even the querydebugger showed the exact same calculation for the search. Does anyone know how to disable the fieldnorm, or have the score created from adding the scores from each value of a multivalued field? 3) I discovered that searching for 'certificate not found' (using the double quotes for a phrase here) did not return any results, even though the phrase did exist (and was lower case originally too, so different than my first issue). I discovered it was because of the stopword not, but the same stopfilterfactory was applied to both the index and query analyzers. Am I doing something wrong there? As a workaround I'm having php manually removing stopwords from the querystring, which is a real pain. I'm thinking my filters aren't being applied correctly since this is similar to issue #1 but with a different filter. Hmmm, looks like a recent change in lucene probably causes this bug. Could you open a new Solr JIRA issue to report this bug? -Yonik
Re: Questions about filters and scoring
For #1, I just testing again and found the problem. WordDelimiterFilterFactory was splitting the words up because it had capitals in the middle of the word, so a lower case version was seen as a different set of tokens. For #2, I'll try using that attribute for the fieldtype and let you know how it goes, but that looks like exactly what I needed. For #3, I'll test it again tomorrow and make sure I didn't have a mistype or something. Thanks for the help! -Reece On Feb 18, 2008 5:11 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Feb 18, 2008 5:05 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Feb 18, 2008 4:42 PM, Yonik Seeley [EMAIL PROTECTED] wrote: Hmmm, looks like a recent change in lucene probably causes this bug. Nope... I just checked Solr 1.2, and it shows the same behavior. With the example data, a query of optimized for high finds the solr document, but optimized for high does not. Scratch that... both Solr 1.2 and trunk seem to work fine for me. My test was flawed because I was searching for optimized for high while the solr document had a misspelling: Optimizied -Yonik
Field Search
Hi, I have a content with *title* as *Advertise* under a *category* *Sales* And also, I have 3 other contents with *titles {TV, Web,Radio}* under a *category Advertise* ** Now, if I try to search for, *title:Advertise --- * I am getting following results: == Title --- Category == Advertise --- Sales TV --- Advertise Radio --- Advertise Web --- Advertise I have given a query such that get me all title which matches *Advertise*. But, it displayed all content which matches advertise(even though title is different) How can I search a content which matches only *advertise* as *title*? Schema.xml defaultSearchFieldtext/defaultSearchField solrQueryParser defaultOperator=OR/ Thanks in advance for your precious time. -kmu
Re: Field Search
Hi, I forgot to put my dismax request handler. requestHandler name=dismax class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qf text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 /str str name=pf text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 /str str name=bf ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 /str str name=fl id,name,price,score /str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler Do you think this is causing some issue? -Thanks and Regards, kmu On Feb 19, 2008 6:31 AM, Mahesh Udupa [EMAIL PROTECTED] wrote: Hi, I have a content with *title* as *Advertise* under a *category* *Sales* And also, I have 3 other contents with *titles {TV, Web,Radio}* under a *category Advertise* ** Now, if I try to search for, *title:Advertise --- * I am getting following results: == Title --- Category == Advertise --- Sales TV --- Advertise Radio --- Advertise Web --- Advertise I have given a query such that get me all title which matches *Advertise *. But, it displayed all content which matches advertise(even though title is different) How can I search a content which matches only *advertise* as *title*? Schema.xml defaultSearchFieldtext/defaultSearchField solrQueryParser defaultOperator=OR/ Thanks in advance for your precious time. -kmu
Problem using wildcard characters ? and *
Hi, I am using Solr in my application to search some blocks. These blocks have unique key = block name + block id When i try to search a block uisng '?' it works partially.. When i give a search for au?it (audit is the name of the block) it shows correct results. But when i try same thing with crea?e (create is the name of the block) no results are displayed.. Both audit and create are stored in the same place. Also, wildcard character '*' works partially.. If i give a search for del* (i wana search for delete block).. it shows two results: delete and softdelete...(this should not have come) i tried changing the cases also but it does not work. can somebody please explain the reason -- View this message in context: http://www.nabble.com/Problem-using-wildcard-characters---and-*-tp15554272p15554272.html Sent from the Solr - User mailing list archive at Nabble.com.
Special character in queries
Hi, I have 2 content with name as follows: a) NewSong sing --- with in the name b) sing I need to search for the content *a* ( i.e. *NewSong sing*), I tried with \NewSong\ sing, It failed to search. But If I try to search with NewSong sing it displays both(that is expected). It would be great if some one please suggest me what is wrong in first query. Thanks in advance. -kmu