Re: Strange anomaly(?) with string matching in query
: Does anybody have any further suggestions on what I might try in this : situation? Any tools perhaps that might help me put my finger on Solr's : pulse so I can figure out just what's going on in there at index and query : time? 1) FYI: you don't always need the settings on every filter to be the same ... WordDelimiterFilterFactory is a particular example where it frequently makes sense for hte settings to be different 2) to answer a question earlier in this thread: when using the analysis tool, it only shows you how the text would be analyzed -- it doesn't do a full query parse. so if you are doing phrase queries like myfield:aa bbb ccc ddd then you would paste this into the query section... aa bbb ccc ddd 3) i think all of the WDF discussion is a red herring ... looking back at the pastebin from your first email showing the document you expected to match the literal values stored in the dc_subjects field for that doc ends in a period (ie: .) ... but in your discussion of debugQuery and analysis.jsp you never used a . character at index or query time. My guess is it's in the indexed value, but not in the query value -- i could be wrong, i didn't load your schema to test whether WDF does something helpful with that trailing period -- but the first place to start doing debuging is to use the *exact* value (with period) in the index section of analysis.jsp. -Hoss
Re: Strange anomaly(?) with string matching in query
Does anybody have any further suggestions on what I might try in this situation? Any tools perhaps that might help me put my finger on Solr's pulse so I can figure out just what's going on in there at index and query time? -Kurt Kurt Nordstrom wrote: Changed the config so that both WordDelimiterFilterFactory settings on both index and query use: org.apache.solr.analysis.WordDelimiterFilterFactory {generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1} Restarted Solr, reindexed the records. Unfortunately, no change in the search results. It still won't find that pesky string. It seems to be generating the same results as before in the analysis page. Any other things I might try or diagnostics that might give useful output? -Kurt -- View this message in context: http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22785313.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange anomaly(?) with string matching in query
Kurt, Attributes for WordDelimiterFilterFactory have different values in the index vs. query sections. Do things work if you make them identical? (you'll have to reindex) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kurt Nordstrom knordst...@library.unt.edu To: solr-user@lucene.apache.org Sent: Wednesday, March 25, 2009 1:33:41 PM Subject: Re: Strange anomaly(?) with string matching in query Otis, Absolutely. Here are the tokenizers and filters for the text fieldtype in the schema. http://pastebin.com/f2bb249f3 Thanks! That's what I suspected. Want to paste the relevant tokenizer+filters sections of your schema? The index-time and query-time analysis has to be the same or compatible enough, and that's not the case here. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -- View this message in context: http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22707191.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange anomaly(?) with string matching in query
Changed the config so that both WordDelimiterFilterFactory settings on both index and query use: org.apache.solr.analysis.WordDelimiterFilterFactory {generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1} Restarted Solr, reindexed the records. Unfortunately, no change in the search results. It still won't find that pesky string. It seems to be generating the same results as before in the analysis page. Any other things I might try or diagnostics that might give useful output? -Kurt Otis Gospodnetic wrote: Kurt, Attributes for WordDelimiterFilterFactory have different values in the index vs. query sections. Do things work if you make them identical? (you'll have to reindex) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kurt Nordstrom knordst...@library.unt.edu To: solr-user@lucene.apache.org Sent: Wednesday, March 25, 2009 1:33:41 PM Subject: Re: Strange anomaly(?) with string matching in query Otis, Absolutely. Here are the tokenizers and filters for the text fieldtype in the schema. http://pastebin.com/f2bb249f3 Thanks! That's what I suspected. Want to paste the relevant tokenizer+filters sections of your schema? The index-time and query-time analysis has to be the same or compatible enough, and that's not the case here. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -- View this message in context: http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22707191.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22726833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange anomaly(?) with string matching in query
Hi, Take the whole string to your Solr Admin - Analysis page and analyze it. Does it get analyzed the way you'd expect it to be analyzed? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kurt Nordstrom knordst...@library.unt.edu To: solr-user@lucene.apache.org Sent: Wednesday, March 25, 2009 11:52:07 AM Subject: Strange anomaly(?) with string matching in query Hello, We've encountered a strange issue in our Solr install regarding a particular string that just doesn't seem to want to return results, despite the exact same string being in the index. What makes it even stranger is that we had the same data in a previous install of Solr, and it worked there, but doesn't here. The string that's been showing the trouble is Abilene Christian College -- Students -- Yearbooks. The field, in this case, is of type text. Strangely enough, when we search for Abilene Christian College -- Students --, the relevant documents are returned. It just fails when the full string is specified. At this point, I'm a little bit stymied. Any suggestions or ideas would be highly appreciated. In order to possibly help with diagnosis, I'm including links to, hopefully, relevant outputs and configurations. We're using Solr version 1.3. This is the output of a search for the string, with debugQuery turned on. http://pastebin.com/f72c017c1 This is the output of a document containing the string in question. The field is dc_subject. http://pastebin.com/f17a2e722 Here is our current schema. http://pastebin.com/f2768bece If there's any more information or diagnostics that I can post or run, please let me know. Thanks for your help and suggestions. -Kurt -- View this message in context: http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22704639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange anomaly(?) with string matching in query
Otis: Okay, I'm not sure whether I should be including the quotes in the query when using the analyzer, so I've run it both ways (no quotes on the index value). I'll try to approximate the final tables returned for each term: The field is dc_subject in both cases, being of type text *** Version 1 (With Quotes) Index Value: Abilene Christian College -- Students -- Yearbooks Query Value: Abilene Christian College -- Students -- Yearbooks Index final table: 1 2 3 4 5 abilene christian college students yearbooks Query final table: 1 2 3 4 6 abilene christian college studentsyearbooks Version 2 (Without Quotes) Index Value: Abilene Christian College -- Students -- Yearbooks Query Value: Abilene Christian College -- Students -- Yearbooks Index final table: 1 2 3 4 5 abilene christian college students yearbooks Query final table: 1 2 3 4 5 abilene christian college students yearbooks *** The main difference seems to be that there is no 5 index when I surround the string with quotes, and instead it skips to 6. This happens at the WordDelimiterFilterFactory step. It seems to me like those tokens should be returning a match, but either way, apparently they're not? Any suggestions at this point? Otis Gospodnetic wrote: Hi, Take the whole string to your Solr Admin - Analysis page and analyze it. Does it get analyzed the way you'd expect it to be analyzed? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -- View this message in context: http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22706495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange anomaly(?) with string matching in query
Otis, Absolutely. Here are the tokenizers and filters for the text fieldtype in the schema. http://pastebin.com/f2bb249f3 Thanks! That's what I suspected. Want to paste the relevant tokenizer+filters sections of your schema? The index-time and query-time analysis has to be the same or compatible enough, and that's not the case here. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -- View this message in context: http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22707191.html Sent from the Solr - User mailing list archive at Nabble.com.