Re: Strange anomaly(?) with string matching in query

2009-04-07 Thread Chris Hostetter

: Does anybody have any further suggestions on what I might try in this
: situation?  Any tools perhaps that might help me put my finger on Solr's
: pulse so I can figure out just what's going on in there at index and query
: time?

1) FYI: you don't always need the settings on every filter to be the same 
... WordDelimiterFilterFactory is a particular example where it frequently 
makes sense for hte settings to be different

2) to answer a question earlier in this thread: when using the analysis 
tool, it only shows you how the text would be analyzed -- it doesn't do a 
full query parse.  so if you are doing phrase queries like myfield:aa 
bbb ccc ddd then you would paste this into the query section...
aa bbb ccc ddd

3) i think all of the WDF discussion is a red herring ... looking back at 
the pastebin from your first email showing the document you expected to 
match the literal values stored in the dc_subjects field for that doc ends 
in a period (ie: .) ... but in your discussion of debugQuery and 
analysis.jsp you never used a . character at index or query time.  

My guess is it's in the indexed value, but not in the query value -- i 
could be wrong, i didn't load your schema to test whether WDF does something 
helpful with that trailing period -- but the first place to start doing 
debuging is to use the *exact* value (with period) in the index section of 
analysis.jsp.




-Hoss



Re: Strange anomaly(?) with string matching in query

2009-03-30 Thread Kurt Nordstrom

Does anybody have any further suggestions on what I might try in this
situation?  Any tools perhaps that might help me put my finger on Solr's
pulse so I can figure out just what's going on in there at index and query
time?

-Kurt


Kurt Nordstrom wrote:
 
 Changed the config so that both WordDelimiterFilterFactory settings on
 both index and query use: 
 
 org.apache.solr.analysis.WordDelimiterFilterFactory
 {generateNumberParts=1, catenateWords=1, generateWordParts=1,
 catenateAll=0, catenateNumbers=1}
 
 Restarted Solr, reindexed the records.
 
 Unfortunately, no change in the search results.  It still won't find that
 pesky string.  It seems to be generating the same results as before in the
 analysis page.
 
 Any other things I might try or diagnostics that might give useful output?   
 
 -Kurt
 
 


-- 
View this message in context: 
http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22785313.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange anomaly(?) with string matching in query

2009-03-26 Thread Otis Gospodnetic

Kurt,

Attributes for WordDelimiterFilterFactory have different values in the index 
vs. query sections.  Do things work if you make them identical? (you'll have 
to reindex)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Kurt Nordstrom knordst...@library.unt.edu
 To: solr-user@lucene.apache.org
 Sent: Wednesday, March 25, 2009 1:33:41 PM
 Subject: Re: Strange anomaly(?) with string matching in query
 
 
 Otis,
 
 Absolutely.  Here are the tokenizers and filters for the text fieldtype in
 the schema.  http://pastebin.com/f2bb249f3
 
 Thanks!
 
 
 
 That's what I suspected.  Want to paste the relevant tokenizer+filters
 sections of your schema?  The index-time and query-time analysis has to be
 the same or compatible enough, and that's not the case here.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 -- 
 View this message in context: 
 http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22707191.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange anomaly(?) with string matching in query

2009-03-26 Thread Kurt Nordstrom

Changed the config so that both WordDelimiterFilterFactory settings on both
index and query use: 

org.apache.solr.analysis.WordDelimiterFilterFactory {generateNumberParts=1,
catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1}

Restarted Solr, reindexed the records.

Unfortunately, no change in the search results.  It still won't find that
pesky string.  It seems to be generating the same results as before in the
analysis page.

Any other things I might try or diagnostics that might give useful output?   

-Kurt


Otis Gospodnetic wrote:
 
 
 Kurt,
 
 Attributes for WordDelimiterFilterFactory have different values in the
 index vs. query sections.  Do things work if you make them identical?
 (you'll have to reindex)
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Kurt Nordstrom knordst...@library.unt.edu
 To: solr-user@lucene.apache.org
 Sent: Wednesday, March 25, 2009 1:33:41 PM
 Subject: Re: Strange anomaly(?) with string matching in query
 
 
 Otis,
 
 Absolutely.  Here are the tokenizers and filters for the text fieldtype
 in
 the schema.  http://pastebin.com/f2bb249f3
 
 Thanks!
 
 
 
 That's what I suspected.  Want to paste the relevant tokenizer+filters
 sections of your schema?  The index-time and query-time analysis has to
 be
 the same or compatible enough, and that's not the case here.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 -- 
 View this message in context: 
 http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22707191.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22726833.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange anomaly(?) with string matching in query

2009-03-25 Thread Otis Gospodnetic

Hi,

Take the whole string to your Solr Admin - Analysis page and analyze it.  Does 
it get analyzed the way you'd expect it to be analyzed?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Kurt Nordstrom knordst...@library.unt.edu
 To: solr-user@lucene.apache.org
 Sent: Wednesday, March 25, 2009 11:52:07 AM
 Subject: Strange anomaly(?) with string matching in query
 
 
 Hello,
 
 We've encountered a strange issue in our Solr install regarding a particular
 string that just doesn't seem to want to return results, despite the exact
 same string being in the index.
 
 What makes it even stranger is that we had the same data in a previous
 install of Solr, and it worked there, but doesn't here.
 
 The string that's been showing the trouble is Abilene Christian College --
 Students -- Yearbooks.  The field, in this case, is of type text. 
 Strangely enough, when we search for Abilene Christian College -- Students
 --, the relevant documents are returned.  It just fails when the full
 string is specified.
 
 At this point, I'm a little bit stymied.  Any suggestions or ideas would be
 highly appreciated.  In order to possibly help with diagnosis, I'm including
 links to, hopefully, relevant outputs and configurations.
 
 We're using Solr version 1.3.
 
 This is the output of a search for the string, with debugQuery turned on.
 http://pastebin.com/f72c017c1
 
 This is the output of a document containing the string in question.  The
 field is dc_subject. http://pastebin.com/f17a2e722
 
 Here is our current schema. http://pastebin.com/f2768bece
 
 If there's any more information or diagnostics that I can post or run,
 please let me know.  Thanks for your help and suggestions.
 
 -Kurt
 -- 
 View this message in context: 
 http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22704639.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange anomaly(?) with string matching in query

2009-03-25 Thread Kurt Nordstrom

Otis:

Okay, I'm not sure whether I should be including the quotes in the query
when using the analyzer, so I've run it both ways (no quotes on the index
value).  I'll try to approximate the final tables returned for each term:

The field is dc_subject in both cases, being of type text

***

Version 1 (With Quotes)
Index Value: Abilene Christian College -- Students -- Yearbooks
Query Value: Abilene Christian College -- Students -- Yearbooks

Index final table:

1 2   3  4   5
abilene  christian college   students yearbooks

Query final table:

1  2   3  4  6
abilene  christian college   studentsyearbooks


Version 2 (Without Quotes)
Index Value: Abilene Christian College -- Students -- Yearbooks
Query Value: Abilene Christian College -- Students -- Yearbooks


Index final table:

1 2   3  4   5
abilene  christian college   students yearbooks

Query final table:

1 2   3  4   5
abilene  christian college   students yearbooks


***

The main difference seems to be that there is no 5 index when I surround
the string with quotes, and instead it skips to 6.  This happens at the
WordDelimiterFilterFactory step. It seems to me like those tokens should be
returning a match, but either way, apparently they're not?  Any suggestions
at this point?


Otis Gospodnetic wrote:
 
 
 Hi,
 
 Take the whole string to your Solr Admin - Analysis page and analyze it. 
 Does it get analyzed the way you'd expect it to be analyzed?
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
-- 
View this message in context: 
http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22706495.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange anomaly(?) with string matching in query

2009-03-25 Thread Kurt Nordstrom

Otis,

Absolutely.  Here are the tokenizers and filters for the text fieldtype in
the schema.  http://pastebin.com/f2bb249f3

Thanks!



That's what I suspected.  Want to paste the relevant tokenizer+filters
sections of your schema?  The index-time and query-time analysis has to be
the same or compatible enough, and that's not the case here.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

-- 
View this message in context: 
http://www.nabble.com/Strange-anomaly%28-%29-with-string-matching-in-query-tp22704639p22707191.html
Sent from the Solr - User mailing list archive at Nabble.com.