Re: Data not always returned
Hi Erick On Tue, Jun 7, 2011 at 11:42 PM, Erick Erickson erickerick...@gmail.com wrote: Well, this is odd. Several questions 1 what do your logs show? I'm wondering if somehow some data is getting rejected. I have no idea why that would be, but if you're seeing indexing exceptions that would explain it. 2 on the admin/stats page, are maxDocs and numDocs the same in the success /failure case? And are they equal to 40,000? 3 what does debugQuery=on show in the two cases? I'd expect it to be identical, but... 4 admin/schema browser. Look at your three fields and see if things like unique-terms are identical. 5 are the rows being returned before indexing in the same order? I'm wondering if somehow you're getting documents overwritten by having the same id (uniqueKey). 6 Have you poked around with Luke to see what, if anything, is dissimilar? These are shots in the dark, but my supposition is that somehow you're not indexing what you expect, the questions above might give us a clue where to look next. You were right, I found a nasty problem with the indexer and postgres which prevented some documents to be indexed. Once I fixed this problem everything worked fine. Thanks a lot for your support. Best Regards, -- Jérôme
Data not always returned
Hi all, I have a problem with my index. Even though I always index the same data over and over again, whenever I try a couple of searches (they are always the same as they are issued by a unit test suite) I do not get the same results, sometimes I get 3 successes and 2 failures and sometimes it is the other way around it is unpredictable. Here is what I am trying to do: I created a new Solr core with its specific solrconfig.xml and schema.xml This core stores a list of towns which I plan to use with an auto-suggestion system, using ngrams (no Suggester) The indexing process is always the same : 1. the import script deletes all documents in the core : deletequery*:*/query/delete and commit/ 2. the import script fetches date from postgres, 100 rows at a time 2. the import script adds these 100 documents and sends a commit/ 3. once all the rows (around 40 000) have been imported the script send an optimize/ query Here is what happens: I run the indexer once and search for 'foo' I get results I expect but if I search for 'bar' I get nothing I reindex once again and search for 'foo' I get nothing, but if I search for 'bar' I get results The search is made on the name field which is a pretty common TextField with ngrams. I tried to physically remove the index (rm -rf path/to/index) and reindex everything as well and not all searches work, sometimes the 'foo' search work, sometimes the 'bar' one. I tried a lot of differents things but now I am running out of ideas. This is why I am asking for help. Some useful informations : Solr version : 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Java 1.5.0_24 on Mac Os X solrconfig.xml and schema.xml are attached Thanks in advance for your help. schema.xml.gz Description: GNU Zip compressed data solrconfig.xml.gz Description: GNU Zip compressed data
Re: Weird behaviour with phrase queries
Hi Erick, On Tue, Jan 25, 2011 at 1:38 PM, Erick Erickson erickerick...@gmail.comwrote: Frankly, this puzzles me. It *looks* like it should be OK. One warning, the analysis page sometimes is a bit misleading, so beware of that. But the output of your queries make it look like the query is parsing as you expect, which leaves the question of whether your index contains what you think it does. You might get a copy of Luke, which allows you to examine what's actually in your index instead of what you think is in there. Sometimes there are surprises here! Bingo ! Some data were not in the index. Indexing them obviously fixed the problem. I didn't mean to re-index your whole corpus, I was thinking that you could just index a few documents in a test index so you have something small to look at. Sorry I can't spot what's happening right away. No worries, thanks for your support :) -- Jérôme
Weird behaviour with phrase queries
Hi, I have a problem with phrase queries, from times to times I do not get any result where as I know I should get returned something. The search is run against a field of type text which definition is available at the following URL : - http://pastebin.com/Ncem7M8z This field is defined with the following configuration: field name=meta_text type=textindexed=true stored=true multiValued=true termVectors=true/ I use the following request handler: requestHandler name=custom class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfmeta_text/str str name=pfmeta_text/str str name=bf/ str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler Depending on the kind of phrase query I use I get either exactly what I am looking for or nothing. Index' contents is all french so I thought about a possible problem with accents but I got queries working with phrase queries containing é and è chars like académie or ingénieur. As you will see the filter used in the text type uses the SnowballPorterFilterFactory for the english language, I plan to fix that by using the correct language for the index (French) and the following protwords http://bit.ly/i8JeX6 . But except this mistake with the stemmer, did I do something (else) wrong ? Did I overlook something ? What could explain I do not always get results for my phrase queries ? Thanks in advance for your feedback. Best Regards, -- Jérôme
Re: Weird behaviour with phrase queries
Erick, On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I don't see any screen shots. Several things: 1 If your stopword file has comments, I'm not sure what the effect would be. Ha, I thought comments were supported in stopwords.txt 2 Something's not right here, or I'm being fooled again. Your withresults xml has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d ingenieur)~0.01) ()/str and your noresults has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:academi charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi charpenti~100)~0.01)/str the empty () in the first one often means you're NOT going to your configured dismax parser in solrconfig.xml. Yet that doesn't square with your custom qt, so I'm puzzled. Could we see your raw query string on the way in? It's almost as if you defined qt in one and defType in the other, which are not equivalent. You are right I fixed this problem (my bad). 3 It may take 12 hours to index, but you could experiment with a smaller subset. You say you know that the noresults one should return documents, what proof do you have? If there's a single document that you know should match this, just index it and a few others and you should be able to make many runs until you get to the bottom of this... I could but I always thought I had to fully re-index after updating schema.xml. If I update only few documents will that take the changes into account without breaking the rest ? And obviously your stemming is happening on the query, are you sure it's happening at index time too? Since you did not get the screenshots you will find attached the full output of the analysis for a phrase that works and for another that does not. Thanks for your support Best Regards, -- Jérôme analysis-noresults.html.gz Description: GNU Zip compressed data analysis-withresults.html.gz Description: GNU Zip compressed data