Re: Solr Shard - Strange results
I'm not quite sure what logs you are talking about, but in the tomcat/logs/catalina.out logs, i found the following [note, i can't copy/paste, so i am typing up a summary]: I execute command: localhost:8080/bravo/select?q=fredrows=102start=0shards=localhost:8080/alpha,localhost:8080/bravo In this example, alpha has 27 instances of fred, while bravo has 0. Then in the catalina.out: -There is the request for the command i sent, shards parameters and all. it has the proper queryString. -Then I see the two requests sent to the shards, apha and bravo. These two requests weave between each other until they are finished: INFO: REQUEST URI =/alpha/select INFO: REQUEST URI =/bravo/select The parameters have changed to: wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0 -Then 2 INFO's scroll across: INFO: [] webapp=/bravo path=/select params={wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0} hits=0 status=0 QTime=1 INFO: [] webapp=/alpha path=/select params={wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0} hits=27 status=0 QTime=1 **Note, hits=27 -Then i see some octet-streams being transferred, with status 200, so those are OK. -The i see something peculiar: It calls alpha with the following parameters: wt=javabinversion=2.2ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55q=fredrows=102parameter=isShard=truestart=0 Performing this query on my own (without the wt=javabin) gives me numFound=2, the result-set I get back from the overarching query. Changing it to rows=10, it gives me numFound=2, and 2 doc's. This is not the strange functionality I was seeing with the overarching query and the mis-matched numfound and doc's. This does beg the question.. why did it add: ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55 to the query? They are the format that would be under docNumber, if that helps.. Any thoughts? I will do some research on those particular ID numbered docs, in the mean time. Here's the configuration information. I only posted the difference from the default files in the solr/example/solr/conf [solrconfig.xml] config dataDir${solr.data.dir:/data/indices/bravo/solr/data/dataDir requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/data/indices/bravo/solr/conf/data-config.xml/str /lst /requestHandler config [schema.xml] schema fields field name=docNumber type=text indexed=true stored=true / field name=column1 type=text indexed=true stored=true / field name=column2 type=text indexed=true stored=true / field name=column3 type=text indexed=true stored=true / field name=column4 type=text indexed=true stored=true / field name=column5 type=text indexed=true stored=true / field name=column6 type=text indexed=true stored=true / field name=column7 type=text indexed=true stored=true / field name=column8 type=text indexed=true stored=true / field name=column9 type=text indexed=true stored=true / /fields uniqueKeydocNumber/uniqueKey defaultSearchFieldcolumn2/defaultSearchField /schema [data-config.xml] dataConfig dataSource type=JdbcDataSource driver=com.metamatrix.jdbc.MMDriver url=jdbc:metamatrix:b...@mms://hostname:port user=username password=password/ document naame=DOC_NAME entity name=ENT_NAME query=select * from ASDF.TABLE field column=TABLE_COL_NO name=docNumber / field column=TABLE_COL_1 name=column1 / field column=TABLE_COL_2 name=column2 / field column=TABLE_COL_3 name=column3 / field column=TABLE_COL_4 name=column4 / field column=TABLE_COL_5 name=column5 / field column=TABLE_COL_6 name=column6 / field column=TABLE_COL_7 name=column7 / field column=TABLE_COL_8 name=column8 / field column=TABLE_COL_9 name=column9 / /entity /document /dataConfig Yonik Seeley-2 wrote: On Fri, May 15, 2009 at 4:11 PM, CB-PO charles.bush...@gmail.com wrote: Yeah, the first thing I thought of was that perhaps there was something wrong with the uniqueKey and they were clashing between the indexes, however upon visual inspection of the data the field we are using as the unique key in each of the indexes is grossly different between the two databases, so there is no chance of them clashing. Yes, but is the same fieldname and FieldType used for both indexes? (that's sort of a requirement) You might also
Re: Solr Shard - Strange results
I'm not quite sure how that would make a difference... From my most recent testing, it seems that the problem is related to the Shards element adding ids=[...] to one of the queries. However, I will give it a try. Yao Ge wrote: Maybe you want to try with docNumber field type as string and see it would make a difference. CB-PO wrote: I'm not quite sure what logs you are talking about, but in the tomcat/logs/catalina.out logs, i found the following [note, i can't copy/paste, so i am typing up a summary]: I execute command: localhost:8080/bravo/select?q=fredrows=102start=0shards=localhost:8080/alpha,localhost:8080/bravo In this example, alpha has 27 instances of fred, while bravo has 0. Then in the catalina.out: -There is the request for the command i sent, shards parameters and all. it has the proper queryString. -Then I see the two requests sent to the shards, apha and bravo. These two requests weave between each other until they are finished: INFO: REQUEST URI =/alpha/select INFO: REQUEST URI =/bravo/select The parameters have changed to: wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0 -Then 2 INFO's scroll across: INFO: [] webapp=/bravo path=/select params={wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0} hits=0 status=0 QTime=1 INFO: [] webapp=/alpha path=/select params={wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0} hits=27 status=0 QTime=1 **Note, hits=27 -Then i see some octet-streams being transferred, with status 200, so those are OK. -The i see something peculiar: It calls alpha with the following parameters: wt=javabinversion=2.2ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55q=fredrows=102parameter=isShard=truestart=0 Performing this query on my own (without the wt=javabin) gives me numFound=2, the result-set I get back from the overarching query. Changing it to rows=10, it gives me numFound=2, and 2 doc's. This is not the strange functionality I was seeing with the overarching query and the mis-matched numfound and doc's. This does beg the question.. why did it add: ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55 to the query? They are the format that would be under docNumber, if that helps.. Any thoughts? I will do some research on those particular ID numbered docs, in the mean time. Here's the configuration information. I only posted the difference from the default files in the solr/example/solr/conf [solrconfig.xml] config dataDir${solr.data.dir:/data/indices/bravo/solr/data/dataDir requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/data/indices/bravo/solr/conf/data-config.xml/str /lst /requestHandler config [schema.xml] schema fields field name=docNumber type=text indexed=true stored=true / field name=column1 type=text indexed=true stored=true / field name=column2 type=text indexed=true stored=true / field name=column3 type=text indexed=true stored=true / field name=column4 type=text indexed=true stored=true / field name=column5 type=text indexed=true stored=true / field name=column6 type=text indexed=true stored=true / field name=column7 type=text indexed=true stored=true / field name=column8 type=text indexed=true stored=true / field name=column9 type=text indexed=true stored=true / /fields uniqueKeydocNumber/uniqueKey defaultSearchFieldcolumn2/defaultSearchField /schema [data-config.xml] dataConfig dataSource type=JdbcDataSource driver=com.metamatrix.jdbc.MMDriver url=jdbc:metamatrix:b...@mms://hostname:port user=username password=password/ document naame=DOC_NAME entity name=ENT_NAME query=select * from ASDF.TABLE field column=TABLE_COL_NO name=docNumber / field column=TABLE_COL_1 name=column1 / field column=TABLE_COL_2 name=column2 / field column=TABLE_COL_3 name=column3 / field column=TABLE_COL_4 name=column4 / field column=TABLE_COL_5 name=column5 / field column=TABLE_COL_6 name=column6 / field column=TABLE_COL_7 name=column7 / field column=TABLE_COL_8 name=column8 / field column=TABLE_COL_9 name=column9 / /entity /document /dataConfig Yonik Seeley-2 wrote: On Fri, May 15, 2009 at 4:11 PM, CB-PO charles.bush...@gmail.com wrote: Yeah, the first thing I thought of was that perhaps there was something wrong with the uniqueKey
Solr Shard - Strange results
Hello, What we have done is created multiple solr instances on the same server, where each instance is created with the DataImportHandler from a different DB. The information on each DB is similar, so the schema's for each instance are pretty much the same. Our goal is to use the shards feature to combine the results into a single table. The problem is that when we use shards, the numFound is acting very strangely. Here are some examples: 2 solr instances: localhost:8080/alpha/ localhost:8080/bravo/ Lets say i'm searching for the term fred. If I do: localhost:8080/alpha/select?q=fredrows=10start=0 I get numFound=0. That's fine localhost:8080/bravo/select?q=fredrows=10start=0 I get: result name=response numFound=27 start=0 Followed by 10 doc/doc's. This is also fine. When i do these [same result for both]: localhost:8080/alpha/select?q=fredrows=10start=0shards=localhost:8080/alpha,localhost:8080/bravo localhost:8080/bravo/select?q=fredrows=10start=0shards=localhost:8080/alpha,localhost:8080/bravo I get: result name=response numFound=18 start=0 followed by 1 doc/doc So... something weird happened... There should be 27 results, but even if it thought there were only 18 results, it should have displayed 10 of them. Alright, so I tried: localhost:8080/alpha/select?q=fredrows=1start=0shards=localhost:8080/alpha,localhost:8080/bravo localhost:8080/bravo/select?q=fredrows=1start=0shards=localhost:8080/alpha,localhost:8080/bravo I got: result name=response numFound=27 start=0 followed by 1 doc/doc Seems to be working alright with this... But lets try... localhost:8080/alpha/select?q=fredrows=1start=1shards=localhost:8080/alpha,localhost:8080/bravo localhost:8080/bravo/select?q=fredrows=1start=1shards=localhost:8080/alpha,localhost:8080/bravo I got: result name=response numFound=26 start=1 with no doc/doc's... wtf? I continued this up to start=10, and numFound decreased by 1 every time, with no more doc/doc's. So i changed it to rows=100start=0 and i got: result name=response numFound=2 start=0 followed by 2 doc/doc's. This issue is happening with multiple search queries, however with some other search queries, it works fine and returns the proper number for numFound, and however many doc's there are supposed to be. Has anyone seen this issue before? -- View this message in context: http://www.nabble.com/Solr-Shard---Strange-results-tp23561201p23561201.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Shard - Strange results
Yeah, the first thing I thought of was that perhaps there was something wrong with the uniqueKey and they were clashing between the indexes, however upon visual inspection of the data the field we are using as the unique key in each of the indexes is grossly different between the two databases, so there is no chance of them clashing. Unfortunately, I cannot provide the data in order to reproduce, however I will try and produce a set of sample data that will reproduce the problem. Although I must add that when we were testing the shard feature on smaller sets of data, we did not notice this issue ( 100,000 docs per index ) but when we fully filled each index, the issue became more apparent ( 1,000,000 docs per index ). This is not to say that the issue wasn't there before, we just never noticed it. On Monday, I will provide some configuration information and see if that helps. Yonik Seeley-2 wrote: Certainly does seem strange. Do you have the same uniqueKeyField in both indexes? Any way you can provide some configuration and some data to reproduce this? -Yonik On Fri, May 15, 2009 at 10:40 AM, CB-PO charles.bush...@gmail.com wrote: Hello, What we have done is created multiple solr instances on the same server, where each instance is created with the DataImportHandler from a different DB. The information on each DB is similar, so the schema's for each instance are pretty much the same. Our goal is to use the shards feature to combine the results into a single table. The problem is that when we use shards, the numFound is acting very strangely. Here are some examples: 2 solr instances: localhost:8080/alpha/ localhost:8080/bravo/ Lets say i'm searching for the term fred. If I do: localhost:8080/alpha/select?q=fredrows=10start=0 I get numFound=0. That's fine localhost:8080/bravo/select?q=fredrows=10start=0 I get: result name=response numFound=27 start=0 Followed by 10 doc/doc's. This is also fine. When i do these [same result for both]: localhost:8080/alpha/select?q=fredrows=10start=0shards=localhost:8080/alpha,localhost:8080/bravo localhost:8080/bravo/select?q=fredrows=10start=0shards=localhost:8080/alpha,localhost:8080/bravo I get: result name=response numFound=18 start=0 followed by 1 doc/doc So... something weird happened... There should be 27 results, but even if it thought there were only 18 results, it should have displayed 10 of them. Alright, so I tried: localhost:8080/alpha/select?q=fredrows=1start=0shards=localhost:8080/alpha,localhost:8080/bravo localhost:8080/bravo/select?q=fredrows=1start=0shards=localhost:8080/alpha,localhost:8080/bravo I got: result name=response numFound=27 start=0 followed by 1 doc/doc Seems to be working alright with this... But lets try... localhost:8080/alpha/select?q=fredrows=1start=1shards=localhost:8080/alpha,localhost:8080/bravo localhost:8080/bravo/select?q=fredrows=1start=1shards=localhost:8080/alpha,localhost:8080/bravo I got: result name=response numFound=26 start=1 with no doc/doc's... wtf? I continued this up to start=10, and numFound decreased by 1 every time, with no more doc/doc's. So i changed it to rows=100start=0 and i got: result name=response numFound=2 start=0 followed by 2 doc/doc's. This issue is happening with multiple search queries, however with some other search queries, it works fine and returns the proper number for numFound, and however many doc's there are supposed to be. Has anyone seen this issue before? -- View this message in context: http://www.nabble.com/Solr-Shard---Strange-results-tp23561201p23566574.html Sent from the Solr - User mailing list archive at Nabble.com.