Re: Solr Shard - Strange results

2009-05-18 Thread CB-PO

I'm not quite sure what logs you are talking about, but in the
tomcat/logs/catalina.out logs, i found the following [note, i can't
copy/paste, so i am typing up a summary]:

I execute command: 
localhost:8080/bravo/select?q=fredrows=102start=0shards=localhost:8080/alpha,localhost:8080/bravo
 
In this example, alpha has 27 instances of fred, while bravo has 0.

Then in the catalina.out:

-There is the request for the command i sent, shards parameters and all.  it
has the proper queryString.
-Then I see the two requests sent to the shards, apha and bravo.  These two
requests weave between each other until they are finished:
 INFO: REQUEST URI =/alpha/select
 INFO: REQUEST URI =/bravo/select
  The parameters have changed to:
 
wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0

-Then 2 INFO's scroll across:
INFO: [] webapp=/bravo path=/select
params={wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0}
hits=0 status=0 QTime=1
INFO: [] webapp=/alpha path=/select
params={wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0}
hits=27 status=0 QTime=1
**Note, hits=27

-Then i see some octet-streams being transferred, with status 200, so those
are OK.

-The i see something peculiar:
  It calls alpha with the following parameters: 
wt=javabinversion=2.2ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55q=fredrows=102parameter=isShard=truestart=0

Performing this query on my own (without the wt=javabin) gives me
numFound=2, the result-set I get back from the overarching query.  
Changing it to rows=10, it gives me numFound=2, and 2 doc's.  This is not
the strange functionality I was seeing with the overarching query and the
mis-matched numfound and doc's.

This does beg the question.. why did it add:
ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55 to the
query?  They are the format that would be under docNumber, if that helps.. 
Any thoughts?  I will do some research on those particular ID numbered docs,
in the mean time.

Here's the configuration information.  I only posted the difference from the
default files in the solr/example/solr/conf

[solrconfig.xml]
config
dataDir${solr.data.dir:/data/indices/bravo/solr/data/dataDir

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str 
name=config/data/indices/bravo/solr/conf/data-config.xml/str
/lst
/requestHandler
config


[schema.xml]
schema
fields
field name=docNumber type=text indexed=true 
stored=true /
field name=column1 type=text indexed=true stored=true 
/
field name=column2 type=text indexed=true stored=true 
/
field name=column3 type=text indexed=true stored=true 
/
field name=column4 type=text indexed=true stored=true 
/
field name=column5 type=text indexed=true stored=true 
/
field name=column6 type=text indexed=true stored=true 
/
field name=column7 type=text indexed=true stored=true 
/
field name=column8 type=text indexed=true stored=true 
/
field name=column9 type=text indexed=true stored=true 
/
/fields
uniqueKeydocNumber/uniqueKey
defaultSearchFieldcolumn2/defaultSearchField
/schema


[data-config.xml]
dataConfig
dataSource type=JdbcDataSource driver=com.metamatrix.jdbc.MMDriver
url=jdbc:metamatrix:b...@mms://hostname:port user=username
password=password/
document naame=DOC_NAME
entity name=ENT_NAME query=select * from ASDF.TABLE
field column=TABLE_COL_NO name=docNumber /
field column=TABLE_COL_1 name=column1 /
field column=TABLE_COL_2 name=column2 /
field column=TABLE_COL_3 name=column3 /
field column=TABLE_COL_4 name=column4 /
field column=TABLE_COL_5 name=column5 /
field column=TABLE_COL_6 name=column6 /
field column=TABLE_COL_7 name=column7 /
field column=TABLE_COL_8 name=column8 /
field column=TABLE_COL_9 name=column9 /
/entity
/document
/dataConfig





Yonik Seeley-2 wrote:
 
 On Fri, May 15, 2009 at 4:11 PM, CB-PO charles.bush...@gmail.com wrote:
 Yeah, the first thing I thought of was that perhaps there was something
 wrong
 with the uniqueKey and they were clashing between the indexes, however
 upon
 visual inspection of the data the field we are using as the unique key in
 each of the indexes is grossly different between the two databases, so
 there
 is no chance of them clashing.
 
 Yes, but is the same fieldname and FieldType used for both indexes?
 (that's sort of a requirement)
 
 You might also

Re: Solr Shard - Strange results

2009-05-18 Thread CB-PO

I'm not quite sure how that would make a difference... From my most recent
testing, it seems that the problem is related to the Shards element adding
ids=[...] to one of the queries.  However, I will give it a try.

Yao Ge wrote:
 
 Maybe you want to try with docNumber field type as string and see it
 would make a difference.
 
 
 CB-PO wrote:
 
 I'm not quite sure what logs you are talking about, but in the
 tomcat/logs/catalina.out logs, i found the following [note, i can't
 copy/paste, so i am typing up a summary]:
 
 I execute command: 
 localhost:8080/bravo/select?q=fredrows=102start=0shards=localhost:8080/alpha,localhost:8080/bravo
  
 In this example, alpha has 27 instances of fred, while bravo has 0.
 
 Then in the catalina.out:
 
 -There is the request for the command i sent, shards parameters and all. 
 it has the proper queryString.
 -Then I see the two requests sent to the shards, apha and bravo.  These
 two requests weave between each other until they are finished:
  INFO: REQUEST URI =/alpha/select
  INFO: REQUEST URI =/bravo/select
   The parameters have changed to:
  
 wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0
 
 -Then 2 INFO's scroll across:
 INFO: [] webapp=/bravo path=/select
 params={wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0}
 hits=0 status=0 QTime=1
 INFO: [] webapp=/alpha path=/select
 params={wt=javabinfsv=trueversion=2.2f1=docNumber,scoreq=fredrows=102isShard=truestart=0}
 hits=27 status=0 QTime=1
 **Note, hits=27
 
 -Then i see some octet-streams being transferred, with status 200, so
 those are OK.
 
 -The i see something peculiar:
   It calls alpha with the following parameters: 
 wt=javabinversion=2.2ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55q=fredrows=102parameter=isShard=truestart=0
 
 Performing this query on my own (without the wt=javabin) gives me
 numFound=2, the result-set I get back from the overarching query.  
 Changing it to rows=10, it gives me numFound=2, and 2 doc's.  This is
 not the strange functionality I was seeing with the overarching query and
 the mis-matched numfound and doc's.
 
 This does beg the question.. why did it add:
 ids=ABC-1353,ABC-408,ABC-1355,ABC-1824,ABC-1354,FRED-ID-27,55 to the
 query?  They are the format that would be under docNumber, if that
 helps..  Any thoughts?  I will do some research on those particular ID
 numbered docs, in the mean time.
 
 Here's the configuration information.  I only posted the difference from
 the default files in the solr/example/solr/conf
 
 [solrconfig.xml]
 config
  dataDir${solr.data.dir:/data/indices/bravo/solr/data/dataDir
  
  requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
  str 
 name=config/data/indices/bravo/solr/conf/data-config.xml/str
  /lst
  /requestHandler
 config
 
 
 [schema.xml]
 schema
  fields
  field name=docNumber type=text indexed=true 
 stored=true /
  field name=column1 type=text indexed=true stored=true 
 /
  field name=column2 type=text indexed=true stored=true 
 /
  field name=column3 type=text indexed=true stored=true 
 /
  field name=column4 type=text indexed=true stored=true 
 /
  field name=column5 type=text indexed=true stored=true 
 /
  field name=column6 type=text indexed=true stored=true 
 /
  field name=column7 type=text indexed=true stored=true 
 /
  field name=column8 type=text indexed=true stored=true 
 /
  field name=column9 type=text indexed=true stored=true 
 /
  /fields
  uniqueKeydocNumber/uniqueKey
  defaultSearchFieldcolumn2/defaultSearchField
 /schema
 
 
 [data-config.xml]
 dataConfig
  dataSource type=JdbcDataSource driver=com.metamatrix.jdbc.MMDriver
 url=jdbc:metamatrix:b...@mms://hostname:port user=username
 password=password/
  document naame=DOC_NAME
  entity name=ENT_NAME query=select * from ASDF.TABLE
  field column=TABLE_COL_NO name=docNumber /
  field column=TABLE_COL_1 name=column1 /
  field column=TABLE_COL_2 name=column2 /
  field column=TABLE_COL_3 name=column3 /
  field column=TABLE_COL_4 name=column4 /
  field column=TABLE_COL_5 name=column5 /
  field column=TABLE_COL_6 name=column6 /
  field column=TABLE_COL_7 name=column7 /
  field column=TABLE_COL_8 name=column8 /
  field column=TABLE_COL_9 name=column9 /
  /entity
  /document
 /dataConfig
 
 
 
 
 
 Yonik Seeley-2 wrote:
 
 On Fri, May 15, 2009 at 4:11 PM, CB-PO charles.bush...@gmail.com
 wrote:
 Yeah, the first thing I thought of was that perhaps there was something
 wrong
 with the uniqueKey

Solr Shard - Strange results

2009-05-15 Thread CB-PO

Hello, 
What we have done is created multiple solr instances on the same server,
where each instance is created with the DataImportHandler from a different
DB.  The information on each DB is similar, so the schema's for each
instance are pretty much the same.  Our goal is to use the shards feature to
combine the results into a single table.

The problem is that when we use shards, the numFound is acting very
strangely.  Here are some examples:

2 solr instances:
localhost:8080/alpha/
localhost:8080/bravo/

Lets say i'm searching for the term fred.  If I do:

localhost:8080/alpha/select?q=fredrows=10start=0
I get numFound=0.  That's fine

localhost:8080/bravo/select?q=fredrows=10start=0
I get: result name=response numFound=27 start=0  Followed by 10
doc/doc's.  This is also fine.

When i do these [same result for both]:
localhost:8080/alpha/select?q=fredrows=10start=0shards=localhost:8080/alpha,localhost:8080/bravo
localhost:8080/bravo/select?q=fredrows=10start=0shards=localhost:8080/alpha,localhost:8080/bravo

I get: result name=response numFound=18 start=0  followed by 1
doc/doc

So... something weird happened... There should be 27 results, but even if it
thought there were only 18 results, it should have displayed 10 of them.


Alright, so I tried:

localhost:8080/alpha/select?q=fredrows=1start=0shards=localhost:8080/alpha,localhost:8080/bravo
localhost:8080/bravo/select?q=fredrows=1start=0shards=localhost:8080/alpha,localhost:8080/bravo

I got: result name=response numFound=27 start=0  followed by 1
doc/doc
Seems to be working alright with this...  But lets try...

localhost:8080/alpha/select?q=fredrows=1start=1shards=localhost:8080/alpha,localhost:8080/bravo
localhost:8080/bravo/select?q=fredrows=1start=1shards=localhost:8080/alpha,localhost:8080/bravo

I got: result name=response numFound=26 start=1  with no
doc/doc's... wtf?

I continued this up to start=10, and numFound decreased by 1 every time,
with no more doc/doc's.
So i changed it to rows=100start=0 and i got:  result name=response
numFound=2 start=0 followed by 2 doc/doc's.

This issue is happening with multiple search queries, however with some
other search queries, it works fine and returns the proper number for
numFound, and however many doc's there are supposed to be.

Has anyone seen this issue before?

-- 
View this message in context: 
http://www.nabble.com/Solr-Shard---Strange-results-tp23561201p23561201.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Shard - Strange results

2009-05-15 Thread CB-PO

Yeah, the first thing I thought of was that perhaps there was something wrong
with the uniqueKey and they were clashing between the indexes, however upon
visual inspection of the data the field we are using as the unique key in
each of the indexes is grossly different between the two databases, so there
is no chance of them clashing.

Unfortunately, I cannot provide the data in order to reproduce, however I
will try and produce a set of sample data that will reproduce the problem. 
Although I must add that when we were testing the shard feature on smaller
sets of data, we did not notice this issue (  100,000 docs per index ) but
when we fully filled each index, the issue became more apparent ( 
1,000,000 docs per index ).  This is not to say that the issue wasn't there
before, we just never noticed it.

On Monday, I will provide some configuration information and see if that
helps.


Yonik Seeley-2 wrote:
 
 Certainly does seem strange.
 Do you have the same uniqueKeyField in both indexes?
 Any way you can provide some configuration and some data to reproduce
 this?
 
 -Yonik
 
 On Fri, May 15, 2009 at 10:40 AM, CB-PO charles.bush...@gmail.com wrote:

 Hello,
 What we have done is created multiple solr instances on the same server,
 where each instance is created with the DataImportHandler from a
 different
 DB.  The information on each DB is similar, so the schema's for each
 instance are pretty much the same.  Our goal is to use the shards feature
 to
 combine the results into a single table.

 The problem is that when we use shards, the numFound is acting very
 strangely.  Here are some examples:

 2 solr instances:
 localhost:8080/alpha/
 localhost:8080/bravo/

 Lets say i'm searching for the term fred.  If I do:

 localhost:8080/alpha/select?q=fredrows=10start=0
 I get numFound=0.  That's fine

 localhost:8080/bravo/select?q=fredrows=10start=0
 I get: result name=response numFound=27 start=0  Followed by 10
 doc/doc's.  This is also fine.

 When i do these [same result for both]:
 localhost:8080/alpha/select?q=fredrows=10start=0shards=localhost:8080/alpha,localhost:8080/bravo
 localhost:8080/bravo/select?q=fredrows=10start=0shards=localhost:8080/alpha,localhost:8080/bravo

 I get: result name=response numFound=18 start=0  followed by 1
 doc/doc

 So... something weird happened... There should be 27 results, but even if
 it
 thought there were only 18 results, it should have displayed 10 of them.


 Alright, so I tried:

 localhost:8080/alpha/select?q=fredrows=1start=0shards=localhost:8080/alpha,localhost:8080/bravo
 localhost:8080/bravo/select?q=fredrows=1start=0shards=localhost:8080/alpha,localhost:8080/bravo

 I got: result name=response numFound=27 start=0  followed by 1
 doc/doc
 Seems to be working alright with this...  But lets try...

 localhost:8080/alpha/select?q=fredrows=1start=1shards=localhost:8080/alpha,localhost:8080/bravo
 localhost:8080/bravo/select?q=fredrows=1start=1shards=localhost:8080/alpha,localhost:8080/bravo

 I got: result name=response numFound=26 start=1  with no
 doc/doc's... wtf?

 I continued this up to start=10, and numFound decreased by 1 every time,
 with no more doc/doc's.
 So i changed it to rows=100start=0 and i got:  result name=response
 numFound=2 start=0 followed by 2 doc/doc's.

 This issue is happening with multiple search queries, however with some
 other search queries, it works fine and returns the proper number for
 numFound, and however many doc's there are supposed to be.

 Has anyone seen this issue before?
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-Shard---Strange-results-tp23561201p23566574.html
Sent from the Solr - User mailing list archive at Nabble.com.