subject:"Results Order When Performing Wildcard Query"

Results Order When Performing Wildcard Query

2013-04-09 Thread P Williams

Hi,

I wrote a test of my application which revealed a Solr oddity (I think).
 The test which I wrote on Windows 7 and makes use of the
solr-test-frameworkhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html
fails
under Ubuntu 12.04 because the Solr results I expected for a wildcard query
of the test data are ordered differently under Ubuntu than Windows.  On
both Windows and Ubuntu all items in the result set have a score of 1.0 and
appear to be ordered by docid (which looks like in corresponds to
alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
root of my issue is that a different docid was assigned to the same
document on each operating system.

The data was imported using a DataImportHandler configuration during a
@BeforeClass step in my JUnit test on both systems.

Any suggestions on how to ensure a consistently ordered wildcard query
result set for testing?

Thanks,
Tricia

Re: Results Order When Performing Wildcard Query

2013-04-09 Thread Shawn Heisey


On 4/9/2013 12:08 PM, P Williams wrote:

I wrote a test of my application which revealed a Solr oddity (I think).
  The test which I wrote on Windows 7 and makes use of the
solr-test-frameworkhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html
fails
under Ubuntu 12.04 because the Solr results I expected for a wildcard query
of the test data are ordered differently under Ubuntu than Windows.  On
both Windows and Ubuntu all items in the result set have a score of 1.0 and
appear to be ordered by docid (which looks like in corresponds to
alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
root of my issue is that a different docid was assigned to the same
document on each operating system.


It might be due to differences in how Java works on the two platforms, 
or even something as simple as different Java versions.  I don't know a 
lot about the underlying Lucene stuff, so this next sentence may not be 
correct: If you have are not starting from an index where the actual 
index directory was deleted before the test started (rather than 
deleting all documents), that might produce different internal Lucene 
document ids.



The data was imported using a DataImportHandler configuration during a
@BeforeClass step in my JUnit test on both systems.

Any suggestions on how to ensure a consistently ordered wildcard query
result set for testing?


Include an explicit sort parameter.  That way it will depend on the 
data, not the internal Lucene representation.


Thanks,
Shawn

Re: Results Order When Performing Wildcard Query

2013-04-09 Thread P Williams

Hey Shawn,

My gut says the difference in assignment of docids has to do with how the
FileListEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
works
on the two operating systems. The documents are updated/imported in a
different order is my guess, but I haven't tested that theory. I still
think it's kind of odd that there would be a difference.

Indexes are created from scratch in my test, so it's not that. java
-versionreports the same values on both machines
java version 1.7.0_17
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) Client VM (build 23.7-b01, mixed mode)

The explicit (arbitrary non-score) sort parameter will work as a
work-around to get my test to pass in both environments while I think about
this some more. Thanks!

Cheers,
Tricia

On Tue, Apr 9, 2013 at 2:13 PM, Shawn Heisey s...@elyograg.org wrote:

On 4/9/2013 12:08 PM, P Williams wrote:

I wrote a test of my application which revealed a Solr oddity (I think).
The test which I wrote on Windows 7 and makes use of the
solr-test-frameworkhttp://**lucene.apache.org/solr/4_1_0/**
solr-test-framework/index.htmlhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html
**

fails
under Ubuntu 12.04 because the Solr results I expected for a wildcard
query
of the test data are ordered differently under Ubuntu than Windows. On
both Windows and Ubuntu all items in the result set have a score of 1.0
and
appear to be ordered by docid (which looks like in corresponds to
alphabetical unique id on Windows but not Ubuntu). I'm guessing that the
root of my issue is that a different docid was assigned to the same
document on each operating system.

It might be due to differences in how Java works on the two platforms, or
even something as simple as different Java versions. I don't know a lot
about the underlying Lucene stuff, so this next sentence may not be
correct: If you have are not starting from an index where the actual index
directory was deleted before the test started (rather than deleting all
documents), that might produce different internal Lucene document ids.

The data was imported using a DataImportHandler configuration during a
@BeforeClass step in my JUnit test on both systems.

Any suggestions on how to ensure a consistently ordered wildcard query
result set for testing?

Include an explicit sort parameter. That way it will depend on the data,
not the internal Lucene representation.

Thanks,
Shawn

Re: Results Order When Performing Wildcard Query

2013-04-09 Thread Chris Hostetter


: My gut says the difference in assignment of docids has to do with how the
: 
FileListEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor

docids just represent the order documents are added to the index.  if you 
use DIH with FileListEntityProcessor to create one doc per file then the 
order of the documents will (if i remember correctly) corrispond tothe 
order of the files returned by the OS, which may vary.

even if the files are ordered consitently by modification date: 1) the 
modification date of these files on your machines  might be different; the 
graunlarity of file modification dates supported by the filesystem or file 
io layer in the JVM on each machine might be different -- causing two 
files to appera to have identical mod times on one machine, but different 
mod times on the other machine.


-Hoss

Results Order When Performing Wildcard Query

Re: Results Order When Performing Wildcard Query

Re: Results Order When Performing Wildcard Query

Re: Results Order When Performing Wildcard Query

4 matches

Site Navigation

Mail list logo

Footer information