Results Order When Performing Wildcard Query

2013-04-09 Thread P Williams
Hi,

I wrote a test of my application which revealed a Solr oddity (I think).
 The test which I wrote on Windows 7 and makes use of the
solr-test-frameworkhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html
fails
under Ubuntu 12.04 because the Solr results I expected for a wildcard query
of the test data are ordered differently under Ubuntu than Windows.  On
both Windows and Ubuntu all items in the result set have a score of 1.0 and
appear to be ordered by docid (which looks like in corresponds to
alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
root of my issue is that a different docid was assigned to the same
document on each operating system.

The data was imported using a DataImportHandler configuration during a
@BeforeClass step in my JUnit test on both systems.

Any suggestions on how to ensure a consistently ordered wildcard query
result set for testing?

Thanks,
Tricia


Re: Results Order When Performing Wildcard Query

2013-04-09 Thread Shawn Heisey

On 4/9/2013 12:08 PM, P Williams wrote:

I wrote a test of my application which revealed a Solr oddity (I think).
  The test which I wrote on Windows 7 and makes use of the
solr-test-frameworkhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html
fails
under Ubuntu 12.04 because the Solr results I expected for a wildcard query
of the test data are ordered differently under Ubuntu than Windows.  On
both Windows and Ubuntu all items in the result set have a score of 1.0 and
appear to be ordered by docid (which looks like in corresponds to
alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
root of my issue is that a different docid was assigned to the same
document on each operating system.


It might be due to differences in how Java works on the two platforms, 
or even something as simple as different Java versions.  I don't know a 
lot about the underlying Lucene stuff, so this next sentence may not be 
correct: If you have are not starting from an index where the actual 
index directory was deleted before the test started (rather than 
deleting all documents), that might produce different internal Lucene 
document ids.



The data was imported using a DataImportHandler configuration during a
@BeforeClass step in my JUnit test on both systems.

Any suggestions on how to ensure a consistently ordered wildcard query
result set for testing?


Include an explicit sort parameter.  That way it will depend on the 
data, not the internal Lucene representation.


Thanks,
Shawn



Re: Results Order When Performing Wildcard Query

2013-04-09 Thread P Williams
Hey Shawn,

My gut says the difference in assignment of docids has to do with how the
FileListEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
works
on the two operating systems. The documents are updated/imported in a
different order is my guess, but I haven't tested that theory. I still
think it's kind of odd that there would be a difference.

Indexes are created from scratch in my test, so it's not that. java
-versionreports the same values on both machines
java version 1.7.0_17
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) Client VM (build 23.7-b01, mixed mode)

The explicit (arbitrary non-score) sort parameter will work as a
work-around to get my test to pass in both environments while I think about
this some more. Thanks!

Cheers,
Tricia


On Tue, Apr 9, 2013 at 2:13 PM, Shawn Heisey s...@elyograg.org wrote:

 On 4/9/2013 12:08 PM, P Williams wrote:

 I wrote a test of my application which revealed a Solr oddity (I think).
   The test which I wrote on Windows 7 and makes use of the
 solr-test-frameworkhttp://**lucene.apache.org/solr/4_1_0/**
 solr-test-framework/index.htmlhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html
 **

 fails
 under Ubuntu 12.04 because the Solr results I expected for a wildcard
 query
 of the test data are ordered differently under Ubuntu than Windows.  On
 both Windows and Ubuntu all items in the result set have a score of 1.0
 and
 appear to be ordered by docid (which looks like in corresponds to
 alphabetical unique id on Windows but not Ubuntu).  I'm guessing that the
 root of my issue is that a different docid was assigned to the same
 document on each operating system.


 It might be due to differences in how Java works on the two platforms, or
 even something as simple as different Java versions.  I don't know a lot
 about the underlying Lucene stuff, so this next sentence may not be
 correct: If you have are not starting from an index where the actual index
 directory was deleted before the test started (rather than deleting all
 documents), that might produce different internal Lucene document ids.


  The data was imported using a DataImportHandler configuration during a
 @BeforeClass step in my JUnit test on both systems.

 Any suggestions on how to ensure a consistently ordered wildcard query
 result set for testing?


 Include an explicit sort parameter.  That way it will depend on the data,
 not the internal Lucene representation.

 Thanks,
 Shawn




Re: Results Order When Performing Wildcard Query

2013-04-09 Thread Chris Hostetter

: My gut says the difference in assignment of docids has to do with how the
: 
FileListEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor

docids just represent the order documents are added to the index.  if you 
use DIH with FileListEntityProcessor to create one doc per file then the 
order of the documents will (if i remember correctly) corrispond tothe 
order of the files returned by the OS, which may vary.

even if the files are ordered consitently by modification date: 1) the 
modification date of these files on your machines  might be different; the 
graunlarity of file modification dates supported by the filesystem or file 
io layer in the JVM on each machine might be different -- causing two 
files to appera to have identical mod times on one machine, but different 
mod times on the other machine.


-Hoss