Results Order When Performing Wildcard Query
Hi, I wrote a test of my application which revealed a Solr oddity (I think). The test which I wrote on Windows 7 and makes use of the solr-test-frameworkhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html fails under Ubuntu 12.04 because the Solr results I expected for a wildcard query of the test data are ordered differently under Ubuntu than Windows. On both Windows and Ubuntu all items in the result set have a score of 1.0 and appear to be ordered by docid (which looks like in corresponds to alphabetical unique id on Windows but not Ubuntu). I'm guessing that the root of my issue is that a different docid was assigned to the same document on each operating system. The data was imported using a DataImportHandler configuration during a @BeforeClass step in my JUnit test on both systems. Any suggestions on how to ensure a consistently ordered wildcard query result set for testing? Thanks, Tricia
Re: Results Order When Performing Wildcard Query
On 4/9/2013 12:08 PM, P Williams wrote: I wrote a test of my application which revealed a Solr oddity (I think). The test which I wrote on Windows 7 and makes use of the solr-test-frameworkhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html fails under Ubuntu 12.04 because the Solr results I expected for a wildcard query of the test data are ordered differently under Ubuntu than Windows. On both Windows and Ubuntu all items in the result set have a score of 1.0 and appear to be ordered by docid (which looks like in corresponds to alphabetical unique id on Windows but not Ubuntu). I'm guessing that the root of my issue is that a different docid was assigned to the same document on each operating system. It might be due to differences in how Java works on the two platforms, or even something as simple as different Java versions. I don't know a lot about the underlying Lucene stuff, so this next sentence may not be correct: If you have are not starting from an index where the actual index directory was deleted before the test started (rather than deleting all documents), that might produce different internal Lucene document ids. The data was imported using a DataImportHandler configuration during a @BeforeClass step in my JUnit test on both systems. Any suggestions on how to ensure a consistently ordered wildcard query result set for testing? Include an explicit sort parameter. That way it will depend on the data, not the internal Lucene representation. Thanks, Shawn
Re: Results Order When Performing Wildcard Query
Hey Shawn, My gut says the difference in assignment of docids has to do with how the FileListEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor works on the two operating systems. The documents are updated/imported in a different order is my guess, but I haven't tested that theory. I still think it's kind of odd that there would be a difference. Indexes are created from scratch in my test, so it's not that. java -versionreports the same values on both machines java version 1.7.0_17 Java(TM) SE Runtime Environment (build 1.7.0_17-b02) Java HotSpot(TM) Client VM (build 23.7-b01, mixed mode) The explicit (arbitrary non-score) sort parameter will work as a work-around to get my test to pass in both environments while I think about this some more. Thanks! Cheers, Tricia On Tue, Apr 9, 2013 at 2:13 PM, Shawn Heisey s...@elyograg.org wrote: On 4/9/2013 12:08 PM, P Williams wrote: I wrote a test of my application which revealed a Solr oddity (I think). The test which I wrote on Windows 7 and makes use of the solr-test-frameworkhttp://**lucene.apache.org/solr/4_1_0/** solr-test-framework/index.htmlhttp://lucene.apache.org/solr/4_1_0/solr-test-framework/index.html ** fails under Ubuntu 12.04 because the Solr results I expected for a wildcard query of the test data are ordered differently under Ubuntu than Windows. On both Windows and Ubuntu all items in the result set have a score of 1.0 and appear to be ordered by docid (which looks like in corresponds to alphabetical unique id on Windows but not Ubuntu). I'm guessing that the root of my issue is that a different docid was assigned to the same document on each operating system. It might be due to differences in how Java works on the two platforms, or even something as simple as different Java versions. I don't know a lot about the underlying Lucene stuff, so this next sentence may not be correct: If you have are not starting from an index where the actual index directory was deleted before the test started (rather than deleting all documents), that might produce different internal Lucene document ids. The data was imported using a DataImportHandler configuration during a @BeforeClass step in my JUnit test on both systems. Any suggestions on how to ensure a consistently ordered wildcard query result set for testing? Include an explicit sort parameter. That way it will depend on the data, not the internal Lucene representation. Thanks, Shawn
Re: Results Order When Performing Wildcard Query
: My gut says the difference in assignment of docids has to do with how the : FileListEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor docids just represent the order documents are added to the index. if you use DIH with FileListEntityProcessor to create one doc per file then the order of the documents will (if i remember correctly) corrispond tothe order of the files returned by the OS, which may vary. even if the files are ordered consitently by modification date: 1) the modification date of these files on your machines might be different; the graunlarity of file modification dates supported by the filesystem or file io layer in the JVM on each machine might be different -- causing two files to appera to have identical mod times on one machine, but different mod times on the other machine. -Hoss