DataImportHandler has non-deterministic sort order for XML files
----------------------------------------------------------------

                 Key: SOLR-2864
                 URL: https://issues.apache.org/jira/browse/SOLR-2864
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
    Affects Versions: 3.4
            Reporter: Gabriel Cooper
            Priority: Minor
             Fix For: 3.5


DataImportHandler's FileListEntityProcessor relies on Java's File.list() method 
to retrieve a list of files from the configured dataimport directory, but 
list() does not guarantee a sort order*. This means that if you have two files 
that update the same record, the results are non-deterministic. Typically, 
list() does in fact return them lexigraphically sorted, but this is not 
guaranteed**.

An example of how you can get into trouble is to imagine the following:

xyz.xml -- Created one hour ago. Contains updates to records "Foo" and "Bar".
abc.xml -- Created one minute ago. Contains updates to records "Bar" and "Baz".

In this case, the newest file, in abc.xml, would (likely, but not guaranteed) 
be run first, updating the "Bar" and "Baz" records. Next, the older file, 
xyz.xml, would update "Foo" and overwrite "Bar" with outdated changes.

* Per 
http://download.oracle.com/javase/1,5,0/docs/api/java/io/File.html#list%28%29

"There is no guarantee that the name strings in the resulting array will appear 
in any specific order; they are not, in particular, guaranteed to appear in 
alphabetical order."

**  Even if it was guaranteed, lexigraphical sorting would give you the 
following sort order:

  1.xml
  10.xml
  2.xml
  ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to