[ https://issues.apache.org/jira/browse/SOLR-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabriel Cooper updated SOLR-2864: --------------------------------- Attachment: lucene-2864.patch Patch to sort files by created/modified time. > DataImportHandler has non-deterministic sort order for XML files > ---------------------------------------------------------------- > > Key: SOLR-2864 > URL: https://issues.apache.org/jira/browse/SOLR-2864 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 3.4 > Reporter: Gabriel Cooper > Priority: Minor > Labels: dataimport, patch, xml > Fix For: 3.5 > > Attachments: lucene-2864.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > DataImportHandler's FileListEntityProcessor relies on Java's File.list() > method to retrieve a list of files from the configured dataimport directory, > but list() does not guarantee a sort order*. This means that if you have two > files that update the same record, the results are non-deterministic. > Typically, list() does in fact return them lexigraphically sorted, but this > is not guaranteed**. > An example of how you can get into trouble is to imagine the following: > xyz.xml -- Created one hour ago. Contains updates to records "Foo" and "Bar". > abc.xml -- Created one minute ago. Contains updates to records "Bar" and > "Baz". > In this case, the newest file, in abc.xml, would (likely, but not guaranteed) > be run first, updating the "Bar" and "Baz" records. Next, the older file, > xyz.xml, would update "Foo" and overwrite "Bar" with outdated changes. > * Per > http://download.oracle.com/javase/1,5,0/docs/api/java/io/File.html#list%28%29 > "There is no guarantee that the name strings in the resulting array will > appear in any specific order; they are not, in particular, guaranteed to > appear in alphabetical order." > ** Even if it was guaranteed, lexigraphical sorting would give you the > following sort order: > 1.xml > 10.xml > 2.xml > ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org