[ 
https://issues.apache.org/jira/browse/SOLR-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Cooper updated SOLR-2864:
---------------------------------

    Attachment: lucene-2864.patch

Patch to sort files by created/modified time.
                
> DataImportHandler has non-deterministic sort order for XML files
> ----------------------------------------------------------------
>
>                 Key: SOLR-2864
>                 URL: https://issues.apache.org/jira/browse/SOLR-2864
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 3.4
>            Reporter: Gabriel Cooper
>            Priority: Minor
>              Labels: dataimport, patch, xml
>             Fix For: 3.5
>
>         Attachments: lucene-2864.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> DataImportHandler's FileListEntityProcessor relies on Java's File.list() 
> method to retrieve a list of files from the configured dataimport directory, 
> but list() does not guarantee a sort order*. This means that if you have two 
> files that update the same record, the results are non-deterministic. 
> Typically, list() does in fact return them lexigraphically sorted, but this 
> is not guaranteed**.
> An example of how you can get into trouble is to imagine the following:
> xyz.xml -- Created one hour ago. Contains updates to records "Foo" and "Bar".
> abc.xml -- Created one minute ago. Contains updates to records "Bar" and 
> "Baz".
> In this case, the newest file, in abc.xml, would (likely, but not guaranteed) 
> be run first, updating the "Bar" and "Baz" records. Next, the older file, 
> xyz.xml, would update "Foo" and overwrite "Bar" with outdated changes.
> * Per 
> http://download.oracle.com/javase/1,5,0/docs/api/java/io/File.html#list%28%29
> "There is no guarantee that the name strings in the resulting array will 
> appear in any specific order; they are not, in particular, guaranteed to 
> appear in alphabetical order."
> **  Even if it was guaranteed, lexigraphical sorting would give you the 
> following sort order:
>   1.xml
>   10.xml
>   2.xml
>   ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to