Pablo Lozano created SOLR-7952:
----------------------------------

             Summary: Change DeltaImport from HashSet to LinkedHashSet.
                 Key: SOLR-7952
                 URL: https://issues.apache.org/jira/browse/SOLR-7952
             Project: Solr
          Issue Type: Improvement
          Components: contrib - DataImportHandler
    Affects Versions: 5.2.1
            Reporter: Pablo Lozano
            Priority: Minor


This is only a minor modification which on some cases might be useful for 
certain custom DataSources or ImportHandlers.

The way my imports work is by fetching on batches, So I need to store those 
batches on a disk cache for a certain time as they are not required on the mean 
time.

And also use some lazy loading as my batches are not initialized by my custom 
iterators until the time they are iterated for the first time,

My issue comes from that the order in which I pass the ids of my documents to 
the ImporHandler during the "FIND_DELTA" step is not the same order they are 
tried to be fetch during the DELTA_DUMP step. It causes my batches to be 
initialized when only one of them could be done at a time.

What I would like is to simply change the HashSet used on the "collectDelta" 
method to a LinkedHashSet. This would help as we would obtain a predictable 
order of documents.

This may be a very specific case but the change is simple and shouldn't impact 
on anything.

The second option would be to create a "deltaImportQuery" like that would work 
like:" select * from table where last_modified > '${dih.last_index_time}'".

I can issue the patch for this.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to