On Mon, Dec 8, 2014 at 9:11 AM, Sushmitha Chakka < sushmi...@sigmoidanalytics.com>
> Hi, > > I have an index with 6 Crores of records. My usecase is to read the > entire index, check each record, whether it is present in new index or > not.If not I have to index into new index. I used scan and scroll operation > to read the index using JAVA Api. But this process is taking lot of time > i.e., to process 50,000 rcds it is taking 8 min. Can anyone suggest me how > I can configure or change my queries. > > I know from experience that scan/scroll can handle batch sizes in the low thousands without trouble so you should give that a shot. Each scroll call should be quite quick. It might be a good idea to post a JSON recreation of your problem so we can see what is happening. Usually the slow part of the scan/scroll into new index is the batch calls to add the documents into the new index. And whether or not that is "slow" is really dependant on the size of the documents, the complexity of any scripts you use on import, your disk speed, the complexity of your analysis, your cpu speed, the merge settings you use. That list is roughly in order of how likely I've seen things effect import speed. Nik -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0twuaDy_qL-Gjv1Bv2qpee%2BREY14s8T2%2BUbgDgsp8kSw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.