Hmm, try calling maybeMerge after each .addIndexes? Robert opened this issue to fix addIndexes: https://issues.apache.org/jira/browse/LUCENE-5672
Mike McCandless http://blog.mikemccandless.com On Wed, May 14, 2014 at 11:46 AM, danielv <dani...@exlibris.co.il> wrote: > Hi, > > We have about 550M records index (~800GB) and we merge thousands of mini > indexes once a week using hadoop - 45 mappers on 2 hadoop nodes. > After upgrading to Lucene 3.6.1 we noticed that the merge process > continuously slowing down. > After we test a couple of options it looks like we found the source of the > problem but have no idea how to fix it. > What we do - first we merge all mini-indexes to one intermediate mini-index, > and than this one to the big (final) one. > The difference is deleted_records existence in mini-index: > In case we have no deleted_records from merged mini-indexes - merger run > about 2h with about 05s-2s per mini-index > If we have deleted_records - after about 10 minutes we see dramatic > degradation in time of merging mini-indexes to intermediate one (if first > 100-200 mini-indexes merge take less than a second, after 10 minutes is take > more than 10s for one mini-index and after hour or two it is a couple of > minutes!) > > This one from jstack of mapper: > > java.lang.Thread.State: RUNNABLE > at java.lang.Thread.isAlive(Native Method) > at > org.apache.lucene.util.CloseableThreadLocal.purge(CloseableThreadLocal.java:115) > - locked <0x00000007db0d6140> (a java.util.WeakHashMap) > at > org.apache.lucene.util.CloseableThreadLocal.maybePurge(CloseableThreadLocal.java:105) > at > org.apache.lucene.util.CloseableThreadLocal.get(CloseableThreadLocal.java:88) > at > org.apache.lucene.index.TermInfosReader.getThreadResources(TermInfosReader.java:160) > at > org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:184) > at > org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172) > at > org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:66) > at > org.apache.lucene.index.BufferedDeletesStream.applyTermDeletes(BufferedDeletesStream.java:346) > - locked <0x00000007805766f0> (a > org.apache.lucene.index.BufferedDeletesStream) > at > org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:248) > - locked <0x00000007805766f0> (a > org.apache.lucene.index.BufferedDeletesStream) > at > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3615) > - locked <0x00000007805739a0> (a > org.apache.lucene.index.IndexWriter) > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3120) > at > org.apache.lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:3064) > > We try to use org.apache.lucene.index.IndexWriter.addIndexes instead of > org.apache.lucene.index.IndexWriter.addIndexesNoOptimize - same behavior. > > How can we eliminate this behavior and get improvement in performance of our > merge? > > Thanks! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Merger-performance-degradation-on-3-6-1-tp4135593.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org