Hi, If the index size on disk is about 750 GiB then a memory usage of 2.3 G heap space for the FST seems fine. It's just a bit strange that you only have 10 million documents!
Are those documents huge and have lots of indexed text content, possibly OCR/scanned stuff? If this is the case, the term dictionary may get huge because of many terms with incorrect spelling. Please also give us a "ls -lh" of your index directory to make a guess. Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: dawn breaks [mailto:2005dawnbre...@gmail.com] > Sent: Thursday, January 11, 2018 3:40 AM > To: java-user@lucene.apache.org > Subject: Lucene OOM > > Hi, all > We have a search engine service built with lucene 4.7, it seem that > lucene eat too much momery, and we have approximate 10 million > document,the > index size on disk is approximate 750G. My question is why the FST$Arc > objects consume so much memory? please refer to the following histo stat > of jmap. Hope anybody can give me some suggestion. > > num #instances #bytes class name > ---------------------------------------------- > 1: 4346283 2294837424 [Lorg.apache.lucene.util.fst.FST$Arc; > 2: 25918804 2023475632 [C > 3: 17450041 1014051416 [B > 4: 25878734 621089616 java.lang.String > 5: 18634803 596313696 java.util.HashMap$Node > 6: 14039862 561594480 java.util.TreeMap$Entry > 7: 4346283 452013432 org.apache.lucene.util.fst.FST > 8: 4522836 424741520 [Ljava.util.HashMap$Node; > 9: 4346283 347702640 > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader > 10: 4683616 337220352 org.apache.lucene.util.fst.FST$Arc > 11: 12947467 310739208 org.apache.lucene.util.BytesRef > 12: 790283 280383040 [J > 13: 4359111 245496264 [Ljava.lang.Object; > 14: 4545337 218176176 java.util.HashMap > 15: 4510384 216498432 org.apache.lucene.index.FieldInfo > 16: 4359066 199713232 [I > 17: 4346283 173851320 org.apache.lucene.util.fst.BytesStore > 18: 4510400 144332800 java.util.Collections$UnmodifiableMap > 19: 4354347 104504328 java.util.ArrayList > 20: 5736589 91785424 java.lang.Integer > 21: 822685 59233320 > org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$NumericE > ntry > 22: 428313 13706016 > org.apache.lucene.facet.taxonomy.writercache.CollisionMap$Entry > 23: 420547 13457504 org.wltea.analyzer.dic.DictSegment > 24: 177039 5665248 [Lorg.wltea.analyzer.dic.DictSegment; > 25: 20 5112128 > [Lorg.apache.lucene.facet.taxonomy.writercache.CollisionMap$Entry; > 26: 42454 2377424 org.apache.lucene.store.RAMInputStream > 27: 50054 2002160 org.apache.lucene.util.packed.Packed64 > 28: 44036 1761440 > org.apache.lucene.util.packed.DirectPackedReader > 29: 33013 1056416 > java.util.concurrent.ConcurrentHashMap$Node > 30: 43957 1054968 > org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$2 > > > > > Thanks & Best Regards! > lubin --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org