This is definitely a confusing error condition. If we can add more information without creating an undue burden for the indexer it would be nice, but I think this will be very challenging here since the exception is thrown at a low level in the code where there might not be a lot of useful info (ie the field name) to provide. And I expect there are other places that make a similar assumption we would have to track down?
On Tue, May 7, 2024 at 9:10 AM Jerven Tjalling Bolleman <jerven.bolleman@sib.swiss> wrote: > > Dear Michael, > > Looking deeper into this. I think we overflowed a term frequency field. > Looking in some statistics, in a previous release we had 1,288,526,281 > of a certain field, this would be larger now. Each of these would have > had a limited set of values. But crucially nearly all of them would have > had the term "positional" or "non-positional" added to the document. > > There is no good reason to do this today, we should just turn this into > a boolean field and update the UI. I will do this and report back. > > Do you think that a patch for a try/catch for a more informative log > message be appreciated by the community? e.g. mentioning the field name > in the exception? > > Regards, > Jerven > > On 5/7/24 14:52, Jerven Tjalling Bolleman wrote: > > Dear Michael, > > > > Thank you for your help. > > > > We don't use custom term frequencies (I just double checked with a code > > search). > > We also always merge down to one segment (historical but also we index > > once and then there are no changes for a week to a month and then we > > reindex every document from scratch). > > > > Your response is very helpful already and I very much appreciate it as > > it cuts down the search space significantly. > > > > Regards, > > Jerven > > > > > > On 5/7/24 14:03, Michael Sokolov wrote: > >> It seems as if the term frequency for some term exceeded the maximum. > >> This can happen if you supplied custom term frequencies eg with > >> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true > >> . The behavior didn't change since 8.x but it's possible that the > >> merging brought together some very "high frequency" terms that were > >> previously not in the same segment? > >> > >> On Tue, May 7, 2024 at 4:03 AM Jerven Tjalling Bolleman > >> <jerven.bolleman@sib.swiss> wrote: > >>> > >>> Dear Lucene community, > >>> > >>> This morning I found this exception in our logs. This was the first time > >>> we indexed this data with lucene 9.10. Before we were still on the > >>> lucene 8.x branch. between the last indexing with 8 and this one with > >>> 9.10 we have a bit more data so it could be something else that went > >>> over an limit. > >>> > >>> Unfortunately, from this log message I am at a loss for what is going > >>> on. And what I could do to prevent this from happening. Does anyone have > >>> any ideas? > >>> > >>> Regards, > >>> Jerven Bolleman > >>> > >>> > >>> Exception in thread "Lucene Merge Thread #202" > >>> org.apache.lucene.index.MergePolicy$MergeException: > >>> java.lang.ArithmeticException: integer overflow > >>> at > >>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735) > >>> at > >>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727) > >>> Caused by: java.lang.ArithmeticException: integer overflow > >>> at java.base/java.lang.Math.toIntExact(Math.java:1135) > >>> at > >>> org.apache.lucene.store.DataOutput.writeGroupVInts(DataOutput.java:354) > >>> at > >>> org.apache.lucene.codecs.lucene99.Lucene99PostingsWriter.finishTerm(Lucene99PostingsWriter.java:379) > >>> at > >>> org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:173) > >>> at > >>> org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.write(Lucene90BlockTreeTermsWriter.java:1097) > >>> at > >>> org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter.write(Lucene90BlockTreeTermsWriter.java:398) > >>> at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:95) > >>> at > >>> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:205) > >>> at > >>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:209) > >>> at > >>> org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298) > >>> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137) > >>> at > >>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252) > >>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740) > >>> at > >>> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541) > >>> at > >>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639) > >>> at > >>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700) > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org