Re: [PR] SOLR-17120: handle null value when merging partials (backport of solr#2214) [lucene-solr]
cpoerschke commented on code in PR #2683: URL: https://github.com/apache/lucene-solr/pull/2683#discussion_r1467470648 ## solr/CHANGES.txt: ## @@ -42,6 +42,9 @@ Bug Fixes * SOLR-17098: ZK Credentials and ACLs are no longer sent to all ZK Servers when using Streaming Expressions. They will only be used when sent to the default ZK Host. (Houston Putman, Jan Høydahl, David Smiley, Gus Heck, Qing Xu) +* SOLR-17120: Fix NullPointerException in UpdateLog.applyOlderUpdates that can occur if there are multiple partial + updates of the same document in separate requests using commitWithin. (Calvin Smith, Christine Poerschke) + Review Comment: Just noting that `main` and `branch_9x` and `branch_9_5` (if releasing after 8.11.3) don't have a 8.11.3 section as yet but would presumably get it in bulk as part of the 8.11.3 release process i.e. no need for individual tickets to add the entries incrementally on those branches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]
easyice opened a new pull request, #13035: URL: https://github.com/apache/lucene/pull/13035 ``` ./gradlew :lucene:core:test --tests "org.apache.lucene.index.TestConcurrentMergeScheduler.testNoStallMergeThreads" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.seed=13FCF0E4FD5ABF60 -Ptests.nightly=true -Ptests.gui=false -Ptests.file.encoding=US-ASCII -Ptests.vectorsize=128 ``` ``` 2> org.apache.lucene.index.MergePolicy$MergeException: java.nio.file.FileSystemException: /Users/zhangchao-so/Develop/git/fork/lucene/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-008/index-NIOFSDirectory-001/_cr_Lucene99_0.tip: Too many open files 2>at __randomizedtesting.SeedInfo.seed([13FCF0E4FD5ABF60]:0) 2>at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735) 2>at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727) 2> Caused by: java.nio.file.FileSystemException: /Users/zhangchao-so/Develop/git/fork/lucene/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-008/index-NIOFSDirectory-001/_cr_Lucene99_0.tip: Too many open files 2>at org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67) 2>at org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82) 2>at org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:202) 2>at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newFileChannel(FilterFileSystemProvider.java:206) 2>at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298) ``` git bisect shows this commit as the perpetrator: d6836d3d0e5d33a98b35c0885b9787f46c4be47e -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Do not use mock merge policy for TestGrouping [lucene]
benwtrent merged PR #13034: URL: https://github.com/apache/lucene/pull/13034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible failure in TestGrouping.testRandom [lucene]
benwtrent closed issue #13025: Reproducible failure in TestGrouping.testRandom URL: https://github.com/apache/lucene/issues/13025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]
rmuir commented on PR #13035: URL: https://github.com/apache/lucene/pull/13035#issuecomment-1911992774 This is not the correct fix: instead the test must be fixed to not use so many files at once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]
rmuir commented on code in PR #13035: URL: https://github.com/apache/lucene/pull/13035#discussion_r1467598866 ## lucene/core/src/test/org/apache/lucene/index/TestConcurrentMergeScheduler.java: ## @@ -43,6 +43,8 @@ import org.apache.lucene.tests.util.TestUtil; import org.apache.lucene.util.InfoStream; +// the testNoStallMergeThreads method will create many files +@LuceneTestCase.SuppressFileSystems("HandleLimitFS") Review Comment: This must never be suppressed for any test. It defeats the entire purposes, which is to detect tests that incorrectly use too many files rather than failing only on some computer systems and not on others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]
easyice commented on code in PR #13035: URL: https://github.com/apache/lucene/pull/13035#discussion_r1467674266 ## lucene/core/src/test/org/apache/lucene/index/TestConcurrentMergeScheduler.java: ## @@ -43,6 +43,8 @@ import org.apache.lucene.tests.util.TestUtil; import org.apache.lucene.util.InfoStream; +// the testNoStallMergeThreads method will create many files +@LuceneTestCase.SuppressFileSystems("HandleLimitFS") Review Comment: Thank you @rmuir , fixed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]
easyice commented on PR #13035: URL: https://github.com/apache/lucene/pull/13035#issuecomment-1912109709 Thank you! i replaced the suppress with `setMaxBufferedDocs`, the open files in nightly tests reduced from 4000+ to 400+, does that looks okay? The `testDeleteMerging` might also create many files, i added a similar change to limit. Can **not** be reproduced: ``` > Caused by: > java.nio.file.FileSystemException: /dev/shm/lucene_candidate/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-001/index-NIOFSDirectory-001/_cr_Lucene99_0.tip: Too many open files > at org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67) > at org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82) > at org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:202) > at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newFileChannel(FilterFileSystemProvider.java:206) > at java.base/java.nio.channels.FileChannel.open(FileChannel.java:309) > at java.base/java.nio.channels.FileChannel.open(FileChannel.java:369) > at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78) > at org.apache.lucene.tests.util.LuceneTestCase.slowFileExists(LuceneTestCase.java:3014) > at org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:800) > at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.(Lucene90BlockTreeTermsReader.java:146) > at org.apache.lucene.codecs.lucene99.Lucene99PostingsFormat.fieldsProducer(Lucene99PostingsFormat.java:428) > at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:330) > at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:392) > at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:98) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95) > at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:178) > at org.apache.lucene.index.ReadersAndUpdates.getReaderForMerge(ReadersAndUpdates.java:784) > at org.apache.lucene.index.IndexWriter.lambda$mergeMiddle$21(IndexWriter.java:5144) > at org.apache.lucene.index.MergePolicy$OneMerge.initMergeReaders(MergePolicy.java:469) > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5140) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4738) > at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6539) > at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639) > at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700) 2> NOTE: reproduce with: gradlew test --tests TestConcurrentMergeScheduler.testDeleteMerging -Dtests.seed=13FCF0E4FD5ABF60 -Dtests.nightly=true -Dtests.locale=mgh-Latn-MZ -Dtests.timezone=Asia/Dili -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]
benwtrent commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1912157192 I ran my own experiment, which showed some interesting and frustrating results. I adjusted the indexing to randomly commit() on every 500 docs or so. I indexed the first 10M docs of cohere-wiki and used max-inner product over the raw float32. This showed that we have some graph building problems, will include those results as well. # Graph ![image](https://github.com/apache/lucene/assets/4357155/7670c364-e66a-4719-a6b5-3f068d732863) The python code & raw data used: ```python ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Early exit range queries on non-matching segments. [lucene]
jpountz merged PR #13033: URL: https://github.com/apache/lucene/pull/13033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Propagate minimum competitive score in ReqOptSumScorer. [lucene]
jpountz merged PR #13026: URL: https://github.com/apache/lucene/pull/13026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SOLR-17120: handle null value when merging partials (backport of solr#2214) [lucene-solr]
cpoerschke merged PR #2683: URL: https://github.com/apache/lucene-solr/pull/2683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix too many open files Exception for TestConcurrentMergeScheduler [lucene]
rmuir commented on PR #13035: URL: https://github.com/apache/lucene/pull/13035#issuecomment-1912510553 Thank you @easyice ! There is also a `TestUtil.reduceOpenFiles`, I'm not sure if it is appropriate here, but something to look into. It is also nice since it makes it obvious from the code why the settings are being changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] HnwsGraph creates disconnected components [lucene]
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1912598493 OK, I added https://github.com/mikemccand/luceneutil/pull/253 Doing some local benchmarking. It seems that the more merges occur, the worse we can get. Sometimes I get good graphs like this: ``` Leaf 5 has 5 layers Leaf 5 has 137196 documents Graph level=4 size=2, Fanout min=1, mean=1.00, max=1 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Graph level=3 size=34, Fanout min=4, mean=7.82, max=12 % 0 10 20 30 40 50 60 70 80 90 100 0 5 6 7 7 8 8 9 9 10 12 Graph level=2 size=545, Fanout min=13, mean=15.98, max=16 % 0 10 20 30 40 50 60 70 80 90 100 0 16 16 16 16 16 16 16 16 16 16 Graph level=1 size=8693, Fanout min=16, mean=16.00, max=16 % 0 10 20 30 40 50 60 70 80 90 100 0 16 16 16 16 16 16 16 16 16 16 Graph level=0 size=137196, Fanout min=3, mean=24.57, max=32 % 0 10 20 30 40 50 60 70 80 90 100 0 14 17 19 22 26 31 32 32 32 32 Graph level=4 size=2, connectedness=1.00 Graph level=3 size=34, connectedness=1.00 Graph level=2 size=545, connectedness=1.00 Graph level=1 size=8693, connectedness=1.00 Graph level=0 size=137196, connectedness=1.00 ``` Other times I get graphs that are pretty abysmal: ``` Leaf 7 has 4 layers Leaf 7 has 39628 documents Graph level=3 size=8, Fanout min=1, mean=1.75, max=7 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 7 7 Graph level=2 size=153, Fanout min=1, mean=1.10, max=16 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 16 Graph level=1 size=2503, Fanout min=1, mean=1.01, max=16 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 16 Graph level=0 size=39628, Fanout min=1, mean=1.00, max=32 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 32 Graph level=3 size=8, connectedness=1.00 Graph level=2 size=153, connectedness=0.12 Graph level=1 size=2503, connectedness=0.01 Graph level=0 size=39628, connectedness=0.00 ``` The numbers are so bad, I almost think this is a bug in my measurements, but it isn't clear to me where it would be. I am going to validate older versions of Lucene to see if this changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] byte to int in TruncateTokenFilterFactory to TruncateTokenFilter [lucene]
scampi commented on issue #12449: URL: https://github.com/apache/lucene/issues/12449#issuecomment-1912710789 The PR got merged, this can be closed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] GitHub#12946: Removing thread sleep calls from TestIndexWriter.testThreadInterruptDeadlock and TestDirectoryReader.testStressTryIncRef [lucene]
iamsanjay opened a new pull request, #13037: URL: https://github.com/apache/lucene/pull/13037 ### Description As suggested in #12946, I tried to tackle two test cases and replacing `Thread.sleep` calls appearing in `TestIndexWriter.testThreadInterruptDeadlock` and `TestDirectoryReader.testStressTryIncRef` And of course, #13001 is going to precedes this one. For now seeking suggestions from community regarding new code that would replace the `Thread.sleep` calls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org