Re: [PR] SOLR-17120: handle null value when merging partials (backport of solr#2214) [lucene-solr]

2024-01-26 Thread via GitHub


cpoerschke commented on code in PR #2683:
URL: https://github.com/apache/lucene-solr/pull/2683#discussion_r1467470648


##
solr/CHANGES.txt:
##
@@ -42,6 +42,9 @@ Bug Fixes
 * SOLR-17098: ZK Credentials and ACLs are no longer sent to all ZK Servers 
when using Streaming Expressions.
   They will only be used when sent to the default ZK Host. (Houston Putman, 
Jan Høydahl, David Smiley, Gus Heck, Qing Xu)
 
+* SOLR-17120: Fix NullPointerException in UpdateLog.applyOlderUpdates that can 
occur if there are multiple partial
+  updates of the same document in separate requests using commitWithin. 
(Calvin Smith, Christine Poerschke)
+

Review Comment:
   Just noting that `main` and `branch_9x` and `branch_9_5` (if releasing after 
8.11.3) don't have a 8.11.3 section as yet but would presumably get it in bulk 
as part of the 8.11.3 release process i.e.  no need for individual tickets to 
add the entries incrementally on those branches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]

2024-01-26 Thread via GitHub


easyice opened a new pull request, #13035:
URL: https://github.com/apache/lucene/pull/13035

   ```
   ./gradlew :lucene:core:test --tests 
"org.apache.lucene.index.TestConcurrentMergeScheduler.testNoStallMergeThreads" 
-Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC 
-XX:ActiveProcessorCount=1" -Ptests.seed=13FCF0E4FD5ABF60 -Ptests.nightly=true 
-Ptests.gui=false -Ptests.file.encoding=US-ASCII -Ptests.vectorsize=128
   ```
   
   
   ```
 2> org.apache.lucene.index.MergePolicy$MergeException: 
java.nio.file.FileSystemException: 
/Users/zhangchao-so/Develop/git/fork/lucene/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-008/index-NIOFSDirectory-001/_cr_Lucene99_0.tip:
 Too many open files
 2>at __randomizedtesting.SeedInfo.seed([13FCF0E4FD5ABF60]:0)
 2>at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735)
 2>at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727)
 2> Caused by: java.nio.file.FileSystemException: 
/Users/zhangchao-so/Develop/git/fork/lucene/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-008/index-NIOFSDirectory-001/_cr_Lucene99_0.tip:
 Too many open files
 2>at 
org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67)
 2>at 
org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
 2>at 
org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:202)
 2>at 
org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newFileChannel(FilterFileSystemProvider.java:206)
 2>at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298)
   ```
   
   git bisect shows this commit as the perpetrator: 
d6836d3d0e5d33a98b35c0885b9787f46c4be47e
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Do not use mock merge policy for TestGrouping [lucene]

2024-01-26 Thread via GitHub


benwtrent merged PR #13034:
URL: https://github.com/apache/lucene/pull/13034


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible failure in TestGrouping.testRandom [lucene]

2024-01-26 Thread via GitHub


benwtrent closed issue #13025: Reproducible failure in TestGrouping.testRandom
URL: https://github.com/apache/lucene/issues/13025


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]

2024-01-26 Thread via GitHub


rmuir commented on PR #13035:
URL: https://github.com/apache/lucene/pull/13035#issuecomment-1911992774

   This is not the correct fix: instead the test must be fixed to not use so 
many files at once.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]

2024-01-26 Thread via GitHub


rmuir commented on code in PR #13035:
URL: https://github.com/apache/lucene/pull/13035#discussion_r1467598866


##
lucene/core/src/test/org/apache/lucene/index/TestConcurrentMergeScheduler.java:
##
@@ -43,6 +43,8 @@
 import org.apache.lucene.tests.util.TestUtil;
 import org.apache.lucene.util.InfoStream;
 
+// the testNoStallMergeThreads method will create many files
+@LuceneTestCase.SuppressFileSystems("HandleLimitFS")

Review Comment:
   This must never be suppressed for any test. It defeats the entire purposes, 
which is to detect tests that incorrectly use too many files rather than 
failing only on some computer systems and not on others.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]

2024-01-26 Thread via GitHub


easyice commented on code in PR #13035:
URL: https://github.com/apache/lucene/pull/13035#discussion_r1467674266


##
lucene/core/src/test/org/apache/lucene/index/TestConcurrentMergeScheduler.java:
##
@@ -43,6 +43,8 @@
 import org.apache.lucene.tests.util.TestUtil;
 import org.apache.lucene.util.InfoStream;
 
+// the testNoStallMergeThreads method will create many files
+@LuceneTestCase.SuppressFileSystems("HandleLimitFS")

Review Comment:
   Thank you @rmuir , fixed!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Do not use HandleLimitFS for TestConcurrentMergeScheduler [lucene]

2024-01-26 Thread via GitHub


easyice commented on PR #13035:
URL: https://github.com/apache/lucene/pull/13035#issuecomment-1912109709

   Thank you! i replaced the suppress with `setMaxBufferedDocs`, the open files 
in nightly tests reduced from 4000+ to 400+, does that looks okay?
   
   The `testDeleteMerging` might also create many files, i added a similar 
change to limit.
   Can **not** be reproduced: 
   
   
   ```
  > Caused by:
  > java.nio.file.FileSystemException: 
/dev/shm/lucene_candidate/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-001/index-NIOFSDirectory-001/_cr_Lucene99_0.tip:
 Too
   many open files
  > at 
org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67)
  > at 
org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
  > at 
org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:202)
  > at 
org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newFileChannel(FilterFileSystemProvider.java:206)
  > at 
java.base/java.nio.channels.FileChannel.open(FileChannel.java:309)
  > at 
java.base/java.nio.channels.FileChannel.open(FileChannel.java:369)
  > at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
  > at 
org.apache.lucene.tests.util.LuceneTestCase.slowFileExists(LuceneTestCase.java:3014)
  > at 
org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:800)
  > at 
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.(Lucene90BlockTreeTermsReader.java:146)
  > at 
org.apache.lucene.codecs.lucene99.Lucene99PostingsFormat.fieldsProducer(Lucene99PostingsFormat.java:428)
  > at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:330)
  > at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:392)
  > at 
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:98)
  > at 
org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
  > at 
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:178)
  > at 
org.apache.lucene.index.ReadersAndUpdates.getReaderForMerge(ReadersAndUpdates.java:784)
  > at 
org.apache.lucene.index.IndexWriter.lambda$mergeMiddle$21(IndexWriter.java:5144)
  > at 
org.apache.lucene.index.MergePolicy$OneMerge.initMergeReaders(MergePolicy.java:469)
  > at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5140)
  > at 
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4738)
  > at 
org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6539)
  > at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
  > at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
 2> NOTE: reproduce with: gradlew test --tests 
TestConcurrentMergeScheduler.testDeleteMerging -Dtests.seed=13FCF0E4FD5ABF60 
-Dtests.nightly=true -Dtests.locale=mgh-Latn-MZ -Dtests.timezone=Asia/Dili 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-26 Thread via GitHub


benwtrent commented on PR #12962:
URL: https://github.com/apache/lucene/pull/12962#issuecomment-1912157192

   I ran my own experiment, which showed some interesting and frustrating 
results. 
   
   I adjusted the indexing to randomly commit() on every 500 docs or so. I 
indexed the first 10M docs of cohere-wiki and used max-inner product over the 
raw float32. This showed that we have some graph building problems, will 
include those results as well.
   
   # Graph
   
   
![image](https://github.com/apache/lucene/assets/4357155/7670c364-e66a-4719-a6b5-3f068d732863)
   
   The python code & raw data used:
   
   
   
   ```python
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Early exit range queries on non-matching segments. [lucene]

2024-01-26 Thread via GitHub


jpountz merged PR #13033:
URL: https://github.com/apache/lucene/pull/13033


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Propagate minimum competitive score in ReqOptSumScorer. [lucene]

2024-01-26 Thread via GitHub


jpountz merged PR #13026:
URL: https://github.com/apache/lucene/pull/13026


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] SOLR-17120: handle null value when merging partials (backport of solr#2214) [lucene-solr]

2024-01-26 Thread via GitHub


cpoerschke merged PR #2683:
URL: https://github.com/apache/lucene-solr/pull/2683


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix too many open files Exception for TestConcurrentMergeScheduler [lucene]

2024-01-26 Thread via GitHub


rmuir commented on PR #13035:
URL: https://github.com/apache/lucene/pull/13035#issuecomment-1912510553

   Thank you @easyice ! There is also a `TestUtil.reduceOpenFiles`, I'm not 
sure if it is appropriate here, but something to look into. It is also nice 
since it makes it obvious from the code why the settings are being changed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] HnwsGraph creates disconnected components [lucene]

2024-01-26 Thread via GitHub


benwtrent commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1912598493

   OK, I added https://github.com/mikemccand/luceneutil/pull/253 
   
   Doing some local benchmarking. It seems that the more merges occur, the 
worse we can get. 
   
   Sometimes I get good graphs like this: 
   
   ```
   Leaf 5 has 5 layers
   Leaf 5 has 137196 documents
   Graph level=4 size=2, Fanout min=1, mean=1.00, max=1
   %   0  10  20  30  40  50  60  70  80  90 100
   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1
   Graph level=3 size=34, Fanout min=4, mean=7.82, max=12
   %   0  10  20  30  40  50  60  70  80  90 100
   0   5   6   7   7   8   8   9   9  10  12
   Graph level=2 size=545, Fanout min=13, mean=15.98, max=16
   %   0  10  20  30  40  50  60  70  80  90 100
   0  16  16  16  16  16  16  16  16  16  16
   Graph level=1 size=8693, Fanout min=16, mean=16.00, max=16
   %   0  10  20  30  40  50  60  70  80  90 100
   0  16  16  16  16  16  16  16  16  16  16
   Graph level=0 size=137196, Fanout min=3, mean=24.57, max=32
   %   0  10  20  30  40  50  60  70  80  90 100
   0  14  17  19  22  26  31  32  32  32  32
   Graph level=4 size=2, connectedness=1.00
   Graph level=3 size=34, connectedness=1.00
   Graph level=2 size=545, connectedness=1.00
   Graph level=1 size=8693, connectedness=1.00
   Graph level=0 size=137196, connectedness=1.00
   ```
   
   Other times I get graphs that are pretty abysmal: 
   ```
   Leaf 7 has 4 layers
   Leaf 7 has 39628 documents
   Graph level=3 size=8, Fanout min=1, mean=1.75, max=7
   %   0  10  20  30  40  50  60  70  80  90 100
   0   1   1   1   1   1   1   1   1   1   7   7
   Graph level=2 size=153, Fanout min=1, mean=1.10, max=16
   %   0  10  20  30  40  50  60  70  80  90 100
   0   1   1   1   1   1   1   1   1   1  16
   Graph level=1 size=2503, Fanout min=1, mean=1.01, max=16
   %   0  10  20  30  40  50  60  70  80  90 100
   0   1   1   1   1   1   1   1   1   1  16
   Graph level=0 size=39628, Fanout min=1, mean=1.00, max=32
   %   0  10  20  30  40  50  60  70  80  90 100
   0   1   1   1   1   1   1   1   1   1  32
   Graph level=3 size=8, connectedness=1.00
   Graph level=2 size=153, connectedness=0.12
   Graph level=1 size=2503, connectedness=0.01
   Graph level=0 size=39628, connectedness=0.00
   ```
   
   The numbers are so bad, I almost think this is a bug in my measurements, but 
it isn't clear to me where it would be. 
   
   I am going to validate older versions of Lucene to see if this changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] byte to int in TruncateTokenFilterFactory to TruncateTokenFilter [lucene]

2024-01-26 Thread via GitHub


scampi commented on issue #12449:
URL: https://github.com/apache/lucene/issues/12449#issuecomment-1912710789

   The PR got merged, this can be closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] GitHub#12946: Removing thread sleep calls from TestIndexWriter.testThreadInterruptDeadlock and TestDirectoryReader.testStressTryIncRef [lucene]

2024-01-26 Thread via GitHub


iamsanjay opened a new pull request, #13037:
URL: https://github.com/apache/lucene/pull/13037

   ### Description
   
   
   As suggested in  #12946, I tried to tackle two test cases and replacing 
`Thread.sleep` calls appearing in `TestIndexWriter.testThreadInterruptDeadlock` 
and  `TestDirectoryReader.testStressTryIncRef`
   
   And of course, #13001 is going to precedes this one. For now seeking 
suggestions from community regarding new code that would replace the 
`Thread.sleep` calls.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org