[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146683#comment-17146683 ]
Michael McCandless commented on LUCENE-8962: -------------------------------------------- LOL, that's awesome. Hmm, beasting just hit another failure in same source: {noformat} [junit4:pickseed] Seed property 'tests.seed' already defined: BC66D44FB7C99235 [junit4] <JUnit4> says salut! Master seed: BC66D44FB7C99235 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(1739111@localhost). [junit4] Suite: org.apache.lucene.search.TestPhraseWildcardQuery [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestPhraseWildcardQuery -Dtests.method=testMaxExpansions -Dtests.seed=BC66D44FB7C99235 -Dtests.slow=true -Dtests.badapples=true \ -Dtests.locale=es -Dtests.timezone=Africa/Douala -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.22s | TestPhraseWildcardQuery.testMaxExpansions <<< [junit4] > Throwable #1: java.lang.AssertionError: test test relies on 2 segments expected:<2> but was:<1> [junit4] > at __randomizedtesting.SeedInfo.seed([BC66D44FB7C99235:83D14835E8C05BA3]:0) [junit4] > at org.apache.lucene.search.TestPhraseWildcardQuery.setUp(TestPhraseWildcardQuery.java:76) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:566) [junit4] > at java.base/java.lang.Thread.run(Thread.java:834) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene86), sim=Asserting(RandomSimilarity(queryNorm=true): {other=DFR I(n)B2, author=org.apache.lucene.search.similarities.BooleanSi\ milarity@3dcd0436, title=DFR GL3(800.0), category=DFR I(F)LZ(0.3)}), locale=es, timezone=Africa/Douala [junit4] 2> NOTE: Linux 5.5.6-arch1-1 amd64/Oracle Corporation 11.0.6 (64-bit)/cpus=128,threads=1,free=477476440,total=536870912 [junit4] 2> NOTE: All tests run in this JVM: [TestPhraseWildcardQuery] [junit4] Completed [1/1 (1!)] in 0.46s, 1 test, 1 failure <<< FAILURES! [junit4] [junit4] [junit4] Tests with failures [seed: BC66D44FB7C99235]: [junit4] - org.apache.lucene.search.TestPhraseWildcardQuery.testMaxExpansions [junit4] [junit4] [junit4] JVM J0: 0.34 .. 1.19 = 0.85s [junit4] Execution time total: 1.20 sec. [junit4] Tests summary: 1 suite, 1 test, 1 failure {noformat} which is odd because you fix (in {{.setUp}}) should have also fixed this one. > Can we merge small segments during refresh, for faster searching? > ----------------------------------------------------------------- > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Michael McCandless > Priority: Major > Fix For: 8.6 > > Attachments: LUCENE-8962_demo.png, failed-tests.patch, > failure_log.txt, test.diff > > Time Spent: 21h 50m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org