[
https://issues.apache.org/jira/browse/SOLR-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907644#comment-16907644
]
Hoss Man commented on SOLR-13696:
---------------------------------
Gus: can you please take a look at this?
based on my assessment, here's the crucial bits of the log..
{noformat}
hossman@tray:~/tmp/jenkins/DimensionalRoutedAliasUpdateProcessorTest$ grep
testTimeCat__TRA__2019-07-05__CRA__calico
thetaphi_Lucene-Solr-8.x-MacOSX_272.log.txt | egrep '(Opening
\[Searcher|add=\[21|fq=cat_s:calico|\{\!terms\+f%3Did}21,20.*hits=2)'
[junit4] 2> 4476175 INFO (qtp759508539-75005) [n:127.0.0.1:55915_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard1 r:core_node5
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n2 ]
o.a.s.s.SolrIndexSearcher Opening
[Searcher@9bc49f7[testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n2]
main]
[junit4] 2> 4476176 INFO (qtp1536738594-75022) [n:127.0.0.1:55916_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard1 r:core_node3
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1 ]
o.a.s.s.SolrIndexSearcher Opening
[Searcher@5547583d[testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1]
main]
[junit4] 2> 4476186 INFO (qtp1998715126-75500) [n:127.0.0.1:55917_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node8
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7 ]
o.a.s.s.SolrIndexSearcher Opening
[Searcher@3d40f0e1[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7]
main]
[junit4] 2> 4476195 INFO (qtp927691752-75020) [n:127.0.0.1:55918_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node6
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4 ]
o.a.s.s.SolrIndexSearcher Opening
[Searcher@18c82bc1[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4]
main]
[junit4] 2> 4477375 INFO (qtp1998715126-75016) [n:127.0.0.1:55917_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node8
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7 ]
o.a.s.u.p.LogUpdateProcessorFactory
[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7] webapp=/solr
path=/update
params={update.distrib=FROMLEADER&distrib.from=http://127.0.0.1:55918/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4/&wt=javabin&version=2}{add=[21
(1641811095092985856)]} 0 2
[junit4] 2> 4477960 INFO (qtp927691752-75506) [n:127.0.0.1:55918_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node6
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4 ]
o.a.s.u.p.LogUpdateProcessorFactory
[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4] webapp=/solr
path=/update
params={update.distrib=NONE&df=_text_&alias.update.distrib=TOLEADER&distrib.from=http://127.0.0.1:55918/solr/testTimeCat__TRA__2019-07-02__CRA__calico_shard2_replica_n6/&wt=javabin&version=2&processor=inc}{add=[21
(1641811095092985856)]} 0 590
[junit4] 2> 4477962 INFO (commitScheduler-24384-thread-1) [ ]
o.a.s.s.SolrIndexSearcher Opening
[Searcher@745b6c94[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4]
main]
[junit4] 2> 4478213 INFO (qtp1998715126-75501) [n:127.0.0.1:55917_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node8
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7 ]
o.a.s.c.S.Request [testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7]
webapp=/solr path=/select
params={q={!terms+f%3Did}21,20&rows=0&wt=javabin&version=2} hits=2 status=0
QTime=13
[junit4] 2> 4478408 INFO (qtp1998715126-75016) [n:127.0.0.1:55917_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node8
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7 ]
o.a.s.c.S.Request [testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7]
webapp=/solr path=/select
params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=516&start=0&fsv=true&fq=cat_s:calico&shard.url=http://127.0.0.1:55917/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7/|http://127.0.0.1:55918/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4/&rows=0&version=2&q=*:*&stats=true&omitHeader=false&NOW=1565753074817&isShard=true&wt=javabin&stats.field=timestamp_dt}
hits=0 status=0 QTime=0
[junit4] 2> 4478408 INFO (qtp1536738594-75032) [n:127.0.0.1:55916_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard1 r:core_node3
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1 ]
o.a.s.c.S.Request [testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1]
webapp=/solr path=/select
params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=516&start=0&fsv=true&fq=cat_s:calico&shard.url=http://127.0.0.1:55916/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1/|http://127.0.0.1:55915/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n2/&rows=0&version=2&q=*:*&stats=true&omitHeader=false&NOW=1565753074817&isShard=true&wt=javabin&stats.field=timestamp_dt}
hits=0 status=0 QTime=0
[junit4] 2> 4478408 INFO (qtp1536738594-75031) [n:127.0.0.1:55916_solr
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard1 r:core_node3
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1 ]
o.a.s.c.S.Request [testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1]
webapp=/solr path=/select
params={q=*:*&stats=true&fq=cat_s:calico&rows=0&wt=javabin&version=2&stats.field=timestamp_dt}
hits=0 status=0 QTime=1
[junit4] 2> 4478461 INFO (commitScheduler-24386-thread-1) [ ]
o.a.s.s.SolrIndexSearcher Opening
[Searcher@4ab6f7d7[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7]
main]
{noformat}
Summary:
* doc 21 is added to {{testTimeCat__TRA__2019-07-05__CRA__calico}}
* the complicated "check that it all got committed eventually" logic in
{{addDocsAndCommit()}} finds the 21 (and 20) on one replica
** but meanwhile
{{testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7}} hasn't finished
opening the searcher
* the test continues and eventually gets to the
{{"stats=true&fq=cat_s:calico"}} request
** {{testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7}} still
hasn't finished opening the searcher
** test fails to find this doc
I think either this test (helper method) needs to stop using "commitWithin" or
(if variations in routed alias code paths / behavior exist when commitWithin is
used and we definitely want to test that) then the "got committed eventually"
logic needs to be more robust and check *every* core (ie: loop over all the
replicas of all the shards until you find the doc on at least one, then loop
over all the other replicas of that same shard until you see that doc on all of
them)
(this problem doesn't seem specific to
DimensionalRoutedAliasUpdateProcessorTest , RoutedAliasUpdateProcessorTest and
any other subcalsses using {{addDocsAndCommit()}} are also going to be affected)
----
Also: this test (and i presume others like it esting the routed aliases) would
be a lot more robust and easier to debug if:
* it had more logging about what it was doing to use as "trace points" when
reading the logs
* it made simpler/tighter assertions earlier in the flow ie:
** do the "// verify that all the documents ended up in the right
collections." frequently
*** idealy at the begining of {{assertCatTimeInvariants()}} call
*** use a hardcoded number for the total docs expected (ie: don't rely on
{{lastDocId - numDocsDeletedOrFailed}} ) so that people reading/debugging the
test can sanity check exactly how many docs are expected at every point
**** the {{lastDocId - numDocsDeletedOrFailed}} is actually really brittle,
and will cause lots of headaches if anyone tries to add more docs/days to the
test because of how "router.1.autoDeleteAge" is used
** ideally: change {{assertCatTimeInvariants()}} to take the expected number
of docs per collection (as a map) instead of just a list a list of the expected
collection names
*** that way we can assert the specified number of docs exist in _each_
collection (ie: don't just checks that the docs we do find have field values
that match the collection name)
*** but first: use the sum of all values in that map to assert the total
number of docs found in the collection
> DimensionalRoutedAliasUpdateProcessorTest / RoutedAliasUpdateProcessorTest
> failures due commitWithin/openSearcher delays
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-13696
> URL: https://issues.apache.org/jira/browse/SOLR-13696
> Project: Solr
> Issue Type: Test
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Assignee: Gus Heck
> Priority: Major
> Attachments: thetaphi_Lucene-Solr-8.x-MacOSX_272.log.txt
>
>
> Recent jenkins failure...
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-MacOSX/272/
> Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC
> {noformat}
> Stack Trace:
> java.lang.AssertionError: expected:<16> but was:<15>
> at
> __randomizedtesting.SeedInfo.seed([DB6DC28D5560B1D2:E295833E1541FDB9]:0)
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:631)
> at
> org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.assertCatTimeInvariants(DimensionalRoutedAliasUpdateProcessorTest.java:677
> )
> at
> org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.testTimeCat(DimensionalRoutedAliasUpdateProcessorTest.java:282)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {noformat}
> Digging into the logs, the problem appears to be in the way the test
> verifies/assumes docs have been committed.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]