[ 
https://issues.apache.org/jira/browse/SOLR-13741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13741:
--------------------------------------
    Fix Version/s: 8.4
                   master (9.0)
      Description: 
This issue started out as an investigation into possible test or code ugs 
uncovered while hardening AuditLoggerIntegrationTest against timing related 
failures.  the bugs that were identified as being in code were spun of into 
their own issues for tracking purposes to raise visibility to end users.

this issue remains as for tracking the final hardening of the test and fixing 
of some test bugs found along the way.

Original jira description below...

----

A while back i saw a weird non-reproducible failure from 
AuditLoggerIntegrationTest.  When i started reading through that code, 2 things 
jumped out at me:

# the way the 'delay' option works is brittle, and makes assumptions about CPU 
scheduling that aren't neccessarily going to be true (and also suffers from the 
problem that Thread.sleep isn't garunteed to sleep as long as you ask it too)
# the way the existing {{waitForAuditEventCallbacks(number)}} logic works by 
checking the size of a (List) {{buffer}} of recieved events in a sleep/poll 
loop, until it contains at least N items -- but the code that adds items to 
that buffer in the async Callback thread async _before_ the code that updates 
other state variables (like the global {{count}} and the patch specific 
{{resourceCounts}}) meaning that a test waiting on 3 events could "see" 3 
events added to the buffer, but calling {{assertEquals(3, 
receiver.getTotalCount())}} could subsequently fail because that variable 
hadn't been udpated yet.

#2 was the source of the failures I was seeing, and while a quick fix for that 
specific problem would be to update all other state _before_ adding the event 
to the buffer, I set out to try and make more general improvements to the test:

* eliminate the dependency on sleep loops by {{await}}-ing on concurrent data 
structures
* harden the assertions made about the expected events recieved (updating some 
test methods that currently just assert the number of events recieved)
* add new assertions that _only_ the expected events are recieved.

In the process of doing this, I've found several oddities/descrepencies between 
things the test currently claims/asserts, and what *actually* happens under 
more rigerous scrutiny/assertions.

I'll attach a patch shortly that has my (in progress) updates and inlcudes 
copious nocommits about things seem suspect.  the summary of these concerns is:

* SolrException status codes that do not match what the existing test says they 
should (but doesn't assert)
* extra AuditEvents occuring that the existing test does not expect
* AuditEvents for incorrect credentials that do not at all match the expected 
AuditEvent in the existing test -- which the current test seems to miss in it's 
assertions because it's picking up some extra events from triggered by previuos 
requests earlier in the test that just happen to also match the asserctions.


...it's not clear to me if the test logic is correct and these are "code bugs" 
or if the test is faulty.


  was:
A while back i saw a weird non-reproducible failure from 
AuditLoggerIntegrationTest.  When i started reading through that code, 2 things 
jumped out at me:

# the way the 'delay' option works is brittle, and makes assumptions about CPU 
scheduling that aren't neccessarily going to be true (and also suffers from the 
problem that Thread.sleep isn't garunteed to sleep as long as you ask it too)
# the way the existing {{waitForAuditEventCallbacks(number)}} logic works by 
checking the size of a (List) {{buffer}} of recieved events in a sleep/poll 
loop, until it contains at least N items -- but the code that adds items to 
that buffer in the async Callback thread async _before_ the code that updates 
other state variables (like the global {{count}} and the patch specific 
{{resourceCounts}}) meaning that a test waiting on 3 events could "see" 3 
events added to the buffer, but calling {{assertEquals(3, 
receiver.getTotalCount())}} could subsequently fail because that variable 
hadn't been udpated yet.

#2 was the source of the failures I was seeing, and while a quick fix for that 
specific problem would be to update all other state _before_ adding the event 
to the buffer, I set out to try and make more general improvements to the test:

* eliminate the dependency on sleep loops by {{await}}-ing on concurrent data 
structures
* harden the assertions made about the expected events recieved (updating some 
test methods that currently just assert the number of events recieved)
* add new assertions that _only_ the expected events are recieved.

In the process of doing this, I've found several oddities/descrepencies between 
things the test currently claims/asserts, and what *actually* happens under 
more rigerous scrutiny/assertions.

I'll attach a patch shortly that has my (in progress) updates and inlcudes 
copious nocommits about things seem suspect.  the summary of these concerns is:

* SolrException status codes that do not match what the existing test says they 
should (but doesn't assert)
* extra AuditEvents occuring that the existing test does not expect
* AuditEvents for incorrect credentials that do not at all match the expected 
AuditEvent in the existing test -- which the current test seems to miss in it's 
assertions because it's picking up some extra events from triggered by previuos 
requests earlier in the test that just happen to also match the asserctions.


...it's not clear to me if the test logic is correct and these are "code bugs" 
or if the test is faulty.


       Issue Type: Test  (was: Bug)
          Summary: AuditLoggerIntegrationTest hardening  (was: possible 
AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest)

thanks Jan!

> AuditLoggerIntegrationTest hardening
> ------------------------------------
>
>                 Key: SOLR-13741
>                 URL: https://issues.apache.org/jira/browse/SOLR-13741
>             Project: Solr
>          Issue Type: Test
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>             Fix For: master (9.0), 8.4
>
>         Attachments: SOLR-13741.patch, SOLR-13741.patch, SOLR-13741.patch, 
> SOLR-13741.patch, SOLR-13741.patch, SOLR-13741.patch, SOLR-13741.patch, 
> SOLR-13741.patch
>
>
> This issue started out as an investigation into possible test or code ugs 
> uncovered while hardening AuditLoggerIntegrationTest against timing related 
> failures.  the bugs that were identified as being in code were spun of into 
> their own issues for tracking purposes to raise visibility to end users.
> this issue remains as for tracking the final hardening of the test and fixing 
> of some test bugs found along the way.
> Original jira description below...
> ----
> A while back i saw a weird non-reproducible failure from 
> AuditLoggerIntegrationTest.  When i started reading through that code, 2 
> things jumped out at me:
> # the way the 'delay' option works is brittle, and makes assumptions about 
> CPU scheduling that aren't neccessarily going to be true (and also suffers 
> from the problem that Thread.sleep isn't garunteed to sleep as long as you 
> ask it too)
> # the way the existing {{waitForAuditEventCallbacks(number)}} logic works by 
> checking the size of a (List) {{buffer}} of recieved events in a sleep/poll 
> loop, until it contains at least N items -- but the code that adds items to 
> that buffer in the async Callback thread async _before_ the code that updates 
> other state variables (like the global {{count}} and the patch specific 
> {{resourceCounts}}) meaning that a test waiting on 3 events could "see" 3 
> events added to the buffer, but calling {{assertEquals(3, 
> receiver.getTotalCount())}} could subsequently fail because that variable 
> hadn't been udpated yet.
> #2 was the source of the failures I was seeing, and while a quick fix for 
> that specific problem would be to update all other state _before_ adding the 
> event to the buffer, I set out to try and make more general improvements to 
> the test:
> * eliminate the dependency on sleep loops by {{await}}-ing on concurrent data 
> structures
> * harden the assertions made about the expected events recieved (updating 
> some test methods that currently just assert the number of events recieved)
> * add new assertions that _only_ the expected events are recieved.
> In the process of doing this, I've found several oddities/descrepencies 
> between things the test currently claims/asserts, and what *actually* happens 
> under more rigerous scrutiny/assertions.
> I'll attach a patch shortly that has my (in progress) updates and inlcudes 
> copious nocommits about things seem suspect.  the summary of these concerns 
> is:
> * SolrException status codes that do not match what the existing test says 
> they should (but doesn't assert)
> * extra AuditEvents occuring that the existing test does not expect
> * AuditEvents for incorrect credentials that do not at all match the expected 
> AuditEvent in the existing test -- which the current test seems to miss in 
> it's assertions because it's picking up some extra events from triggered by 
> previuos requests earlier in the test that just happen to also match the 
> asserctions.
> ...it's not clear to me if the test logic is correct and these are "code 
> bugs" or if the test is faulty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to