[ 
https://issues.apache.org/jira/browse/SOLR-13741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13741:
--------------------------------------
    Attachment: SOLR-13741.patch
        Status: Open  (was: Open)

[~janhoy] : Thanks for that digging – It sounds like there is definitely a bug 
there and i've spun that off into SOLR-13835.

But based on your analysis, that only explains the "401+403 audit logs" 
situation happened when only a 401 audit log was "expected" ... it still 
doesn't explain several of the other nocommit questions from my last patch...
 * why did the comment for a "wrong password" claim it was going to get a 403 
exception + audit log ?
 ** I think this was just a test mistake, probably infered from the 403 audit 
log it saw anyway due to the above mentioned bug
 ** my understanding of when/why a 401 vs 403 should be used: 401 makes sense 
for "failed to authenticate", 403 only makes sense for "authenticated 
correctly, but not autorized"
 * why the _actual_ audit log recieved in the "wrong password" situation is so 
different (and sparse) compared to other audit log events ?
 ** notably:
 *** the path is entirely "/solr" (even though most audit events don't include 
the "/solr" prefix at all)
 *** none of the request params exist in the event, even though they are listed 
in the "queryString" attribute
 *** RequestType is 'UNKOWN'
 *** The 'user' isn't filled in.
 ** my best guess is that this is because the authentication happens _before_ 
the request parsing, so when that fails the request parsing is aborted and we 
don't have these details in the AuditEvent
 *** Does this hypothosis make sense with the current code ?
 *** even if this is expected for the current code, _SHOULD_ it be? or should 
be consider it a bug in the current code that failed-authentication produces 
AuditEvents that are this sparse? Should the code be improved so that at least 
the 'user' (whose password was wrong) is filled in?
 * Are {{/admin/info/key}} events expect when auth is enabled ?
 ** I noticed these showing up when i started experimenting with the tests and 
making tighter assertions about the number of total events generated
 ** is it expected that we should see AuditEvents for this path anytime 
authentication is enabled? is it ok for the test to explicitly ignore these 
events, or should we be asserting that they happen in a predictible way when 
authentication is enabled?

...if you could chime in on those as well I'd appreciate it.

In the interest of moving forward, I went ahead and updated the patch to 
include...
 * new explicit auth tests for:
 ** success logging when an authenticated & authorized user creates a collection
 ** failure logging when an authenticated user doesn't have neccessary 
authorization for admin request
 * updated comments / conditional logic explaining the extra events from the 
known bug: SOLR-13835
 * updated CallbackAuditLoggerPlugin to rip out the hardcoded sleep() function 
when "delay" is configured and replace it with the use of "named Semaphores" so 
that the test can signal exactly when it's ok for events to be processed, and 
how many.

> possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-13741
>                 URL: https://issues.apache.org/jira/browse/SOLR-13741
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-13741.patch, SOLR-13741.patch
>
>
> A while back i saw a weird non-reproducible failure from 
> AuditLoggerIntegrationTest.  When i started reading through that code, 2 
> things jumped out at me:
> # the way the 'delay' option works is brittle, and makes assumptions about 
> CPU scheduling that aren't neccessarily going to be true (and also suffers 
> from the problem that Thread.sleep isn't garunteed to sleep as long as you 
> ask it too)
> # the way the existing {{waitForAuditEventCallbacks(number)}} logic works by 
> checking the size of a (List) {{buffer}} of recieved events in a sleep/poll 
> loop, until it contains at least N items -- but the code that adds items to 
> that buffer in the async Callback thread async _before_ the code that updates 
> other state variables (like the global {{count}} and the patch specific 
> {{resourceCounts}}) meaning that a test waiting on 3 events could "see" 3 
> events added to the buffer, but calling {{assertEquals(3, 
> receiver.getTotalCount())}} could subsequently fail because that variable 
> hadn't been udpated yet.
> #2 was the source of the failures I was seeing, and while a quick fix for 
> that specific problem would be to update all other state _before_ adding the 
> event to the buffer, I set out to try and make more general improvements to 
> the test:
> * eliminate the dependency on sleep loops by {{await}}-ing on concurrent data 
> structures
> * harden the assertions made about the expected events recieved (updating 
> some test methods that currently just assert the number of events recieved)
> * add new assertions that _only_ the expected events are recieved.
> In the process of doing this, I've found several oddities/descrepencies 
> between things the test currently claims/asserts, and what *actually* happens 
> under more rigerous scrutiny/assertions.
> I'll attach a patch shortly that has my (in progress) updates and inlcudes 
> copious nocommits about things seem suspect.  the summary of these concerns 
> is:
> * SolrException status codes that do not match what the existing test says 
> they should (but doesn't assert)
> * extra AuditEvents occuring that the existing test does not expect
> * AuditEvents for incorrect credentials that do not at all match the expected 
> AuditEvent in the existing test -- which the current test seems to miss in 
> it's assertions because it's picking up some extra events from triggered by 
> previuos requests earlier in the test that just happen to also match the 
> asserctions.
> ...it's not clear to me if the test logic is correct and these are "code 
> bugs" or if the test is faulty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to