[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931573#comment-15931573
 ] 

Rakesh R commented on ZOOKEEPER-2712:
-------------------------------------

Thanks a lot [~hanm] for the unit test results and comments.

bq. I am curious what exactly cause the races though - from the description it 
was not very clear to me. Do you mind to elaborate a little bit with regards to 
what code in test case that uses what function in the dependency library and 
what the race condition is?

I hope the following explanation will help to understand the concurrency flow.

Below is the auth failed exception which frequently hits in our 
{{KerberosSecurityTestcase}} related unit test cases.
{code}
2017-03-17 15:55:51,397 [myid:] - WARN  
[NioProcessor-3:KerberosProtocolHandler@241] - Server not found in Kerberos 
database (7)
2017-03-17 15:55:51,398 [myid:] - WARN  
[NioProcessor-3:KerberosProtocolHandler@242] - Server not found in Kerberos 
database (7)
                [Krb5LoginModule] authentication failed 
Server not found in Kerberos database (7) - Server not found in Kerberos 
database
2017-03-17 15:55:51,409 [myid:1] - ERROR 
[Thread-3:QuorumPeerTestBase$MainThread@145] - unexpected exception in run
javax.security.sasl.SaslException: Failed to initialize authentication 
mechanism using SASL [Caused by javax.security.auth.login.LoginException: 
Server not found in Kerberos database (7) - Server not found in Kerberos 
database]
        at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.<init>(SaslQuorumAuthServer.java:69)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.initialize(QuorumPeer.java:570)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:162)
{code}

As we know, test case is creating a ZK cluster of size 3 and uses 
[QuorumAuthTestBase.startServer() 
#L186|https://github.com/apache/zookeeper/blob/branch-3.4/src/java/test/org/apache/zookeeper/server/quorum/auth/QuorumAuthTestBase.java#L186]
 function to start server in a separate thread. Now, we have three servers 
starting parallel in three different threads. During startup, each server will 
initialize SaslQuorumAuthServer and SaslQuorumAuthLearner 
[QuorumPeer#init|https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L570]
 and does auth login. For Krb login, it internally uses ApacheDS library as 
this test case is based on {{KerberosSecurityTestcase}}. I have experimented a 
test scenario of doing multiple Krb 
{{javax.security.auth.login.LoginContext#login()}} simultaneously and hits 
exactly the same error {{server not found in Kerberos database}}. Later, I made 
the login in a sequential fashion and never hits server not found problem. I 
personally feel, that ApacheDS login module is sharing some resources and 
resulting in concurrency failure. IMHO, fixing ApacheDS is not our scope and 
the sequential login changes makes the test case more consistent, does this 
make sense to you?

> MiniKdc test case intermittently failing due to principal not found in 
> Kerberos database
> ----------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2712
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2712
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: tests
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.10
>
>         Attachments: 
> TEST-org.apache.zookeeper.server.quorum.auth.QuorumKerberosAuthTest.txt
>
>
> MiniKdc test cases are intermittently failing due to not finding the 
> principal. Below is the failure stacktrace.
> {code}
> 2017-03-08 13:21:10,843 [myid:] - ERROR 
> [NioProcessor-1:AuthenticationService@187] - Error while searching for client 
> lear...@example.com : Client not found in Kerberos database
> 2017-03-08 13:21:10,843 [myid:] - WARN  
> [NioProcessor-2:KerberosProtocolHandler@241] - Server not found in Kerberos 
> database (7)
> 2017-03-08 13:21:10,845 [myid:] - WARN  
> [NioProcessor-2:KerberosProtocolHandler@242] - Server not found in Kerberos 
> database (7)
> 2017-03-08 13:21:10,844 [myid:] - WARN  
> [NioProcessor-1:KerberosProtocolHandler@241] - Client not found in Kerberos 
> database (6)
> 2017-03-08 13:21:10,845 [myid:] - WARN  
> [NioProcessor-1:KerberosProtocolHandler@242] - Client not found in Kerberos 
> database (6)
> {code}
> Will attach the detailed log to jira.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to