[
https://issues.apache.org/jira/browse/ZOOKEEPER-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931573#comment-15931573
]
Rakesh R commented on ZOOKEEPER-2712:
-------------------------------------
Thanks a lot [~hanm] for the unit test results and comments.
bq. I am curious what exactly cause the races though - from the description it
was not very clear to me. Do you mind to elaborate a little bit with regards to
what code in test case that uses what function in the dependency library and
what the race condition is?
I hope the following explanation will help to understand the concurrency flow.
Below is the auth failed exception which frequently hits in our
{{KerberosSecurityTestcase}} related unit test cases.
{code}
2017-03-17 15:55:51,397 [myid:] - WARN
[NioProcessor-3:KerberosProtocolHandler@241] - Server not found in Kerberos
database (7)
2017-03-17 15:55:51,398 [myid:] - WARN
[NioProcessor-3:KerberosProtocolHandler@242] - Server not found in Kerberos
database (7)
[Krb5LoginModule] authentication failed
Server not found in Kerberos database (7) - Server not found in Kerberos
database
2017-03-17 15:55:51,409 [myid:1] - ERROR
[Thread-3:QuorumPeerTestBase$MainThread@145] - unexpected exception in run
javax.security.sasl.SaslException: Failed to initialize authentication
mechanism using SASL [Caused by javax.security.auth.login.LoginException:
Server not found in Kerberos database (7) - Server not found in Kerberos
database]
at
org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.<init>(SaslQuorumAuthServer.java:69)
at
org.apache.zookeeper.server.quorum.QuorumPeer.initialize(QuorumPeer.java:570)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:162)
{code}
As we know, test case is creating a ZK cluster of size 3 and uses
[QuorumAuthTestBase.startServer()
#L186|https://github.com/apache/zookeeper/blob/branch-3.4/src/java/test/org/apache/zookeeper/server/quorum/auth/QuorumAuthTestBase.java#L186]
function to start server in a separate thread. Now, we have three servers
starting parallel in three different threads. During startup, each server will
initialize SaslQuorumAuthServer and SaslQuorumAuthLearner
[QuorumPeer#init|https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L570]
and does auth login. For Krb login, it internally uses ApacheDS library as
this test case is based on {{KerberosSecurityTestcase}}. I have experimented a
test scenario of doing multiple Krb
{{javax.security.auth.login.LoginContext#login()}} simultaneously and hits
exactly the same error {{server not found in Kerberos database}}. Later, I made
the login in a sequential fashion and never hits server not found problem. I
personally feel, that ApacheDS login module is sharing some resources and
resulting in concurrency failure. IMHO, fixing ApacheDS is not our scope and
the sequential login changes makes the test case more consistent, does this
make sense to you?
> MiniKdc test case intermittently failing due to principal not found in
> Kerberos database
> ----------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2712
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2712
> Project: ZooKeeper
> Issue Type: Bug
> Components: tests
> Reporter: Rakesh R
> Assignee: Rakesh R
> Priority: Critical
> Fix For: 3.4.10
>
> Attachments:
> TEST-org.apache.zookeeper.server.quorum.auth.QuorumKerberosAuthTest.txt
>
>
> MiniKdc test cases are intermittently failing due to not finding the
> principal. Below is the failure stacktrace.
> {code}
> 2017-03-08 13:21:10,843 [myid:] - ERROR
> [NioProcessor-1:AuthenticationService@187] - Error while searching for client
> [email protected] : Client not found in Kerberos database
> 2017-03-08 13:21:10,843 [myid:] - WARN
> [NioProcessor-2:KerberosProtocolHandler@241] - Server not found in Kerberos
> database (7)
> 2017-03-08 13:21:10,845 [myid:] - WARN
> [NioProcessor-2:KerberosProtocolHandler@242] - Server not found in Kerberos
> database (7)
> 2017-03-08 13:21:10,844 [myid:] - WARN
> [NioProcessor-1:KerberosProtocolHandler@241] - Client not found in Kerberos
> database (6)
> 2017-03-08 13:21:10,845 [myid:] - WARN
> [NioProcessor-1:KerberosProtocolHandler@242] - Client not found in Kerberos
> database (6)
> {code}
> Will attach the detailed log to jira.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)