Alla Tumarkin created KAFKA-5709:
------------------------------------

             Summary: Improve logging to include errors from state-change log 
in server log
                 Key: KAFKA-5709
                 URL: https://issues.apache.org/jira/browse/KAFKA-5709
             Project: Kafka
          Issue Type: Improvement
          Components: log
    Affects Versions: 0.10.2.0
            Reporter: Alla Tumarkin


Problem
The following message was generated over and over again when running 
kafka-console-producer or kafka-console-consumer with SSL and ACLs enabled
{code}
WARN Error while fetching metadata with correlation id 1 : 
{test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)` endlessly 
when I run kafka-console-producer or kafka-console-consumer
{code}
however server log (or authorizer log) did not indicate any problem.

Background
1) Initially, security.inter.broker.protocol setting was missing in 
server.properties, so connection was falling back to using plaintext port 
(9092), and only state-change.log actually contained the underlying error:
{code}
[2017-08-04 13:40:48,536] TRACE Controller 0 epoch 6 received response 
{error_code=31,partitions=[{topic=test,partition=0,error_code=31},{topic=__confluent.support.metrics,partition=0,error_code=31}]}
 for a request sent to broker localhost:9092 (id: 0 rack: null) 
(state.change.logger)
{code}
as per
https://kafka.apache.org/protocol#protocol_error_codes
{code}
CLUSTER_AUTHORIZATION_FAILED    31      False   Cluster authorization failed.
{code}

2) After setting "security.inter.broker.protocol=SSL" the port changed to 
secure (9093) yet the error in state-change log did not go away:
{code}
[2017-08-04 13:49:38,462] TRACE Controller 0 epoch 7 received response 
{error_code=31} for a request sent to broker localhost:9093 (id: 0 rack: null) 
(state.change.logger)
{code}
and LEADER_NOT_AVAILABLE was still generated.

This time though, kafka-authorizer.log had a better indication of the problem:
{code}
[2017-08-04 18:17:46,770] DEBUG No acl found for resource 
Cluster:kafka-cluster, authorized = false (kafka.authorizer.logger)
[2017-08-04 18:17:46,770] DEBUG Principal = 
User:CN=Unknown,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown is Denied 
Operation = ClusterAction from host = 127.0.0.1 on resource = 
Cluster:kafka-cluster (kafka.authorizer.logger)
{code}
The issue being that topic metadata is not propagated successfully from 
controller to broker since the broker user doesn't have ClusterAction 
permission.
Fixed by
{code}
bin/kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add 
--allow-principal 
"User:CN=Unknown,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown" 
--operation ClusterAction --cluster
{code}

Request
The debugging is tricky since the controller to broker logging is done in 
controller/state-change log, not in the main server log.
We need to improve the logging on this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to