[ 
https://issues.apache.org/jira/browse/KAFKA-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930563#comment-17930563
 ] 

Bartosz Kubiak commented on KAFKA-15796:
----------------------------------------

Hello,  we are having  very same case as described above by [~sthu]. We are 
using SASL/OAUTHBEARER mechanism with keycloak as a Authentication Service. 
When ExpiringCredentialRefresingLogin doesn't acquire token due to any reason, 
then during authentication an SaslAuthenticationException occurs and its stops 
consumer as it is considered as fatal error. After adding 
authorizationExceptionRetryInterval problem with AuthException is solved and 
consumer is still working, but client start calling LegacyKafkaConsumer.pool() 
in endless loop which takes all of cpu. This is a critical problem for us. Will 
this be fixed in any version ? [~xiaotong.wang] [~pnee] 

> High CPU issue in Kafka Producer when Auth Failed 
> --------------------------------------------------
>
>                 Key: KAFKA-15796
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15796
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, producer 
>    Affects Versions: 3.2.2, 3.2.3, 3.3.1, 3.3.2, 3.5.0, 3.4.1, 3.6.0, 3.5.1
>            Reporter: xiaotong.wang
>            Priority: Major
>         Attachments: image-2023-11-07-14-18-32-016.png
>
>
> How to reproduce
> 1、kafka-client 3.x.x  Producer config  enable.idempotence=true  (this is 
> default)
> 2、start kafka server , not contain client user auth info
> 3、start client producer , after 3.x,producer will initProducerId and TCM 
> state trans to INITIALIZING
> 4、server reject client reqesut , producer will raise 
> AuthenticationException  
> (org.apache.kafka.clients.producer.internals.Sender#maybeSendAndPollTransactionalRequest)
> 5、kafka-client org.apache.kafka.clients.producer.internals.Sender#runOnce 
> catch
> AuthenticationException 
>       call transactionManager.authenticationFailed(e); 
>     
>      synchronized void authenticationFailed(AuthenticationException e)
> {           for (TxnRequestHandler request : pendingRequests)           
> request.fatalError(e);       }
>      this method only handle pendingRequest,but inflight request is missing 
> 6、 TCM state will alway in INITIALIZING
>       for judgment Condition: currentState != State.INITIALIZING && 
> !hasProducerId()
> 7、producer send mesasge , mesasge go into  batch queue,Sender will wake up 
> and set pollTimeout=0 , prepare to send message 
> 8、but , before Sender sendProducerData ,it will do message filter 
> ,RecordAccumulator drain 
> {-}{{-}}>drainBatchesForOneNode{{-}}{-}>shouldStopDrainBatchesForPartition 
>       when producerIdAndEpoch.isValid()==false,return true, it will not 
> collect any message 
> 9、now kafka producer network thread  CPU usage will go 100%
> 10、even we add user auth info and permission in kafka server ,it can not 
> self-healing
>  
>  
>  
> suggest : 
> also catch AuthenticationException  in  
> org.apache.kafka.clients.producer.internals.Sender#maybeSendAndPollTransactionalRequest
>   and respone failed to inflight InitProducerId request
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to