[
https://issues.apache.org/jira/browse/KAFKA-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930563#comment-17930563
]
Bartosz Kubiak commented on KAFKA-15796:
----------------------------------------
Hello, we are having very same case as described above by [~sthu]. We are
using SASL/OAUTHBEARER mechanism with keycloak as a Authentication Service.
When ExpiringCredentialRefresingLogin doesn't acquire token due to any reason,
then during authentication an SaslAuthenticationException occurs and its stops
consumer as it is considered as fatal error. After adding
authorizationExceptionRetryInterval problem with AuthException is solved and
consumer is still working, but client start calling LegacyKafkaConsumer.pool()
in endless loop which takes all of cpu. This is a critical problem for us. Will
this be fixed in any version ? [~xiaotong.wang] [~pnee]
> High CPU issue in Kafka Producer when Auth Failed
> --------------------------------------------------
>
> Key: KAFKA-15796
> URL: https://issues.apache.org/jira/browse/KAFKA-15796
> Project: Kafka
> Issue Type: Bug
> Components: clients, producer
> Affects Versions: 3.2.2, 3.2.3, 3.3.1, 3.3.2, 3.5.0, 3.4.1, 3.6.0, 3.5.1
> Reporter: xiaotong.wang
> Priority: Major
> Attachments: image-2023-11-07-14-18-32-016.png
>
>
> How to reproduce
> 1、kafka-client 3.x.x Producer config enable.idempotence=true (this is
> default)
> 2、start kafka server , not contain client user auth info
> 3、start client producer , after 3.x,producer will initProducerId and TCM
> state trans to INITIALIZING
> 4、server reject client reqesut , producer will raise
> AuthenticationException
> (org.apache.kafka.clients.producer.internals.Sender#maybeSendAndPollTransactionalRequest)
> 5、kafka-client org.apache.kafka.clients.producer.internals.Sender#runOnce
> catch
> AuthenticationException
> call transactionManager.authenticationFailed(e);
>
> synchronized void authenticationFailed(AuthenticationException e)
> { for (TxnRequestHandler request : pendingRequests)
> request.fatalError(e); }
> this method only handle pendingRequest,but inflight request is missing
> 6、 TCM state will alway in INITIALIZING
> for judgment Condition: currentState != State.INITIALIZING &&
> !hasProducerId()
> 7、producer send mesasge , mesasge go into batch queue,Sender will wake up
> and set pollTimeout=0 , prepare to send message
> 8、but , before Sender sendProducerData ,it will do message filter
> ,RecordAccumulator drain
> {-}{{-}}>drainBatchesForOneNode{{-}}{-}>shouldStopDrainBatchesForPartition
> when producerIdAndEpoch.isValid()==false,return true, it will not
> collect any message
> 9、now kafka producer network thread CPU usage will go 100%
> 10、even we add user auth info and permission in kafka server ,it can not
> self-healing
>
>
>
> suggest :
> also catch AuthenticationException in
> org.apache.kafka.clients.producer.internals.Sender#maybeSendAndPollTransactionalRequest
> and respone failed to inflight InitProducerId request
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)