[jira] [Created] (KAFKA-15796) High CPU issue in Kafka Producer when Auth Failed

2023-11-06 Thread xiaotong.wang (Jira)
xiaotong.wang created KAFKA-15796:
-

 Summary: High CPU issue in Kafka Producer when Auth Failed 
 Key: KAFKA-15796
 URL: https://issues.apache.org/jira/browse/KAFKA-15796
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 3.5.1, 3.6.0, 3.4.1, 3.5.0, 3.3.2, 3.3.1, 3.2.3, 3.2.2
Reporter: xiaotong.wang


How to reproduce

1、kafka-client 3.x.x  Producer config  enable.idempotence=true  (this is 
default)

2、start kafka server , not contain client user auth info

3、start client producer , after 3.x,producer will initProducerId and TCM state 
trans to INITIALIZING
4、server reject client reqesut , producer will raise 
AuthenticationException  
(org.apache.kafka.clients.producer.internals.Sender#maybeSendAndPollTransactionalRequest)
5、kafka-client org.apache.kafka.clients.producer.internals.Sender#runOnce catch
AuthenticationException 
      call transactionManager.authenticationFailed(e); 
    
     synchronized void authenticationFailed(AuthenticationException e) {
          for (TxnRequestHandler request : pendingRequests)
          request.fatalError(e);
      }
     this method only handle pendingRequest,but inflight request is miss 
6、 TCM state will alway in INITIALIZING
      for udgment Condition: currentState != State.INITIALIZING && 
!hasProducerId()
7、producer send mesasge , mesasge go into  batch queue,Sender will wake up and 
set pollTimeout=0 , prepare to send message 

8、but , before Sender sendProducerData ,it will do message filter 
,RecordAccumulator drain 
-->drainBatchesForOneNode-->shouldStopDrainBatchesForPartition 
      when producerIdAndEpoch.isValid()==false,return true, it will not collect 
any message 
9、now kafka producer network thread  CPU useage will go 100%
10、even we add user auth info and permission in kafka server ,it can not 
self-healing
 
 
 
suggest : 
also catch AuthenticationException   
org.apache.kafka.clients.producer.internals.Sender#maybeSendAndPollTransactionalRequest
  and respone failed to inflight InitProducerId request
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-10254) 100% cpu usage by kafkaConsumer poll , when broker can‘t be connect

2020-07-09 Thread xiaotong.wang (Jira)
xiaotong.wang created KAFKA-10254:
-

 Summary: 100% cpu usage by kafkaConsumer poll , when broker can‘t 
be connect 
 Key: KAFKA-10254
 URL: https://issues.apache.org/jira/browse/KAFKA-10254
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 2.5.0
Reporter: xiaotong.wang
 Attachments: image-2020-07-09-19-24-20-604.png

steps

1、start kafka broker 

2、start kafka consumer and subscribe some topic with some kafkaConsumer 
instance and  call  kafkaConsumer.*poll(Duration.ofMillis(pollTimeout))*   and 
set auto.commit.enabled=false

3、iptables to disable kafka broker  ip  in client vm or shutdown kafka brokers

4、cpu go to 100%

 

*why?*

 

 

left Vserison :2.3.1

right Version:2.5.0

 

for 2.3.1 kafkaConsumer when kafka  brokers go  
down,updateAssignmentMetadataIfNeeded will block x ms and return empty records ,

!image-2020-07-09-19-24-20-604.png!

 

for 2.5.0

private Map>> pollForFetches(Timer 
timer) {
 *long pollTimeout = coordinator == null ? timer.remainingMs() :*
 *Math.min(coordinator.timeToNextPoll(timer.currentTimeMs()), 
timer.remainingMs());*



i check the source of kafka client ,poll timeout will be change to 0 ms ,when 
heartbeat timeout ,so  it will call poll without any block ,this will cause cpu 
go to 100%

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)