[ 
https://issues.apache.org/jira/browse/KAFKA-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849695#comment-17849695
 ] 

Kamal Chandraprakash commented on KAFKA-16780:
----------------------------------------------

The issue mentioned above also applies to the normal topics on which remote 
storage is not enabled. When the consumer is configured with READ_COMMITTED 
isolation and reads from the beginning of the partition, we scan all the 
transaction indexes to collect the aborted transactions (the indexes would be 
empty if the producer is not a transactional producer). This can add delay to 
respond to the FETCH request when we have lot of segments/indexes to scan.

[~jolshan] [~chia7712] [~showuon] 

Could you please suggest an approach on how to proceed on this? Thanks!

> Txn consumer exerts pressure on remote storage when reading non-txn topic
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-16780
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16780
>             Project: Kafka
>          Issue Type: Task
>            Reporter: Kamal Chandraprakash
>            Priority: Major
>
> h3. Logic to read aborted txns:
>  # When the consumer enables isolation_level as {{READ_COMMITTED}} and reads 
> a non-txn topic, then the broker has to 
> [traverse|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L394]
>  all the local log segments to collect the aborted transactions since there 
> won't be any entry in the transaction index.
>  # The same 
> [logic|https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1436]
>  is applied while reading from remote storage. In this case, when the FETCH 
> request is reading data from the first remote log segment, then it has to 
> fetch the transaction indexes of all the remaining remote-log segments, and 
> then the call lands to the local-log segments before responding to the FETCH 
> request which increases the time taken to serve the requests.
> The [EoS Abort 
> Index|https://docs.google.com/document/d/1Rlqizmk7QCDe8qAnVW5e5X8rGvn6m2DCR3JR2yqwVjc]
>  design doc explains how the transaction index file filters out the aborted 
> transaction records.
> The issue is when consumers are enabled with the {{READ_COMMITTED}} isolation 
> level but read the non-txn topics. If the topic is enabled with the 
> transaction, then we expect the transaction to either commit/rollback within 
> 15 minutes (default transaction.max.timeout.ms = 15 mins), possibly we may 
> have to search only a few remote log segments to collect the aborted txns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to