Jianbin Chen created KAFKA-17020:
------------------------------------

             Summary: After enabling tiered storage, occasional residual logs 
are left in the replica
                 Key: KAFKA-17020
                 URL: https://issues.apache.org/jira/browse/KAFKA-17020
             Project: Kafka
          Issue Type: Wish
    Affects Versions: 3.7.0
            Reporter: Jianbin Chen


After enabling tiered storage, occasional residual logs are left in the replica.
Based on the observed phenomenon, the index values of the rolled-out logs 
generated by the replica and the leader are not the same. As a result, the logs 
uploaded to S3 at the same time do not include the corresponding log files on 
the replica side, making it impossible to delete the local logs.
[!https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E!|https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E]
leader config:
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=86400000
log.local.retention.ms=600000
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=15811200000
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=180000000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
# Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=1200000
# # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.bucket.name=
rsm.config.storage.s3.region=us-west-1
rsm.config.storage.aws.secret.access.key=
rsm.config.storage.aws.access.key.id=
rsm.config.chunk.size=8388608
remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/*:/home/admin/s3-0.0.1-SNAPSHOT/*
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=PLAINTEXT
rsm.config.upload.rate.limit.bytes.per.second=31457280
replica config:
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=86400000
log.local.retention.ms=600000
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=15811200000
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
#remote.log.metadata.topic.retention.ms=180000000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
# Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=1200000
# # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.bucket.name=
rsm.config.storage.s3.region=us-west-1
rsm.config.storage.aws.secret.access.key=
rsm.config.storage.aws.access.key.id=
rsm.config.chunk.size=8388608
remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/*:/home/admin/s3-0.0.1-SNAPSHOT/*
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=PLAINTEXT
rsm.config.upload.rate.limit.bytes.per.second=31457280

 
topic config:
Dynamic configs for topic xxxxxx are:
  local.retention.ms=600000 sensitive=false 
synonyms=\{DYNAMIC_TOPIC_CONFIG:local.retention.ms=600000, 
STATIC_BROKER_CONFIG:log.local.retention.ms=600000, 
DEFAULT_CONFIG:log.local.retention.ms=-2}
  remote.storage.enable=true sensitive=false 
synonyms=\{DYNAMIC_TOPIC_CONFIG:remote.storage.enable=true}
  retention.ms=15811200000 sensitive=false 
synonyms=\{DYNAMIC_TOPIC_CONFIG:retention.ms=15811200000, 
STATIC_BROKER_CONFIG:log.retention.ms=15811200000, 
DEFAULT_CONFIG:log.retention.hours=168}
  segment.bytes=536870912 sensitive=false 
synonyms=\{DYNAMIC_TOPIC_CONFIG:segment.bytes=536870912, 
STATIC_BROKER_CONFIG:log.segment.bytes=536870912, 
DEFAULT_CONFIG:log.segment.bytes=1073741824}
 
[!https://private-user-images.githubusercontent.com/19943636/341939710-76088243-d764-4c50-bbec-18277f3c4296.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5Mzk3MTAtNzYwODgyNDMtZDc2NC00YzUwLWJiZWMtMTgyNzdmM2M0Mjk2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE1NmE0MzA3MzQ0ODYyN2VlODc2NjI4ZjgyMTk1YzFkNTJjNzAzM2EzNGUzZDU2M2MzZjQ5ZGU4NTVmZDI4NjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.P1ZL9_WcHbcnbI5vmdimDY4wyMb8hRTSbOd-dGaTxM0!|https://private-user-images.githubusercontent.com/19943636/341939710-76088243-d764-4c50-bbec-18277f3c4296.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5Mzk3MTAtNzYwODgyNDMtZDc2NC00YzUwLWJiZWMtMTgyNzdmM2M0Mjk2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE1NmE0MzA3MzQ0ODYyN2VlODc2NjI4ZjgyMTk1YzFkNTJjNzAzM2EzNGUzZDU2M2MzZjQ5ZGU4NTVmZDI4NjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.P1ZL9_WcHbcnbI5vmdimDY4wyMb8hRTSbOd-dGaTxM0]
By examining the segment logs for that time period in S3 for the topic, it can 
be observed that the indices of the two are different.
[!https://private-user-images.githubusercontent.com/19943636/341939810-9c4e3f2f-95b8-492c-b28a-ed9d1ea1893e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5Mzk4MTAtOWM0ZTNmMmYtOTViOC00OTJjLWIyOGEtZWQ5ZDFlYTE4OTNlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThjZmJmMjYxMzYzZmZhZDI5MTAxOWViMzg5ZDBiZmNhYTFkMTFkMzc1ODlkZTFmNDljMmRmOWE3YjM2YmUyZDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.nepnHPd4mMoDu0UNr6DCJsBf0ck1T7H4sEvEXs9hnys!|https://private-user-images.githubusercontent.com/19943636/341939810-9c4e3f2f-95b8-492c-b28a-ed9d1ea1893e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5Mzk4MTAtOWM0ZTNmMmYtOTViOC00OTJjLWIyOGEtZWQ5ZDFlYTE4OTNlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThjZmJmMjYxMzYzZmZhZDI5MTAxOWViMzg5ZDBiZmNhYTFkMTFkMzc1ODlkZTFmNDljMmRmOWE3YjM2YmUyZDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.nepnHPd4mMoDu0UNr6DCJsBf0ck1T7H4sEvEXs9hnys]
By searching for the residual log index through log analysis, it was found that 
there were no delete logs on both the leader and replica nodes. However, the 
logs for the corresponding time period in S3 can be queried in the leader node 
logs but not in the replica node logs. Therefore, I believe that the issue is 
due to the different log files generated by the leader and replica nodes.
[!https://private-user-images.githubusercontent.com/19943636/341939924-17a1cfdf-b3b1-42c1-912c-d2644a97b1b5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5Mzk5MjQtMTdhMWNmZGYtYjNiMS00MmMxLTkxMmMtZDI2NDRhOTdiMWI1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2NDBjOGQ5YTNlODBhNzYwNjQ2MjAzYTVjYWMwMDFlMjkzMTMxZjUyMjM5NDc3ZDllYjlmY2M5MmJkNTQ3ZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.8Abjl8yVFW3hiihABF48S7cL8WpSU-WiUbANxVRD-s0!|https://private-user-images.githubusercontent.com/19943636/341939924-17a1cfdf-b3b1-42c1-912c-d2644a97b1b5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5Mzk5MjQtMTdhMWNmZGYtYjNiMS00MmMxLTkxMmMtZDI2NDRhOTdiMWI1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2NDBjOGQ5YTNlODBhNzYwNjQ2MjAzYTVjYWMwMDFlMjkzMTMxZjUyMjM5NDc3ZDllYjlmY2M5MmJkNTQ3ZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.8Abjl8yVFW3hiihABF48S7cL8WpSU-WiUbANxVRD-s0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to