[GitHub] [kafka] showuon commented on a diff in pull request #14049: KAFKA-14038: Optimise calculation of size for log in remote tier

2023-07-27 Thread via GitHub


showuon commented on code in PR #14049:
URL: https://github.com/apache/kafka/pull/14049#discussion_r1276187790


##
storage/src/test/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManagerTest.java:
##
@@ -168,4 +169,64 @@ private void waitUntilConsumerCatchesup(TopicIdPartition 
newLeaderTopicIdPartiti
 }
 }
 
+@Test
+public void testRemoteLogSizeCalculationForUnknownTopicIdPartitionThrows() 
{
+TopicIdPartition topicIdPartition = new 
TopicIdPartition(Uuid.randomUuid(), new TopicPartition("singleton", 0));
+Assertions.assertThrows(RemoteResourceNotFoundException.class, () -> 
topicBasedRlmm().remoteLogSize(topicIdPartition, 0));
+}
+
+@Test
+public void testRemoteLogSizeCalculationWithSegmentsOfTheSameEpoch() 
throws RemoteStorageException, TimeoutException {
+TopicIdPartition topicIdPartition = new 
TopicIdPartition(Uuid.randomUuid(), new TopicPartition("singleton", 0));
+TopicBasedRemoteLogMetadataManager topicBasedRemoteLogMetadataManager 
= topicBasedRlmm();
+
+RemoteLogSegmentMetadata firstSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+0, 100, -1L, 0, time.milliseconds(), SEG_SIZE, 
Collections.singletonMap(0, 0L));
+RemoteLogSegmentMetadata secondSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+100, 200, -1L, 0, time.milliseconds(), SEG_SIZE * 2, 
Collections.singletonMap(0, 0L));
+RemoteLogSegmentMetadata thirdSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+200, 300, -1L, 0, time.milliseconds(), SEG_SIZE * 3, 
Collections.singletonMap(0, 0L));
+
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(firstSegmentMetadata);
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(secondSegmentMetadata);
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(thirdSegmentMetadata);
+
+
topicBasedRemoteLogMetadataManager.onPartitionLeadershipChanges(Collections.singleton(topicIdPartition),
 Collections.emptySet());
+
+// RemoteLogSegmentMetadata events are already published, and 
topicBasedRlmm's consumer manager will start
+// fetching those events and build the cache.
+waitUntilConsumerCatchesup(topicIdPartition, topicIdPartition, 
30_000L);
+
+Long remoteLogSize = 
topicBasedRemoteLogMetadataManager.remoteLogSize(topicIdPartition, 0);
+
+Assertions.assertEquals(SEG_SIZE * 6, remoteLogSize);
+}
+
+@Test
+public void testRemoteLogSizeCalculationWithSegmentsOfDifferentEpochs() 
throws RemoteStorageException, TimeoutException {
+TopicIdPartition topicIdPartition = new 
TopicIdPartition(Uuid.randomUuid(), new TopicPartition("singleton", 0));
+TopicBasedRemoteLogMetadataManager topicBasedRemoteLogMetadataManager 
= topicBasedRlmm();
+
+RemoteLogSegmentMetadata firstSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+0, 100, -1L, 0, time.milliseconds(), SEG_SIZE, 
Collections.singletonMap(0, 0L));
+RemoteLogSegmentMetadata secondSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+100, 200, -1L, 0, time.milliseconds(), SEG_SIZE * 2, 
Collections.singletonMap(1, 100L));
+RemoteLogSegmentMetadata thirdSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+200, 300, -1L, 0, time.milliseconds(), SEG_SIZE * 3, 
Collections.singletonMap(2, 200L));
+
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(firstSegmentMetadata);
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(secondSegmentMetadata);
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(thirdSegmentMetadata);
+
+
topicBasedRemoteLogMetadataManager.onPartitionLeadershipChanges(Collections.singleton(topicIdPartition),
 Collections.emptySet());
+
+// RemoteLogSegmentMetadata events are already published, and 
topicBasedRlmm's consumer manager will start
+// fetching those events and build the cache.
+waitUntilConsumerCatchesup(topicIdPartition, topicIdPartition, 
30_000L);
+
+Assertions.assertEquals(SEG_SIZE, 
topicBasedRemoteLogMetadataManager.remoteLogSize(topicIdPartition, 0));
+Assertions.assertEquals(SEG_SIZE * 2, 
topicBasedRemoteLogMetadataManager.remoteLogSize(topicIdPartition, 1));
+Assertions.assertEquals(SEG_SIZE * 3, 
topicBasedRemoteLogMetadataManager.remoteLogSize(topicIdPartition, 2));

Review Comment:
   Sorry I didn't make it clear. Thanks @divijvaidya , yes, that's what I meant.



[GitHub] [kafka] showuon commented on a diff in pull request #14049: KAFKA-14038: Optimise calculation of size for log in remote tier

2023-07-26 Thread via GitHub


showuon commented on code in PR #14049:
URL: https://github.com/apache/kafka/pull/14049#discussion_r1274908196


##
storage/src/test/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManagerTest.java:
##
@@ -168,4 +169,64 @@ private void waitUntilConsumerCatchesup(TopicIdPartition 
newLeaderTopicIdPartiti
 }
 }
 
+@Test
+public void testRemoteLogSizeCalculationForUnknownTopicIdPartitionThrows() 
{
+TopicIdPartition topicIdPartition = new 
TopicIdPartition(Uuid.randomUuid(), new TopicPartition("singleton", 0));
+Assertions.assertThrows(RemoteResourceNotFoundException.class, () -> 
topicBasedRlmm().remoteLogSize(topicIdPartition, 0));
+}
+
+@Test
+public void testRemoteLogSizeCalculationWithSegmentsOfTheSameEpoch() 
throws RemoteStorageException, TimeoutException {
+TopicIdPartition topicIdPartition = new 
TopicIdPartition(Uuid.randomUuid(), new TopicPartition("singleton", 0));
+TopicBasedRemoteLogMetadataManager topicBasedRemoteLogMetadataManager 
= topicBasedRlmm();
+
+RemoteLogSegmentMetadata firstSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+0, 100, -1L, 0, time.milliseconds(), SEG_SIZE, 
Collections.singletonMap(0, 0L));
+RemoteLogSegmentMetadata secondSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+100, 200, -1L, 0, time.milliseconds(), SEG_SIZE * 2, 
Collections.singletonMap(0, 0L));
+RemoteLogSegmentMetadata thirdSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+200, 300, -1L, 0, time.milliseconds(), SEG_SIZE * 3, 
Collections.singletonMap(0, 0L));
+
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(firstSegmentMetadata);
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(secondSegmentMetadata);
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(thirdSegmentMetadata);
+
+
topicBasedRemoteLogMetadataManager.onPartitionLeadershipChanges(Collections.singleton(topicIdPartition),
 Collections.emptySet());
+
+// RemoteLogSegmentMetadata events are already published, and 
topicBasedRlmm's consumer manager will start
+// fetching those events and build the cache.
+waitUntilConsumerCatchesup(topicIdPartition, topicIdPartition, 
30_000L);
+
+Long remoteLogSize = 
topicBasedRemoteLogMetadataManager.remoteLogSize(topicIdPartition, 0);
+
+Assertions.assertEquals(SEG_SIZE * 6, remoteLogSize);
+}
+
+@Test
+public void testRemoteLogSizeCalculationWithSegmentsOfDifferentEpochs() 
throws RemoteStorageException, TimeoutException {
+TopicIdPartition topicIdPartition = new 
TopicIdPartition(Uuid.randomUuid(), new TopicPartition("singleton", 0));
+TopicBasedRemoteLogMetadataManager topicBasedRemoteLogMetadataManager 
= topicBasedRlmm();
+
+RemoteLogSegmentMetadata firstSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+0, 100, -1L, 0, time.milliseconds(), SEG_SIZE, 
Collections.singletonMap(0, 0L));
+RemoteLogSegmentMetadata secondSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+100, 200, -1L, 0, time.milliseconds(), SEG_SIZE * 2, 
Collections.singletonMap(1, 100L));
+RemoteLogSegmentMetadata thirdSegmentMetadata = new 
RemoteLogSegmentMetadata(new RemoteLogSegmentId(topicIdPartition, 
Uuid.randomUuid()),
+200, 300, -1L, 0, time.milliseconds(), SEG_SIZE * 3, 
Collections.singletonMap(2, 200L));
+
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(firstSegmentMetadata);
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(secondSegmentMetadata);
+
topicBasedRemoteLogMetadataManager.addRemoteLogSegmentMetadata(thirdSegmentMetadata);
+
+
topicBasedRemoteLogMetadataManager.onPartitionLeadershipChanges(Collections.singleton(topicIdPartition),
 Collections.emptySet());
+
+// RemoteLogSegmentMetadata events are already published, and 
topicBasedRlmm's consumer manager will start
+// fetching those events and build the cache.
+waitUntilConsumerCatchesup(topicIdPartition, topicIdPartition, 
30_000L);
+
+Assertions.assertEquals(SEG_SIZE, 
topicBasedRemoteLogMetadataManager.remoteLogSize(topicIdPartition, 0));
+Assertions.assertEquals(SEG_SIZE * 2, 
topicBasedRemoteLogMetadataManager.remoteLogSize(topicIdPartition, 1));
+Assertions.assertEquals(SEG_SIZE * 3, 
topicBasedRemoteLogMetadataManager.remoteLogSize(topicIdPartition, 2));

Review Comment:
   Could we also verify what will return when the epoch is not out of range (I 
thi

[GitHub] [kafka] showuon commented on a diff in pull request #14049: KAFKA-14038: Optimise calculation of size for log in remote tier

2023-07-26 Thread via GitHub


showuon commented on code in PR #14049:
URL: https://github.com/apache/kafka/pull/14049#discussion_r1274883826


##
storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogMetadataManager.java:
##
@@ -201,4 +201,13 @@ void onPartitionLeadershipChanges(Set 
leaderPartitions,
  * @param partitions topic partitions that have been stopped.
  */
 void onStopPartitions(Set partitions);
+
+/**
+ * Returns total size of the log for the given leader epoch in remote 
storage.
+ *
+ * @param topicPartition topic partition for which size needs to be 
calculated.
+ * @param leaderEpoch Size will only include segments belonging to this 
epoch.
+ * @return Total size of the log stored in remote storage in bytes.
+ */
+Long remoteLogSize(TopicIdPartition topicPartition, int leaderEpoch) 
throws RemoteStorageException;

Review Comment:
   +1 to change it in subsequent commits + update the KIP.



##
storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java:
##
@@ -327,6 +327,17 @@ public void onStopPartitions(Set 
partitions) {
 }
 }
 
+@Override
+public Long remoteLogSize(TopicIdPartition topicPartition, int 
leaderEpoch) throws RemoteStorageException {
+long remoteLogSize = 0L;
+Iterator remoteLogSegmentMetadataIterator = 
remotePartitionMetadataStore.listRemoteLogSegments(topicPartition, leaderEpoch);

Review Comment:
   I didn't see the comment you added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org