m1a2st opened a new pull request, #19214:
URL: https://github.com/apache/kafka/pull/19214

   The `lastOffset` includes the entire batch header, so we should check 
`baseOffset` instead.  
   
   To optimize this, we need to update the search logic. The previous approach 
simply checked whether each batch's `lastOffset()` was greater than or equal to 
the target offset. Once it found the first batch that met this condition, it 
returned that batch immediately.  
   
   Now that we are using `baseOffset()`, we need to handle a special case: if 
the `targetOffset` falls between the `lastOffset` of the previous batch and the 
`baseOffset` of the matching batch, we should select the matching batch. The 
updated logic is structured as follows:  
   
   1. First, check if `baseOffset() == targetOffset` for an exact match.  
   2. Keep track of the previous batch (`prevBatch`) for comparison.  
   3. When encountering a batch where `baseOffset() > targetOffset`:  
      - If there is no previous batch, return the current batch.  
      - If the previous batch's `lastOffset()` is greater than or equal to 
`targetOffset`, return the previous batch (indicating that the target falls 
within it).  
      - Otherwise, return the current batch.  
   4. After iterating through all batches, check if the last batch contains the 
target offset.
   
   Test: Verifying Memory Usage Improvement  
   To evaluate whether this optimization helps, I followed the steps below to 
monitor memory usage:  
   
   1. Start a Standalone Kafka Server  
   ```sh
   KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
   bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c 
config/server.properties
   bin/kafka-server-start.sh config/server.properties
   ```  
   
   2. Use Performance Console Tools to Produce and Consume Records  
   **Produce Records:**  
   ```sh
   ./kafka-producer-perf-test.sh \
     --topic test-topic \
     --num-records 1000000000 \
     --record-size 100 \
     --throughput -1 \
     --producer-props bootstrap.servers=localhost:9092
   ```  
   **Consume Records:**  
   ```sh
   ./bin/kafka-consumer-perf-test.sh \
     --topic test-topic \
     --messages 1000000000 \
     --bootstrap-server localhost:9092
   ```  
   trunk:
   ![CleanShot 2025-03-16 at 11 53 
31@2x](https://github.com/user-attachments/assets/eec26b1d-38ed-41c8-8c49-e5c68643761b)
   this PR:
   ![CleanShot 2025-03-16 at 11 54 
05@2x](https://github.com/user-attachments/assets/3857fe7a-8deb-42ff-b7eb-cc11142b74ce)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to