Re: [PR] [improve][meta] Log a warning when ZK batch fails with connectionloss [pulsar]

2024-04-23 Thread via GitHub


lhotari commented on PR #22566:
URL: https://github.com/apache/pulsar/pull/22566#issuecomment-2072024722

   I might have encountered this issue recently when working on the /metrics 
endpoint issue.
   A few times I got errors such as `java.io.IOException: Packet len 15589885 
is out of range!` without any other proper explanation. More details of test 
setup in #22477 and #22494 comments. It seems that 
https://github.com/lhotari/pulsar-playground/blob/master/src/main/java/com/github/lhotari/pulsar/playground/TestScenarioCreateLongNamedTopics.java
 and 
https://github.com/lhotari/pulsar-playground/blob/master/src/main/java/com/github/lhotari/pulsar/playground/TestScenarioLoadAll.java
 would reproduce the problem I faced.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [improve][meta] Log a warning when ZK batch fails with connectionloss [pulsar]

2024-04-23 Thread via GitHub


lhotari commented on PR #22566:
URL: https://github.com/apache/pulsar/pull/22566#issuecomment-2072410563

   The test 
https://github.com/apache/pulsar/blob/61296d90d912493113d7a18c18eef23114810ce7/pulsar-metadata/src/test/java/org/apache/pulsar/metadata/MetadataStoreBatchingTest.java#L151-L182
 will reproduce the problem.
   it will log "2024-04-23T17:03:51,416 - WARN  - 
[main-EventThread:ZKMetadataStore@204] - Connection loss while executing batch 
operation of 40 GET entries of total data size of 910. Retrying individual 
operations one-by-one." with the changes in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [improve][meta] Log a warning when ZK batch fails with connectionloss [pulsar]

2024-04-23 Thread via GitHub


lhotari commented on PR #22566:
URL: https://github.com/apache/pulsar/pull/22566#issuecomment-2072415742

   It seems that batching reads could cause more harm than benefit when the 
returned data exceeds jute.maxbuffer size.
   This impacts stability a lot so there should be a way to disable batch reads 
completely.
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [improve][meta] Log a warning when ZK batch fails with connectionloss [pulsar]

2024-04-23 Thread via GitHub


heesung-sn commented on code in PR #22566:
URL: https://github.com/apache/pulsar/pull/22566#discussion_r1576672640


##
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java:
##
@@ -192,7 +192,20 @@ protected void batchOperation(List ops) {
 Code code = Code.get(rc);
 if (code == Code.CONNECTIONLOSS) {
 // There is the chance that we caused a connection 
reset by sending or requesting a batch
-// that passed the max ZK limit. Retry with the 
individual operations
+// that passed the max ZK limit.
+
+// Build the log warning message
+// summarize the operations by type
+String countsByType = ops.stream().collect(
+
Collectors.groupingBy(MetadataOp::getType, Collectors.summingInt(op -> 1)))
+.entrySet().stream().map(e -> e.getValue() + " 
" + e.getKey().name() + " entries")
+.collect(Collectors.joining(", "));
+Long totalSize = 
ops.stream().collect(Collectors.summingLong(MetadataOp::size));
+log.warn("Connection loss while executing batch 
operation of {} "

Review Comment:
   Nit: Can we also log the max zk buffer size compared to the data size?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [improve][meta] Log a warning when ZK batch fails with connectionloss [pulsar]

2024-04-23 Thread via GitHub


heesung-sn commented on code in PR #22566:
URL: https://github.com/apache/pulsar/pull/22566#discussion_r1576674865


##
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java:
##
@@ -192,7 +192,20 @@ protected void batchOperation(List ops) {
 Code code = Code.get(rc);
 if (code == Code.CONNECTIONLOSS) {
 // There is the chance that we caused a connection 
reset by sending or requesting a batch
-// that passed the max ZK limit. Retry with the 
individual operations
+// that passed the max ZK limit.
+
+// Build the log warning message
+// summarize the operations by type
+String countsByType = ops.stream().collect(
+
Collectors.groupingBy(MetadataOp::getType, Collectors.summingInt(op -> 1)))
+.entrySet().stream().map(e -> e.getValue() + " 
" + e.getKey().name() + " entries")
+.collect(Collectors.joining(", "));
+Long totalSize = 
ops.stream().collect(Collectors.summingLong(MetadataOp::size));
+log.warn("Connection loss while executing batch 
operation of {} "

Review Comment:
   Also, ideally it would be great if we can emit some metics too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [improve][meta] Log a warning when ZK batch fails with connectionloss [pulsar]

2024-04-26 Thread via GitHub


lhotari merged PR #22566:
URL: https://github.com/apache/pulsar/pull/22566


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org