vinkal-chudgar commented on PR #24962:
URL: https://github.com/apache/pulsar/pull/24962#issuecomment-3513194447

   > > `BinaryProtoLookupService#getTopicsUnderNamespace` currently does not 
deduplicate concurrent requests with identical parameters (`namespace`, `mode`, 
`topicsPattern`, `topicsHash`). For each call, it creates a new 
`CompletableFuture` and immediately invokes the helper to obtain a connection 
and send a request, which can cause request amplification when multiple 
components query the same namespace simultaneously.
   > 
   > @vinkal-chudgar This change itself is reasonable, but I'm just wondering 
if this PR is addressing a real problem. How common would it be to have 
multiple consumers with exactly the same topics pattern using the same Pulsar 
client instance? Do you have this use case your self?
   
   To the best of my knowledge, yes, this PR addresses a real case that occurs 
within a single `PulsarClient` instance. Please correct me if you think 
otherwise.
   
   ### 1. Is this a real problem? 
   
   The client currently creates a new future and starts a new request for every 
call to `getTopicsUnderNamespace(...)` even when the inputs are identical. In 
contrast, `getBroker(...)` and `getPartitionedTopicMetadata(...)` already 
coalesce identical in-flight calls. Pattern topic discovery uses 
`getTopicsUnderNamespace(...)` both during initial subscribe and during 
periodic rechecks, so identical calls can happen at the same time within the 
same `PulsarClient / BinaryProtoLookupService` instance. In that window the 
client issues duplicate requests that this PR removes.
   
   ### 2. How common is “same pattern, same client”?
   It depends on the application, but it is not unusual. Pulsar encourages 
reusing one `PulsarClient` per process. It is common to have more than one 
pattern consumer in the same process for the same namespace and regex but with 
different subscriptions or modules. There are two concrete moments where 
identical calls line up within the same `PulsarClient / 
BinaryProtoLookupService instance`:
   
   - Initial subscribe with a topics pattern: 
`PulsarClientImpl#subscribeAsync(...) `routes to 
`patternTopicSubscribeAsync(...)` when `topicsPattern` is set. That method 
calls `lookup.getTopicsUnderNamespace(namespaceName, subscriptionMode, regex, 
null)` to seed the topic set.
   
   - Periodic rechecks for pattern subscriptions: 
`PatternMultiTopicsConsumerImpl` also calls 
`lookup.getTopicsUnderNamespace(...)` during its scheduled rechecks. When 
multiple pattern consumers were created around the same time and share the same 
(`namespace`, `mode`, `topicsPattern`, `topicsHash`), these rechecks can line 
up and issue the same call concurrently.
   
   Even if this is not constant traffic, when it happens it amplifies requests. 
The fix is small and keeps behavior consistent with the other two lookup paths.
   
   ### 3. Do I have this use case myself?
   No, I do not have a personal production deployment that hits this case. I 
did see it surface in CI: while working on unrelated changes, a CI run failed 
on `PatternConsumerBackPressureTest.testInfiniteGetThousandsTopics` about 7 to 
8 days ago. That test drives many concurrent `getTopicsUnderNamespace(...)` 
calls on the same client with the same inputs. It highlighted that this path 
does not coalesce identical in-flight calls today.
   
   P.S. I have created Issue #24964 with all the required details 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to