BewareMyPower commented on PR #1494:
URL: 
https://github.com/apache/pulsar-client-go/pull/1494#issuecomment-4442066016

   The failed blue-green migration is not caused by this PR.
   
   ----
   
   Based on my analysis, **the test failure is NOT caused by the commits since 
master**. Here's why:
   
   **Changes since master:**
   All 8 commits only modify `pulsar/consumer_impl.go` — a pure refactoring of 
the `consumer` struct's `consumers` slice access for thread safety (replacing 
direct indexing with `findPartitionConsumer`/`unsafeFindPartitionConsumer`, 
adding a snapshot in `hasNext()`). The `sync.Mutex` was already embedded before 
these changes.
   
   **Test failure root cause:**
   The failure occurs in `consumer_partition.go` (unchanged by these commits). 
The `proxyConnection` sub-test times out because during topic migration 
reconnection:
   
   1. `grabConn` sends a Subscribe request after getting a connection from the 
pool
   2. The request times out (line 81: `request timed out` at 
`consumer_partition.go:2244`)
   3. The timeout handler tries to close the consumer via `RequestOnCnx` on the 
same connection (`consumer_partition.go:2251-2256`), but this also likely fails 
silently
   4. On subsequent retries, `GetConnection` returns the same pooled connection 
where the broker still has consumerID=1 registered from the timed-out 
subscribe, resulting in: `UnknownError: Consumer that failed is already present 
on the connection`
   
   This is a pre-existing race condition in the reconnection logic — the 
consumer doesn't properly handle the case where a Subscribe request reaches the 
broker but the response doesn't reach the client.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to