BewareMyPower commented on PR #1494: URL: https://github.com/apache/pulsar-client-go/pull/1494#issuecomment-4442066016
The failed blue-green migration is not caused by this PR. ---- Based on my analysis, **the test failure is NOT caused by the commits since master**. Here's why: **Changes since master:** All 8 commits only modify `pulsar/consumer_impl.go` — a pure refactoring of the `consumer` struct's `consumers` slice access for thread safety (replacing direct indexing with `findPartitionConsumer`/`unsafeFindPartitionConsumer`, adding a snapshot in `hasNext()`). The `sync.Mutex` was already embedded before these changes. **Test failure root cause:** The failure occurs in `consumer_partition.go` (unchanged by these commits). The `proxyConnection` sub-test times out because during topic migration reconnection: 1. `grabConn` sends a Subscribe request after getting a connection from the pool 2. The request times out (line 81: `request timed out` at `consumer_partition.go:2244`) 3. The timeout handler tries to close the consumer via `RequestOnCnx` on the same connection (`consumer_partition.go:2251-2256`), but this also likely fails silently 4. On subsequent retries, `GetConnection` returns the same pooled connection where the broker still has consumerID=1 registered from the timed-out subscribe, resulting in: `UnknownError: Consumer that failed is already present on the connection` This is a pre-existing race condition in the reconnection logic — the consumer doesn't properly handle the case where a Subscribe request reaches the broker but the response doesn't reach the client. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
