lin-sh opened a new issue, #1497: URL: https://github.com/apache/pulsar-client-go/issues/1497
### Search before asking - [x] I searched in the [issues](https://github.com/apache/pulsar-client-go/issues) and found no similar issues. ### Version v0.17.0 ### Minimal reproduction steps 1. Create an async producer with `SendTimeout` configured 2. Trigger sustained broker unavailability (all brokers return timeout for sends) 3. Wait for the producer to attempt reconnection (`newPartitionProducer`) 4. During reconnection, `FailTimeoutMessages` is called to clean up pending messages 5. Process panics with nil pointer dereference in `sendRequest.done()` ### Expected behavior `FailTimeoutMessages` should safely fail all pending messages without panicking, even when `MessageID` is nil. ### Actual behavior The process panics with: ``` panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x19df0ef] goroutine 1079 [running]: github.com/apache/pulsar-client-go/pulsar.(*sendRequest).done(0xc0a657e2c0, {0x0, 0x0}, {0x248fea0, 0xc000736750}) /root/go/pkg/mod/github.com/apache/[email protected]/pulsar/producer_partition.go:1656 +0x1cf github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).FailTimeoutMessages(0xc005436000) /root/go/pkg/mod/github.com/apache/[email protected]/pulsar/producer_partition.go:1006 +0x4ea github.com/apache/pulsar-client-go/pulsar.newPartitionProducer in goroutine 724 /root/go/pkg/mod/github.com/apache/[email protected]/pulsar/producer_partition.go:239 +0xa3b ``` ### Analysis - The panic occurs inside `sendRequest.done()` at line 1656 of `producer_partition.go` - `MessageID` passed to `done()` is nil (`{0x0, 0x0}` in the stack trace) - The nil dereference is at offset `0x28` (40 bytes), suggesting a field access on the nil MessageID - This is triggered during producer reconnection when `newPartitionProducer` (line 239) calls `FailTimeoutMessages` (line 1006) to clean up timed-out pending messages - The user callback is never reached — the panic happens inside `done()` before invoking the callback ### Context - We are aware of PR #1121 which normalized sendRequest resource release into `sr.done()`. This fix is included in v0.17.0, but the panic still occurs. - The issue happens under heavy timeout load (multiple brokers simultaneously returning send timeout errors) - This crash kills the entire Go process since the panic occurs in an internal library goroutine that we cannot recover from ### Suggested fix Add a nil check for `MessageID` inside `sendRequest.done()` before accessing any of its fields, or ensure `FailTimeoutMessages` passes a non-nil sentinel MessageID when failing messages. ### Are you willing to submit a PR? - [] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
