geniusjoe opened a new pull request, #1462: URL: https://github.com/apache/pulsar-client-go/pull/1462
Fixes https://github.com/apache/pulsar-client-go/issues/1448 ### Motivation Refer to issue: > When we produce a message with payload > maxChunkSize * MaxPendingMessages, this single message will occupy the entire partition producer's p.publishSemaphore and cannot be released, causing the entire partition producer sending progress block forever. The primary reason why the Java SDK does not have a message size limit, is due to its different chunk message generation strategy: In Java, for each chunk split from the original message, a semaphore for one message is acquired and then the chunk is written to the pendingQueue for asynchronous sending. In Go, however, the system must wait until all semaphores for the current message are acquired before sending the entire batch to the pendingQueue. We can see `testBlockIfQueueFullWhenChunking()` describe this issue in Java code: https://github.com/apache/pulsar/blob/f0ec07b3d8c5cfe36942957fc0ad32e40d69320d/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L657 ```Java for (int chunkId = 0; chunkId < totalChunks; chunkId++) { ... // check pendingQueue permit if (chunkId > 0 && conf.isBlockIfQueueFull() && !canEnqueueRequest(callback, message.getSequenceId(), 0 /* The memory was already reserved */)) { ... return; } // send chunk message individually synchronized (this) { // Update the message metadata before computing the payload chunk size // to avoid a large message cannot be split into chunks. final long sequenceId = updateMessageMetadataSequenceId(msgMetadata); String uuid = totalChunks > 1 ? String.format("%s-%d", producerName, sequenceId) : null; serializeAndSendMessage(msg, payload, sequenceId, uuid, chunkId, totalChunks, readStartIndex, payloadChunkSize, compressedPayload, compressed, compressedPayload.readableBytes(), callback, chunkedMessageCtx, messageId); readStartIndex = ((chunkId + 1) * payloadChunkSize); } } ``` ### Modifications 1. Added validation for message payload size and pendingQueue in `pulsar/producer_partition.go#updateChunkInfo`. However, the current bugfix does not resolve the potential deadlock issue caused by multiple chunk messages refer to https://github.com/apache/pulsar/blob/f0ec07b3d8c5cfe36942957fc0ad32e40d69320d/pulsar-broker/src/test/java/org/apache/pulsar/client/impl/MessageChunkingTest.java#L706 2. Added cleanup handling when chunk messages occupy a portion of the semaphore, fixing the issue where the semaphore is not released after destruction in `sendRequest.done()` ### Verifying this change - [x] Make sure that the change passes the CI checks. This change added tests and can be verified as follows: `TestChunkBlockIfQueueFullWithoutTimeout` `TestSemaphoreStateWithChunkAndTimeout` ### Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) ### Documentation - Does this pull request introduce a new feature? (no) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
