Xi Lyu created SPARK-53525:
------------------------------
Summary: Spark Connect ArrowBatch Result Chunking
Key: SPARK-53525
URL: https://issues.apache.org/jira/browse/SPARK-53525
Project: Spark
Issue Type: Improvement
Components: Connect
Affects Versions: 4.1.0
Reporter: Xi Lyu
Currently, we enforce gRPC message limits on both the client and the server.
These limits are largely meant to protect both sides from potential OOMs by
rejecting abnormally large messages. However, there are cases in which the
server incorrectly sends oversized messages that exceed these limits and cause
execution failures.
Specifically, the large message issue from the server to the client we’re
solving here, comes from the Arrow batch data in ExecutePlanResponse being too
large. It’s caused by a single arrow row exceeding the 128MB message limit, and
Arrow cannot partition further and return the single large row in one gRPC
message.
To improve Spark Connect stability, this PR implements chunking large Arrow
batches when returning query results from the server to the client, ensuring
each ExecutePlanResponse chunk remains within the size limit, and the chunks
from a batch will be reassembled on the client when parsing as an arrow batch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]