Alex Khakhlyuk created SPARK-55047:
--------------------------------------

             Summary: [CONNECT] Add client-side limit for local relation size
                 Key: SPARK-55047
                 URL: https://issues.apache.org/jira/browse/SPARK-55047
             Project: Spark
          Issue Type: Bug
          Components: Connect
    Affects Versions: 4.1.1
            Reporter: Alex Khakhlyuk


Currently, local relation sizes are limited using 
{{spark.sql.session.localRelationSizeLimit}} conf (set to 3GB by default).
This limit is only checked on the server. The client can upload arbitrary large 
relations, e.g. 100 GB, as artifacts, the server will store them, try to 
materialize the local relation on the driver and only then throw an error if 
the relation exceeds the limit.
This is bad for several reasons:
 # The driver will need to store arbitrary amount of data and can run out of 
memory or disk space causing driver failure.
 # The client wastes a lot of time uploading the data to the server only to 
fail later. This is bad UX.

This should be solved by enforcing the limit on the client.

If the client sees that the local relation it serializes exceeds the limit, it 
will throw an {{AnalysisException}} with error class 
{{LOCAL_RELATION_SIZE_LIMIT_EXCEEDED}} and sql state {{54000}} (program limit 
exceeded).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to