Costas Zarifis created SPARK-48967:
--------------------------------------

             Summary: Improve performance and memory footprint of "INSERT INTO 
... VALUES" Statements
                 Key: SPARK-48967
                 URL: https://issues.apache.org/jira/browse/SPARK-48967
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.4.4
            Reporter: Costas Zarifis


Currently very large "INSERT INTO ... VALUES" statements result into 
disproportionally large parse trees as each literal will need to remain in the 
parse tree, until it eventually gets evaluated into a LocalTable, once the 
appropriate analyzer/optimizer rules have been applied.

 

This results in increased memory pressure on the driver, when such large 
statements are generated, that can lead to OOMs and GC pauses. It also results 
in suboptimal runtime performance as the time it takes to apply 
analyzer/optimizer rules is typically proportional to the size of the parse 
tree.

 

Both these issues can be resolved by applying the functions that evaluate the 
unresolved table into a local table eagerly from the AST Builder, thus 
short-circuiting the evaluation of such statements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to