Costas Zarifis created SPARK-48967: -------------------------------------- Summary: Improve performance and memory footprint of "INSERT INTO ... VALUES" Statements Key: SPARK-48967 URL: https://issues.apache.org/jira/browse/SPARK-48967 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.4 Reporter: Costas Zarifis
Currently very large "INSERT INTO ... VALUES" statements result into disproportionally large parse trees as each literal will need to remain in the parse tree, until it eventually gets evaluated into a LocalTable, once the appropriate analyzer/optimizer rules have been applied. This results in increased memory pressure on the driver, when such large statements are generated, that can lead to OOMs and GC pauses. It also results in suboptimal runtime performance as the time it takes to apply analyzer/optimizer rules is typically proportional to the size of the parse tree. Both these issues can be resolved by applying the functions that evaluate the unresolved table into a local table eagerly from the AST Builder, thus short-circuiting the evaluation of such statements. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org