[ https://issues.apache.org/jira/browse/SPARK-47404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-47404: ----------------------------------- Labels: pull-request-available (was: ) > Add hooks to release the ANTLR DFA cache after parsing SQL > ---------------------------------------------------------- > > Key: SPARK-47404 > URL: https://issues.apache.org/jira/browse/SPARK-47404 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 4.0.0 > Reporter: Mark Jarvin > Priority: Major > Labels: pull-request-available > > ANTLR builds a DFA cache while parsing to speed up parsing of similar future > inputs. However, this cache is never cleared and can only grow. Extremely > large SQL inputs can lead to very large DFA caches (>20GiB in one extreme > case I've seen). > Spark’s ANTLR SQL parser is derived from the Presto ANTLR SQL Parser, and > Presto has added hooks to be able to clear this DFA cache. I think Spark > should have similar hooks. > References: > * > [https://github.com/antlr/antlr4/blob/f08a19bbb202b02a521f84d99e661e386bea8625/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java#L163-L171] > * > [https://stackoverflow.com/questions/28017135/why-antlr4-parsers-accumulates-atnconfig-objects?rq=2] > * [https://github.com/antlr/antlr4/issues/499] > * > [https://github.com/trinodb/trino/pull/3186/files#diff-75b81ed5837578d1af42fcc91e4094a247138e5da6edb9d9e4b67d53247b8ca9] > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org