[ 
https://issues.apache.org/jira/browse/SPARK-47404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47404:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add hooks to release the ANTLR DFA cache after parsing SQL
> ----------------------------------------------------------
>
>                 Key: SPARK-47404
>                 URL: https://issues.apache.org/jira/browse/SPARK-47404
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Mark Jarvin
>            Priority: Major
>              Labels: pull-request-available
>
> ANTLR builds a DFA cache while parsing to speed up parsing of similar future 
> inputs. However, this cache is never cleared and can only grow. Extremely 
> large SQL inputs can lead to very large DFA caches (>20GiB in one extreme 
> case I've seen).
> Spark’s ANTLR SQL parser is derived from the Presto ANTLR SQL Parser, and 
> Presto has added hooks to be able to clear this DFA cache. I think Spark 
> should have similar hooks.
> References:
>  * 
> [https://github.com/antlr/antlr4/blob/f08a19bbb202b02a521f84d99e661e386bea8625/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java#L163-L171]
>  * 
> [https://stackoverflow.com/questions/28017135/why-antlr4-parsers-accumulates-atnconfig-objects?rq=2]
>  * [https://github.com/antlr/antlr4/issues/499]
>  * 
> [https://github.com/trinodb/trino/pull/3186/files#diff-75b81ed5837578d1af42fcc91e4094a247138e5da6edb9d9e4b67d53247b8ca9]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to