[ https://issues.apache.org/jira/browse/HUDI-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan updated HUDI-5080: -------------------------------------- Sprint: 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12 (was: 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint) > UnpersistRdds unpersist all rdds in the spark context > ----------------------------------------------------- > > Key: HUDI-5080 > URL: https://issues.apache.org/jira/browse/HUDI-5080 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core > Reporter: sivabalan narayanan > Assignee: sivabalan narayanan > Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > In SparkRDDWriteClient, we have a method to clean up persisted Rdds to free > up the space occupied. > [https://github.com/apache/hudi/blob/b78c3441c4e28200abec340eaff852375764cbdb/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L584] > But the issue is, it cleans up all persisted rdds in the given spark context. > This will impact, async compaction or any other async table services running. > or even if there are multiple streams writing to different tables, this will > be cause a huge impact. > > This also needs to be fixed with DeltaSync. > [https://github.com/apache/hudi/blob/b78c3441c4e28200abec340eaff852375764cbdb/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L345] > -- This message was sent by Atlassian Jira (v8.20.10#820010)