[jira] [Updated] (SPARK-7970) Optimize code for SQL queries fired on Union of RDDs (closure cleaner)

Andrew Or (JIRA) Thu, 29 Oct 2015 04:47:57 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Or updated SPARK-7970:
-----------------------------
            Assignee: Nitin Goyal
    Target Version/s: 1.6.0

> Optimize code for SQL queries fired on Union of RDDs (closure cleaner)
> ----------------------------------------------------------------------
>
>                 Key: SPARK-7970
>                 URL: https://issues.apache.org/jira/browse/SPARK-7970
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Nitin Goyal
>            Assignee: Nitin Goyal
>         Attachments: Screen Shot 2015-05-27 at 11.01.03 pm.png, Screen Shot 
> 2015-05-27 at 11.07.02 pm.png
>
>
> Closure cleaner slows down the execution of Spark SQL queries fired on union 
> of RDDs. The time increases linearly at driver side with number of RDDs 
> unioned. Refer following thread for more context :-
> http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tt12466.html
> As can be seen in attached screenshots of Jprofiler, lot of time is getting 
> consumed in "getClassReader" method of ClosureCleaner and rest in 
> "ensureSerializable" (atleast in my case)
> This can be fixed in two ways (as per my current understanding) :-
> 1. Fixed at Spark SQL level - As pointed out by yhuai, we can create 
> MapPartitionsRDD idirectly nstead of doing rdd.mapPartitions which calls 
> ClosureCleaner clean method (See PR - 
> https://github.com/apache/spark/pull/6256).
> 2. Fix at Spark core level -
>   (i) Make "checkSerializable" property driven in SparkContext's clean method
>   (ii) Somehow cache classreader for last 'n' classes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7970) Optimize code for SQL queries fired on Union of RDDs (closure cleaner)

Reply via email to