[jira] [Commented] (SPARK-30732) BroadcastExchangeExec does not fully honor "spark.broadcast.compress"

L. C. Hsieh (Jira) Thu, 06 Feb 2020 14:27:19 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-30732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031996#comment-17031996
 ]


L. C. Hsieh commented on SPARK-30732:
-------------------------------------

`getByteArrayRdd` is not used only there. And as the comment said, UnsafeRow is 
highly compressible. It sounds not a good idea to disable compression there.

I think `spark.broadcast.compress` provides an option to disable compression 
because you might have objects in a RDD that is hardly compressible.

For `getByteArrayRdd`, the purpose is to collect highly compressible rows back.

> BroadcastExchangeExec does not fully honor "spark.broadcast.compress"
> ---------------------------------------------------------------------
>
>                 Key: SPARK-30732
>                 URL: https://issues.apache.org/jira/browse/SPARK-30732
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Puneet
>            Priority: Major
>
> Setting {{spark.broadcast.compress}} to false disables compression while 
> sending broadcast variable to executors 
> ([https://spark.apache.org/docs/latest/configuration.html#compression-and-serialization])
> However this does not disable compression for any child relations sent by the 
> executors to the driver. 
> Setting spark.boradcast.compress to false should disable both sides of the 
> traffic, allowing users to disable compression for the whole broadcast join 
> for example.
> [https://github.com/puneetguptanitj/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L89]
> ^here `executeCollectIterator` calls `getByteArrayRdd` which by default 
> always gets a compressed stream
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30732) BroadcastExchangeExec does not fully honor "spark.broadcast.compress"

Reply via email to