[ https://issues.apache.org/jira/browse/SPARK-30732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031996#comment-17031996 ]
L. C. Hsieh commented on SPARK-30732: ------------------------------------- `getByteArrayRdd` is not used only there. And as the comment said, UnsafeRow is highly compressible. It sounds not a good idea to disable compression there. I think `spark.broadcast.compress` provides an option to disable compression because you might have objects in a RDD that is hardly compressible. For `getByteArrayRdd`, the purpose is to collect highly compressible rows back. > BroadcastExchangeExec does not fully honor "spark.broadcast.compress" > --------------------------------------------------------------------- > > Key: SPARK-30732 > URL: https://issues.apache.org/jira/browse/SPARK-30732 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Puneet > Priority: Major > > Setting {{spark.broadcast.compress}} to false disables compression while > sending broadcast variable to executors > ([https://spark.apache.org/docs/latest/configuration.html#compression-and-serialization]) > However this does not disable compression for any child relations sent by the > executors to the driver. > Setting spark.boradcast.compress to false should disable both sides of the > traffic, allowing users to disable compression for the whole broadcast join > for example. > [https://github.com/puneetguptanitj/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L89] > ^here `executeCollectIterator` calls `getByteArrayRdd` which by default > always gets a compressed stream > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org