[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957540#comment-14957540 ]
Glenn Strycker commented on SPARK-6235: --------------------------------------- Until this issue and sub-issue tickets are solved, are there any known work-arounds? Increase number of partitions, or decrease? Split up RDDs into parts, run your command, and then union? Turn off Kryo? Use dataframes? Help!! I am encountering the 2GB bug on attempting to simply (re)partition by key an RDD of modest size (84GB) and low skew (AFAIK). I have my memory requests per executor, per master node, per Java, etc. all cranked up as far as they'll go, and I'm currently attempting to partition this RDD across 6800 partitions. Unless my skew is really bad, I don't see why 12MB per partition would be causing a shuffle to hit the 2GB limit, unless the overhead of so many partitions is actually hurting rather than helping. I'm going to try adjusting my partition number and see what happens, but I wanted to know if there is a standard work-around answer to this 2GB issue. > Address various 2G limits > ------------------------- > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core > Reporter: Reynold Xin > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org