I am trying to broadcast a large 5GB variable using Spark 1.2.0. I get the
following exception when the size of the broadcast variable exceeds 2GB. Any
ideas on how I can resolve this issue?
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at
I have been running into NegativeArraySizeException's when doing joins on
data with very skewed key distributions in Spark 1.2.0. I found a previous
post that mentioned that this exception arises when the size of the blocks
spilled during the shuffle exceeds 2GB. The post recommended increasing
I am testing the performance of Spark to see how it behaves when the
dataset size exceeds the amount of memory available. I am running
wordcount on a 4-node cluster (Intel Xeon 16 cores (32 threads), 256GB
RAM per node). I limited spark.executor.memory to 64g, so I have 256g
of memory available in
Hi Reynold,
Nice! What spark configuration parameters did you use to get your job to
run successfully on a large dataset? My job is failing on 1TB of input data
(uncompressed) on a 4-node cluster (64GB memory per node). No OutOfMemory
errors just lost executors.
Thanks,
Soila
On Mar 20, 2014 11
%
of the data. Do you have any pointers on how to handle skewed key
distributions during a join.
Soila
On Fri, Feb 13, 2015 at 10:49 AM, Imran Rashid iras...@cloudera.com wrote:
unfortunately this is a known issue:
https://issues.apache.org/jira/browse/SPARK-1476
as Sean suggested, you need to think
Thanks Shixiong,
I'll try out your PR. Do you know what the status of the PR is? Are
there any plans to incorporate this change to the
DataFrames/SchemaRDDs in Spark 1.3?
Soila
On Thu, Mar 12, 2015 at 7:52 PM, Shixiong Zhu zsxw...@gmail.com wrote:
I sent a PR to add skewed join last year
Hi Tristan,
Did upgrading to Kryo3 help?
Thanks,
Soila
On Sun, Mar 1, 2015 at 2:48 PM, Tristan Blakers tris...@blackfrog.org wrote:
Yeah I implemented the same solution. It seems to kick in around the 4B
mark, but looking at the log I suspect it’s probably a function of the
number of unique
Does Spark support skewed joins similar to Pig which distributes large
keys over multiple partitions? I tried using the RangePartitioner but
I am still experiencing failures because some keys are too large to
fit in a single partition. I cannot use broadcast variables to
work-around this because