Hey Dimitriy, thanks for sharing your solution.
I have some more updates.
The problem comes out when shuffle is involved. Using coalesce shuffle true
behaves like reduceByKey+smaller num of partitions, except that the whole
save stage hangs. I am not sure yet if it only happens with UnionRDD or
Doing the reduceByKey without changing the number of partitions and then do
a coalesce works.
But the other version still hangs, without any information (while working
with spark 1.1.1). The previous logs don't seem to be related to what
happens.
I don't think this is a memory issue as the GC time
FWIW observed similar behavior in similar situation. Was able to work
around by forcefully committing one of the rdds right before the union
into cache, and forcing that by executing take(1). Nothing else ever
helped.
Seems like yet-undiscovered 1.2.x thing.
On Tue, Mar 17, 2015 at 4:21 PM,
Hum increased it to 1024 but doesn't help still have the same problem :(
2015-03-13 18:28 GMT+01:00 Eugen Cepoi cepoi.eu...@gmail.com:
The one by default 0.07 of executor memory. I'll try increasing it and
post back the result.
Thanks
2015-03-13 18:09 GMT+01:00 Ted Yu yuzhih...@gmail.com:
Might be related: what's the value for spark.yarn.executor.memoryOverhead ?
See SPARK-6085
Cheers
On Fri, Mar 13, 2015 at 9:45 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote:
Hi,
I have a job that hangs after upgrading to spark 1.2.1 from 1.1.1. Strange
thing, the exact same code does work
The one by default 0.07 of executor memory. I'll try increasing it and post
back the result.
Thanks
2015-03-13 18:09 GMT+01:00 Ted Yu yuzhih...@gmail.com:
Might be related: what's the value for spark.yarn.executor.memoryOverhead ?
See SPARK-6085
Cheers
On Fri, Mar 13, 2015 at 9:45 AM,
Hi,
I have a job that hangs after upgrading to spark 1.2.1 from 1.1.1. Strange
thing, the exact same code does work (after upgrade) in the spark-shell.
But this information might be misleading as it works with 1.1.1...
*The job takes as input two data sets:*
- rdd A of +170gb (with less it is