Re: Job consistently failing after leftOuterJoin() - oddly sized / non-uniform partitions

2015-07-07 Thread beancinematics
Right, I figured I'd need a custom partitioner from what I've read around! Documentation on this is super sparse; do you have any recommended links on solving data skew and/or creating custom partitioners in Spark 1.4? I'd also love to hear if this is an unusual problem with my type of set-up -

Job consistently failing after leftOuterJoin() - oddly sized / non-uniform partitions

2015-07-06 Thread Mohammed Omer
Afternoon all, Really loving this project and the community behind it. Thank you all for your hard work. This past week, though, I've been having a hard time getting my first deployed job to run without failing at the same point every time: Right after a leftOuterJoin, most partitions (600

Re: Job consistently failing after leftOuterJoin() - oddly sized / non-uniform partitions

2015-07-06 Thread ayan guha
You can bump up number of partition by a parameter in join operator. However you have a data skew problem which you need to resolve using a reasonable partition by function On 7 Jul 2015 08:57, Mohammed Omer beancinemat...@gmail.com wrote: Afternoon all, Really loving this project and the