You may just refer to my another letter with title :
[Beg for help] spark job with very low efficiency
On Tuesday, December 22, 2015 1:49 AM, Ted Yu wrote:
I am not familiar with your use case, is it possible to perform the randomized
combination operation based
Dear All,
For some rdd, while there is just one partition, then the operation &
arithmetic would only be single, the rdd has lose all the parallelism benefit
from spark system ...
Is it exactly like that?
Thanks very much in advance!Zhiliang
Have you tried the following method ?
* Note: With shuffle = true, you can actually coalesce to a larger number
* of partitions. This is useful if you have a small number of partitions,
* say 100, potentially with a few partitions being abnormally large.
Calling
* coalesce(1000,
Hi Ted,
Thanks a lot for your kind reply.
I needs to convert this rdd0 into another rdd1, rows of rdd1 are generated
from rdd0's row randomly combination operation.From that perspective, rdd0
would be with one partition in order to randomly operate on its all rows,
however, it would also lose
I am not familiar with your use case, is it possible to perform the
randomized combination operation based on subset of the rows in rdd0 ?
That way you can increase the parallelism.
Cheers
On Mon, Dec 21, 2015 at 9:40 AM, Zhiliang Zhu wrote:
> Hi Ted,
>
> Thanks a lot for