Hi,

I am doing join of two RDDs which giving different results ( counting number of 
records ) each time I run this code on same input.

The input files are large enough to be divided in two splits. When the program 
runs on two workers with single core assigned to these, output is consistent 
and looks correct. But when single worker is used with two or more than two 
cores, the result seems to be random. Every time, count of joined record is 
different.

Does this sound like a defect or I need to take care of something while using 
join ? I am using spark-0.9.1.


Regards
Ajay

Reply via email to