Hello everyone,
    I am transplanting a clustering algorithm to spark platform, and I meet
a problem confusing me for a long time, can someone help me?

    I have a PairRDD<Integer, Integer> named patternRDD, which the key
represents a number and the value stores an information of the key. And I
want to use two of the VALUEs to calculate a kendall number, and if the
number is greater than 0.6, then output the two KEYs.

    I have tried to transform the PairRDD to a RDD<Tuple2<Integer,
Integer>>, and add a common key zero to them, and join two together then
get a PairRDD<0, Iterable<Tuple2<Tuple2<key1, value1>, Tuple2<key2,
value2>>>>, and tried to use values() method and map the keys out, but it
gives me an "out of memory" error. I think the "out of memory" error is
caused by the few entries of my RDD, but I have no idea how to solve it.

     Can you help me?

Regards,
Gefei Li

Reply via email to