Hi I have few doubts
1.does rdd.saveasNewAPIHadoopFile(outputdir,keyclass,valueclass,ouputformat class)-> shuffles data or it will always create same no of files in output dir as number of partitions in rdd. 2. How to use multiple outputs in saveasNewAPIHadoopFile to have file name generated from key for non Textoutputformat type outputformats. 3. I have a JavaPairRDD<K,V> - I want to partition it into number of partitons equal to distinct keys in pairrdd. 1.will pairrdd.groupByKey() will create new rdd with partitions equal to number of distinct keys in parent pairrdd? 2.or i will have to calculate distinct keys in pairrdd (by using pairrdd.keys().distinct().count())and then call custom partitioner() on pair rdd with number of partitions as calculated distinct keys and partition by key? Thanks