How big is your input dataset? On Thursday, November 27, 2014, Praveen Sripati <praveensrip...@gmail.com> wrote:
> Hi, > > When I run the below program, I see two files in the HDFS because the > number of partitions in 2. But, one of the file is empty. Why is it so? Is > the work not distributed equally to all the tasks? > > textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)). > *reduceByKey*(lambda a, b: a+b).*repartition(2)* > .saveAsTextFile("hdfs://localhost:9000/user/praveen/output/") > > Thanks, > Praveen > -- - Rishi