I tried this out and what is happening here is that as the input file is small only 1 partition is created. lapplyPartition runs the given function on the partition and computes sumx as 55 and sumy as 55. Now the return value from lapplyPartition is treated as a list by SparkR and collect concatenates all the lists from all partitions.
Thus output in this case is just a list with two values and trying to access element[2] in the for loop gives NA. If you just use cat(as.character(element), "\n"), you should see 55 and 55. Thanks Shivaram On Thu, Aug 7, 2014 at 3:21 PM, Pranay Dave <pranay.da...@gmail.com> wrote: > Hello Zongheng > Infact the problem is in lapplyPartition > lapply gives output as > 1,1 > 2,2 > 3,3 > ... > 10,10 > > However lapplyPartition gives output as > 55, NA > 55, NA > > Why lapply output is horizontal and lapplyPartition is vertical ? > > Here is my code > library(SparkR) > > > sc <- sparkR.init("local") > lines <- textFile(sc,"/sparkdev/datafiles/covariance.txt") > > totals <- lapplyPartition(lines, function(lines) > { > > > sumx <- 0 > sumy <- 0 > totaln <- 0 > for (i in 1:length(lines)){ > dataxy <- unlist(strsplit(lines[i], ",")) > sumx <- sumx + as.numeric(dataxy[1]) > sumy <- sumy + as.numeric(dataxy[2]) > > } > > ##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy), > as.numeric(totaln)) > ##list does same as below > c(sumx,sumy) > > } > > ) > > output <- collect(totals) > for (element in output) { > cat(as.character(element[1]),as.character(element[2]), "\n") > } > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11726.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >