Re: SparkR : lapplyPartition transforms the data in vertical format

Shivaram Venkataraman Thu, 07 Aug 2014 21:18:07 -0700

I tried this out and what is happening here is that as the input file is
small only 1 partition is created. lapplyPartition runs the given function
on the partition and computes sumx as 55 and sumy as 55. Now the return
value from lapplyPartition is treated as a list by SparkR and collect
concatenates all the lists from all partitions.


Thus output in this case is just a list with two values and trying to
access element[2] in the for loop gives NA. If you just use
cat(as.character(element), "\n"), you should see 55 and 55.

Thanks
Shivaram


On Thu, Aug 7, 2014 at 3:21 PM, Pranay Dave <pranay.da...@gmail.com> wrote:

> Hello Zongheng
> Infact the problem is in lapplyPartition
> lapply gives output as
> 1,1
> 2,2
> 3,3
> ...
> 10,10
>
> However lapplyPartition gives output as
> 55, NA
> 55, NA
>
> Why lapply output is horizontal and lapplyPartition is vertical ?
>
> Here is my code
> library(SparkR)
>
>
> sc <- sparkR.init("local")
> lines <- textFile(sc,"/sparkdev/datafiles/covariance.txt")
>
> totals <- lapplyPartition(lines, function(lines)
> {
>
>
>         sumx <- 0
>         sumy <- 0
>         totaln <- 0
>         for (i in 1:length(lines)){
>                 dataxy <- unlist(strsplit(lines[i], ","))
>                 sumx <- sumx  + as.numeric(dataxy[1])
>                 sumy <- sumy  + as.numeric(dataxy[2])
>
>         }
>
>         ##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy),
> as.numeric(totaln))
>         ##list does same as below
>         c(sumx,sumy)
>
> }
>
> )
>
> output <- collect(totals)
> for (element in output) {
>   cat(as.character(element[1]),as.character(element[2]), "\n")
> }
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11726.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: SparkR : lapplyPartition transforms the data in vertical format

Reply via email to