Re: add arraylist to dataframe

yohann jardin Tue, 29 Aug 2017 11:46:34 -0700

Hello Asmath,

Your list exist inside the driver, but you try to add element in it from 
the executors. They are in different processes, on different nodes, they 
do not communicate just like that.
https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions


There exist an action called 'collect' that will create the list for 
you. Something like the following should do what you want:

     import scala.collection.JavaConversions._
     points = df.rdd.map { row =>
         val latitude = 
com.navistar.telematics.datascience.validation.PreValidation.getDefaultDoubleVal(row.getAs[String](Constants.Datapoint.Latitude))
         val longitude = 
com.navistar.telematics.datascience.validation.PreValidation.getDefaultDoubleVal(row.getAs[String](Constants.Datapoint.Longitude))
         return new Coordinate(latitude, longitude)
     }.collect()

Note that you are retrieving ALL your coordinates in the driver. If you 
have too much data, this will lead to Out Of Memory.

Le 8/29/2017 à 8:21 PM, KhajaAsmath Mohammed a écrit :
> Hi,
>
> I am initiating arraylist before iterating throuugh the map method. I 
> am always getting the list size value as zero after map operation.
>
> How do I add values to list inside the map method of dataframe ? any 
> suggestions?
>
>  val points = new 
> java.util.ArrayList[com.vividsolutions.jts.geom.Coordinate]()
>     import scala.collection.JavaConversions._
>     df.rdd.map { row =>
>         val latitude = 
> com.navistar.telematics.datascience.validation.PreValidation.getDefaultDoubleVal(row.getAs[String](Constants.Datapoint.Latitude))
>         val longitude = 
> com.navistar.telematics.datascience.validation.PreValidation.getDefaultDoubleVal(row.getAs[String](Constants.Datapoint.Longitude))
>         points.add(new Coordinate(latitude, longitude))
>     }
> points.size is always zero.
>
>
> Thanks,
> Asmath

Re: add arraylist to dataframe

Reply via email to