Re: Dataset , RDD zipWithIndex -- How to use as a map .

Pedro Rodriguez Fri, 22 Jul 2016 20:10:38 -0700

You could either do monotonically_increasing_id or use a window function
and rank. The first is a simple spark SQL function, data bricks has a
pretty helpful post for how to use window functions (in this case the whole
data set is the window).


On Fri, Jul 22, 2016 at 12:20 PM, Marco Mistroni <mmistr...@gmail.com>
wrote:

> Hi
> So u u have a data frame, then use zipwindex and create a tuple ....
> I m not sure if df API has something useful for zip w index.
> But u can
> - get a data frame
> - convert it to rdd (there's a tordd )
> - do a zip with index
>
> That will give u a rdd with 3 fields...
> I don't think you can update df columns....
> Hth
> On 22 Jul 2016 5:19 pm, "VG" <vlin...@gmail.com> wrote:
>
> >
>
> > Hi All,
> >
> > Any suggestions for this
> >
> > Regards,
> > VG
> >
> > On Fri, Jul 22, 2016 at 6:40 PM, VG <vlin...@gmail.com> wrote:
>
> >>
>
> >> Hi All,
> >>
> >> I am really confused how to proceed further. Please help.
> >>
> >> I have a dataset created as follows:
> >> Dataset<Row> b = sqlContext.sql("SELECT bid, name FROM business");
> >>
> >> Now I need to map each name with a unique index and I did the following
> >> JavaPairRDD<Row, Long> indexedBId = business.javaRDD()
> >>
>  .zipWithIndex();
> >>
> >> In later part of the code I need to change a datastructure and update
> name with index value generated above .
> >> I am unable to figure out how to do a look up here..
> >>
> >> Please suggest /.
> >>
> >> If there is a better way to do this please suggest that.
> >>
> >> Regards
> >> VG
> >>
> >
>



-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Re: Dataset , RDD zipWithIndex -- How to use as a map .

Reply via email to