Re: Adding an indexed column

2015-06-04 Thread Deenar Toraskar
or you could

1) convert dataframe to RDD
2) use mapPartitions and zipWithIndex within each partition
3) convert RDD back to dataframe you will need to make sure you preserve
partitioning

Deenar

On 1 June 2015 at 02:23, ayan guha guha.a...@gmail.com wrote:

 If you are on spark 1.3, use repartitionandSort followed by mappartition.
 In 1.4, window functions will be supported, it seems
 On 1 Jun 2015 04:10, Ricardo Almeida ricardo.alme...@actnowib.com
 wrote:

 That's great and how would you create an ordered index by partition (by
 product in this example)?

 Assuming now a dataframe like:

 flag | product | price
 --
 1|   a |47.808764653746
 1|   b |47.808764653746
 1|   a |31.9869279512204
 1|   b |47.7907893713564
 1|   a |16.7599200038239
 1|   b |16.7599200038239
 1|   b |20.3916014172137


 get a new dataframe such as:

 flag | product | price | index
 --
 1|   a |47.808764653746  | 0
 1|   a |31.9869279512204 | 1
 1|   a |16.7599200038239 | 2
 1|   b |47.808764653746  | 0
 1|   b |47.7907893713564 | 1
 1|   b |20.3916014172137 | 2
 1|   b |16.7599200038239 | 3








 On 29 May 2015 at 12:25, Wesley Miao wesley.mi...@gmail.com wrote:

 One way I can see is to -

 1. get rdd from your df
 2. call rdd.zipWithIndex to get a new rdd
 3. turn your new rdd to a new df

 On Fri, May 29, 2015 at 5:43 AM, Cesar Flores ces...@gmail.com wrote:


 Assuming that I have the next data frame:

 flag | price
 --
 1|47.808764653746
 1|47.808764653746
 1|31.9869279512204
 1|47.7907893713564
 1|16.7599200038239
 1|16.7599200038239
 1|20.3916014172137

 How can I create a data frame with an extra indexed column as the next
 one:

 flag | price  | index
 --|---
 1|47.808764653746 | 0
 1|47.808764653746 | 1
 1|31.9869279512204| 2
 1|47.7907893713564| 3
 1|16.7599200038239| 4
 1|16.7599200038239| 5
 1|20.3916014172137| 6

 --
 Cesar Flores






Re: Adding an indexed column

2015-05-31 Thread ayan guha
If you are on spark 1.3, use repartitionandSort followed by mappartition.
In 1.4, window functions will be supported, it seems
On 1 Jun 2015 04:10, Ricardo Almeida ricardo.alme...@actnowib.com wrote:

 That's great and how would you create an ordered index by partition (by
 product in this example)?

 Assuming now a dataframe like:

 flag | product | price
 --
 1|   a |47.808764653746
 1|   b |47.808764653746
 1|   a |31.9869279512204
 1|   b |47.7907893713564
 1|   a |16.7599200038239
 1|   b |16.7599200038239
 1|   b |20.3916014172137


 get a new dataframe such as:

 flag | product | price | index
 --
 1|   a |47.808764653746  | 0
 1|   a |31.9869279512204 | 1
 1|   a |16.7599200038239 | 2
 1|   b |47.808764653746  | 0
 1|   b |47.7907893713564 | 1
 1|   b |20.3916014172137 | 2
 1|   b |16.7599200038239 | 3








 On 29 May 2015 at 12:25, Wesley Miao wesley.mi...@gmail.com wrote:

 One way I can see is to -

 1. get rdd from your df
 2. call rdd.zipWithIndex to get a new rdd
 3. turn your new rdd to a new df

 On Fri, May 29, 2015 at 5:43 AM, Cesar Flores ces...@gmail.com wrote:


 Assuming that I have the next data frame:

 flag | price
 --
 1|47.808764653746
 1|47.808764653746
 1|31.9869279512204
 1|47.7907893713564
 1|16.7599200038239
 1|16.7599200038239
 1|20.3916014172137

 How can I create a data frame with an extra indexed column as the next
 one:

 flag | price  | index
 --|---
 1|47.808764653746 | 0
 1|47.808764653746 | 1
 1|31.9869279512204| 2
 1|47.7907893713564| 3
 1|16.7599200038239| 4
 1|16.7599200038239| 5
 1|20.3916014172137| 6

 --
 Cesar Flores






Re: Adding an indexed column

2015-05-29 Thread Wesley Miao
One way I can see is to -

1. get rdd from your df
2. call rdd.zipWithIndex to get a new rdd
3. turn your new rdd to a new df

On Fri, May 29, 2015 at 5:43 AM, Cesar Flores ces...@gmail.com wrote:


 Assuming that I have the next data frame:

 flag | price
 --
 1|47.808764653746
 1|47.808764653746
 1|31.9869279512204
 1|47.7907893713564
 1|16.7599200038239
 1|16.7599200038239
 1|20.3916014172137

 How can I create a data frame with an extra indexed column as the next one:

 flag | price  | index
 --|---
 1|47.808764653746 | 0
 1|47.808764653746 | 1
 1|31.9869279512204| 2
 1|47.7907893713564| 3
 1|16.7599200038239| 4
 1|16.7599200038239| 5
 1|20.3916014172137| 6

 --
 Cesar Flores



Adding an indexed column

2015-05-28 Thread Cesar Flores
Assuming that I have the next data frame:

flag | price
--
1|47.808764653746
1|47.808764653746
1|31.9869279512204
1|47.7907893713564
1|16.7599200038239
1|16.7599200038239
1|20.3916014172137

How can I create a data frame with an extra indexed column as the next one:

flag | price  | index
--|---
1|47.808764653746 | 0
1|47.808764653746 | 1
1|31.9869279512204| 2
1|47.7907893713564| 3
1|16.7599200038239| 4
1|16.7599200038239| 5
1|20.3916014172137| 6

-- 
Cesar Flores