Thanks Marco. This solved the order problem. Had another question which is
prefix to this.

As you can see below ID2,ID1 and ID3 are in order and I need to maintain
this index order as well. But when we do groupByKey
operation(*rdd.distinct.groupByKey().mapValues(v
=> v.toArray*))
everything is *jumbled*.
Is there any way we can maintain this order as well ?

scala> RDD.foreach(println)
(ID2,18159)
(ID1,18159)
(ID3,18159)

(ID2,18159)
(ID1,18159)
(ID3,18159)

(ID2,36318)
(ID1,36318)
(ID3,36318)

(ID2,54477)
(ID1,54477)
(ID3,54477)

*Jumbled version : *
Array(
(ID1,Array(*18159*, 308703, 72636, 64544, 39244, 107937, *54477*, 145272,
100079, *36318*, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
45431, 100136)),
(ID3,Array(100079, 19622, *18159*, 212064, 107937, 44683, 150022, 39244,
100136, 58866, 72636, 145272, 817, 89366, * 54477*, *36318*, 308703,
160992, 45431, 162076)),
(ID2,Array(308703, * 54477*, 89366, 39244, 150022, 72636, 817, 58866,
44683, 19622, 160992, 107937, 100079, 100136, 145272, 64544, *18159*,
45431, *36318*, 162076))
)

*Expected output:*
Array(
(ID1,Array(*18159*,*36318*, *54477,...*)),
(ID3,Array(*18159*,*36318*, *54477, ...*)),
(ID2,Array(*18159*,*36318*, *54477, ...*))
)

As you can see after *groupbyKey* operation is complete item 18519 is in
index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected is
index 0


On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mmistr...@gmail.com>
wrote:

> Hello
>  Uhm you have an array containing 3 tuples?
> If all the arrays have same length, you can just zip all of them,
> creatings a list of tuples
> then you can scan the list 5 by 5...?
>
> so something like
>
> (Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList
>
> this will give you a list of tuples of 3 elements containing each items
> from ID1, ID2 and ID3  ... sample below
> res: List((18159,100079,308703), (308703, 19622, 54477), (72636,18159,
> 89366)..........)
>
> then you can use a recursive function to compare each element such as
>
> def iterate(lst:List[(Int, Int, Int)]):T = {
>     if (lst.isEmpty): /// return your comparison
>     else {
>          val splits = lst.splitAt(5)
>          // do sometjhing about it using splits._1
>          iterate(splits._2)
>    }
>
> will this help? or am i still missing something?
>
> kr
>
>
>
>
>
>
>
>
>
>
>
>
> On 24 Jul 2016 5:52 pm, "janardhan shetty" <janardhan...@gmail.com> wrote:
>
>> Array(
>> (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
>> 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
>> 45431, 100136)),
>> (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
>> 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703, 160992,
>> 45431, 162076)),
>> (ID2,Array(308703, 54477, 89366, 39244, 150022, 72636, 817, 58866, 44683,
>> 19622, 160992, 107937, 100079, 100136, 145272, 64544, 18159, 45431, 36318,
>> 162076))
>> )
>>
>> I need to compare first 5 elements of ID1 with first five element of ID3
>> next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
>> order until the end of number of elements.
>> Let me know if this helps
>>
>>
>> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> Apologies I misinterpreted.... could you post two use cases?
>>> Kr
>>>
>>> On 24 Jul 2016 3:41 pm, "janardhan shetty" <janardhan...@gmail.com>
>>> wrote:
>>>
>>>> Marco,
>>>>
>>>> Thanks for the response. It is indexed order and not ascending or
>>>> descending order.
>>>> On Jul 24, 2016 7:37 AM, "Marco Mistroni" <mmistr...@gmail.com> wrote:
>>>>
>>>>> Use map values to transform to an rdd where values are sorted?
>>>>> Hth
>>>>>
>>>>> On 24 Jul 2016 6:23 am, "janardhan shetty" <janardhan...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have a key,value pair rdd where value is an array of Ints. I need
>>>>>> to maintain the order of the value in order to execute downstream
>>>>>> modifications. How do we maintain the order of values?
>>>>>> Ex:
>>>>>> rdd = (id1,[5,2,3,15],
>>>>>> Id2,[9,4,2,5]....)
>>>>>>
>>>>>> Followup question how do we compare between one element in rdd with
>>>>>> all other elements ?
>>>>>>
>>>>>
>>

Reply via email to