Re: How to do this pairing in Spark?

2016-08-26 Thread ayan guha
Top of head select *from (Select ID, flag, lead(id) over(partition by city,zip order by flag,ID) c from t) Where id==0 and c is not null Should do it. Basically you want to keep records which has ID 0 and have a corresponding 1. Please let me know if doesn't work, so I can provide a right

Re: How to do this pairing in Spark?

2016-08-26 Thread Rex X
Hi Ayan, Yes, ID=3 can be paired with ID=1, and the same for ID=9 with ID=8. BUT we want to keep only ONE pair for the ID with Flag=0. Since ID=1 with Flag=0 already paired with ID=2, and ID=8 paired with ID=7, we simply delete ID=3 and ID=9. Thanks! Regards, Rex On Fri, Aug 26, 2016 at

Re: How to do this pairing in Spark?

2016-08-26 Thread ayan guha
Why 3 and 9 should be deleted? 3 can be paired with 1and 9 can be paired with 8. On 26 Aug 2016 11:00, "Rex X" wrote: > 1. Given following CSV file > > > $cat data.csv > > > > ID,City,Zip,Flag > > 1,A,95126,0 > > 2,A,95126,1 > > 3,A,95126,1 > >

How to do this pairing in Spark?

2016-08-25 Thread Rex X
1. Given following CSV file > $cat data.csv > > ID,City,Zip,Flag > 1,A,95126,0 > 2,A,95126,1 > 3,A,95126,1 > 4,B,95124,0 > 5,B,95124,1 > 6,C,95124,0 > 7,C,95127,1 > 8,C,95127,0 > 9,C,95127,1 (a) where "ID" above is a primary key (unique), (b) for