Why 3 and 9 should be deleted? 3 can be paired with 1and 9 can be paired
with 8.
On 26 Aug 2016 11:00, "Rex X" <dnsr...@gmail.com> wrote:

> 1. Given following CSV file
>
> >     $cat data.csv
> >
> >     ID,City,Zip,Flag
> >     1,A,95126,0
> >     2,A,95126,1
> >     3,A,95126,1
> >     4,B,95124,0
> >     5,B,95124,1
> >     6,C,95124,0
> >     7,C,95127,1
> >     8,C,95127,0
> >     9,C,95127,1
>
>
> (a) where "ID" above is a primary key (unique),
>
> (b) for each "City" and "Zip" combination, there is one ID in max with
> Flag=0; while it can contain multiple IDs with Flag=1 for each "City" and
> "Zip" combination.
>
> (c) Flag can be 0 or 1
>
>
> 2. For each ID with Flag=0, we want to pair it with another ID with
> Flag=1 but with the same City - Zip. If one cannot find another paired ID
> with Flag=1 and matched City - Zip, we just delete that record.
>
> Here is the expected result:
>
> >     ID,City,Zip,Flag
> >     1,A,95126,0
> >     2,A,95126,1
> >     4,B,95124,0
> >     5,B,95124,1
> >     7,C,95127,1
> >     8,C,95127,0
>
>
> Any valuable tips how to do this pairing in Python or Scala?
>
> Great thanks!
>
> Rex
>

Reply via email to