Hi Ayan, Yes, ID=3 can be paired with ID=1, and the same for ID=9 with ID=8. BUT we want to keep only ONE pair for the ID with Flag=0.
Since ID=1 with Flag=0 already paired with ID=2, and ID=8 paired with ID=7, we simply delete ID=3 and ID=9. Thanks! Regards, Rex On Fri, Aug 26, 2016 at 12:46 AM, ayan guha <guha.a...@gmail.com> wrote: > Why 3 and 9 should be deleted? 3 can be paired with 1and 9 can be paired > with 8. > On 26 Aug 2016 11:00, "Rex X" <dnsr...@gmail.com> wrote: > >> 1. Given following CSV file >> >> > $cat data.csv >> > >> > ID,City,Zip,Flag >> > 1,A,95126,0 >> > 2,A,95126,1 >> > 3,A,95126,1 >> > 4,B,95124,0 >> > 5,B,95124,1 >> > 6,C,95124,0 >> > 7,C,95127,1 >> > 8,C,95127,0 >> > 9,C,95127,1 >> >> >> (a) where "ID" above is a primary key (unique), >> >> (b) for each "City" and "Zip" combination, there is one ID in max with >> Flag=0; while it can contain multiple IDs with Flag=1 for each "City" and >> "Zip" combination. >> >> (c) Flag can be 0 or 1 >> >> >> 2. For each ID with Flag=0, we want to pair it with another ID with >> Flag=1 but with the same City - Zip. If one cannot find another paired ID >> with Flag=1 and matched City - Zip, we just delete that record. >> >> Here is the expected result: >> >> > ID,City,Zip,Flag >> > 1,A,95126,0 >> > 2,A,95126,1 >> > 4,B,95124,0 >> > 5,B,95124,1 >> > 7,C,95127,1 >> > 8,C,95127,0 >> >> >> Any valuable tips how to do this pairing in Python or Scala? >> >> Great thanks! >> >> Rex >> >