Hello, I am trying to filter tuples in bag which is generated by sequence of operation in pig. My data looks like this. (0,{(0,8),(0,1),(0,6),(0,7),(0,4)}) (1,{(1,6),(1,7),(1,8),(1,4)}) (4,{(4,6),(4,8),(4,7)}) (6,{(6,8),(6,7)}) (7,{(7,8)}) This relation is stored in R4. When I do a describe on this relation it says like this. R4: {group: int,R3: {R::b: int,R1::b1: int}}
I was trying to filter the data in the inner bag so that the one which had smallest difference stays and rest all are filtered out. For ex the desired output would be (0,{(0,1)}) (1,{(1,4)}) (4,{(4,6)}) (6,{(6,7)}) (7,{(7,8)}) I tried doing it like this: R5 = foreach R4 { R6 = filter R3 by MIN(b1-b); generate group; } and also some other methods but then realized this was not the proper way of doing it and I was stuck. Then I thought I might write a UDF to achieve it but it would be great if I could do it in Pig it self. Can anyone help me out with this? Thanks, Dhaval Deshpande.