[ 
https://issues.apache.org/jira/browse/PIG-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207575#comment-13207575
 ] 

Jonathan Coveney commented on PIG-2531:
---------------------------------------

Thanks for making this. I agree that this is useful. I looked it over quickly 
and have a couple of comments.

In my opinion, it is confusing to have two different functions for the case of 
comparing a bag to a given Tuple, and a Tuple. IMHO, you should implement 
getArgToFuncMapping and make it so that you have one ContainsTuple (or 
whatever) that works accordingly.

When you check if Tuples are equal in the bag version you're much stricter than 
in the Tuple version. That inconsistency seems odd to me. But I could see an 
argument either way. I personally would have a comparison function that can be 
instantiated in strict mode or in loose mode (or just two functions, or 
whatever), but I'd want to be clear about it.

Lastly, there are some minor typos... IsTupelInTuple and IsBagInTupel and so on.
                
> Filter function for IsTupleInBag and IsTupleInTuple
> ---------------------------------------------------
>
>                 Key: PIG-2531
>                 URL: https://issues.apache.org/jira/browse/PIG-2531
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.1
>            Reporter: Florian Leibert (flo)
>            Priority: Minor
>         Attachments: PIG-2531.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> It would be nice to have a FilterFunc that allows to filter based on a tuple 
> in the stream being part of either another tuple of a bag. 
> Data (e.g. session data joined with e.g. follow-up sessions where)
> > BAG: {('/login'), ('/show'), ('/logout?user_id=2000')}, TUPLE: 
> > ('/logout?user_id=2000')
> > BAG: {('/home'), ('/about')}, TUPLE: ('/admin')
> > BAG: {('login')}, TUPLE: ('/logout')
> It would be great to be able to filter filter based on criteria <B1 CONTAINS  
> T1> or <T1 CONTAINS T2>. In the above case, the only result of such an 
> operation would be the first entry '/logout?user_id=2000' - it should be 
> obvious that this is useful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to