[ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268173#comment-13268173 ]
Jonathan Coveney commented on PIG-2600: --------------------------------------- Prashant, Thanks for your work on this! It's 99.9% of the way there. A more administrative point...you'll need to reupload it with permissions given to apache to use. What follows is fairly nitpicky. - In valuelist, you have {code} + } + else { {code} just bump that else up Less nitpicky: - You have many cases where you catch (Exception). Try and catch the least general exception possible. Example: In valuelist, you have {code} + } catch (Exception e) { + String msg = "Error while generating " + this.getClass().getSimpleName(); + throw new RuntimeException(msg, e); + } {code} (I'm not even quite sure what exception is being thrown?) Further, instead of throwing a runtime exception, the convention (in exec) is to throw ExecException since exec throws IOException anyway. The builtin functions have many examples of this. But yeah, in general, you want to catch the most specific exception possible in the most specific portion of code possible... I don't like the convention of wrapping an entire method in a try{}catch(Exception). At least, that's how I like to do it and I'm reviewing this ;) So basically: try's should be around the most specific piece of code possible, should catch the most specific exception possible, and should throw ExecException if the code are allows. Throwing a RuntimeException should be a last resort (and is really just jank around the annoying exceptions required by outputSchema). Similarly, in KEYSET you have {code} + } catch(FrontendException e) { + // TODO Auto-generated catch block + e.printStackTrace(); + } {code} which I assume was an oversight left by the IDE :) (those things will be the death of us Java programmers...) - I think we had talked a bit about how to do VALUESET... I still think you could do it in one pass, but I don't think the performance gain will be so massive that you can't do it in two passes. Your comment about frequent resizing isn't really accurate, since if you look at the HashSet constructor, it doesn't do any fancy math...it just adds all of the elements, and resizes dynamically as you add more of them. Thus, you could just add to the HashSet, and if it wasn't previously contained, add it to the Tuple, thus doing it in one pass (since that first pass, dynamic resizing and all, is going to happen anyway). Does that make sense? Thanks again for your work on this. It is super close. Jon > Better Map support > ------------------ > > Key: PIG-2600 > URL: https://issues.apache.org/jira/browse/PIG-2600 > Project: Pig > Issue Type: Improvement > Reporter: Jonathan Coveney > Assignee: Prashant Kommireddi > Fix For: 0.11 > > Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, > PIG-2600_4.patch, PIG-2600_5.patch > > > It would be nice if Pig played better with Maps. To that end, I'd like to add > a lot of utility around Maps. > - TOBAG should take a Map and output {(key, value)} > - TOMAP should take a Bag in that same form and make a map. > - KEYSET should return the set of keys. > - VALUESET should return the set of values. > - VALUELIST should return the List of values (no deduping). > - INVERSEMAP would return a Map of values => the set of keys that refer to > that Key > This would all be pretty easy. A more substantial piece of work would be to > make Pig support non-String keys (this is especially an issue since UDFs and > whatnot probably assume that they are all Integers). Not sure if it is worth > it. > I'd love to hear other things that would be useful for people! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira