[ 
https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268173#comment-13268173
 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Prashant,

Thanks for your work on this! It's 99.9% of the way there. A more 
administrative point...you'll need to reupload it with permissions given to 
apache to use. What follows is fairly nitpicky.
- In valuelist, you have
{code}
+            }
+            else {
{code}
just bump that else up

Less nitpicky:
- You have many cases where you catch (Exception). Try and catch the least 
general exception possible. Example: In valuelist, you have 
{code}
+        } catch (Exception e) {
+            String msg = "Error while generating " + 
this.getClass().getSimpleName();
+            throw new RuntimeException(msg, e);
+        }
{code}

(I'm not even quite sure what exception is being thrown?)

Further, instead of throwing a runtime exception, the convention (in exec) is 
to throw ExecException since exec throws IOException anyway. The builtin 
functions have many examples of this. But yeah, in general, you want to catch 
the most specific exception possible in the most specific portion of code 
possible... I don't like the convention of wrapping an entire method in a 
try{}catch(Exception). At least, that's how I like to do it and I'm reviewing 
this ;) So basically: try's should be around the most specific piece of code 
possible, should catch the most specific exception possible, and should throw 
ExecException if the code are allows. Throwing a RuntimeException should be a 
last resort (and is really just jank around the annoying exceptions required by 
outputSchema).

Similarly, in KEYSET you have
{code}
+        } catch(FrontendException e) {
+            // TODO Auto-generated catch block
+            e.printStackTrace();
+        }
{code}
which I assume was an oversight left by the IDE :) (those things will be the 
death of us Java programmers...)

- I think we had talked a bit about how to do VALUESET... I still think you 
could do it in one pass, but I don't think the performance gain will be so 
massive that you can't do it in two passes. Your comment about frequent 
resizing isn't really accurate, since if you look at the HashSet constructor, 
it doesn't do any fancy math...it just adds all of the elements, and resizes 
dynamically as you add more of them. Thus, you could just add to the HashSet, 
and if it wasn't previously contained, add it to the Tuple, thus doing it in 
one pass (since that first pass, dynamic resizing and all, is going to happen 
anyway). Does that make sense?

Thanks again for your work on this. It is super close.
Jon
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, 
> PIG-2600_4.patch, PIG-2600_5.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add 
> a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to 
> that Key
> This would all be pretty easy. A more substantial piece of work would be to 
> make Pig support non-String keys (this is especially an issue since UDFs and 
> whatnot probably assume that they are all Integers). Not sure if it is worth 
> it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to