[
https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268205#comment-13268205
]
Prashant Kommireddi commented on PIG-2600:
------------------------------------------
Thanks for the review Jon.
1. I agree on Exception handling in most cases, and thanks for catching
"printStackTrace" in the code, it wasn't my intention to leave it in there :).
In general wrapping specific portions of code within try-catch is a good
practice, but I prefer not breaking up try blocks into multiple when most lines
within the method throw the same exception, and its not a lot of code
otherwise. In these UDFs, Schema.getField is used more than once and ofcourse
they all throw FrontEndException.
And looking at builtin UDFs for examples was really not the best idea :). Looks
like some refactoring required there.
2. Regarding resizing of HashSet, trying to optimize right now might be a bit
premature. My comment about frequent resizing would make sense if the number of
distinct elements in Map values was large. A HashSet start with an internal
array of size 16, and starts expanding (creating a new array and copying
elements over) once a certain threshold is met. With the current approach,
HashSet implementation guesses an approximate size of array based on size of
the Collection being passed to it in the constructor. "it just adds all of the
elements, and resizes dynamically as you add more of them" - this can be a
costly operation if you start with 16 elements and the number of distinct
values in the map was thousands, millions... "since that first pass, dynamic
resizing and all, is going to happen anyway" - it should make sense that the
amount of resizing is not the same in the 2 cases? Either way, too early to be
thinking about optimization there :)
I will upload a patch soon with the changes, thanks for reviewing again.
> Better Map support
> ------------------
>
> Key: PIG-2600
> URL: https://issues.apache.org/jira/browse/PIG-2600
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Prashant Kommireddi
> Fix For: 0.11
>
> Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch,
> PIG-2600_4.patch, PIG-2600_5.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add
> a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to
> that Key
> This would all be pretty easy. A more substantial piece of work would be to
> make Pig support non-String keys (this is especially an issue since UDFs and
> whatnot probably assume that they are all Integers). Not sure if it is worth
> it.
> I'd love to hear other things that would be useful for people!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira