[ 
https://issues.apache.org/jira/browse/PIG-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated PIG-2883:
---------------------------

    Description: 
HBaseStorage allows a user to load many HBase columns by specifying the prefix. 
The problem is to access such columns later, if their names are dynamically 
created and hold some meaningful information, which you want to process as well 
(it seems to be relatively common). 

Quick example:

User = LOAD 'hbase://user' USING HBaseStorage('friends:*', '-loadKey true') 
  AS (username:bytearray, friendMap:map[]);
UserAndFriend = FOREACH User 
  GENERATE username, friendMap#'What_should_I_to_put_here?';

It would be convenient to easily get the full list of key/value pairs (or just 
keys or values) from a map (something like MapKeysToBag, MapValuesToBag, 
MapEntriesToBag UDFs). Having such UFDs, we may FLATTEN returned bag and 
generate a relation that contains unnested keys or values extracted from the 
map e.g.:

UserAndFriend = FOREACH Users
  GENERATE username, FLATTEN(MapKeysToTuple(friendMap)) AS friendUsername;

I have already implemented such UDFs (here is repo: 
https://github.com/kawaa/Pigitos and here is a fancy example: 
http://bit.ly/Sf2KCP). I would love to add it to Piggybank (I have not found 
such functionality there).

If you think that it is interested I can prepare a patch as soon as possible. 
Please let me know.

  was:
HBaseStorage allows a user to load many HBase columns by specifying the prefix. 
The problem is to access such columns later, if their names are dynamically 
created and hold some meaningful information, which you want to process as well 
(it seems to be relatively common). 

Quick example:

User = LOAD 'hbase://user' USING HBaseStorage('friends:*', '-loadKey true') 
   AS (username:bytearray, friendMap:map[]);
UserAndFriend = FOREACH User 
  GENERATE username, friendMap#'What_should_I_to_put_here?';

It would be convenient to easily get the full list of key/value pairs (or just 
keys or values) from a map (something like MapKeysToBag, MapValuesToBag, 
MapEntriesToBag UDFs). Having such UFDs, we may FLATTEN returned bag and 
generate a relation that contains unnested keys or values extracted from the 
map e.g.:

UserAndFriend = FOREACH Users
  GENERATE username, FLATTEN(MapKeysToTuple(friendMap)) AS friendUsername;

I have already implemented such UDFs (here is repo: 
https://github.com/kawaa/Pigitos and here is a fancy example: 
http://bit.ly/Sf2KCP). I would love to add it to Piggybank (I have not found 
such functionality there).

If you think that it is interested I can prepare a patch as soon as possible. 
Please let me know.

    
> MapKeysToBag and more UDFs to manipulate maps
> ---------------------------------------------
>
>                 Key: PIG-2883
>                 URL: https://issues.apache.org/jira/browse/PIG-2883
>             Project: Pig
>          Issue Type: Wish
>          Components: piggybank
>    Affects Versions: 0.10.0
>            Reporter: Adam Kawa
>            Priority: Trivial
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> HBaseStorage allows a user to load many HBase columns by specifying the 
> prefix. The problem is to access such columns later, if their names are 
> dynamically created and hold some meaningful information, which you want to 
> process as well (it seems to be relatively common). 
> Quick example:
> User = LOAD 'hbase://user' USING HBaseStorage('friends:*', '-loadKey true') 
>   AS (username:bytearray, friendMap:map[]);
> UserAndFriend = FOREACH User 
>   GENERATE username, friendMap#'What_should_I_to_put_here?';
> It would be convenient to easily get the full list of key/value pairs (or 
> just keys or values) from a map (something like MapKeysToBag, MapValuesToBag, 
> MapEntriesToBag UDFs). Having such UFDs, we may FLATTEN returned bag and 
> generate a relation that contains unnested keys or values extracted from the 
> map e.g.:
> UserAndFriend = FOREACH Users
>   GENERATE username, FLATTEN(MapKeysToTuple(friendMap)) AS friendUsername;
> I have already implemented such UDFs (here is repo: 
> https://github.com/kawaa/Pigitos and here is a fancy example: 
> http://bit.ly/Sf2KCP). I would love to add it to Piggybank (I have not found 
> such functionality there).
> If you think that it is interested I can prepare a patch as soon as possible. 
> Please let me know.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to