[ https://issues.apache.org/jira/browse/PIG-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987981#action_12987981 ]
Bill Graham commented on PIG-1782: ---------------------------------- I was also thinking about a map, but I thought we might want to preserve the ordering of the fields specified when explicit fields are requested, as well as CFs, like Dmitriy's example. We'd get the CF fields in the natural ordering that Hbase stores them in too. The more I think about it though, I don't think this is that useful and I think a map approach seems the way to go. @Eric: Yes pig doesn't have any ts control upon writes currently (and that should be improved), but that shouldn't rule out the ability to read them. I can see many use cases where some non-Pig process is populating HBase, but Pig is used for queries. @Dmitriy: I prototyped that exact use case using tuples of tuples, but ran into the downsides you point out. Also each row read has a variable length of tuples, which would seem really difficult to work with. I like this approach when reading all columns in a family: {code} ( rowKey, { col1 => ((val1, ts), ..), col2 => ((val2, ts), ..) } ) {code} For Dymitriy's use case, having the same schema returned (alwaya a map) regardless of how the column families are specified (i.e., 'cf1: cf2:foo' vs 'cf1:' vs 'cf2:foo cf2:bar') is one option. Another is to return a map for CFs and a ((val1, ts), ..) for explicit columns. I'm not sure which approach would make life easier on the script writer. > Add ability to load data by column family in HBaseStorage > --------------------------------------------------------- > > Key: PIG-1782 > URL: https://issues.apache.org/jira/browse/PIG-1782 > Project: Pig > Issue Type: New Feature > Environment: Java 6, Mac OS X 10.6 > Reporter: Eric Yang > Assignee: Bill Graham > > It would be nice to load all columns in the column family by using short hand > syntax like: > {noformat} > CpuMetrics = load 'hbase://SystemMetrics' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey'); > {noformat} > Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1, in > cpu column family. > CpuMetrics would contain something like: > {noformat} > (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1) > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.