Re: reading/writing HBase in Pig
On Jan 18, 2010, at 10:14 PM, Michael Dalton wrote: I took a look at the load-store branch and that definitely seems like the right place to do this. So the right thing to do would be to just open up a JIRA and then post a patch against the load-store rewrite tree, correct? Yes. You should take a look at PIG-1200, which seems to be going part way towards doing what you want to do. Alan.
Re: reading/writing HBase in Pig
The Pig-1200 only support using InputFormat now, the other features: load row key and store to hbase has not been supported, I will continue the remaining work. On Mon, Jan 25, 2010 at 11:13 AM, Alan Gates ga...@yahoo-inc.com wrote: On Jan 18, 2010, at 10:14 PM, Michael Dalton wrote: I took a look at the load-store branch and that definitely seems like the right place to do this. So the right thing to do would be to just open up a JIRA and then post a patch against the load-store rewrite tree, correct? Yes. You should take a look at PIG-1200, which seems to be going part way towards doing what you want to do. Alan. -- Best Regards Jeff Zhang
Re: reading/writing HBase in Pig
I took a look at the load-store branch and that definitely seems like the right place to do this. So the right thing to do would be to just open up a JIRA and then post a patch against the load-store rewrite tree, correct? Also, it seems to be that there's no existing support for row keys, which should also be fixed. The current HBaseStorage assumes that the user passes a list of columns (i.e. column family/qualifier pairs). However, users may encode data in the HBase row key as well -- empty row keys are forbidden, so there is definitely data there. Doing any sort of StoreFunc implementation of HBase will require row key support, as each Put must hav ea row key, so it looks like what I'll be doing is modifying HBaseStorage's LoadFunc support to support row keys in addition to the existing support for column values, and then adding support for StoreFunc (with row keys) to HBaseStorage. Just wanted to make sure this sounds good. Thanks Best regards, Mike On Thu, Jan 14, 2010 at 10:40 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Mike, It would be great to have a StoreFunc for HBase! There is a rewrite underway for the Load/Store stuff that will make that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966 . You may want to consider writing it for the load-store redesign branch. This is what's probably going to be in 0.7. The first step would be to open a jira and look at the existing StoreFunc implementations. -D On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton mwdal...@gmail.com wrote: Hi all, I was looking at the current Pig code in SVN, and it seems like HBase is supported for loading, but not for storing. If this is the case, I'd like to add support for writing to HBase to Pig. Is there anyone else working on this, and if not is this something that you'd like contributed? Based on a cursory evaluation of the StoreFunc interface, it looks like the APIs there are pretty file-centric and may need to be modified to accomodate HBase's table-based design. For example, you aren't going to be serializing your output to an OutputStream object in all likelihood. I haven't contributed to Pig before, and I wanted to see if this is something that would be beneficial to the rest of the Pig community, and if so what next steps I should take (like starting a JIRA) to get the ball rolling. Thanks Best regards, Mike
reading/writing HBase in Pig
Hi all, I was looking at the current Pig code in SVN, and it seems like HBase is supported for loading, but not for storing. If this is the case, I'd like to add support for writing to HBase to Pig. Is there anyone else working on this, and if not is this something that you'd like contributed? Based on a cursory evaluation of the StoreFunc interface, it looks like the APIs there are pretty file-centric and may need to be modified to accomodate HBase's table-based design. For example, you aren't going to be serializing your output to an OutputStream object in all likelihood. I haven't contributed to Pig before, and I wanted to see if this is something that would be beneficial to the rest of the Pig community, and if so what next steps I should take (like starting a JIRA) to get the ball rolling. Thanks Best regards, Mike
Re: reading/writing HBase in Pig
Hi Mike, It would be great to have a StoreFunc for HBase! There is a rewrite underway for the Load/Store stuff that will make that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966 . You may want to consider writing it for the load-store redesign branch. This is what's probably going to be in 0.7. The first step would be to open a jira and look at the existing StoreFunc implementations. -D On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton mwdal...@gmail.com wrote: Hi all, I was looking at the current Pig code in SVN, and it seems like HBase is supported for loading, but not for storing. If this is the case, I'd like to add support for writing to HBase to Pig. Is there anyone else working on this, and if not is this something that you'd like contributed? Based on a cursory evaluation of the StoreFunc interface, it looks like the APIs there are pretty file-centric and may need to be modified to accomodate HBase's table-based design. For example, you aren't going to be serializing your output to an OutputStream object in all likelihood. I haven't contributed to Pig before, and I wanted to see if this is something that would be beneficial to the rest of the Pig community, and if so what next steps I should take (like starting a JIRA) to get the ball rolling. Thanks Best regards, Mike