Re: reading/writing HBase in Pig

2010-01-25 Thread Alan Gates


On Jan 18, 2010, at 10:14 PM, Michael Dalton wrote:

I took a look at the load-store branch and that definitely seems  
like the
right place to do this. So the right thing to do would be to just  
open up a
JIRA and then post a patch against the load-store rewrite tree,  
correct?


Yes.  You should take a look at PIG-1200, which seems to be going part  
way towards doing what you want to do.


Alan.




Re: reading/writing HBase in Pig

2010-01-25 Thread Jeff Zhang
The Pig-1200 only support using InputFormat now, the other features: load
row key and store to hbase has not been supported, I will continue the
remaining work.



On Mon, Jan 25, 2010 at 11:13 AM, Alan Gates ga...@yahoo-inc.com wrote:


 On Jan 18, 2010, at 10:14 PM, Michael Dalton wrote:

  I took a look at the load-store branch and that definitely seems like the
 right place to do this. So the right thing to do would be to just open up
 a
 JIRA and then post a patch against the load-store rewrite tree, correct?


 Yes.  You should take a look at PIG-1200, which seems to be going part way
 towards doing what you want to do.

 Alan.





-- 
Best Regards

Jeff Zhang


Re: reading/writing HBase in Pig

2010-01-18 Thread Michael Dalton
I took a look at the load-store branch and that definitely seems like the
right place to do this. So the right thing to do would be to just open up a
JIRA and then post a patch against the load-store rewrite tree, correct?
Also, it seems to be that there's no existing support for row keys, which
should also be fixed. The current HBaseStorage assumes that the user passes
a list of columns (i.e. column family/qualifier pairs). However, users may
encode data in the HBase row key as well -- empty row keys are forbidden, so
there is definitely data there.

Doing any sort of StoreFunc implementation of HBase will require row key
support, as each Put must hav ea row key, so it looks like what I'll be
doing is modifying HBaseStorage's LoadFunc support to support row keys in
addition to the existing support for column values, and then adding support
for StoreFunc (with row keys) to HBaseStorage. Just wanted to make sure this
sounds good. Thanks

Best regards,

Mike

On Thu, Jan 14, 2010 at 10:40 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 Hi Mike,
 It would be great to have a StoreFunc for HBase!
 There is  a rewrite underway for the Load/Store stuff that will make
 that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966
 .  You may want to consider writing it for the load-store redesign
 branch.  This is what's probably going to be in 0.7. The first step
 would be to open a jira and look at the existing StoreFunc
 implementations.

 -D

 On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton mwdal...@gmail.com
 wrote:
  Hi all,
 
  I was looking at the current Pig code in SVN, and it seems like HBase is
  supported for loading, but not for storing. If this is the case, I'd like
 to
  add support for writing to HBase to Pig. Is there anyone else working on
  this, and if not is this something that you'd like contributed? Based on
 a
  cursory evaluation of the StoreFunc interface, it looks like the APIs
 there
  are pretty file-centric and may need to be modified to accomodate HBase's
  table-based design. For example, you aren't going to be serializing your
  output to an OutputStream object in all likelihood.
 
  I haven't contributed to Pig before, and I wanted to see if this is
  something that would be beneficial to the rest of the Pig community, and
 if
  so what next steps I should take (like starting a JIRA) to get the ball
  rolling. Thanks
 
  Best regards,
 
  Mike
 



reading/writing HBase in Pig

2010-01-14 Thread Michael Dalton
Hi all,

I was looking at the current Pig code in SVN, and it seems like HBase is
supported for loading, but not for storing. If this is the case, I'd like to
add support for writing to HBase to Pig. Is there anyone else working on
this, and if not is this something that you'd like contributed? Based on a
cursory evaluation of the StoreFunc interface, it looks like the APIs there
are pretty file-centric and may need to be modified to accomodate HBase's
table-based design. For example, you aren't going to be serializing your
output to an OutputStream object in all likelihood.

I haven't contributed to Pig before, and I wanted to see if this is
something that would be beneficial to the rest of the Pig community, and if
so what next steps I should take (like starting a JIRA) to get the ball
rolling. Thanks

Best regards,

Mike


Re: reading/writing HBase in Pig

2010-01-14 Thread Dmitriy Ryaboy
Hi Mike,
It would be great to have a StoreFunc for HBase!
There is  a rewrite underway for the Load/Store stuff that will make
that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966
.  You may want to consider writing it for the load-store redesign
branch.  This is what's probably going to be in 0.7. The first step
would be to open a jira and look at the existing StoreFunc
implementations.

-D

On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton mwdal...@gmail.com wrote:
 Hi all,

 I was looking at the current Pig code in SVN, and it seems like HBase is
 supported for loading, but not for storing. If this is the case, I'd like to
 add support for writing to HBase to Pig. Is there anyone else working on
 this, and if not is this something that you'd like contributed? Based on a
 cursory evaluation of the StoreFunc interface, it looks like the APIs there
 are pretty file-centric and may need to be modified to accomodate HBase's
 table-based design. For example, you aren't going to be serializing your
 output to an OutputStream object in all likelihood.

 I haven't contributed to Pig before, and I wanted to see if this is
 something that would be beneficial to the rest of the Pig community, and if
 so what next steps I should take (like starting a JIRA) to get the ball
 rolling. Thanks

 Best regards,

 Mike