[jira] [Updated] (CRUNCH-127) Allow multiple HBaseTargets in a single pipeline

Micah Whitacre (JIRA) Tue, 18 Dec 2012 14:40:13 -0800

     [ 
https://issues.apache.org/jira/browse/CRUNCH-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Micah Whitacre updated CRUNCH-127:
----------------------------------

    Attachment: CRUNCH-127_itest.patch

I wrote up an itest that I thought would demonstrate writing to two table 
successfully.  I haven't gotten it to execute successfully. (I do have your 
patched applied locally)  The message indicates that the Job is failing but I 
haven't dug into why just yet.

Is that how you anticipated the consumers using the multiple outputs?  Or did I 
do something wrong?  

It'd be nice if we could actually hide the multi table support from consumers.  
From a consumer API perspective the way I would hope to use this would be to 
seemingly do independent writes anywhere I want along the pipeline but 
implementation wise they would be aggregated to use the HBaseMultiTableTarget 
if necessary.

If there was a method on the ToHBase class like:

{code}
  PCollection<Put> puts = ...;
  ToHBase.write(pipeline, "tableName", puts);
{code}

This would essentially hide the conversion to PTable<ImmutableBytesWritable, 
Put> which seems like the same code everywhere.  The difficulty with the above 
is if ToHBase would have to track internal state of the target and union of the 
collections.

Or if this could be hidden behind HBaseTarget itself that would be nice.  Just 
throwing out ideas and will hopefully get some time to play with the 
implementation.

Also is the intention that a single pipeline would only ever use the 
HBaseMultiTableTarget or HBaseTarget?  Or would it be acceptable to use the 
together?  
                
> Allow multiple HBaseTargets in a single pipeline
> ------------------------------------------------
>
>                 Key: CRUNCH-127
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-127
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Micah Whitacre
>            Assignee: Josh Wills
>         Attachments: CRUNCH-127_itest.patch, CRUNCH-127.patch
>
>
> Currently when a pipeline contains writes to multiple HBaseTargets, all puts 
> are being sent to the first configured HBaseTarget ignoring the second one 
> and causing issues if the columns are not the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CRUNCH-127) Allow multiple HBaseTargets in a single pipeline

Reply via email to