[ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721380#comment-14721380
 ] 

Lars Hofhansl commented on HBASE-13153:
---------------------------------------

Thanks for thinking about this. Generally it might be worth considering doing 
this a layer above HBase. I.e. some code will generate a set of HFile to be 
bulk loaded. Before the actual bulk load happens we could ship the HFiles to 
the slave cluster and do the bulk loading there (just the loading, not the 
generation of the files)... This just as a general comment.

bq. Replication module will be one of the BulkLoad Actions Listener, so it will 
get notification about newly added hfiles along with their hdfs paths.

What if that notification is missed? For example the RS dies just then? WAL 
replication does not have this issue since it always deals with all existing 
WALs so it cannot miss anything.

bq.. HFileReplicationEndPoint will maintain a queue of hfiles. After every 
configurable interval or max request size limit, it will send a RPC request to 
peer cluster RS with all queued entries.

So you'll send the HFile over RPCs? These files can be huge. Can we use HDFS' 
distCP here?

bq. HFileReplicationEndPoint will maintain a queue of hfiles. After every 
configurable interval or max request size limit, it will send a RPC request to 
peer cluster RS with all queued entries.

Can we simply use the standard bulk load mechanism here? It would split the 
files as necessary. 

bq. The hfile should not get deleted from archive folder until the replication 
is finished.

You'll need to ensure this somehow.

bq. Cyclic replication: There will not be any data validation for cyclic case.

That can lead to very tricky issues where the same files just go from cluster 
to cluster in a never ending cycle. We know at the source that the HFiles came 
from a bulk load, maybe we can handle that specially.

Lastly, it might be generally a good option to copy HFiles around, rather than 
WALs (at least for some setups). Could we use this to do that?


> enable bulkload to support replication
> --------------------------------------
>
>                 Key: HBASE-13153
>                 URL: https://issues.apache.org/jira/browse/HBASE-13153
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication
>            Reporter: sunhaitao
>            Assignee: Ashish Singhi
>             Fix For: 2.0.0
>
>         Attachments: HBase Bulk Load Replication.pdf
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to