[ 
https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313595#comment-14313595
 ] 

Vladimir Rodionov commented on HBASE-10216:
-------------------------------------------

* Flush MemStore to HDFS with replication 3
* Compact store files and write new one with replication factor 1 ( locally, 
hence no network IO ), but  keep all old files which have replication factor 3.
* Periodically change replication factor of store file from 1 to 3 (but not 
very often)

This is not a local compaction (there is a network IO when we do 1 -> 3 for 
store files), but there is no need to change HDFS API.

By controlling  frequency of 1 -> 3 expansion we can control network overhead.  
This approach makes everything a little bit more complex, of course.



  

> Change HBase to support local compactions
> -----------------------------------------
>
>                 Key: HBASE-10216
>                 URL: https://issues.apache.org/jira/browse/HBASE-10216
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>         Environment: All
>            Reporter: David Witten
>
> As I understand it compactions will read data from DFS and write to DFS.  
> This means that even when the reading occurs on the local host (because 
> region server has a local copy) all the writing must go over the network to 
> the other replicas.  This proposal suggests that HBase would perform much 
> better if all the reading and writing occurred locally and did not go over 
> the network. 
> I propose that the DFS interface be extended to provide method that would 
> merge files so that the merging and deleting can be performed on local data 
> nodes with no file contents moving over the network.  The method would take a 
> list of paths to be merged and deleted and the merged file path and an 
> indication of a file-format-aware class that would be run on each data node 
> to perform the merge.  The merge method provided by this merging class would 
> be passed files open for reading for all the files to be merged and one file 
> open for writing.  The custom class provided merge method would read all the 
> input files and append to the output file using some standard API that would 
> work across all DFS implementations.  The DFS would ensure that the merge had 
> happened properly on all replicas before returning to the caller.  It could 
> be that greater resiliency could be achieved by implementing the deletion as 
> a separate phase that is only done after enough of the replicas had completed 
> the merge. 
> HBase would be changed to use the new merge method for compactions, and would 
> provide an implementation of the merging class that works with HFiles.
> This proposal would require a custom code that understands the file format to 
> be runnable by the data nodes to manage the merge.  So there would need to be 
> a facility to load classes into DFS if there isn't such a facility already.  
> Or, less generally, HDFS could build in support for HFile merging.
> The merge method might be optional.  If the DFS implementation did not 
> provide it a generic version that performed the merge on top of the regular 
> DFS interfaces would be used.
> It may be that this method needs to be tweaked or ignored when the region 
> server does not have a local copy data so that, as happens currently, one 
> copy of the data moves to the region server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to