[ 
https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324486#comment-14324486
 ] 

Andrew Purtell commented on HBASE-10216:
----------------------------------------

{quote}
First, the HDFS project is unlikely to accept the idea or implement it in the 
first place. Even in the unlikely event that happens,
[Witten] If there is a significant improvement to HBase, and other HDFS clients 
which do merging (Does Parquet or other higher level storage clients?). I would 
think they'd be eager.
{quote}
That is not my expectation at all. I suggest taking this idea to HDFS on an 
HDFS JIRA. If they take it up we have something to discuss, otherwise there's 
not much.

> Change HBase to support local compactions
> -----------------------------------------
>
>                 Key: HBASE-10216
>                 URL: https://issues.apache.org/jira/browse/HBASE-10216
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>         Environment: All
>            Reporter: David Witten
>
> As I understand it compactions will read data from DFS and write to DFS.  
> This means that even when the reading occurs on the local host (because 
> region server has a local copy) all the writing must go over the network to 
> the other replicas.  This proposal suggests that HBase would perform much 
> better if all the reading and writing occurred locally and did not go over 
> the network. 
> I propose that the DFS interface be extended to provide method that would 
> merge files so that the merging and deleting can be performed on local data 
> nodes with no file contents moving over the network.  The method would take a 
> list of paths to be merged and deleted and the merged file path and an 
> indication of a file-format-aware class that would be run on each data node 
> to perform the merge.  The merge method provided by this merging class would 
> be passed files open for reading for all the files to be merged and one file 
> open for writing.  The custom class provided merge method would read all the 
> input files and append to the output file using some standard API that would 
> work across all DFS implementations.  The DFS would ensure that the merge had 
> happened properly on all replicas before returning to the caller.  It could 
> be that greater resiliency could be achieved by implementing the deletion as 
> a separate phase that is only done after enough of the replicas had completed 
> the merge. 
> HBase would be changed to use the new merge method for compactions, and would 
> provide an implementation of the merging class that works with HFiles.
> This proposal would require a custom code that understands the file format to 
> be runnable by the data nodes to manage the merge.  So there would need to be 
> a facility to load classes into DFS if there isn't such a facility already.  
> Or, less generally, HDFS could build in support for HFile merging.
> The merge method might be optional.  If the DFS implementation did not 
> provide it a generic version that performed the merge on top of the regular 
> DFS interfaces would be used.
> It may be that this method needs to be tweaked or ignored when the region 
> server does not have a local copy data so that, as happens currently, one 
> copy of the data moves to the region server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to