[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-06-19 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397163#comment-13397163
 ] 

Karthik Ranganathan commented on HBASE-5509:


I know :) but I dont get the reason though. Going to put in a couple of 
comments more, but if its a no go - then oh well.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-06-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396585#comment-13396585
 ] 

Lars Hofhansl commented on HBASE-5509:
--

bq. otherwise you'd have to wait for native hardlinks

Looks like that is not going to happen (HDFS-3370).

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-06-18 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396510#comment-13396510
 ] 

Karthik Ranganathan commented on HBASE-5509:


@Lars - I ripped out some code which used the hardlinking - we have implemented 
it internally. I believe we are planning on opensourcing this, otherwise you'd 
have to wait for native hardlinks. The current copy approach still works though 
for a few tens of TB's.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-15 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230810#comment-13230810
 ] 

Lars Hofhansl commented on HBASE-5509:
--

Ran a first few tests on a table with 16 regions and two column families. 
(small cluster with 6 datanodes/regionservers)
Snapshot took about two times as long as distcp of the same data (which 
surprised me).
Interestingly distcp used 48 mappers while Snapshot used 16 mappers.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226617#comment-13226617
 ] 

stack commented on HBASE-5509:
--

I added some.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-08 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225580#comment-13225580
 ] 

Zhihong Yu commented on HBASE-5509:
---

I put a few comments on RB.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-08 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225559#comment-13225559
 ] 

Jesse Yates commented on HBASE-5509:


Looking at it today - should have my comments in by COB.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-08 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225539#comment-13225539
 ] 

Lars Hofhansl commented on HBASE-5509:
--

Any takers for a review?
I assume +1 from the FB guys (right Karthik?) :)


> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224075#comment-13224075
 ] 

Lars Hofhansl commented on HBASE-5509:
--

Create https://reviews.apache.org/r/4218/ for better commenting.
Let's get this thing into trunk. And maybe 0.94 :)


> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223605#comment-13223605
 ] 

Zhihong Yu commented on HBASE-5509:
---

I agree about the point w.r.t. getStoreFileList()

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-06 Thread Karthik Ranganathan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223598#comment-13223598
 ] 

Karthik Ranganathan commented on HBASE-5509:


@Zhihong Yu:
We use this code as the primary means to backup HFiles inside FB. We have done 
a lot of improvements to the DFS copy underneath, and they have caused some 
bugs, but thats unrelated to this code. Not too many issues, besides tuning the 
number of mappers to use so that we dont overwhelm a running system.

@Lars:
You are correct about getStoreFileList() - it is passed from commandline and it 
is overloaded for a subset/all CF's. Zhihong - the list versus a 
comma-separated string is a trivial point since the list construction has to 
happen either in the RS or in the caller, so should not make much of a 
difference practically.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-03 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221796#comment-13221796
 ] 

Lars Hofhansl commented on HBASE-5509:
--

bq. For getStoreFileList():{code}
+   * @param families
+   *  a comma separated list of column families for which we need to
{code} I think List may be better data type for families parameter. 
This would make this method more general in that 

Families is taken from the command line; it might be (1) an indicator for all 
CFs, or  (2) a list of specific CFs. In the former case we cannot get the list 
of CFs before we know the RegionServer, which only happens later.

Again, this is not my code. I can refactor and have extra command line flags 
and extra code for this.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-03 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221789#comment-13221789
 ] 

Lars Hofhansl commented on HBASE-5509:
--

bq. Is it possible to make the src and dst comply to same data type ? Either 
FileStatus or Path.

It is. It means the code in SnapshotMR would be slightly less readable. On the 
other hand we'd do fewer RPC to get the file system status. Also requires 
sameFile to be package private, otherwise we need to double check the file 
here. I'll do that and then we can decide.


> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-03 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221787#comment-13221787
 ] 

Lars Hofhansl commented on HBASE-5509:
--

bq. For sameFile(), I think false should be returned for dest file in the 
following case:{code}
+  //return true if checksum is not supported
+  //(i.e. some of the checksums is null)
{code}

Not sure I agree with that. The method comment says that if either FS does not 
support checksums and that check is ignored. I.e. if any of the earlier tests 
(like size comparison) did not flag the files as different they are considered 
equal).

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221499#comment-13221499
 ] 

Jesse Yates commented on HBASE-5509:


bq. The command options need to be documented better. In fact the argument 
parsing should be improved too.

+1

bq. Generally: It o.a.h.h.backups the right place to put this? Do we want this 
in core HBase?

IMO, it should definitely be part of core. Think about the most common DBs, 
backup/snapshot is part of the database, as opposed to some other tool that you 
get from somewhere else. We can always break the paradigm, but it seems to fit 
in this case.

bq.1. do we want to this route at all 

I think this approach is pretty reasonable. To get 'real' snapshotting, we will 
obviously have to do a bit more work, but this is the right approach to get 
there. Ideally, I should just be able to hook up the region files to another 
cluster and be able to recover/rollback to the previous state. This seems the 
safest and fastest, though debatable how much of either and if its worth the 
work at the moment.


> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221482#comment-13221482
 ] 

Lars Hofhansl commented on HBASE-5509:
--

FileReporter seems like a left-over that was never used. Should be removed.
I'm going to post another patch soon with some cleanups.

The command options need to be documented better. In fact the argument parsing 
should be improved too.

Generally: It o.a.h.h.backups the right place to put this? Do we want this in 
core HBase?


> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221478#comment-13221478
 ] 

Zhihong Yu commented on HBASE-5509:
---

I think Karthick may tell us something about the failure scenarios they have 
handled through this approach :-)

SnapshotMR.FileReporter's run() method only sleeps. What purpose does 
FileReporter serve ?
{code}
+   * Map method. Copies one file from source file system to destination.
{code}
The above is inaccurate: every file returned from 
SnapshotUtilities.getStoreFileList() is copied.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221404#comment-13221404
 ] 

Lars Hofhansl commented on HBASE-5509:
--

It's not ready for RB. Note that this *is* the Facebook patch ported to trunk 
with the changes I mentioned. All points Ted mentioned are from the FB patch.

The types of comment I am looking for are: 1. do we want to this route at all  
2. general comments on failure scenarios.
Then I can go and clean up the finer points.


> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221385#comment-13221385
 ] 

Jesse Yates commented on HBASE-5509:


think it might be time to RB this bad boy; I've got a bunch of comments of my 
own.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221371#comment-13221371
 ] 

Zhihong Yu commented on HBASE-5509:
---

SnapshotUtilities.java misses license and javadoc for the class.
{code}
+  public static boolean sameFile(FileSystem srcfs, FileStatus srcstatus,
+  FileSystem dstfs, Path dstpath, boolean skipCRCCheck) throws IOException 
{
{code}
Is it possible to make the src and dst comply to same data type ? Either 
FileStatus or Path.

For sameFile(), I think false should be returned for dest file in the following 
case:
{code}
+  //return true if checksum is not supported
+  //(i.e. some of the checksums is null)
{code}
{code}
+  public static Path getPathInTrash(Path path, String hbaseUser,
+  FileSystem srcFileSys) throws IOException {
{code}
I think FileSystem parameter should be placed as first parameter for the above 
method.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221361#comment-13221361
 ] 

Zhihong Yu commented on HBASE-5509:
---

Right.
The existing isTableInfoExists() has a different signature.

Pardon me.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221356#comment-13221356
 ] 

Lars Hofhansl commented on HBASE-5509:
--

@Ted: This is against trunk.

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-02 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221152#comment-13221152
 ] 

Zhihong Yu commented on HBASE-5509:
---

The following already exists in FSTableDescriptors.java:
{code}
+  public static boolean isTableInfoExists(FileSystem fs, Path tabledir)
{code}
Can the patch be refreshed based on current TRUNK ?

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-01 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220712#comment-13220712
 ] 

Lars Hofhansl commented on HBASE-5509:
--

This is a work on progress!

> MR based copier for copying HFiles (trunk version)
> --
>
> Key: HBASE-5509
> URL: https://issues.apache.org/jira/browse/HBASE-5509
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, regionserver
>Reporter: Karthik Ranganathan
>Assignee: Lars Hofhansl
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the 
> following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>3.1 lists all the HFiles for a given region by querying the regionserver
>3.2 copies all the HFiles
>3.3 outputs success if the copy succeeded, failure otherwise. Failed 
> regions are retried in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region 
> to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira