[jira] [Commented] (HBASE-4655) Document architecture of backups

2012-05-31 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286809#comment-13286809
 ] 

Karthik Ranganathan commented on HBASE-4655:


I think we should add this doc to the HBase book. The code parts of this HBase 
backups feature is already done. I think the next step is to implement a simple 
wrapper script, and document that as well.

The tasks are already created, see HBASE-4618 for a list of sub-tasks (tasks 1, 
2, 4 and 6 are done, 4 needs to be checked in and closed out).

The next one to look at would be HBASE-4664. Let me add some comments in there 
about what we came up with internally, and then we can go ahead from there.

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture v2.docx, HBase Backups 
 Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2012-05-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285289#comment-13285289
 ] 

stack commented on HBASE-4655:
--

What should we do w/ this doc Karthik?  Seems like still stuff to build out?  
Should we make issues for whats to be done?

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture v2.docx, HBase Backups 
 Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2012-05-23 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281684#comment-13281684
 ] 

Karthik Ranganathan commented on HBASE-4655:


Marking as resolved, feel free to send more comments my way in case something 
is not clear.

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture v2.docx, HBase Backups 
 Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-12-08 Thread Karthik Ranganathan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165436#comment-13165436
 ] 

Karthik Ranganathan commented on HBASE-4655:


@Doug:

 list all the regions, for each region, ask the RS hosting it for a list of 
HFiles 

There is already an API to get a list of regions and the regionservers hosting 
them. And we added a new API to the RS to list the HFiles for the regions it 
hosts.

 The strategy is great, but it will generate a flurry of (warranted) 
questions on how the average person does it. 

True - but this task is only to make sure the document is easy to read and 
understand by an average user. We can definitely add more details if needed, 
but that would risk confusing people. I will definitely incorporate the other 
suggestions (confusing names, etc). The rest of the tasks deal with giving a 
way for the average users to do backups by running/cron-ing a command and not 
have to deal with the internals of how it works.

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-30 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160286#comment-13160286
 ] 

Doug Meil commented on HBASE-4655:
--

Hi folks, sorry about the delay in commenting.

I liked the refresher on why backup? in the beginning.  

I also found some of the names confusing (e.g., RBU, CBU).

The strategy here in the doc is terrific, but I'd like to see this get a little 
more actionable with specifics.  For example in the Stage1 RBU incremental, 
list all the regions, for each region, ask the RS hosting it for a list of 
HFiles.  How is this to be done?  Using Java-API to list regions?  Reading the 
HBase files from HDFS?  Ostensibly the RS hosting the region has to come from 
an online API.  The strategy is great, but it will generate a flurry of 
(warranted) questions on how the average person does it.


 

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151452#comment-13151452
 ] 

Doug Meil commented on HBASE-4655:
--

I'll gladly port this to the book, and I'd like to add this in here...
http://hbase.apache.org/book.html#ops.backup
... with the existing backup info.

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Karthik Ranganathan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151457#comment-13151457
 ] 

Karthik Ranganathan commented on HBASE-4655:


Sounds great Doug! Maybe we make a new section, keep adding stuff in, and 
deprecate the old stuff? Or whatever works...

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151508#comment-13151508
 ] 

Todd Lipcon commented on HBASE-4655:


Two quick notes from looking over the doc:

- the names are a little confusing to me - in-cluster back up is actually two 
clusters, right? I'd call your RBU an in-cluster backup, I'd call your CBU an 
in-datacenter backup, and I'd call your DBU a cross-datacenter backup, DR 
backup, or BCP backup.

- For RBU, maybe we can get atomicity in a simpler manner by having the region 
server initiate the copy of hfiles? It can hold the lock to block flushes while 
the copies happen (they're hard-link copies, right?) 


 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Karthik Ranganathan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151564#comment-13151564
 ] 

Karthik Ranganathan commented on HBASE-4655:


For #1, totally :) internally, we use the term cluster to denote a section of 
the data center (as opposed to the HBase cluster), a data center is composed of 
a number of clusters, hence the name. in-DC and cross-DC work.

For #2, this makes the running cluster stall and not take updates for the time 
period of the copy. It is fast-copy with hard-links underneath, but there is 
nothing in the current design that would stop it from being used against a 
remote cluster or a DFS version without the hard-link. Also, if for some reason 
the hard link fails, it does a deep copy, so it could have longer stalls.

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151589#comment-13151589
 ] 

stack commented on HBASE-4655:
--

Echo Todd #1 remarks.

For '...incremental backups at the Stage 1 (RBU) level', won't the time between 
step between b and d be 'large' and during the copy time, the list of files 
could change on you; i.e. when you go to copy a file, it maybe have been 
removed because it'd been compacted.  What do you do in this case?  (Your list 
may not included the compacted file)?

For a.The backups rely on the clocks across the various region-servers for 
determining the point in time to which the edits are re-played, so, say a 
server is lagging the others by a good bit?   When replaying the edits, you'd 
replay edits from when this lagging server said the backup began?

How will you know which hlogs to replay?  You'll open it and look at first and 
last edits in the file?  Or should we write out metadata files for hlogs?  Or 
is it enough relying on hdfs modtime?

Looks great K.



 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Karthik Ranganathan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151595#comment-13151595
 ] 

Karthik Ranganathan commented on HBASE-4655:


 For '...incremental backups at the Stage 1 (RBU) level', won't the time 
between step between b and d be 'large' and during the copy time, the list of 
files could change on you; i.e. when you go to copy a file, it maybe have been 
removed because it'd been compacted. What do you do in this case? (Your list 
may not included the compacted file)? 

We look for the deleted files in .Trash and reclaim. If they are not present, 
we fail the backup for the region. The backup job runs in loops - the first 
loop starts out with all regions. The failed regions are output and the second 
loop works only on the failed regions. The number of loops is configurable - we 
have defaulted at 5.


 For a.The backups rely on the clocks across the various region-servers for 
determining the point in time to which the edits are re-played, so, say a 
server is lagging the others by a good bit? When replaying the edits, you'd 
replay edits from when this lagging server said the backup began? 

No, right now we just subtract a configurable amount of time (say 5 mins) to 
the start time of the MR job to keep things simple. We could totally do what 
you say as an enhancement.

 How will you know which hlogs to replay? You'll open it and look at first 
and last edits in the file? Or should we write out metadata files for hlogs? Or 
is it enough relying on hdfs modtime? 

The hlog files are of the format hlog.TIMESTAMP, TIMESTAMP is time when log is 
created. We look at this time to determine the file set. We need all files 
where TIMESTAMP  start time and TIMESTAMP  finish time. We need the latest 
file where TIMESTAMP  start time.


 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151605#comment-13151605
 ] 

stack commented on HBASE-4655:
--

Sounds good Karthik.

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira