Re: Recycling directories and backup performance. Was: Re: rsync --link-dest won't link even if existing file is out of date (fwd)

2015-04-16 Thread Ken Chase
How do you handle snapshotting? or do you leave that to the block/fs 
virtualization
layer?

/kc


On Fri, Apr 17, 2015 at 01:35:27PM +1200, Henri Shustak said:
   Our backup procudures have provision for looking back at previous 
directories, but there is not much to be gained with recycled directories.  
Without recycling, and after a failure, the latest available backup may not 
have much in it.
  
  Just wanted to point out that LBackup has a number of checks in place to 
detect failures during a backup. If this happens, then that backup is not 
labeled as a successful snapshot. 
  
  At present, when the next snap shot is started, the previous incomplete 
snapshot(s) are not used as a link-dest source. As mentioned, this is something 
I have been looking at for a while. However, there are some edge cases which 
need to be handled carefully if you use incomplete backups as a link-dest 
source. I am sure these problems are all contractable, I have simply not spend 
enough time.
  
  -
  This email is protected by LBackup, an open source backup solution.
  http://www.lbackup.org
  
  
  
  
  -- 
  Please use reply-all for most replies to avoid omitting the mailing list.
  To unsubscribe or change options: 
https://lists.samba.org/mailman/listinfo/rsync
  Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Ken Chase - k...@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto 
Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front 
St. W.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Recycling directories and backup performance. Was: Re: rsync --link-dest won't link even if existing file is out of date (fwd)

2015-04-16 Thread Henri Shustak
 Our backup procudures have provision for looking back at previous 
 directories, but there is not much to be gained with recycled directories.  
 Without recycling, and after a failure, the latest available backup may not 
 have much in it.

Just wanted to point out that LBackup has a number of checks in place to detect 
failures during a backup. If this happens, then that backup is not labeled as a 
successful snapshot. 

At present, when the next snap shot is started, the previous incomplete 
snapshot(s) are not used as a link-dest source. As mentioned, this is something 
I have been looking at for a while. However, there are some edge cases which 
need to be handled carefully if you use incomplete backups as a link-dest 
source. I am sure these problems are all contractable, I have simply not spend 
enough time.

-
This email is protected by LBackup, an open source backup solution.
http://www.lbackup.org




-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Recycling directories and backup performance. Was: Re: rsync --link-dest won't link even if existing file is out of date (fwd)

2015-04-16 Thread Henri Shustak

 How do you handle snapshotting? or do you leave that to the block/fs 
 virtualization layer?


I am guessing this question is directed at me. 

Firstly, when I used the word snapshot, I was referring to the snapshot in the 
LBackup context. It is outlined at the following page 
http://www.lbackup.org/hard-links. As such, it is not a file system snapshot 
(unless you use the scripting subsystem) it is instead a backup made at a 
specific date and time.

Secondly, if you are using the scripting subsystem to take an actual snapshot 
and then possibly mount this and then use LBackup to take a snapshot of that 
then this uses the file system virtualization layer.

Apologies for the confusion using this word has caused. It is simply a snapshot 
as described and referenced though out the LBackup documentation.

Let me know if further clarification is required.


This email is protected by LBackup, an open source backup solution
http://www.lbackup.org

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Recycling directories and backup performance. Was: Re: rsync --link-dest won't link even if existing file is out of date (fwd)

2015-04-15 Thread Robert Bell

rsync folks,

Henri Shustak henri.shus...@gmail.com wrote:

LBackup always starts a new backup snapshot with an empty directory. I
have been looking at extending --link-dest options to scan beyond just
the previous successful backup to (failed backups / older backups).
However, there are all kinds of edge cases which are worth considering
with such a changes. At present LBackup is focused on reliability as
such, this RD is quite slow given limited resources. The current
version of LBackup offers IMHO reliable backups of user data and the
scripting sub-system offers a high degree of flxibility.

We recycle directories in our backup scheme, and on tests it is 3
to 6 times faster than creating a new directory tree and then deleting
an old one.  Your timing will be different - the speed depends on the relative
numbers of files and directories I'd imagine.

Half our recycled directories are 5 to 6 days old, and our churn rate
is typically only 0.5% of files and 1% of data each day. So, the
recycled directory is usually about 95% right.


Our backup procudures have provision for looking back at previous
directories, but there is not much to be gained with recycled
directories.  Without recycling, and after a failure, the latest
available backup may not have much in it, and won't be a good place to
link-dest from - you need to go further back, as Henry is considering.


Yes, every time you start a backup snapshot, a directory is
re-populated from scratch and this takes time with LBackup. However,
if you are seeking reliability then you may wish to check out the
following URL : http://www.lbackup.org


We rarely have a failure with our backups.   If we do, our procedure
just re-labels the unfinished directory, and re-syncs as normal on the
next attempt.

And, in another post, Henri Shustak henri.shus...@gmail.com gave
good advice on splitting up backups, etc to get performance, for someone
whose original post I didn't find.


Ill take a look but I imagine I cant backup the 80 Million files I
need to in under the 5 hours i have for nightly maintenance/backups.
Currently it's possible by recycling directories...



Here are some performance figures from our backup yesterday.  (We have
multiple streams to several filesystems).

The backups completed in 41 minutes.
They transferred:
  80533 files out of  18127734 files available (0.4%)
57871616693 bytes out of 6827716377557 bytes available (0.8%)
 - so, with this low churn rate, we could backup 80 million files in
   about 3 hours.

Our backup target filesystems include SSD and are managed by SGI's DMF.

Hope this helps.

Rob.

Dr Robert C. Bell
HPC National Partnerships | Scientific Computing
Information Management and Technology
CSIRO
T +61 3 9669 8102 Alt +61 3 8601 3810 Mob +61 428 108 333
robert.b...@csiro.aumailto:robert.b...@csiro.au | www.csiro.au | 
wiki.csiro.au/display/ASC/
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia

PLEASE NOTE
The information contained in this email may be confidential or privileged.
Any unauthorised use or disclosure is prohibited.  If you have received
this email in error, please delete it immediately and notify the sender by
return email. Thank you.  To the extent permitted by law, CSIRO does not
represent, warrant and/or guarantee that the integrity of this
communication has been maintained or that the communication is free of
errors, virus, interception or interference.

Please consider the environment before printing this email.
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html