Re: [BackupPC-users] RsyncP problem

2009-12-14 Thread Jeffrey J. Kosowsky
Harald Amtmann wrote at about 19:29:07 +0100 on Monday, December 7, 2009:
  So, for anyone who cares (doesn't seem to be anyone on this list who 
  noticed), I found this post from 2006 stating and analyzing my exact problem:

You are assuming something that is not true...

  
  http://www.topology.org/linux/backuppc.html
  On this site, search for Design flaw: Avoidable re-transmission of massive 
  amounts of data.
  
  
  For future reference and archiving, I quote here in full:
  
  2006-6-7:
  During the last week while using BackupPC in earnest, I have
  noticed a very serious design flaw which it totally avoidable by
  making a small change to the software. First I will describe the
  flaw with an example.
 
 details snipped

 
  The design flaw here is crystal clear. Consider a single file
  home1/xyz.txt. The authors has designed the BackupPC system so that
  the file home1/xyz.txt is sent in full from client1 to server1
  unless 
  
 details snipped
  
  The cure for this design flaw is very easy indeed, and it would
  save me several days of saturated LAN bandwidth when I make
  back-ups. It's very sad that the authors did not design the
  software correctly. Here is how the software design flaw can be
  fixed. 

This is an open source project -- rather than repetitively talking
about serious design flaws in a very workable piece of software (to
which I believe you have contributed nothing) and instead of talking
about how sad it is that the authors didn't correct it, why don't
you stop complaining and code a better version.

I'm sure if you produce a demonstrably better version and test it
under a range of use-cases to validate its robustness that people
would be more than happy to use your fix for this serious design flaw.

And you win a bigger bonus if you do this all using tar or rsync
without the requirement for any client software of any other remotely
executed commands...

  The above design concept would make BackupPC much more efficient
  even under normal circumstances where the variable
  $Conf{RsyncShareName} is unchanging. At present, rsyncd will only
  refrain from sending a file if it is present in the same path in
  the same module in a previous full back-up. If server1 already has
  the same identical file in any other location, the file is sent by
  rsyncd and then discarded after it arrives.

It sounds like you know what you want to do so start coding and stop
complaining...

  If the above serious design flaw is not fixed, it will not do much
  harm to people whose files are rarely changing and rarely
  moving. But if, for example, you move a directory tree from once
  place to another, BackupPC will re-send the whole lot across the
  LAN, and then it will discard the files when they arrive on the
  BackupPC server. This will keep on happening until after you have
  made a full back-up of the files in the new location.  

No one is stopping you from fixing this serious design flaw which
obviously is not keeping the bulk of us users up at night worrying.

And for the record, I don't necessarily disagree with you that there
are things that can be improved but your attitude is going to get you
less than nowhere. Also, the coders are hardly stupid and there are
good reasons for the various tradeoffs they have made that you would
be wise in trying to understand before disparaging them and their
software.

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-14 Thread Harald Amtmann
 
 And for the record, I don't necessarily disagree with you that there
 are things that can be improved but your attitude is going to get you
 less than nowhere. Also, the coders are hardly stupid and there are
 good reasons for the various tradeoffs they have made that you would
 be wise in trying to understand before disparaging them and their
 software.

Hi I didn't want to sound rude. This was my 6th mail regarding this problem (5 
to this list, 1 personally to Craig) I think. In the first 5 mails I was 
reporting my observtaions asking whether what I am seeing is expected behaviour 
or an error on my part, each mail providing more detail as I was trying to find 
the source of the problem. In my personal mail to Craig I stated the same 
question and asked for pointers as to where in RsyncP might be the problem so 
that I can start working on a fix (if possible). Not one single one of the 
mails got a reply, so I kept looking myself for an answer, both in Google and 
the source code. This last mail was just me being happy that I found out that 
this is indeed expected behaviour, that I can stop looking for problems in my 
setup and as a record for any future users who observe this behaviour.

Regards
Harald





-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-09 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 14:11:12 -0600 on Monday, December 7, 2009:
  It applies to full rsync or rsyncd backups.  An interrupted full should 
  be marked as a 'partial' in your backup summary - and the subsequent 
  full retry should not transfer the completed files again although it 
  will take the time to to a block checksum compare over them.  I don't 
  think it applies to incomplete files, so if you have one huge file that 
  didn't finish I think it would retry from the start.   This and 
  Conf{IncrLevels} are fairly recent additions - be sure you have a 
  current backuppc version and the code and documentation match.   Even 
  the current version won't find new or moved content if it exists in the 
  pool, though.

Is there any reason the rsync option --partial couldn't be implemented
in perl-File-RsyncP (if not already there)? This would presumably
allow partial backups of single files to be resumed. Not sure how hard
it would be but intuitively, I wouldn't think it would be too hard.

This could be important when backing up large files (e.g., video,
databases, isos) and in particular over a slow link.







--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-07 Thread Harald Amtmann
So, for anyone who cares (doesn't seem to be anyone on this list who noticed), 
I found this post from 2006 stating and analyzing my exact problem:

http://www.topology.org/linux/backuppc.html
On this site, search for Design flaw: Avoidable re-transmission of massive 
amounts of data.


For future reference and archiving, I quote here in full:

2006-6-7:
During the last week while using BackupPC in earnest, I have noticed a very 
serious design flaw which it totally avoidable by making a small change to the 
software. First I will describe the flaw with an example.

   1. First I back up the ryncd module home from computer client1 to computer 
server1 using the rsyncd method. This uses the following line in the server1 
config.pl file:

  $Conf{RsyncShareName} = ['home'];

   2. Then I do an incremental back-up of module home from client1 to 
server1. This back-up correctly sends only the changes in the file-system 
module home over the network. So the back-up is very quick.
   3. Now I modify the variable $Conf{RsyncShareName} on server1 as follows:

  $Conf{RsyncShareName} = ['home', 'home1'];

   4. Next, I make an incremental back-up. Naturally, the home module is sent 
very efficiently over the LAN and home1 is sent in full, essentially 
uncompressed. Well, this isn't quite natural. In fact, it's quite avoidable, 
but I'll explain why this is so later.
   5. Now I make a second incremental back-up of home and home1. Since I have 
already backed up these two modules, I expect them both to be very quick. But 
this does not happen. In fact, all of home1 is sent in full over the LAN, which 
in my case takes about 10 hours. This is a real nuisance. This problem occurs 
even if I have this in the config.pl file on server1:

  $Conf{IncrFill} = 1;

   6. Next, I make a full back-up. This sends only the changes to home over the 
LAN, but sends the full contents of home1, uncompressed, over the LAN, even 
though I have already sent this module in full twice.
   7. Now when I make future back-ups, the modules home and home1 are both sent 
efficiently and quickly. 

The design flaw here is crystal clear. Consider a single file home1/xyz.txt. 
The authors has designed the BackupPC system so that the file home1/xyz.txt is 
sent in full from client1 to server1 unless

   1. the file home1/xyz.txt is already on server1 with the identical path in 
the identical module home1, and
   2. the back-up in which home1/xyz.txt exists is a full back-up, not an 
incremental back-up. 

If the above conditions do not both hold, the full file is transmitted by 
rsyncd on client1; then it is discarded by server1 if it is already present on 
server1 in either the same path in an earlier back-up, or in any path at all in 
any other module in any kind of earlier back-up. So the software correctly 
discards duplicate files when they arrive on server1, but they are still 
transmitted anyway.

The cure for this design flaw is very easy indeed, and it would save me several 
days of saturated LAN bandwidth when I make back-ups. It's very sad that the 
authors did not design the software correctly. Here is how the software design 
flaw can be fixed.

   1. When an rsync file-system module module1 is to be transmitted from 
client1 to server1, first transmit the hash (e.g. MD5) of each file from 
client1 to server1. This can be done (a) on a file by file basis, (b) for all 
the files in module1 at the same time, or (c) in bundles of say, a few hundred 
or thousand hashes at a time.
   2. The BackupPC server server1 matches the received file hashes with the 
global hash table of all files on server1, both full back-up files and 
incremenetal back-up files.
   3. Then server1 requests rsyncd on client1 to only transmit the files which 
are not already present on server1. Notice that the files on server1 do not 
have to be in the same path in the same module on server1 in a full back-up, 
which is the case in the current BackupPC software design.
   4. Then client1 sends only the files which are requested, which are the 
files which are not already present on server1. 

The above design concept would make BackupPC much more efficient even under 
normal circumstances where the variable $Conf{RsyncShareName} is unchanging. At 
present, rsyncd will only refrain from sending a file if it is present in the 
same path in the same module in a previous full back-up. If server1 already has 
the same identical file in any other location, the file is sent by rsyncd and 
then discarded after it arrives.

If the above serious design flaw is not fixed, it will not do much harm to 
people whose files are rarely changing and rarely moving. But if, for example, 
you move a directory tree from once place to another, BackupPC will re-send the 
whole lot across the LAN, and then it will discard the files when they arrive 
on the BackupPC server. This will keep on happening until after you have made a 
full back-up of the files in the new location. 



Re: [BackupPC-users] RsyncP problem

2009-12-07 Thread Les Mikesell
Harald Amtmann wrote:
 So, for anyone who cares (doesn't seem to be anyone on this list who 
 noticed), I found this post from 2006 stating and analyzing my exact problem:
 
 http://www.topology.org/linux/backuppc.html
 On this site, search for Design flaw: Avoidable re-transmission of massive 
 amounts of data.

It's documented behavior, so not a surprise.

5. Now I make a second incremental back-up of home and home1. Since I have 
 already backed up these two modules, I expect them both to be very quick. But 
 this does not happen. In fact, all of home1 is sent in full over the LAN, 
 which in my case takes about 10 hours. This is a real nuisance. This problem 
 occurs even if I have this in the config.pl file on server1:
   $Conf{IncrFill} = 1;

You have the wrong expectations. Do you have a reasonably current 
version, and did you read the section on $Conf{IncrLevels} in 
http://backuppc.sourceforge.net/faq/BackupPC.html?  You can also just do 
full runs instead of incrementals - they take a long time as the target 
has to read the files to verify the block checksums, but not a lot of 
bandwidth.

 The cure for this design flaw is very easy indeed, and it would save me 
 several days of saturated LAN bandwidth when I make back-ups. It's very sad 
 that the authors did not design the software correctly. Here is how the 
 software design flaw can be fixed.
 
1. When an rsync file-system module module1 is to be transmitted from 
 client1 to server1, first transmit the hash (e.g. MD5) of each file from 
 client1 to server1. This can be done (a) on a file by file basis, (b) for all 
 the files in module1 at the same time, or (c) in bundles of say, a few 
 hundred or thousand hashes at a time.

The rsync binary on the target isn't going to do that.

2. The BackupPC server server1 matches the received file hashes with the 
 global hash table of all files on server1, both full back-up files and 
 incremenetal back-up files.

Aside from not matching rsync, the file hashes have expected collisions 
that can only be resolved by a full data comparison.  And there's no 
reason to expect all of the files in the pool to have been collected 
with an rsync transfer method.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-07 Thread Harald Amtmann

 Original-Nachricht 
 Datum: Mon, 07 Dec 2009 13:08:52 -0600
 Von: Les Mikesell lesmikes...@gmail.com
 An: General list for user discussion,questions and support 
 backuppc-users@lists.sourceforge.net
 Betreff: Re: [BackupPC-users] RsyncP problem

 Harald Amtmann wrote:
  So, for anyone who cares (doesn't seem to be anyone on this list who
 noticed), I found this post from 2006 stating and analyzing my exact problem:
  
  http://www.topology.org/linux/backuppc.html
  On this site, search for Design flaw: Avoidable re-transmission of
 massive amounts of data.
 
 It's documented behavior, so not a surprise.

With the rsync transfer method the partial backup is used to resume the next 
full backup, avoiding the need to retransfer the file data already in the 
partial backup.

This is also from the docs and doesn't work. I have 40 GB of data and do a 
first full backup. It gets interrupted. I start it again and all data is 
retransmitted. Does the rsync transfer method not include rsyncd method which 
I am using?


-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-07 Thread Les Mikesell
Harald Amtmann wrote:
  Original-Nachricht 
 Datum: Mon, 07 Dec 2009 13:08:52 -0600
 Von: Les Mikesell lesmikes...@gmail.com
 An: General list for user discussion,   questions and support 
 backuppc-users@lists.sourceforge.net
 Betreff: Re: [BackupPC-users] RsyncP problem
 
 Harald Amtmann wrote:
 So, for anyone who cares (doesn't seem to be anyone on this list who
 noticed), I found this post from 2006 stating and analyzing my exact problem:
 http://www.topology.org/linux/backuppc.html
 On this site, search for Design flaw: Avoidable re-transmission of
 massive amounts of data.

 It's documented behavior, so not a surprise.
 
 With the rsync transfer method the partial backup is used to resume the next 
 full backup, avoiding the need to retransfer the file data already in the 
 partial backup.
 
 This is also from the docs and doesn't work. I have 40 GB of data and do a 
 first full backup. It gets interrupted. I start it again and all data is 
 retransmitted. Does the rsync transfer method not include rsyncd method 
 which I am using?

It applies to full rsync or rsyncd backups.  An interrupted full should 
be marked as a 'partial' in your backup summary - and the subsequent 
full retry should not transfer the completed files again although it 
will take the time to to a block checksum compare over them.  I don't 
think it applies to incomplete files, so if you have one huge file that 
didn't finish I think it would retry from the start.   This and 
Conf{IncrLevels} are fairly recent additions - be sure you have a 
current backuppc version and the code and documentation match.   Even 
the current version won't find new or moved content if it exists in the 
pool, though.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-07 Thread Harald Amtmann

 Conf{IncrLevels} are fairly recent additions - be sure you have a 
 current backuppc version and the code and documentation match.   Even 
 the current version won't find new or moved content if it exists in the 
 pool, though.

Are you referring to 3.2.0 beta 1 or 3.1.0 as recent version? I am using 3.1.0 
from Debian.



-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-07 Thread Les Mikesell
Harald Amtmann wrote:
 Conf{IncrLevels} are fairly recent additions - be sure you have a 
 current backuppc version and the code and documentation match.   Even 
 the current version won't find new or moved content if it exists in the 
 pool, though.
 
 Are you referring to 3.2.0 beta 1 or 3.1.0 as recent version? I am using 
 3.1.0 from Debian.
 
 
 
 From the changelog here 
http://sourceforge.net/project/shownotes.php?release_id=673692 I'd say 
the features should be in 3.1.0 but there could have been bugs with 
subsequent fixes.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/