Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Les Mikesell
Robin Lee Powell wrote:
> I've only looked at the code briefly, but I believe this *should* be
> possible.  I don't know if I'll be implementing it, at least not
> right away, but it shouldn't actually be that hard, so I wanted to
> throw it out so someone else could run with it if ey wants.
> 
> It's an idea I had about rsync resumption:
> 
> Keep an array of all the things you haven't backed up yet, starting
> with the inital arguments; let's say we're transferring "/a" and
> "/b" from the remote machine.
> 
> Start by putting "a/" and "b/" in the array.  Then get the directory
> listing for a/, and replace "a/" in the array with "a/d", "a/e", ...
> for all files and directories in there.  When each file is
> transferred, it gets removed.  Directories are replaced with their
> contents.
> 
> If the transfer breaks, you can resume with that list of
> things-what-still-need-transferring/recursing-through without having
> to walk the parts of the tree you've already walked.

Directories aren't static things.  If you don't complete a run, you would still 
need to re-walk the whole tree comparing for changes.

You can, however, explicitly break the runs at top-level directory boundaries 
and mount points if you have a problem with the size.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Cannot start backuppc

2009-12-14 Thread Robert J. Phillips
I figured out the problem.  When the folder got re-created the user and
group for all the folders was root.  I changed them to backuppc and
everything seems to have started up.

 

Thanks for the assistance.

 

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Robin Lee Powell
On Sun, Dec 13, 2009 at 11:56:59PM -0500, Jeffrey J. Kosowsky wrote:
> Unfortunately, I don't think it is that simple. If it were, then
> rsync would have been written that way back in version .001. I
> mean there is a reason that rsync memory usage increases as the
> number of files increases (even in 3.0) and it is not due to
> memory holes or ignorant programmers. After all, your proposed fix
> is not exactly obscure.

The point is actually to fix resumption, not memory usage; memory
usage is already quite good in 3.0.  That was never a goal of rsync
as far as I'm aware; resumption is handled by re-reading the tree
again and dealing with partial copies.

> At least one reason is the need to keep track of inodes so that
> hard links can be copied properly. 

Yes, sorry, forgot to mention that: hardlinks probably won't work
this way.  Although backuppc handles them rather different, since
it's only rsync-the-program on one end.

> Maybe you don't care but if so, you could probably do just about
> as well by dropping the --hard-links argument from RsyncArgs.

Already done; makes no significant difference.

> I don't believe there is any easy way to get something for free
> here...

It's not for free, it's at the expense of memory on the server.
Probably not very much, but still.

Do you actually see a *problem* with it, or are you just assuming it
won't work because it seems too easy?

-Robin

-- 
They say:  "The first AIs will be built by the military as weapons."
And I'm  thinking:  "Does it even occur to you to try for something
other  than  the default  outcome?"  See http://shrunklink.com/cdiz
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Robin Lee Powell
On Mon, Dec 14, 2009 at 07:57:10AM -0600, Les Mikesell wrote:
> Robin Lee Powell wrote:
> > I've only looked at the code briefly, but I believe this
> > *should* be possible.  I don't know if I'll be implementing it,
> > at least not right away, but it shouldn't actually be that hard,
> > so I wanted to throw it out so someone else could run with it if
> > ey wants.
> > 
> > It's an idea I had about rsync resumption:
> > 
> > Keep an array of all the things you haven't backed up yet,
> > starting with the inital arguments; let's say we're transferring
> > "/a" and "/b" from the remote machine.
> > 
> > Start by putting "a/" and "b/" in the array.  Then get the
> > directory listing for a/, and replace "a/" in the array with
> > "a/d", "a/e", ... for all files and directories in there.  When
> > each file is transferred, it gets removed.  Directories are
> > replaced with their contents.
> > 
> > If the transfer breaks, you can resume with that list of
> > things-what-still-need-transferring/recursing-through without
> > having to walk the parts of the tree you've already walked.
> 
> Directories aren't static things.  If you don't complete a run,
> you would still need to re-walk the whole tree comparing for
> changes.

Why?  The point here would be to explicitely declare "I don't care
about directories that changed since I passed them on this
particular backup run; they'll get caught on the next backup run".

> You can, however, explicitly break the runs at top-level directory
> boundaries and mount points if you have a problem with the size.

That doesn't always work; it certainly doesn't work in my case.
Millions of files scattered unevenly around a single file system; I
don't even know where the concentrations are because it takes so
long to run du/find on this filesystem, and it degrades performance
in a way that makes the client upset.

-Robin

-- 
They say:  "The first AIs will be built by the military as weapons."
And I'm  thinking:  "Does it even occur to you to try for something
other  than  the default  outcome?"  See http://shrunklink.com/cdiz
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Consecutive BackupPC_archiveStart calling for the same archive host is not executing

2009-12-14 Thread Jose Torres
Hello!

 

I am executing in a bash script the following consecutive BackupPC command

/scriptpath/BackupPC_archiveStart localhost backuppc server1

/scriptpath/BackupPC_archiveStart localhost backuppc server2

/scriptpath/BackupPC_archiveStart localhost backuppc server3

 

localhost is the archive host, and server1, server2, server3 are hosts being
backup prior to the executing of this script.

 

I could have instead:

 

/scriptpath/BackupPC_archiveStart localhost backuppc server1 server2 server3

 

But I want to execute for each server an ArchivePostUserCmd so then I have
to do the archive for each host individually.

 

The problem is that each request is being done as appears on the BackupPC
log but only one (the first request) is executed.

So I end with the archiving of only server1 backups.

 

Any light will be appreciated.

 

 

 

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Jeffrey J. Kosowsky
Robin Lee Powell wrote at about 10:10:17 -0800 on Monday, December 14, 2009:
 > Do you actually see a *problem* with it, or are you just assuming it
 > won't work because it seems too easy?

The problem I see is that backuppc won't be able to backup hard links
on any interrupted or sub-divided backup unless you are careful to
make sure that no hard links span multiple restarts. And once you mess
up hard links for a file, all subsequent incremental will be unlinked
to.


If you are just using BackupPC to back up data then that might not be
important. On the other hand, if you are using backuppc to backup
entire systems with the goal of having (close to a) bare metal
restore, then this method won't work.

Personally, I haven't seen a major memory sink using rsync
3.0+. Perhaps you could provide some real world data of the
potential savings so that people can understand the tradeoffs.

That being said, memory is pretty cheap, while reliable backups are
hard. So, I wouldn't expect Craig to integrate functionality that
would degrade the ability to reliably back up a *nix filesystem just
to save a little memory. Of course, none of this is meant to
discourage your own patches or forks if they suit your needs.

As an aside, if anything, myself and others have been pushing to get
more reliable backup of filesystem details such as extended
attributes, ACLs, ntfs stuff etc. and removing the ability to backup
hard links would be a step backwards from that perspective.

Finally, the problem with interrupted backups that I see mentioned
most on this group is the interruption of large transfers that have to
be restarted and then retransferred over a slow link. Rsync itself is
pretty fast when it just has to check file attributes to determine
what needs to be backed up. So, I think the best way for improvement
that would be consistent with BackupPC design would be to store
partial file transfers so that they could be resumed on
interruption. Also, people have suggested tweaks to the algorithm for
storing partial backups. I suspect that a little effort in those
directions would solve most problems with few if any drawbacks. Again,
I really haven't seen people mentioning memory issues per-se in the
normal backuppc context -- the memory issue seems to mostly come up
when people are using rsync (outside of backuppc) to duplicate the
pool/pc trees and its large number of hard links.


--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Jeffrey J. Kosowsky
Robin Lee Powell wrote at about 10:12:28 -0800 on Monday, December 14, 2009:
 > On Mon, Dec 14, 2009 at 07:57:10AM -0600, Les Mikesell wrote:
 > > Robin Lee Powell wrote:
 > > > I've only looked at the code briefly, but I believe this
 > > > *should* be possible.  I don't know if I'll be implementing it,
 > > > at least not right away, but it shouldn't actually be that hard,
 > > > so I wanted to throw it out so someone else could run with it if
 > > > ey wants.
 > > > 
 > > > It's an idea I had about rsync resumption:
 > > > 
 > > > Keep an array of all the things you haven't backed up yet,
 > > > starting with the inital arguments; let's say we're transferring
 > > > "/a" and "/b" from the remote machine.
 > > > 
 > > > Start by putting "a/" and "b/" in the array.  Then get the
 > > > directory listing for a/, and replace "a/" in the array with
 > > > "a/d", "a/e", ... for all files and directories in there.  When
 > > > each file is transferred, it gets removed.  Directories are
 > > > replaced with their contents.
 > > > 
 > > > If the transfer breaks, you can resume with that list of
 > > > things-what-still-need-transferring/recursing-through without
 > > > having to walk the parts of the tree you've already walked.
 > > 
 > > Directories aren't static things.  If you don't complete a run,
 > > you would still need to re-walk the whole tree comparing for
 > > changes.
 > 
To be fair, unless you are using filesystem snapshots, the directories
aren't static during an uninterrupted rsync either...

 > Why?  The point here would be to explicitely declare "I don't care
 > about directories that changed since I passed them on this
 > particular backup run; they'll get caught on the next backup run".
 > 

Again I think this goes against the grain of the needs of many users
whose number one priority is typically reliability and consistency of
backups rather than speed. If anything, people are moving more to the
notion of filesystem snapshots to ensure consistency.


 > > You can, however, explicitly break the runs at top-level directory
 > > boundaries and mount points if you have a problem with the size.
 > 
 > That doesn't always work; it certainly doesn't work in my case.
 > Millions of files scattered unevenly around a single file system; I
 > don't even know where the concentrations are because it takes so
 > long to run du/find on this filesystem, and it degrades performance
 > in a way that makes the client upset.
 > 

I wonder how common your use case is where the files are scattered so
unevenly, so unpredictably, and in such a dynamically changing manner
that you can't make a dent in the complexity by subdividing the share
into smaller pieces. If the system is so dynamic and unpredictable,
then perhaps the more robust solution is to see whether the data
storage can be organized better...

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Jeffrey J. Kosowsky
Shawn Perry wrote at about 23:42:33 -0700 on Sunday, December 13, 2009:
 > You can always run come sort of disk de-duplicater after you copy without -H

How does the disk de-duplicator know which duplications are
intentional vs. which ones are not?

Plus a de-duplicator will have similar memory scaling issues that
rsync has when dealing with hard links.

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Robin Lee Powell
On Mon, Dec 14, 2009 at 02:17:01PM -0500, Jeffrey J. Kosowsky wrote:
> Robin Lee Powell wrote at about 10:12:28 -0800 on Monday, December 14, 2009:
>  > On Mon, Dec 14, 2009 at 07:57:10AM -0600, Les Mikesell wrote:
>  > >
>  > > You can, however, explicitly break the runs at top-level
>  > > directory boundaries and mount points if you have a problem
>  > > with the size.
>  > 
>  > That doesn't always work; it certainly doesn't work in my case.
>  > Millions of files scattered unevenly around a single file
>  > system; I don't even know where the concentrations are because
>  > it takes so long to run du/find on this filesystem, and it
>  > degrades performance in a way that makes the client upset.
>  > 
> 
> I wonder how common your use case is where the files are scattered
> so unevenly, so unpredictably, and in such a dynamically changing
> manner that you can't make a dent in the complexity by subdividing
> the share into smaller pieces. If the system is so dynamic and
> unpredictable, then perhaps the more robust solution is to see
> whether the data storage can be organized better...

We're a hosting company; these are backups for clients.  We can't
enforce that sort of shift.

Since I've got two clients with exactly the same issue (file trees
in the millions of files, that take ten hours or more just to run a
"find" on, let alone du), though, I'm inclined to think that it's
not *that* uncommon.

I find it more than a little odd that you are telling me "OMG
HARDLINKS!" on the one hand, and "subdivide the share" on the other,
since by definition subdivided shares break hard links.  What's up
with that?  (not intended confrontationally, just confused)

-Robin

-- 
They say:  "The first AIs will be built by the military as weapons."
And I'm  thinking:  "Does it even occur to you to try for something
other  than  the default  outcome?"  See http://shrunklink.com/cdiz
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Robin Lee Powell
On Mon, Dec 14, 2009 at 02:08:31PM -0500, Jeffrey J. Kosowsky wrote:
> Robin Lee Powell wrote at about 10:10:17 -0800 on Monday, December 14, 2009:
>  > Do you actually see a *problem* with it, or are you just
>  > assuming it won't work because it seems too easy?
> 
> The problem I see is that backuppc won't be able to backup hard
> links on any interrupted or sub-divided backup unless you are
> careful to make sure that no hard links span multiple restarts.
> And once you mess up hard links for a file, all subsequent
> incremental will be unlinked to.
> 
> 
> If you are just using BackupPC to back up data then that might not
> be important. On the other hand, if you are using backuppc to
> backup entire systems with the goal of having (close to a) bare
> metal restore, then this method won't work.

Agreed on both counts; I'm only interested in backing up data.
Obviously such a system would have to be optional.

> Personally, I haven't seen a major memory sink using rsync 3.0+.
> Perhaps you could provide some real world data of the potential
> savings so that people can understand the tradeoffs.
> 
> That being said, memory is pretty cheap, while reliable backups
> are hard. 

I'm *far* more worried about the reliability than the RAM usage;
that was just a side effect.  I'm losing 10+ hour backups routinely
to SIGPIPE and rsync on the remote and dying and so on; *that* is
what the idea was designed to fix.  The whole point is to,
optionally, make rsync more reliable at the expense of losing
hardlink support and, tangentially, save some RAM.

> As an aside, if anything, myself and others have been pushing to
> get more reliable backup of filesystem details such as extended
> attributes, ACLs, ntfs stuff etc. and removing the ability to
> backup hard links would be a step backwards from that perspective.

Understood.

> Finally, the problem with interrupted backups that I see mentioned
> most on this group is the interruption of large transfers that
> have to be restarted and then retransferred over a slow link.
> Rsync itself is pretty fast when it just has to check file
> attributes to determine what needs to be backed up. 

Not with large trees it isn't.  I have 3.5 million files, and more
than 300GiB of data, in one file system.  The last incremental took
*twenty one hours*.  I have another backup that's 4.5 million files,
also more than 300 GiB of data, also in one file system.  The full
took 20 hours; it hasn't succeeded at an incremental yet.  That's
over full 100BaseT, if not better (I'm not the networking person).

Asking rsync, and ssh, and a pair of firewalls and load balancers
(it's complicated) to stay perfectly fine for almost a full day is
really asking a whole hell of a lot.  For large data sets like this,
rsync simple isn't robust enough by itself.  Losing 15 hours worth
of (BackupPC's) work because the ssh connection goes down is
*really* frustrating.

In both cases, the client-side rsync uses more than 300MiB of RAM,
with --hard-links *removed* from the rsync option list.  Not
devestating, but not trivial either.

> So, I think the best way for improvement that would be consistent
> with BackupPC design would be to store partial file transfers so
> that they could be resumed on interruption. Also, people have
> suggested tweaks to the algorithm for storing partial backups. 

Partial transfers won't help in the slightest: the cost is the time
it takes to walk the file tree, which is what my idea was designed
to avoid: re-walking the tree on resumption.

Having said that, if incrementals could be resumed instead of just
thrown away, that would at least be marginally less frustrating when
a minor network glitch loses a 15+ hour transfer.

In the incremental I mentioned above, rsync's MB/sec listing is
0.08.  Over 100BaseT.  Seriously: the problem is that walking file
trees of that size, when they are active serving production traffic,
takes a *really* long time.  I don't see any way to avoid that
besides keeping track of where you've been.

-Robin

-- 
They say:  "The first AIs will be built by the military as weapons."
And I'm  thinking:  "Does it even occur to you to try for something
other  than  the default  outcome?"  See http://shrunklink.com/cdiz
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Jeffrey J. Kosowsky
Robin Lee Powell wrote at about 16:28:43 -0800 on Monday, December 14, 2009:
 > On Mon, Dec 14, 2009 at 02:17:01PM -0500, Jeffrey J. Kosowsky wrote:
 > > Robin Lee Powell wrote at about 10:12:28 -0800 on Monday, December 14, 
 > > 2009:
 > >  > On Mon, Dec 14, 2009 at 07:57:10AM -0600, Les Mikesell wrote:
 > >  > >
 > >  > > You can, however, explicitly break the runs at top-level
 > >  > > directory boundaries and mount points if you have a problem
 > >  > > with the size.
 > >  > 
 > >  > That doesn't always work; it certainly doesn't work in my case.
 > >  > Millions of files scattered unevenly around a single file
 > >  > system; I don't even know where the concentrations are because
 > >  > it takes so long to run du/find on this filesystem, and it
 > >  > degrades performance in a way that makes the client upset.
 > >  > 
 > > 
 > > I wonder how common your use case is where the files are scattered
 > > so unevenly, so unpredictably, and in such a dynamically changing
 > > manner that you can't make a dent in the complexity by subdividing
 > > the share into smaller pieces. If the system is so dynamic and
 > > unpredictable, then perhaps the more robust solution is to see
 > > whether the data storage can be organized better...
 > 
 > We're a hosting company; these are backups for clients.  We can't
 > enforce that sort of shift.
 > 
 > Since I've got two clients with exactly the same issue (file trees
 > in the millions of files, that take ten hours or more just to run a
 > "find" on, let alone du), though, I'm inclined to think that it's
 > not *that* uncommon.
 > 
 > I find it more than a little odd that you are telling me "OMG
 > HARDLINKS!" on the one hand, and "subdivide the share" on the other,
 > since by definition subdivided shares break hard links.  What's up
 > with that?  (not intended confrontationally, just confused)

No confrontation taken ;)
But if you follow the thread, I wasn't the one initially suggesting
subdividing the share nor am I pushing that solution. However, I think
that if you really can't do it all in one session, then subdividing is
a more recommended solution than suggesting a change to the backuppc
source (unless you just do it on your own).

Perhaps your setup is pushing the envelope more than others, but I
haven't seen many people with issues like yours. It would be
interesting to hear whether others are experiencing the same
problem. Also, before you go off rewriting backuppc, it might be
productive to investigate whether there are any correctable
software/hardware issues in your hosting setup that might be
contributing. You postulate several potential causes for your SIGPIPE
errors, but it would be good to identify the actual source(s) and
trigger(s).

ssh and rsync are pretty mature and robust software and they shouldn't
just "flake" out just from being active for 15 hours or from being
pounded hard with lots of files. While my setup is much much smaller
than yours, I keep ssh connections open for months using very
off-the-shelf consumer grade hardware. And if there is something
flakey in your hardware/software, you probably want to know that if
you are a hosting company...

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 Thread Les Mikesell
Robin Lee Powell wrote:
> 
> Asking rsync, and ssh, and a pair of firewalls and load balancers
> (it's complicated) to stay perfectly fine for almost a full day is
> really asking a whole hell of a lot.

I don't think that should be true.  There's no reason for a program to quit 
just 
because it has been running for a day and no particular limit to what ssh can 
transfer.   And tcp can deal with quite a lot of lossage and problems - unless 
your load balancers are NATing to different sources or tossing RST's when they 
fail over.

 > For large data sets like this,
> rsync simple isn't robust enough by itself.  Losing 15 hours worth
> of (BackupPC's) work because the ssh connection goes down is
> *really* frustrating.

I don't think it is rsync or ssh's problem, although you are correct that rsync 
could be better about handling huge sets of files.  Both should be as reliable 
as the underlying hardware.

> In both cases, the client-side rsync uses more than 300MiB of RAM,
> with --hard-links *removed* from the rsync option list.  Not
> devestating, but not trivial either.
> 
>> So, I think the best way for improvement that would be consistent
>> with BackupPC design would be to store partial file transfers so
>> that they could be resumed on interruption. Also, people have
>> suggested tweaks to the algorithm for storing partial backups. 
> 
> Partial transfers won't help in the slightest: the cost is the time
> it takes to walk the file tree, which is what my idea was designed
> to avoid: re-walking the tree on resumption.
> 
> Having said that, if incrementals could be resumed instead of just
> thrown away, that would at least be marginally less frustrating when
> a minor network glitch loses a 15+ hour transfer.

I've always thought that activating the --ignore-times option should be 
controlled separately instead of hard coded into the full runs.  If you didn't 
activate that, fulls could be almost as fast as incrementals so you could just 
do all fulls (with the server side hit of rebuilding the tree for each run). 
Maybe you could try knocking it out of lib/BackupPC/Xfer/Rsync.pm to see if it 
makes fulls fast enough to run all the time.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-14 Thread Jeffrey J. Kosowsky
Harald Amtmann wrote at about 19:29:07 +0100 on Monday, December 7, 2009:
 > So, for anyone who cares (doesn't seem to be anyone on this list who 
 > noticed), I found this post from 2006 stating and analyzing my exact problem:

You are assuming something that is not true...

 > 
 > http://www.topology.org/linux/backuppc.html
 > On this site, search for "Design flaw: Avoidable re-transmission of massive 
 > amounts of data."
 > 
 > 
 > For future reference and archiving, I quote here in full:
 > 
 > "2006-6-7:
 > During the last week while using BackupPC in earnest, I have
 > noticed a very serious design flaw which it totally avoidable by
 > making a small change to the software. First I will describe the
 > flaw with an example.
 
 details snipped

> 
 > The design flaw here is crystal clear. Consider a single file
 > home1/xyz.txt. The authors has designed the BackupPC system so that
 > the file home1/xyz.txt is sent in full from client1 to server1
 > unless 
 > 
 details snipped
 > 
 > The cure for this design flaw is very easy indeed, and it would
 > save me several days of saturated LAN bandwidth when I make
 > back-ups. It's very sad that the authors did not design the
 > software correctly. Here is how the software design flaw can be
 > fixed. 

This is an open source project -- rather than repetitively talking
about "serious design flaws" in a very workable piece of software (to
which I believe you have contributed nothing) and instead of talking
about how "sad" it is that the authors didn't correct it, why don't
you stop complaining and code a better version.

I'm sure if you produce a demonstrably better version and test it
under a range of use-cases to validate its robustness that people
would be more than happy to use your fix for this "serious" design flaw.

And you win a bigger bonus if you do this all using tar or rsync
without the requirement for any client software of any other remotely
executed commands...

 > The above design concept would make BackupPC much more efficient
 > even under normal circumstances where the variable
 > $Conf{RsyncShareName} is unchanging. At present, rsyncd will only
 > refrain from sending a file if it is present in the same path in
 > the same module in a previous full back-up. If server1 already has
 > the same identical file in any other location, the file is sent by
 > rsyncd and then discarded after it arrives.

It sounds like you know what you want to do so start coding and stop
complaining...

 > If the above serious design flaw is not fixed, it will not do much
 > harm to people whose files are rarely changing and rarely
 > moving. But if, for example, you move a directory tree from once
 > place to another, BackupPC will re-send the whole lot across the
 > LAN, and then it will discard the files when they arrive on the
 > BackupPC server. This will keep on happening until after you have
 > made a full back-up of the files in the new location.  "

No one is stopping you from fixing this "serious design flaw" which
obviously is not keeping the bulk of us users up at night worrying.

And for the record, I don't necessarily disagree with you that there
are things that can be improved but your attitude is going to get you
less than nowhere. Also, the coders are hardly stupid and there are
good reasons for the various tradeoffs they have made that you would
be wise in trying to understand before disparaging them and their
software.

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] RsyncP problem

2009-12-14 Thread Harald Amtmann
 
> And for the record, I don't necessarily disagree with you that there
> are things that can be improved but your attitude is going to get you
> less than nowhere. Also, the coders are hardly stupid and there are
> good reasons for the various tradeoffs they have made that you would
> be wise in trying to understand before disparaging them and their
> software.

Hi I didn't want to sound rude. This was my 6th mail regarding this problem (5 
to this list, 1 personally to Craig) I think. In the first 5 mails I was 
reporting my observtaions asking whether what I am seeing is expected behaviour 
or an error on my part, each mail providing more detail as I was trying to find 
the source of the problem. In my personal mail to Craig I stated the same 
question and asked for pointers as to where in RsyncP might be the problem so 
that I can start working on a fix (if possible). Not one single one of the 
mails got a reply, so I kept looking myself for an answer, both in Google and 
the source code. This last mail was just me being happy that I found out that 
this is indeed expected behaviour, that I can stop looking for problems in my 
setup and as a record for any future users who observe this behaviour.

Regards
Harald





-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/