Paul Slootman wrote:
> On Sun 20 Jan 2008, Brian wrote:
>> Paul Slootman wrote:
> 
>>> This means you're transferring a list that contains 15 copies of an
>>> image (assuming there's only one image on #1). Doing this while
>>> preserving all hard links is quite memory-intensive, when using current
>>> versions of rsync.
>>>
>>>> Apart from splitting the vault into smaller parts of the system, any 
>>> I'd typically do it by first transferring the first image, and then
>>> transferring the second one in a similar way that dirvish does, i.e. by
>>> using --link-dest that points to the first image, etc.
>>>
>>>> ideas how to reduce the memory overhead? Maybe its just a bad idea to 
>>>> use dirvish to backup the complete debian system?
>>> You'll always run into this sooner or later. Nothing wrong with
>>> backuping a complete debian system.
>>>
>>> What you could try is the current prerelease version of rsync 3.0.0;
>>> currently it's prerelease 8.  I'm using that for a couple of systems,
>>> and it's much better in doing large lists. For one, it doesn't wait
>>> until the entire list is transferred before beginning with transferring
>>> files.  Doe note that you need that version on both ends for the new
>>> protocol to work.
>>>
>>>
>>> Paul Slootman
>>> _______________________________________________
>>> Dirvish mailing list
>>> [email protected]
>>> http://www.dirvish.org/mailman/listinfo/dirvish
>>>
>> Paul,
>> thanks. Just to be certain, the vault has about 15 trees in it, most of 
>> the copies are identical so mainly hard links, but I assume that would 
>> only make a difference to the amount of data that may need to be 
>> transferred, not to the size of the actual file list sent.
> 
> Doesn't each hard link also need to be mentioned in the file list?
> (hint: yes :-)  How otherwise does rsync know what the names of the hard
> links are...

Logical, still I am surprised that I am using all 29MB of RAM plus 
between 60 and 90 MB or SWAP just for a file list of between 200K and 
300K files. OTOH I have no idea how much info rsync puts in this file 
list.


> 
>> So I'm not quite certain how --link-dest would help for my problem.
> 
> By splitting the transfer up into 15 separate sessions, the list is that
> much smaller.
> 
> 
>> I guess the new version may actually make things worse for my situation, 
>> as transferring files at the same time would probably take even more 
>> storage. Unless it means that the file list in storage would be smaller 
>> at any one point in time.
> 
> As the list is being processed while it's being transferred, those parts
> of the list that aren't needed anymore can be freed from memory.
> Besides, the memory needed for transferring files is insignificant in
> comparison to the file list. Hence the new incremental protocol usually
> speeds things up significantly, and uses significantly less memory.
> 
>> I think my main problem may just be the lousy paging performance of the 
>> NSLU2.
> 
> Of course, but why not try to optimise the procedure anyway...
> Especially if the performance of a system is bad it's most important to
> optimise.
> 
> 
> Note that rsync 3.0.0 is for all intents and purposes backwards
> compatible, so you can just drop it in place of an older version.
> The only things that may be different are that strange quirks of older
> versions have been fixed, and in some very specialised situations that
> may have an impact on how it works. For dirvish it's irrelevant.
> 
> The pre-8 prerelease version seems very stable, and an official release
> is not far away.
> 
> 
> Paul Slootman
> _______________________________________________
> Dirvish mailing list
> [email protected]
> http://www.dirvish.org/mailman/listinfo/dirvish
> 
Paul,
for the situation of syncing the data between two disks on the same slug 
I have got relief from paging by going into the VAULT and doing an rsync 
on all the backup dirs in that vault. That works very well.

Doing the rsync from one slug to the other, I rsync from Slug2 with the 
following command  "rsync://192.168.123.252/slug1-system" to a directory 
on this Slug. The slug2-system dir is the dirvish vault, therefore using 
a script to split this dir into many smaller parts seems to be somewhat 
more complex, as I need to pull a list of the contents of that dir to 
the local machine to issue the rsync commands.Still I am going to look 
into it.

 > Doesn't each hard link also need to be mentioned in the file list?
 > (hint: yes :-)  How otherwise does rsync know what the names of the
 > hard links are...

Logical, still I am surprised that I am using all 29MB of RAM plus 
between 60 and 90 MB or SWAP just for a file list of between 200K and 
300K files. OTOH I have no idea how much info rsync puts in this file 
list.

NB.
Currently I cannot find a V3 of rsync for the machine (NSLU2 running 
Debian) that I have, so I need to wait a little longer for it to leave 
the beta stage.

Cheers Brian












_______________________________________________
Dirvish mailing list
[email protected]
http://www.dirvish.org/mailman/listinfo/dirvish

Reply via email to