Re: User controlled i/o block size?

2016-04-12 Thread Greg Freemyer
On Tue, Apr 12, 2016 at 3:22 PM, Kevin Korb  wrote:
> In that instance you would need to delete the incomplete file.  The
> same would happen if you used -u on rsync but -u is cp's only method
> of avoiding files that are already there.

Which is why I don't like to use cp for large directory copies of
100's files and 100's of GB of data.

rsync is much more functional for me.  I'd just like to get the
performance under cygwin improved.

Maybe I will run some strace's tomorrow and see if I can figure out
why it isn't doing well with cygwin/windows filesystem caching.

Greg

--
Greg Freemyer
www.IntelligentAvatar.net

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: User controlled i/o block size?

2016-04-12 Thread Kevin Korb
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In that instance you would need to delete the incomplete file.  The
same would happen if you used -u on rsync but -u is cp's only method
of avoiding files that are already there.

On 04/12/2016 02:54 PM, Greg Freemyer wrote:
> On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb 
> wrote:
>> You didn't say if you were networking or what features of rsync
>> you are using but if you aren't networking and aren't doing
>> anything fancy you are probably better off with cp -au which is
>> essentially the same as rsync -au except faster.
> 
> I was curious if "cp -au" was indeed as robust as rsync.
> 
> No it isn't.  My test:
> 
> Create a folder with numerous files in it (a dozen in my case).
> Have one of them be 9GB (or anything relatively big).
> 
> cp -au  
> 
> Look in the destination folder and when you see the 9GB file
> growing, kill "cp -au".  (I just did a control-C).
> 
> Restart "cp -au".
> 
> I ended up with a truncated copy of the 9GB file.  (roughly a 3GB
> file.)
> 
> The copy I did yesterday was about 1200 files.  Almost all were
> about 1.5GB in size, so that was a multi-hour process to make the
> copy.
> 
> Using rsync, I can kill the copy at any time (by desire or system 
> issue) and just restart it.
> 
> Using the simple "rsync -avp --progress" command I end up
> recopying the file that was in progress when rsync was aborted, but
> 1.5GB files only take 10 or 15 seconds to copy, so that is a
> minimal wasted effort when considering a copy process that runs for
> hours.
> 
> fyi: In my job I work with 100GB+ read-only datasets all the time. 
> The tools are all designed  to segment the data into 1.5 GB files. 
> One advantage is if a file becomes corrupt, just that segment file
> has to be replaced.  All the large files are validated via MD5 hash
> (or SHA-256, etc).  I keep a minimum of two copies of all
> datasets. Yesterday I was making a third copy of several of the
> datasets, so I had almost 2TB of data to copy.
> 
> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
> 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iEYEARECAAYFAlcNSwAACgkQVKC1jlbQAQcEQwCdEc8gRw/Qy7F4xMKpdmKjBE2B
dzYAoMk5CBmTrd2mes6lnDOwCWusaO3o
=gU2g
-END PGP SIGNATURE-

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: User controlled i/o block size?

2016-04-12 Thread Greg Freemyer
On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb  wrote:
> You didn't say if you were networking or what features of rsync you
> are using but if you aren't networking and aren't doing anything fancy
> you are probably better off with cp -au which is essentially the same
> as rsync -au except faster.

I was curious if "cp -au" was indeed as robust as rsync.

No it isn't.  My test:

Create a folder with numerous files in it (a dozen in my case).  Have
one of them be 9GB (or anything relatively big).

cp -au  

Look in the destination folder and when you see the 9GB file growing,
kill "cp -au".  (I just did a control-C).

Restart "cp -au".

I ended up with a truncated copy of the 9GB file.  (roughly a 3GB file.)

The copy I did yesterday was about 1200 files.  Almost all were about
1.5GB in size, so that was a multi-hour process to make the copy.

Using rsync, I can kill the copy at any time (by desire or system
issue) and just restart it.

Using the simple "rsync -avp --progress" command I end up recopying
the file that was in progress when rsync was aborted, but 1.5GB files
only take 10 or 15 seconds to copy, so that is a minimal wasted effort
when considering a copy process that runs for hours.

fyi: In my job I work with 100GB+ read-only datasets all the time.
The tools are all designed  to segment the data into 1.5 GB files.
One advantage is if a file becomes corrupt, just that segment file has
to be replaced.  All the large files are validated via MD5 hash (or
SHA-256, etc).  I keep a minimum of two copies of all datasets.
Yesterday I was making a third copy of several of the datasets, so I
had almost 2TB of data to copy.

Thanks
Greg
--
Greg Freemyer
www.IntelligentAvatar.net

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: User controlled i/o block size?

2016-04-12 Thread Fabian Cenedese
At 01:33 12.04.2016, Greg Freemyer wrote:
>Content-Transfer-Encoding: 7bit
>
>I'm just doing a local copy:
>rsync -avp --progress  

Just as side information: In local copies all files are copied wholly,
the diff algorithm is not in effect. So if a file changes then it still is
copied completely (without --partial, --no-whole-file etc).

Second thing: From what I remember rsync does a lot of stat calls to
get every file's properties. This is more expensive on cygwin/Windows
than on linux directly. Rsync also uses processes/threads which are
easier/faster to create and switched to on linux than on Windows.

A Windows native implementation of rsync could run faster than the
original rsync with cygwin layer. Some time ago somebody announced
a new program using the rsync algorithm. But I never used it so I don't
know about the features or speed.

http://www.acrosync.com/windows.html

bye  Fabi


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html