Re: Estimating backup usage with dir-merge filter

2011-10-07 Thread Paul Dugas
On Thu, Oct 6, 2011 at 6:49 PM, Henri Shustak henri.shus...@gmail.com wrote:
 It sounds like you missed the point of Kevin's message (in the other fork 
 of this thread).  The point wasn't to use
 `du`, it was that you can run your stats against the backed-up files, not 
 the source.  Then you're only running stats
 against the results of running the backup using the filters, so you don't 
 need to filter them again.

 I got that but neglected to respond to the whole group.  My mistake.
 The backups are being performed using BackupPC to a central server
 where compression and de-duplication is done.  While it's true that
 the actual storage on the backup server being consumed by each user is
 less because of these, I don't have any problem hiding this from them
 and instead telling them what their uncompressed and duplicated usage
 is instead.  It has more of an effect that way if you know what I
 mean.

 If that doesn't make sense or isn't possible (backups are on some remote 
 server), then just use your rsync command
 with '--list-only', and post-process that list.

 I've been tinkering with using --verbose and --dry-run then parsing
 the total size our of the last line of the output and I think I'm
 close.  Curiously, when I don't include the --filter option as a
 baseline, I'm not getting the same results as du.

 $ du -sb . | awk '{print $1}'
 508625653

 $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
 '{print $4}'
 506037893

 The difference is minimal and probably negligible for this purpose but
 I'm still curious where it's coming from.  Maybe there are some sparse
 files in there somewhere.

 Do you have the same discrepancy if you use the --stats option?

Yes.  Using --stats, the last line of the output is the same as is the
earlier Total file size: line in the additional output.

Paul
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

[Bug 8512] New: rsync -a slower than cp -a

2011-10-07 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=8512

   Summary: rsync -a slower than cp -a
   Product: rsync
   Version: 3.1.0
  Platform: All
OS/Version: All
Status: NEW
  Severity: normal
  Priority: P5
 Component: core
AssignedTo: way...@samba.org
ReportedBy: linux.n...@bucksch.org
 QAContact: rsync...@samba.org


Reproduction:
1. du /foo/bigfile
2. echo 3  /proc/sys/vm/drop_caches
3. time rsync -avp /foo/bigfile /bar/bigfile
4. echo 3  /proc/sys/vm/drop_caches
5. time cp -a /foo/bigfile /bar/bigfile

Actual result:
1. ~1286 MB
3. 27.9s, 45.9 MB/s per calc, 45.61 MB/s according to rsync
5. 14.6s, 88.1 MB/s per calc

In other words, cp is *almost twice as fast* as rsync. On a single big file
where no comparison is necessary.

Expected result:
When copying a file, rsync is as fast as cp.

rsync is great, but this is a real putoff. I don't see a good reason either, as
the two programs perform the same function in this case. Must be some design
problem. Find the bottleneck and eliminate it, please.

Importance:
rsync advertizes with its speed.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 8512] rsync -a slower than cp -a

2011-10-07 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=8512

--- Comment #1 from Brian K. White br...@aljex.com 2011-10-07 17:17:49 UTC ---
If it were'nt just one file, having -v and -r and -D and not having --inplace
on rsync would be unfair. Only for a single file like this you can get away
with it.
Also it doesn't affect speed but -a already includes -p.

Actually lots of rsync options and default behavior do not fairly compare to
cp, but at the very least, add -W to rsync to compare it against cp. It'll
still be slower.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 8512] rsync -a slower than cp -a

2011-10-07 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=8512

--- Comment #2 from Ben Bucksch linux.n...@bucksch.org 2011-10-07 17:30:58 
UTC ---
As you said, all these options are irrelevant when rsync is in the middle of
copying a single big file. This copy loop is inefficient, that's what my test
case shows. And it's very much real: when I move 5T from one server to another
new server via GBit Ethernet, and I use rsync (because that's my habit because
rsync is so great), I am some 10 hours slower than with cp.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 8512] rsync -a slower than cp -a

2011-10-07 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=8512

--- Comment #3 from Sandon san...@van-ness.com 2011-10-07 17:45:53 UTC ---
I am seeing performance problems as well but CPU bottleneck. My issue is even
cp is CPU bottlenecked and dd with direct i/O gives me the best performance.

Nobody mentioned -W before but it didn't seem to make a difference for me.
Copying a file from one array to another I am seeing the following:

cp (no direct I/O) 94% cpu = 1min11s
rsync (no direct I/O) 210% cpu = 2m41s
rsync -w (no direct/IO) 205% cpu = 2m46s
dd (direct I/O) 13% cpu = 47s
dd (no direct I/O) 85% cpu = 1m24s
cp (libdirectio) 114% cpu = 50s

so my best bet is cp with libdirect I/O but I am not fond of this method as it
also is using a lot of CPU usage. I wish there was something that could give me
the speed/CPU usage of dd with direct I/O. Pushing data through the memory
buffer on my machine is eating up too much CPU usage and bottlenecking the
transfer in my case.

I don't know if rsync does check-suming  and stuff and that is why it seems to
be the slowest out of everything I tested.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 8513] New: files from local filesystem on source written to different filesystem on dest despite --one-file-system

2011-10-07 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=8513

   Summary: files from local filesystem on source written to
different filesystem on dest despite --one-file-system
   Product: rsync
   Version: 3.1.0
  Platform: All
OS/Version: All
Status: NEW
  Severity: normal
  Priority: P5
 Component: core
AssignedTo: way...@samba.org
ReportedBy: terry_n_br...@yahoo.com
 QAContact: rsync...@samba.org


doing something like

rsync -e ssh -p $PORT \
--ignore-errors \
--only-write-batch=$STORAGE/$TS_$NAME.rsb \
--archive \
--one-file-system \
--verbose \
--progress \
--delete \
$SOURCE/ \
$TARGET:$SOURCE/

On the source, due to historical accident, there are *local* files like

.../cifs/data1/cdrive/autoexec.bat
.../cifs/data1/cdrive/http/setup.ini

and on dest. .../cifs/data1/cdrive is a windows share on another filesystem /
machine, but the --read-batch version of the above, even with --one-file-system
flag, files on the other filesystem are still overwritten.

Seems --one-file-system should never cross file system boundaries.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Permissions option

2011-10-07 Thread James Moe
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,
  rsync v3.0.7
  I am backing up data to a USB memory drive that is formatted FAT32,
which does not comprehend linux permissions.
  I added the --no-perms option to the option set. rsync still
attempts to change the permissions.

OPTS = --no-perms --archive --stats --delete --itemize-changes
- --quiet --exclude-from='${EXCL_FILE}

Typical entry in the log file:
2011/10/07 10:27:10 [9597] .d...p. sma-v3/sma-mailing-list/
2011/10/07 10:27:10 [9597] .f...p. sma-v3/sma-mailing-list/index.php

IIRC the p indicates a permissions change.

  Is this to be expected?

- -- 
James Moe
moe dot james at sohnen-moe dot com
520.743.3936
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6PQjIACgkQzTcr8Prq0ZMhmQCfd72BMbcMx5eTmSvxCUu/gG/I
mOIAoKsKPJSmI89bNwa5jQaFkoQU8qW5
=I/Cl
-END PGP SIGNATURE-
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Permissions option

2011-10-07 Thread Kevin Korb
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I believe order matters.  It needs to be after the --archive.

Actually, you really shouldn't be using --archive anyways because FAT
only supports a couple of the things in --archive.  You should use
- --recursive and --times instead of --archive and a bunch of things to
remove the parts of --archive that don't work.

Also, if your source is not FAT you will probably also need
- --modify-window=2 and if you live somewhere that has daylight savings
time you will occasionally need --modify-window-3602.

On 10/07/11 14:17, James Moe wrote:
 Hello,
   rsync v3.0.7
   I am backing up data to a USB memory drive that is formatted FAT32,
 which does not comprehend linux permissions.
   I added the --no-perms option to the option set. rsync still
 attempts to change the permissions.
 
 OPTS = --no-perms --archive --stats --delete --itemize-changes
 --quiet --exclude-from='${EXCL_FILE}
 
 Typical entry in the log file:
 2011/10/07 10:27:10 [9597] .d...p. sma-v3/sma-mailing-list/
 2011/10/07 10:27:10 [9597] .f...p. sma-v3/sma-mailing-list/index.php
 
 IIRC the p indicates a permissions change.
 
   Is this to be expected?
 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk6PUQEACgkQVKC1jlbQAQeaywCdGH/gX73kVa6rJv3hh2D2NfdD
cqwAoLYP8fG2XDPW7aGfwXy+bM6NhDLf
=Lhh2
-END PGP SIGNATURE-
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Permissions option

2011-10-07 Thread James Moe
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/07/2011 12:20 PM, Kevin Korb wrote:
 I believe order matters.  It needs to be after the --archive.
 
 Actually, you really shouldn't be using --archive anyways because
 FAT only supports a couple of the things in --archive.  You should
 use - --recursive and --times instead of --archive and a bunch of
 things to remove the parts of --archive that don't work.

  Thank you. That works quite nicely.

- -- 
James Moe
moe dot james at sohnen-moe dot com
520.743.3936
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6PWkYACgkQzTcr8Prq0ZMTLACfSa+EerqmpTMsn9XHW7IZDnLc
q0IAn1ZsQf5KjYhsSVuJAhHQME7gscqO
=yP35
-END PGP SIGNATURE-
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)

2011-10-07 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5478

--- Comment #20 from Tim Taiwanese Liim tim.l...@alcatel-lucent.com 
2011-10-07 21:57:54 UTC ---
I agree with Wayne and Eric that Eric's issue is outside of rsync,
somewhere in the transport.

Eric,
Have you tried to check the TCP buffers of the ssh process on both
ends?  For example,
p=192.168.51.98:22
while true; do date; netstat -tn | grep $p; sleep 1; done
Proto Recv-Q Send-Q Local AddressForeign Address State
tcp0  19328 192.168.51.98:22 192.168.51.51:53010 ESTABLISHED
In this example, the sender has 19328 bytes in its TCP sending buffer.


You can also use tcpdump and wireshark to graph how well the tcp pipe
goes:
# catch 100 bytes of each packet on eth0, write to t.pcap.  We
# need only first 100 bytes because we care only about the TCP
# sequence number, but not the actual file content.
# need root access to sniff packets.
tcpdump -i eth0 -s 100 -w t.pcap
then feed the trace to wireshark:
wireshark t.pcap
# then select menu Statistics -- TCP Stream Graph -- 
# Time-Sequence Graph (Stevens)
With this you can visualize how the TCP flow goes (smooth or stalled
or fluctuates or excessive retries).  With tcpdump from both ends, you
can also check for lost packets.  (A few years ago I worked on a case
of stalled ssh; turned out the NIC firmware was defective, causing
excessive packet loss) Comparison between the two targets (good and
bad one) may show the difference.  Do your two targets run on the same
host machine?  Or host machines of the same configuration (same NIC
etc)?  Could one has bad NIC (eg. working but excessive packet loss in
bursts)?

As Wayne pointed out, there is pipe (or unix domain socket) between
rsync and ssh as well.  I don't know how to track the queue size in
pipe yet, so let's track the known ones (TCP) first.  

 Notice, the last successful read on the target side happened .06
 seconds before the last write on the source side, which is pretty
 much at the same time.
This is an important clue, although I don't know what to make out of
it yet; maybe a few lost tcp acks in a row?

How did your bwlimit=32 test go?


BTW, I am not rsync developer; I do use rsync a lot and tried to help
when possible, so please do not take my response as official.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html