On 2012-11-09 00:14, Justin T Pryzby wrote: > On Thu, Nov 08, 2012 at 11:54:52PM +0100, Christian Iversen wrote: >> On 2012-11-02 19:32, Justin T Pryzby wrote: >>> On Fri, Nov 02, 2012 at 02:33:13PM +0100, Christian Iversen wrote: >>>> However, 1 server is giving me a lot of trouble. It has a directory >>>> with (currently) 734088 files in it, and every time I try to backup >>>> this dir, rsync hangs after transferring roughly 2000 files. >>>> Sometimes it's around 1800, sometimes it's over 2100 (I think), but >>>> it's in that ballbark. > >> That was a bit tricky, since the whole process hangs shortly after >> being started (the files are not that big). But I managed to strace >> -f -p $PID1 -p $PID2 the whole thing. > Do you know you can strace something while starting it? > strace -f rsync -avz /path/dir $dest 2>/strace/rsync
Yes, very. But you asked for server straces, not client ones, so.. :-) >> [pid 27310] select(6, [5], [], NULL, {60, 0}) = 1 (in [5], left {59, >> 999997}) >> [pid 27310] read(5, >> "\4\0\0kA*\0\0\4\0\0kB*\0\0\4\0\0kC*\0\0\4\0\0kD*\0\0"..., 8184) = >> 208 >> [pid 27310] select(6, [5], [1], [1], {60, 0} >> >> >> That {59, 99999x} looks highly suspicious. rsync wouldn't happen to >> hold open 100000 file handles, would it? Or something to that >> effect? > > No, that means FD#5 has data waiting; and, of the 60 second theshhold > given to "select", 59.999997 seconds remain (it returned almost > immediately). -ENOSLEEP. Of course. > BTW, you can check file descriptors in /proc/27310/fd/ (and, in recent > kernels, fdinfo/ has even more). BTW, symlinks in /proc are > "special". Eg. if a process opens /file as FD#3, and then the user > "unlinks" that file, and creates a new file with the same name (but > different inode), referencing the pathname will access the new file. > But, /proc/pid/fd/3 will access the old file. I know about /proc/*/fd, but I didn't think to take a look at open files. Will do next time :) > Could you strace the rsync client, too? Yup (I added newlines for clarity): [pid 24969] read(4, "\377\330\377\340\0\20JFIF\0\1\1\0\0\1\0\1\0\0\377\376\0;CREATOR:"..., 2427) = 2427 [pid 24969] select(6, NULL, [5], [5], {60, 0}) = 1 (out [5], left {59, 999997}) [pid 24969] write(5, "\374\17\0\7", 4) = 4 [pid 24969] select(6, NULL, [5], [5], {60, 0}) = 1 (out [5], left {59, 999998}) [pid 24969] write(5, "GHIJSTUVWXYZcdefghijstuvwxyz\203\204\205\206"..., 4092) = 4092 [pid 24969] close(4) = 0 [pid 24969] open("data/www/virtual/beta.ageforce.dk/usrPictures/10365-90x80-OnlineModerator.jpg", O_RDONLY) = 4 [pid 24969] fstat(4, {st_mode=S_IFREG|0644, st_size=1702, ...}) = 0 [pid 24969] write(1, "data/www/virtual/beta.ageforce.d"..., 78data/www/virtual/beta.ageforce.dk/usrPictures/10365-90x80-OnlineModerator.jpg ) = 78 [pid 24969] read(4, "\377\330\377\340\0\20JFIF\0\1\1\0\0\1\0\1\0\0\377\376\0;CREATOR:"..., 1702) = 1702 [pid 24969] close(4) = 0 [pid 24969] open("data/www/virtual/beta.ageforce.dk/usrPictures/10365-90x80-OnlineModeratorImage.jpg", O_RDONLY) = 4 [pid 24969] fstat(4, {st_mode=S_IFREG|0644, st_size=2517, ...}) = 0 [pid 24969] write(1, "data/www/virtual/beta.ageforce.d"..., 83data/www/virtual/beta.ageforce.dk/usrPictures/10365-90x80-OnlineModeratorImage.jpg ) = 83 [pid 24969] read(4, "\377\330\377\340\0\20JFIF\0\1\1\0\0\1\0\1\0\0\377\376\0;CREATOR:"..., 2517) = 2517 [pid 24969] select(6, NULL, [5], [5], {60, 0}) = 1 (out [5], left {59, 999997}) [pid 24969] write(5, "\374\17\0\7", 4) = 4 [pid 24969] select(6, NULL, [5], [5], {60, 0}) = 1 (out [5], left {59, 999998}) [pid 24969] write(5, "R\321\360$3br\202\t\n\26\27\30\31\32%&'()*456789:CDEF"..., 4092) = 4092 [pid 24969] close(4) = 0 [pid 24969] open("data/www/virtual/beta.ageforce.dk/usrPictures/10366-160x168-Offline.jpg", O_RDONLY) = 4 [pid 24969] fstat(4, {st_mode=S_IFREG|0644, st_size=2712, ...}) = 0 [pid 24969] write(1, "data/www/virtual/beta.ageforce.d"..., 72data/www/virtual/beta.ageforce.dk/usrPictures/10366-160x168-Offline.jpg ) = 72 [pid 24969] read(4, "\377\330\377\340\0\20JFIF\0\1\1\0\0\1\0\1\0\0\377\376\0;CREATOR:"..., 2712) = 2712 [pid 24969] select(6, NULL, [5], [5], {60, 0}) = 0 (Timeout) [pid 24969] select(6, NULL, [5], [5], {60, 0} Seems it just stops. Wonder what's going on here? >>> What filesystem is it? Can you create a tarball of the directory? >> >> Tarball could be problematic. It's 8.6GB of data spread over 734088 >> files in a single dir. Yes, that many. Yes, I know it's insane. No, >> I didn't choose it :) >> >>> time c /path/to/dir |wc >> >> What's "c"? > Oops, I meant: > > time tar c /path/to/dir |wc > > That creates a tarball piped to wc. If a tarball is problematic, then > rsync may be a problem, too. But it seems you were able to run "find" > on that dir just fine. Yeah, and du. So I can read the inodes at least. > What filesystem is that? I believe "huge dirs" is something *some* > filesystems specificaly intend to handle gracefully, but some > filesystems handle poorly. Some programs may handle it poorly, too :) > Could you also send the output of df -i for that partition? It's vanilla ext4, with: $ df -i /data Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/storage-data 16384000 1475262 14908738 10% /data and: $ file /dev/mapper/storage-data /dev/mapper/storage-data: Linux rev 1.0 ext4 filesystem data, UUID=8b748eca-bdfe-4165-b7b3-debfb9463366 (needs journal recovery) (extents) (large files) (huge files) -- De bedste hilsner, Christian Iversen Systemadministrator, Meebox.net ------- Denne e-mail kan indeholde fortrolige oplysninger. Er du ikke den rette modtager, bedes du returnere og slette denne e-mail. ------- -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html