I want to start by saying that I appreciate all the help and suggestions y'all have given on something that's obviously not your problem. :) Unfortunately, it looks like this problem is (1) far more interesting than I thought and (2) might be in BackupPC itself.
On Tue, Dec 15, 2009 at 11:29:46AM -0900, Chris Robertson wrote: > My guess would be that your firewalls are set up to close > "inactive" TCP sessions. Try adding "-o ServerAliveInterval=60" > to your RsyncClientCmd (so it looks something like "$sshPath -C -q > -x -o ServerAliveInterval=60 -l root $host $rsyncPath $argList+") > and see if that solves your problem. Also suggested by several other people. Well, umm, it kept the connection open. Now I've got [counts] 8 BackupPC_dump processes doing absolutely nothing. The one I was stracing and tcpduming and so on looks fine: ssh is sending the keepalives, and they're being responded to appropriately. The rsync is up on the client, the dump is up on the server, all procs are connected to each other correctly. They're just not *doing* anything. Nothing has errored out; BackupPC thinks everything is fine. Some of the places this is happening are very small backups that usually take a matter of minutes. Suddenly this isn't looking like a networking problem anymore; the networking appears to be just fine. This is looking like a BackupPC problem. My version, by the way, is 3.1.0. It's like at some point BackupPC forgot that it was supposed to be trying to back anything up. Is it possible the nightly jobs are configured in some weird way that broke things? Anyways, the one that has the problem consistently *also* always has it in exactly the same place; I was watching it in basically every way possible, so here comes the debugging stuff. As will become obvious, I probably need to turn the ssh debugging down and the BackupPC debugging up. If, for some reason, you want raw copies of any of this, let me know and I'll post them. On the machine in question, we're backing up /pictures/. /pictures/agate/shared/pictures/ has about a thousand subdirs that each have about a thousand subdirs; the *vast* majority of the files on the file system are in there. The backup always stops doing anything right after lstat of the last file in /pictures/agate/shared/pictures/0216/2987/. According to ls -U, /pictures/agate/shared/pictures/0216/ is the last directory in /pictures/agate/shared/pictures/, and /pictures/agate/shared/pictures/0216/2987/ is the last directory in /pictures/agate/shared/pictures/0216/. Now, there are other directories in /pictures/agate/shared/ that ls -U after pictures/, but with rsync 3.0 on the client, I'm assuming those are handled in parallel. So it looks like the backup is stalling right after finishing checking all the files. No "new" dir is made. There's a NewFileList, but it's empty. XferLog.z is mostly useless; too many -v in my ssh call apparently, but here's the last bit anyways (copied from less): - ------------- browse.jpg<CE>^...@^@w<85>@I:1^Kdisplay.jpgp<A3>^...@^@y<85>@I:1^Ksegment.jpg<8D>^...@^@t<85>@I:$^Y3187/DSCN1886_gallery.jpg<D4>^...@^@}...@i:2^Pfor_cropping.jpg<94>y...@^@<88>^...@i:2^Klibrary.jpg<8A> ^...@^@<8A>^...@i:2^Qmarket_banner.jpgN<C1>^...@^@<8C>^...@i:2 browse.jpg~...@^@<80>^...@i:1^D.JPG^T<A0>^I^@<8C>^...@i:1 _badge.j...@^@<87>^...@i:2^Lselector.jpg'^...@^@<84>^...@i:3^Nhop_banner.jpg<F7><88>^...@^@<89>^...@i:2^Kdisplay.jpg<BF>x...@^@<82>^...@i:2^ksegment.jpgl...@^@^...@i:%(968/food-wrapping-mr-gallery-x_badge.jpg<D5>b...@^@^...@i:d^kgallery.jpg^...@^@^...@i:D^Qmarket_banner.jpg<A8><D1>^...@^@^...@i:D^Klibrary.jpg<AB>^...@^@^...@i<BA>D^Oshop_banner.jpg<8C><CB>^...@^@:D browse.jpg&^...@^@@I:C^D.jpg<E3>^c...@^^$@I:C^L_segment.jpg<AD> ^...@^@@I:d^pfor_cropping.jpg!*...@^z$@I:D^Lselector.jpg<96>^...@^@^...@i:D^Kdisplay.jpgR1^A^@@I:%^K539/061.JPG<DC>f...@^@d @I:,^l_gallery.jpgr...@^@a @I:-^Klibrary.jpg<D7>^...@^@c @I:-^Pfor_cropping.jpg<F9>u...@^@` @I:-^Lselector.jpg<8C>^...@^@b @I:-^Qmarket_banner.jpg<D2>l...@^@d @I:-^oshop_banner.jpg^...@^@c @I:- browse.jpg<EE>^...@^@b @I:-^Ksegment.jpg<8D>^...@^@a @I:-^kdisplay.j...@^@^...@b @I:-^ibadge.jpg...@^@c @I:$^T7026/blog_browse.jpg<D9>^O ^...@^@<D7>o...@i:.^Klibrary.jpg<EC>^...@^@<D9>o...@i:.^lselector.jpg=...@^@<D8>o...@i:.^Kgallery.jpg,&^...@^@<D5>o...@i:.^Qmarket_banner.jpgl - ------------- The only relevant, and last, LOG line is: 2009-12-16 00:12:00 incr backup started back to 2009-12-12 00:00:08 (backup #0) for directory / Current time on the box is Wed Dec 16 10:54:14 PST 2009 (boxes agree to within a second; didn't test beyond flipping back and forth and typing "date"). Last bit of strace of the dump proc on the SERVER: backuppc 4607 24499 0 00:12 ? 00:00:07 /usr/bin/perl /usr/local/bin/BackupPC_dump -i ey02-s00416_pictures - ------------- 04:34:54.081334 read(7, "nner.jpgDX\1\0mW\342I\272-\vlibrary.jpg\177\24\0\0:,\4.JPGN \t\0kW\342I:,\22_market_banner.jpg^\253\1\0lW\342I:$\0376998/P1010001_1_shop_banner.jpg\234\274\0\0\257F\342I:3\4.JPG\25j\t\0\255F\342I:3\f_library.jpg\340\n\0\0\256F\342I:4\21market_banner.jpg\255\356\0\0\255F\342I:4\vgallery.jpg\0\31\0\0\257F\342I:4\vsegment.jpg<\5\0\0\260F\342I\2726\nlector.jpgK\6\0\0:4\vdisplay.jpg\212\214\0\0\256F\342I:4\nbrowse.jpgU\v\0\0\257F\342I:5\10adge.jpg]$\0\0\256F\342I:$%0716/skpstonespecanbowl4_selector.jpg\10\6\0\0\347F\341I:<\4.jpg\252\313\4\0\344F\341I:<\f_gallery.jpgP\27\0\0\347F\341I:=\21market_banner.jpg\360\235\0\0\345F\341I:=\17shop_banner.jpgC\230\0\0\346F\341I:=\tbadge.jpgG\"\0\0\345F\341I:=\vlibrary.jpg\346\v\0\0\346F\341I:=\nbrowse.jpg+\n\0\0\347F\341I:=\vsegment.jpg\326\4\0\0\350F\341I:=\vdisplay.jpgh\223\0\0\345F\341I:$\0319342/CIMG8427_library.jpgd\16\0\0\1[\342I:2\fselector....@\6\0\0\3[\342i:2\tbadge.jpg<0\0\0\1[\342I\2722\21market_banner.jpg\350\260\0\0:2\vgallery.jpgV\37\0\0\3[\342I:1\4.JPGR\210\2\0\0[\342I:1\20_shop_banner.jpg\260\243\0\0\2[\342I:2\vdisplay.jpg\212\311\0\0\1[\342I:2\vsegment.jpg\32\5\0\0\3[\342I\2722\nbrowse.jpg\363\v\0\0:$&7348/fuchsia_on_blue_2.JPG_display.jpg\346<\1\0\351K\342I\272?\tbadge.jpgH[\0\0:?\vsegment.jpg\305\t\0\0\353K\342I:>\4.jpg\257\360\22\0\350K\342I:>\f_library.jpg:\26\0\0\352K\342I\272?\vgallery.jpgf<\0\0:?\fselector.jpgS\f\0\0\353K\342I:@\16hop_banner.jpgA|\1\0\352K\342I:?\nbrowse.jpg\227\27\0\0\353K\342I:?\21market_banner.jpg`\301\1\0\351K\342I:$$0516/IMG_7919-copy_market_banner.jpg\363\366\0\0*D\341I:7\fselector.jpg4\n\0\0,D\341I:6\4.jpg\312y\10\0)D\341I:6\f_segment.jpg\371\7\0\0,D\341I\2727\nbrowse.jpg3\24\0\0:7\17shop_banner.jpg^\343\0\0+D\341I:7\vdisplay.jpgv\201\1\0*D\341I\2727\tbadge.jpg,K\0\0:7\vgallery.jpg\0223\0\0+D\341I\2727\vlibrary.jpg\352\23\0\0:$\0379142/CIMG8401_market_banner.jpg\205\351\0\0\253X\342I:2\vgallery.jpg\352#\0\0\254X\342I\2722\nbrowse.jpgq\r\0\0\2722\17shop_banner.jpg\264\330\0\0\2722\tbadge.jpg\3677\0\0\2722\vlibrary.jpg\225\17\0\0:2\fselector.jpg\271\6\0\0\255X\342I:1\4.JPG~\333\2\0\253X\342I:1\f_display.jpg\207\371\0\0\254X\342I:2\vsegment.jpg[\5\0\0\255X\342I:$)3457/ByNanasHands_437_gal"..., 65536) = 4092 04:34:54.081334 select(8, [7], NULL, [7], NULL) = 1 (in [7]) 04:35:06.849495 read(7, "\200\f\0\7"..., 65536) = 4 04:35:06.849495 select(8, [7], NULL, [7], NULL) = 1 (in [7]) 04:35:06.849495 read(7, ".jpg\3306\0\0002]\342I:7\21market_banner.jpg\323<\1\0000]\342I:$\0333057/IMG_3601new_browse.jpg\216\t\0\0\244\205\341I:5\vdisplay.jpg\274S\0\0\240\205\341I:5\vsegment.jpg\346\4\0\0\246\205\341I:7\nlector.jpg\315\5\0\0\245\205\341I:4\4.jpgy\304\22\0\235\205\341I:4\22_market_banner.jpg\r\252\0\0\236\205\341I:5\vgallery.jpgk\23\0\0\243\205\341I:5\tbadge.jpg\17\33\0\0\237\205\341I:5\vlibrary.jpg'\n\0\0\241\205\341I:5\17shop_banner.jpg\331u\0\0\242\205\341I:$\0265867/banner_browse.jpg\225\f\0\0\17*\342I\2720\vsegment.jpg\330\5\0\0\2720\vdisplay.jpg\30!\0\0\2720\21market_banner.jpg}\\\0\0\2720\vgallery.jpg\357\26\0\0\2720\vlibrary.jpgC\5\0\0\272/\4.JPG\t/\0\0\272/\r_selector.jpg\2\7\0\0\2720\tbadge.jpg\233\36\0\0\2720\17shop_banner.jpg\300?\0\0:$\0262887/williamsburg3.JPG\327v\5\0.{\341I:6\f_display.jpg\323\267\0\0/{\341I\2727\tbadge.jpg\34.\0\0:7\17shop_banner.jpg\227\344\0\0000{\341I:8\negment.jpg&\6\0\0001{\341I:9\nlector.jpg|\7\0\0000{\341I\2727\nbrowse.jpg\330\f\0\0:7\vlibrary.jpg\267\16\0\0/{\341I:7\21market_banner.jpg'\f\1\0.{\341I:7\vgallery.jpg\236\36\0\0000{\341I:$\0267919/Joy_2_segment.jpg-\7\0\0\312O\342I\272/\vdisplay.jpg\266\315\0\0\272/\20for_cropping.jpgg\227\0\0\272/\tbadge.jpg\2762\0\0\2720\trowse.jpg\27\20\0\0\272.\4.JPGZ\256\2\0\272.\20_shop_banner.jpgY\270\0\0\2720\velector.jpg\360\10\0\0\272/\vlibrary.jpg\234\22\0\0\272/\vgallery.jpg.#\0\0\272/\21market_banner.jpg\332\311\0\0:$\0320316/P1010406_selector.jpg\352\5\0\0\276B\341I:2\nbrowse.jpg\346\n\0\0\275B\341I:2\vdisplay.jpgY\275\0\0\273B\341I:2\vsegment.jpg\24\5\0\0\277B\341I:3\16hop_banner.jpg\3649\1\0\274B\341I:1\4.JPG\214$\20\0\271B\341I:1\f_library.jpg~\v\0\0\274B\341I:2\tbadge.jpg\r/\0\0\272B\341I\2722\21market_banner.jpg\310\261\1\0:2\vgallery.jpgb\35\0\0\275B\341I:$\0343257/S-54b_market_banner.jpg\245\17\0\0\233\223\341I:/\vlibrary.jpg6\v\0\0\234\223\341I\272/\vgallery.jpg\325\23\0\0\272/\17shop_banner.jpg\240\17\0\0\272/\vdisplay.jpg*[\0\0\272/\vsegment.jpg\310\4\0\0\272/\nbrowse.jpg\320\t\0\0:.\4.jpg\255T\1\0\233\223\341I:.\r_selector.jpg\33\6\0\0\234\223\341I\272/\tbadge.jpg\215\33\0\0:$\0309742/Jophiel_segment.jpg\364\5\0\0\0i\342I:1\vdisplay.jpg\23V\0\0\354h\342I:1\tbadge.jpg\321\"\0\0\377h\342I:1\17shop_banner.jpg\235\\\0\0\354h\342I:2\velector.jpgm\7\0\0\0i\342I:0\4.jpg\30a\2\0\353h\342"..., 65536) = 3200 04:35:06.849495 select(8, [7], NULL, [7], NULL - ------------- Note the select that's been sitting there for 6+ hours. lsof, same proc, on the SERVER: - ------------- xc88-s00071 ~ # lsof -p 4607 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME BackupPC_ 4607 backuppc cwd DIR 8,1 624 2 / BackupPC_ 4607 backuppc rtd DIR 8,1 624 2 / BackupPC_ 4607 backuppc txt REG 8,1 1104232 14724 /usr/bin/perl5.8.8 BackupPC_ 4607 backuppc mem REG 8,1 115056 76305 /lib64/ld-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 93392 75381 /lib64/libpthread-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 84904 76357 /lib64/libnsl-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 14632 76341 /lib64/libdl-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 526568 75404 /lib64/libm-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 22728 75346 /lib64/libcrypt-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 10592 76264 /lib64/libutil-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 1305848 76332 /lib64/libc-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 31136 29639 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Data/Dumper/Dumper.so BackupPC_ 4607 backuppc mem REG 8,1 10288 29623 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Cwd/Cwd.so BackupPC_ 4607 backuppc mem REG 8,1 14288 29711 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Fcntl/Fcntl.so BackupPC_ 4607 backuppc mem REG 8,1 22592 29891 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Socket/Socket.so BackupPC_ 4607 backuppc mem REG 8,1 18440 29666 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Digest/MD5/MD5.so BackupPC_ 4607 backuppc mem REG 8,1 40080 29690 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Encode/Encode.so BackupPC_ 4607 backuppc mem REG 8,1 18592 29610 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/IO/IO.so BackupPC_ 4607 backuppc mem REG 8,1 26904 209613 /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/List/Util/Util.so BackupPC_ 4607 backuppc mem REG 8,1 18560 29648 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/File/Glob/Glob.so BackupPC_ 4607 backuppc mem REG 8,1 67944 209571 /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/Compress/Raw/Zlib/Zlib.so BackupPC_ 4607 backuppc mem REG 8,1 84368 76323 /lib64/libz.so.1.2.3 BackupPC_ 4607 backuppc mem REG 8,1 22688 211593 /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/File/RsyncP/Digest/Digest.so BackupPC_ 4607 backuppc mem REG 8,1 52064 211597 /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/File/RsyncP/FileList/FileList.so BackupPC_ 4607 backuppc mem REG 8,1 31544 76340 /lib64/libnss_compat-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 43496 75398 /lib64/libnss_nis-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 43448 75374 /lib64/libnss_files-2.6.1.so BackupPC_ 4607 backuppc 0w CHR 1,3 607 /dev/null BackupPC_ 4607 backuppc 1w FIFO 0,6 15292289 pipe BackupPC_ 4607 backuppc 2w FIFO 0,6 15292289 pipe BackupPC_ 4607 backuppc 3w REG 8,33 2726 430785 /backups/pc/ey02-s00416_pictures/LOG.122009 BackupPC_ 4607 backuppc 4w REG 8,33 44026496 11706115 /backups/pc/ey02-s00416_pictures/XferLOG.z BackupPC_ 4607 backuppc 5w REG 8,33 0 12301398 /backups/pc/ey02-s00416_pictures/NewFileList BackupPC_ 4607 backuppc 7u unix 0xffff88002497f5c0 15292306 socket - ------------- Last bits of strace of the ssh/rsyc proc on the SERVER: backuppc 4610 4607 0 00:12 ? 00:00:01 /usr/bin/ssh -p 8416 -i /engineyard/backuppc/.ssh/id_dsa -o StrictHostKeyChecking=no -o ServerAliveInterval=60 -vvv -x -l root 65.74.174.196 /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --times --block-size=2048 --recursive . / - ------------- 10:54:07.576492 write(3, "\221\221\241\"\310\217\321\26\3\302\f\255ne8.\235\17\350~\201\234\227\305\34\200D\271\323\2463K\35\355C\355\3226\340\252hm\336\310\267-J!?\273\362\301\333\335\237'H\32\361F\370\262W\373"..., 64) = 64 10:54:07.576492 select(7, [3 4], [], NULL, {60, 0}) = 1 (in [3], left {60, 0}) 10:54:07.576492 read(3, "\3377\335\202\35\5\212_\361\205R\2176\vJ>\306Q\352\370(\224\342+I\211\234\231\250<\220L"..., 8192) = 32 10:54:07.576492 select(7, [3 4], [], NULL, {60, 0}) = 0 (Timeout) 10:55:07.577250 select(7, [3 4], [3], NULL, {60, 0}) = 1 (out [3], left {60, 0}) 10:55:07.577250 write(3, "\320\212\6\230!\253\314>j\300\356\1^\306\351\217\230f\3044G\f\235y\263>\223\352i\222\262\361\34\344G\231\v\215EyBs\300$\343\325\200\23R\216\332E\264e\217\304\336\314Q]\227$BM"..., 64) = 64 10:55:07.577250 select(7, [3 4], [], NULL, {60, 0}) = 1 (in [3], left {60, 0}) 10:55:07.577250 read(3, "$1\277{\241\253\3k\351\35\225\242\30\2\2770o\335\v\35\342\333%\251\312[\262x#y9O"..., 8192) = 32 10:55:07.577250 select(7, [3 4], [], NULL, {60, 0}) = 0 (Timeout) 10:56:07.578008 select(7, [3 4], [3], NULL, {60, 0}) = 1 (out [3], left {60, 0}) 10:56:07.578008 write(3, "\350]\373a\246G\21\255\352\246\317\27\"\252\207\256\243\337\33\343\334N2\232\350\316B\316'\204\345pq!\5\3R\217mH\225\335?e\302wI\25\0327\357V\373\350\376\235\330_\222\250\270\6\371\350"..., 64) = 64 10:56:07.578008 select(7, [3 4], [], NULL, {60, 0}) = 1 (in [3], left {60, 0}) 10:56:07.578008 read(3, "~\f\325\212\323u\227\375X\226W`>\3258G\2471\311\306S;\3\37\273\23M\263q$\v\324"..., 8192) = 32 10:56:07.578008 select(7, [3 4], [], NULL, {60, 0} - ------------- Note the rather different timestamps. lsof, same proc, on the SERVER: - ------------- xc88-s00071 ~ # lsof -p 4607 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME BackupPC_ 4607 backuppc cwd DIR 8,1 624 2 / BackupPC_ 4607 backuppc rtd DIR 8,1 624 2 / BackupPC_ 4607 backuppc txt REG 8,1 1104232 14724 /usr/bin/perl5.8.8 BackupPC_ 4607 backuppc mem REG 8,1 115056 76305 /lib64/ld-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 93392 75381 /lib64/libpthread-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 84904 76357 /lib64/libnsl-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 14632 76341 /lib64/libdl-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 526568 75404 /lib64/libm-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 22728 75346 /lib64/libcrypt-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 10592 76264 /lib64/libutil-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 1305848 76332 /lib64/libc-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 31136 29639 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Data/Dumper/Dumper.so BackupPC_ 4607 backuppc mem REG 8,1 10288 29623 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Cwd/Cwd.so BackupPC_ 4607 backuppc mem REG 8,1 14288 29711 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Fcntl/Fcntl.so BackupPC_ 4607 backuppc mem REG 8,1 22592 29891 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Socket/Socket.so BackupPC_ 4607 backuppc mem REG 8,1 18440 29666 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Digest/MD5/MD5.so BackupPC_ 4607 backuppc mem REG 8,1 40080 29690 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/Encode/Encode.so BackupPC_ 4607 backuppc mem REG 8,1 18592 29610 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/IO/IO.so BackupPC_ 4607 backuppc mem REG 8,1 26904 209613 /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/List/Util/Util.so BackupPC_ 4607 backuppc mem REG 8,1 18560 29648 /usr/lib64/perl5/5.8.8/x86_64-linux/auto/File/Glob/Glob.so BackupPC_ 4607 backuppc mem REG 8,1 67944 209571 /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/Compress/Raw/Zlib/Zlib.so BackupPC_ 4607 backuppc mem REG 8,1 84368 76323 /lib64/libz.so.1.2.3 BackupPC_ 4607 backuppc mem REG 8,1 22688 211593 /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/File/RsyncP/Digest/Digest.so BackupPC_ 4607 backuppc mem REG 8,1 52064 211597 /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux/auto/File/RsyncP/FileList/FileList.so BackupPC_ 4607 backuppc mem REG 8,1 31544 76340 /lib64/libnss_compat-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 43496 75398 /lib64/libnss_nis-2.6.1.so BackupPC_ 4607 backuppc mem REG 8,1 43448 75374 /lib64/libnss_files-2.6.1.so BackupPC_ 4607 backuppc 0w CHR 1,3 607 /dev/null BackupPC_ 4607 backuppc 1w FIFO 0,6 15292289 pipe BackupPC_ 4607 backuppc 2w FIFO 0,6 15292289 pipe BackupPC_ 4607 backuppc 3w REG 8,33 2726 430785 /backups/pc/ey02-s00416_pictures/LOG.122009 BackupPC_ 4607 backuppc 4w REG 8,33 44026496 11706115 /backups/pc/ey02-s00416_pictures/XferLOG.z BackupPC_ 4607 backuppc 5w REG 8,33 0 12301398 /backups/pc/ey02-s00416_pictures/NewFileList BackupPC_ 4607 backuppc 7u unix 0xffff88002497f5c0 15292306 socket - ------------- rsync/ssh on the CLIENT: root 21661 3254 0 00:12 ? 00:00:01 sshd: r...@notty root 21663 21661 30 00:12 ? 03:18:21 /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --times --block-size=2048 --recursive . / last bits of strace from rsync/ssh on the CLIENT: - ------------- 04:34:54.502841 lstat("pictures/agate/shared/pictures/0216/2987/Sapphire_Pearl_Pendant_segment.jpg", {st_mode=S_IFREG|0644, st_size=1152, ...}) = 0 04:34:54.503024 lstat("pictures/agate/shared/pictures/0216/2987/Sapphire_Pearl_Pendant.JPG", {st_mode=S_IFREG|0644, st_size=88733, ...}) = 0 04:34:54.503208 lstat("pictures/agate/shared/pictures/0216/2987/Sapphire_Pearl_Pendant_market_banner.jpg", {st_mode=S_IFREG|0644, st_size=21168, ...}) = 0 04:34:54.503392 getdents(3, /* 0 entries */, 4096) = 0 04:34:54.503459 close(3) = 0 04:34:54.503523 gettimeofday({1260966894, 503548}, NULL) = 0 04:34:54.503583 gettimeofday({1260966894, 503608}, NULL) = 0 04:34:54.503648 mmap(NULL, 20447232, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b3d5b87c000 04:35:06.795465 munmap(0x2b3d5b87c000, 20447232) = 0 04:35:06.844403 select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {60, 0}) 04:35:06.844557 write(1, "\200\f\0\7", 4) = 4 04:35:06.844633 select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {60, 0}) 04:35:06.844739 write(1, ".jpg\3306\0\0002]\342I:7\21market_banner.jpg\323<\1\0000]\342I:$\0333057/IMG_3601new_browse.jpg\216\t\0\0\244\205\341I:5\vdisplay.jpg\274S\0\0\240\205\341I:5\vsegment.jpg\346\4\0\0\246\205\341I:7\nlector.jpg\315\5\0\0\245\205\341I:4\4.jpgy\304\22\0\235\205\341I:4\22_market_banner.jpg\r\252\0\0\236\205\341I:5\vgallery.jpgk\23\0\0\243\205\341I:5\tbadge.jpg\17\33\0\0\237\205\341I:5\vlibrary.jpg\'\n\0\0\241\205\341I:5\17shop_banner.jpg\331u\0\0\242\205\341I:$\0265867/banner_browse.jpg\225\f\0\0\17*\342I\2720\vsegment.jpg\330\5\0\0\2720\vdisplay.jpg\30!\0\0\2720\21market_banner.jpg}\\\0\0\2720\vgallery.jpg\357\26\0\0\2720\vlibrary.jpgC\5\0\0\272/\4.JPG\t/\0\0\272/\r_selector.jpg\2\7\0\0\2720\tbadge.jpg\233\36\0\0\2720\17shop_banner.jpg\300?\0\0:$\0262887/williamsburg3.JPG\327v\5\0.{\341I:6\f_display.jpg\323\267\0\0/{\341I\2727\tbadge.jpg\34.\0\0:7\17shop_banner.jpg\227\344\0\0000{\341I:8\negment.jpg&\6\0\0001{\341I:9\nlector.jpg|\7\0\0000{\341I\2727\nbrowse.jpg\330\f\0\0:7\vlibrary.jpg\267\16\0\0/{\341I:7\21market_banner.jpg\'\f\1\0.{\341I:7\vgallery.jpg\236\36\0\0000{\341I:$\0267919/Joy_2_segment.jpg-\7\0\0\312O\342I\272/\vdisplay.jpg\266\315\0\0\272/\20for_cropping.jpgg\227\0\0\272/\tbadge.jpg\2762\0\0\2720\trowse.jpg\27\20\0\0\272.\4.JPGZ\256\2\0\272.\20_shop_banner.jpgY\270\0\0\2720\velector.jpg\360\10\0\0\272/\vlibrary.jpg\234\22\0\0\272/\vgallery.jpg.#\0\0\272/\21market_banner.jpg\332\311\0\0:$\0320316/P1010406_selector.jpg\352\5\0\0\276B\341I:2\nbrowse.jpg\346\n\0\0\275B\341I:2\vdisplay.jpgY\275\0\0\273B\341I:2\vsegment.jpg\24\5\0\0\277B\341I:3\16hop_banner.jpg\3649\1\0\274B\341I:1\4.JPG\214$\20\0\271B\341I:1\f_library.jpg~\v\0\0\274B\341I:2\tbadge.jpg\r/\0\0\272B\341I\2722\21market_banner.jpg\310\261\1\0:2\vgallery.jpgb\35\0\0\275B\341I:$\0343257/S-54b_market_banner.jpg\245\17\0\0\233\223\341I:/\vlibrary.jpg6\v\0\0\234\223\341I\272/\vgallery.jpg\325\23\0\0\272/\17shop_banner.jpg\240\17\0\0\272/\vdisplay.jpg*[\0\0\272/\vsegment.jpg\310\4\0\0\272/\nbrowse.jpg\320\t\0\0:.\4.jpg\255T\1\0\233\223\341I:.\r_selector.jpg\33\6\0\0\234\223\341I\272/\tbadge.jpg\215\33\0\0:$\0309742/Jophiel_segment.jpg\364\5\0\0\0i\342I:1\vdisplay.jpg\23V\0\0\354h\342I:1\tbadge.jpg\321\"\0\0\377h\342I:1\17shop_banner.jpg\235\\\0\0\354h\342I:2\velector.jpgm\7\0\0\0i\342I:0\4.jpg\30a\2\0\353h"..., 3200) = 3200 04:35:06.845208 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 04:36:06.844621 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 04:37:06.949968 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 04:38:06.950773 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 04:39:06.951859 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 04:40:06.952896 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 04:41:06.953942 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 04:42:07.181208 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) [snip] 10:53:10.572467 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 10:54:11.041651 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 10:55:11.282693 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 10:56:11.523736 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 10:57:11.524637 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) 10:58:11.525718 select(1, [0], [], NULL, {60, 0} - ------------- lsof of same on CLIENT: - ------------- ey02-s00416 ~ # lsof -p 21663 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME rsync 21663 root cwd DIR 8,1 760 2 / rsync 21663 root rtd DIR 8,1 760 2 / rsync 21663 root txt REG 8,1 372408 519144 /usr/bin/rsync rsync 21663 root mem REG 8,1 114952 78335 /lib64/ld-2.6.1.so rsync 21663 root mem REG 8,1 31296 65085 /lib64/libacl.so.1.1.0 rsync 21663 root mem REG 8,1 31424 35956 /usr/lib64/libpopt.so.0.0.0 rsync 21663 root mem REG 8,1 1305744 78362 /lib64/libc-2.6.1.so rsync 21663 root mem REG 8,1 18696 99779 /lib64/libattr.so.1.1.0 rsync 21663 root 0u unix 0xffff8800b4546c40 175163806 socket rsync 21663 root 1u unix 0xffff8800b4546c40 175163806 socket rsync 21663 root 2u unix 0xffff880044935740 175163808 socket - ------------- And, finally, tcpdump on the CLIENT: - ------------- 04:34:53.525749 IP (tos 0x8, ttl 56, id 19535, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xf81e (correct), 54385:54385(0) ack 115876872 win 501 <nop,nop,timestamp144817076 513742551> 04:34:53.525792 IP (tos 0x8, ttl 64, id 35494, offset 0, flags [DF], proto TCP (6), length 1284) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: P 115878320:115879552(1232) ack 54385 win 132 <nop,nop,timestamp 513742551 144817076> 04:34:53.526944 IP (tos 0x8, ttl 56, id 19536, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xeda6 (correct), 54385:54385(0) ack 115879552 win 501 <nop,nop,timestamp144817076 513742551> 04:34:54.074567 IP (tos 0x8, ttl 64, id 35495, offset 0, flags [DF], proto TCP (6), length 100) ey02-s00416.ey02.engineyard.com.ssh> 129.177.74.65.static.xc88.engineyard.com.42585: P 115879552:115879600(48) ack 54385 win 132 <nop,nop,timestamp 513742688 144817076> 04:34:54.075133 IP (tos 0x8, ttl 64, id 35496, offset 0, flags [DF], proto TCP (6), length 2948) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: . 115879600:115882496(2896) ack 54385 win 132 <nop,nop,timestamp 513742688 144817076> 04:34:54.076020 IP (tos 0x8, ttl 56, id 19537, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xe6bc (correct), 54385:54385(0) ack 115881048 win 501 <nop,nop,timestamp144817213 513742688> 04:34:54.076051 IP (tos 0x8, ttl 64, id 35498, offset 0, flags [DF], proto TCP (6), length 1284) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: P 115882496:115883728(1232) ack 54385 win 132 <nop,nop,timestamp 513742689 144817213> 04:34:54.077032 IP (tos 0x8, ttl 56, id 19538, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xdc44 (correct), 54385:54385(0) ack 115883728 win 501 <nop,nop,timestamp144817213 513742688> 04:35:06.844658 IP (tos 0x8, ttl 64, id 35499, offset 0, flags [DF], proto TCP (6), length 100) ey02-s00416.ey02.engineyard.com.ssh> 129.177.74.65.static.xc88.engineyard.com.42585: P 115883728:115883776(48) ack 54385 win 132 <nop,nop,timestamp 513745881 144817213> 04:35:06.845281 IP (tos 0x8, ttl 64, id 35500, offset 0, flags [DF], proto TCP (6), length 2948) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: . 115883776:115886672(2896) ack 54385 win 132 <nop,nop,timestamp 513745881 144817213> 04:35:06.846300 IP (tos 0x8, ttl 56, id 19539, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xbd7a (correct), 54385:54385(0) ack 115885224 win 501 <nop,nop,timestamp144820406 513745881> 04:35:06.846353 IP (tos 0x8, ttl 64, id 35502, offset 0, flags [DF], proto TCP (6), length 404) ey02-s00416.ey02.engineyard.com.ssh> 129.177.74.65.static.xc88.engineyard.com.42585: P 115886672:115887024(352) ack 54385 win 132 <nop,nop,timestamp 513745881 144820406> 04:35:06.847088 IP (tos 0x8, ttl 56, id 19540, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xb672 (correct), 54385:54385(0) ack 115887024 win 501 <nop,nop,timestamp144820406 513745881> 04:36:06.846945 IP (tos 0x8, ttl 56, id 19541, offset 0, flags [DF], proto TCP (6), length 116) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: P 54385:54449(64) ack 115887024 win 501 <nop,nop,timestamp 144835406 513745881> 04:36:06.846970 IP (tos 0x8, ttl 64, id 35503, offset 0, flags [DF], proto TCP (6), length 52) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: ., cksum 0x4273 (correct), 115887024:115887024(0) ack 54449 win 132 <nop,nop,timestamp 513760881 144835406> 04:36:06.847037 IP (tos 0x8, ttl 64, id 35504, offset 0, flags [DF], proto TCP (6), length 84) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: P 115887024:115887056(32) ack 54449 win 132 <nop,nop,timestamp 513760881 144835406> 04:36:06.886586 IP (tos 0x8, ttl 56, id 19542, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0x40d8 (correct), 54449:54449(0) ack 115887056 win 501 <nop,nop,timestamp144835416 513760881> 04:37:06.847775 IP (tos 0x8, ttl 56, id 19543, offset 0, flags [DF], proto TCP (6), length 116) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: P 54449:54513(64) ack 115887056 win 501 <nop,nop,timestamp 144850406 513760881> 04:37:06.847878 IP (tos 0x8, ttl 64, id 35505, offset 0, flags [DF], proto TCP (6), length 84) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: P 115887056:115887088(32) ack 54513 win 132 <nop,nop,timestamp 513775881 144850406> 04:37:06.848515 IP (tos 0x8, ttl 56, id 19544, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0xcb51 (correct), 54513:54513(0) ack 115887088 win 501 <nop,nop,timestamp144850406 513775881> [snip] 10:53:07.575258 IP (tos 0x8, ttl 56, id 20295, offset 0, flags [DF], proto TCP (6), length 116) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: P 78513:78577(64) ack 115899088 win 501 <nop,nop,timestamp 150490516 519400964> 10:53:07.575383 IP (tos 0x8, ttl 64, id 35883, offset 0, flags [DF], proto TCP (6), length 84) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: P 115899088:115899120(32) ack 78577 win 132 <nop,nop,timestamp 519415964 150490516> 10:53:07.576056 IP (tos 0x8, ttl 56, id 20296, offset 0, flags [DF], proto TCP (6), length 52) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: ., cksum 0x1e64 (correct), 78577:78577(0) ack 115899120 win 501 <nop,nop,timestamp 150490516 519415964> 10:54:07.576336 IP (tos 0x8, ttl 56, id 20297, offset 0, flags [DF], proto TCP (6), length 116) 129.177.74.65.static.xc88.engineyard.com.42585 > ey02-s00416.ey02.engineyard.com.ssh: P 78577:78641(64) ack 115899120 win 501 <nop,nop,timestamp 150505516 519415964> 10:54:07.576514 IP (tos 0x8, ttl 64, id 35884, offset 0, flags [DF], proto TCP (6), length 84) ey02-s00416.ey02.engineyard.com.ssh > 129.177.74.65.static.xc88.engineyard.com.42585: P 115899120:115899152(32) ack 78641 win 132 <nop,nop,timestamp 519430964 150505516> - ------------- Suggestions welcome, but it sure doesn't look like any kind of networking problem to me. The only place I can see where there might have been some kind of communication breakdown is between BackupPC_dump on the server and the ssh/rsync on the server; it looks like BackupPC_dump is waiting for more data but rsync is done sending it. The last thing the client rsync sent was: 04:35:06.844739 write(1, ".jpg\3306\0\0002]\342I:7\21market_banner.jpg\323<\1\0000[snip]"..., 3200) = 3200 And the last thing BackupPC_dump saw before the select that never ends was: 04:35:06.849495 read(7, ".jpg\3306\0\0002]\342I:7\21market_banner.jpg\323<\1\0000[snip]"..., 65536) = 3200 So clearly their last communication was in accord. It just looks like BackupPC_dump has no idea that rsync is done. So after a while, the NAT deletes the apparently "dead" connection, except with keepalives that doesn't happen, so we just sit here forever. The other 7 BackupPC_dump sessions seem to be in the same state: select stuck on FD 7. I'm quite stumped. Any ideas? -Robin -- They say: "The first AIs will be built by the military as weapons." And I'm thinking: "Does it even occur to you to try for something other than the default outcome?" See http://shrunklink.com/cdiz http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/ ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/