On 20/01/16 22:01, sf...@users.sourceforge.net wrote: > OmegaPhil: >> It has now been some time since I got the kernel memory allocation >> failures, so clearly the libau hack has fixed it - thanks. > > Glad to hear that! > (Honestly speaking, I totally forgot about this issue) > > >> In the manpage, please can you change 'If you have a directory which has >> millions of files' to say 'tens of thousands of files', and it would be >> useful to mention 'page allocation failure' somehow so that its easy for > ::: > > How about the attached diff?
The diff looks good, however for normal users it might be useful to force them to think 'syslog', since normal programs will probably throw a useless generic 'I/O error': 'You may meet "out of memory" message or "page allocation failure" due to the memory fragmentation or real starvation' V 'A program using the directory may throw an "out of memory" error and/or the kernel may output a "page allocation failure" associated with the program in the syslog, due to memory fragmentation or real starvation' >> rsync: readdir("/omega1-storage-4/." (in backups)): Invalid argument (22)= > > Hmm, won't you investigate it a little more? > - which systemcall returned EINVAL(22)? > - what parameter did rsync pass to the systemcall (or readdir)? > > And is your $LIBAU set to "all"? I did look into it on the rsync side, didn't look useful - see https://download.samba.org/pub/unpacked/rsync/flist.c:send_directory, the readdir is called on line 1739, with the error reported on 1771. Suddenly the VM doesn't error anymore in the particular test I set up, so back on the server, I fiddled with the rsync init.d script and ran the daemon via 'strace -fv'. One EINVAL hit in the resulting file, here is it with some context: ======================================================================= [pid 1293] stat("/omega1-home/", {st_dev=makedev(0, 34), st_ino=273972, st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12, st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0 [pid 1293] chdir("/omega1-home/") = 0 [pid 1293] socketpair(PF_LOCAL, SOCK_STREAM, 0, [4, 6]) = 0 [pid 1293] fcntl(4, F_GETFL) = 0x2 (flags O_RDWR) [pid 1293] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 1293] fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) [pid 1293] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 1293] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f7b499149d0) = 1294 Process 1294 attached [pid 1293] close(6 <unfinished ...> [pid 1294] set_robust_list(0x7f7b499149e0, 24 <unfinished ...> [pid 1293] <... close resumed> ) = 0 [pid 1294] <... set_robust_list resumed> ) = 0 [pid 1293] lstat(".", <unfinished ...> [pid 1294] close(4 <unfinished ...> [pid 1293] <... lstat resumed> {st_dev=makedev(0, 34), st_ino=273972, st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12, st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0 [pid 1294] <... close resumed> ) = 0 [pid 1293] openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC <unfinished ...> [pid 1294] select(6, [5], [], [5], {60, 0} <unfinished ...> [pid 1293] <... openat resumed> ) = 6 [pid 1293] brk(0x564d6bf68000) = 0x564d6bf68000 [pid 1293] fstatfs(6, {f_type=0x61756673, f_bsize=4096, f_blocks=3418641366, f_bfree=652846041, f_bavail=649635649, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0 [pid 1293] ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0x41, 0x00, 0x40), 0x7ffc3f394810) = -1 EINVAL (Invalid argument) [pid 1293] sendto(3, "<28>Jan 21 19:11:31 rsyncd[1293]"..., 103, MSG_NOSIGNAL, NULL, 0) = 103 [pid 1293] fstatfs(6, {f_type=0x61756673, f_bsize=4096, f_blocks=3418641366, f_bfree=652846041, f_bavail=649635649, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0 [pid 1293] futex(0x7f7b48b3d0a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 [pid 1293] close(6) = 0 [pid 1293] lstat(".", {st_dev=makedev(0, 34), st_ino=273972, st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12, st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0 ======================================================================= After lstating '.', rsync appears to go on and lstat the subdirectories. I'm guessing that due to the failure being an ioctl call, it didn't appear in the usual '-e trace=file' invocation? >> This appears to have happened after I upgraded the kernel to v4.3.3-5, > > Is this version debian kernel pkg's? > According to your post in last year, your system is > 4.2.0-1-amd64 #1 SMP Debian 4.2.5-1 (2015-10-27) x86_64 > GNU/Linux - Debian Testing standard kernel. > > If this problem is specific to debian v4.3.3-5 kernel, then I will try > finding the changes made in > 1. vanilla v4.3.3 > 2. debian v4.3.3-5 > particulary around ioctl(2). Just confirmed, on this kernel the setup is fine: ======================================================================= Linux 4.2.0-1-amd64 #1 SMP Debian 4.2.6-1 (2015-11-10) x86_64 GNU/Linux ======================================================================= On this it breaks: ======================================================================= Linux 4.3.0-1-amd64 #1 SMP Debian 4.3.3-5 (2016-01-04) x86_64 GNU/Linux ======================================================================= Yes these are stock Debian kernels - the only special compilation I do is your standalone aufs driver (there are some DKMS modules mind). Thanks
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140