On 21/01/16 19:39, OmegaPhil wrote: > On 20/01/16 22:01, sf...@users.sourceforge.net wrote: >> OmegaPhil: >>> It has now been some time since I got the kernel memory allocation >>> failures, so clearly the libau hack has fixed it - thanks. >> >> Glad to hear that! >> (Honestly speaking, I totally forgot about this issue) >> >> >>> In the manpage, please can you change 'If you have a directory which has >>> millions of files' to say 'tens of thousands of files', and it would be >>> useful to mention 'page allocation failure' somehow so that its easy for >> ::: >> >> How about the attached diff? > > The diff looks good, however for normal users it might be useful to > force them to think 'syslog', since normal programs will probably throw > a useless generic 'I/O error': > > 'You may meet "out of memory" message or "page allocation failure" due > to the memory fragmentation or real starvation' > > V > > 'A program using the directory may throw an "out of memory" error and/or > the kernel may output a "page allocation failure" associated with the > program in the syslog, due to memory fragmentation or real starvation' > > >>> rsync: readdir("/omega1-storage-4/." (in backups)): Invalid argument (22)= >> >> Hmm, won't you investigate it a little more? >> - which systemcall returned EINVAL(22)? >> - what parameter did rsync pass to the systemcall (or readdir)? >> >> And is your $LIBAU set to "all"? > > I did look into it on the rsync side, didn't look useful - see > https://download.samba.org/pub/unpacked/rsync/flist.c:send_directory, > the readdir is called on line 1739, with the error reported on 1771. > > Suddenly the VM doesn't error anymore in the particular test I set up, > so back on the server, I fiddled with the rsync init.d script and ran > the daemon via 'strace -fv'. One EINVAL hit in the resulting file, here > is it with some context: > > ======================================================================= > > [pid 1293] stat("/omega1-home/", {st_dev=makedev(0, 34), st_ino=273972, > st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, > st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12, > st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0 > [pid 1293] chdir("/omega1-home/") = 0 > [pid 1293] socketpair(PF_LOCAL, SOCK_STREAM, 0, [4, 6]) = 0 > [pid 1293] fcntl(4, F_GETFL) = 0x2 (flags O_RDWR) > [pid 1293] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > [pid 1293] fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) > [pid 1293] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > [pid 1293] clone(child_stack=0, > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > child_tidptr=0x7f7b499149d0) = 1294 > Process 1294 attached > [pid 1293] close(6 <unfinished ...> > [pid 1294] set_robust_list(0x7f7b499149e0, 24 <unfinished ...> > [pid 1293] <... close resumed> ) = 0 > [pid 1294] <... set_robust_list resumed> ) = 0 > [pid 1293] lstat(".", <unfinished ...> > [pid 1294] close(4 <unfinished ...> > [pid 1293] <... lstat resumed> {st_dev=makedev(0, 34), st_ino=273972, > st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, > st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12, > st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0 > [pid 1294] <... close resumed> ) = 0 > [pid 1293] openat(AT_FDCWD, ".", > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC <unfinished ...> > [pid 1294] select(6, [5], [], [5], {60, 0} <unfinished ...> > [pid 1293] <... openat resumed> ) = 6 > [pid 1293] brk(0x564d6bf68000) = 0x564d6bf68000 > [pid 1293] fstatfs(6, {f_type=0x61756673, f_bsize=4096, > f_blocks=3418641366, f_bfree=652846041, f_bavail=649635649, f_files=0, > f_ffree=0, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0 > [pid 1293] ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0x41, 0x00, 0x40), > 0x7ffc3f394810) = -1 EINVAL (Invalid argument) > [pid 1293] sendto(3, "<28>Jan 21 19:11:31 rsyncd[1293]"..., 103, > MSG_NOSIGNAL, NULL, 0) = 103 > [pid 1293] fstatfs(6, {f_type=0x61756673, f_bsize=4096, > f_blocks=3418641366, f_bfree=652846041, f_bavail=649635649, f_files=0, > f_ffree=0, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0 > [pid 1293] futex(0x7f7b48b3d0a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > [pid 1293] close(6) = 0 > [pid 1293] lstat(".", {st_dev=makedev(0, 34), st_ino=273972, > st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, > st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12, > st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0 > > ======================================================================= > > After lstating '.', rsync appears to go on and lstat the subdirectories. > I'm guessing that due to the failure being an ioctl call, it didn't > appear in the usual '-e trace=file' invocation? > > >>> This appears to have happened after I upgraded the kernel to v4.3.3-5, >> >> Is this version debian kernel pkg's? >> According to your post in last year, your system is >> 4.2.0-1-amd64 #1 SMP Debian 4.2.5-1 (2015-10-27) x86_64 >> GNU/Linux - Debian Testing standard kernel. >> >> If this problem is specific to debian v4.3.3-5 kernel, then I will try >> finding the changes made in >> 1. vanilla v4.3.3 >> 2. debian v4.3.3-5 >> particulary around ioctl(2). > > Just confirmed, on this kernel the setup is fine: > > ======================================================================= > > Linux 4.2.0-1-amd64 #1 SMP Debian 4.2.6-1 (2015-11-10) x86_64 GNU/Linux > > ======================================================================= > > On this it breaks: > > ======================================================================= > > Linux 4.3.0-1-amd64 #1 SMP Debian 4.3.3-5 (2016-01-04) x86_64 GNU/Linux > > ======================================================================= > > Yes these are stock Debian kernels - the only special compilation I do > is your standalone aufs driver (there are some DKMS modules mind). > > Thanks
Minor update - rsync/aufs has suddenly decided to fail all over the place with this supposed kernel memory problem, so even SSH+rsync is somewhat unusable currently. I say somewhat, as then forcing the relevant environment variables through the rsync SSH session results in the 'invalid argument' problem in most but not all rsync calls.
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140