On 21/01/16 19:39, OmegaPhil wrote:
> On 20/01/16 22:01, sf...@users.sourceforge.net wrote:
>> OmegaPhil:
>>> It has now been some time since I got the kernel memory allocation
>>> failures, so clearly the libau hack has fixed it - thanks.
>>
>> Glad to hear that!
>> (Honestly speaking, I totally forgot about this issue)
>>
>>
>>> In the manpage, please can you change 'If you have a directory which has
>>> millions of files' to say 'tens of thousands of files', and it would be
>>> useful to mention 'page allocation failure' somehow so that its easy for
>>      :::
>>
>> How about the attached diff?
> 
> The diff looks good, however for normal users it might be useful to
> force them to think 'syslog', since normal programs will probably throw
> a useless generic 'I/O error':
> 
> 'You may meet "out of memory" message or "page allocation failure" due
> to the memory fragmentation or real starvation'
> 
> V
> 
> 'A program using the directory may throw an "out of memory" error and/or
> the kernel may output a "page allocation failure" associated with the
> program in the syslog, due to memory fragmentation or real starvation'
> 
> 
>>> rsync: readdir("/omega1-storage-4/." (in backups)): Invalid argument (22)=
>>
>> Hmm, won't you investigate it a little more?
>> - which systemcall returned EINVAL(22)?
>> - what parameter did rsync pass to the systemcall (or readdir)?
>>
>> And is your $LIBAU set to "all"?
> 
> I did look into it on the rsync side, didn't look useful - see
> https://download.samba.org/pub/unpacked/rsync/flist.c:send_directory,
> the readdir is called on line 1739, with the error reported on 1771.
> 
> Suddenly the VM doesn't error anymore in the particular test I set up,
> so back on the server, I fiddled with the rsync init.d script and ran
> the daemon via 'strace -fv'. One EINVAL hit in the resulting file, here
> is it with some context:
> 
> =======================================================================
> 
> [pid  1293] stat("/omega1-home/", {st_dev=makedev(0, 34), st_ino=273972,
> st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096,
> st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12,
> st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0
> [pid  1293] chdir("/omega1-home/")      = 0
> [pid  1293] socketpair(PF_LOCAL, SOCK_STREAM, 0, [4, 6]) = 0
> [pid  1293] fcntl(4, F_GETFL)           = 0x2 (flags O_RDWR)
> [pid  1293] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> [pid  1293] fcntl(6, F_GETFL)           = 0x2 (flags O_RDWR)
> [pid  1293] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> [pid  1293] clone(child_stack=0,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x7f7b499149d0) = 1294
> Process 1294 attached
> [pid  1293] close(6 <unfinished ...>
> [pid  1294] set_robust_list(0x7f7b499149e0, 24 <unfinished ...>
> [pid  1293] <... close resumed> )       = 0
> [pid  1294] <... set_robust_list resumed> ) = 0
> [pid  1293] lstat(".",  <unfinished ...>
> [pid  1294] close(4 <unfinished ...>
> [pid  1293] <... lstat resumed> {st_dev=makedev(0, 34), st_ino=273972,
> st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096,
> st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12,
> st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0
> [pid  1294] <... close resumed> )       = 0
> [pid  1293] openat(AT_FDCWD, ".",
> O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC <unfinished ...>
> [pid  1294] select(6, [5], [], [5], {60, 0} <unfinished ...>
> [pid  1293] <... openat resumed> )      = 6
> [pid  1293] brk(0x564d6bf68000)         = 0x564d6bf68000
> [pid  1293] fstatfs(6, {f_type=0x61756673, f_bsize=4096,
> f_blocks=3418641366, f_bfree=652846041, f_bavail=649635649, f_files=0,
> f_ffree=0, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0
> [pid  1293] ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0x41, 0x00, 0x40),
> 0x7ffc3f394810) = -1 EINVAL (Invalid argument)
> [pid  1293] sendto(3, "<28>Jan 21 19:11:31 rsyncd[1293]"..., 103,
> MSG_NOSIGNAL, NULL, 0) = 103
> [pid  1293] fstatfs(6, {f_type=0x61756673, f_bsize=4096,
> f_blocks=3418641366, f_bfree=652846041, f_bavail=649635649, f_files=0,
> f_ffree=0, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0
> [pid  1293] futex(0x7f7b48b3d0a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> [pid  1293] close(6)                    = 0
> [pid  1293] lstat(".", {st_dev=makedev(0, 34), st_ino=273972,
> st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096,
> st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12,
> st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0
> 
> =======================================================================
> 
> After lstating '.', rsync appears to go on and lstat the subdirectories.
> I'm guessing that due to the failure being an ioctl call, it didn't
> appear in the usual '-e trace=file' invocation?
> 
> 
>>> This appears to have happened after I upgraded the kernel to v4.3.3-5,
>>
>> Is this version debian kernel pkg's?
>> According to your post in last year, your system is
>>      4.2.0-1-amd64 #1 SMP Debian 4.2.5-1 (2015-10-27) x86_64
>>      GNU/Linux - Debian Testing standard kernel.
>>
>> If this problem is specific to debian v4.3.3-5 kernel, then I will try
>> finding the changes made in
>> 1. vanilla v4.3.3
>> 2. debian v4.3.3-5
>> particulary around ioctl(2).
> 
> Just confirmed, on this kernel the setup is fine:
> 
> =======================================================================
> 
> Linux 4.2.0-1-amd64 #1 SMP Debian 4.2.6-1 (2015-11-10) x86_64 GNU/Linux
> 
> =======================================================================
> 
> On this it breaks:
> 
> =======================================================================
> 
> Linux 4.3.0-1-amd64 #1 SMP Debian 4.3.3-5 (2016-01-04) x86_64 GNU/Linux
> 
> =======================================================================
> 
> Yes these are stock Debian kernels - the only special compilation I do
> is your standalone aufs driver (there are some DKMS modules mind).
> 
> Thanks


Minor update - rsync/aufs has suddenly decided to fail all over the
place with this supposed kernel memory problem, so even SSH+rsync is
somewhat unusable currently.

I say somewhat, as then forcing the relevant environment variables
through the rsync SSH session results in the 'invalid argument' problem
in most but not all rsync calls.

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140

Reply via email to