Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-28 Thread Fernando Santagata
On Tue, Sep 27, 2016 at 06:24:20PM +0200, Aurelien Jarno wrote:
> On 2016-09-27 16:00, Florian Weimer wrote:
> Ok. I have seen this change in 4.7.5:
> 
> | commit bec4e55b55867ed948a3afd9f9ccf3506bfdad24
> | Author: Michal Hocko 
> | Date:   Thu Sep 1 16:14:41 2016 -0700
> |
> | mm, oom: prevent premature OOM killer invocation for high order request
> 
> So I assumed it fixes the issue. Maybe it only fixes it partially.

Yesterday, before switching back to version 4.6, I upgraded the 4.7
version one last time. The net result was that the external USB drive
wasn't detected anymore. No message in the logs at all.

Everything worked fine again when I rebooted into the 4.6 kernel.

-- 
Fernando Santagata



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Aurelien Jarno
On 2016-09-27 16:00, Florian Weimer wrote:
> * Aurelien Jarno:
> 
> > On 2016-09-27 13:44, Florian Weimer wrote:
> >> * Aurelien Jarno:
> >> 
> >> > Hmm, rsync doesn't use libpthread, so that clearly rules out a
> >> > libpthread issue. That said, all the example you gave fail to allocate
> >> > the memory correctly, either through malloc (glibc) or mmap (kernel)
> >> > which returns -ENOMEM. This points to either a kernel issue, or a
> >> > limitation of the memory using for example ulimit.
> >> 
> >> The mm subsystem in the 4.7 upstream kernel has a very visible issue
> >> which causes allocation failures:
> >> 
> >>   
> >>
> >> There are other threads as well.  (I personally see this with the
> >> xfs_inode cache.)
> >> 
> >> Usually it manifests in premature OOM killer invocations, but maybe
> >> something the reporter's system configuration changes that (perhaps it
> >> runs with vm.overcommit_memory=2?).
> >  
> > Indeed, that is correct. The problem has been fixed in version 4.7.5,
> > while the reporter seems to run version 4.7.4. Upgrading to the latest
> > kernel version would be a good start.
> 
> I don't think this has been fully fixed in 4.7.5.  I'm running that
> version now, and with lots of xfs_inode objects, I observe basically
> zero read-ahead, which results in stuttering media playback with
> ogg123.  vm.drop_caches=3 makes the stuttgering go away.
> 
> I need to see if I can still reproduce the OOMs.  This was a bit
> tricky before.

Ok. I have seen this change in 4.7.5:

| commit bec4e55b55867ed948a3afd9f9ccf3506bfdad24
| Author: Michal Hocko 
| Date:   Thu Sep 1 16:14:41 2016 -0700
|
| mm, oom: prevent premature OOM killer invocation for high order request

So I assumed it fixes the issue. Maybe it only fixes it partially.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Aurelien Jarno
On 2016-09-27 15:22, Fernando Santagata wrote:
> On Tue, Sep 27, 2016 at 03:10:17PM +0200, Florian Weimer wrote:
> > * Fernando Santagata:
> > 
> > >> Usually it manifests in premature OOM killer invocations, but maybe
> > >> something the reporter's system configuration changes that (perhaps it
> > >> runs with vm.overcommit_memory=2?).
> > >
> > > That's it. I found this in /var/log/kern.log at the time I run a program
> > > that crashed:
> > >
> > > Sep 27 10:37:31 gretux kernel: [ 77.250470] mmap: moar (2564): VmData
> > > 135217152 exceed data ulimit 134217728. Update limits or use boot
> > > option ignore_rlimit_data.
> > 
> > No, I think the above is unrelated.  It relates to a userspace ABI
> > break related to the RLIMIT_DATA implementation.
> 
> Anyway, booting with the 4.6.0 kernel solves all the issues.
> 
> Thanks for the help!

Thanks for the feedback, I am therefore closing the bug.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Florian Weimer
* Aurelien Jarno:

> On 2016-09-27 13:44, Florian Weimer wrote:
>> * Aurelien Jarno:
>> 
>> > Hmm, rsync doesn't use libpthread, so that clearly rules out a
>> > libpthread issue. That said, all the example you gave fail to allocate
>> > the memory correctly, either through malloc (glibc) or mmap (kernel)
>> > which returns -ENOMEM. This points to either a kernel issue, or a
>> > limitation of the memory using for example ulimit.
>> 
>> The mm subsystem in the 4.7 upstream kernel has a very visible issue
>> which causes allocation failures:
>> 
>>   
>>
>> There are other threads as well.  (I personally see this with the
>> xfs_inode cache.)
>> 
>> Usually it manifests in premature OOM killer invocations, but maybe
>> something the reporter's system configuration changes that (perhaps it
>> runs with vm.overcommit_memory=2?).
>  
> Indeed, that is correct. The problem has been fixed in version 4.7.5,
> while the reporter seems to run version 4.7.4. Upgrading to the latest
> kernel version would be a good start.

I don't think this has been fully fixed in 4.7.5.  I'm running that
version now, and with lots of xfs_inode objects, I observe basically
zero read-ahead, which results in stuttering media playback with
ogg123.  vm.drop_caches=3 makes the stuttgering go away.

I need to see if I can still reproduce the OOMs.  This was a bit
tricky before.



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Fernando Santagata
On Tue, Sep 27, 2016 at 03:10:17PM +0200, Florian Weimer wrote:
> * Fernando Santagata:
> 
> >> Usually it manifests in premature OOM killer invocations, but maybe
> >> something the reporter's system configuration changes that (perhaps it
> >> runs with vm.overcommit_memory=2?).
> >
> > That's it. I found this in /var/log/kern.log at the time I run a program
> > that crashed:
> >
> > Sep 27 10:37:31 gretux kernel: [ 77.250470] mmap: moar (2564): VmData
> > 135217152 exceed data ulimit 134217728. Update limits or use boot
> > option ignore_rlimit_data.
> 
> No, I think the above is unrelated.  It relates to a userspace ABI
> break related to the RLIMIT_DATA implementation.

Anyway, booting with the 4.6.0 kernel solves all the issues.

Thanks for the help!

-- 
Fernando Santagata



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Florian Weimer
* Fernando Santagata:

>> Usually it manifests in premature OOM killer invocations, but maybe
>> something the reporter's system configuration changes that (perhaps it
>> runs with vm.overcommit_memory=2?).
>
> That's it. I found this in /var/log/kern.log at the time I run a program
> that crashed:
>
> Sep 27 10:37:31 gretux kernel: [ 77.250470] mmap: moar (2564): VmData
> 135217152 exceed data ulimit 134217728. Update limits or use boot
> option ignore_rlimit_data.

No, I think the above is unrelated.  It relates to a userspace ABI
break related to the RLIMIT_DATA implementation.



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Fernando Santagata
On Tue, Sep 27, 2016 at 01:44:34PM +0200, Florian Weimer wrote:
> * Aurelien Jarno:
> 
> > Hmm, rsync doesn't use libpthread, so that clearly rules out a
> > libpthread issue. That said, all the example you gave fail to allocate
> > the memory correctly, either through malloc (glibc) or mmap (kernel)
> > which returns -ENOMEM. This points to either a kernel issue, or a
> > limitation of the memory using for example ulimit.
> 
> The mm subsystem in the 4.7 upstream kernel has a very visible issue
> which causes allocation failures:
> 
>   
> 
> There are other threads as well.  (I personally see this with the
> xfs_inode cache.)
> 
> Usually it manifests in premature OOM killer invocations, but maybe
> something the reporter's system configuration changes that (perhaps it
> runs with vm.overcommit_memory=2?).

That's it. I found this in /var/log/kern.log at the time I run a program
that crashed:

Sep 27 10:37:31 gretux kernel: [   77.250470] mmap: moar (2564): VmData 
135217152 exceed data ulimit 134217728. Update limits or use boot option 
ignore_rlimit_data.

Looks like a kernel mmap issue.

-- 
Fernando Santagata



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Aurelien Jarno
On 2016-09-27 13:44, Florian Weimer wrote:
> * Aurelien Jarno:
> 
> > Hmm, rsync doesn't use libpthread, so that clearly rules out a
> > libpthread issue. That said, all the example you gave fail to allocate
> > the memory correctly, either through malloc (glibc) or mmap (kernel)
> > which returns -ENOMEM. This points to either a kernel issue, or a
> > limitation of the memory using for example ulimit.
> 
> The mm subsystem in the 4.7 upstream kernel has a very visible issue
> which causes allocation failures:
> 
>   
>
> There are other threads as well.  (I personally see this with the
> xfs_inode cache.)
> 
> Usually it manifests in premature OOM killer invocations, but maybe
> something the reporter's system configuration changes that (perhaps it
> runs with vm.overcommit_memory=2?).
 
Indeed, that is correct. The problem has been fixed in version 4.7.5,
while the reporter seems to run version 4.7.4. Upgrading to the latest
kernel version would be a good start.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Florian Weimer
* Aurelien Jarno:

> Hmm, rsync doesn't use libpthread, so that clearly rules out a
> libpthread issue. That said, all the example you gave fail to allocate
> the memory correctly, either through malloc (glibc) or mmap (kernel)
> which returns -ENOMEM. This points to either a kernel issue, or a
> limitation of the memory using for example ulimit.

The mm subsystem in the 4.7 upstream kernel has a very visible issue
which causes allocation failures:

  

There are other threads as well.  (I personally see this with the
xfs_inode cache.)

Usually it manifests in premature OOM killer invocations, but maybe
something the reporter's system configuration changes that (perhaps it
runs with vm.overcommit_memory=2?).



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-27 Thread Aurelien Jarno
control: tag -1 + moreinfo
control: retitle -1 libc6: memory allocation issues

On 2016-09-26 14:10, Fernando Santagata wrote:
> Package: libc6
> Version: 2.24-3
> Severity: important
> 
> Dear Maintainer,
> 
> One month ago everything worked fine on my Debian sid computer.
> After an update/dist-upgrade cycle in which libc6 was updated I started 
> noticing some malfunctions.
> I couldn't use rakudobrew (the Perl6 installation program) anymore.
> I couldn't use the Selenium driver (a java program which drives the browser 
> and provides an API to several programming languages).
> rsync started failing on big files.
> "java -version" fails.
> 
> Yet, this doesn't appear to be a hardware problem: my computer works fine, 
> even under heavy load. No other program seems to be affected, neither the 
> browser (chrome), nor the music player (clementine), libreoffice, evince, 
> gimp, etc.
> 
> All the failing programs appear to be using threading. It shows even on small 
> snippets of code: I'm collecting interesting snippets of Perl6 code; while 
> most of them work fine, the ones that use threading are not working anymore.
> 
> While I'm not able to debug libpthread, what I can show are just the symptoms.
> 
> The command "java -version" outputs this:
> 
> OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x7f570c3fb000, 
> 172032, 0) failed; error='Cannot allocate memory' (errno=12)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (mmap) failed to map 172032 bytes for committing 
> reserved memory.
> 
> I'm attaching the error logs as java-version.hs_err_pid12374.log and 
> javaws.replay_pid12456.log.
> 
> javaws outputs this:
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 32744 bytes for 
> ChunkPool::allocate
> # An error report file with more information is saved as:
> # /home/nando/tmp/libc6_bug/hs_err_pid12456.log
> [thread 140120686573312 also had an error]
> [thread 140120688678656 also had an error]
> [thread 140121534289664 also had an error]
> 
> [error occurred during error reporting , id 0xe001]
> 
> I'm attaching the error log as javaws.hs_err_pid12456.log.
> 
> Running rsync I got this error:
> 
> ERROR: out of memory in flist_expand [sender]
> rsync error: error allocating core memory buffers (code 22) at util2.c(102) 
> [sender=3.1.1]
> 

Hmm, rsync doesn't use libpthread, so that clearly rules out a
libpthread issue. That said, all the example you gave fail to allocate
the memory correctly, either through malloc (glibc) or mmap (kernel)
which returns -ENOMEM. This points to either a kernel issue, or a
limitation of the memory using for example ulimit.

Can you please give us the output of "ulimit -a" on your system?

Thanks,
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-26 Thread Florian Weimer
* Fernando Santagata:

> One month ago everything worked fine on my Debian sid computer.
> After an update/dist-upgrade cycle in which libc6 was updated I
> started noticing some malfunctions.

Did you upgrade the kernel at the same time?



Bug#838913: libc6: There's probably a bug in libpthread, affecting several user programs.

2016-09-26 Thread Fernando Santagata
Package: libc6
Version: 2.24-3
Severity: important

Dear Maintainer,

One month ago everything worked fine on my Debian sid computer.
After an update/dist-upgrade cycle in which libc6 was updated I started 
noticing some malfunctions.
I couldn't use rakudobrew (the Perl6 installation program) anymore.
I couldn't use the Selenium driver (a java program which drives the browser and 
provides an API to several programming languages).
rsync started failing on big files.
"java -version" fails.

Yet, this doesn't appear to be a hardware problem: my computer works fine, even 
under heavy load. No other program seems to be affected, neither the browser 
(chrome), nor the music player (clementine), libreoffice, evince, gimp, etc.

All the failing programs appear to be using threading. It shows even on small 
snippets of code: I'm collecting interesting snippets of Perl6 code; while most 
of them work fine, the ones that use threading are not working anymore.

While I'm not able to debug libpthread, what I can show are just the symptoms.

The command "java -version" outputs this:

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x7f570c3fb000, 
172032, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 172032 bytes for committing 
reserved memory.

I'm attaching the error logs as java-version.hs_err_pid12374.log and 
javaws.replay_pid12456.log.

javaws outputs this:
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32744 bytes for 
ChunkPool::allocate
# An error report file with more information is saved as:
# /home/nando/tmp/libc6_bug/hs_err_pid12456.log
[thread 140120686573312 also had an error]
[thread 140120688678656 also had an error]
[thread 140121534289664 also had an error]

[error occurred during error reporting , id 0xe001]

I'm attaching the error log as javaws.hs_err_pid12456.log.

Running rsync I got this error:

ERROR: out of memory in flist_expand [sender]
rsync error: error allocating core memory buffers (code 22) at util2.c(102) 
[sender=3.1.1]

While I can successfully run any Perl6 program, those which use the concurrent 
programming interface of the language, fail showing errors like this:

Memory allocation failed; could not allocate 4194304 bytes

I can't even use reportbug, because when it spawns the editor to make me check 
the generated email it crashes with this error:

/usr/lib/x86_64-linux-gnu/gio/modules/libgioremote-volume-monitor.so: failed to 
map segment from shared object
Failed to load module: 
/usr/lib/x86_64-linux-gnu/gio/modules/libgioremote-volume-monitor.so
/usr/lib/x86_64-linux-gnu/gio/modules/libgioremote-volume-monitor.so: failed to 
map segment from shared object
Failed to load module: 
/usr/lib/x86_64-linux-gnu/gio/modules/libgioremote-volume-monitor.so

***MEMORY-ERROR***: reportbug[14467]: GSlice: failed to allocate 2032 bytes 
(alignment: 2048): Cannot allocate memory


*** /home/nando/tmp/libc6_bug/java-version.hs_err_pid12374.log
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 172032 bytes for committing 
reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (os_linux.cpp:2630), pid=12374, tid=0x7f4723e9c700
#
# JRE version:  (8.0_102-b14) (build )
# Java VM: OpenJDK 64-Bit Server VM (25.102-b14 mixed mode linux-amd64 
compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#

---  T H R E A D  ---

Current thread (0x7f471c009800):  JavaThread "Unknown thread" 
[_thread_in_vm, id=12375, stack(0x7f4723d9c000,0x7f4723e9d000)]

Stack: [0x7f4723d9c000,0x7f4723e9d000],  sp=0x7f4723e9b4b0,  free 
space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xa69680]
V  [libjvm.so+0x4c460c]
V  [libjvm.so+0x8b97c6]
V  [libjvm.so+0x8b405c]
V  [libjvm.so+0x3c4bb6]
V  [libjvm.so+0x3c3788]
V  [libjvm.so+0x91e9b8]
V  [libjvm.so+0x91ecb4]
V  [libjvm.so+0x2adfb1]
V  [libjvm.so+0x8de612]
V  [libjvm.so+0xa2dc11]
V  [libjvm.so+0xa2de72]
V  [libjvm.so+0x60e82f]
V  [libjvm.so+0xa12a7a]
V  [libjvm.so+0x688b01]  JNI_CreateJavaVM+0x61
C  [libjli.so+0x2f26]
C  [libjli.so+0x74bd]
C