On Fri, Mar 24, 2017 at 1:06 AM, James Cowgill <jcowg...@debian.org> wrote: > reassign 858405 xsltproc > forcemerge 750593 858405 > retitle 750593 xsltproc: bus error on some arches with linux < 4.1 > thanks > > Hi, > > On 22/03/17 21:01, Daniel Kahn Gillmor wrote: >> On Wed 2017-03-22 06:22:41 -0400, James Cowgill wrote: >>> On 22/03/17 01:29, Daniel Kahn Gillmor wrote: >>>> For debian revisions of 3.20, failures happened on: >>>> >>>> mipsel-manda-02 >>>> eberlin >>>> >>>> Also for revisions of 3.20, successes happened on: >>>> >>>> mipsel-sil-01 >>>> mipsel-manda-03 >>>> mipsel-manda-01 >>> >>> This is a known issue and it only affects Loongson buildds. >>> Interestingly mipsel-manda-01 is Loongson and didn't fail there so there >>> may be a random element involved here. I don't think anyone's tracked >>> down the underlying issue though. >> >> thanks, is there a public reference for the known issue that we can >> point to? > > I think #750593 looks a lot like the bug here. > > After some investigation, it seems I was being a bit unfair to Loongson. > This is arguably a non mips specific bug in Linux < 4.1. It just so > happens that all the Loongson buildds run jessie's 3.16 kernel and all > the other buildds run >= 4.7 from backports. > > In #750593 there was lots of talk about stack overflows causing this but > there is actually another element to this. Indeed if I reduced the stack > size down with ulimit, the segfaults become more frequent, but > increasing the stack size didn't help at all. After looking at the > mappings for a failing process, I saw this (taken just after starting > xsltproc): > > [...] >> fff7f50000-fff7f5c000 ---p 00004000 fd:00 1060250 >> /usr/lib/mips64el-linux-gnuabi64/libeatmydata.so.1.1.2 >> fff7f5c000-fff7f60000 rw-p 00000000 fd:00 1060250 >> /usr/lib/mips64el-linux-gnuabi64/libeatmydata.so.1.1.2 >> fff7f60000-fff7f88000 r-xp 00000000 fd:00 1060375 >> /lib/mips64el-linux-gnuabi64/ld-2.24.so >> fff7f94000-fff7f98000 rw-p 00024000 fd:00 1060375 >> /lib/mips64el-linux-gnuabi64/ld-2.24.so >> fff7f98000-fff7fa0000 r-xp 00000000 fd:00 947544 >> /usr/bin/xsltproc >> fff7fa4000-fff7fac000 rw-p 00000000 00:00 0 >> fff7fac000-fff7fb0000 rw-p 00004000 fd:00 947544 >> /usr/bin/xsltproc >> ffff1d4000-ffff384000 rwxp 00000000 00:00 0 >> [heap] >> ffff9e0000-ffffa04000 rwxp 00000000 00:00 0 >> [stack] >> ffffffc000-10000000000 r-xp 00000000 00:00 0 >> [vdso] > > Notice that there is a very small gap between the heap and the stack > here (at least compared to working xsltproc runs). I think that the heap > is growing to a point where it limits the maximum size of the stack and > so increasing the stack size with ulimit doesn't help. > > The reason the program and the heap are at these very high addresses is > that xsltproc is built with PIE and the kernel is treating the > executable like a mmap and grouping it with all the other libraries. In > d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR") the behavior > changed and now the program and it's heap will be mapped at a lower > address so the bug does not affect newer kernels. Using "setarch -L" or > "setarch -R" is another workaround for this bug because that moves the > program so that there is a much larger gap between the heap and the stack. > > This might affect other applications as well. Effectively it means that > PIE executables which use lots of stack space might not work properly > with jessie's kernel. The chances the bug will be hit seems to vary > between arches however (depending on what each arch does in > arch_pick_mmap_layout and arch_randomize_brk) - mips64el seems to be hit > pretty frequently. In xsltproc's case, PIE was enabled some time ago > which is why this bug is quite old. > > I believe any of the following will fix this (but have not all been tested): > - Reduce the stack usage in xsltproc (the upstream bug) > - Upgrade the relevant buildds to Linux >= 4.1 > - Apply d1fd836dcf00 to jessie's kernel > - Disable PIE in xsltproc. > - Run xsltproc inside setarch -L / setarch -R >
we have some trouble to run newer kernel on some Loongson machines, as their pmon can only load initrd with limit size. So backports patch may ideal for us, now. >>> For the moment, I'll rebuild libreswan again and hope a good buildd is >>> picked. >> >> i see 5 mips64el rebuilds now at >> https://buildd.debian.org/status/logs.php?pkg=libreswan&ver=3.20-6&suite=sid, >> but none of them have succeded yet :/ >> >> 3 of the builds are from mipsel-manda-02, 1 is from eberlin, and one >> additional new "bad" builder is: >> >> mipsel-aql-01 > > There are 3 non-Loongson buildds: mipsel-aql-03, mipsel-manda-03 and > mipsel-sil-01. I expect libreswan will only build on one of those > buildds at the moment. > > Thanks, > James > -- YunQiang Su