On Thu, Aug 3, 2017 at 8:25 AM, Jeff Law <l...@redhat.com> wrote: > On 08/03/2017 08:17 AM, Yury Gribov wrote: >> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov <tetra2005.patc...@gmail.com> >>> wrote: >>>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>>>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov <tetra2005.patc...@gmail.com> >>>>> wrote: >>>>>> On 02.08.2017 23:04, H.J. Lu wrote: >>>>>>> >>>>>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov >>>>>>> <tetra2005.patc...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> On 02.08.2017 21:48, H.J. Lu wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov >>>>>>>>> <tetra2005.patc...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 02.08.2017 20:02, Jeff Law wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 07/17/2017 01:23 AM, Yuri Gribov wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've rebased the previous patch to trunk per Andrew's suggestion. >>>>>>>>>>>>>> Original patch description/motivation/questions are in >>>>>>>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is his stuff used for exception handling? If so, doesn't that >>>>>>>>>>>>> make >>>>>>>>>>>>> the >>>>>>>>>>>>> performance a significant concern (ie that msync call?) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> For the performance issue, I wonder if it wouldn't be better to >>>>>>>>>>>> just >>>>>>>>>>>> compile the unwinder twice, basically include everything needed for >>>>>>>>>>>> the >>>>>>>>>>>> verification backtrace in a single *.c file with special defines, >>>>>>>>>>>> and >>>>>>>>>>>> not export anything from that *.o file except the single >>>>>>>>>>>> entrypoint. >>>>>>>>>>>> That would also handle the ABI problems. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Yea. >>>>>>>>>>> >>>>>>>>>>> Given that the vast majority of the time the addresses are valid, I >>>>>>>>>>> wonder if we could take advantage of that property to keep the >>>>>>>>>>> overhead >>>>>>>>>>> lower in the most common cases. >>>>>>>>>>> >>>>>>>>>>>> For the concurrent calls of other threads unmapping stacks of >>>>>>>>>>>> running >>>>>>>>>>>> threads (that is where most of the verification goes against), I'm >>>>>>>>>>>> afraid >>>>>>>>>>>> that is simply non-solveable, even if you don't cache anything, it >>>>>>>>>>>> will >>>>>>>>>>>> still be racy. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Absolutely -- I think solving this stuff 100% reliably without races >>>>>>>>>>> isn't possible. I think the question is can we make this >>>>>>>>>>> significantly >>>>>>>>>>> better. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> All, >>>>>>>>>> >>>>>>>>>> First of all, thanks for the feedback. This is indeed not a 100% >>>>>>>>>> robust >>>>>>>>>> solution, just something to allow tools like Asan to produce more >>>>>>>>>> sane >>>>>>>>>> backtraces on crash (currently when stack is corrupted they would >>>>>>>>>> crash >>>>>>>>>> in >>>>>>>>>> the unwinder without printing at least partial backtrace). >>>>>>>>>> >>>>>>>>> >>>>>>>>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is >>>>>>>>> corrupted: >>>>>>>>> >>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752 >>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189 >>>>>>>>> >>>>>>>>> There is very little point to unwind stack when stack is corrupted. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I'd argue that situation with __stack_chk_fail is very special. It's >>>>>>>> used >>>>>>>> in production code where you'd want to halt as soon as corruption is >>>>>>>> detected to prevent further damage so even printing message is an >>>>>>>> additional >>>>>>>> risk. Debugging tools (e.g. sanitizers) are different and would prefer >>>>>>>> to >>>>>>>> print as many survived frames as possible. >>>>>>>> >>>>>>> >>>>>>> But stack unwinder in libgcc is primarily for production use, not for >>>>>>> debugging. >>>>>> >>>>>> >>>>>> I can see it used in different projects to print backtraces on error >>>>>> (bind9, >>>>>> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder >>>>>> and >>>>>> seems to be meant primarily for debugging (at least many projects use it >>>>>> for >>>>>> this purpose: >>>>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same >>>>>> for >>>>>> libbacktrace which, as its README mentions, is meant to "print a detailed >>>>>> backtrace when an error occurs or to gather detailed profiling >>>>>> information". >>>>> >>>>> But it shouldn't be performed on a corrupted stack. When a stack is >>>>> corrupted, there is just no way to reliably unwind it. >>>> >>>> Why not? Attached patch shows that it's possible (up to a point of >>>> corruption of course). >>>> >>>>> It isn't a problem >>>>> for a debugger which is designed to analyze corrupted program. >>>> >>>> Yes but forcing all applications that would like to print helpful >>>> message on stack corruption to employ gdb sounds less efficient that >>>> providing fail-safe APIs in standard library... >>> ^^^^^^^^^^^ There is no such a thing of fail-safe on >>> corrupted >>> stack. A bad, but valid address, can be put on corrupted stack. When >>> you unwind it, the unwinder may not crash, but give you bogus stack >>> backtrace. >> >> Agreed but at least my users were fine with this. Having at least >> partial stack is very helpful when you app fails early during OS >> startup with no chance to attach debugger. > Once it's corrupted, you should stop the backtrace, unless you are > absolutely certain that trying to interpret the data in the stack can't > cause your unwinder to perform undefined behavior. >
That is what we did in glibc 2.26. -- H.J.