Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

H.J. Lu Thu, 03 Aug 2017 08:29:09 -0700

On Thu, Aug 3, 2017 at 8:25 AM, Jeff Law <[email protected]> wrote:
> On 08/03/2017 08:17 AM, Yury Gribov wrote:
>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu <[email protected]> wrote:
>>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov <[email protected]> 
>>> wrote:
>>>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu <[email protected]> wrote:
>>>>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov <[email protected]> 
>>>>> wrote:
>>>>>> On 02.08.2017 23:04, H.J. Lu wrote:
>>>>>>>
>>>>>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov 
>>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On 02.08.2017 21:48, H.J. Lu wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>>>>>>>>> <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 02.08.2017 20:02, Jeff Law wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've rebased the previous patch to trunk per Andrew's suggestion.
>>>>>>>>>>>>>> Original patch description/motivation/questions are in
>>>>>>>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is his stuff used for exception handling?  If so, doesn't that 
>>>>>>>>>>>>> make
>>>>>>>>>>>>> the
>>>>>>>>>>>>> performance a significant concern (ie that msync call?)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For the performance issue, I wonder if it wouldn't be better to 
>>>>>>>>>>>> just
>>>>>>>>>>>> compile the unwinder twice, basically include everything needed for
>>>>>>>>>>>> the
>>>>>>>>>>>> verification backtrace in a single *.c file with special defines, 
>>>>>>>>>>>> and
>>>>>>>>>>>> not export anything from that *.o file except the single 
>>>>>>>>>>>> entrypoint.
>>>>>>>>>>>> That would also handle the ABI problems.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yea.
>>>>>>>>>>>
>>>>>>>>>>> Given that the vast majority of the time the addresses are valid, I
>>>>>>>>>>> wonder if we could take advantage of that property to keep the
>>>>>>>>>>> overhead
>>>>>>>>>>> lower in the most common cases.
>>>>>>>>>>>
>>>>>>>>>>>> For the concurrent calls of other threads unmapping stacks of 
>>>>>>>>>>>> running
>>>>>>>>>>>> threads (that is where most of the verification goes against), I'm
>>>>>>>>>>>> afraid
>>>>>>>>>>>> that is simply non-solveable, even if you don't cache anything, it
>>>>>>>>>>>> will
>>>>>>>>>>>> still be racy.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Absolutely -- I think solving this stuff 100% reliably without races
>>>>>>>>>>> isn't possible.  I think the question is can we make this
>>>>>>>>>>> significantly
>>>>>>>>>>> better.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> All,
>>>>>>>>>>
>>>>>>>>>> First of all, thanks for the feedback.  This is indeed not a 100%
>>>>>>>>>> robust
>>>>>>>>>> solution, just something to allow tools like Asan to produce more 
>>>>>>>>>> sane
>>>>>>>>>> backtraces on crash (currently when stack is corrupted they would 
>>>>>>>>>> crash
>>>>>>>>>> in
>>>>>>>>>> the unwinder without printing at least partial backtrace).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
>>>>>>>>> corrupted:
>>>>>>>>>
>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>>>>>>>>
>>>>>>>>> There is very little point to unwind stack when stack is corrupted.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I'd argue that situation with __stack_chk_fail is very special.  It's
>>>>>>>> used
>>>>>>>> in production code where you'd want to halt as soon as corruption is
>>>>>>>> detected to prevent further damage so even printing message is an
>>>>>>>> additional
>>>>>>>> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
>>>>>>>> to
>>>>>>>> print as many survived frames as possible.
>>>>>>>>
>>>>>>>
>>>>>>> But stack unwinder in libgcc is primarily for production use, not for
>>>>>>> debugging.
>>>>>>
>>>>>>
>>>>>> I can see it used in different projects to print backtraces on error 
>>>>>> (bind9,
>>>>>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
>>>>>> and
>>>>>> seems to be meant primarily for debugging (at least many projects use it 
>>>>>> for
>>>>>> this purpose:
>>>>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same 
>>>>>> for
>>>>>> libbacktrace which, as its README mentions, is meant to "print a detailed
>>>>>> backtrace when an error occurs or to gather detailed profiling 
>>>>>> information".
>>>>>
>>>>> But it shouldn't be performed on a corrupted stack.  When a stack is
>>>>> corrupted, there is just no way to reliably unwind it.
>>>>
>>>> Why not? Attached patch shows that it's possible (up to a point of
>>>> corruption of course).
>>>>
>>>>> It isn't a problem
>>>>> for a debugger which is designed to analyze corrupted program.
>>>>
>>>> Yes but forcing all applications that would like to print helpful
>>>> message on stack corruption to employ gdb sounds less efficient that
>>>> providing fail-safe APIs in standard library...
>>>                 ^^^^^^^^^^^  There is no such a thing of fail-safe on 
>>> corrupted
>>> stack.   A bad, but valid address, can be put on corrupted stack.  When
>>> you unwind it, the unwinder may not crash, but give you bogus stack
>>> backtrace.
>>
>> Agreed but at least my users were fine with this.  Having at least
>> partial stack is very helpful when you app fails early during OS
>> startup with no chance to attach debugger.
> Once it's corrupted, you should stop the backtrace, unless you are
> absolutely certain that trying to interpret the data in the stack can't
> cause your unwinder to perform undefined behavior.
>


That is what we did in glibc 2.26.


-- 
H.J.

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

Reply via email to