Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message

2019-01-21 Thread Vineet Gupta
On 1/21/19 9:07 AM, Eugeniy Paltsev wrote:
> We process signals in the end of syscall/exception handler.
> If the signal is fatal we print register's content using
> show_regs function. show_regs() also prints information about
> last exception happened.
> 
> In case of multicore system we can catch the situation when we
> will print wrong information about exception. See the example:
> __
> CPU-0: started to handle page fault
> CPU-1: sent signal to process, which is executed on CPU-0
> CPU-0: ended page fault handle. Started to process signal before
>returnig to userspace. Process signal, which is send from
>CPU-0. As th signal is fatal we call show_regs().
>show_regs() will show information about last exception
>which is *page fault* (instead of "trap" which is used for
>signals and happened on CPU-0)
> 
> So we will get message like this:
># ./waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /home/waitpid02
>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>   task: 9f11c200 task.stack: 9f3ae000
> 
>   [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000123ec
>   [EFA   ]: 0x
>   [BLINK ]: 0x123ea
>   [ERET  ]: 0x123ec
> @off 0x123ec in [/home/waitpid02]
> VMA: 0x0001 to 0x00016000
>   [STAT32]: 0x80080882 : IE U
>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x
>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x0006
>   [-other-info-]
> 
> This message is confusing because it show information about page fault
> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> to signal.
> 
> This situation was reproduced with waitpid02 LTP test.
> _
> 
> So remove printing information about exceptions from show_regs()
> to avoid confusing messages. Print information about exceptions
> only in required places instead of show_regs()

That is fine, but as I mentioned in your last posting, this is still not 
complete.
If printing reg file confuses us in case of termination by signal from some 
other
task, I don't see how just leaving out the exception regs, but still printing 
rest
of the reg file is the complete solution. It is still bogus and any fixes to 
that
effect are band aids.

> 
> Now we don't print information about exceptions if signal is simply
> send by another userspace app. So in case of waitpid02 we will print
> next message:

So all we are skipping is the decoding of ECR as you seem to be printing the raw
value anyways.

> _
># ./waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /root/waitpid02
>   CPU: 2 PID: 105 Comm: waitpid02 Not tainted 
> 4.18.0-rc8-2-gde0f6d6aeb53-dirty #17
>   [ECR   ]: 0x00050100
>   [EFA   ]: 0x
>   [BLINK ]: 0x20001486
>   [-other-info-]
> _
> 
> This patch fix
> STAR 9001146055: waitpid02: Invalid Write @ 0x by insn @ 0x000123ec
> 
> NOTE:
> To be more clear I give examples of different faults (signal-based,
> userspace/kernelspace exception-based) with different values of
> "/proc/sys/kernel/print-fatal-signals" option.
> 
> 0) NULL pointer access from user space, print-fatal-signals == 1:
> >8---
>  # ./arc_hell
> Exception: arc_hell[103]: at 0x2003a35c [off 0x2e35c in 
> /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
>   ECR: 0x00050100 => Invalid Read @ 0x by insn @ 0x2003a35c
> potentially unexpected fatal signal 11.
> Path: /root/arc_hell
> CPU: 1 PID: 103 Comm: arc_hell Not tainted 
> 4.18.0-rc8-2-gde0f6d6aeb53-dirty #17
> [ECR   ]: 0x00050100

So we are printing the ECR twice. Sorry this approach is not going to work.


> [EFA   ]: 0x
> [BLINK ]: 0x20039ef8
> [ERET  ]: 0x2003a35c
...
> 
> Segmentation fault
> >8---
> 
> 1) NULL pointer access from user space, print-fatal-signals == 0:
> >8---
>  # ./arc_hell
> Exception: arc_hell[107]: at 0x2003a35c [off 0x2e35c in 
> /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
>   ECR: 0x00050100 => Invalid Read @ 0x by insn @ 0x2003a35c
> Segmentation fault
> >8---
> 
> 2) Process killed by signal (waitpid02 test), print-fatal-signals == 1:
> >8---
>  # ./waitpid02
> potentially unexpected fatal signal 8.
> Path: /root/waitpid02
> CPU: 2 PID: 105 Comm: waitpid02 Not tainted 
> 4.18.0-rc8-2-gde0f6d6aeb53-dirty #17
> [ECR   ]: 0x00050100
> [EFA   ]: 0x
> [BLINK ]: 0x20001486
> [ERET  ]: 0x2000146c
> [STAT32]: 0x80080082 : IE U
> BTA: 0x2fc4  SP: 0x5fa21d64  FP: 0x
> LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x0006
> r00: 0x2000c0dc r01: 0x0018 r02: 0x0001159a
> r03: 0x0001 r04: 0x r05: 0x0045
> r06: 0x004e r07: 0x01010101 r08: 0x00dc
> r09: 0x200a31e0 r10: 0x20003a5c r11: 0x20004038
> r12: 0x20001486 r13: 0x20004174 r14: 0x07ca2bc0
> r15: 

[PATCH] ARC: prevent showing irrelevant exception info in signal message

2019-01-21 Thread Eugeniy Paltsev
We process signals in the end of syscall/exception handler.
If the signal is fatal we print register's content using
show_regs function. show_regs() also prints information about
last exception happened.

In case of multicore system we can catch the situation when we
will print wrong information about exception. See the example:
__
CPU-0: started to handle page fault
CPU-1: sent signal to process, which is executed on CPU-0
CPU-0: ended page fault handle. Started to process signal before
   returnig to userspace. Process signal, which is send from
   CPU-0. As th signal is fatal we call show_regs().
   show_regs() will show information about last exception
   which is *page fault* (instead of "trap" which is used for
   signals and happened on CPU-0)

So we will get message like this:
   # ./waitpid02
  potentially unexpected fatal signal 8.
  Path: /home/waitpid02
  CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
  task: 9f11c200 task.stack: 9f3ae000

  [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000123ec
  [EFA   ]: 0x
  [BLINK ]: 0x123ea
  [ERET  ]: 0x123ec
@off 0x123ec in [/home/waitpid02]
VMA: 0x0001 to 0x00016000
  [STAT32]: 0x80080882 : IE U
  BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x
  LPS: 0x20031684 LPE: 0x2003169a LPC: 0x0006
  [-other-info-]

This message is confusing because it show information about page fault
( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
to signal.

This situation was reproduced with waitpid02 LTP test.
_

So remove printing information about exceptions from show_regs()
to avoid confusing messages. Print information about exceptions
only in required places instead of show_regs()

Now we don't print information about exceptions if signal is simply
send by another userspace app. So in case of waitpid02 we will print
next message:
_
   # ./waitpid02
  potentially unexpected fatal signal 8.
  Path: /root/waitpid02
  CPU: 2 PID: 105 Comm: waitpid02 Not tainted 
4.18.0-rc8-2-gde0f6d6aeb53-dirty #17
  [ECR   ]: 0x00050100
  [EFA   ]: 0x
  [BLINK ]: 0x20001486
  [-other-info-]
_

This patch fix
STAR 9001146055: waitpid02: Invalid Write @ 0x by insn @ 0x000123ec

NOTE:
To be more clear I give examples of different faults (signal-based,
userspace/kernelspace exception-based) with different values of
"/proc/sys/kernel/print-fatal-signals" option.

0) NULL pointer access from user space, print-fatal-signals == 1:
>8---
 # ./arc_hell
Exception: arc_hell[103]: at 0x2003a35c [off 0x2e35c in 
/lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
  ECR: 0x00050100 => Invalid Read @ 0x by insn @ 0x2003a35c
potentially unexpected fatal signal 11.
Path: /root/arc_hell
CPU: 1 PID: 103 Comm: arc_hell Not tainted 4.18.0-rc8-2-gde0f6d6aeb53-dirty 
#17
[ECR   ]: 0x00050100
[EFA   ]: 0x
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
[STAT32]: 0x80080882 : IE U
BTA: 0x2003a358  SP: 0x5fa27dc4  FP: 0x5fa27de8
LPS: 0x2003a628 LPE: 0x2003a62c LPC: 0x
r00: 0x r01: 0x200740b0 r02: 0x0001
r03: 0x0007 r04: 0x80808080 r05: 0x2f2f2f2f
r06: 0x7c7a2f43 r07: 0x r08: 0x1a131100
r09: 0x2008b1e0 r10: 0x20003a5c r11: 0x20004038
r12: 0x20039ef8 r13: 0x200740b0 r14: 0x
r15: 0x200740b0 r16: 0x r17: 0x0007d468
r18: 0x0009313a r19: 0x r20: 0x0009c22c
r21: 0x0009c23c r22: 0x0009ab64 r23: 0x
r24: 0x0009dfc5 r25: 0x20004b70

Segmentation fault
>8---

1) NULL pointer access from user space, print-fatal-signals == 0:
>8---
 # ./arc_hell
Exception: arc_hell[107]: at 0x2003a35c [off 0x2e35c in 
/lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
  ECR: 0x00050100 => Invalid Read @ 0x by insn @ 0x2003a35c
Segmentation fault
>8---

2) Process killed by signal (waitpid02 test), print-fatal-signals == 1:
>8---
 # ./waitpid02
potentially unexpected fatal signal 8.
Path: /root/waitpid02
CPU: 2 PID: 105 Comm: waitpid02 Not tainted 
4.18.0-rc8-2-gde0f6d6aeb53-dirty #17
[ECR   ]: 0x00050100
[EFA   ]: 0x
[BLINK ]: 0x20001486
[ERET  ]: 0x2000146c
[STAT32]: 0x80080082 : IE U
BTA: 0x2fc4  SP: 0x5fa21d64  FP: 0x
LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x0006
r00: 0x2000c0dc r01: 0x0018 r02: 0x0001159a
r03: 0x0001 r04: 0x r05: 0x0045
r06: 0x004e r07: 0x01010101 r08: 0x00dc
r09: 0x200a31e0 r10: 0x20003a5c r11: 0x20004038
r12: 0x20001486 r13: 0x20004174 r14: 0x07ca2bc0
r15: 0x20004078 r16: 0x r17: 0x20004038
r18: 0x0001 r19: 0x r20: 0x0001159a
r21: 0x0001 r22: 0x r23: 0x0004
r24: 0x2000d1fc r25: 0x20004cd0
>8---

3) Process killed by signal (waitpid02 test), print-fatal-signals == 0:

Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message

2018-07-06 Thread Eugeniy Paltsev
Hi Vineet,

On Thu, 2018-07-05 at 14:26 -0700, Vineet Gupta wrote:
> On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> > On Mon, 2018-07-02 at 10:57 -0700, Vineet Gupta wrote:
> > > +CC Al
> > > 
> > > On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> > > > We process signals in the end of syscall/exception handler.
> > > > It the signal is fatal we print register's content using
> > > > show_regs function. show_regs() also prints information about
> > > > last exception happened.
> > > > 
> > > > In case of multicore system we can catch the situation when we
> > > > will print wrong information about exception. See the example:
> > > > __
> > > > CPU-0: started to handle page fault
> > > > CPU-1: sent signal to process, which is executed on CPU-0
> > > > CPU-0: ended page fault handle. Started to process signal before
> > > >returnig to userspace. Process signal, which is send from
> > > >CPU-0. As th signal is fatal we call show_regs().
> > > >show_regs() will show information about last exception
> > > >which is *page fault* (instead of "trap" which is used for
> > > >signals and happened on CPU-0)
> > > > 
> > > > So we will get message like this:
> > > > /home/waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   Path: /home/waitpid02
> > > >   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
> > > >   task: 9f11c200 task.stack: 9f3ae000
> > > > 
> > > >   [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 
> > > > 0x000123ec
> > > >   [EFA   ]: 0x
> > > >   [BLINK ]: 0x123ea
> > > >   [ERET  ]: 0x123ec
> > > > @off 0x123ec in [/home/waitpid02]
> > > > VMA: 0x0001 to 0x00016000
> > > >   [STAT32]: 0x80080882 : IE U
> > > >   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x
> > > >   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x0006
> > > >   [-other-info-]
> > > > 
> > > > This message is confusing because it show information about page fault
> > > > ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> > > > to signal.
> > > 
> > > Agreed this is misleading. @Al, is there a way to identify process 
> > > termination
> > > from signals because it did something wrong vs. say unhandled signal. For 
> > > former,
> > > we want to dump additional info in show_regs() such as PC / Fault addres 
> > > etc and
> > > not in other scenario.
> > > 
> > > > This situation was reproduced with waitpid02 LTP test.
> > > > _
> > > > 
> > > > So remove printing information about exceptions from show_regs()
> > > > to avoid confusing messages. Print information about exceptions
> > > > only in required places instead of show_regs()
> > > > 
> > > > Now we don't print information about exceptions if signal is simply
> > > > send by another userspace app. So in case of waitpid02 we will print
> > > > next message:
> > > > _
> > > > ./waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   [STAT32]: 0x80080082 : IE U
> > > >   BTA: 0x2fc4SP: 0x5ff8bd64  FP: 0x
> > > >   LPS: 0x200524a0   LPE: 0x200524b6 LPC: 0x0006
> > > >   [-other-info-]
> > > > _
> > > 
> > > The prints I'm seeing now, for a segv from NULL pointer access is even 
> > > more
> > > confusing !
> > > There's a mixup of prints
> > > 
> > > >8
> > > Path: /segv
> > > CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
> > > 
> > > [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000103ac
> > > [EFA   ]: 0x
> > > [BLINK ]: 0x20047bb0
> > > [ERET  ]: 0x103ac
> > > @off 0x103ac in [/segv]
> > > VMA: 0x0001 to 0x00012000
> > > 
> > > potentially unexpected fatal signal 11.
> > > [STAT32]: 0x80080882 : IE U
> > > BTA: 0x00010398 SP: 0x5fc95e1c FP: 0x5fc95e20
> > > LPS: 0x20039ffcLPE: 0x2003a000LPC: 0x
> > > r00: 0x0001r01: 0x5fc95e94r02: 0x   
> > > r03: 0x0064r04: 0x80808080r05: 0x2f2f2f2f   
> > > ...
> > > >8
> > > 
> > > and for the process killed by signal 8, we get below.
> > > 
> > > >8
> > > [ARCLinux]# kill -8 71
> > > [ARCLinux]# potentially unexpected fatal signal 8.
> > > [STAT32]: 0x80080882 : IE U
> > > BTA: 0x20020660 SP: 0x5fbcddec FP: 0x5fbcde1c
> > > LPS: 0x20039ffcLPE: 0x2003a000LPC: 0x
> > > r00: 0xfdfcr01: 0x5fbcddf0r02: 0x   
> > > r03: 0x0008r04: 0x80808080r05: 0x2f2f2f2f   
> > > r06: 0x7a2f5f4ar07: 0xr08: 0x0065   
> > > ...
> > > 
> > > 
> > > [1]+  Floating point exception   ./sleep
> > > >8
> > > I'm not sure whats the improvement here vs. the status quo.
> > 
> > Why do you think this is confusing?
> > The main change is 

Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message

2018-07-06 Thread Eugeniy Paltsev
Hi Vineet,

On Thu, 2018-07-05 at 14:26 -0700, Vineet Gupta wrote:
> On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> > On Mon, 2018-07-02 at 10:57 -0700, Vineet Gupta wrote:
> > > +CC Al
> > > 
> > > On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> > > > We process signals in the end of syscall/exception handler.
> > > > It the signal is fatal we print register's content using
> > > > show_regs function. show_regs() also prints information about
> > > > last exception happened.
> > > > 
> > > > In case of multicore system we can catch the situation when we
> > > > will print wrong information about exception. See the example:
> > > > __
> > > > CPU-0: started to handle page fault
> > > > CPU-1: sent signal to process, which is executed on CPU-0
> > > > CPU-0: ended page fault handle. Started to process signal before
> > > >returnig to userspace. Process signal, which is send from
> > > >CPU-0. As th signal is fatal we call show_regs().
> > > >show_regs() will show information about last exception
> > > >which is *page fault* (instead of "trap" which is used for
> > > >signals and happened on CPU-0)
> > > > 
> > > > So we will get message like this:
> > > > /home/waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   Path: /home/waitpid02
> > > >   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
> > > >   task: 9f11c200 task.stack: 9f3ae000
> > > > 
> > > >   [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 
> > > > 0x000123ec
> > > >   [EFA   ]: 0x
> > > >   [BLINK ]: 0x123ea
> > > >   [ERET  ]: 0x123ec
> > > > @off 0x123ec in [/home/waitpid02]
> > > > VMA: 0x0001 to 0x00016000
> > > >   [STAT32]: 0x80080882 : IE U
> > > >   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x
> > > >   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x0006
> > > >   [-other-info-]
> > > > 
> > > > This message is confusing because it show information about page fault
> > > > ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> > > > to signal.
> > > 
> > > Agreed this is misleading. @Al, is there a way to identify process 
> > > termination
> > > from signals because it did something wrong vs. say unhandled signal. For 
> > > former,
> > > we want to dump additional info in show_regs() such as PC / Fault addres 
> > > etc and
> > > not in other scenario.
> > > 
> > > > This situation was reproduced with waitpid02 LTP test.
> > > > _
> > > > 
> > > > So remove printing information about exceptions from show_regs()
> > > > to avoid confusing messages. Print information about exceptions
> > > > only in required places instead of show_regs()
> > > > 
> > > > Now we don't print information about exceptions if signal is simply
> > > > send by another userspace app. So in case of waitpid02 we will print
> > > > next message:
> > > > _
> > > > ./waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   [STAT32]: 0x80080082 : IE U
> > > >   BTA: 0x2fc4SP: 0x5ff8bd64  FP: 0x
> > > >   LPS: 0x200524a0   LPE: 0x200524b6 LPC: 0x0006
> > > >   [-other-info-]
> > > > _
> > > 
> > > The prints I'm seeing now, for a segv from NULL pointer access is even 
> > > more
> > > confusing !
> > > There's a mixup of prints
> > > 
> > > >8
> > > Path: /segv
> > > CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
> > > 
> > > [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000103ac
> > > [EFA   ]: 0x
> > > [BLINK ]: 0x20047bb0
> > > [ERET  ]: 0x103ac
> > > @off 0x103ac in [/segv]
> > > VMA: 0x0001 to 0x00012000
> > > 
> > > potentially unexpected fatal signal 11.
> > > [STAT32]: 0x80080882 : IE U
> > > BTA: 0x00010398 SP: 0x5fc95e1c FP: 0x5fc95e20
> > > LPS: 0x20039ffcLPE: 0x2003a000LPC: 0x
> > > r00: 0x0001r01: 0x5fc95e94r02: 0x   
> > > r03: 0x0064r04: 0x80808080r05: 0x2f2f2f2f   
> > > ...
> > > >8
> > > 
> > > and for the process killed by signal 8, we get below.
> > > 
> > > >8
> > > [ARCLinux]# kill -8 71
> > > [ARCLinux]# potentially unexpected fatal signal 8.
> > > [STAT32]: 0x80080882 : IE U
> > > BTA: 0x20020660 SP: 0x5fbcddec FP: 0x5fbcde1c
> > > LPS: 0x20039ffcLPE: 0x2003a000LPC: 0x
> > > r00: 0xfdfcr01: 0x5fbcddf0r02: 0x   
> > > r03: 0x0008r04: 0x80808080r05: 0x2f2f2f2f   
> > > r06: 0x7a2f5f4ar07: 0xr08: 0x0065   
> > > ...
> > > 
> > > 
> > > [1]+  Floating point exception   ./sleep
> > > >8
> > > I'm not sure whats the improvement here vs. the status quo.
> > 
> > Why do you think this is confusing?
> > The main change is 

Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message

2018-07-02 Thread Vineet Gupta
+CC Al

On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> We process signals in the end of syscall/exception handler.
> It the signal is fatal we print register's content using
> show_regs function. show_regs() also prints information about
> last exception happened.
>
> In case of multicore system we can catch the situation when we
> will print wrong information about exception. See the example:
> __
> CPU-0: started to handle page fault
> CPU-1: sent signal to process, which is executed on CPU-0
> CPU-0: ended page fault handle. Started to process signal before
>returnig to userspace. Process signal, which is send from
>CPU-0. As th signal is fatal we call show_regs().
>show_regs() will show information about last exception
>which is *page fault* (instead of "trap" which is used for
>signals and happened on CPU-0)
>
> So we will get message like this:
> /home/waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /home/waitpid02
>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>   task: 9f11c200 task.stack: 9f3ae000
>
>   [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000123ec
>   [EFA   ]: 0x
>   [BLINK ]: 0x123ea
>   [ERET  ]: 0x123ec
> @off 0x123ec in [/home/waitpid02]
> VMA: 0x0001 to 0x00016000
>   [STAT32]: 0x80080882 : IE U
>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x
>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x0006
>   [-other-info-]
>
> This message is confusing because it show information about page fault
> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> to signal.

Agreed this is misleading. @Al, is there a way to identify process termination
from signals because it did something wrong vs. say unhandled signal. For 
former,
we want to dump additional info in show_regs() such as PC / Fault addres etc and
not in other scenario.

> This situation was reproduced with waitpid02 LTP test.
> _
>
> So remove printing information about exceptions from show_regs()
> to avoid confusing messages. Print information about exceptions
> only in required places instead of show_regs()
>
> Now we don't print information about exceptions if signal is simply
> send by another userspace app. So in case of waitpid02 we will print
> next message:
> _
> ./waitpid02
>   potentially unexpected fatal signal 8.
>   [STAT32]: 0x80080082 : IE U
>   BTA: 0x2fc4  SP: 0x5ff8bd64  FP: 0x
>   LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x0006
>   [-other-info-]
> _

The prints I'm seeing now, for a segv from NULL pointer access is even more
confusing !
There's a mixup of prints

>8
Path: /segv
CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412

[ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000103ac
[EFA   ]: 0x
[BLINK ]: 0x20047bb0
[ERET  ]: 0x103ac
    @off 0x103ac in [/segv]
    VMA: 0x0001 to 0x00012000

potentially unexpected fatal signal 11.
[STAT32]: 0x80080882 : IE U
BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x
r00: 0x0001    r01: 0x5fc95e94    r02: 0x   
r03: 0x0064    r04: 0x80808080    r05: 0x2f2f2f2f   
...
>8

and for the process killed by signal 8, we get below.

>8
[ARCLinux]# kill -8 71
[ARCLinux]# potentially unexpected fatal signal 8.
[STAT32]: 0x80080882 : IE U
BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x
r00: 0xfdfc    r01: 0x5fbcddf0    r02: 0x   
r03: 0x0008    r04: 0x80808080    r05: 0x2f2f2f2f   
r06: 0x7a2f5f4a    r07: 0x    r08: 0x0065   
...


[1]+  Floating point exception   ./sleep
>8

I'm not sure whats the improvement here vs. the status quo.

For signal based kill, we don't want to dump the extra registers and if any, we
might still want to print the PC where the process was last seen in user mode to
give user of idea what it was doing at the time.

-Vineet


Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message

2018-07-02 Thread Vineet Gupta
+CC Al

On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> We process signals in the end of syscall/exception handler.
> It the signal is fatal we print register's content using
> show_regs function. show_regs() also prints information about
> last exception happened.
>
> In case of multicore system we can catch the situation when we
> will print wrong information about exception. See the example:
> __
> CPU-0: started to handle page fault
> CPU-1: sent signal to process, which is executed on CPU-0
> CPU-0: ended page fault handle. Started to process signal before
>returnig to userspace. Process signal, which is send from
>CPU-0. As th signal is fatal we call show_regs().
>show_regs() will show information about last exception
>which is *page fault* (instead of "trap" which is used for
>signals and happened on CPU-0)
>
> So we will get message like this:
> /home/waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /home/waitpid02
>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>   task: 9f11c200 task.stack: 9f3ae000
>
>   [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000123ec
>   [EFA   ]: 0x
>   [BLINK ]: 0x123ea
>   [ERET  ]: 0x123ec
> @off 0x123ec in [/home/waitpid02]
> VMA: 0x0001 to 0x00016000
>   [STAT32]: 0x80080882 : IE U
>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x
>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x0006
>   [-other-info-]
>
> This message is confusing because it show information about page fault
> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> to signal.

Agreed this is misleading. @Al, is there a way to identify process termination
from signals because it did something wrong vs. say unhandled signal. For 
former,
we want to dump additional info in show_regs() such as PC / Fault addres etc and
not in other scenario.

> This situation was reproduced with waitpid02 LTP test.
> _
>
> So remove printing information about exceptions from show_regs()
> to avoid confusing messages. Print information about exceptions
> only in required places instead of show_regs()
>
> Now we don't print information about exceptions if signal is simply
> send by another userspace app. So in case of waitpid02 we will print
> next message:
> _
> ./waitpid02
>   potentially unexpected fatal signal 8.
>   [STAT32]: 0x80080082 : IE U
>   BTA: 0x2fc4  SP: 0x5ff8bd64  FP: 0x
>   LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x0006
>   [-other-info-]
> _

The prints I'm seeing now, for a segv from NULL pointer access is even more
confusing !
There's a mixup of prints

>8
Path: /segv
CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412

[ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000103ac
[EFA   ]: 0x
[BLINK ]: 0x20047bb0
[ERET  ]: 0x103ac
    @off 0x103ac in [/segv]
    VMA: 0x0001 to 0x00012000

potentially unexpected fatal signal 11.
[STAT32]: 0x80080882 : IE U
BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x
r00: 0x0001    r01: 0x5fc95e94    r02: 0x   
r03: 0x0064    r04: 0x80808080    r05: 0x2f2f2f2f   
...
>8

and for the process killed by signal 8, we get below.

>8
[ARCLinux]# kill -8 71
[ARCLinux]# potentially unexpected fatal signal 8.
[STAT32]: 0x80080882 : IE U
BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x
r00: 0xfdfc    r01: 0x5fbcddf0    r02: 0x   
r03: 0x0008    r04: 0x80808080    r05: 0x2f2f2f2f   
r06: 0x7a2f5f4a    r07: 0x    r08: 0x0065   
...


[1]+  Floating point exception   ./sleep
>8

I'm not sure whats the improvement here vs. the status quo.

For signal based kill, we don't want to dump the extra registers and if any, we
might still want to print the PC where the process was last seen in user mode to
give user of idea what it was doing at the time.

-Vineet