[SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-29 Thread Nick Alcock
My server's ARC-1210 has been working fine for years, but when I
upgraded from 3.10.1, it started failing:

Instead of

[0.784044] Areca RAID Controller0: F/W V1.46 2009-01-06 & Model ARC-1210
[0.804028] scsi0 : Areca SATA Host Adapter RAID Controller
 Driver Version 1.20.00.15 2010/08/05
[...]

[4.111770] sd 7:0:0:1: [sdd] Assuming drive cache: write through
[4.115399] sd 7:0:0:1: [sdd] No Caching mode page present
[4.115401] sd 7:0:0:1: [sdd] Assuming drive cache: write through
[4.118081]  sdd: sdd1
[4.124363] sd 7:0:0:1: [sdd] No Caching mode page present
[4.124601] sd 7:0:0:1: [sdd] Assuming drive cache: write through
[4.124867] sd 7:0:0:1: [sdd] Attached SCSI removable disk

I now see (timestamps and some of the right edge chopped off because not
captured on my camera, no netconsole as this machine has all my storage
and is my loghost, and with this bug it can't get at any of that
storage).

sd 7:0:0:1: [sdd] Assuming drive cache: write through
sd 7:0:0:1: [sdd] No Caching mode page present
sd 7:0:0:1: [sdd] Assuming drive cache: write through
 sdd: sdd1
sd 7:0:0:1: [sdd] No Caching mode page present
sd 7:0:0:1: [sdd] Assuming drive cache: write through
sd 7:0:0:1: [sdd] Attached SCSI removable disk
arcmsr0: abort device command of scsi id = 0 lun = 1
arcmsr0: abort device command of scsi id = 0 lun = 0
arcmsr: executing bus reset eh.num_resets=0, num_[...]

arcmsr0: wait 'abort all outstanding command' timeout
arcmsr0: executing hw bus reset 
arcmsr0: waiting for hw bus reset return, retry=0
arcmsr0: waiting for hw bus reset return, retry=1
Areca RAID Controller0: F/W V1.46 2009-01-06 & Model ARC-1210
arcmsr: scsi  bus reset eh returns with success
[and back to the top of the error messages again, apparently forever,
 not that the machine would be much use without its RAID array even
 if this loop terminated at some point, so I only gave it a couple
 of minutes]

The failure happens precisely at the moment we transition to early
userspace, so presumably userspace I/O is failing (or something related
to raw device access, perhaps, since the first thing it does is a
vgscan).

I haven't bisected yet (sorry, I have work to do which means this
machine must be running right now), but nothing has changed in the
arcmsr controller, nor in SCSI-land excepting

commit 98dcc2946adbe4349ef1ef9b99873b912831edd4
Author: Martin K. Petersen 
Date:   Thu Jun 6 22:15:55 2013 -0400

SCSI: sd: Update WRITE SAME heuristics

so my, admittedly largely baseless, suspicions currently fall there.


Obviously, at this point, this machine has no modules loaded (it has
almost none loaded even when fully operational)

.config, unchanged from 3.10.1 to 3.10.3:

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CPU_AUTOPROBE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_LZMA=y
CONFIG_DEFAULT_HOSTNAME="spindle"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_FHANDLE=y
CONFIG_AUDIT=y
CONFIG_HAVE_GENERIC_HARDIRQS=y

CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-30 Thread Nick Alcock
On 30 Jul 2013, Bernd Schubert told this:

> On 07/30/2013 01:34 AM, Martin K. Petersen wrote:
>> (wheezy)fslab1:~# sg_inq -v /dev/sdc
>> inquiry cdb: 12 00 00 00 24 00
>> standard INQUIRY:
>> inquiry cdb: 12 00 00 00 60 00
>>   PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
>>   [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2
>>   SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  BQue=0
>>   EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=1
>>   [RelAdr=0]  WBus16=1  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
>>   [SPI: Clocking=0x3  QAS=0  IUS=0]
>> length=96 (0x60)   Peripheral device type: disk
>>  Vendor identification: Hitachi
>>  Product identification: HDS724040KLSA80
>>  Product revision level: R001
>> inquiry cdb: 12 01 00 00 fc 00
>> inquiry cdb: 12 01 80 00 fc 00
>>  Unit serial number: KRFS2CRAHXJZVD
>
> Besides the firmware, the difference might be that I'm exporting single disks 
> without any areca-raidset in between.
> I can try to confirm that tomorrow, I just need the system as it is till 
> tomorrow noon.

Aaah. Yeah, it looks like in JBOD mode it's just passing things straight
on to the disk: that vendor ID is a dead giveaway. For all I know my
earlier firmware does the same, but for obvious reasons I can't really
test that! Quite possibly it's passing *everything* on to the disk,
including all SCSI commands, in which case we don't actually know that
your Areca controller supports the VPD page we thought it did: quite
possibly only this underlying disk does.

You can get a degree of info on the underlying disks in the array even
if it's in RAID mode -- smartctl does it, for instance -- but it takes
Areca-specific code and chattering to the sg devices directly. I bet
that in JBOD mode, the sg device is the only exposure the controller has
to the world, and *all* the /dev/sd* devices are just passthroughs.

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scsi disk: Use its own buffer for the vpd request

2013-08-02 Thread Nick Alcock
On 1 Aug 2013, Bernd Schubert told this:

> Once I noticed that scsi_get_vpd_page() works fine from other function
> calls and that it is not 0x89, but already 0x0 that fails fixing it became
> easy.
>
> Nix, any chance you could verify it also works for you?

Sorry for the delay: it's hard for me to verify this during the working
week.

I'll check it tomorrow -- after I've run a backup! :} (why yes, bugs of
this nature do frighten me a bit. I know it's superstition, but I'm
always wondering whether the SCSI controller will come back again
whenever that post-error bus reset happens.)

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scsi disk: Use its own buffer for the vpd request

2013-08-03 Thread Nick Alcock
On 1 Aug 2013, Bernd Schubert stated:

> Once I noticed that scsi_get_vpd_page() works fine from other function
> calls and that it is not 0x89, but already 0x0 that fails fixing it became
> easy.
>
> Nix, any chance you could verify it also works for you?

Confirmed, thank you!

> Somehow older areca firmware versions have issues with
> scsi_get_vpd_page() and a large buffer.

I wonder if they're using math modulo SD_BUF_SIZE-1 by mistake, so they
misinterpret this as zero? (Still, doing math modulo 511 seems very
odd, even if this firmware *does* only support 512-byte sectors.)

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] vfs: respect FMODE_UNSIGNED_OFFSET in p(read|write)[v]*().

2014-01-29 Thread Nick Alcock
Because the pread and pwrite functions do not respect the unsigned
offset flag, you can read certain parts of /proc/$pid/mem via lseek()
and read(), but not via pread().  (This probably went unnoticed
because on i386 and x86-64, almost everything except the vdso is
normally located below the region where signed offsets become
negative: but this is not true on all platforms.)

Fixing pwrite() is currently academic because this flag is only
used by files that do not allow writing, but it is easiest to
be consistent and retain a similarity of form between the pread*()
and pwrite*() functions.

Signed-off-by: Nick Alcock 
---
 fs/read_write.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 58e440d..f33f664 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -534,10 +534,12 @@ SYSCALL_DEFINE4(pread64, unsigned int, fd, char __user *, 
buf,
struct fd f;
ssize_t ret = -EBADF;
 
-   if (pos < 0)
+   f = fdget(fd);
+   if ((pos < 0) && (!f.file || !unsigned_offsets(f.file))) {
+   fdput(f);
return -EINVAL;
+   }
 
-   f = fdget(fd);
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
@@ -554,10 +556,12 @@ SYSCALL_DEFINE4(pwrite64, unsigned int, fd, const char 
__user *, buf,
struct fd f;
ssize_t ret = -EBADF;
 
-   if (pos < 0)
+   f = fdget(fd);
+   if ((pos < 0) && (!f.file || !unsigned_offsets(f.file))) {
+   fdput(f);
return -EINVAL;
+   }
 
-   f = fdget(fd);
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)  
@@ -847,10 +851,12 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct 
iovec __user *, vec,
struct fd f;
ssize_t ret = -EBADF;
 
-   if (pos < 0)
+   f = fdget(fd);
+   if ((pos < 0) && (!f.file || !unsigned_offsets(f.file))) {
+   fdput(f);
return -EINVAL;
+   }
 
-   f = fdget(fd);
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
@@ -871,8 +877,10 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct 
iovec __user *, vec,
struct fd f;
ssize_t ret = -EBADF;
 
-   if (pos < 0)
+   if ((pos < 0) && (!f.file || !unsigned_offsets(f.file))) {
+   f = fdget(fd);
return -EINVAL;
+   }
 
f = fdget(fd);
if (f.file) {
-- 
1.8.5.2.169.ge058798
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfs: respect FMODE_UNSIGNED_OFFSET in p(read|write)[v]*().

2014-01-29 Thread Nick Alcock
On 29 Jan 2014, Al Viro outgrape:

> On Wed, Jan 29, 2014 at 11:57:20AM +0000, Nick Alcock wrote:
>>  ssize_t ret = -EBADF;
>>  
>> -if (pos < 0)
>> +f = fdget(fd);
>> +if ((pos < 0) && (!f.file || !unsigned_offsets(f.file))) {
>> +fdput(f);
>>  return -EINVAL;
>> +}
>
> ... and now pread(-1, ...) fails with EINVAL instead of EBADF.

Sorry, I don't see it. If the fh is invalid, control flow is unchanged
unless pos is also < 0 (that's an && outside the bracketed section, not
an ||, and nothing I've touched changes ret outside that conditional
branch): if pos *is* < 0, we'd have had an EINVAL before and we have one
now, likewise unchanged.

What am I missing?

(Or did you miss the brackets enclosing (!f.file || !unsigned_offsets(f.file))?
If so, I'm not surprised: it would really be easier to read if that
function had the inverse sense, 'signed_offsets()'...)

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite

2016-05-27 Thread Nick Alcock
So I've been working on a patch series (see below) that applies GCC's
-fstack-protector{-all,-strong} to almost all of glibc bar the dynamic
linker. In trying to upstream it, one review commenter queried one
SPARC-specific patch in the series; the absence of this patch triggers a
BUG in the SPARC kernel when glibc is tested as an unprivileged user, on
all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on
the ldoms I have access to and presumably on bare hardware too.

This is clearly a bug, and equally clearly I think it needs fixing
before we can upstream the series, which it would be nice to do because
it would have prevented most of the recent spate of glibc stack
overflows from escalating to arbitrary code execution.

First, a representative sample of the BUG, as seen on 4.6.0:

ld-linux.so.2[36805]: segfault at 7ff ip   (null) (rpc   
(null)) sp   (null) error 30001 in tst-kill6[10+4000]
ld-linux.so.2[36806]: segfault at 7ff ip   (null) (rpc   
(null)) sp   (null) error 30001 in tst-kill6[10+4000]
ld-linux.so.2[36807]: segfault at 7ff ip   (null) (rpc   
(null)) sp   (null) error 30001 in tst-kill6[10+4000]
kernel BUG at arch/sparc/mm/fault_64.c:299!
  \|/  \|/
  "@'/ .. \`@"
  /_| \__/ |_\
 \__U_/
ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 004410001601 TPC: 00a1a784 TNPC: 00a1a788 Y: 
0002Not tainted
TPC: 
g0: fff824fc8248 g1: 00db04dc g2:  g3: 
0001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 
0001
o0: 00b95ee8 o1: 012b o2:  o3: 
000200b9b358
o4:  o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 
00a1a77c
RPC: 
l0: 07ff l1:  l2: 005f l3: 

l4: fff8000301347e98 l5: fff824ff3060 l6:  l7: 

i0: fff8000301347f60 i1: 00102400 i2:  i3: 

i4:  i5:  i6: fff80003013476a1 i7: 
00404d4c
I7: 
Call Trace:
 [00404d4c] user_rtt_fill_fixup+0x6c/0x7c
Disabling lock debugging due to kernel taint
Caller[00404d4c]: user_rtt_fill_fixup+0x6c/0x7c
Caller[]:   (null)
Instruction DUMP: 9210212b  7fe84179  901222e8 <91d02005> 90102002  92102001  
94100018  7fecd033  96100010
Kernel panic - not syncing: Fatal exception
Press Stop-A (L1-A) to return to the boot prom
---[ end Kernel panic - not syncing: Fatal exception

The crash moves around, and can even be seen striking in completely
random userspace processes that aren't part of the glibc under test
(e.g. I've seen it happen inside awk and GCC). The backtrace is always
the same, though.

It seems this is an unexpected TLB fault from this BUG in
do_sparc64_fault():

if ((fault_code & FAULT_CODE_ITLB) &&
(fault_code & FAULT_CODE_DTLB))
BUG();

which certainly explains the randomness to some extent.

Now, some details for replication. It's easy to replicate if you can
build and test glibc using a GCC that supports -fstack-protector-all on
Linux/SPARC: I used 4.9.3. (You don't need to *install* the glibc or
anything, and getting to the crash on reasonable hardware takes only a
few minutes.)

The patch series itself, in the hopefully-not-too-inconvenient form of a
pair of git bundles based on glibc commit
a5df3210a641c175138052037fcdad34298bfa4d (near the glibc-2.23 release),
though this happens on glibc trunk with these bundles merged in too:





You'll need to run autoconf-2.69 in the source tree after checkout,
since I haven't regenerated configure in either of them.

To configure/build/test, I used

../../glibc/configure --enable-stackguard-randomization \
--enable-stack-protector=all --prefix=/usr --enable-shared \
--enable-bind-now --enable-maintainer-mode --enable-obsolete-rpc \
--enable-add-ons=libidn --enable-kernel=4.1 --enable-check-abi=warn \
&& make -j 5 && make -j 5 check TIMEOUTFACTOR=5

though most of the configure flags are probably unnecessary and you'll
probably want to adjust the -j numbers. The crucial one is
--enable-stack-protector=all; without it, the first patch series is
equivalent to the second.

The crash almost invariably happens during the make check run, usually
during or after string/; both 32-bit and 64-bit glibc builds are
affected (the above configure line is for 64-bit). I have not yet
completed as many as four runs without a crash, and it almost always
happens in one or two. You can probably trigger one reliabl

[4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite

2016-05-27 Thread Nick Alcock
[Resent with fixed address for sparclinux@; sorry!]

So I've been working on a patch series (see below) that applies GCC's
-fstack-protector{-all,-strong} to almost all of glibc bar the dynamic
linker. In trying to upstream it, one review commenter queried one
SPARC-specific patch in the series; the absence of this patch triggers a
BUG in the SPARC kernel when glibc is tested as an unprivileged user, on
all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on
the ldoms I have access to and presumably on bare hardware too.

This is clearly a bug, and equally clearly I think it needs fixing
before we can upstream the series, which it would be nice to do because
it would have prevented most of the recent spate of glibc stack
overflows from escalating to arbitrary code execution.

First, a representative sample of the BUG, as seen on 4.6.0:

ld-linux.so.2[36805]: segfault at 7ff ip   (null) (rpc   
(null)) sp   (null) error 30001 in tst-kill6[10+4000]
ld-linux.so.2[36806]: segfault at 7ff ip   (null) (rpc   
(null)) sp   (null) error 30001 in tst-kill6[10+4000]
ld-linux.so.2[36807]: segfault at 7ff ip   (null) (rpc   
(null)) sp   (null) error 30001 in tst-kill6[10+4000]
kernel BUG at arch/sparc/mm/fault_64.c:299!
  \|/  \|/
  "@'/ .. \`@"
  /_| \__/ |_\
 \__U_/
ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 004410001601 TPC: 00a1a784 TNPC: 00a1a788 Y: 
0002Not tainted
TPC: 
g0: fff824fc8248 g1: 00db04dc g2:  g3: 
0001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 
0001
o0: 00b95ee8 o1: 012b o2:  o3: 
000200b9b358
o4:  o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 
00a1a77c
RPC: 
l0: 07ff l1:  l2: 005f l3: 

l4: fff8000301347e98 l5: fff824ff3060 l6:  l7: 

i0: fff8000301347f60 i1: 00102400 i2:  i3: 

i4:  i5:  i6: fff80003013476a1 i7: 
00404d4c
I7: 
Call Trace:
 [00404d4c] user_rtt_fill_fixup+0x6c/0x7c
Disabling lock debugging due to kernel taint
Caller[00404d4c]: user_rtt_fill_fixup+0x6c/0x7c
Caller[]:   (null)
Instruction DUMP: 9210212b  7fe84179  901222e8 <91d02005> 90102002  92102001  
94100018  7fecd033  96100010
Kernel panic - not syncing: Fatal exception
Press Stop-A (L1-A) to return to the boot prom
---[ end Kernel panic - not syncing: Fatal exception

The crash moves around, and can even be seen striking in completely
random userspace processes that aren't part of the glibc under test
(e.g. I've seen it happen inside awk and GCC). The backtrace is always
the same, though.

It seems this is an unexpected TLB fault from this BUG in
do_sparc64_fault():

if ((fault_code & FAULT_CODE_ITLB) &&
(fault_code & FAULT_CODE_DTLB))
BUG();

which certainly explains the randomness to some extent.

Now, some details for replication. It's easy to replicate if you can
build and test glibc using a GCC that supports -fstack-protector-all on
Linux/SPARC: I used 4.9.3. (You don't need to *install* the glibc or
anything, and getting to the crash on reasonable hardware takes only a
few minutes.)

The patch series itself, in the hopefully-not-too-inconvenient form of a
pair of git bundles based on glibc commit
a5df3210a641c175138052037fcdad34298bfa4d (near the glibc-2.23 release),
though this happens on glibc trunk with these bundles merged in too:





You'll need to run autoconf-2.69 in the source tree after checkout,
since I haven't regenerated configure in either of them.

To configure/build/test, I used

../../glibc/configure --enable-stackguard-randomization \
--enable-stack-protector=all --prefix=/usr --enable-shared \
--enable-bind-now --enable-maintainer-mode --enable-obsolete-rpc \
--enable-add-ons=libidn --enable-kernel=4.1 --enable-check-abi=warn \
&& make -j 5 && make -j 5 check TIMEOUTFACTOR=5

though most of the configure flags are probably unnecessary and you'll
probably want to adjust the -j numbers. The crucial one is
--enable-stack-protector=all; without it, the first patch series is
equivalent to the second.

The crash almost invariably happens during the make check run, usually
during or after string/; both 32-bit and 64-bit glibc builds are
affected (the above configure line is for 64-bit). I have not yet
completed as many as four runs without a crash, and it almost always
happen

Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite

2016-05-27 Thread Nick Alcock
On 27 May 2016, John Paul Adrian Glaubitz outgrape:

> Hi Nick!
>
> On 05/27/2016 03:19 PM, Nick Alcock wrote:
>> So I've been working on a patch series (see below) that applies GCC's
>> -fstack-protector{-all,-strong} to almost all of glibc bar the dynamic
>> linker. In trying to upstream it, one review commenter queried one
>> SPARC-specific patch in the series; the absence of this patch triggers a
>> BUG in the SPARC kernel when glibc is tested as an unprivileged user, on
>> all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on
>> the ldoms I have access to and presumably on bare hardware too.
>
> I apologize for hijacking this thread but since you are mentioning glibc,
> there are actually a couple of tests in the glibc testsuite [1].

At least one of those failures is spurious:

FAIL: nptl/tst-cond11
original exit status 1
clock = 0
Timed out: killed the child process

You want to pass in a higher TIMEOUTFACTOR to the make check run, and
that problem at least should go away. (The TIMEOUTFACTOR you need
depends on how sluggish your test machine is.)


Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite

2016-05-27 Thread Nick Alcock
On 27 May 2016, David Miller stated:

> From: Nick Alcock 
> Date: Fri, 27 May 2016 14:19:27 +0100
>
>> The only difference between the two series above is that in the crashing
>> series, the ka_restorer stub functions __rt_sigreturn_stub and
>> __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get
>> stack-protected; in the non-crashing series, they do not; the same is
>> true without --enable-stack-protector=all, because the functions have no
>> local variables at all, so without -fstack-protector-all they don't get
>> stack-protected in any case. Passing such a stack-protected function in
>> as the ka_restorer stub seems to suffice to cause this crash at some
>> later date. I'm wondering if the stack canary is clobbering something
>> that the caller does not expect to be clobbered: we saw this cause
>> trouble on x86 in a different context (see upstream commit
>> 7a25d6a84df9fea56963569ceccaaf7c2a88f161).
>
> This is amazing that it makes a difference since the sigreturn stub is
> implemented entirely in inline assembler :-)

I was fairly surprised as well, but not shocked, because people who
write a function that consists of one single inline assembler
instruction might well be rather surprised to find a massive pile of
prologue and epilogue code dumped around it!

> Normally the 64-bit stub is emitted as:
>
> __rt_sigreturn_stub:
> mov 101, %g1
> ta 0x6d
>
> and with -fstack-protector-all we get:
>
> __rt_sigreturn_stub:
> save%sp, -192, %sp
> ldx [%g7+40], %g1
> stx %g1, [%fp+2039]
> mov 0, %g1
>
> mov 101, %g1
> ta 0x6d
>
> ldx [%fp+2039], %g1
> ldx [%g7+40], %g2
> xor %g1, %g2, %g1
> mov 0, %g2
> brnz,pn %g1, .LL4
>  nop
> return  %i7+8
>  nop
> .LL4:
> call__stack_chk_fail, 0
>  nop
> nop
>
> That 'save' is the problem.
>
> One can't change the register window or the stack pointer in this
> function, as the kernel has setup the restore frame at a precise
> location relative to the stack pointer when the stub is invoked.

Oops!

> Basically, do_rt_sigreturn is restoring garbage into the cpu
> registers.

Oh gods is it supposed to do register restoration? i.e. the usual ABI
rules in re stack changes, etc just don't apply to it?

Right, that's a disaster for stack-protection, obviously. The
stack-protector prologue/epilogue does rather assume that it's being
wrapped around a function, and in a very real sense this thing isn't a
function in the normal sense at all. This is exactly what I thought was
going on with the x86 code, but in the end that turned out to be a
simple case of the (assembly) caller assuming a call-clobbered register
had survived unchanged when the stack-protector epilogue had clobbered
it (as it was quite within its rights to).

> It obviously shouldn't crash, which I'll look into, but it is clear
> that we can't enable -fstack-protector-all for this function.

And now I have a good explanation of why that is for the commit log.
Thank you!

> So far I'm playing with the patch below to do some basic sanity
> checks on the values inside of the sigreturn frame:

Good move. Segfaulting the process is fine! :) Any process that does
this sort of thing is clearly either terminally buggy, written by an
idiot who doesn't know what he's doing (i.e. my original patch) or
malicious. These all deserve SEGVs.

(I still don't understand why this leads to spurious TLB faults, though.
Filling the userland CPU registers with garbage is bad, but should still
be reasonably harmless to the kernel, surely?)

-- 
NULL && (void)