kernel crash when connecting MP3-Player (Sansa Clip Zip)

2014-02-24 Thread n . reusse
Hi bugs@,

whenever i connect my turned-off Sansa Clip Zip to my openbsd desktop
machine,
the kernel hangs and throws me into dbb.  It's 100% reproducible.  The
output
was generated from the latest amd64-current-snapshot (23.02., dmesg
below),
connecting the player immediately after boot.

I replaced the firmware on the player with rockbox 3.13, which crashes
from
time to time.  On debian wheezy i had similar problems (i.e. the
*player* crashed
and was not detected when i connected it), but debian never crashed.

Something else i noticed while looking at the ps output: there is a
'usbtask'
and a 'usbatsk' command running, is that correct or a typo somewhere?

Best,
N.

kernel: protection fault trap, code=0
Stopped at  usb_allocmem+0x166: cmpq%rbx,0(%rax)
ddb{0}> trace
usb_allocmem() at usb_allocmem+0x166
usbd_transfer() at usbd_transfer+0xe6
usbd_do_request_flags() at usbd_do_request_flags+0xcc
usbd_get_desc() at usbd_get_desc+0x2d
usbd_new_device() at usbd_new_device+0x1c7
uhub_explore() at uhub_explore+0x1c9
usb_explore() at usb_explore+0xcf
usb_task_thread() at usb_task_thread+0xb2
end trace frame: 0x0, count: -8
ddb{0}> ps
PID   PPID   PGRPUID  S   FLAGS  WAIT COMMAND
2487  1   2487  0  30x83  ttyingetty
74  1 74  0  30x83  ttyingetty
 11485  1  11485  0  30x83  ttyingetty
 17765  1  17765  0  30x83  ttyingetty
8940  1   8940  0  30x83  ttyingetty
 12835  1  12835  0  30x80  select   cron
 15782  1  15782  0  30x80  kqread   apmd
5226  1   5226 99  30x90  poll sndiod
 10951  1  10951  0  30xb0  select   sendmail
 32479  1  32479  0  30x80  select   sshd
1841   4987118 83  30x90  poll ntpd
4987118118 83  30x90  poll ntpd
118  1118  0  30x80  poll ntpd
 11290   6871   6871 74  30x90  bpf  pflogd
6871  1   6871  0  30x80  netiopflogd
4023  18959  18959 73  30x90  poll syslogd
 18959  1  18959  0  30x80  netiosyslogd
 25941  1  25941 77  30x90  poll dhclient
 24059  1  24059  0  30x80  poll dhclient
 22870  0  0  0  3  0x4200  boredttm_swap
 30340  0  0  0  3  0x4200  aiodoned aiodoned
8303  0  0  0  3  0x4200  syncer   update
 32489  0  0  0  3  0x4200  cleaner  cleaner
3256  0  0  0  3  0x4200  reaper   reaper
 27232  0  0  0  3  0x4200  pgdaemon pagedaemon
 15293  0  0  0  3  0x4200  boredcrypto
3835  0  0  0  3  0x4200  pftm pfpurge
 13053  0  0  0  3  0x4200  boredsensors
* 5362  0  0  0  7  0x4200 ;   usbtask
 18010  0  0  0  3  0x4200  usbatsk  usbatsk
 30029  0  0  0  3  0x40004200  acpi0apci0
 17126  0  0  0  7  0x40004200   & nbsp;   idle1
 16167  0  0  0  3  0x4200  boredsystq
 15255  0  0  0  3  0x4200  boredsyswq
 25236  0  0  0  3  0x40004200   & nbsp;   idle0
1  0  1  0  30x82  wait init
0 -1  0  0  3   0x200  schedulerswapper
ddb{0}>

*dmesg

OpenBSD 5.5 (GENERIC.MP) #296: Sun Feb 23 11:54:46 MST 2014
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3738107904 (3564MB)
avail mem = 3629985792 (3461MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf0100 (61 entries)
bios0: vendor Award Software International, Inc. version "F4" date
07/28/2010
bios0: Gigabyte Technology Co., Ltd. GA-880GMA-UD2H
acpi0 at bios0: rev 0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT HPET MCFG MATS APIC
acpi0: wakeup devices USB0(S3) USB1(S3) USB2(S3) USB3(S3) USB4(S3)
USB5(S3) USB6(S3) SBAZ(S4) PEX0(S5) PEX1(S5) PEX2(S5) PEX3(S5) P2P_(S5)
PCE2(S4) PCE3(S4) PCE4(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpihpet0 at acpi0: 14318180 Hz
acpimcfg0 at acpi0 addr 0xe000, bus 0-255
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Athlon(tm) II X2 240e Processor, 11252.14 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,ITSC<
br> cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB
64b/line 16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
cpu0: DTLB 48 4KB entries fully associative, 48 4MB entries fully
a

Re: kernel crash when connecting MP3-Player (Sansa Clip Zip)

2014-02-24 Thread Martin Pieuchot
Hello,

On 24/02/14(Mon) 13:01, n.reu...@hxgn.net wrote:
> Hi bugs@,
> 
> whenever i connect my turned-off Sansa Clip Zip to my openbsd desktop
> machine,
> the kernel hangs and throws me into dbb.  It's 100% reproducible.  The
> output
> was generated from the latest amd64-current-snapshot (23.02., dmesg
> below),
> connecting the player immediately after boot.

This looks like one of the DMA segment has been written past its size or
after being freed.
I see that you have various USB devices on your system, does it happen if
you only have your music player connected?  I assume it attaches at ehci,
could you confirm that?  If that's the case can you try to disable ehci
at boot (by doing "boot -c" then "disable ehci" and "quit") and see if
the problem is still there?

Finally could you try to following diff and send me its output?

> 
> Something else i noticed while looking at the ps output: there is a
> 'usbtask'
> and a 'usbatsk' command running, is that correct or a typo somewhere?

It's correct, the first one is the kernel thread executing most of the
USB operations (USB task) and the second one only do aborts (USB abort
task).


Thanks for you report,
M.

Index: usb_mem.c
===
RCS file: /home/ncvs/src/sys/dev/usb/usb_mem.c,v
retrieving revision 1.25
diff -u -p -r1.25 usb_mem.c
--- usb_mem.c   15 Apr 2013 09:23:02 -  1.25
+++ usb_mem.c   24 Feb 2014 15:06:12 -
@@ -52,10 +52,13 @@
 #include   /* just for struct usb_dma */
 #include 
 
+#undef USB_DEBUG
+#define USB_DEBUG
+
 #ifdef USB_DEBUG
 #define DPRINTF(x) do { if (usbdebug) printf x; } while (0)
 #define DPRINTFN(n,x)  do { if (usbdebug>(n)) printf x; } while (0)
-extern int usbdebug;
+int usbdebug = 7;
 #else
 #define DPRINTF(x)
 #define DPRINTFN(n,x)



Re: Bad return value for getpwnam_r et al

2014-02-24 Thread Ingo Schwarze
Hi Stuart,

Stuart Henderson wrote on Sun, Feb 23, 2014 at 02:22:47PM +:

> btw, here are results from a search for getpw(nam|uid)_r in unpacked
> ports source, after removing a bunch of autoconf checks etc.

Wow, these are nearly a hundred ports.  It is not feasible to audit
all that.  Besides, my patches also imply slight changes to the
behaviour of getgr{nam,gid}_r() (when called with errno already set)
and to getpw{nam,uid}() (which may now set errno even when finding
a match and will no longer set errno with the third patch in that
case).  So your list is too long for a complete audit even though
it is incomplete.

Anyway, i looked at seven examples from your list, choosing ports
from different categories.  The results are, if i read the code
correctly:

 - I found no cases where my patches might break something.
 - One of the seven ports would be significantly improved (postfix).
 - Three ports get minor improvements (gdm, glib2, ruby).
 - For three ports, it makes no difference (gmake, postgres, gnutls).
 - Nearly all changes in behaviour are due to the first patch -
   change of the return values - and among these, nearly all
   are due to changing the return value from 1 to 0 when not
   finding a match, but when there is no error.
 - The errno variable and the exact return value (as opposed to
   merely whether it's 0 or non-0) are rarely looked at.
 - For detailed results, look at the very end of this mail.

Even after unlock, i don't think it would be reasonable to try to
evaluate all the consequences of the changes.  After unlock, we
should just commit all the changes such that libc behaves reasonably,
then watch out for any fallout.

What do we commit now?

After this review, i'd say:

 * PLEASE somebody give me an OK just for the first patch.
   It actually improves various ports.
   It doesn't look like it's going to break stuff.
 (Theo said he is not opposed to fixing part of this,
  but cannot look at the details right now.)

 * Let's not commit the second patch
 (avoid errno clobbering as suggested by guenther@).
   It doesn't seem to matter much, so let's avoid any risk.

 * Let's not commit the third patch
 (let getpw*_r() not touch errno as suggested by kettenis@).
   Even though i didn't find any port where this might matter,
   i still consider it more risky than the first two patches.

Here is the first patch, again.

Yours,
  Ingo


Index: getpwent.c
===
RCS file: /cvs/src/lib/libc/gen/getpwent.c,v
retrieving revision 1.48
diff -u -p -r1.48 getpwent.c
--- getpwent.c  15 Nov 2013 22:32:55 -  1.48
+++ getpwent.c  19 Feb 2014 16:36:48 -
@@ -708,8 +708,12 @@ getpwnam_r(const char *name, struct pass
 {
struct passwd *pwret = NULL;
int flags = 0, *flagsp;
+   int my_errno = 0;
+   int saved_errno;
 
_THREAD_PRIVATE_MUTEX_LOCK(pw);
+   saved_errno = errno;
+   errno = 0;
if (!_pw_db && !__initdb())
goto fail;
 
@@ -733,8 +737,12 @@ getpwnam_r(const char *name, struct pass
 fail:
if (pwretp)
*pwretp = pwret;
+   if (pwret == NULL)
+   my_errno = errno;
+   if (!errno)
+   errno = saved_errno;
_THREAD_PRIVATE_MUTEX_UNLOCK(pw);
-   return (pwret ? 0 : 1);
+   return (my_errno);
 }
 
 struct passwd *
@@ -753,8 +761,12 @@ getpwuid_r(uid_t uid, struct passwd *pw,
 {
struct passwd *pwret = NULL;
int flags = 0, *flagsp;
+   int my_errno = 0;
+   int saved_errno;
 
_THREAD_PRIVATE_MUTEX_LOCK(pw);
+   saved_errno = errno;
+   errno = 0;
if (!_pw_db && !__initdb())
goto fail;
 
@@ -778,8 +790,12 @@ getpwuid_r(uid_t uid, struct passwd *pw,
 fail:
if (pwretp)
*pwretp = pwret;
+   if (pwret == NULL)
+   my_errno = errno;
+   if (!errno)
+   errno = saved_errno;
_THREAD_PRIVATE_MUTEX_UNLOCK(pw);
-   return (pwret ? 0 : 1);
+   return (my_errno);
 }
 
 struct passwd *
Index: getgrent.c
===
RCS file: /cvs/src/lib/libc/gen/getgrent.c,v
retrieving revision 1.38
diff -u -p -r1.38 getgrent.c
--- getgrent.c  17 Apr 2013 17:40:35 -  1.38
+++ getgrent.c  19 Feb 2014 16:36:49 -
@@ -134,6 +134,7 @@ getgrnam_r(const char *name, struct grou
if (bufsize < GETGR_R_SIZE_MAX)
return ERANGE;
errnosave = errno;
+   errno = 0;
*result = getgrnam_gs(name, grp, (struct group_storage *)buffer);
if (*result == NULL)
ret = errno;
@@ -180,6 +181,7 @@ getgrgid_r(gid_t gid, struct group *grp,
if (bufsize < GETGR_R_SIZE_MAX)
return ERANGE;
errnosave = errno;
+   errno = 0;
*result = getgrgid_gs(gid, grp, (struct group_storage *)buffer);
if (*result == NULL)
ret 

Re: Bad return value for getpwnam_r et al

2014-02-24 Thread Mark Kettenis
> Date: Mon, 24 Feb 2014 16:51:42 +0100
> From: Ingo Schwarze 
> 
> Hi Stuart,
> 
> Stuart Henderson wrote on Sun, Feb 23, 2014 at 02:22:47PM +:
> 
> > btw, here are results from a search for getpw(nam|uid)_r in unpacked
> > ports source, after removing a bunch of autoconf checks etc.
> 
> Wow, these are nearly a hundred ports.  It is not feasible to audit
> all that.  Besides, my patches also imply slight changes to the
> behaviour of getgr{nam,gid}_r() (when called with errno already set)
> and to getpw{nam,uid}() (which may now set errno even when finding
> a match and will no longer set errno with the third patch in that
> case).  So your list is too long for a complete audit even though
> it is incomplete.
> 
> Anyway, i looked at seven examples from your list, choosing ports
> from different categories.  The results are, if i read the code
> correctly:
> 
>  - I found no cases where my patches might break something.
>  - One of the seven ports would be significantly improved (postfix).
>  - Three ports get minor improvements (gdm, glib2, ruby).
>  - For three ports, it makes no difference (gmake, postgres, gnutls).
>  - Nearly all changes in behaviour are due to the first patch -
>change of the return values - and among these, nearly all
>are due to changing the return value from 1 to 0 when not
>finding a match, but when there is no error.
>  - The errno variable and the exact return value (as opposed to
>merely whether it's 0 or non-0) are rarely looked at.
>  - For detailed results, look at the very end of this mail.
> 
> Even after unlock, i don't think it would be reasonable to try to
> evaluate all the consequences of the changes.  After unlock, we
> should just commit all the changes such that libc behaves reasonably,
> then watch out for any fallout.
> 
> What do we commit now?
> 
> After this review, i'd say:
> 
>  * PLEASE somebody give me an OK just for the first patch.
>It actually improves various ports.
>It doesn't look like it's going to break stuff.
>  (Theo said he is not opposed to fixing part of this,
>   but cannot look at the details right now.)
> 
>  * Let's not commit the second patch
>  (avoid errno clobbering as suggested by guenther@).
>It doesn't seem to matter much, so let's avoid any risk.
> 
>  * Let's not commit the third patch
>  (let getpw*_r() not touch errno as suggested by kettenis@).
>Even though i didn't find any port where this might matter,
>i still consider it more risky than the first two patches.
> 
> Here is the first patch, again.

While I agree this is a step in the right direction, and probably not
going to break stuff, I'm not sure this is the time to change this.
It does change the behaviour, and some ports will end up in codepaths
that have not been tested on OpenBSD.

> Index: getpwent.c
> ===
> RCS file: /cvs/src/lib/libc/gen/getpwent.c,v
> retrieving revision 1.48
> diff -u -p -r1.48 getpwent.c
> --- getpwent.c15 Nov 2013 22:32:55 -  1.48
> +++ getpwent.c19 Feb 2014 16:36:48 -
> @@ -708,8 +708,12 @@ getpwnam_r(const char *name, struct pass
>  {
>   struct passwd *pwret = NULL;
>   int flags = 0, *flagsp;
> + int my_errno = 0;
> + int saved_errno;
>  
>   _THREAD_PRIVATE_MUTEX_LOCK(pw);
> + saved_errno = errno;
> + errno = 0;
>   if (!_pw_db && !__initdb())
>   goto fail;
>  
> @@ -733,8 +737,12 @@ getpwnam_r(const char *name, struct pass
>  fail:
>   if (pwretp)
>   *pwretp = pwret;
> + if (pwret == NULL)
> + my_errno = errno;
> + if (!errno)
> + errno = saved_errno;
>   _THREAD_PRIVATE_MUTEX_UNLOCK(pw);
> - return (pwret ? 0 : 1);
> + return (my_errno);
>  }
>  
>  struct passwd *
> @@ -753,8 +761,12 @@ getpwuid_r(uid_t uid, struct passwd *pw,
>  {
>   struct passwd *pwret = NULL;
>   int flags = 0, *flagsp;
> + int my_errno = 0;
> + int saved_errno;
>  
>   _THREAD_PRIVATE_MUTEX_LOCK(pw);
> + saved_errno = errno;
> + errno = 0;
>   if (!_pw_db && !__initdb())
>   goto fail;
>  
> @@ -778,8 +790,12 @@ getpwuid_r(uid_t uid, struct passwd *pw,
>  fail:
>   if (pwretp)
>   *pwretp = pwret;
> + if (pwret == NULL)
> + my_errno = errno;
> + if (!errno)
> + errno = saved_errno;
>   _THREAD_PRIVATE_MUTEX_UNLOCK(pw);
> - return (pwret ? 0 : 1);
> + return (my_errno);
>  }
>  
>  struct passwd *
> Index: getgrent.c
> ===
> RCS file: /cvs/src/lib/libc/gen/getgrent.c,v
> retrieving revision 1.38
> diff -u -p -r1.38 getgrent.c
> --- getgrent.c17 Apr 2013 17:40:35 -  1.38
> +++ getgrent.c19 Feb 2014 16:36:49 -
> @@ -134,6 +134,7 @@ getgrnam_r(const char *name, struct grou
>   if (bufsize < GETGR_R_SIZE_MAX)
> 

Re: Bad return value for getpwnam_r et al

2014-02-24 Thread Theo de Raadt
> While I agree this is a step in the right direction, and probably not
> going to break stuff, I'm not sure this is the time to change this.
> It does change the behaviour, and some ports will end up in codepaths

Agreed.  This is postponed to after release.  It didn't make the cutoff.