qemu crash

2008-11-26 Thread Marco Peereboom
assertion "!"feature is missing in this emulation: " "unknown word read"" 
failed: file "/usr/obj/i386/qemu-0.9.1p4/qemu-0.9.1/hw/eepro100.c", line 1202, 
function "eepro100_read2"

when running with fxp in qemu like:
qemu -nographic -hda boot.img -net nic,model=i82551 -net tap



Re: qemu crash

2008-12-01 Thread Marco Peereboom
No one cares?

On Wed, Nov 26, 2008 at 08:49:57PM -0600, Marco Peereboom wrote:
> assertion "!"feature is missing in this emulation: " "unknown word read"" 
> failed: file "/usr/obj/i386/qemu-0.9.1p4/qemu-0.9.1/hw/eepro100.c", line 
> 1202, function "eepro100_read2"
> 
> when running with fxp in qemu like:
> qemu -nographic -hda boot.img -net nic,model=i82551 -net tap
> 



Re: qemu crash

2008-12-01 Thread Brad
On Monday 01 December 2008 22:02:37 Marco Peereboom wrote:
> No one cares?
>
> On Wed, Nov 26, 2008 at 08:49:57PM -0600, Marco Peereboom wrote:
> > assertion "!"feature is missing in this emulation: " "unknown word read""
> > failed: file "/usr/obj/i386/qemu-0.9.1p4/qemu-0.9.1/hw/eepro100.c", line
> > 1202, function "eepro100_read2"
> >
> > when running with fxp in qemu like:
> > qemu -nographic -hda boot.img -net nic,model=i82551 -net tap

Can you try the i82557b model and see if there is any change in behavior?

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Re: qemu crash

2008-12-01 Thread Marco Peereboom
Same, I tried all fxp.

On Mon, Dec 01, 2008 at 10:09:20PM -0500, Brad wrote:
> On Monday 01 December 2008 22:02:37 Marco Peereboom wrote:
> > No one cares?
> >
> > On Wed, Nov 26, 2008 at 08:49:57PM -0600, Marco Peereboom wrote:
> > > assertion "!"feature is missing in this emulation: " "unknown word read""
> > > failed: file "/usr/obj/i386/qemu-0.9.1p4/qemu-0.9.1/hw/eepro100.c", line
> > > 1202, function "eepro100_read2"
> > >
> > > when running with fxp in qemu like:
> > > qemu -nographic -hda boot.img -net nic,model=i82551 -net tap
> 
> Can you try the i82557b model and see if there is any change in behavior?
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 



Re: qemu crash

2008-12-07 Thread Todd T. Fries
As recommended during anytime you pkg_add qemu, please use the ne2k_pci.

I've also had luck with model=rl8139.

Anything else is slightly experimental at best.

I suspect newer qemu has better code for fxp network card emulation, but for
now (i.e. until the newer qemu comes out) if you want reliable behavior stick
with what works (i.e. realtek of one form or fashion).

Thanks,
-- 
Todd Fries .. [EMAIL PROTECTED]

 _
| \  1.636.410.0632 (voice)
| Free Daemon Consulting, LLC \  1.405.227.9094 (voice)
| http://FreeDaemonConsulting.com \  1.866.792.3418 (FAX)
| "..in support of free software solutions."  \  250797 (FWD)
| \
 \\
 
  37E7 D3EB 74D0 8D66 A68D  B866 0326 204E 3F42 004A
http://todd.fries.net/pgp.txt

Penned by Marco Peereboom on 20081201 21:10.14, we have:
| Same, I tried all fxp.
| 
| On Mon, Dec 01, 2008 at 10:09:20PM -0500, Brad wrote:
| > On Monday 01 December 2008 22:02:37 Marco Peereboom wrote:
| > > No one cares?
| > >
| > > On Wed, Nov 26, 2008 at 08:49:57PM -0600, Marco Peereboom wrote:
| > > > assertion "!"feature is missing in this emulation: " "unknown word 
read""
| > > > failed: file "/usr/obj/i386/qemu-0.9.1p4/qemu-0.9.1/hw/eepro100.c", line
| > > > 1202, function "eepro100_read2"
| > > >
| > > > when running with fxp in qemu like:
| > > > qemu -nographic -hda boot.img -net nic,model=i82551 -net tap
| > 
| > Can you try the i82557b model and see if there is any change in behavior?
| > 
| > -- 
| > This message has been scanned for viruses and
| > dangerous content by MailScanner, and is
| > believed to be clean.
| > 



Re: qemu crash

2008-12-08 Thread Brad
On Sunday 07 December 2008 23:47:08 Todd T. Fries wrote:
> I suspect newer qemu has better code for fxp network card emulation, but
> for now (i.e. until the newer qemu comes out) if you want reliable behavior
> stick with what works (i.e. realtek of one form or fashion).

No, it does not.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



potential qemu crash fix, please test

2011-01-21 Thread Stefan Sperling
I've run into a qemu crash with the following trace:

#0  _thread_kern_sig_undefer ()
at /usr/src/lib/libpthread/uthread/uthread_kern.c:1003
1003if (curthread->sig_defer_count > 1) {
(gdb) p curthread
$1 = (struct pthread *) 0x8
(gdb) bt
#0  _thread_kern_sig_undefer ()
at /usr/src/lib/libpthread/uthread/uthread_kern.c:1003
#1  0x000209fbb039 in _thread_kern_sig_defer ()
at /usr/src/lib/libpthread/uthread/uthread_kern.c:988
#2  0x000209fb6d82 in _thread_fd_unlock (fd=Variable "fd" is not available.
)
at /usr/src/lib/libpthread/uthread/uthread_fd.c:568
#3  0x000209fb5a14 in write (fd=121237504, buf=0x651d90, nbytes=8)
at /usr/src/lib/libpthread/uthread/uthread_write.c:170
#4  0x004070af in ?? ()
#5  0x00423b61 in ?? ()
#6  0x000209fbcb06 in _dispatch_signal (sig=31, scp=0x202612ac0)
at /usr/src/lib/libpthread/uthread/uthread_sig.c:400
#7  0x000209fbcbe3 in _dispatch_signals (scp=0x202612ac0)
at /usr/src/lib/libpthread/uthread/uthread_sig.c:429
#8  0x000209fbd31d in _thread_sig_handler (sig=14, info=0x202612ba0, 
scp=0x202612ac0) at /usr/src/lib/libpthread/uthread/uthread_sig.c:139
#9  0x000202612ac0 in ?? ()
#10 0x00020739f190 in ?? ()
#11 0x00020739f000 in ?? ()
#12 0x0004 in ?? ()
#13 0x4d38f2e5 in ?? ()
#14 0x

Note that call to write() which gets a bogus fd.
This fd is io_thread_fd in the patch below.

This seems to help stability here.
It would be interesting to know if this helps others who have seen qemu crash.

Index: Makefile
===
RCS file: /cvs/ports/emulators/qemu/Makefile,v
retrieving revision 1.59
diff -u -p -r1.59 Makefile
--- Makefile22 Nov 2010 11:32:01 -  1.59
+++ Makefile21 Jan 2011 09:28:07 -
@@ -6,6 +6,7 @@ ONLY_FOR_ARCHS =i386 amd64 sparc64
 COMMENT =  multi system emulator
 
 DISTNAME = qemu-0.13.0
+REVISION = 0
 CATEGORIES =   emulators
 
 HOMEPAGE = http://www.qemu.org/
Index: patches/patch-cpus_c
===
RCS file: patches/patch-cpus_c
diff -N patches/patch-cpus_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-cpus_c21 Jan 2011 09:27:20 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- cpus.c.origFri Jan 21 10:24:52 2011
 cpus.c Fri Jan 21 10:26:29 2011
+@@ -149,7 +149,7 @@ static void cpu_debug_handler(CPUState *env)
+ }
+ 
+ #ifndef _WIN32
+-static int io_thread_fd = -1;
++static volatile sig_atomic_t io_thread_fd = -1;
+ 
+ static void qemu_event_increment(void)
+ {
Index: patches/patch-net_h
===
RCS file: /cvs/ports/emulators/qemu/patches/patch-net_h,v
retrieving revision 1.1
diff -u -p -r1.1 patch-net_h
--- patches/patch-net_h 27 May 2010 17:55:05 -  1.1
+++ patches/patch-net_h 21 Jan 2011 09:27:19 -
@@ -1,7 +1,7 @@
 $OpenBSD: patch-net_h,v 1.1 2010/05/27 17:55:05 fgsch Exp $
 net.h.orig Tue Feb 23 20:54:38 2010
-+++ net.h  Mon Mar 22 20:26:50 2010
-@@ -172,7 +172,7 @@ void net_host_device_remove(Monitor *mon, const QDict 
+--- net.h.orig Fri Oct 15 22:56:09 2010
 net.h  Fri Jan 21 10:04:46 2011
+@@ -172,7 +172,7 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QO
  #ifdef __sun__
  #define SMBD_COMMAND "/usr/sfw/sbin/smbd"
  #else
Index: patches/patch-posix-aio-compat_c
===
RCS file: patches/patch-posix-aio-compat_c
diff -N patches/patch-posix-aio-compat_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-posix-aio-compat_c21 Jan 2011 09:27:20 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- posix-aio-compat.c.origFri Jan 21 10:27:04 2011
 posix-aio-compat.c Fri Jan 21 10:15:20 2011
+@@ -495,7 +495,7 @@ static int posix_aio_flush(void *opaque)
+ return !!s->first_aio;
+ }
+ 
+-static PosixAioState *posix_aio_state;
++static volatile PosixAioState *posix_aio_state;
+ 
+ static void aio_signal_handler(int signum)
+ {
Index: patches/patch-qemu-options_hx
===
RCS file: /cvs/ports/emulators/qemu/patches/patch-qemu-options_hx,v
retrieving revision 1.1
diff -u -p -r1.1 patch-qemu-options_hx
--- patches/patch-qemu-options_hx   27 May 2010 17:55:05 -  1.1
+++ patches/patch-qemu-options_hx   21 Jan 2011 09:27:19 -
@@ -1,7 +1,7 @@
 $OpenBSD: patch-qemu-options_hx,v 1.1 2010/05/27 17:55:05 fgsch Exp $
 qemu-options.hx.orig   Tue Feb 23 20:54:38 2010
-+++ qemu-options.hxMon Mar 22 20:26:50 2010
-@@ -942,7 +942,7 @@ or @file{C:\WINNT\SYSTEM32\DRIVERS\ETC\LMHOSTS} (Windo
+--- qemu-options.hx.orig   Fri Oct 15 22:56:09 2010
 qemu-options.hxFri Jan 21 10:04:46 2011
+@@ -1105,7 +1105,7 @@ or @file{C:\WINNT\SYSTEM32\DRIVERS\ETC\LMH

Re: potential qemu crash fix, please test

2011-01-21 Thread Stuart Henderson
On 2011/01/21 12:18, Stefan Sperling wrote:
> I've run into a qemu crash with the following trace:

Cool! On the occasions when I have been able to obtain a valid
backtrace after a crash, it has always looked like this.

Since there have been a number of people who have been
complaining about the qemu port, there should be no shortage
of testers!

I committed to qemu recently, so here is an updated diff that
will apply to -current.


Index: Makefile
===
RCS file: /cvs/ports/emulators/qemu/Makefile,v
retrieving revision 1.60
diff -u -p -r1.60 Makefile
--- Makefile19 Jan 2011 16:22:31 -  1.60
+++ Makefile21 Jan 2011 11:22:13 -
@@ -6,7 +6,7 @@ ONLY_FOR_ARCHS =i386 amd64 sparc64
 COMMENT =  multi system emulator
 
 DISTNAME = qemu-0.13.0
-REVISION = 0
+REVISION = 1
 CATEGORIES =   emulators
 
 HOMEPAGE = http://www.qemu.org/
Index: patches/patch-cpus_c
===
RCS file: patches/patch-cpus_c
diff -N patches/patch-cpus_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-cpus_c21 Jan 2011 11:22:13 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- cpus.c.origFri Jan 21 10:24:52 2011
 cpus.c Fri Jan 21 10:26:29 2011
+@@ -149,7 +149,7 @@ static void cpu_debug_handler(CPUState *env)
+ }
+ 
+ #ifndef _WIN32
+-static int io_thread_fd = -1;
++static volatile sig_atomic_t io_thread_fd = -1;
+ 
+ static void qemu_event_increment(void)
+ {
Index: patches/patch-net_h
===
RCS file: /cvs/ports/emulators/qemu/patches/patch-net_h,v
retrieving revision 1.1
diff -u -p -r1.1 patch-net_h
--- patches/patch-net_h 27 May 2010 17:55:05 -  1.1
+++ patches/patch-net_h 21 Jan 2011 11:22:13 -
@@ -1,7 +1,7 @@
 $OpenBSD: patch-net_h,v 1.1 2010/05/27 17:55:05 fgsch Exp $
 net.h.orig Tue Feb 23 20:54:38 2010
-+++ net.h  Mon Mar 22 20:26:50 2010
-@@ -172,7 +172,7 @@ void net_host_device_remove(Monitor *mon, const QDict 
+--- net.h.orig Fri Oct 15 22:56:09 2010
 net.h  Fri Jan 21 10:04:46 2011
+@@ -172,7 +172,7 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QO
  #ifdef __sun__
  #define SMBD_COMMAND "/usr/sfw/sbin/smbd"
  #else
Index: patches/patch-posix-aio-compat_c
===
RCS file: patches/patch-posix-aio-compat_c
diff -N patches/patch-posix-aio-compat_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-posix-aio-compat_c21 Jan 2011 11:22:13 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- posix-aio-compat.c.origFri Jan 21 10:27:04 2011
 posix-aio-compat.c Fri Jan 21 10:15:20 2011
+@@ -495,7 +495,7 @@ static int posix_aio_flush(void *opaque)
+ return !!s->first_aio;
+ }
+ 
+-static PosixAioState *posix_aio_state;
++static volatile PosixAioState *posix_aio_state;
+ 
+ static void aio_signal_handler(int signum)
+ {
Index: patches/patch-qemu-options_hx
===
RCS file: /cvs/ports/emulators/qemu/patches/patch-qemu-options_hx,v
retrieving revision 1.1
diff -u -p -r1.1 patch-qemu-options_hx
--- patches/patch-qemu-options_hx   27 May 2010 17:55:05 -  1.1
+++ patches/patch-qemu-options_hx   21 Jan 2011 11:22:13 -
@@ -1,7 +1,7 @@
 $OpenBSD: patch-qemu-options_hx,v 1.1 2010/05/27 17:55:05 fgsch Exp $
 qemu-options.hx.orig   Tue Feb 23 20:54:38 2010
-+++ qemu-options.hxMon Mar 22 20:26:50 2010
-@@ -942,7 +942,7 @@ or @file{C:\WINNT\SYSTEM32\DRIVERS\ETC\LMHOSTS} (Windo
+--- qemu-options.hx.orig   Fri Oct 15 22:56:09 2010
 qemu-options.hxFri Jan 21 10:04:46 2011
+@@ -1105,7 +1105,7 @@ or @file{C:\WINNT\SYSTEM32\DRIVERS\ETC\LMHOSTS} (Windo
  Then @file{@var{dir}} can be accessed in @file{\\smbserver\qemu}.
  
  Note that a SAMBA server must be installed on the host OS in



Re: potential qemu crash fix, please test

2011-01-21 Thread David Coppa
On Fri, Jan 21, 2011 at 12:25 PM, Stuart Henderson  wrote:
> On 2011/01/21 12:18, Stefan Sperling wrote:
>> I've run into a qemu crash with the following trace:
>
> Cool! On the occasions when I have been able to obtain a valid
> backtrace after a crash, it has always looked like this.
>
> Since there have been a number of people who have been
> complaining about the qemu port, there should be no shortage
> of testers!
>
> I committed to qemu recently, so here is an updated diff that
> will apply to -current.

Cool. Hoping this can go into 4.9... So, qemu users, please test it.

ciao,
david



Re: potential qemu crash fix, please test

2011-01-21 Thread Stefan Sperling
On Fri, Jan 21, 2011 at 12:18:28PM +0100, Stefan Sperling wrote:
> It would be interesting to know if this helps others who have seen qemu crash.

Well it did eventually crash again, but with a nonesense trace this time.

Meanwhile I've been looking at some of the signal handlers and there
are quite a few that naively use non-volatile global variables and
also linked lists. Below is what I'm running with now hoping it will help.
But I didn't try to fix the ones that traverse linked lists (they
are in the shutdown and gdbstub code paths).

Putting this here in case more people want to help testing to see
if this really makes a difference.
This one also applies to -current, thanks for the hint Stuart.

Maybe compiling without optimisation will help? Did anyone ever try that?

Index: Makefile
===
RCS file: /cvs/ports/emulators/qemu/Makefile,v
retrieving revision 1.60
diff -u -p -r1.60 Makefile
--- Makefile19 Jan 2011 16:22:31 -  1.60
+++ Makefile21 Jan 2011 17:24:45 -
@@ -6,7 +6,7 @@ ONLY_FOR_ARCHS =i386 amd64 sparc64
 COMMENT =  multi system emulator
 
 DISTNAME = qemu-0.13.0
-REVISION = 0
+REVISION = 1
 CATEGORIES =   emulators
 
 HOMEPAGE = http://www.qemu.org/
Index: patches/patch-cpu-all_h
===
RCS file: patches/patch-cpu-all_h
diff -N patches/patch-cpu-all_h
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-cpu-all_h 21 Jan 2011 17:24:09 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- cpu-all.h.orig Fri Jan 21 17:49:27 2011
 cpu-all.h  Fri Jan 21 17:49:43 2011
+@@ -775,7 +775,7 @@ void cpu_dump_statistics (CPUState *env, FILE *f,
+ void QEMU_NORETURN cpu_abort(CPUState *env, const char *fmt, ...)
+ __attribute__ ((__format__ (__printf__, 2, 3)));
+ extern CPUState *first_cpu;
+-extern CPUState *cpu_single_env;
++extern volatile CPUState *cpu_single_env;
+ 
+ #define CPU_INTERRUPT_HARD   0x02 /* hardware interrupt pending */
+ #define CPU_INTERRUPT_EXITTB 0x04 /* exit the current TB (use for x86 a20 
case) */
Index: patches/patch-cpus_c
===
RCS file: patches/patch-cpus_c
diff -N patches/patch-cpus_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-cpus_c21 Jan 2011 17:24:09 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- cpus.c.origFri Jan 21 10:24:52 2011
 cpus.c Fri Jan 21 10:26:29 2011
+@@ -149,7 +149,7 @@ static void cpu_debug_handler(CPUState *env)
+ }
+ 
+ #ifndef _WIN32
+-static int io_thread_fd = -1;
++static volatile sig_atomic_t io_thread_fd = -1;
+ 
+ static void qemu_event_increment(void)
+ {
Index: patches/patch-exec_c
===
RCS file: /cvs/ports/emulators/qemu/patches/patch-exec_c,v
retrieving revision 1.9
diff -u -p -r1.9 patch-exec_c
--- patches/patch-exec_c22 Nov 2010 11:32:01 -  1.9
+++ patches/patch-exec_c21 Jan 2011 17:24:09 -
@@ -1,6 +1,20 @@
 $OpenBSD: patch-exec_c,v 1.9 2010/11/22 11:32:01 fgsch Exp $
 exec.c.origFri Oct 15 21:56:09 2010
-+++ exec.c Thu Nov 18 09:21:58 2010
+--- exec.c.origFri Oct 15 22:56:09 2010
 exec.c Fri Jan 21 17:19:20 2011
+@@ -119,11 +119,11 @@ RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(
+ CPUState *first_cpu;
+ /* current CPU in the current thread. It is only valid inside
+cpu_exec() */
+-CPUState *cpu_single_env;
++volatile CPUState *cpu_single_env;
+ /* 0 = Do not count executed instructions.
+1 = Precise instruction counting.
+2 = Adaptive rate instruction counting.  */
+-int use_icount = 0;
++volatile sig_atomic_t use_icount = 0;
+ /* Current instruction counter.  While executing translated code this may
+include some instructions that have not yet been executed.  */
+ int64_t qemu_icount;
 @@ -524,7 +524,8 @@ static void code_gen_alloc(unsigned long tb_size)
  exit(1);
  }
Index: patches/patch-posix-aio-compat_c
===
RCS file: patches/patch-posix-aio-compat_c
diff -N patches/patch-posix-aio-compat_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-posix-aio-compat_c21 Jan 2011 17:24:09 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- posix-aio-compat.c.origFri Jan 21 10:27:04 2011
 posix-aio-compat.c Fri Jan 21 10:15:20 2011
+@@ -495,7 +495,7 @@ static int posix_aio_flush(void *opaque)
+ return !!s->first_aio;
+ }
+ 
+-static PosixAioState *posix_aio_state;
++static volatile PosixAioState *posix_aio_state;
+ 
+ static void aio_signal_handler(int signum)
+ {
Index: patches/patch-qemu-timer_c
===
RCS file: /cvs/ports/emulators/qemu/patches/patch-qemu-timer_c,v
retrieving revision 1.1
diff -u -p -r1

Re: potential qemu crash fix, please test

2011-01-23 Thread Stefan Sperling
On Fri, Jan 21, 2011 at 06:39:17PM +0100, Stefan Sperling wrote:
> On Fri, Jan 21, 2011 at 12:18:28PM +0100, Stefan Sperling wrote:
> > It would be interesting to know if this helps others who have seen qemu 
> > crash.
> 
> Well it did eventually crash again, but with a nonesense trace this time.

Turns out qemu is running into libpthreads bugs.

With my patches a Linux guest wouldn't crash qemu anymore, even though
it was constantly running 2 builds of Subversion at once and also
Subversion's regression test suite (which is very I/O intensive).

But even with my qemu patches, an OpenBSD guest would still cause qemu
to crash during a simple cvs checkout. The crashes were always in SIGARLM
or SIGUSR2 signal handlers. Qemu uses these signals in its I/O code.

Some crashes had similar traces as the one seen before, with
_thread_kern_sig_defer() apparently calling _thread_kern_sig_undefer().
Some had an apparently corrupt stack -- when this happened, the
program counter within the last sigcontext saved for the running thread
was always in _thread_machdep_restore_float_state().
Not sure what to make of these crashes.

Anyway, I found that qemu runs stable with rthreads, with both
Linux and OpenBSD guests, and without any patches.

cd /usr/src/lib/librthread
make obj depend && make && sudo make install
mkdir ~/.lib
cd ~/.lib; ln -s /usr/lib/librthread.so.4.1 libpthread.so.13.1
sudo sysctl kern.rthreads=1
env LD_LIBRARY_PATH=$HOME/.lib qemu ...

With rthreads, the OpenBSD guest finished a cvs checkout and also
a kernel build. I'm starting a make build on it next.



Re: potential qemu crash fix, please test

2011-01-23 Thread Stefan Sperling
On Sun, Jan 23, 2011 at 02:18:10PM +0100, Stefan Sperling wrote:
> Turns out qemu is running into libpthreads bugs.
 
> Some crashes had similar traces as the one seen before, with
> _thread_kern_sig_defer() apparently calling _thread_kern_sig_undefer().
> Some had an apparently corrupt stack -- when this happened, the
> program counter within the last sigcontext saved for the running thread
> was always in _thread_machdep_restore_float_state().

> With rthreads, the OpenBSD guest finished a cvs checkout and also
> a kernel build. I'm starting a make build on it next.

With this patch to libpthread, an OpenBSD guest in a qemu run with
pthreads (not rthreads) has finished a cvs checkout and a kernel build.
I'm starting a make build on it now. (BTW the instance run with rthreads
is still happily running make build.)

The patch protects the region around _thread_machdep_restore_float_state(),
which severly messes with the stack of the current thread, from being
interrupted by signals.
Please test. This problem could also affect other applications.

I'm not sure of the bit making sig_defer_count volatile is needed,
but it does have an effect on the assembly code generated for
_thread_kern_sig_defer() and _thread_kern_sig_undefer().
Thrown in because it cannot hurt.

Index: uthread/pthread_private.h
===
RCS file: /cvs/src/lib/libpthread/uthread/pthread_private.h,v
retrieving revision 1.76
diff -u -p -r1.76 pthread_private.h
--- uthread/pthread_private.h   28 Oct 2010 15:02:41 -  1.76
+++ uthread/pthread_private.h   22 Jan 2011 17:07:45 -
@@ -761,7 +761,7 @@ struct pthread {
 * Set to non-zero when this thread has deferred signals.
 * We allow for recursive deferral.
 */
-   int sig_defer_count;
+   volatile sig_atomic_t   sig_defer_count;
 
/*
 * Set to TRUE if this thread should yield after undeferring
Index: uthread/uthread_kern.c
===
RCS file: /cvs/src/lib/libpthread/uthread/uthread_kern.c,v
retrieving revision 1.36
diff -u -p -r1.36 uthread_kern.c
--- uthread/uthread_kern.c  21 May 2007 16:50:36 -  1.36
+++ uthread/uthread_kern.c  23 Jan 2011 15:18:03 -
@@ -440,6 +440,12 @@ _thread_kern_sched(struct sigcontext * s
_queue_signals = 0;
}
 
+   /*
+* Prevent the signal handler from fiddling with this
+* thread before its state is set.
+*/
+   _queue_signals = 1;
+
/* Make the selected thread the current thread: */
_set_curthread(pthread_h);
curthread = pthread_h;
@@ -481,6 +487,9 @@ _thread_kern_sched(struct sigcontext * s
 */
curthread = _get_curthread();
_thread_kern_in_sched = 0;
+
+   /* Allow signals again. */
+   _queue_signals = 0;
 
/* run any installed switch-hooks */
if ((_sched_switch_hook != NULL) &&



Re: potential qemu crash fix, please test

2011-01-23 Thread Stefan Sperling
On Sun, Jan 23, 2011 at 09:04:46PM +0100, Stefan Sperling wrote:
> With this patch to libpthread, an OpenBSD guest in a qemu run with
> pthreads (not rthreads) has finished a cvs checkout and a kernel build.
> I'm starting a make build on it now.

Forgot to mention: The earlier patches to qemu are *not* needed with
this libpthread patch.

> Index: uthread/pthread_private.h
> ===
> RCS file: /cvs/src/lib/libpthread/uthread/pthread_private.h,v
> retrieving revision 1.76
> diff -u -p -r1.76 pthread_private.h
> --- uthread/pthread_private.h 28 Oct 2010 15:02:41 -  1.76
> +++ uthread/pthread_private.h 22 Jan 2011 17:07:45 -
> @@ -761,7 +761,7 @@ struct pthread {
>* Set to non-zero when this thread has deferred signals.
>* We allow for recursive deferral.
>*/
> - int sig_defer_count;
> + volatile sig_atomic_t   sig_defer_count;
>  
>   /*
>* Set to TRUE if this thread should yield after undeferring
> Index: uthread/uthread_kern.c
> ===
> RCS file: /cvs/src/lib/libpthread/uthread/uthread_kern.c,v
> retrieving revision 1.36
> diff -u -p -r1.36 uthread_kern.c
> --- uthread/uthread_kern.c21 May 2007 16:50:36 -  1.36
> +++ uthread/uthread_kern.c23 Jan 2011 15:18:03 -
> @@ -440,6 +440,12 @@ _thread_kern_sched(struct sigcontext * s
>   _queue_signals = 0;
>   }
>  
> + /*
> +  * Prevent the signal handler from fiddling with this
> +  * thread before its state is set.
> +  */
> + _queue_signals = 1;
> +
>   /* Make the selected thread the current thread: */
>   _set_curthread(pthread_h);
>   curthread = pthread_h;
> @@ -481,6 +487,9 @@ _thread_kern_sched(struct sigcontext * s
>*/
>   curthread = _get_curthread();
>   _thread_kern_in_sched = 0;
> +
> + /* Allow signals again. */
> + _queue_signals = 0;
>  
>   /* run any installed switch-hooks */
>   if ((_sched_switch_hook != NULL) &&



Re: potential qemu crash fix, please test

2011-01-24 Thread Ryan McBride
This patch helps a lot. I couldn't even get through an install before.
But please don't remove qemu-old yet: I'm using UDP multicast sockets to
build virtual networks, and they fail on  0.13.0:

$ sudo qemu -m 128 -no-fd-bootchk \
-hda virtual.img -boot n -nographic \
-net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:34:03 \
-net user -tftp /usr/src/sys/arch/i386/compile/TEST -bootp pxeboot \
-net nic,vlan=1,model=rtl8139,macaddr=52:54:00:23:03:01 \
-net tap,vlan=1,script=no \
-net nic,vlan=3,model=rtl8139,macaddr=52:54:00:23:03:03 \
-net socket,vlan=3,mcast=230.0.0.1:10003 
setsockopt(SOL_IP, IP_MULTICAST_LOOP): Invalid argument
qemu: -net socket,vlan=3,mcast=230.0.0.1:10003: Device 'socket' could not be 
initialized

Works fine if I comment out the last two lines.


On Fri, Jan 21, 2011 at 12:18:28PM +0100, Stefan Sperling wrote:
> I've run into a qemu crash with the following trace:
> 
> #0  _thread_kern_sig_undefer ()
> at /usr/src/lib/libpthread/uthread/uthread_kern.c:1003
> 1003if (curthread->sig_defer_count > 1) {
> (gdb) p curthread
> $1 = (struct pthread *) 0x8
> (gdb) bt
> #0  _thread_kern_sig_undefer ()
> at /usr/src/lib/libpthread/uthread/uthread_kern.c:1003
> #1  0x000209fbb039 in _thread_kern_sig_defer ()
> at /usr/src/lib/libpthread/uthread/uthread_kern.c:988
> #2  0x000209fb6d82 in _thread_fd_unlock (fd=Variable "fd" is not 
> available.
> )
> at /usr/src/lib/libpthread/uthread/uthread_fd.c:568
> #3  0x000209fb5a14 in write (fd=121237504, buf=0x651d90, nbytes=8)
> at /usr/src/lib/libpthread/uthread/uthread_write.c:170
> #4  0x004070af in ?? ()
> #5  0x00423b61 in ?? ()
> #6  0x000209fbcb06 in _dispatch_signal (sig=31, scp=0x202612ac0)
> at /usr/src/lib/libpthread/uthread/uthread_sig.c:400
> #7  0x000209fbcbe3 in _dispatch_signals (scp=0x202612ac0)
> at /usr/src/lib/libpthread/uthread/uthread_sig.c:429
> #8  0x000209fbd31d in _thread_sig_handler (sig=14, info=0x202612ba0, 
> scp=0x202612ac0) at /usr/src/lib/libpthread/uthread/uthread_sig.c:139
> #9  0x000202612ac0 in ?? ()
> #10 0x00020739f190 in ?? ()
> #11 0x00020739f000 in ?? ()
> #12 0x0004 in ?? ()
> #13 0x4d38f2e5 in ?? ()
> #14 0x
> 
> Note that call to write() which gets a bogus fd.
> This fd is io_thread_fd in the patch below.
> 
> This seems to help stability here.
> It would be interesting to know if this helps others who have seen qemu crash.
> 
> Index: Makefile
> ===
> RCS file: /cvs/ports/emulators/qemu/Makefile,v
> retrieving revision 1.59
> diff -u -p -r1.59 Makefile
> --- Makefile  22 Nov 2010 11:32:01 -  1.59
> +++ Makefile  21 Jan 2011 09:28:07 -
> @@ -6,6 +6,7 @@ ONLY_FOR_ARCHS =  i386 amd64 sparc64
>  COMMENT =multi system emulator
>  
>  DISTNAME =   qemu-0.13.0
> +REVISION =   0
>  CATEGORIES = emulators
>  
>  HOMEPAGE =   http://www.qemu.org/
> Index: patches/patch-cpus_c
> ===
> RCS file: patches/patch-cpus_c
> diff -N patches/patch-cpus_c
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ patches/patch-cpus_c  21 Jan 2011 09:27:20 -
> @@ -0,0 +1,12 @@
> +$OpenBSD$
> +--- cpus.c.orig  Fri Jan 21 10:24:52 2011
>  cpus.c   Fri Jan 21 10:26:29 2011
> +@@ -149,7 +149,7 @@ static void cpu_debug_handler(CPUState *env)
> + }
> + 
> + #ifndef _WIN32
> +-static int io_thread_fd = -1;
> ++static volatile sig_atomic_t io_thread_fd = -1;
> + 
> + static void qemu_event_increment(void)
> + {
> Index: patches/patch-net_h
> ===
> RCS file: /cvs/ports/emulators/qemu/patches/patch-net_h,v
> retrieving revision 1.1
> diff -u -p -r1.1 patch-net_h
> --- patches/patch-net_h   27 May 2010 17:55:05 -  1.1
> +++ patches/patch-net_h   21 Jan 2011 09:27:19 -
> @@ -1,7 +1,7 @@
>  $OpenBSD: patch-net_h,v 1.1 2010/05/27 17:55:05 fgsch Exp $
>  net.h.orig   Tue Feb 23 20:54:38 2010
> -+++ net.hMon Mar 22 20:26:50 2010
> -@@ -172,7 +172,7 @@ void net_host_device_remove(Monitor *mon, const QDict 
> +--- net.h.orig   Fri Oct 15 22:56:09 2010
>  net.hFri Jan 21 10:04:46 2011
> +@@ -172,7 +172,7 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QO
>   #ifdef __sun__
>   #define SMBD_COMMAND "/usr/sfw/sbin/smbd"
>   #else
> Index: patches/patch-posix-aio-compat_c
> ===
&g

Re: potential qemu crash fix, please test

2011-01-24 Thread Federico G. Schwindt
On Mon, Jan 24, 2011 at 05:03:23PM +0900, Ryan McBride wrote:
> This patch helps a lot. I couldn't even get through an install before.
> But please don't remove qemu-old yet: I'm using UDP multicast sockets to
> build virtual networks, and they fail on  0.13.0:
> 
> $ sudo qemu -m 128 -no-fd-bootchk \
> -hda virtual.img -boot n -nographic \
> -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:34:03 \
> -net user -tftp /usr/src/sys/arch/i386/compile/TEST -bootp pxeboot \
> -net nic,vlan=1,model=rtl8139,macaddr=52:54:00:23:03:01 \
> -net tap,vlan=1,script=no \
> -net nic,vlan=3,model=rtl8139,macaddr=52:54:00:23:03:03 \
> -net socket,vlan=3,mcast=230.0.0.1:10003 
> setsockopt(SOL_IP, IP_MULTICAST_LOOP): Invalid argument
> qemu: -net socket,vlan=3,mcast=230.0.0.1:10003: Device 'socket' could not be 
> initialized
> 
> Works fine if I comment out the last two lines.

  this should fix it. can you try it please?

  f.-

Index: Makefile
===
RCS file: /cvs/ports/emulators/qemu/Makefile,v
retrieving revision 1.60
diff -N -u -p Makefile
--- Makefile19 Jan 2011 16:22:31 -  1.60
+++ Makefile24 Jan 2011 15:41:37 -
@@ -6,7 +6,7 @@ ONLY_FOR_ARCHS =i386 amd64 sparc64
 COMMENT =  multi system emulator
 
 DISTNAME = qemu-0.13.0
-REVISION = 0
+REVISION = 1
 CATEGORIES =   emulators
 
 HOMEPAGE = http://www.qemu.org/
Index: patches/patch-net_socket_c
===
RCS file: patches/patch-net_socket_c
diff -N -u -p patches/patch-net_socket_c
--- /dev/null   24 Jan 2011 08:41:37 -
+++ patches/patch-net_socket_c  24 Jan 2011 15:41:37 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- net/socket.c.orig  Mon Jan 24 15:34:58 2011
 net/socket.c   Mon Jan 24 15:35:01 2011
+@@ -195,7 +195,7 @@ static int net_socket_mcast_create(struct sockaddr_in 
+ /* Force mcast msgs to loopback (eg. several QEMUs in same host */
+ val = 1;
+ ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
+-   (const char *)&val, sizeof(val));
++   (const char *)&val, sizeof(char));
+ if (ret < 0) {
+   perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
+   goto fail;



Re: potential qemu crash fix, please test

2011-01-24 Thread Federico G. Schwindt
On Sun, Jan 23, 2011 at 09:04:46PM +0100, Stefan Sperling wrote:
> On Sun, Jan 23, 2011 at 02:18:10PM +0100, Stefan Sperling wrote:
> > Turns out qemu is running into libpthreads bugs.
>  
> > Some crashes had similar traces as the one seen before, with
> > _thread_kern_sig_defer() apparently calling _thread_kern_sig_undefer().
> > Some had an apparently corrupt stack -- when this happened, the
> > program counter within the last sigcontext saved for the running thread
> > was always in _thread_machdep_restore_float_state().
> 
> > With rthreads, the OpenBSD guest finished a cvs checkout and also
> > a kernel build. I'm starting a make build on it next.
> 
> With this patch to libpthread, an OpenBSD guest in a qemu run with
> pthreads (not rthreads) has finished a cvs checkout and a kernel build.
> I'm starting a make build on it now. (BTW the instance run with rthreads
> is still happily running make build.)
> 
> The patch protects the region around _thread_machdep_restore_float_state(),
> which severly messes with the stack of the current thread, from being
> interrupted by signals.
> Please test. This problem could also affect other applications.
> 
> I'm not sure of the bit making sig_defer_count volatile is needed,
> but it does have an effect on the assembly code generated for
> _thread_kern_sig_defer() and _thread_kern_sig_undefer().
> Thrown in because it cannot hurt.

  i will try to take a look tomorrow. i don't think the volatile is needed
so i prefer if it's not included unless we found otherwise.

  f.-



Re: potential qemu crash fix, please test

2011-01-24 Thread Philip Guenther
On Sun, 23 Jan 2011, Stefan Sperling wrote:
> The patch protects the region around _thread_machdep_restore_float_state(),
> which severly messes with the stack of the current thread, from being
> interrupted by signals.
> Please test. This problem could also affect other applications.
> 
> I'm not sure of the bit making sig_defer_count volatile is needed,
> but it does have an effect on the assembly code generated for
> _thread_kern_sig_defer() and _thread_kern_sig_undefer().

Like Federico, I want to eyeball this part a bit more closely before oking 
it.

The other thing I need to finish double checking is whether the nesting of 
_thread_kern_in_sched vs _queue_signals is correct here:

> @@ -481,6 +487,9 @@ _thread_kern_sched(struct sigcontext * s
>*/
>   curthread = _get_curthread();
>   _thread_kern_in_sched = 0;
> +
> + /* Allow signals again. */
> + _queue_signals = 0;
>  
>   /* run any installed switch-hooks */
>   if ((_sched_switch_hook != NULL) &&

...or whether the order should be flipped.  The core idea makes sense to 
me though.  I should be able to finish reviewing in the next couple days.


Philip



Re: potential qemu crash fix, please test

2011-01-24 Thread Stefan Sperling
On Mon, Jan 24, 2011 at 09:33:40AM -0800, Philip Guenther wrote:
> The other thing I need to finish double checking is whether the nesting of 
> _thread_kern_in_sched vs _queue_signals is correct here:
> 
> > @@ -481,6 +487,9 @@ _thread_kern_sched(struct sigcontext * s
> >  */
> > curthread = _get_curthread();
> > _thread_kern_in_sched = 0;
> > +
> > +   /* Allow signals again. */
> > +   _queue_signals = 0;
> >  
> > /* run any installed switch-hooks */
> > if ((_sched_switch_hook != NULL) &&
> 
> ...or whether the order should be flipped.

You're right, there's a problem. This is tricky.

If I'm not mistaken, with the current ordering, we get the following:

A signal is caught in-between setting these global variables,
i.e. they are: _thread_kern_in_sched == 0 && _queue_signals == 1

_thread_sig_handler()
{
if (sig == _SCHED_SIGNAL) {
if (There are pending signals for the current thread,
i.e. a signal was received while we were in the critical
section protected by the patch) {
_SCHED_SIGNAL is ignored, current thread will
yield in _thread_kern_sig_undefer()
} else {
_thread_kern_sched() is called with
_queue_signals == 1. It immediately sets
_thread_kern_in_sched to 1.

Signals are blocked earlier than usual during
_thread_kern_sched(), but it eventually wants
_queue_signals == 1 anyway. No biggie.

A new thread might be scheduled.
Either the interrupted thread is scheduled again
right way, or another _SCHED_SIGNAL eventually
causes _thread_kern_sched() to resume the interrupted
thread.

When the interrupted thread resumes it is still
in _thread_kern_sched(), about to set
_queue_signals to zero.
But the global _thread_kern_in_sched is 1 from the
entry to _thread_kern_sched() and will NOT be set
to zero cause the thread has already done that!

We did effectively jump within _thread_kern_in_sched(),
skipping the part that sets _thread_kern_in_sched to
zero again. Now we're stuck with
_thread_kern_in_sched = 1.
}
 } else {
signal is queued because of _queue_signals == 1
 }

return to interrupted thread, which proceeds to set
_queue_signals to zero,
or to same or a newly scheduled thread in case sig == _SCHED_SIGNAL
and there were no pending signals.
}


The other case is correct:

_thread_kern_in_sched == 1 && _queue_signals == 0

_thread_sig_handler()
{
if (sig == _SCHED_SIGNAL) {
signal is ignored because of _thread_kern_in_sched == 1;
 } else {
_queue_signals = 1;
signal is handled, possibly dispatched to application
_queue_signals = 0;
 }

return to interrupted thread, which proceeds to set
_thread_kern_in_sched to zero.
}


New diff, also removing the volatile change cause we're not sure if
it's necessary:

Index: uthread/uthread_kern.c
===
RCS file: /cvs/src/lib/libpthread/uthread/uthread_kern.c,v
retrieving revision 1.36
diff -u -p -r1.36 uthread_kern.c
--- uthread/uthread_kern.c  21 May 2007 16:50:36 -  1.36
+++ uthread/uthread_kern.c  24 Jan 2011 20:11:09 -
@@ -440,6 +440,12 @@ _thread_kern_sched(struct sigcontext * s
_queue_signals = 0;
}
 
+   /*
+* Prevent the signal handler from fiddling with this
+* thread before its state is set.
+*/
+   _queue_signals = 1;
+
/* Make the selected thread the current thread: */
_set_curthread(pthread_h);
curthread = pthread_h;
@@ -480,6 +486,11 @@ _thread_kern_sched(struct sigcontext * s
 * before use.
 */
curthread = _get_curthread();
+
+   /* Allow signals again. */
+   _queue_signals = 0;
+
+   /* Done with scheduling. */
_thread_kern_in_sched = 0;
 
/* run any installed switch-hooks */



Re: potential qemu crash fix, please test

2011-01-24 Thread Ryan McBride
On Mon, Jan 24, 2011 at 05:03:23PM +0900, Ryan McBride wrote:
> $ sudo qemu -m 128 -no-fd-bootchk \
> -hda virtual.img -boot n -nographic \
> -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:34:03 \
> -net user -tftp /usr/src/sys/arch/i386/compile/TEST -bootp pxeboot \
> -net nic,vlan=1,model=rtl8139,macaddr=52:54:00:23:03:01 \
> -net tap,vlan=1,script=no \
> -net nic,vlan=3,model=rtl8139,macaddr=52:54:00:23:03:03 \
> -net socket,vlan=3,mcast=230.0.0.1:10003 
> setsockopt(SOL_IP, IP_MULTICAST_LOOP): Invalid argument
> qemu: -net socket,vlan=3,mcast=230.0.0.1:10003: Device 'socket' could not be 
> initialized
> 
> Works fine if I comment out the last two lines.

setsockopt(SOL_IP, IP_MULTICAST_LOOP) takes a u_char, not int as in the
0.13.0 qemu code.

The patch to net/socket.c below fixes this, which lets me test the
pthreads changes properly with my setup.

Index: Makefile
===
RCS file: /cvs/ports/emulators/qemu/Makefile,v
retrieving revision 1.60
diff -u -p -r1.60 Makefile
--- Makefile19 Jan 2011 16:22:31 -  1.60
+++ Makefile24 Jan 2011 23:40:48 -
@@ -6,7 +6,7 @@ ONLY_FOR_ARCHS =i386 amd64 sparc64
 COMMENT =  multi system emulator
 
 DISTNAME = qemu-0.13.0
-REVISION = 0
+REVISION = 1
 CATEGORIES =   emulators
 
 HOMEPAGE = http://www.qemu.org/
--- /dev/null   Tue Jan 25 08:41:19 2011
+++ patches/patch-net_socket_c  Tue Jan 25 05:57:22 2011
@@ -0,0 +1,23 @@
+$OpenBSD$
+--- net/socket.c.orig  Sat Oct 16 05:56:09 2010
 net/socket.c   Tue Jan 25 05:57:04 2011
+@@ -154,6 +154,7 @@ static int net_socket_mcast_create(struct sockaddr_in 
+ struct ip_mreq imr;
+ int fd;
+ int val, ret;
++u_char val2;
+ if (!IN_MULTICAST(ntohl(mcastaddr->sin_addr.s_addr))) {
+   fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) does 
not contain a multicast address\n",
+   inet_ntoa(mcastaddr->sin_addr),
+@@ -193,9 +194,9 @@ static int net_socket_mcast_create(struct sockaddr_in 
+ }
+ 
+ /* Force mcast msgs to loopback (eg. several QEMUs in same host */
+-val = 1;
++val2 = 1;
+ ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
+-   (const char *)&val, sizeof(val));
++   (const char *)&val2, sizeof(val2));
+ if (ret < 0) {
+   perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
+   goto fail;



Re: potential qemu crash fix, please test

2011-01-24 Thread Federico G. Schwindt
On Tue, Jan 25, 2011 at 08:45:38AM +0900, Ryan McBride wrote:
> On Mon, Jan 24, 2011 at 05:03:23PM +0900, Ryan McBride wrote:
> > $ sudo qemu -m 128 -no-fd-bootchk \
> > -hda virtual.img -boot n -nographic \
> > -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:34:03 \
> > -net user -tftp /usr/src/sys/arch/i386/compile/TEST -bootp pxeboot \
> > -net nic,vlan=1,model=rtl8139,macaddr=52:54:00:23:03:01 \
> > -net tap,vlan=1,script=no \
> > -net nic,vlan=3,model=rtl8139,macaddr=52:54:00:23:03:03 \
> > -net socket,vlan=3,mcast=230.0.0.1:10003 
> > setsockopt(SOL_IP, IP_MULTICAST_LOOP): Invalid argument
> > qemu: -net socket,vlan=3,mcast=230.0.0.1:10003: Device 'socket' could not 
> > be initialized
> > 
> > Works fine if I comment out the last two lines.
> 
> setsockopt(SOL_IP, IP_MULTICAST_LOOP) takes a u_char, not int as in the
> 0.13.0 qemu code.
> 
> The patch to net/socket.c below fixes this, which lets me test the
> pthreads changes properly with my setup.

  erhm, I take you didn't see the diff i sent earlier today?
 
  f.-



Re: potential qemu crash fix, please test

2011-01-24 Thread Ryan McBride
On Tue, Jan 25, 2011 at 12:33:08AM +, Federico G. Schwindt wrote:
>   erhm, I take you didn't see the diff i sent earlier today?
>  

ah. Well, I've seen it now. :-)

I like yours better, ok by me.



Re: potential qemu crash fix, please test

2011-01-24 Thread Brad

On 24/01/11 10:43 AM, Federico G. Schwindt wrote:

On Mon, Jan 24, 2011 at 05:03:23PM +0900, Ryan McBride wrote:

This patch helps a lot. I couldn't even get through an install before.
But please don't remove qemu-old yet: I'm using UDP multicast sockets to
build virtual networks, and they fail on  0.13.0:

$ sudo qemu -m 128 -no-fd-bootchk \
 -hda virtual.img -boot n -nographic \
 -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:34:03 \
 -net user -tftp /usr/src/sys/arch/i386/compile/TEST -bootp pxeboot \
 -net nic,vlan=1,model=rtl8139,macaddr=52:54:00:23:03:01 \
 -net tap,vlan=1,script=no \
 -net nic,vlan=3,model=rtl8139,macaddr=52:54:00:23:03:03 \
 -net socket,vlan=3,mcast=230.0.0.1:10003
setsockopt(SOL_IP, IP_MULTICAST_LOOP): Invalid argument
qemu: -net socket,vlan=3,mcast=230.0.0.1:10003: Device 'socket' could not be 
initialized

Works fine if I comment out the last two lines.

   this should fix it. can you try it please?

   f.-

Index: Makefile
===
RCS file: /cvs/ports/emulators/qemu/Makefile,v
retrieving revision 1.60
diff -N -u -p Makefile
--- Makefile19 Jan 2011 16:22:31 -  1.60
+++ Makefile24 Jan 2011 15:41:37 -
@@ -6,7 +6,7 @@ ONLY_FOR_ARCHS =i386 amd64 sparc64
  COMMENT = multi system emulator

  DISTNAME =qemu-0.13.0
-REVISION = 0
+REVISION = 1
  CATEGORIES =  emulators

  HOMEPAGE =http://www.qemu.org/
Index: patches/patch-net_socket_c
===
RCS file: patches/patch-net_socket_c
diff -N -u -p patches/patch-net_socket_c
--- /dev/null   24 Jan 2011 08:41:37 -
+++ patches/patch-net_socket_c  24 Jan 2011 15:41:37 -
@@ -0,0 +1,12 @@
+$OpenBSD$
+--- net/socket.c.orig  Mon Jan 24 15:34:58 2011
 net/socket.c   Mon Jan 24 15:35:01 2011
+@@ -195,7 +195,7 @@ static int net_socket_mcast_create(struct sockaddr_in
+ /* Force mcast msgs to loopback (eg. several QEMUs in same host */
+ val = 1;
+ ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
+-   (const char *)&val, sizeof(val));
++   (const char *)&val, sizeof(char));
+ if (ret<  0) {
+   perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
+   goto fail;


Can you also check the rest of socket.c for the other uses of 
setsockopt(), specifically

the SO_REUSEADDR cases?

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Re: potential qemu crash fix, please test

2011-01-24 Thread Ryan McBride
On Mon, Jan 24, 2011 at 08:04:05PM -0500, Brad wrote:
> Can you also check the rest of socket.c for the other uses of
> setsockopt(), specifically
> the SO_REUSEADDR cases?

Those are fine, they take an int as optval. See setsockopt(2).

(also, the code works correctly with only the IP_MULTICAST_LOOP
setsockopt fixed)



Re: potential qemu crash fix, please test

2011-01-24 Thread Matthew Dempsky
On Mon, Jan 24, 2011 at 03:43:30PM +, Federico G. Schwindt wrote:
> +--- net/socket.c.origMon Jan 24 15:34:58 2011
>  net/socket.c Mon Jan 24 15:35:01 2011
> +@@ -195,7 +195,7 @@ static int net_socket_mcast_create(struct sockaddr_in 
> + /* Force mcast msgs to loopback (eg. several QEMUs in same host */
> + val = 1;
> + ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
> +-   (const char *)&val, sizeof(val));
> ++   (const char *)&val, sizeof(char));
> + if (ret < 0) {
> + perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
> + goto fail;

Won't this break on sparc64, since it's a big-endian architecture?



Re: potential qemu crash fix, please test

2011-01-25 Thread Federico G. Schwindt
On Mon, Jan 24, 2011 at 04:59:46PM -0800, Matthew Dempsky wrote:
> On Mon, Jan 24, 2011 at 03:43:30PM +, Federico G. Schwindt wrote:
> > +--- net/socket.c.orig  Mon Jan 24 15:34:58 2011
> >  net/socket.c   Mon Jan 24 15:35:01 2011
> > +@@ -195,7 +195,7 @@ static int net_socket_mcast_create(struct sockaddr_in 
> > + /* Force mcast msgs to loopback (eg. several QEMUs in same host */
> > + val = 1;
> > + ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
> > +-   (const char *)&val, sizeof(val));
> > ++   (const char *)&val, sizeof(char));
> > + if (ret < 0) {
> > +   perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
> > +   goto fail;
> 
> Won't this break on sparc64, since it's a big-endian architecture?

  likely, so better stick with adding a full new var.
  fwiw, i think we should relax the check in the kernel to do m->m_len < 1
rather than != 1 and ignore any trailing garbage.
  it seems noone except us is so strict wrt the size which leads to
unnecessary patches imho.
  btw, ipv6 uses an int for the IPV6_MULTICAST_LOOP.

  f.-



Re: potential qemu crash fix, please test

2011-01-25 Thread Stefan Sperling
On Sun, Jan 23, 2011 at 09:04:46PM +0100, Stefan Sperling wrote:
> With this patch to libpthread, an OpenBSD guest in a qemu run with
> pthreads (not rthreads) has finished a cvs checkout and a kernel build.
> I'm starting a make build on it now.

The qemu instance run with this patch has finished make build and
also a xenocara build successfully.
There were a handful (less than 10) disk i/o soft errors:
  wd0f: device timeout writing fsbn 2096704 of 2096704-2096735 (wd0 bn 8209472; 
cn
   511 tn 4 sn 5), retrying
  wd0: soft error (corrected)
  wd0(pciide0:0:0): timeout
These were probably due to signals sent from qemu's i/o thread to 
the main thread being queued for a while. There were 2 additional
qemu instances running in parallel doing heavy i/o so the host system
was under heavy load the whole time.

The soft i/o errors didn't show in the rthreads qemu instance (which has
also finished make build + xenocara). As expected, performance of this
instance was better than with pthreads. It finished a couple of hours earlier.

For further testing and review the updated diff at
http://marc.info/?l=openbsd-ports&m=129590196005318&w=2
should be used.

> Index: uthread/pthread_private.h
> ===
> RCS file: /cvs/src/lib/libpthread/uthread/pthread_private.h,v
> retrieving revision 1.76
> diff -u -p -r1.76 pthread_private.h
> --- uthread/pthread_private.h 28 Oct 2010 15:02:41 -  1.76
> +++ uthread/pthread_private.h 22 Jan 2011 17:07:45 -
> @@ -761,7 +761,7 @@ struct pthread {
>* Set to non-zero when this thread has deferred signals.
>* We allow for recursive deferral.
>*/
> - int sig_defer_count;
> + volatile sig_atomic_t   sig_defer_count;
>  
>   /*
>* Set to TRUE if this thread should yield after undeferring
> Index: uthread/uthread_kern.c
> ===
> RCS file: /cvs/src/lib/libpthread/uthread/uthread_kern.c,v
> retrieving revision 1.36
> diff -u -p -r1.36 uthread_kern.c
> --- uthread/uthread_kern.c21 May 2007 16:50:36 -  1.36
> +++ uthread/uthread_kern.c23 Jan 2011 15:18:03 -
> @@ -440,6 +440,12 @@ _thread_kern_sched(struct sigcontext * s
>   _queue_signals = 0;
>   }
>  
> + /*
> +  * Prevent the signal handler from fiddling with this
> +  * thread before its state is set.
> +  */
> + _queue_signals = 1;
> +
>   /* Make the selected thread the current thread: */
>   _set_curthread(pthread_h);
>   curthread = pthread_h;
> @@ -481,6 +487,9 @@ _thread_kern_sched(struct sigcontext * s
>*/
>   curthread = _get_curthread();
>   _thread_kern_in_sched = 0;
> +
> + /* Allow signals again. */
> + _queue_signals = 0;
>  
>   /* run any installed switch-hooks */
>   if ((_sched_switch_hook != NULL) &&



Re: potential qemu crash fix, please test

2011-01-25 Thread Dale Rahn
On Mon, Jan 24, 2011 at 03:43:30PM +, Federico G. Schwindt wrote:
> On Mon, Jan 24, 2011 at 05:03:23PM +0900, Ryan McBride wrote:
> > This patch helps a lot. I couldn't even get through an install before.
> > But please don't remove qemu-old yet: I'm using UDP multicast sockets to
> > build virtual networks, and they fail on  0.13.0:
> > 
> > $ sudo qemu -m 128 -no-fd-bootchk \
> > -hda virtual.img -boot n -nographic \
> > -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:34:03 \
> > -net user -tftp /usr/src/sys/arch/i386/compile/TEST -bootp pxeboot \
> > -net nic,vlan=1,model=rtl8139,macaddr=52:54:00:23:03:01 \
> > -net tap,vlan=1,script=no \
> > -net nic,vlan=3,model=rtl8139,macaddr=52:54:00:23:03:03 \
> > -net socket,vlan=3,mcast=230.0.0.1:10003 
> > setsockopt(SOL_IP, IP_MULTICAST_LOOP): Invalid argument
> > qemu: -net socket,vlan=3,mcast=230.0.0.1:10003: Device 'socket' could not 
> > be initialized
> > 
> > Works fine if I comment out the last two lines.
> 
>   this should fix it. can you try it please?
> 
>   f.-

THIS IS WRONG.

If we ever want qemu to run on big endian again, use mcbrides' diff.

> 
> Index: Makefile
> ===
> RCS file: /cvs/ports/emulators/qemu/Makefile,v
> retrieving revision 1.60
> diff -N -u -p Makefile
> --- Makefile  19 Jan 2011 16:22:31 -  1.60
> +++ Makefile  24 Jan 2011 15:41:37 -
> @@ -6,7 +6,7 @@ ONLY_FOR_ARCHS =  i386 amd64 sparc64
>  COMMENT =multi system emulator
>  
>  DISTNAME =   qemu-0.13.0
> -REVISION =   0
> +REVISION =   1
>  CATEGORIES = emulators
>  
>  HOMEPAGE =   http://www.qemu.org/
> Index: patches/patch-net_socket_c
> ===
> RCS file: patches/patch-net_socket_c
> diff -N -u -p patches/patch-net_socket_c
> --- /dev/null 24 Jan 2011 08:41:37 -
> +++ patches/patch-net_socket_c24 Jan 2011 15:41:37 -
> @@ -0,0 +1,12 @@
> +$OpenBSD$
> +--- net/socket.c.origMon Jan 24 15:34:58 2011
>  net/socket.c Mon Jan 24 15:35:01 2011
> +@@ -195,7 +195,7 @@ static int net_socket_mcast_create(struct sockaddr_in 
> + /* Force mcast msgs to loopback (eg. several QEMUs in same host */
> + val = 1;
> + ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
> +-   (const char *)&val, sizeof(val));
> ++   (const char *)&val, sizeof(char));
> + if (ret < 0) {
> + perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
> + goto fail;
> 
Dale Rahn   dr...@dalerahn.com



Re: potential qemu crash fix, please test

2011-01-25 Thread Federico G. Schwindt
On Tue, Jan 25, 2011 at 09:40:22AM -0600, Dale Rahn wrote:
> On Mon, Jan 24, 2011 at 03:43:30PM +, Federico G. Schwindt wrote:
> > On Mon, Jan 24, 2011 at 05:03:23PM +0900, Ryan McBride wrote:
> > > This patch helps a lot. I couldn't even get through an install before.
> > > But please don't remove qemu-old yet: I'm using UDP multicast sockets to
> > > build virtual networks, and they fail on  0.13.0:
> > > 
> > > $ sudo qemu -m 128 -no-fd-bootchk \
> > > -hda virtual.img -boot n -nographic \
> > > -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:34:03 \
> > > -net user -tftp /usr/src/sys/arch/i386/compile/TEST -bootp 
> > > pxeboot \
> > > -net nic,vlan=1,model=rtl8139,macaddr=52:54:00:23:03:01 \
> > > -net tap,vlan=1,script=no \
> > > -net nic,vlan=3,model=rtl8139,macaddr=52:54:00:23:03:03 \
> > > -net socket,vlan=3,mcast=230.0.0.1:10003 
> > > setsockopt(SOL_IP, IP_MULTICAST_LOOP): Invalid argument
> > > qemu: -net socket,vlan=3,mcast=230.0.0.1:10003: Device 'socket' could not 
> > > be initialized
> > > 
> > > Works fine if I comment out the last two lines.
> > 
> >   this should fix it. can you try it please?
> > 
> >   f.-
> 
> THIS IS WRONG.
> 
> If we ever want qemu to run on big endian again, use mcbrides' diff.

  already agreed.

  f.-



Re: potential qemu crash fix, please test

2011-01-27 Thread Brad
On Tue, Jan 25, 2011 at 09:40:22AM -0600, Dale Rahn wrote:
> THIS IS WRONG.

And the wrong code is still in the qemu-old port.


Index: Makefile
===
RCS file: /home/cvs/ports/emulators/qemu-old/Makefile,v
retrieving revision 1.12
diff -u -p -r1.12 Makefile
--- Makefile19 Jan 2011 16:22:31 -  1.12
+++ Makefile28 Jan 2011 01:35:42 -
@@ -6,7 +6,7 @@ ONLY_FOR_ARCHS= amd64 i386 powerpc
 COMMENT=   multi system emulator
 
 DISTNAME=  qemu-0.9.1
-REVISION=  17
+REVISION=  18
 CATEGORIES=emulators
 
 HOMEPAGE=  http://www.nongnu.org/qemu/
Index: patches/patch-vl_c
===
RCS file: /home/cvs/ports/emulators/qemu-old/patches/patch-vl_c,v
retrieving revision 1.1.1.1
diff -u -p -r1.1.1.1 patch-vl_c
--- patches/patch-vl_c  27 May 2010 17:33:43 -  1.1.1.1
+++ patches/patch-vl_c  28 Jan 2011 01:35:16 -
@@ -1,6 +1,6 @@
 $OpenBSD: patch-vl_c,v 1.1.1.1 2010/05/27 17:33:43 fgsch Exp $
 vl.c.orig  Sun Jan  6 13:38:42 2008
-+++ vl.c   Tue Jun 17 19:48:00 2008
+--- vl.c.orig  Sun Jan  6 14:38:42 2008
 vl.c   Thu Jan 27 20:35:07 2011
 @@ -61,7 +61,8 @@
  #include 
  #ifdef _BSD
@@ -123,16 +123,27 @@ $OpenBSD: patch-vl_c,v 1.1.1.1 2010/05/2
  
  if (ifname1 != NULL)
  pstrcpy(ifname, sizeof(ifname), ifname1);
-@@ -4320,7 +4396,7 @@ static int net_socket_mcast_create(struct sockaddr_in 
+@@ -4279,6 +4355,7 @@ static int net_socket_mcast_create(struct sockaddr_in 
+ struct ip_mreq imr;
+ int fd;
+ int val, ret;
++u_char loop;
+ if (!IN_MULTICAST(ntohl(mcastaddr->sin_addr.s_addr))) {
+   fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) does 
not contain a multicast address\n",
+   inet_ntoa(mcastaddr->sin_addr),
+@@ -4318,9 +4395,9 @@ static int net_socket_mcast_create(struct sockaddr_in 
+ }
+ 
  /* Force mcast msgs to loopback (eg. several QEMUs in same host */
- val = 1;
+-val = 1;
++loop = 1;
  ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
 -   (const char *)&val, sizeof(val));
-+   (const char *)&val, sizeof(char));
++   (const char *)&loop, sizeof(loop));
  if (ret < 0) {
perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
goto fail;
-@@ -4609,7 +4685,8 @@ static const char *get_word(char *buf, int buf_size, c
+@@ -4609,7 +4686,8 @@ static const char *get_word(char *buf, int buf_size, c
  return p;
  }
  
@@ -142,7 +153,7 @@ $OpenBSD: patch-vl_c,v 1.1.1.1 2010/05/2
 const char *tag, const char *str)
  {
  const char *p;
-@@ -4748,6 +4825,9 @@ static int net_client_init(const char *str)
+@@ -4748,6 +4826,9 @@ static int net_client_init(const char *str)
  char ifname[64];
  char setup_script[1024], down_script[1024];
  int fd;
@@ -152,7 +163,7 @@ $OpenBSD: patch-vl_c,v 1.1.1.1 2010/05/2
  vlan->nb_host_devs++;
  if (get_param_value(buf, sizeof(buf), "fd", p) > 0) {
  fd = strtol(buf, NULL, 0);
-@@ -4755,16 +4835,16 @@ static int net_client_init(const char *str)
+@@ -4755,16 +4836,16 @@ static int net_client_init(const char *str)
  if (net_tap_fd_init(vlan, fd))
  ret = 0;
  } else {
@@ -173,7 +184,7 @@ $OpenBSD: patch-vl_c,v 1.1.1.1 2010/05/2
  }
  } else
  #endif
-@@ -8130,19 +8210,23 @@ int main(int argc, char **argv)
+@@ -8130,19 +8211,23 @@ int main(int argc, char **argv)
  gdbstub_port = DEFAULT_GDBSTUB_PORT;
  #endif
  snapshot = 0;

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.