Re: Unprivileged double fault with GDB and simple program written in assembly
Hi, Il 08/09/24 01:23, Samuel Thibault ha scritto: Hello, AIUI, the fix you submitted was meant to fix this? yes, at least to correctly decode the registers in the fault handler, I didn't look deeper yet. Luca
[PATCH gnumach v3 1/3] add xfloat thread state interface
* i386/i386/fpu.c: extend current getter and setter to support the extended state; move the struct casting here to reuse the locking and allocation logic for the thread state; make sure the new state is set as valid, otherwise it won't be applied; add i386_get_xstate_size() to dynamically retrieve the FPU state size. * i386/i386/fpu.h: update prototypes to accept generic thread state * i386/i386/pcb.c: forward raw thread state to getter and setter, only checking for minimum size and use the new i386_get_xstate_size() helper. * i386/include/mach/i386/mach_i386.defs: expose the new helper i386_get_xstate_size(). * i386/include/mach/i386/thread_status.h: add interface definition for I386_XFLOAT_STATE and the corresponding data structure. --- i386/i386/fpu.c| 139 +++-- i386/i386/fpu.h| 9 +- i386/i386/pcb.c| 37 ++- i386/include/mach/i386/mach_i386.defs | 7 ++ i386/include/mach/i386/thread_status.h | 12 +++ 5 files changed, 161 insertions(+), 43 deletions(-) diff --git a/i386/i386/fpu.c b/i386/i386/fpu.c index 316e3b41..43771e7f 100644 --- a/i386/i386/fpu.c +++ b/i386/i386/fpu.c @@ -250,6 +250,17 @@ init_fpu(void) } } +kern_return_t +i386_get_xstate_size(host_t host, vm_size_t *size) +{ + if (host == HOST_NULL) + return KERN_INVALID_ARGUMENT; + + *size = sizeof(struct i386_xfloat_state) + fp_xsave_size; + + return KERN_SUCCESS; +} + /* * Initialize FP handling. */ @@ -385,10 +396,11 @@ twd_fxsr_to_i387 (struct i386_xfp_save *fxsave) * concurrent fpu_set_state or fpu_get_state. */ kern_return_t -fpu_set_state(const thread_t thread, - struct i386_float_state *state) +fpu_set_state(const thread_t thread, void *state, int flavor) { pcb_t pcb = thread->pcb; + struct i386_float_state *fstate = (struct i386_float_state*)state; + struct i386_xfloat_state *xfstate = (struct i386_xfloat_state*)state; struct i386_fpsave_state *ifps; struct i386_fpsave_state *new_ifps; @@ -410,7 +422,8 @@ ASSERT_IPL(SPL0); } #endif - if (state->initialized == 0) { + if ((flavor == i386_FLOAT_STATE && fstate->initialized == 0) || + (flavor == i386_XFLOAT_STATE && xfstate->initialized == 0)) { /* * new FPU state is 'invalid'. * Deallocate the fp state if it exists. @@ -428,13 +441,6 @@ ASSERT_IPL(SPL0); /* * Valid state. Allocate the fp state if there is none. */ - struct i386_fp_save *user_fp_state; - struct i386_fp_regs *user_fp_regs; - - user_fp_state = (struct i386_fp_save *) &state->hw_state[0]; - user_fp_regs = (struct i386_fp_regs *) - &state->hw_state[sizeof(struct i386_fp_save)]; - new_ifps = 0; Retry: simple_lock(&pcb->lock); @@ -454,10 +460,43 @@ ASSERT_IPL(SPL0); * Ensure that reserved parts of the environment are 0. */ memset(ifps, 0, fp_xsave_size); + ifps->fp_valid = TRUE; - if (fp_save_kind != FP_FNSAVE) { - int i; + if (flavor == i386_FLOAT_STATE) { + struct i386_fp_save *user_fp_state; + struct i386_fp_regs *user_fp_regs; + + user_fp_state = (struct i386_fp_save *) &fstate->hw_state[0]; + user_fp_regs = (struct i386_fp_regs *) + &fstate->hw_state[sizeof(struct i386_fp_save)]; + if (fp_save_kind != FP_FNSAVE) { + int i; + + ifps->xfp_save_state.fp_control = user_fp_state->fp_control; + ifps->xfp_save_state.fp_status = user_fp_state->fp_status; + ifps->xfp_save_state.fp_tag = twd_i387_to_fxsr(user_fp_state->fp_tag); + ifps->xfp_save_state.fp_eip = user_fp_state->fp_eip; + ifps->xfp_save_state.fp_cs = user_fp_state->fp_cs; + ifps->xfp_save_state.fp_opcode = user_fp_state->fp_opcode; + ifps->xfp_save_state.fp_dp = user_fp_state->fp_dp; + ifps->xfp_save_state.fp_ds = user_fp_state->fp_ds; + for (i=0; i<8; i++) + memcpy(&ifps->xfp_save_state.fp_reg_word[i], &user_fp_regs->fp_reg_word[i], sizeof(user_fp_regs->fp_reg_word[i])); + } else { + ifps->fp_save_state.fp_control = user_fp_state->fp_control; + ifps->fp_save_state.fp_status = user_fp_state->fp_status; + ifps->fp_save_state.fp_tag = user_fp_state->fp_tag; + ifps->fp_save_state.fp_eip = user_fp_state->fp_eip; + ifps->fp_save_state.fp_cs = user_fp_state->fp_cs; + ifps->fp_save_state.fp_opcode = user_fp_state->fp_opcode; +
[PATCH gnumach 3/3] x86_64: fix double fault handler
* x86_64/locore.S: adjust to the changes in the thread state structure (segment registers), and add the missing opcode. --- x86_64/locore.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86_64/locore.S b/x86_64/locore.S index 8f39a677..376f41c1 100644 --- a/x86_64/locore.S +++ b/x86_64/locore.S @@ -435,10 +435,10 @@ ENTRY(start_timer) ENTRY(t_dbl_fault) INT_FIX cli /* disable interrupts that might corrupt the state*/ + pushq $(T_DOUBLE_FAULT) /* indicate fault type */ pusha movq%cr2,%rax movq%rax,R_CR2-R_R15(%rsp) /* CR2 might contain the faulting address */ - subq$48,%rsp// FIXME remove when segments are cleaned up movq%rsp,%rdi /* pass the saved state */ callhandle_double_fault jmp cpu_shutdown/* reset */ -- 2.39.2
[PATCH gnumach v3 2/3] add tests for FLOAT/XFLOAT state
--- tests/include/testlib.h | 1 + tests/test-thread-state-fp.c | 240 +++ tests/testlib.c | 16 +++ tests/user-qemu.mk | 3 +- 4 files changed, 259 insertions(+), 1 deletion(-) create mode 100644 tests/test-thread-state-fp.c diff --git a/tests/include/testlib.h b/tests/include/testlib.h index 7c7c2b11..17a96660 100644 --- a/tests/include/testlib.h +++ b/tests/include/testlib.h @@ -70,6 +70,7 @@ thread_t test_thread_start(task_t task, void(*routine)(void*), void* arg); mach_port_t host_priv(void); mach_port_t device_priv(void); void wait_thread_terminated(thread_t th); +void wait_thread_suspended(thread_t th); extern vm_size_t vm_page_size; diff --git a/tests/test-thread-state-fp.c b/tests/test-thread-state-fp.c new file mode 100644 index ..2aab8ca9 --- /dev/null +++ b/tests/test-thread-state-fp.c @@ -0,0 +1,240 @@ +/* + * Copyright (c) 2024 Free Software Foundation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include +#include +#include +#include + +#if defined(__i386__) || defined(__x86_64__) +#include + +static void printx(struct i386_xfloat_state *state, int size) +{ + printf("xfloat init %d fp %d exc %d\n", + state->initialized, state->fpkind, state->exc_status); + struct i386_xfp_save *xfp = (struct i386_xfp_save *) &state->hw_state[0]; + printf("xfp %x %x %x %x %x %x %x %x\n", + xfp->fp_control, xfp->fp_status, xfp->fp_tag, xfp->fp_eip, + xfp->fp_cs, xfp->fp_opcode, xfp->fp_dp, xfp->fp_ds); + for (int i=0; i<8; i++) +{ + printf("fp%d", i); + for (int j=0; j<16; j++) +printf(" %02x", xfp->fp_reg_word[i][j]); + printf("\n"); +} + for (int i=0; i<16; i++) +{ + printf("xmm%02d", i); + for (int j=0; j<16; j++) +printf(" %02x", xfp->fp_xreg_word[i][j]); + printf("\n"); +} + printf("header xfp_features %llx bv %llx\n", + xfp->header.xfp_features, xfp->header.xcomp_bv); + int rem = size - sizeof(*state) - sizeof(struct i386_xfp_save); + if (rem > 0) +{ + int iter = 0; + while (rem > 0) +{ + const int len = 16; + int n; + if (rem > len) +n = len; + else +n = rem; + printf("ext"); + for (int j=0; jextended[j + iter*len]); + printf("\n"); + rem -= n; + iter++; +} +} +} + +static void thread_fp_getset(void *arg) +{ + int err; + thread_t th = *(thread_t*)arg; + + wait_thread_suspended(th); + + mach_msg_type_number_t state_count = i386_FLOAT_STATE_COUNT; + struct i386_float_state state; + + memset(&state, 0, sizeof(state)); + err = thread_get_state(th, i386_FLOAT_STATE, +(thread_state_t) &state, &state_count); + ASSERT_RET(err, "thread_get_state get failed"); + ASSERT(state_count == i386_FLOAT_STATE_COUNT, "bad state_count"); + + struct i386_fp_regs *fpr = + (struct i386_fp_regs *) (&state.hw_state[0] + sizeof(struct i386_fp_save)); + + printf("fp regs get:\n"); + for (int i=0; i<8; i++) + { + printf("fp%d", i); + for (int j=0; j<5; j++) + printf(" %04x", fpr->fp_reg_word[i][j]); + printf("\n"); + } + + char tmp[10]; + memcpy(tmp, &fpr->fp_reg_word[1][0], sizeof(tmp)); + memcpy(&fpr->fp_reg_word[1][0], &fpr->fp_reg_word[0][0], sizeof(tmp)); + memcpy(&fpr->fp_reg_word[0][0], tmp, sizeof(tmp)); + + printf("fp regs set:\n"); + for (int i=0; i<8; i++) + { + printf("fp%d", i); + for (int j=0; j<5; j++) + printf(" %04x", fpr->fp_reg_word[i][j]); + printf("\n"); + } + + err = thread_set_state(th, i386_FLOAT_STATE, +(thread_state_t) &state, state_count); + ASSERT_RET(err, "thread_set_state set failed"); + + err = thread_resume(th); + ASSERT_RET(err, "error in thread_resume"); + thread_terminate(mach_thread_self()); + FAILURE("thread_terminate"); +} + +static void test_fp_state_getset() +{ + int err; + thread_t th = mach_thread_self(); + + /* load some known value in FP registers */ + int n1[1] = {11}; + float d2[1] = {123.456}; + asm volatile ("fildl %0\n" +"fldl %1\n" +:: "m" (n1), "m" (d2) :); + + /* then switch to the ge
Re: Unprivileged double fault with GDB and simple program written in assembly
Hi, Il 28/08/24 15:01, J. E. Marinheiro ha scritto: At this point, a double fault evidently happens, Mach starts panicking, and the registers are dumped: * RAX=4010DE * RBX=0 * RCX=1 * RDX=0 * RSI=0 * RDI=0 * RBP=0 * RSP=0 * R8 to R12=0 * EFLAGS=4000CE The error message is: `trapno 0: Divide error, error 01402ff8' `panic ../i386/i386/trap.c:677: handle_double_fault: DOUBLE FAULT! This is critical' I can reproduce the issues, both with and without gdb; in particular, I see that there is a bug in decoding the registers after a double fault; Fixing that I see RAX 003c RBX RCX 004000de RDX RSI 004010de RDI RBP 0040 RSP R8 1403 R9 R10 R11 R12 R13 R14 R15 RIP 81012164 EFLAGS 00010102 where RIP seems to point the the syscall64 entry. It's also weird to have RSP=0 but that might be due to the bad state of the program being debugged. GDB correctly prints errors related to signals, as there is no signal thread to handle them: Thread 4 received signal ?, Unknown signal. _start () at program.S:11 11mov $60, %rax warning: Signal ? does not exist on this system. warning: Can't deliver signal ?: No signal thread. The signal seems caused by an invalid opcode exception after the second syscall, sent by the kernel and handled by GDB. From a kernel trace I collected it seems that the exception is sent twice, and it seems that somewhere after the second exception, when the program is resumed, the double fault happens. There might be an issue with restarting twice a task with an invalid state, if exceptions can be delivered; without GDB, the kernel terminates the task at the first error, as there is no exception port set. When not using GDB, the program is simply killed by the system and nothing bad seems to happen. I'm guessing Linux syscalls need not be the same as Mach syscalls, but a double fault from some faulty program shouldn't trigger a panic without even root privileges. The kernel has no idea about unix users, AFAIK in userspace the difference is basically the access to some privileged mach ports, but this is implemented in glibc and the hurd servers. Luca
Re: [PATCH gnumach 2/3] add tests for FLOAT/XFLOAT state
Il 23/08/24 11:50, Sergey Bugaev ha scritto: On Wed, Aug 21, 2024 at 7:37 PM Luca Dariz wrote: +#include + +static void printx(struct i386_xfloat_state *state, int size) +{ + printf("xfloat init %d fp %d exc %d\n", + state->initialized, state->fpkind, state->exc_status); + struct i386_xfp_save *xfp = (struct i386_xfp_save *) &state->hw_state[0]; + printf("xfp %d %d %d %d %d %d %d %d\n", + xfp->fp_control, xfp->fp_status, xfp->fp_tag, xfp->fp_eip, + xfp->fp_cs, xfp->fp_opcode, xfp->fp_dp, xfp->fp_ds); Please make sure to disable this on non-x86. You can disable the whole test, I guess, or -- would testing something like this also make sense on other architectures? Make it can just fail on other archs, I guess in general we can expect such test to be needed. Luca
Re: [PATCH v2 gnumach 1/3] add xfloat thread state interface
Il 22/08/24 23:26, Samuel Thibault ha scritto: Hello, Thanks for the improved version! Luca Dariz, le mer. 21 août 2024 18:36:14 +0200, a ecrit: @@ -495,10 +534,11 @@ ASSERT_IPL(SPL0); * concurrent fpu_set_state or fpu_get_state. */ kern_return_t -fpu_get_state(const thread_t thread, - struct i386_float_state *state) +fpu_get_state(const thread_t thread, void *state, int flavor) { pcb_t pcb = thread->pcb; + struct i386_float_state *fstate = (struct i386_float_state*)state; + struct i386_xfloat_state *xfstate = (struct i386_xfloat_state*)state; struct i386_fpsave_state *ifps; ASSERT_IPL(SPL0); @@ -512,7 +552,10 @@ ASSERT_IPL(SPL0); * No valid floating-point state. */ simple_unlock(&pcb->lock); - memset(state, 0, sizeof(struct i386_float_state)); +if (flavor == i386_FLOAT_STATE) +memset(state, 0, sizeof(struct i386_float_state)); +else if (flavor == i386_XFLOAT_STATE) +memset(state, 0, sizeof(struct i386_xfloat_state)); I guess we should also memset to 0 the fp_xsave_size - sizeof(struct i386_xfp_save) part, to avoid leaking data? Thus just pass fp_xsave_size rather than sizeof(struct i386_xfloat_state). diff --git a/i386/include/mach/i386/thread_status.h b/i386/include/mach/i386/thread_status.h index 94596a74..e5632ed6 100644 --- a/i386/include/mach/i386/thread_status.h +++ b/i386/include/mach/i386/thread_status.h @@ -148,6 +149,20 @@ struct i386_float_state { }; #define i386_FLOAT_STATE_COUNT (sizeof(struct i386_float_state)/sizeof(unsigned int)) +#define XFP_STATE_BYTES (sizeof (struct i386_xfp_save)) + +struct i386_xfloat_state { + int fpkind; /* FP_NO..FP_387X (readonly) */ + int initialized; + int exc_status; /* exception status (readonly) */ + int fp_save_kind; /* format of hardware state */ + unsigned char hw_state[XFP_STATE_BYTES]; /* actual "hardware" state */ I'm wondering if it's really useful to use XFP_STATE_BYTES here, since callers are supposed to be able to support the extended part. Better make hw_state zero-sized so that any miss in supporting the extended part will hopefully be much more probably caught. That'll also make i386_get_xstate_size simpler. Good point, I'll fix it. Luca
Re: [PATCH v2 gnumach 1/3] add xfloat thread state interface
Il 22/08/24 23:51, Samuel Thibault ha scritto: Luca Dariz, le mer. 21 août 2024 18:36:14 +0200, a ecrit: + } else if (flavor == i386_XFLOAT_STATE) { + int i; + struct i386_xfp_save *user_fp_state = (struct i386_xfp_save *) &xfstate->hw_state[0]; We should also check that user_fp_state->fp_save_kind == fp_save_kind. Right, will do. Luca
Re: [PATCH v2 gnumach 1/3] add xfloat thread state interface
Il 22/08/24 23:44, Samuel Thibault ha scritto: Luca Dariz, le mer. 21 août 2024 18:36:14 +0200, a ecrit: diff --git a/i386/include/mach/i386/mach_i386.defs b/i386/include/mach/i386/mach_i386.defs index 965d5c3b..61fed222 100644 --- a/i386/include/mach/i386/mach_i386.defs +++ b/i386/include/mach/i386/mach_i386.defs @@ -111,3 +111,9 @@ routine i386_get_gdt( target_thread : thread_t; selector: int; out desc: descriptor_t); + +/* Returns the size in bytes of the FPU extended state, to be used + with i386_XFLOAT_STATE instead of i386_XFLOAT_STATE_COUNT */ +routinei386_get_xstate_size( + host: host_t; + out size: natural_t); In principle that should rather be a vm_size_t? Yes, this was to match with thread_*_state(), which use mach_msg_type_number_t which is a natural_t, and considering that on 64 bit sizeof(natural_t) < sizeof(vm_size_t). If this makes sense, I'll document it here. Luca
[PATCH gnumach 3/3] add rpc interrupted test
* tests/test-machmsg.c: add two use cases used by glibc during signal handling * tests/include/testlib.h * tests/testlib.c: add new wait_thread_terminated() helper --- tests/include/testlib.h | 1 + tests/test-machmsg.c| 80 + tests/testlib.c | 15 3 files changed, 96 insertions(+) diff --git a/tests/include/testlib.h b/tests/include/testlib.h index 1d08067b..737efd41 100644 --- a/tests/include/testlib.h +++ b/tests/include/testlib.h @@ -70,6 +70,7 @@ thread_t test_thread_start(task_t task, void(*routine)(void*), void* arg); mach_port_t host_priv(void); mach_port_t device_priv(void); void wait_thread_suspended(thread_t th); +void wait_thread_terminated(thread_t th); extern vm_size_t vm_page_size; diff --git a/tests/test-machmsg.c b/tests/test-machmsg.c index ac292376..7f535bde 100644 --- a/tests/test-machmsg.c +++ b/tests/test-machmsg.c @@ -499,6 +499,83 @@ void test_msg_emptydesc(void) } +void recv_to_be_interrupted(void *arg) +{ + mach_msg_header_t msg; + kern_return_t ret; + mach_port_t rcv_name; + long err = (long)arg; + + ret = mach_port_allocate(mach_task_self (), MACH_PORT_RIGHT_RECEIVE, &rcv_name); + ASSERT_RET(ret, "creating rx port"); + + ret = mach_msg(&msg, MACH_RCV_MSG, + 0, sizeof(msg), rcv_name, + MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); + + printf("mach_msg returned %x\n", ret); + ASSERT(ret == err, "recv not interrupted correctly"); + + thread_terminate(mach_thread_self()); + FAILURE("thread_terminate"); +} + +void test_recv_interrupted(void) +{ + kern_return_t ret; + thread_t th; + th = test_thread_start(mach_task_self(), recv_to_be_interrupted, (void*)MACH_RCV_INTERRUPTED); + msleep(100); + + ret = thread_suspend(th); + ASSERT_RET(ret, "thread_suspend"); + + ret = thread_abort(th); + ASSERT_RET(ret, "thread_abort"); + + ret = thread_resume(th); + ASSERT_RET(ret, "thread_resume"); + + wait_thread_terminated(th); +} + +void test_recv_interrupted_setreturn(void) +{ + kern_return_t ret; + thread_t th; + th = test_thread_start(mach_task_self(), recv_to_be_interrupted, (void*)123L); + msleep(100); + + ret = thread_suspend(th); + ASSERT_RET(ret, "thread_suspend"); + + ret = thread_abort(th); + ASSERT_RET(ret, "thread_abort"); + + + struct i386_thread_state state; + unsigned int count; + count = i386_THREAD_STATE_COUNT; + ret = thread_get_state(th, i386_REGS_SEGS_STATE, + (thread_state_t) &state, &count); + ASSERT_RET(ret, "thread_get_state()"); + +#ifdef __i386__ + state.eax = 123; +#elif defined(__x86_64__) + state.rax = 123; +#endif + ret = thread_set_state(th, i386_REGS_SEGS_STATE, + (thread_state_t) &state, i386_THREAD_STATE_COUNT); + ASSERT_RET(ret, "thread_set_state"); + + ret = thread_resume(th); + ASSERT_RET(ret, "thread_resume"); + + wait_thread_terminated(th); +} + + int main (int argc, char *argv[], int envc, char *envp[]) { @@ -512,5 +589,8 @@ main (int argc, char *argv[], int envc, char *envp[]) test_msg_emptydesc(); printf("test_iters()\n"); test_iterations(); + printf("test_recv_interrupted()\n"); + test_recv_interrupted(); + test_recv_interrupted_setreturn(); return 0; } diff --git a/tests/testlib.c b/tests/testlib.c index df9d7113..4200979d 100644 --- a/tests/testlib.c +++ b/tests/testlib.c @@ -225,6 +225,21 @@ void wait_thread_suspended(thread_t th) } while (1); } +void wait_thread_terminated(thread_t th) +{ + int err; + struct thread_basic_info info; + mach_msg_type_number_t count; + do { +count = THREAD_BASIC_INFO_COUNT; +err = thread_info(th, THREAD_BASIC_INFO, (thread_info_t)&info, &count); +if (err == MACH_SEND_INVALID_DEST) +break; +ASSERT_RET(err, "error in thread_info"); +msleep(100); // don't poll continuously + } while (1); +} + /* * Minimal _start() for test modules, we just take the arguments from the * kernel, call main() and reboot. As in glibc, we expect the argument pointer -- 2.39.2
[PATCH gnumach 2/3] add tests for FLOAT/XFLOAT state
--- tests/include/testlib.h | 1 + tests/test-thread-state-fp.c | 232 +++ tests/testlib.c | 16 +++ tests/user-qemu.mk | 3 +- 4 files changed, 251 insertions(+), 1 deletion(-) create mode 100644 tests/test-thread-state-fp.c diff --git a/tests/include/testlib.h b/tests/include/testlib.h index 035fdc28..1d08067b 100644 --- a/tests/include/testlib.h +++ b/tests/include/testlib.h @@ -69,6 +69,7 @@ thread_t test_thread_start(task_t task, void(*routine)(void*), void* arg); mach_port_t host_priv(void); mach_port_t device_priv(void); +void wait_thread_suspended(thread_t th); extern vm_size_t vm_page_size; diff --git a/tests/test-thread-state-fp.c b/tests/test-thread-state-fp.c new file mode 100644 index ..985ca4d6 --- /dev/null +++ b/tests/test-thread-state-fp.c @@ -0,0 +1,232 @@ +/* + * Copyright (c) 2024 Free Software Foundation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include +#include +#include +#include +#include + +static void printx(struct i386_xfloat_state *state, int size) +{ + printf("xfloat init %d fp %d exc %d\n", + state->initialized, state->fpkind, state->exc_status); + struct i386_xfp_save *xfp = (struct i386_xfp_save *) &state->hw_state[0]; + printf("xfp %d %d %d %d %d %d %d %d\n", + xfp->fp_control, xfp->fp_status, xfp->fp_tag, xfp->fp_eip, + xfp->fp_cs, xfp->fp_opcode, xfp->fp_dp, xfp->fp_ds); + for (int i=0; i<8; i++) +{ + printf("fp%d", i); + for (int j=0; j<16; j++) +printf(" %02x", xfp->fp_reg_word[i][j]); + printf("\n"); +} + for (int i=0; i<16; i++) +{ + printf("xmm%02d", i); + for (int j=0; j<16; j++) +printf(" %02x", xfp->fp_xreg_word[i][j]); + printf("\n"); +} + printf("header xfp_features %llx bv %llx\n", + xfp->header.xfp_features, xfp->header.xcomp_bv); + if (size > sizeof(*state)) +{ + int rem = size - sizeof(*state), iter = 0; + while (rem > 0) +{ + const int len = 16; + int n; + if (rem > len) +n = len; + else +n = rem; + printf("ext"); + for (int j=0; jextended[j + iter*len]); + printf("\n"); + rem -= n; + iter++; +} +} +} + +static void thread_fp_getset(void *arg) +{ + int err; + thread_t th = *(thread_t*)arg; + + wait_thread_suspended(th); + + mach_msg_type_number_t state_count = i386_FLOAT_STATE_COUNT; + struct i386_float_state state; + + memset(&state, 0, sizeof(state)); + err = thread_get_state(th, i386_FLOAT_STATE, +(thread_state_t) &state, &state_count); + ASSERT_RET(err, "thread_get_state get failed"); + ASSERT(state_count == i386_FLOAT_STATE_COUNT, "bad state_count"); + + struct i386_fp_regs *fpr = + (struct i386_fp_regs *) (&state.hw_state[0] + sizeof(struct i386_fp_save)); + + printf("fp regs get:\n"); + for (int i=0; i<8; i++) + { + printf("fp%d", i); + for (int j=0; j<5; j++) + printf(" %04x", fpr->fp_reg_word[i][j]); + printf("\n"); + } + + char tmp[10]; + memcpy(tmp, &fpr->fp_reg_word[1][0], sizeof(tmp)); + memcpy(&fpr->fp_reg_word[1][0], &fpr->fp_reg_word[0][0], sizeof(tmp)); + memcpy(&fpr->fp_reg_word[0][0], tmp, sizeof(tmp)); + + printf("fp regs set:\n"); + for (int i=0; i<8; i++) + { + printf("fp%d", i); + for (int j=0; j<5; j++) + printf(" %04x", fpr->fp_reg_word[i][j]); + printf("\n"); + } + + err = thread_set_state(th, i386_FLOAT_STATE, +(thread_state_t) &state, state_count); + ASSERT_RET(err, "thread_set_state set failed"); + + err = thread_resume(th); + ASSERT_RET(err, "error in thread_resume"); + thread_terminate(mach_thread_self()); + FAILURE("thread_terminate"); +} + +static void test_fp_state_getset() +{ + int err; + thread_t th = mach_thread_self(); + + /* load some known value in FP registers */ + int n1[1] = {11}; + float d2[1] = {123.456}; + asm volatile ("fildl %0\n" +"fldl %1\n" +:: "m" (n1), "m" (d2) :); + + /* then switch to the get/set test thread, and wait to be resumed */ + test_thread_start(mach_task_self(), thread_fp_getset, &th); + er
[PATCH v2 gnumach 1/3] add xfloat thread state interface
* i386/i386/fpu.c: extend current getter and setter to support the extended state; move the struct casting here to reuse the locking and allocation logic for the thread state; make sure the new state is set as valid, otherwise it won't be applied; add i386_get_xstate_size() to dynamically retrieve the FPU state size. * i386/i386/fpu.h: update prototypes to accept generic thread state * i386/i386/pcb.c: forward raw thread state to getter and setter, only checking for minimum size and use the new i386_get_xstate_size() helper. * i386/include/mach/i386/mach_i386.defs: expose the new helper i386_get_xstate_size(). * i386/include/mach/i386/thread_status.h: add interface definition for I386_XFLOAT_STATE and the corresponding data structure. --- i386/i386/fpu.c| 139 +++-- i386/i386/fpu.h| 9 +- i386/i386/pcb.c| 37 ++- i386/include/mach/i386/mach_i386.defs | 6 ++ i386/include/mach/i386/thread_status.h | 15 +++ 5 files changed, 163 insertions(+), 43 deletions(-) diff --git a/i386/i386/fpu.c b/i386/i386/fpu.c index 316e3b41..2d8a5887 100644 --- a/i386/i386/fpu.c +++ b/i386/i386/fpu.c @@ -250,6 +250,17 @@ init_fpu(void) } } +kern_return_t +i386_get_xstate_size(host_t host, natural_t *size) +{ + if (host == HOST_NULL) + return KERN_INVALID_ARGUMENT; + + *size = sizeof(struct i386_xfloat_state) + fp_xsave_size - sizeof(struct i386_xfp_save); + + return KERN_SUCCESS; +} + /* * Initialize FP handling. */ @@ -385,10 +396,11 @@ twd_fxsr_to_i387 (struct i386_xfp_save *fxsave) * concurrent fpu_set_state or fpu_get_state. */ kern_return_t -fpu_set_state(const thread_t thread, - struct i386_float_state *state) +fpu_set_state(const thread_t thread, void *state, int flavor) { pcb_t pcb = thread->pcb; + struct i386_float_state *fstate = (struct i386_float_state*)state; + struct i386_xfloat_state *xfstate = (struct i386_xfloat_state*)state; struct i386_fpsave_state *ifps; struct i386_fpsave_state *new_ifps; @@ -410,7 +422,8 @@ ASSERT_IPL(SPL0); } #endif - if (state->initialized == 0) { + if ((flavor == i386_FLOAT_STATE && fstate->initialized == 0) || + (flavor == i386_XFLOAT_STATE && xfstate->initialized == 0)) { /* * new FPU state is 'invalid'. * Deallocate the fp state if it exists. @@ -428,13 +441,6 @@ ASSERT_IPL(SPL0); /* * Valid state. Allocate the fp state if there is none. */ - struct i386_fp_save *user_fp_state; - struct i386_fp_regs *user_fp_regs; - - user_fp_state = (struct i386_fp_save *) &state->hw_state[0]; - user_fp_regs = (struct i386_fp_regs *) - &state->hw_state[sizeof(struct i386_fp_save)]; - new_ifps = 0; Retry: simple_lock(&pcb->lock); @@ -454,10 +460,43 @@ ASSERT_IPL(SPL0); * Ensure that reserved parts of the environment are 0. */ memset(ifps, 0, fp_xsave_size); + ifps->fp_valid = TRUE; - if (fp_save_kind != FP_FNSAVE) { - int i; + if (flavor == i386_FLOAT_STATE) { + struct i386_fp_save *user_fp_state; + struct i386_fp_regs *user_fp_regs; + + user_fp_state = (struct i386_fp_save *) &fstate->hw_state[0]; + user_fp_regs = (struct i386_fp_regs *) + &fstate->hw_state[sizeof(struct i386_fp_save)]; + if (fp_save_kind != FP_FNSAVE) { + int i; + + ifps->xfp_save_state.fp_control = user_fp_state->fp_control; + ifps->xfp_save_state.fp_status = user_fp_state->fp_status; + ifps->xfp_save_state.fp_tag = twd_i387_to_fxsr(user_fp_state->fp_tag); + ifps->xfp_save_state.fp_eip = user_fp_state->fp_eip; + ifps->xfp_save_state.fp_cs = user_fp_state->fp_cs; + ifps->xfp_save_state.fp_opcode = user_fp_state->fp_opcode; + ifps->xfp_save_state.fp_dp = user_fp_state->fp_dp; + ifps->xfp_save_state.fp_ds = user_fp_state->fp_ds; + for (i=0; i<8; i++) + memcpy(&ifps->xfp_save_state.fp_reg_word[i], &user_fp_regs->fp_reg_word[i], sizeof(user_fp_regs->fp_reg_word[i])); + } else { + ifps->fp_save_state.fp_control = user_fp_state->fp_control; + ifps->fp_save_state.fp_status = user_fp_state->fp_status; + ifps->fp_save_state.fp_tag = user_fp_state->fp_tag; + ifps->fp_save_state.fp_eip = user_fp_state->fp_eip; + ifps->fp_save_state.fp_cs = user_fp_state->fp_cs; + ifps->fp_save_state.fp_opcod
Re: [PATCH] add xfloat thread state interface
Hi, Il 05/08/24 20:28, Sergey Bugaev ha scritto: On Mon, Aug 5, 2024 at 9:23 PM Samuel Thibault wrote: Luca Dariz, le lun. 05 août 2024 14:52:24 +0200, a ecrit: Il 05/08/24 00:32, Samuel Thibault ha scritto: Luca Dariz, le ven. 02 août 2024 17:32:34 +0200, a ecrit: +#define XFP_STATE_BYTES (sizeof (struct i386_xfp_save)) That is not sufficient: depending on the sse level and the saving style, we have various amount of data to store. See fp_xsave_size computed in init_fpu, we have to save all of that. We thus have to make get_state check that the user-land buffer is large enough for that. And as I mentioned on irc, the i386_xfloat_state stucture should contain the fp_save_kind for glibc to know how to restore it. Right, maybe we also need a way for userspace to discover the required size? e.g. with a separate call i386_get_xstate_size(). I was thinking that perhaps glibc can compute it itself with cpuid, but it'll be simpler to add some call to gnumach indeed. Samuel I haven't looked closely, but just a quick reminder that MIG can return variable-sized data just fine, and the caller (glibc) will then learn the size after having received the data. The machine_thread_all_state abstraction (that glibc uses internally) does make this somewhat awkward though. AFAIK in thread_get_state() the caller needs to allocate enough space in any case, even if the exact amount is not known, as handling MIG_ARRAY_TOO_LARGE would lead to a partial state save. We could check if there is an upper bound on this, but it might result in more maintenance work if in the future the extended state is increased to a bigger size. Retrieving the amount with an arch-specific call, we only need to handle this in gnumach (my understanding is that glibc just needs to save and restore the xfp state, without inspecting it, during signal handling). Luca
Re: [PATCH] add xfloat thread state interface
Il 05/08/24 00:32, Samuel Thibault ha scritto: Hello, Luca Dariz, le ven. 02 août 2024 17:32:34 +0200, a ecrit: +#define XFP_STATE_BYTES (sizeof (struct i386_xfp_save)) That is not sufficient: depending on the sse level and the saving style, we have various amount of data to store. See fp_xsave_size computed in init_fpu, we have to save all of that. We thus have to make get_state check that the user-land buffer is large enough for that. And as I mentioned on irc, the i386_xfloat_state stucture should contain the fp_save_kind for glibc to know how to restore it. Right, maybe we also need a way for userspace to discover the required size? e.g. with a separate call i386_get_xstate_size(). Luca
[PATCH] add xfloat thread state interface
* i386/i386/fpu.c: extend current getter and setter to support the extended state; move the struct casting here to reuse the locking and allocation logic for the thread state. * i386/i386/fpu.h: update prototypes to accept generic thread state * i386/i386/pcb.c: forward raw thread state to getter and setter, only checking for minimum size. * i386/include/mach/i386/thread_status.h: add interface definition for I386_XFLOAT_STATE and the corresponding data structure. --- i386/i386/fpu.c| 115 ++--- i386/i386/fpu.h| 8 +- i386/i386/pcb.c| 23 - i386/include/mach/i386/thread_status.h | 12 +++ 4 files changed, 115 insertions(+), 43 deletions(-) diff --git a/i386/i386/fpu.c b/i386/i386/fpu.c index 316e3b41..4f783293 100644 --- a/i386/i386/fpu.c +++ b/i386/i386/fpu.c @@ -385,10 +385,11 @@ twd_fxsr_to_i387 (struct i386_xfp_save *fxsave) * concurrent fpu_set_state or fpu_get_state. */ kern_return_t -fpu_set_state(const thread_t thread, - struct i386_float_state *state) +fpu_set_state(const thread_t thread, void *state, int flavor) { pcb_t pcb = thread->pcb; + struct i386_float_state *fstate = (struct i386_float_state*)state; + struct i386_xfloat_state *xfstate = (struct i386_xfloat_state*)state; struct i386_fpsave_state *ifps; struct i386_fpsave_state *new_ifps; @@ -410,7 +411,8 @@ ASSERT_IPL(SPL0); } #endif - if (state->initialized == 0) { + if ((flavor == i386_FLOAT_STATE && fstate->initialized == 0) || + (flavor == i386_XFLOAT_STATE && xfstate->initialized == 0)) { /* * new FPU state is 'invalid'. * Deallocate the fp state if it exists. @@ -428,13 +430,6 @@ ASSERT_IPL(SPL0); /* * Valid state. Allocate the fp state if there is none. */ - struct i386_fp_save *user_fp_state; - struct i386_fp_regs *user_fp_regs; - - user_fp_state = (struct i386_fp_save *) &state->hw_state[0]; - user_fp_regs = (struct i386_fp_regs *) - &state->hw_state[sizeof(struct i386_fp_save)]; - new_ifps = 0; Retry: simple_lock(&pcb->lock); @@ -455,9 +450,41 @@ ASSERT_IPL(SPL0); */ memset(ifps, 0, fp_xsave_size); - if (fp_save_kind != FP_FNSAVE) { - int i; + if (flavor == i386_FLOAT_STATE) { + struct i386_fp_save *user_fp_state; + struct i386_fp_regs *user_fp_regs; + + user_fp_state = (struct i386_fp_save *) &fstate->hw_state[0]; + user_fp_regs = (struct i386_fp_regs *) + &fstate->hw_state[sizeof(struct i386_fp_save)]; + if (fp_save_kind != FP_FNSAVE) { + int i; + + ifps->xfp_save_state.fp_control = user_fp_state->fp_control; + ifps->xfp_save_state.fp_status = user_fp_state->fp_status; + ifps->xfp_save_state.fp_tag = twd_i387_to_fxsr(user_fp_state->fp_tag); + ifps->xfp_save_state.fp_eip = user_fp_state->fp_eip; + ifps->xfp_save_state.fp_cs = user_fp_state->fp_cs; + ifps->xfp_save_state.fp_opcode = user_fp_state->fp_opcode; + ifps->xfp_save_state.fp_dp = user_fp_state->fp_dp; + ifps->xfp_save_state.fp_ds = user_fp_state->fp_ds; + for (i=0; i<8; i++) + memcpy(&ifps->xfp_save_state.fp_reg_word[i], &user_fp_regs->fp_reg_word[i], sizeof(user_fp_regs->fp_reg_word[i])); + } else { + ifps->fp_save_state.fp_control = user_fp_state->fp_control; + ifps->fp_save_state.fp_status = user_fp_state->fp_status; + ifps->fp_save_state.fp_tag = user_fp_state->fp_tag; + ifps->fp_save_state.fp_eip = user_fp_state->fp_eip; + ifps->fp_save_state.fp_cs = user_fp_state->fp_cs; + ifps->fp_save_state.fp_opcode = user_fp_state->fp_opcode; + ifps->fp_save_state.fp_dp = user_fp_state->fp_dp; + ifps->fp_save_state.fp_ds = user_fp_state->fp_ds; + ifps->fp_regs = *user_fp_regs; + } + } else if (flavor == i386_XFLOAT_STATE) { + int i; + struct i386_xfp_save *user_fp_state = (struct i386_xfp_save *) &xfstate->hw_state[0]; ifps->xfp_save_state.fp_control = user_fp_state->fp_control; ifps->xfp_save_state.fp_status = user_fp_state->fp_status; ifps->xfp_save_state.fp_tag = twd_i387_to_fxsr(user_fp_state->fp_tag); @@ -467,17 +494,11 @@ ASSERT_IPL(SPL0); ifps->xfp_save_state.fp_dp = user_fp_state->fp_dp; ifps->x
[PATCH 2/2] tests/machmsg: check rx message size on different code paths
* tests/test-machmsg.c: add more combinations to existing cases: - make tx and rx ports independent in the send/receive tests - add two more variants for send/receive tests, using two separate system calls, using different code paths in mach_msg(). --- tests/test-machmsg.c | 117 +-- 1 file changed, 114 insertions(+), 3 deletions(-) diff --git a/tests/test-machmsg.c b/tests/test-machmsg.c index 60f3f49f..ac292376 100644 --- a/tests/test-machmsg.c +++ b/tests/test-machmsg.c @@ -40,6 +40,7 @@ static uint32_t align(uint32_t val, size_t aln) struct echo_params { + mach_port_t tx_port; mach_port_t rx_port; mach_msg_size_t rx_size; mach_msg_size_t rx_number; @@ -110,6 +111,7 @@ test_iterations (void) struct echo_params params; params.rx_port = port; + params.tx_port = port; params.rx_size = sizeof(message.header) + sizeof(message.type) + 5; ALIGN_INLINE(params.rx_size, MACH_MSG_USER_ALIGNMENT); params.rx_number = TEST_ITERATIONS; @@ -183,6 +185,7 @@ run_test_simple(void *msg, mach_msg_size_t msglen, mach_msg_id_t msgid) ASSERT_RET(err, "syscall_mach_port_allocate 2"); struct echo_params params; + params.tx_port = MACH_PORT_NULL; params.rx_port = port; params.rx_size = msglen; params.rx_number = 1; @@ -208,6 +211,63 @@ run_test_simple(void *msg, mach_msg_size_t msglen, mach_msg_id_t msgid) ASSERT(head->msgh_size == msglen, "wrong size in final rx"); } +/* same as run_test_simple(), but use two different sysccalls for tx and rx */ +void +run_test_simple_split(void *msg, mach_msg_size_t msglen, mach_msg_id_t msgid) +{ + mach_msg_header_t *head = msg; + mach_port_t port, receive; + int err; + + err = syscall_mach_port_allocate (mach_task_self (), +MACH_PORT_RIGHT_RECEIVE, &port); + ASSERT_RET(err, "syscall_mach_port_allocate"); + + err = syscall_mach_port_allocate (mach_task_self (), +MACH_PORT_RIGHT_RECEIVE, &receive); + ASSERT_RET(err, "syscall_mach_port_allocate 2"); + + struct echo_params params; + params.tx_port = receive; + params.rx_port = port; + params.rx_size = msglen; + params.rx_number = 1; + test_thread_start (mach_task_self (), echo_thread, ¶ms); + + head->msgh_bits = MACH_MSGH_BITS(MACH_MSG_TYPE_MAKE_SEND, + MACH_MSG_TYPE_MAKE_SEND_ONCE); + head->msgh_remote_port = port; + head->msgh_local_port = receive; + head->msgh_id = msgid; + head->msgh_size = 0; // check that the echo thread receives the correct size + + err = mach_msg (msg, + MACH_SEND_MSG, + msglen, + 0, + MACH_PORT_NULL, + MACH_MSG_TIMEOUT_NONE, + MACH_PORT_NULL); + ASSERT_RET(err, "mach_msg tx"); + + memset(msg, 0, msglen); + + err = mach_msg (msg, + MACH_RCV_MSG, + 0, + msglen, + receive, + MACH_MSG_TIMEOUT_NONE, + MACH_PORT_NULL); + ASSERT_RET(err, "mach_msg rx"); + + printf("size in final rx: %d expected %d\n", head->msgh_size, msglen); + ASSERT(head->msgh_size == msglen, "wrong size in final rx"); +} + +/* Text tx and rx of a message, without using a different thread. We + * also use the same port to send and receive the message. + */ void run_test_simple_self(void *msg, mach_msg_size_t msglen, mach_msg_id_t msgid) { @@ -222,9 +282,6 @@ run_test_simple_self(void *msg, mach_msg_size_t msglen, mach_msg_id_t msgid) head->msgh_bits = MACH_MSGH_BITS (MACH_MSG_TYPE_MAKE_SEND, MACH_MSG_TYPE_MAKE_SEND_ONCE); - /* head->msgh_bits */ - /* = MACH_MSGH_BITS (MACH_MSG_TYPE_MAKE_SEND_ONCE, */ - /* MACH_MSG_TYPE_COPY_SEND); */ head->msgh_bits |= MACH_MSGH_BITS_COMPLEX; head->msgh_remote_port = port; @@ -245,6 +302,52 @@ run_test_simple_self(void *msg, mach_msg_size_t msglen, mach_msg_id_t msgid) ASSERT(head->msgh_size == msglen, "wrong size in final rx\n"); } +/* same as run_test_simple_self(), but use two different sysccalls for tx and rx */ +void +run_test_simple_self_split(void *msg, mach_msg_size_t msglen, mach_msg_id_t msgid) +{ + mach_msg_header_t *head = msg; + mach_port_t port, receive; + int err; + + err = syscall_mach_port_allocate (mach_task_self (), +MACH_PORT_RIGHT_RECEIVE, &port); + ASSERT_RET(err, "syscall_mach_port_allocate"); + + head->msgh_bits += MACH_MSGH_BITS (MACH_MSG_TYPE_MAKE_SEND, + MACH_MSG_TYPE_MAKE_SEND_ONCE); + + head->msgh_bits |= MACH_MSGH_BITS_COMPLEX; + head->msgh_remote_port = port; + head->msgh_local_port = port; + head->msgh_id = msgid; + head->msgh_size = msglen; + + err = mach_msg (msg, + MACH_SEND_MSG, + msglen, + 0, + port, + MACH_MSG_TIMEOUT_NON
[PATCH 1/2] x86_64: fix msg size forwarding in case it's not set by userspace
* ipc/copy_user.c: recent MIG stubs should always fill the size correctly in the msg header, but we shouldn't rely on that. Instead, we use the size that was correctly copied-in, overwriting the value in the header. This is already done by the 32-bit copyinmsg(), and was missing in the 64-bit version. Furthermore, the assertion about user/kernel size make sense with and without USER32, so take it out if the #ifdef. --- ipc/copy_user.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ipc/copy_user.c b/ipc/copy_user.c index a4b238de..850ea49e 100644 --- a/ipc/copy_user.c +++ b/ipc/copy_user.c @@ -442,16 +442,18 @@ int copyinmsg (const void *userbuf, void *kernelbuf, const size_t usize, const s } kmsg->msgh_size = sizeof(mach_msg_header_t) + ksaddr - (vm_offset_t)(kmsg + 1); - assert(kmsg->msgh_size <= ksize); #else /* The 64 bit interface ensures the header is the same size, so it does not need any resizing. */ _Static_assert(sizeof(mach_msg_header_t) == sizeof(mach_msg_user_header_t), "mach_msg_header_t and mach_msg_user_header_t expected to be of the same size"); if (copyin(umsg, kmsg, usize)) return 1; + + kmsg->msgh_size = usize; kmsg->msgh_remote_port &= 0x; // FIXME: still have port names here kmsg->msgh_local_port &= 0x; // also, this assumes little-endian #endif + assert(kmsg->msgh_size <= ksize); return 0; } -- 2.39.2
Re: [PATCH 1/2] RFC enhance tracing utilities
Il 09/03/24 16:24, Etienne Brateau ha scritto: Le sam. 9 mars 2024 à 15:03, Luca Dariz diff --git a/i386/i386/debug.h b/i386/i386/debug.h index 84397ba8..eff330c6 100644 --- a/i386/i386/debug.h +++ b/i386/i386/debug.h @@ -54,6 +54,7 @@ void debug_trace_dump(void); #else /* __ASSEMBLER__ */ +#ifndef __x86_64__ #define DEBUG_TRACE \ pushl $__LINE__ ;\ pushl $9f ;\ @@ -62,10 +63,21 @@ void debug_trace_dump(void); .data ;\ 9: .ascii __FILE__"\0" ;\ .text - +#else /* __x86_64__ */ +#define DEBUG_TRACE \ + pushq %rdi ;\ + pushq %rs1 ;\ Is it really %rs1 here and not %rsi ? you are pushing rs1 but popping rsi it's probably %rsi, this part is still to be cleaned up (see the list in the descritption) diff --git a/i386/i386/locore.S b/i386/i386/locore.S index 9d0513a1..26c5843c 100644 --- a/i386/i386/locore.S +++ b/i386/i386/locore.S @@ -634,6 +634,15 @@ ENTRY(thread_bootstrap_return) */ ENTRY(thread_syscall_return) +#if KERNEL_TRACE + testb $0xff,EXT(syscall_trace) + jz 1f + movl S_ARG0,%eax /* get return value */ + pushl %eax + call syscall_trace_return + movl %eax,S_ARG0 /* restore return value */ +1: +#endif movl S_ARG0,%eax /* get return value */ movl %esp,%ecx /* get kernel stack */ or $(KERNEL_STACK_SIZE-1),%ecx @@ -1174,18 +1183,26 @@ syscall_native: mach_call_call: -#ifdef DEBUG +#if KERNEL_TRACE testb $0xff,EXT(syscall_trace) jz 0f - pushl %eax - call EXT(syscall_trace_print) - /* will return with syscallofs still (or again) in eax */ - addl $4,%esp + pushl %eax /* add syscall num to args array */ + pushl %esp /* args array is the first argument*/ + call EXT(syscall_trace_enter) + popl %eax + popl %eax you are popping eax twice here, shouldn’t the second one be esp? The first popl just discards the %esp value, the second one restores the correct %eax, so the real syscall can be called as if nothing had happened. For i386 all function parameters are on the stack, so they are not touched by the tracing call. Luca
[PATCH 2/2] RFC: add kernel trace utilities
These are some utilities I used to decode and control the kernel trace, compiled as part of gnumach just so it's easier to build them. I'm not sure if they should be in their own package, for now they complement the kernel tracing patch so it can be tested more easily. With ktrace we can collect the trace of a specific executable, or to manually start/stop tracing. With parsetrace we can decode the trace file collected (with ktrace or from gdb) handling different configurations and mixed 32/64 bit kernel/user environments. This helps a lot when bootstrapping a new platform. The usage is very basic for now, but advanced filtering can be implemented (even with bindings to other languages). --- tests/Makefrag.am| 9 + tests/ktrace.c | 199 +++ tests/parsetrace.cpp | 594 +++ 3 files changed, 802 insertions(+) create mode 100644 tests/ktrace.c create mode 100644 tests/parsetrace.cpp diff --git a/tests/Makefrag.am b/tests/Makefrag.am index faabdf44..df1e1e90 100644 --- a/tests/Makefrag.am +++ b/tests/Makefrag.am @@ -32,3 +32,12 @@ clean-local: $(USER_TESTS_CLEAN) rm -fr tests/include-mach endif # !PLATFORM_xen + +parsetrace: tests/parsetrace.cpp $(MACH_TESTINSTALL) + $(CXX) -Wall -ggdb3 -I$(MACH_TESTINCLUDE) -I$(MIG_OUTDIR) -o $@ $< + +ktrace: tests/ktrace.c $(MIG_OUTDIR)/gnumach.user.c $(MIG_OUTDIR)/mach.user.c + $(CC) -Wall -ggdb3 -I$(MACH_TESTINCLUDE) -I$(MIG_OUTDIR) -o $@ $^ + + +include tests/hurd-integration.mk diff --git a/tests/ktrace.c b/tests/ktrace.c new file mode 100644 index ..b8b7c607 --- /dev/null +++ b/tests/ktrace.c @@ -0,0 +1,199 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + + +/* start kernel tracing */ +void do_start(mach_port_t hostp, size_t bufsize, int kinds) +{ + int err = ktrace_start(hostp, bufsize, kinds); + if (err) +error(2, err, "ktrace_start"); +} + +/* stop kernel tracing */ +void do_stop(mach_port_t hostp) +{ + int err = ktrace_stop(hostp); + if (err) +error(2, err, "ktrace_stop"); +} + +/* collect tracing data and free the kernel resources */ +void do_collect(mach_port_t hostp, const char *path) +{ + vm_offset_t buf, last_ptr; + mach_msg_type_number_t bufsize; + int flags, err; + err = ktrace_collect(hostp, &buf, &bufsize, &last_ptr, &flags); + if (err) +error(1, err, "ktrace_collect"); + printf("out %s size %u last %lu flags %x overrun %d\n", path, bufsize, last_ptr, flags, flags & 0xF); + if (bufsize < last_ptr) +printf("ERROR overrun?\n"); + FILE *fp = fopen(path, "w"); + if (fp) +{ + trace_header_t th; + memset(&th, 0, sizeof(th)); + memcpy(th.magic, TH_MAGIC, 4); + th.flags = flags; + th.last = last_ptr; + err = fwrite(&th, 1, sizeof(th), fp); + if (err <= 0) +error(0, errno, "fwrite header"); + + err = fwrite((void*)buf, 1, bufsize, fp); + if (err < bufsize) +error(0, errno, "fwrite"); + fclose(fp); +} + else +error(0, errno, "fopen, trace file not written"); + + err = vm_deallocate(mach_task_self(), buf, bufsize); + if (err) +error(0, err, "vm_deallocate"); + + err = ktrace_free(hostp); + if (err) +error(1, err, "ktrace_free"); +} + +static const char *usage_text = + "usage: ktrace [ -n BUFSIZE_M ] [ -m MASK ] [ -o TRACEFILE ] start|stop|collect|run [ prog arg1 arg2 ... ]\n\n" + " -n BUFSIZE (on start) is in MiB, default 10 MiB\n" + " -m MASK (on start) is an integer bitwise-mask of the traced events, default all:\n" + " 1 - raw syscall entry/exit\n" + " 2 - user or kernel trap\n" + " 4 - context switch\n" + " 8 - interrupt handlers\n" + " 16 - raw IPC send/recv, traced on copyin/copyout. If mach_msg() fails copyout is skipped.\n" + " -o TRACEFILE (on collect) will contain the dump of trace ring buffer, the default file is trace.bin\n\n" + "start|stop|collect allow to manage the tracing freely\n" + "run atomatically starts and stop tracing of the command [ prog arg1 arg2 ... ].\n" + ; + +int main(int argc, char *argv[]) +{ + int err; + mach_port_t hostp; + size_t b
[PATCH 3/3] move x86 copy_user.[ch] to ipc/ and make it arch-indipendent
From: LD --- Makefrag.am| 2 ++ i386/Makefrag.am | 1 - {x86_64 => ipc}/copy_user.c| 7 +-- {i386/i386 => ipc}/copy_user.h | 18 +- ipc/ipc_kmsg.c | 2 +- ipc/ipc_mqueue.c | 2 +- ipc/mach_msg.c | 2 +- kern/ipc_mig.c | 2 +- x86_64/Makefrag.am | 1 - 9 files changed, 20 insertions(+), 17 deletions(-) rename {x86_64 => ipc}/copy_user.c (99%) rename {i386/i386 => ipc}/copy_user.h (90%) diff --git a/Makefrag.am b/Makefrag.am index 5b61a1d6..82fce628 100644 --- a/Makefrag.am +++ b/Makefrag.am @@ -76,6 +76,8 @@ endif # libkernel_a_SOURCES += \ + ipc/copy_user.c \ + ipc/copy_user.h \ ipc/ipc_entry.c \ ipc/ipc_entry.h \ ipc/ipc_init.c \ diff --git a/i386/Makefrag.am b/i386/Makefrag.am index 58ee3273..5e7d4740 100644 --- a/i386/Makefrag.am +++ b/i386/Makefrag.am @@ -91,7 +91,6 @@ endif # libkernel_a_SOURCES += \ - i386/i386/copy_user.h \ i386/i386/cswitch.S \ i386/i386/debug_trace.S \ i386/i386/idt_inittab.S \ diff --git a/x86_64/copy_user.c b/ipc/copy_user.c similarity index 99% rename from x86_64/copy_user.c rename to ipc/copy_user.c index c6e125d9..5c6329d3 100644 --- a/x86_64/copy_user.c +++ b/ipc/copy_user.c @@ -16,14 +16,15 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ +#ifdef __LP64__ + #include #include +#include #include #include -#include - /* Mach field descriptors measure size in bits */ #define descsize_to_bytes(n) (n / 8) @@ -611,3 +612,5 @@ int copyoutmsg (const void *kernelbuf, void *userbuf, const size_t ksize) return 0; } + +#endif /* __LP64__ */ diff --git a/i386/i386/copy_user.h b/ipc/copy_user.h similarity index 90% rename from i386/i386/copy_user.h rename to ipc/copy_user.h index 3d1c7278..a57b3ee5 100644 --- a/i386/i386/copy_user.h +++ b/ipc/copy_user.h @@ -28,7 +28,7 @@ /* * The copyin_32to64() and copyout_64to32() routines are meant for data types * that have different size in kernel and user space. They should be independent - * of endianness and hopefully can be reused in the future on other archs. + * of endianness and hopefully can be reused on all archs. * These types are e.g.: * - port names vs port pointers, on a 64-bit kernel * - memory addresses, on a 64-bit kernel and 32-bit user @@ -71,23 +71,23 @@ static inline int copyout_address(const vm_offset_t *kaddr, rpc_vm_offset_t *uad static inline int copyin_port(const mach_port_name_t *uaddr, mach_port_t *kaddr) { -#ifdef __x86_64__ +#ifdef __LP64__ return copyin_32to64(uaddr, kaddr); -#else /* __x86_64__ */ +#else /* __LP64__ */ return copyin(uaddr, kaddr, sizeof(*uaddr)); -#endif /* __x86_64__ */ +#endif /* __LP64__ */ } static inline int copyout_port(const mach_port_t *kaddr, mach_port_name_t *uaddr) { -#ifdef __x86_64__ +#ifdef __LP64__ return copyout_64to32(kaddr, uaddr); -#else /* __x86_64__ */ +#else /* __LP64__ */ return copyout(kaddr, uaddr, sizeof(*kaddr)); -#endif /* __x86_64__ */ +#endif /* __LP64__ */ } -#if defined(__x86_64__) && defined(USER32) +#if defined(__LP64__) && defined(USER32) /* For 32 bit userland, kernel and user land messages are not the same size. */ size_t msg_usize(const mach_msg_header_t *kmsg); #else @@ -95,6 +95,6 @@ static inline size_t msg_usize(const mach_msg_header_t *kmsg) { return kmsg->msgh_size; } -#endif /* __x86_64__ && USER32 */ +#endif /* __LP64__ && USER32 */ #endif /* COPY_USER_H */ diff --git a/ipc/ipc_kmsg.c b/ipc/ipc_kmsg.c index bd843804..8bd645ff 100644 --- a/ipc/ipc_kmsg.c +++ b/ipc/ipc_kmsg.c @@ -42,7 +42,6 @@ #include #include #include -#include #include #include #include @@ -51,6 +50,7 @@ #include #include #include +#include #include #include #include diff --git a/ipc/ipc_mqueue.c b/ipc/ipc_mqueue.c index 44e1eb98..95308f35 100644 --- a/ipc/ipc_mqueue.c +++ b/ipc/ipc_mqueue.c @@ -36,13 +36,13 @@ #include #include -#include #include #include #include #include #include #include +#include #include #include #include diff --git a/ipc/mach_msg.c b/ipc/mach_msg.c index 6194ef7b..ff5e5b09 100644 --- a/ipc/mach_msg.c +++ b/ipc/mach_msg.c @@ -39,7 +39,6 @@ #include #include #include -#include #include #include #include @@ -49,6 +48,7 @@ #include #include #include +#include #include #include #include diff --git a/kern/ipc_mig.c b/kern/ipc_mig.c index d26d2c6d..b753a25f 100644 --- a/kern/ipc_mig.c +++ b/kern/ipc_mig.c @@ -30,7 +30,6 @@ #include #include #include -#include #include #include #include @@ -42,6 +41,7 @@ #include #include #include +#include #include #include #include diff --git a/x86_64/Makefrag.am b/x86_64/Makefrag.am index b0bc45c2..2bbed986 100644 --- a/x86_64/Makefrag.am +++ b/x86_64/Makefrag.am @@ -90,7 +90,6 @@ libkernel_a_SOURCES += \ i386/i386/
[PATCH 1/2] RFC enhance tracing utilities
This extends the previous debug utility to trace system calls with more events and the ability to control the tracing from userspace, collecting a trace of the whole systems. This tool was quite useful in porting the rpc format to 64 bit and handle the 32to64 translation, but also to debug user-space programs. It's still a draft with some limitations and open points: - DEBUG_TRACE for 64 bit is missing - check if we can have meaningful and reliable data removing the syscall number on syscall return; this would avoid saving it in the thread structure. - add more info to each trace entry, e.g. current CPU or timestamps - trace more events - add more filtering capabilities - trace a single subhurd - trace format is not exported --- Makefrag.am | 7 + configfrag.ac | 12 ++ i386/i386/debug.h | 16 +- i386/i386/debug_i386.c| 45 ++-- i386/i386/locore.S| 33 ++- i386/i386/pcb.c | 4 + i386/i386/trap.c | 5 + include/mach/gnumach.defs | 33 +++ include/mach/trace.h | 61 ++ ipc/ipc_kmsg.c| 6 +- ipc/mach_msg.c| 2 + kern/exception.c | 1 + kern/startup.c| 2 + kern/thread.h | 4 + kern/trace.c | 432 ++ kern/trace.h | 42 scripts/gnumach-gdb.py| 63 ++ tests/test-trace.c| 58 + tests/user-qemu.mk| 3 +- x86_64/debug_trace.S | 9 +- x86_64/interrupt.S| 19 ++ x86_64/locore.S | 84 +++- 22 files changed, 887 insertions(+), 54 deletions(-) create mode 100644 include/mach/trace.h create mode 100644 kern/trace.c create mode 100644 kern/trace.h create mode 100644 scripts/gnumach-gdb.py create mode 100644 tests/test-trace.c diff --git a/Makefrag.am b/Makefrag.am index 82fce628..25d73ec3 100644 --- a/Makefrag.am +++ b/Makefrag.am @@ -221,6 +221,12 @@ libkernel_a_SOURCES += \ kern/xpr.h \ kern/elf-load.c \ kern/boot_script.c + +if KERNEL_TRACE +libkernel_a_SOURCES += \ + kern/trace.c +endif + EXTRA_DIST += \ kern/exc.defs \ kern/mach.srv \ @@ -407,6 +413,7 @@ include_mach_HEADERS = \ include/mach/profilparam.h \ include/mach/std_types.h \ include/mach/syscall_sw.h \ + include/mach/trace.h \ include/mach/task_info.h \ include/mach/task_special_ports.h \ include/mach/thread_info.h \ diff --git a/configfrag.ac b/configfrag.ac index b8b41261..f4dbe63d 100644 --- a/configfrag.ac +++ b/configfrag.ac @@ -139,6 +139,18 @@ AC_DEFINE([SLAB_VERIFY], [0], [SLAB_VERIFY]) # Enable the CPU pool layer in the slab allocator. AC_DEFINE([SLAB_USE_CPU_POOLS], [0], [SLAB_USE_CPU_POOLS]) + +# enable support for tracing various kernel events +AC_ARG_ENABLE([kernel-trace], + AS_HELP_STRING([--enable-kernel-trace], [Enable kernel event tracing])) +[if [ x"$enable_kernel_trace" = xyes ]; then] + AC_DEFINE([KERNEL_TRACE], [1], [Enable kernel event tracing]) + AM_CONDITIONAL([KERNEL_TRACE], [true]) +[else] + AC_DEFINE([KERNEL_TRACE], [0], [Enable kernel event tracing]) + AM_CONDITIONAL([KERNEL_TRACE], [false]) +[fi] + # # Options. diff --git a/i386/i386/debug.h b/i386/i386/debug.h index 84397ba8..eff330c6 100644 --- a/i386/i386/debug.h +++ b/i386/i386/debug.h @@ -54,6 +54,7 @@ void debug_trace_dump(void); #else /* __ASSEMBLER__ */ +#ifndef __x86_64__ #define DEBUG_TRACE\ pushl $__LINE__ ;\ pushl $9f ;\ @@ -62,10 +63,21 @@ void debug_trace_dump(void); .data ;\ 9: .ascii __FILE__"\0";\ .text - +#else /* __x86_64__ */ +#define DEBUG_TRACE\ + pushq %rdi;\ + pushq %rs1;\ + movq$__LINE__,%rdi ;\ + movq$9f,%rsi;\ + call__debug_trace ;\ +popq %rsi;\ +popq %rdi;\ + .data ;\ +9: .ascii __FILE__"\0";\ + .text +#endif #endif /* __ASSEMBLER__ */ - #endif /* DEBUG */ /* XXX #include_next "debug.h" */ diff --git a/i386/i386/debug_i386.c b/i386/i386/debug_i386.c index 41d032e3..61d73dff 100644 --- a/i386/i386/debug_i386.c +++ b/i386/i386/debug_i386.c @@ -22,6 +22,7 @@ */ #include +#include #include "thread.h" #include "trap.h" @@ -140,39 +141,23 @@ debug_trace_dump(void) splx(s); } +#endif /* DEBUG */ -#include - -int syscall_trace = 0; -task_t syscall_trace_task; +#if KERNEL_TRACE -int -syscall_trace_print(int syscallvec, ...) +#ifndef __x86_64__ +#define SYSCALL_NUM_SHIFT 4 +#else +#define SYSCALL_NUM_SHIFT 5 +#endif
[PATCH 1/3] x86_64: split SET_KERNEL_SEGMENTS() for NCPU > 1
This allows 32on64 to work again. Also, it's a clearer indication of a missing part. --- x86_64/locore.S | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/x86_64/locore.S b/x86_64/locore.S index 806762bb..8f39a677 100644 --- a/x86_64/locore.S +++ b/x86_64/locore.S @@ -163,17 +163,21 @@ #define POP_SEGMENTS_ISR(reg) #endif +#if NCPUS > 1 +#define SET_KERNEL_SEGMENTS(reg)\ + ud2 /* TODO: use swapgs or similar */ +#else // NCPUS > 1 #ifdef USER32 #define SET_KERNEL_SEGMENTS(reg) \ mov %ss,reg /* switch to kernel segments */ ;\ mov reg,%ds /* (same as kernel stack segment) */ ;\ mov reg,%es ;\ mov reg,%fs ;\ - mov $(PERCPU_DS),reg;\ mov reg,%gs -#else +#else // USER32 #define SET_KERNEL_SEGMENTS(reg) -#endif +#endif // USER32 +#endif // NCPUS > 1 /* * Fault recovery. -- 2.39.2
[PATCH 2/3] remove machine/machspl.h as it duplicates machine/spl.h
From: LD --- device/chario.c | 2 +- device/ds_routines.c | 4 ++-- device/net_io.c | 4 ++-- i386/Makefrag_x86.am | 1 - i386/i386/db_interface.c | 2 +- i386/i386/db_trace.c | 2 +- i386/i386/fpu.c | 2 +- i386/i386/ipl.h | 2 +- i386/i386/irq.c | 2 +- i386/i386/machspl.h | 29 - i386/i386/pic.c | 2 +- i386/i386/trap.c | 2 +- i386/i386at/com.c| 2 +- i386/i386at/kd.c | 2 +- i386/i386at/kd.h | 2 +- i386/i386at/kd_event.c | 2 +- i386/i386at/model_dep.c | 2 +- i386/i386at/rtc.c| 2 +- kern/ast.c | 2 +- kern/eventcount.c| 2 +- kern/ipc_host.c | 2 +- kern/ipc_sched.c | 2 +- kern/mach_clock.c| 2 +- kern/machine.c | 2 +- kern/priority.c | 2 +- kern/sched_prim.c| 2 +- kern/startup.c | 2 +- kern/syscall_subr.c | 2 +- kern/task.c | 2 +- kern/thread.c| 2 +- kern/thread_swap.c | 2 +- kern/xpr.c | 2 +- linux/dev/arch/i386/kernel/irq.c | 2 +- linux/dev/kernel/sched.c | 2 +- xen/console.c| 2 +- 35 files changed, 35 insertions(+), 65 deletions(-) delete mode 100644 i386/i386/machspl.h diff --git a/device/chario.c b/device/chario.c index 3fe93ccb..efb55867 100644 --- a/device/chario.c +++ b/device/chario.c @@ -34,7 +34,7 @@ #include #include #include -#include/* spl definitions */ +#include/* spl definitions */ #include diff --git a/device/ds_routines.c b/device/ds_routines.c index d97d229e..439fc5b3 100644 --- a/device/ds_routines.c +++ b/device/ds_routines.c @@ -63,7 +63,7 @@ #include #include #include -#include/* spl definitions */ +#include/* spl definitions */ #include #include @@ -95,7 +95,7 @@ #include #include -#include +#include #ifdef LINUX_DEV extern struct device_emulation_ops linux_block_emulation_ops; diff --git a/device/net_io.c b/device/net_io.c index ee9435d7..efde9d6c 100644 --- a/device/net_io.c +++ b/device/net_io.c @@ -42,7 +42,7 @@ #include #include -#include/* spl definitions */ +#include/* spl definitions */ #include #include #include @@ -64,7 +64,7 @@ #include #include -#include +#include #ifMACH_TTD #include diff --git a/i386/Makefrag_x86.am b/i386/Makefrag_x86.am index 272de023..a6c7a5c8 100644 --- a/i386/Makefrag_x86.am +++ b/i386/Makefrag_x86.am @@ -50,7 +50,6 @@ libkernel_a_SOURCES += \ i386/i386/mach_param.h \ i386/i386/machine_routines.h \ i386/i386/machine_task.c \ - i386/i386/machspl.h \ i386/i386/model_dep.h \ i386/i386/mp_desc.c \ i386/i386/mp_desc.h \ diff --git a/i386/i386/db_interface.c b/i386/i386/db_interface.c index 483991d6..392e2b69 100644 --- a/i386/i386/db_interface.c +++ b/i386/i386/db_interface.c @@ -59,7 +59,7 @@ #include #include #include -#include +#include #if MACH_KDB /* Whether the kernel uses any debugging register. */ diff --git a/i386/i386/db_trace.c b/i386/i386/db_trace.c index 0ef72518..abc32554 100644 --- a/i386/i386/db_trace.c +++ b/i386/i386/db_trace.c @@ -34,7 +34,7 @@ #include #include -#include +#include #include #include #include diff --git a/i386/i386/fpu.c b/i386/i386/fpu.c index 4cd31dd9..316e3b41 100644 --- a/i386/i386/fpu.c +++ b/i386/i386/fpu.c @@ -43,7 +43,7 @@ #include #include -#include/* spls */ +#include/* spls */ #include #include #include diff --git a/i386/i386/ipl.h b/i386/i386/ipl.h index 6e59b368..2d02e3c2 100644 --- a/i386/i386/ipl.h +++ b/i386/i386/ipl.h @@ -71,7 +71,7 @@ WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. #ifdef KERNEL #ifndef__ASSEMBLER__ -#include +#include /* Note that interrupts have varying signatures */ typedef void (*interrupt_handler_fn)(int); extern interrupt_handler_fn ivect[]; diff --git a/i386/i386/irq.c b/i386/i386/irq.c index a7c98890..3c2f1748 100644 --- a/i386/i386/irq.c +++ b/i386/i386/irq.c @@ -22,7 +22,7 @@ #include #include #include -#include +#include extern queue_head_t main_intr_queue; diff --git a/i386/i386/machspl.h b/i386/i386/machspl.h deleted file mode 100644 index bbb26754.. --- a/i386/i386/machspl.h +++ /dev/null @@ -1,29 +0,0 @@ -/* - * Mach Operating System - * Copyright (c) 1991,1990 Carnegie Mellon University - * All Rights Reserved. - * - * Permission to use, copy, modify and distribute this software and its - * documentation is hereby granted, provided that both the copyright - * notice and
Re: Help with cross install
Hi, Il 12/01/24 05:18, Joshua Branson ha scritto: Here is my /boot/grub/custom.cfg #+begin_example menuentry "pci-arbiter + acpi + rumpdisk" { set root=(hd0,msdos1) multiboot /boot/gnumach-1.8-486.gz root=part:1:device:wd0 noide -s you can try to add also "noahci" to gnumach command line after "noide"; this should prevent gnumach from probing AHCI devices and then rumpdisk should be able to find them. Luca
[PATCH 02/11] add mach_host tests
--- tests/test-mach_host.c | 81 ++ tests/user-qemu.mk | 3 +- 2 files changed, 83 insertions(+), 1 deletion(-) create mode 100644 tests/test-mach_host.c diff --git a/tests/test-mach_host.c b/tests/test-mach_host.c new file mode 100644 index ..53f30240 --- /dev/null +++ b/tests/test-mach_host.c @@ -0,0 +1,81 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include + +#include + +void test_kernel_version() +{ + int err; + kernel_version_t kver; + err = host_get_kernel_version(mach_host_self(), kver); + ASSERT_RET(err, "host_kernel_info"); + printf("kernel version: %s\n", kver); +} + +void test_host_info() +{ + int err; + mach_msg_type_number_t count; + mach_port_t thishost = mach_host_self(); + + host_basic_info_data_t binfo; + count = HOST_BASIC_INFO_COUNT; + err = host_info(thishost, HOST_BASIC_INFO, (host_info_t)&binfo, &count); + ASSERT_RET(err, "host_basic_info"); + ASSERT(count == HOST_BASIC_INFO_COUNT, ""); + ASSERT(binfo.max_cpus > 0, "no cpu?"); + ASSERT(binfo.avail_cpus > 0, "no cpu available?"); + ASSERT(binfo.memory_size > (1024 * 1024), "memory too low"); + + const int maxcpus = 255; + int proc_slots[maxcpus]; + count = maxcpus; + err = host_info(thishost, HOST_PROCESSOR_SLOTS, (host_info_t)&proc_slots, &count); + ASSERT_RET(err, "host_processor_slots"); + ASSERT((1 <= count) && (count <= maxcpus), ""); + + host_sched_info_data_t sinfo; + count = HOST_SCHED_INFO_COUNT; + err = host_info(thishost, HOST_SCHED_INFO, (host_info_t)&sinfo, &count); + ASSERT_RET(err, "host_sched_info"); + ASSERT(count == HOST_SCHED_INFO_COUNT, ""); + ASSERT(sinfo.min_timeout < 1000, "timeout unexpectedly high"); + ASSERT(sinfo.min_quantum < 1000, "quantum unexpectedly high"); + + host_load_info_data_t linfo; + count = HOST_LOAD_INFO_COUNT; + err = host_info(thishost, HOST_LOAD_INFO, (host_info_t)&linfo, &count); + ASSERT_RET(err, "host_load_info"); + ASSERT(count == HOST_LOAD_INFO_COUNT, ""); + for (int i=0; i<3; i++) + { + printf("avenrun %d\n", linfo.avenrun[i]); + printf("mach_factor %d\n", linfo.mach_factor[i]); + } +} + +// TODO processor sets + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + test_kernel_version(); + test_host_info(); + return 0; +} diff --git a/tests/user-qemu.mk b/tests/user-qemu.mk index 78775938..8b338241 100644 --- a/tests/user-qemu.mk +++ b/tests/user-qemu.mk @@ -195,7 +195,8 @@ clean-test-%: USER_TESTS := \ - tests/test-hello + tests/test-hello \ + tests/test-mach_host USER_TESTS_CLEAN = $(subst tests/,clean-,$(USER_TESTS)) -- 2.39.2
[PATCH 09/11] add raw mach_msg tests
--- tests/test-machmsg.c | 405 +++ tests/user-qemu.mk | 3 +- 2 files changed, 407 insertions(+), 1 deletion(-) create mode 100644 tests/test-machmsg.c diff --git a/tests/test-machmsg.c b/tests/test-machmsg.c new file mode 100644 index ..60f3f49f --- /dev/null +++ b/tests/test-machmsg.c @@ -0,0 +1,405 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include + +#include +#include + +#include +#include +#include + +#define ECHO_MAX_BODY_LEN 256 + +static uint32_t align(uint32_t val, size_t aln) +{ + // we should check aln is a power of 2 + aln--; + return (val + aln) & (~aln); +} + +#define ALIGN_INLINE(val, n) { (val) = align((val), (n)); } + +struct echo_params +{ + mach_port_t rx_port; + mach_msg_size_t rx_size; + mach_msg_size_t rx_number; +}; + +void echo_thread (void *arg) +{ + struct echo_params *params = arg; + int err; + struct message + { +mach_msg_header_t header; +char body[ECHO_MAX_BODY_LEN]; + } message; + + printf ("thread echo started\n"); + for (mach_msg_size_t i=0; irx_number; i++) +{ + message.header.msgh_local_port = params->rx_port; + message.header.msgh_size = sizeof (message); + + err = mach_msg (&message.header, + MACH_RCV_MSG, + 0, sizeof (message), + params->rx_port, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); + ASSERT_RET(err, "mach_msg echo rx"); + printf("echo rx %d expected 5d\n", + message.header.msgh_size, params->rx_size); + ASSERT(message.header.msgh_size == params->rx_size, + "wrong size in echo rx"); + + message.header.msgh_local_port = MACH_PORT_NULL; + printf ("echo: msgh_id %d\n", message.header.msgh_id); + + err = mach_msg (&message.header, + MACH_SEND_MSG, + message.header.msgh_size, 0, + MACH_PORT_NULL, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); + ASSERT_RET(err, "mach_msg echo tx"); +} + printf ("thread echo stopped\n"); + thread_terminate (mach_thread_self ()); + FAILURE("thread_terminate"); +} + +#define TEST_ITERATIONS 3 + +// TODO run_test_iterations +void +test_iterations (void) +{ + mach_port_t port, receive; + int err; + struct message + { +mach_msg_header_t header; +mach_msg_type_t type; +char data[64]; + } message; + + err = mach_port_allocate (mach_task_self (), + MACH_PORT_RIGHT_RECEIVE, &port); + ASSERT_RET(err, "mach_port_allocate"); + + err = mach_port_allocate (mach_task_self (), + MACH_PORT_RIGHT_RECEIVE, &receive); + ASSERT_RET(err, "mach_port_allocate 2"); + + struct echo_params params; + params.rx_port = port; + params.rx_size = sizeof(message.header) + sizeof(message.type) + 5; + ALIGN_INLINE(params.rx_size, MACH_MSG_USER_ALIGNMENT); + params.rx_number = TEST_ITERATIONS; + test_thread_start (mach_task_self (), echo_thread, ¶ms); + + time_value_t start_time; + err = host_get_time (mach_host_self (), &start_time); + ASSERT_RET(err, "host_get_time"); + + /* Send a message down the port */ + for (int i = 0; i < TEST_ITERATIONS; i++) +{ + struct message message; + + memset (&message, 0, sizeof (message)); + strcpy (message.data, "ciao"); + size_t datalen = strlen (message.data) + 1; + + message.header.msgh_bits + = MACH_MSGH_BITS (MACH_MSG_TYPE_MAKE_SEND, + MACH_MSG_TYPE_MAKE_SEND_ONCE); + message.header.msgh_remote_port = port; /* Request port */ + message.header.msgh_local_port = receive;/* Reply port */ + message.header.msgh_id = 123;/* Message id */ + message.header.msgh_size = sizeof (message.header) + sizeof (message.type) + datalen;/* Message size */ + ALIGN_INLINE(message.header.msgh_size, 4); + message.type.msgt_name = MACH_MSG_TYPE_STRING; /* Parameter type */ + message.type.msgt_size = 8 * datalen;/* # Bits */ + message.type.msgt_number = 1;/* Number of elements */ + message.type.msgt_inline = TRUE; /* Inlined */ + message.type.msgt_longform = FALSE; /* Shortform */ + message.type.msgt_dealloc
[PATCH 03/11] add gsync tests
--- tests/test-gsync.c | 122 + tests/user-qemu.mk | 3 +- 2 files changed, 124 insertions(+), 1 deletion(-) create mode 100644 tests/test-gsync.c diff --git a/tests/test-gsync.c b/tests/test-gsync.c new file mode 100644 index ..a5160651 --- /dev/null +++ b/tests/test-gsync.c @@ -0,0 +1,122 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +#include +#include +#include +#include + +#include +#include + +/* Gsync flags. */ +#ifndef GSYNC_SHARED +# define GSYNC_SHARED 0x01 +# define GSYNC_QUAD0x02 +# define GSYNC_TIMED 0x04 +# define GSYNC_BROADCAST 0x08 +# define GSYNC_MUTATE 0x10 +#endif + +static uint32_t single_shared; +static struct { + uint32_t val1; + uint32_t val2; +} single_shared_quad; + +static void test_single() +{ + int err; + single_shared = 0; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 0, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait did not timeout"); + + single_shared = 1; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 0, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_INVALID_ARGUMENT, "gsync_wait on wrong value"); + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 1, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait again on correct value did not timeout"); + + single_shared_quad.val1 = 1; + single_shared_quad.val2 = 2; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared_quad, 99, 88, + 100, GSYNC_TIMED | GSYNC_QUAD); + ASSERT(err == KERN_INVALID_ARGUMENT, "gsync_wait on wrong quad value"); + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared_quad, 1, 2, + 100, GSYNC_TIMED | GSYNC_QUAD); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait again on correct value did not timeout"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake with nobody waiting"); + + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 1, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait not affected by previous gsync_wake"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, GSYNC_BROADCAST); + ASSERT_RET(err, "gsync_wake broadcast with nobody waiting"); + + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 1, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait not affected by previous gsync_wake"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 2, GSYNC_MUTATE); + ASSERT_RET(err, "gsync_wake nobody + mutate"); + ASSERT(single_shared == 2, "gsync_wake single_shared did not mutate"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake nobody without mutate"); + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 3a"); +} + +static void single_thread_setter(void *arg) +{ + int err; + int val = (long)arg; + + /* It should be enough to sleep a bit for our creator to call + gsync_wait(). To verify that the test is performed with the + correct sequence, we also change the value so if the wait is + called after our wake it will fail with KERN_INVALID_ARGUMENT */ + msleep(100); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, val, GSYNC_MUTATE); + ASSERT_RET(err, "gsync_wake from thread + mutate"); + + thread_terminate(mach_thread_self()); + FAILURE("thread_terminate"); +} + +static void test_single_from_thread() +{ + int err; + single_shared = 10; + test_thread_start(mach_task_self(), single_thread_setter, (void*)11); + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 10, 0, 0, 0); + ASSERT_RET(err, "gsync_wait without timeout for wake from another thread"); + ASSERT(single_shared == 11, "wake didn't mutate"); +} + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + test_single_from_thread(); + test_single(); + return 0; +} diff --git a/tests/user-qemu.mk b/tests/user-qemu.mk index 8b338241..7c5e6927 100644 --- a/tests/user-qemu.mk +++ b/tests/user-qemu.mk @@ -196,7 +196,8 @@ clean-test-%: USER_TE
[PATCH 06/11] add basic vm tests
--- tests/test-vm.c| 85 ++ tests/user-qemu.mk | 3 +- 2 files changed, 87 insertions(+), 1 deletion(-) create mode 100644 tests/test-vm.c diff --git a/tests/test-vm.c b/tests/test-vm.c new file mode 100644 index ..4ece792e --- /dev/null +++ b/tests/test-vm.c @@ -0,0 +1,85 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +#include +#include +#include +#include +#include + +#include +#include +#include +#include + + +static void test_memobj() +{ + // this emulates maptime() mapping and reading + struct mapped_time_value *mtime; + int64_t secs, usecs; + mach_port_t device, memobj; + int err; + + err = device_open (device_priv(), 0, "time", &device); + ASSERT_RET(err, "device_open"); + err = device_map (device, VM_PROT_READ, 0, sizeof(*mtime), &memobj, 0); + ASSERT_RET(err, "device_map"); + err = mach_port_deallocate (mach_task_self (), device); + ASSERT_RET(err, "mach_port_deallocate"); + mtime = 0; + err = vm_map(mach_task_self (), (vm_address_t *)&mtime, sizeof *mtime, 0, 1, + memobj, 0, 0, VM_PROT_READ, VM_PROT_READ, VM_INHERIT_NONE); + ASSERT_RET(err, "vm_map"); + + do +{ + secs = mtime->seconds; + __sync_synchronize (); + usecs = mtime->microseconds; + __sync_synchronize (); +} + while (secs != mtime->check_seconds); + printf("mapped time is %lld.%lld\n",secs, usecs); + + err = mach_port_deallocate (mach_task_self (), memobj); + ASSERT_RET(err, "mach_port_deallocate"); + err = vm_deallocate(mach_task_self(), (vm_address_t)mtime, sizeof(*mtime)); + ASSERT_RET(err, "vm_deallocate"); +} + +static void test_wire() +{ + int err = vm_wire_all(host_priv(), mach_task_self(), VM_WIRE_ALL); + ASSERT_RET(err, "vm_wire_all-ALL"); + err = vm_wire_all(host_priv(), mach_task_self(), VM_WIRE_NONE); + ASSERT_RET(err, "vm_wire_all-NONE"); + // TODO check that all memory is actually wired or unwired +} + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + printf("VM_MIN_ADDRESS=0x%p\n", VM_MIN_ADDRESS); + printf("VM_MAX_ADDRESS=0x%p\n", VM_MAX_ADDRESS); + test_wire(); + test_memobj(); + return 0; +} diff --git a/tests/user-qemu.mk b/tests/user-qemu.mk index fe0f9569..1ab7cb8c 100644 --- a/tests/user-qemu.mk +++ b/tests/user-qemu.mk @@ -198,7 +198,8 @@ USER_TESTS := \ tests/test-hello \ tests/test-mach_host \ tests/test-gsync \ - tests/test-mach_port + tests/test-mach_port \ + tests/test-vm USER_TESTS_CLEAN = $(subst tests/,clean-,$(USER_TESTS)) -- 2.39.2
[PATCH 11/11] add basic thread tests
--- tests/test-threads.c | 104 +++ tests/user-qemu.mk | 3 +- 2 files changed, 106 insertions(+), 1 deletion(-) create mode 100644 tests/test-threads.c diff --git a/tests/test-threads.c b/tests/test-threads.c new file mode 100644 index ..06630bef --- /dev/null +++ b/tests/test-threads.c @@ -0,0 +1,104 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +#include +#include + +#include + +void sleeping_thread(void* arg) +{ + printf("starting thread %d\n", arg); + for (int i=0; i<100; i++) + msleep(50); + printf("stopping thread %d\n", arg); + thread_terminate(mach_thread_self()); + FAILURE("thread_terminate"); +} + +void test_many(void) +{ + for (long tid=0; tid<10; tid++) +{ + test_thread_start(mach_task_self(), sleeping_thread, (void*)tid); +} + // TODO: wait for thread end notifications + msleep(6000); +} + +#ifdef __x86_64__ +void test_fsgs_base_thread(void* tid) +{ + int err; +#if defined(__SEG_FS) && defined(__SEG_GS) + long __seg_fs *fs_ptr; + long __seg_gs *gs_ptr; + long fs_value; + long gs_value; + + struct i386_fsgs_base_state state; + state.fs_base = (unsigned long)&fs_value; + state.gs_base = (unsigned long)&gs_value; + err = thread_set_state(mach_thread_self(), i386_FSGS_BASE_STATE, + (thread_state_t) &state, i386_FSGS_BASE_STATE_COUNT); + ASSERT_RET(err, "thread_set_state"); + + fs_value = 0x100 + (long)tid; + gs_value = 0x200 + (long)tid; + + msleep(50); // allow the others to set their segment base + + fs_ptr = 0; + gs_ptr = 0; + long rdvalue = *fs_ptr; + printf("FS expected %lx read %lx\n", fs_value, rdvalue); + ASSERT(fs_value == rdvalue, "FS base error\n"); + rdvalue = *gs_ptr; + printf("GS expected %lx read %lx\n", gs_value, rdvalue); + ASSERT(gs_value == rdvalue, "GS base error\n"); +#else +#error " missing __SEG_FS and __SEG_GS" +#endif + + thread_terminate(mach_thread_self()); + FAILURE("thread_terminate"); +} +#endif + +void test_fsgs_base(void) +{ +#ifdef __x86_64__ + int err; + for (long tid=0; tid<10; tid++) +{ + test_thread_start(mach_task_self(), test_fsgs_base_thread, (void*)tid); +} + msleep(1000); // TODO: wait for threads +#endif +} + + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + test_fsgs_base(); + test_many(); + return 0; +} diff --git a/tests/user-qemu.mk b/tests/user-qemu.mk index 77259198..fd5ae1ab 100644 --- a/tests/user-qemu.mk +++ b/tests/user-qemu.mk @@ -203,7 +203,8 @@ USER_TESTS := \ tests/test-vm \ tests/test-syscalls \ tests/test-machmsg \ - tests/test-task + tests/test-task \ + tests/test-threads USER_TESTS_CLEAN = $(subst tests/,clean-,$(USER_TESTS)) -- 2.39.2
[PATCH 05/11] adjust range when changing memory pageability
* vm/vm_map.c: use actual limits instead of min/max boundaries to change pageability of the currently mapped memory. This caused the initial vm_wire_all(host, task VM_WIRE_ALL) in glibc startup to fail with KERN_NO_SPACE. --- vm/vm_map.c | 31 ++- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/vm/vm_map.c b/vm/vm_map.c index 26e18676..e454bb2a 100644 --- a/vm/vm_map.c +++ b/vm/vm_map.c @@ -1829,6 +1829,30 @@ kern_return_t vm_map_pageable( return(KERN_SUCCESS); } +/* Update pageability of all the memory currently in the map. + * The map must be locked, and protection mismatch will not be checked, see + * vm_map_pageable(). + */ +static kern_return_t +vm_map_pageable_current(vm_map_t map, vm_prot_t access_type) +{ + struct rbtree_node *node; + vm_offset_t min_address, max_address; + + node = rbtree_first(&map->hdr.tree); + min_address = rbtree_entry(node, struct vm_map_entry, + tree_node)->vme_start; + + node = rbtree_last(&map->hdr.tree); + max_address = rbtree_entry(node, struct vm_map_entry, + tree_node)->vme_end; + + /* Returns with the map read-locked if successful */ + return vm_map_pageable(map, min_address, max_address,access_type, + FALSE, FALSE); +} + + /* * vm_map_pageable_all: * @@ -1859,8 +1883,7 @@ vm_map_pageable_all(struct vm_map *map, vm_wire_t flags) map->wiring_required = FALSE; /* Returns with the map read-locked if successful */ - kr = vm_map_pageable(map, map->min_offset, map->max_offset, -VM_PROT_NONE, FALSE, FALSE); + kr = vm_map_pageable_current(map, VM_PROT_NONE); vm_map_unlock(map); return kr; } @@ -1873,9 +1896,7 @@ vm_map_pageable_all(struct vm_map *map, vm_wire_t flags) if (flags & VM_WIRE_CURRENT) { /* Returns with the map read-locked if successful */ - kr = vm_map_pageable(map, map->min_offset, map->max_offset, -VM_PROT_READ | VM_PROT_WRITE, -FALSE, FALSE); + kr = vm_map_pageable_current(map, VM_PROT_READ | VM_PROT_WRITE); if (kr != KERN_SUCCESS) { if (flags & VM_WIRE_FUTURE) { -- 2.39.2
[PATCH 01/11] add basic user-space tests with qemu
* configure.ac: move test fragment to have USER32 * tests/Makefrag.am: add user tests * tests/README: add basic info on how to run and debug user tests * tests/configfrag.ac: allow the test compiler/flags to be autoconfigured or customized * tests/grub.cfg.single.template: add minimal grub config to boot a module * tests/include/device/cons.h: add a simplified version of device/cons.h usable for tests * tests/include/kern/printf.h: symlink to kern/printf.h * tests/include/mach/mig_support.h: add basic version for user-space tests * tests/include/syscalls.h: add prototypes for syscalls used in tests. * tests/include/testlib.h: add definitions for common test functionalities * tests/include/util/atoi.h: symlink to util/atoi.h * tests/run-qemu.sh.template: add a simple qemu test runner * tests/start.S: add arch-specific entry point * tests/syscalls.S: generate syscalls entry points * tests/test-hello.c: add basic smoke test * tests/testlib.c: add the minimal functionality to run a user-space executable and reboot the system, and some test helpers. * tests/user-qemu.mk: add rules to build simple user-space test modules, including generating mig stubs. The tests reuse some kernel code (like printf(), mach_atoi(), mem*(), str*() functions) so we can use the freestanding environment and not depend on glibc. --- configure.ac | 6 +- tests/Makefrag.am| 11 +- tests/README | 37 ++ tests/configfrag.ac | 16 +++ tests/grub.cfg.single.template | 4 + tests/include/device/cons.h | 27 tests/include/kern/printf.h | 1 + tests/include/mach/mig_support.h | 71 +++ tests/include/syscalls.h | 83 tests/include/testlib.h | 74 +++ tests/include/util/atoi.h| 1 + tests/run-qemu.sh.template | 38 ++ tests/start.S| 28 tests/syscalls.S | 4 + tests/test-hello.c | 26 tests/testlib.c | 114 + tests/user-qemu.mk | 212 +++ 17 files changed, 748 insertions(+), 5 deletions(-) create mode 100644 tests/README create mode 100644 tests/grub.cfg.single.template create mode 100644 tests/include/device/cons.h create mode 12 tests/include/kern/printf.h create mode 100644 tests/include/mach/mig_support.h create mode 100644 tests/include/syscalls.h create mode 100644 tests/include/testlib.h create mode 12 tests/include/util/atoi.h create mode 100644 tests/run-qemu.sh.template create mode 100644 tests/start.S create mode 100644 tests/syscalls.S create mode 100644 tests/test-hello.c create mode 100644 tests/testlib.c create mode 100644 tests/user-qemu.mk diff --git a/configure.ac b/configure.ac index cadc33b6..69f75cf2 100644 --- a/configure.ac +++ b/configure.ac @@ -123,9 +123,6 @@ AC_CHECK_PROG([PATCH], [patch], [patch], [patch-not-found]) # configure fragments. # -# The test suite. -m4_include([tests/configfrag.ac]) - # Default set of device drivers. AC_ARG_ENABLE([device-drivers], AS_HELP_STRING([--enable-device-drivers=WHICH], [specify WHICH (on `ix86-at' @@ -181,6 +178,9 @@ m4_include([configfrag.ac]) # Linux code snarfed into GNU Mach. m4_include([linux/configfrag.ac]) + +# The test suite. +m4_include([tests/configfrag.ac]) # # Compiler features. diff --git a/tests/Makefrag.am b/tests/Makefrag.am index 2723f64a..88c7fe8c 100644 --- a/tests/Makefrag.am +++ b/tests/Makefrag.am @@ -21,6 +21,13 @@ # if !PLATFORM_xen + +include tests/user-qemu.mk + TESTS += \ - tests/test-multiboot -endif + tests/test-multiboot \ + $(USER_TESTS) + +clean-local: $(USER_TESTS_CLEAN) + +endif # !PLATFORM_xen diff --git a/tests/README b/tests/README new file mode 100644 index ..3dacc184 --- /dev/null +++ b/tests/README @@ -0,0 +1,37 @@ + +There are some basic tests that can be run qith qemu. You can run all the tests with + +$ make check + +or selectively with: + +$ make run-hello + +Also, you can debug the existing tests, or a new one, by starting on one shell + +$ make debug-hello + +and on another shell you can attach with gdb, load the symbols of the +bootstrap module and break on its _start(): + +$ gdb gnumach +... +(gdb) target remote :1234 +... +(gdb) b setup_main +Breakpoint 11 at 0x81019d60: file ../kern/startup.c, line 98. +(gdb) c +Continuing. + +Breakpoint 11, setup_main () at ../kern/startup.c:98 +98 cninit(); +(gdb) add-symbol-file ../gnumach/build-64/module-task +Reading symbols from ../gnumach/build-64/module-task... +(gdb) b _start +Breakpoint 12 at 0x40324a: _start. (2 locations) +(gdb) c +Continuing. + +Breakpoint 12, _start () at ../tests/testlib.c:96 +96 { +(gdb) diff --git a/tests/configfrag.ac b/tests/configfrag.ac index 55c6da62..de87cbad 100
[PATCH 10/11] add basic task tests
--- tests/test-task.c | 171 + tests/user-qemu.mk | 3 +- 2 files changed, 173 insertions(+), 1 deletion(-) create mode 100644 tests/test-task.c diff --git a/tests/test-task.c b/tests/test-task.c new file mode 100644 index ..cbc75e23 --- /dev/null +++ b/tests/test-task.c @@ -0,0 +1,171 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +#include +#include +#include +#include +#include + +#include +#include + + +void test_task() +{ + mach_port_t ourtask = mach_task_self(); + mach_msg_type_number_t count; + int err; + + struct task_basic_info binfo; + count = TASK_BASIC_INFO_COUNT; + err = task_info(ourtask, TASK_BASIC_INFO, (task_info_t)&binfo, &count); + ASSERT_RET(err, "TASK_BASIC_INFO"); + ASSERT(binfo.virtual_size > binfo.resident_size, "wrong memory counters"); + + struct task_events_info einfo; + count = TASK_EVENTS_INFO_COUNT; + err = task_info(ourtask, TASK_EVENTS_INFO, (task_info_t)&einfo, &count); + ASSERT_RET(err, "TASK_EVENTS_INFO"); + printf("msgs sent %llu received %llu\n", + einfo.messages_sent, einfo.messages_received); + + struct task_thread_times_info ttinfo; + count = TASK_THREAD_TIMES_INFO_COUNT; + err = task_info(ourtask, TASK_THREAD_TIMES_INFO, (task_info_t)&ttinfo, &count); + ASSERT_RET(err, "TASK_THREAD_TIMES_INFO"); + printf("run user %lld system %lld\n", + ttinfo.user_time64.seconds, ttinfo.user_time64.nanoseconds); +} + + +void dummy_thread(void *arg) +{ + printf("started dummy thread\n"); + while (1) +; +} + +void check_threads(thread_t *threads, mach_msg_type_number_t nthreads) +{ + for (int tid=0; tid
[PATCH 07/11] add thread creation helper to tests
--- tests/include/testlib.h | 1 + tests/testlib_thread_start.c | 86 tests/user-qemu.mk | 1 + 3 files changed, 88 insertions(+) create mode 100644 tests/testlib_thread_start.c diff --git a/tests/include/testlib.h b/tests/include/testlib.h index e492f2f6..a3f3a6a8 100644 --- a/tests/include/testlib.h +++ b/tests/include/testlib.h @@ -65,6 +65,7 @@ const char* e2s(int err); const char* e2s_gnumach(int err); void halt(); int msleep(uint32_t timeout); +thread_t test_thread_start(task_t task, void(*routine)(void*), void* arg); mach_port_t host_priv(void); mach_port_t device_priv(void); diff --git a/tests/testlib_thread_start.c b/tests/testlib_thread_start.c new file mode 100644 index ..c52694ca --- /dev/null +++ b/tests/testlib_thread_start.c @@ -0,0 +1,86 @@ +/* + * MIT License + * + * Copyright (c) 2017 Luc Chabassier + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +/* This small helper was started from + * https://github.com/dwarfmaster/mach-ipc/blob/master/minimal_threads/main.c + * and then reworked. */ + +#include +#include +#include + +/* This is just a temporary mapping to set up the stack */ +static long stack_top[PAGE_SIZE/sizeof(long)] __attribute__ ((aligned (PAGE_SIZE))); + +thread_t test_thread_start(task_t task, void(*routine)(void*), void* arg) { + const vm_size_t stack_size = PAGE_SIZE * 16; + kern_return_t ret; + vm_address_t stack; + + ret = vm_allocate(task, &stack, stack_size, TRUE); + ASSERT_RET(ret, "can't allocate the stack for a new thread"); + + ret = vm_protect(task, stack, PAGE_SIZE, FALSE, VM_PROT_NONE); + ASSERT_RET(ret, "can't protect the stack from overflows"); + + long *top = (long*)((vm_offset_t)stack_top + PAGE_SIZE) - 1; +#ifdef __i386__ + *top = (long)arg; /* The argument is passed on the stack on x86_32 */ + *(top - 1) = 0; /* The return address */ +#elif defined(__x86_64__) + *top = 0; /* The return address */ +#endif + ret = vm_write(task, stack + stack_size - PAGE_SIZE, (vm_offset_t)stack_top, PAGE_SIZE); + ASSERT_RET(ret, "can't initialize the stack for the new thread"); + + thread_t thread; + ret = thread_create(task, &thread); + ASSERT_RET(ret, "thread_create()"); + + struct i386_thread_state state; + unsigned int count; + count = i386_THREAD_STATE_COUNT; + ret = thread_get_state(thread, i386_REGS_SEGS_STATE, + (thread_state_t) &state, &count); + ASSERT_RET(ret, "thread_get_state()"); + +#ifdef __i386__ + state.eip = (long) routine; + state.uesp = (long) (stack + stack_size - sizeof(long) * 2); + state.ebp = 0; +#elif defined(__x86_64__) + state.rip = (long) routine; + state.ursp = (long) (stack + stack_size - sizeof(long) * 1); + state.rbp = 0; + state.rdi = (long)arg; +#endif + ret = thread_set_state(thread, i386_REGS_SEGS_STATE, + (thread_state_t) &state, i386_THREAD_STATE_COUNT); + ASSERT_RET(ret, "thread_set_state"); + + ret = thread_resume(thread); + ASSERT_RET(ret, "thread_resume"); + + return thread; +} diff --git a/tests/user-qemu.mk b/tests/user-qemu.mk index 1ab7cb8c..8f10ba36 100644 --- a/tests/user-qemu.mk +++ b/tests/user-qemu.mk @@ -125,6 +125,7 @@ SRC_TESTLIB= \ $(srcdir)/tests/syscalls.S \ $(srcdir)/tests/start.S \ $(srcdir)/tests/testlib.c \ + $(srcdir)/tests/testlib_thread_start.c \ $(builddir)/tests/errlist.c \ $(MIG_GEN_CC) -- 2.39.2
[PATCH 08/11] add syscall tests
--- tests/test-syscalls.c | 166 ++ tests/user-qemu.mk| 3 +- 2 files changed, 168 insertions(+), 1 deletion(-) create mode 100644 tests/test-syscalls.c diff --git a/tests/test-syscalls.c b/tests/test-syscalls.c new file mode 100644 index ..be4df8c3 --- /dev/null +++ b/tests/test-syscalls.c @@ -0,0 +1,166 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +#include +#include +#include + +#include +#include +#include + + +static struct { + mach_port_t exception_port; + mach_port_t thread; + mach_port_t task; + integer_t exception; + integer_t code; + integer_t subcode; +} last_exc; +kern_return_t catch_exception_raise(mach_port_t exception_port, +mach_port_t thread, mach_port_t task, +integer_t exception, integer_t code, +long_integer_t subcode) +{ + printf("received catch_exception_raise(%u %u %u %d %d %d)\n", + exception_port, thread, task, exception, code, subcode); + last_exc.exception_port = exception_port; + last_exc.thread = thread; + last_exc.task = task; + last_exc.exception = exception; + last_exc.code = code; + last_exc.subcode = subcode; + return KERN_SUCCESS; +} + +static char simple_request_data[PAGE_SIZE]; +static char simple_reply_data[PAGE_SIZE]; +int simple_msg_server(boolean_t (*demuxer) (mach_msg_header_t *request, + mach_msg_header_t *reply), + mach_port_t rcv_port_name, + int num_msgs) +{ + int midx = 0, mok = 0; + int ret; + mig_reply_header_t *request = (mig_reply_header_t*)simple_request_data; + mig_reply_header_t *reply = (mig_reply_header_t*)simple_reply_data; + while ((midx - num_msgs) < 0) +{ + ret = mach_msg(&request->Head, MACH_RCV_MSG, 0, PAGE_SIZE, + rcv_port_name, 0, MACH_PORT_NULL); + switch (ret) +{ +case MACH_MSG_SUCCESS: + if ((*demuxer)(&request->Head, &reply->Head)) +mok++; // TODO send reply + else +FAILURE("demuxer didn't handle the message"); + break; +default: + ASSERT_RET(ret, "receiving in msg_server"); + break; +} + midx++; +} + if (mok != midx) +FAILURE("wrong number of message received"); + return mok != midx; +} + + +void test_syscall_bad_arg_on_stack(void *arg) +{ + /* mach_msg() has 7 arguments, so the last one should be always + passed on the stack on x86. Here we make ESP/RSP point to the + wrong place to test the access check */ +#ifdef __x86_64__ + asm volatile("movq $0x123,%rsp;" \ + "movq $-25,%rax;" \ + "syscall;" \ + ); +#else + asm volatile("mov$0x123,%esp;" \ + "mov$-25,%eax;" \ + "lcall $0x7,$0x0;" \ + ); +#endif + FAILURE("we shouldn't be here!"); +} + +void test_bad_syscall_num(void *arg) +{ +#ifdef __x86_64__ + asm volatile("movq $0x123456,%rax;"\ + "syscall;" \ + ); +#else + asm volatile("mov$0x123456,%eax;"\ + "lcall $0x7,$0x0;" \ + ); +#endif + FAILURE("we shouldn't be here!"); +} + + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + int err; + mach_port_t excp; + + err = mach_port_allocate(mach_task_self (), MACH_PORT_RIGHT_RECEIVE, &excp); + ASSERT_RET(err, "creating exception port"); + + err = mach_port_insert_right(mach_task_self(), excp, excp, + MACH_MSG_TYPE_MAKE_SEND); + ASSERT_RET(err, "inserting send right into exception port"); + + err = task_set_special_port(mach_task_self(), TASK_EXCEPTION_PORT, excp); + ASSERT_RET(err, "setting task exception port"); + + /* FIXME: receiving an exception with small size causes GP on 64 bit userspace */ + /* mig_reply_header_t msg; */ + /* err = mach_msg(&msg.Head, /\* The header *\/ */ + /*
[PATCH 04/11] add mach_port tests
--- tests/test-mach_port.c | 121 + tests/user-qemu.mk | 3 +- 2 files changed, 123 insertions(+), 1 deletion(-) create mode 100644 tests/test-mach_port.c diff --git a/tests/test-mach_port.c b/tests/test-mach_port.c new file mode 100644 index ..81a1072b --- /dev/null +++ b/tests/test-mach_port.c @@ -0,0 +1,121 @@ +/* + * Copyright (C) 2024 Free Software Foundation + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with the program ; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include + +#include +#include + +#include +#include + +void test_mach_port(void) +{ + kern_return_t err; + + mach_port_name_t *namesp; + mach_msg_type_number_t namesCnt=0; + mach_port_type_t *typesp; + mach_msg_type_number_t typesCnt=0; + err = mach_port_names(mach_task_self(), &namesp, &namesCnt, &typesp, &typesCnt); + ASSERT_RET(err, "mach_port_names"); + printf("mach_port_names: type/name length: %d %d\n", namesCnt, typesCnt); + ASSERT((namesCnt != 0) && (namesCnt == typesCnt), + "mach_port_names: wrong type/name length"); + for (int i=0; i
Re: [PATCH 03/14] add mach_host tests
Il 06/01/24 20:43, Sergey Bugaev ha scritto: On Sat, Jan 6, 2024 at 10:26 PM Luca wrote: Uhm, I still have an issue, although a bit different now: By the way, the exception is still the same (General Protection, which is usually forwarded to user space), but for a different reason, apparently a non-canonical address in $rax=0x820175c0 I think this is just that you now have to add 2, not 1, to the result of __builtin_frame_address(). Or better, do what I did in glibc, change it to an explicit function argument [0], since __builtin_frame_address() doesn't return a useful (for this purpose) value on aarch64. [0]: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=24b707c1665afae7eb302542ffa92d53aa577111 great, this works :) Thanks! Luca
Re: [PATCH] news/2023-q4.mdwn: new qoth for q4 of 2023.
Il 06/01/24 21:20, Sergey Bugaev ha scritto: On Sat, Jan 6, 2024 at 10:47 PM jbra...@dismail.de wrote: +Luca Dariz worked on adding [[some simple GNUMach user-space tests I've never seen gnumach spelled like that (GNUMach), only GNU Mach or gnumach. Is that intentional? +Flavio Cruz [[improved GNUMach's +IPC|https://lists.gnu.org/archive/html/bug-hurd/2023-11/msg00033.html]] +by reordering `mach_msg_type_t` fields to byte align `msgt_name` and +`msgt_size`. He also created a patch series to [[avoid message +resizing for +x86_64|https://lists.gnu.org/archive/html/bug-hurd/2023-11/msg00028.html]]. +He also [[removed untyped mach RPC +code|https://lists.gnu.org/archive/html/bug-hurd/2023-11/msg00026.html]], +since the Hurd always uses typed Mach RPC. It's not that much that *the Hurd* uses typed IPC, it's that gnumach does. The Hurd could easily support either, and in fact this support was present (though likely bitrotten), and this is what Flavio has removed. This is because going forward, we don't see the Hurd running on any other flavor of Mach; but it was originally the goal for it to run on non-GNU Machs, IIRC. +Sergey Bugaev added [[GNUMach entry re-coalescing +support|https://darnassus.sceen.net/~hurd-web/open_issues/gnumach_vm_map_entry_forward_merging/]]. +Essentially, Mach is not always able to merge two vm entries that are +made next to each other, which slows down ext2 and bash. Sergey +allowed GNUMach to merge entries in the common cases, which greatly +helps ext2fs. Is that actually true though — did it speed anything up? I reinstalled hurd-amd64 and it does seem faster than a few months ago, at least noticeable during a foreign debootstrap (on both 32 and 64-bit). Luca
Re: [PATCH 03/14] add mach_host tests
Il 06/01/24 20:02, Sergey Bugaev ha scritto: On Sat, Jan 6, 2024 at 9:45 PM Samuel Thibault wrote: Luca, le sam. 06 janv. 2024 19:41:17 +0100, a ecrit: Il 29/12/23 15:14, Luca Dariz ha scritto: Il 29/12/23 14:44, Samuel Thibault ha scritto: Also, it would be useful to compile the tests with -ftrivial-auto-var-init=pattern so as to fill the structures with random values before making the gnumach calls. with this option all tests fail on the first mig-generated stub entry, which is task_get_special_port(), in _start(). Maybe it's related to SSE somehow, I see a page fault here: Is $rbp unaligned? (we do want to fix such bug anyway) (gdb) disassemble task_get_special_port Dump of assembler code for function task_get_special_port: 0x00416bc6 <+0>: push %rbp 0x00416bc7 <+1>: mov%rsp,%rbp 0x00416bca <+4>: sub$0xa0,%rsp 0x00416bd1 <+11>: mov%edi,-0x94(%rbp) 0x00416bd7 <+17>: mov%esi,-0x98(%rbp) 0x00416bdd <+23>: mov%rdx,-0xa0(%rbp) 0x00416be4 <+30>: lea-0x60(%rbp),%rax 0x00416be8 <+34>: movdqa 0x124f0(%rip),%xmm0# 0x4290e0 => 0x00416bf0 <+42>: movaps %xmm0,(%rax) 0x00416bf3 <+45>: movaps %xmm0,0x10(%rax) Yes, you have to align the stack. An executable gets entered at _start (or whatever the ELF header specifies) with %rsp 16-aligned, but you must enter C code with %rsp being 8 modulo 8. To fix this, change your _start like so: asm(".global _start\n" "_start:\n" " callq c_start"); void __attribute__((used, retain)) c_start() { ... } Uhm, I still have an issue, although a bit different now: (gdb) disassemble c_start Dump of assembler code for function c_start: 0x00402ec1 <+0>: push %rbp 0x00402ec2 <+1>: mov%rsp,%rbp 0x00402ec5 <+4>: sub$0x30,%rsp 0x00402ec9 <+8>: movl $0xfefefefe,-0x8(%rbp) 0x00402ed0 <+15>: movl $0xfefefefe,-0xc(%rbp) 0x00402ed7 <+22>: mov%rbp,%rax 0x00402eda <+25>: add$0x8,%rax 0x00402ede <+29>: mov%rax,-0x18(%rbp) 0x00402ee2 <+33>: mov-0x18(%rbp),%rax 0x00402ee6 <+37>: mov%rax,-0x20(%rbp) 0x00402eea <+41>: mov-0x20(%rbp),%rax 0x00402eee <+45>: mov(%rax),%rax 0x00402ef1 <+48>: mov%eax,0x2a111(%rip)# 0x42d008 0x00402ef7 <+54>: mov-0x20(%rbp),%rax 0x00402efb <+58>: add$0x8,%rax 0x00402eff <+62>: mov%rax,0x2915a(%rip)# 0x42c060 0x00402f06 <+69>: mov0x29153(%rip),%rax# 0x42c060 0x00402f0d <+76>: mov0x2a0f5(%rip),%edx# 0x42d008 0x00402f13 <+82>: movslq %edx,%rdx 0x00402f16 <+85>: add$0x1,%rdx 0x00402f1a <+89>: shl$0x3,%rdx 0x00402f1e <+93>: add%rdx,%rax 0x00402f21 <+96>: mov%rax,0x2a0e8(%rip)# 0x42d010 0x00402f28 <+103>: movl $0x0,0x2a0e6(%rip)# 0x42d018 0x00402f32 <+113>: jmp0x402f43 0x00402f34 <+115>: mov0x2a0de(%rip),%eax# 0x42d018 0x00402f3a <+121>: add$0x1,%eax 0x00402f3d <+124>: mov%eax,0x2a0d5(%rip)# 0x42d018 0x00402f43 <+130>: mov0x2a0c6(%rip),%rax# 0x42d010 0x00402f4a <+137>: mov0x2a0c8(%rip),%edx# 0x42d018 0x00402f50 <+143>: movslq %edx,%rdx 0x00402f53 <+146>: shl$0x3,%rdx 0x00402f57 <+150>: add%rdx,%rax => 0x000000402f5a <+153>: mov(%rax),%rax 0x00402f5d <+156>: test %rax,%rax By the way, the exception is still the same (General Protection, which is usually forwarded to user space), but for a different reason, apparently a non-canonical address in $rax=0x820175c0 Luca
Re: [PATCH 03/14] add mach_host tests
Il 06/01/24 19:44, Samuel Thibault ha scritto: Luca, le sam. 06 janv. 2024 19:41:17 +0100, a ecrit: Il 29/12/23 15:14, Luca Dariz ha scritto: Il 29/12/23 14:44, Samuel Thibault ha scritto: Also, it would be useful to compile the tests with -ftrivial-auto-var-init=pattern so as to fill the structures with random values before making the gnumach calls. with this option all tests fail on the first mig-generated stub entry, which is task_get_special_port(), in _start(). Maybe it's related to SSE somehow, I see a page fault here: Is $rbp unaligned? (we do want to fix such bug anyway) it's aligned to 8 byte, but it seems that for MOVAPS we need 16-biyte alignment. By the way, by mistake that dump was with yet another change not part of this patch set, but also with the simple setup the failure is the same, just in another place (when printing the first string). Luca
Re: [PATCH 03/14] add mach_host tests
Il 29/12/23 15:14, Luca Dariz ha scritto: Il 29/12/23 14:44, Samuel Thibault ha scritto: Also, it would be useful to compile the tests with -ftrivial-auto-var-init=pattern so as to fill the structures with random values before making the gnumach calls. with this option all tests fail on the first mig-generated stub entry, which is task_get_special_port(), in _start(). Maybe it's related to SSE somehow, I see a page fault here: (gdb) disassemble task_get_special_port Dump of assembler code for function task_get_special_port: 0x00416bc6 <+0>: push %rbp 0x00416bc7 <+1>: mov%rsp,%rbp 0x00416bca <+4>: sub$0xa0,%rsp 0x00416bd1 <+11>: mov%edi,-0x94(%rbp) 0x00416bd7 <+17>: mov%esi,-0x98(%rbp) 0x00416bdd <+23>: mov%rdx,-0xa0(%rbp) 0x00416be4 <+30>: lea-0x60(%rbp),%rax 0x00416be8 <+34>: movdqa 0x124f0(%rip),%xmm0# 0x4290e0 => 0x00416bf0 <+42>: movaps %xmm0,(%rax) 0x00416bf3 <+45>: movaps %xmm0,0x10(%rax) Luca
Re: [PATCH 11/14] add raw mach_msg tests
Il 29/12/23 15:20, Samuel Thibault ha scritto: Luca Dariz, le jeu. 28 déc. 2023 20:42:58 +0100, a ecrit: + +#define align_inline(val, n) { val = align(val, n); } Rather name it ALIGN_INLINE, so people don't mistake this for a function (that doesn't have side-effects on its parameters). ok +void +run_test_simple(void *msg, mach_msg_size_t msglen, mach_msg_id_t msgid) +{ + mach_msg_header_t *head = msg; + mach_port_t port, receive; + int err; + + err = syscall_mach_port_allocate (mach_task_self (), +MACH_PORT_RIGHT_RECEIVE, &port); + ASSERT_RET(err, "syscall_mach_port_allocate"); + + err = syscall_mach_port_allocate (mach_task_self (), +MACH_PORT_RIGHT_RECEIVE, &receive); + ASSERT_RET(err, "syscall_mach_port_allocate 2"); + + struct echo_params params; + params.rx_port = port; + params.rx_size = msglen; + params.rx_number = 1; + test_thread_start (mach_task_self (), echo_thread, ¶ms); + msleep(100); Is this msleep really needed? If yes, this is racy. If it's just to make the test a bit more stressful, then ok. not really, I'll remove it Luca
Re: [PATCH 08/14] add thread creation helper to tests
Il 29/12/23 15:14, Samuel Thibault ha scritto: Luca Dariz, le jeu. 28 déc. 2023 20:42:55 +0100, a ecrit: diff --git a/tests/testlib.c b/tests/testlib.c index 6abe8c4d..e6be46ee 100644 --- a/tests/testlib.c +++ b/tests/testlib.c @@ -95,3 +95,56 @@ void _start() printf("%s: test %s exit code %x\n", TEST_SUCCESS_MARKER, argv[0], ret); halt(); } + +// started from https://github.com/dwarfmaster/mach-ipc/blob/master/minimal_threads/main.c So this *has* to reproduce the MIT license header of that repository. Better put it in a separate file. ok +static uint32_t stack_top[PAGE_SIZE] __attribute__ ((aligned (PAGE_SIZE))); +thread_t test_thread_start(task_t task, void(*routine)(void*), void* arg) { + const vm_size_t stack_size = PAGE_SIZE * 16; + kern_return_t ret; + vm_address_t stack; + + ret = vm_allocate(task, &stack, stack_size, TRUE); + ASSERT_RET(ret, "can't allocate the stack for a new thread"); + + ret = vm_protect(task, stack, PAGE_SIZE, FALSE, VM_PROT_NONE); + ASSERT_RET(ret, "can't protect the stack from overflows"); + + long *top = (long*)((vm_offset_t)stack_top + PAGE_SIZE) - 2; + *top = 0;/* The return address */ +#ifndef __x86_64__ + *(top + 1) = (long)arg; /* The argument is passed on the stack on x86_32 */ +#endif + ret = vm_write(task, stack + stack_size - PAGE_SIZE, (vm_offset_t)stack_top, PAGE_SIZE); + ASSERT_RET(ret, "can't initialize the stack for the new thread"); + + thread_t thread; + ret = thread_create(task, &thread); + ASSERT_RET(ret, "thread_create()"); + + struct i386_thread_state state; + unsigned int count; + count = i386_THREAD_STATE_COUNT; + ret = thread_get_state(thread, i386_REGS_SEGS_STATE, + (thread_state_t) &state, &count); + ASSERT_RET(ret, "thread_get_state()"); + +#ifdef __x86_64__ + state.rip = (long) routine; + state.ursp = (long) (stack + stack_size - 16); Rather use sizeof(long)*2. + state.rbp = 0; + state.rdi = (long)arg; +#else + state.eip = (long) routine; + state.uesp = (long) (stack + stack_size - 8); And here as well. ok + state.ebp = 0; +#endif + ret = thread_set_state(thread, i386_REGS_SEGS_STATE, + (thread_state_t) &state, i386_THREAD_STATE_COUNT); + ASSERT_RET(ret, "thread_set_state"); + + ret = thread_resume(thread); + ASSERT_RET(ret, "thread_resume"); + + return thread; +} -- 2.39.2
Re: [PATCH 07/14] add basic vm tests
Il 29/12/23 15:09, Samuel Thibault ha scritto: Luca Dariz, le jeu. 28 déc. 2023 20:42:54 +0100, a ecrit: + // this emulates maptime() + struct mapped_time_value *mtime; + mach_port_t device, memobj; + int err = device_open (device_priv(), 0, "time", &device); + ASSERT_RET(err, "device_open"); + err = device_map (device, VM_PROT_READ, 0, sizeof(*mtime), &memobj, 0); + ASSERT_RET(err, "device_map"); + err = mach_port_deallocate (mach_task_self (), device); + ASSERT_RET(err, "mach_port_deallocate"); + mtime = 0; + err = +vm_map (mach_task_self (), (vm_address_t *)&mtime, sizeof *mtime, 0, 1, +memobj, 0, 0, VM_PROT_READ, VM_PROT_READ, VM_INHERIT_NONE); + ASSERT_RET(err, "vm_map"); + err = mach_port_deallocate (mach_task_self (), memobj); + ASSERT_RET(err, "mach_port_deallocate"); I'd say try to print the content, so as to check that this did return a pointer that is readable? ok Luca
Re: [PATCH 06/14] adjust range when changing memory pageabilityg
Il 29/12/23 15:06, Samuel Thibault ha scritto: Luca Dariz, le jeu. 28 déc. 2023 20:42:53 +0100, a ecrit: * vm/vm_map.c: if the start address is not in the map, try to find the nearest entry instead of failing. This caused the initial vm_wire_all(host, task VM_WIRE_ALL) in glibc startup to fail with KERN_NO_SPACE. --- vm/vm_map.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/vm/vm_map.c b/vm/vm_map.c index 26e18676..4a5861e3 100644 --- a/vm/vm_map.c +++ b/vm/vm_map.c @@ -1774,13 +1774,24 @@ kern_return_t vm_map_pageable( if (!vm_map_lookup_entry(map, start, &start_entry)) { /* -* Start address is not in map; this is fatal. +* Start address is not in map; try to find the nearest entry It is wrong to drop the error like this. The caller did ask for these addresses. If they are wrong, the caller should be told so, and not papered over. Possibly it's vm_map_pageable_all or such that actually needs to be fixed. yes that seems a better place. Luca
Re: [PATCH 05/14] add mach_port tests
Il 29/12/23 15:01, Samuel Thibault ha scritto: Luca Dariz, le jeu. 28 déc. 2023 20:42:52 +0100, a ecrit: + mach_port_t newname = 123; Why initializing it? the idea was to check that the value it's actually set, I'll fix it by using a valid port name. + err = mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &newname); + ASSERT_RET(err, "mach_port_allocate"); + err = mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_PORT_SET, &newname); + ASSERT_RET(err, "mach_port_allocate"); + err = mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_DEAD_NAME, &newname); + ASSERT_RET(err, "mach_port_allocate"); You could check that you are getting different names. ok + err = mach_port_destroy(mach_task_self(), newname); + ASSERT_RET(err, "mach_port_destroy"); + + err = mach_port_deallocate(mach_task_self(), 1002); + ASSERT_RET(err, "mach_port_deallocate"); + + mach_port_urefs_t urefs; + err = mach_port_get_refs(mach_task_self(), 1002, MACH_PORT_RIGHT_DEAD_NAME, &urefs); + ASSERT_RET(err, "mach_port_get_refs"); You could also check the references of the ports created above. ok Luca
Re: 64bit startup
Il 04/01/24 10:55, Sergey Bugaev ha scritto: P.S. I have posted all of my patches, so if you're interested in hacking on aarch64-gnu Mach, you should be able to build the full toolchain now. sure, I've started looking into it, but it will take a while before I can run something in userspace. I'm working on top of your gnumach patch for now. Luca
Re: 64bit startup
Il 05/01/24 19:12, Sergey Bugaev ha scritto: /servers/crash-dump-core crashes on the memset () call in hurd:exec/elfcore.c:fetch_thread_fpregset (); the (*fpregs) pointer is NULL. The caller passes fpregs = ¬e.data.pr_fpreg, where note.data is of type struct elf_lwpstatus, defined in hurd:include/sys/procfs.h, whose pr_fpreg field is of type prfpregset_t, which is a typedef to fpregset_t, which was an actual struct on i386, but is a pointer on x86_64. This would've been easier to debug if I had debuginfo :) I had this small patch applied that apparently is enough for me to have some kind of core dump, I'm not sure if it's a good solution: diff --git a/exec/elfcore.c b/exec/elfcore.c index c6aa2bc8b..405fa8e0c 100644 --- a/exec/elfcore.c +++ b/exec/elfcore.c @@ -544,6 +544,11 @@ dump_core (task_t task, file_t file, off_t corelimit, note.data.pr_info.si_code = sigcode; note.data.pr_info.si_errno = sigerror; +#ifdef __x86_64__ + struct _libc_fpstate fpstate; + memset(&fpstate, 0, sizeof(fpstate)); + note.data.pr_fpreg = &fpstate; +#endif fetch_thread_regset (threads[i], ¬e.data.pr_reg); fetch_thread_fpregset (threads[i], ¬e.data.pr_fpreg); HTH Luca
Re: 64bit startup
Hi Sergey, Il 03/01/24 09:17, Sergey Bugaev ha scritto: How are you running it? Should I still be using a ramdisk image and not rumpdisk? Recently I've been installing hurd-amd64 on another disk of my hurd-i386 vm and booting from that. Basically I prepare the disk with debootstrap --foreign, then I reuse the i386 grub install to boot the 64 bit kernel with a custom entry, then run the --second stage, configure login, fstab and network and reboot. I can give you the exact commands and setup I'm using if you want (I need to reinstall it anyway due to latest changes), I'm currently using qemu via virt-manager, mostly with the default configuration for an x86_64 vm; that means a virtual SATA disk controller and Q35 chipset. The only issue I see is that sometimes at shutdown rumpdisk hangs and I can't halt the system, however this seems the same with hurd-i686 and it doesn't happen if I force the shutdown with halt-hurd (or reboot-hurd). I didn't had a deeper look at this so far, but I don't have issues booting or connecting via ssh. I didn't try heavy builds, so probably that's why I don't see Samuel's issue, but I've been building gdb and the hurd itself (for a fix for the crash server that I have in my queue). Luca
Re: aarch64-gnu (and Happy New Year!)
Il 01/01/24 14:51, Sergey Bugaev ha scritto: On Mon, Jan 1, 2024 at 4:02 PM Luca wrote: Hi Sergey, Hi Luca, Really great work! To work on gnumach we just need MIG and any armv8 compiler (also targeting GNU/Linux is fine), and it seems MIG works fine without adjustments? Maybe there could be some issues once it's run somewhere, e.g. alignment issues. Well, I have the aarch64-gnu toolchain here, please see the binutils & GCC patches that I posted. But yeah, any aarch64-targeting compiler should work for gnumach, including aarch64-linux-gnu or just a generic embedded aarch64 target. MIG seems to just work (thanks to all the Flávio's work!). I'm using the same message ABI as on x86_64, and haven't seen any issues so far — neither compiler errors / failed static assertions (about struct sizes and such), nor hardware errors from misaligned accesses. Good to hear! so if I understand it correctly you tested it on aarch64 GNU/Linux? Once I post the preliminary gnumach patches, you should be able to build a complete binutils + GCC + MIG aarch64-gnu toolchain. Another issue with ARM in general is that the hardware support is much less streamlined than x86, although with v8 there should be some alignment on basic stuff like IRQ and UEFI. So I've heard, yeah. But yeah I've also heard that newer chips support UEFI, and from UEFI it should be possible to boot GRUB and from there gnumach, so maybe the boot process is not as much of a problem as it was, and we could avoid separate per-board builds of gnumach. Starting the kernel itself will probably not a problem, thanks to GRUB, but it shouldn't be too hard to add support for U-Boot either if needed. I think more issues might come out setting up the various pieces of the system. For example, some chips have heterogeneous cores, (e.g.mine has two A72 cores and four A53 cores) so SMP will be more complicated. Also, about the serial console, it might be useful at some point to use a driver from userspace, if we can reuse some drivers from netbsd or linux, to avoid embedding all of them in gnumach. But as always, I have no idea what I'm talking about. Probably even the serial console needs a platform-specific driver (I'm not sure, I'm more familiar with older and more embedded variants like Cortex-M) To bootstrap gnumach the first thing we'd need would probably be the console, setting up the virtual memory, then thread states, context switch, irqs and userspace entry points (list by no means exhaustive). I actually have an armv8 server that would be handy for some development, so I might be able to help with something in the future. Sounds great! (But you should also be able to use qemu, no?) You can emulate aarch64 from an amd64 machine, but it's not very efficient. Also, I would realay like to run GNU/Hurd natively :) Even before you write any code, it would be great if you helped review my gnumach patches (e.g. aarch64_thread_state & the arm/aarch64 exceptions). sure, but I'll have to get a bit more familiar with the aarch64 details first. Also, there is a bunch of design work to do. Will/can AArch64 use the same mechanism for letting userland handle interrupts? Do we have all the mechanisms required for userland to poke at specific addresses in memory (to replace I/O ports)? — I believe we do, but I haven't looked closely. AFAIK there are no I/O ports in ARM, the usual way to configure things is with memory-mapped registers, so this might become even easier. About IRQs, probably it needs to be arch-specific anyway. What should the API for manipulating PAC keys look like? I kind of want it to be just another flavor of thread state, but then it is really supposed to be per-trask, not per-thread, and maybe adding some aarch64-specific RPCs to read and write the PAC keys in mach_aarch64.defs is the right way. But also AFAICS Mach currently has no notion of per-task arch-specific data (unlike for threads, and other than the VM map), so it'd be interesting to add one — could it be useful for something else? Do you mean Pointer Authentication Code? That seems quite a new feature, so maybe not fundamental to have a first working kernel, but anyway I don't see issues in having arch-specific data in tasks. What are the debugging facilities available on ARM / AArch64? Should we expose them as more flavors of thread state, or...? What would GDB need? For gnumach, it might be easier to debug on real hw than x86 if we can use openocd as gdbserver (but JTAG lines are not always reachable). For user space tasks I don't know. Should gnumach accept tagged addresses (like PR_SET_TAGGED_ADDR_CTRL on Linux)? It could, but also it doesn't seem mandatory, so I would start without it. Can we make Linux code (in-Mach drivers, pfinet, netdde, ...) work on AArch64? I think nowadays LKL would be a better target, unless
Re: aarch64-gnu (and Happy New Year!)
teers for a gnumach port? :) Another issue with ARM in general is that the hardware support is much less streamlined than x86, although with v8 there should be some alignment on basic stuff like IRQ and UEFI. Probably even the serial console needs a platform-specific driver (I'm not sure, I'm more familiar with older and more embedded variants like Cortex-M) To bootstrap gnumach the first thing we'd need would probably be the console, setting up the virtual memory, then thread states, context switch, irqs and userspace entry points (list by no means exhaustive). I actually have an armv8 server that would be handy for some development, so I might be able to help with something in the future. Luca
Re: [PATCH 04/14] add gsync tests
Il 29/12/23 14:56, Samuel Thibault ha scritto: Luca Dariz, le jeu. 28 déc. 2023 20:42:51 +0100, a ecrit: +static void single_t2(void *arg) +{ + int err; + msleep(100); + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake t2"); + + err = thread_terminate(mach_thread_self()); + ASSERT_RET(err, "thread_terminate"); thread_terminate is not actually supposed to return, so that'd rather be an assert(0) right, I think I added the FAILURE() macro later than this and I forgot to update +} + +static void test_single() +{ + int err; + single_shared = 0; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 0, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait 1"); + + single_shared = 1; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 0, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_INVALID_ARGUMENT, "gsync_wait 2"); + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 1, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait 2b"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 1"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, GSYNC_BROADCAST); + ASSERT_RET(err, "gsync_wake 2"); Here, you want to try gsync_wait again, and check that they time out again (i.e. the kernel did forget about the gsync_wake, as expected). ok added + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 2, GSYNC_MUTATE); + ASSERT_RET(err, "gsync_wake 2a"); + ASSERT(single_shared == 2, "gsync_wake single_shared"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 3"); + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 3a"); Why two such calls? To be honest I don't remember, maybe a mistake. I'll try to add more descriptive messages on assertions. + single_shared = 1; + test_thread_start(mach_task_self(), single_t2, 0); + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 1, 0, 0, 0); + ASSERT_RET(err, "gsync_wait 4"); This is racy since the thread may start and call gsync_wake before gsync_wait gets to block, and thus gsync_wait would stay stuck. Of course in practice this doesn't happen because of the msleep(100) in the thread, but that still leaves it racy and prone to fail if the system is e.g. very busy. You want to make single_t2 modify single_shared, so that if test_single2 happens to call gsync_wait after that, it will just not block and return KERN_INVALID_ARGUMENT instead, as expected. ok +static void test_single2() +{ + int err; + test_thread_start(mach_task_self(), single_t2, 0); + single_shared = 2; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 2, 0, 0, 0); + ASSERT_RET(err, "gsync_wait 1"); + // should this fail? + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 2"); +} I don't understand the test. Why waking after waiting? There is nobody to wake. You also want to set single_shared before starting the thread. Looking at it again, it really seems the same case as above, I don't remember why I added the additional wake. I'll make this a test_single_from_thread() or similar, merge your suggestion above and remove the duplicate test from test_single(), I guess the exact value used in the synchronization is not relevant, as long as it's the expected one. Luca
Re: [PATCH 03/14] add mach_host tests
Il 29/12/23 14:44, Samuel Thibault ha scritto: Luca Dariz, le jeu. 28 déc. 2023 20:42:50 +0100, a ecrit: --- tests/test-mach_host.c | 54 ++ 1 file changed, 54 insertions(+) create mode 100644 tests/test-mach_host.c diff --git a/tests/test-mach_host.c b/tests/test-mach_host.c new file mode 100644 index ..99fc4aca --- /dev/null +++ b/tests/test-mach_host.c Such files are also non-trivial, and thus deserve a copyright header too. @@ -0,0 +1,54 @@ + +#include + +#include + +void test_kernel_version() +{ + int err; + kernel_version_t kver; + err = host_get_kernel_version(mach_host_self(), kver); + ASSERT_RET(err, "host_kernel_info"); + printf("kernel version: %s\n", kver); +} + +void test_host_info() +{ + int err; + mach_msg_type_number_t count; + mach_port_t thishost = mach_host_self(); + + host_basic_info_data_t binfo; + count = HOST_BASIC_INFO_COUNT; + err = host_info(thishost, HOST_BASIC_INFO, (host_info_t)&binfo, &count); + ASSERT_RET(err, "host_basic_info"); + ASSERT(count == HOST_BASIC_INFO_COUNT, ""); Perhaps also make some basic checks, such as binfo.avail_cpus <= binfo.max_cpus. ok + const int maxcpus = 255; + int proc_slots[maxcpus]; + count = maxcpus; + err = host_info(thishost, HOST_PROCESSOR_SLOTS, (host_info_t)&proc_slots, &count); + ASSERT_RET(err, "host_processor_slots"); + ASSERT((1 <= count) && (count <= maxcpus), ""); + + host_sched_info_data_t sinfo; + count = HOST_SCHED_INFO_COUNT; + err = host_info(thishost, HOST_SCHED_INFO, (host_info_t)&sinfo, &count); + ASSERT_RET(err, "host_sched_info"); + ASSERT(count == HOST_SCHED_INFO_COUNT, ""); Here you can probably check that the min_* values is smaller than e.g. 1000, to catch randomly-filled values. ok Also, it would be useful to compile the tests with -ftrivial-auto-var-init=pattern so as to fill the structures with random values before making the gnumach calls. good idea, I didn't know this feature. Luca
Re: [PATCH 02/14] add basic user-space tests with qemu
Il 29/12/23 14:57, Samuel Thibault ha scritto: Luca Dariz, le ven. 29 déc. 2023 14:51:31 +0100, a ecrit: Il 29/12/23 14:37, Samuel Thibault ha scritto: Luca Dariz, le jeu. 28 déc. 2023 20:42:49 +0100, a ecrit: new file mode 100644 index ..4cf25891 --- /dev/null +++ b/tests/README @@ -0,0 +1,37 @@ + +There are some basic tests that can be run qith qemu. You can run all the tests with + +$ make check + +or selectively with: + +$ make run-hello + +Also, you can debug the existing tests, or a new one, by starting on one shell + +$ make debug-hello + +and on another shell you can attach with gdb, load the symbols of the +bootstrap module and break on its _start(): + +$ gdb gnuamch Typo ;) What would be a better command? The command is fine, I just mean the typo gnuamch -> gnumach :) aah ok :) +++ b/tests/run-qemu.sh.template +++ b/tests/test-hello.c +++ b/tests/user-qemu.mk These are non-trivial, and so definitely need a copyright header. Right, I forgot to add it to all files... I'll resend the patch set Perhaps wait for my reviews on the other patchs. diff --git a/tests/syscalls.S b/tests/syscalls.S new file mode 100644 index ..df9c9bc0 --- /dev/null +++ b/tests/syscalls.S @@ -0,0 +1,4 @@ + Spurious line? I'll add the copyright header here also +#include + +kernel_trap(invalid_syscall,-31,0) For such a trivial content it's not really useful. Samuel
Re: [PATCH 02/14] add basic user-space tests with qemu
Hi, Il 29/12/23 14:37, Samuel Thibault ha scritto: Hello, Thanks, this looks good! Luca Dariz, le jeu. 28 déc. 2023 20:42:49 +0100, a ecrit: new file mode 100644 index ..4cf25891 --- /dev/null +++ b/tests/README @@ -0,0 +1,37 @@ + +There are some basic tests that can be run qith qemu. You can run all the tests with + +$ make check + +or selectively with: + +$ make run-hello + +Also, you can debug the existing tests, or a new one, by starting on one shell + +$ make debug-hello + +and on another shell you can attach with gdb, load the symbols of the +bootstrap module and break on its _start(): + +$ gdb gnuamch Typo ;) What would be a better command? This is actually how I start gdb from the build directory, then I have a .gdbinit that connects to qemu and adds the userspace symbols if needed, sets breakpoints etc... Maybe I can add a minimal .gdbinit and copy it in the build directory? +++ b/tests/run-qemu.sh.template +++ b/tests/test-hello.c +++ b/tests/user-qemu.mk These are non-trivial, and so definitely need a copyright header. Right, I forgot to add it to all files... I'll resend the patch set diff --git a/tests/syscalls.S b/tests/syscalls.S new file mode 100644 index ..df9c9bc0 --- /dev/null +++ b/tests/syscalls.S @@ -0,0 +1,4 @@ + Spurious line? I'll add the copyright header here also +#include + +kernel_trap(invalid_syscall,-31,0)
[PATCH 14/14] add tests to make check
--- tests/user-qemu.mk | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tests/user-qemu.mk b/tests/user-qemu.mk index 50b04736..669eb77a 100644 --- a/tests/user-qemu.mk +++ b/tests/user-qemu.mk @@ -178,7 +178,15 @@ clean-test-%: USER_TESTS := \ - tests/test-hello + tests/test-hello \ + tests/test-mach_host \ + tests/test-gsync \ + tests/test-machmsg \ + tests/test-mach_port \ + tests/test-syscalls \ + tests/test-task \ + tests/test-threads \ + tests/test-vm USER_TESTS_CLEAN = $(subst tests/,clean-,$(USER_TESTS)) -- 2.39.2
[PATCH 11/14] add raw mach_msg tests
--- tests/test-machmsg.c | 390 +++ 1 file changed, 390 insertions(+) create mode 100644 tests/test-machmsg.c diff --git a/tests/test-machmsg.c b/tests/test-machmsg.c new file mode 100644 index ..c262f5f4 --- /dev/null +++ b/tests/test-machmsg.c @@ -0,0 +1,390 @@ + +#include +#include +#include + +#include +#include + +#include +#include +#include + +#define ECHO_MAX_BODY_LEN 256 + +static uint32_t align(uint32_t val, size_t aln) +{ + // we should check aln is a power of 2 + aln--; + return (val + aln) & (~aln); +} + +#define align_inline(val, n) { val = align(val, n); } + +struct echo_params +{ + mach_port_t rx_port; + mach_msg_size_t rx_size; + mach_msg_size_t rx_number; +}; + +void echo_thread (void *arg) +{ + struct echo_params *params = arg; + int err; + struct message + { +mach_msg_header_t header; +char body[ECHO_MAX_BODY_LEN]; + } message; + + printf ("thread echo started\n"); + for (mach_msg_size_t i=0; irx_number; i++) +{ + message.header.msgh_local_port = params->rx_port; + message.header.msgh_size = sizeof (message); + + err = mach_msg (&message.header, + MACH_RCV_MSG, + 0, sizeof (message), + params->rx_port, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); + ASSERT_RET(err, "mach_msg echo rx"); + printf("echo rx %d expected 5d\n", + message.header.msgh_size, params->rx_size); + ASSERT(message.header.msgh_size == params->rx_size, + "wrong size in echo rx"); + + message.header.msgh_local_port = MACH_PORT_NULL; + printf ("echo: msgh_id %d\n", message.header.msgh_id); + + err = mach_msg (&message.header, + MACH_SEND_MSG, + message.header.msgh_size, 0, + MACH_PORT_NULL, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); + ASSERT_RET(err, "mach_msg echo tx"); +} + printf ("thread echo stopped\n"); + err = thread_terminate (mach_thread_self ()); + ASSERT_RET(err, "thread_terminate"); +} + +#define TEST_ITERATIONS 3 + +// TODO run_test_iterations +void +test_iterations (void) +{ + mach_port_t port, receive; + int err; + struct message + { +mach_msg_header_t header; +mach_msg_type_t type; +char data[64]; + } message; + + err = mach_port_allocate (mach_task_self (), + MACH_PORT_RIGHT_RECEIVE, &port); + ASSERT_RET(err, "mach_port_allocate"); + + err = mach_port_allocate (mach_task_self (), + MACH_PORT_RIGHT_RECEIVE, &receive); + ASSERT_RET(err, "mach_port_allocate 2"); + + struct echo_params params; + params.rx_port = port; + params.rx_size = sizeof(message.header) + sizeof(message.type) + 5; + align_inline(params.rx_size, MACH_MSG_USER_ALIGNMENT); + params.rx_number = TEST_ITERATIONS; + test_thread_start (mach_task_self (), echo_thread, ¶ms); + + time_value_t start_time; + err = host_get_time (mach_host_self (), &start_time); + ASSERT_RET(err, "host_get_time"); + + /* Send a message down the port */ + for (int i = 0; i < TEST_ITERATIONS; i++) +{ + struct message message; + + memset (&message, 0, sizeof (message)); + strcpy (message.data, "ciao"); + size_t datalen = strlen (message.data) + 1; + + message.header.msgh_bits + = MACH_MSGH_BITS (MACH_MSG_TYPE_MAKE_SEND, + MACH_MSG_TYPE_MAKE_SEND_ONCE); + message.header.msgh_remote_port = port; /* Request port */ + message.header.msgh_local_port = receive;/* Reply port */ + message.header.msgh_id = 123;/* Message id */ + message.header.msgh_size = sizeof (message.header) + sizeof (message.type) + datalen;/* Message size */ + align_inline(message.header.msgh_size, 4); + message.type.msgt_name = MACH_MSG_TYPE_STRING; /* Parameter type */ + message.type.msgt_size = 8 * datalen;/* # Bits */ + message.type.msgt_number = 1;/* Number of elements */ + message.type.msgt_inline = TRUE; /* Inlined */ + message.type.msgt_longform = FALSE; /* Shortform */ + message.type.msgt_deallocate = FALSE;/* Do not deallocate */ + message.type.msgt_unused = 0;/* = 0 */ + + /* Send the message on its way and wait for a reply. */ + err = mach_msg (&message.header, /* The header */ + MACH_SEND_MSG | MACH_RCV_MSG, /* Flags */ + message.header.msgh_size, /* Send size */ + sizeof (message), /* Max receive Size */ + receive, /* Receive port */ + MACH_MSG_TIMEOUT_NONE,/* No timeout */ + MACH_PORT_NULL); /* No notification */ + ASSERT_RET(err, "mach_msg txrx"); +} + + time_value_t stop_time; + err = host_get_time (mach_host_self (), &stop_time); + ASSERT_RET(err, "host_get_time"); + + printf ("start: %d.%06d\n", start_time.seconds, star
[PATCH 12/14] add basic task tests
--- tests/test-task.c | 149 ++ 1 file changed, 149 insertions(+) create mode 100644 tests/test-task.c diff --git a/tests/test-task.c b/tests/test-task.c new file mode 100644 index ..da9289ff --- /dev/null +++ b/tests/test-task.c @@ -0,0 +1,149 @@ + +#include +#include + +#include +#include +#include +#include +#include + +#include +#include + + +void test_task() +{ + mach_port_t ourtask = mach_task_self(); + mach_msg_type_number_t count; + int err; + + struct task_basic_info binfo; + count = TASK_BASIC_INFO_COUNT; + err = task_info(ourtask, TASK_BASIC_INFO, (task_info_t)&binfo, &count); + ASSERT_RET(err, "TASK_BASIC_INFO"); + + struct task_events_info einfo; + count = TASK_EVENTS_INFO_COUNT; + err = task_info(ourtask, TASK_EVENTS_INFO, (task_info_t)&einfo, &count); + ASSERT_RET(err, "TASK_EVENTS_INFO"); + + struct task_thread_times_info ttinfo; + count = TASK_THREAD_TIMES_INFO_COUNT; + err = task_info(ourtask, TASK_THREAD_TIMES_INFO, (task_info_t)&ttinfo, &count); + ASSERT_RET(err, "TASK_THREAD_TIMES_INFO"); +} + + +void dummy_thread(void *arg) +{ + printf("started dummy thread\n"); + while (1) +; +} + +void check_threads(thread_t *threads, mach_msg_type_number_t nthreads) +{ + for (int tid=0; tid
[PATCH 13/14] add basic thread tests
--- tests/test-threads.c | 88 1 file changed, 88 insertions(+) create mode 100644 tests/test-threads.c diff --git a/tests/test-threads.c b/tests/test-threads.c new file mode 100644 index ..659aaf3b --- /dev/null +++ b/tests/test-threads.c @@ -0,0 +1,88 @@ + +#include +#include + +#include +#include + +#include + +void sleeping_thread(void* arg) +{ + printf("starting thread %d\n", arg); + for (int i=0; i<100; i++) + msleep(50); // maybe we could just yield + printf("stopping thread %d\n", arg); + int err = thread_terminate(mach_thread_self()); + ASSERT_RET(err, "thread_terminate"); +} + +void test_many(void) +{ + int err; + for (long tid=0; tid<10; tid++) +{ + test_thread_start(mach_task_self(), sleeping_thread, (void*)tid); +} + // TODO: wait for thread end notifications + msleep(6000); +} + +#ifdef __x86_64__ +void test_fsgs_base_thread(void* tid) +{ + int err; +#if defined(__SEG_FS) && defined(__SEG_GS) + long __seg_fs *fs_ptr; + long __seg_gs *gs_ptr; + long fs_value; + long gs_value; + + struct i386_fsgs_base_state state; + state.fs_base = (unsigned long)&fs_value; + state.gs_base = (unsigned long)&gs_value; + err = thread_set_state(mach_thread_self(), i386_FSGS_BASE_STATE, + (thread_state_t) &state, i386_FSGS_BASE_STATE_COUNT); + ASSERT_RET(err, "thread_set_state"); + + fs_value = 0x100 + (long)tid; + gs_value = 0x200 + (long)tid; + + msleep(50); // allow the others to set their segment base + + fs_ptr = 0; + gs_ptr = 0; + long rdvalue = *fs_ptr; + printf("FS expected %lx read %lx\n", fs_value, rdvalue); + ASSERT(fs_value == rdvalue, "FS base error\n"); + rdvalue = *gs_ptr; + printf("GS expected %lx read %lx\n", gs_value, rdvalue); + ASSERT(gs_value == rdvalue, "GS base error\n"); +#else +#error " missing __SEG_FS and __SEG_GS" +#endif + + err = thread_terminate(mach_thread_self()); + ASSERT_RET(err, "thread_terminate"); +} +#endif + +void test_fsgs_base(void) +{ +#ifdef __x86_64__ + int err; + for (long tid=0; tid<10; tid++) +{ + test_thread_start(mach_task_self(), test_fsgs_base_thread, (void*)tid); +} + msleep(1000); // TODO: wait for threads +#endif +} + + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + test_fsgs_base(); + test_many(); + return 0; +} -- 2.39.2
[PATCH v2 00/14] add user tests to gnumach
This patch set adds simple user-space tests as part of gnumach's make check, running on qemu, and some fixes found in the meantime. The tests are far for complete and currently mostly cover issues I found working on x86_64, but they are quite minimal and only require a working MIG. They will be definitely useful in case of new architectures, if supported by qemu. Currently I ran the tests only on GNU/Linux as qemu does not run yet on the Hurd. Luca Dariz (14): USER32: change default to disabled and make it a general option add basic user-space tests with qemu add mach_host tests add gsync tests add mach_port tests adjust range when changing memory pageability add basic vm tests add thread creation helper to tests add syscall tests expose MACH_MSG_USER_ALIGNMENT for manually-built messages add raw mach_msg tests add basic task tests add basic thread tests add tests to make check configure.ac | 16 +- include/mach/message.h | 19 +- tests/Makefrag.am| 11 +- tests/README | 37 +++ tests/configfrag.ac | 16 ++ tests/grub.cfg.single.template | 4 + tests/include/device/cons.h | 10 + tests/include/kern/printf.h | 1 + tests/include/mach/mig_support.h | 71 ++ tests/include/syscalls.h | 83 +++ tests/include/testlib.h | 58 + tests/include/util/atoi.h| 1 + tests/run-qemu.sh.template | 23 ++ tests/syscalls.S | 4 + tests/test-gsync.c | 87 +++ tests/test-hello.c | 9 + tests/test-mach_host.c | 54 + tests/test-mach_port.c | 80 +++ tests/test-machmsg.c | 390 +++ tests/test-syscalls.c| 149 tests/test-task.c| 149 tests/test-threads.c | 88 +++ tests/test-vm.c | 52 + tests/testlib.c | 150 tests/user-qemu.mk | 203 vm/vm_map.c | 21 +- x86_64/configfrag.ac | 14 +- 27 files changed, 1771 insertions(+), 29 deletions(-) create mode 100644 tests/README create mode 100644 tests/grub.cfg.single.template create mode 100644 tests/include/device/cons.h create mode 12 tests/include/kern/printf.h create mode 100644 tests/include/mach/mig_support.h create mode 100644 tests/include/syscalls.h create mode 100644 tests/include/testlib.h create mode 12 tests/include/util/atoi.h create mode 100644 tests/run-qemu.sh.template create mode 100644 tests/syscalls.S create mode 100644 tests/test-gsync.c create mode 100644 tests/test-hello.c create mode 100644 tests/test-mach_host.c create mode 100644 tests/test-mach_port.c create mode 100644 tests/test-machmsg.c create mode 100644 tests/test-syscalls.c create mode 100644 tests/test-task.c create mode 100644 tests/test-threads.c create mode 100644 tests/test-vm.c create mode 100644 tests/testlib.c create mode 100644 tests/user-qemu.mk -- 2.39.2
[PATCH 03/14] add mach_host tests
--- tests/test-mach_host.c | 54 ++ 1 file changed, 54 insertions(+) create mode 100644 tests/test-mach_host.c diff --git a/tests/test-mach_host.c b/tests/test-mach_host.c new file mode 100644 index ..99fc4aca --- /dev/null +++ b/tests/test-mach_host.c @@ -0,0 +1,54 @@ + +#include + +#include + +void test_kernel_version() +{ + int err; + kernel_version_t kver; + err = host_get_kernel_version(mach_host_self(), kver); + ASSERT_RET(err, "host_kernel_info"); + printf("kernel version: %s\n", kver); +} + +void test_host_info() +{ + int err; + mach_msg_type_number_t count; + mach_port_t thishost = mach_host_self(); + + host_basic_info_data_t binfo; + count = HOST_BASIC_INFO_COUNT; + err = host_info(thishost, HOST_BASIC_INFO, (host_info_t)&binfo, &count); + ASSERT_RET(err, "host_basic_info"); + ASSERT(count == HOST_BASIC_INFO_COUNT, ""); + + const int maxcpus = 255; + int proc_slots[maxcpus]; + count = maxcpus; + err = host_info(thishost, HOST_PROCESSOR_SLOTS, (host_info_t)&proc_slots, &count); + ASSERT_RET(err, "host_processor_slots"); + ASSERT((1 <= count) && (count <= maxcpus), ""); + + host_sched_info_data_t sinfo; + count = HOST_SCHED_INFO_COUNT; + err = host_info(thishost, HOST_SCHED_INFO, (host_info_t)&sinfo, &count); + ASSERT_RET(err, "host_sched_info"); + ASSERT(count == HOST_SCHED_INFO_COUNT, ""); + + host_load_info_data_t linfo; + count = HOST_LOAD_INFO_COUNT; + err = host_info(thishost, HOST_LOAD_INFO, (host_info_t)&linfo, &count); + ASSERT_RET(err, "host_load_info"); + ASSERT(count == HOST_LOAD_INFO_COUNT, ""); +} + +// TODO processor sets + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + test_kernel_version(); + test_host_info(); + return 0; +} -- 2.39.2
[PATCH 09/14] add syscall tests
--- tests/test-syscalls.c | 149 ++ 1 file changed, 149 insertions(+) create mode 100644 tests/test-syscalls.c diff --git a/tests/test-syscalls.c b/tests/test-syscalls.c new file mode 100644 index ..7e4234ac --- /dev/null +++ b/tests/test-syscalls.c @@ -0,0 +1,149 @@ + +#include +#include + +#include +#include +#include + +#include +#include +#include + + +static struct { + mach_port_t exception_port; + mach_port_t thread; + mach_port_t task; + integer_t exception; + integer_t code; + integer_t subcode; +} last_exc; +kern_return_t catch_exception_raise(mach_port_t exception_port, +mach_port_t thread, mach_port_t task, +integer_t exception, integer_t code, +long_integer_t subcode) +{ + printf("received catch_exception_raise(%u %u %u %d %d %d)\n", + exception_port, thread, task, exception, code, subcode); + last_exc.exception_port = exception_port; + last_exc.thread = thread; + last_exc.task = task; + last_exc.exception = exception; + last_exc.code = code; + last_exc.subcode = subcode; + return KERN_SUCCESS; +} + +static char simple_request_data[PAGE_SIZE]; +static char simple_reply_data[PAGE_SIZE]; +int simple_msg_server(boolean_t (*demuxer) (mach_msg_header_t *request, + mach_msg_header_t *reply), + mach_port_t rcv_port_name, + int num_msgs) +{ + int midx = 0, mok = 0; + int ret; + mig_reply_header_t *request = (mig_reply_header_t*)simple_request_data; + mig_reply_header_t *reply = (mig_reply_header_t*)simple_reply_data; + while ((midx - num_msgs) < 0) +{ + ret = mach_msg(&request->Head, MACH_RCV_MSG, 0, PAGE_SIZE, + rcv_port_name, 0, MACH_PORT_NULL); + switch (ret) +{ +case MACH_MSG_SUCCESS: + if ((*demuxer)(&request->Head, &reply->Head)) +mok++; // TODO send reply + else +FAILURE("demuxer didn't handle the message"); + break; +default: + ASSERT_RET(ret, "receiving in msg_server"); + break; +} + midx++; +} + if (mok != midx) +FAILURE("wrong number of message received"); + return mok != midx; +} + + +void test_syscall_bad_arg_on_stack(void *arg) +{ + /* mach_msg() has 7 arguments, so the last one should be passed on the stack + make RSP points the the wrong place to check the access check + */ +#ifdef __x86_64__ + asm volatile("movq $0x123,%rsp;" \ + "movq $-25,%rax;" \ + "syscall;" \ + ); +#else + asm volatile("mov$0x123,%esp;" \ + "mov$-25,%eax;" \ + "lcall $0x7,$0x0;" \ + ); +#endif + FAILURE("we shouldn't be here!"); +} + +void test_bad_syscall_num(void *arg) +{ +#ifdef __x86_64__ + asm volatile("movq $0x123456,%rax;"\ + "syscall;" \ + ); +#else + asm volatile("mov$0x123456,%eax;"\ + "lcall $0x7,$0x0;" \ + ); +#endif + FAILURE("we shouldn't be here!"); +} + + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + int err; + mach_port_t excp; + + err = mach_port_allocate(mach_task_self (), MACH_PORT_RIGHT_RECEIVE, &excp); + ASSERT_RET(err, "creating exception port"); + + err = mach_port_insert_right(mach_task_self(), excp, excp, + MACH_MSG_TYPE_MAKE_SEND); + ASSERT_RET(err, "inserting sr into exception port"); + + err = task_set_special_port(mach_task_self(), TASK_EXCEPTION_PORT, excp); + ASSERT_RET(err, "setting task exception port"); + + /* FIXME: receiving an exception with small size causes GP on 64 bit userspace */ + /* mig_reply_header_t msg; */ + /* err = mach_msg(&msg.Head, /\* The header *\/ */ + /*MACH_RCV_MSG, */ + /*0, */ + /*sizeof (msg), /\* Max receive Size *\/ */ + /*excp, */ + /*1000, */ + /*MACH_PORT_NULL); */ + + // FIXME: maybe MIG should provide this prototype? + boolean_t exc_server +(mach_msg_header_t *InHeadP, mach_msg_header_t *OutHeadP); + + memset(&last_exc, 0, sizeof(last_exc)); + test_thread_start(mach_task_self(), test_bad_syscall_num, NULL); + ASSERT_RET(simple_msg_server(exc_server, excp, 1), "error in exc server"); + ASSERT((last_exc.exception == EXC_BAD_INSTRUCTION) && (last_exc.code == EXC_I386_INVOP), + "bad exception for test_bad_syscall_num()"); + + memset(&last_exc, 0, sizeof(last_exc)); + test_thread_start(mach_task_self(), test_syscall_bad_arg_on_stack, NULL); + ASSERT_RET(simple_msg_server(e
[PATCH 07/14] add basic vm tests
--- tests/test-vm.c | 52 + 1 file changed, 52 insertions(+) create mode 100644 tests/test-vm.c diff --git a/tests/test-vm.c b/tests/test-vm.c new file mode 100644 index ..ba52876b --- /dev/null +++ b/tests/test-vm.c @@ -0,0 +1,52 @@ + +#include +#include + +#include +#include +#include +#include +#include + +#include +#include +#include +#include + + +static void test_memobj() +{ + // this emulates maptime() + struct mapped_time_value *mtime; + mach_port_t device, memobj; + int err = device_open (device_priv(), 0, "time", &device); + ASSERT_RET(err, "device_open"); + err = device_map (device, VM_PROT_READ, 0, sizeof(*mtime), &memobj, 0); + ASSERT_RET(err, "device_map"); + err = mach_port_deallocate (mach_task_self (), device); + ASSERT_RET(err, "mach_port_deallocate"); + mtime = 0; + err = +vm_map (mach_task_self (), (vm_address_t *)&mtime, sizeof *mtime, 0, 1, +memobj, 0, 0, VM_PROT_READ, VM_PROT_READ, VM_INHERIT_NONE); + ASSERT_RET(err, "vm_map"); + err = mach_port_deallocate (mach_task_self (), memobj); + ASSERT_RET(err, "mach_port_deallocate"); + err = vm_deallocate(mach_task_self(), (vm_address_t)mtime, sizeof(*mtime)); + ASSERT_RET(err, "vm_deallocate"); +} + +static void test_wire() +{ + int err = vm_wire_all(host_priv(), mach_task_self(), VM_WIRE_ALL); + ASSERT_RET(err, "vm_wire_all"); +} + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + printf("VM_MIN_ADDRESS=0x%p\n", VM_MIN_ADDRESS); + printf("VM_MAX_ADDRESS=0x%p\n", VM_MAX_ADDRESS); + test_wire(); + test_memobj(); + return 0; +} -- 2.39.2
[PATCH 02/14] add basic user-space tests with qemu
* configure.ac: move test fragment to have USER32 * tests/Makefrag.am: add user tests * tests/README: add basic info on how to run and debug user tests * tests/configfrag.ac: allow the test compiler/flags to be autoconfigured or customized * tests/grub.cfg.single.template: add minimal grub config to boot a module * tests/include/device/cons.h: add a simplified version of device/cons.h usable for tests * tests/include/kern/printf.h: symlink to kern/printf.h * tests/include/mach/mig_support.h: add basic version for user-space tests * tests/include/syscalls.h: add prototypes for syscalls used in tests. * tests/include/testlib.h: add definitions for common test functionalities * tests/include/util/atoi.h: symlink to util/atoi.h * tests/run-qemu.sh.template: add a simple qemu test runner * tests/syscalls.S: generate syscalls entry points * tests/test-hello.c: add basic smoke test * tests/testlib.c: add the imnimal funcitonality to run a user-space executable and reboot the system, and some test helpers. * tests/user-qemu.mk: add rules to build simple user-space test modules, including generating mig stubs. The tests reuse some kernel code (like printf(), mach_atoi(), mem*(), str*() functions) so we can use the freestanding environment and not depend on glibc. --- configure.ac | 6 +- tests/Makefrag.am| 11 +- tests/README | 37 ++ tests/configfrag.ac | 16 +++ tests/grub.cfg.single.template | 4 + tests/include/device/cons.h | 10 ++ tests/include/kern/printf.h | 1 + tests/include/mach/mig_support.h | 71 +++ tests/include/syscalls.h | 83 + tests/include/testlib.h | 57 + tests/include/util/atoi.h| 1 + tests/run-qemu.sh.template | 23 tests/syscalls.S | 4 + tests/test-hello.c | 9 ++ tests/testlib.c | 97 +++ tests/user-qemu.mk | 195 +++ 16 files changed, 620 insertions(+), 5 deletions(-) create mode 100644 tests/README create mode 100644 tests/grub.cfg.single.template create mode 100644 tests/include/device/cons.h create mode 12 tests/include/kern/printf.h create mode 100644 tests/include/mach/mig_support.h create mode 100644 tests/include/syscalls.h create mode 100644 tests/include/testlib.h create mode 12 tests/include/util/atoi.h create mode 100644 tests/run-qemu.sh.template create mode 100644 tests/syscalls.S create mode 100644 tests/test-hello.c create mode 100644 tests/testlib.c create mode 100644 tests/user-qemu.mk diff --git a/configure.ac b/configure.ac index 52205eee..0c5d4d61 100644 --- a/configure.ac +++ b/configure.ac @@ -123,9 +123,6 @@ AC_CHECK_PROG([PATCH], [patch], [patch], [patch-not-found]) # configure fragments. # -# The test suite. -m4_include([tests/configfrag.ac]) - # Default set of device drivers. AC_ARG_ENABLE([device-drivers], AS_HELP_STRING([--enable-device-drivers=WHICH], [specify WHICH (on `ix86-at' @@ -181,6 +178,9 @@ m4_include([configfrag.ac]) # Linux code snarfed into GNU Mach. m4_include([linux/configfrag.ac]) + +# The test suite. +m4_include([tests/configfrag.ac]) # # Compiler features. diff --git a/tests/Makefrag.am b/tests/Makefrag.am index 2723f64a..88c7fe8c 100644 --- a/tests/Makefrag.am +++ b/tests/Makefrag.am @@ -21,6 +21,13 @@ # if !PLATFORM_xen + +include tests/user-qemu.mk + TESTS += \ - tests/test-multiboot -endif + tests/test-multiboot \ + $(USER_TESTS) + +clean-local: $(USER_TESTS_CLEAN) + +endif # !PLATFORM_xen diff --git a/tests/README b/tests/README new file mode 100644 index ..4cf25891 --- /dev/null +++ b/tests/README @@ -0,0 +1,37 @@ + +There are some basic tests that can be run qith qemu. You can run all the tests with + +$ make check + +or selectively with: + +$ make run-hello + +Also, you can debug the existing tests, or a new one, by starting on one shell + +$ make debug-hello + +and on another shell you can attach with gdb, load the symbols of the +bootstrap module and break on its _start(): + +$ gdb gnuamch +... +(gdb) target remote :1234 +... +(gdb) b setup_main +Breakpoint 11 at 0x81019d60: file ../kern/startup.c, line 98. +(gdb) c +Continuing. + +Breakpoint 11, setup_main () at ../kern/startup.c:98 +98 cninit(); +(gdb) add-symbol-file ../gnumach/build-64/module-task +Reading symbols from ../gnumach/build-64/module-task... +(gdb) b _start +Breakpoint 12 at 0x40324a: _start. (2 locations) +(gdb) c +Continuing. + +Breakpoint 12, _start () at ../tests/testlib.c:96 +96 { +(gdb) diff --git a/tests/configfrag.ac b/tests/configfrag.ac index 55c6da62..de87cbad 100644 --- a/tests/configfrag.ac +++ b/tests/configfrag.ac @@ -22,6 +22,22 @@ dnl 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
[PATCH 06/14] adjust range when changing memory pageability
* vm/vm_map.c: if the start address is not in the map, try to find the nearest entry instead of failing. This caused the initial vm_wire_all(host, task VM_WIRE_ALL) in glibc startup to fail with KERN_NO_SPACE. --- vm/vm_map.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/vm/vm_map.c b/vm/vm_map.c index 26e18676..4a5861e3 100644 --- a/vm/vm_map.c +++ b/vm/vm_map.c @@ -1774,13 +1774,24 @@ kern_return_t vm_map_pageable( if (!vm_map_lookup_entry(map, start, &start_entry)) { /* -* Start address is not in map; this is fatal. +* Start address is not in map; try to find the nearest entry */ - if (lock_map) { - vm_map_unlock(map); - } + struct rbtree_node *node; + node = rbtree_lookup_nearest(&map->hdr.tree, start, +vm_map_entry_cmp_lookup, RBTREE_LEFT); + + if (node == NULL) { + node = rbtree_lookup_nearest(&map->hdr.tree, start, +vm_map_entry_cmp_lookup, RBTREE_RIGHT); + if (node == NULL) { + if (lock_map) { + vm_map_unlock(map); + } + return KERN_NO_SPACE; + } - return KERN_NO_SPACE; + } + start_entry = rbtree_entry(node, struct vm_map_entry, tree_node); } /* -- 2.39.2
[PATCH 10/14] expose MACH_MSG_USER_ALIGNMENT for manually-built messages
--- include/mach/message.h | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/include/mach/message.h b/include/mach/message.h index 0b8b34d4..9790ef98 100644 --- a/include/mach/message.h +++ b/include/mach/message.h @@ -401,6 +401,16 @@ typedef integer_t mach_msg_option_t; #define MACH_SEND_ALWAYS 0x0001 /* internal use only */ +#ifdef __x86_64__ +#if defined(KERNEL) && defined(USER32) +#define MACH_MSG_USER_ALIGNMENT 4 +#else +#define MACH_MSG_USER_ALIGNMENT 8 +#endif +#else +#define MACH_MSG_USER_ALIGNMENT 4 +#endif + #ifdef KERNEL /* This is the alignment of msg descriptors and the actual data * for both in kernel messages and user land messages. @@ -411,15 +421,6 @@ typedef integer_t mach_msg_option_t; * so that kernel messages are correctly aligned. */ #define MACH_MSG_KERNEL_ALIGNMENT sizeof(uintptr_t) -#ifdef __x86_64__ -#ifdef USER32 -#define MACH_MSG_USER_ALIGNMENT 4 -#else -#define MACH_MSG_USER_ALIGNMENT 8 -#endif -#else -#define MACH_MSG_USER_ALIGNMENT 4 -#endif #define mach_msg_align(x, alignment) \ ( ( ((vm_offset_t)(x)) + ((alignment)-1) ) & ~((alignment)-1) ) -- 2.39.2
[PATCH 08/14] add thread creation helper to tests
--- tests/include/testlib.h | 1 + tests/testlib.c | 53 + 2 files changed, 54 insertions(+) diff --git a/tests/include/testlib.h b/tests/include/testlib.h index 2b7a67c0..52605047 100644 --- a/tests/include/testlib.h +++ b/tests/include/testlib.h @@ -48,6 +48,7 @@ const char* e2s(int err); const char* e2s_gnumach(int err); void halt(); int msleep(uint32_t timeout); +thread_t test_thread_start(task_t task, void(*routine)(void*), void* arg); mach_port_t host_priv(void); mach_port_t device_priv(void); diff --git a/tests/testlib.c b/tests/testlib.c index 6abe8c4d..e6be46ee 100644 --- a/tests/testlib.c +++ b/tests/testlib.c @@ -95,3 +95,56 @@ void _start() printf("%s: test %s exit code %x\n", TEST_SUCCESS_MARKER, argv[0], ret); halt(); } + +// started from https://github.com/dwarfmaster/mach-ipc/blob/master/minimal_threads/main.c + +static uint32_t stack_top[PAGE_SIZE] __attribute__ ((aligned (PAGE_SIZE))); +thread_t test_thread_start(task_t task, void(*routine)(void*), void* arg) { + const vm_size_t stack_size = PAGE_SIZE * 16; + kern_return_t ret; + vm_address_t stack; + + ret = vm_allocate(task, &stack, stack_size, TRUE); + ASSERT_RET(ret, "can't allocate the stack for a new thread"); + + ret = vm_protect(task, stack, PAGE_SIZE, FALSE, VM_PROT_NONE); + ASSERT_RET(ret, "can't protect the stack from overflows"); + + long *top = (long*)((vm_offset_t)stack_top + PAGE_SIZE) - 2; + *top = 0;/* The return address */ +#ifndef __x86_64__ + *(top + 1) = (long)arg; /* The argument is passed on the stack on x86_32 */ +#endif + ret = vm_write(task, stack + stack_size - PAGE_SIZE, (vm_offset_t)stack_top, PAGE_SIZE); + ASSERT_RET(ret, "can't initialize the stack for the new thread"); + + thread_t thread; + ret = thread_create(task, &thread); + ASSERT_RET(ret, "thread_create()"); + + struct i386_thread_state state; + unsigned int count; + count = i386_THREAD_STATE_COUNT; + ret = thread_get_state(thread, i386_REGS_SEGS_STATE, + (thread_state_t) &state, &count); + ASSERT_RET(ret, "thread_get_state()"); + +#ifdef __x86_64__ + state.rip = (long) routine; + state.ursp = (long) (stack + stack_size - 16); + state.rbp = 0; + state.rdi = (long)arg; +#else + state.eip = (long) routine; + state.uesp = (long) (stack + stack_size - 8); + state.ebp = 0; +#endif + ret = thread_set_state(thread, i386_REGS_SEGS_STATE, + (thread_state_t) &state, i386_THREAD_STATE_COUNT); + ASSERT_RET(ret, "thread_set_state"); + + ret = thread_resume(thread); + ASSERT_RET(ret, "thread_resume"); + + return thread; +} -- 2.39.2
[PATCH 04/14] add gsync tests
--- tests/test-gsync.c | 87 ++ 1 file changed, 87 insertions(+) create mode 100644 tests/test-gsync.c diff --git a/tests/test-gsync.c b/tests/test-gsync.c new file mode 100644 index ..80701606 --- /dev/null +++ b/tests/test-gsync.c @@ -0,0 +1,87 @@ + +#include +#include + +#include +#include +#include +#include + +#include +#include + +/* Gsync flags. */ +#ifndef GSYNC_SHARED +# define GSYNC_SHARED 0x01 +# define GSYNC_QUAD0x02 +# define GSYNC_TIMED 0x04 +# define GSYNC_BROADCAST 0x08 +# define GSYNC_MUTATE 0x10 +#endif + +static uint32_t single_shared; +static uint64_t single_shared_quad; + +static void single_t2(void *arg) +{ + int err; + msleep(100); + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake t2"); + + err = thread_terminate(mach_thread_self()); + ASSERT_RET(err, "thread_terminate"); +} + +static void test_single() +{ + int err; + single_shared = 0; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 0, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait 1"); + + single_shared = 1; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 0, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_INVALID_ARGUMENT, "gsync_wait 2"); + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 1, 0, 100, GSYNC_TIMED); + ASSERT(err == KERN_TIMEDOUT, "gsync_wait 2b"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 1"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, GSYNC_BROADCAST); + ASSERT_RET(err, "gsync_wake 2"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 2, GSYNC_MUTATE); + ASSERT_RET(err, "gsync_wake 2a"); + ASSERT(single_shared == 2, "gsync_wake single_shared"); + + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 3"); + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 3a"); + + single_shared = 1; + test_thread_start(mach_task_self(), single_t2, 0); + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 1, 0, 0, 0); + ASSERT_RET(err, "gsync_wait 4"); +} + +static void test_single2() +{ + int err; + test_thread_start(mach_task_self(), single_t2, 0); + single_shared = 2; + err = gsync_wait(mach_task_self(), (vm_offset_t)&single_shared, 2, 0, 0, 0); + ASSERT_RET(err, "gsync_wait 1"); + // should this fail? + err = gsync_wake(mach_task_self(), (vm_offset_t)&single_shared, 0, 0); + ASSERT_RET(err, "gsync_wake 2"); +} + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + test_single2(); + test_single(); + return 0; +} -- 2.39.2
[PATCH 01/14] USER32: change default to disabled and make it a general option
* configfrag.ac: add the global option USER32; although it makes sense for 64-bit versions only, it can be used by future 64-bit architectiures and not only x86_64. Also, change the default setting to be disabled; now that we have a working full 64-bit system, it makes sense to consider it the common choice. * x86_64/configfrag.ac: define the correct 32-bit cpu if USER32 is enabled. --- configure.ac | 10 ++ x86_64/configfrag.ac | 14 -- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/configure.ac b/configure.ac index 3665d47a..52205eee 100644 --- a/configure.ac +++ b/configure.ac @@ -147,6 +147,16 @@ AC_ARG_ENABLE([device-drivers], [;; esac] +AC_ARG_ENABLE([user32], +AS_HELP_STRING([--enable-user32], [enable 32-bit user space on a 64-bit kernel])) +[if [ x"$enable_user32" = xyes ]; then] +AC_DEFINE([USER32], [], [enable 32-bit user on 64-bit kernel]) +AM_CONDITIONAL([enable_user32], [true]) +[else] +AM_CONDITIONAL([enable_user32], [false]) +[fi] + + # Platform-specific configuration. # PC AT. diff --git a/x86_64/configfrag.ac b/x86_64/configfrag.ac index 71f8d8c1..f119a9a3 100644 --- a/x86_64/configfrag.ac +++ b/x86_64/configfrag.ac @@ -27,22 +27,16 @@ dnl USE OF THIS SOFTWARE. # Determines the size of the CPU cache line. AC_DEFINE([CPU_L1_SHIFT], [6], [CPU_L1_SHIFT]) -AC_ARG_ENABLE([user32], - AS_HELP_STRING([--disable-user32], [disable 32-bit user on 64-bit kernel])) -[if [ x"$enable_user32" != xno ]; then] - AC_DEFINE([USER32], [], [enable 32-bit user on 64-bit kernel]) - AM_CONDITIONAL([enable_user32], [true]) -[else] - AM_CONDITIONAL([enable_user32], [false]) -[fi] +[if test x"$enable_user32" = xyes ; then + user32_cpu=i686 +fi] [# Does the architecture provide machine-specific interfaces? mach_machine_routines=1 enable_pae=yes;; *)] -AM_CONDITIONAL([HOST_x86_64], [false]) -AM_CONDITIONAL([enable_user32], [true])[;; +AM_CONDITIONAL([HOST_x86_64], [false])[;; esac case $host_platform in -- 2.39.2
[PATCH 05/14] add mach_port tests
--- tests/test-mach_port.c | 80 ++ 1 file changed, 80 insertions(+) create mode 100644 tests/test-mach_port.c diff --git a/tests/test-mach_port.c b/tests/test-mach_port.c new file mode 100644 index ..4f095047 --- /dev/null +++ b/tests/test-mach_port.c @@ -0,0 +1,80 @@ + +#include +#include +#include + +#include +#include + +#include +#include + +void test_mach_port(void) +{ + kern_return_t err; + + mach_port_name_t *namesp; + mach_msg_type_number_t namesCnt=0; + mach_port_type_t *typesp; + mach_msg_type_number_t typesCnt=0; + err = mach_port_names(mach_task_self(), &namesp, &namesCnt, &typesp, &typesCnt); + ASSERT_RET(err, "mach_port_names"); + printf("mach_port_names: type/name length: %d %d\n", namesCnt, typesCnt); + ASSERT((namesCnt != 0) && (namesCnt == typesCnt), + "mach_port_names: wrong type/name length"); + for (int i=0; i
Re: [PATCH gnumach] x86_64: Support 8 byte inlined port rights to avoid message resizing.
Il 24/11/23 22:30, Flavio Cruz ha scritto: If a port is inlined in a message, the user has to use mach_port_name_inlined_t to define each port. Out of line memory continues to use mach_port_name_t since that memory has to be copied to the kernel anyway. Both copyinmsg and copyoutmsg can be reduced to nothing (if we ignore USER32) as a follow up but kept this patch simple for ease of review. wouldn't they just become almost an alias to copyin/copyout, plus setting msg size? (as in the 32-bit case) In that case probably copy_user.c is not needed anymore, on a full 64-bit system, and would be also simpler than this change. Luca
Re: [RFC PATCH 1/2] add basic user-space tests with qemu
Hi, Il 22/10/23 17:19, Samuel Thibault ha scritto: Luca Dariz, le jeu. 19 oct. 2023 20:57:46 +0200, a ecrit: * tests/configfrag.ac: add MIGUSER, required for user-space tests as it might be different from the one used to ubild the kernel. Can't we just automatically select the proper mig? depending on user-land bitness we can use i686-gnu-mig or x86_64-gnu-mig. yes that should be the default indeed, but I would keep the possibility to override it, to test a new mig if needed. * tests/include/kern/printf.h: reuse kern/printf.h * tests/include/util/atoi.h: reuse util/atoi.h Rather use a symlink for such files? they are already a symlink, maybe I'll update the commit description to be clearer diff --git a/tests/Makefrag.am b/tests/Makefrag.am index 2723f64a..c16326e8 100644 --- a/tests/Makefrag.am +++ b/tests/Makefrag.am @@ -23,4 +23,129 @@ +if HOST_ix86 +QEMU=qemu-system-i386 Using qemu-system-i386 fails on my system: grub starts, loads the kernel, but booting it fails. It doesn't seem related to this patch series, as qemu-system-i386 -kernel gnumach seems to fail the same way. But we'd rather fix it before adding the test support :) I've tried changing some options and I see the kernel rebooting only with -cpu pentium (pentium2 works). I'm on debian 12 with qemu 7.2+dfsg-7+deb12u2, I tried with both debian's current gnumach and compiling from master. Does it work for you if using qemu-system-x86_64? I would avoid using it for the 32-bit gnumach tests as it confuses gdb. +QEMU_GDB_PORT ?= 11234 Typo? Actually this is somehow related to my .gdbinit, as sometimes I was running two debug session in parallel, to compare the 32 and 64 bit behavior. Having it customizable is probably enough, with one single default. + cat $(srcdir)/tests/grub.cfg.single.template | \ + sed -e s/BOOTMODULE/$$(builddir)/tests/isofiles/boot/grub/grub.cfg rather < $(srcdir)/tests/grub.cfg.single.template \ sed -e s/BOOTMODULE/$$(builddir)/tests/isofiles/boot/grub/grub.cfg will do diff --git a/tests/run-qemu.sh.template b/tests/run-qemu.sh.template new file mode 100644 index ..1f116354 --- /dev/null +++ b/tests/run-qemu.sh.template @@ -0,0 +1,22 @@ +#!/bin/sh + +set -e + +cmd="QEMU QEMU_OPTS -cdrom test-TESTNAME.iso" +out=out-TESTNAME +if which QEMU >/dev/null ; then +if ! timeout -v --foreground --kill-after=3 15s $cmd > $out; then +tail -n +18 $out # skip terminal reconfiguration Better use sed -n '/start module-/,$p' or such to properly skip all kernel startup prints. You could also make the grub script print something, that you can catch in the sed script. And you can put the sed script instead of "> $out", so the output file is already clean and you don't have to duplicate the one-liner to print it. Good idea, no need to hardcode the number of lines then. Also, please think about cleaning files :) do you mean make clean or some other style cleanup in the code? diff --git a/tests/testlib.c b/tests/testlib.c new file mode 100644 index ..13aa2614 --- /dev/null +++ b/tests/testlib.c @@ -0,0 +1,125 @@ +// from glibc's sysdeps/mach/usleep.c No need to keep the reference, it's small and trivial enough that it's not copyrightable :) ok +int msleep(uint32_t timeout) +{ + mach_port_t recv = mach_reply_port(); + return mach_msg(NULL, MACH_RCV_MSG|MACH_RCV_TIMEOUT|MACH_RCV_INTERRUPT, + 0, 0, recv, timeout, MACH_PORT_NULL); +} + +const char* e2s(int err) +{ +switch (err) +{ +case MACH_SEND_INVALID_DATA: return "MACH_SEND_INVALID_DATA"; You could use a macro #define E2S(s) \ case s: return #s; E2S(MACH_SEND_INVALID_DATA) In addition to this I was also thinking about auto-generating the switch/case from message.h kern_return.h and mig_errors.h, with the possibility to extend it. Luca
[RFC PATCH 2/2] add mach_host tests
--- tests/Makefrag.am | 4 +++- tests/test-mach_host.c | 54 ++ 2 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 tests/test-mach_host.c diff --git a/tests/Makefrag.am b/tests/Makefrag.am index c16326e8..cb946bc7 100644 --- a/tests/Makefrag.am +++ b/tests/Makefrag.am @@ -146,6 +146,8 @@ debug-%: test-%.iso TESTS += \ - tests/test-hello + tests/test-hello \ + tests/test-mach_host + endif # !PLATFORM_xen diff --git a/tests/test-mach_host.c b/tests/test-mach_host.c new file mode 100644 index ..2e9e6989 --- /dev/null +++ b/tests/test-mach_host.c @@ -0,0 +1,54 @@ + +#include + +#include + +void test_kernel_version() +{ + int err; + kernel_version_t kver; + err = host_get_kernel_version(mach_host_self(), kver); + ASSERT_RET(err, "host_kernel_info"); + printf("kernel version: %s\n", kver); +} + +void test_host_info() +{ + int err; + mach_msg_type_number_t count; + mach_port_t thishost = mach_host_self(); + + host_basic_info_data_t binfo; + count = HOST_BASIC_INFO_COUNT; + err = host_info(thishost, HOST_BASIC_INFO, (host_info_t)&binfo, &count); + ASSERT_RET(err, "host_basic_info"); + ASSERT(count == HOST_BASIC_INFO_COUNT); + + const int maxcpus = 255; + int proc_slots[maxcpus]; + count = maxcpus; + err = host_info(thishost, HOST_PROCESSOR_SLOTS, (host_info_t)&proc_slots, &count); + ASSERT_RET(err, "host_processor_slots"); + ASSERT((1 <= count) && (count <= maxcpus)); + + host_sched_info_data_t sinfo; + count = HOST_SCHED_INFO_COUNT; + err = host_info(thishost, HOST_SCHED_INFO, (host_info_t)&sinfo, &count); + ASSERT_RET(err, "host_sched_info"); + ASSERT(count == HOST_SCHED_INFO_COUNT); + + host_load_info_data_t linfo; + count = HOST_LOAD_INFO_COUNT; + err = host_info(thishost, HOST_LOAD_INFO, (host_info_t)&linfo, &count); + ASSERT_RET(err, "host_load_info"); + ASSERT(count == HOST_LOAD_INFO_COUNT); +} + +// TODO processor sets + +int main(int argc, char *argv[], int envc, char *envp[]) +{ + test_kernel_version(); + test_host_info(); + return 0; +} -- 2.39.2
[RFC PATCH 0/2] gnumach testing
This patch adds the possibility to automate simple tests with user-space programs, using qemu. I used this method to work on the x86_64 port so far, and it was quite useful for short iterations and testing special conditions (e.g. memory objects, syscall errors). I'm not sure if this is the best way to integrate it in the existing codebase, as it adds more dependencies for testing (qemu, two versions of MIG if testing 32on64), or if it would be better to have a separate repository for this kind of testing. For example the same method can easily be combined with hurd code, by automating the build of the relevant parts. I usually use this method to check the 32bit (non-PAE), 32bit (PAE), 32on64 and full 64 bit before sending out patches. For example, currently it seems that the 32on64 configuration does not start properly. I also add a small mach_host test example; once the testing environment is settled I can adapt other tests I currently have. Luca Dariz (2): add basic user-space tests with qemu add mach_host tests tests/Makefrag.am| 127 +++ tests/README | 37 + tests/configfrag.ac | 10 +++ tests/grub.cfg.single.template | 3 + tests/include/device/cons.h | 10 +++ tests/include/kern/printf.h | 1 + tests/include/mach/mig_support.h | 71 + tests/include/syscalls.h | 83 tests/include/testlib.h | 46 +++ tests/include/util/atoi.h| 1 + tests/run-qemu.sh.template | 22 ++ tests/syscalls.S | 4 + tests/test-hello.c | 9 +++ tests/test-mach_host.c | 54 + tests/testlib.c | 125 ++ 15 files changed, 603 insertions(+) create mode 100644 tests/README create mode 100644 tests/grub.cfg.single.template create mode 100644 tests/include/device/cons.h create mode 12 tests/include/kern/printf.h create mode 100644 tests/include/mach/mig_support.h create mode 100644 tests/include/syscalls.h create mode 100644 tests/include/testlib.h create mode 12 tests/include/util/atoi.h create mode 100644 tests/run-qemu.sh.template create mode 100644 tests/syscalls.S create mode 100644 tests/test-hello.c create mode 100644 tests/test-mach_host.c create mode 100644 tests/testlib.c -- 2.39.2
[RFC PATCH 1/2] add basic user-space tests with qemu
* tests/Makefrag.am: add rules to build simple user-space test modules, including generating mig stubs. The tests reuse some kernel code (the printf(), mach_atoi(), mem*(), str*() functions) so we can use the freestanding environment and not depend on glibc. * tests/README: add basic info on how to run and debug tests * tests/configfrag.ac: add MIGUSER, required for user-space tests as it might be different from the one used to ubild the kernel. * tests/grub.cfg.single.template: add minimal grub config to boot a module * tests/include/device/cons.h: add a simplified version of device/cons.h usable for tests * tests/include/kern/printf.h: reuse kern/printf.h * tests/include/mach/mig_support.h: add basic version for user-space tests * tests/include/syscalls.h: add prototypes for syscalls used in tests. * tests/include/testlib.h: add definitions for common test functionalities * tests/include/util/atoi.h: reuse util/atoi.h * tests/run-qemu.sh.template: add a simple qemu test runner * tests/syscalls.S: generate syscalls entry points * tests/test-hello.c: add basic smoke test * tests/testlib.c: add the imnimal funcitonality to run a user-space executable and reboot the system, and some test helpers. --- tests/Makefrag.am| 125 +++ tests/README | 37 + tests/configfrag.ac | 10 +++ tests/grub.cfg.single.template | 3 + tests/include/device/cons.h | 10 +++ tests/include/kern/printf.h | 1 + tests/include/mach/mig_support.h | 71 ++ tests/include/syscalls.h | 83 tests/include/testlib.h | 46 tests/include/util/atoi.h| 1 + tests/run-qemu.sh.template | 22 ++ tests/syscalls.S | 4 + tests/test-hello.c | 9 +++ tests/testlib.c | 125 +++ 14 files changed, 547 insertions(+) create mode 100644 tests/README create mode 100644 tests/grub.cfg.single.template create mode 100644 tests/include/device/cons.h create mode 12 tests/include/kern/printf.h create mode 100644 tests/include/mach/mig_support.h create mode 100644 tests/include/syscalls.h create mode 100644 tests/include/testlib.h create mode 12 tests/include/util/atoi.h create mode 100644 tests/run-qemu.sh.template create mode 100644 tests/syscalls.S create mode 100644 tests/test-hello.c create mode 100644 tests/testlib.c diff --git a/tests/Makefrag.am b/tests/Makefrag.am index 2723f64a..c16326e8 100644 --- a/tests/Makefrag.am +++ b/tests/Makefrag.am @@ -23,4 +23,129 @@ if !PLATFORM_xen TESTS += \ tests/test-multiboot + +# FIXME: how can we reliably detect a change on any header and reinstall them? +$(MACH_TESTINSTALL): + mkdir -p $@ + $(MAKE) install-data DESTDIR=$@ + +prepare-test: $(MACH_TESTINSTALL) + +$(MIG_OUTDIR): + mkdir -p $@ + +MACH_TESTINSTALL=$(builddir)/tests/include-mach +MACH_TESTINCLUDE=$(MACH_TESTINSTALL)/$(prefix)/include + +MIGCOMUSER=$(MIGUSER) -n -cc cat - /dev/null +MIG_OUTDIR=$(builddir)/tests/mig-out +MIG_CPPFLAGS=-nostdinc -I$(MACH_TESTINCLUDE) -ggdb + +define generate_mig_client +$(MIG_OUTDIR)/$(2).user.c: prepare-test $(MIG_OUTDIR) $(srcdir)/include/$(1)/$(2).defs + $(CPP) $(MIG_CPPFLAGS) -o $(MIG_OUTDIR)/$(2).user.defs $(srcdir)/include/$(1)/$(2).defs + $(MIGCOMUSER) $(MIGCOMFLAGS) $(MIGCOMUFLAGS)\ + -user $(MIG_OUTDIR)/$(2).user.c -header $(MIG_OUTDIR)/$(2).user.h \ + -list $(MIG_OUTDIR)/$(2).user.msgids \ + < $(MIG_OUTDIR)/$(2).user.defs +endef + +define generate_mig_server +$(MIG_OUTDIR)/$(2).server.c: prepare-test $(MIG_OUTDIR) $(srcdir)/include/$(1)/$(2).defs + $(CPP) $(MIG_CPPFLAGS) -o $(MIG_OUTDIR)/$(2).server.defs $(srcdir)/include/$(1)/$(2).defs + $(MIGCOMUSER) $(MIGCOMFLAGS) $(MIGCOMUFLAGS)\ + -server $(MIG_OUTDIR)/$(2).server.c -header $(MIG_OUTDIR)/$(2).server.h \ + -list $(MIG_OUTDIR)/$(2).server.msgids \ + < $(MIG_OUTDIR)/$(2).server.defs +endef + +MIG_GEN_CC= \ + $(MIG_OUTDIR)/mach.user.c \ + $(MIG_OUTDIR)/mach_host.user.c \ + $(MIG_OUTDIR)/mach_port.user.c \ + $(MIG_OUTDIR)/gnumach.user.c \ + $(MIG_OUTDIR)/device_reply.user.c \ + $(MIG_OUTDIR)/device.user.c \ + $(MIG_OUTDIR)/exc.server.c + +# eval->info for debug +$(eval $(call generate_mig_client,mach,mach)) +$(eval $(call generate_mig_client,mach,mach_host)) +$(eval $(call generate_mig_client,mach,mach_port)) +$(eval $(call generate_mig_client,mach,gnumach)) +$(eval $(call generate_mig_client,device,device_reply)) +$(eval $(call generate_mig_client,device,device)) +$(eval $(call generate_mig_server,mach,exc)) + + +TESTCFLAGS=-static -nostartfiles -nolibc \ + -ffreestanding \ + -I$(srcdir)/tests/include \ + -I$
Re: [PATCH] Updated the emacs open issues page.
Hi, Il 05/09/23 20:27, Guy-Fleury Iteriteka ha scritto: https://people.debian.org/~sthibault/hurd-i386/initrd-amd64.img.gz » And how you launch this one. I get: no bootable module It can be used as a ramdisk; one way to use it (maybe not the simplest, but more or less the one I used) is to build a bootable grub iso image: * from the ramdisk extract ext2fs.static and exec.static * compile gnumach for x86_64 user and kernel (with --disable-user32), make sure to include debian's ramdisk patch (or maybe take the image from https://people.debian.org/~sthibault/tmp/hurd-amd64/pool-hurd-amd64/g/gnumach/gnumach-image-1.8-486_1.8+git20230526-3_hurd-amd64.deb) * use as grub.cfg something like this: >8 set timeout=0 set default=0 menuentry "gnumach test" { multiboot /boot/gnumach console=com0 TERM=mach-gnu-color init=/bin/sh module --nounzip /boot/initrd.gz initrd '$(ramdisk-create)' module /hurd/ext2fs.static ext2fs \ --multiboot-command-line='${kernel-command-line}' \ --host-priv-port='${host-port}' \ --device-master-port='${device-port}' \ --exec-server-task='${exec-task}' -T typed gunzip:device:rd0 \ '$(task-create)' '$(task-resume)' module /hurd/exec.static exec \ --device-master-port='${device-port}' \ '$(exec-task=task-create)' boot } >8 * then assemble the grub image and run it: >8 mkdir -p isofiles/boot/grub mkdir -p isofiles/hurd cp grub.cfg isofiles/boot/grub/grub.cfg cp gnumach initrd-amd64.img.gz isofiles/boot/ cp ext2fs.static exec.static isofiles/hurd/ grub-mkrescue -o test-ramhurd.iso ./isofiles qemu-system-x86_64 -cdrom test-ramhurd.iso -m 4096 -nographic -no-reboot -boot d >8 * once you have the ramdisk running, launch debootstrap's second stage as in https://www.gnu.org/software/hurd/open_issues/64-bit_port.html To have something persistent, I installed it on a second disk attached to qemu: - attach a disk image with -hda testdisk.img (which can also be a copy of initrd-amd64.img.gz, unpacked) - once the first ramdisk debootstrap completes, mount the disk using the rumpdisk driver on /dev/wd0, chroot into it and run the --second-stage again - to boot from disk, I use a similar .iso image, with a different grub.cfg, taking also the pci-arbiter.static and rumpdisk.static files from the disk image as above (I haven't tried installing grub on disk from hurd-amd64 yet): >8 set timeout=0 set default=0 menuentry "gnumach test" { multiboot /boot/gnumach console=com0 root=part:1:device:wd0 module /boot/pci-arbiter.static pci-arbiter \ --host-priv-port='${host-port}' \ --device-master-port='${device-port}' \ --next-task='${disk-task}' \ '$(task-create)' '$(task-resume)' module /boot/rumpdisk.static rumpdisk \ --next-task='${fs-task}' \ '$(disk-task=task-create)' module /boot/ext2fs.static ext2fs \ --multiboot-command-line='${kernel-command-line}' \ --exec-server-task='${exec-task}' -T typed '${root}' \ '$(fs-task=task-create)' module /boot/exec.static exec '$(exec-task=task-create)' boot } >8---- - once installed, you can configure a network interface and ssh into it as a 32-bit hurd virtual machine Probably there is a simpler way to use it, but I hope this helps :) I guess once the port is stabilized, it will be easier to have it running. Luca
Re: [RFC PATCH gnumach] percpu area using gs segment
Hi, Il 26/08/23 08:48, Damien Zammit ha scritto: diff --git a/i386/i386/cpu_number.c b/i386/i386/cpu_number.c index ef19e11f..241015b5 100644 --- a/i386/i386/cpu_number.c +++ b/i386/i386/cpu_number.c @@ -20,11 +20,17 @@ #include #include #include +#include #include #if NCPUS > 1 -int cpu_number(void) +int cpu_number_slow(void) { return cpu_id_lut[apic_get_current_cpu()]; } This should also be defined for non-smp, otherwise compilation fails (I only tested x86_64). Also I would make compilaiton explicitely fail somewhere on x86_64+smp for now. diff --git a/i386/i386/percpu.c b/i386/i386/percpu.c new file mode 100644 index ..0bc8b234 --- /dev/null +++ b/i386/i386/percpu.c @@ -0,0 +1,30 @@ +/* + * Copyright (c) 2023 Free Software Foundation, Inc. + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ +#include +#include +#include + +struct percpu percpu_array[NCPUS] __aligned(0x8000); Why do you need this alignment? diff --git a/i386/i386/percpu.h b/i386/i386/percpu.h new file mode 100644 index ..b22d512c --- /dev/null +++ b/i386/i386/percpu.h @@ -0,0 +1,72 @@ +/* + * Copyright (c) 2023 Free Software Foundation, Inc. + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef _PERCPU_H_ +#define _PERCPU_H_ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define percpu_assign(stm, val) \ +asm("mov %0, %%gs:%1" \ + : : "r" (val), "m" (__builtin_offsetof(struct percpu, stm))); + +#define percpu_ptr(typ, stm)\ +MACRO_BEGIN \ +typ *ptr_ = (typ *)__builtin_offsetof(struct percpu, stm); \ +\ +asm("add %%gs:0, %0"\ + : "+r" (ptr_) \ + : ); \ +\ +ptr_; \ +MACRO_END If it could simplify this accessor, gcc (and also clang, if we ever need it) supports using a specific segment base for a variable: https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Named-Address-Spaces.html#x86-Named-Address-Spaces and https://kernel.org/doc/html/next/x86/x86_64/fsgs.html#compiler-support-for-fs-gs-based-addressing Luca
Re: [PATCH 1/3 gnumach] apic: Use cpuid to read the apic id for speed
Il 11/08/23 10:34, Damien Zammit ha scritto:> diff --git a/i386/i386/cpu_number.h b/i386/i386/cpu_number.h index c00896e8..df086370 100644 --- a/i386/i386/cpu_number.h +++ b/i386/i386/cpu_number.h @@ -39,12 +39,30 @@ #define CX(addr, reg) addr(,reg,8) #endif -#define CPU_NUMBER(reg) \ +#defineCPU_NUMBER_NO_STACK(reg)\ movl%cs:lapic, reg ;\ movl%cs:APIC_ID(reg), reg ;\ shrl$24, reg;\ movl%cs:CX(cpu_id_lut, reg), reg;\ +/* Never call CPU_NUMBER(%esi) */ +#define CPU_NUMBER(reg)\ + pushl %esi;\ + pushl %eax;\ + pushl %ebx;\ + pushl %ecx;\ + pushl %edx;\ + movl$1, %eax;\ + cpuid ;\ + shrl$24, %ebx ;\ + movl%cs:CX(cpu_id_lut, %ebx), %esi ;\ + popl%edx;\ + popl%ecx;\ + popl%ebx;\ + popl%eax;\ + movl%esi, reg ;\ + popl%esi;\ + How much faster is this? did you measure also on hw? I would use RDTSC to do it, I don't know if there are better ways. Interestingly, Linux uses either LSL or RDPID to obtain the same value, I wonder if they could be an easy alternative. Also, could it be that there is a particular use of it that is much more frequent than the others, and that might benefit from further optimizations? Luca
Re: 64bit startup
Il 09/08/23 19:57, Samuel Thibault ha scritto: Samuel Thibault, le mar. 08 août 2023 19:15:50 +0200, a ecrit: I continued stabbing at the network issue, it was just a size alignment problem, so that depending on their sizes, half of the network packets would be discarded by mach_msg. Hurd-amd64 packages are getting rebuilt to contain the fix. They're rebuilt, the image as well. Along the way, I fixed the hurd package getting built with pie enabled, that was a missing commit to be uploaded to dpkg. I'll also try to build with nostrip. In case it's not too big that'll make debugging even easier. Thanks! now I am able to set up networking so I can install some more stuff, I can also ssh from my host to the 64-bit hurd :D By the way, if I try to add the repository key with apt-key adv --recv-keys --keyserver keyring.debian.org 900CB024B67931D40F82304BD0178C767D069EE6 it fails with E: gnupg, gnupg2 and gnupg1 do not seem to be installed, but one of them is required for this operation Were there any build failures with gnupg or was it just left out? (I see other gnupg-related packages built successfully, e.g. gpgv). I can bypass authentication anyway for now with --allow-unauthenticated and --allow-insecure-repositories. Luca
Re: [PATCH 5/5] x86_64: remove unneeded segment selectors handling on full 64 bit
Il 04/08/23 23:50, Samuel Thibault ha scritto: Luca Dariz, le sam. 29 juil. 2023 19:47:53 +0200, a ecrit: @@ -803,10 +809,7 @@ kern_return_t thread_getstatus( == 0) saved_state->efl &= ~EFL_IF; } - } - else -#endif - { + } else { /* * 386 mode. */ @@ -815,6 +818,7 @@ kern_return_t thread_getstatus( state->fs = saved_state->fs & 0x; state->gs = saved_state->gs & 0x; } +#endif Mmm, I believe we shall then drop [gfed]s from i386_thread_state? Otherwise we'll leak content in the structure in thread_getstatus calls. Yes, they would not be necessary anymore. It seems this would also require to adapt glibc accordingly in sysdeps/mach/hurd/x86_64/bits/sigcontext.h, I tried bootstrapping the whole system with a simple patch but failed so far. Luca
gnumach kernel memory map (was: Re: 64bit startup)
Il 09/08/23 11:20, Samuel Thibault ha scritto: Sergey Bugaev, le mer. 09 août 2023 12:12:29 +0300, a ecrit: It should just reuse whatever memory the bootloader has already loaded the module into, no need to copy it out anywhere. (my guess is that it's not so simple because possibly we somehow get rid of the memory allocated for modules, otherwise we'd get stuck with that piece of memory area that we cannot reuse for other system memory uses) From what I understand so far it's a bit tricky, especially in the early boot phase. The current kernel map on x86_64 is very very simple, and inherits some constraints from the 32-bit map for simplicity. It's also partly limited by the simple bootstrapping procedure in boothdr.S. Some issues that I found so far are e.g.: * decouple the bootstrap map created in boothdr.S from the runtime map and initalize it properly in pmap_bootstarap() (see also _phystokv vs phystokv). There is some memory allocated in the early C code that might need to be remapped (e.g the kernel page table map and the biosmem heap) * the kernel .text/.bss/.data etc seem to be the only parts required to be in the upper 2G, due to mcmodel=kernel. Everything else could be anywhere. * highmem is somehow limited by VM_KERNEL_MAP_SIZE, which itself is currently limited by the 2G range * in some places it seems there is the assumption that memory is directly mapped (i.e. with a simple offset between virtual/physical address), I'm not sure how pervasive this assumption is and how difficult it would be to remove it where it's not needed. Some time ago I started a first step in this direction, attempting to move down the kernel map so the highmem/directmap regions could be made much bigger, and still keeping the static data in the upper 2G. I uploaded it here [0] but it's really just a first step. My idea here was to have as a first step the various dma/directnap/highmem regions between -4G and -2G and still contiguous, and then try to leave some red zone between them and make the regions bigger, once the limits could be freely moved. For the bootstrap modules, it might be enough to explicitly reserve the memory in pmap_bootstrap(), although that would consume "low" memory. Once we have bigger memory regions, it shouldn't be too hard to move the bootstrap module somewhere else. Any thoughts? I have probably overlooked some other issues, so please correct me if I'm wrong. Luca [0] https://gitlab.com/luckyd/gnumach/-/commit/f0564e8ed63b82b4e8b0342821c37f2461625438
Re: [PATCH gnumach] Update the 64bit RPC ABI to be simpler (v2)
Il 09/08/23 03:09, Samuel Thibault ha scritto: Hello, So, is anybody against making this change? Hi, for me it's ok, I tried this patch together with the mig one and didn't see any issue so far. Luca
[PATCH v2] x86_64: install emergency handler for double fault
* i386/i386/idt.c: add selector for the interrupt-specific stack * i386/i386/ktss.c: configure ist1 to use a dedicated stack * i386/i386/trap.c: add double fault handler, which just prints the state and panics. There is not much else to do in this case but it's useful for troubleshooting * x86_64/idt_inittab.S: allow to specify an interrupt stack for custom handlers * x86_64/locore.S: add double fault handler --- i386/i386/idt.c | 12 +++- i386/i386/ktss.c | 4 +++- i386/i386/trap.c | 6 ++ x86_64/idt_inittab.S | 25 + x86_64/locore.S | 15 +++ 5 files changed, 48 insertions(+), 14 deletions(-) diff --git a/i386/i386/idt.c b/i386/i386/idt.c index cdfb9a88..caa44d71 100644 --- a/i386/i386/idt.c +++ b/i386/i386/idt.c @@ -34,6 +34,10 @@ struct idt_init_entry unsigned long entrypoint; unsigned short vector; unsigned short type; +#ifdef __x86_64__ + unsigned short ist; + unsigned short pad_0; +#endif }; extern struct idt_init_entry idt_inittab[]; @@ -49,7 +53,13 @@ idt_fill(struct real_gate *myidt) /* Initialize the exception vectors from the idt_inittab. */ while (iie->entrypoint) { - fill_idt_gate(myidt, iie->vector, iie->entrypoint, KERNEL_CS, iie->type, 0); + fill_idt_gate(myidt, iie->vector, iie->entrypoint, KERNEL_CS, iie->type, +#ifdef __x86_64__ + iie->ist +#else + 0 +#endif + ); iie++; } diff --git a/i386/i386/ktss.c b/i386/i386/ktss.c index 1d880167..34cb6df2 100644 --- a/i386/i386/ktss.c +++ b/i386/i386/ktss.c @@ -43,9 +43,10 @@ struct task_tss ktss; void ktss_fill(struct task_tss *myktss, struct real_descriptor *mygdt) { - /* XXX temporary exception stack */ + /* XXX temporary exception stacks */ /* FIXME: make it per-processor */ static int exception_stack[1024]; + static int double_fault_stack[1024]; #ifdef MACH_RING1 /* Xen won't allow us to do any I/O by default anyway, just register @@ -61,6 +62,7 @@ ktss_fill(struct task_tss *myktss, struct real_descriptor *mygdt) /* Initialize the master TSS. */ #ifdef __x86_64__ myktss->tss.rsp0 = (unsigned long)(exception_stack+1024); + myktss->tss.ist1 = (unsigned long)(double_fault_stack+1024); #else /* ! __x86_64__ */ myktss->tss.ss0 = KERNEL_DS; myktss->tss.esp0 = (unsigned long)(exception_stack+1024); diff --git a/i386/i386/trap.c b/i386/i386/trap.c index f7bd8e38..b3689c9a 100644 --- a/i386/i386/trap.c +++ b/i386/i386/trap.c @@ -666,3 +666,9 @@ db_debug_all_traps (boolean_t enable) } #endif /* MACH_KDB */ + +void handle_double_fault(struct i386_saved_state *regs) +{ + dump_ss(regs); + panic("DOUBLE FAULT! This is critical\n"); +} diff --git a/x86_64/idt_inittab.S b/x86_64/idt_inittab.S index f021b56d..fc1df0c7 100644 --- a/x86_64/idt_inittab.S +++ b/x86_64/idt_inittab.S @@ -50,12 +50,13 @@ ENTRY(idt_inittab) .quad entry ;\ .text #else /* MACH_PV_DESCRIPTORS */ -#defineIDT_ENTRY(n,entry,type) \ +#defineIDT_ENTRY(n,entry,type,ist) \ .data 2 ;\ .quad entry ;\ .word n ;\ .word type;\ - .long 0 /*pad*/ ;\ + .word ist ;\ + .word 0 /*pad*/ ;\ .text #endif /* MACH_PV_DESCRIPTORS */ @@ -63,7 +64,7 @@ ENTRY(idt_inittab) * No error code. Clear error code and push trap number. */ #defineEXCEPTION(n,name) \ - IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_TRAP_GATE);\ + IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_TRAP_GATE, 0);\ ENTRY(name);\ INT_FIX ;\ pushq $(0);\ @@ -74,7 +75,7 @@ ENTRY(name) ;\ * User-accessible exception. Otherwise, same as above. */ #defineEXCEP_USR(n,name) \ - IDT_ENTRY(n,EXT(name),ACC_PL_U|ACC_TRAP_GATE);\ + IDT_ENTRY(n,EXT(name),ACC_PL_U|ACC_TRAP_GATE, 0);\ ENTRY(name);\ INT_FIX ;\ pushq $(0);\ @@ -85,7 +86,7 @@ ENTRY(name) ;\ * Error code has been pushed. Just push trap number. */ #defineEXCEP_ERR(n,name) \ - IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_INTR_GATE);\ + IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_INTR_GATE, 0);\ ENTRY(name);\ INT_FIX ;\ pushq $(n);\ @@ -95,25 +96,25 @@ ENTRY(name) ;\ * Special interrupt code: dispatches to a unique entrypoint, * not defined automatically here. */ -#defineEXCEP_SPC(n,name) \ - IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_TRAP_GATE) +#defineE
[PATCH 3/5] x86_64: format pusha/popa macros for readability
--- x86_64/locore.S | 37 +++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/x86_64/locore.S b/x86_64/locore.S index 413d43c4..7127957b 100644 --- a/x86_64/locore.S +++ b/x86_64/locore.S @@ -39,8 +39,41 @@ #include #include -#define pusha pushq %rax ; pushq %rcx ; pushq %rdx ; pushq %rbx ; subq $8,%rsp ; pushq %rbp ; pushq %rsi ; pushq %rdi ; pushq %r8 ; pushq %r9 ; pushq %r10 ; pushq %r11 ; pushq %r12 ; pushq %r13 ; pushq %r14 ; pushq %r15 -#define popa popq %r15 ; popq %r14 ; popq %r13 ; popq %r12 ; popq %r11 ; popq %r10 ; popq %r9 ; popq %r8 ; popq %rdi ; popq %rsi ; popq %rbp ; addq $8,%rsp ; popq %rbx ; popq %rdx ; popq %rcx ; popq %rax +#define pusha \ + pushq %rax ;\ + pushq %rcx ;\ + pushq %rdx ;\ + pushq %rbx ;\ + subq $8,%rsp;\ + pushq %rbp ;\ + pushq %rsi ;\ + pushq %rdi ;\ + pushq %r8 ;\ + pushq %r9 ;\ + pushq %r10 ;\ + pushq %r11 ;\ + pushq %r12 ;\ + pushq %r13 ;\ + pushq %r14 ;\ + pushq %r15 + +#define popa \ + popq %r15 ;\ + popq %r14 ;\ + popq %r13 ;\ + popq %r12 ;\ + popq %r11 ;\ + popq %r10 ;\ + popq %r9;\ + popq %r8;\ + popq %rdi ;\ + popq %rsi ;\ + popq %rbp ;\ + addq $8,%rsp;\ + popq %rbx ;\ + popq %rdx ;\ + popq %rcx ;\ + popq %rax #ifdef USER32 #define PUSH_FSGS \ -- 2.39.2
[PATCH 2/5] x86_64: disable V86 mode on full 64-bit configuration
* i386/i386/pcb.c: simplify exception stack location and adapt thread gettrs/setters * i386/i386/thread.h: don't include V86 fields on full 64-bit * x86_64/locore.S: don't include checks for V86 mode on full 64-bit --- i386/i386/pcb.c| 35 ++- i386/i386/thread.h | 6 ++ x86_64/locore.S| 8 3 files changed, 40 insertions(+), 9 deletions(-) diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c index fb535709..d1c5fb50 100644 --- a/i386/i386/pcb.c +++ b/i386/i386/pcb.c @@ -145,9 +145,14 @@ void switch_ktss(pcb_t pcb) * won`t save the v86 segments, so we leave room. */ +#if !defined(__x86_64__) || defined(USER32) pcb_stack_top = (pcb->iss.efl & EFL_VM) ? (long) (&pcb->iss + 1) : (long) (&pcb->iss.v86_segs); +#else + pcb_stack_top = (vm_offset_t) (&pcb->iss + 1); +#endif + #ifdef __x86_64__ assert((pcb_stack_top & 0xF) == 0); #endif @@ -534,6 +539,7 @@ kern_return_t thread_setstatus( | EFL_USER_SET; #endif /* __x86_64__ && !USER32 */ +#if !defined(__x86_64__) || defined(USER32) /* * Segment registers. Set differently in V8086 mode. */ @@ -563,8 +569,9 @@ kern_return_t thread_setstatus( thread->pcb->ims.v86s.flags = saved_state->efl & (EFL_TF | EFL_IF); } - } - else if (flavor == i386_THREAD_STATE) { + } else +#endif + if (flavor == i386_THREAD_STATE) { /* * 386 mode. Set segment registers for flat * 32-bit address space. @@ -630,7 +637,7 @@ kern_return_t thread_setstatus( #endif break; } - +#if !defined(__x86_64__) || defined(USER32) case i386_V86_ASSIST_STATE: { struct i386_v86_assist_state *state; @@ -657,7 +664,7 @@ kern_return_t thread_setstatus( USER_REGS(thread)->efl & (EFL_TF | EFL_IF); break; } - +#endif case i386_DEBUG_STATE: { struct i386_debug_state *state; @@ -710,13 +717,20 @@ kern_return_t thread_getstatus( { switch (flavor) { case THREAD_STATE_FLAVOR_LIST: - if (*count < 4) +#if !defined(__x86_64__) || defined(USER32) + unsigned int ncount = 4; +#else + unsigned int ncount = 3; +#endif + if (*count < ncount) return (KERN_INVALID_ARGUMENT); tstate[0] = i386_THREAD_STATE; tstate[1] = i386_FLOAT_STATE; tstate[2] = i386_ISA_PORT_MAP_STATE; +#if !defined(__x86_64__) || defined(USER32) tstate[3] = i386_V86_ASSIST_STATE; - *count = 4; +#endif + *count = ncount; break; case i386_THREAD_STATE: @@ -770,6 +784,7 @@ kern_return_t thread_getstatus( state->cs = saved_state->cs; state->ss = saved_state->ss; +#if !defined(__x86_64__) || defined(USER32) if (saved_state->efl & EFL_VM) { /* * V8086 mode. @@ -789,7 +804,9 @@ kern_return_t thread_getstatus( saved_state->efl &= ~EFL_IF; } } - else { + else +#endif + { /* * 386 mode. */ @@ -835,7 +852,7 @@ kern_return_t thread_getstatus( *count = i386_ISA_PORT_MAP_STATE_COUNT; break; } - +#if !defined(__x86_64__) || defined(USER32) case i386_V86_ASSIST_STATE: { struct i386_v86_assist_state *state; @@ -850,7 +867,7 @@ kern_return_t thread_getstatus( *count = i386_V86_ASSIST_STATE_COUNT; break; } - +#endif case i386_DEBUG_STATE: { struct i386_debug_state *state; diff --git a/i386/i386/thread.h b/i386/i386/thread.h index b5fc5ffb..2378154f 100644 --- a/i386/i386/thread.h +++ b/i386/i386/thread.h @@ -85,12 +85,14 @@ struct i386_saved_state { unsigned long efl; unsigned long uesp; unsigned long ss; +#if !defined(__x86_64__) || defined(USER32) struct v86_segs { unsigned long v86_es; /* virtual 8086 segment registers */ unsigned long v86_ds; unsigned long v86_fs; unsigned long v86_gs; } v86_segs; +#endif }; /* @@ -144,6 +146,7 @@ struct i386_fpsave_state { }; }; +#if !defined(__x86_64__) || defined(USER32) /* * v86_assist_state: * @@ -157,6 +160,7 @@ struct v86_assist_state { unsigned short flags; /* 8086 flag bits */ };
[PATCH 4/5] x86_64: refactor segment register handling
The actual values are not saved together with the rest of the thread state, both because it would be quite espensive (reading MSR, unless rdfsbase instructions are supported, but that's optional) and not really needed. The only way the user has to change its value is with a specific RPC, so we can intercept the change easily. Furthermore, Leaving the values there exposes them to being corrupted in case of a double interruption, e.g. an irq is handled just before iretq but after starting to restore the thread state. This solution was suggested by Sergey Bugaev. * i386/i386/db_trace.c: remove fsbase/gsbase from the registers available * i386/i386/debug_i386.c: remove fsbase/gsbase from the printed thread state * i386/i386/i386asm.sym: remove fsbase/gsbase as it's not needed in asm anymore * i386/i386/pcb.c: point fsbase/gsbase to the new location * i386/i386/thread.h: move fsbase/gsbase to the machine state * x86_64/locore.S: generalize segment-handling including es/ds/gs/fs and remove fsbase/gsbase handling. Also, factor out kernel segment selector setting to a macro. --- i386/i386/db_trace.c | 2 - i386/i386/debug_i386.c | 2 - i386/i386/i386asm.sym | 2 - i386/i386/pcb.c| 12 +-- i386/i386/thread.h | 22 +++- x86_64/locore.S| 228 +++-- 6 files changed, 108 insertions(+), 160 deletions(-) diff --git a/i386/i386/db_trace.c b/i386/i386/db_trace.c index 2b6ad741..8bd86fa5 100644 --- a/i386/i386/db_trace.c +++ b/i386/i386/db_trace.c @@ -78,8 +78,6 @@ struct db_variable db_regs[] = { { "r13",(long *)&ddb_regs.r13, db_i386_reg_value }, { "r14",(long *)&ddb_regs.r14, db_i386_reg_value }, { "r15",(long *)&ddb_regs.r15, db_i386_reg_value }, - { "fsb",(long *)&ddb_regs.fsbase,db_i386_reg_value }, - { "gsb",(long *)&ddb_regs.gsbase,db_i386_reg_value }, #endif }; struct db_variable *db_eregs = db_regs + sizeof(db_regs)/sizeof(db_regs[0]); diff --git a/i386/i386/debug_i386.c b/i386/i386/debug_i386.c index b5465796..41d032e3 100644 --- a/i386/i386/debug_i386.c +++ b/i386/i386/debug_i386.c @@ -40,8 +40,6 @@ void dump_ss(const struct i386_saved_state *st) st->r8, st->r9, st->r10, st->r11); printf("R12 %016lx R13 %016lx R14 %016lx R15 %016lx\n", st->r12, st->r13, st->r14, st->r15); - printf("FSBASE %016lx GSBASE %016lx\n", - st->fsbase, st->gsbase); printf("RIP %016lx EFLAGS %08lx\n", st->eip, st->efl); #else printf("EAX %08lx EBX %08lx ECX %08lx EDX %08lx\n", diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym index fd0be557..1b9b40bb 100644 --- a/i386/i386/i386asm.sym +++ b/i386/i386/i386asm.sym @@ -108,8 +108,6 @@ offset i386_saved_stater r12 offset i386_saved_stater r13 offset i386_saved_stater r14 offset i386_saved_stater r15 -offset i386_saved_stater fsbase -offset i386_saved_stater gsbase #endif offset i386_interrupt_statei eip diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c index d1c5fb50..1cf87eb1 100644 --- a/i386/i386/pcb.c +++ b/i386/i386/pcb.c @@ -229,8 +229,8 @@ void switch_ktss(pcb_t pcb) #endif /* MACH_PV_DESCRIPTORS */ #if defined(__x86_64__) && !defined(USER32) - wrmsr(MSR_REG_FSBASE, pcb->iss.fsbase); - wrmsr(MSR_REG_GSBASE, pcb->iss.gsbase); + wrmsr(MSR_REG_FSBASE, pcb->ims.sbs.fsbase); + wrmsr(MSR_REG_GSBASE, pcb->ims.sbs.gsbase); #endif db_load_context(pcb); @@ -687,8 +687,8 @@ kern_return_t thread_setstatus( return KERN_INVALID_ARGUMENT; state = (struct i386_fsgs_base_state *) tstate; -thread->pcb->iss.fsbase = state->fs_base; -thread->pcb->iss.gsbase = state->gs_base; +thread->pcb->ims.sbs.fsbase = state->fs_base; +thread->pcb->ims.sbs.gsbase = state->gs_base; if (thread == current_thread()) { wrmsr(MSR_REG_FSBASE, state->fs_base); wrmsr(MSR_REG_GSBASE, state->gs_base); @@ -889,8 +889,8 @@ kern_return_t thread_getstatus( return KERN_INVALID_ARGUMENT; state = (struct i386_fsgs_base_state *) tstate; -state->fs_base = thread->pcb->iss.fsbase; -state->gs_base = thread->pcb->iss.gsbase; +state->fs_base = thread->pcb->ims.sbs.fsbase; +state->gs_base = thread->pcb->ims.sbs.gsbase; *count = i386_FSGS_BASE_STATE_COUNT; break; } diff --git a/i386/i386/thread.h b/i386/i386/thread.h index 2378154f..86a44098 100644 --- a/i386/i386/thread.h +++ b/i386/i386/thread.h @@ -51,10 +51,6 @@ */ struct i386_saved_state { -#ifdef __x86_64__ - unsigned long fsbase; - unsi
[PATCH 5/5] x86_64: remove unneeded segment selectors handling on full 64 bit
* i386/i386/db_interface.c: don't set unused segment selectors on full 64-bit * i386/i386/db_trace.c: likewise. * i386/i386/i386asm.sym: likewise. * i386/i386/pcb.c:: likewise. * i386/i386/thread.h: remove ES/DS/FS/GS from thread state on !USER32, as they are unused in this configuration. Only SS and CS are kept. * x86_64/locore.S: convert segment handling macros to no-op on full 64-bit --- i386/i386/db_interface.c | 10 +++--- i386/i386/db_trace.c | 2 ++ i386/i386/i386asm.sym| 2 ++ i386/i386/pcb.c | 12 i386/i386/thread.h | 4 x86_64/locore.S | 16 +--- 6 files changed, 28 insertions(+), 18 deletions(-) diff --git a/i386/i386/db_interface.c b/i386/i386/db_interface.c index 5a4ace9f..8f0ab4ec 100644 --- a/i386/i386/db_interface.c +++ b/i386/i386/db_interface.c @@ -332,12 +332,13 @@ kdb_trap( regs->ebp= ddb_regs.ebp; regs->esi= ddb_regs.esi; regs->edi= ddb_regs.edi; - regs->es = ddb_regs.es & 0x; regs->cs = ddb_regs.cs & 0x; +#if !defined(__x86_64__) || defined(USER32) + regs->es = ddb_regs.es & 0x; regs->ds = ddb_regs.ds & 0x; regs->fs = ddb_regs.fs & 0x; regs->gs = ddb_regs.gs & 0x; - +#endif if ((type == T_INT3) && (db_get_task_value(regs->eip, BKPT_SIZE, FALSE, TASK_NULL) == BKPT_INST)) @@ -401,11 +402,12 @@ kdb_kentry( ddb_regs.esi = is->rsi; ddb_regs.edi = is->rdi; #endif +#if !defined(__x86_64__) || defined(USER32) ddb_regs.ds = is->ds; ddb_regs.es = is->es; ddb_regs.fs = is->fs; ddb_regs.gs = is->gs; - +#endif cnpollc(TRUE); db_task_trap(-1, 0, (ddb_regs.cs & 0x3) != 0); cnpollc(FALSE); @@ -430,10 +432,12 @@ kdb_kentry( is->rsi = ddb_regs.esi; is->rdi = ddb_regs.edi; #endif +#if !defined(__x86_64__) || defined(USER32) is->ds = ddb_regs.ds & 0x; is->es = ddb_regs.es & 0x; is->fs = ddb_regs.fs & 0x; is->gs = ddb_regs.gs & 0x; +#endif } #ifNCPUS > 1 db_leave(); diff --git a/i386/i386/db_trace.c b/i386/i386/db_trace.c index 8bd86fa5..b63e140f 100644 --- a/i386/i386/db_trace.c +++ b/i386/i386/db_trace.c @@ -54,10 +54,12 @@ */ struct db_variable db_regs[] = { { "cs", (long *)&ddb_regs.cs, db_i386_reg_value }, +#if !defined(__x86_64__) || defined(USER32) { "ds", (long *)&ddb_regs.ds, db_i386_reg_value }, { "es", (long *)&ddb_regs.es, db_i386_reg_value }, { "fs", (long *)&ddb_regs.fs, db_i386_reg_value }, { "gs", (long *)&ddb_regs.gs, db_i386_reg_value }, +#endif { "ss", (long *)&ddb_regs.ss, db_i386_reg_value }, { "eax",(long *)&ddb_regs.eax, db_i386_reg_value }, { "ecx",(long *)&ddb_regs.ecx, db_i386_reg_value }, diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym index 1b9b40bb..8af0c5d6 100644 --- a/i386/i386/i386asm.sym +++ b/i386/i386/i386asm.sym @@ -84,8 +84,10 @@ size i386_kernel_state iks size i386_exception_link iel +#if !defined(__x86_64__) || defined(USER32) offset i386_saved_stater gs offset i386_saved_stater fs +#endif offset i386_saved_stater cs offset i386_saved_stater uesp offset i386_saved_stater eax diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c index 1cf87eb1..e0f4f57a 100644 --- a/i386/i386/pcb.c +++ b/i386/i386/pcb.c @@ -417,10 +417,12 @@ void pcb_init(task_t parent_task, thread_t thread) */ pcb->iss.cs = USER_CS; pcb->iss.ss = USER_DS; +#if !defined(__x86_64__) || defined(USER32) pcb->iss.ds = USER_DS; pcb->iss.es = USER_DS; pcb->iss.fs = USER_DS; pcb->iss.gs = USER_DS; +#endif pcb->iss.efl = EFL_USER_SET; thread->pcb = pcb; @@ -578,10 +580,12 @@ kern_return_t thread_setstatus( */ saved_state->cs = USER_CS; saved_state->ss = USER_DS; +#if !defined(__x86_64__) || defined(USER32) saved_state->ds = USER_DS; saved_state->es = USER_DS; saved_state->fs = USER_DS; saved_state->gs = USER_DS; +#endif } else { /* @@ -592,10 +596,12 @@ kern_return_t thread_setstatus( */ saved_state->cs = state->cs; saved_state->ss = state->ss; +#if !defined(__x86_64__) || defined(USER32) saved_state->ds = state->ds; saved_state->es = state->es; saved_state->fs = state->fs; saved_state->gs =
[PATCH 1/5] x86_64: fix stack handling on recursive interrupts for USER32
* x86_64/locore.S: ensure the thread state is filled completely even on recursive interrups. The value of the segment selectors is not very important in this case, but we still need to align the stack to the bottom of i386_interrupt_state. --- x86_64/locore.S | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/x86_64/locore.S b/x86_64/locore.S index 16b0dde5..ac7138b7 100644 --- a/x86_64/locore.S +++ b/x86_64/locore.S @@ -752,17 +752,17 @@ ENTRY(all_intrs) pushq %r11 cld /* clear direction flag */ - movq%rsp,%rdx /* on an interrupt stack? */ - and $(~(INTSTACK_SIZE-1)),%rdx - cmpq%ss:EXT(int_stack_base),%rdx - je int_from_intstack /* if not: */ - movq%ds,%rdx/* save segment registers */ pushq %rdx movq%es,%rdx pushq %rdx PUSH_FSGS_ISR + movq%rsp,%rdx /* on an interrupt stack? */ + and $(~(INTSTACK_SIZE-1)),%rdx + cmpq%ss:EXT(int_stack_base),%rdx + je int_from_intstack /* if not: */ + mov %ss,%dx /* switch to kernel segments */ mov %dx,%ds mov %dx,%es @@ -830,6 +830,12 @@ int_from_intstack: jb stack_overflowed/* if not: */ callEXT(interrupt) /* call interrupt routine */ _return_to_iret_i: /* ( label for kdb_kintr) */ + POP_FSGS_ISR + pop %rdx + mov %rdx,%es + pop %rdx + mov %rdx,%ds + pop %r11 pop %r10 pop %r9 -- 2.39.2
Re: [RFC PATCH v4 1/2 hurd] libirqhelp: Add library
Il 05/07/23 09:59, Damien Zammit ha scritto: diff --git a/libirqhelp/irqhelp.c b/libirqhelp/irqhelp.c new file mode 100644 index 0..bafae6cf1 --- /dev/null +++ b/libirqhelp/irqhelp.c +void * +irqhelp_server_loop(void *arg) +{ + struct irq *irq = (struct irq *)arg; + + if (!irq) +{ + printf("irqhelp cannot start this irq thread\n"); + return NULL; +} + + int interrupt_demuxer (mach_msg_header_t *inp, +mach_msg_header_t *outp) + { +device_intr_notification_t *n = (device_intr_notification_t *) inp; + +((mig_reply_header_t *) outp)->RetCode = MIG_NO_REPLY; +if (n->intr_header.msgh_id != DEVICE_INTR_NOTIFY) + return 0; /* not an interrupt */ + +/* FIXME: id <-> gsi now has an indirection, assuming 1:1 */ +if (n->id != irq->gsi) + return 0; /* interrupt not for us */ + +/* wait if irq disabled */ +pthread_mutex_lock (&irq->irqlock); +while (!irq->enabled) + pthread_cond_wait (&irq->irqcond, &irq->irqlock); +pthread_mutex_unlock (&irq->irqlock); > + +/* call handler */ +irq->handler(irq->context); + +/* ACK interrupt */ +device_intr_ack (irqdev, irq->port, MACH_MSG_TYPE_MAKE_SEND); + +if (irq->shutdown) + { + mach_port_deallocate(mach_task_self (), irq->port); + irq->port = MACH_PORT_NULL; + pthread_cond_destroy(&irq->irqcond); + pthread_mutex_destroy(&irq->irqlock); + free(irq); +pthread_exit(NULL); This would prevent the thread calling irqhelp_server_loop() to do some cleanup. Looking at the implementation in glibc (in mach/msgserver.c), maybe one way would be to set a bad reply message, that would cause an error different from MACH_SEND_INVALID_DEST or MACH_RCV_TOO_LARGE, so the server loop will return with this error code. It's a bit hacky but I don't know if there is another way, since the demuxer return code seems ignored. Luca
Re: Recent patches break ACPI tables
Il 19/06/23 21:40, l...@orpolo.org ha scritto: and at this stage the lapic pointer is not yet initialized: (gdb) p lapic $4 = (volatile ApicLocalUnit *) 0x0 (gdb) x &lapic 0xc109bc6c : 0x I guess so far this worked because the address 0 was mapped, and now it isn't. I'm not sure what would be the proper way to solve this. I tried anticipating the call to machine_init() to be before vm_mem_bootstrap() (to have lapic initialized) but this triggers another assert. Strangely, initializing lapic with a dummy structure as in ===8<=== diff --git a/i386/i386/apic.c b/i386/i386/apic.c index ff7ba3e2..db5cbc15 100644 --- a/i386/i386/apic.c +++ b/i386/i386/apic.c @@ -27,7 +27,9 @@ #include -volatile ApicLocalUnit* lapic = NULL; +static ApicLocalUnit init_lapic; + +volatile ApicLocalUnit* lapic = &init_lapic; ApicInfo apic_data; ===8<=== makes qemu crash, as if the cpu is started multiple times: ===8<=== qemu-system-i386 -m 512 -nographic -no-reboot -cdrom test-hello.iso -smp 2 GNU Mach 1.8 ELF section header table at c0010198 biosmem: physical memory map: biosmem: 00:09f000, available biosmem: 09fc00:0a, reserved biosmem: 0f:10, reserved biosmem: 10:001ffe, available biosmem: 001ffe:002000, reserved biosmem: 00fffc:01, reserved Loaded ELF symbol table for mach (3162 symbols) vm_page: page table size: 131024 entries (7168k) vm_page: DMA: pages: 4080 (15M), free: 0 (0M) vm_page: DMA: min:500 low:600 high:1000 vm_page: DIRECTMAP: pages: 126944 (495M), free: 124031 (484M) vm_page: DIRECTMAP: min:6347 low:7616 high:12694 ACPI: rsdp = c00f59d0; rsdp->rsdt_addr = 1ffe1b21 rsdt = e1184b21; rsdt->length = 34 (n = 4) CPUS: CPU 0 - APIC ID 0 - addr=0xe1187000 CPU 1 - APIC ID 1 - addr=0xe1187000 IOAPICS: IOAPIC 0 - APIC ID 0 - addr=0xe1188000 IRQ override: pin=0 gsi=2 trigger=EDGE polarity=HIGH IRQ override: pin=5 gsi=5 trigger=LEVEL polarity=HIGH IRQ override: pin=9 gsi=9 trigger=LEVEL polarity=HIGH IRQ override: pin=10 gsi=10 trigger=LEVEL polarity=HIGH IRQ override: pin=11 gsi=11 trigger=LEVEL polarity=HIGH IOAPIC 0 configured com 2 out of range lpr0: at atbus0, port = 378x, spl = 6d, pic = 7. RTC time is 2023-06-19 20:11:07 timer calibration... done Starting AP 1 Trying to enable: 1 ** ERROR:../../accel/tcg/tcg-accel-ops-mttcg.c:110:mttcg_cpu_thread_fn: assertion failed: (cpu->halted) Sending IPIs to APIC ID 1...Bail out! ERROR:../../accel/tcg/tcg-accel-ops-mttcg.c:110:mttcg_cpu_thread_fn: assertion failed: (cpu->halted) ===8<=== Luca
Re: Recent patches break ACPI tables
Il 19/06/23 20:35, Almudena Garcia ha scritto: But the code which starts the secondary cpus is so much later than the crash. Then, the crash could be produced by the reading of ACPI tables, which are supposed to be in a certain memory region, defined by a physical address. phystokv will doesn't solve fully the problem, because the lapic address is out of the range allowed by this function. Currently, we are using paging to map every ACPI table which we need to access (to get a virtual address of this). But the search of the initial ACPI address is based in a physical address range. I could go a bit further with debugging, and it seems that the problem is a bit different, it seems removing the 1:1 map exposed an issue that went hidden so far. In my test the cpu is reset by a triple fault (you can see this by enabling interrupt and cpu_reset logging with qemu, e.g. using -d int,cpu_reset) which is triggered after the first call to splvm: (gdb) bt #0 splvm () at ../i386/i386/spl.S:122 #1 0xc1001da6 in pmap_enter (pmap=, v=, pa=, prot=, wired=) at ../i386/intel/pmap.c:2171 #2 0xc1029b99 in pmap_steal_memory (size=) at ../vm/vm_resident.c:278 #3 0xc1029c48 in vm_page_bootstrap (startp=, endp=) at ../vm/vm_resident.c:207 #4 0xc101b893 in vm_mem_bootstrap () at ../vm/vm_init.c:65 #5 0xc10161d1 in setup_main () at ../kern/startup.c:115 #6 0xc1004652 in c_boot_entry (bi=) at ../i386/i386at/model_dep.c:578 #7 0xc193 in iplt_done () at ../i386/i386at/boothdr.S:103 (gdb) si 124 cli 1: x/i $pc => 0xc100ac5d : cli (gdb) 125 CPU_NUMBER(%edx) 1: x/i $pc => 0xc100ac5e : mov%cs:0xc109bc6c,%edx (gdb) 0xc100ac65 125 CPU_NUMBER(%edx) 1: x/i $pc => 0xc100ac65 : mov%cs:0x20(%edx),%edx (gdb) t_page_fault () at ../i386/i386/locore.S:435 435 pushl $(T_PAGE_FAULT) /* mark a page fault trap */ 1: x/i $pc => 0xc100a42c : push $0xe ... and here it will enter recursively t_page_fault, because in trap_from_kernel there is another CPU_NUMBER. I guess the triple fault is triggered because at some point the exception stack overflows. With --enable-ncpu=2 it seems that CPU_NUMBER is #define CPU_NUMBER(reg) \ movl%cs:lapic, reg ;\ movl%cs:APIC_ID(reg), reg ;\ shrl$24, reg;\ and at this stage the lapic pointer is not yet initialized: (gdb) p lapic $4 = (volatile ApicLocalUnit *) 0x0 (gdb) x &lapic 0xc109bc6c : 0x I guess so far this worked because the address 0 was mapped, and now it isn't. I'm not sure what would be the proper way to solve this. I tried anticipating the call to machine_init() to be before vm_mem_bootstrap() (to have lapic initialized) but this triggers another assert. Any idea? Luca
Re: [PATCH 2/3] x86_64: install emergency handler for double fault
Hi, Il 19/06/23 19:53, Almudena Garcia ha scritto: The cpus are started sequentially: don't start the next until the current has finished its configuration. in this case I think it's not needed to have a per-cpu bootstrap stack, assuming the secondary cpu already runs a proper thread (i.e. KTSS is loaded) when the next cpu is started. Luca
Re: [PATCH 2/3] x86_64: install emergency handler for double fault
Il 17/06/23 23:12, Samuel Thibault ha scritto: Luca Dariz, le jeu. 15 juin 2023 23:49:30 +0200, a ecrit: diff --git a/i386/i386/ktss.c b/i386/i386/ktss.c index 1d880167..52f3722c 100644 --- a/i386/i386/ktss.c +++ b/i386/i386/ktss.c @@ -61,6 +61,7 @@ ktss_fill(struct task_tss *myktss, struct real_descriptor *mygdt) /* Initialize the master TSS. */ #ifdef __x86_64__ myktss->tss.rsp0 = (unsigned long)(exception_stack+1024); + myktss->tss.ist1 = (unsigned long)(exception_stack+1024); Shouldn't we use a different stack, to avoid overwriting information from the first fault? you're right, otherwise it will be less useful in case a double fault happens before scheduling starts (which then overwrites rsp0 with the pcb stack on every context switch). Thinking about the smp case, are the cpu started in parallel or sequentially? There might also be a need to set a cpu-specific first rsp0, if there is the possibility to have early interrupts concurrently. Luca
Re: Recent patches break ACPI tables
Hi, Il 18/06/23 22:52, Samuel Thibault ha scritto: Hello, Damien Zammit, le dim. 18 juin 2023 00:48:15 +, a ecrit: Almu and I discovered that the following commit breaks --enable-apic --enable-ncpus= >1 --disable-linux-groups * d972c01c pmap: only map lower BIOS memory 1:1 when using Linux drivers I believe the ACPI tables need temporary low memory mapping to access them. Luca, what do you think? Can we perhaps fix the ACPI code to use phystokv to translate from physical addresses to virtual addresses? yes that's probably the best way to fix the code. By the way, how, is it exactly crashing? It might not be related to ACPI/APIC code, I tried a simple program on a 32-bit kernel with --enable-apic but without --enable-ncpus and it seems to work. If I use --enable-ncpus=2 it seems to fail quite early, even without -smp=2 in qemu. At the moment I'm not able to set a breakpoint even to the C entry point, I'm not sure why, I think I did it in the past. Maybe there is some other part, SMP_specific, that assumes the lower memory is mapped 1:1? I remember there was some bootstrap code to start the secondary cpus, but I don't remember the details. Luca
[PATCH 1/3] x86_64: use solid intstack already during bootstrap
* x86_64/boothdr.S: there is no reason to not use it right away --- x86_64/boothdr.S | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/x86_64/boothdr.S b/x86_64/boothdr.S index d81f9a78..0ab9bd55 100644 --- a/x86_64/boothdr.S +++ b/x86_64/boothdr.S @@ -158,7 +158,7 @@ switch64: boot_entry64: /* Switch to our own interrupt stack. */ - movq$(_intstack+INTSTACK_SIZE),%rax + movq$solid_intstack+INTSTACK_SIZE-16, %rax andq$(~15),%rax movq%rax,%rsp @@ -192,9 +192,6 @@ iplt_done: /* not reached */ nop - .section .boot.data - .comm _intstack,INTSTACK_SIZE - .code32 .section .boot.data .align 4096 -- 2.39.2
[PATCH 2/3] x86_64: install emergency handler for double fault
* i386/i386/idt.c: add selector for the interrupt-specific stack * i386/i386/ktss.c: configure ist1 * i386/i386/trap.c: add double fault handler, which just prints the state and panics. There is not much else to do in this case but it's useful for troubleshooting * x86_64/idt_inittab.S: allow to specify an interrupt stack for custom handlers * x86_64/locore.S: add double fault handler --- i386/i386/idt.c | 12 +++- i386/i386/ktss.c | 1 + i386/i386/trap.c | 6 ++ x86_64/idt_inittab.S | 25 + x86_64/locore.S | 15 +++ 5 files changed, 46 insertions(+), 13 deletions(-) diff --git a/i386/i386/idt.c b/i386/i386/idt.c index cdfb9a88..caa44d71 100644 --- a/i386/i386/idt.c +++ b/i386/i386/idt.c @@ -34,6 +34,10 @@ struct idt_init_entry unsigned long entrypoint; unsigned short vector; unsigned short type; +#ifdef __x86_64__ + unsigned short ist; + unsigned short pad_0; +#endif }; extern struct idt_init_entry idt_inittab[]; @@ -49,7 +53,13 @@ idt_fill(struct real_gate *myidt) /* Initialize the exception vectors from the idt_inittab. */ while (iie->entrypoint) { - fill_idt_gate(myidt, iie->vector, iie->entrypoint, KERNEL_CS, iie->type, 0); + fill_idt_gate(myidt, iie->vector, iie->entrypoint, KERNEL_CS, iie->type, +#ifdef __x86_64__ + iie->ist +#else + 0 +#endif + ); iie++; } diff --git a/i386/i386/ktss.c b/i386/i386/ktss.c index 1d880167..52f3722c 100644 --- a/i386/i386/ktss.c +++ b/i386/i386/ktss.c @@ -61,6 +61,7 @@ ktss_fill(struct task_tss *myktss, struct real_descriptor *mygdt) /* Initialize the master TSS. */ #ifdef __x86_64__ myktss->tss.rsp0 = (unsigned long)(exception_stack+1024); + myktss->tss.ist1 = (unsigned long)(exception_stack+1024); #else /* ! __x86_64__ */ myktss->tss.ss0 = KERNEL_DS; myktss->tss.esp0 = (unsigned long)(exception_stack+1024); diff --git a/i386/i386/trap.c b/i386/i386/trap.c index f7bd8e38..b3689c9a 100644 --- a/i386/i386/trap.c +++ b/i386/i386/trap.c @@ -666,3 +666,9 @@ db_debug_all_traps (boolean_t enable) } #endif /* MACH_KDB */ + +void handle_double_fault(struct i386_saved_state *regs) +{ + dump_ss(regs); + panic("DOUBLE FAULT! This is critical\n"); +} diff --git a/x86_64/idt_inittab.S b/x86_64/idt_inittab.S index f021b56d..fc1df0c7 100644 --- a/x86_64/idt_inittab.S +++ b/x86_64/idt_inittab.S @@ -50,12 +50,13 @@ ENTRY(idt_inittab) .quad entry ;\ .text #else /* MACH_PV_DESCRIPTORS */ -#defineIDT_ENTRY(n,entry,type) \ +#defineIDT_ENTRY(n,entry,type,ist) \ .data 2 ;\ .quad entry ;\ .word n ;\ .word type;\ - .long 0 /*pad*/ ;\ + .word ist ;\ + .word 0 /*pad*/ ;\ .text #endif /* MACH_PV_DESCRIPTORS */ @@ -63,7 +64,7 @@ ENTRY(idt_inittab) * No error code. Clear error code and push trap number. */ #defineEXCEPTION(n,name) \ - IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_TRAP_GATE);\ + IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_TRAP_GATE, 0);\ ENTRY(name);\ INT_FIX ;\ pushq $(0);\ @@ -74,7 +75,7 @@ ENTRY(name) ;\ * User-accessible exception. Otherwise, same as above. */ #defineEXCEP_USR(n,name) \ - IDT_ENTRY(n,EXT(name),ACC_PL_U|ACC_TRAP_GATE);\ + IDT_ENTRY(n,EXT(name),ACC_PL_U|ACC_TRAP_GATE, 0);\ ENTRY(name);\ INT_FIX ;\ pushq $(0);\ @@ -85,7 +86,7 @@ ENTRY(name) ;\ * Error code has been pushed. Just push trap number. */ #defineEXCEP_ERR(n,name) \ - IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_INTR_GATE);\ + IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_INTR_GATE, 0);\ ENTRY(name);\ INT_FIX ;\ pushq $(n);\ @@ -95,25 +96,25 @@ ENTRY(name) ;\ * Special interrupt code: dispatches to a unique entrypoint, * not defined automatically here. */ -#defineEXCEP_SPC(n,name) \ - IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_TRAP_GATE) +#defineEXCEP_SPC(n,name, ist) \ + IDT_ENTRY(n,EXT(name),ACC_PL_K|ACC_TRAP_GATE, ist) EXCEPTION(0x00,t_zero_div) -EXCEP_SPC(0x01,t_debug) +EXCEP_SPC(0x01,t_debug, 0) /* skip NMI interrupt - let more specific code figure that out. */ EXCEP_USR(0x03,t_int3) EXCEP_USR(0x04,t_into) EXCEP_USR(0x05,t_bounds) EXCEPTION(0x06,t_invop) EXCEPTION(0x07,t_nofpu) -EXCEPTION(0x08,a_dbl_fault) +EXCEP_SPC(0x08,t_dbl_fault, 1) EXCEPTION(0x09,a_fpu_over) EXCEPTION(0x0
[PATCH 3/3] x86_64: add a critical section on entry and exit from syscall/sysret
When entering a syscall we're still using the user stack, so we can't reliably handle exceptions or interrupts, otherwise a user thread can easily crash the machine with an invalid stack. Instead, disable interrupts and (hopefullly) avoid traps in the fragments where we need to have the user stack in RSP. * i386/i386/ldt.c: mask interrupts and IOPL on syscall entry * x86_64/locore.S: keep interrupts disabled when we use the user stack --- i386/i386/ldt.c | 3 ++- x86_64/locore.S | 4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c index 4d7ec19a..5db36426 100644 --- a/i386/i386/ldt.c +++ b/i386/i386/ldt.c @@ -27,6 +27,7 @@ * "Local" descriptor table. At the moment, all tasks use the * same LDT. */ +#include #include #include @@ -75,7 +76,7 @@ ldt_fill(struct real_descriptor *myldt, struct real_descriptor *mygdt) wrmsr(MSR_REG_EFER, rdmsr(MSR_REG_EFER) | MSR_EFER_SCE); wrmsr(MSR_REG_LSTAR, (vm_offset_t)syscall64); wrmsr(MSR_REG_STAR, long)USER_CS - 16) << 16) | (long)KERNEL_CS) << 32); -wrmsr(MSR_REG_FMASK, 0); // ? +wrmsr(MSR_REG_FMASK, EFL_IF | EFL_IOPL_USER); #else /* defined(__x86_64__) && ! defined(USER32) */ fill_ldt_gate(myldt, USER_SCALL, (vm_offset_t)&syscall, KERNEL_CS, diff --git a/x86_64/locore.S b/x86_64/locore.S index a6697fb9..16b0dde5 100644 --- a/x86_64/locore.S +++ b/x86_64/locore.S @@ -1405,9 +1405,10 @@ ENTRY(syscall64) mov %r11,%rbx /* prepare for error handling */ mov %r10,%rcx /* fix arg3 location according to C ABI */ - /* switch to kernel stack */ + /* switch to kernel stack, then we can enable interrupts */ CPU_NUMBER(%r11) movqCX(EXT(kernel_stack),%r11),%rsp + sti /* Now we have saved state and args 1-6 are in place. * Before invoking the syscall we do some bound checking and, @@ -1468,6 +1469,7 @@ _syscall64_check_for_ast: _syscall64_restore_state: /* Restore thread state and return to user using sysret. */ + cli /* block interrupts when using the user stack in kernel space */ movqCX(EXT(active_threads),%r11),%r11 /* point to current thread */ movqTH_PCB(%r11),%r11 /* point to pcb */ addq$ PCB_ISS,%r11 /* point to saved state */ -- 2.39.2
Re: [VERY RFC PATCH] x86_64: Rework storing segment registers
Il 15/06/23 19:41, Sergey Bugaev ha scritto: On Thu, Jun 15, 2023 at 8:02 PM Luca wrote: That's a bit strange, with my kernel I can now successfully end the second stage, and without these warnings. Strangely, I don't see the "Setting up translators: " line... Check /debootstrap/debootatrap.log :) The output I posted was from both debootstrap std{out,err} and the log, interleaved. I'm launching tail -f /debootstrap/debootatrap.log in parallel with deboostrap itself, following Samuel's instructions. ah yes, now it makes sense :D I saw that, but then I forgot... +#if !defined(__x86_64__) || defined(USER32) I'm wondering if we could simplify this case by defining USER32 also on i386. Perhaps, but then you'd need to go over the other usages of USER32 and verify they don't imply x86_64. from a quick look It seems indeed that most cases assume x86_64. Maybe it would make sense to define USER64 as defined(__x86_64__) && !defined(USER32) since that seems the most common case. +#if defined(__x86_64__) && !defined(USER32) +struct i386_saved_fsgs_base_state { + unsigned long fsbase; + unsigned long gsbase; +}; +#endif + These can maybe be placed in the i386_machine_state below Yeah, that'd probably be better. Why did you remove these? The idea was that they could now be empty on full 64-bit, and encapsulating the segment handling code for 32-bit, which is repeated in various places... One addition could also be to set the kernel segments after saving the user ones, or are there some more variations of them? Because there's no longer a need to do anything special for fs/gs, they are now handled the same way as es/ds, namely ifdef'ed out on !USER32. Maybe it would make sense to define new macros like PUSH_SEGS and POP_SEGS that would push/pop all of (ds, es, fs, gs), but that'd be a separate change from this patch. ok, in any case I would keep PUSH_SEGS and SET_KERNEL_SEGMENTS separated, as I think they need to be split in all_intrs. Luca
Re: [VERY RFC PATCH] x86_64: Rework storing segment registers
Hi Sergey, Il 15/06/23 16:54, Sergey Bugaev ha scritto: * For USER32, don't store fs/gs base at all * For !USER32, store fs/gs base outside of the PCB stack * For !USER32, don't store or touch es, ds, fs, gs (but keep ss and cs) * For !USER32, disable all of the v86 code --- Incidentally I have a patch very very similar to this that I'm testing in various configurations (also thanks to your suggestion about the stack corruption!), although in my case I had to disable some more code (e.g. in db_trace.c, db_interface.c, probably because I have kdb enabled). So I went ahead and just made x86_64 !USER32 not store or access those segment registers, along with moving fs/gs base out of iss and disabling v86 (not that I know what v86 is, but it sounds like something we don't need to support considering we only allow running x86_64 code). I think V86 is not even used on the 32-bit build, apparently it is a special protected mode to support older x86 programs... I have only tested my configuration (x86_64 !USER32 !MACH_KDB) -- quite likely this doesn't build or work in others; but for me it seems to work very well and I haven't got a single crash, in kernel space or user space. Please do review! debootstrap is still not quite happy. I've uploaded the log here: [0] [0]: https://paste.gg/p/anonymous/c976008dc38342cd963b0778586ead19 That's a bit strange, with my kernel I can now successfully end the second stage, and without these warnings. Strangely, I don't see the "Setting up translators: " line... On the other hand, I had to fix a tricky combination of syscalls and interrupts, that for some reason was triggered quite often and caused a triple fault, together with an issue with nested interrupts (which I'm not sure are still used, but the code to handle it looks weird). How are you launching the second stage? For now I used Samuel's archive as ramdisk, then launched bash and then debootstrap --second-stage: https://paste.debian.net/1283107/ i386/i386/debug_i386.c | 2 - i386/i386/i386asm.sym | 4 +- i386/i386/pcb.c| 30 i386/i386/thread.h | 27 +-- x86_64/locore.S| 105 + 5 files changed, 90 insertions(+), 78 deletions(-) diff --git a/i386/i386/debug_i386.c b/i386/i386/debug_i386.c index b5465796..41d032e3 100644 --- a/i386/i386/debug_i386.c +++ b/i386/i386/debug_i386.c @@ -40,8 +40,6 @@ void dump_ss(const struct i386_saved_state *st) st->r8, st->r9, st->r10, st->r11); printf("R12 %016lx R13 %016lx R14 %016lx R15 %016lx\n", st->r12, st->r13, st->r14, st->r15); - printf("FSBASE %016lx GSBASE %016lx\n", - st->fsbase, st->gsbase); printf("RIP %016lx EFLAGS %08lx\n", st->eip, st->efl); #else printf("EAX %08lx EBX %08lx ECX %08lx EDX %08lx\n", diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym index fd0be557..8af0c5d6 100644 --- a/i386/i386/i386asm.sym +++ b/i386/i386/i386asm.sym @@ -84,8 +84,10 @@ size i386_kernel_state iks size i386_exception_link iel +#if !defined(__x86_64__) || defined(USER32) I'm wondering if we could simplify this case by defining USER32 also on i386. offseti386_saved_stater gs offseti386_saved_stater fs +#endif offseti386_saved_stater cs offseti386_saved_stater uesp offseti386_saved_stater eax @@ -108,8 +110,6 @@ offset i386_saved_stater r12 offseti386_saved_stater r13 offseti386_saved_stater r14 offseti386_saved_stater r15 -offset i386_saved_stater fsbase -offset i386_saved_stater gsbase #endif offset i386_interrupt_state i eip diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c index fb535709..d8987ddf 100644 --- a/i386/i386/pcb.c +++ b/i386/i386/pcb.c @@ -145,9 +145,14 @@ void switch_ktss(pcb_t pcb) * won`t save the v86 segments, so we leave room. */ +#if !defined(__x86_64__) || defined(USER32) pcb_stack_top = (pcb->iss.efl & EFL_VM) ? (long) (&pcb->iss + 1) : (long) (&pcb->iss.v86_segs); +#else + pcb_stack_top = (vm_offset_t) (&pcb->iss + 1); +#endif + #ifdef __x86_64__ assert((pcb_stack_top & 0xF) == 0); #endif @@ -224,8 +229,8 @@ void switch_ktss(pcb_t pcb) #endif /* MACH_PV_DESCRIPTORS */ #if defined(__x86_64__) && !defined(USER32) - wrmsr(MSR_REG_FSBASE, pcb->iss.fsbase); - wrmsr(MSR_REG_GSBASE, pcb->iss.gsbase); + wrmsr(MSR_REG_FSBASE, pcb->isbs.fsbase); + wrmsr(MSR_REG_GSBASE, pcb->isbs.gsbase); #endif db_load_context(pcb); @@ -412,10 +417,12 @@ void pcb_init(task_t parent_task, thread_t thread) */ pcb->iss.cs = USER_CS; pcb->iss.ss = USER_DS; +#if !de
Re: 64bit startup
Hi, Il 09/06/23 19:49, Sergey Bugaev ha scritto: with the following hacky patch, I no longer see any crashes, debootstrap --second-stage runs all the way and leaves me with an almost full Debian GNU/Hurd x86_64 system \o/ I was able to avoid the issues with a different approach, basically commenting any write to ES/DS in x86_64/locore.S. In my case the corruption usually happens when handling the error of writing a bad value to ES/DS, and disabling this probably hides the corruption issue, but it's still a different issue that could be solved by just removing the useless segment handling. It might be that the corruption itself is caused by a second trap/irq happening when the exit code from a first trap is executing (e.g. pop the last registers before iretq). This would enter the trap/irq handler using the same pcb stack, and probably some fsbase/gsbase push/pop is not done correctly in this case. Luca
Re: 64bit startup
Il 6 giugno 2023 20:22:53 UTC, Sergey Bugaev ha scritto: >On Mon, Jun 5, 2023 at 3:03 PM Sergey Bugaev wrote: >> That is going to be much easier to debug than debootstrap, thank you! > >Unfortunately I'm facing some troubles :| > >For one thing you seem to have rebuilt/updated the packages, but not >the rootfs image, so now the debuginfo I download doesn't match the >binaries in the image. Please update the image too! > >Also, gdb seems to suddenly have some sort of trouble with >interpreting the debuginfo files, specifically I'm trying ones from >the 'hurd-dbsym' package (to be even more specific: >/usr/lib/debug/.build-id/bf/dd0c0525d0ca383bd842796063345a2dd0c001.debug >from that package, which corresponds to ext2fs.static -- but I've done >a quick check and other files seem to behave the same way too). GDB >loads regular symbols from them, but not the debuginfo, i.e. I can see >what function is at which address, but not map addresses to source >lines or access local variables or use types (tcbhead_t is the one I >currently need most). I don't know enough about GDB and DWARF to >diagnose exactly what's going on; readelf --debug-dump=info seems to >dump the debuginfo just fine. > >Please try to reproduce this with your GDB (no Hurd system required), >and if you have changed something recently about how debug files are >generated, maybe that's what has broken it. > >So that all being said, here's one crash I am (and have been) seeing a >lot: the crash at any sort of TCB access when fs_base suddenly turns >out to be equal to the address of _kret_popl_ds. This makes no sense >-- surely userspace would never set that, so it must be a gnumach bug. > >I've got a little theory of how something like that could happen: > >It is my understanding that "the PCB stack" (whatever that is) where >locore.S pushes user's registers and thread->pcb->iss is really the >exact same place, pushing registers onto that stack is exactly writing >to the thread's i386_saved_state structure. The first four members of >struct i386_saved_state are unsigned long fsbase, gsbase, gs, fs -- >and being the first members of the struct means they have the lowest >addresses, i.e. are located at the top of the PCB stack. > >locore.S actually skips pushing or popping these four members: > >#define PUSH_FSGS \ >subq$32,%rsp > >#define POP_FSGS\ >addq$32,%rsp > >This is because fs and gs we don't care about, and fsbase/gsbase of a >thread state can only be changed by explicit thread_set_state calls >and not by the thread itself, so, no need to rdmsr and push it, since >the value is already saved in the PCB slot. > >However, *something* goes wrong and the fsbase slot gets overwritten >with an unrelated value (_kret_popl_ds). The real %fs_base MSR keeps >the proper value -- until we context-switch away from the thread and >then back to it, at which point the bogus value gets loaded into >%fs_base and then the userland tries to use it and faults. > >I don't know nearly enough about x86 interrupts/traps to say, but >could it be that we get another interrupt/trap, while presumably >executing _kret_popl_ds, and that causes the faulting %rip to be >pushed onto the stack, but since we're at the PCB stack at that point >it clobbers the stored fsbase? That doesn't cause issues for all the >other registers because we have already popped their values off and >won't be accessing them anymore; we'll push the new values the next >time the thread enters the kernel -- though I guess it could show up >in thread_get_state if you do that without stopping the thread on an >SMP kernel. > >cc'ing Luca -- does what I'm saying make sense? could this happen? can >you reproduce %fs_base getting set to _kret_popl_ds? yes this makes perfectly sense, I think I'm chasing the same bug currently, or some variation of it. With some tracing I saw that this corruption seems to happen when an irq (usually timer) fires when returning from a trap, although not necessarily at one specific point. I still have to find exactly the reason, in my tests fsbase gets overwritten either with a kernel address or 0x17, which might be a segment value. Btw in locore.S the 64-bit-only segment handling code is doing way too much, e.g. es/ds and such could be ignored, I guess simplifying this part may also solve this issue. Luca Hi Sergey,