Re: [Qemu-devel] Re: gdbstub: packet reply is too long
On Sun, Dec 21, 2008 at 12:44:04AM +0100, Jan Kiszka wrote: > And that means setting current_gdbarch while keeping target_gdbarch - > that's where reality (existing gdb code) bites us. Again, I'm not > arguing against fixing this, I'm arguing in keeping qemu's workaround > until this is done. I will look into the gdb part, but one after the other. No, it does not mean setting current_gdbarch different from target_gdbarch. With the current gdbarch set to a 64-bit one that accurately describes the target, GDB should be able to debug code running in 32-bit mode. If it can't, there are simply bugs in GDB to fix. If you'd like to reach some solution to this problem, which I've seen come up on the QEMU list a half-dozen times now, please describe how you're using GDB on the g...@sourceware.org mailing list and let's see if we can't fix the GDB bugs. I'm pretty sure that any solution is going to involve always transferring the x86-64 register set, though. -- Daniel Jacobowitz CodeSourcery -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG() with SCSI-interfaced disk images
I'm encountering a kernel BUG() in guests using SCSI-interfaced disk images. I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same behavior (disclaimer: Debian has about a dozen patches in their kvm packaging, but they all seem to be changes to the build/install process or security-related). IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian lenny (32-bit/i386) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64 2.6.26-12). After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB filesystem is a reliable trigger), the kernel BUGs (oops output below). I was previously using KVM 72, and tried upgrading to 79 because both Debian lenny and Ubuntu hardy guests were panicing due to sym disconnects/timeouts. 79 makes the lenny guest start BUGging as described above. 82 is not perceivably different from 79 for the lenny guest. FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up, although it emits: Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : No Sense [current] Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0 Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: No additional sense information at seemingly random intervals. The upgrade to 82 made the hardy guest start BUGging on soft lockups at random intervals (I can provide the full output if anyone's interested, but I'm much more interested in the lenny guest oops at this point). john run via libvirt: /usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \ -boot c -drive file=image.qcow,if=scsi,index=0,boot=on -net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \ -net tap,fd=17,script=,vlan=0,ifname=vnet2 \ -net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \ -net tap,fd=18,script=,vlan=1,ifname=vnet3 \ -serial pty -parallel none -usb -vnc 0.0.0.0:1 [The KVMWiki asks whether the problem is reproducible with -no-kvm-irqchip, -no-kvm-pit, or -no-kvm, but when I tried invoking the above command line by hand (outside of libvirt), the VNC console was always blank and there was no console output on the serial pty. If this would be useful information to have in this case, I'd love to know what I'm doing wrong, or if there's a way to specify additional command line arguments with libvirt.] oops generated in the guest: [ 140.101828] sym0: unexpected disconnect [ 140.102748] BUG: unable to handle kernel NULL pointer dereference at 0358 [ 140.103818] IP: [] :sym53c8xx:sym_int_sir+0x547/0x118f [ 140.106449] *pdpt = 1f5f9001 *pde = [ 140.107356] Oops: [#1] SMP [ 140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix ide_core thermal processor fan thermal_sys [ 140.108062] [ 140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1) [ 140.108062] EIP: 0060:[] EFLAGS: 00010287 CPU: 0 [ 140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx] [ 140.108062] EAX: 000a EBX: ECX: 1f98c084 EDX: 0030 [ 140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0 [ 140.108062] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [ 140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 task.ti=de0f2000) [ 140.108062] Stack: 000144d6 7f5a222c c011a853 0021d496 [ 140.108062] df98c000 e08e08cd 0001 df98c000 [ 140.108062]0084 e08e3f2f df988c00 0046 df544400 0196 [ 140.108062] Call Trace: [ 140.108062] [] pvclock_clocksource_read+0x4b/0xd0 [ 140.108062] [] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx] [ 140.108062] [] sym_interrupt+0x3ee/0x5fd [sym53c8xx] [ 140.108062] [] sym53c8xx_intr+0x35/0x56 [sym53c8xx] [ 140.108062] [] handle_IRQ_event+0x23/0x51 [ 140.108062] [] handle_fasteoi_irq+0x71/0xa4 [ 140.108062] [] do_IRQ+0x4d/0x63 [ 140.108062] [] common_interrupt+0x23/0x28 [ 140.108062] [] ptrace_request+0x1ec/0x278 [ 140.108062] [] __do_softirq+0x57/0xd3 [ 140.108062] [] do_softirq+0x45/0x53 [ 140.108062] [] irq_exit+0x35/0x67 [ 140.108062] [] smp_apic_timer_interrupt+0x6b/0x75 [ 140.108062] [] apic_timer_interrupt+0x28/0x30 [ 140.108062] [] _spin_unlock_irqrestore+0x7/0x10 [ 140.108062] [] scsi_dispatch_cmd+0x197/0x205 [scsi_mod] [ 140.108062] [] scsi_request_fn+0x264/0x32a [scsi_mod] [ 140.108063] [] __generic_unplug_device+0x1a/0x1c [ 140.108063] [] __make_request+0x2fe/0x348 [ 140.108063] [] generic_make_request+0x34d/0x37b [ 140.108063] [] mempool_alloc+0x1c/0xba [ 140.108063] [] submit_bio+0xc6/0xcd [ 140.108063] [] bio_alloc_bioset+0x9b/0xf3 [ 140.108063] [] subm
Re: how increase/decrease ram on running vm ?
2008/12/27 Ryota OZAKI : > Have you tried decreasing memory? AFAIK, current ballooning cannot > increase memory. oops, i mean ballooning cannot increase memory over the amount of memory specified in qemu/kvm arguments. > Regards, > ozaki-r > > 2008/12/27 Василец Дмитрий : >> i read this , but i haven't balloon in cli. >> >> В Птн, 26/12/2008 в 23:25 +0900, Ryota OZAKI пишет: >>> Hi, >>> >>> http://www.linux-kvm.com/content/memory-ballooning-feature-coming-soon-kvm >>> >>> This page might help you. >>> >>> Regards, >>> ozaki-r >>> >>> 2008/12/26 Василец Дмитрий : >>> > how increase/decrease ram on running vm ? >>> > i found virtio_balloon module , but don't know how it work. >>> > >>> > -- >>> > To unsubscribe from this list: send the line "unsubscribe kvm" in >>> > the body of a message to majord...@vger.kernel.org >>> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > >>> -- >>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how increase/decrease ram on running vm ?
Have you tried decreasing memory? AFAIK, current ballooning cannot increase memory. Regards, ozaki-r 2008/12/27 Василец Дмитрий : > i read this , but i haven't balloon in cli. > > В Птн, 26/12/2008 в 23:25 +0900, Ryota OZAKI пишет: >> Hi, >> >> http://www.linux-kvm.com/content/memory-ballooning-feature-coming-soon-kvm >> >> This page might help you. >> >> Regards, >> ozaki-r >> >> 2008/12/26 Василец Дмитрий : >> > how increase/decrease ram on running vm ? >> > i found virtio_balloon module , but don't know how it work. >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe kvm" in >> > the body of a message to majord...@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how increase/decrease ram on running vm ?
i read this , but i haven't balloon in cli. В Птн, 26/12/2008 в 23:25 +0900, Ryota OZAKI пишет: > Hi, > > http://www.linux-kvm.com/content/memory-ballooning-feature-coming-soon-kvm > > This page might help you. > > Regards, > ozaki-r > > 2008/12/26 Василец Дмитрий : > > how increase/decrease ram on running vm ? > > i found virtio_balloon module , but don't know how it work. > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Remove interrupt stack table usage from x86_64 kernel (v2)
* Ingo Molnar wrote: > They have the following commit IDs, and they are also in tip/master: > > 921e521: x86: move NMI back to interrupt stack > 36ef6c9: x86: make interrupt stack switching atomic > dd64891: x86: consolidate irq stack switching to a single macro > 955a368: x86: drop the use of the tss interrupt stack table (IST) > > I also started testing them in tip-qa. testing failed quickly, the attached config crashes. I've pushed out the bad kernel to the tip/tmp.master.bad branch: fe3aac9: Merge branch 'x86/irq' (no crashlog available - all i know that the box crashed and rebooted, when booted with the bzImage built out of the attached config.) Ingo # # Automatically generated make config: don't edit # Linux kernel version: 2.6.28 # Fri Dec 26 15:31:21 2008 # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set # CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ZONE_DMA32=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y # CONFIG_KTIME_SCALAR is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y # CONFIG_SYSVIPC is not set CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y # CONFIG_TASK_IO_ACCOUNTING is not set CONFIG_AUDIT=y # CONFIG_AUDITSYSCALL is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=21 # CONFIG_CGROUPS is not set CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y # CONFIG_GROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_SYSFS_DEPRECATED_V2=y CONFIG_RELAY=y CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_PCSPKR_PLATFORM=y CONFIG_COMPAT_BRK=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_HAVE_PERF_COUNTERS=y # # Performance Counters # CONFIG_PERF_COUNTERS=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_PCI_QUIRKS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_PROFILING=y CONFIG_TRACEPOINTS=y # CONFIG_MARKERS is not set CONFIG_OPROFILE=y CONFIG_HAVE_OPROFILE=y # CONFIG_KPROBES is not set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_ARCH_TRACEHOOK=y # CONFIG_HAVE_GENERIC_DMA_COHERENT is not set CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_FORCE_LOAD=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_BLOCK=y CONFIG_BLK_DEV_IO_TRACE=y CONFIG_BLK_DEV_BSG=y CONFIG_BLK_DEV_INTEGRITY=y CONFIG_BLOCK_COMPAT=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=m CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" CONFIG_PREEMPT_NOTIFIERS=y CONFIG_CLASSIC_RCU=y # CONFIG_TREE_RCU is not set # CONFIG_PREEMPT_RCU is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_PREEMPT_RCU_TRACE is not set CONFIG_FREEZER=y # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y # CONFIG_HIGH_RES_TIMERS is not set CONFIG_GENERIC_CLOCKEVENTS_BUILD=y # CONFIG_SMP is not set CONFIG_SPARSE_IRQ=y CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_GENER
Re: [PATCH 0/4] Remove interrupt stack table usage from x86_64 kernel (v2)
* Avi Kivity wrote: > The interrupt stack table (IST) mechanism is the only thing preventing > kvm from deferring saving and reloading of some significant state. It > is also somewhat complicated. > > Remove it by switching the special exceptions to use the normal irqstack. > > Changes from v1: > - rebase on tip/master > - as a step, consolidate stack switching into a single macro > > Jeremy, Xen is also affected; please review. > > Avi Kivity (4): > x86: drop the use of the tss interrupt stack table (IST) > x86: Consolidate irq stack switching to a single macro > x86: Make interrupt stack switching atomic > x86: Move NMI back to interrupt stack > > arch/x86/include/asm/desc.h | 12 - > arch/x86/include/asm/page_64.h |7 --- > arch/x86/include/asm/pda.h |2 +- > arch/x86/include/asm/processor.h | 11 > arch/x86/kernel/asm-offsets_64.c |1 - > arch/x86/kernel/cpu/common.c | 35 -- > arch/x86/kernel/dumpstack_64.c | 96 > -- > arch/x86/kernel/entry_64.S | 89 ++- > arch/x86/kernel/traps.c | 12 ++-- > 9 files changed, 33 insertions(+), 232 deletions(-) applied to tip/x86/irq, thanks Avi! They have the following commit IDs, and they are also in tip/master: 921e521: x86: move NMI back to interrupt stack 36ef6c9: x86: make interrupt stack switching atomic dd64891: x86: consolidate irq stack switching to a single macro 955a368: x86: drop the use of the tss interrupt stack table (IST) I also started testing them in tip-qa. I added the standard Impact-lines that we do in the x86 tree. Note that this patch: dd64891: x86: consolidate irq stack switching to a single macro isnt just consolidating IRQ entry assembly code, it is also changing the paranoidentry macros to do IRQ stack entries - and hence switches all but the NMI critical exception entries sequences over to the IRQ stack. Your later patch: 921e521: x86: move NMI back to interrupt stack covers the NMI entry code too. Please double-check that we indeed now have all the critical exceptions on the IRQ stack (they are all rare so testing alone wont show this), and please also double-check that we dont have more exceptions and entry callpaths on the IRQ stack than what we wanted. For example on a preemptible kernel (or in any codepath that calls schedule()) it is fatal to be on the IRQ stack, so this has to be very accurately coded. Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how increase/decrease ram on running vm ?
Hi, http://www.linux-kvm.com/content/memory-ballooning-feature-coming-soon-kvm This page might help you. Regards, ozaki-r 2008/12/26 Василец Дмитрий : > how increase/decrease ram on running vm ? > i found virtio_balloon module , but don't know how it work. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] x86: Make interrupt stack switching atomic
Instead of relying on pda.irqcount to tell us whether we're already in an interrupt or not, examine the stack pointer directly. This makes the switch atomic (since there's no window between incrementing the counter and switching the stack where an NMI could see the new counter but the old stack), and lets us get rid of a variable. Signed-off-by: Avi Kivity --- arch/x86/include/asm/pda.h |2 +- arch/x86/kernel/asm-offsets_64.c |1 - arch/x86/kernel/cpu/common.c |1 - arch/x86/kernel/entry_64.S |9 + 4 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h index 1a79e16..362fd28 100644 --- a/arch/x86/include/asm/pda.h +++ b/arch/x86/include/asm/pda.h @@ -14,7 +14,7 @@ struct x8664_pda { address */ unsigned long kernelstack; /* 16 top of kernel stack for current */ unsigned long oldrsp; /* 24 user rsp for system call */ - int irqcount; /* 32 Irq nesting counter. Starts -1 */ + int unused; /* 32 for rent */ unsigned int cpunumber; /* 36 Logical CPU number */ unsigned long stack_canary; /* 40 stack canary value */ /* gcc-ABI: this canary MUST be at diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c index 1d41d3f..62dde96 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -52,7 +52,6 @@ int main(void) ENTRY(kernelstack); ENTRY(oldrsp); ENTRY(pcurrent); - ENTRY(irqcount); ENTRY(cpunumber); ENTRY(irqstackptr); ENTRY(data_offset); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 6808c3a..f0ea980 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -881,7 +881,6 @@ void __cpuinit pda_init(int cpu) mb(); pda->cpunumber = cpu; - pda->irqcount = -1; pda->kernelstack = (unsigned long)stack_thread_info() - PDA_STACKOFFSET + THREAD_SIZE; pda->active_mm = &init_mm; diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 4f1a38f..61c54d9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -315,20 +315,21 @@ ENTRY(native_usergs_sysret64) CFI_REL_OFFSET r15, R15+\offset .endm - .macro call_in_irqstack func + .macro call_in_irqstack func, scratch=%rax /* Switch to the irq stack, unless already on it, then call func */ push %rbp CFI_ADJUST_CFA_OFFSET 8 mov %rsp,%rbp CFI_DEF_CFA_REGISTER rbp - incl %gs:pda_irqcount - cmovz %gs:pda_irqstackptr,%rsp + mov %gs:pda_irqstackptr, \scratch + sub %rsp,\scratch + cmp $IRQSTACKSIZE-64,%rax + cmova %gs:pda_irqstackptr,%rsp EMPTY_FRAME 0 call \func leaveq CFI_DEF_CFA_REGISTER rsp CFI_ADJUST_CFA_OFFSET -8 - decl %gs:pda_irqcount .endm /* save partial stack frame */ -- 1.6.0.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] x86: Consolidate irq stack switching to a single macro
Instead of scattering the logic around, move all stack switching logic into a single macro which calls a caller-supplied logic. This makes changing the logic easier and improves readability. Signed-off-by: Avi Kivity --- arch/x86/kernel/entry_64.S | 59 +++ 1 files changed, 21 insertions(+), 38 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 4d47cb8..4f1a38f 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -315,6 +315,22 @@ ENTRY(native_usergs_sysret64) CFI_REL_OFFSET r15, R15+\offset .endm + .macro call_in_irqstack func + /* Switch to the irq stack, unless already on it, then call func */ + push %rbp + CFI_ADJUST_CFA_OFFSET 8 + mov %rsp,%rbp + CFI_DEF_CFA_REGISTER rbp + incl %gs:pda_irqcount + cmovz %gs:pda_irqstackptr,%rsp + EMPTY_FRAME 0 + call \func + leaveq + CFI_DEF_CFA_REGISTER rsp + CFI_ADJUST_CFA_OFFSET -8 + decl %gs:pda_irqcount + .endm + /* save partial stack frame */ ENTRY(save_args) XCPT_FRAME @@ -336,18 +352,6 @@ ENTRY(save_args) je 1f SWAPGS /* -* irqcount is used to check if a CPU is already on an interrupt stack -* or not. While this is essentially redundant with preempt_count it is -* a little cheaper to use a separate counter in the PDA (short of -* moving irq_enter into assembly, which would be too much work) -*/ -1: incl %gs:pda_irqcount - jne 2f - popq_cfi %rax /* move return address... */ - mov %gs:pda_irqstackptr,%rsp - EMPTY_FRAME 0 - pushq_cfi %rax /* ... to the new stack */ - /* * We entered an interrupt context - irqs are off: */ 2: TRACE_IRQS_OFF @@ -819,8 +823,7 @@ END(interrupt) subq $10*8, %rsp CFI_ADJUST_CFA_OFFSET 10*8 call save_args - PARTIAL_FRAME 0 - call \func + call_in_irqstack \func .endm /* @@ -836,7 +839,6 @@ common_interrupt: ret_from_intr: DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF - decl %gs:pda_irqcount leaveq CFI_DEF_CFA_REGISTERrsp CFI_ADJUST_CFA_OFFSET -8 @@ -1060,7 +1062,7 @@ ENTRY(\sym) TRACE_IRQS_OFF movq %rsp,%rdi /* pt_regs pointer */ xorl %esi,%esi /* no error code */ - call \do_sym + call_in_irqstack \do_sym jmp paranoid_exit /* %ebx: no swapgs flag */ CFI_ENDPROC END(\sym) @@ -1096,7 +1098,7 @@ ENTRY(\sym) movq %rsp,%rdi /* pt_regs pointer */ movq ORIG_RAX(%rsp),%rsi/* get error code */ movq $-1,ORIG_RAX(%rsp) /* no syscall to restart */ - call \do_sym + call_in_irqstack \do_sym jmp paranoid_exit /* %ebx: no swapgs flag */ CFI_ENDPROC END(\sym) @@ -1239,19 +1241,7 @@ END(kernel_execve) /* Call softirq on interrupt stack. Interrupts are off. */ ENTRY(call_softirq) CFI_STARTPROC - push %rbp - CFI_ADJUST_CFA_OFFSET 8 - CFI_REL_OFFSET rbp,0 - mov %rsp,%rbp - CFI_DEF_CFA_REGISTER rbp - incl %gs:pda_irqcount - cmove %gs:pda_irqstackptr,%rsp - push %rbp # backlink for old unwinder - call __do_softirq - leaveq - CFI_DEF_CFA_REGISTERrsp - CFI_ADJUST_CFA_OFFSET -8 - decl %gs:pda_irqcount + call_in_irqstack __do_softirq ret CFI_ENDPROC END(call_softirq) @@ -1281,15 +1271,8 @@ ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) movq %rdi, %rsp# we don't return, adjust the stack frame CFI_ENDPROC DEFAULT_FRAME -11:incl %gs:pda_irqcount - movq %rsp,%rbp - CFI_DEF_CFA_REGISTER rbp - cmovzq %gs:pda_irqstackptr,%rsp - pushq %rbp # backlink for old unwinder - call xen_evtchn_do_upcall - popq %rsp + call_in_irqstack xen_evtchn_do_upcall CFI_DEF_CFA_REGISTER rsp - decl %gs:pda_irqcount jmp error_exit CFI_ENDPROC END(do_hypervisor_callback) -- 1.6.0.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] x86: Move NMI back to interrupt stack
Now that interrupt stack switching is atomic, we can move the NMI handler to the interrupt stack. Signed-off-by: Avi Kivity --- arch/x86/kernel/entry_64.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 61c54d9..3d45880 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1496,7 +1496,7 @@ ENTRY(nmi) /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ movq %rsp,%rdi movq $-1,%rsi - call do_nmi + call_in_irqstack do_nmi #ifdef CONFIG_TRACE_IRQFLAGS /* paranoidexit; without TRACE_IRQS_OFF */ /* ebx: no swapgs flag */ -- 1.6.0.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] x86: drop the use of the tss interrupt stack table (IST)
The IST is the only thing that requires a valid TSS while running in kernel mode. Dropping its use unlocks an optimization opportunity for kvm: if we don't need a valid TSS while in kernel mode we can defer the use of the VMLOAD/VMSAVE instructions until the next context switch, reducing the executions of these costly instructions by a nice factor. Kernel reliability should also be improved since interrupt paths are simplified. Signed-off-by: Avi Kivity --- arch/x86/include/asm/desc.h | 12 - arch/x86/include/asm/page_64.h |7 --- arch/x86/include/asm/processor.h | 11 arch/x86/kernel/cpu/common.c | 34 - arch/x86/kernel/dumpstack_64.c | 96 -- arch/x86/kernel/entry_64.S | 27 +- arch/x86/kernel/traps.c | 12 ++-- 7 files changed, 9 insertions(+), 190 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index dc27705..c8787ff 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -367,18 +367,6 @@ static inline void set_task_gate(unsigned int n, unsigned int gdt_entry) _set_gate(n, GATE_TASK, (void *)0, 0, 0, (gdt_entry<<3)); } -static inline void set_intr_gate_ist(int n, void *addr, unsigned ist) -{ - BUG_ON((unsigned)n > 0xFF); - _set_gate(n, GATE_INTERRUPT, addr, 0, ist, __KERNEL_CS); -} - -static inline void set_system_intr_gate_ist(int n, void *addr, unsigned ist) -{ - BUG_ON((unsigned)n > 0xFF); - _set_gate(n, GATE_INTERRUPT, addr, 0x3, ist, __KERNEL_CS); -} - #else /* * GET_DESC_BASE reads the descriptor base of the specified segment. diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 5ebca29..7c89095 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -16,13 +16,6 @@ #define IRQSTACK_ORDER 2 #define IRQSTACKSIZE (PAGE_SIZE << IRQSTACK_ORDER) -#define STACKFAULT_STACK 1 -#define DOUBLEFAULT_STACK 2 -#define NMI_STACK 3 -#define DEBUG_STACK 4 -#define MCE_STACK 5 -#define N_EXCEPTION_STACKS 5 /* hw limit: 7 */ - #define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT) #define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1)) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 091cd88..16d0cbe 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -277,13 +277,6 @@ struct tss_struct { DECLARE_PER_CPU(struct tss_struct, init_tss); -/* - * Save the original ist values for checking stack pointers during debugging - */ -struct orig_ist { - unsigned long ist[7]; -}; - #defineMXCSR_DEFAULT 0x1f80 struct i387_fsave_struct { @@ -376,10 +369,6 @@ union thread_xstate { struct xsave_struct xsave; }; -#ifdef CONFIG_X86_64 -DECLARE_PER_CPU(struct orig_ist, orig_ist); -#endif - extern void print_cpu_info(struct cpuinfo_x86 *); extern unsigned int xstate_size; extern void free_thread_xstate(struct task_struct *); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 376b9f9..6808c3a 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -907,9 +907,6 @@ void __cpuinit pda_init(int cpu) } } -static char boot_exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + - DEBUG_STKSZ] __page_aligned_bss; - extern asmlinkage void ignore_sysret(void); /* May not be marked __init: used by software suspend */ @@ -935,12 +932,6 @@ void syscall_init(void) unsigned long kernel_eflags; -/* - * Copies of the original ist values from the tss are only accessed during - * debugging, no special alignment required. - */ -DEFINE_PER_CPU(struct orig_ist, orig_ist); - #else /* Make sure %fs is initialized properly in idle threads */ @@ -964,17 +955,13 @@ void __cpuinit cpu_init(void) { int cpu = stack_smp_processor_id(); struct tss_struct *t = &per_cpu(init_tss, cpu); - struct orig_ist *orig_ist = &per_cpu(orig_ist, cpu); unsigned long v; - char *estacks = NULL; struct task_struct *me; int i; /* CPU 0 is initialised in head64.c */ if (cpu != 0) pda_init(cpu); - else - estacks = boot_exception_stacks; me = current; @@ -1004,27 +991,6 @@ void __cpuinit cpu_init(void) if (cpu != 0 && x2apic) enable_x2apic(); - /* -* set up and load the per-CPU TSS -*/ - if (!orig_ist->ist[0]) { - static const unsigned int order[N_EXCEPTION_STACKS] = { - [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STACK_ORDER, - [DEBUG_STACK - 1] = DEBUG_STACK_ORDER - }; - for (v = 0; v < N_EXCEPTION_STACKS; v++) { - if (cpu) { - estacks = (char *)__get_free_pages(GFP_AT
[PATCH 0/4] Remove interrupt stack table usage from x86_64 kernel (v2)
The interrupt stack table (IST) mechanism is the only thing preventing kvm from deferring saving and reloading of some significant state. It is also somewhat complicated. Remove it by switching the special exceptions to use the normal irqstack. Changes from v1: - rebase on tip/master - as a step, consolidate stack switching into a single macro Jeremy, Xen is also affected; please review. Avi Kivity (4): x86: drop the use of the tss interrupt stack table (IST) x86: Consolidate irq stack switching to a single macro x86: Make interrupt stack switching atomic x86: Move NMI back to interrupt stack arch/x86/include/asm/desc.h | 12 - arch/x86/include/asm/page_64.h |7 --- arch/x86/include/asm/pda.h |2 +- arch/x86/include/asm/processor.h | 11 arch/x86/kernel/asm-offsets_64.c |1 - arch/x86/kernel/cpu/common.c | 35 -- arch/x86/kernel/dumpstack_64.c | 96 -- arch/x86/kernel/entry_64.S | 89 ++- arch/x86/kernel/traps.c | 12 ++-- 9 files changed, 33 insertions(+), 232 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: userspace: Remove duplicated functionality for cpuid processing
Hi Amit, On 26.12.2008, at 07:02, Amit Shah wrote: host_cpuid is now available in target-i386/helper.c. Remove the duplicated code now in kvm-specific code. Signed-off-by: Amit Shah --- qemu/qemu-kvm-x86.c | 70 --- 1 files changed, 0 insertions(+), 70 deletions(-) diff --git a/qemu/qemu-kvm-x86.c b/qemu/qemu-kvm-x86.c index aa36be8..1bf86e1 100644 --- a/qemu/qemu-kvm-x86.c +++ b/qemu/qemu-kvm-x86.c @@ -451,39 +451,6 @@ void kvm_arch_save_regs(CPUState *env) } } -static void host_cpuid(uint32_t function, uint32_t *eax, uint32_t *ebx, - uint32_t *ecx, uint32_t *edx) -{ -uint32_t vec[4]; - -#ifdef __x86_64__ -asm volatile("cpuid" -: "=a"(vec[0]), "=b"(vec[1]), - "=c"(vec[2]), "=d"(vec[3]) -: "0"(function) : "cc"); -#else -asm volatile("pusha \n\t" -"cpuid \n\t" -"mov %%eax, 0(%1) \n\t" -"mov %%ebx, 4(%1) \n\t" -"mov %%ecx, 8(%1) \n\t" -"mov %%edx, 12(%1) \n\t" -"popa" -: : "a"(function), "S"(vec) -: "memory", "cc"); -#endif - -if (eax) - *eax = vec[0]; -if (ebx) - *ebx = vec[1]; -if (ecx) - *ecx = vec[2]; -if (edx) - *edx = vec[3]; -} - - static void do_cpuid_ent(struct kvm_cpuid_entry *e, uint32_t function, CPUState *env) { @@ -494,43 +461,6 @@ static void do_cpuid_ent(struct kvm_cpuid_entry *e, uint32_t function, e->ebx = env->regs[R_EBX]; e->ecx = env->regs[R_ECX]; e->edx = env->regs[R_EDX]; That looks a lot better, but I think we could easily do more! do_cpuid_ent is only called twice like this: do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, ©); We can replace that with: e->eax = i; struct kvm_cpuid_entry *e = &cpuid_ent[cpuid_nent++]; cpu_x86_cpuid(©, &e->eax, &e->ebx, &e->ecx, &e->edx); The same could be done for qemu_kvm_cpuid_on_env. Then we can get rid of qemu-kvm-helper.c too :-). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Remove interrupt stack table usage from x86_64 kernel
* Avi Kivity wrote: > The interrupt stack table (IST) mechanism is the only thing preventing > kvm from deferring saving and reloading of some significant state. It > is also somewhat complicated. > > Remove it by switching the special exceptions to use the normal irqstack. > > Avi Kivity (3): > x86: drop the use of the tss interrupt stack table (IST) > x86: Remove pda.irqcount > x86: Switch critical exceptions and NMI to irqstack > > arch/x86/include/asm/desc.h | 12 - > arch/x86/include/asm/page_64.h |7 --- > arch/x86/include/asm/pda.h |2 +- > arch/x86/include/asm/processor.h | 11 > arch/x86/kernel/asm-offsets_64.c |1 - > arch/x86/kernel/cpu/common.c | 35 -- > arch/x86/kernel/dumpstack_64.c | 96 > -- > arch/x86/kernel/entry_64.S | 49 --- > arch/x86/kernel/traps.c | 12 ++-- > 9 files changed, 27 insertions(+), 198 deletions(-) looks good. Please base your work on the tip/master tree, we have a ton of pending (and conflicting) changes in the lowlevel assembly area: http://people.redhat.com/mingo/tip.git/README Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html