Re: [Qemu-devel] Re: gdbstub: packet reply is too long

2008-12-26 Thread Daniel Jacobowitz
On Sun, Dec 21, 2008 at 12:44:04AM +0100, Jan Kiszka wrote:
> And that means setting current_gdbarch while keeping target_gdbarch -
> that's where reality (existing gdb code) bites us. Again, I'm not
> arguing against fixing this, I'm arguing in keeping qemu's workaround
> until this is done. I will look into the gdb part, but one after the other.

No, it does not mean setting current_gdbarch different from
target_gdbarch.  With the current gdbarch set to a 64-bit one that
accurately describes the target, GDB should be able to debug code
running in 32-bit mode.  If it can't, there are simply bugs in GDB to
fix.

If you'd like to reach some solution to this problem, which I've seen
come up on the QEMU list a half-dozen times now, please describe how
you're using GDB on the g...@sourceware.org mailing list and let's see
if we can't fix the GDB bugs.  I'm pretty sure that any solution is
going to involve always transferring the x86-64 register set, though.

-- 
Daniel Jacobowitz
CodeSourcery
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG() with SCSI-interfaced disk images

2008-12-26 Thread John Morrissey
I'm encountering a kernel BUG() in guests using SCSI-interfaced disk images.
I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same
behavior (disclaimer: Debian has about a dozen patches in their kvm
packaging, but they all seem to be changes to the build/install process or
security-related).

IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian
lenny (32-bit/i386) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64
2.6.26-12).

After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB
filesystem is a reliable trigger), the kernel BUGs (oops output below).

I was previously using KVM 72, and tried upgrading to 79 because both Debian
lenny and Ubuntu hardy guests were panicing due to sym disconnects/timeouts.
79 makes the lenny guest start BUGging as described above. 82 is not
perceivably different from 79 for the lenny guest.

FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up, although
it emits:

Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : No 
Sense [current] 
Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0
Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: No 
additional sense information

at seemingly random intervals. The upgrade to 82 made the hardy guest start
BUGging on soft lockups at random intervals (I can provide the full output
if anyone's interested, but I'm much more interested in the lenny guest
oops at this point).

john


run via libvirt:
/usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \
-boot c -drive file=image.qcow,if=scsi,index=0,boot=on
-net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \
-net tap,fd=17,script=,vlan=0,ifname=vnet2 \
-net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \
-net tap,fd=18,script=,vlan=1,ifname=vnet3 \
-serial pty -parallel none -usb -vnc 0.0.0.0:1

[The KVMWiki asks whether the problem is reproducible with -no-kvm-irqchip,
 -no-kvm-pit, or -no-kvm, but when I tried invoking the above command line
 by hand (outside of libvirt), the VNC console was always blank and there
 was no console output on the serial pty. If this would be useful
 information to have in this case, I'd love to know what I'm doing wrong, or
 if there's a way to specify additional command line arguments with
 libvirt.]

oops generated in the guest:
[  140.101828] sym0: unexpected disconnect
[  140.102748] BUG: unable to handle kernel NULL pointer dereference at 0358
[  140.103818] IP: [] :sym53c8xx:sym_int_sir+0x547/0x118f
[  140.106449] *pdpt = 1f5f9001 *pde =  
[  140.107356] Oops:  [#1] SMP 
[  140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr serio_raw 
i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod cdrom 
ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring virtio 
sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix ide_core 
thermal processor fan thermal_sys
[  140.108062] 
[  140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1)
[  140.108062] EIP: 0060:[] EFLAGS: 00010287 CPU: 0
[  140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx]
[  140.108062] EAX: 000a EBX:  ECX: 1f98c084 EDX: 0030
[  140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0
[  140.108062]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[  140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 
task.ti=de0f2000)
[  140.108062] Stack:  000144d6 7f5a222c c011a853 0021d496  
  
[  140.108062] df98c000 e08e08cd   0001 
 df98c000 
[  140.108062]0084 e08e3f2f df988c00 0046  df544400 
0196  
[  140.108062] Call Trace:
[  140.108062]  [] pvclock_clocksource_read+0x4b/0xd0
[  140.108062]  [] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx]
[  140.108062]  [] sym_interrupt+0x3ee/0x5fd [sym53c8xx]
[  140.108062]  [] sym53c8xx_intr+0x35/0x56 [sym53c8xx]
[  140.108062]  [] handle_IRQ_event+0x23/0x51
[  140.108062]  [] handle_fasteoi_irq+0x71/0xa4
[  140.108062]  [] do_IRQ+0x4d/0x63
[  140.108062]  [] common_interrupt+0x23/0x28
[  140.108062]  [] ptrace_request+0x1ec/0x278
[  140.108062]  [] __do_softirq+0x57/0xd3
[  140.108062]  [] do_softirq+0x45/0x53
[  140.108062]  [] irq_exit+0x35/0x67
[  140.108062]  [] smp_apic_timer_interrupt+0x6b/0x75
[  140.108062]  [] apic_timer_interrupt+0x28/0x30
[  140.108062]  [] _spin_unlock_irqrestore+0x7/0x10
[  140.108062]  [] scsi_dispatch_cmd+0x197/0x205 [scsi_mod]
[  140.108062]  [] scsi_request_fn+0x264/0x32a [scsi_mod]
[  140.108063]  [] __generic_unplug_device+0x1a/0x1c
[  140.108063]  [] __make_request+0x2fe/0x348
[  140.108063]  [] generic_make_request+0x34d/0x37b
[  140.108063]  [] mempool_alloc+0x1c/0xba
[  140.108063]  [] submit_bio+0xc6/0xcd
[  140.108063]  [] bio_alloc_bioset+0x9b/0xf3
[  140.108063]  [] subm

Re: how increase/decrease ram on running vm ?

2008-12-26 Thread Ryota OZAKI
2008/12/27 Ryota OZAKI :
> Have you tried decreasing memory? AFAIK, current ballooning cannot
> increase memory.

oops, i mean ballooning cannot increase memory over the amount of
memory specified in qemu/kvm arguments.

> Regards,
>  ozaki-r
>
> 2008/12/27 Василец Дмитрий :
>> i read this , but i haven't balloon in cli.
>>
>> В Птн, 26/12/2008 в 23:25 +0900, Ryota OZAKI пишет:
>>> Hi,
>>>
>>> http://www.linux-kvm.com/content/memory-ballooning-feature-coming-soon-kvm
>>>
>>> This page might help you.
>>>
>>> Regards,
>>>   ozaki-r
>>>
>>> 2008/12/26 Василец Дмитрий :
>>> > how increase/decrease ram on running vm ?
>>> > i found virtio_balloon module , but don't know how it work.
>>> >
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> > the body of a message to majord...@vger.kernel.org
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how increase/decrease ram on running vm ?

2008-12-26 Thread Ryota OZAKI
Have you tried decreasing memory? AFAIK, current ballooning cannot
increase memory.

Regards,
  ozaki-r

2008/12/27 Василец Дмитрий :
> i read this , but i haven't balloon in cli.
>
> В Птн, 26/12/2008 в 23:25 +0900, Ryota OZAKI пишет:
>> Hi,
>>
>> http://www.linux-kvm.com/content/memory-ballooning-feature-coming-soon-kvm
>>
>> This page might help you.
>>
>> Regards,
>>   ozaki-r
>>
>> 2008/12/26 Василец Дмитрий :
>> > how increase/decrease ram on running vm ?
>> > i found virtio_balloon module , but don't know how it work.
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how increase/decrease ram on running vm ?

2008-12-26 Thread Василец Дмитрий
i read this , but i haven't balloon in cli.

В Птн, 26/12/2008 в 23:25 +0900, Ryota OZAKI пишет:
> Hi,
> 
> http://www.linux-kvm.com/content/memory-ballooning-feature-coming-soon-kvm
> 
> This page might help you.
> 
> Regards,
>   ozaki-r
> 
> 2008/12/26 Василец Дмитрий :
> > how increase/decrease ram on running vm ?
> > i found virtio_balloon module , but don't know how it work.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Remove interrupt stack table usage from x86_64 kernel (v2)

2008-12-26 Thread Ingo Molnar

* Ingo Molnar  wrote:

> They have the following commit IDs, and they are also in tip/master:
> 
>  921e521: x86: move NMI back to interrupt stack
>  36ef6c9: x86: make interrupt stack switching atomic
>  dd64891: x86: consolidate irq stack switching to a single macro
>  955a368: x86: drop the use of the tss interrupt stack table (IST)
> 
> I also started testing them in tip-qa.

testing failed quickly, the attached config crashes. I've pushed out the 
bad kernel to the tip/tmp.master.bad branch:

 fe3aac9: Merge branch 'x86/irq'

(no crashlog available - all i know that the box crashed and rebooted, 
when booted with the bzImage built out of the attached config.)

Ingo
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.28
# Fri Dec 26 15:31:21 2008
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_X86_BIOS_REBOOT=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
# CONFIG_TASK_IO_ACCOUNTING is not set
CONFIG_AUDIT=y
# CONFIG_AUDITSYSCALL is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=21
# CONFIG_CGROUPS is not set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
# CONFIG_GROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_COMPAT_BRK=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_HAVE_PERF_COUNTERS=y

#
# Performance Counters
#
CONFIG_PERF_COUNTERS=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# CONFIG_MARKERS is not set
CONFIG_OPROFILE=y
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_CLASSIC_RCU=y
# CONFIG_TREE_RCU is not set
# CONFIG_PREEMPT_RCU is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
# CONFIG_SMP is not set
CONFIG_SPARSE_IRQ=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_GENER

Re: [PATCH 0/4] Remove interrupt stack table usage from x86_64 kernel (v2)

2008-12-26 Thread Ingo Molnar

* Avi Kivity  wrote:

> The interrupt stack table (IST) mechanism is the only thing preventing
> kvm from deferring saving and reloading of some significant state.  It
> is also somewhat complicated.
> 
> Remove it by switching the special exceptions to use the normal irqstack.
> 
> Changes from v1:
> - rebase on tip/master
> - as a step, consolidate stack switching into a single macro
> 
> Jeremy, Xen is also affected; please review.
> 
> Avi Kivity (4):
>   x86: drop the use of the tss interrupt stack table (IST)
>   x86: Consolidate irq stack switching to a single macro
>   x86: Make interrupt stack switching atomic
>   x86: Move NMI back to interrupt stack
> 
>  arch/x86/include/asm/desc.h  |   12 -
>  arch/x86/include/asm/page_64.h   |7 ---
>  arch/x86/include/asm/pda.h   |2 +-
>  arch/x86/include/asm/processor.h |   11 
>  arch/x86/kernel/asm-offsets_64.c |1 -
>  arch/x86/kernel/cpu/common.c |   35 --
>  arch/x86/kernel/dumpstack_64.c   |   96 
> --
>  arch/x86/kernel/entry_64.S   |   89 ++-
>  arch/x86/kernel/traps.c  |   12 ++--
>  9 files changed, 33 insertions(+), 232 deletions(-)

applied to tip/x86/irq, thanks Avi!

They have the following commit IDs, and they are also in tip/master:

 921e521: x86: move NMI back to interrupt stack
 36ef6c9: x86: make interrupt stack switching atomic
 dd64891: x86: consolidate irq stack switching to a single macro
 955a368: x86: drop the use of the tss interrupt stack table (IST)

I also started testing them in tip-qa.

I added the standard Impact-lines that we do in the x86 tree. Note that 
this patch:

 dd64891: x86: consolidate irq stack switching to a single macro

isnt just consolidating IRQ entry assembly code, it is also changing the 
paranoidentry macros to do IRQ stack entries - and hence switches all but 
the NMI critical exception entries sequences over to the IRQ stack. Your 
later patch:

 921e521: x86: move NMI back to interrupt stack

covers the NMI entry code too.

Please double-check that we indeed now have all the critical exceptions on 
the IRQ stack (they are all rare so testing alone wont show this), and 
please also double-check that we dont have more exceptions and entry 
callpaths on the IRQ stack than what we wanted. For example on a 
preemptible kernel (or in any codepath that calls schedule()) it is fatal 
to be on the IRQ stack, so this has to be very accurately coded.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how increase/decrease ram on running vm ?

2008-12-26 Thread Ryota OZAKI
Hi,

http://www.linux-kvm.com/content/memory-ballooning-feature-coming-soon-kvm

This page might help you.

Regards,
  ozaki-r

2008/12/26 Василец Дмитрий :
> how increase/decrease ram on running vm ?
> i found virtio_balloon module , but don't know how it work.
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] x86: Make interrupt stack switching atomic

2008-12-26 Thread Avi Kivity
Instead of relying on pda.irqcount to tell us whether we're already in an
interrupt or not, examine the stack pointer directly.  This makes the switch
atomic (since there's no window between incrementing the counter and
switching the stack where an NMI could see the new counter but the old stack),
and lets us get rid of a variable.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/pda.h   |2 +-
 arch/x86/kernel/asm-offsets_64.c |1 -
 arch/x86/kernel/cpu/common.c |1 -
 arch/x86/kernel/entry_64.S   |9 +
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h
index 1a79e16..362fd28 100644
--- a/arch/x86/include/asm/pda.h
+++ b/arch/x86/include/asm/pda.h
@@ -14,7 +14,7 @@ struct x8664_pda {
   address */
unsigned long kernelstack;  /* 16 top of kernel stack for current */
unsigned long oldrsp;   /* 24 user rsp for system call */
-   int irqcount;   /* 32 Irq nesting counter. Starts -1 */
+   int unused; /* 32 for rent */
unsigned int cpunumber; /* 36 Logical CPU number */
unsigned long stack_canary; /* 40 stack canary value */
/* gcc-ABI: this canary MUST be at
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 1d41d3f..62dde96 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -52,7 +52,6 @@ int main(void)
ENTRY(kernelstack); 
ENTRY(oldrsp); 
ENTRY(pcurrent); 
-   ENTRY(irqcount);
ENTRY(cpunumber);
ENTRY(irqstackptr);
ENTRY(data_offset);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6808c3a..f0ea980 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -881,7 +881,6 @@ void __cpuinit pda_init(int cpu)
mb();
 
pda->cpunumber = cpu;
-   pda->irqcount = -1;
pda->kernelstack = (unsigned long)stack_thread_info() -
 PDA_STACKOFFSET + THREAD_SIZE;
pda->active_mm = &init_mm;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4f1a38f..61c54d9 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -315,20 +315,21 @@ ENTRY(native_usergs_sysret64)
CFI_REL_OFFSET r15, R15+\offset
.endm
 
-   .macro call_in_irqstack func
+   .macro call_in_irqstack func, scratch=%rax
/* Switch to the irq stack, unless already on it, then call func */
push %rbp
CFI_ADJUST_CFA_OFFSET 8
mov %rsp,%rbp
CFI_DEF_CFA_REGISTER rbp
-   incl %gs:pda_irqcount
-   cmovz %gs:pda_irqstackptr,%rsp
+   mov %gs:pda_irqstackptr, \scratch
+   sub %rsp,\scratch
+   cmp $IRQSTACKSIZE-64,%rax
+   cmova %gs:pda_irqstackptr,%rsp
EMPTY_FRAME 0
call \func
leaveq
CFI_DEF_CFA_REGISTER rsp
CFI_ADJUST_CFA_OFFSET -8
-   decl %gs:pda_irqcount
.endm
 
 /* save partial stack frame */
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] x86: Consolidate irq stack switching to a single macro

2008-12-26 Thread Avi Kivity
Instead of scattering the logic around, move all stack switching logic
into a single macro which calls a caller-supplied logic.

This makes changing the logic easier and improves readability.

Signed-off-by: Avi Kivity 
---
 arch/x86/kernel/entry_64.S |   59 +++
 1 files changed, 21 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4d47cb8..4f1a38f 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -315,6 +315,22 @@ ENTRY(native_usergs_sysret64)
CFI_REL_OFFSET r15, R15+\offset
.endm
 
+   .macro call_in_irqstack func
+   /* Switch to the irq stack, unless already on it, then call func */
+   push %rbp
+   CFI_ADJUST_CFA_OFFSET 8
+   mov %rsp,%rbp
+   CFI_DEF_CFA_REGISTER rbp
+   incl %gs:pda_irqcount
+   cmovz %gs:pda_irqstackptr,%rsp
+   EMPTY_FRAME 0
+   call \func
+   leaveq
+   CFI_DEF_CFA_REGISTER rsp
+   CFI_ADJUST_CFA_OFFSET -8
+   decl %gs:pda_irqcount
+   .endm
+
 /* save partial stack frame */
 ENTRY(save_args)
XCPT_FRAME
@@ -336,18 +352,6 @@ ENTRY(save_args)
je 1f
SWAPGS
/*
-* irqcount is used to check if a CPU is already on an interrupt stack
-* or not. While this is essentially redundant with preempt_count it is
-* a little cheaper to use a separate counter in the PDA (short of
-* moving irq_enter into assembly, which would be too much work)
-*/
-1: incl %gs:pda_irqcount
-   jne 2f
-   popq_cfi %rax   /* move return address... */
-   mov %gs:pda_irqstackptr,%rsp
-   EMPTY_FRAME 0
-   pushq_cfi %rax  /* ... to the new stack */
-   /*
 * We entered an interrupt context - irqs are off:
 */
 2: TRACE_IRQS_OFF
@@ -819,8 +823,7 @@ END(interrupt)
subq $10*8, %rsp
CFI_ADJUST_CFA_OFFSET 10*8
call save_args
-   PARTIAL_FRAME 0
-   call \func
+   call_in_irqstack \func
.endm
 
/*
@@ -836,7 +839,6 @@ common_interrupt:
 ret_from_intr:
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
-   decl %gs:pda_irqcount
leaveq
CFI_DEF_CFA_REGISTERrsp
CFI_ADJUST_CFA_OFFSET   -8
@@ -1060,7 +1062,7 @@ ENTRY(\sym)
TRACE_IRQS_OFF
movq %rsp,%rdi  /* pt_regs pointer */
xorl %esi,%esi  /* no error code */
-   call \do_sym
+   call_in_irqstack \do_sym
jmp paranoid_exit   /* %ebx: no swapgs flag */
CFI_ENDPROC
 END(\sym)
@@ -1096,7 +1098,7 @@ ENTRY(\sym)
movq %rsp,%rdi  /* pt_regs pointer */
movq ORIG_RAX(%rsp),%rsi/* get error code */
movq $-1,ORIG_RAX(%rsp) /* no syscall to restart */
-   call \do_sym
+   call_in_irqstack \do_sym
jmp paranoid_exit   /* %ebx: no swapgs flag */
CFI_ENDPROC
 END(\sym)
@@ -1239,19 +1241,7 @@ END(kernel_execve)
 /* Call softirq on interrupt stack. Interrupts are off. */
 ENTRY(call_softirq)
CFI_STARTPROC
-   push %rbp
-   CFI_ADJUST_CFA_OFFSET   8
-   CFI_REL_OFFSET rbp,0
-   mov  %rsp,%rbp
-   CFI_DEF_CFA_REGISTER rbp
-   incl %gs:pda_irqcount
-   cmove %gs:pda_irqstackptr,%rsp
-   push  %rbp  # backlink for old unwinder
-   call __do_softirq
-   leaveq
-   CFI_DEF_CFA_REGISTERrsp
-   CFI_ADJUST_CFA_OFFSET   -8
-   decl %gs:pda_irqcount
+   call_in_irqstack __do_softirq
ret
CFI_ENDPROC
 END(call_softirq)
@@ -1281,15 +1271,8 @@ ENTRY(xen_do_hypervisor_callback)   # 
do_hypervisor_callback(struct *pt_regs)
movq %rdi, %rsp# we don't return, adjust the stack frame
CFI_ENDPROC
DEFAULT_FRAME
-11:incl %gs:pda_irqcount
-   movq %rsp,%rbp
-   CFI_DEF_CFA_REGISTER rbp
-   cmovzq %gs:pda_irqstackptr,%rsp
-   pushq %rbp  # backlink for old unwinder
-   call xen_evtchn_do_upcall
-   popq %rsp
+   call_in_irqstack xen_evtchn_do_upcall
CFI_DEF_CFA_REGISTER rsp
-   decl %gs:pda_irqcount
jmp  error_exit
CFI_ENDPROC
 END(do_hypervisor_callback)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] x86: Move NMI back to interrupt stack

2008-12-26 Thread Avi Kivity
Now that interrupt stack switching is atomic, we can move the NMI handler
to the interrupt stack.

Signed-off-by: Avi Kivity 
---
 arch/x86/kernel/entry_64.S |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 61c54d9..3d45880 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1496,7 +1496,7 @@ ENTRY(nmi)
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
movq %rsp,%rdi
movq $-1,%rsi
-   call do_nmi
+   call_in_irqstack do_nmi
 #ifdef CONFIG_TRACE_IRQFLAGS
/* paranoidexit; without TRACE_IRQS_OFF */
/* ebx: no swapgs flag */
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] x86: drop the use of the tss interrupt stack table (IST)

2008-12-26 Thread Avi Kivity
The IST is the only thing that requires a valid TSS while running in
kernel mode.  Dropping its use unlocks an optimization opportunity for
kvm: if we don't need a valid TSS while in kernel mode we can defer the
use of the VMLOAD/VMSAVE instructions until the next context switch,
reducing the executions of these costly instructions by a nice factor.

Kernel reliability should also be improved since interrupt paths are
simplified.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/desc.h  |   12 -
 arch/x86/include/asm/page_64.h   |7 ---
 arch/x86/include/asm/processor.h |   11 
 arch/x86/kernel/cpu/common.c |   34 -
 arch/x86/kernel/dumpstack_64.c   |   96 --
 arch/x86/kernel/entry_64.S   |   27 +-
 arch/x86/kernel/traps.c  |   12 ++--
 7 files changed, 9 insertions(+), 190 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index dc27705..c8787ff 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -367,18 +367,6 @@ static inline void set_task_gate(unsigned int n, unsigned 
int gdt_entry)
_set_gate(n, GATE_TASK, (void *)0, 0, 0, (gdt_entry<<3));
 }
 
-static inline void set_intr_gate_ist(int n, void *addr, unsigned ist)
-{
-   BUG_ON((unsigned)n > 0xFF);
-   _set_gate(n, GATE_INTERRUPT, addr, 0, ist, __KERNEL_CS);
-}
-
-static inline void set_system_intr_gate_ist(int n, void *addr, unsigned ist)
-{
-   BUG_ON((unsigned)n > 0xFF);
-   _set_gate(n, GATE_INTERRUPT, addr, 0x3, ist, __KERNEL_CS);
-}
-
 #else
 /*
  * GET_DESC_BASE reads the descriptor base of the specified segment.
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 5ebca29..7c89095 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -16,13 +16,6 @@
 #define IRQSTACK_ORDER 2
 #define IRQSTACKSIZE (PAGE_SIZE << IRQSTACK_ORDER)
 
-#define STACKFAULT_STACK 1
-#define DOUBLEFAULT_STACK 2
-#define NMI_STACK 3
-#define DEBUG_STACK 4
-#define MCE_STACK 5
-#define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
-
 #define PUD_PAGE_SIZE  (_AC(1, UL) << PUD_SHIFT)
 #define PUD_PAGE_MASK  (~(PUD_PAGE_SIZE-1))
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 091cd88..16d0cbe 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -277,13 +277,6 @@ struct tss_struct {
 
 DECLARE_PER_CPU(struct tss_struct, init_tss);
 
-/*
- * Save the original ist values for checking stack pointers during debugging
- */
-struct orig_ist {
-   unsigned long   ist[7];
-};
-
 #defineMXCSR_DEFAULT   0x1f80
 
 struct i387_fsave_struct {
@@ -376,10 +369,6 @@ union thread_xstate {
struct xsave_struct xsave;
 };
 
-#ifdef CONFIG_X86_64
-DECLARE_PER_CPU(struct orig_ist, orig_ist);
-#endif
-
 extern void print_cpu_info(struct cpuinfo_x86 *);
 extern unsigned int xstate_size;
 extern void free_thread_xstate(struct task_struct *);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 376b9f9..6808c3a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -907,9 +907,6 @@ void __cpuinit pda_init(int cpu)
}
 }
 
-static char boot_exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ +
- DEBUG_STKSZ] __page_aligned_bss;
-
 extern asmlinkage void ignore_sysret(void);
 
 /* May not be marked __init: used by software suspend */
@@ -935,12 +932,6 @@ void syscall_init(void)
 
 unsigned long kernel_eflags;
 
-/*
- * Copies of the original ist values from the tss are only accessed during
- * debugging, no special alignment required.
- */
-DEFINE_PER_CPU(struct orig_ist, orig_ist);
-
 #else
 
 /* Make sure %fs is initialized properly in idle threads */
@@ -964,17 +955,13 @@ void __cpuinit cpu_init(void)
 {
int cpu = stack_smp_processor_id();
struct tss_struct *t = &per_cpu(init_tss, cpu);
-   struct orig_ist *orig_ist = &per_cpu(orig_ist, cpu);
unsigned long v;
-   char *estacks = NULL;
struct task_struct *me;
int i;
 
/* CPU 0 is initialised in head64.c */
if (cpu != 0)
pda_init(cpu);
-   else
-   estacks = boot_exception_stacks;
 
me = current;
 
@@ -1004,27 +991,6 @@ void __cpuinit cpu_init(void)
if (cpu != 0 && x2apic)
enable_x2apic();
 
-   /*
-* set up and load the per-CPU TSS
-*/
-   if (!orig_ist->ist[0]) {
-   static const unsigned int order[N_EXCEPTION_STACKS] = {
- [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STACK_ORDER,
- [DEBUG_STACK - 1] = DEBUG_STACK_ORDER
-   };
-   for (v = 0; v < N_EXCEPTION_STACKS; v++) {
-   if (cpu) {
-   estacks = (char *)__get_free_pages(GFP_AT

[PATCH 0/4] Remove interrupt stack table usage from x86_64 kernel (v2)

2008-12-26 Thread Avi Kivity
The interrupt stack table (IST) mechanism is the only thing preventing
kvm from deferring saving and reloading of some significant state.  It
is also somewhat complicated.

Remove it by switching the special exceptions to use the normal irqstack.

Changes from v1:
- rebase on tip/master
- as a step, consolidate stack switching into a single macro

Jeremy, Xen is also affected; please review.

Avi Kivity (4):
  x86: drop the use of the tss interrupt stack table (IST)
  x86: Consolidate irq stack switching to a single macro
  x86: Make interrupt stack switching atomic
  x86: Move NMI back to interrupt stack

 arch/x86/include/asm/desc.h  |   12 -
 arch/x86/include/asm/page_64.h   |7 ---
 arch/x86/include/asm/pda.h   |2 +-
 arch/x86/include/asm/processor.h |   11 
 arch/x86/kernel/asm-offsets_64.c |1 -
 arch/x86/kernel/cpu/common.c |   35 --
 arch/x86/kernel/dumpstack_64.c   |   96 --
 arch/x86/kernel/entry_64.S   |   89 ++-
 arch/x86/kernel/traps.c  |   12 ++--
 9 files changed, 33 insertions(+), 232 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: userspace: Remove duplicated functionality for cpuid processing

2008-12-26 Thread Alexander Graf

Hi Amit,

On 26.12.2008, at 07:02, Amit Shah wrote:


host_cpuid is now available in target-i386/helper.c.
Remove the duplicated code now in kvm-specific code.

Signed-off-by: Amit Shah 
---
qemu/qemu-kvm-x86.c |   70  
---

1 files changed, 0 insertions(+), 70 deletions(-)

diff --git a/qemu/qemu-kvm-x86.c b/qemu/qemu-kvm-x86.c
index aa36be8..1bf86e1 100644
--- a/qemu/qemu-kvm-x86.c
+++ b/qemu/qemu-kvm-x86.c
@@ -451,39 +451,6 @@ void kvm_arch_save_regs(CPUState *env)
}
}

-static void host_cpuid(uint32_t function, uint32_t *eax, uint32_t  
*ebx,

-  uint32_t *ecx, uint32_t *edx)
-{
-uint32_t vec[4];
-
-#ifdef __x86_64__
-asm volatile("cpuid"
-: "=a"(vec[0]), "=b"(vec[1]),
-  "=c"(vec[2]), "=d"(vec[3])
-: "0"(function) : "cc");
-#else
-asm volatile("pusha \n\t"
-"cpuid \n\t"
-"mov %%eax, 0(%1) \n\t"
-"mov %%ebx, 4(%1) \n\t"
-"mov %%ecx, 8(%1) \n\t"
-"mov %%edx, 12(%1) \n\t"
-"popa"
-: : "a"(function), "S"(vec)
-: "memory", "cc");
-#endif
-
-if (eax)
-   *eax = vec[0];
-if (ebx)
-   *ebx = vec[1];
-if (ecx)
-   *ecx = vec[2];
-if (edx)
-   *edx = vec[3];
-}
-
-
static void do_cpuid_ent(struct kvm_cpuid_entry *e, uint32_t function,
 CPUState *env)
{
@@ -494,43 +461,6 @@ static void do_cpuid_ent(struct kvm_cpuid_entry  
*e, uint32_t function,

e->ebx = env->regs[R_EBX];
e->ecx = env->regs[R_ECX];
e->edx = env->regs[R_EDX];


That looks a lot better, but I think we could easily do more!

do_cpuid_ent is only called twice like this:  
do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, ©);


We can replace that with:

e->eax = i;
struct kvm_cpuid_entry *e = &cpuid_ent[cpuid_nent++];
cpu_x86_cpuid(©, &e->eax, &e->ebx, &e->ecx, &e->edx);

The same could be done for qemu_kvm_cpuid_on_env. Then we can get rid  
of qemu-kvm-helper.c too :-).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Remove interrupt stack table usage from x86_64 kernel

2008-12-26 Thread Ingo Molnar

* Avi Kivity  wrote:

> The interrupt stack table (IST) mechanism is the only thing preventing
> kvm from deferring saving and reloading of some significant state.  It
> is also somewhat complicated.
> 
> Remove it by switching the special exceptions to use the normal irqstack.
> 
> Avi Kivity (3):
>   x86: drop the use of the tss interrupt stack table (IST)
>   x86: Remove pda.irqcount
>   x86: Switch critical exceptions and NMI to irqstack
> 
>  arch/x86/include/asm/desc.h  |   12 -
>  arch/x86/include/asm/page_64.h   |7 ---
>  arch/x86/include/asm/pda.h   |2 +-
>  arch/x86/include/asm/processor.h |   11 
>  arch/x86/kernel/asm-offsets_64.c |1 -
>  arch/x86/kernel/cpu/common.c |   35 --
>  arch/x86/kernel/dumpstack_64.c   |   96 
> --
>  arch/x86/kernel/entry_64.S   |   49 ---
>  arch/x86/kernel/traps.c  |   12 ++--
>  9 files changed, 27 insertions(+), 198 deletions(-)

looks good. Please base your work on the tip/master tree, we have a ton of 
pending (and conflicting) changes in the lowlevel assembly area:

  http://people.redhat.com/mingo/tip.git/README

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html