Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Timon Wang
My domain xml is like this:

domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'
  name2008-2/name
  uuid6325d8a5-468d-42e9-b5cb-9a04f5f34e80/uuid
  memory unit='KiB'524288/memory
  currentMemory unit='KiB'524288/currentMemory
  vcpu placement='static'2/vcpu
  os
type arch='x86_64' machine='pc-i440fx-1.4'hvm/type
  /os
  features
acpi/
apic/
pae/
  /features
  clock offset='localtime'/
  on_poweroffdestroy/on_poweroff
  on_rebootrestart/on_reboot
  on_crashdestroy/on_crash
  devices
emulator/usr/bin/qemu-kvm/emulator
disk type='file' device='floppy'
  driver name='qemu' type='raw' cache='none'/
  target dev='fda' bus='fdc'/
  readonly/
  address type='drive' controller='0' bus='0' target='0' unit='0'/
/disk
disk type='file' device='disk'
  driver name='qemu' type='raw' cache='none'/
  source file='/home/images/win2008_2_sys'/
  target dev='hda' bus='ide'/
  boot order='3'/
  address type='drive' controller='0' bus='0' target='0' unit='0'/
/disk
disk type='file' device='cdrom'
  driver name='qemu' type='raw'/
  source file='/home/isos/windows2008_64r2.iso'/
  target dev='sdc' bus='ide'/
  readonly/
  boot order='1'/
  address type='drive' controller='0' bus='1' target='0' unit='0'/
/disk
disk type='block' device='disk'
  driver name='qemu' type='raw'/
  source dev='/dev/fedora/q_disk'/
  target dev='sda' bus='virtio'/
  shareable/
  address type='pci' domain='0x' bus='0x00' slot='0x0a'
function='0x0'/
/disk
controller type='fdc' index='0'/
controller type='ide' index='0'
  address type='pci' domain='0x' bus='0x00' slot='0x01'
function='0x1'/
/controller
controller type='virtio-serial' index='0'
  address type='pci' domain='0x' bus='0x00' slot='0x06'
function='0x0'/
/controller
controller type='usb' index='0'
  address type='pci' domain='0x' bus='0x00' slot='0x01'
function='0x2'/
/controller
controller type='pci' index='0' model='pci-root'/
controller type='scsi' index='0' model='virtio-scsi'
  address type='pci' domain='0x' bus='0x00' slot='0x07'
function='0x0'/
/controller
interface type='bridge'
  mac address='52:54:00:71:20:ae'/
  source bridge='br0'/
  target dev='vport2'/
  model type='rtl8139'/
  address type='pci' domain='0x' bus='0x00' slot='0x03'
function='0x0'/
/interface
interface type='network'
  mac address='52:54:00:12:a0:fd'/
  source network='default'/
  model type='rtl8139'/
  address type='pci' domain='0x' bus='0x00' slot='0x04'
function='0x0'/
/interface
serial type='pty'
  target port='0'/
/serial
console type='pty'
  target type='serial' port='0'/
/console
input type='tablet' bus='usb'/
input type='mouse' bus='ps2'/
graphics type='spice' autoport='yes' listen='0.0.0.0'
  listen type='address' address='0.0.0.0'/
/graphics
sound model='ac97'
  address type='pci' domain='0x' bus='0x00' slot='0x05'
function='0x0'/
/sound
video
  model type='qxl' ram='65536' vram='32768' heads='2'/
  address type='pci' domain='0x' bus='0x00' slot='0x02'
function='0x0'/
/video
memballoon model='virtio'
  address type='pci' domain='0x' bus='0x00' slot='0x08'
function='0x0'/
/memballoon
  /devices
  qemu:commandline
qemu:arg value='-rtc-td-hack'/
  /qemu:commandline
/domain



On 8/19/13, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 15/08/2013 12:01, Timon Wang ha scritto:
 Thanks.

 I have read the link you provide, there is another link which tells me
 to pass a NPIV discovery lun as a disk, this is seen as a local direct
 access disk in windows. RAC and Failure Cluster both consider this
 pass through disk as local disk, not a share disk, and the setup
 process failed.

 Hyper-v provides a virtual Fiber Channel implementation, so I
 wondering if kvm has the same solution like it.

 Can you include the XML file you are using for the domain?

 Paolo




-- 
Focus on: Server Vitualization, Network security,Scanner,NodeJS,JAVA,WWW
Blog: http://www.nohouse.net
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Timon Wang
Right now, I found that Windows Failover Cluster needs SCSI-3
Persistent Reservation, I don't know where if virtio-scsi supports
this, according to http://www.ovirt.org/Features/Virtio-SCSI I found
this:

limited flexibility: virtio-blk does not support all possible storage
scenarios. For example, it does not allow SCSI passthrough or
persistent reservations. In principle, virtio-scsi provides anything
that the underlying SCSI target (be it physical storage, iSCSI or the
in-kernel target) supports.

virtio-blk does not support persistent reservations, but virtio-scsi
may surpport.

Another web page which archive mail lists
http://www.spinics.net/lists/target-devel/msg01813.html, which request
virtio-scsi to implement SPC-3 (persistent reservation) feature, but I
can't found any more information about this.

On Tue, Aug 20, 2013 at 2:00 PM, Timon Wang timon...@gmail.com wrote:
 My domain xml is like this:

 domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'
   name2008-2/name
   uuid6325d8a5-468d-42e9-b5cb-9a04f5f34e80/uuid
   memory unit='KiB'524288/memory
   currentMemory unit='KiB'524288/currentMemory
   vcpu placement='static'2/vcpu
   os
 type arch='x86_64' machine='pc-i440fx-1.4'hvm/type
   /os
   features
 acpi/
 apic/
 pae/
   /features
   clock offset='localtime'/
   on_poweroffdestroy/on_poweroff
   on_rebootrestart/on_reboot
   on_crashdestroy/on_crash
   devices
 emulator/usr/bin/qemu-kvm/emulator
 disk type='file' device='floppy'
   driver name='qemu' type='raw' cache='none'/
   target dev='fda' bus='fdc'/
   readonly/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 disk type='file' device='disk'
   driver name='qemu' type='raw' cache='none'/
   source file='/home/images/win2008_2_sys'/
   target dev='hda' bus='ide'/
   boot order='3'/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 disk type='file' device='cdrom'
   driver name='qemu' type='raw'/
   source file='/home/isos/windows2008_64r2.iso'/
   target dev='sdc' bus='ide'/
   readonly/
   boot order='1'/
   address type='drive' controller='0' bus='1' target='0' unit='0'/
 /disk
 disk type='block' device='disk'
   driver name='qemu' type='raw'/
   source dev='/dev/fedora/q_disk'/
   target dev='sda' bus='virtio'/
   shareable/
   address type='pci' domain='0x' bus='0x00' slot='0x0a'
 function='0x0'/
 /disk
 controller type='fdc' index='0'/
 controller type='ide' index='0'
   address type='pci' domain='0x' bus='0x00' slot='0x01'
 function='0x1'/
 /controller
 controller type='virtio-serial' index='0'
   address type='pci' domain='0x' bus='0x00' slot='0x06'
 function='0x0'/
 /controller
 controller type='usb' index='0'
   address type='pci' domain='0x' bus='0x00' slot='0x01'
 function='0x2'/
 /controller
 controller type='pci' index='0' model='pci-root'/
 controller type='scsi' index='0' model='virtio-scsi'
   address type='pci' domain='0x' bus='0x00' slot='0x07'
 function='0x0'/
 /controller
 interface type='bridge'
   mac address='52:54:00:71:20:ae'/
   source bridge='br0'/
   target dev='vport2'/
   model type='rtl8139'/
   address type='pci' domain='0x' bus='0x00' slot='0x03'
 function='0x0'/
 /interface
 interface type='network'
   mac address='52:54:00:12:a0:fd'/
   source network='default'/
   model type='rtl8139'/
   address type='pci' domain='0x' bus='0x00' slot='0x04'
 function='0x0'/
 /interface
 serial type='pty'
   target port='0'/
 /serial
 console type='pty'
   target type='serial' port='0'/
 /console
 input type='tablet' bus='usb'/
 input type='mouse' bus='ps2'/
 graphics type='spice' autoport='yes' listen='0.0.0.0'
   listen type='address' address='0.0.0.0'/
 /graphics
 sound model='ac97'
   address type='pci' domain='0x' bus='0x00' slot='0x05'
 function='0x0'/
 /sound
 video
   model type='qxl' ram='65536' vram='32768' heads='2'/
   address type='pci' domain='0x' bus='0x00' slot='0x02'
 function='0x0'/
 /video
 memballoon model='virtio'
   address type='pci' domain='0x' bus='0x00' slot='0x08'
 function='0x0'/
 /memballoon
   /devices
   qemu:commandline
 qemu:arg value='-rtc-td-hack'/
   /qemu:commandline
 /domain



 On 8/19/13, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 15/08/2013 12:01, Timon Wang ha scritto:
 Thanks.

 I have read the link you provide, there is another link which tells me
 to pass a NPIV discovery lun as a disk, this is seen as a local direct
 access disk in windows. RAC and Failure Cluster both consider this
 pass through disk as local disk, not a share disk, and the setup
 process failed.

 Hyper-v provides a virtual Fiber Channel implementation, so I
 wondering 

RE: [Qemu-devel] [PATCH] qemu-kvm bugfix for IA32_FEATURE_CONTROL

2013-08-20 Thread Liu, Jinsong
Paolo Bonzini wrote:
 Il 19/08/2013 16:59, Andreas Färber ha scritto:
 qemu-kvm is no longer maintained since 1.3 so it should not be
 occurring any more. 
 
 Please use a prefix of target-i386:  (the directory name) to signal
 where you are changing code, i.e. x86 only.
 
 bugfix is not a very telling description of what a patch is doing.
 
 (Up to Paolo and Gleb whether they'll fix it or whether they require
 a resend.)
 
 No, not this time at least. :)
 
 Paolo

Thanks Paolo, and Andreas's comments is also good, so I update commit message 
and will send out later.

Regards,
Jinsong

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Emulation failure

2013-08-20 Thread Paolo Bonzini
Il 20/08/2013 03:26, Duy Nguyen TN ha scritto:
 Vào T2, ngày 19, 08 năm 2013 lúc 11:27 +0200, Paolo Bonzini viết:
 The disassembled code is

0x1dd10:push   %rbx
0x1dd11:mov$0x6e,%eax
0x1dd16:mov%rdi,%rbx
0x1dd19:sub$0x20,%rsp
0x1dd1d:test   %rdi,%rdi
0x1dd20:je 0xb1dd92
0x1dd22:mov0x4bf1e0(%rip),%eax
0x1dd28:cmp$0x,%eax
0x1dd2b:je 0xb1ddd0
0x1dd31:test   %eax,%eax
0x1dd33:jne0xb1dd92
0x1dd35:mov0xe1f55c(%rip),%rax
0x1dd3c:cmpq   $0x0,0xf0(%rax)
0x1dd44:fildll 0xf0(%rax)
0x1dd4a:js 0xb1ddf0
0x1dd50:mov0xe1f54a(%rip),%eax
0x1dd56:mov%rax,-0x80(%rsp)
0x1dd5b:fildll -0x80(%rsp)
0x1dd5f:fmulp  %st,%st(1)

 Not sure if it helps but rax after 0xb1dd35 contains the pointer to
 mmap'd memory of /dev/hpet

 I think this wouldn't work even with the latest kernel.  Emulation of
 x87 instructions is not supported yet.
 
 I'm confused. How could this program work? It produces similar assembly
 listing

The information you posted is not really enough to get the complete
picture (it is better to grab it from ftrace in the host, or from the
QEMU monitor), but my understanding is that the instruction at 0xb1dd44
doesn't refer to RAM; it refers to a memory-mapped I/O region.  In this
case, the instructions are not executed by the processor.  Instead, they
are emulated by the hypervisor.  KVM does not support emulation of x87
instructions.

Paolo

 -- 8 --
 #include stdio.h
 #include stdint.h
 
 uint64_t s_rtcClockPeriod = 10;
 uint64_t mc = 30;
 int main(int ac, char **av)
 {
 uint64_t value = (uint64_t)((long double)mc * 
  (long double)s_rtcClockPeriod /
 10.0L);
 printf(%lu\n, value);
 return 0;
 }
 -- 8 --
 
 and the assembly I got is
 
 -- 8 --
 sub$0x18,%rsp
 cmpq   $0x0,0x200adc(%rip)
 fildll 0x200ad6(%rip)
 js 0x4005f8 main+184
 cmpq   $0x0,0x200ac0(%rip)
 fildll 0x200aba(%rip)
 js 0x400612 main+210
 fmulp  %st,%st(1)
 fdivs  0x1ac(%rip)
 flds   0x1aa(%rip)
 fxch   %st(1)
 fucomi %st(1),%st
 jae0x4005c0 main+128
 fstp   %st(1)
 fnstcw 0x16(%rsp)
 ...
 -- 8 --
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Paolo Bonzini
Il 20/08/2013 08:00, Timon Wang ha scritto:
 disk type='file' device='disk'
   driver name='qemu' type='raw' cache='none'/
   source file='/home/images/win2008_2_sys'/
   target dev='hda' bus='ide'/
   boot order='3'/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 disk type='file' device='cdrom'
   driver name='qemu' type='raw'/
   source file='/home/isos/windows2008_64r2.iso'/
   target dev='sdc' bus='ide'/
   readonly/
   boot order='1'/
   address type='drive' controller='0' bus='1' target='0' unit='0'/
 /disk
 disk type='block' device='disk'

I'm not sure this will be enough, but if you want passthrough to the
host device you should use device='lun' here.  However, you still would
not be able to issue SCSI reservations unless you run QEMU with the
CAP_SYS_RAWIO capability (using disk ... rawio='yes').

Most important, it still would be unsafe to do this if the same device
is passed to multiple virtual machines on the same host.  You need to
have NPIV and create separate virtual HBAs.  Then each virtual machine
should get a separate virtual HBA.  Otherwise, persistent reservations
are not attached to a particular virtual machine, but generically to the
host.

   driver name='qemu' type='raw'/
   source dev='/dev/fedora/q_disk'/
   target dev='sda' bus='virtio'/

You are not exposing a virtio-scsi disk here.  You are exposing a
virtio-blk disk.  You can see this from the type='pci' address that
libvirt gave to the disk.

If you use bus='scsi', you will see that libvirt will use type='drive'
for the address.

 controller type='scsi' index='0' model='virtio-scsi'
   address type='pci' domain='0x' bus='0x00' slot='0x07'
 function='0x0'/
 /controller

This is okay.

   qemu:commandline
 qemu:arg value='-rtc-td-hack'/
   /qemu:commandline

FWIW, this can be replaced with

  clock offset='localtime'
timer name='rtc' tickpolicy='catchup'/
  /clock

(you already have the clock element, but no timer element inside).

Paolo

 /domain
 
 
 
 On 8/19/13, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 15/08/2013 12:01, Timon Wang ha scritto:
 Thanks.

 I have read the link you provide, there is another link which tells me
 to pass a NPIV discovery lun as a disk, this is seen as a local direct
 access disk in windows. RAC and Failure Cluster both consider this
 pass through disk as local disk, not a share disk, and the setup
 process failed.

 Hyper-v provides a virtual Fiber Channel implementation, so I
 wondering if kvm has the same solution like it.

 Can you include the XML file you are using for the domain?

 Paolo


 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi Queue KVM Support

2013-08-20 Thread Paolo Bonzini
Il 20/08/2013 05:21, Naor Shlomo ha scritto:
 Hi Paolo,
 
 The host is running CentOS release 6.3 (Final).
 I did yum upgrade libvirt and yum upgrade qemu-kvm a couple of days ago 
 and ended up with these versions.
 
 What do you suggest regarding qemu? compile 6.5 or later myself?

RHEL/CentOS 6.5 is not yet out, it's still a few months before it's
released.

You can compile QEMU 1.6 from source, or wait for CentOS to have the
feature.

Paolo

 I appreciate your help,
 Naor
 
 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo 
 Bonzini
 Sent: Monday, August 19, 2013 11:22 PM
 To: Naor Shlomo
 Cc: kvm@vger.kernel.org
 Subject: Re: Multi Queue KVM Support
 
 Il 19/08/2013 13:29, Naor Shlomo ha scritto:
 Hello experts,

 I am trying to use the multi queue support on a Linux guest running Kernel 
 3.9.7.

 The host's virsh version command reports the following output: 
 Compiled against library: libvirt 0.10.2 Using library: libvirt 0.10.2 
 Using API: QEMU 0.10.2 Running hypervisor: QEMU 0.12.1
 
 Is it RHEL or CentOS or Scientific Linux, or something else?  If RHEL/CentOS, 
 what release?
 
 The problem is that virtio_has_feature(vdev, VIRTIO_NET_F_MQ) returns FALSE 
 and I don't know why.
 
 This version of QEMU is too old.  It's possible that 6.5 will have 
 multiqueue, but I'm not entirely sure.
 
 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Timon Wang
On Tue, Aug 20, 2013 at 4:33 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/08/2013 08:00, Timon Wang ha scritto:
 disk type='file' device='disk'
   driver name='qemu' type='raw' cache='none'/
   source file='/home/images/win2008_2_sys'/
   target dev='hda' bus='ide'/
   boot order='3'/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 disk type='file' device='cdrom'
   driver name='qemu' type='raw'/
   source file='/home/isos/windows2008_64r2.iso'/
   target dev='sdc' bus='ide'/
   readonly/
   boot order='1'/
   address type='drive' controller='0' bus='1' target='0' unit='0'/
 /disk
 disk type='block' device='disk'

 I'm not sure this will be enough, but if you want passthrough to the
 host device you should use device='lun' here.  However, you still would
 not be able to issue SCSI reservations unless you run QEMU with the
 CAP_SYS_RAWIO capability (using disk ... rawio='yes').


After change the libvirt xml like this:
disk type='block' device='lun' rawio='yes'
  driver name='qemu' type='raw' cache='none'/
  source dev='/dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk'/
  target dev='sda' bus='scsi'/
  shareable/
  address type='drive' controller='0' bus='0' target='0' unit='0'/
/disk
I got these errors:
char device redirected to /dev/pts/1 (label charserial0)
qemu-system-x86_64: -device
scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0:
scsi-block: INQUIRY failed
qemu-system-x86_64: -device
scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0:
Device 'scsi-block' could not be initialized


 Most important, it still would be unsafe to do this if the same device
 is passed to multiple virtual machines on the same host.  You need to
 have NPIV and create separate virtual HBAs.  Then each virtual machine
 should get a separate virtual HBA.  Otherwise, persistent reservations
 are not attached to a particular virtual machine, but generically to the
 host.

How to use NPIV virtual HBAs with libvirt xml configurations? I can
define nodedev, but have no idea about how to pass the nodedev to the
vm.


   driver name='qemu' type='raw'/
   source dev='/dev/fedora/q_disk'/
   target dev='sda' bus='virtio'/

 You are not exposing a virtio-scsi disk here.  You are exposing a
 virtio-blk disk.  You can see this from the type='pci' address that
 libvirt gave to the disk.

 If you use bus='scsi', you will see that libvirt will use type='drive'
 for the address.

 controller type='scsi' index='0' model='virtio-scsi'
   address type='pci' domain='0x' bus='0x00' slot='0x07'
 function='0x0'/
 /controller

 This is okay.

   qemu:commandline
 qemu:arg value='-rtc-td-hack'/
   /qemu:commandline

 FWIW, this can be replaced with

   clock offset='localtime'
 timer name='rtc' tickpolicy='catchup'/
   /clock

 (you already have the clock element, but no timer element inside).

Thanks for this tip.


 Paolo

 /domain



 On 8/19/13, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 15/08/2013 12:01, Timon Wang ha scritto:
 Thanks.

 I have read the link you provide, there is another link which tells me
 to pass a NPIV discovery lun as a disk, this is seen as a local direct
 access disk in windows. RAC and Failure Cluster both consider this
 pass through disk as local disk, not a share disk, and the setup
 process failed.

 Hyper-v provides a virtual Fiber Channel implementation, so I
 wondering if kvm has the same solution like it.

 Can you include the XML file you are using for the domain?

 Paolo






-- 
Focus on: Server Vitualization, Network security,Scanner,NodeJS,JAVA,WWW
Blog: http://www.nohouse.net
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Paolo Bonzini
Il 20/08/2013 11:59, Timon Wang ha scritto:
 On Tue, Aug 20, 2013 at 4:33 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/08/2013 08:00, Timon Wang ha scritto:
 disk type='file' device='disk'
   driver name='qemu' type='raw' cache='none'/
   source file='/home/images/win2008_2_sys'/
   target dev='hda' bus='ide'/
   boot order='3'/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 disk type='file' device='cdrom'
   driver name='qemu' type='raw'/
   source file='/home/isos/windows2008_64r2.iso'/
   target dev='sdc' bus='ide'/
   readonly/
   boot order='1'/
   address type='drive' controller='0' bus='1' target='0' unit='0'/
 /disk
 disk type='block' device='disk'

 I'm not sure this will be enough, but if you want passthrough to the
 host device you should use device='lun' here.  However, you still would
 not be able to issue SCSI reservations unless you run QEMU with the
 CAP_SYS_RAWIO capability (using disk ... rawio='yes').

 
 After change the libvirt xml like this:
 disk type='block' device='lun' rawio='yes'
   driver name='qemu' type='raw' cache='none'/
   source dev='/dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk'/
   target dev='sda' bus='scsi'/
   shareable/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 I got these errors:
 char device redirected to /dev/pts/1 (label charserial0)
 qemu-system-x86_64: -device
 scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0:
 scsi-block: INQUIRY failed
 qemu-system-x86_64: -device
 scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0:
 Device 'scsi-block' could not be initialized

Can you do

# ls -l /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
# sg_inq /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk

?

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Paolo Bonzini
Il 20/08/2013 12:42, Timon Wang ha scritto:
 [root@localhost /]# ls -l /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
 lrwxrwxrwx. 1 root root 8 8月  20 17:38
 /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk - ../dm-13
 [root@localhost /]# sg_inq /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
 standard INQUIRY:
   PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
   [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=0
   SCCS=1  ACC=0  TPGS=1  3PC=0  Protect=0  [BQue=0]
   EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
   [RelAdr=0]  WBus16=1  Sync=1  Linked=0  [TranDis=0]  CmdQue=1
 length=36 (0x24)   Peripheral device type: disk
  Vendor identification: MacroSAN
  Product identification: LU
  Product revision level: 1.0
  Unit serial number: 0d9281ae-aea4-6da0--02180142b300
 
 This lun is from a vg build based on iscsi target.

If it is a logical volume, you cannot pass it as a LUN to the guest.
Only the whole iSCSI LUN can be passed as a LUN.

Paolo

 [root@localhost /]# libvirtd --version
 libvirtd (libvirt) 1.0.5
 [root@localhost /]# qemu-kvm --version
 QEMU emulator version 1.4.1, Copyright (c) 2003-2008 Fabrice Bellard
 [root@localhost /]# uname -a
 Linux localhost.localdomain 3.9.2-301.fc19.x86_64 #1 SMP Mon May 13
 12:36:24 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
 
 
 On Tue, Aug 20, 2013 at 6:16 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/08/2013 11:59, Timon Wang ha scritto:
 On Tue, Aug 20, 2013 at 4:33 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/08/2013 08:00, Timon Wang ha scritto:
 disk type='file' device='disk'
   driver name='qemu' type='raw' cache='none'/
   source file='/home/images/win2008_2_sys'/
   target dev='hda' bus='ide'/
   boot order='3'/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 disk type='file' device='cdrom'
   driver name='qemu' type='raw'/
   source file='/home/isos/windows2008_64r2.iso'/
   target dev='sdc' bus='ide'/
   readonly/
   boot order='1'/
   address type='drive' controller='0' bus='1' target='0' unit='0'/
 /disk
 disk type='block' device='disk'

 I'm not sure this will be enough, but if you want passthrough to the
 host device you should use device='lun' here.  However, you still would
 not be able to issue SCSI reservations unless you run QEMU with the
 CAP_SYS_RAWIO capability (using disk ... rawio='yes').


 After change the libvirt xml like this:
 disk type='block' device='lun' rawio='yes'
   driver name='qemu' type='raw' cache='none'/
   source dev='/dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk'/
   target dev='sda' bus='scsi'/
   shareable/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 I got these errors:
 char device redirected to /dev/pts/1 (label charserial0)
 qemu-system-x86_64: -device
 scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0:
 scsi-block: INQUIRY failed
 qemu-system-x86_64: -device
 scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0:
 Device 'scsi-block' could not be initialized

 Can you do

 # ls -l /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
 # sg_inq /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk

 ?

 Paolo

 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Multi Queue KVM Support

2013-08-20 Thread Naor Shlomo
Hi Paolo and thanks for your help.

I upgraded the following (compiled from source)
qemu : 1.5.2 stable
libvirt : 1.1.1

but for some reason when I run the version command inside virsh:

Compiled against library: libvirt 1.1.1
Using library: libvirt 1.1.1
Using API: QEMU 1.1.1
Running hypervisor: QEMU 0.12.1

It says that my running Hypervisor is QEMU 0.12.1

Could you please tell me what did I miss, how do I upgrade the hypervisor?

Thanks,
Naor

-Original Message-
From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini
Sent: Tuesday, August 20, 2013 12:28 PM
To: Naor Shlomo
Cc: kvm@vger.kernel.org
Subject: Re: Multi Queue KVM Support

Il 20/08/2013 05:21, Naor Shlomo ha scritto:
 Hi Paolo,
 
 The host is running CentOS release 6.3 (Final).
 I did yum upgrade libvirt and yum upgrade qemu-kvm a couple of days ago 
 and ended up with these versions.
 
 What do you suggest regarding qemu? compile 6.5 or later myself?

RHEL/CentOS 6.5 is not yet out, it's still a few months before it's released.

You can compile QEMU 1.6 from source, or wait for CentOS to have the feature.

Paolo

 I appreciate your help,
 Naor
 
 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of 
 Paolo Bonzini
 Sent: Monday, August 19, 2013 11:22 PM
 To: Naor Shlomo
 Cc: kvm@vger.kernel.org
 Subject: Re: Multi Queue KVM Support
 
 Il 19/08/2013 13:29, Naor Shlomo ha scritto:
 Hello experts,

 I am trying to use the multi queue support on a Linux guest running Kernel 
 3.9.7.

 The host's virsh version command reports the following output: 
 Compiled against library: libvirt 0.10.2 Using library: libvirt 
 0.10.2 Using API: QEMU 0.10.2 Running hypervisor: QEMU 0.12.1
 
 Is it RHEL or CentOS or Scientific Linux, or something else?  If RHEL/CentOS, 
 what release?
 
 The problem is that virtio_has_feature(vdev, VIRTIO_NET_F_MQ) returns FALSE 
 and I don't know why.
 
 This version of QEMU is too old.  It's possible that 6.5 will have 
 multiqueue, but I'm not entirely sure.
 
 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Timon Wang
Thanks, the whole iSCSI LUN have been passed to the VM.

But I test it with scsicmd, and found that the driver may be not
support SPC-3, but if i use this by microsoft iscsi initiator, I can
pass all the scsi3_test tests.

Tool can be found here:
http://www.symantec.com/business/support/index?page=contentid=TECH72086

It's this means that the scsi passthrough windows driver does not
support SPC-3 feature, I have read a post about this, it says if
support this we should change both the implementation and the
documents in virtio spec.

I am new to this list, so I don't know what is the situation right now?

Would somebody please give me some advise on it?


On Tue, Aug 20, 2013 at 6:49 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/08/2013 12:42, Timon Wang ha scritto:
 [root@localhost /]# ls -l /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
 lrwxrwxrwx. 1 root root 8 8月  20 17:38
 /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk - ../dm-13
 [root@localhost /]# sg_inq /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
 standard INQUIRY:
   PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
   [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=0
   SCCS=1  ACC=0  TPGS=1  3PC=0  Protect=0  [BQue=0]
   EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
   [RelAdr=0]  WBus16=1  Sync=1  Linked=0  [TranDis=0]  CmdQue=1
 length=36 (0x24)   Peripheral device type: disk
  Vendor identification: MacroSAN
  Product identification: LU
  Product revision level: 1.0
  Unit serial number: 0d9281ae-aea4-6da0--02180142b300

 This lun is from a vg build based on iscsi target.

 If it is a logical volume, you cannot pass it as a LUN to the guest.
 Only the whole iSCSI LUN can be passed as a LUN.

 Paolo

 [root@localhost /]# libvirtd --version
 libvirtd (libvirt) 1.0.5
 [root@localhost /]# qemu-kvm --version
 QEMU emulator version 1.4.1, Copyright (c) 2003-2008 Fabrice Bellard
 [root@localhost /]# uname -a
 Linux localhost.localdomain 3.9.2-301.fc19.x86_64 #1 SMP Mon May 13
 12:36:24 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


 On Tue, Aug 20, 2013 at 6:16 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/08/2013 11:59, Timon Wang ha scritto:
 On Tue, Aug 20, 2013 at 4:33 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/08/2013 08:00, Timon Wang ha scritto:
 disk type='file' device='disk'
   driver name='qemu' type='raw' cache='none'/
   source file='/home/images/win2008_2_sys'/
   target dev='hda' bus='ide'/
   boot order='3'/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 disk type='file' device='cdrom'
   driver name='qemu' type='raw'/
   source file='/home/isos/windows2008_64r2.iso'/
   target dev='sdc' bus='ide'/
   readonly/
   boot order='1'/
   address type='drive' controller='0' bus='1' target='0' unit='0'/
 /disk
 disk type='block' device='disk'

 I'm not sure this will be enough, but if you want passthrough to the
 host device you should use device='lun' here.  However, you still would
 not be able to issue SCSI reservations unless you run QEMU with the
 CAP_SYS_RAWIO capability (using disk ... rawio='yes').


 After change the libvirt xml like this:
 disk type='block' device='lun' rawio='yes'
   driver name='qemu' type='raw' cache='none'/
   source dev='/dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk'/
   target dev='sda' bus='scsi'/
   shareable/
   address type='drive' controller='0' bus='0' target='0' unit='0'/
 /disk
 I got these errors:
 char device redirected to /dev/pts/1 (label charserial0)
 qemu-system-x86_64: -device
 scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0:
 scsi-block: INQUIRY failed
 qemu-system-x86_64: -device
 scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0:
 Device 'scsi-block' could not be initialized

 Can you do

 # ls -l /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
 # sg_inq /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk

 ?

 Paolo








-- 
Focus on: Server Vitualization, Network security,Scanner,NodeJS,JAVA,WWW
Blog: http://www.nohouse.net
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Timon Wang
I found when I use scsicmd -d1 -s13 test command to test the
controller bus reset request, there will be a blue screen on windows
2008 r2.

The error code is :
BugCheck D1, {4, a, 0, f8800154dd06}

1: kd !analyze -v
***
* *
*Bugcheck Analysis*
* *
***

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0004, memory referenced
Arg2: 000a, IRQL
Arg3: , value 0 = read operation, 1 = write operation
Arg4: f8800154dd06, address which referenced memory

Debugging Details:
--

Page 17c41 not present in the dump file. Type .hh dbgerr004 for details

READ_ADDRESS:  0004

CURRENT_IRQL:  a

FAULTING_IP:
vioscsi+1d06
f880`0154dd06 458b4804mov r9d,dword ptr [r8+4]

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0xD1

PROCESS_NAME:  scsicmd.exe

TRAP_FRAME:  f880009f7670 -- (.trap 0xf880009f7670)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0002 rbx= rcx=fa800065e738
rdx=fa800065e8f8 rsi= rdi=
rip=f8800154dd06 rsp=f880009f7800 rbp=fa800065e8f8
 r8=  r9= r10=fa80009155b0
r11=f880009f7848 r12= r13=
r14= r15=
iopl=0 nv up ei pl zr na po nc
vioscsi+0x1d06:
f880`0154dd06 458b4804mov r9d,dword ptr [r8+4]
ds:e630:0004=
Resetting default scope

LAST_CONTROL_TRANSFER:  from f800016ca469 to f800016caf00

STACK_TEXT:
f880`009f7528 f800`016ca469 : `000a
`0004 `000a ` :
nt!KeBugCheckEx
f880`009f7530 f800`016c90e0 : `
fa80`009151b0 fa80`0155a290 f880`01339110 :
nt!KiBugCheckDispatch+0x69
f880`009f7670 f880`0154dd06 : `0001
f880`0154dcec fa80`009151b0 f880`01323934 :
nt!KiPageFault+0x260
f880`009f7800 f880`0132abcf : fa80`009151b0
fa80`0065e8f8 fa80`0065e738 `0001 : vioscsi+0x1d06
f880`009f7850 f880`0154d971 : `0001
`0001 `002d5000 fa80`00925000 :
storport!StorPortSynchronizeAccess+0x4f
f880`009f7890 f880`01323a0c : fa80`0fb1
fa80`0155a200 `002d5000 fa80`01576010 : vioscsi+0x1971
f880`009f78d0 f880`01333adf : fa80`006eeb30
fa80`006e2070 ` `0801 :
storport!RaCallMiniportResetBus+0x1c
f880`009f7900 f880`01333b68 : fa80`0155a290
fa80`006b39f0 0040` ` :
storport!RaidAdapterResetBus+0x2f
f880`009f7950 f880`0136de0b : `20206f49
`0001 `0001 `20206f49 :
storport!RaidAdapterStorageResetBusIoctl+0x28
f880`009f7980 f880`0136d1d0 : f880`01339110
fa80`00915060 ` fa80`006e2070 : storport! ??
::NNGAKEGL::`string'+0x3c8
f880`009f79d0 f800`019e33a7 : fa80`006e2070
f880`009f7ca0 fa80`006e2070 fa80`0155a290 :
storport!RaDriverDeviceControlIrp+0x90
f880`009f7a10 f800`019e3c06 : `
` ` ` :
nt!IopXxxControlFile+0x607
f880`009f7b40 f800`016ca153 : `001aeb01
`0001 `001aeba0 f800`019db152 :
nt!NtDeviceIoControlFile+0x56
f880`009f7bb0 `77a2ff2a : `
` ` ` :
nt!KiSystemServiceCopyEnd+0x13
`001af1d8 ` : `
` ` ` : 0x77a2ff2a


STACK_COMMAND:  kb

FOLLOWUP_IP:
vioscsi+1d06
f880`0154dd06 458b4804mov r9d,dword ptr [r8+4]

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  vioscsi+1d06

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: vioscsi

IMAGE_NAME:  vioscsi.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  5200724f

FAILURE_BUCKET_ID:  X64_0xD1_vioscsi+1d06

BUCKET_ID:  X64_0xD1_vioscsi+1d06

Followup: MachineOwner
-

On Tue, Aug 20, 2013 at 7:43 PM, Timon Wang timon...@gmail.com wrote:
 Thanks, the whole iSCSI LUN have been passed to the VM.

 But I test it with scsicmd, and found that the driver may be not
 support SPC-3, but if i use this by microsoft iscsi initiator, I can
 pass all the 

Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Paolo Bonzini
Il 20/08/2013 13:43, Timon Wang ha scritto:
 Thanks, the whole iSCSI LUN have been passed to the VM.
 
 But I test it with scsicmd, and found that the driver may be not
 support SPC-3, but if i use this by microsoft iscsi initiator, I can
 pass all the scsi3_test tests.

If you are passing the LUN to the VM with device='lun', the driver and
VMM do not interpret any SCSI command.  You should see exactly the same
data as in the host, which includes support for SPC-3:

 [root@localhost /]# sg_inq /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
 standard INQUIRY:
   PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
   [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=0
   SCCS=1  ACC=0  TPGS=1  3PC=0  Protect=0  [BQue=0]
   EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
   [RelAdr=0]  WBus16=1  Sync=1  Linked=0  [TranDis=0]  CmdQue=1
 length=36 (0x24)   Peripheral device type: disk
  Vendor identification: MacroSAN
  Product identification: LU
  Product revision level: 1.0
  Unit serial number: 0d9281ae-aea4-6da0--02180142b300

Can you try using a Linux VM and executing sg_inq in the VM?

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi Queue KVM Support

2013-08-20 Thread Paolo Bonzini
Il 20/08/2013 13:13, Naor Shlomo ha scritto:
 Hi Paolo and thanks for your help.
 
 I upgraded the following (compiled from source)
 qemu : 1.5.2 stable
 libvirt : 1.1.1
 
 but for some reason when I run the version command inside virsh:
 
 Compiled against library: libvirt 1.1.1
 Using library: libvirt 1.1.1
 Using API: QEMU 1.1.1
 Running hypervisor: QEMU 0.12.1
 
 It says that my running Hypervisor is QEMU 0.12.1
 
 Could you please tell me what did I miss, how do I upgrade the hypervisor?

Not sure.  Adding the libvirt-users mailing list.

 Thanks,
 Naor
 
 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo 
 Bonzini
 Sent: Tuesday, August 20, 2013 12:28 PM
 To: Naor Shlomo
 Cc: kvm@vger.kernel.org
 Subject: Re: Multi Queue KVM Support
 
 Il 20/08/2013 05:21, Naor Shlomo ha scritto:
 Hi Paolo,

 The host is running CentOS release 6.3 (Final).
 I did yum upgrade libvirt and yum upgrade qemu-kvm a couple of days ago 
 and ended up with these versions.

 What do you suggest regarding qemu? compile 6.5 or later myself?
 
 RHEL/CentOS 6.5 is not yet out, it's still a few months before it's released.
 
 You can compile QEMU 1.6 from source, or wait for CentOS to have the feature.
 
 Paolo
 
 I appreciate your help,
 Naor

 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of 
 Paolo Bonzini
 Sent: Monday, August 19, 2013 11:22 PM
 To: Naor Shlomo
 Cc: kvm@vger.kernel.org
 Subject: Re: Multi Queue KVM Support

 Il 19/08/2013 13:29, Naor Shlomo ha scritto:
 Hello experts,

 I am trying to use the multi queue support on a Linux guest running Kernel 
 3.9.7.

 The host's virsh version command reports the following output: 
 Compiled against library: libvirt 0.10.2 Using library: libvirt 
 0.10.2 Using API: QEMU 0.10.2 Running hypervisor: QEMU 0.12.1

 Is it RHEL or CentOS or Scientific Linux, or something else?  If 
 RHEL/CentOS, what release?

 The problem is that virtio_has_feature(vdev, VIRTIO_NET_F_MQ) returns FALSE 
 and I don't know why.

 This version of QEMU is too old.  It's possible that 6.5 will have 
 multiqueue, but I'm not entirely sure.

 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html

 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-20 Thread Zhanghaoyu (A)
  The QEMU command line (/var/log/libvirt/qemu/[domain name].log), 
  LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ 
  QEMU_AUDIO_DRV=none
  /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu
  qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 
  -uuid
  0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults 
  -chardev 
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,ser
  ver, n owait -mon chardev=charmonitor,id=monitor,mode=control 
  -rtc base=localtime -no-shutdown -device
  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
  file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw
  ,cac
  h
  e=none -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-dis
  k0,i
  d
  =virtio-disk0,bootindex=1 -netdev
  tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.
  0
  ,addr=0x3,bootindex=2 -netdev
  tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
  virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.
  0
  ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 
  -device 
  virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.
  0
  ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 
  -device 
  virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.
  0
  ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 
  -device 
  virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.
  0
  ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 
  -device 
  virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.
  0
  ,addr=0x9 -chardev pty,id=charserial0 -device 
  isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga 
  cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb
  -watchdog-action poweroff -device 
  virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
  
 Which QEMU version is this? Can you try with e1000 NICs instead of virtio?
 
 This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem 
 exists, including the performance degradation and readonly GFNs' flooding.
 I tried with e1000 NICs instead of virtio, including the performance 
 degradation and readonly GFNs' flooding, the QEMU version is 1.5.2.
 No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at 
 post-restore stage (i.e. running stage), as soon as the restoring 
 completed, the flooding is starting.
 
 Thanks,
 Zhang Haoyu
 
 --
   Gleb.
 
 Should we focus on the first bad 
 commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' 
 flooding?
 
Not really. There is no point in debugging very old version compiled 
with kvm-kmod, there are to many variables in the environment. I cannot 
reproduce the GFN flooding on upstream, so the problem may be gone, may 
be a result of kvm-kmod problem or something different in how I invoke 
qemu. So the best way to proceed is for you to reproduce with upstream 
version then at least I will be sure that we are using the same code.

Thanks, I will test the combos of upstream kvm kernel and upstream qemu.
And, the guest os version above I said was wrong, current running guest os is 
SLES10SP4.

I tested below combos of qemu and kernel,
+-+-+-+
|  kvm kernel |  QEMU   |   test result   |
+-+-+-+
|  kvm-3.11-2 |   qemu-1.5.2|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.0.0|  BAD|
+-+-+-+
|  SLES11SP2  |   qemu-1.4.0|  BAD|
+-+-+-+
|  SLES11SP2  |   qemu-1.4.2|  BAD|
+-+-+-+
|  SLES11SP2  | qemu-1.5.0-rc0  |  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.0|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.1|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.2|  GOOD   |
+-+-+-+
NOTE:
1. above kvm-3.11-2 in the table is the whole tag kernel download from 
https://git.kernel.org/pub/scm/virt/kvm/kvm.git
2. SLES11SP2's kernel version is 3.0.13-0.27

Then I git bisect the qemu changes between qemu-1.4.2 and qemu-1.5.0-rc0 by 
marking the good version as bad, and the bad version as good,
so the first bad commit is just the patch which fixes the degradation problem.
++---+-+-+
| bisect No. |  commit   |  save-restore   |
migration|

Re: Call for agenda for 2013-08-20

2013-08-20 Thread Juan Quintela
Juan Quintela quint...@redhat.com wrote:
 Hi

 Please, send any topic that you are interested in covering.


Call cancelled.

As this was the only topic,  and neither Frederik or Konrad are able to
attend today,  topic got moved to next call in two weeks.


 Thanks, Juan.

 Agenda so far:
 - Talk about qemu reverse executing (1st description was done this week)
   How to handle IO when we want to do reverse execution.
   How this relate to Kemari needs?
   And to icount changes?

 Call details:

 10:00 AM to 11:00 AM EDT
 Every two weeks

 If you need phone number details,  contact me privately.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-ppc: Update slb array with correct index values.

2013-08-20 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote:

 Alexander Graf ag...@suse.de writes:
 
 On 11.08.2013, at 20:16, Aneesh Kumar K.V wrote:
 
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Without this, a value of rb=0 and rs=0, result in us replacing the 0th 
 index
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Wrong mailing list again ;).
 
 Will post the series again with updated commit message to the qemu list.
 
 
 ---
 target-ppc/kvm.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)
 
 diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
 index 30a870e..5d4e613 100644
 --- a/target-ppc/kvm.c
 +++ b/target-ppc/kvm.c
 @@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs)
/* Sync SLB */
 #ifdef TARGET_PPC64
for (i = 0; i  64; i++) {
 -ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe,
 -   sregs.u.s.ppc64.slb[i].slbv);
 +target_ulong rb  = sregs.u.s.ppc64.slb[i].slbe;
 +/*
 + * KVM_GET_SREGS doesn't retun slb entry with slot information
 + * same as index. So don't depend on the slot information in
 + * the returned value.
 
 This is the generating code in book3s_pr.c:
 
if (vcpu-arch.hflags  BOOK3S_HFLAG_SLB) {
for (i = 0; i  64; i++) {
sregs-u.s.ppc64.slb[i].slbe = 
 vcpu-arch.slb[i].orige | i;
sregs-u.s.ppc64.slb[i].slbv = 
 vcpu-arch.slb[i].origv;
}
 
 Where exactly did you see broken slbe entries?
 
 
 I noticed this when adding support for guest memory dumping via qemu gdb
 server. Now the array we get would look like below
 
 slbe0 slbv0
 slbe1 slbv1
   0
   0

 Ok, so that's where the problem lies. Why are the entries 0 here?
 Either we try to fetch more entries than we should, we populate
 entries incorrectly or the kernel simply returns invalid SLB entry
 values for invalid entries.


The ioctl zero out the sregs, and fill only slb_max entries. So we find
0 filled entries above slb_max. Also we don't pass slb_max to user
space. So userspace have to look at all the 64 entries. 



 Are you seeing this with PR KVM or HV KVM?


HV KVM

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm/queue still ahead of kvm/next

2013-08-20 Thread Paolo Bonzini
Il 09/08/2013 21:26, Paolo Bonzini ha scritto:
 Hi all,
 
 I'm seeing some breakage of shadow-on-shadow and shadow-on-EPT nested
 VMX.  Until I can track more precisely whether it is a regression, and
 on which hosts I can reproduce it, I'm going to leave the patches out of
 kvm/next.
 
 The good news is that nested EPT works pretty well. :)

Yeah, shadow-on-EPT doesn't work on at least the Westmere I tried, so
I'll merge kvm/queue to kvm/next soon.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH uq/master] kvm: Simplify kvm_handle_io

2013-08-20 Thread Paolo Bonzini
Il 13/08/2013 14:43, Jan Kiszka ha scritto:
 Now that cpu_in/out is just a wrapper around address_space_rw, we can
 also call the latter directly. As host endianness == guest endianness,
 there is no need for the memory access helpers st*_p/ld*_p as well.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  kvm-all.c |   28 ++--
  1 files changed, 2 insertions(+), 26 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 716860f..c861354 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1499,32 +1499,8 @@ static void kvm_handle_io(uint16_t port, void *data, 
 int direction, int size,
  uint8_t *ptr = data;
  
  for (i = 0; i  count; i++) {
 -if (direction == KVM_EXIT_IO_IN) {
 -switch (size) {
 -case 1:
 -stb_p(ptr, cpu_inb(port));
 -break;
 -case 2:
 -stw_p(ptr, cpu_inw(port));
 -break;
 -case 4:
 -stl_p(ptr, cpu_inl(port));
 -break;
 -}
 -} else {
 -switch (size) {
 -case 1:
 -cpu_outb(port, ldub_p(ptr));
 -break;
 -case 2:
 -cpu_outw(port, lduw_p(ptr));
 -break;
 -case 4:
 -cpu_outl(port, ldl_p(ptr));
 -break;
 -}
 -}
 -
 +address_space_rw(address_space_io, port, ptr, size,
 + direction == KVM_EXIT_IO_OUT);
  ptr += size;
  }
  }
 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu-kvm bugfix for IA32_FEATURE_CONTROL

2013-08-20 Thread Paolo Bonzini
Il 20/08/2013 05:33, Liu, Jinsong ha scritto:
 Thanks Andreas!
 
 This patch is for qemu-kvm.

Even though the repository is still called qemu-kvm, the uq/master
branch is the only active one and patches there will end up in upstream
QEMU.

There are no qemu-kvm releases anymore.

I applied the patch to uq/master, thanks.

Paolo

 Per my understanding, there are some patches firstly checked in qemu-kvm 
 uq/master branch.
 This patch is to fix c/s 0779caeb1a17f4d3ed14e2925b36ba09b084fb7b of qemu-kvm 
 uq/master branch
 (which is to co-work w/ kvm IA32_FEATURE_CONTROL, and currently not yet in 
 upstream qemu).
 
 This patch is used to fix the bug introduced by 
 0779caeb1a17f4d3ed14e2925b36ba09b084fb7b of qemu-kvm uq/master branch. The 
 bug is reported as
 https://bugs.launchpad.net/qemu-kvm/+bug/1207623
 https://bugs.launchpad.net/qemu/+bug/1213797
 
 Anything I misunderstand, for upstream qemu and qemu-kvm?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fix lapic time counter read for periodic mode

2013-08-20 Thread Paolo Bonzini
Il 13/11/2012 21:40, Marcelo Tosatti ha scritto:
 On Tue, Nov 13, 2012 at 08:52:54AM +0100, Christian Ehrhardt wrote:
  
  Hi,
  
  thanks for your reply.
  
  On Mon, Nov 12, 2012 at 07:32:37PM -0200, Marcelo Tosatti wrote:
there is a bug in the emulation of the lapic time counter. In 
particular
what we are seeing is that the time counter of a periodic lapic timer
in the guest reads as zero 99% of the time. The patch below fixes 
that.

The emulation of the lapic timer is done with the help of a hires
timer that expires with the same frequency as the lapic counter.
New expiration times for a periodic timer are calculated 
incrementally
based on the last scheduled expiration time. This ensures long term
accuracy of the emulated timer close to that of the underlying clock.

The actual value of the lapic time counter is calculated from the
real time difference between current time and scheduled expiration 
time
of the hires timer. If this difference is negative, the hires timer
expired. For oneshot mode this is correctly translated into a zero 
value
for the time counter. However, in periodic mode we must use the 
negative
difference unmodified.

 regards   Christian

Fix lapic time counter read for periodic mode.
   
   In periodic mode the hrtimer is rearmed once expired, see
   apic_timer_fn. So _get_remaining should return proper value
   even if the guest is not able to process timer interrupts. 
   
   Can you describe your specific scenario in more detail?
  
  In my specific case, the host is admittedly somewhat special as it
  already is a rehosted version of linux, i.e. not running directly on
  native hardware. It is still unclear if the host has sufficiently accurate
  timer interrupts. This is most likely part of the problems we are seeing.
  
  However, AFAICS apic_timer_fn is only called once per jiffy (at least in
  some configurations). In particular, it is not called by
  hrtimer_get_remaining. Thus depending on the frequency of the LAPIC timer
  in the guest there might _several_ iterations that are missed. This can
  probably be mitigated by a hires timer interrupts. However, I think
  the problem is still there even in that case.
  
  Additionally, the behaviour that I want to establish matches that of the
  PIT timer (in a not completely obvious way, though).
  
  Having said that the proposed patch in my first mail is incomplete, as
  the mod_64 does not work correctly for negative values. A fixed version
  is below.
  
   regards Christian
  
  Signed-off-by: Christian Ehrhardt l...@c--e.de
 Alright. Please add a comment from the LAPIC documentation describing
 this behaviour (and a nice changelog). Thanks.
 

Christian, did you ever resubmit the patch?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [uq/master patch 2/2] kvm-all.c: max_cpus should not exceed KVM vcpu limit

2013-08-20 Thread Paolo Bonzini
Il 12/08/2013 21:56, Marcelo Tosatti ha scritto:
 maxcpus, which specifies the maximum number of hotpluggable CPUs,
 should not exceed KVM's vcpu limit.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: qemu/kvm-all.c
 ===
 --- qemu.orig/kvm-all.c
 +++ qemu/kvm-all.c
 @@ -1391,6 +1391,13 @@ int kvm_init(void)
  goto err;
  }
  
 +if (max_cpus  max_vcpus) {
 +ret = -EINVAL;
 +fprintf(stderr, Number of max_cpus requested (%d) exceeds max cpus 
 +supported by KVM (%d)\n, max_cpus, max_vcpus);
 +goto err;
 +}
 +
  s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
  if (s-vmfd  0) {
  #ifdef TARGET_S390X
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

I applied this patch to uq/master.  Thanks,

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [uq/master PATCH] kvm: i386: fix LAPIC TSC deadline timer save/restore

2013-08-20 Thread Paolo Bonzini
Il 19/08/2013 22:01, Marcelo Tosatti ha scritto:
 On Mon, Aug 19, 2013 at 08:57:58PM +0200, Paolo Bonzini wrote:
 Il 19/08/2013 19:13, Marcelo Tosatti ha scritto:

 The configuration of the timer represented by MSR_IA32_TSCDEADLINE depends 
 on:

 - APIC LVT Timer register.
 - TSC value.

 Change the order to respect the dependency.

 Do you have a testcase?

 Paolo
 
 Autotest:
 
 python ConfigTest.py --guestname=RHEL.7  --driveformat=virtio_scsi
 --nicmodel=e1000 --mem=2048 --vcpu=4
 --testcase=timedrift..ntp.with_migration --nrepeat=10

Thanks, applied.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: x86: update masterclock when kvmclock_offset is calculated

2013-08-20 Thread Marcelo Tosatti

The offset to add to the hosts monotonic time, kvmclock_offset, is
calculated against the monotonic time at KVM_SET_CLOCK ioctl time.

Request a master clock update at this time, to reduce a potentially
unbounded difference between the values of the masterclock and
the clock value used to calculate kvmclock_offset.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: linux-2.6-kvmclock-fixes/arch/x86/kvm/x86.c
===
--- linux-2.6-kvmclock-fixes.orig/arch/x86/kvm/x86.c
+++ linux-2.6-kvmclock-fixes/arch/x86/kvm/x86.c
@@ -3806,6 +3806,7 @@ long kvm_arch_vm_ioctl(struct file *filp
delta = user_ns.clock - now_ns;
local_irq_enable();
kvm-arch.kvmclock_offset = delta;
+   kvm_gen_update_masterclock(kvm);
break;
}
case KVM_GET_CLOCK: {
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] vfio-pci: PCI hot reset interface

2013-08-20 Thread Alex Williamson
The current VFIO_DEVICE_RESET interface only maps to PCI use cases
where we can isolate the reset to the individual PCI function.  This
means the device must support FLR (PCIe or AF), PM reset on D3hot-D0
transition, device specific reset, or be a singleton device on a bus
for a secondary bus reset.  FLR does not have widespread support,
PM reset is not very reliable, and bus topology is dictated by the
system and device design.  We need to provide a means for a user to
induce a bus reset in cases where the existing mechanisms are not
available or not reliable.

This device specific extension to VFIO provides the user with this
ability.  Two new ioctls are introduced:
 - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO
 - VFIO_DEVICE_PCI_HOT_RESET

The first provides the user with information about the extent of
devices affected by a hot reset.  This is essentially a list of
devices and the IOMMU groups they belong to.  The user may then
initiate a hot reset by calling the second ioctl.  We must be
careful that the user has ownership of all the affected devices
found via the first ioctl, so the second ioctl takes a list of file
descriptors for the VFIO groups affected by the reset.  Each group
must have IOMMU protection established for the ioctl to succeed.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

v2: Use PCI bus iterators.  Depends on pci_walk_slot() patch
v3: #include linux/file.h per Al Viro

 drivers/vfio/pci/vfio_pci.c |  280 +++
 include/uapi/linux/vfio.h   |   38 ++
 2 files changed, 317 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index cef6002..1dfec392 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -13,6 +13,7 @@
 
 #include linux/device.h
 #include linux/eventfd.h
+#include linux/file.h
 #include linux/interrupt.h
 #include linux/iommu.h
 #include linux/module.h
@@ -227,6 +228,104 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
return 0;
 }
 
+struct vfio_pci_walk_info {
+   int ret;
+   void *data;
+};
+
+static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
+{
+   struct vfio_pci_walk_info *walk = data;
+   int *count = walk-data;
+
+   (*count)++;
+   return walk-ret;
+}
+
+struct vfio_pci_fill_info {
+   int max;
+   int cur;
+   struct vfio_pci_dependent_device *devices;
+};
+
+static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
+{
+   struct vfio_pci_walk_info *walk = data;
+   struct vfio_pci_fill_info *fill = walk-data;
+   struct iommu_group *iommu_group;
+
+   if (fill-cur == fill-max) {
+   walk-ret = -EAGAIN; /* Something changed, try again */
+   return walk-ret;
+   }
+
+   iommu_group = iommu_group_get(pdev-dev);
+   if (!iommu_group) {
+   walk-ret = -EPERM; /* Cannot reset non-isolated devices */
+   return walk-ret;
+   }
+
+   fill-devices[fill-cur].group_id = iommu_group_id(iommu_group);
+   fill-devices[fill-cur].segment = pci_domain_nr(pdev-bus);
+   fill-devices[fill-cur].bus = pdev-bus-number;
+   fill-devices[fill-cur].devfn = pdev-devfn;
+   fill-cur++;
+   iommu_group_put(iommu_group);
+   return walk-ret;
+}
+
+struct vfio_pci_group_entry {
+   struct vfio_group *group;
+   int id;
+};
+
+struct vfio_pci_group_info {
+   int count;
+   struct vfio_pci_group_entry *groups;
+};
+
+static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data)
+{
+   struct vfio_pci_walk_info *walk = data;
+   struct vfio_pci_group_info *info = walk-data;
+   struct iommu_group *group;
+   int id, i;
+
+   group = iommu_group_get(pdev-dev);
+   if (!group) {
+   walk-ret = -EPERM;
+   return walk-ret;
+   }
+
+   id = iommu_group_id(group);
+
+   for (i = 0; i  info-count; i++)
+   if (info-groups[i].id == id)
+   break;
+
+   iommu_group_put(group);
+
+   if (i == info-count)
+   walk-ret = -EINVAL;
+
+   return walk-ret;
+}
+
+static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev,
+int (*fn)(struct pci_dev *,
+  void *data), void *data,
+bool slot)
+{
+   struct vfio_pci_walk_info info = { .ret = 0, .data = data };
+
+   if (slot)
+   pci_walk_slot(pdev-slot, fn, info);
+   else
+   pci_walk_bus(pdev-bus, fn, info);
+
+   return info.ret;
+}
+
 static long vfio_pci_ioctl(void *device_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -407,10 +506,189 @@ static long vfio_pci_ioctl(void *device_data,
 
return ret;
 
-   } else if (cmd == VFIO_DEVICE_RESET)
+   } else if (cmd == 

[PATCH] vfio-pci: Use fdget() rather than eventfd_fget()

2013-08-20 Thread Alex Williamson
eventfd_fget() tests to see whether the file is an eventfd file, which
we then immediately pass to eventfd_ctx_fileget(), which again tests
whether the file is an eventfd file.  Simplify slightly by using
fdget() so that we only test that we're looking at an eventfd once.
fget() could also be used, but fdget() makes use of fget_light() for
another slight optimization.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci_intrs.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 4bc704e..7507975 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -130,7 +130,7 @@ static int virqfd_enable(struct vfio_pci_device *vdev,
 void (*thread)(struct vfio_pci_device *, void *),
 void *data, struct virqfd **pvirqfd, int fd)
 {
-   struct file *file = NULL;
+   struct fd irqfd;
struct eventfd_ctx *ctx = NULL;
struct virqfd *virqfd;
int ret = 0;
@@ -149,13 +149,13 @@ static int virqfd_enable(struct vfio_pci_device *vdev,
INIT_WORK(virqfd-shutdown, virqfd_shutdown);
INIT_WORK(virqfd-inject, virqfd_inject);
 
-   file = eventfd_fget(fd);
-   if (IS_ERR(file)) {
-   ret = PTR_ERR(file);
+   irqfd = fdget(fd);
+   if (!irqfd.file) {
+   ret = -EBADF;
goto fail;
}
 
-   ctx = eventfd_ctx_fileget(file);
+   ctx = eventfd_ctx_fileget(irqfd.file);
if (IS_ERR(ctx)) {
ret = PTR_ERR(ctx);
goto fail;
@@ -187,7 +187,7 @@ static int virqfd_enable(struct vfio_pci_device *vdev,
init_waitqueue_func_entry(virqfd-wait, virqfd_wakeup);
init_poll_funcptr(virqfd-pt, virqfd_ptable_queue_proc);
 
-   events = file-f_op-poll(file, virqfd-pt);
+   events = irqfd.file-f_op-poll(irqfd.file, virqfd-pt);
 
/*
 * Check if there was an event already pending on the eventfd
@@ -202,7 +202,7 @@ static int virqfd_enable(struct vfio_pci_device *vdev,
 * Do not drop the file until the irqfd is fully initialized,
 * otherwise we might race against the POLLHUP.
 */
-   fput(file);
+   fdput(irqfd);
 
return 0;
 
@@ -210,8 +210,8 @@ fail:
if (ctx  !IS_ERR(ctx))
eventfd_ctx_put(ctx);
 
-   if (file  !IS_ERR(file))
-   fput(file);
+   if (irqfd.file)
+   fdput(irqfd);
 
kfree(virqfd);
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio-pci: Use fdget() rather than eventfd_fget()

2013-08-20 Thread Al Viro
On Tue, Aug 20, 2013 at 01:18:07PM -0600, Alex Williamson wrote:
 eventfd_fget() tests to see whether the file is an eventfd file, which
 we then immediately pass to eventfd_ctx_fileget(), which again tests
 whether the file is an eventfd file.  Simplify slightly by using
 fdget() so that we only test that we're looking at an eventfd once.
 fget() could also be used, but fdget() makes use of fget_light() for
 another slight optimization.

Umm...

 @@ -210,8 +210,8 @@ fail:
   if (ctx  !IS_ERR(ctx))
   eventfd_ctx_put(ctx);
  
 - if (file  !IS_ERR(file))
 - fput(file);
 + if (irqfd.file)
 + fdput(irqfd);
  
   kfree(virqfd);

IMO it's a bad style; you have three failure exits leading here, and those
ifs are nothing but how far did we get before we'd failed.

fail3:
eventfd_ctx_put(ctx);
fail2:
fdput(irqfd);
fail1:
kfree(virqfd);

is much easier to analyse.  It's a very common pattern and IME it's more
robust than this kind of flexibility...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v3 1/4] idr: Percpu ida

2013-08-20 Thread Andrew Morton
On Fri, 16 Aug 2013 23:09:06 + Nicholas A. Bellinger 
n...@linux-iscsi.org wrote:

 From: Kent Overstreet k...@daterainc.com
 
 Percpu frontend for allocating ids. With percpu allocation (that works),
 it's impossible to guarantee it will always be possible to allocate all
 nr_tags - typically, some will be stuck on a remote percpu freelist
 where the current job can't get to them.
 
 We do guarantee that it will always be possible to allocate at least
 (nr_tags / 2) tags - this is done by keeping track of which and how many
 cpus have tags on their percpu freelists. On allocation failure if
 enough cpus have tags that there could potentially be (nr_tags / 2) tags
 stuck on remote percpu freelists, we then pick a remote cpu at random to
 steal from.
 
 Note that there's no cpu hotplug notifier - we don't care, because
 steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
 more allocations if we had a notifier - but we'll still meet our
 guarantees and it's absolutely not a correctness issue, so I don't think
 it's worth the extra code.

 ...

  include/linux/idr.h |   53 +
  lib/idr.c   |  316 
 +--

I don't think this should be in idr.[ch] at all.  It has no
relationship with the existing code.  Apart from duplicating its
functionality :(

 
 ...

 @@ -243,4 +245,55 @@ static inline int ida_get_new(struct ida *ida, int *p_id)
  
  void __init idr_init_cache(void);
  
 +/* Percpu IDA/tag allocator */
 +
 +struct percpu_ida_cpu;
 +
 +struct percpu_ida {
 + /*
 +  * number of tags available to be allocated, as passed to
 +  * percpu_ida_init()
 +  */
 + unsignednr_tags;
 +
 + struct percpu_ida_cpu __percpu  *tag_cpu;
 +
 + /*
 +  * Bitmap of cpus that (may) have tags on their percpu freelists:
 +  * steal_tags() uses this to decide when to steal tags, and which cpus
 +  * to try stealing from.
 +  *
 +  * It's ok for a freelist to be empty when its bit is set - steal_tags()
 +  * will just keep looking - but the bitmap _must_ be set whenever a
 +  * percpu freelist does have tags.
 +  */
 + unsigned long   *cpus_have_tags;

Why not cpumask_t?

 + struct {
 + spinlock_t  lock;
 + /*
 +  * When we go to steal tags from another cpu (see steal_tags()),
 +  * we want to pick a cpu at random. Cycling through them every
 +  * time we steal is a bit easier and more or less equivalent:
 +  */
 + unsignedcpu_last_stolen;
 +
 + /* For sleeping on allocation failure */
 + wait_queue_head_t   wait;
 +
 + /*
 +  * Global freelist - it's a stack where nr_free points to the
 +  * top
 +  */
 + unsignednr_free;
 + unsigned*freelist;
 + } cacheline_aligned_in_smp;

Why the cacheline_aligned_in_smp?

 +};
 
 ...

 +
 +/* Percpu IDA */
 +
 +/*
 + * Number of tags we move between the percpu freelist and the global 
 freelist at
 + * a time

between a percpu freelist would be more accurate?

 + */
 +#define IDA_PCPU_BATCH_MOVE  32U
 +
 +/* Max size of percpu freelist, */
 +#define IDA_PCPU_SIZE((IDA_PCPU_BATCH_MOVE * 3) / 2)
 +
 +struct percpu_ida_cpu {
 + spinlock_t  lock;
 + unsignednr_free;
 + unsignedfreelist[];
 +};

Data structure needs documentation.  There's one of these per cpu.  I
guess nr_free and freelist are clear enough.  The presence of a lock
in a percpu data structure is a surprise.  It's for cross-cpu stealing,
I assume?

 +static inline void move_tags(unsigned *dst, unsigned *dst_nr,
 +  unsigned *src, unsigned *src_nr,
 +  unsigned nr)
 +{
 + *src_nr -= nr;
 + memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr);
 + *dst_nr += nr;
 +}
 +
 
 ...

 +static inline void alloc_global_tags(struct percpu_ida *pool,
 +  struct percpu_ida_cpu *tags)
 +{
 + move_tags(tags-freelist, tags-nr_free,
 +   pool-freelist, pool-nr_free,
 +   min(pool-nr_free, IDA_PCPU_BATCH_MOVE));
 +}

Document this function?

 +static inline unsigned alloc_local_tag(struct percpu_ida *pool,
 +struct percpu_ida_cpu *tags)
 +{
 + int tag = -ENOSPC;
 +
 + spin_lock(tags-lock);
 + if (tags-nr_free)
 + tag = tags-freelist[--tags-nr_free];
 + spin_unlock(tags-lock);
 +
 + return tag;
 +}

I guess this one's clear enough, if the data structure relationships are
understood.

 +/**
 + * percpu_ida_alloc - allocate a tag
 + * @pool: pool to allocate from
 + * @gfp: gfp flags
 + *
 + * Returns a tag - an integer in the 

Re: [PATCH v2] vhost: Include linux/uio.h instead of linux/socket.h

2013-08-20 Thread David Miller
From: Asias He as...@redhat.com
Date: Mon, 19 Aug 2013 09:23:19 +0800

 memcpy_fromiovec is moved from net/core/iovec.c to lib/iovec.c.
 linux/uio.h provides the declaration for memcpy_fromiovec.
 
 Include linux/uio.h instead of inux/socket.h for it.
 
 Signed-off-by: Asias He as...@redhat.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oracle RAC in libvirt+KVM environment

2013-08-20 Thread Timon Wang
From the fedora 19 host:
[root@fedora ~]# sg_inq /dev/sdc
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=0
  SCCS=1  ACC=0  TPGS=1  3PC=0  Protect=0  [BQue=0]
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=1  Sync=1  Linked=0  [TranDis=0]  CmdQue=1
length=36 (0x24)   Peripheral device type: disk
 Vendor identification: MacroSAN
 Product identification: LU
 Product revision level: 1.0
 Unit serial number: fd01ece6-8540-f4c7--fe170142b300

From the fedora 19 vm:
[root@fedoravm ~]# sg_inq /dev/sdb
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=0
  SCCS=1  ACC=0  TPGS=1  3PC=0  Protect=0  [BQue=0]
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=1  Sync=1  Linked=0  [TranDis=0]  CmdQue=1
length=36 (0x24)   Peripheral device type: disk
 Vendor identification: MacroSAN
 Product identification: LU
 Product revision level: 1.0
 Unit serial number: fd01ece6-8540-f4c7--fe170142b300

The result from fedora 19 host and fedora 19 vm are the same. It's
that means I got a wrong windows vm scsi pass-through driver?
Or is there any tool like sg_inq in windows 2008?


On Tue, Aug 20, 2013 at 8:09 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/08/2013 13:43, Timon Wang ha scritto:
 Thanks, the whole iSCSI LUN have been passed to the VM.

 But I test it with scsicmd, and found that the driver may be not
 support SPC-3, but if i use this by microsoft iscsi initiator, I can
 pass all the scsi3_test tests.

 If you are passing the LUN to the VM with device='lun', the driver and
 VMM do not interpret any SCSI command.  You should see exactly the same
 data as in the host, which includes support for SPC-3:

 [root@localhost /]# sg_inq /dev/VM-IMAGES-BACKUP-DO-NOT-REMOVE/q_disk
 standard INQUIRY:
   PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
   [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=0
   SCCS=1  ACC=0  TPGS=1  3PC=0  Protect=0  [BQue=0]
   EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
   [RelAdr=0]  WBus16=1  Sync=1  Linked=0  [TranDis=0]  CmdQue=1
 length=36 (0x24)   Peripheral device type: disk
  Vendor identification: MacroSAN
  Product identification: LU
  Product revision level: 1.0
  Unit serial number: 0d9281ae-aea4-6da0--02180142b300

 Can you try using a Linux VM and executing sg_inq in the VM?

 Paolo




-- 
Focus on: Server Vitualization, Network security,Scanner,NodeJS,JAVA,WWW
Blog: http://www.nohouse.net
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html