from:"Pavel Fedin"

Re: [Qemu-devel] [libvirt] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-15 Thread Pavel Fedin

 Hello!

> I know Pavel Fedin was trying to revive kernel_irqchip=off once,
> but I don't know if that effort was abandoned or not.

 It should work with the latest kernel, at least i posted patches and all of 
them were applied. If nothing got broken during later
rewrites.
 The only missing part is generic timer support. There were problems with it, 
however, after rewrite, they can be clearly addressed,
without need for any hacks. The following patchset implements this on kernel 
side, but it has never been reviewed:
http://www.spinics.net/lists/kvm/msg124539.html. I also have qemu support in my 
experimental tree and it works great, i can run
"virt" guest on a Samsung's proprietary board with FrankenGIC, but since there 
was no interest, i never polished it up and
published.

> I think it
> could be a nice-to-have, in order to help isolate bugs with KVM,
> but I agree running that way wouldn't be the norm.

 IMHO it depends on what you want to achieve. If you strive for performance, 
then yes, of course. But, if you want to emulate some
particular hardware on another hardware, then this can be the only way to do it 
if, for example, you have GICv3-only hardware. KVM
without irqchip is still much better than TCG.

 But yes, i never included it into Libvirt. Once i was thinking about something 
like , but perhaps it's not good
idea because this option is not ARM-specific, it's architecture-agnostic and 
applicable to any KVM acceleration for IRQ controller.
Whether it works or not for the given platform, it is IMHO different story.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support

2016-02-02 Thread Pavel Fedin

 Hello!

> Shouldn't the default be "host", to mean "whatever the host supports",
> rather than a specific version based either on host or QEMU probing?

 No, because:
1) Older qemu, which does not support this option, uses v2.
2) It also depends on whether KVM or TCG is used. Currently we can have v3 only 
with KVM, but when software emulation is implemented, it can be used in all 
cases, regardless of host KVM restrictions.

> That should work for every QEMU version, right?

 Wrong. See (1) above.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region

2016-01-27 Thread Pavel Fedin

 Hello!

> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 3839c68..7eaf8be 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -125,6 +125,7 @@ static const MemMapEntry a15memmap[] = {
>  [VIRT_GPIO] =   { 0x0903, 0x1000 },
>  [VIRT_SECURE_UART] ={ 0x0904, 0x1000 },
>  [VIRT_MMIO] =   { 0x0a00, 0x0200 },
> +[VIRT_RESERVED] =   { 0x0be0, 0x0010 },

 Looks like with this approach we would need to add this to all machine models 
which make use of PCI. But is it a good idea? As far
as i understand, the only requirement for this region is not to clash with 
guest RAM addresses. So, can we instead have some code,
which automatically finds some place, based on the size? For now we hardcode 
the size to 0x0010, but in future we could query
the host for the size, because it's still host's MSI controller.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v1 4/5] kvm/x86: Hyper-V VMBus hypercall userspace exit

2016-01-20 Thread Pavel Fedin

 Hello!

> Because, as the docs say, we don't want to do that.

 Where do they say this? I remember that "we want to use KVM_EXIT_IO or 
KVM_EXIT_MMIO for handling device I/O". In other words - we
should not introduce anything that requires any other mechanism (e. g. 
hypercalls) to handle this. And that's all. By the way,
Hyper-V implementation contradicts this rule by itself. And i wonder why we 
need it at all, but, well, why not. So, Hyper-V violates
this rule only because it was designed by other people under different rules. 
And we have to follow these rules. So we have to
support hypercalls. And hypercalls cannot map to EXIT_IO or EXIT_MMIO, so...

>  We want to use
> KVM_EXIT_IO or KVM_EXIT_MMIO, with two exceptions: s390 and wherever we
> can't do that for compatibility purposes.  This is the latter.

 Yes, basically this is what i told above...

>  So we
> should not add a new exit (I suggested elsewhere the existing hyper-v
> exit)

 Yes, this would also be more consistent i think, if we think subsystem-centric 
("we are implementing Hyper-V") instead of
implementation-centric ("we are implementing hypercalls").

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v1 4/5] kvm/x86: Hyper-V VMBus hypercall userspace exit

2016-01-14 Thread Pavel Fedin

 Hello!

> We thought reusing KVM_EXIT_HYPERCALL was a bad idea exactly because of
> that.  Hypercalls are not universal, the calling and return conventions
> are hypervisor-specific.

 Treatment of them is hypervisor-specific, but from CPUs point of view they are 
the same. You load something into registers, and
execute hypercall instruction. So, you just need to pass registers in your 
structure. Or, you could even use generic register access
APIs.

>  KVM already has to make the decision that the
> particular vmexit is a HyperV hypercall; it appears unnatural to then
> pass the data on to userspace in a generic structure and have them make
> that decision again.

 Is it so difficult to make such a decision? The userland already knows what we 
are emulating.
 I'm afraid that in future we can end up in having 10 versions of 
KVM_EXIT_xxx_HYPERCALL with very small difference between them.
Will it be good?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v1 4/5] kvm/x86: Hyper-V VMBus hypercall userspace exit

2016-01-14 Thread Pavel Fedin

 Hello!

> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3359,6 +3359,14 @@ Hyper-V SynIC state change. Notification is used to 
> remap SynIC
>  event/message pages and to enable/disable SynIC messages/events processing
>  in userspace.
> 
> + /* KVM_EXIT_HYPERV_HCALL */
> + struct {
> + __u64 input;
> + __u64 params[2];
> + __u64 result;
> + } hv_hcall;
> +Indicates that the VCPU exits into userspace to process some guest
> +Hyper-V hypercalls.

 Why introducing this? We already have KVM_EXIT_HYPERCALL, which is exactly the 
same. AFAIK it's not used at the moment.
 Additionally, in theory we could have hypercalls handled in userspace for 
something else except HyperV. And not only for x86.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] live migration vs device assignment (motivation)

2015-12-28 Thread Pavel Fedin

 Hello!

> A dedicated IRQ per device for something that is a system wide event
> sounds like a waste.  I don't understand why a spec change is strictly
> required, we only need to support this with the specific virtual bridge
> used by QEMU, so I think that a vendor specific capability will do.
> Once this works well in the field, a PCI spec ECN might make sense
> to standardise the capability.

 Keeping track of your discussion for some time, decided to jump in...
 So far, we want to have some kind of mailbox to notify the quest about 
migration. So what about some dedicated "pci device" for
this purpose? Some kind of "migration controller". This is:
a) perhaps easier to implement than capability, we don't need to push anything 
to PCI spec.
b) could easily make friendship with Windows, because this means that no bus 
code has to be touched at all. It would rely only on
drivers' ability to communicate with each other (i guess it should be possible 
in Windows, isn't it?)
c) does not need to steal resources (BARs, IRQs, etc) from the actual devices.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-21 Thread Pavel Fedin

 Hello!

> >   It depends. Can i read about these hypercalls somewhere? Is there any 
> > documentation?
> I don't know about a documentation, but you can look at the code of
> Hyper-V hypercall handling inside KVM:
> 
> https://github.com/torvalds/linux/blob/master/arch/x86/kvm/hyperv.c#L346

 Aha, i see, so vmmcall CPU instruction is employed. Well, i believe this very 
well fits into the sematics of KVM_EXIT_HYPERCALL,
because it's a true hypercall.

> The code simply decodes hypercall parameters from vcpu registers then
> handle hypercall code in switch and encode return code inside vcpu
> registers. Probably encode and decode of hypercall parameters/return
> code can be done in QEMU so we need only some exit with parameter that
> this is Hyper-V hypercall and probably KVM_EXIT_HYPERCALL is good for it.

 Or you could even reuse the whole structure, it has all you need:

__u64 nr;   /* Reserved for x86, other 
architectures can use it, for example ARM "hvc #nr" */
__u64 args[6];  /* rax, rbx, rcx, rdx, rdi, rsi */
__u64 ret;
__u32 longmode; /* longmode; other architectures (like 
ARM64) can also make sense of it */

 Or you could put in struct kvm_regs instead of args and ret, and allow the 
userspace to manipulate it.

> But KVM_EXIT_HYPERCALL is not used inside KVM/QEMU so requires
> implementation.

 I guess your hypercalls to be introduced using KVM_EXIT_HYPERV are also not 
used inside qemu so require implementation :)

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-21 Thread Pavel Fedin

 Hello!

> Yes, we can use  KVM_EXIT_REG_IO/MSR_IO for Hyper-V SynIC MSRS's changes
> and can even use only one MSR value . So union inside struct
> kvm_hyperv_exit is excessive.
> 
> But we still need Vcpu exit to handle VMBus hypercalls by QEMU to
> emulate VMBus devices inside QEMU.
> 
> And currently we are going to extend struct kvm_hyperv_exit
> to store Hyper-V VMBus hypercall parameters.

 Hm... Hypercalls, you say?
 We already have KVM_EXIT_HYPERCALL. Documentation says it's currently unused. 
Is it a leftover from ia64 KVM? Could we reuse it for
the purpose?

> but could we replace Hyper-V VMBus hypercall and it's parameters
> by KVM_EXIT_REG_IO/MSR_IO too?

 It depends. Can i read about these hypercalls somewhere? Is there any 
documentation?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-20 Thread Pavel Fedin

 Hello!

 Replying to everything in one message.

> > As far as i understand this code, KVM_EXIT_HYPERV is called when one
> > of three MSRs are accessed. But, shouldn't we have implemented
> > instead something more generic, like KVM_EXIT_REG_IO, which would
> > work similar to KVM_EXIT_PIO or KVM_EXIT_MMIO, but carry register
> > code and value?
> 
> Yes, we considered that.  There were actually patches for this as well.

 Ah, i missed them, what a pity. There are lots of patches, i don't review them 
all. Actually i have noticed the change only after
it appeared in linux-next.

>  However, in this case the register is still emulated in the kernel, and
> userspace just gets informed of the new value.

 I see, but this doesn't change the semantic. All we need to do is to tell the 
userland that "register has been written".
Additionally to this we could do whatever we want, including caching the data 
in kernel, using it in kernel, and processing reads in
kernel.

> If we do get that, we will just rename KVM_EXIT_HYPERV to
> KVM_EXIT_MSR_ACCESS, and KVM_EXIT_HYPERV_SYNIC to
> KVM_EXIT_MSR_HYPERV_SYNIC, and struct kvm_hyperv_exit to kvm_msr_exit.

 Actually, i see this in more generic way, something like:

/* KVM_EXIT_REG_ACCESS */
struct {
__u64 reg;
__u64 data;
__u8  is_write;
} mmio;
 
 'data' and 'is_write' are self-explanatory, 'reg' would be generalized 
register code, the same as used for KVM_(GET|SET)_ONE_REG:
 - for ARM64: ARM64_SYS_REG(op0, op1, crn, crm, op2) - see
http://lxr.free-electrons.com/source/arch/arm64/include/uapi/asm/kvm.h#L189
 - for x86  : to be defined (i know, we don't use ..._ONE_REG operations here 
yet), like X86_MSR_REG(id), where the macro itself is:

#define X86_MSR_REG(id) (KVM_REG_X86 | KVM_REG_X86_MSR | 
KVM_REG_SIZE_U64 | (id))

 - for other architectures: to be defined in a similar way, once needed.

> On brief inspection of Andrey's patch (I have not been following
> closely) it looks like the kvm_hyperv_exit struct that's returned to
> userspace contains more data (control, evt_page, and msg_page fields)
> than simply the value of the MSR, so would the desired SynIC exit fit
> into a general-purpose exit for MSR emulation?

 I have looked at the code too, and these three fields are nothing more than 
values of respective MSR's:

case HV_X64_MSR_SCONTROL:
synic->control = data;
if (!host)
synic_exit(synic, msr);
break;



case HV_X64_MSR_SIEFP:
if (data & HV_SYNIC_SIEFP_ENABLE)
if (kvm_clear_guest(vcpu->kvm,
data & PAGE_MASK, PAGE_SIZE)) {
ret = 1;
break;
}
synic->evt_page = data;
if (!host)
synic_exit(synic, msr);
break;
case HV_X64_MSR_SIMP:
if (data & HV_SYNIC_SIMP_ENABLE)
if (kvm_clear_guest(vcpu->kvm,
data & PAGE_MASK, PAGE_SIZE)) {
ret = 1;
break;
}
synic->msg_page = data;
if (!host)
synic_exit(synic, msr);
break;

 So, every time one of these thee MSRs is written, we get a vmexit with values 
of all three registers, and that's all. We could
easily have 'synic_exit(synic, msr, data)' in all three cases, and i think the 
userspace could easily deal with proposed
KVM_EXIT_REG_ACCESS, just cache these values internally if needed.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-18 Thread Pavel Fedin

 Hello!

 I realize that it's perhaps too late, because patches are already on 
Linux-next, but i have one concern... May be it's not too
late...

 I dislike implementing architecture-dependent exit code where we could 
implement an architecture-independent one.

 As far as i understand this code, KVM_EXIT_HYPERV is called when one of three 
MSRs are accessed. But, shouldn't we have implemented
instead something more generic, like KVM_EXIT_REG_IO, which would work similar 
to KVM_EXIT_PIO or KVM_EXIT_MMIO, but carry register
code and value?

 This would allow us to solve the same task which we have done here, but this 
solution would be reusable for other devices and other
archirectures. What if in future we have more system registers to emulate in 
userspace?

 I write this because at one point i suggested similar thing for ARM64 (but i 
never actually wrote it), to emulate physical CP15
timer. And it would require exactly the same capability - process some trapped 
system register accesses in userspace.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] vhost_user and VIRTIO_NET_F_GUEST_ANNOUNCE

2015-12-10 Thread Pavel Fedin

 Hello!

 I am currently testing this patchset 
(http://dpdk.org/ml/archives/dev/2015-December/029193.html), and I've got a 
question.
 After the migration the guest has to announce its MAC address for the network 
to work. Guest OS can do it itself, or qemu can send
VHOST_USER_SEND_RARP to the vhost device. The problem arises because:
 1. DPDK currently does not implement VHOST_USER_SEND_RARP
 2. DPDK does not announce VIRTIO_NET_F_GUEST_ANNOUNCE feature, therefore qemu 
does not propagate it to the guest, and the quest
does not activate own announcing.

 The question is: why do we expect vhost-user to return 
VIRTIO_NET_F_GUEST_ANNOUNCE ? Does the device really need to have some
explicit support for it? As far as I can understand, there's no special 
handling to be done, the guest just sends GARP and that's
it. So why doesn't qemu turn on this feature by itself?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size

2015-12-09 Thread Pavel Fedin

 Hello!

> So I think in the end, the one page size we care about is the minimum
> IOMMU granularity.  We don't really care about the target page size at
> all and maybe we only care about the host page size for determining what
> might share a page with a sub-page mapping.

 Ok, so, in v2 i remove TARGET_PAGE_ALIGN and simply align to 
vfio_container_granularity. Would it be OK for now?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size

2015-12-07 Thread Pavel Fedin

 Hello!

> TAGET_PAGE_ALIGN tells us that it *could* be a valid DMA target though.
> The VM model is capable of using that as a page size, which means we
> assume it is and want to generate a fault.

 We seem to have looped back. So...
 It is possible to fix this according to this assumption. In this case we would 
need to make TARGET_PAGE_BITS a variable. If we are emulating ancient armv5te, 
it will be set to 10. For modern targets, ARMv6 and newer, it will be 12.
 Peter, would you ACK this approach? BTW, we even have a kind of hack in 
target-arm/cpu.h:
--- cut ---
#if defined(CONFIG_USER_ONLY)
#define TARGET_PAGE_BITS 12
#else
/* The ARM MMU allows 1k pages.  */
/* ??? Linux doesn't actually use these, and they're deprecated in recent
   architecture revisions.  Maybe a configure option to disable them.  */
#define TARGET_PAGE_BITS 10
#endif
--- cut ---

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size

2015-12-03 Thread Pavel Fedin

 Hello!

> >  My device defines this BAR to be of 2M size. In this case qemu splits it 
> > up into three
> > regions:
> >  1) Region below the MSI-X table (it's called "mmap", for me it's empty 
> > because table offset
> > is 0)
> >  2) MSI-X table itself (20 vectors = 0x0140 bytes for me).
> >  3) Region above the MSi-X table (it's called "mmap msix-hi"). Size for my 
> > case is 2M -
> > REAL_HOST_PAGE_ALIGN(0x140) = 0x1FF000.
> > Regions (1) and (3) are passed through and directly mapped by VFIO, region 
> > (2) is emulated.
> >  Now, we have PBA. The device says that PBA is placed in memory specified 
> > by the same BAR as
> > table, and its offset is 0x000f. PBA is also emulated by qemu, and as a 
> > result it overlaps
> > "mmap msix-hi" region, which breaks up into two fragments as a consequence:
> >  3a) offset 0x0 size 0x0EF000
> >  3b) offset 0xEF008 size 0x10FFF8
> >  PBA sits between these fragments, having only 8 bytes in size.
> >  Attempt to map (3b) causes the problem. vfio_mmap_region() is expected to 
> > align this up,
> > and it does, but it does it to a minimum possible granularity for ARM 
> > platform, which, as qemu
> > thinks, is always 1K.
> 
> Yep, on x86 we'd have target page size == host page size == minimum
> iommu page size and we'd align that up to something that can be mapped,
> which means we wouldn't have an iommu mapping for peer-to-peer in this
> space, but nobody is likely to notice.

 So, OK, to keep the quoting short... I've also read your previous reply about 
that "rounding up just works". Let's sum things up.

1. Rounding up "just works" in my case too. So i don't see a direct reason to 
change it.
2. The only problem is rounding up to 1K, which TARGET_PAGE_ALIGN() does on 
ARM. What we need is 4K,
   which is appropriate for host too. And, by the way, for modern ARM guests 
too. So, we can do either of:
   a) Keep my original approach - TARGET_PAGE_ALIGN(), then align to 
vfio_container_granularity().
   b) Just align to vfio_container_granularity() instead of using 
TARGET_PAGE_ALIGN().
   c) Use HOST_PAGE_ALIGN() instead of TARGET_PAGE_ALIGN() (what Peter 
suggested).

   On x86 there would be no difference, because all these alignments are 
identical. On ARM, actually, all of these approaches would also give correct 
result, because it's only TARGET_PAGE_ALIGN() wrong in this case. So - what do 
you think is the most appropriate? Let's make a choice and i'll send a proper 
patch.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [Qemu-arm] [PATCH v3 1/8] hw/arm/virt: Add a GPIO controller

2015-12-02 Thread Pavel Fedin

 Hello!

> PSCI handles the actions initiated from the inside of OS. Examples
> include system shutdown and hotplug (still inside OS). From this
> perspective PSCI works well. However this communication is
> one-direction: there isn't a way to communicate from the outside (e.g.
> libvirt) to the guest OS. For instance, "system_powerdown" qmp command
> won't work because guest OS can't receive the notification.

 Ah, so it is necessary for e.g. commanding a graceful shutdown using 
virt-manager's "shutdown" command? Good then, i've missed that.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v3 1/8] hw/arm/virt: Add a GPIO controller

2015-12-01 Thread Pavel Fedin

 Hello!

> >> ACPI 5.0 supports GPIO-signaled ACPI Events. This can be used for
> >> powerdown, hotplug evnets. Add a GPIO controller in machine virt,
> > s/evnets/events/
> >
> >> to support powerdown, maybe can be used for cpu hotplug. And
> >> here we use pl061.

 Sorry for late jumping in, but this was the first message Cc'ed to me.
 With these devices virt machine IMHO goes farther and farther away from its 
initial goal: be a minimalistic virtual box, which ensures maximum possible 
compatibility and portability.
 virt machine already supports poweroff using PSCI interface. Why we need to 
add more hardware? Can't ACPI deal with PSCI?
 To tell the truth, i dislike ACPI + EFI thing at all. It looks like cramming 
PC-oriented firmware into architecture for which it was never meant to be 
written. Too much overcomplications, we drop already established things and 
reinvent a (triangular) wheel, but what's the purpose? Is it being done only 
because vendors want obscure proprietary firmware instead of old good u-boot?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] BUG ALERT: vexpress model + KVM does not boot up with default settings

2015-11-30 Thread Pavel Fedin

 Hello!

 Unfortunately i don't have much time to investigate this and it is out of my 
project's scope, but from time to time i test vexpress
model on ARM32. And after the recent changes regarding TrustZone support 
vexpress model with direct kernel boot in KVM gets stuck at
this point:

--- cut ---
Uncompressing Linux... done, booting the kernel.
Booting Linux on physical CPU 0x0
Initializing cgroup subsys cpuset
Linux version 4.1.4+ (p.fedin@fedinw7x64) (gcc version 4.8.3 (Linaro GCC 
4.8-2014.04-1~dev) ) #5 SMP Wed Aug 5 18:47:36 MSK 2015
CPU: ARMv7 Processor [412fc0f3] revision 3 (ARMv7), cr=30c5387d
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
Machine model: V2P-CA15
bootconsole [earlycon0] enabled
Forcing write-allocate cache policy for SMP
Memory policy: Data cache writealloc
PERCPU: Embedded 12 pages/cpu @8fdd5000 s16704 r8192 d24256 u49152
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
Kernel command line: console=ttyAMA0,115200n8 root=/dev/vda rw rootwait 
earlyprintk
PID hash table entries: 1024 (order: 0, 4096 bytes)
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 252256K/262144K available (5374K kernel code, 158K rwdata, 1448K 
rodata, 272K init, 153K bss, 9888K reserved, 0K
cma-reserved)
Virtual kernel memory layout:
vector  : 0x - 0x1000   (   4 kB)
fixmap  : 0xffc0 - 0xfff0   (3072 kB)
vmalloc : 0x9080 - 0xff00   (1768 MB)
lowmem  : 0x8000 - 0x9000   ( 256 MB)
modules : 0x7f00 - 0x8000   (  16 MB)
  .text : 0x80008000 - 0x806b1d28   (6824 kB)
  .init : 0x806b2000 - 0x806f6000   ( 272 kB)
  .data : 0x806f6000 - 0x8071dae0   ( 159 kB)
   .bss : 0x8071dae0 - 0x80743f78   ( 154 kB)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Hierarchical RCU implementation.
RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
NR_IRQS:16 nr_irqs:16 16
L2C: failed to init: -19
NO_HZ: Clearing 0 from nohz_full range for timekeeping
NO_HZ: Full dynticks CPUs: 1.
Note: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.
Offload RCU callbacks from CPUs: 1.
Architected cp15 timer(s) running at 24.00MHz (virt).
clocksource arch_sys_counter: mask: 0xff max_cycles: 0x588fe9dc0, 
max_idle_ns: 440795202592 ns
sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 4398046511097ns
Switching to timer-based delay loop, resolution 41ns
sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
clocksource arm,sp804: mask: 0x max_cycles: 0x, max_idle_ns: 
1911260446275 ns
Console: colour dummy device 80x30
Calibrating delay loop (skipped), value calculated using timer frequency.. 
48.00 BogoMIPS (lpj=24)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
CPU: Testing write buffer coherency: ok
/cpus/cpu@0 missing clock-frequency property
/cpus/cpu@1 missing clock-frequency property
CPU0: thread -1, cpu 0, socket 0, mpidr 8000
Setting up static identity map for 0x80008300 - 0x80008358
--- cut ---

 The problem goes away if i add "secure=off" option. "virt" machine is not 
affected.
 By the way, is it legitimate to default to "secure=on" in KVM mode at all? We 
cannot have trustzone inside KVM.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [RFC PATCH v3 1/5] hw/intc: Implement ITS base class

2015-11-29 Thread Pavel Fedin

 Hello!

> > +/* Our two regions are always adjacent, therefore we now combine them
> > + * into a single one in order to make our users' life easier.
> > + */
> > +memory_region_init(&s->iomem_main, OBJECT(s), "gicv3_its", ITS_SIZE);
> > +memory_region_add_subregion(&s->iomem_main, 0, &s->iomem_its_cntrl);
> > +memory_region_add_subregion(&s->iomem_main, ITS_CONTROL_SIZE,
> > +&s->iomem_its);
> > +sysbus_init_mmio(sbd, &s->iomem_main);
> 
> So we have two memory subregions:
>  * the control register page, whose ops are passed in by the subclass
>  * the translation register page, whose only register is implemented
>here by looking up the subclass and calling its send_msi method
> 
> Does that complexity gain us anything later? There doesn't really
> seem to be anything much actually in common here, which would
> suggest just having the subclasses create their own mmio region(s)
> (which could then just directly implement the right behaviour for
> GITS_TRANSLATER).

 I started from this, but decided to move region creation into base class, 
because:
1. We always have this region, both for KVM and for SW implementation, and it 
operates in the same way.
2. We always need to do address validation, as well as have 
gicv3_its_trans_read() stub.
3. Also, in the next revision i'll implement 16-bit handling here.

 So, i decided not to duplicate these things.

> > +
> > +const char *its_class_name(void)
> > +{
> > +if (kvm_irqchip_in_kernel()) {
> > +#ifdef TARGET_AARCH64
> > +/* KVM implementation requires this capability */
> > +if (kvm_direct_msi_enabled()) {
> > +return "arm-its-kvm";
> > +}
> > +#endif
> 
> Why is this in an #ifdef? In theory we could support
> the GICv3 in a 32-bit system.

 Only for code consistency, because "arm-its-kvm" class is compiled only for 
TARGET_AARCH64, because only there we have the relevant kernel API definitions. 
So, i did not want to refer to nonexistent class.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [RFC PATCH v3 3/5] kvm_arm: Pass requester ID to MSI routing functions

2015-11-29 Thread Pavel Fedin

 Hello!

> > +route->u.msi.devid = pci_requester_id(dev);
> > +}
> 
> Is there anything that would go wrong if we just always set
> the u.msi.devid and the VALID_DEVID flag? (ie do we need the
> kvm_arm_msi_use_devid bool?)

 Current kernels always make sure that flags == 0, or they simply return 
-EINVAL. Therefore, for backwards compatibility we set the flag only if we know 
that we need it.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v2] vhost: Fix aborting if KVM does not support eventfds

2015-11-25 Thread Pavel Fedin

 Hello!

> > Or better yet, actually implement host notifiers
> > in userspace - should not be that hard to do, just
> > do write into eventfd.
> 
> I sent this RFC patch a while back:

 Thank you very much, but Paolo already pointed it out for me, and i have 
picked it up:
https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04626.html

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size

2015-11-24 Thread Pavel Fedin

 Hello!

> There are a number of different interesting page sizes here:
>  * the host kernel page size
>  * the target CPU architecture's worst-case smallest page size
>  * the page size the guest kernel is actually using at the moment
>(consider a 4K-page guest kernel on a 64K-page host kernel)
> 
> These don't necessarily have to all be the same. I would
> expect VFIO to be interested in the host kernel page size,
> not TARGET_PAGE_ALIGN.

 So would the appropriate fix be just replacing TARGET_PAGE_ALIGN with 
HOST_PAGE_ALIGN?
 Alex: there's however, still no answer to "why are we aligning up, not down?" 
question..

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size

2015-11-24 Thread Pavel Fedin

 Hello!

> >  So, i've got this problem on ARM64. On ARM64 we actually can never have 1K 
> > pages. This page
> > size was supported only by old 32-bit ARM CPUs, up to ARMv5 IIRC, then it 
> > was dropped. Linux
> > OS never even used it.
> >  But, since qemu can emulate those ancient CPUs, TARGET_PAGE_BITS is 
> > defined to 10 for ARM.
> > And, ARM64 and ARM32 is actually the same target for qemu, so this is why 
> > we still get it.
> >  Perhaps, TARGET_PAGE_BITS should be a variable for ARM, and we should set 
> > it according to
> > the actual used CPU. Then this IOMMU alignment problem would disappear 
> > automatically. What do
> > you think?
> >  Cc'ed Peter since he is the main ARM guy here.
> 
> Do we only see these alignments when we're emulating those old 1k page
> processors?

 No. We see them when we are emulating brand new ARM64. And we see them only 
because TARGET_PAGE_BITS is still hardcoded to 10.

> If not, should we really be telling a 4k page VM about 1k
> aligned memory?

 Well, first of all, we are not telling about aligned memory at all. I've 
researched for the origin of this address, it appears when we are mapping MSI-X 
BAR of the VFIO device.
 My device defines this BAR to be of 2M size. In this case qemu splits it up 
into three regions:
 1) Region below the MSI-X table (it's called "mmap", for me it's empty because 
table offset is 0)
 2) MSI-X table itself (20 vectors = 0x0140 bytes for me).
 3) Region above the MSi-X table (it's called "mmap msix-hi"). Size for my case 
is 2M - REAL_HOST_PAGE_ALIGN(0x140) = 0x1FF000.
Regions (1) and (3) are passed through and directly mapped by VFIO, region (2) 
is emulated.
 Now, we have PBA. The device says that PBA is placed in memory specified by 
the same BAR as table, and its offset is 0x000f. PBA is also emulated by 
qemu, and as a result it overlaps "mmap msix-hi" region, which breaks up into 
two fragments as a consequence:
 3a) offset 0x0 size 0x0EF000
 3b) offset 0xEF008 size 0x10FFF8
 PBA sits between these fragments, having only 8 bytes in size.
 Attempt to map (3b) causes the problem. vfio_mmap_region() is expected to 
align this up, and it does, but it does it to a minimum possible granularity 
for ARM platform, which, as qemu thinks, is always 1K.

>  If that's all legit, maybe we should be aligning down
> rather than up, we know the host memory is at least 4k pages, so
> anything in the gap between those alignments should be backed by memory,
> right?

 You know, i also have this question. Why are we aligning up? What if the 
device tries to access parts that we skipped by our alignment?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [RFC PATCH v3 5/5] arm/virt: Add ITS to the virt board

2015-11-24 Thread Pavel Fedin

If supported by the configuration, ITS will be added automatically.

This patch also renames v2m_phandle to msi_phandle because it's now used
by both MSI implementations.

Signed-off-by: Pavel Fedin 
---
 hw/arm/virt.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9c6792c..2196bbe 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -70,7 +70,7 @@ typedef struct VirtBoardInfo {
 int fdt_size;
 uint32_t clock_phandle;
 uint32_t gic_phandle;
-uint32_t v2m_phandle;
+uint32_t msi_phandle;
 } VirtBoardInfo;
 
 typedef struct {
@@ -351,9 +351,22 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 }
 }
 
+static void fdt_add_its_gic_node(VirtBoardInfo *vbi)
+{
+vbi->msi_phandle = qemu_fdt_alloc_phandle(vbi->fdt);
+qemu_fdt_add_subnode(vbi->fdt, "/intc/its");
+qemu_fdt_setprop_string(vbi->fdt, "/intc/its", "compatible",
+"arm,gic-v3-its");
+qemu_fdt_setprop(vbi->fdt, "/intc/its", "msi-controller", NULL, 0);
+qemu_fdt_setprop_sized_cells(vbi->fdt, "/intc/its", "reg",
+ 2, vbi->memmap[VIRT_GIC_ITS].base,
+ 2, vbi->memmap[VIRT_GIC_ITS].size);
+qemu_fdt_setprop_cell(vbi->fdt, "/intc/its", "phandle", vbi->msi_phandle);
+}
+
 static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi)
 {
-vbi->v2m_phandle = qemu_fdt_alloc_phandle(vbi->fdt);
+vbi->msi_phandle = qemu_fdt_alloc_phandle(vbi->fdt);
 qemu_fdt_add_subnode(vbi->fdt, "/intc/v2m");
 qemu_fdt_setprop_string(vbi->fdt, "/intc/v2m", "compatible",
 "arm,gic-v2m-frame");
@@ -361,7 +374,7 @@ static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi)
 qemu_fdt_setprop_sized_cells(vbi->fdt, "/intc/v2m", "reg",
  2, vbi->memmap[VIRT_GIC_V2M].base,
  2, vbi->memmap[VIRT_GIC_V2M].size);
-qemu_fdt_setprop_cell(vbi->fdt, "/intc/v2m", "phandle", vbi->v2m_phandle);
+qemu_fdt_setprop_cell(vbi->fdt, "/intc/v2m", "phandle", vbi->msi_phandle);
 }
 
 static void fdt_add_gic_node(VirtBoardInfo *vbi, int type)
@@ -397,6 +410,26 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi, int type)
 qemu_fdt_setprop_cell(vbi->fdt, "/intc", "phandle", vbi->gic_phandle);
 }
 
+static void create_its(VirtBoardInfo *vbi, DeviceState *gicdev)
+{
+const char *itsclass = its_class_name();
+DeviceState *dev;
+
+if (!itsclass) {
+/* Do nothing if not supported */
+return;
+}
+
+dev = qdev_create(NULL, itsclass);
+
+object_property_set_link(OBJECT(dev), OBJECT(gicdev), "parent-gicv3",
+ &error_abort);
+qdev_init_nofail(dev);
+sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, vbi->memmap[VIRT_GIC_ITS].base);
+
+fdt_add_its_gic_node(vbi);
+}
+
 static void create_v2m(VirtBoardInfo *vbi, qemu_irq *pic)
 {
 int i;
@@ -480,7 +513,9 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic, 
int type, bool secure)
 
 fdt_add_gic_node(vbi, type);
 
-if (type == 2) {
+if (type == 3) {
+create_its(vbi, gicdev);
+} else {
 create_v2m(vbi, pic);
 }
 }
@@ -799,9 +834,9 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq 
*pic,
 qemu_fdt_setprop_cells(vbi->fdt, nodename, "bus-range", 0,
nr_pcie_buses - 1);
 
-if (vbi->v2m_phandle) {
+if (vbi->msi_phandle) {
 qemu_fdt_setprop_cells(vbi->fdt, nodename, "msi-parent",
-   vbi->v2m_phandle);
+   vbi->msi_phandle);
 }
 
 qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
-- 
2.4.4

[Qemu-devel] [RFC PATCH v3 4/5] kvm_arm: Implement support for ITS emulation by KVM

2015-11-24 Thread Pavel Fedin

This patch relies on new kernel API which is not released yet.

Signed-off-by: Pavel Fedin 
---
 hw/intc/Makefile.objs  |  1 +
 hw/intc/arm_gicv3_its_common.c |  2 +-
 hw/intc/arm_gicv3_its_kvm.c| 88 ++
 3 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 hw/intc/arm_gicv3_its_kvm.c

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 2d6543b..8d5 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -19,6 +19,7 @@ common-obj-$(CONFIG_OPENPIC) += openpic.o
 obj-$(CONFIG_APIC) += apic.o apic_common.o
 obj-$(CONFIG_ARM_GIC_KVM) += arm_gic_kvm.o
 obj-$(call land,$(CONFIG_ARM_GIC_KVM),$(TARGET_AARCH64)) += arm_gicv3_kvm.o
+obj-$(call land,$(CONFIG_ARM_GIC_KVM),$(TARGET_AARCH64)) += arm_gicv3_its_kvm.o
 obj-$(CONFIG_STELLARIS) += armv7m_nvic.o
 obj-$(CONFIG_EXYNOS4) += exynos4210_gic.o exynos4210_combiner.o
 obj-$(CONFIG_GRLIB) += grlib_irqmp.o
diff --git a/hw/intc/arm_gicv3_its_common.c b/hw/intc/arm_gicv3_its_common.c
index 873d62a..c040011 100644
--- a/hw/intc/arm_gicv3_its_common.c
+++ b/hw/intc/arm_gicv3_its_common.c
@@ -70,7 +70,7 @@ static MemTxResult gicv3_its_trans_write(void *opaque, hwaddr 
offset,
 if (offset == 0x0040) {
 GICv3ITSState *s = ARM_GICV3_ITS_COMMON(opaque);
 GICv3ITSCommonClass *c = ARM_GICV3_ITS_COMMON_GET_CLASS(s);
-int ret = c->send_msi(s, le32_to_cpu(value), attrs.stream_id);
+int ret = c->send_msi(s, le32_to_cpu(value), attrs.requester_id);
 
 if (ret) {
 qemu_log_mask(LOG_GUEST_ERROR,
diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
new file mode 100644
index 000..62323c6
--- /dev/null
+++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -0,0 +1,88 @@
+/*
+ * KVM-based ITS implementation for a GICv3-based system
+ *
+ * Copyright (c) 2015 Samsung Electronics Co., Ltd.
+ * Written by Pavel Fedin 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "hw/intc/arm_gicv3_its_common.h"
+#include "sysemu/kvm.h"
+#include "kvm_arm.h"
+
+#define TYPE_KVM_ARM_ITS "arm-its-kvm"
+#define KVM_ARM_ITS(obj) OBJECT_CHECK(GICv3ITSState, (obj), TYPE_KVM_ARM_ITS)
+
+static int kvm_its_send_msi(GICv3ITSState *s, uint32_t value, uint16_t devid)
+{
+struct kvm_msi msi;
+
+msi.address_lo = 0x0040;
+msi.address_hi = 0;
+msi.data = value;
+msi.flags = KVM_MSI_VALID_DEVID;
+msi.devid = devid;
+memset(msi.pad, 0, sizeof(msi.pad));
+
+return kvm_vm_ioctl(kvm_state, KVM_SIGNAL_MSI, &msi);
+}
+
+static void kvm_arm_its_realize(DeviceState *dev, Error **errp)
+{
+GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
+
+gicv3_its_init_mmio(s, NULL);
+kvm_arm_register_device(&s->iomem_its_cntrl, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
+KVM_VGIC_V3_ADDR_TYPE_ITS, s->gicv3->dev_fd);
+
+kvm_arm_msi_use_devid = true;
+kvm_gsi_routing_allowed = kvm_has_gsi_routing();
+kvm_msi_via_irqfd_allowed = kvm_gsi_routing_allowed;
+}
+
+static void kvm_arm_its_init(Object *obj)
+{
+GICv3ITSState *s = KVM_ARM_ITS(obj);
+
+object_property_add_link(obj, "parent-gicv3",
+ "kvm-arm-gicv3", (Object **)&s->gicv3,
+ object_property_allow_set_link,
+ OBJ_PROP_LINK_UNREF_ON_RELEASE,
+ &error_abort);
+}
+
+static void kvm_arm_its_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+GICv3ITSCommonClass *icc = ARM_GICV3_ITS_COMMON_CLASS(klass);
+
+dc->realize = kvm_arm_its_realize;
+icc->send_msi = kvm_its_send_msi;
+}
+
+static const TypeInfo kvm_arm_its_info = {
+.name = TYPE_KVM_ARM_ITS,
+.parent = TYPE_ARM_GICV3_ITS_COMMON,
+.instance_size = sizeof(GICv3ITSState),
+.instance_init = kvm_arm_its_init,
+.class_init = kvm_arm_its_class_init,
+};
+
+static void kvm_arm_its_register_types(void)
+{
+type_register_static(&kvm_arm_its_info);
+}
+
+type_init(kvm_arm_its_register_types)
-- 
2.4.4

[Qemu-devel] [RFC PATCH v3 0/5] vITS support

2015-11-24 Thread Pavel Fedin

This series introduces support for in-kernel GICv3 ITS emulation.
It is based on kernel API which is not released yet, therefore i post
it as an RFC.

Kernel patch sets which implement this functionality are:
- [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation
  http://www.spinics.net/lists/kvm/msg121878.html
- [PATCH v3 0/7] KVM: arm/arm64: gsi routing support
  http://www.spinics.net/lists/kvm/msg119567.html

v2 => v3:
- Really added unmigratable flag, was overlooked in v2
- Fixed checkpatch issue with initializing static variable to zero

v1 => v2:
- Added registers and reset method
- Added unmigratable flag
- Rebased on top of current master, use kvm_arch_fixup_msi_route() now

Pavel Fedin (5):
  hw/intc: Implement ITS base class
  kernel: Add vGICv3 ITS definitions
  kvm_arm: Pass requester ID to MSI routing functions
  kvm_arm: Implement support for ITS emulation by KVM
  arm/virt: Add ITS to the virt board

 hw/arm/virt.c  |  47 --
 hw/intc/Makefile.objs  |   2 +
 hw/intc/arm_gicv3_its_common.c | 155 +
 hw/intc/arm_gicv3_its_kvm.c|  88 +++
 include/hw/intc/arm_gicv3_its_common.h |  72 +++
 linux-headers/asm-arm64/kvm.h  |   1 +
 linux-headers/linux/kvm.h  |   9 +-
 target-arm/kvm.c   |   6 ++
 target-arm/kvm_arm.h   |  13 +++
 target-arm/machine.c   |  16 
 10 files changed, 401 insertions(+), 8 deletions(-)
 create mode 100644 hw/intc/arm_gicv3_its_common.c
 create mode 100644 hw/intc/arm_gicv3_its_kvm.c
 create mode 100644 include/hw/intc/arm_gicv3_its_common.h

-- 
2.4.4

[Qemu-devel] [RFC PATCH v3 1/5] hw/intc: Implement ITS base class

2015-11-24 Thread Pavel Fedin

This is the basic skeleton for both KVM and software-emulated ITS.
Since we already prepare status structure, we also introduce complete
VMState description. But, because we currently have no migratable
implementations, we also set unmigratable flag.

Signed-off-by: Pavel Fedin 
---
 hw/intc/Makefile.objs  |   1 +
 hw/intc/arm_gicv3_its_common.c | 155 +
 include/hw/intc/arm_gicv3_its_common.h |  72 +++
 target-arm/kvm_arm.h   |  10 +++
 target-arm/machine.c   |  16 
 5 files changed, 254 insertions(+)
 create mode 100644 hw/intc/arm_gicv3_its_common.c
 create mode 100644 include/hw/intc/arm_gicv3_its_common.h

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 004b0c2..2d6543b 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -13,6 +13,7 @@ common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_common.o
+common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_its_common.o
 common-obj-$(CONFIG_OPENPIC) += openpic.o
 
 obj-$(CONFIG_APIC) += apic.o apic_common.o
diff --git a/hw/intc/arm_gicv3_its_common.c b/hw/intc/arm_gicv3_its_common.c
new file mode 100644
index 000..873d62a
--- /dev/null
+++ b/hw/intc/arm_gicv3_its_common.c
@@ -0,0 +1,155 @@
+/*
+ * ITS base class for a GICv3-based system
+ *
+ * Copyright (c) 2015 Samsung Electronics Co., Ltd.
+ * Written by Pavel Fedin
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "hw/pci/msi.h"
+#include "hw/intc/arm_gicv3_its_common.h"
+
+static void gicv3_its_pre_save(void *opaque)
+{
+GICv3ITSState *s = (GICv3ITSState *)opaque;
+GICv3ITSCommonClass *c = ARM_GICV3_ITS_COMMON_GET_CLASS(s);
+
+if (c->pre_save) {
+c->pre_save(s);
+}
+}
+
+static int gicv3_its_post_load(void *opaque, int version_id)
+{
+GICv3ITSState *s = (GICv3ITSState *)opaque;
+GICv3ITSCommonClass *c = ARM_GICV3_ITS_COMMON_GET_CLASS(s);
+
+if (c->post_load) {
+c->post_load(s);
+}
+return 0;
+}
+
+static const VMStateDescription vmstate_its = {
+.name = "arm_gicv3_its",
+.pre_save = gicv3_its_pre_save,
+.post_load = gicv3_its_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(ctlr, GICv3ITSState),
+VMSTATE_UINT64(cbaser, GICv3ITSState),
+VMSTATE_UINT64(cwriter, GICv3ITSState),
+VMSTATE_UINT64(creadr, GICv3ITSState),
+VMSTATE_UINT64_ARRAY(baser, GICv3ITSState, 8),
+VMSTATE_END_OF_LIST()
+}
+.unmigratable = true,
+};
+
+static uint64_t gicv3_its_trans_read(void *opaque, hwaddr offset, unsigned 
size)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "ITS read at offset 0x%jX\n", offset);
+return ~0ULL;
+}
+
+static MemTxResult gicv3_its_trans_write(void *opaque, hwaddr offset,
+ uint64_t value, unsigned size,
+ MemTxAttrs attrs)
+{
+if (offset == 0x0040) {
+GICv3ITSState *s = ARM_GICV3_ITS_COMMON(opaque);
+GICv3ITSCommonClass *c = ARM_GICV3_ITS_COMMON_GET_CLASS(s);
+int ret = c->send_msi(s, le32_to_cpu(value), attrs.stream_id);
+
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ITS: Error sending MSI: %s\n", strerror(-ret));
+return MEMTX_DECODE_ERROR;
+}
+
+return MEMTX_OK;
+} else {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ITS write at bad offset 0x%jX\n", offset);
+return MEMTX_DECODE_ERROR;
+}
+}
+
+static const MemoryRegionOps gicv3_its_trans_ops = {
+.read = gicv3_its_trans_read,
+.write_with_attrs = gicv3_its_trans_write,
+.impl = {
+ .min_access_size = 4,
+ .max_access_size = 4,
+ },
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+void gicv3_its_init_mmio(GICv3ITSState *s, const MemoryRegionOps *ops)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(s);
+
+memory_region_init_io(&s->iomem_its_cntrl, OBJECT(s), ops, s,
+  "control", ITS_CONTROL_SIZE);
+memory_region_init_io(&s->iomem_its, OBJECT(s), &gicv3_its_trans_ops, s,
+

[Qemu-devel] [RFC PATCH v3 2/5] kernel: Add vGICv3 ITS definitions

2015-11-24 Thread Pavel Fedin

This temporary patch adds kernel API definitions. Use proper header update
procedure after these features are released.

Signed-off-by: Pavel Fedin 
---
 linux-headers/asm-arm64/kvm.h | 1 +
 linux-headers/linux/kvm.h | 9 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index d3714c0..a4f3230 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -87,6 +87,7 @@ struct kvm_regs {
 /* Supported VGICv3 address types  */
 #define KVM_VGIC_V3_ADDR_TYPE_DIST 2
 #define KVM_VGIC_V3_ADDR_TYPE_REDIST   3
+#define KVM_VGIC_V3_ADDR_TYPE_ITS  4
 
 #define KVM_VGIC_V3_DIST_SIZE  SZ_64K
 #define KVM_VGIC_V3_REDIST_SIZE(2 * SZ_64K)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index dcc410e..32938d7 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -843,7 +843,10 @@ struct kvm_irq_routing_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
-   __u32 pad;
+   union {
+   __u32 pad;
+   __u32 devid;
+   };
 };
 
 struct kvm_irq_routing_s390_adapter {
@@ -982,12 +985,14 @@ struct kvm_one_reg {
__u64 addr;
 };
 
+#define KVM_MSI_VALID_DEVID(1U << 0)
 struct kvm_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
__u32 flags;
-   __u8  pad[16];
+   __u32 devid;
+   __u8  pad[12];
 };
 
 struct kvm_arm_device_addr {
-- 
2.4.4

[Qemu-devel] [RFC PATCH v3 3/5] kvm_arm: Pass requester ID to MSI routing functions

2015-11-24 Thread Pavel Fedin

Introduce global kvm_arm_msi_use_devid flag and pass device IDs in
kvm_arch_fixup_msi_route(). Device IDs are required by the ITS.

Signed-off-by: Pavel Fedin 
---
 target-arm/kvm.c | 6 ++
 target-arm/kvm_arm.h | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 79ef4c6..0e46930 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -23,6 +23,7 @@
 #include "cpu.h"
 #include "internals.h"
 #include "hw/arm/arm.h"
+#include "hw/pci/pci.h"
 #include "exec/memattrs.h"
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
@@ -30,6 +31,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 };
 
 static bool cap_has_mp_state;
+bool kvm_arm_msi_use_devid;
 
 int kvm_arm_vcpu_init(CPUState *cs)
 {
@@ -607,6 +609,10 @@ int kvm_arm_vgic_probe(void)
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
  uint64_t address, uint32_t data, PCIDevice *dev)
 {
+if (kvm_arm_msi_use_devid) {
+route->flags = KVM_MSI_VALID_DEVID;
+route->u.msi.devid = pci_requester_id(dev);
+}
 return 0;
 }
 
diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h
index 0ec221b..181ff37 100644
--- a/target-arm/kvm_arm.h
+++ b/target-arm/kvm_arm.h
@@ -120,6 +120,9 @@ bool write_kvmstate_to_list(ARMCPU *cpu);
 void kvm_arm_reset_vcpu(ARMCPU *cpu);
 
 #ifdef CONFIG_KVM
+
+extern bool kvm_arm_msi_use_devid;
+
 /**
  * kvm_arm_create_scratch_host_vcpu:
  * @cpus_to_try: array of QEMU_KVM_ARM_TARGET_* values (terminated with
-- 
2.4.4

[Qemu-devel] PING: [PATCH] backends/hostmem: Ignore ENOSYS while setting MPOL_DEFAULT

2015-11-22 Thread Pavel Fedin

 Hello! No news for a long time, we are at RC stage. Could we get this in?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

> -Original Message-
> From: qemu-devel-bounces+p.fedin=samsung@nongnu.org [mailto:qemu-devel-
> bounces+p.fedin=samsung@nongnu.org] On Behalf Of Eduardo Habkost
> Sent: Tuesday, October 27, 2015 8:32 PM
> To: Pavel Fedin
> Cc: 'Paolo Bonzini'; qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] [PATCH] backends/hostmem: Ignore ENOSYS while 
> setting MPOL_DEFAULT
> 
> On Tue, Oct 27, 2015 at 03:51:31PM +0300, Pavel Fedin wrote:
> > Currently hostmem backend fails if CONFIG_NUMA is enabled for the qemu
> > (default), but NUMA is not supported by the kernel. This makes it
> > impossible to use ivshmem in such configurations.
> >
> > This patch fixes the problem by ignoring ENOSYS error if policy is set to
> > MPOL_DEFAULT. This way the code behaves in the same way as if CONFIG_NUMA
> > was not defined. qemu will still fail if the user specifies some other
> > policy, so that the user knows it.
> >
> > Signed-off-by: Pavel Fedin 
> 
> Reviewed-by: Eduardo Habkost 
> 
> Thanks. Applied to numa tree, with the following indentation fix:
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 94a4ac0..1b4eb45 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -315,7 +315,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
> Error **errp)
>maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) 
> {
>  if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
>  error_setg_errno(errp, errno,
> - "cannot bind memory to host NUMA nodes");
> + "cannot bind memory to host NUMA nodes");
>  return;
>  }
>  }
> 
> > ---
> >  backends/hostmem.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/backends/hostmem.c b/backends/hostmem.c
> > index 41ba2af..94a4ac0 100644
> > --- a/backends/hostmem.c
> > +++ b/backends/hostmem.c
> > @@ -313,9 +313,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
> > Error **errp)
> >  assert(maxnode <= MAX_NODES);
> >  if (mbind(ptr, sz, backend->policy,
> >maxnode ? backend->host_nodes : NULL, maxnode + 1, 
> > flags)) {
> > -error_setg_errno(errp, errno,
> > +if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
> > +error_setg_errno(errp, errno,
> >   "cannot bind memory to host NUMA nodes");
> > -return;
> > +return;
> > +}
> >  }
> >  #endif
> >  /* Preallocate memory after the NUMA policy has been instantiated.
> > --
> > 1.9.5.msysgit.0
> >
> >
> 
> --
> Eduardo

[Qemu-devel] [PATCH] memory: emulate ioeventfd

2015-11-20 Thread Pavel Fedin

The ioeventfd mechanism is used by vhost, dataplane, and virtio-pci to
turn guest MMIO/PIO writes into eventfd file descriptor events.  This
allows arbitrary threads to be notified when the guest writes to a
specific MMIO/PIO address.

qtest and TCG do not support ioeventfd because memory writes are not
checked against registered ioeventfds in QEMU.  This patch implements
this in memory_region_dispatch_write() so qtest can use ioeventfd.

Also this patch fixes vhost aborting on some misconfigured old kernels
like 3.18.0 on ARM. It is possible to explicitly enable CONFIG_EVENTFD
in expert settings, while MMIO binding support in KVM will still be
missing.

Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Pavel Fedin 
---
RFC => PATCH:
- Add !kvm_eventfds_enabled() conditions to bypass eventfd injection when not 
needed
- Renamed "ioeventfd" to "eventfd", just to make words shorter
- Add a one-shot warning about missing MMIO bindings in KVM
---
 kvm-all.c |  6 --
 memory.c  | 42 ++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index ddb007a..70f5cec 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1633,8 +1633,10 @@ static int kvm_init(MachineState *ms)
 
 kvm_state = s;
 
-s->memory_listener.listener.eventfd_add = kvm_mem_ioeventfd_add;
-s->memory_listener.listener.eventfd_del = kvm_mem_ioeventfd_del;
+if (kvm_eventfds_allowed) {
+s->memory_listener.listener.eventfd_add = kvm_mem_ioeventfd_add;
+s->memory_listener.listener.eventfd_del = kvm_mem_ioeventfd_del;
+}
 s->memory_listener.listener.coalesced_mmio_add = kvm_coalesce_mmio_region;
 s->memory_listener.listener.coalesced_mmio_del = 
kvm_uncoalesce_mmio_region;
 
diff --git a/memory.c b/memory.c
index e193658..4d138fb 100644
--- a/memory.c
+++ b/memory.c
@@ -18,12 +18,14 @@
 #include "exec/ioport.h"
 #include "qapi/visitor.h"
 #include "qemu/bitops.h"
+#include "qemu/error-report.h"
 #include "qom/object.h"
 #include "trace.h"
 #include 
 
 #include "exec/memory-internal.h"
 #include "exec/ram_addr.h"
+#include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
 
 //#define DEBUG_UNASSIGNED
@@ -1141,6 +1143,32 @@ MemTxResult memory_region_dispatch_read(MemoryRegion *mr,
 return r;
 }
 
+/* Return true if an eventfd was signalled */
+static bool memory_region_dispatch_write_eventfds(MemoryRegion *mr,
+hwaddr addr,
+uint64_t data,
+unsigned size,
+MemTxAttrs attrs)
+{
+MemoryRegionIoeventfd ioeventfd = {
+.addr = addrrange_make(int128_make64(addr), int128_make64(size)),
+.data = data,
+};
+unsigned i;
+
+for (i = 0; i < mr->ioeventfd_nb; i++) {
+ioeventfd.match_data = mr->ioeventfds[i].match_data;
+ioeventfd.e = mr->ioeventfds[i].e;
+
+if (memory_region_ioeventfd_equal(ioeventfd, mr->ioeventfds[i])) {
+event_notifier_set(ioeventfd.e);
+return true;
+}
+}
+
+return false;
+}
+
 MemTxResult memory_region_dispatch_write(MemoryRegion *mr,
  hwaddr addr,
  uint64_t data,
@@ -1154,6 +1182,11 @@ MemTxResult memory_region_dispatch_write(MemoryRegion 
*mr,
 
 adjust_endianness(mr, &data, size);
 
+if ((!kvm_eventfds_enabled()) &&
+memory_region_dispatch_write_eventfds(mr, addr, data, size, attrs)) {
+return MEMTX_OK;
+}
+
 if (mr->ops->write) {
 return access_with_adjusted_size(addr, &data, size,
  mr->ops->impl.min_access_size,
@@ -1672,6 +1705,8 @@ void memory_region_clear_global_locking(MemoryRegion *mr)
 mr->global_locking = false;
 }
 
+static bool userspace_eventfd_warning;
+
 void memory_region_add_eventfd(MemoryRegion *mr,
hwaddr addr,
unsigned size,
@@ -1688,6 +1723,13 @@ void memory_region_add_eventfd(MemoryRegion *mr,
 };
 unsigned i;
 
+if (kvm_enabled() && (!(kvm_eventfds_enabled() ||
+userspace_eventfd_warning))) {
+userspace_eventfd_warning = true;
+error_report("Using eventfd without MMIO binding in KVM. "
+ "Suboptimal performance expected");
+}
+
 if (size) {
 adjust_endianness(mr, &mrfd.data, size);
 }
-- 
1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH] virtio: Implement userspace forwarding for host notifiers

2015-11-19 Thread Pavel Fedin

 Hello!

> If I read this correctly, memory regions already keep track of
> ioeventfds and this patch can simply trigger them manually if that had
> not already been done?

 Yes, exactly, and this is the new idea.

> Only that we don't have such a nice tracking structure that memory
> regions already have.
> 
> The intercept handler for diagnose 500 already ends up at the correct
> device (its id is one of the parameters), so I'd need to check for an
> existing notifier on the virtqueue and trigger that instead of calling
> virtio_queue_notify directly?

 Yes, absolutely correct, but only if the notifier has actually been installed. 
Pure userspace virtio will not do it, AFAIK, if we
don't have KVM_CAP_IOEVENTFD. Take a careful look at virtio-mmio.c, you'll see 
that host notifiers are actually installed from two
places:
1. virtio_mmio_start_ioeventfd()
2. virtio_mmio_set_host_notifier()

 Userspace implementation of virtio uses (1) path, while vhost uses (2). Note 
also that (1) does not do anything at all if KVM does
not support ioeventfds. So, (1) case breaks up into two:
 1a. userspace virtio, notifications triggered by handling MMIO writes in 
userspace
 1b. userspace virtio, notifications triggered by ioeventfds, which are bound 
to MMIO in kernel
 I don't know why we have (1b) at all, because i'm in doubt there would be any 
noticeable performance advantage. But, i believe,
there was some reason to implement it.

 Both (1) and (2) end up in the same internal routine, and the only difference 
is 'bool set_handler', which is set to false for
vhost. This is what i rely on, and this is what you should detect too.

 PCI backend is a little bit more complicated to read, but it works absolutely 
in the same way.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] virtio: Implement userspace forwarding for host notifiers

2015-11-19 Thread Pavel Fedin

 Hello!

> > Paolo, do you see anything wrong with making
> > memory_region_add_eventfd work (slowly) without kvm ioeventfd support?
> >
> 
> Sure, it's even been on the todo list for a while.  Stefan Hajnoczi had
> a patch, the only ugly thing was that it slowed down a little all
> non-ioeventfd accesses, but I guess that's okay because it's probably
> not measurable.
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04710.html

 Ok, good. So, this has already been tested. Ok, i think i can further extend 
this patch by:
1. Add a flag to disable memory_region_dispatch_write_ioeventfds(), to be set 
by KVM code for example. So that with KVM we have even
smaller performance degradation.
2. Issue a one-shot warning, like in my current patch, so that the user knows 
that his/her kernel prevents from getting the best
performance.
 Will it be OK then?

 Cc'ed also Cornelia because he's taking part in CCW discussion. And the same 
kind of generic event forwarding could be implemented
for CCW then.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] virtio: Implement userspace forwarding for host notifiers

2015-11-19 Thread Pavel Fedin

 Hello!

> > I think if you have motivation, you could implement userspace forwarding 
> > for ccw yourself,
> since you know the thing and know how to
> > do it. For MMIO it was indeed very simple.
> 
> I'll take a peek at your patch.

 This is actually a note for Michael. ccw support is one more reason for 
handling forwarding on virtio device level, not on memory
region level.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] virtio: Implement userspace forwarding for host notifiers

2015-11-19 Thread Pavel Fedin

 Hello!

> OK, that's better, thanks!

 Yes, indeed this was simple.

> Why do we need
> if (kvm_eventfds_enabled()) {
> memory_region_add_eventfd(&proxy->iomem, VIRTIO_MMIO_QUEUENOTIFY, 
> 4,
> true, n, notifier);
> } else if (!set_handler) {
>virtio_queue_set_host_notifier_forwarding(vq);
> }
> everywhere though?
> 
> Can't memory_region_add_eventfd DTRT depending on kvm etc?

 You know... I took a look at this, and yes, i could simply hook up emulation 
into memory region handlers. And everything that
expects KVM eventfd binding will magically start working, probably rendering 
some bypass code obsolete.
 I have only one concern against this. qemu is a large piece of software, 
consisting of lots of components. I cannot test absolutely
everything in every configuration. I suggest, old code was written with the 
assumption that if memory_region_add_eventfd() works, we
are really using KVM acceleration. If we break this assumption, how much code 
will mysteriously misbehave instead of throwing
"function not supported" error, which is quite easy to trace down and fix?
 What i would refactor, perhaps, is to add a return code to 
memory_region_add_eventfd(), so that it can signal failure instead of a
critical abort, this would allow to get rid of kvm_eventfds_enabled() 
accompanying checks.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size

2015-11-19 Thread Pavel Fedin

 Hello!

> > On some architectures TARGET_PAGE_ALIGN() is not enough to get the right
> > alignment. For example on ARM TARGET_PAGE_BITS is 10 because some old CPUs
> > support 1K page size, while minimum SMMU page size is 4K.
> >
> > This fixes problems like:
> >
> > 2015-11-17T07:37:42.892265Z qemu-system-aarch64: VFIO_MAP_DMA: -22
> > 2015-11-17T07:37:42.892309Z qemu-system-aarch64: vfio_dma_map(0x223da230, 
> > 0x80002f0400,
> 0x10fc00, 0x7f89b40400) = -22 (Invalid
> > argument)
> > qemu: hardware error: vfio: DMA mapping failed, unable to continue

[skip]

> I don't understand how this is supposed to work, if we align to a larger
> size than the processor, then there are processor size pages of RAM than
> could be handed out as DMA targets for devices, but we can't map them
> through the IOMMU.  Thus if the guest tries to use them, we get IOMMU
> faults in the host and likely memory corruption in the guest because the
> device can't read or write to the page it's supposed to.  This doesn't
> seem like the right solution.

 Well, this was my first try on the problem. I've got your idea. But i guess we 
should discuss the proper solution then.
 So, i've got this problem on ARM64. On ARM64 we actually can never have 1K 
pages. This page size was supported only by old 32-bit ARM CPUs, up to ARMv5 
IIRC, then it was dropped. Linux OS never even used it.
 But, since qemu can emulate those ancient CPUs, TARGET_PAGE_BITS is defined to 
10 for ARM. And, ARM64 and ARM32 is actually the same target for qemu, so this 
is why we still get it.
 Perhaps, TARGET_PAGE_BITS should be a variable for ARM, and we should set it 
according to the actual used CPU. Then this IOMMU alignment problem would 
disappear automatically. What do you think?
 Cc'ed Peter since he is the main ARM guy here.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] virtio: Implement userspace forwarding for host notifiers

2015-11-19 Thread Pavel Fedin

 Hello!

> Looks like this would be ok for virtio-ccw (as it does not call
> virtio_queue_set_host_notifier_forwarding)

 Yes, this is the reason why i intentionally did not insert extra logic into 
virtio_queue_set_host_notifier_fd_handler(), but made
it into separate function. I don't know whether KVM on S390 reports 
KVM_CAP_IOEVENTFD.

> Question is might something like this for virtio-ccw useful as well?

 Yes, provided you have something to bind userspace actions to. Looks like S390 
doesn't use MMIO here at all, but some obscure (for
me) thing, called "subchannel".
 Another question is whether you can have kernel without KVM_CAP_IOEVENTFD on 
S390. Was it introduced since the beginning, or some
time later?

> We cannot use memory_region, though, we would need to handle that in our
> diagnose code.

 Sorry, this phrase is a complete mystery to me, i have no idea what is 
diagnose code, as well as have never seen S390 in real life.
I think if you have motivation, you could implement userspace forwarding for 
ccw yourself, since you know the thing and know how to
do it. For MMIO it was indeed very simple.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH] virtio: Implement userspace forwarding for host notifiers

2015-11-19 Thread Pavel Fedin

If you happen to have a kernel with ioeventfd support enabled, but
missing support for them in KVM, and you attempt to enable vhost by
setting vhost=on, qemu aborts with error:

kvm_mem_ioeventfd_add: error adding ioeventfd: Function not implemented

This patch adds a mechanism which allows to emulate KVM binding by
triggering the related notifiers via the userspace. The first time the
emulation is used, a warning is displayed, so that the user knows about
potential performance impact:

2015-11-19T09:35:16.618380Z qemu-system-aarch64: KVM does not support eventfd 
binding, using userspace event forwarding (slow)

This problem can be observed with libvirt, which checks for /dev/vhost-net
availability and just inserts "vhost=on" automatically in this case; on an
ARM64 system using stock kernel 3.18.0 with CONFIG_IOEVENTFD enabled in
expert settings.

Signed-off-by: Pavel Fedin 
---
 hw/virtio/virtio-mmio.c| 15 +---
 hw/virtio/virtio-pci.c | 61 ++
 hw/virtio/virtio.c | 24 +-
 include/hw/virtio/virtio.h |  1 +
 4 files changed, 69 insertions(+), 32 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 16621fa..69d4cbc 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -110,11 +110,18 @@ static int 
virtio_mmio_set_host_notifier_internal(VirtIOMMIOProxy *proxy,
 return r;
 }
 virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler);
-memory_region_add_eventfd(&proxy->iomem, VIRTIO_MMIO_QUEUENOTIFY, 4,
-  true, n, notifier);
+
+if (kvm_eventfds_enabled()) {
+memory_region_add_eventfd(&proxy->iomem, VIRTIO_MMIO_QUEUENOTIFY, 
4,
+  true, n, notifier);
+} else if (!set_handler) {
+virtio_queue_set_host_notifier_forwarding(vq);
+}
 } else {
-memory_region_del_eventfd(&proxy->iomem, VIRTIO_MMIO_QUEUENOTIFY, 4,
-  true, n, notifier);
+if (kvm_eventfds_enabled()) {
+memory_region_del_eventfd(&proxy->iomem, VIRTIO_MMIO_QUEUENOTIFY, 
4,
+  true, n, notifier);
+}
 virtio_queue_set_host_notifier_fd_handler(vq, false, false);
 event_notifier_cleanup(notifier);
 }
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 96be4fd..b27a630 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -293,41 +293,48 @@ static int 
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
 return r;
 }
 virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler);
-if (modern) {
-if (fast_mmio) {
-memory_region_add_eventfd(modern_mr, modern_addr, 0,
-  false, n, notifier);
-} else {
-memory_region_add_eventfd(modern_mr, modern_addr, 2,
-  false, n, notifier);
+
+if (kvm_eventfds_enabled()) {
+if (modern) {
+if (fast_mmio) {
+memory_region_add_eventfd(modern_mr, modern_addr, 0,
+  false, n, notifier);
+} else {
+memory_region_add_eventfd(modern_mr, modern_addr, 2,
+  false, n, notifier);
+}
+if (modern_pio) {
+memory_region_add_eventfd(modern_notify_mr, 0, 2,
+  true, n, notifier);
+}
 }
-if (modern_pio) {
-memory_region_add_eventfd(modern_notify_mr, 0, 2,
-  true, n, notifier);
+if (legacy) {
+memory_region_add_eventfd(legacy_mr, legacy_addr, 2,
+  true, n, notifier);
 }
-}
-if (legacy) {
-memory_region_add_eventfd(legacy_mr, legacy_addr, 2,
-  true, n, notifier);
+} else if (!set_handler) {
+virtio_queue_set_host_notifier_forwarding(vq);
 }
 } else {
-if (modern) {
-if (fast_mmio) {
-memory_region_del_eventfd(modern_mr, modern_addr, 0,
-  false, n, notifier);
-} else {
-memory_region_del_eventfd(modern_mr, modern_addr, 2,
-  false, n, notifier);
+if (kvm_eventfds_enabled()) {
+if (modern) {
+if (fast_mmio) {
+memory_region_del_eventfd(modern_mr, modern_addr, 0,
+  false,

[Qemu-devel] [PATCH] vfio: Align iova also to IOMMU page size

2015-11-16 Thread Pavel Fedin

On some architectures TARGET_PAGE_ALIGN() is not enough to get the right
alignment. For example on ARM TARGET_PAGE_BITS is 10 because some old CPUs
support 1K page size, while minimum SMMU page size is 4K.

This fixes problems like:

2015-11-17T07:37:42.892265Z qemu-system-aarch64: VFIO_MAP_DMA: -22
2015-11-17T07:37:42.892309Z qemu-system-aarch64: vfio_dma_map(0x223da230, 
0x80002f0400, 0x10fc00, 0x7f89b40400) = -22 (Invalid
argument)
qemu: hardware error: vfio: DMA mapping failed, unable to continue

Signed-off-by: Pavel Fedin 
---
 hw/vfio/common.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ff5a89a..328140c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -326,7 +326,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
  MemoryRegionSection *section)
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
-hwaddr iova, end;
+hwaddr iova, end, iommu_page_size;
 Int128 llend;
 void *vaddr;
 int ret;
@@ -346,6 +346,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 
 iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+iommu_page_size = vfio_container_granularity(container);
+iova = (iova + iommu_page_size - 1) & ~(iommu_page_size - 1);
 llend = int128_make64(section->offset_within_address_space);
 llend = int128_add(llend, section->size);
 llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
@@ -390,8 +392,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
 memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
-memory_region_iommu_replay(giommu->iommu, &giommu->n,
-   vfio_container_granularity(container),
+memory_region_iommu_replay(giommu->iommu, &giommu->n, iommu_page_size,
false);
 
 return;
-- 
1.9.5.msysgit.0

[Qemu-devel] [PATCH v2] vhost: Fix aborting if KVM does not support eventfds

2015-11-16 Thread Pavel Fedin

If you happen to have a stock kernel of old version, like 3.x, and you
attempt to enable vhost by setting vhost=on, qemu aborts with error:

kvm_mem_ioeventfd_add: error adding ioeventfd: Function not implemented

This patch adds capability check, so that vhost gets disabled instead. A
warning is displayed, explaining the reason:

2015-11-13T08:43:51.146802Z qemu-system-aarch64: KVM does not support eventfd 
binding
2015-11-13T08:43:51.146915Z qemu-system-aarch64: unable to start vhost net: 38: 
falling back on userspace virtio

This problem can be observed with libvirt, which checks for /dev/vhost-net
availability and just inserts "vhost=on" automatically in this case.

Signed-off-by: Pavel Fedin 
---
v1 => v2:
- Removed "MMIO" from warning message, because it applies not only to MMIO
- Add note about warning to the commit message
---
 hw/virtio/vhost.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 1794f0d..50b8171 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -24,6 +24,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/migration.h"
+#include "sysemu/kvm.h"
 
 static struct vhost_log *vhost_log;
 static struct vhost_log *vhost_log_shm;
@@ -1083,6 +1084,11 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 r = -ENOSYS;
 goto fail;
 }
+if (!kvm_eventfds_enabled()) {
+error_report("KVM does not support eventfd binding");
+r = -ENOSYS;
+goto fail;
+}
 
 for (i = 0; i < hdev->nvqs; ++i) {
 r = k->set_host_notifier(qbus->parent, hdev->vq_index + i, true);
-- 
1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH v4 6/7] qom: replace object property list with GHashTable

2015-11-16 Thread Pavel Fedin

 Hello!

> My idea, which I wanted to investigate after the weekend, is iterating
> through the hashtable to create a list of prop->release functions and
> call them only after finishing the iteration. That might not work
> either, so we may need to loop over the releasing to allow for released
> properties to disappear after prop->release().

 Hm...
 May be instead of actually deleting a property, while we are iterating, we 
should mark it as pending for deletion, and then, after
iteration is done, actually remove them? This would cost one 'bool 
delete_pending' per property.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vhost: Fix aborting if KVM does not support eventfds

2015-11-16 Thread Pavel Fedin

 Hello!

> > If you happen to have a stock kernel of old version, like 3.x, and you
> > attempt to enable vhost by setting vhost=on, qemu aborts with error:
> >
> > kvm_mem_ioeventfd_add: error adding ioeventfd: Function not implemented
> >
> > This patch adds capability check, so that vhost gets disabled instead. A
> > warning is displayed, explaining the reason.
> >
> > This problem can be observed with libvirt, which checks for /dev/vhost-net
> > availability and just inserts "vhost=on" automatically in this case.
> >
> > Signed-off-by: Pavel Fedin 
> 
> How come you have /dev/vhost-net though?

 Easily. Just enable CONFIG_VHOST_NET for the kernel. I happened to have it 
because i just copied over .config from another kernel
version, which had ioeventfds working.
 CONFIG_VHOST_NET in the kernel depends on CONFIG_VHOST, which in turn depends 
on CONFIG_EVENTFD. But there's no direct dependency
between it and CONFIG_HAVE_KVM_EVENTFD. So, you can just enable 
CONFIG_IOEVENTFD in "Configure standard kernel features", and get
ioeventfds by themselves with vhost-net. Which, technically speaking, has no 
direct dependency on KVM's ability to bind ioeventfd to
memory accesses. Because, theoretically, you could have some other use for it.

> That was supposed to only be there if you have vhost-net,
> and that never existed on kernels without eventfd.

 I believe distro maintainers (should) have taken care of it and disable it. 
But, as you can see, in some circumstances you could
have it enabled. IMHO it's better to be more flexible and process this 
situation correctly. The fix is trivial.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vhost: Fix aborting if KVM does not support eventfds

2015-11-16 Thread Pavel Fedin

 Hello!

> OK so it's a misconfigured kernel.
> Fine but I'm not happy with silently using userspace instead.

 It's not silent. You get two warnings in the log:
--- cut ---
2015-11-13T08:43:51.146802Z qemu-system-aarch64: KVM does not support MMIO 
eventfds
2015-11-13T08:43:51.146915Z qemu-system-aarch64: unable to start vhost net: 38: 
falling back on userspace virtio
--- cut ---

> If people ask for vhost they should get it.
> How about
> - kicking vhost from userspace

 This can be interesting, this would be similar to forwarding IRQ through the 
userspace if we don't have irqfds. But i believe this
would require a lot to implement. To tell the truth, i'm not ready for this.
 Actually, for now i just want to provide some way to switch kernels and use 
the same userspace configuration, and make all possible
conbinations working somehow, without critical faults. Anyway, if we want to 
use vhost, i suppose we want to get maximum
performance, which can not be achieved with userspace event forwarding, so it's 
IMHO okay just to fallback to userspace (which we
already do, just we miss to catch some situations), just explain why.
 I even plan to add another warning, when eventfds work and irqfds don't. 
Currently we just silently fall back to userspace IRQ
forwarding.

> or
> - failing vhost init

 That's what my patch does. vhost_dev_enable_notifiers () is called from within 
vhost-net init routine. I intentionally patch
vhost.c so that it should also have the same effect on vhost-scsi.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v4 6/7] qom: replace object property list with GHashTable

2015-11-16 Thread Pavel Fedin

 Hello!

> >> (process:4102): GLib-CRITICAL **: g_hash_table_iter_next: assertion
> >> 'ri->version == ri->hash_table->version' failed
> >>
> >> (process:4102): GLib-CRITICAL **: g_hash_table_iter_next: assertion
> >> 'ri->version == ri->hash_table->version' failed
> >>
> >> (process:4102): GLib-CRITICAL **: iter_remove_or_steal: assertion
> >> 'ri->version == ri->hash_table->version' failed

 Wow... Actually this may come from attempts to modify the tree inside 
iteration.

> Thanks! sclp_init() seems to violate several QOM design principles in
> that it uses object_new() during TypeInfo::instance_init() and uses a
> TYPE_... constant as property name. But nothing else stands out immediately.

 I think we should refactor this and retry. If not all problems go away, then 
we are indeed modifying the tree during iteration, and
we have to find some solution.
 I wonder... Could we have both list and hashtable? hashtable for searching by 
name and list for iteration. In this case we would
not have to use glib's iterators, and would be free of problems with them. Just 
keep the list and hashtable in sync.
 Or, is there any hashtable implementation out there which would keep iterators 
valid during modification?
 OTOH, glib has a function "remove the element at iterator's position", and we 
could postpone addition. So, perhaps, using both
containers would be an overkill, just refactor the code to adapt to the new 
behavior.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] vfio: Fix handling VFIO_IOMMU_GET_INFO results

2015-11-13 Thread Pavel Fedin

 Hello!

> >  If we fix qemu, it will automatically start working with all
> > available kernels which are there in the wild. If we fix kernel, older
> > versions will still not work, however they can.
> >  That's why i think that we should adapt qemu to what already exists.
> > But, well, you are The Boss, so you can just say "i don't care". So,
> > just let me now if you strongly disagree with this.
> 
> I do care, in fact I care enough about the ABI that I'm suggesting what
> I think is the correct fix rather than taking the quick and dirty
> solution.  It's an unfortunate bug, but it's not worth changing the ABI
> and removing the kernel's ability to indicate whether the pgsize bitmap
> field is valid IMO.

 Ok, i see your point...
 But what about fix, which would work both with future kernels, which do 
provide this flag, as well as would be compatible with already existing 
kernels, which set flags == 0?
 We could check for ((info.flags == 0) || (info.flags & 
VFIO_IOMMU_INFO_PGSIZES)). This would conform to both behaviors:
 a) All current kernels set flags = 0 and report page sizes.
 b) Some future kernels could have set some flags, but not reported page sizes 
and not set VFIO_IOMMI_PGSIZES

 What would you say about this? Yes, this would be a "workaround" instead of 
"fix".

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH] vhost: Fix aborting if KVM does not support eventfds

2015-11-13 Thread Pavel Fedin

If you happen to have a stock kernel of old version, like 3.x, and you
attempt to enable vhost by setting vhost=on, qemu aborts with error:

kvm_mem_ioeventfd_add: error adding ioeventfd: Function not implemented

This patch adds capability check, so that vhost gets disabled instead. A
warning is displayed, explaining the reason.

This problem can be observed with libvirt, which checks for /dev/vhost-net
availability and just inserts "vhost=on" automatically in this case.

Signed-off-by: Pavel Fedin 
---
 hw/virtio/vhost.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 1794f0d..3121e19 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -24,6 +24,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/migration.h"
+#include "sysemu/kvm.h"
 
 static struct vhost_log *vhost_log;
 static struct vhost_log *vhost_log_shm;
@@ -1083,6 +1084,11 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 r = -ENOSYS;
 goto fail;
 }
+if (!kvm_eventfds_enabled()) {
+error_report("KVM does not support MMIO eventfds");
+r = -ENOSYS;
+goto fail;
+}
 
 for (i = 0; i < hdev->nvqs; ++i) {
 r = k->set_host_notifier(qbus->parent, hdev->vq_index + i, true);
-- 
1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH] vfio: Fix handling VFIO_IOMMU_GET_INFO results

2015-11-12 Thread Pavel Fedin

 Hello!

> > Kernel headers define VFIO_IOMMU_INFO_PGSIZES flag, however it has
> > actually been never used, probably by mistake which now became a part
> > of the ABI. The kernel always sets info.flags to 0:
> 
> I don't see how this implies that it becomes part of the ABI.  In fact,
> as the defacto userspace driver for vfio, QEMU honoring the flag and not
> using the value the kernel provides implies the ABI is still valid.  We
> should fix the kernel instead.

 Well... I intentionally put two links to LXR. From the very beginning, this 
ioctl returned valid page sizes. And it never set this flag. We simply cannot 
have a kernel which does not report page sizes.
 If we fix qemu, it will automatically start working with all available kernels 
which are there in the wild. If we fix kernel, older versions will still not 
work, however they can.
 That's why i think that we should adapt qemu to what already exists. But, 
well, you are The Boss, so you can just say "i don't care". So, just let me now 
if you strongly disagree with this.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH] vfio: Fix handling VFIO_IOMMU_GET_INFO results

2015-11-11 Thread Pavel Fedin

Kernel headers define VFIO_IOMMU_INFO_PGSIZES flag, however it has
actually been never used, probably by mistake which now became a part
of the ABI. The kernel always sets info.flags to 0:

http://lxr.free-electrons.com/source/drivers/vfio/vfio_iommu_type1.c?v=3.7#L675
http://lxr.free-electrons.com/source/drivers/vfio/vfio_iommu_type1.c#L974

Signed-off-by: Pavel Fedin 
---
 hw/vfio/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6797208..afc10c7 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -704,7 +704,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 info.argsz = sizeof(info);
 ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
 /* Ignore errors */
-if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+if (ret == 0) {
 container->iova_pgsizes = info.iova_pgsizes;
 }
 } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
-- 
1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH v4 6/7] qom: replace object property list with GHashTable

2015-11-06 Thread Pavel Fedin

 Hello!

> > > -QTAILQ_FOREACH(prop, &obj->parent->properties, node) {
> > > +g_hash_table_iter_init(&iter, obj->parent->properties);
> > > +while (g_hash_table_iter_next(&iter, NULL, (gpointer *)&prop)) {
> >
> > Is this cast needed?
> 
> Probably not, as any pointer should coerce to void * without an
> explicit cast, unless it had a 'const' involved.

 But after '&' it becomes "**", not just "*", and this implicit conversion
rule doesn't apply any more. "void **" and "something **" are always considered
different.  

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v4 6/7] qom: replace object property list with GHashTable

2015-11-06 Thread Pavel Fedin

 Hello!

> >  static void object_property_del_all(Object *obj)
> >  {
> > -while (!QTAILQ_EMPTY(&obj->properties)) {
> > -ObjectProperty *prop = QTAILQ_FIRST(&obj->properties);
> > -
> > -QTAILQ_REMOVE(&obj->properties, prop, node);
> > +ObjectProperty *prop;
> > +GHashTableIter iter;
> > +gpointer key, value;
> >
> > +g_hash_table_iter_init(&iter, obj->properties);
> > +while (g_hash_table_iter_next(&iter, &key, &value)) {
> > +prop = value;
> >  if (prop->release) {
> >  prop->release(obj, prop->name, prop->opaque);
> >  }
> 
> Why is this not in property_free(), too? Is there a timing difference?

 This is what i started from, and got NAKed. property_free() gets only 
ObjectProperty * as argument, but for our release callback we
need also 'obj'. In my first version i added Object * backpointer to 
ObjectProperty and Daniel reported a problem with that:
--- cut ---
I have a patch which adds property registration against the
class, so requiring ObjectProperty to have a back-poointer
to an object instance is not desirable.
--- cut ---
 Full message here: 
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg01999.html

> 
> > -
> > -g_free(prop->name);
> > -    g_free(prop->type);
> > -g_free(prop->description);
> > -g_free(prop);
> >  }
> > +
> > +g_hash_table_unref(obj->properties);
> >  }

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] migration: Introduce migration_in_completion()

2015-10-29 Thread Pavel Fedin

 Hello!

> ok, your problem here is that you modify ram.  Could you take a look at
> how vhost manage this?  It is done at migration_bitmap_sync(), and it
> just marks the pages that are dirty.

 Hm, interesting... I see it hooks into memory_region_sync_dirty_bitmap(). 
Sorry for maybe lame question, i do not know the whole
code, and it will be much faster for you to explain it to me, than for me to 
dig into it myself. At what moment is it called during
migration?

 For you to better understand what is necessary... ITS is a thing which can be 
implemented in-kernel by KVM, and i work on exactly
this. In my implementation i add an ioctl, which is called after CPUs are 
stopped. It flushes internal caches of the vITS to the
RAM. It happens inside the kernel. I guess, dirty state tracking works 
correctly in this case, because memory gets modified by the
kernel, and i guess from qemu's point of view it's the same as memory being 
modified by the guest. Therefore, i do not need to
modify memory state bitmaps. I only need to tell the kernel to actually write 
out the data.
 If we talk about making this thing iterative, we anyway need this ioctl. It 
could be modified inside the kernel to update only
those RAM parts whose data have been modified since the last flush. The 
semantics would stay the same - it's just an ioctl telling
the virtual device to store its data in RAM.
 Ah, and again, these memory listeners are not prioritized too. I guess i could 
use them, but i would need a guarantee that my
listener is called before KVMMemoryListener, which picks up changes.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] migration: Introduce migration_in_completion()

2015-10-28 Thread Pavel Fedin

 Hello!

> Power people have a similar problem with its hashed page tables, they
> integrated their own save_live implementation because they are too big
> for the last stage.  You can look there for inspiration.

 I examined their code. Interesting, and, indeed, it opens up a way for 
decreasing downtime by implementing iterative migration for
the ITS.
 However, this is not really what is necessary. This thing aims to produce own 
data chunk, and it's not good for ITS. ITS already
stores everything in system RAM, therefore savevm_ram_handlers take perfect 
care about these data. The only thing to do is to tell
the ITS to dump its state into RAM. This is what i currently do using 
migration_in_completion().
 An alternate, perhaps better approach, would be to be able to hook into 
ram_save_iterate() and ram_save_complete(). This way we
could kick ITS right before attempting to migrate RAM.
 Could we extend the infrastructure so that:
a) Handlers are prioritized, and we can determine order of their execution?
b) We can choose whether our handlers actually produce extra chunk or not?

 OTOH, what i've done is actually a way to hook up into save_live_complete 
before any other registered handlers get executed. What
is missing is one more notifier_list_notify() call right before 
qemu_savevm_state_iterate(), and a corresponding
migration_is_active() checker.

 What do you think ?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v2] target-arm: Extract some external ARM CPU API

2015-10-28 Thread Pavel Fedin

 Hello!

>  Ok, so decided. I will convert my code, test the build and send a small 
> patch for this soon,
> perhaps today.

 arm_gicv3_kvm.o is already in obj-y, and arm_gicv3_common.o does not use any 
of those definitions. So, nothing to move, there will be no patch.

 So far, we have only this small leftover: 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg02349.html. Needed 
both by live migration and SW emulation of GICv3.

 Shlomo: Just add your GICv3 code to obj-$(CONFIG_ARM_GIC), and you'll be able 
to include things you need.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] migration: Introduce migration_in_completion()

2015-10-28 Thread Pavel Fedin

 Hello!

> Power people have a similar problem with its hashed page tables, they
> integrated their own save_live implementation because they are too big
> for the last stage.  You can look there for inspiration.

 Hm, i'll take a look...

> How big are we talking here?

 Tables are normally large (about 640K on my VM), but extremely sparse. So the 
actually modified space during this callback would be
several pages, unlikely more.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH v2] backends/hostmem-file: Allow to specify full pathname for backing file

2015-10-28 Thread Pavel Fedin

This allows to explicitly specify file name to use with the backend. This
is important when using it together with ivshmem in order to make it backed
by hugetlbfs. By default filename is autogenerated using mkstemp(), and the
file is unlink()ed after creation, effectively making it anonymous. This is
not very useful with ivshmem because it ends up in a memory which cannot be
accessed by something else.

Distinction between directory and file name is done by stat() check. If an
existing directory is given, the code keeps old behavior. Otherwise it
creates or opens a file with the given pathname.

Signed-off-by: Pavel Fedin 
Tested-by: Igor Skalkin 
---
v1 => v2:
- Changed title to more generic one
- Do not introduce new property, check whether the given path is a
  directory instead
---
 exec.c| 34 +-
 qemu-doc.texi |  2 +-
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/exec.c b/exec.c
index 8af2570..3238c9a 100644
--- a/exec.c
+++ b/exec.c
@@ -1205,6 +1205,7 @@ static void *file_ram_alloc(RAMBlock *block,
 const char *path,
 Error **errp)
 {
+struct stat st;
 char *filename;
 char *sanitized_name;
 char *c;
@@ -1233,26 +1234,33 @@ static void *file_ram_alloc(RAMBlock *block,
 goto error;
 }
 
-/* Make name safe to use with mkstemp by replacing '/' with '_'. */
-sanitized_name = g_strdup(memory_region_name(block->mr));
-for (c = sanitized_name; *c != '\0'; c++) {
-if (*c == '/')
-*c = '_';
-}
+if (!stat(path, &st) && S_ISDIR(st.st_mode)) {
+/* Make name safe to use with mkstemp by replacing '/' with '_'. */
+sanitized_name = g_strdup(memory_region_name(block->mr));
+for (c = sanitized_name; *c != '\0'; c++) {
+if (*c == '/') {
+*c = '_';
+}
+}
 
-filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
-   sanitized_name);
-g_free(sanitized_name);
+filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
+   sanitized_name);
+g_free(sanitized_name);
+
+fd = mkstemp(filename);
+if (fd >= 0) {
+unlink(filename);
+}
+g_free(filename);
+} else {
+fd = open(path, O_RDWR | O_CREAT, 0644);
+}
 
-fd = mkstemp(filename);
 if (fd < 0) {
 error_setg_errno(errp, errno,
  "unable to create backing store for hugepages");
-g_free(filename);
 goto error;
 }
-unlink(filename);
-g_free(filename);
 
 memory = ROUND_UP(memory, hpagesize);
 
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 3126abd..460ab71 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1299,7 +1299,7 @@ Instead of specifying the  using POSIX shm, you 
may specify
 a memory backend that has hugepage support:
 
 @example
-qemu-system-i386 -object 
memory-backend-file,size=1G,mem-path=/mnt/hugepages,id=mb1
+qemu-system-i386 -object 
memory-backend-file,size=1G,mem-path=/mnt/hugepages/my-shmem-file,id=mb1
  -device ivshmem,memdev=mb1
 @end example
 
-- 
1.9.5.msysgit.0

Re: [Qemu-devel] proposal: new qemu-arm mailing list

2015-10-27 Thread Pavel Fedin

 Hello!

> Just to jump on and bead a dead horse, I am not OK with the idea of a
> mailing list where patches might get reviewed and staged for pull
> without the general population of qemu-devel being able to look first.

 Ok ok, i don't object, Peter has already explained a similar thing to me, and 
i silently agreed.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v2] target-arm: Extract some external ARM CPU API

2015-10-27 Thread Pavel Fedin

 Hello!

> If there are no dependent patches on list that are 2.5 candidate, I suggest 
> we just drop this one for the short term.
> Is there something on list?

 For 2.5, AFAIK, no. Unless Shlomo suddenly comes up with clean GICv3 
implementation which could be quickly accepted.
 I wanted to upstream this now in order to forget about the problem and allow 
me and Shlomo to post cleaner series without the need to include preparations 
like this every time.

> I have no objection to the obj-y solution medium term, as there are already 
> multiple users of the ARM CP API using
> their obj-y privileges to access it.

 Ok, so decided. I will convert my code, test the build and send a small patch 
for this soon, perhaps today.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] backends/hostmem-file: Add file-name property

2015-10-27 Thread Pavel Fedin

 Hello!

> Is there a missing qapi change, so that the new filename attribute can
> also be specified by QMP command?  Or do we not support hotplug of
> ivshmem yet?

 To tell the truth, i don't know, and we currently do not test ivshmem with 
hotplug here, neither i heard that we would want it. And i actually even don't 
know how hotplug works in qemu.
 I discovered an issue and fixed it - this simple. Without this fix ivshmem + 
hostmem-file is unusable because the file cannot be located, therefore the 
memory cannot be shared.
 Could this (possibly) missing thing be added afterwards, if necessary? Anyway, 
my fix does not break anything (i hope).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v2] target-arm: Extract some external ARM CPU API

2015-10-27 Thread Pavel Fedin

 Hello!

> But Peter C has a better grasp of the current status of
> multiarch than I do, so if he prefers using obj-y then
> I'm happy to take his recommendation.

 I am neutral, and i will accept any solution. OTOH, he agreed with my proposal 
too, and even promised to pick it up into his patchsets, but suggested to 
change include pathname to include/hw/arm/cpu.h. And i have the last concern 
about it because hw/arm/ is a place where board-specific includes are placed.
 Additionally, we already have three more specific ARM CPU files in 
include/hw/cpu/, so they just look nice together.

 And you know... Theoretically... As far as i can understand, multi-arch is a 
way to enable emulation of heterogeneous systems, which combine different CPUs. 
Am i right about it? In this case, what if in future we have some heterogeneous 
HW, where e.g. ARM and x86 are interoperating using system registers? In this 
case, x86-targeted qemu code would probably have to talk to ARM-targeted one. 
And having this API separated from the inner CPU guts would only help.
 Compared to this architectural improvement, obj-y looks like very simple way 
to ignore, but not to solve the problem. :)

 So, Peter C, what is your final decision?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] migration: Introduce migration_in_completion()

2015-10-27 Thread Pavel Fedin

 Hello!

> Applied, thanks.

 Thank you.

> I wonder who it has never been needed before.  BTW, what is ITS?

 ITS is Interrupt Translation Service. It is used by ARM64 architecture. In a 
short, it is a special hardware which performs
table-based lookup of MSI IDs and their destination CPUs, and routes them. 
There can be lots of CPUs and lots of interrupts, so
these tables are large and are held in a system RAM.
 qemu implementation of this thing needs to be able to flush its internal 
cached state into memory for live migration. This has to
be done during final stage, right after CPUs have been stopped, because right 
after this point migration code will recheck RAM
status and send out modified data for the last time.
 The implementation itself has to wait, because the necessary KVM API is also 
not ready yet, but i prefer to upstream as much of the
infrastructure as possible, because: a) we want to reduce amount of out-of-tree 
patches for our project; b) these things affect qemu
core code, and upstreaming them makes sure that we do it right, and our 
out-of-tree code will not diverge from the upstream too
much.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] migration: Introduce MIGRATION_STATUS_FINISHING

2015-10-27 Thread Pavel Fedin

 Hello!

> If it is truly internal, then avoiding exposing the state to external
> clients is indeed the way to go.

 Already done: 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg06262.html, you 
should have received it too.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH] backends/hostmem-file: Add file-name property

2015-10-27 Thread Pavel Fedin

This option allows to explicitly specify file name to use with the backend.
This is important when using it together with ivshmem in order to make it
backed by hugetlbfs. By default filename is autogenerated using mkstemp(),
and the file is unlink()ed after creation, effectively making it
anonymous. This is not very useful with ivshmem because it ends up in a
memory which cannot be accessed by something else.

Signed-off-by: Pavel Fedin 
Tested-by: Igor Skalkin 
---
 backends/hostmem-file.c | 26 +-
 exec.c  | 36 
 include/exec/memory.h   |  3 +++
 include/exec/ram_addr.h |  2 +-
 memory.c|  4 +++-
 numa.c  |  2 +-
 qemu-doc.texi   |  2 +-
 7 files changed, 58 insertions(+), 17 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e9b6d21..557eaf1 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -31,6 +31,7 @@ struct HostMemoryBackendFile {
 
 bool share;
 char *mem_path;
+char *file_name;
 };
 
 static void
@@ -54,7 +55,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  object_get_canonical_path(OBJECT(backend)),
  backend->size, fb->share,
- fb->mem_path, errp);
+ fb->mem_path, fb->file_name, errp);
 }
 #endif
 }
@@ -83,10 +84,31 @@ static void set_mem_path(Object *o, const char *str, Error 
**errp)
 error_setg(errp, "cannot change property value");
 return;
 }
+
 g_free(fb->mem_path);
 fb->mem_path = g_strdup(str);
 }
 
+static char *get_file_name(Object *o, Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+return g_strdup(fb->file_name);
+}
+
+static void set_file_name(Object *o, const char *str, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (memory_region_size(&backend->mr)) {
+error_setg(errp, "cannot change property value");
+return;
+}
+g_free(fb->file_name);
+fb->file_name = g_strdup(str);
+}
+
 static bool file_memory_backend_get_share(Object *o, Error **errp)
 {
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
@@ -114,6 +136,8 @@ file_backend_instance_init(Object *o)
 file_memory_backend_set_share, NULL);
 object_property_add_str(o, "mem-path", get_mem_path,
 set_mem_path, NULL);
+object_property_add_str(o, "file-name", get_file_name,
+set_file_name, NULL);
 }
 
 static const TypeInfo file_backend_info = {
diff --git a/exec.c b/exec.c
index 8af2570..6cf1a36 100644
--- a/exec.c
+++ b/exec.c
@@ -1203,6 +1203,7 @@ static long gethugepagesize(const char *path, Error 
**errp)
 static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 const char *path,
+const char *name,
 Error **errp)
 {
 char *filename;
@@ -1240,18 +1241,29 @@ static void *file_ram_alloc(RAMBlock *block,
 *c = '_';
 }
 
-filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
-   sanitized_name);
-g_free(sanitized_name);
+if (name) {
+filename = g_strdup_printf("%s/%s", path, name);
+fd = open(filename, O_RDWR | O_CREAT, 0644);
+if (fd < 0) {
+error_setg_errno(errp, errno,
+ "unable to open backing store for hugepages");
+g_free(filename);
+goto error;
+}
+} else {
+filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
+   sanitized_name);
+g_free(sanitized_name);
 
-fd = mkstemp(filename);
-if (fd < 0) {
-error_setg_errno(errp, errno,
- "unable to create backing store for hugepages");
-g_free(filename);
-goto error;
+fd = mkstemp(filename);
+if (fd < 0) {
+error_setg_errno(errp, errno,
+ "unable to create backing store for hugepages");
+g_free(filename);
+goto error;
+}
+unlink(filename);
 }
-unlink(filename);
 g_free(filename);
 
 memory = ROUND_UP(memory, hpagesize);
@@ -1563,7 +1575,7 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, 
Error **errp)
 #ifdef __linux__
 ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
 bool share,

[Qemu-devel] [PATCH] backends/hostmem: Ignore ENOSYS while setting MPOL_DEFAULT

2015-10-27 Thread Pavel Fedin

Currently hostmem backend fails if CONFIG_NUMA is enabled for the qemu
(default), but NUMA is not supported by the kernel. This makes it
impossible to use ivshmem in such configurations.

This patch fixes the problem by ignoring ENOSYS error if policy is set to
MPOL_DEFAULT. This way the code behaves in the same way as if CONFIG_NUMA
was not defined. qemu will still fail if the user specifies some other
policy, so that the user knows it.

Signed-off-by: Pavel Fedin 
---
 backends/hostmem.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 41ba2af..94a4ac0 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -313,9 +313,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 assert(maxnode <= MAX_NODES);
 if (mbind(ptr, sz, backend->policy,
   maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
-error_setg_errno(errp, errno,
+if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
+error_setg_errno(errp, errno,
  "cannot bind memory to host NUMA nodes");
-return;
+return;
+}
 }
 #endif
 /* Preallocate memory after the NUMA policy has been instantiated.
-- 
1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH] backends/hostmem: Ignore ENOSYS while setting MPOL_DEFAULT

2015-10-27 Thread Pavel Fedin

 Sorry, guys, i forgot to add "v2" and simply sent the mail. Log is:

v1 => v2:
- Removed unnecessary parenthesis

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

> -Original Message-
> From: qemu-devel-bounces+p.fedin=samsung@nongnu.org [mailto:qemu-devel-
> bounces+p.fedin=samsung@nongnu.org] On Behalf Of Pavel Fedin
> Sent: Tuesday, October 27, 2015 3:52 PM
> To: qemu-devel@nongnu.org
> Cc: 'Paolo Bonzini'; 'Eduardo Habkost'
> Subject: [Qemu-devel] [PATCH] backends/hostmem: Ignore ENOSYS while setting 
> MPOL_DEFAULT
> 
> Currently hostmem backend fails if CONFIG_NUMA is enabled for the qemu
> (default), but NUMA is not supported by the kernel. This makes it
> impossible to use ivshmem in such configurations.
> 
> This patch fixes the problem by ignoring ENOSYS error if policy is set to
> MPOL_DEFAULT. This way the code behaves in the same way as if CONFIG_NUMA
> was not defined. qemu will still fail if the user specifies some other
> policy, so that the user knows it.
> 
> Signed-off-by: Pavel Fedin 
> ---
>  backends/hostmem.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 41ba2af..94a4ac0 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -313,9 +313,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
> Error **errp)
>  assert(maxnode <= MAX_NODES);
>  if (mbind(ptr, sz, backend->policy,
>maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) 
> {
> -error_setg_errno(errp, errno,
> +if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
> +error_setg_errno(errp, errno,
>   "cannot bind memory to host NUMA nodes");
> -return;
> +return;
> +}
>  }
>  #endif
>  /* Preallocate memory after the NUMA policy has been instantiated.
> --
> 1.9.5.msysgit.0
>

[Qemu-devel] [PATCH] backends/hostmem: Ignore ENOSYS while setting MPOL_DEFAULT

2015-10-27 Thread Pavel Fedin

Currently hostmem backend fails if CONFIG_NUMA is enabled for the qemu
(default), but NUMA is not supported by the kernel. This makes it
impossible to use ivshmem in such configurations.

This patch fixes the problem by ignoring ENOSYS error if policy is set to
MPOL_DEFAULT. This way the code behaves in the same way as if CONFIG_NUMA
was not defined. qemu will still fail if the user specifies some other
policy, so that the user knows it.

Signed-off-by: Pavel Fedin 
---
 backends/hostmem.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 41ba2af..826141b 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -313,9 +313,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 assert(maxnode <= MAX_NODES);
 if (mbind(ptr, sz, backend->policy,
   maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
-error_setg_errno(errp, errno,
+if ((backend->policy != MPOL_DEFAULT) || (errno != ENOSYS)) {
+error_setg_errno(errp, errno,
  "cannot bind memory to host NUMA nodes");
-return;
+return;
+}
 }
 #endif
 /* Preallocate memory after the NUMA policy has been instantiated.
-- 
1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH v2] target-arm: Extract some external ARM CPU API

2015-10-27 Thread Pavel Fedin

 Hello!

> We have plenty of precedent in tree for interrupt controllers being
> compiled as arch-specific for reasons such as this. Can we just
> promote GIC to an obj-y (much the same way the KVM GIC or V7MNVIC are
> promoted)? You should them have access to cpu.h and the CP interface.

 Huh, indeed, it's so simple... I failed to notice this.
 Peter, what do you think, if we indeed simply move gicv3 implementation from 
common-obj-y to obj-y?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH v2] target-arm: Extract some external ARM CPU API

2015-10-27 Thread Pavel Fedin

 Hello!

> >  include/hw/cpu/arm.h | 308 
> > +++
> 
> I think this would be hw/arm/cpu.h

 Sorry for split-reply, forgot to look at this...
 include/hw/arm is for board classes, isn't it?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] PING-2: [PATCH v4 0/7] qom: more efficient object property handling

2015-10-27 Thread Pavel Fedin

 Hello!

> > I can further test cases to do more coverage of object proprty handling
> > wrt to classes, if you want me to.
> 
> No, if that is sorted out now, I'll drop v2 and need to review v4.

 How is it? Any problems / advancements? Could i help somehow?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] proposal: new qemu-arm mailing list

2015-10-27 Thread Pavel Fedin

 Hello!

> The idea would be to get people to cc the
> list with their ARM related patches, so it would mostly act as
> a way for people to filter their mail to separate the ARM stuff
> out from the qemu-devel firehose.

 I also vote for it. The main qemu-devel traffic is quite high, and not 
everyone wants to get it.

> (Everything should still be cc'd to qemu-devel as well.)

 I would drop this requirement, except for stuff which could touch core 
functionality (qom, qobject).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH] migration: Introduce migration_in_completion()

2015-10-27 Thread Pavel Fedin

This allows to signal migration notifiers that the migration has entered
final phase. The condition is set after vm_stop_force_state().

This will be necessary for ITS live migration on ARM, which will have to
dump its state into guest RAM at this point.

Signed-off-by: Pavel Fedin 
---
 include/migration/migration.h | 2 ++
 migration/migration.c | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8334621..51b0ea2 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -69,6 +69,7 @@ struct MigrationState
 int parameters[MIGRATION_PARAMETER_MAX];
 
 int state;
+bool in_completion;
 MigrationParams params;
 double mbps;
 int64_t total_time;
@@ -117,6 +118,7 @@ int migrate_fd_close(MigrationState *s);
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
 bool migration_in_setup(MigrationState *);
+bool migration_in_completion(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 MigrationState *migrate_get_current(void);
diff --git a/migration/migration.c b/migration/migration.c
index b092f38..f4a2421 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -674,6 +674,11 @@ bool migration_in_setup(MigrationState *s)
 return s->state == MIGRATION_STATUS_SETUP;
 }
 
+bool migration_in_completion(MigrationState *s)
+{
+return s->in_completion;
+}
+
 bool migration_has_finished(MigrationState *s)
 {
 return s->state == MIGRATION_STATUS_COMPLETED;
@@ -996,6 +1001,8 @@ static void migration_completion(MigrationState *s, bool 
*old_vm_running,
 if (!ret) {
 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 if (ret >= 0) {
+s->in_completion = true;
+notifier_list_notify(&migration_state_notifiers, s);
 qemu_file_set_rate_limit(s->file, INT64_MAX);
 qemu_savevm_state_complete(s->file);
 }
-- 
1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH] migration: Introduce MIGRATION_STATUS_FINISHING

2015-10-27 Thread Pavel Fedin

 Hello!

> adding new user-visible states
> has a tendency to break existing clients that aren't prepared for
> unexpected states (although technically such bugs are in the client - in
> the past, libvirt used to be one such client, although we've tried to
> fix it to gracefully ignore unknown states).

 Yes, i know this, my host uses libvirt v1.2.9.3 (i backport necessary patches 
to it) and it barfed. I didn't check master though...

>  Are we sure no other
> existing state works for this?  I'm not opposed to the addition, just
> wanting to make sure we have good reason for it.

 You know, actually, this is a thing only for qemu's internal use, we don't 
need a new state at all. Instead, we could introduce a 'bool cpus_stopped' to 
MigrationState structure and test for it in migration_in_finishing(), like:
--- cut ---
bool migration_in_finishing(MigrationState *s)
{
return s->cpus_stopped;
}
--- cut ---
 What do you think about this solution? It is much less complicated than...

> One design idea for adding new states is making sure the new state will
> not be exposed unless the client specifies some new option to enable the
> state, so that old clients will never see the state and new clients have
> expressed their interest in the state

 With this kind of approach we would not be able to migrate ITS without this 
option. Because it's not external clients being interested in this state at the 
moment, but qemu itself.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH] migration: Introduce MIGRATION_STATUS_FINISHING

2015-10-26 Thread Pavel Fedin

This status is set after vm_stop_force_state(), and is telling us that all
CPUs are stopped, and we are finishing up with the migration.

Also, call notifier_list_notify() when this status is set. This will be
necessary for ITS live migration on ARM, which will have to dump its state
into guest RAM at this point.

Signed-off-by: Pavel Fedin 
---
 hmp.c |  1 +
 include/migration/migration.h |  1 +
 migration/migration.c | 19 ---
 qapi-schema.json  |  4 +++-
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/hmp.c b/hmp.c
index 5048eee..202768f 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1467,6 +1467,7 @@ static void hmp_migrate_status_cb(void *opaque)
 
 info = qmp_query_migrate(NULL);
 if (!info->has_status || info->status == MIGRATION_STATUS_ACTIVE ||
+info->status == MIGRATION_STATUS_FINISHING ||
 info->status == MIGRATION_STATUS_SETUP) {
 if (info->has_disk) {
 int progress;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8334621..ff4bfcb 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -117,6 +117,7 @@ int migrate_fd_close(MigrationState *s);
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
 bool migration_in_setup(MigrationState *);
+bool migration_in_finishing(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 MigrationState *migrate_get_current(void);
diff --git a/migration/migration.c b/migration/migration.c
index b092f38..8f28585 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -427,6 +427,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->has_total_time = false;
 break;
 case MIGRATION_STATUS_ACTIVE:
+case MIGRATION_STATUS_FINISHING:
 case MIGRATION_STATUS_CANCELLING:
 info->has_status = true;
 info->has_total_time = true;
@@ -507,6 +508,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 MigrationCapabilityStatusList *cap;
 
 if (s->state == MIGRATION_STATUS_ACTIVE ||
+s->state == MIGRATION_STATUS_FINISHING ||
 s->state == MIGRATION_STATUS_SETUP) {
 error_setg(errp, QERR_MIGRATION_ACTIVE);
 return;
@@ -641,7 +643,8 @@ static void migrate_fd_cancel(MigrationState *s)
 do {
 old_state = s->state;
 if (old_state != MIGRATION_STATUS_SETUP &&
-old_state != MIGRATION_STATUS_ACTIVE) {
+old_state != MIGRATION_STATUS_ACTIVE &&
+old_state != MIGRATION_STATUS_FINISHING) {
 break;
 }
 migrate_set_state(s, old_state, MIGRATION_STATUS_CANCELLING);
@@ -674,6 +677,11 @@ bool migration_in_setup(MigrationState *s)
 return s->state == MIGRATION_STATUS_SETUP;
 }
 
+bool migration_in_finishing(MigrationState *s)
+{
+return s->state == MIGRATION_STATUS_FINISHING;
+}
+
 bool migration_has_finished(MigrationState *s)
 {
 return s->state == MIGRATION_STATUS_COMPLETED;
@@ -774,6 +782,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.shared = has_inc && inc;
 
 if (s->state == MIGRATION_STATUS_ACTIVE ||
+s->state == MIGRATION_STATUS_FINISHING ||
 s->state == MIGRATION_STATUS_SETUP ||
 s->state == MIGRATION_STATUS_CANCELLING) {
 error_setg(errp, QERR_MIGRATION_ACTIVE);
@@ -985,6 +994,7 @@ int64_t migrate_xbzrle_cache_size(void)
 static void migration_completion(MigrationState *s, bool *old_vm_running,
  int64_t *start_time)
 {
+MigrationStatus state = MIGRATION_STATUS_ACTIVE;
 int ret;
 
 qemu_mutex_lock_iothread();
@@ -996,6 +1006,9 @@ static void migration_completion(MigrationState *s, bool 
*old_vm_running,
 if (!ret) {
 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 if (ret >= 0) {
+state = MIGRATION_STATUS_FINISHING;
+migrate_set_state(s, MIGRATION_STATUS_ACTIVE, state);
+notifier_list_notify(&migration_state_notifiers, s);
 qemu_file_set_rate_limit(s->file, INT64_MAX);
 qemu_savevm_state_complete(s->file);
 }
@@ -1011,11 +1024,11 @@ static void migration_completion(MigrationState *s, 
bool *old_vm_running,
 goto fail;
 }
 
-migrate_set_state(s, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COMPLETED);
+migrate_set_state(s, state, MIGRATION_STATUS_COMPLETED);
 return;
 
 fail:
-migrate_set_state(s, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED);
+migrate_set_state(s, state, MIGRATION_STATUS_FAILED);
 }
 
 /* migration thread support */
diff --git a/qapi-schema.json b/qapi-schema.json
index f60be29..c6226da 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@

Re: [Qemu-devel] [RFC PATCH v3 3/4] hw/intc/arm_gicv3_kvm: Implement get/put functions

2015-10-26 Thread Pavel Fedin

 Hello!

> > +reg = c->pendbaser & (GICR_PENDBASER_OUTER_CACHEABILITY_MASK |
> > +  GICR_PENDBASER_ADDR_MASK |
> > +  GICR_PENDBASER_SHAREABILITY_MASK |
> > +  GICR_PENDBASER_CACHEABILITY_MASK);
> > +if (!c->redist_ctlr & GICR_CTLR_ENABLE_LPIS) {
> > +reg |= GICR_PENDBASER_PTZ;
> > +}
> 
> Why does the state of the pendbaser register depend on state in the
> redist_ctlr ?

 PTZ bit is write-only, we cannot read it back. And spec says that setting PTZ 
is adviced while LPIs are not enabled, because it shortens down the time of GIC 
initialization. So, i had to implement this small heuristics here. Is this 
approach OK?

> Worth a comment, whatever the answer is.

 I will.

> > +kvm_gicr_access(s, GICR_PENDBASER, ncpu, ®, false);
> > +c->pendbaser = reg & (GICR_PENDBASER_OUTER_CACHEABILITY_MASK |
> > +  GICR_PENDBASER_ADDR_MASK |
> > +  GICR_PENDBASER_SHAREABILITY_MASK |
> > +  GICR_PENDBASER_CACHEABILITY_MASK);
> 
> Why do we need to mask these values?

 I decided to do this at least for the case of KVM->TCG migration (as far as i 
understand, such things are possible). In this case i think we should not 
pollute our state with read-only bits, which get added by the emulation code 
itself.

> Do we not transfer ICC_SRE_EL1 because it's implemented as RO?
> (I think that's right for no-irq/fiq-bypass, sysregs only.)

 Yes, also because looks like KVM is not going to implement GICv3 with non-SRE 
mode, instead, if we want to run a legacy guest, we just configure our host to 
provide GICv2 for it.
 I actually migrate only those CPU interface registers, which are saved by the 
kernel code as part of guest's context.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH v2] target-arm: Extract some external ARM CPU API

2015-10-26 Thread Pavel Fedin

Includes, which reside in target-arm, are very problematic to use from code
which is considered target-independent by the build system (for example,
hw/intc/something.c). It happens because they depend on config-target.h,
which is unreachable in this case. This creates problems for example for
GICv3 implementation, which needs to call define_arm_cp_regs_with_opaque()
in order to add system registers to the processor model, as well as play
with affinity IDs.

This patch solves the problem by extracting some self-sufficient
definitions into public area (include/hw/cpu).

Signed-off-by: Pavel Fedin 
---
v1 => v2:
- mp-affinity property addition left out, now a pure code move
- Move some more useful definitions (REGINFO_SENTINEL) and NOP accessors.
---
 include/hw/cpu/arm.h | 308 +++
 target-arm/cpu-qom.h |  40 +--
 target-arm/cpu.h | 239 +--
 3 files changed, 311 insertions(+), 276 deletions(-)
 create mode 100644 include/hw/cpu/arm.h

diff --git a/include/hw/cpu/arm.h b/include/hw/cpu/arm.h
new file mode 100644
index 000..de7bec7
--- /dev/null
+++ b/include/hw/cpu/arm.h
@@ -0,0 +1,308 @@
+/*
+ * QEMU ARM CPU
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ * Copyright (c) 2012 SUSE LINUX Products GmbH
+ * Copyright (c) 2015 Samsung Electronics Co. Ltd.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see
+ * <http://www.gnu.org/licenses/gpl-2.0.html>
+ */
+
+#ifndef HW_CPU_ARM_H
+#define HW_CPU_ARM_H
+
+#include "qom/cpu.h"
+
+#define TYPE_ARM_CPU "arm-cpu"
+
+#define ARM_CPU_CLASS(klass) \
+OBJECT_CLASS_CHECK(ARMCPUClass, (klass), TYPE_ARM_CPU)
+#define ARM_CPU(obj) \
+OBJECT_CHECK(ARMCPU, (obj), TYPE_ARM_CPU)
+#define ARM_CPU_GET_CLASS(obj) \
+OBJECT_GET_CLASS(ARMCPUClass, (obj), TYPE_ARM_CPU)
+
+/**
+ * ARMCPUClass:
+ * @parent_realize: The parent class' realize handler.
+ * @parent_reset: The parent class' reset handler.
+ *
+ * An ARM CPU model.
+ */
+typedef struct ARMCPUClass {
+/*< private >*/
+CPUClass parent_class;
+/*< public >*/
+
+DeviceRealize parent_realize;
+void (*parent_reset)(CPUState *cpu);
+} ARMCPUClass;
+
+/* These two are black boxes for us */
+typedef struct ARMCPU ARMCPU;
+typedef struct CPUARMState CPUARMState;
+
+/* ARMCPRegInfo type field bits. If the SPECIAL bit is set this is a
+ * special-behaviour cp reg and bits [15..8] indicate what behaviour
+ * it has. Otherwise it is a simple cp reg, where CONST indicates that
+ * TCG can assume the value to be constant (ie load at translate time)
+ * and 64BIT indicates a 64 bit wide coprocessor register. SUPPRESS_TB_END
+ * indicates that the TB should not be ended after a write to this register
+ * (the default is that the TB ends after cp writes). OVERRIDE permits
+ * a register definition to override a previous definition for the
+ * same (cp, is64, crn, crm, opc1, opc2) tuple: either the new or the
+ * old must have the OVERRIDE bit set.
+ * ALIAS indicates that this register is an alias view of some underlying
+ * state which is also visible via another register, and that the other
+ * register is handling migration and reset; registers marked ALIAS will not be
+ * migrated but may have their state set by syncing of register state from KVM.
+ * NO_RAW indicates that this register has no underlying state and does not
+ * support raw access for state saving/loading; it will not be used for either
+ * migration or KVM state synchronization. (Typically this is for "registers"
+ * which are actually used as instructions for cache maintenance and so on.)
+ * IO indicates that this register does I/O and therefore its accesses
+ * need to be surrounded by gen_io_start()/gen_io_end(). In particular,
+ * registers which implement clocks or timers require this.
+ */
+#define ARM_CP_SPECIAL 1
+#define ARM_CP_CONST 2
+#define ARM_CP_64BIT 4
+#define ARM_CP_SUPPRESS_TB_END 8
+#define ARM_CP_OVERRIDE 16
+#define ARM_CP_ALIAS 32
+#define ARM_CP_IO 64
+#define ARM_CP_NO_RAW 128
+#define ARM_CP_NOP (ARM_CP_SPECIAL | (1 << 8))
+#define ARM_CP_WFI (ARM_CP_SPECIAL | (2 << 8))
+#define ARM_CP_NZCV (ARM_CP_SPECIAL | (3 << 8))
+#define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | (4 << 8))
+#define ARM_CP_DC_ZVA (ARM_CP_SPECIAL | (5 << 8))
+#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
+/* Used

Re: [Qemu-devel] [RFC PATCH v3 1/4] hw/intc/arm_gicv3_common: Add state information

2015-10-26 Thread Pavel Fedin

 Hello!

 I skipped many of comments because they are straightforward to address, 
decided to discuss only important ones.

> So we're going to (potentially) emulate:
>  * non-system-register config
>  * non-affinity-routing config
> 
> ? OK, but are we really sure we want to do that? Legacy config
> is deprecated and it's really going to complicate the model...

 I remember that you told me that we are going to emulate full-blown GICv3, 
with HYP support and will all legacy stuff. You told me about this while 
merging the initial GICv3 series, so we reserved MMIO space for legacy CPU 
interface. So, i was pretty confident that we are going to do this over time, 
aren't we?

> (Are we starting out with the non-legacy config that forces
> system-regs and affinity-routing to always-on?)

 Yes, we are, but i also remember that you told me that migration data format, 
once implemented, is set in stone, so we should think very well. (*) So far, i 
added also the following legacy stuff:

1. GICC_CTLR

 This is currently used to store GRPEN bits. I thought that having dedicated 
bool's for them is too much, they are just single bits, and they have to be 
mirrored in GICC_CTLR, once implemented. So i just squashed them in there from 
the beginning. Additionally, some of its bits, like FIQBypDisGrpX and 
IRQBypDisGrpX, do not have analogs in system registers. So, if we even 
implement them, we'll have to store them in dedicated place, and now we already 
have this place.

2. GIC_SPENDSGIR (**)

 Actually this is not used by in-kernel vGIC, but this is necessary for SW 
emulation. Because, as far as i can understand Shlomo's code, for software 
emulation we need to track down which source CPUs are sending SGIs, even if we 
don't emulate legacy interface. Because, for example, if two CPUs send an SGI 
to another CPU at the same time, the destination CPU should actually get two 
interrupts. So, we have to track down whose interrupts are already delivered 
and whose are not yet. And Shlomo's code uses a bitmask for that, which i put 
in GICv3SGISource.

3. GICD_ITARGETSR

 Ok, this is actually not used yet, but again, this does not have any analog in 
system register interface, so once we have legacy mode, we have to store it 
somewhere. So i decided to reserve it too, because of (*).

> > +/* Workaround!
> > + * Linux (drivers/irqchip/irq-gic-v3.c) is enabling only group one,
> > + * in gic_cpu_sys_reg_init it calls gic_write_grpen1(1);
> > + * but it doesn't conigure any interrupt to be in group one.
> > + * The same for SPIs below
> > + */
> 
> Is this a bug in Linux, or is it just expecting that firmware configures
> all interrupts into group 1 for it? (ie do we need some variation on
> commit 8ff41f3995ad2d for gicv3 ?)

 To tell the truth i don't know, i left original Shlomo's comment.
 I'm not sure it’s really a bug, i think Linux relies on the firmware. All 
boards i know have some kind of trustzone code, even minimal one, and i believe 
it's its responsibility to set these things up.

> > +uint32_t gicv3_get_igrpen0(GICv3State *s, int cpuindex)
> > +{
> > +GICv3CPUState *c = &s->cpu[cpuindex];
> > +
> > +return extract32(c->legacy_ctlr, GICC_CTLR_EN_GRP0_SHIFT, 1);
> > +}
> 
> My gut feeling is that if we alias legacy and system register
> state, then we should do it by having the 'master' copy be
> the system register format.

 Explained above. I could have bool grpen[2], but this would be two 32-bit 
variables, not used for anything else. And legacy_ctlr is a single 32-bit 
variable, which is potentially more functional, and provides better backwards 
compatibility for live migration data format.

> > +struct GICv3SGISource {
> > +/* For each SGI on the target CPU, we store bit mask
> > + * indicating which source CPUs have made this SGI
> > + * pending on the target CPU. These correspond to
> > + * the bytes in the GIC_SPENDSGIR* registers as
> > + * read by the target CPU.
> > + */
> > +unsigned long *pending;
> > +int32_t size; /* Bitmap size for migration */
> > +};
> 
> This doesn't look right. There is one GICD_SPENDSGIR* register set
> for each CPU, but each CPU has only 8 pending bits per SGI.
> (That's because this is only relevant for legacy operation
> with affinity-routing disabled, and the limitations on
> legacy operation apply.) So you don't need a huge bitmap here.
> I would default to modelling this as uint32_t spendsgir[4]
> unless there's a good reason to do something else.

 This is about (**). Or do you want to tell that GICv3 with affinity routing 
enabled simply doesn't care about source CPUs, and if several CPUs trigger the 
same SGI for the same destination, the destination gets only one SGI?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] target-arm: Extract some external ARM CPU API

2015-10-23 Thread Pavel Fedin

 Hello!

> Please don't do two things in one patch. (I actually read this patch
> code-first and thought you'd accidentally inserted this change due
> to a rebasing mishap...) It's particularly bad in patches which are
> otherwise almost entirely moving code from one file to another,
> because it's easy to overlook the substantive change in the resulting
> large patch.

 Ok, ok, sorry... I really wanted to push this in somehow, and i thought that 
maybe you'll like it as a little thing that completes the reusable API.
 Should i post v2 with this hunk removed?
 
> > +typedef struct ARMCPUClass {
> > +/*< private >*/
> > +CPUClass parent_class;
> > +/*< public >*/
> > +
> > +DeviceRealize parent_realize;
> > +void (*parent_reset)(CPUState *cpu);
> > +} ARMCPUClass;
> 
> I'm slightly surprised we need to define the ARMCPUClass struct
> here -- what needs the definition rather than just the typedef,
> or is this just to follow other header files?

 Other header files, i believe, do it for a reason. To tell the truth i made 
the patch very quickly, and didn't have much time to dig in 
OBJECT_CLASS_CHECK() internals. Does it need a sizeof() maybe? And without it 
thing looked a bit ugly and incomplete.
 Class structure is pretty self-sufficient and straightforward, so i just moved 
it too.

 I'll reply to the rest of messages on monday, weekend is coming. :)

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [RFC PATCH v3 0/4] GICv3 live migration support

2015-10-22 Thread Pavel Fedin

This series introduces support for GICv3 live migration. It is based on
kernel API which is not released yet, therefore i post it as an RFC.

Kernel patches which implement this functionality are:
- [PATCH v5 0/7] KVM: arm64: Implement API for vGICv3 live migration
  http://www.spinics.net/lists/kvm/msg122040.html

One more prerequisite for this series is:
- [PATCH] target-arm: Extract some external ARM CPU API
  http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05269.html

The main purpose of this RFC is to agree on GICv3 state data format,
because software implementation of GICv3 is also going to use it. In order
to simplify GICv3 software emulation development, parts 1 and 2 of this
patchset can be accepted right now, without waiting for the kernel part.

v2 => v3:
- Integrated state manipulation macros from Shlomo Pongratz' GICv3 RFC
  (with some changes)
- Added fields for SGI source masks
- Do not use mp-affinity property every time, cache IDs internally instead
- Removed mp-affinity property patch, now a part of prerequisite

v1 => v2:
- Use different kernel API, agreed upon by KVM developers
- Reworked state representation, do not duplicate SPI fields any more
- Added basic LPI support (PROPBASER and PENDBASER).

Pavel Fedin (4):
  hw/intc/arm_gicv3_common: Add state information
  kernel: Add definitions for GICv3 attributes
  hw/intc/arm_gicv3_kvm: Implement get/put functions
  hw/intc/arm_gicv3_common: Add vmstate descriptors

 hw/intc/arm_gicv3_common.c | 185 ++-
 hw/intc/arm_gicv3_kvm.c| 456 -
 hw/intc/gicv3_internal.h   | 265 +
 include/hw/intc/arm_gicv3_common.h |  93 +++-
 include/migration/vmstate.h|   9 +
 linux-headers/asm-arm64/kvm.h  |  17 +-
 6 files changed, 1011 insertions(+), 14 deletions(-)
 create mode 100644 hw/intc/gicv3_internal.h

-- 
2.4.4

[Qemu-devel] [RFC PATCH v3 3/4] hw/intc/arm_gicv3_kvm: Implement get/put functions

2015-10-22 Thread Pavel Fedin

This actually implements pre_save and post_load methods for in-kernel
vGICv3.

Signed-off-by: Pavel Fedin 
---
 hw/intc/arm_gicv3_kvm.c | 456 +++-
 1 file changed, 452 insertions(+), 4 deletions(-)

diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index b48f78f..ce8d2a0 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -21,8 +21,11 @@
 
 #include "hw/intc/arm_gicv3_common.h"
 #include "hw/sysbus.h"
+#include "migration/migration.h"
+#include "qemu/error-report.h"
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
+#include "gicv3_internal.h"
 #include "vgic_common.h"
 
 #ifdef DEBUG_GICV3_KVM
@@ -41,6 +44,23 @@
 #define KVM_ARM_GICV3_GET_CLASS(obj) \
  OBJECT_GET_CLASS(KVMARMGICv3Class, (obj), TYPE_KVM_ARM_GICV3)
 
+#define ICC_PMR_EL1 \
+KVM_DEV_ARM_VGIC_SYSREG(0b11, 0b000, 0b0100, 0b0110, 0b000)
+#define ICC_BPR0_EL1\
+KVM_DEV_ARM_VGIC_SYSREG(0b11, 0b000, 0b1100, 0b1000, 0b011)
+#define ICC_APR0_EL1(n) \
+KVM_DEV_ARM_VGIC_SYSREG(0b11, 0b000, 0b1100, 0b1000, 0b100 | n)
+#define ICC_APR1_EL1(n) \
+KVM_DEV_ARM_VGIC_SYSREG(0b11, 0b000, 0b1100, 0b1001, 0b000 | n)
+#define ICC_BPR1_EL1\
+KVM_DEV_ARM_VGIC_SYSREG(0b11, 0b000, 0b1100, 0b1100, 0b011)
+#define ICC_CTLR_EL1\
+KVM_DEV_ARM_VGIC_SYSREG(0b11, 0b000, 0b1100, 0b1100, 0b100)
+#define ICC_IGRPEN0_EL1 \
+KVM_DEV_ARM_VGIC_SYSREG(0b11, 0b000, 0b1100, 0b1100, 0b110)
+#define ICC_IGRPEN1_EL1 \
+KVM_DEV_ARM_VGIC_SYSREG(0b11, 0b000, 0b1100, 0b1100, 0b111)
+
 typedef struct KVMARMGICv3Class {
 ARMGICv3CommonClass parent_class;
 DeviceRealize parent_realize;
@@ -54,16 +74,431 @@ static void kvm_arm_gicv3_set_irq(void *opaque, int irq, 
int level)
 kvm_arm_gic_set_irq(s->num_irq, irq, level);
 }
 
+#define VGIC_CPUID(cpuid) cpuid) & ARM_AFF3_MASK) >> 8) | \
+   ((cpuid) & ARM32_AFFINITY_MASK))
+#define KVM_VGIC_ATTR(reg, cpuid) \
+((VGIC_CPUID(cpuid) << KVM_DEV_ARM_VGIC_CPUID_SHIFT) | (reg))
+
+static inline void kvm_gicd_access(GICv3State *s, int offset, int cpu,
+   uint64_t *val, bool write)
+{
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_DIST_REGS,
+  KVM_VGIC_ATTR(offset, s->cpu[cpu].affinity_id),
+  val, write);
+}
+
+static inline void kvm_gicr_access(GICv3State *s, int offset, int cpu,
+   uint64_t *val, bool write)
+{
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
+  KVM_VGIC_ATTR(offset, s->cpu[cpu].affinity_id),
+  val, write);
+}
+
+static inline void kvm_gicc_access(GICv3State *s, uint64_t reg, int cpu,
+   uint64_t *val, bool write)
+{
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
+  KVM_VGIC_ATTR(reg, s->cpu[cpu].affinity_id),
+  val, write);
+}
+
+/*
+ * Translate from the in-kernel field for an IRQ value to/from the qemu
+ * representation.
+ */
+typedef void (*vgic_translate_fn)(GICv3State *s, int irq, int cpu,
+  uint32_t *field, bool to_kernel);
+
+/* synthetic translate function used for clear/set registers to completely
+ * clear a setting using a clear-register before setting the remaining bits
+ * using a set-register */
+static void translate_clear(GICv3State *s, int irq, int cpu,
+uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = ~0;
+} else {
+/* does not make sense: qemu model doesn't use set/clear regs */
+abort();
+}
+}
+
+static void translate_enabled(GICv3State *s, int irq, int cpu,
+  uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = GIC_TEST_ENABLED(irq, cpu);
+} else {
+GIC_REPLACE_ENABLED(irq, cpu, *field);
+}
+}
+
+static void translate_group(GICv3State *s, int irq, int cpu,
+uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = GIC_TEST_GROUP(irq, cpu);
+} else {
+GIC_REPLACE_GROUP(irq, cpu, *field);
+}
+}
+
+static void translate_trigger(GICv3State *s, int irq, int cpu,
+  uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = GIC_TEST_EDGE_TRIGGER(irq, cpu) ? 2 : 0;
+} else {
+GIC_REPLACE_EDGE_TRIGGER(irq, cpu, *field & 2);
+}
+}
+
+static void translate_pending(GICv3State *s, int irq, int cpu,
+  uint32_t *field, bool to_kernel)
+{
+if (to_kernel) {
+*field = gic_test_pending(s, irq, cpu);
+} else {
+GIC_REPLACE_PENDING(irq, cpu, *field);
+/* TODO: Capture if level-line is held high in the kernel

[Qemu-devel] [RFC PATCH v3 4/4] hw/intc/arm_gicv3_common: Add vmstate descriptors

2015-10-22 Thread Pavel Fedin

Add state structure descriptors and actually enable live migration.

In order to describe fixed-size bitmaps, VMSTATE_BITMAP_STATIC() macro is
introduced.

Signed-off-by: Pavel Fedin 
---
 hw/intc/arm_gicv3_common.c  | 58 -
 include/migration/vmstate.h |  9 +++
 2 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 2082d05..12d9de1 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -45,11 +45,67 @@ static int gicv3_post_load(void *opaque, int version_id)
 return 0;
 }
 
+static const VMStateDescription vmstate_gicv3_sgi = {
+.name = "arm_gicv3_sgi",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_BITMAP(pending, GICv3SGISource, 0, size),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_gicv3_cpu = {
+.name = "arm_gicv3_cpu",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_BOOL(cpu_enabled, GICv3CPUState),
+VMSTATE_UINT32(redist_ctlr, GICv3CPUState),
+VMSTATE_UINT32(group, GICv3CPUState),
+VMSTATE_UINT32(enabled, GICv3CPUState),
+VMSTATE_UINT32(pending, GICv3CPUState),
+VMSTATE_UINT32(active, GICv3CPUState),
+VMSTATE_UINT32(edge_trigger, GICv3CPUState),
+VMSTATE_UINT8_ARRAY(priority, GICv3CPUState, GIC_INTERNAL),
+VMSTATE_UINT64(propbaser, GICv3CPUState),
+VMSTATE_UINT64(pendbaser, GICv3CPUState),
+VMSTATE_UINT32_ARRAY(ctlr, GICv3CPUState, 2),
+VMSTATE_UINT32(priority_mask, GICv3CPUState),
+VMSTATE_UINT32_ARRAY(bpr, GICv3CPUState, 2),
+VMSTATE_UINT32_2DARRAY(apr, GICv3CPUState, 4, 2),
+VMSTATE_UINT32(legacy_ctlr, GICv3CPUState),
+VMSTATE_STRUCT_ARRAY(sgi, GICv3CPUState, GIC_NR_SGIS, 1,
+ vmstate_gicv3_sgi, GICv3SGISource),
+VMSTATE_UINT16(current_pending, GICv3CPUState),
+VMSTATE_UINT16(running_irq, GICv3CPUState),
+VMSTATE_UINT16(running_priority, GICv3CPUState),
+VMSTATE_UINT16_ARRAY(last_active, GICv3CPUState, GICV3_MAXIRQ),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_gicv3 = {
 .name = "arm_gicv3",
-.unmigratable = 1,
+.version_id = 1,
+.minimum_version_id = 1,
 .pre_save = gicv3_pre_save,
 .post_load = gicv3_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(ctlr, GICv3State),
+VMSTATE_BITMAP_STATIC(group, GICv3State , 0, GICV3_MAXSPI),
+VMSTATE_BITMAP_STATIC(enabled, GICv3State , 0, GICV3_MAXSPI),
+VMSTATE_BITMAP_STATIC(pending, GICv3State , 0, GICV3_MAXSPI),
+VMSTATE_BITMAP_STATIC(active, GICv3State , 0, GICV3_MAXSPI),
+VMSTATE_BITMAP_STATIC(level, GICv3State , 0, GICV3_MAXSPI),
+VMSTATE_BITMAP_STATIC(edge_trigger, GICv3State , 0, GICV3_MAXSPI),
+VMSTATE_UINT8_ARRAY(priority, GICv3State, GICV3_MAXSPI),
+VMSTATE_UINT8_ARRAY(irq_target, GICv3State, GICV3_MAXSPI),
+VMSTATE_UINT64_ARRAY(irq_route, GICv3State, GICV3_MAXSPI),
+VMSTATE_STRUCT_VARRAY_POINTER_UINT32(cpu, GICv3State, num_cpu,
+ vmstate_gicv3_cpu, GICv3CPUState),
+VMSTATE_END_OF_LIST()
+}
 };
 
 void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 9a65522..7d060e9 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -537,6 +537,15 @@ extern const VMStateInfo vmstate_info_bitmap;
 .offset   = offsetof(_state, _field),\
 }
 
+#define VMSTATE_BITMAP_STATIC(_field, _state, _version, _size) { \
+.name = (stringify(_field)), \
+.version_id   = (_version),  \
+.size = (_size), \
+.info = &vmstate_info_bitmap,\
+.flags= VMS_BUFFER,  \
+.offset   = offsetof(_state, _field),\
+}
+
 /* _f : field name
_f_n : num of elements field_name
_n : num of elements
-- 
2.4.4

[Qemu-devel] [RFC PATCH v3 2/4] kernel: Add definitions for GICv3 attributes

2015-10-22 Thread Pavel Fedin

This temporary patch adds kernel API definitions. Use proper header update
procedure after these features are released.

Signed-off-by: Pavel Fedin 
---
 linux-headers/asm-arm64/kvm.h | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index d3714c0..6e377c1 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -179,14 +179,14 @@ struct kvm_arch_memory_slot {
KVM_REG_ARM64_SYSREG_ ## n ## _MASK)
 
 #define __ARM64_SYS_REG(op0,op1,crn,crm,op2) \
-   (KVM_REG_ARM64 | KVM_REG_ARM64_SYSREG | \
-   ARM64_SYS_REG_SHIFT_MASK(op0, OP0) | \
+   (ARM64_SYS_REG_SHIFT_MASK(op0, OP0) | \
ARM64_SYS_REG_SHIFT_MASK(op1, OP1) | \
ARM64_SYS_REG_SHIFT_MASK(crn, CRN) | \
ARM64_SYS_REG_SHIFT_MASK(crm, CRM) | \
ARM64_SYS_REG_SHIFT_MASK(op2, OP2))
 
-#define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_SIZE_U64)
+#define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_ARM64 | \
+   KVM_REG_SIZE_U64 | KVM_REG_ARM64_SYSREG)
 
 #define KVM_REG_ARM_TIMER_CTL  ARM64_SYS_REG(3, 3, 14, 3, 1)
 #define KVM_REG_ARM_TIMER_CNT  ARM64_SYS_REG(3, 3, 14, 3, 2)
@@ -197,12 +197,21 @@ struct kvm_arch_memory_slot {
 #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
 #define KVM_DEV_ARM_VGIC_GRP_CPU_REGS  2
 #define   KVM_DEV_ARM_VGIC_CPUID_SHIFT 32
-#define   KVM_DEV_ARM_VGIC_CPUID_MASK  (0xffULL << 
KVM_DEV_ARM_VGIC_CPUID_SHIFT)
+#define   KVM_DEV_ARM_VGIC_CPUID_MASK  (0xULL << 
KVM_DEV_ARM_VGIC_CPUID_SHIFT)
 #define   KVM_DEV_ARM_VGIC_OFFSET_SHIFT0
 #define   KVM_DEV_ARM_VGIC_OFFSET_MASK (0xULL << 
KVM_DEV_ARM_VGIC_OFFSET_SHIFT)
 #define KVM_DEV_ARM_VGIC_GRP_NR_IRQS   3
 #define KVM_DEV_ARM_VGIC_GRP_CTRL  4
 #define   KVM_DEV_ARM_VGIC_CTRL_INIT   0
+#define KVM_DEV_ARM_VGIC_GRP_REDIST_REGS 5
+#define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
+#define   KVM_DEV_ARM_VGIC_SYSREG_MASK (KVM_REG_ARM64_SYSREG_OP0_MASK | \
+KVM_REG_ARM64_SYSREG_OP1_MASK | \
+KVM_REG_ARM64_SYSREG_CRN_MASK | \
+KVM_REG_ARM64_SYSREG_CRM_MASK | \
+KVM_REG_ARM64_SYSREG_OP2_MASK)
+#define   KVM_DEV_ARM_VGIC_SYSREG(op0,op1,crn,crm,op2) \
+   __ARM64_SYS_REG(op0,op1,crn,crm,op2)
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
-- 
2.4.4

[Qemu-devel] [RFC PATCH v3 1/4] hw/intc/arm_gicv3_common: Add state information

2015-10-22 Thread Pavel Fedin

Add state information to GICv3 object structure and implement
arm_gicv3_common_reset(). Also, add some functions for registers which are
not stored directly but simulated.

State information includes not only pure GICv3 data, but also some legacy
registers. This will be useful for implementing software emulation of GICv3
with v2 backwards compatilibity mode.

Signed-off-by: Pavel Fedin 
---
 hw/intc/arm_gicv3_common.c | 127 +-
 hw/intc/gicv3_internal.h   | 265 +
 include/hw/intc/arm_gicv3_common.h |  93 -
 3 files changed, 480 insertions(+), 5 deletions(-)
 create mode 100644 hw/intc/gicv3_internal.h

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 032ece2..2082d05 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -3,8 +3,9 @@
  *
  * Copyright (c) 2012 Linaro Limited
  * Copyright (c) 2015 Huawei.
+ * Copyright (c) 2015 Samsung Electronics Co., Ltd.
  * Written by Peter Maydell
- * Extended to 64 cores by Shlomo Pongratz
+ * Reworked for GICv3 by Shlomo Pongratz and Pavel Fedin
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -21,6 +22,7 @@
  */
 
 #include "hw/intc/arm_gicv3_common.h"
+#include "gicv3_internal.h"
 
 static void gicv3_pre_save(void *opaque)
 {
@@ -88,6 +90,8 @@ void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler 
handler,
 static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
 {
 GICv3State *s = ARM_GICV3_COMMON(dev);
+Object *cpu;
+unsigned int i, j;
 
 /* revision property is actually reserved and currently used only in order
  * to keep the interface compatible with GICv2 code, avoiding extra
@@ -98,11 +102,130 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
 error_setg(errp, "unsupported GIC revision %d", s->revision);
 return;
 }
+
+if (s->num_irq > GICV3_MAXIRQ) {
+error_setg(errp,
+   "requested %u interrupt lines exceeds GIC maximum %d",
+   s->num_irq, GICV3_MAXIRQ);
+return;
+}
+
+s->cpu = g_malloc(s->num_cpu * sizeof(GICv3CPUState));
+
+for (i = 0; i < s->num_cpu; i++) {
+for (j = 0; j < GIC_NR_SGIS; j++) {
+s->cpu[i].sgi[j].pending = g_malloc(BITS_TO_LONGS(s->num_cpu));
+s->cpu[i].sgi[j].size = s->num_cpu;
+}
+
+cpu = OBJECT(qemu_get_cpu(i));
+s->cpu[i].affinity_id = object_property_get_int(cpu, "mp-affinity",
+NULL);
+}
 }
 
 static void arm_gicv3_common_reset(DeviceState *dev)
 {
-/* TODO */
+GICv3State *s = ARM_GICV3_COMMON(dev);
+unsigned int i, j;
+
+for (i = 0; i < s->num_cpu; i++) {
+GICv3CPUState *c = &s->cpu[i];
+
+c->cpu_enabled = false;
+c->redist_ctlr = 0;
+c->propbaser = GICR_PROPBASER_InnerShareable|GICR_PROPBASER_WaWb;
+c->pendbaser = GICR_PENDBASER_InnerShareable|GICR_PENDBASER_WaWb;
+/* Workaround!
+ * Linux (drivers/irqchip/irq-gic-v3.c) is enabling only group one,
+ * in gic_cpu_sys_reg_init it calls gic_write_grpen1(1);
+ * but it doesn't conigure any interrupt to be in group one.
+ * The same for SPIs below
+ */
+c->group = 0x;
+/* GIC-500 comment 'j' SGI are always enabled */
+c->enabled = 0x;
+c->pending = 0;
+c->active = 0;
+c->level = 0;
+c->edge_trigger = 0x;
+memset(c->priority, 0, sizeof(c->priority));
+for (j = 0; j < GIC_NR_SGIS; j++) {
+bitmap_zero(c->sgi[j].pending, s->num_cpu);
+}
+
+c->ctlr[0] = 0;
+c->ctlr[1] = 0;
+c->legacy_ctlr = 0;
+c->priority_mask = 0;
+c->bpr[0] = GIC_MIN_BPR0;
+c->bpr[1] = GIC_MIN_BPR1;
+memset(c->apr, 0, sizeof(c->apr));
+
+c->current_pending = 1023;
+c->running_irq = 1023;
+c->running_priority = 0x100;
+memset(c->last_active, 0, sizeof(c->last_active));
+}
+
+memset(s->group, 0, sizeof(s->group));
+memset(s->enabled, 0, sizeof(s->enabled));
+memset(s->pending, 0, sizeof(s->pending));
+memset(s->active, 0, sizeof(s->active));
+memset(s->level, 0, sizeof(s->level));
+memset(s->edge_trigger, 0, sizeof(s->edge_trigger));
+
+/* Workaround! (the same as c->group above) */
+for (i = GIC_INTERNAL; i < s->num_irq; i++) {
+set_bit(i - GIC_INTERNAL, s->group);
+}
+
+/* By default all interrupts always target CPU #0 */
+for

[Qemu-devel] [PATCH] target-arm: Extract some external ARM CPU API

2015-10-22 Thread Pavel Fedin

Includes, which reside in target-arm, are very problematic to use from code
which is considered target-independent by the build system (for example,
hw/intc/something.c). It happens because they depend on config-target.h,
which is unreachable in this case. This creates problems for example for
GICv3 implementation, which needs to call define_arm_cp_regs_with_opaque()
in order to add system registers to the processor model, as well as play
with affinity IDs.

This patch solves the problem by extracting some self-sufficient
definitions into public area (include/hw/cpu).

Additionally, the patch introduces useful "mp-affinity" property.

Signed-off-by: Pavel Fedin 
---
 include/hw/cpu/arm.h | 295 +++
 target-arm/cpu-qom.h |  40 +--
 target-arm/cpu.c |   1 +
 target-arm/cpu.h | 226 +--
 4 files changed, 299 insertions(+), 263 deletions(-)
 create mode 100644 include/hw/cpu/arm.h

diff --git a/include/hw/cpu/arm.h b/include/hw/cpu/arm.h
new file mode 100644
index 000..17544c9
--- /dev/null
+++ b/include/hw/cpu/arm.h
@@ -0,0 +1,295 @@
+/*
+ * QEMU ARM CPU
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ * Copyright (c) 2012 SUSE LINUX Products GmbH
+ * Copyright (c) 2015 Samsung Electronics Co. Ltd.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see
+ * <http://www.gnu.org/licenses/gpl-2.0.html>
+ */
+
+#ifndef HW_CPU_ARM_H
+#define HW_CPU_ARM_H
+
+#include "qom/cpu.h"
+
+#define TYPE_ARM_CPU "arm-cpu"
+
+#define ARM_CPU_CLASS(klass) \
+OBJECT_CLASS_CHECK(ARMCPUClass, (klass), TYPE_ARM_CPU)
+#define ARM_CPU(obj) \
+OBJECT_CHECK(ARMCPU, (obj), TYPE_ARM_CPU)
+#define ARM_CPU_GET_CLASS(obj) \
+OBJECT_GET_CLASS(ARMCPUClass, (obj), TYPE_ARM_CPU)
+
+/**
+ * ARMCPUClass:
+ * @parent_realize: The parent class' realize handler.
+ * @parent_reset: The parent class' reset handler.
+ *
+ * An ARM CPU model.
+ */
+typedef struct ARMCPUClass {
+/*< private >*/
+CPUClass parent_class;
+/*< public >*/
+
+DeviceRealize parent_realize;
+void (*parent_reset)(CPUState *cpu);
+} ARMCPUClass;
+
+/* These two are black boxes for us */
+typedef struct ARMCPU ARMCPU;
+typedef struct CPUARMState CPUARMState;
+
+/* ARMCPRegInfo type field bits. If the SPECIAL bit is set this is a
+ * special-behaviour cp reg and bits [15..8] indicate what behaviour
+ * it has. Otherwise it is a simple cp reg, where CONST indicates that
+ * TCG can assume the value to be constant (ie load at translate time)
+ * and 64BIT indicates a 64 bit wide coprocessor register. SUPPRESS_TB_END
+ * indicates that the TB should not be ended after a write to this register
+ * (the default is that the TB ends after cp writes). OVERRIDE permits
+ * a register definition to override a previous definition for the
+ * same (cp, is64, crn, crm, opc1, opc2) tuple: either the new or the
+ * old must have the OVERRIDE bit set.
+ * ALIAS indicates that this register is an alias view of some underlying
+ * state which is also visible via another register, and that the other
+ * register is handling migration and reset; registers marked ALIAS will not be
+ * migrated but may have their state set by syncing of register state from KVM.
+ * NO_RAW indicates that this register has no underlying state and does not
+ * support raw access for state saving/loading; it will not be used for either
+ * migration or KVM state synchronization. (Typically this is for "registers"
+ * which are actually used as instructions for cache maintenance and so on.)
+ * IO indicates that this register does I/O and therefore its accesses
+ * need to be surrounded by gen_io_start()/gen_io_end(). In particular,
+ * registers which implement clocks or timers require this.
+ */
+#define ARM_CP_SPECIAL 1
+#define ARM_CP_CONST 2
+#define ARM_CP_64BIT 4
+#define ARM_CP_SUPPRESS_TB_END 8
+#define ARM_CP_OVERRIDE 16
+#define ARM_CP_ALIAS 32
+#define ARM_CP_IO 64
+#define ARM_CP_NO_RAW 128
+#define ARM_CP_NOP (ARM_CP_SPECIAL | (1 << 8))
+#define ARM_CP_WFI (ARM_CP_SPECIAL | (2 << 8))
+#define ARM_CP_NZCV (ARM_CP_SPECIAL | (3 << 8))
+#define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | (4 << 8))
+#define ARM_CP_DC_ZVA (ARM_CP_SPECIAL | (5 << 8))
+#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
+/* Used only as a terminator for ARMCPRegInfo l

Re: [Qemu-devel] [PATCH RFC V5 1/9] hw/intc: Implement GIC-500 support files

2015-10-22 Thread Pavel Fedin

)
> +
> +/*
> + * Re-Distributor registers, offsets from SGI_base
> + */
> +#define GICR_ISENABLER0 GICD_ISENABLER
> +#define GICR_ICENABLER0 GICD_ICENABLER
> +#define GICR_ISPENDR0   GICD_ISPENDR
> +#define GICR_ICPENDR0   GICD_ICPENDR
> +#define GICR_ISACTIVER0 GICD_ISACTIVER
> +#define GICR_ICACTIVER0 GICD_ICACTIVER
> +#define GICR_IPRIORITYR0GICD_IPRIORITYR
> +#define GICR_ICFGR0 GICD_ICFGR
> +
> +#define GICR_TYPER_VLPIS(1U << 1)
> +#define GICR_TYPER_LAST (1U << 4)
> +
> +/*
> + * Simulated used system registers
> + */
> +#define GICC_CTLR_EN_GRP0(1U << 0)
> +#define GICC_CTLR_EN_GRP1(1U << 1)
> +#define GICC_CTLR_ACK_CTL(1U << 2)
> +#define GICC_CTLR_FIQ_EN (1U << 3)
> +#define GICC_CTLR_CBPR   (1U << 4) /* GICv1: SBPR */
> +#define GICC_CTLR_EOIMODE(1U << 9)
> +#define GICC_CTLR_EOIMODE_NS (1U << 10)
> +
> +#endif /* !QEMU_ARM_GIC_INTERNAL_H */
> diff --git a/include/hw/intc/arm_gicv3.h b/include/hw/intc/arm_gicv3.h
> new file mode 100644
> index 000..a03af35
> --- /dev/null
> +++ b/include/hw/intc/arm_gicv3.h
> @@ -0,0 +1,44 @@
> +/*
> + * ARM GIC support
> + *
> + * Copyright (c) 2012 Linaro Limited
> + * Copyright (c) 2015 Huawei.
> + * Written by Peter Maydell
> + * Extended to 64 cores by Shlomo Pongratz
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_ARM_GICV3_H
> +#define HW_ARM_GICV3_H
> +
> +#include "arm_gicv3_common.h"
> +
> +#define TYPE_ARM_GICV3 "arm_gicv3"
> +#define ARM_GICV3(obj) \
> + OBJECT_CHECK(GICv3State, (obj), TYPE_ARM_GICV3)
> +#define ARM_GICV3_CLASS(klass) \
> + OBJECT_CLASS_CHECK(ARMGICv3Class, (klass), TYPE_ARM_GICV3)
> +#define ARM_GICV3_GET_CLASS(obj) \
> + OBJECT_GET_CLASS(ARMGICv3Class, (obj), TYPE_ARM_GICV3)
> +
> +typedef struct ARMGICv3Class {
> +/*< private >*/
> +ARMGICv3CommonClass parent_class;
> +/*< public >*/
> +
> +DeviceRealize parent_realize;
> +} ARMGICv3Class;
> +
> +#endif
> diff --git a/include/hw/intc/arm_gicv3_common.h 
> b/include/hw/intc/arm_gicv3_common.h
> index c2fd8da..aa32229 100644
> --- a/include/hw/intc/arm_gicv3_common.h
> +++ b/include/hw/intc/arm_gicv3_common.h
> @@ -26,6 +26,51 @@
>  #include "hw/sysbus.h"
>  #include "hw/intc/arm_gic_common.h"
> 
> +/* Maximum number of possible interrupts, determined by the GIC architecture 
> */
> +#define GICV3_MAXIRQ 1020
> +/* First 32 are private to each CPU (SGIs and PPIs). */
> +#define GICV3_INTERNAL 32
> +#define GICV3_NR_SGIS 16
> +
> +#define ARM_AFF0_SHIFT 0
> +#define ARM_AFF0_MASK  (0xFFULL << ARM_AFF0_SHIFT)
> +#define ARM_AFF1_SHIFT 8
> +#define ARM_AFF1_MASK  (0xFFULL << ARM_AFF1_SHIFT)
> +#define ARM_AFF2_SHIFT 16
> +#define ARM_AFF2_MASK  (0xFFULL << ARM_AFF2_SHIFT)
> +#define ARM_AFF3_SHIFT 32
> +#define ARM_AFF3_MASK  (0xFFULL << ARM_AFF3_SHIFT)

 Actually, these are defined in target-arm/cpu-qom.h. Looks like it was my 
fault and they have to be moved to something like
include/hw/cpu/cpu_arm.h. I'll post some patches to move them, but not sure 
that Peter accepts them, just because "there are no
users for this yet".

> +
> +#define MAX_NR_GROUP_PRIO 128
> +
> +typedef struct gicv3_irq_state {
> +/* The enable bits are only banked for per-cpu interrupts.  */
> +unsigned long *enabled;
> +unsigned long *pending;
> +unsigned long *active;
> +unsigned long *level;
> +unsigned long *group;
> +unsigned long *target;
> +uint16_t *last_active;
> +bool edge_trigger; /* true: edge-triggered, false: level-triggered  */
> +int32_t num_cpu; /* For VMSTATE_BITMAP & VMSTATE_VARRAY */
> +} gicv3_irq_state;
> +
> +typedef struct gicv3_sgi_state {
> +unsigned long *pending;
> +int32_t num_cpu; /* For VMSTATE_BITMAP */
> +} gicv3_sgi_state;
> +
> +typedef struct gicv3_sgi_vec {
> +struct gicv3_sgi_state *state;
> +int32_t num_cpu; /* For VMSTATE_VARRAY */
> +} gicv3_sgi_vec;
> +
> +typedef struct gicv3_priority {
> +uint8_t *p;
> +int32_t num_cpu; /* For VMSTATE_VARRAY */
> +} gicv3_Priority;
> +
>  typedef struct GICv3State {
>  /*< private >*/
>  SysBusDevice parent_obj;
> @@ -33,14 +78,45 @@ typedef struct GICv3State {
> 
>  qemu_irq *parent_irq;
>  qemu_irq *parent_fiq;
> +/* GICD_CTLR; for a GIC with the security extensions the NS banked 
> version
> + * of this register is just an alias of bit 1 of the S banked version.
> + */
> +uint32_t ctlr;
> +/* Sim GICC_CTLR; again, the NS banked version is just aliases of bits of
> + * the S banked register, so our state only needs to store the S version.
> + */
> +uint32_t *cpu_ctlr;
> +unsigned long *cpu_enabled;
> +
> +gicv3_irq_state irq_state[GICV3_MAXIRQ];
> +gicv3_Priority priority1[GICV3_INTERNAL];
> +uint8_t priority2[GICV3_MAXIRQ - GICV3_INTERNAL];
> +/* For each SGI on the target CPU, we store bitmap
> + * indicating which source CPUs have made this SGI
> + * pending on the target CPU. These correspond to
> + * the bytes in the GIC_SPENDSGIR* registers as
> + * read by the target CPU.
> + */
> +gicv3_sgi_vec sgi[GICV3_NR_SGIS];
> +
> +uint16_t *priority_mask;
> +uint16_t *running_irq;
> +uint16_t *running_priority;
> +uint16_t *current_pending;
> +uint32_t num_mp_affinity;
> +uint64_t *mp_affinity;
> 
>  MemoryRegion iomem_dist; /* Distributor */
> +MemoryRegion iomem_spi;
> +MemoryRegion iomem_its_cntrl;
> +MemoryRegion iomem_its;
>  MemoryRegion iomem_redist; /* Redistributors */
> 
> +uint32_t cpu_mask; /* For redistributer */
>  uint32_t num_cpu;
>  uint32_t num_irq;
>  uint32_t revision;
> -bool security_extn;
> +uint8_t security_levels; /* replace seurity extentions */
> 
>  int dev_fd; /* kvm device fd if backed by kvm vgic support */
>  } GICv3State;
> --
> 1.9.1

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions support

2015-10-22 Thread Pavel Fedin

 Hello!

>> It has nothing to do with KVM. EFI is a firmware, which originates from 
>> Intel, but now adopted by ARM64 architecture too. You can also run
>> it under qemu, if you want to make kind of "full" machine. And it writes 
>> some value to BPR1, which is indeed ignored by Linux kernel.
 
> So were is it used in QEMU?

You can configure any ARM machine to use firmware blob instead of kernel + 
device tree. This way it looks more like a real machine.

> Which machine in hw/arm needs it?

 We ran it on virt, for example.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions support

2015-10-22 Thread Pavel Fedin

 Hello!

> I've implemented the registers accessed by Linux driver in 
> drivers/irqchip/irq-gic-v3.c
> If this register is used only with KVM e.g. virt/kvm/arm/vgic-v3.c than it is 
> out of my mandate.

 It has nothing to do with KVM. EFI is a firmware, which originates from Intel, 
but now adopted by ARM64 architecture too. You can also run it under qemu, if 
you want to make kind of "full" machine. And it writes some value to BPR1, 
which is indeed ignored by Linux kernel.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 8/9] target-arm/cpu64 GICv3 system instructions support

2015-10-22 Thread Pavel Fedin

nv);
> +value = armv8_gicv3_get_igrpen1(ri->opaque, cpu->cpu_index);
> +return value;
> +}
> +
> +static void igrpen1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
> value)
> +{
> +CPUState *cpu = ENV_GET_CPU(env);
> +armv8_gicv3_set_igrpen1(ri->opaque, cpu->cpu_index, value);
> +}
> +#endif
> +
> +static const ARMCPRegInfo cortex_a57_a53_cp_with_opaque_reginfo[] = {
> +{ .name = "EIOR1_EL1", .state = ARM_CP_STATE_AA64,
> +#ifndef CONFIG_USER_ONLY
> +  .writefn = eoir_write,
> +#endif
> +  .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 1,
> +  .access = PL1_W, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
> +{ .name = "IAR1_EL1", .state = ARM_CP_STATE_AA64,
> +#ifndef CONFIG_USER_ONLY
> +  .readfn = iar_read,
> +#endif
> +  .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 0,
> +  .access = PL1_R, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
> +{ .name = "SGI1R_EL1", .state = ARM_CP_STATE_AA64,
> +#ifndef CONFIG_USER_ONLY
> +  .writefn = sgi_write,
> +#endif
> +  .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 11, .opc2 = 5,
> +  .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
> +{ .name = "PMR_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 0, .crn = 4, .crm = 6, .opc2 = 0,
> +#ifndef CONFIG_USER_ONLY
> +  .readfn = pmr_read, .writefn = pmr_write,
> +#endif
> +  .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
> +{ .name = "CTLR_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 4,
> +  .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
> +{ .name = "SRE_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 5, .resetvalue = 0,
> +#ifndef CONFIG_USER_ONLY
> +  .readfn = sre_read, .writefn = sre_write,
> +#endif
> +  .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
> +{ .name = "IGRPEN1_EL1", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 7,
> +#ifndef CONFIG_USER_ONLY
> +  .readfn = igrpen1_read, .writefn = igrpen1_write,
> +#endif
> +  .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
> +REGINFO_SENTINEL
> +};
> +

 One more note on this table, which i previously forgot. We tried to run EFI 
here, and it crashes with your emulation because when
setting up GICv3 it also pokes BPR1_EL1 register. You should implement it.

>  static const ARMCPRegInfo cortex_a57_a53_cp_reginfo[] = {
>  #ifndef CONFIG_USER_ONLY
>  { .name = "L2CTLR_EL1", .state = ARM_CP_STATE_AA64,
> @@ -258,6 +367,15 @@ static void aarch64_cpu_set_aarch64(Object *obj, bool 
> value, Error **errp)
>  }
>  }
> 
> +void aarch64_registers_with_opaque_set(Object *obj, void *opaque)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +
> +define_arm_cp_regs_with_opaque(cpu,
> +   cortex_a57_a53_cp_with_opaque_reginfo,
> +   opaque);
> +}
> +
>  static void aarch64_cpu_initfn(Object *obj)
>  {
>  object_property_add_bool(obj, "aarch64", aarch64_cpu_get_aarch64,
> --
> 1.9.1

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 6/9] hw/intc: arm_gicv3_spi_its

2015-10-21 Thread Pavel Fedin

 Hello!

> This patch includes a placeholder code for future spi and its
> implementation.

 Forgot to comment on this. I see that here you are building an ITS into GIC as 
a monolithic thing. This can be wrong because we
could want to emulate platforms which have GICv3 but don't have ITS. I would 
suggest to implement ITS as a separate class, and i
have actually done it in 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg04613.html.
 So, i think we should not bother about ITS for now and suggest you to leave it 
out completely and focus on pure GICv3.
 Even more, i have software-emulated ITS code in one of my abandoned branches, 
it lacks only GICv3's ability to receive LPIs (which
is indeed GIC's functionality). After you're done with GIC, i could rebase it 
on top and post at least as an RFC. May be i'll
complete it myself, may be you'll want to pick it up, i don't know. Actually, 
it works with Linux kernel and successfully translates
an MSI event into LPI number, which is then injected into the GIC object. Just 
it's GIC missing the appropriate LPI hanlders. I
wrote it when there was no in-kernel vITS implementation, but abandoned when 
vITS patch series was published.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] PING: [PATCH] Add mp-affinity property for ARM CPU class

2015-10-21 Thread Pavel Fedin

 Hello!

> Nothing wrong with this patch, but I'd rather add it as part of
> the series which actually uses it. (I see you have it in one of
> your GICv3 patchsets.)

 Just a small PING, Shlomo asked for this in 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg04884.html
 I told him that he can also include it in his GIC-500 software emulation 
patchset, but finally it's up to you.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 6/9] hw/intc: arm_gicv3_spi_its

2015-10-21 Thread Pavel Fedin

 Hello!

> I just added a placeholder, I didn't add any functionality.

 I see. Just wanted to say, that if we accept my proposal (implementing ITS as 
a separate object), then the only thing we would do with this placeholder is to 
remove it. It should go then to something like hw/intc/arm_gicv3_its.c and its 
object should inherit from TYPE_ARM_GICV3_ITS_COMMON.
 Or do you have some explicit reasons to have everything as a monolith?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 1/9] hw/intc: Implement GIC-500 support files

2015-10-21 Thread Pavel Fedin

 Hello!

> Do you mean that in virt.c::create_gic I'll take the cpu's affinity from the 
> cpu's property and not directly from
> ARM_CPU(qemu_get_cpu(i))->mp_affinity

 I mean that you can do it in your GIC's realize function. And, even better, in 
arm_gicv3_common_realize(), because KVM GICv3 live migration code will also 
need it:
--- cut ---
for (i = 0; i < s->num_cpu; i++) {
Object *cpu = OBJECT(qemu_get_cpu(i));
s->cpu[i].mp_affinity = object_property_get_int(cpu, "mp-affinity", NULL);
}
--- cut ---

> I can do that but that depends on acceptance of your patch.

 Peter ACKed it, just he doesn't like having unused code: 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg03105.html
 Just include it into your next respin and forget. :) Actually, i made my RFC 
so that you could just take 0001 and 0002 from it and use for your purpose. 
With additions, of course, if necessary.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 9/9] hw/arm: Add virt-v3 machine that uses GIC-500

2015-10-21 Thread Pavel Fedin

 Hello!

> I see, just how do I pass the gic version from the command line?

 Easy. -machine virt,gic-version=3

> GICV3 is accessed by system instructions that exists only in ARCH64.

 Wrong. In 32-bit mode the CPU sees them as CP15 registers. See "8.5 AArch32 
System register descriptions".
 I would say, system registers are nothing new, these are just renamed CP15 
registers, with "CPn" prefix removed.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 1/9] hw/intc: Implement GIC-500 support files

2015-10-21 Thread Pavel Fedin

 Hello!

>> See this: 
>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg02349.html. This 
>> is also a part of my live migration RFC.
>> I remember that Peter told long time ago that "it should really be a 
>> property", when i integrated full affinity support. But, he currently
>> refused to accept this small standalone patch because there are no users for 
>> now. My GICv3 live migration is waiting for kernel API to be
>> ready. And kernel API is waiting for kernel 4.5 development cycle to begin.
> So please resubmit it and mention me as a client.

 Ok, i'll PING, but you can also include it into your patchset. BTW, why is it 
still RFC?

> But I wonder if accessing the property in real time (not in only in 
> initialization) from the GIC code will have impact on performance.

 It can, but you can cache them during realize. For example, if you accept my 
data layout, then you can just add "uint64_t mp_affinity" to GICv3CPUState.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 6/9] hw/intc: arm_gicv3_spi_its

2015-10-21 Thread Pavel Fedin

 Hello!

> For QEMU we could in theory do either; I was leaning towards
> direct connection just because on the KVM side the in-kernel
> GIC isn't going to separate them out as two distinct things.

 I'd say this is not entirely true. With KVM you still can have vGIC without 
vITS. Just don't set KVM_VGIC_V3_ADDR_TYPE_ITS and that's it. The only thing 
linked in is LPI support. With KVM you cannot have LPIs without vITS. Neither 
you can directly inject LPIs.
 So, in-kernel ITS is also optional.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 6/9] hw/intc: arm_gicv3_spi_its

2015-10-21 Thread Pavel Fedin

 Hello!

> I just wanted to understand. I don't have any preferences.

 In other words, in short: spec says that ITS is optional, so we can implement 
it as a separate component, which gets attached to the GIC using some specified 
interface. It's not a problem to design such an interface. Actually, i believe 
real HW does the same thing.
 In my RFC i have implemented a part of this interface. My ITS class has 
gic-parent property, which is used to attach it to the GIC. KVM implementation 
fetches vGIC's fd from there, while software emulation can use it to call LPI 
methods on the GIC. The property is declared as implementation-specific only 
because it would have different object type, for additional fail-safety. 
Software-emulated ITS cannot be attached to KVM vGIC and vice versa, actually 
only because kernel guys don't want direct LPI injection.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 6/9] hw/intc: arm_gicv3_spi_its

2015-10-21 Thread Pavel Fedin

 Hello!

> As far as I understand Figure 1 in GICv3 architecture document the interrupts 
> from device goes to the distributor and from it to the
> re-distributors or from the deices via the ITS to the re-distributors.
> So eventually ITS should be part of the GIC. That is if the ITS is a 
> different entity how do you see it work with the rest of the GIC?

 Easy. As i said before, what ITS actually does is transforming device ID + 
event ID into a single LPI number. Then, an LPI goes to the redistributor. 
Therefore, all we need from GIC is a method like inject_lpi(int lpi). Well, may 
be a couple more, for moving LPIs between collections. But this is not a 
problem and makes up a nice interface.
 Even on the picture you mentioned ITS is a kind of separate thing. It can even 
be completely missing.

 Well, my reasons end here. If you still disagree... Do what you want then, it 
will not be a problem to remove these unused stubs.
 Peter, we are summoning you. :)

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 6/9] hw/intc: arm_gicv3_spi_its

2015-10-21 Thread Pavel Fedin

 Hello!

>> Or do you have some explicit reasons to have everything as a monolith?
> No I just didn't want to have 3 stub files spi, its and its_control.
> Do you suggest that I'll split it to 3 files?

 You didn't understand my question. It's not about internal structure of ITS 
implementation. It is about GIC and ITS connection.
 Please review my KVM ITS RFC: 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg04613.html. You'll 
see that ITS is a separate object of a separate class, which can even be 
omitted at all, if machine model doesn't need it for some reason. So, i suggest 
that all the ITS code should go there, and it would be a completely separate 
entity, and a separate patch set, after your GICv3 is accepted. I will help you 
with this.
 Peter, i know you can be very busy, but, could you at least take a glance at 
my vITS v2 RFC structure and judge us? Should ITS + GICv3 be a monolithic 
object, or is my suggestion better?
 By the way, gicv3_init_irqs_and_mmio() expects only two regions, so it will 
not even pay attention to your stubs. You could patch it, of course, but... I 
don't think it's the good thing to do.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 9/9] hw/arm: Add virt-v3 machine that uses GIC-500

2015-10-21 Thread Pavel Fedin

 Hello!

>> The system register implementation belongs in the gic code, not
>> target-arm/. We already have support for devices that say
>> "I have some system registers, please add them to this CPU".

> I don't understand.
> The system registers are defined in ARM Architecture reference Manual.
> It is true that the real implementation is in arm_gicv3_interrupts.c
> But the crn, crm, op0, and op1 of the instructions are in CPU domain.

 Not really. If you take a closer look, you'll see that crn, crm, op1, op2 are 
the same for both ARM64 and ARM32. The only difference is that ARM64 uses op0 = 
3, and ARM32 uses cp = 15. Both of these specifiers can coexist in the 
descriptor table.
 I think Peter wants to tell that you should not insert your register table and 
registration function into target-arm/cpu64.c. This code should go to 
hw/intc/arm_gicv3_cpu_interface.c, add .cp = 15, and - voila - it magically 
works with both ARM32 and ARM64.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 1/9] hw/intc: Implement GIC-500 support files

2015-10-21 Thread Pavel Fedin

 Hello!

> I can't find the patch that handles the for example modification of "uint8_t 
> sgi_pending[GIC_NR_SGIS][GIC_NCPU]" to fixed-size bitmaps. as
> GIC_NCPU is not a constant and uint8_t need to have the size of nubmer of 
> CPUs which is no longer bounded.

 This is the only thing which i excluded, because my vGIC implementation didn't 
need these flags. I know that you cannot actually omit them in software 
emulation, because they are needed in order to handle a situation where more 
than one CPU sends the same SGI number.
 I think you can place it in distributor, in "internal state" section, if you 
look at my patch.

> Are you sure that the semantics is the same? In section 4.1.4 of GICv3 I see 
> that the security level is a combination of two registers GRPMOD > and GROUP 
> and I wanted to be prepared for it.

 Looks like you have some private NDA version of the manual, because my one 
(downloaded from infocenter) doesn't have a paragraph numbered 4.1.4. Anyway, 
after reading my one, i think that GRPMOD simply overrides what is specified in 
GROUP. I cannot find such thing as "group 2" anywhere, and all the text starts 
with "In a GIC which supports two security states". So, there's only non-secure 
and secure state, nothing else.
 Again, nothing changes since ARM32.

> I assume you want to distinguish between Secure EL1 and Secure EL3 (in the 
> future).

 As far as i understand, EL3 is what was called "monitor" in ARM32, and i would 
still prefer to call it this way, because it is more meaningful than those 
stupid (IMHO) numbers. The only special thing about this mode is that it allows 
you to freely switch between secure and non-secure states. So, again, there's 
nothing special about "secure EL3".

 Peter, please correct me if i'm wrong.

> I don't have access to the internal CPU's data structures in the GIC's core, 
> its an opaque structure to the GIC.
> Including the CPU include files doesn't seems to work.

 See this: 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg02349.html. This is 
also a part of my live migration RFC.
 I remember that Peter told long time ago that "it should really be a 
property", when i integrated full affinity support. But, he currently refused 
to accept this small standalone patch because there are no users for now. My 
GICv3 live migration is waiting for kernel API to be ready. And kernel API is 
waiting for kernel 4.5 development cycle to begin.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 9/9] hw/arm: Add virt-v3 machine that uses GIC-500

2015-10-21 Thread Pavel Fedin

class_init(ObjectClass *oc, void *data)
>  {
>  MachineClass *mc = MACHINE_CLASS(oc);
> @@ -1174,9 +1221,33 @@ static const TypeInfo machvirt_info = {
>  .class_init = virt_class_init,
>  };
> 
> +static void virtv3_instance_init(Object *obj)
> +{
> +virt_instance_init_common(obj, 3);
> +}
> +
> +static void virtv3_class_init(ObjectClass *oc, void *data)
> +{
> +MachineClass *mc = MACHINE_CLASS(oc);
> +
> +mc->name = TYPE_VIRTV3_MACHINE;
> +mc->desc = "ARM Virtual Machine with GICv3",
> +mc->init = machvirt_init;
> +}
> +
> +static const TypeInfo machvirtv3_info = {
> +.name = TYPE_VIRTV3_MACHINE,
> +.parent = TYPE_VIRT_MACHINE,
> +.instance_size = sizeof(VirtMachineState),
> +.instance_init = virtv3_instance_init,
> +.class_size = sizeof(VirtMachineClass),
> +.class_init = virtv3_class_init,
> +};
> +
>  static void machvirt_machine_init(void)
>  {
>  type_register_static(&machvirt_info);
> +type_register_static(&machvirtv3_info);
>  }

 As i said before, this stuff is simply not needed.

> 
>  machine_init(machvirt_machine_init);
> diff --git a/include/hw/arm/fdt.h b/include/hw/arm/fdt.h
> index c3d5015..ad8f85c 100644
> --- a/include/hw/arm/fdt.h
> +++ b/include/hw/arm/fdt.h
> @@ -30,5 +30,7 @@
> 
>  #define GIC_FDT_IRQ_PPI_CPU_START 8
>  #define GIC_FDT_IRQ_PPI_CPU_WIDTH 8
> +#define GICV3_FDT_IRQ_PPI_CPU_WIDTH 24
> +
> 
>  #endif
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index f464586..53ff50a 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -46,6 +46,7 @@ enum {
>  VIRT_CPUPERIPHS,
>  VIRT_GIC_DIST,
>  VIRT_GIC_CPU,
> +VIRT_GIC_DIST_SPI = VIRT_GIC_CPU,
>  VIRT_GIC_V2M,
>  VIRT_GIC_ITS,
>  VIRT_GIC_REDIST,
> diff --git a/target-arm/machine.c b/target-arm/machine.c
> index 36a0d15..33f28be 100644
> --- a/target-arm/machine.c
> +++ b/target-arm/machine.c
> @@ -341,7 +341,12 @@ const char *gicv3_class_name(void)
>  #endif
>  } else {
>  /* TODO: Software emulation is not implemented yet */
> -error_report("KVM is currently required for GICv3 emulation\n");
> +#ifdef TARGET_AARCH64
> +return "arm_gicv3";

 Peter told me, there is a policy to use dash ("-") in class names. So 
"arm-gicv3".
 By the way, why does it depend on TARGET_AARCH64? Actually, TARGET_ARM and 
TARGET_AARCH64 are the same. You can make both emulators
running both 64- and 32-bit code. KVM code depends on TARGET_AARCH64 only 
because some definitions are missing on 32-bit kernels.
You don't have this restriction, so you don't need this #ifdef.

> +#else
> +error_report("KVM GICv3 acceleration is not supported on this "
> + "platform\n");

 Why "KVM" here? Well, rubbish anyway because of the above. :)

> +#endif
>  }
> 
>  exit(1);
> --
> 1.9.1

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH RFC V5 1/9] hw/intc: Implement GIC-500 support files

2015-10-20 Thread Pavel Fedin

 GICD_ISENABLER
> +#define GICR_ICENABLER0 GICD_ICENABLER
> +#define GICR_ISPENDR0   GICD_ISPENDR
> +#define GICR_ICPENDR0   GICD_ICPENDR
> +#define GICR_ISACTIVER0 GICD_ISACTIVER
> +#define GICR_ICACTIVER0 GICD_ICACTIVER
> +#define GICR_IPRIORITYR0GICD_IPRIORITYR
> +#define GICR_ICFGR0 GICD_ICFGR
> +
> +#define GICR_TYPER_VLPIS(1U << 1)
> +#define GICR_TYPER_LAST (1U << 4)
> +
> +/*
> + * Simulated used system registers
> + */
> +#define GICC_CTLR_EN_GRP0(1U << 0)
> +#define GICC_CTLR_EN_GRP1(1U << 1)
> +#define GICC_CTLR_ACK_CTL(1U << 2)
> +#define GICC_CTLR_FIQ_EN (1U << 3)
> +#define GICC_CTLR_CBPR   (1U << 4) /* GICv1: SBPR */
> +#define GICC_CTLR_EOIMODE(1U << 9)
> +#define GICC_CTLR_EOIMODE_NS (1U << 10)
> +
> +#endif /* !QEMU_ARM_GIC_INTERNAL_H */
> diff --git a/include/hw/intc/arm_gicv3.h b/include/hw/intc/arm_gicv3.h
> new file mode 100644
> index 000..a03af35
> --- /dev/null
> +++ b/include/hw/intc/arm_gicv3.h
> @@ -0,0 +1,44 @@
> +/*
> + * ARM GIC support
> + *
> + * Copyright (c) 2012 Linaro Limited
> + * Copyright (c) 2015 Huawei.
> + * Written by Peter Maydell
> + * Extended to 64 cores by Shlomo Pongratz
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_ARM_GICV3_H
> +#define HW_ARM_GICV3_H
> +
> +#include "arm_gicv3_common.h"
> +
> +#define TYPE_ARM_GICV3 "arm_gicv3"
> +#define ARM_GICV3(obj) \
> + OBJECT_CHECK(GICv3State, (obj), TYPE_ARM_GICV3)
> +#define ARM_GICV3_CLASS(klass) \
> + OBJECT_CLASS_CHECK(ARMGICv3Class, (klass), TYPE_ARM_GICV3)
> +#define ARM_GICV3_GET_CLASS(obj) \
> + OBJECT_GET_CLASS(ARMGICv3Class, (obj), TYPE_ARM_GICV3)
> +
> +typedef struct ARMGICv3Class {
> +/*< private >*/
> +ARMGICv3CommonClass parent_class;
> +/*< public >*/
> +
> +DeviceRealize parent_realize;
> +} ARMGICv3Class;
> +
> +#endif
> diff --git a/include/hw/intc/arm_gicv3_common.h 
> b/include/hw/intc/arm_gicv3_common.h
> index c2fd8da..aa32229 100644
> --- a/include/hw/intc/arm_gicv3_common.h
> +++ b/include/hw/intc/arm_gicv3_common.h
> @@ -26,6 +26,51 @@
>  #include "hw/sysbus.h"
>  #include "hw/intc/arm_gic_common.h"
> 
> +/* Maximum number of possible interrupts, determined by the GIC architecture 
> */
> +#define GICV3_MAXIRQ 1020
> +/* First 32 are private to each CPU (SGIs and PPIs). */
> +#define GICV3_INTERNAL 32
> +#define GICV3_NR_SGIS 16
> +
> +#define ARM_AFF0_SHIFT 0
> +#define ARM_AFF0_MASK  (0xFFULL << ARM_AFF0_SHIFT)
> +#define ARM_AFF1_SHIFT 8
> +#define ARM_AFF1_MASK  (0xFFULL << ARM_AFF1_SHIFT)
> +#define ARM_AFF2_SHIFT 16
> +#define ARM_AFF2_MASK  (0xFFULL << ARM_AFF2_SHIFT)
> +#define ARM_AFF3_SHIFT 32
> +#define ARM_AFF3_MASK  (0xFFULL << ARM_AFF3_SHIFT)
> +
> +#define MAX_NR_GROUP_PRIO 128
> +
> +typedef struct gicv3_irq_state {
> +/* The enable bits are only banked for per-cpu interrupts.  */
> +unsigned long *enabled;
> +unsigned long *pending;
> +unsigned long *active;
> +unsigned long *level;
> +unsigned long *group;
> +unsigned long *target;
> +uint16_t *last_active;
> +bool edge_trigger; /* true: edge-triggered, false: level-triggered  */
> +int32_t num_cpu; /* For VMSTATE_BITMAP & VMSTATE_VARRAY */
> +} gicv3_irq_state;
> +
> +typedef struct gicv3_sgi_state {
> +unsigned long *pending;
> +int32_t num_cpu; /* For VMSTATE_BITMAP */
> +} gicv3_sgi_state;
> +
> +typedef struct gicv3_sgi_vec {
> +struct gicv3_sgi_state *state;
> +int32_t num_cpu; /* For VMSTATE_VARRAY */
> +} gicv3_sgi_vec;
> +
> +typedef struct gicv3_priority {
> +uint8_t *p;
> +int32_t num_cpu; /* For VMSTATE_VARRAY */
> +} gicv3_Priority;
> +
>  typedef struct GICv3State {
>  /*< private >*/
>  SysBusDevice parent_obj;
> @@ -33,14 +78,45 @@ typedef struct GICv3State {
> 
>  qemu_irq *parent_irq;
>  qemu_irq *parent_fiq;
> +/* GICD_CTLR; for a GIC with the security extensions the NS banked 
> version
> + * of this register is just an alias of bit 1 of the S banked version.
> + */
> +uint32_t ctlr;
> +/* Sim GICC_CTLR; again, the NS banked version is just aliases of bits of
> + * the S banked register, so our state only needs to store the S version.
> + */
> +uint32_t *cpu_ctlr;
> +unsigned long *cpu_enabled;
> +
> +gicv3_irq_state irq_state[GICV3_MAXIRQ];
> +gicv3_Priority priority1[GICV3_INTERNAL];
> +uint8_t priority2[GICV3_MAXIRQ - GICV3_INTERNAL];
> +/* For each SGI on the target CPU, we store bitmap
> + * indicating which source CPUs have made this SGI
> + * pending on the target CPU. These correspond to
> + * the bytes in the GIC_SPENDSGIR* registers as
> + * read by the target CPU.
> + */
> +gicv3_sgi_vec sgi[GICV3_NR_SGIS];
> +
> +uint16_t *priority_mask;
> +uint16_t *running_irq;
> +uint16_t *running_priority;
> +uint16_t *current_pending;
> +uint32_t num_mp_affinity;
> +uint64_t *mp_affinity;
> 
>  MemoryRegion iomem_dist; /* Distributor */
> +MemoryRegion iomem_spi;
> +MemoryRegion iomem_its_cntrl;
> +MemoryRegion iomem_its;
>  MemoryRegion iomem_redist; /* Redistributors */
> 
> +uint32_t cpu_mask; /* For redistributer */
>  uint32_t num_cpu;
>  uint32_t num_irq;
>  uint32_t revision;
> -bool security_extn;
> +uint8_t security_levels; /* replace seurity extentions */
> 
>  int dev_fd; /* kvm device fd if backed by kvm vgic support */
>  } GICv3State;
> --
> 1.9.1

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

1 2 3 4 5 6 >

1 - 100 of 511 matches

Mail list logo