date:20131122

Re: [Qemu-devel] [PATCH 1.7 0/2] PPC: Fix BookE timer performance regression

2013-11-22 Thread Stefan Weil

Am 23.11.2013 05:08, schrieb Stefan Weil:
> Good morning Alexander,
>
> I have two small remarks:
>
> Patch 1: Could you please use 'QemuTimer' instead of 'struct QEMUTimer'
> in the modified lines of code?

Sorry, QEMUTimer, of course. The 'struct' keyword is redundant and
should be avoided.

Re: [Qemu-devel] [PATCH 1.7 0/2] PPC: Fix BookE timer performance regression

2013-11-22 Thread Stefan Weil

Am 23.11.2013 04:53, schrieb Alexander Graf:
> Thanks to the new timer infrastructure we are now able to trigger timer events
> and ridiculous granularities in sub-microsecond orders.
>
> However, the BookE targets were quite unhappy about that change, showing up
> to x10 slowdown on a simple Linux guest bootup test.
>
> This patch set makes the constant timer facilities in BookE more lazy and less
> fine grained than they could be. That way we're at least as good as we were in
> QEMU 1.6.
>
> Alexander Graf (2):
>   PPC: Make BookE FIT/WDT timers more lazy
>   PPC: BookE: Make FIT/WDT timers at best millisecond grained
>
>  hw/ppc/ppc_booke.c | 49 -
>  1 file changed, 44 insertions(+), 5 deletions(-)

Good morning Alexander,

I have two small remarks:

Patch 1: Could you please use 'QemuTimer' instead of 'struct QEMUTimer'
in the modified lines of code?

Patch 2: Milliseconds instead of microseconds (commit message and code
comment)?

Cheers,
Stefan

[Qemu-devel] [PATCH 1.7 0/2] PPC: Fix BookE timer performance regression

2013-11-22 Thread Alexander Graf

Thanks to the new timer infrastructure we are now able to trigger timer events
and ridiculous granularities in sub-microsecond orders.

However, the BookE targets were quite unhappy about that change, showing up
to x10 slowdown on a simple Linux guest bootup test.

This patch set makes the constant timer facilities in BookE more lazy and less
fine grained than they could be. That way we're at least as good as we were in
QEMU 1.6.

Alexander Graf (2):
  PPC: Make BookE FIT/WDT timers more lazy
  PPC: BookE: Make FIT/WDT timers at best millisecond grained

 hw/ppc/ppc_booke.c | 49 -
 1 file changed, 44 insertions(+), 5 deletions(-)

-- 
1.7.12.4

[Qemu-devel] [PATCH 2/2] PPC: BookE: Make FIT/WDT timers at best microsecond grained

2013-11-22 Thread Alexander Graf

The default granularity for the FIT timer on 440 is on every 0x1000th
transition of TB from 0 to 1. Translated that means 48828 times a second.

Since interrupts are quite expensive for 440 and we don't really care
about the accuracy of the FIT to that significance, let's force FIT and
WDT to at best microsecond granularity.

This basically restores behavior as it was in QEMU 1.6, where timers
could only deal with microsecond granularities at all.

This patch greatly improves performance with the 440 target and restores
roughly the same performance level that QEMU 1.6 had for me.

Signed-off-by: Alexander Graf 
---
 hw/ppc/ppc_booke.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/ppc/ppc_booke.c b/hw/ppc/ppc_booke.c
index 28cf9fd..3038acc 100644
--- a/hw/ppc/ppc_booke.c
+++ b/hw/ppc/ppc_booke.c
@@ -174,6 +174,12 @@ static void booke_update_fixed_timer(CPUPPCState 
*env,
 
 if (*next == now) {
 (*next)++;
+} else {
+/*
+ * There's no point to fake any granularity that's more fine grained
+ * than microseconds. Anything beyond that just overloads the system.
+ */
+*next = MAX(*next, now + SCALE_MS);
 }
 
 /* Fire the next timer */
-- 
1.7.12.4

[Qemu-devel] [PATCH 1/2] PPC: Make BookE FIT/WDT timers more lazy

2013-11-22 Thread Alexander Graf

Today we fire FIT and WDT timer events every time the respective bit
position in TB flips from 0 -> 1.

However, there is no need to do this if the end result would be that
we're changing a TSR bit that is set to 1 to 1 again. No guest visible
change would have occured.

So whenever we see that the TSR bit to our timer is already set, don't
even bother to update the timer that would potentially fire it off.

However, we do need to make sure that we update our timer that notifies
us of the TB flip when the respective TSR bit gets unset. In that case
we do care about the flip and need to notify the guest again. So add
a callback into our timer handlers when TSR bits get unset.

This improves performance for me when the guest is busy processing things.

Signed-off-by: Alexander Graf 
---
 hw/ppc/ppc_booke.c | 43 ++-
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/ppc_booke.c b/hw/ppc/ppc_booke.c
index 8bbfc72..28cf9fd 100644
--- a/hw/ppc/ppc_booke.c
+++ b/hw/ppc/ppc_booke.c
@@ -128,7 +128,8 @@ static uint8_t booke_get_wdt_target(CPUPPCState *env, 
ppc_tb_t *tb_env)
 static void booke_update_fixed_timer(CPUPPCState *env,
  uint8_t   target_bit,
  uint64_t  *next,
- struct QEMUTimer *timer)
+ struct QEMUTimer *timer,
+ int   tsr_bit)
 {
 ppc_tb_t *tb_env = env->tb_env;
 uint64_t delta_tick, ticks = 0;
@@ -136,6 +137,14 @@ static void booke_update_fixed_timer(CPUPPCState 
*env,
 uint64_t period;
 uint64_t now;
 
+if (!env->spr[SPR_BOOKE_TSR] & tsr_bit) {
+/*
+ * Don't arm the timer again when the guest has the current
+ * interrupt still pending. Wait for it to ack it.
+ */
+return;
+}
+
 now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 tb  = cpu_ppc_get_tb(tb_env, now, tb_env->tb_offset);
 period = 1ULL << target_bit;
@@ -167,6 +176,7 @@ static void booke_update_fixed_timer(CPUPPCState 
*env,
 (*next)++;
 }
 
+/* Fire the next timer */
 timer_mod(timer, *next);
 }
 
@@ -200,7 +210,8 @@ static void booke_fit_cb(void *opaque)
 booke_update_fixed_timer(env,
  booke_get_fit_target(env, tb_env),
  &booke_timer->fit_next,
- booke_timer->fit_timer);
+ booke_timer->fit_timer,
+ TSR_FIS);
 }
 
 static void booke_wdt_cb(void *opaque)
@@ -220,15 +231,35 @@ static void booke_wdt_cb(void *opaque)
 booke_update_fixed_timer(env,
  booke_get_wdt_target(env, tb_env),
  &booke_timer->wdt_next,
- booke_timer->wdt_timer);
+ booke_timer->wdt_timer,
+ TSR_WIS);
 }
 
 void store_booke_tsr(CPUPPCState *env, target_ulong val)
 {
 PowerPCCPU *cpu = ppc_env_get_cpu(env);
+ppc_tb_t *tb_env = env->tb_env;
+booke_timer_t *booke_timer = tb_env->opaque;
 
 env->spr[SPR_BOOKE_TSR] &= ~val;
 kvmppc_clear_tsr_bits(cpu, val);
+
+if (val & TSR_FIS) {
+booke_update_fixed_timer(env,
+ booke_get_fit_target(env, tb_env),
+ &booke_timer->fit_next,
+ booke_timer->fit_timer,
+ TSR_FIS);
+}
+
+if (val & TSR_WIS) {
+booke_update_fixed_timer(env,
+ booke_get_wdt_target(env, tb_env),
+ &booke_timer->wdt_next,
+ booke_timer->wdt_timer,
+ TSR_WIS);
+}
+
 booke_update_irq(cpu);
 }
 
@@ -247,12 +278,14 @@ void store_booke_tcr(CPUPPCState *env, target_ulong val)
 booke_update_fixed_timer(env,
  booke_get_fit_target(env, tb_env),
  &booke_timer->fit_next,
- booke_timer->fit_timer);
+ booke_timer->fit_timer,
+ TSR_FIS);
 
 booke_update_fixed_timer(env,
  booke_get_wdt_target(env, tb_env),
  &booke_timer->wdt_next,
- booke_timer->wdt_timer);
+ booke_timer->wdt_timer,
+ TSR_WIS);
 }
 
 static void ppc_booke_timer_reset_handle(void *opaque)
-- 
1.7.12.4

Re: [Qemu-devel] [PULL for-1.7 5/5] Revert "e1000/rtl8139: update HMP NIC when every bit is written"

2013-11-22 Thread Amos Kong

On Thu, Nov 21, 2013 at 05:14:36PM +0200, Michael S. Tsirkin wrote:
> This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
> Digging into hardware specs shows this does not
> actually make QEMU behave more like hardware:
> 
> There are valid arguments backed by the spec to indicate why the version
> of e1000 prior to cd5be582 was more correct: the high byte actually
> includes a valid bit, this is why all guests write it last.
> 
> For rtl8139 there's actually a separate undocumented valid bit, but we
> don't implement it yet.
> 
> To summarize all the drivers we know about behave in one way
> that allows us to make an assumption about write order and avoid
> spurious, incorrect mac address updates to the monitor.
> 
> Let's stick to the tried heuristic for 1.7 and
> possibly revisit for 1.8.
> 
> Reported-by: Vlad Yasevich 
> Reviewed-by: Vlad Yasevich 
> Cc: Amos Kong 
> Signed-off-by: Michael S. Tsirkin 

Reviewed-by: Amos Kong 

> ---
>  hw/net/e1000.c   | 2 +-
>  hw/net/rtl8139.c | 5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index ae63591..8387443 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)
>  
>  s->mac_reg[index] = val;
>  
> -if (index == RA || index == RA + 1) {
> +if (index == RA + 1) {
>  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
>  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
>  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t *)macaddr);
> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> index 7f2b4db..5329f44 100644
> --- a/hw/net/rtl8139.c
> +++ b/hw/net/rtl8139.c
> @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
> addr, uint32_t val)
>  
>  switch (addr)
>  {
> -case MAC0 ... MAC0+5:
> +case MAC0 ... MAC0+4:
> +s->phys[addr - MAC0] = val;
> +break;
> +case MAC0+5:
>  s->phys[addr - MAC0] = val;
>  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
>  break;
> -- 
> MST
> 

-- 
Amos.

Re: [Qemu-devel] [edk2 PATCH] OvmfPkg: split the variable store to a separate file

2013-11-22 Thread Laszlo Ersek

On 11/22/13 21:51, Jordan Justen wrote:

> Many of these scenarios were discussed in the past on qemu-devel, but
> a single -pflash was the only thing that stuck. This has made me just
> focus on making the single file pflash work.

I almost forgot to reflect on this -- I'm extremely grateful to you that
you implemented the flash driver.

I tried to be as non-intrusive in my OVMF patch as possible. Keeping the
PCD values and the original memory layout was crucial -- I wanted the
driver to continue working as-is. Everything else (the qemu patch, for
example) came from that. In fact I wrote the OVMF patch first, and then
said "let's see how we can accommodate this in qemu".

(Sorry about answering twice.)

Laszlo

Re: [Qemu-devel] [edk2 PATCH 1/1] OvmfPkg/AcpiPlatformDxe: download ACPI tables from QEMU

2013-11-22 Thread Jordan Justen

On Fri, Nov 22, 2013 at 10:49 AM, Laszlo Ersek  wrote:
> On 11/22/13 19:10, Jordan Justen wrote:
>> Do we need to print all those fields?
>>
>> Anyway, maybe a better centralized place for this would be:
>> MdeModulePkg/Universal/Acpi/AcpiTableDxe/AcpiTableProtocol.c
>>
>> Also, I think we try to keep debug messages under 80 characters.
>
> We seem to think completely differently about debug messages :)

Maybe... ?

I think you can have too much debug if:
* it makes it debug builds excessively slow or large
* it makes it challenging to find items in the debug log
* it makes the code significantly more difficult to read
* it is in a code area that is just not valuable to log

> I can cut most of the fields though, and simply keep the signature (and
> remove the format string macro too, because for the signature I'd only
> use it once). Would that work for you?

Sure.

>> We do have a local InstallAcpiTable function. (Just remove "AcpiProtocol->")
>
> Yes, I saw that. But, we don't have a wrapper for the rollback on the
> error path (where we uninstall the tables). Since I used AcpiProtocol->
> in the rollback part, I wanted to be consistent here.

Good point.

-Jordan

Re: [Qemu-devel] [edk2 PATCH] OvmfPkg: split the variable store to a separate file

2013-11-22 Thread Laszlo Ersek

On 11/22/13 21:51, Jordan Justen wrote:
> On Fri, Nov 22, 2013 at 10:43 AM, Laszlo Ersek  wrote:
>> OTOH building all three FDs per default might be confusing for
>> end-users. We know for a fact that they usually don't read the README
>> (or not thoroughly enough). If we only give them one output file per
>> default, that's less potential for confusion.
> 
> If it will just add confusion, then perhaps it is something best left
> out of the README. And at that point, since splitting it is so easy,
> the value of having it in OVMF is debatable.
> 
>> But I can certainly post a version where all three files are built per
>> default, if that's what you prefer (and aren't opposed to the idea in
>> general).
> 
> I'm not opposed to it, but I would like to wait to see what QEMU does.

OK. Let's see where the qemu patch goes (I'll have to fix the comment
typo that Eric found), and if it is accepted (maybe with modifications),
we can return to the OVMF patch.

> Many of these scenarios were discussed in the past on qemu-devel, but
> a single -pflash was the only thing that stuck.

But (thanks to you) we now have a flash driver in OVMF that works *in
practice*. The discussion about multiple -pflash flags is not academic
any longer. (Hm, sorry, that's maybe too strong -- I can't really say if
the discussion used to be academic before. But it cannot have been this
practical.)

We also know that the libvirt developers prefer the split file. Plus:

> This has made me just
> focus on making the single file pflash work. You also can't deny that
> the QEMU command line gets ugly real fast.

We're in violent agreement on that -- which emphasizes the need for good
libvirt support all the more.

Seriously, I refuse to work with the qemu command line day to day. I
have an elaborate wrapper script that I insert between qemu and libvirt
(specified in the  libvirt domain XML) that post-processes the
arguments composed by libvirt. I need this to test out features that
libvirt doesn't yet support.

For example, I can set various filenames in the  element -- it
specifies the argument to the -bios option. In the script I check the
argument against various patterns, and I can turn it into:
- -bios "$ARG"
- -pflash "$ARG"
- -pflash "$ARG" -pflash "$(manipulate "$ARG")"

The script handles CSM vs. pure-UEFI for Windows 2008, it can honor
upstream vs. RHEL qemu differences, it can turn off the iPXE virtio-net
driver. In the single -pflash option case, it does refresh the code part
in the VM's copy of OVMF (with dd) from my most recent build. It sets a
name-dependent output file for the debug port. Etc.

Maintaining this script over months has paid off orders of magnitude
better than working with the raw qemu command line would have. (I can
compare: on my (occasionally used) AMD desktop that runs Debian I have
not bothered to install libvirt, and each time I need to modify the
forty line qemu command that I keep in a script file, I cringe. And I
can't bring myself to install a second guest.)

The convenience/efficiency libvirt gives me is critical. Changing boot
order, adding or removing devices, starting & stopping -- these are very
fast in virt-manager. Editing the configuration with virsh is nice (and
it has some level of automatic verification when you save and exit).
Comparing guest configs against each other (using the XMLs) is very
convenient. And so on. I depend on these every day, but I need to modify
the wrapper script only occasionally.

Libvirt still allows me to send custom monitor commands, and I can set
it to log all of its *own* commands that it sends to the monitor or the
guest agent.

I've always wondered how other developers (for example, you :)) can
exist without libvirt at all!

That is why "keep the qemu command line simple" (== approx. "two -pflash
options are inconvenient") is no concern of mine. That ship has sailed.

Thanks,
Laszlo

Re: [Qemu-devel] [edk2 PATCH] OvmfPkg: split the variable store to a separate file

2013-11-22 Thread Jordan Justen

On Fri, Nov 22, 2013 at 12:54 PM, Eric Blake  wrote:
> On 11/22/2013 01:51 PM, Jordan Justen wrote:
>
>> Tangentially related idea: if the user specifies -pflash to a
>> non-existent file, then QEMU could copy 'pflash-$(arch).bin' from the
>> roms folder into that file, and then the use it for the flash. It
>> could make using a flash almost as easy as using the default bios.
>
> But that image won't be auto-updated the next time the master is
> updated.

Yes. We've decided to call that a feature. :)
https://lists.gnu.org/archive/html/qemu-devel/2011-07/msg02507.html

> Besides, under sVirt protections of libvirt, libvirt would
> have to pre-create the file (because libvirt intentionally denies qemu
> the ability to create files), so libvirt would have to be aware of the
> default. It's not much harder to make libvirt aware of handling a split
> image, and a split image is easier to handle than having to copy the
> default into a destination.

Well, admittedly, this idea is to optimize for qemu command line
ease-of-use. So I seem to be targeting a niche market. :) But I still
like to run qemu directly.

-Jordan

Re: [Qemu-devel] [PATCH] qemu-ga: vss-win32: Install VSS provider COM+ application service

2013-11-22 Thread Michael Roth

Quoting Tomoki Sekiyama (2013-11-01 16:47:25)
> Currently, qemu-ga for Windows fails to execute guset-fsfreeze-freeze when
> no user is logging in to Windows, with an error message:
>   {"error":{"class":"GenericError",
> "desc":"failed to add C:\\ to snapshotset:  (error: 8004230f)"}}
> 
> To enable guest-fsfreeze-freeze/thaw without logging in users, this installs
> a service to execute qemu-ga VSS provider COM+ application that has full
> access privileges to the local system. The service will automatically be
> removed when the COM+ application is deregistered.
> 
> This patch replaces ICOMAdminCatalog interface with ICOMAdminCatalog2
> interface that contains CreateServiceForApplication() method in addition.
> 
> Signed-off-by: Tomoki Sekiyama 

Thanks Tomoki, applied to qga branch and pull sent for 1.7:

https://github.com/mdroth/qemu/commits/qga

> ---
>  qga/vss-win32/install.cpp |   16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/qga/vss-win32/install.cpp b/qga/vss-win32/install.cpp
> index 37731a7..b791a6c 100644
> --- a/qga/vss-win32/install.cpp
> +++ b/qga/vss-win32/install.cpp
> @@ -25,8 +25,8 @@ extern HINSTANCE g_hinstDll;
> 
>  const GUID CLSID_COMAdminCatalog = { 0xF618C514, 0xDFB8, 0x11d1,
>  {0xA2, 0xCF, 0x00, 0x80, 0x5F, 0xC7, 0x92, 0x35} };
> -const GUID IID_ICOMAdminCatalog = { 0xDD662187, 0xDFC2, 0x11d1,
> -{0xA2, 0xCF, 0x00, 0x80, 0x5F, 0xC7, 0x92, 0x35} };
> +const GUID IID_ICOMAdminCatalog2 = { 0x790C6E0B, 0x9194, 0x4cc9,
> +{0x94, 0x26, 0xA4, 0x8A, 0x63, 0x18, 0x56, 0x96} };
>  const GUID CLSID_WbemLocator = { 0x4590f811, 0x1d3a, 0x11d0,
>  {0x89, 0x1f, 0x00, 0xaa, 0x00, 0x4b, 0x2e, 0x24} };
>  const GUID IID_IWbemLocator = { 0xdc12a687, 0x737f, 0x11cf,
> @@ -141,7 +141,7 @@ static HRESULT QGAProviderFind(
>  HRESULT hr;
>  COMInitializer initializer;
>  COMPointer pUnknown;
> -COMPointer pCatalog;
> +COMPointer pCatalog;
>  COMPointer pColl;
>  COMPointer pObj;
>  _variant_t var;
> @@ -149,7 +149,7 @@ static HRESULT QGAProviderFind(
> 
>  chk(CoCreateInstance(CLSID_COMAdminCatalog, NULL, CLSCTX_INPROC_SERVER,
>   IID_IUnknown, (void **)pUnknown.replace()));
> -chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog,
> +chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog2,
>   (void **)pCatalog.replace()));
>  chk(pCatalog->GetCollection(_bstr_t(L"Applications"),
>  (IDispatch **)pColl.replace()));
> @@ -206,7 +206,7 @@ STDAPI COMRegister(void)
>  HRESULT hr;
>  COMInitializer initializer;
>  COMPointer pUnknown;
> -COMPointer pCatalog;
> +COMPointer pCatalog;
>  COMPointer pApps, pRoles, pUsersInRole;
>  COMPointer pObj;
>  long n;
> @@ -229,7 +229,7 @@ STDAPI COMRegister(void)
> 
>  chk(CoCreateInstance(CLSID_COMAdminCatalog, NULL, CLSCTX_INPROC_SERVER,
>   IID_IUnknown, (void **)pUnknown.replace()));
> -chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog,
> +chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog2,
>   (void **)pCatalog.replace()));
> 
>  /* Install COM+ Component */
> @@ -273,6 +273,10 @@ STDAPI COMRegister(void)
>  goto out;
>  }
> 
> +chk(pCatalog->CreateServiceForApplication(
> +_bstr_t(QGA_PROVIDER_LNAME), _bstr_t(QGA_PROVIDER_LNAME),
> +_bstr_t(L"SERVICE_AUTO_START"), _bstr_t(L"SERVICE_ERROR_NORMAL"),
> +_bstr_t(L""), _bstr_t(L".\\localsystem"), _bstr_t(L""), FALSE));
>  chk(pCatalog->InstallComponent(_bstr_t(QGA_PROVIDER_LNAME),
> _bstr_t(dllPath), _bstr_t(tlbPath),
> _bstr_t("")));

[Qemu-devel] [PATCH] qemu-ga: vss-win32: Install VSS provider COM+ application service

2013-11-22 Thread Michael Roth

From: Tomoki Sekiyama 

Currently, qemu-ga for Windows fails to execute guset-fsfreeze-freeze when
no user is logging in to Windows, with an error message:
  {"error":{"class":"GenericError",
"desc":"failed to add C:\\ to snapshotset:  (error: 8004230f)"}}

To enable guest-fsfreeze-freeze/thaw without logging in users, this installs
a service to execute qemu-ga VSS provider COM+ application that has full
access privileges to the local system. The service will automatically be
removed when the COM+ application is deregistered.

This patch replaces ICOMAdminCatalog interface with ICOMAdminCatalog2
interface that contains CreateServiceForApplication() method in addition.

Signed-off-by: Tomoki Sekiyama 
Signed-off-by: Michael Roth 
---
 qga/vss-win32/install.cpp |   16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/qga/vss-win32/install.cpp b/qga/vss-win32/install.cpp
index 37731a7..b791a6c 100644
--- a/qga/vss-win32/install.cpp
+++ b/qga/vss-win32/install.cpp
@@ -25,8 +25,8 @@ extern HINSTANCE g_hinstDll;
 
 const GUID CLSID_COMAdminCatalog = { 0xF618C514, 0xDFB8, 0x11d1,
 {0xA2, 0xCF, 0x00, 0x80, 0x5F, 0xC7, 0x92, 0x35} };
-const GUID IID_ICOMAdminCatalog = { 0xDD662187, 0xDFC2, 0x11d1,
-{0xA2, 0xCF, 0x00, 0x80, 0x5F, 0xC7, 0x92, 0x35} };
+const GUID IID_ICOMAdminCatalog2 = { 0x790C6E0B, 0x9194, 0x4cc9,
+{0x94, 0x26, 0xA4, 0x8A, 0x63, 0x18, 0x56, 0x96} };
 const GUID CLSID_WbemLocator = { 0x4590f811, 0x1d3a, 0x11d0,
 {0x89, 0x1f, 0x00, 0xaa, 0x00, 0x4b, 0x2e, 0x24} };
 const GUID IID_IWbemLocator = { 0xdc12a687, 0x737f, 0x11cf,
@@ -141,7 +141,7 @@ static HRESULT QGAProviderFind(
 HRESULT hr;
 COMInitializer initializer;
 COMPointer pUnknown;
-COMPointer pCatalog;
+COMPointer pCatalog;
 COMPointer pColl;
 COMPointer pObj;
 _variant_t var;
@@ -149,7 +149,7 @@ static HRESULT QGAProviderFind(
 
 chk(CoCreateInstance(CLSID_COMAdminCatalog, NULL, CLSCTX_INPROC_SERVER,
  IID_IUnknown, (void **)pUnknown.replace()));
-chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog,
+chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog2,
  (void **)pCatalog.replace()));
 chk(pCatalog->GetCollection(_bstr_t(L"Applications"),
 (IDispatch **)pColl.replace()));
@@ -206,7 +206,7 @@ STDAPI COMRegister(void)
 HRESULT hr;
 COMInitializer initializer;
 COMPointer pUnknown;
-COMPointer pCatalog;
+COMPointer pCatalog;
 COMPointer pApps, pRoles, pUsersInRole;
 COMPointer pObj;
 long n;
@@ -229,7 +229,7 @@ STDAPI COMRegister(void)
 
 chk(CoCreateInstance(CLSID_COMAdminCatalog, NULL, CLSCTX_INPROC_SERVER,
  IID_IUnknown, (void **)pUnknown.replace()));
-chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog,
+chk(pUnknown->QueryInterface(IID_ICOMAdminCatalog2,
  (void **)pCatalog.replace()));
 
 /* Install COM+ Component */
@@ -273,6 +273,10 @@ STDAPI COMRegister(void)
 goto out;
 }
 
+chk(pCatalog->CreateServiceForApplication(
+_bstr_t(QGA_PROVIDER_LNAME), _bstr_t(QGA_PROVIDER_LNAME),
+_bstr_t(L"SERVICE_AUTO_START"), _bstr_t(L"SERVICE_ERROR_NORMAL"),
+_bstr_t(L""), _bstr_t(L".\\localsystem"), _bstr_t(L""), FALSE));
 chk(pCatalog->InstallComponent(_bstr_t(QGA_PROVIDER_LNAME),
_bstr_t(dllPath), _bstr_t(tlbPath),
_bstr_t("")));
-- 
1.7.9.5

[Qemu-devel] [PULL 0/1 for-1.7] qemu-ga fsfreeze fix for Windows

2013-11-22 Thread Michael Roth

Hi Anthony,

Please pull the following fix, which allows guest-freeze-fsfreeze to be
executed while a user is not logged in to the guest.

The following changes since commit 607bb022f2a44797cbf40e85e84da4134e2f0e01:

  Update version for 1.7.0-rc1 release (2013-11-21 08:11:47 -0800)

are available in the git repository at:

  git://github.com/mdroth/qemu.git qga-pull-2013-11-22

for you to fetch changes up to 3c040d5cba585e27eded8efc121ef599723581f8:

  qemu-ga: vss-win32: Install VSS provider COM+ application service (2013-11-22 
14:54:42 -0600)


Tomoki Sekiyama (1):
  qemu-ga: vss-win32: Install VSS provider COM+ application service

 qga/vss-win32/install.cpp |   16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

Re: [Qemu-devel] [edk2 PATCH] OvmfPkg: split the variable store to a separate file

2013-11-22 Thread Eric Blake

On 11/22/2013 01:51 PM, Jordan Justen wrote:

> Tangentially related idea: if the user specifies -pflash to a
> non-existent file, then QEMU could copy 'pflash-$(arch).bin' from the
> roms folder into that file, and then the use it for the flash. It
> could make using a flash almost as easy as using the default bios.

But that image won't be auto-updated the next time the master is
updated.  Besides, under sVirt protections of libvirt, libvirt would
have to pre-create the file (because libvirt intentionally denies qemu
the ability to create files), so libvirt would have to be aware of the
default.  It's not much harder to make libvirt aware of handling a split
image, and a split image is easier to handle than having to copy the
default into a destination.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [edk2 PATCH] OvmfPkg: split the variable store to a separate file

2013-11-22 Thread Jordan Justen

On Fri, Nov 22, 2013 at 10:43 AM, Laszlo Ersek  wrote:
> OTOH building all three FDs per default might be confusing for
> end-users. We know for a fact that they usually don't read the README
> (or not thoroughly enough). If we only give them one output file per
> default, that's less potential for confusion.

If it will just add confusion, then perhaps it is something best left
out of the README. And at that point, since splitting it is so easy,
the value of having it in OVMF is debatable.

> But I can certainly post a version where all three files are built per
> default, if that's what you prefer (and aren't opposed to the idea in
> general).

I'm not opposed to it, but I would like to wait to see what QEMU does.

Many of these scenarios were discussed in the past on qemu-devel, but
a single -pflash was the only thing that stuck. This has made me just
focus on making the single file pflash work. You also can't deny that
the QEMU command line gets ugly real fast.

Tangentially related idea: if the user specifies -pflash to a
non-existent file, then QEMU could copy 'pflash-$(arch).bin' from the
roms folder into that file, and then the use it for the flash. It
could make using a flash almost as easy as using the default bios.

-Jordan

Re: [Qemu-devel] [PATCH v4 6/7] qmp: add command 'blockdev-backup'

2013-11-22 Thread Eric Blake

On 11/21/2013 10:24 PM, Fam Zheng wrote:
> Similar to drive-backup, but this command uses a device id as target
> instead of creating/opening an image file.
> 
> Also add blocker on target bs, since the target is also a named device
> now.
> 
> Signed-off-by: Fam Zheng 
> ---

> +++ b/qapi-schema.json
> @@ -1844,6 +1844,40 @@
>  '*on-target-error': 'BlockdevOnError' } }
>  
>  ##
> +# @BlockdevBackup
> +#
> +# @device: the name of the device which should be copied.
> +#
> +# @target: the name of the backup target device

We're not very consistent on whether to end these with a '.'; but that's
not the end of the world :)

> +# Note that @on-source-error and @on-target-error only affect background I/O.
> +# If an error occurs during a guest write request, the device's rerror/werror
> +# actions will be used.
> +#
> +# Since: 1.7

1.8

> +##
> +{ 'type': 'BlockdevBackup',
> +  'data': { 'device': 'str', 'target': 'str',
> +'sync': 'MirrorSyncMode',
> +'*speed': 'int',
> +'*on-source-error': 'BlockdevOnError',
> +'*on-target-error': 'BlockdevOnError' } }
> +

> +# @blockdev-backup
> +#
> +# Block device version for drive-backup command. Use existing device as 
> target
> +# of backup.
> +#
> +# For the arguments, see the documentation of BlockdevBackup.
> +#
> +# Returns: nothing on success
> +#  If @device or @target is not a valid block device, DeviceNotFound
> +#
> +# Since 1.7

1.8

Interface seems okay; I didn't closely review the code leading up to it
(but do think your array of blockers, containing a ready-to-return error
message, is kind of slick)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 1/7] qapi: Add BlockOperationType enum

2013-11-22 Thread Eric Blake

On 11/21/2013 10:24 PM, Fam Zheng wrote:
> This adds the enum of all the operations that can be taken on a block
> device.
> 
> Signed-off-by: Fam Zheng 
> ---
>  qapi-schema.json | 25 +
>  1 file changed, 25 insertions(+)

>  
>  ##
> +# @BlockOperationType
> +# Type of a block operation.
> +#
> +# Since: 1.8

It might be worth adding a bit more docs, as in one line per enum value
with a cross-reference to the command(s) that can trigger that
operation.  Something like:

@backup: See the 'drive-backup' command.
@change: See the 'change' command.

This is especially true if we later add new block ops later than 1.8, so
such additions can have the useful '(since 1.9)' notation.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] GIT master fails compilation for ACPI

2013-11-22 Thread Erik Rull


Paolo Bonzini wrote:

Il 22/11/2013 12:16, Erik Rull ha scritto:

It's getting more and more complex to build qemu, is there a reason why everyone
needs to build the acpi stuff by himself?


It is only attempted if iasl is installed but as you said below, your
version is too old.  Please run "make V=1" so that we can see what is
the problem.


It should be something static like the bios binary files, right?


ACPI tables are now generated by QEMU, so the ACPI compilation step
happens while compiling QEMU.


So you could provide the defaults directly and everyone that wants to modify the
defaults is free to compile it by himself.

And if these tools are required, please add an error message at configure
runtime so that the successive errors at runtime will not appear, because
compiling is then blocked by configure. And if the IASL is too old, a version
check at configure runtime would be helpful as well.


Good idea.  Any chance you could help?  Version 20090123 should be new
enough.


Hi Paolo,

Sure, here the V=1 result - I tried already some make options to get the 
verbose output, but I didn't find this trivial option :-)


erik@debian:~/qemu-test/qemu$ make V=1
make  BUILD_DIR=/home/erik/qemu-test/qemu -C pixman V="1" all
make[1]: Entering directory `/home/erik/qemu-test/qemu/pixman'
make  all-recursive
make[2]: Entering directory `/home/erik/qemu-test/qemu/pixman'
Making all in pixman
make[3]: Entering directory `/home/erik/qemu-test/qemu/pixman/pixman'
make  all-am
make[4]: Entering directory `/home/erik/qemu-test/qemu/pixman/pixman'
make[4]: Nothing to be done for `all-am'.
make[4]: Leaving directory `/home/erik/qemu-test/qemu/pixman/pixman'
make[3]: Leaving directory `/home/erik/qemu-test/qemu/pixman/pixman'
Making all in test
make[3]: Entering directory `/home/erik/qemu-test/qemu/pixman/test'
make[3]: Nothing to be done for `all'.
make[3]: Leaving directory `/home/erik/qemu-test/qemu/pixman/test'
make[3]: Entering directory `/home/erik/qemu-test/qemu/pixman'
make[3]: Nothing to be done for `all-am'.
make[3]: Leaving directory `/home/erik/qemu-test/qemu/pixman'
make[2]: Leaving directory `/home/erik/qemu-test/qemu/pixman'
make[1]: Leaving directory `/home/erik/qemu-test/qemu/pixman'
make  BUILD_DIR=/home/erik/qemu-test/qemu -C x86_64-softmmu V="1" 
TARGET_DIR="x86_64-softmmu/" all

make[1]: Entering directory `/home/erik/qemu-test/qemu/x86_64-softmmu'
cpp -P /home/erik/qemu-test/qemu/hw/i386/acpi-dsdt.dsl -o acpi-dsdt.dsl.i.orig
python /home/erik/qemu-test/qemu/scripts/acpi_extract_preprocess.py 
acpi-dsdt.dsl.i.orig > acpi-dsdt.dsl.i

iasl  -vs -l -tc -p acpi-dsdt acpi-dsdt.dsl.i   2>&1
acpi-dsdt.dsl.i84: 0x80,
Error4094 -   ^ Conversion error: AE_BAD_HEX_CONSTANT


acpi-dsdt.dsl.i85: 0xFF,
Error4094 -   ^ Conversion error: AE_BAD_HEX_CONSTANT


acpi-dsdt.dsl.i87: 0x80,
Error4094 -   ^ Conversion error: AE_BAD_HEX_CONSTANT


ASL Input:  acpi-dsdt.dsl.i - 476 lines, 19189 bytes, 316 keywords
Compilation complete. 3 Errors, 0 Warnings, 0 Remarks, 246 Optimizations
make[1]: *** [hw/i386/acpi-dsdt.hex] Error 1
make[1]: Leaving directory `/home/erik/qemu-test/qemu/x86_64-softmmu'
make: *** [subdir-x86_64-softmmu] Error 2
erik@debian:~/qemu-test/qemu$

Best regards,

Erik




Paolo

Re: [Qemu-devel] [RFC PATCH v3 0/2] Point-in-time snapshot exporting over NBD

2013-11-22 Thread Ian Main

On Fri, Nov 22, 2013 at 01:47:26PM +0800, Fam Zheng wrote:
> On 2013年11月20日 10:32, Ian Main wrote:
> >On Thu, Oct 17, 2013 at 01:36:41PM +0800, Fam Zheng wrote:
> >>This series adds for point-in-time snapshot NBD exporting based on
> >>blockdev-backup (variant of drive-backup with existing device as target).
> >
> >In general this seems to work great.  I did a bunch of testing and was
> >able to mount filesystems over NBD, do many writes (on backed up
> >image)/reads (on target), intense IO etc and all held up fine.
> >
> >There are definitely some issues with some of the commands allowing you
> >to blow things up.  I'm interested to hear opinions on whether this is a
> >showstopper at this time or not.
> >
> 
> Thanks very much for your testing, Ian. QEMU should report error
> instead of crashing with invalid operations. I added an "operation
> blocker" and incorporated into the next version.
> 
> In the future, we still want to review more on those commands and
> try to enable safe ones, and possibly also allow multiple block jobs
> on a BDS. For now it is good enough to only allow blockdev-backup on
> the source and NBD export on the target (which is safe, and all what
> we need for image fleecing, as this version shows). With that being
> a working implementation, I've posted v4. Please test and free to
> make comments.

That's great.  Interesting solution you have come up with.  I will give
it a try and let you know how it goes.
 
> >>We get a thin point-in-time snapshot by COW mechanism of drive-backup, and
> >>export it through built in NBD server. The steps are as below:
> >>
> >>  1. (SHELL) qemu-img create -f qcow2 BACKUP.qcow2 
> >>
> >> (Alternatively we can use -o backing_file=RUNNING-VM.img to omit 
> >> explicitly
> >> providing the size by ourselves, but it's risky because 
> >> RUNNING-VM.qcow2 is
> >> used r/w by guest. Whether or not setting backing file in the image 
> >> file
> >> doesn't matter, as we are going to override the backing hd in the next
> >> step)
> >>
> >>  2. (QMP) blockdev-add backing=source-drive file.driver=file 
> >> file.filename=BACKUP.qcow2 id=target0 if=none driver=qcow2
> >
> >I had to create a custom python script to make these commands work as
> >they require nested dicts.  Is that normal or did I miss something
> >entirely?
> >
> 
> Yes, I use a python script locally to test it too.

I like your notation.. I might try to make a patch for qmp/qmp-shell to
allow it to create nested dicts using the dot notation.

Ian

Re: [Qemu-devel] [PATCH] vfio-pci: Release all MSI-X vectors when disabled

2013-11-22 Thread Bandan Das

Alex Williamson  writes:

> We were relying on msix_unset_vector_notifiers() to release all the
> vectors when we disable MSI-X, but this only happens when MSI-X is
> still enabled on the device.  Perform further cleanup by releasing
> any remaining vectors listed as in-use after this call.  This caused
> a leak of IRQ routes on hotplug depending on how the guest OS prepared
> the device for removal.
>
> Signed-off-by: Alex Williamson 
> Cc: qemu-sta...@nongnu.org
> ---
>  hw/misc/vfio.c |   12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index f7f8a19..355b018 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -878,8 +878,20 @@ static void vfio_disable_msi_common(VFIODevice *vdev)
>  
>  static void vfio_disable_msix(VFIODevice *vdev)
>  {
> +int i;
> +
>  msix_unset_vector_notifiers(&vdev->pdev);
>  
> +/*
> + * MSI-X will only release vectors if MSI-X is still enabled on the
> + * device, check through the rest and release it ourselves if necessary.
> + */
> +for (i = 0; i < vdev->nr_vectors; i++) {
> +if (vdev->msi_vectors[i].use) {
> +vfio_msix_vector_release(&vdev->pdev, i);
> +}
> +}
> +
>  if (vdev->nr_vectors) {
>  vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
>  }

Reviewed-by: Bandan Das

Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-22 Thread Laszlo Ersek

On 11/22/13 13:21, Markus Armbruster wrote:
> Laszlo Ersek  writes:
> 
>> This patch allows the user to usefully specify
>>
>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>   -drive file=img_2,if=pflash,format=raw
>>
>> on the command line. The flash images will be mapped under 4G in their
>> reverse unit order -- that is, with their base addresses progressing
>> downwards, in increasing unit order.
>>
>> (The unit number increases with command line order if not explicitly
>> specified.)
>>
>> This accommodates the following use case: suppose that OVMF is split in
>> two parts, a writeable host file for non-volatile variable storage, and a
>> read-only part for bootstrap and decompressible executable code.
>>
>> The binary code part would be read-only, centrally managed on the host
>> system, and passed in as unit 0. The variable store would be writeable,
>> VM-specific, and passed in as unit 1.
>>
>>   ffe0-ffe1 (prio 0, R-): system.flash1
>>   ffe2- (prio 0, R-): system.flash0
>>
>> (If the guest tries to write to the flash range that is backed by the
>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>> write with -EACCES, and pflash_update() won't care, which suits us well.)
> 
> Before this patch:
> 
> You can define as many if=pflash drives as you want.  Any with non-zero
> index are silently ignored.
> 
> If you don't specify one with index=0, you get a ROM at the top of the
> 32 bit address space, contents taken from -bios (default "bios.bin").
> Up to its last 128KiB are aliased at the top of the ISA address space.
> 
> If you do specify one with index=0, you get a pflash device at the top
> of the 32 bit address space, with contents from the drive, and -bios is
> silently ignored.  Up to its last 128KiB are copied to a ROM at the top
> of the (20 bit) ISA address space.
> 
> After this patch (please correct misunderstandings):
> 
> Now the drives after the first unused index are silently ignored.
> 
> If you don't specify one with index=0, no change.
> 
> If you do, you now get N pflash devices, where N is the first unused
> index.  Each pflash's contents is taken from the respective drive.  The
> flashes are mapped at the top of the 32 bit address space in reverse
> index order.  -bios is silently ignored, as before.  Up to the last
> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA
> address space.
> 
> Thus, no change for index=0.  For index=1..N, we now get additional
> flash devices.
> 
> Correct?

Yes.

Thanks
Laszlo

[Qemu-devel] [PATCH] kvm: Query KVM for available memory slots

2013-11-22 Thread Alex Williamson

KVM reports the number of available memory slots (KVM_CAP_NR_MEMSLOTS)
using the extension interface.  Both x86 and s390 implement this, ARM
and powerpc do not yet enable it.  Convert the static slots array to
be dynamically allocated, supporting more slots when available.
Default to 32 when KVM_CAP_NR_MEMSLOTS is not implemented.  The
motivation for this change is to support more assigned devices, where
memory mapped PCI MMIO BARs typically take one slot each.

Signed-off-by: Alex Williamson 
---
 kvm-all.c |   30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 4478969..63c4e9b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -72,7 +72,8 @@ typedef struct kvm_dirty_log KVMDirtyLog;
 
 struct KVMState
 {
-KVMSlot slots[32];
+KVMSlot *slots;
+int nr_slots;
 int fd;
 int vmfd;
 int coalesced_mmio;
@@ -125,7 +126,7 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
 {
 int i;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 if (s->slots[i].memory_size == 0) {
 return &s->slots[i];
 }
@@ -141,7 +142,7 @@ static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
 {
 int i;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 KVMSlot *mem = &s->slots[i];
 
 if (start_addr == mem->start_addr &&
@@ -163,7 +164,7 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
 KVMSlot *found = NULL;
 int i;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 KVMSlot *mem = &s->slots[i];
 
 if (mem->memory_size == 0 ||
@@ -185,7 +186,7 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void 
*ram,
 {
 int i;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 KVMSlot *mem = &s->slots[i];
 
 if (ram >= mem->ram && ram < mem->ram + mem->memory_size) {
@@ -357,7 +358,7 @@ static int kvm_set_migration_log(int enable)
 
 s->migration_log = enable;
 
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+for (i = 0; i < s->nr_slots; i++) {
 mem = &s->slots[i];
 
 if (!mem->memory_size) {
@@ -1383,9 +1384,6 @@ int kvm_init(void)
 #ifdef KVM_CAP_SET_GUEST_DEBUG
 QTAILQ_INIT(&s->kvm_sw_breakpoints);
 #endif
-for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-s->slots[i].slot = i;
-}
 s->vmfd = -1;
 s->fd = qemu_open("/dev/kvm", O_RDWR);
 if (s->fd == -1) {
@@ -1409,6 +1407,19 @@ int kvm_init(void)
 goto err;
 }
 
+s->nr_slots = kvm_check_extension(s, KVM_CAP_NR_MEMSLOTS);
+
+/* If unspecified, use the previous default value */
+if (!s->nr_slots) {
+s->nr_slots = 32;
+}
+
+s->slots = g_malloc0(s->nr_slots * sizeof(KVMSlot));
+
+for (i = 0; i < s->nr_slots; i++) {
+s->slots[i].slot = i;
+}
+
 /* check the vcpu limits */
 soft_vcpus_limit = kvm_recommended_vcpus(s);
 hard_vcpus_limit = kvm_max_vcpus(s);
@@ -1527,6 +1538,7 @@ err:
 if (s->fd != -1) {
 close(s->fd);
 }
+g_free(s->slots);
 g_free(s);
 
 return ret;

Re: [Qemu-devel] [PATCH v9 10/11] target-arm: Provide '-cpu host' when running KVM

2013-11-22 Thread Christoffer Dall

On Fri, Nov 22, 2013 at 06:50:47PM +, Peter Maydell wrote:
> On 22 November 2013 18:25, Christoffer Dall  
> wrote:
> > On Fri, Nov 22, 2013 at 05:17:17PM +, Peter Maydell wrote:
> >> +bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
> >> +  int *fdarray,
> >> +  struct kvm_vcpu_init *init);
> >
> > why do we need to export this at all?
> 
> For 64 bit KVM support the 32 bit code moves into kvm32.c
> and then we have two users of this function, one in
> kvm32.c and one in kvm64.c. It seemed easiest to just
> make the function public with the proper doc comment
> from the start, rather than starting it private and
> then having a patch which makes it public.
> 
Yes, that does make sense.

Thanks.

Re: [Qemu-devel] [PATCH v9 10/11] target-arm: Provide '-cpu host' when running KVM

2013-11-22 Thread Peter Maydell

On 22 November 2013 18:25, Christoffer Dall  wrote:
> On Fri, Nov 22, 2013 at 05:17:17PM +, Peter Maydell wrote:
>> +bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
>> +  int *fdarray,
>> +  struct kvm_vcpu_init *init);
>
> why do we need to export this at all?

For 64 bit KVM support the 32 bit code moves into kvm32.c
and then we have two users of this function, one in
kvm32.c and one in kvm64.c. It seemed easiest to just
make the function public with the proper doc comment
from the start, rather than starting it private and
then having a patch which makes it public.

thanks
-- PMM

Re: [Qemu-devel] [edk2 PATCH 1/1] OvmfPkg/AcpiPlatformDxe: download ACPI tables from QEMU

2013-11-22 Thread Laszlo Ersek

On 11/22/13 19:10, Jordan Justen wrote:
> On Tue, Nov 12, 2013 at 7:11 AM, Laszlo Ersek  wrote:
>> Qemu v1.7.0-rc0 features an ACPI linker/loader interface, available over
>> fw_cfg, written by Michael Tsirkin.
>>
>> Qemu composes all ACPI tables on the host side, according to the target
>> hardware configuration, and makes the tables available to any guest
>> firmware over fw_cfg.
>>
>> The feature moves the burden of keeping ACPI tables up-to-date from boot
>> firmware to qemu (which is the source of hardware configuration anyway).
>>
>> This patch adds client code for this feature.
>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Laszlo Ersek 
>> ---
>>  OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h |   7 +-
>>  OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c |  12 +-
>>  OvmfPkg/AcpiPlatformDxe/Qemu.c | 204 
>> +
>>  3 files changed, 215 insertions(+), 8 deletions(-)
>>
>> diff --git a/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h 
>> b/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h
>> index 21107cd..c643fa1 100644
>> --- a/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h
>> +++ b/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h
>> @@ -10,7 +10,7 @@
>>THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
>>WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR 
>> IMPLIED.
>>
>> -**/
>> +**/
>>
>>  #ifndef _ACPI_PLATFORM_H_INCLUDED_
>>  #define _ACPI_PLATFORM_H_INCLUDED_
>> @@ -61,5 +61,10 @@ InstallXenTables (
>>IN   EFI_ACPI_TABLE_PROTOCOL   *AcpiProtocol
>>);
>>
>> +EFI_STATUS
>> +EFIAPI
>> +InstallQemuLinkedTables (
>> +  IN   EFI_ACPI_TABLE_PROTOCOL   *AcpiProtocol
>> +  );
>>  #endif
>>
>> diff --git a/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c 
>> b/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c
>> index 6e0b610..084c393 100644
>> --- a/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c
>> +++ b/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c
>> @@ -256,16 +256,14 @@ AcpiPlatformEntryPoint (
>>
>>if (XenDetected ()) {
>>  Status = InstallXenTables (AcpiTable);
>> -if (EFI_ERROR (Status)) {
>> -  Status = FindAcpiTablesInFv (AcpiTable);
>> -}
>>} else {
>> +Status = InstallQemuLinkedTables (AcpiTable);
>> +  }
>> +
>> +  if (EFI_ERROR (Status)) {
>>  Status = FindAcpiTablesInFv (AcpiTable);
>>}
>> -  if (EFI_ERROR (Status)) {
>> -return Status;
>> -  }
>>
>> -  return EFI_SUCCESS;
>> +  return Status;
>>  }
>>
>> diff --git a/OvmfPkg/AcpiPlatformDxe/Qemu.c b/OvmfPkg/AcpiPlatformDxe/Qemu.c
>> index 06bd463..c572f8a 100644
>> --- a/OvmfPkg/AcpiPlatformDxe/Qemu.c
>> +++ b/OvmfPkg/AcpiPlatformDxe/Qemu.c
>> @@ -515,3 +515,207 @@ QemuInstallAcpiTable (
>> );
>>  }
>>
>> +
>> +/**
>> +  Download the ACPI table data file from QEMU and interpret it.
>> +
>> +  Starting with v1.7.0-rc0, QEMU provides the following three files for 1.7+
>> +  machine types:
>> +  - etc/acpi/rsdp
>> +  - etc/acpi/tables
>> +  - etc/table-loader
>> +
>> +  "etc/acpi/rsdp" and "etc/acpi/tables" are similar, they are only kept
>> +  separate because they have different allocation requirements in SeaBIOS.
>> +
>> +  Both of these fw_cfg files contain preformatted ACPI payload. 
>> "etc/acpi/rsdp"
>> +  contains only the RSDP table, while "etc/acpi/tables" contains all other
>> +  tables, concatenated.
>> +
>> +  The tables in these two files have been filled in by qemu, but two kinds 
>> of
>> +  fields are incomplete in each table: pointers to other tables, and 
>> checksums
>> +  (which depend on the pointers). The pointers are pre-initialized with
>> +  *relative* offsets, but their final values depend on where the actual 
>> files
>> +  will be placed in memory by the guest. That is, the pointer fields need 
>> to be
>> +  "relocated" (incremented) by the base addresses of where "/etc/acpi/rsdp" 
>> and
>> +  "/etc/acpi/tables" will be placed in guest memory.
>> +
>> +  This is where the third file, "/etc/table-loader" comes into the picture. 
>> It
>> +  is a linker/loader script that has several command types:
>> +
>> +One command type instructs the guest to download the other two files.
>> +
>> +Another command type instructs the guest to increment ("absolutize") a
>> +pointer field (having a relative initial value) in some file, with the
>> +dynamic base address of the same (or another) file.
>> +
>> +The third command type instructs the guest to compute checksums over
>> +ranges and to store them.
>> +
>> +  In edk2, EFI_ACPI_TABLE_PROTOCOL knows about table relationships -- it
>> +  handles linkage automatically when a table is installed. The protocol 
>> takes
>> +  care of checksumming too. RSDP is installed automatically. Hence we only 
>> need
>> +  to care about the "etc/acpi/tables" file, determining the boundaries of 
>> each
>> +  table and installing it.
> 
> I'm wondering if some of the information in this comment might have a
> better home in the commit message. Will it help in maintaining the

Re: [Qemu-devel] [edk2 PATCH] OvmfPkg: split the variable store to a separate file

2013-11-22 Thread Laszlo Ersek

On 11/22/13 18:37, Jordan Justen wrote:
> On Thu, Nov 21, 2013 at 2:21 PM, Laszlo Ersek  wrote:
>> Split the variable store off to a separate file when SPLIT_VARSTORE is
>> defined.
> 
> Is the goal to allow the central OVMF to be updated so the VM will
> automatically be running the newest firmware? (While preserving
> variables)

Yes.

In some distros this is what happens (*) when you uprage the
seabios(-bin) package on the host system. When you reboot any VM on it,
it will see the upgraded bios.

(*) Except of course you have no variable store.

The bios binary (the file belonging to its containing package) is also
owned by root and has some nice file mode bits and SELinux permissions:

$ ls -lL /usr/share/qemu-kvm/bios.bin
-rw-r--r--. 1 root root 131072 2013-11-20 14:55:29 +0100
/usr/share/qemu-kvm/bios.bin

$ ls -lLZ /usr/share/qemu-kvm/bios.bin
-rw-r--r--. root root system_u:object_r:usr_t:s0
/usr/share/qemu-kvm/bios.bin

These are distinctly different from what libvirt sets for the private
image files that underneath the VM's disks.

OVMF.fd would correspond to "bios.bin" above, and NVVARSTORE.fd is a
private disk file.

> I think in your scenario, you are assuming some VM manager will build
> the command line parameters. But, couldn't the VM manager also merge
> flash updates into the *single* VM specific flash image? Then QEMU
> would not need to be changed.

Correct. I floated this idea to the libvirt guys and Cole (of
virt-manager fame). The merging proposal was frowned upon. (Also, if
we're talking explicit reflashing, maybe the user should click a button
on virt-manager to request the update... Not sure.)

So basically these patches are just the non-merging alternative that is
perceived as more suitable for some distros.

> This might also enable a 'feature' where the particular VM instance
> can choose to not update the firmware when the central OVMF is
> updated.

Correct. (See the click the button thing above.) However right now VMs
have no choice, and auto-upgrading their OVMF wouldn't be a step back.
But, your remark is 100% valid.

> If we decided splitting was a good thing, then it would be nice to
> just always build the split and full images. I'm not sure .fdf can
> handle this though.

I think it can, if we add all three FDs with different names (like
OVMF.fd, OVMF_SPLIT.fd, NVVARSTORE.fd). I think the build process reuses
the FV files if there are multiple referring FDs -- I seem to recall
that's already happening between MEMFD.fd and OVMF.fd, no?

> Seems like build.sh could be tweaked if .fdf is
> not up to the task.

Certainly -- just invoke it twice with different params.

OTOH building all three FDs per default might be confusing for
end-users. We know for a fact that they usually don't read the README
(or not thoroughly enough). If we only give them one output file per
default, that's less potential for confusion.

But I can certainly post a version where all three files are built per
default, if that's what you prefer (and aren't opposed to the idea in
general).

Thanks!
Laszlo

Re: [Qemu-devel] [PATCH v9 10/11] target-arm: Provide '-cpu host' when running KVM

2013-11-22 Thread Christoffer Dall

On Fri, Nov 22, 2013 at 05:17:17PM +, Peter Maydell wrote:
> Implement '-cpu host' for ARM when we're using KVM, broadly
> in line with other KVM-supporting architectures.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target-arm/helper.c  |6 ++
>  target-arm/kvm.c |  224 
> ++
>  target-arm/kvm_arm.h |   55 +
>  3 files changed, 285 insertions(+)
> 
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 3445813..263dbbf 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -1842,6 +1842,12 @@ void arm_cpu_list(FILE *f, fprintf_function 
> cpu_fprintf)
>  (*cpu_fprintf)(f, "Available CPUs:\n");
>  g_slist_foreach(list, arm_cpu_list_entry, &s);
>  g_slist_free(list);
> +#ifdef CONFIG_KVM
> +/* The 'host' CPU type is dynamically registered only if KVM is
> + * enabled, so we have to special-case it here:
> + */
> +(*cpu_fprintf)(f, "  host (only available in KVM mode)\n");
> +#endif
>  }
>  
>  static void arm_cpu_add_definition(gpointer data, gpointer user_data)
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index 182db85..f865dac 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -27,12 +27,236 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] 
> = {
>  KVM_CAP_LAST_INFO
>  };
>  
> +bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
> +  int *fdarray,
> +  struct kvm_vcpu_init *init)
> +{
> +int ret, kvmfd = -1, vmfd = -1, cpufd = -1;
> +
> +kvmfd = qemu_open("/dev/kvm", O_RDWR);
> +if (kvmfd < 0) {
> +goto err;
> +}
> +vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
> +if (vmfd < 0) {
> +goto err;
> +}
> +cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
> +if (cpufd < 0) {
> +goto err;
> +}
> +
> +ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
> +if (ret >= 0) {
> +ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
> +if (ret < 0) {
> +goto err;
> +}
> +} else {
> +/* Old kernel which doesn't know about the
> + * PREFERRED_TARGET ioctl: we know it will only support
> + * creating one kind of guest CPU which is its preferred
> + * CPU type.
> + */
> +while (*cpus_to_try != QEMU_KVM_ARM_TARGET_NONE) {
> +init->target = *cpus_to_try++;
> +memset(init->features, 0, sizeof(init->features));
> +ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
> +if (ret >= 0) {
> +break;
> +}
> +}
> +if (ret < 0) {
> +goto err;
> +}
> +}
> +
> +fdarray[0] = kvmfd;
> +fdarray[1] = vmfd;
> +fdarray[2] = cpufd;

you could consider using a define/enum/struct for this instead of an
array, but bah, not important.

> +
> +return true;
> +
> +err:
> +if (cpufd >= 0) {
> +close(cpufd);
> +}
> +if (vmfd >= 0) {
> +close(vmfd);
> +}
> +if (kvmfd >= 0) {
> +close(kvmfd);
> +}
> +
> +return false;
> +}
> +
> +void kvm_arm_destroy_scratch_host_vcpu(int *fdarray)
> +{
> +int i;
> +
> +for (i = 2; i >= 0; i--) {
> +close(fdarray[i]);
> +}
> +}
> +
> +static inline void set_feature(uint64_t *features, int feature)
> +{
> +*features |= 1ULL << feature;
> +}
> +
> +bool kvm_arm_get_host_cpu_features(ARMHostCPUClass *ahcc)
> +{
> +/* Identify the feature bits corresponding to the host CPU, and
> + * fill out the ARMHostCPUClass fields accordingly. To do this
> + * we have to create a scratch VM, create a single CPU inside it,
> + * and then query that CPU for the relevant ID registers.
> + */
> +int i, ret, fdarray[3];
> +uint32_t midr, id_pfr0, id_isar0, mvfr1;
> +uint64_t features = 0;
> +/* Old kernels may not know about the PREFERRED_TARGET ioctl: however
> + * we know these will only support creating one kind of guest CPU,
> + * which is its preferred CPU type.
> + */
> +static const uint32_t cpus_to_try[] = {
> +QEMU_KVM_ARM_TARGET_CORTEX_A15,
> +QEMU_KVM_ARM_TARGET_NONE
> +};
> +struct kvm_vcpu_init init;
> +struct kvm_one_reg idregs[] = {
> +{
> +.id = KVM_REG_ARM | KVM_REG_SIZE_U32
> +| ENCODE_CP_REG(15, 0, 0, 0, 0, 0),
> +.addr = (uintptr_t)&midr,
> +},
> +{
> +.id = KVM_REG_ARM | KVM_REG_SIZE_U32
> +| ENCODE_CP_REG(15, 0, 0, 1, 0, 0),
> +.addr = (uintptr_t)&id_pfr0,
> +},
> +{
> +.id = KVM_REG_ARM | KVM_REG_SIZE_U32
> +| ENCODE_CP_REG(15, 0, 0, 2, 0, 0),
> +.addr = (uintptr_t)&id_isar0,
> +},
> +{
> +.id = KVM_REG_ARM | KVM_REG_SIZE_U32
> +| KVM_REG_ARM_VFP | KVM_REG_ARM_VFP_MVFR1,
> +.a

Re: [Qemu-devel] [PATCH v9 00/11] target-arm: mach virt and -cpu host support

2013-11-22 Thread Christoffer Dall

On Fri, Nov 22, 2013 at 05:17:07PM +, Peter Maydell wrote:
> This patchset combines the 'virt' machine definition and
> -cpu host support patchsets I've posted previous versions
> of. I think these are now ready to go in once 1.8 opens
> up; review appreciated.
> 
> Changes since previous versions:
>  * added in -cpu host patchset, since it really only makes
>sense with mach-virt
>  * rearranged virt address space a bit to allow space
>for a reasonable sized boot flash device and possible
>PCI window if a PCI controller model ever becomes possible
>  * fixed bug where we weren't starting secondary CPUs in
>"PSCI power-down" state (by adding a CPU property which
>lets the board specify that the CPU starts powered down)
>[thanks to Giridhar Maruthy for tracking this down]
>  * includes a KVM header update against current mainline
>  * includes (previously RFC'd) patch which provides defines
>of some kernel KVM constants we can use even if not CONFIG_KVM
>  * put the KVM CPU type into the CPU object rather than having
>a silly 'QOM object type => constant' lookup table
>  * put dtb 'compatible' string into CPU object rather than
>having the board model need to know about it
>  * abstracted out the "create a scratch vcpu for querying
>for capabilities" code; this is currently only called in
>one place, but the abstraction will be needed for 64 bit
>KVM ARM support
> 
> Mostly this is cleanup and streamlining which I noticed
> was either possible or necessary in the course of putting
> 64 bit KVM ARM control on top of this patchset.
> 
> 
> The kernel patch to get the PL011 to work is still needed
> (Christoffer tells me he's working on getting a proper fix
> done for this):
> 

I tell myself that too...

Anyway, looked over the series and it looks good to me.

So except for some of the QOM that I can't really wrap around my brain
yet:

Reviewed-by: Christoffer Dall

Re: [Qemu-devel] [PATCH v9 07/11] hw/arm: Add 'virt' platform

2013-11-22 Thread Christoffer Dall

On Fri, Nov 22, 2013 at 06:17:51PM +, Peter Maydell wrote:
> On 22 November 2013 18:11, Christoffer Dall  
> wrote:
> > On Fri, Nov 22, 2013 at 05:17:14PM +, Peter Maydell wrote:
> >> + * Emulate a virtual board which works by passing Linux all the 
> >> information
> >> + * it needs about what devices are present via the device tree.
> >> + * There are some restrictions about what we can do here:
> >> + *  + we can only present devices whose Linux drivers will work based
> >> + *purely on the device tree with no platform data at all
> >> + *  + we want to present a very stripped-down minimalist platform,
> >> + *both because this reduces the security attack surface from the guest
> >> + *and also because it reduces our exposure to being broken when
> >> + *the kernel updates its device tree bindings and requires further
> >> + *information in a device binding that we aren't providing.
> >> + * This is essentially the same approach kvmtool uses.
> >
> > nit: questionable value of this last line in the comment.
> 
> Well, it says we're not doing something completely bonkers
> of our own devising ;-)
> 

;)

> > +vbi->bootinfo.board_id = -1;
> >
> > board_id = -1 ?
> 
> This is what the kernel's boot protocol says you should
> pass as the board ID register for boards which are new
> enough that they only support booting via device tree
> and not the old-style board ID number system.
> 
ok, thanks.
-Christoffer

Re: [Qemu-devel] [PATCH v9 07/11] hw/arm: Add 'virt' platform

2013-11-22 Thread Peter Maydell

On 22 November 2013 18:11, Christoffer Dall  wrote:
> On Fri, Nov 22, 2013 at 05:17:14PM +, Peter Maydell wrote:
>> + * Emulate a virtual board which works by passing Linux all the information
>> + * it needs about what devices are present via the device tree.
>> + * There are some restrictions about what we can do here:
>> + *  + we can only present devices whose Linux drivers will work based
>> + *purely on the device tree with no platform data at all
>> + *  + we want to present a very stripped-down minimalist platform,
>> + *both because this reduces the security attack surface from the guest
>> + *and also because it reduces our exposure to being broken when
>> + *the kernel updates its device tree bindings and requires further
>> + *information in a device binding that we aren't providing.
>> + * This is essentially the same approach kvmtool uses.
>
> nit: questionable value of this last line in the comment.

Well, it says we're not doing something completely bonkers
of our own devising ;-)

> +vbi->bootinfo.board_id = -1;
>
> board_id = -1 ?

This is what the kernel's boot protocol says you should
pass as the board ID register for boards which are new
enough that they only support booting via device tree
and not the old-style board ID number system.

-- PMM

Re: [Qemu-devel] [edk2 PATCH 1/1] OvmfPkg/AcpiPlatformDxe: download ACPI tables from QEMU

2013-11-22 Thread Jordan Justen

On Tue, Nov 12, 2013 at 7:11 AM, Laszlo Ersek  wrote:
> Qemu v1.7.0-rc0 features an ACPI linker/loader interface, available over
> fw_cfg, written by Michael Tsirkin.
>
> Qemu composes all ACPI tables on the host side, according to the target
> hardware configuration, and makes the tables available to any guest
> firmware over fw_cfg.
>
> The feature moves the burden of keeping ACPI tables up-to-date from boot
> firmware to qemu (which is the source of hardware configuration anyway).
>
> This patch adds client code for this feature.
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Laszlo Ersek 
> ---
>  OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h |   7 +-
>  OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c |  12 +-
>  OvmfPkg/AcpiPlatformDxe/Qemu.c | 204 
> +
>  3 files changed, 215 insertions(+), 8 deletions(-)
>
> diff --git a/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h 
> b/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h
> index 21107cd..c643fa1 100644
> --- a/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h
> +++ b/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.h
> @@ -10,7 +10,7 @@
>THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
>WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR 
> IMPLIED.
>
> -**/
> +**/
>
>  #ifndef _ACPI_PLATFORM_H_INCLUDED_
>  #define _ACPI_PLATFORM_H_INCLUDED_
> @@ -61,5 +61,10 @@ InstallXenTables (
>IN   EFI_ACPI_TABLE_PROTOCOL   *AcpiProtocol
>);
>
> +EFI_STATUS
> +EFIAPI
> +InstallQemuLinkedTables (
> +  IN   EFI_ACPI_TABLE_PROTOCOL   *AcpiProtocol
> +  );
>  #endif
>
> diff --git a/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c 
> b/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c
> index 6e0b610..084c393 100644
> --- a/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c
> +++ b/OvmfPkg/AcpiPlatformDxe/AcpiPlatform.c
> @@ -256,16 +256,14 @@ AcpiPlatformEntryPoint (
>
>if (XenDetected ()) {
>  Status = InstallXenTables (AcpiTable);
> -if (EFI_ERROR (Status)) {
> -  Status = FindAcpiTablesInFv (AcpiTable);
> -}
>} else {
> +Status = InstallQemuLinkedTables (AcpiTable);
> +  }
> +
> +  if (EFI_ERROR (Status)) {
>  Status = FindAcpiTablesInFv (AcpiTable);
>}
> -  if (EFI_ERROR (Status)) {
> -return Status;
> -  }
>
> -  return EFI_SUCCESS;
> +  return Status;
>  }
>
> diff --git a/OvmfPkg/AcpiPlatformDxe/Qemu.c b/OvmfPkg/AcpiPlatformDxe/Qemu.c
> index 06bd463..c572f8a 100644
> --- a/OvmfPkg/AcpiPlatformDxe/Qemu.c
> +++ b/OvmfPkg/AcpiPlatformDxe/Qemu.c
> @@ -515,3 +515,207 @@ QemuInstallAcpiTable (
> );
>  }
>
> +
> +/**
> +  Download the ACPI table data file from QEMU and interpret it.
> +
> +  Starting with v1.7.0-rc0, QEMU provides the following three files for 1.7+
> +  machine types:
> +  - etc/acpi/rsdp
> +  - etc/acpi/tables
> +  - etc/table-loader
> +
> +  "etc/acpi/rsdp" and "etc/acpi/tables" are similar, they are only kept
> +  separate because they have different allocation requirements in SeaBIOS.
> +
> +  Both of these fw_cfg files contain preformatted ACPI payload. 
> "etc/acpi/rsdp"
> +  contains only the RSDP table, while "etc/acpi/tables" contains all other
> +  tables, concatenated.
> +
> +  The tables in these two files have been filled in by qemu, but two kinds of
> +  fields are incomplete in each table: pointers to other tables, and 
> checksums
> +  (which depend on the pointers). The pointers are pre-initialized with
> +  *relative* offsets, but their final values depend on where the actual files
> +  will be placed in memory by the guest. That is, the pointer fields need to 
> be
> +  "relocated" (incremented) by the base addresses of where "/etc/acpi/rsdp" 
> and
> +  "/etc/acpi/tables" will be placed in guest memory.
> +
> +  This is where the third file, "/etc/table-loader" comes into the picture. 
> It
> +  is a linker/loader script that has several command types:
> +
> +One command type instructs the guest to download the other two files.
> +
> +Another command type instructs the guest to increment ("absolutize") a
> +pointer field (having a relative initial value) in some file, with the
> +dynamic base address of the same (or another) file.
> +
> +The third command type instructs the guest to compute checksums over
> +ranges and to store them.
> +
> +  In edk2, EFI_ACPI_TABLE_PROTOCOL knows about table relationships -- it
> +  handles linkage automatically when a table is installed. The protocol takes
> +  care of checksumming too. RSDP is installed automatically. Hence we only 
> need
> +  to care about the "etc/acpi/tables" file, determining the boundaries of 
> each
> +  table and installing it.

I'm wondering if some of the information in this comment might have a
better home in the commit message. Will it help in maintaining the
code to have it here? Or, maybe a more terse version can live here?

Of course, I rarely comment my code enough, which is much worse. So,
feel free to ignore this feedback. :)

> +  @param[in

Re: [Qemu-devel] [PATCH v9 07/11] hw/arm: Add 'virt' platform

2013-11-22 Thread Christoffer Dall

On Fri, Nov 22, 2013 at 05:17:14PM +, Peter Maydell wrote:
> Add 'virt' platform support corresponding to arch/arm/mach-virt
> in the Linux kernel tree. This has no platform-specific code but
> can use any device whose kernel driver is is able to work purely
> from a device tree node. We use this to instantiate a minimal
> set of devices: a GIC and some virtio-mmio transports.
> 
> Signed-off-by: John Rigby 
> [PMM:
>  Significantly overhauled:
>  * renamed user-facing machine to just "virt"
>  * removed the A9 support (it can't work since the A9 has no
>generic timers)
>  * added virtio-mmio transports instead of random set of 'soc' devices
>(though we retain a pl011 UART)
>  * instead of updating io_base as we step through adding devices,
>define a memory map with an array (similar to vexpress)
>  * similarly, define irqmap with an array
>  * folded in some minor fixes from John's aarch64-support patch
>  * rather than explicitly doing endian-swapping on FDT cells,
>use fdt APIs that let us just pass in host-endian values
>and let the fdt layer take care of the swapping
>  * miscellaneous minor code cleanups and style fixes
> ]
> Signed-off-by: Peter Maydell 
> ---
>  hw/arm/Makefile.objs |2 +-
>  hw/arm/virt.c|  444 
> ++
>  2 files changed, 445 insertions(+), 1 deletion(-)
>  create mode 100644 hw/arm/virt.c
> 
> diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
> index 3671b42..78b5614 100644
> --- a/hw/arm/Makefile.objs
> +++ b/hw/arm/Makefile.objs
> @@ -1,7 +1,7 @@
>  obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
>  obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
>  obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
> -obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o
> +obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
>  
>  obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
>  obj-y += omap1.o omap2.o strongarm.o
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> new file mode 100644
> index 000..1e42cc2
> --- /dev/null
> +++ b/hw/arm/virt.c
> @@ -0,0 +1,444 @@
> +/*
> + * ARM mach-virt emulation
> + *
> + * Copyright (c) 2013 Linaro Limited
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + *
> + * Emulate a virtual board which works by passing Linux all the information
> + * it needs about what devices are present via the device tree.
> + * There are some restrictions about what we can do here:
> + *  + we can only present devices whose Linux drivers will work based
> + *purely on the device tree with no platform data at all
> + *  + we want to present a very stripped-down minimalist platform,
> + *both because this reduces the security attack surface from the guest
> + *and also because it reduces our exposure to being broken when
> + *the kernel updates its device tree bindings and requires further
> + *information in a device binding that we aren't providing.
> + * This is essentially the same approach kvmtool uses.

nit: questionable value of this last line in the comment.

> + */
> +
> +#include "hw/sysbus.h"
> +#include "hw/arm/arm.h"
> +#include "hw/arm/primecell.h"
> +#include "hw/devices.h"
> +#include "net/net.h"
> +#include "sysemu/device_tree.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/kvm.h"
> +#include "hw/boards.h"
> +#include "exec/address-spaces.h"
> +#include "qemu/bitops.h"
> +#include "qemu/error-report.h"
> +
> +#define NUM_VIRTIO_TRANSPORTS 32
> +
> +/* Number of external interrupt lines to configure the GIC with */
> +#define NUM_IRQS 128
> +
> +#define GIC_FDT_IRQ_TYPE_SPI 0
> +#define GIC_FDT_IRQ_TYPE_PPI 1
> +
> +#define GIC_FDT_IRQ_FLAGS_EDGE_LO_HI 1
> +#define GIC_FDT_IRQ_FLAGS_EDGE_HI_LO 2
> +#define GIC_FDT_IRQ_FLAGS_LEVEL_HI 4
> +#define GIC_FDT_IRQ_FLAGS_LEVEL_LO 8
> +
> +#define GIC_FDT_IRQ_PPI_CPU_START 8
> +#define GIC_FDT_IRQ_PPI_CPU_WIDTH 8
> +
> +enum {
> +VIRT_FLASH,
> +VIRT_MEM,
> +VIRT_CPUPERIPHS,
> +VIRT_GIC_DIST,
> +VIRT_GIC_CPU,
> +VIRT_UART,
> +VIRT_MMIO,
> +};
> +
> +typedef struct MemMapEntry {
> +hwaddr base;
> +hwaddr size;
> +} MemMapEntry;
> +
> +typedef struct VirtBoardInfo {
> +struct arm_boot_info bootinfo;
> +const char *cpu_model;
> +const char *qdevname;
> +const char *gic_compatible;
> +const MemMap

Re: [Qemu-devel] [PATCH v9 04/11] target-arm: Provide PSCI constants to generic QEMU code

2013-11-22 Thread Christoffer Dall

On Fri, Nov 22, 2013 at 05:17:11PM +, Peter Maydell wrote:
> Provide versions of the KVM PSCI constants to non-KVM code;
> this will allow us to avoid an ifdef in boards which set up
> a PSCI node in the device tree.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target-arm/kvm-consts.h |   12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/target-arm/kvm-consts.h b/target-arm/kvm-consts.h
> index 6f56f72..4062f11 100644
> --- a/target-arm/kvm-consts.h
> +++ b/target-arm/kvm-consts.h
> @@ -36,6 +36,18 @@ MISMATCH_CHECK(CP_REG_SIZE_U32, KVM_REG_SIZE_U32)
>  MISMATCH_CHECK(CP_REG_SIZE_U64, KVM_REG_SIZE_U64)
>  MISMATCH_CHECK(CP_REG_ARM, KVM_REG_ARM)
>  
> +#define PSCI_FN_BASE 0x95c1ba5e
> +#define PSCI_FN(n) (PSCI_FN_BASE + (n))
> +#define PSCI_FN_CPU_SUSPEND PSCI_FN(0)
> +#define PSCI_FN_CPU_OFF PSCI_FN(1)
> +#define PSCI_FN_CPU_ON PSCI_FN(2)
> +#define PSCI_FN_MIGRATE PSCI_FN(3)
> +
> +MISMATCH_CHECK(PSCI_FN_CPU_SUSPEND, KVM_PSCI_FN_CPU_SUSPEND)
> +MISMATCH_CHECK(PSCI_FN_CPU_OFF, KVM_PSCI_FN_CPU_OFF)
> +MISMATCH_CHECK(PSCI_FN_CPU_ON, KVM_PSCI_FN_CPU_ON)
> +MISMATCH_CHECK(PSCI_FN_MIGRATE, KVM_PSCI_FN_MIGRATE)
> +
>  #undef MISMATCH_CHECK
>  
>  #endif
> -- 
> 1.7.9.5
> 
These are the defines specifially chosen for the KVM implementation of
PSCI 0.1 right?  I wonder if we should name them as such, but then it
conflicts with the actual KVM defines I guess.  Also, perhaps unrelated,
I guess it makes sense to reuse this base and these function IDs if we
add PSCI support to QEMU itself for TCG.  Hmmm.

-Christoffer

[Qemu-devel] [PATCH v9 04/11] target-arm: Provide PSCI constants to generic QEMU code

2013-11-22 Thread Peter Maydell

Provide versions of the KVM PSCI constants to non-KVM code;
this will allow us to avoid an ifdef in boards which set up
a PSCI node in the device tree.

Signed-off-by: Peter Maydell 
---
 target-arm/kvm-consts.h |   12 
 1 file changed, 12 insertions(+)

diff --git a/target-arm/kvm-consts.h b/target-arm/kvm-consts.h
index 6f56f72..4062f11 100644
--- a/target-arm/kvm-consts.h
+++ b/target-arm/kvm-consts.h
@@ -36,6 +36,18 @@ MISMATCH_CHECK(CP_REG_SIZE_U32, KVM_REG_SIZE_U32)
 MISMATCH_CHECK(CP_REG_SIZE_U64, KVM_REG_SIZE_U64)
 MISMATCH_CHECK(CP_REG_ARM, KVM_REG_ARM)
 
+#define PSCI_FN_BASE 0x95c1ba5e
+#define PSCI_FN(n) (PSCI_FN_BASE + (n))
+#define PSCI_FN_CPU_SUSPEND PSCI_FN(0)
+#define PSCI_FN_CPU_OFF PSCI_FN(1)
+#define PSCI_FN_CPU_ON PSCI_FN(2)
+#define PSCI_FN_MIGRATE PSCI_FN(3)
+
+MISMATCH_CHECK(PSCI_FN_CPU_SUSPEND, KVM_PSCI_FN_CPU_SUSPEND)
+MISMATCH_CHECK(PSCI_FN_CPU_OFF, KVM_PSCI_FN_CPU_OFF)
+MISMATCH_CHECK(PSCI_FN_CPU_ON, KVM_PSCI_FN_CPU_ON)
+MISMATCH_CHECK(PSCI_FN_MIGRATE, KVM_PSCI_FN_MIGRATE)
+
 #undef MISMATCH_CHECK
 
 #endif
-- 
1.7.9.5

[Qemu-devel] [PATCH v9 11/11] hw/arm/virt: Support -cpu host

2013-11-22 Thread Peter Maydell

Support -cpu host in virt machine (treating it like an A15, ie
with a GIC v2 and the A15's private peripherals.)

Signed-off-by: Peter Maydell 
---
 hw/arm/virt.c |8 
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 1e42cc2..9531b5a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -122,6 +122,14 @@ static VirtBoardInfo machines[] = {
 .memmap = a15memmap,
 .irqmap = a15irqmap,
 },
+{
+.cpu_model = "host",
+/* We use the A15 private peripheral model to get a V2 GIC */
+.qdevname = "a15mpcore_priv",
+.gic_compatible = "arm,cortex-a15-gic",
+.memmap = a15memmap,
+.irqmap = a15irqmap,
+},
 };
 
 static VirtBoardInfo *find_machine_info(const char *cpu)
-- 
1.7.9.5

Re: [Qemu-devel] [edk2 PATCH] OvmfPkg: split the variable store to a separate file

2013-11-22 Thread Jordan Justen

On Thu, Nov 21, 2013 at 2:21 PM, Laszlo Ersek  wrote:
> Split the variable store off to a separate file when SPLIT_VARSTORE is
> defined.

Is the goal to allow the central OVMF to be updated so the VM will
automatically be running the newest firmware? (While preserving
variables)

I think in your scenario, you are assuming some VM manager will build
the command line parameters. But, couldn't the VM manager also merge
flash updates into the *single* VM specific flash image? Then QEMU
would not need to be changed.

This might also enable a 'feature' where the particular VM instance
can choose to not update the firmware when the central OVMF is
updated.

If we decided splitting was a good thing, then it would be nice to
just always build the split and full images. I'm not sure .fdf can
handle this though. Seems like build.sh could be tweaked if .fdf is
not up to the task.

-Jordan

> Even in this case, the preexistent PCDs' values don't change. Qemu must
> take care of contiguously mapping NVVARSTORE.fd + OVMF.fd so that when
> concatenated they end exactly at 4GB.
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Laszlo Ersek 
> ---
>  Generated with 8 lines of context for easier in-patch verification with
>  the help of the clipboard.
>
>  OvmfPkg/OvmfPkgIa32.fdf| 48 
> ++
>  OvmfPkg/OvmfPkgIa32X64.fdf | 48 
> ++
>  OvmfPkg/OvmfPkgX64.fdf | 48 
> ++
>  OvmfPkg/README | 16 
>  4 files changed, 160 insertions(+)
>
> diff --git a/OvmfPkg/OvmfPkgIa32.fdf b/OvmfPkg/OvmfPkgIa32.fdf
> index c50709c..310d6a9 100644
> --- a/OvmfPkg/OvmfPkgIa32.fdf
> +++ b/OvmfPkg/OvmfPkgIa32.fdf
> @@ -23,31 +23,51 @@
>  #
>  [Defines]
>  !if $(TARGET) == RELEASE
>  !ifndef $(FD_SIZE_2MB)
>  DEFINE FD_SIZE_1MB=
>  !endif
>  !endif
>
> +!ifdef $(SPLIT_VARSTORE)
> +!ifdef $(FD_SIZE_1MB)
> +[FD.NvVarStore]
> +BaseAddress   = 0xFFF0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress
> +SET gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize = 0x0010
> +Size  = 0x2
> +ErasePolarity = 1
> +BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize
> +NumBlocks = 0x20
> +!else
> +[FD.NvVarStore]
> +BaseAddress   = 0xFFE0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress
> +SET gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize = 0x0020
> +Size  = 0x2
> +ErasePolarity = 1
> +BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize
> +NumBlocks = 0x20
> +!endif
> +!else
>  !ifdef $(FD_SIZE_1MB)
>  [FD.OVMF]
>  BaseAddress   = 0xFFF0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress
>  Size  = 0x0010|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize
>  ErasePolarity = 1
>  BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize
>  NumBlocks = 0x100
>  !else
>  [FD.OVMF]
>  BaseAddress   = 0xFFE0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress
>  Size  = 0x0020|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize
>  ErasePolarity = 1
>  BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize
>  NumBlocks = 0x200
>  !endif
> +!endif
>
>  0x|0xe000
>  
> gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFlashNvStorageVariableBase|gEfiMdeModulePkgTokenSpaceGuid.PcdFlashNvStorageVariableSize
>  #NV_VARIABLE_STORE
>  DATA = {
>## This is the EFI_FIRMWARE_VOLUME_HEADER
># ZeroVector []
>0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> @@ -106,30 +126,58 @@ DATA = {
># WriteQueueSize: UINT64
>0xE0, 0x0F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>  }
>
>  0x0001|0x0001
>  #NV_FTW_SPARE
>  
> gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFlashNvStorageFtwSpareBase|gEfiMdeModulePkgTokenSpaceGuid.PcdFlashNvStorageFtwSpareSize
>
> +!ifdef $(SPLIT_VARSTORE)
> +!ifdef $(FD_SIZE_1MB)
> +[FD.OVMF]
> +BaseAddress   = 0xFFF2
> +Size  = 0x000E
> +ErasePolarity = 1
> +BlockSize = 0x1000
> +NumBlocks = 0xE0
> +
> +0x|0x000CC000
> +FV = FVMAIN_COMPACT
> +0x000CC000|0x14000
> +FV = SECFV
> +!else
> +[FD.OVMF]
> +BaseAddress   = 0xFFE2
> +Size  = 0x001E
> +ErasePolarity = 1
> +BlockSize = 0x1000
> +NumBlocks = 0x1E0
> +
> +0x|0x001AC000
> +FV = FVMAIN_COMPACT
> +0x001AC000|0x34000
> +FV = SECFV
> +!endif
> +!else
>  !ifdef $(FD_SIZE_1MB)
>  0x0002|0x000CC000
>  FV = FVMAIN_COMPACT
>
>  0x000EC000|0x14000
>  FV = SECFV
>
>  !else
>  0x0002|0x001AC000
>  FV = FVMAIN_COMPACT
>
>  0x001CC000|0x34000
>  FV = SECFV
>  !endif
> +!endif
>
>  
> 
>
>  [FD.MEMFD]
>  BaseAddress   = 0x80|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfMemFvBase
>  Size  = 0x80|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfMemFvSize
>  ErasePolarity = 1
>  BlockSize = 0x1
> diff --git a/OvmfPkg/OvmfPkgIa32X64.f

[Qemu-devel] virtio-net: network stops responding in Win2k3 server

2013-11-22 Thread Mario De Chenno

Hi all.
We are facing some network issues on some Windows Server 2003 machines.
In short, network seems to lock up and stops responding even to ping
requests. From TCPdump on the tap interface on the server I only see arp
request to the gateway, without replies. Increased tx overruns on the some
interface too. No errors at all on windows side. Most of the time we can
disable the interface from within windows and re-enable it to make things
work again. Rarely we have to shut down the virtual machine (Windows reboot
does not solve the problem).
The issue appears randomly without apparent relation with server activity.
One server makes a lot of small outbond connections and locks up about
every 12-24 hours. It has two nic defined and only one is affected (the
heaviest loaded). Other servers have just one nic and lock less often.
We run Qemu-kvm 1.4.0 and latest virtio-win drivers. On the same host we
did run a Linux web server with no issue in months of activity, with
traffic peaks up to 80Mb/s.

Here's is the startup command:

qemu-system-x86_64 -enable-kvm -machine type=pc,accel=kvm -cpu kvm64
-pidfile /vmstore/vm_pids/solari.pid -rtc base=localtime -drive
file=/vmstore/vm_disks/solari.img,if=virtio -netdev
tap,id=nic01,ifname=tap-solaripub,script=pub102-ifup -device
virtio-net-pci,netdev=nic01,mac=CE:DA:01:00:17:16 -netdev
tap,id=nic02,ifname=tap-solariconsip,script=consip-ifup -device
virtio-net-pci,netdev=nic02,mac=CE:DA:01:00:16:16 -vnc :16 -m 4096 -boot c
-k it -usbdevice tablet -name solari -daemonize

We tried also the following options without result
,vhost=off,vnet_hdr=on
,ioeventfd=on,event_idx=off


How can we troubleshoot the issue?
Thanks

[Qemu-devel] [PATCH v9 07/11] hw/arm: Add 'virt' platform

2013-11-22 Thread Peter Maydell

Add 'virt' platform support corresponding to arch/arm/mach-virt
in the Linux kernel tree. This has no platform-specific code but
can use any device whose kernel driver is is able to work purely
from a device tree node. We use this to instantiate a minimal
set of devices: a GIC and some virtio-mmio transports.

Signed-off-by: John Rigby 
[PMM:
 Significantly overhauled:
 * renamed user-facing machine to just "virt"
 * removed the A9 support (it can't work since the A9 has no
   generic timers)
 * added virtio-mmio transports instead of random set of 'soc' devices
   (though we retain a pl011 UART)
 * instead of updating io_base as we step through adding devices,
   define a memory map with an array (similar to vexpress)
 * similarly, define irqmap with an array
 * folded in some minor fixes from John's aarch64-support patch
 * rather than explicitly doing endian-swapping on FDT cells,
   use fdt APIs that let us just pass in host-endian values
   and let the fdt layer take care of the swapping
 * miscellaneous minor code cleanups and style fixes
]
Signed-off-by: Peter Maydell 
---
 hw/arm/Makefile.objs |2 +-
 hw/arm/virt.c|  444 ++
 2 files changed, 445 insertions(+), 1 deletion(-)
 create mode 100644 hw/arm/virt.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 3671b42..78b5614 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -1,7 +1,7 @@
 obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
 obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
 obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
-obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o
+obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
 
 obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
 obj-y += omap1.o omap2.o strongarm.o
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
new file mode 100644
index 000..1e42cc2
--- /dev/null
+++ b/hw/arm/virt.c
@@ -0,0 +1,444 @@
+/*
+ * ARM mach-virt emulation
+ *
+ * Copyright (c) 2013 Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ *
+ * Emulate a virtual board which works by passing Linux all the information
+ * it needs about what devices are present via the device tree.
+ * There are some restrictions about what we can do here:
+ *  + we can only present devices whose Linux drivers will work based
+ *purely on the device tree with no platform data at all
+ *  + we want to present a very stripped-down minimalist platform,
+ *both because this reduces the security attack surface from the guest
+ *and also because it reduces our exposure to being broken when
+ *the kernel updates its device tree bindings and requires further
+ *information in a device binding that we aren't providing.
+ * This is essentially the same approach kvmtool uses.
+ */
+
+#include "hw/sysbus.h"
+#include "hw/arm/arm.h"
+#include "hw/arm/primecell.h"
+#include "hw/devices.h"
+#include "net/net.h"
+#include "sysemu/device_tree.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "hw/boards.h"
+#include "exec/address-spaces.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+
+#define NUM_VIRTIO_TRANSPORTS 32
+
+/* Number of external interrupt lines to configure the GIC with */
+#define NUM_IRQS 128
+
+#define GIC_FDT_IRQ_TYPE_SPI 0
+#define GIC_FDT_IRQ_TYPE_PPI 1
+
+#define GIC_FDT_IRQ_FLAGS_EDGE_LO_HI 1
+#define GIC_FDT_IRQ_FLAGS_EDGE_HI_LO 2
+#define GIC_FDT_IRQ_FLAGS_LEVEL_HI 4
+#define GIC_FDT_IRQ_FLAGS_LEVEL_LO 8
+
+#define GIC_FDT_IRQ_PPI_CPU_START 8
+#define GIC_FDT_IRQ_PPI_CPU_WIDTH 8
+
+enum {
+VIRT_FLASH,
+VIRT_MEM,
+VIRT_CPUPERIPHS,
+VIRT_GIC_DIST,
+VIRT_GIC_CPU,
+VIRT_UART,
+VIRT_MMIO,
+};
+
+typedef struct MemMapEntry {
+hwaddr base;
+hwaddr size;
+} MemMapEntry;
+
+typedef struct VirtBoardInfo {
+struct arm_boot_info bootinfo;
+const char *cpu_model;
+const char *qdevname;
+const char *gic_compatible;
+const MemMapEntry *memmap;
+const int *irqmap;
+int smp_cpus;
+void *fdt;
+int fdt_size;
+uint32_t clock_phandle;
+} VirtBoardInfo;
+
+/* Addresses and sizes of our components.
+ * 0..128MB is space for a flash device so we can run bootrom code such as 
UEFI.
+ * 128MB..256MB is used for miscellaneous device I/O.
+ * 256MB..1GB is reserved for possible future PCI support (ie w

[Qemu-devel] [PATCH v9 06/11] target-arm: Allow secondary KVM CPUs to be booted via PSCI

2013-11-22 Thread Peter Maydell

New ARM boards are generally expected to boot their secondary CPUs
via the PSCI interface, rather than ad-hoc "loop around in holding
pen code" as hw/arm/boot.c implements. In particular this is
necessary for mach-virt kernels. For KVM we achieve this by creating
the VCPUs with a feature flag marking them as starting in PSCI
powered-down state; the guest kernel will then make a PSCI call
(implemented in the host kernel) to start the secondaries at
an address of its choosing once it has got the primary CPU up.

Implement this setting of the feature flag, controlled by a
qdev property for ARMCPU, which board code can set if it is a
PSCI system.

Signed-off-by: Peter Maydell 
---
 target-arm/cpu-qom.h |3 +++
 target-arm/cpu.c |7 +++
 target-arm/kvm.c |3 +++
 3 files changed, 13 insertions(+)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index cbb9eec..8bd3e36 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -94,6 +94,9 @@ typedef struct ARMCPU {
 /* 'compatible' string for this CPU for Linux device trees */
 const char *dtb_compatible;
 
+/* Should CPU start in PSCI powered-off state? */
+bool start_powered_off;
+
 /* The instance init functions for implementation-specific subclasses
  * set these fields to specify the implementation-dependent values of
  * various constant registers and reset values of non-constant
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 4c8d9c7..0325815 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -20,6 +20,7 @@
 
 #include "cpu.h"
 #include "qemu-common.h"
+#include "hw/qdev-properties.h"
 #if !defined(CONFIG_USER_ONLY)
 #include "hw/loader.h"
 #endif
@@ -944,6 +945,11 @@ static const ARMCPUInfo arm_cpus[] = {
 #endif
 };
 
+static Property arm_cpu_properties[] = {
+DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
+DEFINE_PROP_END_OF_LIST()
+};
+
 static void arm_cpu_class_init(ObjectClass *oc, void *data)
 {
 ARMCPUClass *acc = ARM_CPU_CLASS(oc);
@@ -952,6 +958,7 @@ static void arm_cpu_class_init(ObjectClass *oc, void *data)
 
 acc->parent_realize = dc->realize;
 dc->realize = arm_cpu_realizefn;
+dc->props = arm_cpu_properties;
 
 acc->parent_reset = cc->reset;
 cc->reset = arm_cpu_reset;
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 3098456..80c58c5 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -79,6 +79,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 init.target = KVM_ARM_TARGET_CORTEX_A15;
 memset(init.features, 0, sizeof(init.features));
+if (cpu->start_powered_off) {
+init.features[0] = 1 << KVM_ARM_VCPU_POWER_OFF;
+}
 ret = kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, &init);
 if (ret) {
 return ret;
-- 
1.7.9.5

[Qemu-devel] [PATCH v9 01/11] target-arm: Provide mechanism for getting KVM constants even if not CONFIG_KVM

2013-11-22 Thread Peter Maydell

There are a number of places where it would be convenient for ARM
code to have working definitions of KVM constants even in code
which is compiled with CONFIG_KVM not set. In this situation we
can't simply include the kernel KVM headers (which might conflict
with host header definitions or not even compile on the compiler
we're using) so we have to redefine equivalent constants.
Provide a mechanism for doing this and checking that the values
match, and use it for the constants we're currently exposing
via an ad-hoc mechanism.

Signed-off-by: Peter Maydell 
---
 target-arm/cpu.h|   13 ++---
 target-arm/kvm-consts.h |   41 +
 target-arm/kvm.c|9 -
 3 files changed, 43 insertions(+), 20 deletions(-)
 create mode 100644 target-arm/kvm-consts.h

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 9f110f1..c3f007f 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -21,6 +21,8 @@
 
 #include "config.h"
 
+#include "kvm-consts.h"
+
 #if defined(TARGET_AARCH64)
   /* AArch64 definitions */
 #  define TARGET_LONG_BITS 64
@@ -497,17 +499,6 @@ void armv7m_nvic_complete_irq(void *opaque, int irq);
 (((cp) << 16) | ((is64) << 15) | ((crn) << 11) |\
  ((crm) << 7) | ((opc1) << 3) | (opc2))
 
-/* Note that these must line up with the KVM/ARM register
- * ID field definitions (kvm.c will check this, but we
- * can't just use the KVM defines here as the kvm headers
- * are unavailable to non-KVM-specific files)
- */
-#define CP_REG_SIZE_SHIFT 52
-#define CP_REG_SIZE_MASK   0x00f0ULL
-#define CP_REG_SIZE_U320x0020ULL
-#define CP_REG_SIZE_U640x0030ULL
-#define CP_REG_ARM 0x4000ULL
-
 /* Convert a full 64 bit KVM register ID to the truncated 32 bit
  * version used as a key for the coprocessor register hashtable
  */
diff --git a/target-arm/kvm-consts.h b/target-arm/kvm-consts.h
new file mode 100644
index 000..6f56f72
--- /dev/null
+++ b/target-arm/kvm-consts.h
@@ -0,0 +1,41 @@
+/*
+ * KVM ARM ABI constant definitions
+ *
+ * Copyright (c) 2013 Linaro Limited
+ *
+ * Provide versions of KVM constant defines that can be used even
+ * when CONFIG_KVM is not set and we don't have access to the
+ * KVM headers. If CONFIG_KVM is set, we do a compile-time check
+ * that we haven't got out of sync somehow.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef ARM_KVM_CONSTS_H
+#define ARM_KVM_CONSTS_H
+
+#ifdef CONFIG_KVM
+#include "qemu/compiler.h"
+#include 
+
+#define MISMATCH_CHECK(X, Y) QEMU_BUILD_BUG_ON(X != Y)
+
+#else
+#define MISMATCH_CHECK(X, Y)
+#endif
+
+#define CP_REG_SIZE_SHIFT 52
+#define CP_REG_SIZE_MASK   0x00f0ULL
+#define CP_REG_SIZE_U320x0020ULL
+#define CP_REG_SIZE_U640x0030ULL
+#define CP_REG_ARM 0x4000ULL
+
+MISMATCH_CHECK(CP_REG_SIZE_SHIFT, KVM_REG_SIZE_SHIFT)
+MISMATCH_CHECK(CP_REG_SIZE_MASK, KVM_REG_SIZE_MASK)
+MISMATCH_CHECK(CP_REG_SIZE_U32, KVM_REG_SIZE_U32)
+MISMATCH_CHECK(CP_REG_SIZE_U64, KVM_REG_SIZE_U64)
+MISMATCH_CHECK(CP_REG_ARM, KVM_REG_ARM)
+
+#undef MISMATCH_CHECK
+
+#endif
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 6e5cd36..3098456 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -23,15 +23,6 @@
 #include "cpu.h"
 #include "hw/arm/arm.h"
 
-/* Check that cpu.h's idea of coprocessor fields matches KVM's */
-#if (CP_REG_SIZE_SHIFT != KVM_REG_SIZE_SHIFT) || \
-(CP_REG_SIZE_MASK != KVM_REG_SIZE_MASK) ||   \
-(CP_REG_SIZE_U32 != KVM_REG_SIZE_U32) || \
-(CP_REG_SIZE_U64 != KVM_REG_SIZE_U64) || \
-(CP_REG_ARM != KVM_REG_ARM)
-#error mismatch between cpu.h and KVM header definitions
-#endif
-
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
-- 
1.7.9.5

[Qemu-devel] [PATCH v9 05/11] target-arm: Add ARMCPU field for Linux device-tree 'compatible' string

2013-11-22 Thread Peter Maydell

Linux requires device tree CPU nodes to include a 'compatible'
string describing the CPU. Add a field in the ARMCPU struct for
this so that boards which construct a device tree can insert
the correct CPU nodes.

Note that there is currently no officially specified 'compatible'
string for the TI925T, Cortex-M3 or SA1110 CPUs.

Signed-off-by: Peter Maydell 
---
 target-arm/cpu-qom.h |3 +++
 target-arm/cpu.c |   50 ++
 2 files changed, 53 insertions(+)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index b55306a..cbb9eec 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -91,6 +91,9 @@ typedef struct ARMCPU {
 /* GPIO outputs for generic timer */
 qemu_irq gt_timer_outputs[NUM_GTIMERS];
 
+/* 'compatible' string for this CPU for Linux device trees */
+const char *dtb_compatible;
+
 /* The instance init functions for implementation-specific subclasses
  * set these fields to specify the implementation-dependent values of
  * various constant registers and reset values of non-constant
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index d40f2a7..4c8d9c7 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -217,6 +217,12 @@ static void arm_cpu_initfn(Object *obj)
ARRAY_SIZE(cpu->gt_timer_outputs));
 #endif
 
+/* DTB consumers generally don't in fact care what the 'compatible'
+ * string is, so always provide some string and trust that a hypothetical
+ * picky DTB consumer will also provide a helpful error message.
+ */
+cpu->dtb_compatible = "qemu,unknown";
+
 if (tcg_enabled() && !inited) {
 inited = true;
 arm_translate_init();
@@ -318,6 +324,8 @@ static ObjectClass *arm_cpu_class_by_name(const char 
*cpu_model)
 static void arm926_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,arm926";
 set_feature(&cpu->env, ARM_FEATURE_V5);
 set_feature(&cpu->env, ARM_FEATURE_VFP);
 set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
@@ -331,6 +339,8 @@ static void arm926_initfn(Object *obj)
 static void arm946_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,arm946";
 set_feature(&cpu->env, ARM_FEATURE_V5);
 set_feature(&cpu->env, ARM_FEATURE_MPU);
 set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
@@ -342,6 +352,8 @@ static void arm946_initfn(Object *obj)
 static void arm1026_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,arm1026";
 set_feature(&cpu->env, ARM_FEATURE_V5);
 set_feature(&cpu->env, ARM_FEATURE_VFP);
 set_feature(&cpu->env, ARM_FEATURE_AUXCR);
@@ -374,6 +386,8 @@ static void arm1136_r2_initfn(Object *obj)
  * for 1136_r2 (in particular r0p2 does not actually implement most
  * of the ID registers).
  */
+
+cpu->dtb_compatible = "arm,arm1136";
 set_feature(&cpu->env, ARM_FEATURE_V6);
 set_feature(&cpu->env, ARM_FEATURE_VFP);
 set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
@@ -403,6 +417,8 @@ static void arm1136_r2_initfn(Object *obj)
 static void arm1136_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,arm1136";
 set_feature(&cpu->env, ARM_FEATURE_V6K);
 set_feature(&cpu->env, ARM_FEATURE_V6);
 set_feature(&cpu->env, ARM_FEATURE_VFP);
@@ -433,6 +449,8 @@ static void arm1136_initfn(Object *obj)
 static void arm1176_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,arm1176";
 set_feature(&cpu->env, ARM_FEATURE_V6K);
 set_feature(&cpu->env, ARM_FEATURE_VFP);
 set_feature(&cpu->env, ARM_FEATURE_VAPA);
@@ -463,6 +481,8 @@ static void arm1176_initfn(Object *obj)
 static void arm11mpcore_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,arm11mpcore";
 set_feature(&cpu->env, ARM_FEATURE_V6K);
 set_feature(&cpu->env, ARM_FEATURE_VFP);
 set_feature(&cpu->env, ARM_FEATURE_VAPA);
@@ -516,6 +536,8 @@ static const ARMCPRegInfo cortexa8_cp_reginfo[] = {
 static void cortex_a8_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,cortex-a8";
 set_feature(&cpu->env, ARM_FEATURE_V7);
 set_feature(&cpu->env, ARM_FEATURE_VFP3);
 set_feature(&cpu->env, ARM_FEATURE_NEON);
@@ -580,6 +602,8 @@ static const ARMCPRegInfo cortexa9_cp_reginfo[] = {
 static void cortex_a9_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,cortex-a9";
 set_feature(&cpu->env, ARM_FEATURE_V7);
 set_feature(&cpu->env, ARM_FEATURE_VFP3);
 set_feature(&cpu->env, ARM_FEATURE_VFP_FP16);
@@ -649,6 +673,8 @@ static const ARMCPRegInfo cortexa15_cp_reginfo[] = {
 static void cortex_a15_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,cortex-a15";
 set_feature(&cpu->env, ARM_FEATURE_V7);
 set_feature(

[Qemu-devel] [PATCH v9 08/11] linux-headers: Update from mainline

2013-11-22 Thread Peter Maydell

Update Linux KVM headers from mainline commit 5d6e63323fe779.

Signed-off-by: Peter Maydell 
---
 linux-headers/asm-arm/kvm.h  |3 +-
 linux-headers/asm-powerpc/epapr_hcalls.h |4 +-
 linux-headers/asm-powerpc/kvm.h  |   86 --
 linux-headers/asm-x86/hyperv.h   |   19 +++
 linux-headers/asm-x86/kvm.h  |6 +--
 linux-headers/linux/kvm.h|   11 
 6 files changed, 119 insertions(+), 10 deletions(-)

diff --git a/linux-headers/asm-arm/kvm.h b/linux-headers/asm-arm/kvm.h
index c1ee007..c498b60 100644
--- a/linux-headers/asm-arm/kvm.h
+++ b/linux-headers/asm-arm/kvm.h
@@ -63,7 +63,8 @@ struct kvm_regs {
 
 /* Supported Processor Types */
 #define KVM_ARM_TARGET_CORTEX_A15  0
-#define KVM_ARM_NUM_TARGETS1
+#define KVM_ARM_TARGET_CORTEX_A7   1
+#define KVM_ARM_NUM_TARGETS2
 
 /* KVM_ARM_SET_DEVICE_ADDR ioctl id encoding */
 #define KVM_ARM_DEVICE_TYPE_SHIFT  0
diff --git a/linux-headers/asm-powerpc/epapr_hcalls.h 
b/linux-headers/asm-powerpc/epapr_hcalls.h
index 33b3f89..06f7247 100644
--- a/linux-headers/asm-powerpc/epapr_hcalls.h
+++ b/linux-headers/asm-powerpc/epapr_hcalls.h
@@ -78,7 +78,7 @@
 #define EV_SUCCESS 0
 #define EV_EPERM   1   /* Operation not permitted */
 #define EV_ENOENT  2   /*  Entry Not Found */
-#define EV_EIO 3   /* I/O error occurred */
+#define EV_EIO 3   /* I/O error occured */
 #define EV_EAGAIN  4   /* The operation had insufficient
 * resources to complete and should be
 * retried
@@ -89,7 +89,7 @@
 #define EV_ENODEV  7   /* No such device */
 #define EV_EINVAL  8   /* An argument supplied to the hcall
   was out of range or invalid */
-#define EV_INTERNAL9   /* An internal error occurred */
+#define EV_INTERNAL9   /* An internal error occured */
 #define EV_CONFIG  10  /* A configuration error was detected */
 #define EV_INVALID_STATE   11  /* The object is in an invalid state */
 #define EV_UNIMPLEMENTED   12  /* Unimplemented hypercall */
diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 0fb1a6e..6836ec7 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -27,6 +27,7 @@
 #define __KVM_HAVE_PPC_SMT
 #define __KVM_HAVE_IRQCHIP
 #define __KVM_HAVE_IRQ_LINE
+#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
__u64 pc;
@@ -269,7 +270,24 @@ struct kvm_fpu {
__u64 fpr[32];
 };
 
+/*
+ * Defines for h/w breakpoint, watchpoint (read, write or both) and
+ * software breakpoint.
+ * These are used as "type" in KVM_SET_GUEST_DEBUG ioctl and "status"
+ * for KVM_DEBUG_EXIT.
+ */
+#define KVMPPC_DEBUG_NONE  0x0
+#define KVMPPC_DEBUG_BREAKPOINT(1UL << 1)
+#define KVMPPC_DEBUG_WATCH_WRITE   (1UL << 2)
+#define KVMPPC_DEBUG_WATCH_READ(1UL << 3)
 struct kvm_debug_exit_arch {
+   __u64 address;
+   /*
+* exiting to userspace because of h/w breakpoint, watchpoint
+* (read, write or both) and software breakpoint.
+*/
+   __u32 status;
+   __u32 reserved;
 };
 
 /* for KVM_SET_GUEST_DEBUG */
@@ -281,10 +299,6 @@ struct kvm_guest_debug_arch {
 * Type denotes h/w breakpoint, read watchpoint, write
 * watchpoint or watchpoint (both read and write).
 */
-#define KVMPPC_DEBUG_NONE  0x0
-#define KVMPPC_DEBUG_BREAKPOINT(1UL << 1)
-#define KVMPPC_DEBUG_WATCH_WRITE   (1UL << 2)
-#define KVMPPC_DEBUG_WATCH_READ(1UL << 3)
__u32 type;
__u32 reserved;
} bp[16];
@@ -429,6 +443,11 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_MMCR0  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x10)
 #define KVM_REG_PPC_MMCR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x11)
 #define KVM_REG_PPC_MMCRA  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x12)
+#define KVM_REG_PPC_MMCR2  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x13)
+#define KVM_REG_PPC_MMCRS  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x14)
+#define KVM_REG_PPC_SIAR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x15)
+#define KVM_REG_PPC_SDAR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x16)
+#define KVM_REG_PPC_SIER   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x17)
 
 #define KVM_REG_PPC_PMC1   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x18)
 #define KVM_REG_PPC_PMC2   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x19)
@@ -499,6 +518,65 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_TLB3PS (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9a)
 #define KVM_REG_PPC_EPTCFG (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9b)
 
+/* Timebase offset */
+#define KVM_REG_PPC_TB_OFFSET  (KVM_R

[Qemu-devel] [PATCH v9 03/11] hw/arm/boot: Allow boards to provide an fdt blob

2013-11-22 Thread Peter Maydell

From: John Rigby 

If no fdt is provided on command line and the new field
get_dtb in struct arm_boot_info is set then call it to
get a device tree blob.

Signed-off-by: John Rigby 
[PMM: minor tweaks and cleanup]
Signed-off-by: Peter Maydell 
---
 hw/arm/boot.c|   32 
 include/hw/arm/arm.h |7 +++
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 583ec79..55d552f 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -228,23 +228,31 @@ static void set_kernel_args_old(const struct 
arm_boot_info *info)
 static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo)
 {
 void *fdt = NULL;
-char *filename;
 int size, rc;
 uint32_t acells, scells;
 
-filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, binfo->dtb_filename);
-if (!filename) {
-fprintf(stderr, "Couldn't open dtb file %s\n", binfo->dtb_filename);
-goto fail;
-}
+if (binfo->dtb_filename) {
+char *filename;
+filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, binfo->dtb_filename);
+if (!filename) {
+fprintf(stderr, "Couldn't open dtb file %s\n", 
binfo->dtb_filename);
+goto fail;
+}
 
-fdt = load_device_tree(filename, &size);
-if (!fdt) {
-fprintf(stderr, "Couldn't open dtb file %s\n", filename);
+fdt = load_device_tree(filename, &size);
+if (!fdt) {
+fprintf(stderr, "Couldn't open dtb file %s\n", filename);
+g_free(filename);
+goto fail;
+}
 g_free(filename);
-goto fail;
+} else if (binfo->get_dtb) {
+fdt = binfo->get_dtb(binfo, &size);
+if (!fdt) {
+fprintf(stderr, "Board was unable to create a dtb blob\n");
+goto fail;
+}
 }
-g_free(filename);
 
 acells = qemu_devtree_getprop_cell(fdt, "/", "#address-cells");
 scells = qemu_devtree_getprop_cell(fdt, "/", "#size-cells");
@@ -438,7 +446,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 /* for device tree boot, we pass the DTB directly in r2. Otherwise
  * we point to the kernel args.
  */
-if (info->dtb_filename) {
+if (info->dtb_filename || info->get_dtb) {
 /* Place the DTB after the initrd in memory. Note that some
  * kernels will trash anything in the 4K page the initrd
  * ends in, so make sure the DTB isn't caught up in that.
diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index ecbbba8..cbbf4ca 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -50,6 +50,13 @@ struct arm_boot_info {
  const struct arm_boot_info *info);
 void (*secondary_cpu_reset_hook)(ARMCPU *cpu,
  const struct arm_boot_info *info);
+/* if a board is able to create a dtb without a dtb file then it
+ * sets get_dtb. This will only be used if no dtb file is provided
+ * by the user. On success, sets *size to the length of the created
+ * dtb, and returns a pointer to it. (The caller must free this memory
+ * with g_free() when it has finished with it.) On failure, returns NULL.
+ */
+void *(*get_dtb)(const struct arm_boot_info *info, int *size);
 /* if a board needs to be able to modify a device tree provided by
  * the user it should implement this hook.
  */
-- 
1.7.9.5

[Qemu-devel] [PATCH v9 09/11] target-arm: Don't hardcode KVM target CPU to be A15

2013-11-22 Thread Peter Maydell

Instead of assuming that a KVM target CPU must always be a
Cortex-A15 and hardcoding this in kvm_arch_init_vcpu(),
store the KVM_ARM_TARGET_* value in the ARMCPU class,
and use that.

Signed-off-by: Peter Maydell 
---
 target-arm/cpu-qom.h|5 +
 target-arm/cpu.c|2 ++
 target-arm/kvm-consts.h |   11 +++
 target-arm/kvm.c|7 ++-
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index 8bd3e36..f32178a 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -97,6 +97,11 @@ typedef struct ARMCPU {
 /* Should CPU start in PSCI powered-off state? */
 bool start_powered_off;
 
+/* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
+ * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
+ */
+uint32_t kvm_target;
+
 /* The instance init functions for implementation-specific subclasses
  * set these fields to specify the implementation-dependent values of
  * various constant registers and reset values of non-constant
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 0325815..0635e78 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -223,6 +223,7 @@ static void arm_cpu_initfn(Object *obj)
  * picky DTB consumer will also provide a helpful error message.
  */
 cpu->dtb_compatible = "qemu,unknown";
+cpu->kvm_target = QEMU_KVM_ARM_TARGET_NONE;
 
 if (tcg_enabled() && !inited) {
 inited = true;
@@ -685,6 +686,7 @@ static void cortex_a15_initfn(Object *obj)
 set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
 set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
 set_feature(&cpu->env, ARM_FEATURE_LPAE);
+cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A15;
 cpu->midr = 0x412fc0f1;
 cpu->reset_fpsid = 0x410430f0;
 cpu->mvfr0 = 0x10110222;
diff --git a/target-arm/kvm-consts.h b/target-arm/kvm-consts.h
index 4062f11..2bba0bd 100644
--- a/target-arm/kvm-consts.h
+++ b/target-arm/kvm-consts.h
@@ -48,6 +48,17 @@ MISMATCH_CHECK(PSCI_FN_CPU_OFF, KVM_PSCI_FN_CPU_OFF)
 MISMATCH_CHECK(PSCI_FN_CPU_ON, KVM_PSCI_FN_CPU_ON)
 MISMATCH_CHECK(PSCI_FN_MIGRATE, KVM_PSCI_FN_MIGRATE)
 
+#define QEMU_KVM_ARM_TARGET_CORTEX_A15 0
+
+/* There's no kernel define for this: sentinel value which
+ * matches no KVM target value for either 64 or 32 bit
+ */
+#define QEMU_KVM_ARM_TARGET_NONE UINT_MAX
+
+#ifndef TARGET_AARCH64
+MISMATCH_CHECK(QEMU_KVM_ARM_TARGET_CORTEX_A15, KVM_ARM_TARGET_CORTEX_A15)
+#endif
+
 #undef MISMATCH_CHECK
 
 #endif
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 80c58c5..182db85 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -77,7 +77,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
 struct kvm_reg_list *rlp;
 ARMCPU *cpu = ARM_CPU(cs);
 
-init.target = KVM_ARM_TARGET_CORTEX_A15;
+if (cpu->kvm_target == QEMU_KVM_ARM_TARGET_NONE) {
+fprintf(stderr, "KVM is not supported for this guest CPU type\n");
+return -EINVAL;
+}
+
+init.target = cpu->kvm_target;
 memset(init.features, 0, sizeof(init.features));
 if (cpu->start_powered_off) {
 init.features[0] = 1 << KVM_ARM_VCPU_POWER_OFF;
-- 
1.7.9.5

[Qemu-devel] [PATCH v9 00/11] target-arm: mach virt and -cpu host support

2013-11-22 Thread Peter Maydell

This patchset combines the 'virt' machine definition and
-cpu host support patchsets I've posted previous versions
of. I think these are now ready to go in once 1.8 opens
up; review appreciated.

Changes since previous versions:
 * added in -cpu host patchset, since it really only makes
   sense with mach-virt
 * rearranged virt address space a bit to allow space
   for a reasonable sized boot flash device and possible
   PCI window if a PCI controller model ever becomes possible
 * fixed bug where we weren't starting secondary CPUs in
   "PSCI power-down" state (by adding a CPU property which
   lets the board specify that the CPU starts powered down)
   [thanks to Giridhar Maruthy for tracking this down]
 * includes a KVM header update against current mainline
 * includes (previously RFC'd) patch which provides defines
   of some kernel KVM constants we can use even if not CONFIG_KVM
 * put the KVM CPU type into the CPU object rather than having
   a silly 'QOM object type => constant' lookup table
 * put dtb 'compatible' string into CPU object rather than
   having the board model need to know about it
 * abstracted out the "create a scratch vcpu for querying
   for capabilities" code; this is currently only called in
   one place, but the abstraction will be needed for 64 bit
   KVM ARM support

Mostly this is cleanup and streamlining which I noticed
was either possible or necessary in the course of putting
64 bit KVM ARM control on top of this patchset.


The kernel patch to get the PL011 to work is still needed
(Christoffer tells me he's working on getting a proper fix
done for this):

diff --git a/arch/arm/mach-virt/virt.c b/arch/arm/mach-virt/virt.c
index b184e57..2b6aceb 100644
--- a/arch/arm/mach-virt/virt.c
+++ b/arch/arm/mach-virt/virt.c
@@ -21,11 +21,13 @@
 #include 
 #include 
 #include 
+#include 

 #include 

 static void __init virt_init(void)
 {
+   of_clk_init(NULL);
of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
 }


John Rigby (1):
  hw/arm/boot: Allow boards to provide an fdt blob

Peter Maydell (10):
  target-arm: Provide mechanism for getting KVM constants even if not
CONFIG_KVM
  device_tree.c: Terminate the empty reservemap in create_device_tree()
  target-arm: Provide PSCI constants to generic QEMU code
  target-arm: Add ARMCPU field for Linux device-tree 'compatible'
string
  target-arm: Allow secondary KVM CPUs to be booted via PSCI
  hw/arm: Add 'virt' platform
  linux-headers: Update from mainline
  target-arm: Don't hardcode KVM target CPU to be A15
  target-arm: Provide '-cpu host' when running KVM
  hw/arm/virt: Support -cpu host

 device_tree.c|4 +
 hw/arm/Makefile.objs |2 +-
 hw/arm/boot.c|   32 ++-
 hw/arm/virt.c|  452 ++
 include/hw/arm/arm.h |7 +
 linux-headers/asm-arm/kvm.h  |3 +-
 linux-headers/asm-powerpc/epapr_hcalls.h |4 +-
 linux-headers/asm-powerpc/kvm.h  |   86 +-
 linux-headers/asm-x86/hyperv.h   |   19 ++
 linux-headers/asm-x86/kvm.h  |6 +-
 linux-headers/linux/kvm.h|   11 +
 target-arm/cpu-qom.h |   11 +
 target-arm/cpu.c |   59 
 target-arm/cpu.h |   13 +-
 target-arm/helper.c  |6 +
 target-arm/kvm-consts.h  |   64 +
 target-arm/kvm.c |  243 +++-
 target-arm/kvm_arm.h |   55 
 18 files changed, 1033 insertions(+), 44 deletions(-)
 create mode 100644 hw/arm/virt.c
 create mode 100644 target-arm/kvm-consts.h

-- 
1.7.9.5

[Qemu-devel] [PATCH v9 10/11] target-arm: Provide '-cpu host' when running KVM

2013-11-22 Thread Peter Maydell

Implement '-cpu host' for ARM when we're using KVM, broadly
in line with other KVM-supporting architectures.

Signed-off-by: Peter Maydell 
---
 target-arm/helper.c  |6 ++
 target-arm/kvm.c |  224 ++
 target-arm/kvm_arm.h |   55 +
 3 files changed, 285 insertions(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 3445813..263dbbf 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1842,6 +1842,12 @@ void arm_cpu_list(FILE *f, fprintf_function cpu_fprintf)
 (*cpu_fprintf)(f, "Available CPUs:\n");
 g_slist_foreach(list, arm_cpu_list_entry, &s);
 g_slist_free(list);
+#ifdef CONFIG_KVM
+/* The 'host' CPU type is dynamically registered only if KVM is
+ * enabled, so we have to special-case it here:
+ */
+(*cpu_fprintf)(f, "  host (only available in KVM mode)\n");
+#endif
 }
 
 static void arm_cpu_add_definition(gpointer data, gpointer user_data)
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 182db85..f865dac 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -27,12 +27,236 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = 
{
 KVM_CAP_LAST_INFO
 };
 
+bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
+  int *fdarray,
+  struct kvm_vcpu_init *init)
+{
+int ret, kvmfd = -1, vmfd = -1, cpufd = -1;
+
+kvmfd = qemu_open("/dev/kvm", O_RDWR);
+if (kvmfd < 0) {
+goto err;
+}
+vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
+if (vmfd < 0) {
+goto err;
+}
+cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
+if (cpufd < 0) {
+goto err;
+}
+
+ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
+if (ret >= 0) {
+ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
+if (ret < 0) {
+goto err;
+}
+} else {
+/* Old kernel which doesn't know about the
+ * PREFERRED_TARGET ioctl: we know it will only support
+ * creating one kind of guest CPU which is its preferred
+ * CPU type.
+ */
+while (*cpus_to_try != QEMU_KVM_ARM_TARGET_NONE) {
+init->target = *cpus_to_try++;
+memset(init->features, 0, sizeof(init->features));
+ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
+if (ret >= 0) {
+break;
+}
+}
+if (ret < 0) {
+goto err;
+}
+}
+
+fdarray[0] = kvmfd;
+fdarray[1] = vmfd;
+fdarray[2] = cpufd;
+
+return true;
+
+err:
+if (cpufd >= 0) {
+close(cpufd);
+}
+if (vmfd >= 0) {
+close(vmfd);
+}
+if (kvmfd >= 0) {
+close(kvmfd);
+}
+
+return false;
+}
+
+void kvm_arm_destroy_scratch_host_vcpu(int *fdarray)
+{
+int i;
+
+for (i = 2; i >= 0; i--) {
+close(fdarray[i]);
+}
+}
+
+static inline void set_feature(uint64_t *features, int feature)
+{
+*features |= 1ULL << feature;
+}
+
+bool kvm_arm_get_host_cpu_features(ARMHostCPUClass *ahcc)
+{
+/* Identify the feature bits corresponding to the host CPU, and
+ * fill out the ARMHostCPUClass fields accordingly. To do this
+ * we have to create a scratch VM, create a single CPU inside it,
+ * and then query that CPU for the relevant ID registers.
+ */
+int i, ret, fdarray[3];
+uint32_t midr, id_pfr0, id_isar0, mvfr1;
+uint64_t features = 0;
+/* Old kernels may not know about the PREFERRED_TARGET ioctl: however
+ * we know these will only support creating one kind of guest CPU,
+ * which is its preferred CPU type.
+ */
+static const uint32_t cpus_to_try[] = {
+QEMU_KVM_ARM_TARGET_CORTEX_A15,
+QEMU_KVM_ARM_TARGET_NONE
+};
+struct kvm_vcpu_init init;
+struct kvm_one_reg idregs[] = {
+{
+.id = KVM_REG_ARM | KVM_REG_SIZE_U32
+| ENCODE_CP_REG(15, 0, 0, 0, 0, 0),
+.addr = (uintptr_t)&midr,
+},
+{
+.id = KVM_REG_ARM | KVM_REG_SIZE_U32
+| ENCODE_CP_REG(15, 0, 0, 1, 0, 0),
+.addr = (uintptr_t)&id_pfr0,
+},
+{
+.id = KVM_REG_ARM | KVM_REG_SIZE_U32
+| ENCODE_CP_REG(15, 0, 0, 2, 0, 0),
+.addr = (uintptr_t)&id_isar0,
+},
+{
+.id = KVM_REG_ARM | KVM_REG_SIZE_U32
+| KVM_REG_ARM_VFP | KVM_REG_ARM_VFP_MVFR1,
+.addr = (uintptr_t)&mvfr1,
+},
+};
+
+if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, &init)) {
+return false;
+}
+
+ahcc->target = init.target;
+
+/* This is not strictly blessed by the device tree binding docs yet,
+ * but in practice the kernel does not care about this string so
+ * there is no point maintaining an KVM_ARM_TARGET_* -> string table.
+ */
+ahcc->dtb_compatible = "arm,arm-v7";
+
+for (i = 0;

[Qemu-devel] [PATCH v9 02/11] device_tree.c: Terminate the empty reservemap in create_device_tree()

2013-11-22 Thread Peter Maydell

Device trees created with create_device_tree() may not have any
entries in their reservemap, because the FDT API requires that the
reservemap is completed before any FDT nodes are added, and
create_device_tree() itself creates a node.  However we were not
calling fdt_finish_reservemap(), which meant that there was no
terminator in the reservemap list and whatever happened to be at the
start of the FDT data section would end up being interpreted as
reservemap entries.  Avoid this by calling fdt_finish_reservemap()
to add the terminator.

Signed-off-by: Peter Maydell 
Acked-by: Alexander Graf 
---
 device_tree.c |4 
 1 file changed, 4 insertions(+)

diff --git a/device_tree.c b/device_tree.c
index ffec99a..391da8c 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -41,6 +41,10 @@ void *create_device_tree(int *sizep)
 if (ret < 0) {
 goto fail;
 }
+ret = fdt_finish_reservemap(fdt);
+if (ret < 0) {
+goto fail;
+}
 ret = fdt_begin_node(fdt, "");
 if (ret < 0) {
 goto fail;
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH 13/27] acpi: memory hotplug ACPI hardware implementation

2013-11-22 Thread Igor Mammedov

On Thu, 21 Nov 2013 16:38:47 +0200
"Michael S. Tsirkin"  wrote:

> On Thu, Nov 21, 2013 at 03:21:37PM +0100, Igor Mammedov wrote:
> > On Thu, 21 Nov 2013 11:42:02 +0200
> > "Michael S. Tsirkin"  wrote:
[...]
> > > > +  [0x14] highest memory hot-plug interface version supported by 
> > > > QEMU
> > > 
> > > So this can make guest fail gracefully but it appears that
> > > detecting guest version would be nicer?
> > > It would let us actually support old guests ...
> > 
> > my idea of how to it was,
> > guest writes its version into [0x14] register and reads QEMU version
> > from it back, if they do not match then than BIOS ignores GPE.3 event
> > effectively disabling hotplug on guest side.
> > I haven't thought about supporting multiple implementations in QEMU though.
> > Do we really want it?
> 
> I'm talking about old bios which does not read acpi from qemu.
> We want it to work even if it can't see hotplugged memory.
to sum up cases that were discussed on IRC.

 * migration from NEW to OLD machine leads us to state when we run new BIOS with
   ACPI tables supporting memory hotplug on OLD machine.
   In general should work since NEW machine has to be started without memory
   hotplug enabled, which leads to disabling related ASL handler in ACPI tables.
   Case where NEW machine with memory hotplug enabled migrating to OLD machine
   without memory hotplug is invalid since migration will fail early due to
   missing devices (DimmBus/DIMM devices)

 * running OLD BIOS on new machine with memory hotplug turned on at QEMU CLI
qemu -m X,slots=Y,maxmem=Z -bios old-bios-image -M pc-1.8

   Problem here is that OLD BIOS doesn't know about 'etc/reserved-memory-end' 
fwcfg,
   so it can start mapping PCI BARs right after high memory (i.e. in memory 
hotplug
   reserved space)

   That's fine as far as there is no DIMM devices since access will fall through
   hotplug-mem container to PCI address space.

   But if DIMM device added at startup or runtime, its MemoryRegion will 
overshadow
   PCI BARs mapped at its range.

   If it were only hotadd case, then QEMU could start with enabled but not 
active
   memory hotplug and require BIOS to acknowledge (activate) hotlug so that 
adding
   DIMM devices wouldn't be possible without hotplug enabled BIOS. 
   That would guarantee empty hotplug-mem container for OLD BIOS since it can't
   activate hotplug and avoid conflict.

   But if there are cold-plugged DIMM devices on QEMU CLI, that wouldn't work
   because OLD BIOS has no idea about them (i.e. it reads above4gb memory CMOS 
value)
   and it is bound to map 64-bit BARs in invalid place.

   It would be nice to run OLD BIOS on new machine and it will even work IF 
memory
   hotplug is not enabled.
   But I'm not sure we should care about case when not compatible BIOS is 
running
   on machine with active memory hotplug.

> 
> > > 
> > > > +  [0x15] Memory device status fields
> > > > +  bits:
> > > > +  1: device is enabled and may be used by guest
> > > > +  2: device insert event, used by ACPI BIOS to distinguish
> > > > + device for which no device check event to OSPM was 
> > > > issued
> > > 
> > > what does the above mean?
> > After OSPM issued device check on selected device it clears this bit to 
> > mark event
> > as handled.
> > It allows to avoid keeping this state in ASL (as it's done for CPU hotplug, 
> > see CPON)
> 
> That's fine.
> 
> > > what if device is not present?
> > ASL will issue device check and clear bit, it might be a bug since _STA 
> > would report
> > not present but no eject event was issued.
> > 
> > Papering over it ASL could check present bit first and issue device check 
> > only if
> > it's present.
> 
> Is this a problem? If yes - that will still be racy won't it?
I thought about it some more and I don't see why it would be racy.
bit 0x15.1 is set when there is initialized DIMM device in slot so 0x15.2 
couldn't
be set if 0x15.1 == 0.

May be following description would be better:

   2: device insert event, used by ACPI BIOS to distinguish
  device for which no device check event to OSPM was issued.
  Bit is valid only when 0x15.1 bit is set.

It's possible to remove is_inserting bit altogether and just send device check 
to
each present memory device. Device check will be NOP for memory devices that 
OSPM
processed.
That will work fine with small amount of slots but would increase load on guest
in case of 1000s devices. That's the main reason for dedicated is_inserting bit,
allowing to handle hotadd effectively.

> 
> 
> Also, should guest reset eject memory that we requested unplug for?
I'm not sure what you saying here in general.
In particular unplug is not part of this series.

[...]
> > > selected?
> > see "write access: [0x0-0x3]"
> 
> yes but you have a typo above
Ahh, blind me. Thanks, I'll fix it.

> 
> > > 
> > > How about we actually require guest to enable memory?
> > > 
> > > This way i

Re: [Qemu-devel] [PATCH v3 for-1.7 resend] rdma: rename 'x-rdma' => 'rdma'

2013-11-22 Thread Michael R. Hines


On 11/23/2013 12:37 AM, Daniel P. Berrange wrote:

On Sat, Nov 23, 2013 at 12:29:51AM +0800, mrhi...@linux.vnet.ibm.com wrote:

From: "Michael R. Hines" 

As far as we can tell, all known bugs have been fixed:
3. Libvirt patches are ready

Please stop claiming this. A proof of concept was posted and got some
review feedback. AFAIK, no followup has been posted to actually finish
the job, nor deal with the new questions around cgroups ACL management
raised last week on qemu-devel

Regards,
Daniel


My apologies. I have them in my laptop, but haven't sent them out yet.

I'm currently in the process of moving my family from one country to
another am very busy resettling, so I was trying to get the most
urgent patch out first before cleaning up the libvirt patch on my laptop.

- Michael

Re: [Qemu-devel] GIT master fails compilation for ACPI

2013-11-22 Thread Paolo Bonzini

Il 22/11/2013 12:16, Erik Rull ha scritto:
> It's getting more and more complex to build qemu, is there a reason why 
> everyone
> needs to build the acpi stuff by himself?

It is only attempted if iasl is installed but as you said below, your
version is too old.  Please run "make V=1" so that we can see what is
the problem.

> It should be something static like the bios binary files, right?

ACPI tables are now generated by QEMU, so the ACPI compilation step
happens while compiling QEMU.

> So you could provide the defaults directly and everyone that wants to modify 
> the
> defaults is free to compile it by himself.
> 
> And if these tools are required, please add an error message at configure
> runtime so that the successive errors at runtime will not appear, because
> compiling is then blocked by configure. And if the IASL is too old, a version
> check at configure runtime would be helpful as well.

Good idea.  Any chance you could help?  Version 20090123 should be new
enough.

Paolo

Re: [Qemu-devel] [PATCH v4 0/7] Drop in_use from BlockDriverState and enable point-in-time snapshot exporting over NBD

2013-11-22 Thread Stefan Hajnoczi

On Fri, Nov 22, 2013 at 01:24:47PM +0800, Fam Zheng wrote:
> This series adds for point-in-time snapshot NBD exporting based on
> blockdev-backup (variant of drive-backup with existing device as target).
> 
> We get a thin point-in-time snapshot by COW mechanism of drive-backup, and
> export it through built in NBD server. The steps are as below:
> 
>  1. (SHELL) qemu-img create -f qcow2 BACKUP.qcow2 
> 
> (Alternatively we can use -o backing_file=RUNNING-VM.img to omit 
> explicitly
> providing the size by ourselves, but it's risky because RUNNING-VM.qcow2 
> is
> used r/w by guest. Whether or not setting backing file in the image file
> doesn't matter, as we are going to override the backing hd in the next
> step)
> 
>  2. (QMP) blockdev-add backing=source-drive file.driver=file 
> file.filename=BACKUP.qcow2 id=target0 if=none driver=qcow2
> 
> (where ide0-hd0 is the running BlockDriverState name for
> RUNNING-VM.img. This patch implements "backing=" option to override
> backing_hd for added drive)
> 
>  3. (QMP) blockdev-backup device=source-drive sync=none target=target0
> 
> (this is the QMP command introduced by this series, which use a named
> device as target of drive-backup)
> 
>  4. (QMP) nbd-server-add device=target0
> 
> When image fleecing done:
> 
>  1. (QMP) block-job-complete device=ide0-hd0
> 
>  2. (HMP) drive_del target0
> 
>  3. (SHELL) rm BACKUP.qcow2

Interesting implementation, it looks pretty good.  I'll need to review
it a second time to track all the operation block/unblocks.  It wasn't
immediately clear to me whether these patches will restrict something
that used to work.

Stefan

Re: [Qemu-devel] [PATCH v3 for-1.7] rdma: rename 'x-rdma' => 'rdma'

2013-11-22 Thread Paolo Bonzini

Il 16/11/2013 11:32, Daniel P. Berrange ha scritto:
> There's also an SELinux question to deal with there. If multiple QEMUs
> need concurrent access we can't do a selective grant of the device just
> when migration is running - we would have to give all QEMU's access
> all the time.  This would be a case where doing FD passing of the
> pre-opened devices might be a better option. It depends on what the
> downsides are to giving QEMU access to the devices unconditionally.

I think unconditional SELinux access + conditional cgroups access would
work best here.

How did Gluster deal with the same problem (for the gluster+rdma:// URI
scheme)?  I guess no one bothered to mention it when the Gluster patches
were committed, but it should be the same.   It would also be the same
for userspace iSCSI if libiscsi were to grow support for iSER (iSCSI
extensions for RDMA).

Paolo

Re: [Qemu-devel] [PATCH v3 for-1.7 resend] rdma: rename 'x-rdma' => 'rdma'

2013-11-22 Thread Daniel P. Berrange

On Sat, Nov 23, 2013 at 12:29:51AM +0800, mrhi...@linux.vnet.ibm.com wrote:
> From: "Michael R. Hines" 
> 
> As far as we can tell, all known bugs have been fixed:

> 3. Libvirt patches are ready

Please stop claiming this. A proof of concept was posted and got some
review feedback. AFAIK, no followup has been posted to actually finish
the job, nor deal with the new questions around cgroups ACL management
raised last week on qemu-devel

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PATCH v3 for-1.7 resend] rdma: rename 'x-rdma' => 'rdma'

2013-11-22 Thread mrhines

From: "Michael R. Hines" 

As far as we can tell, all known bugs have been fixed:

1. Parallel RDMA migrations are working
2. IPv6 migration is working
3. Libvirt patches are ready
4. virt-test is working

Objections?

Reviewed-by: Eric Blake 
Signed-off-by: Michael R. Hines 
---
 docs/rdma.txt|   24 ++--
 migration-rdma.c |2 +-
 migration.c  |6 +++---
 qapi-schema.json |7 +++
 4 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/docs/rdma.txt b/docs/rdma.txt
index 8d1e003..c559f9a 100644
--- a/docs/rdma.txt
+++ b/docs/rdma.txt
@@ -66,7 +66,7 @@ bulk-phase round of the migration and can be enabled for 
extremely
 high-performance RDMA hardware using the following command:
 
 QEMU Monitor Command:
-$ migrate_set_capability x-rdma-pin-all on # disabled by default
+$ migrate_set_capability rdma-pin-all on # disabled by default
 
 Performing this action will cause all 8GB to be pinned, so if that's
 not what you want, then please ignore this step altogether.
@@ -93,12 +93,12 @@ $ migrate_set_speed 40g # or whatever is the MAX of your 
RDMA device
 
 Next, on the destination machine, add the following to the QEMU command line:
 
-qemu . -incoming x-rdma:host:port
+qemu . -incoming rdma:host:port
 
 Finally, perform the actual migration on the source machine:
 
 QEMU Monitor Command:
-$ migrate -d x-rdma:host:port
+$ migrate -d rdma:host:port
 
 PERFORMANCE
 ===
@@ -120,8 +120,8 @@ For example, in the same 8GB RAM example with all 8GB of 
memory in
 active use and the VM itself is completely idle using the same 40 gbps
 infiniband link:
 
-1. x-rdma-pin-all disabled total time: approximately 7.5 seconds @ 9.5 Gbps
-2. x-rdma-pin-all enabled total time: approximately 4 seconds @ 26 Gbps
+1. rdma-pin-all disabled total time: approximately 7.5 seconds @ 9.5 Gbps
+2. rdma-pin-all enabled total time: approximately 4 seconds @ 26 Gbps
 
 These numbers would of course scale up to whatever size virtual machine
 you have to migrate using RDMA.
@@ -407,18 +407,14 @@ socket is broken during a non-RDMA based migration.
 
 TODO:
 =
-1. 'migrate x-rdma:host:port' and '-incoming x-rdma' options will be
-   renamed to 'rdma' after the experimental phase of this work has
-   completed upstream.
-2. Currently, 'ulimit -l' mlock() limits as well as cgroups swap limits
+1. Currently, 'ulimit -l' mlock() limits as well as cgroups swap limits
are not compatible with infinband memory pinning and will result in
an aborted migration (but with the source VM left unaffected).
-3. Use of the recent /proc//pagemap would likely speed up
+2. Use of the recent /proc//pagemap would likely speed up
the use of KSM and ballooning while using RDMA.
-4. Also, some form of balloon-device usage tracking would also
+3. Also, some form of balloon-device usage tracking would also
help alleviate some issues.
-5. Move UNREGISTER requests to a separate thread.
-6. Use LRU to provide more fine-grained direction of UNREGISTER
+4. Use LRU to provide more fine-grained direction of UNREGISTER
requests for unpinning memory in an overcommitted environment.
-7. Expose UNREGISTER support to the user by way of workload-specific
+5. Expose UNREGISTER support to the user by way of workload-specific
hints about application behavior.
diff --git a/migration-rdma.c b/migration-rdma.c
index f94f3b4..eeb4302 100644
--- a/migration-rdma.c
+++ b/migration-rdma.c
@@ -3412,7 +3412,7 @@ void rdma_start_outgoing_migration(void *opaque,
 }
 
 ret = qemu_rdma_source_init(rdma, &local_err,
-s->enabled_capabilities[MIGRATION_CAPABILITY_X_RDMA_PIN_ALL]);
+s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL]);
 
 if (ret) {
 goto err;
diff --git a/migration.c b/migration.c
index b4f8462..d9c7a62 100644
--- a/migration.c
+++ b/migration.c
@@ -81,7 +81,7 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 if (strstart(uri, "tcp:", &p))
 tcp_start_incoming_migration(p, errp);
 #ifdef CONFIG_RDMA
-else if (strstart(uri, "x-rdma:", &p))
+else if (strstart(uri, "rdma:", &p))
 rdma_start_incoming_migration(p, errp);
 #endif
 #if !defined(WIN32)
@@ -423,7 +423,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (strstart(uri, "tcp:", &p)) {
 tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
-} else if (strstart(uri, "x-rdma:", &p)) {
+} else if (strstart(uri, "rdma:", &p)) {
 rdma_start_outgoing_migration(s, p, &local_err);
 #endif
 #if !defined(WIN32)
@@ -501,7 +501,7 @@ bool migrate_rdma_pin_all(void)
 
 s = migrate_get_current();
 
-return s->enabled_capabilities[MIGRATION_CAPABILITY_X_RDMA_PIN_ALL];
+return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
 }
 
 bool migrate_auto_converge(void)
diff --git a/qapi-schema.json b/qapi-schema.json
index 145eca8..92509f6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -61

Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState

2013-11-22 Thread Stefan Hajnoczi

On Fri, Nov 22, 2013 at 01:24:49PM +0800, Fam Zheng wrote:
> +bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp)
> +{
> +BdrvOpBlocker *blocker;
> +assert(op >=0 && op < BLOCK_OP_TYPE_MAX);
> +if (!QLIST_EMPTY(&bs->op_blockers[op])) {
> +blocker = QLIST_FIRST(&bs->op_blockers[op]);
> +*errp = error_copy(blocker->reason);
> +return true;
> +}
> +return false;
> +}

It's worth following the convention that Error **errp may be NULL:

if (errp) {
*errp = error_copy(blocker->reason);
}

The bool return value might be enough for some callers who don't need
the full Error.

[Qemu-devel] [PATCH v2 7/7] blkverify: Don't require protocol filename

2013-11-22 Thread Max Reitz

If the filename is not prefixed by "blkverify:" in
blkverify_parse_filename(), the blkverify driver was not selected
through that protocol prefix, but by an explicit command line option
(like file.driver=blkverify). Contrary to the current reaction, this is
not really a problem; the whole filename just has to be stored (in the
x-image option) and the user has to manually specify the x-raw option.

Signed-off-by: Max Reitz 
---
 block/blkverify.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index 3c63528..bdbdd68 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -78,7 +78,9 @@ static void blkverify_parse_filename(const char *filename, 
QDict *options,
 
 /* Parse the blkverify: prefix */
 if (!strstart(filename, "blkverify:", &filename)) {
-error_setg(errp, "File name string must start with 'blkverify:'");
+/* There was no prefix; therefore, all options have to be already
+   present in the QDict (except for the filename) */
+qdict_put(options, "x-image", qstring_from_str(filename));
 return;
 }
 
-- 
1.8.4.2

[Qemu-devel] [PATCH v2 6/7] blkdebug: Use command-line in read_config()

2013-11-22 Thread Max Reitz

Use qemu_config_parse_qdict() to parse the command-line options in
addition to the config file.

Signed-off-by: Max Reitz 
---
 block/blkdebug.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 1db999b..58bf05a 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -271,11 +271,13 @@ static void remove_rule(BlkdebugRule *rule)
 g_free(rule);
 }
 
-static int read_config(BDRVBlkdebugState *s, const char *filename, Error 
**errp)
+static int read_config(BDRVBlkdebugState *s, const char *filename,
+   QDict *options, Error **errp)
 {
 FILE *f = NULL;
 int ret;
 struct add_rule_data d;
+Error *local_err = NULL;
 
 if (filename) {
 f = fopen(filename, "r");
@@ -294,6 +296,13 @@ static int read_config(BDRVBlkdebugState *s, const char 
*filename, Error **errp)
 }
 }
 
+qemu_config_parse_qdict(options, config_groups, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+
 d.s = s;
 d.action = ACTION_INJECT_ERROR;
 qemu_opts_foreach(&inject_error_opts, add_rule, &d, 0);
@@ -378,9 +387,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
-/* Read rules from config file */
+/* Read rules from config file or command line options */
 config = qemu_opt_get(opts, "config");
-ret = read_config(s, config, errp);
+ret = read_config(s, config, options, errp);
 if (ret) {
 goto fail;
 }
-- 
1.8.4.2

[Qemu-devel] [PATCH v2 4/7] qemu-option: Add qemu_config_parse_qdict()

2013-11-22 Thread Max Reitz

This function basically parses command-line options given as a QDict
replacing a config file.

For instance, the QDict {"section.opt1": 42, "section.opt2": 23}
corresponds to the config file:

[section]
opt1 = 42
opt2 = 23

It is possible to specify multiple sections and also multiple sections
of the same type. On the command line, this looks like the following:

inject-error.0.event=reftable_load,\
inject-error.1.event=l2_load,\
set-state.event=l1_update

This would correspond to the following config file:

[inject-error "inject-error.0"]
event = reftable_load

[inject-error "inject-error.1"]
event = l2_load

[set-state]
event = l1_update

Signed-off-by: Max Reitz 
---
 include/qemu/config-file.h |  6 +++
 util/qemu-config.c | 95 ++
 2 files changed, 101 insertions(+)

diff --git a/include/qemu/config-file.h b/include/qemu/config-file.h
index 508428f..dbd97c4 100644
--- a/include/qemu/config-file.h
+++ b/include/qemu/config-file.h
@@ -4,6 +4,7 @@
 #include 
 #include "qemu/option.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
 
 QemuOptsList *qemu_find_opts(const char *group);
 QemuOptsList *qemu_find_opts_err(const char *group, Error **errp);
@@ -18,6 +19,11 @@ int qemu_config_parse(FILE *fp, QemuOptsList **lists, const 
char *fname);
 
 int qemu_read_config_file(const char *filename);
 
+/* Parse QDict options as a replacement for a config file (allowing multiple
+   enumerated (0..(n-1)) configuration "sections") */
+void qemu_config_parse_qdict(QDict *options, QemuOptsList **lists,
+ Error **errp);
+
 /* Read default QEMU config files
  */
 int qemu_read_default_config_files(bool userconfig);
diff --git a/util/qemu-config.c b/util/qemu-config.c
index 04da942..12178d4 100644
--- a/util/qemu-config.c
+++ b/util/qemu-config.c
@@ -356,3 +356,98 @@ int qemu_read_config_file(const char *filename)
 return -EINVAL;
 }
 }
+
+static void config_parse_qdict_section(QDict *options, QemuOptsList *opts,
+   Error **errp)
+{
+QemuOpts *subopts;
+QDict *subqdict;
+QList *list = NULL;
+Error *local_err = NULL;
+size_t orig_size, enum_size;
+char *prefix;
+
+prefix = g_strdup_printf("%s.", opts->name);
+qdict_extract_subqdict(options, &subqdict, prefix);
+g_free(prefix);
+orig_size = qdict_size(subqdict);
+if (!orig_size) {
+goto out;
+}
+
+subopts = qemu_opts_create_nofail(opts);
+qemu_opts_absorb_qdict(subopts, subqdict, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+goto out;
+}
+
+enum_size = qdict_size(subqdict);
+if (enum_size < orig_size && enum_size) {
+error_setg(errp, "Unknown option '%s' for [%s]",
+   qdict_first(subqdict)->key, opts->name);
+goto out;
+}
+
+if (enum_size) {
+/* Multiple, enumerated sections */
+QListEntry *list_entry;
+unsigned i = 0;
+
+/* Not required anymore */
+qemu_opts_del(subopts);
+
+qdict_array_split(subqdict, &list);
+if (qdict_size(subqdict)) {
+error_setg(errp, "Unused option '%s' for [%s]",
+   qdict_first(subqdict)->key, opts->name);
+goto out;
+}
+
+QLIST_FOREACH_ENTRY(list, list_entry) {
+QDict *section = qobject_to_qdict(qlist_entry_obj(list_entry));
+char *opt_name;
+
+opt_name = g_strdup_printf("%s.%u", opts->name, i++);
+subopts = qemu_opts_create(opts, opt_name, 1, &local_err);
+g_free(opt_name);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+goto out;
+}
+
+qemu_opts_absorb_qdict(subopts, section, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+qemu_opts_del(subopts);
+goto out;
+}
+
+if (qdict_size(section)) {
+error_setg(errp, "[%s] section doesn't support the option 
'%s'",
+   opts->name, qdict_first(section)->key);
+qemu_opts_del(subopts);
+goto out;
+}
+}
+}
+
+out:
+QDECREF(subqdict);
+QDECREF(list);
+}
+
+void qemu_config_parse_qdict(QDict *options, QemuOptsList **lists,
+ Error **errp)
+{
+int i;
+Error *local_err = NULL;
+
+for (i = 0; lists[i]; i++) {
+config_parse_qdict_section(options, lists[i], &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
-- 
1.8.4.2

[Qemu-devel] [PATCH v2 5/7] blkdebug: Always call read_config()

2013-11-22 Thread Max Reitz

Move the check whether there actually is a config file into the
read_config() function.

Signed-off-by: Max Reitz 
---
 block/blkdebug.c | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index b50db9d..1db999b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -273,23 +273,25 @@ static void remove_rule(BlkdebugRule *rule)
 
 static int read_config(BDRVBlkdebugState *s, const char *filename, Error 
**errp)
 {
-FILE *f;
+FILE *f = NULL;
 int ret;
 struct add_rule_data d;
 
-f = fopen(filename, "r");
-if (f == NULL) {
-error_setg_errno(errp, errno, "Could not read blkdebug config file "
- "'%s'", filename);
-return -errno;
-}
+if (filename) {
+f = fopen(filename, "r");
+if (f == NULL) {
+error_setg_errno(errp, errno, "Could not read blkdebug config file 
"
+ "'%s'", filename);
+return -errno;
+}
 
-ret = qemu_config_parse(f, config_groups, filename);
-if (ret < 0) {
-error_setg(errp, "Could not parse blkdebug config file '%s'",
-   filename);
-ret = -EINVAL;
-goto fail;
+ret = qemu_config_parse(f, config_groups, filename);
+if (ret < 0) {
+error_setg(errp, "Could not parse blkdebug config file '%s'",
+   filename);
+ret = -EINVAL;
+goto fail;
+}
 }
 
 d.s = s;
@@ -303,7 +305,9 @@ static int read_config(BDRVBlkdebugState *s, const char 
*filename, Error **errp)
 fail:
 qemu_opts_reset(&inject_error_opts);
 qemu_opts_reset(&set_state_opts);
-fclose(f);
+if (f) {
+fclose(f);
+}
 return ret;
 }
 
@@ -376,11 +380,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 /* Read rules from config file */
 config = qemu_opt_get(opts, "config");
-if (config) {
-ret = read_config(s, config, errp);
-if (ret) {
-goto fail;
-}
+ret = read_config(s, config, errp);
+if (ret) {
+goto fail;
 }
 
 /* Set initial state */
-- 
1.8.4.2

[Qemu-devel] [PATCH v2 3/7] qdict: Add qdict_array_split()

2013-11-22 Thread Max Reitz

This function splits a QDict consisting of entries prefixed by
incrementally enumerated indices into a QList of QDicts.

Signed-off-by: Max Reitz 
---
 include/qapi/qmp/qdict.h |  1 +
 qobject/qdict.c  | 36 
 2 files changed, 37 insertions(+)

diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 5cefd80..1ddf97b 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -68,5 +68,6 @@ QDict *qdict_clone_shallow(const QDict *src);
 void qdict_flatten(QDict *qdict);
 
 void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start);
+void qdict_array_split(QDict *src, QList **dst);
 
 #endif /* QDICT_H */
diff --git a/qobject/qdict.c b/qobject/qdict.c
index 0f3e0a6..ef76281 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -548,3 +548,39 @@ void qdict_extract_subqdict(QDict *src, QDict **dst, const 
char *start)
 entry = next;
 }
 }
+
+/**
+ * qdict_array_split(): This function creates a QList of QDicts. Every entry in
+ * the original QDict with a key prefixed "%u.", where %u designates an 
unsigned
+ * integer starting at 0 and incrementally counting up, will be moved to a new
+ * QDict at index %u in the output QList with the key prefix removed. The
+ * function terminates when there is no entry in the QDict with a prefix
+ * directly (incrementally) following the last one.
+ * Example: {"0.a": 42, "0.b": 23, "1.x": 0, "3.y": 1, "o.o": 7}
+ *  (or {"1.x": 0, "3.y": 1, "0.a": 42, "o.o": 7, "0.b": 23})
+ *   => [{"a": 42, "b": 23}, {"x": 0}]
+ *  and {"3.y": 1, "o.o": 7} (remainder of the old QDict)
+ */
+void qdict_array_split(QDict *src, QList **dst)
+{
+unsigned i;
+
+*dst = qlist_new();
+
+for (i = 0; i < UINT_MAX; i++) {
+QDict *subqdict;
+char prefix[32];
+size_t snprintf_ret;
+
+snprintf_ret = snprintf(prefix, 32, "%u.", i);
+assert(snprintf_ret < 32);
+
+qdict_extract_subqdict(src, &subqdict, prefix);
+if (!qdict_size(subqdict)) {
+QDECREF(subqdict);
+break;
+}
+
+qlist_append_obj(*dst, QOBJECT(subqdict));
+}
+}
-- 
1.8.4.2

[Qemu-devel] [PATCH v2 1/7] blkdebug: Use errp for read_config()

2013-11-22 Thread Max Reitz

Use an Error variable in the read_config() function.

Signed-off-by: Max Reitz 
---
 block/blkdebug.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 16d2b91..9dfa712 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -271,7 +271,7 @@ static void remove_rule(BlkdebugRule *rule)
 g_free(rule);
 }
 
-static int read_config(BDRVBlkdebugState *s, const char *filename)
+static int read_config(BDRVBlkdebugState *s, const char *filename, Error 
**errp)
 {
 FILE *f;
 int ret;
@@ -279,11 +279,16 @@ static int read_config(BDRVBlkdebugState *s, const char 
*filename)
 
 f = fopen(filename, "r");
 if (f == NULL) {
+error_setg_errno(errp, errno, "Could not read blkdebug config file "
+ "'%s'", filename);
 return -errno;
 }
 
 ret = qemu_config_parse(f, config_groups, filename);
 if (ret < 0) {
+error_setg(errp, "Could not parse blkdebug config file '%s'",
+   filename);
+ret = -EINVAL;
 goto fail;
 }
 
@@ -370,9 +375,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* Read rules from config file */
 config = qemu_opt_get(opts, "config");
 if (config) {
-ret = read_config(s, config);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Could not read blkdebug config 
file");
+ret = read_config(s, config, errp);
+if (ret) {
 goto fail;
 }
 }
-- 
1.8.4.2

[Qemu-devel] [PATCH v2 2/7] blkdebug: Don't require sophisticated filename

2013-11-22 Thread Max Reitz

If the filename is not prefixed by "blkdebug:" in
blkdebug_parse_filename(), the blkdebug driver was not selected through
that protocol prefix, but by an explicit command line option
(file.driver=blkdebug or something similar). Contrary to the current
reaction, this is not a problem at all; we just need to store the
filename (in the x-image option) and can go on; the user just has to
manually specify the config option.

Signed-off-by: Max Reitz 
---
 block/blkdebug.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 9dfa712..b50db9d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -315,7 +315,9 @@ static void blkdebug_parse_filename(const char *filename, 
QDict *options,
 
 /* Parse the blkdebug: prefix */
 if (!strstart(filename, "blkdebug:", &filename)) {
-error_setg(errp, "File name string must start with 'blkdebug:'");
+/* There was no prefix; therefore, all options have to be already
+   present in the QDict (except for the filename) */
+qdict_put(options, "x-image", qstring_from_str(filename));
 return;
 }
 
-- 
1.8.4.2

[Qemu-devel] [PATCH v2 0/7] blkdebug/blkverify: Allow command-line configuration

2013-11-22 Thread Max Reitz

Currently, the configuration of blkdebug and blkverify is done through
the "filename" alone. There is now way of manually choosing blkdebug or
blkverify as a driver and using a normal image filename.

In the case of blkdebug, the filename starts with the protocol prefix,
follows up with the name of a configuration file and ends with the name
of the image file.

In the case of blkverify, the filename starts with the protocol prefix,
follows up with the raw reference image filename and ends with the name
of the image file.

This patch allows the configuration of both drivers completely through
command-line options. The driver has to be selected through the
file.driver option (or similar), the image filename has to be given as
the filename (obviously) and, depending on the driver, further options
have to be given to control the behavior.

In case of blkverify, the x-raw option specifies the name of the raw
reference image file.

In case of blkdebug, one may either set the config option to the
filename of a configuration file, or the contents of the configuration
file may be given directly on the command line (see description of patch
4 for an example).

v2:
 - Addressed Fam's comments:
- Added patch 3
- Fixed patch 4 and modified it to use the function introduced in
  patch 3
 - Removed unnecessary error_propagate() in patch 1 (affects patches 5
   and 6 as well)


Max Reitz (7):
  blkdebug: Use errp for read_config()
  blkdebug: Don't require sophisticated filename
  qdict: Add qdict_array_split()
  qemu-option: Add qemu_config_parse_qdict()
  blkdebug: Always call read_config()
  blkdebug: Use command-line in read_config()
  blkverify: Don't require protocol filename

 block/blkdebug.c   | 49 
 block/blkverify.c  |  4 +-
 include/qapi/qmp/qdict.h   |  1 +
 include/qemu/config-file.h |  6 +++
 qobject/qdict.c| 36 ++
 util/qemu-config.c | 95 ++
 6 files changed, 174 insertions(+), 17 deletions(-)

-- 
1.8.4.2

Re: [Qemu-devel] [RFC PATCH 6/8] cpu: Add per-cpu address space

2013-11-22 Thread Edgar E. Iglesias

Hi, no I actually had it in cpustate first but had to do env-get-cpu all
over so i moved it to env. Iiuc env-get-cpu involves a dyn typecheck. I
havent meassured the perf impact though.

Sorry for phone email...

Cheers

---
Sent from my phone
On Nov 22, 2013 4:51 PM, "Peter Maydell"  wrote:

> On 22 November 2013 15:15,   wrote:
> > @@ -176,6 +176,9 @@ typedef struct CPUWatchpoint {
> >  sigjmp_buf jmp_env;
> \
> >  int exception_index;
>  \
> >
>  \
> > +/* Per CPU address-space.  */
> \
> > +AddressSpace *as;
> \
> > +
>  \
>
> Does this really have to live in the env struct rather than
> CPUState ?
>
> thanks
> -- PMM
>

Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting "-sandbox on" by default

2013-11-22 Thread Paul Moore

On Friday, November 22, 2013 04:48:41 PM Stefan Hajnoczi wrote:
> On Fri, Nov 22, 2013 at 09:44:42AM -0500, Paul Moore wrote:
> > On Friday, November 22, 2013 11:39:31 AM Stefan Hajnoczi wrote:
> > > On Thu, Nov 21, 2013 at 10:48:58AM -0500, Paul Moore wrote:
> > > > I'm always open to suggestions on how to improve the
> > > > development/debugging
> > > > process, so if you have any ideas please let me know.
> > > 
> > > The failure mode is terrible:
> > Glad to see you don't feel strongly about things.
> 
> Sorry for the rant :).  I know you and Eduardo understand the issues and
> have already been working on them.

I can't speak for Eduardo, but no worries on my end; it just wouldn't be an 
Open Source project without a bit of hyperbole now and then would it? ;)

> I hope hearing it from a developer who isn't following seccomp is useful
> though.

Definitely.  I should have said it earlier, but I do appreciate you taking the 
time to comment.

> It shows which issues stick out and hinder usability.  Users will only be
> happy with seccomp when it works silently behind the scenes.

Exactly.  Users don't tolerate bugs and I don't blame them.  After all, at 
some point we are all users too.

> Developers will only be happy with seccomp if it's easy and rewarding to
> support/debug.

Agreed.

As a developer, how do you feel about the audit/syslog based approach I 
mentioned earlier?

-- 
paul moore
security and virtualization @ redhat

Re: [Qemu-devel] GIT master fails compilation for ACPI

2013-11-22 Thread Stefan Hajnoczi

On Fri, Nov 22, 2013 at 12:16:37PM +0100, Erik Rull wrote:
> > On November 22, 2013 at 11:56 AM Stefan Hajnoczi  wrote:
> >
> >
> > On Thu, Nov 21, 2013 at 09:44:29PM +0100, Erik Rull wrote:
> > > Hu Tao wrote:
> > > >On Thu, Nov 21, 2013 at 09:06:43AM +0100, Erik Rull wrote:
> > > >>Hi all,
> > > >>
> > > >>when doing a git clone on the latest master, it fails compiling:
> > > >>
> > > >>   CC    x86_64-softmmu/memory_mapping.o
> > > >>   CC    x86_64-softmmu/dump.o
> > > >>   CC    x86_64-softmmu/xen-stub.o
> > > >>   CC    x86_64-softmmu/hw/i386/multiboot.o
> > > >>   CC    x86_64-softmmu/hw/i386/smbios.o
> > > >>   CC    x86_64-softmmu/hw/i386/pc.o
> > > >>   CC    x86_64-softmmu/hw/i386/pc_piix.o
> > > >>   CC    x86_64-softmmu/hw/i386/pc_q35.o
> > > >>   CC    x86_64-softmmu/hw/i386/pc_sysfw.o
> > > >>   CC    x86_64-softmmu/hw/i386/kvmvapic.o
> > > >>   CPP x86_64-softmmu/acpi-dsdt.dsl.i.orig
> > > >>   ACPI_PREPROCESS x86_64-softmmu/acpi-dsdt.dsl.i
> > > >>   IASL x86_64-softmmu/acpi-dsdt.dsl.i
> > > >>make[1]: *** [hw/i386/acpi-dsdt.hex] Error 1
> > > >>make: *** [subdir-x86_64-softmmu] Error 2
> > > >>erik@debian:~/qemu-test/qemu$
> > > >>
> > > >>
> > > >>My configure options are:
> > > >>./configure --prefix= --target-list=x86_64-softmmu --disable-vnc-png
> > > >
> > > >Are you sure about --prefix= and...
> > > >
> > > >>--disable-vnc-jpeg --disable-vnc-tls --disable-vnc-sasl  
> > > >>--audio-drv-list=
> > > >
> > > >... --audio-drv-list= ?
> > > >
> > > >>--enable-sdl --disable-xen --disable-brlapi --disable-bluez 
> > > >>--disable-curl
> > > >>--disable-guest-agent --disable-spice
> > > >
> > > >I can't even do a configure with exactly the above options, it gives an
> > > >error message:
> > > >
> > > >   ERROR: unknown option --disable-vnc-sasl
> > > >
> > > >While removing --prefix= and --audio-drv-list=, the compilation goes
> > > >fine.
> > > >
> > >
> > > No, sorry, the error is still the same! I did a make distclean and
> > > redid the configure without the options you pointed out.
> > >
> > > Any further ideas? Are there more possibilities to debug why the
> > > compilation fails?
> >
> > Just a guess, but do you have the acpica-tools package installed?  You
> > need the iasl compiler.
> >
> > Stefan
> >
> 
> Hi Stefan,
> 
> my distro doesn't seem to have this package:
> debian:/home/erik# apt-get install acpica-tools
> Reading package lists... Done
> Building dependency tree... Done
> E: Couldn't find package acpica-tools
> debian:/home/erik#
> 
> The IASL compiler is installed, here the version:
> Intel ACPI Component Architecture
> ASL Optimizing Compiler version 20060912 [Dec 20 2006]
> Copyright (C) 2000 - 2006 Intel Corporation
> Supports ACPI Specification Revision 3.0a
> 
> And there is no newer version available for my distro.
> 
> It's getting more and more complex to build qemu, is there a reason why 
> everyone
> needs to build the acpi stuff by himself? It should be something static like 
> the
> bios binary files, right? (Sorry for this question, I'm not familiar with 
> ACPI)
> So you could provide the defaults directly and everyone that wants to modify 
> the
> defaults is free to compile it by himself.
> 
> And if these tools are required, please add an error message at configure
> runtime so that the successive errors at runtime will not appear, because
> compiling is then blocked by configure. And if the IASL is too old, a version
> check at configure runtime would be helpful as well.

All of this is reasonable but I don't know the answers :).

CCing Michael Tsirkin for ACPI help

Stefan

Re: [Qemu-devel] [PATCH v3 0/6] Add cache mode option to qemu-iotests, and change default mode to "writeback"

2013-11-22 Thread Stefan Hajnoczi

On Fri, Nov 22, 2013 at 10:30:30AM +0100, Kevin Wolf wrote:
> Am 22.11.2013 um 09:57 hat Stefan Hajnoczi geschrieben:
> > On Thu, Nov 21, 2013 at 03:16:23PM +0100, Kevin Wolf wrote:
> > > Am 21.11.2013 um 13:43 hat Stefan Hajnoczi geschrieben:
> > > > On Wed, Nov 20, 2013 at 03:44:11PM +0800, Fam Zheng wrote:
> > > > > This series adds cache mode option in the iotests framework. Test 
> > > > > cases are
> > > > > updated to make use of cache mode and mask supported modes.
> > > > > 
> > > > > v3: Change _unsupported_qemu_io_options to _supported_cache_modes.
> > > > > Change default mode to "writeback".
> > > > > Clean up some whitespaces in the end of series.
> > > > > Fix "026.out.nocache" case.
> > > > > Fix 048 case on tmpfs.
> > > > > 
> > > > > 
> > > > > Fam Zheng (6):
> > > > >   qemu-iotests: Add "-c " option
> > > > >   qemu-iotests: Honour cache mode in iotests.py
> > > > >   qemu-iotests: Add _supported_cache_modes
> > > > >   qemu-iotests: Change default cache mode to "writeback"
> > > > >   qemu-iotests: Force qcow2 in error path test in 048
> > > > >   qemu-iotests: Clean up spaces in usage output
> > > > > 
> > > > >  tests/qemu-iotests/026|  2 +-
> > > > >  tests/qemu-iotests/039|  2 +-
> > > > >  tests/qemu-iotests/048|  8 +++-
> > > > >  tests/qemu-iotests/052|  3 +--
> > > > >  tests/qemu-iotests/check  |  2 +-
> > > > >  tests/qemu-iotests/common | 37 
> > > > > +++--
> > > > >  tests/qemu-iotests/common.rc  | 20 ++--
> > > > >  tests/qemu-iotests/iotests.py |  3 ++-
> > > > >  8 files changed, 50 insertions(+), 27 deletions(-)
> > > > 
> > > > In principle I'm happy with the series but there are two instances where
> > > > you are silently changing what the test does (overriding cache mode and
> > > > image format).
> > > > 
> > > > Please skip tests that cannot run instead of silently testing something
> > > > else.
> > > 
> > > Skipping isn't really a good solution either. For example, I almost
> > > never test 039, because I always run ./check -nocache. What would be
> > > most useful is to use a working cache mode for this test rather than
> > > leaving it untested over weeks and months.
> > > 
> > > I mean we can make things as sophisticated as we want, and introduce a
> > > list of cache modes in the order of their preference, so that I can say
> > > ./check -c none,writethrough, but that's probably a bit overengineered.
> > 
> > Why not just:
> > 
> > # Before sending a block pull request
> > ./check -c none
> > ./check -c writeback
> > 
> > That covers both O_DIRECT and buffered I/O.
> 
> And takes about twice the time. Sorry, no, in practice it means that 039
> (and I think one or two more) stays untested by me, because running all
> test cases a second time with writethrough (not writeback) would take
> forever.
> 
> > I think 1:1 behavior is important to keep it easy to reason about test
> > cases.  If test cases go off an change behavior behind the scenes it's
> > possible to get confused or draw the wrong conclusions.
> 
> I can understand your position and would agree in theory. I just think
> that running the test with more difficult reasoning is preferable to
> not running the test and therefore not reasoning about it even though
> the reasoning would have been easy.

I think we've gotten some good ideas from an IRC discussion with Fam and
Wenchao.

A "./check" default mode that runs every test case with appropriate
settings is useful.  This makes it easy to run the tests and gives us
good coverage.

Stefan

Re: [Qemu-devel] [RFC PATCH 6/8] cpu: Add per-cpu address space

2013-11-22 Thread Peter Maydell

On 22 November 2013 15:15,   wrote:
> @@ -176,6 +176,9 @@ typedef struct CPUWatchpoint {
>  sigjmp_buf jmp_env; \
>  int exception_index;\
>  \
> +/* Per CPU address-space.  */   \
> +AddressSpace *as;   \
> +\

Does this really have to live in the env struct rather than
CPUState ?

thanks
-- PMM

Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting "-sandbox on" by default

2013-11-22 Thread Stefan Hajnoczi

On Fri, Nov 22, 2013 at 09:44:42AM -0500, Paul Moore wrote:
> On Friday, November 22, 2013 11:39:31 AM Stefan Hajnoczi wrote:
> > On Thu, Nov 21, 2013 at 10:48:58AM -0500, Paul Moore wrote:
> > > I'm always open to suggestions on how to improve the development/debugging
> > > process, so if you have any ideas please let me know.
> > 
> > The failure mode is terrible:
> 
> Glad to see you don't feel strongly about things.

Sorry for the rant :).  I know you and Eduardo understand the issues and
have already been working on them.

I hope hearing it from a developer who isn't following seccomp is useful
though.  It shows which issues stick out and hinder usability.  Users
will only be happy with seccomp when it works silently behind the
scenes.  Developers will only be happy with seccomp if it's easy and
rewarding to support/debug.

Stefan

[Qemu-devel] [RFC PATCH 7/8] exec: On AS changes, only flush affected CPU TLBs

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
---
 exec.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/exec.c b/exec.c
index acbd2e6..f273a46 100644
--- a/exec.c
+++ b/exec.c
@@ -1715,6 +1715,11 @@ static void tcg_commit(MemoryListener *listener)
reset the modified entries */
 /* XXX: slow ! */
 CPU_FOREACH(cpu) {
+/* FIXME: Disentangle the cpu.h circular files deps so we can
+   directly get the right CPU from listener.  */
+if (cpu->tcg_as_listener != listener) {
+continue;
+}
 CPUArchState *env = cpu->env_ptr;
 
 tlb_flush(env, 1);
-- 
1.7.10.4

[Qemu-devel] [RFC PATCH 8/8] exec: Make ldl_*_phys input an AddressSpace

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
---
 cpu-exec.c|5 +-
 exec.c|   19 ---
 hw/dma/pl080.c|9 ++--
 hw/dma/sun4m_iommu.c  |3 +-
 hw/net/vmware_utils.h |2 +-
 hw/ppc/spapr_hcall.c  |8 +--
 hw/s390x/css.c|3 +-
 hw/s390x/s390-virtio-bus.c|2 +-
 hw/s390x/virtio-ccw.c |5 +-
 hw/scsi/megasas.c |4 +-
 hw/scsi/vmw_pvscsi.c  |3 +-
 hw/virtio/virtio.c|3 +-
 include/exec/cpu-common.h |6 +--
 include/hw/ppc/spapr.h|2 +-
 target-alpha/helper.h |2 +-
 target-alpha/mem_helper.c |8 +--
 target-alpha/translate.c  |2 +-
 target-arm/helper.c   |   12 ++---
 target-i386/arch_memory_mapping.c |   36 ++---
 target-i386/helper.c  |8 +--
 target-i386/seg_helper.c  |4 +-
 target-i386/smm_helper.c  |  102 +++--
 target-i386/svm_helper.c  |   26 +-
 target-ppc/excp_helper.c  |2 +-
 target-ppc/mmu-hash32.h   |4 +-
 target-s390x/cpu.c|2 +-
 target-sparc/ldst_helper.c|6 +--
 target-sparc/mmu_helper.c |   18 +++
 target-unicore32/softmmu.c|4 +-
 target-xtensa/helper.c|2 +-
 30 files changed, 163 insertions(+), 149 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 30cfa2a..8f6766b 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -374,7 +374,10 @@ int cpu_exec(CPUArchState *env)
 /* FIXME: this should respect TPR */
 cpu_svm_check_intercept_param(env, SVM_EXIT_VINTR,
   0);
-intno = ldl_phys(env->vm_vmcb + offsetof(struct 
vmcb, control.int_vector));
+intno = ldl_phys(env->as,
+ env->vm_vmcb
+ + offsetof(struct vmcb,
+control.int_vector));
 qemu_log_mask(CPU_LOG_TB_IN_ASM, "Servicing 
virtual hardware INT=0x%02x\n", intno);
 do_interrupt_x86_hardirq(env, intno, 1);
 cpu->interrupt_request &= ~CPU_INTERRUPT_VIRQ;
diff --git a/exec.c b/exec.c
index f273a46..ba67f98 100644
--- a/exec.c
+++ b/exec.c
@@ -1490,7 +1490,7 @@ static uint64_t watch_mem_read(void *opaque, hwaddr addr,
 switch (size) {
 case 1: return ldub_phys(addr);
 case 2: return lduw_phys(addr);
-case 4: return ldl_phys(addr);
+case 4: return ldl_phys(&address_space_memory, addr);
 default: abort();
 }
 }
@@ -2216,7 +2216,7 @@ void cpu_physical_memory_unmap(void *buffer, hwaddr len,
 }
 
 /* warning: addr must be aligned */
-static inline uint32_t ldl_phys_internal(hwaddr addr,
+static inline uint32_t ldl_phys_internal(AddressSpace *as, hwaddr addr,
  enum device_endian endian)
 {
 uint8_t *ptr;
@@ -2225,8 +2225,7 @@ static inline uint32_t ldl_phys_internal(hwaddr addr,
 hwaddr l = 4;
 hwaddr addr1;
 
-mr = address_space_translate(&address_space_memory, addr, &addr1, &l,
- false);
+mr = address_space_translate(as, addr, &addr1, &l, false);
 if (l < 4 || !memory_access_is_direct(mr, false)) {
 /* I/O case */
 io_mem_read(mr, addr1, &val, 4);
@@ -2259,19 +2258,19 @@ static inline uint32_t ldl_phys_internal(hwaddr addr,
 return val;
 }
 
-uint32_t ldl_phys(hwaddr addr)
+uint32_t ldl_phys(AddressSpace *as, hwaddr addr)
 {
-return ldl_phys_internal(addr, DEVICE_NATIVE_ENDIAN);
+return ldl_phys_internal(as, addr, DEVICE_NATIVE_ENDIAN);
 }
 
-uint32_t ldl_le_phys(hwaddr addr)
+uint32_t ldl_le_phys(AddressSpace *as, hwaddr addr)
 {
-return ldl_phys_internal(addr, DEVICE_LITTLE_ENDIAN);
+return ldl_phys_internal(as, addr, DEVICE_LITTLE_ENDIAN);
 }
 
-uint32_t ldl_be_phys(hwaddr addr)
+uint32_t ldl_be_phys(AddressSpace *as, hwaddr addr)
 {
-return ldl_phys_internal(addr, DEVICE_BIG_ENDIAN);
+return ldl_phys_internal(as, addr, DEVICE_BIG_ENDIAN);
 }
 
 /* warning: addr must be aligned */
diff --git a/hw/dma/pl080.c b/hw/dma/pl080.c
index 35b9015..58556f3 100644
--- a/hw/dma/pl080.c
+++ b/hw/dma/pl080.c
@@ -8,6 +8,7 @@
  */
 
 #include "hw/sysbus.h"
+#include "exec/address-spaces.h"
 
 #define PL080_MAX_CHANNELS 8
 #define PL080_CONF_E0x1
@@ -204,10 +205,10 @@ again:
 if (size == 0) {
 /* Transfer complete.  */
 if (ch->lli) {
-ch->src = ldl_le_phys(ch->lli);
-ch->dest = ldl_le_phys(ch->lli + 4);
-ch->ctrl

[Qemu-devel] [RFC PATCH 5/8] memory: Add MemoryListener to typedefs

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
---
 include/qemu/typedefs.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index a4c1b84..425ca1a 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -24,6 +24,7 @@ typedef struct BusClass BusClass;
 typedef struct AddressSpace AddressSpace;
 typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionSection MemoryRegionSection;
+typedef struct MemoryListener MemoryListener;
 
 typedef struct MemoryMappingList MemoryMappingList;
 
-- 
1.7.10.4

[Qemu-devel] [RFC PATCH 1/8] exec: Make tb_invalidate_phys_addr input an AS

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

No functional change.

Signed-off-by: Edgar E. Iglesias 
---
 exec.c|3 ++-
 include/exec/exec-all.h   |2 +-
 target-xtensa/op_helper.c |3 ++-
 translate-all.c   |4 ++--
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 95c4356..189c324 100644
--- a/exec.c
+++ b/exec.c
@@ -411,7 +411,8 @@ static void breakpoint_invalidate(CPUState *cpu, 
target_ulong pc)
 {
 hwaddr phys = cpu_get_phys_page_debug(cpu, pc);
 if (phys != -1) {
-tb_invalidate_phys_addr(phys | (pc & ~TARGET_PAGE_MASK));
+tb_invalidate_phys_addr(&address_space_memory,
+phys | (pc & ~TARGET_PAGE_MASK));
 }
 }
 #endif
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ea90b64..b4dfc07 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -101,7 +101,7 @@ void tlb_flush(CPUArchState *env, int flush_global);
 void tlb_set_page(CPUArchState *env, target_ulong vaddr,
   hwaddr paddr, int prot,
   int mmu_idx, target_ulong size);
-void tb_invalidate_phys_addr(hwaddr addr);
+void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr);
 #else
 static inline void tlb_flush_page(CPUArchState *env, target_ulong addr)
 {
diff --git a/target-xtensa/op_helper.c b/target-xtensa/op_helper.c
index cf97025..89a72b5 100644
--- a/target-xtensa/op_helper.c
+++ b/target-xtensa/op_helper.c
@@ -29,6 +29,7 @@
 #include "helper.h"
 #include "qemu/host-utils.h"
 #include "exec/softmmu_exec.h"
+#include "exec/address-spaces.h"
 
 static void do_unaligned_access(CPUXtensaState *env,
 target_ulong addr, int is_write, int is_user, uintptr_t retaddr);
@@ -90,7 +91,7 @@ static void tb_invalidate_virtual_addr(CPUXtensaState *env, 
uint32_t vaddr)
 int ret = xtensa_get_physical_addr(env, false, vaddr, 2, 0,
 &paddr, &page_size, &access);
 if (ret == 0) {
-tb_invalidate_phys_addr(paddr);
+tb_invalidate_phys_addr(&address_space_memory, paddr);
 }
 }
 
diff --git a/translate-all.c b/translate-all.c
index aeda54d..7596b8d 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1352,13 +1352,13 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 }
 
 #if defined(TARGET_HAS_ICE) && !defined(CONFIG_USER_ONLY)
-void tb_invalidate_phys_addr(hwaddr addr)
+void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 {
 ram_addr_t ram_addr;
 MemoryRegion *mr;
 hwaddr l = 1;
 
-mr = address_space_translate(&address_space_memory, addr, &addr, &l, 
false);
+mr = address_space_translate(as, addr, &addr, &l, false);
 if (!(memory_region_is_ram(mr)
   || memory_region_is_romd(mr))) {
 return;
-- 
1.7.10.4

[Qemu-devel] [RFC PATCH 3/8] exec: Always initialize MemorySection address spaces

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
---
 exec.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/exec.c b/exec.c
index 5e17110..05df217 100644
--- a/exec.c
+++ b/exec.c
@@ -806,6 +806,7 @@ static void register_subpage(AddressSpaceDispatch *d, 
MemoryRegionSection *secti
 
 if (!(existing->mr->subpage)) {
 subpage = subpage_init(d->as, base);
+subsection.address_space = d->as;
 subsection.mr = &subpage->iomem;
 phys_page_set(d, base >> TARGET_PAGE_BITS, 1,
   phys_section_add(&subsection));
@@ -1609,6 +1610,7 @@ static subpage_t *subpage_init(AddressSpace *as, hwaddr 
base)
 static uint16_t dummy_section(MemoryRegion *mr)
 {
 MemoryRegionSection section = {
+.address_space = &address_space_memory,
 .mr = mr,
 .offset_within_address_space = 0,
 .offset_within_region = 0,
-- 
1.7.10.4

[Qemu-devel] [RFC PATCH 6/8] cpu: Add per-cpu address space

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
---
 cputlb.c|4 ++--
 exec.c  |   31 +++
 include/exec/cpu-defs.h |3 +++
 include/exec/exec-all.h |1 +
 include/exec/softmmu_template.h |4 ++--
 include/qom/cpu.h   |2 ++
 6 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 0399172..a2264a3 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -254,7 +254,7 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
 }
 
 sz = size;
-section = address_space_translate_for_iotlb(&address_space_memory, paddr,
+section = address_space_translate_for_iotlb(env->as, paddr,
 &xlat, &sz);
 assert(sz >= TARGET_PAGE_SIZE);
 
@@ -327,7 +327,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, 
target_ulong addr)
 cpu_ldub_code(env1, addr);
 }
 pd = env1->iotlb[mmu_idx][page_index] & ~TARGET_PAGE_MASK;
-mr = iotlb_to_region(&address_space_memory, pd);
+mr = iotlb_to_region(env1->as, pd);
 if (memory_region_is_unassigned(mr)) {
 CPUState *cpu = ENV_GET_CPU(env1);
 CPUClass *cc = CPU_GET_CLASS(cpu);
diff --git a/exec.c b/exec.c
index 0162eb3..acbd2e6 100644
--- a/exec.c
+++ b/exec.c
@@ -129,6 +129,7 @@ static PhysPageMap next_map;
 
 static void io_mem_init(void);
 static void memory_map_init(void);
+static void tcg_commit(MemoryListener *listener);
 
 static MemoryRegion io_mem_watch;
 #endif
@@ -361,6 +362,25 @@ CPUState *qemu_get_cpu(int index)
 return NULL;
 }
 
+#if !defined(CONFIG_USER_ONLY)
+void cpu_address_space_init(CPUState *cpu, AddressSpace *as)
+{
+CPUArchState *env = cpu->env_ptr;
+
+if (tcg_enabled()) {
+if (cpu->tcg_as_listener) {
+memory_listener_unregister(cpu->tcg_as_listener);
+} else {
+cpu->tcg_as_listener = g_new0(MemoryListener, 1);
+}
+cpu->tcg_as_listener->commit = tcg_commit;
+memory_listener_register(cpu->tcg_as_listener, as);
+}
+
+env->as = as;
+}
+#endif
+
 void cpu_exec_init(CPUArchState *env)
 {
 CPUState *cpu = ENV_GET_CPU(env);
@@ -380,6 +400,7 @@ void cpu_exec_init(CPUArchState *env)
 QTAILQ_INIT(&env->breakpoints);
 QTAILQ_INIT(&env->watchpoints);
 #ifndef CONFIG_USER_ONLY
+cpu_address_space_init(cpu, &address_space_memory);
 cpu->thread_id = qemu_get_thread_id();
 #endif
 QTAILQ_INSERT_TAIL(&cpus, cpu, node);
@@ -409,9 +430,10 @@ static void breakpoint_invalidate(CPUState *cpu, 
target_ulong pc)
 #else
 static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
 {
+CPUArchState *env = cpu->env_ptr;
 hwaddr phys = cpu_get_phys_page_debug(cpu, pc);
 if (phys != -1) {
-tb_invalidate_phys_addr(&address_space_memory,
+tb_invalidate_phys_addr(env->as,
 phys | (pc & ~TARGET_PAGE_MASK));
 }
 }
@@ -1717,10 +1739,6 @@ static MemoryListener core_memory_listener = {
 .priority = 1,
 };
 
-static MemoryListener tcg_memory_listener = {
-.commit = tcg_commit,
-};
-
 void address_space_init_dispatch(AddressSpace *as)
 {
 as->dispatch = NULL;
@@ -1755,9 +1773,6 @@ static void memory_map_init(void)
 address_space_init(&address_space_io, system_io, "I/O");
 
 memory_listener_register(&core_memory_listener, &address_space_memory);
-if (tcg_enabled()) {
-memory_listener_register(&tcg_memory_listener, &address_space_memory);
-}
 }
 
 MemoryRegion *get_system_memory(void)
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 01cd8c7..406b36c 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -176,6 +176,9 @@ typedef struct CPUWatchpoint {
 sigjmp_buf jmp_env; \
 int exception_index;\
 \
+/* Per CPU address-space.  */   \
+AddressSpace *as;   \
+\
 /* user data */ \
 void *opaque;   \
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6129365..61770ee 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -95,6 +95,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end,
   int is_cpu_write_access);
 #if !defined(CONFIG_USER_ONLY)
+void cpu_address_space_init(CPUState *cpu, AddressSpace *as);
 /* cputlb.c */
 void tlb_flush_page(CPUArchState *env, target_ulong addr);
 void tlb_flush(CPUA

[Qemu-devel] [RFC PATCH 0/8] Steps towards per CPU address-spaces

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

Hi,

I'm looking at modelling a system where multiple CPUs co-exist with
different views of their attached buses/devs.

With this RFC series I'm trying to take small steps towards having
an address-space per CPU. This is still incomplete but I would like
to get comments early before changing to much. As there are no users
in the tree yet, this code restructure should not result in any
functional changes.

Patch 6 shows an issue with circular header deps between
qemu-common.h, cpu.h. I've got a branch with a possible
solution to that but its very intrusive so I opted to keep that
for later.

Patch 8 shows a possible transform to ld*/st*_phys here only
implemented for ldl_*_phys.

Future work will have to transform more of the cpu_* bus accessing
functions.

Cheers,
Edgar

Edgar E. Iglesias (8):
  exec: Make tb_invalidate_phys_addr input an AS
  exec: Make iotlb_to_region input an AS
  exec: Always initialize MemorySection address spaces
  exec: Make memory_region_section_get_iotlb use section AS
  memory: Add MemoryListener to typedefs
  cpu: Add per-cpu address space
  exec: On AS changes, only flush affected CPU TLBs
  exec: Make ldl_*_phys input an AddressSpace

 cpu-exec.c|5 +-
 cputlb.c  |4 +-
 exec.c|   64 +++
 hw/dma/pl080.c|9 ++--
 hw/dma/sun4m_iommu.c  |3 +-
 hw/net/vmware_utils.h |2 +-
 hw/ppc/spapr_hcall.c  |8 +--
 hw/s390x/css.c|3 +-
 hw/s390x/s390-virtio-bus.c|2 +-
 hw/s390x/virtio-ccw.c |5 +-
 hw/scsi/megasas.c |4 +-
 hw/scsi/vmw_pvscsi.c  |3 +-
 hw/virtio/virtio.c|3 +-
 include/exec/cpu-common.h |6 +--
 include/exec/cpu-defs.h   |3 ++
 include/exec/exec-all.h   |5 +-
 include/exec/softmmu_template.h   |5 +-
 include/hw/ppc/spapr.h|2 +-
 include/qemu/typedefs.h   |1 +
 include/qom/cpu.h |2 +
 target-alpha/helper.h |2 +-
 target-alpha/mem_helper.c |8 +--
 target-alpha/translate.c  |2 +-
 target-arm/helper.c   |   12 ++---
 target-i386/arch_memory_mapping.c |   36 ++---
 target-i386/helper.c  |8 +--
 target-i386/seg_helper.c  |4 +-
 target-i386/smm_helper.c  |  102 +++--
 target-i386/svm_helper.c  |   26 +-
 target-ppc/excp_helper.c  |2 +-
 target-ppc/mmu-hash32.h   |4 +-
 target-s390x/cpu.c|2 +-
 target-sparc/ldst_helper.c|6 +--
 target-sparc/mmu_helper.c |   18 +++
 target-unicore32/softmmu.c|4 +-
 target-xtensa/helper.c|2 +-
 target-xtensa/op_helper.c |3 +-
 translate-all.c   |4 +-
 38 files changed, 215 insertions(+), 169 deletions(-)

-- 
1.7.10.4

[Qemu-devel] [RFC PATCH 2/8] exec: Make iotlb_to_region input an AS

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
---
 cputlb.c|2 +-
 exec.c  |4 ++--
 include/exec/exec-all.h |2 +-
 include/exec/softmmu_template.h |5 +++--
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index fff0afb..0399172 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -327,7 +327,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, 
target_ulong addr)
 cpu_ldub_code(env1, addr);
 }
 pd = env1->iotlb[mmu_idx][page_index] & ~TARGET_PAGE_MASK;
-mr = iotlb_to_region(pd);
+mr = iotlb_to_region(&address_space_memory, pd);
 if (memory_region_is_unassigned(mr)) {
 CPUState *cpu = ENV_GET_CPU(env1);
 CPUClass *cc = CPU_GET_CLASS(cpu);
diff --git a/exec.c b/exec.c
index 189c324..5e17110 100644
--- a/exec.c
+++ b/exec.c
@@ -1618,9 +1618,9 @@ static uint16_t dummy_section(MemoryRegion *mr)
 return phys_section_add(§ion);
 }
 
-MemoryRegion *iotlb_to_region(hwaddr index)
+MemoryRegion *iotlb_to_region(AddressSpace *as, hwaddr index)
 {
-return address_space_memory.dispatch->sections[index & 
~TARGET_PAGE_MASK].mr;
+return as->dispatch->sections[index & ~TARGET_PAGE_MASK].mr;
 }
 
 static void io_mem_init(void)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b4dfc07..6129365 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -324,7 +324,7 @@ extern uintptr_t tci_tb_ptr;
 
 void phys_mem_set_alloc(void *(*alloc)(size_t));
 
-struct MemoryRegion *iotlb_to_region(hwaddr index);
+struct MemoryRegion *iotlb_to_region(AddressSpace *as, hwaddr index);
 bool io_mem_read(struct MemoryRegion *mr, hwaddr addr,
  uint64_t *pvalue, unsigned size);
 bool io_mem_write(struct MemoryRegion *mr, hwaddr addr,
diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
index c6a5440..69d856a 100644
--- a/include/exec/softmmu_template.h
+++ b/include/exec/softmmu_template.h
@@ -22,6 +22,7 @@
  * License along with this library; if not, see .
  */
 #include "qemu/timer.h"
+#include "exec/address-spaces.h"
 #include "exec/memory.h"
 
 #define DATA_SIZE (1 << SHIFT)
@@ -118,7 +119,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState 
*env,
   uintptr_t retaddr)
 {
 uint64_t val;
-MemoryRegion *mr = iotlb_to_region(physaddr);
+MemoryRegion *mr = iotlb_to_region(&address_space_memory, physaddr);
 
 physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
 env->mem_io_pc = retaddr;
@@ -324,7 +325,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
   target_ulong addr,
   uintptr_t retaddr)
 {
-MemoryRegion *mr = iotlb_to_region(physaddr);
+MemoryRegion *mr = iotlb_to_region(&address_space_memory, physaddr);
 
 physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
 if (mr != &io_mem_rom && mr != &io_mem_notdirty && !can_do_io(env)) {
-- 
1.7.10.4

[Qemu-devel] [RFC PATCH 4/8] exec: Make memory_region_section_get_iotlb use section AS

2013-11-22 Thread edgar . iglesias

From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
---
 exec.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index 05df217..0162eb3 100644
--- a/exec.c
+++ b/exec.c
@@ -709,7 +709,7 @@ hwaddr memory_region_section_get_iotlb(CPUArchState *env,
 iotlb |= PHYS_SECTION_ROM;
 }
 } else {
-iotlb = section - address_space_memory.dispatch->sections;
+iotlb = section - section->address_space->dispatch->sections;
 iotlb += xlat;
 }
 
-- 
1.7.10.4

[Qemu-devel] [RFC] hw/net: add DEC Tulip NIC emulation

2013-11-22 Thread Dmitry Smagin

From: Antony Pavlov 

This patch adds emulation of DEC/Intel Tulip 21143 with some external chips:
  * Intel LXT971A 10/100 Mbps PHY MII Tranceiver;
  * Microchip 93LC46B 1K Microwire Compatible Serial EEPROM.

Restrictions and TODOs:
  - Tulip always work in promisc-mode with no packet filtering
(TODO: check qemu packet filtering interfaces)

  - Address errors in packet descriptors are not checked

  - Tulip is bound to PCI, no other bus is possible for now
(TODO: Rewrite code to permit Tulip to live on buses other than PCI)

  - Internal transceiver is not emulated
The reason: internal transceiver works with 10 Mbps but we need 100Mbps
which only external transceivers can give.

  - Link status is not emulated, it's always ON
(TODO: use standard qemu interface)

  - Incorrect behavior of the Tulip driver is not checked and there's no error
signaling via register CSR5

  - Only transfer/receive (TI/RI) interrupts are emulated

  - Subsystem IDs in EEPROM image are not good
(TODO: find proper IDs)

TESTING
---

NOTE:
  buildroot with tulip-friendly config for qemu is here:
https://github.com/dmitrysmagin/buildroot.git : master
(configs/qemu_mips_malta_tulip_defconfig)

  $ git clone https://github.com/dmitrysmagin/qemu.git
  $ cd qemu
  $ ./configure --target-list=mips-softmmu
  ...
  $ make
  ...
  $ ./mips-softmmu/qemu-system-mips \
   -M malta \
   -m 256 \
   -nodefaults \
   -nographic \
   -kernel qemu-malta-testing/vmlinux \
   -initrd qemu-malta-testing/rootfs.cpio \
   -serial stdio \
   -net nic,vlan=0,model=tulip,macaddr=00:11:22:33:44:ee

  Linux version 3.12.0 (exmortis@lazar) (gcc version 4.7.3 (Sourcery CodeBench 
Lite 2013.05-66) ) #1 SMP Thu Nov 14 11:58:04 MSK 2013
  ...
  Welcome to Buildroot
  buildroot login: root
  # lspci -d 0x1011:
  00:12.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 
(rev 41)
  # ifconfig eth0
  eth0  Link encap:Ethernet  HWaddr 00:11:22:33:44:EE
BROADCAST MULTICAST  MTU:1500  Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

  To test data transfer please use '-net tap,vlan=0,...' or
  '-net vde,vlan=0,...' (depends on your local network settings).

Signed-off-by: Antony Pavlov 
Signed-off-by: Dmitry Smagin 
---
 default-configs/pci.mak |   1 +
 hw/net/Makefile.objs|   1 +
 hw/net/tulip.c  | 680 
 hw/net/tulip_mdio.c | 196 +
 hw/net/tulip_mdio.h |  98 +++
 hw/net/tulip_uwire_eeprom.c | 111 
 hw/net/tulip_uwire_eeprom.h |  60 
 hw/pci/pci.c|   2 +
 include/hw/pci/pci_ids.h|   1 +
 9 files changed, 1150 insertions(+)
 create mode 100644 hw/net/tulip.c
 create mode 100644 hw/net/tulip_mdio.c
 create mode 100644 hw/net/tulip_mdio.h
 create mode 100644 hw/net/tulip_uwire_eeprom.c
 create mode 100644 hw/net/tulip_uwire_eeprom.h

diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index 91b1e92..28a282e 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -18,6 +18,7 @@ CONFIG_MEGASAS_SCSI_PCI=y
 CONFIG_RTL8139_PCI=y
 CONFIG_E1000_PCI=y
 CONFIG_VMXNET3_PCI=y
+CONFIG_TULIP_PCI=y
 CONFIG_IDE_CORE=y
 CONFIG_IDE_QDEV=y
 CONFIG_IDE_PCI=y
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index 951cca3..014c24b 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -10,6 +10,7 @@ common-obj-$(CONFIG_E1000_PCI) += e1000.o
 common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
 common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet_tx_pkt.o vmxnet_rx_pkt.o
 common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
+common-obj-$(CONFIG_TULIP_PCI) += tulip.o tulip_uwire_eeprom.o tulip_mdio.o
 
 common-obj-$(CONFIG_SMC91C111) += smc91c111.o
 common-obj-$(CONFIG_LAN9118) += lan9118.o
diff --git a/hw/net/tulip.c b/hw/net/tulip.c
new file mode 100644
index 000..bd52413
--- /dev/null
+++ b/hw/net/tulip.c
@@ -0,0 +1,680 @@
+/*
+ * QEMU DEC 21143 (Tulip) emulation
+ *
+ * Copyright (C) 2011, 2013 Antony Pavlov
+ * Copyright (C) 2013 Dmitry Smagin
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "hw/hw.h"
+#include "hw

[Qemu-devel] [RFC][PATCH] qemu-img: add support for skipping zeroes in input during convert

2013-11-22 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 qemu-img.c |   52 ++--
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index b6b5644..808f8f8 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1121,8 +1121,9 @@ out3:
 
 static int img_convert(int argc, char **argv)
 {
-int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size,
+int c, n, n1, bs_n, bs_i, compress, cluster_size,
 cluster_sectors, skip_create;
+int64_t ret = 0;
 int progress = 0, flags;
 const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
 BlockDriver *drv, *proto_drv;
@@ -1479,11 +1480,6 @@ static int img_convert(int argc, char **argv)
 if (nb_sectors <= 0) {
 break;
 }
-if (nb_sectors >= (IO_BUF_SIZE / 512)) {
-n = (IO_BUF_SIZE / 512);
-} else {
-n = nb_sectors;
-}
 
 while (sector_num - bs_offset >= bs_sectors) {
 bs_i ++;
@@ -1495,34 +1491,46 @@ static int img_convert(int argc, char **argv)
sector_num, bs_i, bs_offset, bs_sectors); */
 }
 
-if (n > bs_offset + bs_sectors - sector_num) {
-n = bs_offset + bs_sectors - sector_num;
-}
-
-/* If the output image is being created as a copy on write image,
-   assume that sectors which are unallocated in the input image
-   are present in both the output's and input's base images (no
-   need to copy them). */
-if (out_baseimg) {
-ret = bdrv_is_allocated(bs[bs_i], sector_num - bs_offset,
-n, &n1);
+if (out_baseimg || has_zero_init) {
+n = nb_sectors > INT_MAX ? INT_MAX : nb_sectors;
+ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset,
+n, &n1);
 if (ret < 0) {
-error_report("error while reading metadata for sector "
- "%" PRId64 ": %s",
+error_report("error while reading block status of sector 
%" PRId64 ": %s",
  sector_num - bs_offset, strerror(-ret));
 goto out;
 }
-if (!ret) {
+/* If the output image is zero initialized, we are not working
+ * on a shared base and the input is zero we can skip the next
+ * n1 bytes */
+if (!out_baseimg && has_zero_init && ret & BDRV_BLOCK_ZERO) {
+sector_num += n1;
+continue;
+}
+/* If the output image is being created as a copy on write 
image,
+   assume that sectors which are unallocated in the input image
+   are present in both the output's and input's base images (no
+   need to copy them). */
+if (out_baseimg && !(ret & BDRV_BLOCK_DATA)) {
 sector_num += n1;
 continue;
 }
 /* The next 'n1' sectors are allocated in the input image. Copy
only those as they may be followed by unallocated sectors. 
*/
-n = n1;
+nb_sectors = n1;
+}
+
+if (nb_sectors >= (IO_BUF_SIZE / 512)) {
+n = (IO_BUF_SIZE / 512);
 } else {
-n1 = n;
+n = nb_sectors;
+}
+
+if (n > bs_offset + bs_sectors - sector_num) {
+n = bs_offset + bs_sectors - sector_num;
 }
 
+n1 = n;
 ret = bdrv_read(bs[bs_i], sector_num - bs_offset, buf, n);
 if (ret < 0) {
 error_report("error while reading sector %" PRId64 ": %s",
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting "-sandbox on" by default

2013-11-22 Thread Paul Moore

On Friday, November 22, 2013 11:39:31 AM Stefan Hajnoczi wrote:
> On Thu, Nov 21, 2013 at 10:48:58AM -0500, Paul Moore wrote:
> > I'm always open to suggestions on how to improve the development/debugging
> > process, so if you have any ideas please let me know.
> 
> The failure mode is terrible:

Glad to see you don't feel strongly about things.

>   "The following fails to start here (the shell hangs and ps shows QEMU
>   is a  process)"
> 
> When a process dies the shell prints a warning.  That alerts the user
> and prompts them to take further steps.
> 
> Hanging the shell is a really bad way to fail.  We can't expect users to
> begin searching logs for audit failures.  They probably don't even know
> about audit or seccomp.

First things first, if a normal user hits a seccomp failure I consider it to 
be a bug, and to be honest, I've seen much nastier, much more subtle bugs than 
an inadvertent seccomp "death".  In a perfect world only developers would run 
into this problem and I would expect developers to be smart enough to figure 
out what is going on.

Getting past that, if seccomp is configured to kill the process when it 
violates the filter rules there isn't much else we can do; the kernel kills 
the process and then the rest of userspace, e.g. the shell/libvirt/etc., does 
whatever it does.  We have very little control over things.

> Is it possible to produce output when a seccomp violation takes place?

See the earlier comments from Eduardo about his attempts in this area.

-- 
paul moore
security and virtualization @ redhat

Re: [Qemu-devel] [PATCH] vl: verify if combination of cpus, sockets, cores and threads is sane

2013-11-22 Thread Andreas Färber

Am 22.11.2013 12:20, schrieb Paolo Bonzini:
> Il 22/11/2013 12:13, Peter Lieven ha scritto:
>>> I see where you come from, but I think the potential for this patch to
>>> break some working configuration (for some definition of working) is too
>>> high.  Can you split out the fixes to the "fill in the blanks" logic?
>>
>> I can, but the number of sockets is logal to the parse function.
> 
> Not sure why that matters, just make two patches instead of one.

...and please CC me and possibly Eduardo who refactored it last IIRC.

Andreas

>> What would you think is it okay to just send a warning about
>> the illegal config and drop the exit(1).
> 
> Yes, that would be okay.
> 
> Paolo

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting "-sandbox on" by default

2013-11-22 Thread Paul Moore

On Friday, November 22, 2013 11:34:41 AM Stefan Hajnoczi wrote:
> IMO this seccomp approach is doomed since QEMU does not practice
> privilege separation.  QEMU is monolithic so it's really hard to create
> a meaningful sets of system calls.

I'm a big fan of decomposing QEMU, but based on previous discussions there 
seems to be a lot of fear from the core QEMU folks around decomposition; 
enough that I'm not sure it is worth the time and effort at this point to 
pursue it.  While I agree that a decomposed QEMU would be able to make better 
use of syscall filtering (and LSM/SELinux protection, and ...) I don't believe 
it means syscall filtering is a complete lost cause with a monolithic QEMU.  
Any improvement you can make, no matter how small, is still and improvement.

> To avoid breaking stuff you need to be too liberal, defeating the purpose of
> seccomp.

Even if you can only disable a few syscalls you are still better off than you 
were before.  Could it be done better, of course it could, but it doesn't mean 
you shouldn't try for some benefit.

> For each QEMU command-line there may be a different set of syscalls that
> should be allowed/forbidden.

I'm not sure if you missed it or not, but I had an email exchange with Eduardo 
on this list about making the syscall whitelist a bit more "intelligent" and 
dependent on what functionality was enabled for a given QEMU instance.  This 
should help a bit with the problems you are describing.

> The existing approach clearly doesn't support the full range of options
> that users specify on the command-line.

Bugs.  It will get fixed in time with more testing/debugging.  Eduardo is 
working on improving the testing and RH's QA folks are working hard to shake 
out the bugs too.  I just posted another bug fix patch to the whitelist a few 
days ago.

> So I guess the options are:
> 
> 1. Don't make it the default since it breaks stuff but use it for very
> specific scenarios (e.g. libvirt use cases that have been well tested).

In my opinion, I think it was probably a bit premature to make enable it by 
default, but at some point in the future I think we do need to do this.

> 2. Provide a kind of syscall set for various QEMU options and apply the
> union of them at launch.  This still seems fragile but in theory it
> could work.

This is what I was discussing above.  I think this is likely the next big 
improvement.

-- 
paul moore
security and virtualization @ redhat

Re: [Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch

2013-11-22 Thread Vlad Yasevich

On 11/22/2013 04:47 AM, Jason Wang wrote:
> On 11/22/2013 04:04 AM, Vlad Yasevich wrote:
>> e1000 provides a E1000_RAH_AV bit on every complete write
>> to the Receive Address Register.  We can use this bit
>> 2 ways:
>>  1) To trigger HMP notifications.  When the bit is set the
>> mac address is fully set and we can update the HMP.
>>
>>  2) We can turn off he bit on the write to low order bits of
>> the Receive Address Register, so that we would not try
>> to match received traffic to this address when it is
>> not completely set.
>>
>> Signed-off-by: Vlad Yasevich 
>> ---
>>  hw/net/e1000.c | 11 ++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index ae63591..82978ea 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -1106,10 +1106,19 @@ mac_writereg(E1000State *s, int index, uint32_t val)
>>  
>>  s->mac_reg[index] = val;
>>  
>> -if (index == RA || index == RA + 1) {
>> +switch (index) {
>> +case RA:
>> +/* Mask off AV bit on the write of the low dword.  The write of
>> + * the high dword will set the bit.  This way a half-written
>> + * mac address will not be used to filter on rx.
>> + */
>> +s->mac_reg[RA+1] &= ~E1000_RAH_AV;
> 
> If a stupid driver write high dword first, it won't receive any packets.

I need to ping Intel guys again.  I asked them what happens when only
the low register is set, but haven't heard back.

>> +break;
>> +case (RA + 1):
>>  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
>>  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t 
>> *)macaddr);
> 
> Guest may invalid the mac address by clearing the AV bit through writing
> to high dword. So this may notify a wrong mac address.

In this case, testing for the AV bit would solve this issue.
> 
> Generally, we could teset the AV bit before notification, and try to do
> the this on both high and low dword. This obeys specs and
> receive_filter() above.

This will not really help since the AV bit would already be set from the
prior mac address.  So, if a stupid driver writes just the low word,
the AV bit would already be set.

> 
> If we don't want half-written status, driver should clear AV bit before
> each writing of new mac address. But looks like linux and freebsd does
> not do this.  But the window is really small and harmless.

We can emulate this.  Thanks for the idea.

-vlad

>> +break;
>>  }
>>  }
>>  
>

[Qemu-devel] [RESENT PATCH for-1.7] curses: fixup SIGWINCH handler mess

2013-11-22 Thread Gerd Hoffmann

Don't run code in the signal handler, only set a flag.
Use sigaction(2) to avoid non-portable signal(2) semantics.
Make #ifdefs less messy.

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Laszlo Ersek 
---
 ui/curses.c | 44 
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/ui/curses.c b/ui/curses.c
index 289a955..dbc3d5e 100644
--- a/ui/curses.c
+++ b/ui/curses.c
@@ -106,9 +106,9 @@ static void curses_resize(DisplayChangeListener *dcl,
 curses_calc_pad();
 }
 
-#ifndef _WIN32
-#if defined(SIGWINCH) && defined(KEY_RESIZE)
-static void curses_winch_handler(int signum)
+#if !defined(_WIN32) && defined(SIGWINCH) && defined(KEY_RESIZE)
+static volatile sig_atomic_t got_sigwinch;
+static void curses_winch_check(void)
 {
 struct winsize {
 unsigned short ws_row;
@@ -117,18 +117,34 @@ static void curses_winch_handler(int signum)
 unsigned short ws_ypixel;   /* unused */
 } ws;
 
-/* terminal size changed */
-if (ioctl(1, TIOCGWINSZ, &ws) == -1)
+if (!got_sigwinch) {
+return;
+}
+got_sigwinch = false;
+
+if (ioctl(1, TIOCGWINSZ, &ws) == -1) {
 return;
+}
 
 resize_term(ws.ws_row, ws.ws_col);
-curses_calc_pad();
 invalidate = 1;
+}
 
-/* some systems require this */
-signal(SIGWINCH, curses_winch_handler);
+static void curses_winch_handler(int signum)
+{
+got_sigwinch = true;
 }
-#endif
+
+static void curses_winch_init(void)
+{
+struct sigaction old, winch = {
+.sa_handler  = curses_winch_handler,
+};
+sigaction(SIGWINCH, &winch, &old);
+}
+#else
+static void curses_winch_check(void) {}
+static void curses_winch_init(void) {}
 #endif
 
 static void curses_cursor_position(DisplayChangeListener *dcl,
@@ -163,6 +179,8 @@ static void curses_refresh(DisplayChangeListener *dcl)
 {
 int chr, nextchr, keysym, keycode, keycode_alt;
 
+curses_winch_check();
+
 if (invalidate) {
 clear();
 refresh();
@@ -349,13 +367,7 @@ void curses_display_init(DisplayState *ds, int full_screen)
 curses_keyboard_setup();
 atexit(curses_atexit);
 
-#ifndef _WIN32
-#if defined(SIGWINCH) && defined(KEY_RESIZE)
-/* some curses implementations provide a handler, but we
- * want to be sure this is handled regardless of the library */
-signal(SIGWINCH, curses_winch_handler);
-#endif
-#endif
+curses_winch_init();
 
 dcl = (DisplayChangeListener *) g_malloc0(sizeof(DisplayChangeListener));
 dcl->ops = &dcl_ops;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH RFC 0/3] add direct support of event in qapi schema

2013-11-22 Thread Luiz Capitulino

On Fri, 22 Nov 2013 10:03:44 +0800
Wenchao Xia  wrote:

> 于 2013/11/13 9:44, Wenchao Xia 写道:
> > This series add support for tag/keyword 'event' in qapi-schema.
> > The implemention doesn't generate a struture and visit function
> > in the background for every event, so it doesn't support nested
> > structure in the define to avoid trouble.
> > 
> > It is on top of series:
> > http://lists.nongnu.org/archive/html/qemu-devel/2013-11/msg01574.html
> > 
> > Wenchao Xia (3):
> >1 os-posix: include sys/time.h
> >2 qapi script: add support of event
> >3 tests: add test cases for qapi event support
> > 
> >   Makefile|9 +-
> >   Makefile.objs   |2 +-
> >   include/sysemu/os-posix.h   |2 +
> >   qapi/Makefile.objs  |1 +
> >   scripts/qapi-event.py   |  355 
> > +++
> >   tests/Makefile  |   14 +-
> >   tests/qapi-schema/qapi-schema-test.json |   12 +
> >   tests/qapi-schema/qapi-schema-test.out  |   10 +-
> >   tests/test-qmp-event.c  |  250 ++
> >   9 files changed, 646 insertions(+), 9 deletions(-)
> >   create mode 100644 scripts/qapi-event.py
> >   create mode 100644 tests/test-qmp-event.c
> > 
>   Luiz, do you think this series is in the right direction?

Sorry, but I haven't reviewed it yet. Will do it next week.

Re: [Qemu-devel] [PATCH 21/27] pc: add memory hotplug 440fx machine

2013-11-22 Thread Gerd Hoffmann

On Do, 2013-11-21 at 03:38 +0100, Igor Mammedov wrote:
> Add DimmBus for memory hotplug below 4Gb or above 4Gb depending
> on initial memory size and hotplug memory size.
> 
> * if ram_size is less than 32-bit PCI hole start, reserve
>   hotplug memory region as [ram_size,32bit-PCIhole-start)
>   if hotplug memory region fits there,
>   otherwise reserve hotplug memory region after "0x1ULL
>   + above_4g_mem_size"

Hmm, 32-bit pci hole start depends on ram size ...

Does it make sense to hotplug memory above 4g unconditionally to
simplify things?

cheers,
  Gerd

[Qemu-devel] [PATCH] vl.c: clean up missing parameters calculation in smp_parse

2013-11-22 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 vl.c |   28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/vl.c b/vl.c
index 8d5d874..470a8d1 100644
--- a/vl.c
+++ b/vl.c
@@ -1385,35 +1385,33 @@ static QemuOptsList qemu_smp_opts = {
 static void smp_parse(QemuOpts *opts)
 {
 if (opts) {
-
 unsigned cpus= qemu_opt_get_number(opts, "cpus", 0);
 unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
 unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
 unsigned threads = qemu_opt_get_number(opts, "threads", 0);
 
 /* compute missing values, prefer sockets over cores over threads */
-if (cpus == 0 || sockets == 0) {
+if (cpus == 0) {
 sockets = sockets > 0 ? sockets : 1;
 cores = cores > 0 ? cores : 1;
 threads = threads > 0 ? threads : 1;
-if (cpus == 0) {
-cpus = cores * threads * sockets;
-}
+cpus = cores * threads * sockets;
+} else if (sockets == 0) {
+cores = cores > 0 ? cores : 1;
+threads = threads > 0 ? threads : 1;
+sockets = cpus / (cores * threads);
+} else if (cores == 0) {
+threads = threads > 0 ? threads : 1;
+cores = cpus / (sockets * threads);
 } else {
-if (cores == 0) {
-threads = threads > 0 ? threads : 1;
-cores = cpus / (sockets * threads);
-} else {
-threads = cpus / (cores * sockets);
-}
+threads = cpus / (sockets * cores);
 }
 
-max_cpus = qemu_opt_get_number(opts, "maxcpus", 0);
-
 smp_cpus = cpus;
-smp_cores = cores > 0 ? cores : 1;
-smp_threads = threads > 0 ? threads : 1;
+smp_cores = cores;
+smp_threads = threads;
 
+max_cpus = qemu_opt_get_number(opts, "maxcpus", 0);
 }
 
 if (max_cpus == 0) {
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH 3/6] qemu-option: Add qemu_config_parse_qdict()

2013-11-22 Thread Max Reitz


On 22.11.2013 09:08, Fam Zheng wrote:

On 2013年11月22日 03:05, Max Reitz wrote:

This function basically parses command-line options given as a QDict
replacing a config file.

For instance, the QDict {"section.opt1": 42, "section.opt2": 23}
corresponds to the config file:

[section]
opt1 = 42
opt2 = 23

It is possible to specify multiple sections and also multiple sections
of the same type. On the command line, this looks like the following:

inject-error.0.event=reftable_load,\
inject-error.1.event=l2_load,\
set-state.event=l1_update

This would correspond to the following config file:

[inject-error "inject-error.0"]
event = reftable_load

[inject-error "inject-error.1"]
event = l2_load

[set-state]
event = l1_update

Signed-off-by: Max Reitz 
---
  include/qemu/config-file.h |   6 +++
  util/qemu-config.c | 111 
+

  2 files changed, 117 insertions(+)

diff --git a/include/qemu/config-file.h b/include/qemu/config-file.h
index 508428f..dbd97c4 100644
--- a/include/qemu/config-file.h
+++ b/include/qemu/config-file.h
@@ -4,6 +4,7 @@
  #include 
  #include "qemu/option.h"
  #include "qapi/error.h"
+#include "qapi/qmp/qdict.h"

  QemuOptsList *qemu_find_opts(const char *group);
  QemuOptsList *qemu_find_opts_err(const char *group, Error **errp);
@@ -18,6 +19,11 @@ int qemu_config_parse(FILE *fp, QemuOptsList 
**lists, const char *fname);


  int qemu_read_config_file(const char *filename);

+/* Parse QDict options as a replacement for a config file (allowing 
multiple

+   enumerated (0..(n-1)) configuration "sections") */
+void qemu_config_parse_qdict(QDict *options, QemuOptsList **lists,
+ Error **errp);
+
  /* Read default QEMU config files
   */
  int qemu_read_default_config_files(bool userconfig);
diff --git a/util/qemu-config.c b/util/qemu-config.c
index 04da942..80d82d4 100644
--- a/util/qemu-config.c
+++ b/util/qemu-config.c
@@ -356,3 +356,114 @@ int qemu_read_config_file(const char *filename)
  return -EINVAL;
  }
  }
+
+static void config_parse_qdict_section(QDict *options, QemuOptsList 
*opts,

+   Error **errp)
+{
+QemuOpts *subopts, *subopts_i;
+QDict *subqdict, *subqdict_i = NULL;
+char *prefix = g_strdup_printf("%s.", opts->name);
+Error *local_err = NULL;
+size_t orig_size, enum_size;
+
+qdict_extract_subqdict(options, &subqdict, prefix);
+orig_size = qdict_size(subqdict);
+if (!orig_size) {
+goto out;
+}
+
+subopts = qemu_opts_create_nofail(opts);
+qemu_opts_absorb_qdict(subopts, subqdict, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+goto out;
+}
+
+enum_size = qdict_size(subqdict);
+if (enum_size < orig_size && enum_size) {
+error_setg(errp, "Unknown option '%s' for '%s'",
+   qdict_first(subqdict)->key, opts->name);
+goto out;
+}
+
+if (enum_size) {
+/* Multiple, enumerated rules */
+int i;
+
+/* Not required anymore */
+qemu_opts_del(subopts);
+
+for (i = 0; i < INT_MAX; i++) {
+char i_prefix[32], opt_name[48];
+size_t snprintf_ret;
+
+snprintf_ret = snprintf(i_prefix, 32, "%i.", i);
+assert(snprintf_ret < 32);
+
+snprintf_ret = snprintf(opt_name, 48, "%s.%i", 
opts->name, i);

+assert(snprintf_ret < 48);


Is there a length limit of opts->name? I.e. is this an error case 
rather than an assertion case?


Hm, yes, I had most of the code directly in block/blkdebug.c at first, 
that's why there are some issues here which work fine just for blkdebug, 
but are not that good in the general case. I'll use g_strdup_sprintf 
instead.





+
+qdict_extract_subqdict(subqdict, &subqdict_i, i_prefix);
+if (!qdict_size(subqdict_i)) {
+break;
+}
+
+subopts_i = qemu_opts_create(opts, opt_name, 1, NULL);


Could pass in errp and skip error_setg below.


Oh, right, missed that, thanks.




+if (!subopts_i) {
+error_setg(errp, "Option ID '%s' already in use", 
opt_name);

+goto out;
+}
+
+qemu_opts_absorb_qdict(subopts_i, subqdict_i, &local_err);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+qemu_opts_del(subopts_i);
+goto out;
+}
+
+if (qdict_size(subqdict_i)) {
+error_setg(errp, "[%s] rules don't support the 
option '%s'",


s/rules don't/section doesn't/


Yes, this one also is fine for blkdebug, but not so much for the general 
case.





+   opts->name, qdict_first(subqdict_i)->key);
+qemu_opts_del(subopts_i);
+goto out;
+}
+
+/*
+if (add_rule(subopts_i, (void *)ard)

Re: [Qemu-devel] [PATCH 00/21] RFCv2: add Spice block device

2013-11-22 Thread Marc-André Lureau

Hi there,

Even though there is no rush to review this series, it would be nice
to get the first ~10 patches. It would make the next rebase easier,
and the review smaller.

cheers

On Mon, Nov 18, 2013 at 1:25 PM, Marc-André Lureau
 wrote:
> Hi,
>
> The following patch series implement a Spice block device, which
> allows the client to redirect a block device using the NBD protocol,
> which greatly simplifies the Spice code by reusing an existing
> protocol, and allows sharing existing qemu NBD implementation.
>
> The backend only support read-only device atm (although it shouldn't
> be hard to add write support if necessary)
>
> Usage with a CDROM drive:
>  -device ide-cd,drive=cd -drive if=none,id=cd,readonly,file=spicebd:
>
> The associated server and client bits are:
> http://lists.freedesktop.org/archives/spice-devel/2013-June/013608.html
> http://lists.freedesktop.org/archives/spice-devel/2013-November/015452.html
> http://lists.freedesktop.org/archives/spice-devel/2013-November/015431.html
>
> Caveats: This block device driver is a bit special, since it is
> successfully initialized with size 0, and once the client is connected
> (or want to change block device) it re-opens itself. For this to work,
> we allow a block driver to be open with an existing opaque data. We
> also save the associate device name in the block drivers.
>
> During migration, the source needs to be able to flush pending
> operations, so the Spice channel context must be in a running loop. A
> modification to the Spice server API allows to associate a particular
> channel with the AIO loop, and may be used in the future to associate
> channels with different context or athreads. However, the AIO context
> doesn't have timers yet. Since they aren't really needed for the NBD
> channel, it's not a problem. I have been told timers in AIO are on
> their way, so this could be updated later.
>
> Since the block driver state is not migrated, the destination needs to
> wait until the block driver is initialized before the VM can run. This
> is done with a simple hold count. It is also necessary to avoid extra
> media changed notifications, which is easily done by checking
> migration state.
>
>
> Marc-André Lureau (21):
>   vscclient: do not add a socket watch if there is not data to send
>   spice-char: remove unused field
>   qmp_change_blockdev() remove unused has_format
>   include: add missing config-host.h include
>   char: add qemu_chr_fe_event()
>   Split nbd block client code
>   nbd: don't change socket block during negotiate
>   nbd: pass export name as init argument
>   nbd: make session_close() idempotent
>   nbd: finish any pending coroutine
>   nbd: avoid uninitialized warnings
>   block: save the associated child name in BlockDriverState
>   blockdev: add qmp_change_blockdev_int()
>   block: extract make_snapshot() from bdrv_open()
>   block: add "snapshot.size" option to avoid extra bdrv_open()
>   block: learn to open a driver with a given opaque
>   block: allow to call bdrv_open() with an opaque
>   block: do not notify change during migration
>   sysemu: add vm_start_hold/release
>   spice-core: allow an interface to be in AIO context
>   block: add spice block device backend
>
>  block.c   | 225 ---
>  block/Makefile.objs   |   3 +-
>  block/nbd-client.c| 384 +
>  block/nbd-client.h|  50 +
>  block/nbd.c   | 380 +++-
>  block/spicebd.c   | 536 
> ++
>  blockdev.c|  24 ++-
>  hw/block/fdc.c|   8 +-
>  hw/ide/core.c |  12 +-
>  hw/scsi/scsi-disk.c   |  11 +-
>  hw/sd/sd.c|   6 +-
>  include/block/block.h |   2 +-
>  include/block/block_int.h |   1 +
>  include/sysemu/blockdev.h |   5 +-
>  include/sysemu/char.h |  10 +
>  include/sysemu/sysemu.h   |   2 +
>  include/ui/qemu-spice.h   |   4 +-
>  libcacard/vscclient.c |  10 +-
>  nbd.c |   1 -
>  qemu-char.c   |   9 +-
>  qmp.c |   2 +-
>  spice-qemu-char.c |  20 +-
>  stubs/vm-stop.c   |   5 +
>  ui/spice-core.c   |  62 +-
>  vl.c  |  17 ++
>  25 files changed, 1320 insertions(+), 469 deletions(-)
>  create mode 100644 block/nbd-client.c
>  create mode 100644 block/nbd-client.h
>  create mode 100644 block/spicebd.c
>
> --
> 1.8.3.1
>



-- 
Marc-André Lureau

[Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation

2013-11-22 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini 
---
 block/qcow2.c | 2 ++
 block/qed.c   | 2 ++
 block/vdi.c   | 1 +
 block/vhdx.c  | 3 +++
 block/vpc.c   | 1 +
 5 files changed, 9 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2fe37ed..1542750 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1892,6 +1892,8 @@ static coroutine_fn int 
qcow2_co_flush_to_os(BlockDriverState *bs)
 static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 BDRVQcowState *s = bs->opaque;
+bdi->unallocated_blocks_are_zero = true;
+bdi->can_write_zeroes_with_unmap = (s->qcow_version >= 3);
 bdi->cluster_size = s->cluster_size;
 bdi->vm_state_offset = qcow2_vm_state_offset(s);
 return 0;
diff --git a/block/qed.c b/block/qed.c
index adc2736..59516a5 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1475,6 +1475,8 @@ static int bdrv_qed_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 memset(bdi, 0, sizeof(*bdi));
 bdi->cluster_size = s->header.cluster_size;
 bdi->is_dirty = s->header.features & QED_F_NEED_CHECK;
+bdi->unallocated_blocks_are_zero = true;
+bdi->can_write_zeroes_with_unmap = true;
 return 0;
 }
 
diff --git a/block/vdi.c b/block/vdi.c
index b6ec002..2d7490f 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -331,6 +331,7 @@ static int vdi_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 logout("\n");
 bdi->cluster_size = s->block_size;
 bdi->vm_state_offset = 0;
+bdi->unallocated_blocks_are_zero = true;
 return 0;
 }
 
diff --git a/block/vhdx.c b/block/vhdx.c
index ed6fa53..67bbe10 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1049,6 +1049,9 @@ static int vhdx_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 
 bdi->cluster_size = s->block_size;
 
+bdi->unallocated_blocks_are_zero =
+(s->params.data_bits & VHDX_PARAMS_HAS_PARENT) == 0;
+
 return 0;
 }
 
diff --git a/block/vpc.c b/block/vpc.c
index 551876f..1d326cb 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -464,6 +464,7 @@ static int vpc_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 bdi->cluster_size = s->block_size;
 }
 
+bdi->unallocated_blocks_are_zero = true;
 return 0;
 }
 
-- 
1.8.4.2

Re: [Qemu-devel] Buildbot failure: MinGW (was: qga/vss-win32/requester.h compile error)

2013-11-22 Thread Gerd Hoffmann

  Hi,

> > 'make check' for MinGW cross builds needs an installation of wine to run
> > the resulting exe files.

I don't think it is a good idea to run tests for cross builds by
default.

cheers,
  Gerd

[Qemu-devel] [PATCH v3 17/19] scsi-disk: catch write protection errors in UNMAP

2013-11-22 Thread Paolo Bonzini

This is the same that is already done for WRITE SAME.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-disk.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 74e6a14..4138268 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1543,6 +1543,7 @@ done:
 
 static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
 {
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 uint8_t *p = inbuf;
 int len = r->req.cmd.xfer;
 UnmapCBData *data;
@@ -1560,6 +1561,11 @@ static void scsi_disk_emulate_unmap(SCSIDiskReq *r, 
uint8_t *inbuf)
 goto invalid_param_len;
 }
 
+if (bdrv_is_read_only(s->qdev.conf.bs)) {
+scsi_check_condition(r, SENSE_CODE(WRITE_PROTECTED));
+return;
+}
+
 data = g_new0(UnmapCBData, 1);
 data->r = r;
 data->inbuf = &p[8];
-- 
1.8.4.2

[Qemu-devel] [PATCH v3 16/19] qemu-iotests: 033 is fast

2013-11-22 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini 
---
 tests/qemu-iotests/group | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index b63b18c..303e0f3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -39,7 +39,7 @@
 030 rw auto backing
 031 rw auto quick
 032 rw auto
-033 rw auto
+033 rw auto quick
 034 rw auto backing
 035 rw auto quick
 036 rw auto quick
-- 
1.8.4.2

[Qemu-devel] [PATCH v3 06/19] block: make bdrv_co_do_write_zeroes stricter in producing aligned requests

2013-11-22 Thread Paolo Bonzini

Right now, bdrv_co_do_write_zeroes will only try to align the
beginning of the request.  However, it is simpler for many
formats to expect the block layer to separate both the head *and*
the tail.  This makes sure that the format's bdrv_co_write_zeroes
function will be called with aligned sector_num and nb_sectors for
the bulk of the request.

Signed-off-by: Paolo Bonzini 
---
 block.c | 35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index b18ee6b..de66b21 100644
--- a/block.c
+++ b/block.c
@@ -2768,14 +2768,21 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
 while (nb_sectors > 0 && !ret) {
 int num = nb_sectors;
 
-/* align request */
-if (bs->bl.write_zeroes_alignment &&
-num >= bs->bl.write_zeroes_alignment &&
-sector_num % bs->bl.write_zeroes_alignment) {
-if (num > bs->bl.write_zeroes_alignment) {
+/* Align request.  Block drivers can expect the "bulk" of the request
+ * to be aligned.
+ */
+if (bs->bl.write_zeroes_alignment
+&& num > bs->bl.write_zeroes_alignment) {
+if (sector_num % bs->bl.write_zeroes_alignment != 0) {
+/* Make a small request up to the first aligned sector.  */
 num = bs->bl.write_zeroes_alignment;
+num -= sector_num % bs->bl.write_zeroes_alignment;
+} else if ((sector_num + num) % bs->bl.write_zeroes_alignment != 
0) {
+/* Shorten the request to the last aligned sector.  num cannot
+ * underflow because num > bs->bl.write_zeroes_alignment.
+ */
+num -= (sector_num + num) % bs->bl.write_zeroes_alignment;
 }
-num -= sector_num % bs->bl.write_zeroes_alignment;
 }
 
 /* limit request size */
@@ -2793,16 +2800,20 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
 /* Fall back to bounce buffer if write zeroes is unsupported */
 iov.iov_len = num * BDRV_SECTOR_SIZE;
 if (iov.iov_base == NULL) {
-/* allocate bounce buffer only once and ensure that it
- * is big enough for this and all future requests.
- */
-size_t bufsize = num <= nb_sectors ? num : max_write_zeroes;
-iov.iov_base = qemu_blockalign(bs, bufsize * BDRV_SECTOR_SIZE);
-memset(iov.iov_base, 0, bufsize * BDRV_SECTOR_SIZE);
+iov.iov_base = qemu_blockalign(bs, num * BDRV_SECTOR_SIZE);
+memset(iov.iov_base, 0, num * BDRV_SECTOR_SIZE);
 }
 qemu_iovec_init_external(&qiov, &iov, 1);
 
 ret = drv->bdrv_co_writev(bs, sector_num, num, &qiov);
+
+/* Keep bounce buffer around if it is big enough for all
+ * all future requests.
+ */
+if (num < max_write_zeroes) {
+qemu_vfree(iov.iov_base);
+iov.iov_base = NULL;
+}
 }
 
 sector_num += num;
-- 
1.8.4.2

[Qemu-devel] [PATCH v3 19/19] scsi-disk: correctly implement WRITE SAME

2013-11-22 Thread Paolo Bonzini

Fetch the data to be written from the input buffer.  If it is all zeroes,
we can use the write_zeroes call (possibly with the new MAY_UNMAP flag).
Otherwise, do as many write cycles as needed, writing 512k at a time.

Strictly speaking, this is still incorrect because a zero cluster should
only be written if the MAY_UNMAP flag is set.  But this is a bug in qcow2
and the other formats, not in the SCSI code.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-disk.c | 140 +++-
 1 file changed, 116 insertions(+), 24 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 0640bb0..efadfc0 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -41,6 +41,7 @@ do { printf("scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
 #include 
 #endif
 
+#define SCSI_WRITE_SAME_MAX 524288
 #define SCSI_DMA_BUF_SIZE   131072
 #define SCSI_MAX_INQUIRY_LEN256
 #define SCSI_MAX_MODE_LEN   256
@@ -634,6 +635,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
 buflen = 0x40;
 memset(outbuf + 4, 0, buflen - 4);
 
+outbuf[4] = 0x1; /* wsnz */
+
 /* optimal transfer length granularity */
 outbuf[6] = (min_io_size >> 8) & 0xff;
 outbuf[7] = min_io_size & 0xff;
@@ -1589,6 +1592,111 @@ invalid_field:
 scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
 }
 
+typedef struct WriteSameCBData {
+SCSIDiskReq *r;
+int64_t sector;
+int nb_sectors;
+QEMUIOVector qiov;
+struct iovec iov;
+} WriteSameCBData;
+
+static void scsi_write_same_complete(void *opaque, int ret)
+{
+WriteSameCBData *data = opaque;
+SCSIDiskReq *r = data->r;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+assert(r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+bdrv_acct_done(s->qdev.conf.bs, &r->acct);
+if (r->req.io_canceled) {
+goto done;
+}
+
+if (ret < 0) {
+if (scsi_handle_rw_error(r, -ret)) {
+goto done;
+}
+}
+
+data->nb_sectors -= data->iov.iov_len / 512;
+data->sector += data->iov.iov_len / 512;
+data->iov.iov_len = MIN(data->nb_sectors * 512, data->iov.iov_len);
+if (data->iov.iov_len) {
+bdrv_acct_start(s->qdev.conf.bs, &r->acct, data->iov.iov_len, 
BDRV_ACCT_WRITE);
+r->req.aiocb = bdrv_aio_writev(s->qdev.conf.bs, data->sector,
+   &data->qiov, data->iov.iov_len / 512,
+   scsi_write_same_complete, r);
+return;
+}
+
+scsi_req_complete(&r->req, GOOD);
+
+done:
+if (!r->req.io_canceled) {
+scsi_req_unref(&r->req);
+}
+qemu_vfree(data->iov.iov_base);
+g_free(data);
+}
+
+static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
+{
+SCSIRequest *req = &r->req;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+uint32_t nb_sectors = scsi_data_cdb_length(r->req.cmd.buf);
+WriteSameCBData *data;
+uint8_t *buf;
+int i;
+
+/* Fail if PBDATA=1 or LBDATA=1 or ANCHOR=1.  */
+if (nb_sectors == 0 || (req->cmd.buf[1] & 0x16)) {
+scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+return;
+}
+
+if (bdrv_is_read_only(s->qdev.conf.bs)) {
+scsi_check_condition(r, SENSE_CODE(WRITE_PROTECTED));
+return;
+}
+if (!check_lba_range(s, r->req.cmd.lba, nb_sectors)) {
+scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
+return;
+}
+
+if (buffer_is_zero(inbuf, s->qdev.blocksize)) {
+int flags = (req->cmd.buf[1] & 0x8) ? BDRV_REQ_MAY_UNMAP : 0;
+
+/* The request is used as the AIO opaque value, so add a ref.  */
+scsi_req_ref(&r->req);
+bdrv_acct_start(s->qdev.conf.bs, &r->acct, nb_sectors * 
s->qdev.blocksize,
+BDRV_ACCT_WRITE);
+r->req.aiocb = bdrv_aio_write_zeroes(s->qdev.conf.bs,
+ r->req.cmd.lba * 
(s->qdev.blocksize / 512),
+ nb_sectors * (s->qdev.blocksize / 
512),
+ flags, scsi_aio_complete, r);
+return;
+}
+
+data = g_new0(WriteSameCBData, 1);
+data->r = r;
+data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512);
+data->nb_sectors = nb_sectors * (s->qdev.blocksize / 512);
+data->iov.iov_len = MIN(data->nb_sectors * 512, SCSI_WRITE_SAME_MAX);
+data->iov.iov_base = buf = qemu_blockalign(s->qdev.conf.bs, 
data->iov.iov_len);
+qemu_iovec_init_external(&data->qiov, &data->iov, 1);
+
+for (i = 0; i < data->iov.iov_len; i += s->qdev.blocksize) {
+memcpy(&buf[i], inbuf, s->qdev.blocksize);
+}
+
+scsi_req_ref(&r->req);
+bdrv_acct_start(s->qdev.conf.bs, &r->acct, data->iov.iov_len, 
BDRV_ACCT_WRITE);
+r->req.aiocb = bdrv_aio_writev(s->qdev.conf

[Qemu-devel] [PATCH v3 14/19] raw-posix: implement write_zeroes with MAY_UNMAP for block devices

2013-11-22 Thread Paolo Bonzini

See the next commit for the description of the Linux kernel problem
that is worked around in raw_open_common.

Signed-off-by: Paolo Bonzini 
---
 block/raw-posix.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7f3f47d..b3feed6 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -334,6 +334,22 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 if (S_ISREG(st.st_mode)) {
 s->discard_zeroes = true;
 }
+if (S_ISBLK(st.st_mode)) {
+#ifdef BLKDISCARDZEROES
+unsigned int arg;
+if (ioctl(s->fd, BLKDISCARDZEROES, &arg) == 0 && arg) {
+s->discard_zeroes = true;
+}
+#endif
+#ifdef __linux__
+/* On Linux 3.10, BLKDISCARD leaves stale data in the page cache.  Do
+ * not rely on the contents of discarded blocks unless using O_DIRECT.
+ */
+if (!(bs->open_flags & BDRV_O_NOCACHE)) {
+s->discard_zeroes = false;
+}
+#endif
+}
 
 #ifdef CONFIG_XFS
 if (platform_test_xfs_fd(s->fd)) {
@@ -1586,6 +1602,26 @@ static coroutine_fn BlockDriverAIOCB 
*hdev_aio_discard(BlockDriverState *bs,
cb, opaque, QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
 }
 
+static coroutine_fn int hdev_co_write_zeroes(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+{
+BDRVRawState *s = bs->opaque;
+int rc;
+
+rc = fd_open(bs);
+if (rc < 0) {
+return rc;
+}
+if (!(flags & BDRV_REQ_MAY_UNMAP)) {
+return -ENOTSUP;
+}
+if (!s->discard_zeroes) {
+return -ENOTSUP;
+}
+return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+  QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
+}
+
 static int hdev_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
 {
@@ -1638,6 +1674,7 @@ static BlockDriver bdrv_host_device = {
 .bdrv_reopen_abort   = raw_reopen_abort,
 .bdrv_create= hdev_create,
 .create_options = raw_create_options,
+.bdrv_co_write_zeroes = hdev_co_write_zeroes,
 
 .bdrv_aio_readv= raw_aio_readv,
 .bdrv_aio_writev   = raw_aio_writev,
-- 
1.8.4.2

[Qemu-devel] [PATCH v3 13/19] raw-posix: implement write_zeroes with MAY_UNMAP for files

2013-11-22 Thread Paolo Bonzini

Writing zeroes to a file can be done by punching a hole if
MAY_UNMAP is set.

Note that in this case ENOTSUP is not ignored, but makes
the block layer fall back to the generic implementation.

Signed-off-by: Paolo Bonzini 
---
 block/raw-posix.c | 66 +--
 trace-events  |  1 +
 2 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index cfa3162..7f3f47d 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -139,9 +139,10 @@ typedef struct BDRVRawState {
 void *aio_ctx;
 #endif
 #ifdef CONFIG_XFS
-bool is_xfs : 1;
+bool is_xfs:1;
 #endif
-bool has_discard : 1;
+bool has_discard:1;
+bool discard_zeroes:1;
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -283,6 +284,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 Error *local_err = NULL;
 const char *filename;
 int fd, ret;
+struct stat st;
 
 opts = qemu_opts_create_nofail(&raw_runtime_opts);
 qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -324,6 +326,15 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 #endif
 
 s->has_discard = true;
+
+if (fstat(s->fd, &st) < 0) {
+error_setg_errno(errp, errno, "Could not stat file");
+goto fail;
+}
+if (S_ISREG(st.st_mode)) {
+s->discard_zeroes = true;
+}
+
 #ifdef CONFIG_XFS
 if (platform_test_xfs_fd(s->fd)) {
 s->is_xfs = true;
@@ -787,6 +798,29 @@ static int aio_worker(void *arg)
 return ret;
 }
 
+static int paio_submit_co(BlockDriverState *bs, int fd,
+int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+int type)
+{
+RawPosixAIOData *acb = g_slice_new(RawPosixAIOData);
+ThreadPool *pool;
+
+acb->bs = bs;
+acb->aio_type = type;
+acb->aio_fildes = fd;
+
+if (qiov) {
+acb->aio_iov = qiov->iov;
+acb->aio_niov = qiov->niov;
+}
+acb->aio_nbytes = nb_sectors * 512;
+acb->aio_offset = sector_num * 512;
+
+trace_paio_submit_co(sector_num, nb_sectors, type);
+pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+return thread_pool_submit_co(pool, aio_worker, acb);
+}
+
 static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockDriverCompletionFunc *cb, void *opaque, int type)
@@ -1199,6 +1233,31 @@ static coroutine_fn BlockDriverAIOCB 
*raw_aio_discard(BlockDriverState *bs,
cb, opaque, QEMU_AIO_DISCARD);
 }
 
+static int coroutine_fn raw_co_write_zeroes(
+BlockDriverState *bs, int64_t sector_num,
+int nb_sectors, BdrvRequestFlags flags)
+{
+BDRVRawState *s = bs->opaque;
+
+if (!(flags & BDRV_REQ_MAY_UNMAP)) {
+return -ENOTSUP;
+}
+if (!s->discard_zeroes) {
+return -ENOTSUP;
+}
+return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+  QEMU_AIO_DISCARD);
+}
+
+static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+BDRVRawState *s = bs->opaque;
+
+bdi->unallocated_blocks_are_zero = s->discard_zeroes;
+bdi->can_write_zeroes_with_unmap = s->discard_zeroes;
+return 0;
+}
+
 static QEMUOptionParameter raw_create_options[] = {
 {
 .name = BLOCK_OPT_SIZE,
@@ -1222,6 +1281,7 @@ static BlockDriver bdrv_file = {
 .bdrv_create = raw_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
 .bdrv_co_get_block_status = raw_co_get_block_status,
+.bdrv_co_write_zeroes = raw_co_write_zeroes,
 
 .bdrv_aio_readv = raw_aio_readv,
 .bdrv_aio_writev = raw_aio_writev,
@@ -1230,6 +1290,7 @@ static BlockDriver bdrv_file = {
 
 .bdrv_truncate = raw_truncate,
 .bdrv_getlength = raw_getlength,
+.bdrv_get_info = raw_get_info,
 .bdrv_get_allocated_file_size
 = raw_get_allocated_file_size,
 
@@ -1585,6 +1646,7 @@ static BlockDriver bdrv_host_device = {
 
 .bdrv_truncate  = raw_truncate,
 .bdrv_getlength= raw_getlength,
+.bdrv_get_info = raw_get_info,
 .bdrv_get_allocated_file_size
 = raw_get_allocated_file_size,
 
diff --git a/trace-events b/trace-events
index d318d6f..e32d00c 100644
--- a/trace-events
+++ b/trace-events
@@ -128,6 +128,7 @@ thread_pool_cancel(void *req, void *opaque) "req %p opaque 
%p"
 
 # block/raw-win32.c
 # block/raw-posix.c
+paio_submit_co(int64_t sector_num, int nb_sectors, int type) "sector_num 
%"PRId64" nb_sectors %d type %d"
 paio_submit(void *acb, void *opaque, int64_t sector_num, int nb_sectors, int 
type) "acb %p opaque %p sector_num %"PRId64" nb_sectors %d type %d"
 
 # ioport.c
-- 
1.8.4.2

[Qemu-devel] [PATCH v3 10/19] block/iscsi: remove .bdrv_has_zero_init

2013-11-22 Thread Paolo Bonzini

From: Peter Lieven 

since commit 3ac21627 the default value changed to 0.

Signed-off-by: Peter Lieven 
Signed-off-by: Paolo Bonzini 
---
 block/iscsi.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index b7b5238..b6b62aa 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1505,11 +1505,6 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset)
 return 0;
 }
 
-static int iscsi_has_zero_init(BlockDriverState *bs)
-{
-return 0;
-}
-
 static int iscsi_create(const char *filename, QEMUOptionParameter *options,
 Error **errp)
 {
@@ -1608,8 +1603,6 @@ static BlockDriver bdrv_iscsi = {
 .bdrv_aio_writev = iscsi_aio_writev,
 .bdrv_aio_flush  = iscsi_aio_flush,
 
-.bdrv_has_zero_init = iscsi_has_zero_init,
-
 #ifdef __linux__
 .bdrv_ioctl   = iscsi_ioctl,
 .bdrv_aio_ioctl   = iscsi_aio_ioctl,
-- 
1.8.4.2

[Qemu-devel] [PATCH v3 12/19] block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP

2013-11-22 Thread Paolo Bonzini

The current check is right for MAY_UNMAP=1.  For MAY_UNMAP=0, just
try and fall back to regular writes as soon as a WRITE SAME command
fails.

Signed-off-by: Paolo Bonzini 
---
 block/iscsi.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 20f4f55..93fee6d 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -55,6 +55,7 @@ typedef struct IscsiLun {
 QEMUTimer *nop_timer;
 uint8_t lbpme;
 uint8_t lbprz;
+uint8_t has_write_same;
 struct scsi_inquiry_logical_block_provisioning lbp;
 struct scsi_inquiry_block_limits bl;
 unsigned char *zeroblock;
@@ -976,8 +977,13 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, 
int64_t sector_num,
 return -EINVAL;
 }
 
-if (!iscsilun->lbp.lbpws) {
-/* WRITE SAME is not supported by the target */
+if (!(flags & BDRV_REQ_MAY_UNMAP) && !iscsilun->has_write_same) {
+/* WRITE SAME without UNMAP is not supported by the target */
+return -ENOTSUP;
+}
+
+if ((flags & BDRV_REQ_MAY_UNMAP) && !iscsilun->lbp.lbpws) {
+/* WRITE SAME with UNMAP is not supported by the target */
 return -ENOTSUP;
 }
 
@@ -1012,6 +1018,14 @@ retry:
 }
 
 if (iTask.status != SCSI_STATUS_GOOD) {
+if (iTask.status == SCSI_STATUS_CHECK_CONDITION &&
+iTask.task->sense.key == SCSI_SENSE_ILLEGAL_REQUEST &&
+iTask.task->sense.ascq == SCSI_SENSE_ASCQ_INVALID_OPERATION_CODE) {
+/* WRITE SAME is not supported by the target */
+iscsilun->has_write_same = false;
+return -ENOTSUP;
+}
+
 return -EIO;
 }
 
@@ -1375,6 +1389,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 iscsilun->type = inq->periperal_device_type;
+iscsilun->has_write_same = true;
 
 if ((ret = iscsi_readcapacity_sync(iscsilun)) != 0) {
 goto out;
-- 
1.8.4.2

1 2 >

1 - 100 of 149 matches

Mail list logo