Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information

2019-10-27 Thread Markus Armbruster
Tao Xu  writes:

> Got it. I will use bytes per second for bandwidth here. Usually we use
> nanosecond for memory latency, so if we use second for latency, it may
> lose precision. So can I use nanosecond here, because we now use
> nanosecond as smallest time unit.

Sounds fair, go ahead.




Re: [PATCH v2] net: add tulip (dec21143) driver

2019-10-27 Thread Jason Wang



On 2019/10/25 下午7:43, Michael S. Tsirkin wrote:

On Tue, Oct 22, 2019 at 05:54:13PM +0200, Sven Schnelle wrote:

This adds the basic functionality to emulate a Tulip NIC.

Jason, do you want to queue this?

Overall ok so

Acked-by: Michael S. Tsirkin



Yes, I've queued V3 of this.

Thanks




Re: [PATCH v20 0/5] Add ARMv8 RAS virtualization support in QEMU

2019-10-27 Thread gengdongjiu
Hi Michael/All

On 2019/10/27 18:17, Michael S. Tsirkin wrote:
> On Sat, Oct 26, 2019 at 11:24:42AM +0800, Xiang Zheng wrote:
>> In the ARMv8 platform, the CPU error types are synchronous external 
>> abort(SEA)
>> and SError Interrupt (SEI). If exception happens in guest, sometimes it's 
>> better
>> for guest to perform the recovery, because host does not know the detailed
>> information of guest. For example, if an exception happens in a user-space
>> application within guest, host does not know which application encounters
>> errors.
>>
>> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify 
>> userspace.
>> After user space gets the notification, it will record the CPER into guest 
>> GHES
>> buffer and inject an exception or IRQ into guest.
>>
>> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we 
>> will
>> treat it as a synchronous exception, and notify guest with ARMv8 SEA
>> notification type after recording CPER into guest.
>>
>> This series of patches are based on Qemu 4.1, which include two parts:
>> 1. Generate APEI/GHES table.
>> 2. Handle the SIGBUS signal, record the CPER in runtime and fill it into 
>> guest
>>memory, then notify guest according to the type of SIGBUS.
>>
>> The whole solution was suggested by James(james.mo...@arm.com); The solution 
>> of
>> APEI section was suggested by Laszlo(ler...@redhat.com).
>> Show some discussions in [1].
>>
>> This series of patches have already been tested on ARM64 platform with RAS
>> feature enabled:
>> Show the APEI part verification result in [2].
>> Show the BUS_MCEERR_AR SIGBUS handling verification result in [3].
> 
> This looks mostly OK to me.  I sent some minor style comments but they
> can be addressed by follow up patches.
> 
> Maybe it's a good idea to merge this before soft freeze to make sure it
> gets some testing.  I'll leave this decision to the ARM maintainer.  For
> ACPI parts:
> 
> Reviewed-by: Michael S. Tsirkin 

Got it, Thanks for the Reviewed-by from Michael.

Hi Michael,
  According to discussion with QEMU community, I finished and developed the 
whole ARM RAS virtualization solution, and introduce the ARM APEI table in the 
first time.
For the newly created files, which are mainly about ARM APEI/GHES part,I would 
like to maintain them. If you agree it, whether I can add new maintainers[1]? 
thanks a lot.


[1]:
+ARM APEI Subsystem
+M: Dongjiu Geng 
+M: Xiang zheng 
+L: qemu-...@nongnu.org
+S: Maintained
+F: hw/acpi/acpi_ghes.c
+F: include/hw/acpi/acpi_ghes.h
+F: docs/specs/acpi_hest_ghes.rst


> 
> 
>> ---




Re: [PATCH v20 0/5] Add ARMv8 RAS virtualization support in QEMU

2019-10-27 Thread Xiang Zheng



On 2019/10/27 18:17, Michael S. Tsirkin wrote:
> On Sat, Oct 26, 2019 at 11:24:42AM +0800, Xiang Zheng wrote:
>> In the ARMv8 platform, the CPU error types are synchronous external 
>> abort(SEA)
>> and SError Interrupt (SEI). If exception happens in guest, sometimes it's 
>> better
>> for guest to perform the recovery, because host does not know the detailed
>> information of guest. For example, if an exception happens in a user-space
>> application within guest, host does not know which application encounters
>> errors.
>>
>> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify 
>> userspace.
>> After user space gets the notification, it will record the CPER into guest 
>> GHES
>> buffer and inject an exception or IRQ into guest.
>>
>> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we 
>> will
>> treat it as a synchronous exception, and notify guest with ARMv8 SEA
>> notification type after recording CPER into guest.
>>
>> This series of patches are based on Qemu 4.1, which include two parts:
>> 1. Generate APEI/GHES table.
>> 2. Handle the SIGBUS signal, record the CPER in runtime and fill it into 
>> guest
>>memory, then notify guest according to the type of SIGBUS.
>>
>> The whole solution was suggested by James(james.mo...@arm.com); The solution 
>> of
>> APEI section was suggested by Laszlo(ler...@redhat.com).
>> Show some discussions in [1].
>>
>> This series of patches have already been tested on ARM64 platform with RAS
>> feature enabled:
>> Show the APEI part verification result in [2].
>> Show the BUS_MCEERR_AR SIGBUS handling verification result in [3].
> 
> 
> This looks mostly OK to me.  I sent some minor style comments but they
> can be addressed by follow up patches.
> 
> Maybe it's a good idea to merge this before soft freeze to make sure it
> gets some testing.  I'll leave this decision to the ARM maintainer.  For
> ACPI parts:
> 
> Reviewed-by: Michael S. Tsirkin 
> 

Hi Peter,

I can address the style comments and send the series of patches before soft
freeze if needed. And if there is no window before soft freeze, I can also
address them by follow up patches. :)

-- 

Thanks,
Xiang




Failure of troublesome vhost-user/flags-mismatch test can't be reproduced

2019-10-27 Thread Coiby Xu
Hi,

Can anyone reproduce the failure of one troublesome test
vhost-user/flags-mismatch (https://wiki.qemu.org/Testing/CI/TroublesomeTests)
which has been disabled since Aug 15 2017 (
https://patchwork.kernel.org/patch/9939431/)?  Last month I downloaded QEMU
source code (commit ID: 3483534ec3) and this version also failed this test.
After debugging this issue, I thought I have fixed this issue by saving
acknowledged features after vhost backend disconnect with the client
(qemu). But later I reviewed the commit history of net/vhost-user.c and
found out this test failed on my machines because of  a newly introduced
bug from a recent commit (https://patchwork.kernel.org/patch/11054781/) on
July 29 2019 and this new bug has already been fixed by a more recent
commit (
https://lists.sr.ht/~philmd/qemu/%3C20190924162044.11414-1-amorenoz%40redhat.com%3E).
After more investigation, it surprised me both 3.1.0 and 2.9.1could pass
vhost-user/flags-mismatch test. According to the comments on the commit
https://patchwork.kernel.org/patch/9939431/ which disabled the
vhost-user/flags-mismatch and other two tests, it seems the tests are only
broken on travis. Maybe we should re-investigate this issue to re-enable
the test so similar bugs like
https://bugzilla.redhat.com/show_bug.cgi?id=1738768 could be prevented in
the first place.

Btw we can excluisvely check this test on latest QEMU by given TESTPATH and
setting QTEST_VHOST_USER_FIXME environment variable,
QTEST_VHOST_USER_FIXME=1
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img
tests/qos-test -p
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/flags-mismatch


-- 
*Best regards,*
*Coiby*


Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information

2019-10-27 Thread Tao Xu

On 10/26/2019 4:51 AM, Eduardo Habkost wrote:

On Fri, Oct 25, 2019 at 09:44:50PM +0200, Markus Armbruster wrote:

Igor Mammedov  writes:


On Fri, 25 Oct 2019 14:33:53 +0800
Tao Xu  wrote:


On 10/23/2019 11:28 PM, Igor Mammedov wrote:

On Sun, 20 Oct 2019 19:11:19 +0800
Tao Xu  wrote:

[...]

+#
+# @access-bandwidth: access bandwidth (MB/s)
+#
+# @read-bandwidth: read bandwidth (MB/s)
+#
+# @write-bandwidth: write bandwidth (MB/s)

I think units here are not appropriate, values stored in fields are
minimal base units only and nothing else (i.e. ps and B/s)
   

Eric suggest me to drop picoseconds. So here I can use ns. For
bandwidth, if we use B/s here, does it let user or developer to
misunderstand that the smallest unit is B/s ?


It's not nanoseconds or MB/s stored in theses fields, isn't it?
I'd specify units in which value is stored or drop units altogether.

Maybe Eric and Markus can suggest a better way to describe fields.


This isn't review (yet), just an attempt to advise more quickly on
general QAPI/QMP conventions.

Unit prefixes like Mebi- are nice for humans, because 1MiB is clearer
than 1048576B.

QMP is for machines.  We eschew unit prefixes and unit symbols there.
The unit is implied.  Unit prefixes only complicate things.  Machines
can deal with 1048576 easily.  Also dealing 1024Ki and 1Mi is additional
work.  We therefore use JSON numbers for byte counts, not strings with
units.

The general rule is "always use the plainest implied unit that would
do."  There are exceptions, mostly due to review failure.

Byte rates should be in bytes per second.

For time, we've made a godawful mess.  The plainest unit is clearly the
second.  We commonly need sub-second granularity, though.
Floating-point seconds are unpopular for some reason :)  Instead we use
milli-, micro-, and nanoseconds, and even (seconds, microseconds) pairs.

QAPI schema documentation describes both the generated C and the QMP
wire protocol.  It must be written with the implied unit.  If you send a
byte rate in bytes per second via QMP, that's what you document.  Even
if a human interface lets you specify the byte rate in MiB/s.

Does this make sense?


This makes sense for the bandwidth fields.  We still need to
decide how to represent the latency field, though.

Seconds would be the obvious choice, if only it didn't risk
silently losing precision when converting numbers to floats.

Got it. I will use bytes per second for bandwidth here. Usually we use 
nanosecond for memory latency, so if we use second for latency, it may 
lose precision. So can I use nanosecond here, because we now use 
nanosecond as smallest time unit.




Re: [Qemu-devel] [PATCH] migration: check length directly to make sure the range is aligned

2019-10-27 Thread Wei Yang
On Fri, Jul 19, 2019 at 07:06:51PM +0100, Dr. David Alan Gilbert wrote:
>* Paolo Bonzini (pbonz...@redhat.com) wrote:
>> On 19/07/19 19:54, Dr. David Alan Gilbert wrote:
>> >> -if ((uintptr_t)host_endaddr & (rb->page_size - 1)) {
>> >> -error_report("ram_block_discard_range: Unaligned end 
>> >> address: %p",
>> >> - host_endaddr);
>> >> +if (length & (rb->page_size - 1)) {
>> >> +error_report("ram_block_discard_range: Unaligned length: 
>> >> %lx",
>> >> + length);
>> > Yes, I *think* this is safe, we'll need to watch out for any warnings;
>> 
>> Do you mean compiler or QEMU warning?
>
>No, I mean lots of these error reports being printed out in some common
>case.
>

Hi, Dave

Haven't see you for a period of time :-)

>Dave
>
>  The patch is safe since there's an
>> 
>> if ((uintptr_t)host_startaddr & (rb->page_size - 1)) {
>> error_report("ram_block_discard_range: Unaligned start address: %p",
>>  host_startaddr);
>> goto err;
>> }
>> 
>> just before this context.
>> 
>> Paolo
>--
>Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

-- 
Wei Yang
Help you, Help me



Fwd: [retrocomputing devroom] FOSDEM 2020 - Retrocomputing DevRoom Call for Participation

2019-10-27 Thread François Revol
Hi,

anyone wants to talk about using QEMU for old platform emulation? (like,
m68k or older…)


François.


 Message transféré 
Sujet : [retrocomputing devroom] FOSDEM 2020 - Retrocomputing DevRoom
Call for Participation
Date : Sun, 27 Oct 2019 23:48:33 +0100
De : Pau Garcia Quiles 
Pour : retrocomputing-devr...@lists.fosdem.org

Hello,

FOSDEM is a free software event that offers open source communities a
place to meet, share ideas and collaborate. It is renowned for being
highly developer-oriented and brings together 8000+ participants from
all over the world. It is held in the city of Brussels (Belgium).

FOSDEM 2020 will take place during the weekend of February 1st-2nd
2020. More details about the event can be found at
http://www.fosdem.org.


CALL FOR PARTICIPATION

After success in the past two years, the Retrocomputing DevRoom will
be back in 2020, with talks about usage of older computing hardware
and software in modern times.

Presentation topics could include but are not limited to:
   - Emulation of old systems to run videogames, legacy software, etc
   - New software, hardware or related to be used with classic systems
   - Open source software emulation/simulation
   - Open hardware
   - Operating systems/executives for retrocomputers/retrosystems
   - Uses of retrocomputing today
   - Other retrosystems topics
   - Opportunities in retrocomputing

You are not limited to slide presentations, of course. Be creative.

However, FOSDEM is an open source conference, therefore we ask you to
stay clear of marketing presentations. We are not afraid of technical
stuff: devrooms are a place for development teams to meet, discuss,
hack and publicly present their project’s latest  improvements and
future directions.

If you will have special needs for your talk (e. g. because you will
need to plug some sort of a system, you need sound, etc), please note
that clearly in your proposal so that we can provide it.

You can use the Wikipedia definition of retrocomputing as a reference
definition to see if you talk qualifies, although it is not exclusive:
https://en.wikipedia.org/wiki/Retrocomputing


IMPORTANT DATES
   -  30 November 2019: submission deadline for talk proposals
   -  14 December 2019: announcement of the final schedule
   -  1 February 2020: Retrocomputing DevRoom


USEFUL INFORMATION

Use the FOSDEM Pentabarf tool to submit your proposal:
https://penta.fosdem.org/submission/FOSDEM20

- If necessary, create a Pentabarf account and activate it. Please
reuse your account from previous years if you have already created it.
- In the "Person" section, provide First name, Last name (in the
"General" tab), Email (in the "Contact" tab) and Bio ("Abstract" field
in the "Description" tab).
- Submit a proposal by clicking on "Create event".
- Important! Select the "Retrocomputing DevRoom" track (on the
"General" tab). If you do not select a track, then nobody, from any
track, will look at your submission!
- Provide the title of your talk ("Event title" in the "General" tab).
- Provide a description of the subject of the talk and the intended
audience (in the "Abstract" field of the "Description" tab)
- Provide a rough outline of the talk or goals of the session (a short
list of bullet points covering topics that will be discussed) in the
"Full description" field in the "Description" tab
- Provide an expected length of your talk in the "Duration" field,
including discussion. The default duration is 30 minutes.
- Slides are NOT required at the moment of submission

Please note neither FOSDEM nor the Retrocomputing DevRoom will
reimburse any expenses you incur.


RECORDING OF TALKS

The FOSDEM organizers plan to have live streaming and recording fully
working, both for remote/later viewing of talks, and so that people
can watch streams in the hallways when rooms are full. This requires
speakers to consent to being recorded and streamed.

If you plan to be a speaker, please understand that by doing so you
implicitly give consent for your talk to be recorded and streamed. The
recordings will be published under the same license as all FOSDEM
content (CC-BY).

Hope to hear from you soon! And please forward this announcement.


CONTACT

The Retrocomputing DevRoom is managed by Pau Garcia Quiles and
François Revol (retro-devroom-mana...@fosdem.org).

A mailing list of speakers, audience and the curious is available,
please subscribe at:
https://lists.fosdem.org/listinfo/retrocomputing-devroom


-- 
Pau Garcia Quiles
http://www.elpauer.org
___
retrocomputing-devroom mailing list
retrocomputing-devr...@lists.fosdem.org
https://lists.fosdem.org/listinfo/retrocomputing-devroom



Re: [PULL v2 00/73] tcg plugins and testing updates

2019-10-27 Thread Peter Maydell
On Fri, 25 Oct 2019 at 21:24, Markus Armbruster  wrote:
> Alex Bennée  writes:
> > I'd rather not unless we can make an exception for late merging of the
> > PR. I've worked quite hard to make sure everything is ready for the 4.2
> > window and I'd rather not miss a whole release cycle on a
> > misunderstanding of what these plugins allow.
>
> I think there are multiple ways to avoid the nuclear outcome.
>
> Coming to a conclusion before the soft freeze is the nicest one.
>
> Making an exception for late merging is another one, but Peter may
> prefer not to.
>
> Yet another one is merging the pull request before the soft freeze with
> the understanding that it'll be reverted unless we come to a positive
> conclusion before say -rc0 (Nov 5).  I'm confident we can work it out in
> Lyon.

I'm happy with any of these (and we have a longstanding rule
that as long as a version of the pullreq was on list before soft
freeze it's ok to apply before hardfreeze, even if it needed to
go through a few versions or was otherwise a bit delayed).

In practice, since I'm on holiday Mon/Tues and this hotel wifi is
awful it's quite likely that I wouldn't get round to actually processing
a pullreq with the TCG plugins in it before we all get a chance
to talk face-to-face on Wednesday anyhow :-)

Alex: I suggest you send out a pullreq with the plugins stuff
(I've just applied your testing pullreq), and then we can
make sure it gets over the "passes merge build/tests" hurdle.

thanks
-- PMM



Re: [PULL v3 00/15] testing updates

2019-10-27 Thread Peter Maydell
On Fri, 25 Oct 2019 at 20:37, Alex Bennée  wrote:
>
> The following changes since commit 03bf012e523ecdf047ac56b2057950247256064d:
>
>   Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
> (2019-10-25 14:59:53 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/stsquad/qemu.git tags/pull-testing-next-251019-3
>
> for you to fetch changes up to 8ce2f68fc90e36d8cd57585f7f4bc75e5038f0b1:
>
>   tests/docker: update Travis image to a more current version (2019-10-25 
> 19:24:21 +0100)
>
> 
> Testing updates (split from mega PR)
>
>   - various Travis dependency updates
>   - enable tcg debug for check-tcg
>   - additional Xcode build for Cirrus
>   - dependency tweak for gitlab



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM



[PULL 3/3] contrib/gitdm: add China Mobile to the domain map

2019-10-27 Thread Alex Bennée
We've had a number of contributions from this domain. Mao has
confirmed they are company contributions.

Signed-off-by: Alex Bennée 
Acked-by: Mao Zhongyi 
Cc: Zhang Shengju 

diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map
index 9efe066ec9c..dd79147c769 100644
--- a/contrib/gitdm/domain-map
+++ b/contrib/gitdm/domain-map
@@ -5,6 +5,7 @@
 #
 
 amd.com AMD
+cmss.chinamobile.com China Mobile
 citrix.com  Citrix
 greensocs.com   GreenSocs
 fujitsu.com Fujitsu
-- 
2.20.1




[PULL 1/3] contrib/gitdm: add Emanuele as an individual

2019-10-27 Thread Alex Bennée
Again this is guess work based on public websites. Please confirm.

Signed-off-by: Alex Bennée 
Acked-by: Emanuele Giuseppe Esposito 

diff --git a/contrib/gitdm/group-map-individuals 
b/contrib/gitdm/group-map-individuals
index 1c847174380..bcb50e325cc 100644
--- a/contrib/gitdm/group-map-individuals
+++ b/contrib/gitdm/group-map-individuals
@@ -14,3 +14,4 @@ nor...@nocrew.org
 samuel.thiba...@ens-lyon.org
 aurel...@aurel32.net
 bala...@eik.bme.hu
+e.emanuelegiuse...@gmail.com
-- 
2.20.1




[PULL 2/3] contrib/gitdm: add Andrey to the individual group

2019-10-27 Thread Alex Bennée
Signed-off-by: Alex Bennée 
Acked-by: Andrey Smirnov 

diff --git a/contrib/gitdm/group-map-individuals 
b/contrib/gitdm/group-map-individuals
index bcb50e325cc..cf8a2ce3671 100644
--- a/contrib/gitdm/group-map-individuals
+++ b/contrib/gitdm/group-map-individuals
@@ -15,3 +15,4 @@ samuel.thiba...@ens-lyon.org
 aurel...@aurel32.net
 bala...@eik.bme.hu
 e.emanuelegiuse...@gmail.com
+andrew.smir...@gmail.com
-- 
2.20.1




[PULL 0/3] a few gitdm updates

2019-10-27 Thread Alex Bennée
The following changes since commit 856bd2c28e108ad0eb909bbbf3774f6f8bd7c2d4:

  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging (2019-10-25 21:57:41 +0100)

are available in the Git repository at:

  https://github.com/stsquad/qemu.git tags/pull-gitdm-next-271019-1

for you to fetch changes up to 82448ecd110ec1c4af1504a394b5ea5097dcca3c:

  contrib/gitdm: add China Mobile to the domain map (2019-10-27 19:16:30 +)


gitdm updates

  - Add a few individuals
  - Add China Mobile


Alex Bennée (3):
  contrib/gitdm: add Emanuele as an individual
  contrib/gitdm: add Andrey to the individual group
  contrib/gitdm: add China Mobile to the domain map

 contrib/gitdm/domain-map| 1 +
 contrib/gitdm/group-map-individuals | 2 ++
 2 files changed, 3 insertions(+)

-- 
2.20.1




Re: [PATCH 2/5] ipmi: Add support to customize OEM functions

2019-10-27 Thread Corey Minyard
On Sun, Oct 27, 2019 at 06:47:39PM +0100, David Gibson wrote:
> On Mon, Oct 21, 2019 at 09:30:17AM -0500, Corey Minyard wrote:
> > On Mon, Oct 21, 2019 at 03:12:12PM +0200, Cédric Le Goater wrote:
> > > The routine ipmi_register_oem_netfn() lets external modules register
> > > command handlers for OEM functions. Required for the PowerNV machine.
> > 
> > Comments inline.
> > 
> > > 
> > > Cc: Corey Minyard 
> > > Signed-off-by: Cédric Le Goater 
> > > ---
> > >  include/hw/ipmi/ipmi.h | 36 
> > >  hw/ipmi/ipmi_bmc_sim.c | 41 ++---
> > >  2 files changed, 42 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/include/hw/ipmi/ipmi.h b/include/hw/ipmi/ipmi.h
> > > index 6f2413b39b4a..cb7203b06767 100644
> > > --- a/include/hw/ipmi/ipmi.h
> > > +++ b/include/hw/ipmi/ipmi.h
> > > @@ -265,4 +265,40 @@ int ipmi_bmc_sdr_find(IPMIBmc *b, uint16_t recid,
> > >const struct ipmi_sdr_compact **sdr, uint16_t 
> > > *nextrec);
> > >  void ipmi_bmc_gen_event(IPMIBmc *b, uint8_t *evt, bool log);
> > >  
> > > +typedef struct IPMIBmcSim IPMIBmcSim;
> > 
> > This type isn't very useful outside of the simulator, but changes for
> > that can come as they are needed.  I don't see an easy way to avoid
> > putting it here.
> > 
> > > +
> > > +typedef struct RspBuffer {
> > > +uint8_t buffer[MAX_IPMI_MSG_SIZE];
> > > +unsigned int len;
> > > +} RspBuffer;
> > > +
> > > +static inline void rsp_buffer_set_error(RspBuffer *rsp, uint8_t byte)
> > > +{
> > > +rsp->buffer[2] = byte;
> > > +}
> > > +
> > > +/* Add a byte to the response. */
> > > +static inline void rsp_buffer_push(RspBuffer *rsp, uint8_t byte)
> > > +{
> > > +if (rsp->len >= sizeof(rsp->buffer)) {
> > > +rsp_buffer_set_error(rsp, IPMI_CC_REQUEST_DATA_TRUNCATED);
> > > +return;
> > > +}
> > > +rsp->buffer[rsp->len++] = byte;
> > > +}
> > > +
> > > +typedef struct IPMICmdHandler {
> > > +void (*cmd_handler)(IPMIBmcSim *s,
> > > +uint8_t *cmd, unsigned int cmd_len,
> > > +RspBuffer *rsp);
> > > +unsigned int cmd_len_min;
> > > +} IPMICmdHandler;
> > > +
> > > +typedef struct IPMINetfn {
> > > +unsigned int cmd_nums;
> > > +const IPMICmdHandler *cmd_handlers;
> > > +} IPMINetfn;
> > > +
> > > +int ipmi_register_oem_netfn(IPMIBmc *b, const IPMINetfn *netfnd);
> > > +
> > >  #endif
> > > diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
> > > index 71e56f3b13d1..770aace55b08 100644
> > > --- a/hw/ipmi/ipmi_bmc_sim.c
> > > +++ b/hw/ipmi/ipmi_bmc_sim.c
> > > @@ -98,6 +98,7 @@
> > >  #define IPMI_CMD_GET_SEL_TIME 0x48
> > >  #define IPMI_CMD_SET_SEL_TIME 0x49
> > >  
> > > +#define IPMI_NETFN_OEM0x3a
> > >  
> > >  /* Same as a timespec struct. */
> > >  struct ipmi_time {
> > > @@ -167,23 +168,8 @@ typedef struct IPMISensor {
> > >  #define MAX_SENSORS 20
> > >  #define IPMI_WATCHDOG_SENSOR 0
> > >  
> > > -typedef struct IPMIBmcSim IPMIBmcSim;
> > > -typedef struct RspBuffer RspBuffer;
> > > -
> > >  #define MAX_NETFNS 64
> > >  
> > > -typedef struct IPMICmdHandler {
> > > -void (*cmd_handler)(IPMIBmcSim *s,
> > > -uint8_t *cmd, unsigned int cmd_len,
> > > -RspBuffer *rsp);
> > > -unsigned int cmd_len_min;
> > > -} IPMICmdHandler;
> > > -
> > > -typedef struct IPMINetfn {
> > > -unsigned int cmd_nums;
> > > -const IPMICmdHandler *cmd_handlers;
> > > -} IPMINetfn;
> > > -
> > >  typedef struct IPMIRcvBufEntry {
> > >  QTAILQ_ENTRY(IPMIRcvBufEntry) entry;
> > >  uint8_t len;
> > > @@ -279,28 +265,8 @@ struct IPMIBmcSim {
> > >  #define IPMI_BMC_WATCHDOG_ACTION_POWER_DOWN  2
> > >  #define IPMI_BMC_WATCHDOG_ACTION_POWER_CYCLE 3
> > >  
> > > -struct RspBuffer {
> > > -uint8_t buffer[MAX_IPMI_MSG_SIZE];
> > > -unsigned int len;
> > > -};
> > > -
> > >  #define RSP_BUFFER_INITIALIZER { }
> > >  
> > > -static inline void rsp_buffer_set_error(RspBuffer *rsp, uint8_t byte)
> > > -{
> > > -rsp->buffer[2] = byte;
> > > -}
> > > -
> > > -/* Add a byte to the response. */
> > > -static inline void rsp_buffer_push(RspBuffer *rsp, uint8_t byte)
> > > -{
> > > -if (rsp->len >= sizeof(rsp->buffer)) {
> > > -rsp_buffer_set_error(rsp, IPMI_CC_REQUEST_DATA_TRUNCATED);
> > > -return;
> > > -}
> > > -rsp->buffer[rsp->len++] = byte;
> > > -}
> > > -
> > >  static inline void rsp_buffer_pushmore(RspBuffer *rsp, uint8_t *bytes,
> > > unsigned int n)
> > >  {
> > > @@ -640,6 +606,11 @@ static int ipmi_register_netfn(IPMIBmcSim *s, 
> > > unsigned int netfn,
> > >  return 0;
> > >  }
> > >  
> > > +int ipmi_register_oem_netfn(IPMIBmc *b, const IPMINetfn *netfnd)
> > > +{
> > > +return ipmi_register_netfn(IPMI_BMC_SIMULATOR(b), IPMI_NETFN_OEM, 
> > > netfnd);
> > > +}
> > 
> > I 

Re: [PATCH 2/5] ipmi: Add support to customize OEM functions

2019-10-27 Thread David Gibson
On Mon, Oct 21, 2019 at 09:30:17AM -0500, Corey Minyard wrote:
> On Mon, Oct 21, 2019 at 03:12:12PM +0200, Cédric Le Goater wrote:
> > The routine ipmi_register_oem_netfn() lets external modules register
> > command handlers for OEM functions. Required for the PowerNV machine.
> 
> Comments inline.
> 
> > 
> > Cc: Corey Minyard 
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  include/hw/ipmi/ipmi.h | 36 
> >  hw/ipmi/ipmi_bmc_sim.c | 41 ++---
> >  2 files changed, 42 insertions(+), 35 deletions(-)
> > 
> > diff --git a/include/hw/ipmi/ipmi.h b/include/hw/ipmi/ipmi.h
> > index 6f2413b39b4a..cb7203b06767 100644
> > --- a/include/hw/ipmi/ipmi.h
> > +++ b/include/hw/ipmi/ipmi.h
> > @@ -265,4 +265,40 @@ int ipmi_bmc_sdr_find(IPMIBmc *b, uint16_t recid,
> >const struct ipmi_sdr_compact **sdr, uint16_t 
> > *nextrec);
> >  void ipmi_bmc_gen_event(IPMIBmc *b, uint8_t *evt, bool log);
> >  
> > +typedef struct IPMIBmcSim IPMIBmcSim;
> 
> This type isn't very useful outside of the simulator, but changes for
> that can come as they are needed.  I don't see an easy way to avoid
> putting it here.
> 
> > +
> > +typedef struct RspBuffer {
> > +uint8_t buffer[MAX_IPMI_MSG_SIZE];
> > +unsigned int len;
> > +} RspBuffer;
> > +
> > +static inline void rsp_buffer_set_error(RspBuffer *rsp, uint8_t byte)
> > +{
> > +rsp->buffer[2] = byte;
> > +}
> > +
> > +/* Add a byte to the response. */
> > +static inline void rsp_buffer_push(RspBuffer *rsp, uint8_t byte)
> > +{
> > +if (rsp->len >= sizeof(rsp->buffer)) {
> > +rsp_buffer_set_error(rsp, IPMI_CC_REQUEST_DATA_TRUNCATED);
> > +return;
> > +}
> > +rsp->buffer[rsp->len++] = byte;
> > +}
> > +
> > +typedef struct IPMICmdHandler {
> > +void (*cmd_handler)(IPMIBmcSim *s,
> > +uint8_t *cmd, unsigned int cmd_len,
> > +RspBuffer *rsp);
> > +unsigned int cmd_len_min;
> > +} IPMICmdHandler;
> > +
> > +typedef struct IPMINetfn {
> > +unsigned int cmd_nums;
> > +const IPMICmdHandler *cmd_handlers;
> > +} IPMINetfn;
> > +
> > +int ipmi_register_oem_netfn(IPMIBmc *b, const IPMINetfn *netfnd);
> > +
> >  #endif
> > diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
> > index 71e56f3b13d1..770aace55b08 100644
> > --- a/hw/ipmi/ipmi_bmc_sim.c
> > +++ b/hw/ipmi/ipmi_bmc_sim.c
> > @@ -98,6 +98,7 @@
> >  #define IPMI_CMD_GET_SEL_TIME 0x48
> >  #define IPMI_CMD_SET_SEL_TIME 0x49
> >  
> > +#define IPMI_NETFN_OEM0x3a
> >  
> >  /* Same as a timespec struct. */
> >  struct ipmi_time {
> > @@ -167,23 +168,8 @@ typedef struct IPMISensor {
> >  #define MAX_SENSORS 20
> >  #define IPMI_WATCHDOG_SENSOR 0
> >  
> > -typedef struct IPMIBmcSim IPMIBmcSim;
> > -typedef struct RspBuffer RspBuffer;
> > -
> >  #define MAX_NETFNS 64
> >  
> > -typedef struct IPMICmdHandler {
> > -void (*cmd_handler)(IPMIBmcSim *s,
> > -uint8_t *cmd, unsigned int cmd_len,
> > -RspBuffer *rsp);
> > -unsigned int cmd_len_min;
> > -} IPMICmdHandler;
> > -
> > -typedef struct IPMINetfn {
> > -unsigned int cmd_nums;
> > -const IPMICmdHandler *cmd_handlers;
> > -} IPMINetfn;
> > -
> >  typedef struct IPMIRcvBufEntry {
> >  QTAILQ_ENTRY(IPMIRcvBufEntry) entry;
> >  uint8_t len;
> > @@ -279,28 +265,8 @@ struct IPMIBmcSim {
> >  #define IPMI_BMC_WATCHDOG_ACTION_POWER_DOWN  2
> >  #define IPMI_BMC_WATCHDOG_ACTION_POWER_CYCLE 3
> >  
> > -struct RspBuffer {
> > -uint8_t buffer[MAX_IPMI_MSG_SIZE];
> > -unsigned int len;
> > -};
> > -
> >  #define RSP_BUFFER_INITIALIZER { }
> >  
> > -static inline void rsp_buffer_set_error(RspBuffer *rsp, uint8_t byte)
> > -{
> > -rsp->buffer[2] = byte;
> > -}
> > -
> > -/* Add a byte to the response. */
> > -static inline void rsp_buffer_push(RspBuffer *rsp, uint8_t byte)
> > -{
> > -if (rsp->len >= sizeof(rsp->buffer)) {
> > -rsp_buffer_set_error(rsp, IPMI_CC_REQUEST_DATA_TRUNCATED);
> > -return;
> > -}
> > -rsp->buffer[rsp->len++] = byte;
> > -}
> > -
> >  static inline void rsp_buffer_pushmore(RspBuffer *rsp, uint8_t *bytes,
> > unsigned int n)
> >  {
> > @@ -640,6 +606,11 @@ static int ipmi_register_netfn(IPMIBmcSim *s, unsigned 
> > int netfn,
> >  return 0;
> >  }
> >  
> > +int ipmi_register_oem_netfn(IPMIBmc *b, const IPMINetfn *netfnd)
> > +{
> > +return ipmi_register_netfn(IPMI_BMC_SIMULATOR(b), IPMI_NETFN_OEM, 
> > netfnd);
> > +}
> 
> I think I would prefer just exposing ipmi_register_netfn() and maybe
> rename it ipmi_sim_register_netfn() or something like that.  There may
> be other netfns needed in the future.
> 
> But with that change, this looks good to me:
> 
> Reviewed-by: Corey Minyard 

What's the plan for merging this, once it's ready?  Is there an IPMI
tree for it to 

Re: [PATCH 1/5] ppc/pnv: Add a PNOR model

2019-10-27 Thread David Gibson
On Mon, Oct 21, 2019 at 03:12:11PM +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> On a POWERPC PowerNV system, the host firmware is stored in a PNOR
> flash chip which contents is mapped on the LPC bus. This model adds a
> simple dummy device to map the contents of a block device in the host
> address space.
> 
> Signed-off-by: Cédric Le Goater 

Applied to ppc-for-4.2.  The rest of the series will need acks for the
generic IPMI changes, of course.

> ---
>  include/hw/ppc/pnv.h  |   3 +
>  include/hw/ppc/pnv_pnor.h |  25 +++
>  hw/ppc/pnv.c  |  14 
>  hw/ppc/pnv_pnor.c | 135 ++
>  hw/ppc/Makefile.objs  |   4 +-
>  5 files changed, 180 insertions(+), 1 deletion(-)
>  create mode 100644 include/hw/ppc/pnv_pnor.h
>  create mode 100644 hw/ppc/pnv_pnor.c
> 
> diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
> index 5e01a9f3df95..e2f20f2b0bc4 100644
> --- a/include/hw/ppc/pnv.h
> +++ b/include/hw/ppc/pnv.h
> @@ -24,6 +24,7 @@
>  #include "hw/sysbus.h"
>  #include "hw/ipmi/ipmi.h"
>  #include "hw/ppc/pnv_lpc.h"
> +#include "hw/ppc/pnv_pnor.h"
>  #include "hw/ppc/pnv_psi.h"
>  #include "hw/ppc/pnv_occ.h"
>  #include "hw/ppc/pnv_homer.h"
> @@ -173,6 +174,8 @@ typedef struct PnvMachineState {
>  
>  IPMIBmc  *bmc;
>  Notifier powerdown_notifier;
> +
> +PnvPnor  *pnor;
>  } PnvMachineState;
>  
>  static inline bool pnv_chip_is_power9(const PnvChip *chip)
> diff --git a/include/hw/ppc/pnv_pnor.h b/include/hw/ppc/pnv_pnor.h
> new file mode 100644
> index ..dec811695c8d
> --- /dev/null
> +++ b/include/hw/ppc/pnv_pnor.h
> @@ -0,0 +1,25 @@
> +/*
> + * QEMU PowerNV PNOR simple model
> + *
> + * Copyright (c) 2019, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +#ifndef _PPC_PNV_PNOR_H
> +#define _PPC_PNV_PNOR_H
> +
> +#define TYPE_PNV_PNOR  "pnv-pnor"
> +#define PNV_PNOR(obj)  OBJECT_CHECK(PnvPnor, (obj), TYPE_PNV_PNOR)
> +
> +typedef struct PnvPnor {
> +SysBusDevice   parent_obj;
> +
> +BlockBackend   *blk;
> +
> +uint8_t*storage;
> +uint32_t   size;
> +MemoryRegion   mmio;
> +} PnvPnor;
> +
> +#endif /* _PPC_PNV_PNOR_H */
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index 1a22dfd46031..b74528eba42a 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -44,6 +44,7 @@
>  #include "hw/ppc/xics.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/ppc/pnv_xscom.h"
> +#include "hw/ppc/pnv_pnor.h"
>  
>  #include "hw/isa/isa.h"
>  #include "hw/boards.h"
> @@ -633,6 +634,8 @@ static void pnv_init(MachineState *machine)
>  long fw_size;
>  int i;
>  char *chip_typename;
> +DriveInfo *pnor = drive_get(IF_MTD, 0, 0);
> +DeviceState *dev;
>  
>  /* allocate RAM */
>  if (machine->ram_size < (1 * GiB)) {
> @@ -644,6 +647,17 @@ static void pnv_init(MachineState *machine)
>   machine->ram_size);
>  memory_region_add_subregion(get_system_memory(), 0, ram);
>  
> +/*
> + * Create our simple PNOR device
> + */
> +dev = qdev_create(NULL, TYPE_PNV_PNOR);
> +if (pnor) {
> +qdev_prop_set_drive(dev, "drive", blk_by_legacy_dinfo(pnor),
> +_abort);
> +}
> +qdev_init_nofail(dev);
> +pnv->pnor = PNV_PNOR(dev);
> +
>  /* load skiboot firmware  */
>  if (bios_name == NULL) {
>  bios_name = FW_FILE_NAME;
> diff --git a/hw/ppc/pnv_pnor.c b/hw/ppc/pnv_pnor.c
> new file mode 100644
> index ..bfb1e92b0392
> --- /dev/null
> +++ b/hw/ppc/pnv_pnor.c
> @@ -0,0 +1,135 @@
> +/*
> + * QEMU PowerNV PNOR simple model
> + *
> + * Copyright (c) 2015-2019, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "qemu/log.h"
> +#include "sysemu/block-backend.h"
> +#include "sysemu/blockdev.h"
> +#include "hw/loader.h"
> +#include "hw/ppc/pnv_pnor.h"
> +#include "hw/qdev-properties.h"
> +
> +static uint64_t pnv_pnor_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +PnvPnor *s = PNV_PNOR(opaque);
> +uint64_t ret = 0;
> +int i;
> +
> +for (i = 0; i < size; i++) {
> +ret |= (uint64_t) s->storage[addr + i] << (8 * (size - i - 1));
> +}
> +
> +return ret;
> +}
> +
> +static void pnv_pnor_update(PnvPnor *s, int offset, int size)
> +{
> +int offset_end;
> +
> +if (s->blk) {
> +return;
> +}
> +
> +offset_end = offset + size;
> +offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
> +offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
> +
> +blk_pwrite(s->blk, offset, s->storage + offset,
> +   offset_end - offset, 0);
> +}
> +
> +static void pnv_pnor_write(void 

Re: [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 08:34:27AM -0400, Liu Yi L wrote:
> This patch modifies pci_setup_iommu() to set PCIIOMMUOps instead of only
> setting PCIIOMMUFunc. PCIIOMMUFunc is previously used to get an address
> space for a device in vendor specific way. The PCIIOMMUOps still offers
> this functionality. Use PCIIOMMUOps leaves space to add more iommu related
> vendor specific operations.
> 
> Cc: Kevin Tian 
> Cc: Jacob Pan 
> Cc: Peter Xu 
> Cc: Eric Auger 
> Cc: Yi Sun 
> Cc: David Gibson 
> Signed-off-by: Liu Yi L 

Reviewed-by: David Gibson 

> ---
>  hw/alpha/typhoon.c   |  6 +-
>  hw/arm/smmu-common.c |  6 +-
>  hw/hppa/dino.c   |  6 +-
>  hw/i386/amd_iommu.c  |  6 +-
>  hw/i386/intel_iommu.c|  6 +-
>  hw/pci-host/designware.c |  6 +-
>  hw/pci-host/ppce500.c|  6 +-
>  hw/pci-host/prep.c   |  6 +-
>  hw/pci-host/sabre.c  |  6 +-
>  hw/pci/pci.c | 11 ++-
>  hw/ppc/ppc440_pcix.c |  6 +-
>  hw/ppc/spapr_pci.c   |  6 +-
>  hw/s390x/s390-pci-bus.c  |  8 ++--
>  include/hw/pci/pci.h |  8 ++--
>  include/hw/pci/pci_bus.h |  2 +-
>  15 files changed, 74 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
> index 179e1f7..b890771 100644
> --- a/hw/alpha/typhoon.c
> +++ b/hw/alpha/typhoon.c
> @@ -741,6 +741,10 @@ static AddressSpace *typhoon_pci_dma_iommu(PCIBus *bus, 
> void *opaque, int devfn)
>  return >pchip.iommu_as;
>  }
>  
> +static const PCIIOMMUOps typhoon_iommu_ops = {
> +.get_address_space = typhoon_pci_dma_iommu,
> +};
> +
>  static void typhoon_set_irq(void *opaque, int irq, int level)
>  {
>  TyphoonState *s = opaque;
> @@ -901,7 +905,7 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus 
> **isa_bus,
>   "iommu-typhoon", UINT64_MAX);
>  address_space_init(>pchip.iommu_as, MEMORY_REGION(>pchip.iommu),
> "pchip0-pci");
> -pci_setup_iommu(b, typhoon_pci_dma_iommu, s);
> +pci_setup_iommu(b, _iommu_ops, s);
>  
>  /* Pchip0 PCI special/interrupt acknowledge, 0x801.F800., 64MB.  */
>  memory_region_init_io(>pchip.reg_iack, OBJECT(s), _pci_iack_ops,
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 245817d..d668514 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -342,6 +342,10 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void 
> *opaque, int devfn)
>  return >as;
>  }
>  
> +static const PCIIOMMUOps smmu_ops = {
> +.get_address_space = smmu_find_add_as,
> +};
> +
>  IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid)
>  {
>  uint8_t bus_n, devfn;
> @@ -436,7 +440,7 @@ static void smmu_base_realize(DeviceState *dev, Error 
> **errp)
>  s->smmu_pcibus_by_busptr = g_hash_table_new(NULL, NULL);
>  
>  if (s->primary_bus) {
> -pci_setup_iommu(s->primary_bus, smmu_find_add_as, s);
> +pci_setup_iommu(s->primary_bus, _ops, s);
>  } else {
>  error_setg(errp, "SMMU is not attached to any PCI bus!");
>  }
> diff --git a/hw/hppa/dino.c b/hw/hppa/dino.c
> index ab6969b..dbcff03 100644
> --- a/hw/hppa/dino.c
> +++ b/hw/hppa/dino.c
> @@ -389,6 +389,10 @@ static AddressSpace *dino_pcihost_set_iommu(PCIBus *bus, 
> void *opaque,
>  return >bm_as;
>  }
>  
> +static const PCIIOMMUOps dino_iommu_ops = {
> +.get_address_space = dino_pcihost_set_iommu,
> +};
> +
>  /*
>   * Dino interrupts are connected as shown on Page 78, Table 23
>   * (Little-endian bit numbers)
> @@ -508,7 +512,7 @@ PCIBus *dino_init(MemoryRegion *addr_space,
>  memory_region_add_subregion(>bm, 0xfff0,
>  >bm_cpu_alias);
>  address_space_init(>bm_as, >bm, "pci-bm");
> -pci_setup_iommu(b, dino_pcihost_set_iommu, s);
> +pci_setup_iommu(b, _iommu_ops, s);
>  
>  *p_rtc_irq = qemu_allocate_irq(dino_set_timer_irq, s, 0);
>  *p_ser_irq = qemu_allocate_irq(dino_set_serial_irq, s, 0);
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index d372636..ba6904c 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -1451,6 +1451,10 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
> void *opaque, int devfn)
>  return _as[devfn]->as;
>  }
>  
> +static const PCIIOMMUOps amdvi_iommu_ops = {
> +.get_address_space = amdvi_host_dma_iommu,
> +};
> +
>  static const MemoryRegionOps mmio_mem_ops = {
>  .read = amdvi_mmio_read,
>  .write = amdvi_mmio_write,
> @@ -1576,7 +1580,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>  
>  sysbus_init_mmio(SYS_BUS_DEVICE(s), >mmio);
>  sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, AMDVI_BASE_ADDR);
> -pci_setup_iommu(bus, amdvi_host_dma_iommu, s);
> +pci_setup_iommu(bus, _iommu_ops, s);
>  s->devid = object_property_get_int(OBJECT(>pci), "addr", err);
>  msi_init(>pci.dev, 0, 1, true, false, err);
>  amdvi_init(s);
> diff --git 

Re: [RFC v2 04/22] hw/iommu: introduce IOMMUContext

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 08:34:25AM -0400, Liu Yi L wrote:
> From: Peter Xu 
> 
> This patch adds IOMMUContext as an abstract layer of IOMMU related
> operations. The current usage of this abstract layer is setup dual-
> stage IOMMU translation (vSVA) for vIOMMU.
> 
> To setup dual-stage IOMMU translation, vIOMMU needs to propagate
> guest changes to host via passthru channels (e.g. VFIO). To have
> a better abstraction, it is better to avoid direct calling between
> vIOMMU and VFIO. So we have this new structure to act as abstract
> layer between VFIO and vIOMMU. So far, it is proposed to provide a
> notifier mechanism, which registered by VFIO and fired by vIOMMU.
> 
> For more background, may refer to the discussion below:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg05022.html
> 
> Cc: Kevin Tian 
> Cc: Jacob Pan 
> Cc: Peter Xu 
> Cc: Eric Auger 
> Cc: Yi Sun 
> Cc: David Gibson 
> Signed-off-by: Peter Xu 
> Signed-off-by: Liu Yi L 
> ---
>  hw/Makefile.objs |  1 +
>  hw/iommu/Makefile.objs   |  1 +
>  hw/iommu/iommu.c | 66 
>  include/hw/iommu/iommu.h | 79 
> 
>  4 files changed, 147 insertions(+)
>  create mode 100644 hw/iommu/Makefile.objs
>  create mode 100644 hw/iommu/iommu.c
>  create mode 100644 include/hw/iommu/iommu.h
> 
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index ece6cc3..ac19f9c 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -39,6 +39,7 @@ devices-dirs-y += xen/
>  devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
>  devices-dirs-y += semihosting/
>  devices-dirs-y += smbios/
> +devices-dirs-y += iommu/
>  endif
>  
>  common-obj-y += $(devices-dirs-y)
> diff --git a/hw/iommu/Makefile.objs b/hw/iommu/Makefile.objs
> new file mode 100644
> index 000..0484b79
> --- /dev/null
> +++ b/hw/iommu/Makefile.objs
> @@ -0,0 +1 @@
> +obj-y += iommu.o
> diff --git a/hw/iommu/iommu.c b/hw/iommu/iommu.c
> new file mode 100644
> index 000..2391b0d
> --- /dev/null
> +++ b/hw/iommu/iommu.c
> @@ -0,0 +1,66 @@
> +/*
> + * QEMU abstract of IOMMU context
> + *
> + * Copyright (C) 2019 Red Hat Inc.
> + *
> + * Authors: Peter Xu ,
> + *  Liu Yi L 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/iommu/iommu.h"
> +
> +void iommu_ctx_notifier_register(IOMMUContext *iommu_ctx,
> + IOMMUCTXNotifier *n,
> + IOMMUCTXNotifyFn fn,
> + IOMMUCTXEvent event)
> +{
> +n->event = event;
> +n->iommu_ctx_event_notify = fn;
> +QLIST_INSERT_HEAD(_ctx->iommu_ctx_notifiers, n, node);

Having this both modify the IOMMUCTXNotifier structure and insert it
in the list seems confusing to me - and gratuitously different from
the interface for both IOMMUNotifier and Notifier.

Separating out a iommu_ctx_notifier_init() as a helper and having
register take a fully initialized structure seems better to me.

> +return;

Using an explicit return at the end of a function returning void is an
odd style.

> +}
> +
> +void iommu_ctx_notifier_unregister(IOMMUContext *iommu_ctx,
> +   IOMMUCTXNotifier *notifier)
> +{
> +IOMMUCTXNotifier *cur, *next;
> +
> +QLIST_FOREACH_SAFE(cur, _ctx->iommu_ctx_notifiers, node, next) {
> +if (cur == notifier) {
> +QLIST_REMOVE(cur, node);
> +break;
> +}
> +}
> +}
> +
> +void iommu_ctx_event_notify(IOMMUContext *iommu_ctx,
> +IOMMUCTXEventData *event_data)
> +{
> +IOMMUCTXNotifier *cur;
> +
> +QLIST_FOREACH(cur, _ctx->iommu_ctx_notifiers, node) {
> +if ((cur->event == event_data->event) &&
> + cur->iommu_ctx_event_notify) {

Do you actually need the test on iommu_ctx_event_notify?  I can't see
any reason to register a notifier with a NULL function pointer.

> +cur->iommu_ctx_event_notify(cur, event_data);
> +}
> +}
> +}
> +
> +void iommu_context_init(IOMMUContext *iommu_ctx)
> +{
> +QLIST_INIT(_ctx->iommu_ctx_notifiers);
> +}
> diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
> new file mode 100644
> index 000..c22c442
> --- /dev/null
> +++ b/include/hw/iommu/iommu.h
> @@ 

Re: [PATCH 5/6] spapr: Don't use CPU_FOREACH() in 'info pic'

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 11:28:45AM +0200, Greg Kurz wrote:
> On Thu, 24 Oct 2019 14:02:31 +1100
> David Gibson  wrote:
> 
> > On Wed, Oct 23, 2019 at 04:52:21PM +0200, Greg Kurz wrote:
> > > Now that presenter objects are parented to the interrupt controller, stop
> > > relying on CPU_FOREACH() which can race with CPU hotplug and crash QEMU.
> > > 
> > > Signed-off-by: Greg Kurz 
> > 
> > So.. we might be able to go further than this.  Having the
> > TYPE_INTERRUPT_STATS_PROVIDER interrupt on the machine is actually an
> > spapr and pnv oddity.  In most cases that interface is on the various
> > components of the interrupt controller directly.  hmp_info_irq() scans
> > the whole QOM tree looking for everything with the interface to
> > produce the info pic output.
> > 
> > It would be nice if we can do the same for xics and xive.  The tricky
> > bit is that we do have the possibility of both, in which case the
> > individual components would need to know if they're currently "active"
> > and minimize their output if so.
> > 
> 
> Yes but this looks like 4.3 material. If we want to fix this for 4.2,
> I'm now thinking it might be safer to keep CPU_FOREACH() and check the
> state.

Yeah, for 4.2 that sounds like a good idea.

> 
> > > ---
> > >  hw/intc/spapr_xive.c  |8 +---
> > >  hw/intc/xics.c|   12 
> > >  hw/intc/xics_spapr.c  |8 +---
> > >  hw/intc/xive.c|   12 
> > >  include/hw/ppc/xics.h |1 +
> > >  include/hw/ppc/xive.h |2 ++
> > >  6 files changed, 29 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> > > index d74ee71e76b4..05763a58cf5d 100644
> > > --- a/hw/intc/spapr_xive.c
> > > +++ b/hw/intc/spapr_xive.c
> > > @@ -579,14 +579,8 @@ static void 
> > > spapr_xive_set_irq(SpaprInterruptController *intc, int irq, int val)
> > >  static void spapr_xive_print_info(SpaprInterruptController *intc, 
> > > Monitor *mon)
> > >  {
> > >  SpaprXive *xive = SPAPR_XIVE(intc);
> > > -CPUState *cs;
> > > -
> > > -CPU_FOREACH(cs) {
> > > -PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > -
> > > -xive_tctx_pic_print_info(spapr_cpu_state(cpu)->tctx, mon);
> > > -}
> > >  
> > > +xive_presenter_print_info(XIVE_ROUTER(intc), mon);
> > >  spapr_xive_pic_print_info(xive, mon);
> > >  }
> > >  
> > > diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> > > index d5e4db668a4b..6e820c4851f3 100644
> > > --- a/hw/intc/xics.c
> > > +++ b/hw/intc/xics.c
> > > @@ -88,6 +88,18 @@ void ics_pic_print_info(ICSState *ics, Monitor *mon)
> > >  }
> > >  }
> > >  
> > > +static int do_ics_pic_print_icp_infos(Object *child, void *opaque)
> > > +{
> > > +icp_pic_print_info(ICP(child), opaque);
> > > +return 0;
> > > +}
> > > +
> > > +void ics_pic_print_icp_infos(ICSState *ics, const char *type, Monitor 
> > > *mon)
> > > +{
> > > +object_child_foreach_type(OBJECT(ics), type, 
> > > do_ics_pic_print_icp_infos,
> > > +  mon);
> > > +}
> > > +
> > >  /*
> > >   * ICP: Presentation layer
> > >   */
> > > diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> > > index 080ed73aad64..7624d693c8da 100644
> > > --- a/hw/intc/xics_spapr.c
> > > +++ b/hw/intc/xics_spapr.c
> > > @@ -400,14 +400,8 @@ static void 
> > > xics_spapr_set_irq(SpaprInterruptController *intc, int irq, int val)
> > >  static void xics_spapr_print_info(SpaprInterruptController *intc, 
> > > Monitor *mon)
> > >  {
> > >  ICSState *ics = ICS_SPAPR(intc);
> > > -CPUState *cs;
> > > -
> > > -CPU_FOREACH(cs) {
> > > -PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > -
> > > -icp_pic_print_info(spapr_cpu_state(cpu)->icp, mon);
> > > -}
> > >  
> > > +ics_pic_print_icp_infos(ics, TYPE_ICP, mon);
> > >  ics_pic_print_info(ics, mon);
> > >  }
> > >  
> > > diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > > index 8d2da4a11163..40ce43152456 100644
> > > --- a/hw/intc/xive.c
> > > +++ b/hw/intc/xive.c
> > > @@ -547,6 +547,18 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, 
> > > Monitor *mon)
> > >  }
> > >  }
> > >  
> > > +static int do_xive_presenter_print_info(Object *child, void *opaque)
> > > +{
> > > +xive_tctx_pic_print_info(XIVE_TCTX(child), opaque);
> > > +return 0;
> > > +}
> > > +
> > > +void xive_presenter_print_info(XiveRouter *xrtr, Monitor *mon)
> > > +{
> > > +object_child_foreach_type(OBJECT(xrtr), TYPE_XIVE_TCTX,
> > > +  do_xive_presenter_print_info, mon);
> > > +}
> > > +
> > >  void xive_tctx_reset(XiveTCTX *tctx)
> > >  {
> > >  memset(tctx->regs, 0, sizeof(tctx->regs));
> > > diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> > > index f4827e748fd8..4de1f421c997 100644
> > > --- a/include/hw/ppc/xics.h
> > > +++ b/include/hw/ppc/xics.h
> > > @@ -175,6 +175,7 @@ static inline bool ics_irq_free(ICSState *ics, 
> > > uint32_t srcno)
> > >  void ics_set_irq_type(ICSState *ics, int srcno, 

Re: [PATCH 0/3] ppc: Fix 'info pic' crash

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 04:27:16PM +0200, Greg Kurz wrote:
> The interrupt presenters are currently parented to their associated
> VCPU, and we rely on CPU_FOREACH() when we need to perform a specific
> task with them. Like exposing their state with 'info pic', or finding
> the target VCPU for an interrupt when using the XIVE controller.
> 
> We recently realized that the latter could crash QEMU because CPU_FOREACH()
> can race with CPU hotplug. This got fixed by checking the presenter pointer
> under the CPU was set (commit 627fa61746f7), but I'm not that sure that
> this is enough since the presenter pointers also get stale at some point
> during CPU unplug. And we still have other users of CPU_FOREACH(), namely
> 'info pic' with both XICS and XIVE, that have the very same problem:
> 
> With XIVE:
> 
> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
> 0x0001003d2848 in xive_tctx_pic_print_info (tctx=0x101ae5280, 
> mon=0x7fffe180) at /home/greg/Work/qemu/qemu-spapr/hw/intc/xive.c:526
> 526 int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
> (gdb) p tctx
> $1 = (XiveTCTX *) 0x101ae5280
> (gdb) p tctx->cs
> $2 = (CPUState *) 0x2057512020203a5d <-- tctx is stale
> (gdb) p tctx->cs->cpu_index
> Cannot access memory at address 0x205751202020bead
> 
> With XICS:
> 
> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
> 0x0001003cc39c in icp_pic_print_info (icp=0x10244ccf0, mon=0x7fffe940)
> at /home/greg/Work/qemu/qemu-spapr/hw/intc/xics.c:47
> 47  int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
> (gdb) p icp
> $1 = (ICPState *) 0x10244ccf0
> (gdb) p icp->cs
> $2 = (CPUState *) 0x524958203220 <-- icp is stale
> (gdb) p icp->cs->cpu_index
> Cannot access memory at address 0x52495820b670
> 
> It may be worth finding a way to address this globally instead of
> open-coding the check of the presenter pointer everywhere because
> this is fragile. I gave a try with this series:
> 
>   [0/6] ppc: Reparent the interrupt presenter
> 
>   https://patchwork.ozlabs.org/cover/1182224/
> 
> but it requires some more reflexion. Also, we're about to enter
> softfreeze, and it seems better to come up with a simpler fix.
> 
> Let's forget the reparenting and check the presenter pointers
> where needed instead. Patch 1 from the previous series was changed
> to also NULLify presenter pointers, so that they can be used to
> filter out unwanted vCPUs in patch 3. I've kept patch 2 because
> it's a fix in the same area, but it isn't related to the QEMU
> crashes.

Applied to ppc-for-4.2, thanks.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v5 4/7] ppc/pnv: Add a PnvChip pointer to PnvCore

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 07:30:30PM +0200, Greg Kurz wrote:
> On Thu, 24 Oct 2019 13:38:12 +1100
> David Gibson  wrote:
> 
> > On Tue, Oct 22, 2019 at 06:38:09PM +0200, Cédric Le Goater wrote:
> > > We will use it to reset the interrupt presenter from the CPU reset
> > > handler.
> > > 
> > > Signed-off-by: Cédric Le Goater 
> > > Reviewed-by: Greg Kurz 
> > > ---
> > >  include/hw/ppc/pnv_core.h | 3 +++
> > >  hw/ppc/pnv_core.c | 3 ++-
> > >  2 files changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
> > > index bfbd2ec42aa6..55eee95104da 100644
> > > --- a/include/hw/ppc/pnv_core.h
> > > +++ b/include/hw/ppc/pnv_core.h
> > > @@ -31,6 +31,8 @@
> > >  #define PNV_CORE_GET_CLASS(obj) \
> > >   OBJECT_GET_CLASS(PnvCoreClass, (obj), TYPE_PNV_CORE)
> > >  
> > > +typedef struct PnvChip PnvChip;
> > > +
> > >  typedef struct PnvCore {
> > >  /*< private >*/
> > >  CPUCore parent_obj;
> > > @@ -38,6 +40,7 @@ typedef struct PnvCore {
> > >  /*< public >*/
> > >  PowerPCCPU **threads;
> > >  uint32_t pir;
> > > +PnvChip *chip;
> > 
> > I don't love having this as a redundant encoding of the information
> > already in the property, since it raises the possibility of confusing
> > bugs if they ever got out of sync.
> > 
> 
> Ouch, we also have this pattern in xive_tctx_realize(). The XiveTCXT
> object has both a "cpu" property and a pointer to the vCPU.
> 
> > It's not a huge deal, but it would be nice to at least to at least
> > consider either a) grabbing the property everywhere you need it (if
> > there aren't too many places) or b) customizing the property
> > definition so it's written directly into that field.
> > 
> 
> The pointer to the vCPU is used among other things to get the
> value of the PIR, which is needed by the presenting logic to
> match physical CAM lines. This is a _hot_ path so it's probably
> better to go for b).

Agreed.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 6/6] xive: Don't use CPU_FOREACH() to perform CAM line matching

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 02:33:27PM +0200, Greg Kurz wrote:
> On Thu, 24 Oct 2019 14:05:36 +1100
> David Gibson  wrote:
> 
> > On Wed, Oct 23, 2019 at 04:52:27PM +0200, Greg Kurz wrote:
> > > Now that the TCTX objects are children of the XIVE router, stop
> > > using CPU_FOREACH() when looking for a matching VCPU target.
> > > 
> > > Signed-off-by: Greg Kurz 
> > > ---
> > >  hw/intc/xive.c |  100 
> > > +++-
> > >  1 file changed, 62 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > > index 40ce43152456..ec5e7d0ee39a 100644
> > > --- a/hw/intc/xive.c
> > > +++ b/hw/intc/xive.c
> > > @@ -1403,55 +1403,79 @@ typedef struct XiveTCTXMatch {
> > >  uint8_t ring;
> > >  } XiveTCTXMatch;
> > >  
> > > -static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> > > - uint8_t nvt_blk, uint32_t nvt_idx,
> > > - bool cam_ignore, uint8_t priority,
> > > - uint32_t logic_serv, XiveTCTXMatch 
> > > *match)
> > > +typedef struct XivePresenterMatch {
> > > +uint8_t format;
> > > +uint8_t nvt_blk;
> > > +uint32_t nvt_idx;
> > > +bool cam_ignore;
> > > +uint8_t priority;
> > > +uint32_t logic_serv;
> > > +XiveTCTXMatch *match;
> > > +int count;
> > > +} XivePresenterMatch;
> > > +
> > > +static int do_xive_presenter_match(Object *child, void *opaque)
> > >  {
> > > -CPUState *cs;
> > > +XiveTCTX *tctx = XIVE_TCTX(child);
> > > +XivePresenterMatch *xpm = opaque;
> > > +int ring;
> > >  
> > >  /*
> > >   * TODO (PowerNV): handle chip_id overwrite of block field for
> > >   * hardwired CAM compares
> > >   */
> > >  
> > > -CPU_FOREACH(cs) {
> > > -XiveTCTX *tctx = xive_router_get_tctx(xrtr, cs);
> > > -int ring;
> > > +/*
> > > + * HW checks that the CPU is enabled in the Physical Thread
> > > + * Enable Register (PTER).
> > > + */
> > >  
> > > -/*
> > > - * Skip partially initialized vCPUs. This can happen when
> > > - * vCPUs are hotplugged.
> > > - */
> > > -if (!tctx) {
> > > -continue;
> > > +/*
> > > + * Check the thread context CAM lines and record matches. We
> > > + * will handle CPU exception delivery later
> > > + */
> > > +ring = xive_presenter_tctx_match(tctx, xpm->format, xpm->nvt_blk,
> > > + xpm->nvt_idx, xpm->cam_ignore,
> > > + xpm->logic_serv);
> > > +
> > > +/*
> > > + * Save the context and follow on to catch duplicates, that we
> > > + * don't support yet.
> > > + */
> > > +if (ring != -1) {
> > > +if (xpm->match->tctx) {
> > > +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread 
> > > "
> > > +  "context NVT %x/%x\n", xpm->nvt_blk, 
> > > xpm->nvt_idx);
> > > +return -1;
> > >  }
> > >  
> > > -/*
> > > - * HW checks that the CPU is enabled in the Physical Thread
> > > - * Enable Register (PTER).
> > > - */
> > > +xpm->match->ring = ring;
> > > +xpm->match->tctx = tctx;
> > > +xpm->count++;
> > > +}
> > >  
> > > -/*
> > > - * Check the thread context CAM lines and record matches. We
> > > - * will handle CPU exception delivery later
> > > - */
> > > -ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
> > > - cam_ignore, logic_serv);
> > > -/*
> > > - * Save the context and follow on to catch duplicates, that we
> > > - * don't support yet.
> > > - */
> > > -if (ring != -1) {
> > > -if (match->tctx) {
> > > -qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a 
> > > thread "
> > > -  "context NVT %x/%x\n", nvt_blk, nvt_idx);
> > > -return false;
> > > -}
> > > -
> > > -match->ring = ring;
> > > -match->tctx = tctx;
> > > -}
> > > +return 0;
> > > +}
> > > +
> > > +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> > > + uint8_t nvt_blk, uint32_t nvt_idx,
> > > + bool cam_ignore, uint8_t priority,
> > > + uint32_t logic_serv, XiveTCTXMatch 
> > > *match)
> > > +{
> > > +XivePresenterMatch xpm = {
> > > +.format = format,
> > > +.nvt_blk= nvt_blk,
> > > +.nvt_idx= nvt_idx,
> > > +.cam_ignore = cam_ignore,
> > > +.priority   = priority,
> > > +.logic_serv = logic_serv,
> > > +.match  = match,
> > > +.count  = 0,
> > > +};
> > > +
> > > +if 

Re: [PATCH v5 4/7] ppc/pnv: Add a PnvChip pointer to PnvCore

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 11:57:05AM +0200, Cédric Le Goater wrote:
> On 24/10/2019 04:38, David Gibson wrote:
> > On Tue, Oct 22, 2019 at 06:38:09PM +0200, Cédric Le Goater wrote:
> >> We will use it to reset the interrupt presenter from the CPU reset
> >> handler.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> Reviewed-by: Greg Kurz 
> >> ---
> >>  include/hw/ppc/pnv_core.h | 3 +++
> >>  hw/ppc/pnv_core.c | 3 ++-
> >>  2 files changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
> >> index bfbd2ec42aa6..55eee95104da 100644
> >> --- a/include/hw/ppc/pnv_core.h
> >> +++ b/include/hw/ppc/pnv_core.h
> >> @@ -31,6 +31,8 @@
> >>  #define PNV_CORE_GET_CLASS(obj) \
> >>   OBJECT_GET_CLASS(PnvCoreClass, (obj), TYPE_PNV_CORE)
> >>  
> >> +typedef struct PnvChip PnvChip;
> >> +
> >>  typedef struct PnvCore {
> >>  /*< private >*/
> >>  CPUCore parent_obj;
> >> @@ -38,6 +40,7 @@ typedef struct PnvCore {
> >>  /*< public >*/
> >>  PowerPCCPU **threads;
> >>  uint32_t pir;
> >> +PnvChip *chip;
> > 
> > I don't love having this as a redundant encoding of the information
> > already in the property, since it raises the possibility of confusing
> > bugs if they ever got out of sync.
> 
> Indeed.
> 
> > It's not a huge deal, but it would be nice to at least to at least
> > consider either a) grabbing the property everywhere you need it (if
> > there aren't too many places) 
> 
> We need the chip at core creation and core reset to call the 
> interrupt chip handlers. These are not hot path but the pointer
> seemed practical.
> 
> > or b) customizing the property
> > definition so it's written directly into that field.
> 
> OK. That is better than just a link.

I guess.  If there are only two non hot path callers, it seems like it
might be simpler to just pull it out of the property at those places.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v5 4/7] ppc/pnv: Add a PnvChip pointer to PnvCore

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 06:48:23PM +0200, Greg Kurz wrote:
> On Thu, 24 Oct 2019 11:57:05 +0200
> Cédric Le Goater  wrote:
> 
> > On 24/10/2019 04:38, David Gibson wrote:
> > > On Tue, Oct 22, 2019 at 06:38:09PM +0200, Cédric Le Goater wrote:
> > >> We will use it to reset the interrupt presenter from the CPU reset
> > >> handler.
> > >>
> > >> Signed-off-by: Cédric Le Goater 
> > >> Reviewed-by: Greg Kurz 
> > >> ---
> > >>  include/hw/ppc/pnv_core.h | 3 +++
> > >>  hw/ppc/pnv_core.c | 3 ++-
> > >>  2 files changed, 5 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
> > >> index bfbd2ec42aa6..55eee95104da 100644
> > >> --- a/include/hw/ppc/pnv_core.h
> > >> +++ b/include/hw/ppc/pnv_core.h
> > >> @@ -31,6 +31,8 @@
> > >>  #define PNV_CORE_GET_CLASS(obj) \
> > >>   OBJECT_GET_CLASS(PnvCoreClass, (obj), TYPE_PNV_CORE)
> > >>  
> > >> +typedef struct PnvChip PnvChip;
> > >> +
> > >>  typedef struct PnvCore {
> > >>  /*< private >*/
> > >>  CPUCore parent_obj;
> > >> @@ -38,6 +40,7 @@ typedef struct PnvCore {
> > >>  /*< public >*/
> > >>  PowerPCCPU **threads;
> > >>  uint32_t pir;
> > >> +PnvChip *chip;
> > > 
> > > I don't love having this as a redundant encoding of the information
> > > already in the property, since it raises the possibility of confusing
> > > bugs if they ever got out of sync.
> > 
> > Indeed.
> > 
> > > It's not a huge deal, but it would be nice to at least to at least
> > > consider either a) grabbing the property everywhere you need it (if
> > > there aren't too many places) 
> > 
> > We need the chip at core creation and core reset to call the 
> > interrupt chip handlers. These are not hot path but the pointer
> > seemed practical.
> > 
> 
> FWIW this is also used at core destruction in this patch:
> 
> [1/3] ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip
> https://patchwork.ozlabs.org/patch/1183158/
> 
> > > or b) customizing the property
> > > definition so it's written directly into that field.
> > 
> > OK. That is better than just a link.
> > 
> 
> I think David is suggesting to use object_property_add_link()
> instead of object_property_add_const_link() actually. Something
> like that:

TBH, I hadn't looked into the mechanics of how to do this that
closely.  Change below looks pretty reasonable, though.

> 
> diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
> index 55eee95104da..fce6d8d9b31b 100644
> --- a/include/hw/ppc/pnv_core.h
> +++ b/include/hw/ppc/pnv_core.h
> @@ -36,11 +36,11 @@ typedef struct PnvChip PnvChip;
>  typedef struct PnvCore {
>  /*< private >*/
>  CPUCore parent_obj;
> +PnvChip *chip;
>  
>  /*< public >*/
>  PowerPCCPU **threads;
>  uint32_t pir;
> -PnvChip *chip;
>  
>  MemoryRegion xscom_regs;
>  } PnvCore;
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index 68cc3c81aa75..90449d33e422 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -1312,8 +1312,8 @@ static void pnv_chip_core_realize(PnvChip *chip, Error 
> **errp)
>  object_property_set_int(OBJECT(pnv_core),
>  pcc->core_pir(chip, core_hwid),
>  "pir", _fatal);
> -object_property_add_const_link(OBJECT(pnv_core), "chip",
> -   OBJECT(chip), _fatal);
> +object_property_set_link(OBJECT(pnv_core), OBJECT(chip), "chip",
> + _abort);
>  object_property_set_bool(OBJECT(pnv_core), true, "realized",
>   _fatal);
>  
> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> index 61b3d3ce2250..8562a9961845 100644
> --- a/hw/ppc/pnv_core.c
> +++ b/hw/ppc/pnv_core.c
> @@ -217,15 +217,6 @@ static void pnv_core_realize(DeviceState *dev, Error 
> **errp)
>  void *obj;
>  int i, j;
>  char name[32];
> -Object *chip;
> -
> -chip = object_property_get_link(OBJECT(dev), "chip", _err);
> -if (!chip) {
> -error_propagate_prepend(errp, local_err,
> -"required link 'chip' not found: ");
> -return;
> -}
> -pc->chip = PNV_CHIP(chip);
>  
>  pc->threads = g_new(PowerPCCPU *, cc->nr_threads);
>  for (i = 0; i < cc->nr_threads; i++) {
> @@ -323,6 +314,14 @@ static void pnv_core_class_init(ObjectClass *oc, void 
> *data)
>  dc->props = pnv_core_properties;
>  }
>  
> +static void pnv_core_instance_init(Object *obj)
> +{
> +object_property_add_link(obj, "chip", TYPE_PNV_CHIP,
> + (Object **) _CORE(obj)->chip,
> + qdev_prop_allow_set_link_before_realize,
> + 0, _abort);
> +}
> +
>  #define DEFINE_PNV_CORE_TYPE(family, cpu_model) \
>  {   \
>  .parent = TYPE_PNV_CORE,\
> @@ -335,6 +334,7 @@ static const TypeInfo pnv_core_infos[] = 

Re: [PATCH 3/6] ppc: Reparent presenter objects to the interrupt controller object

2019-10-27 Thread David Gibson
On Thu, Oct 24, 2019 at 11:04:53AM +0200, Greg Kurz wrote:
> On Thu, 24 Oct 2019 13:58:41 +1100
> David Gibson  wrote:
> 
> > On Wed, Oct 23, 2019 at 04:52:10PM +0200, Greg Kurz wrote:
> > > Each VCPU is associated to a presenter object within the interrupt
> > > controller, ie. TCTX for XIVE and ICP for XICS, but our current
> > > models put these objects below the VCPU, and we rely on CPU_FOREACH()
> > > to do anything that involves presenters.
> > > 
> > > This recently bit us with the CAM line matching logic in XIVE because
> > > CPU_FOREACH() can race with CPU hotplug and we ended up considering a
> > > VCPU that wasn't associated to a TCTX object yet. Other users of
> > > CPU_FOREACH() are 'info pic' for both XICS and XIVE. It is again very
> > > easy to crash QEMU with concurrent VCPU hotplug/unplug and 'info pic'.
> > > 
> > > Reparent the presenter objects to the corresponding interrupt controller
> > > object, ie. XIVE router or ICS, to make it clear they are not some extra
> > > data hanging from the CPU but internal XIVE or XICS entities. The CPU
> > > object now needs to explicitely take a reference on the presenter to
> > > ensure its pointer remains valid until unrealize time.
> > > 
> > > This will allow to get rid of CPU_FOREACH() and ease further improvements
> > > to the XIVE model.
> > > 
> > > This change doesn't impact section ids and is thus transparent to
> > > migration.
> > > 
> > > Signed-off-by: Greg Kurz 
> > 
> > 
> > Urgh.  I see why we want something like this, but reparenting the
> > presenters to the source modules (particularly for XICS) makes me
> > uncomfortable.
> > 
> > AFAICT the association here is *purely* about managing lifetimes, not
> > because those ICPs inherently have something to do with those ICSes.
> > Same with XIVE, although since they'll be on the same chip there's a
> > bit more logic to it.
> > 
> 
> I did it this way for XIVE because they are indeed on the same chip,
> but I agree it is questionable for XICS.
> 
> > For spapr we kinda-sorta treat the (single) ICS or XiveRouter object
> > as the "master" of the interrupt controller.  But that's not really
> 
> Agreed for XICS. On the other side, the XIVE controller chip really has
> a routing sub-engine (IVRE) and a presenter sub-engine (IVPE), and the
> TCTXs do reside in an SRAM within the IVPE. The XiveRouter type might
> be better named XiveController, and each instance to expose a XiveRouter
> and a XivePresenter interface. We have one per chip for PNV and only a
> single one for sPAPR.

Yes, that sounds like a reasonable approach for XIVE.

> 
> > accurate to the hardware, so I don't really want to push that way of
> > looking at it any further than it already is.
> > 
> > If we could make the presenters children of the machine (spapr) or
> > chip (pnv) that might make more sense?
> > 
> 
> It probably makes sense for XICS, not sure this makes things clearer
> for XIVE.
> 
> > I'm also not totally convinced that having the presenter as a child of
> > the cpu is inherently bad.  Yes we have bugs now, but maybe the right
> > fix *is* actually to do the NULL check on every CPU_FOREACH().
> > 
> > For comparison, the lapic on x86 machines is a child of the cpu, and I
> > believe they do have NULL checks on cpu->apic_state in various places
> > they use CPU_FOREACH().
> > 
> 
> I didn't want to go that way because I was finding CPU_FOREACH() to
> be fragile since all users are broken,

Hm, ok.  There aren't that many existing users though, right?

> but I can revisit that. Maybe
> worth consolidating the logic in a PRESENTER_FOREACH() macro in order
> to avoid future breakage with CPU_FOREACH() ?

That sounds worth considering at least, yes.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v15 01/11] esp: move handle_ti_cmd() cleanup code to esp_do_dma().

2019-10-27 Thread Paolo Bonzini
On 26/10/19 18:45, Laurent Vivier wrote:
> To prepare following patches move do_cmd and DMA special case
> from handle_ti() to esp_do_dma().
> 
> This part of the code must be only executed with real DMA, not with
> pseudo-DMA. And PDMA is detected in esp_do_dma(), so move this part
> of the code in esp_do_dma(). We keep the code in handle_ti_cmd()
> in the case no DMA is done.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/scsi/esp.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index 841d79b60e..09b28cba17 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -249,10 +249,19 @@ static void esp_do_dma(ESPState *s)
>  
>  len = s->dma_left;
>  if (s->do_cmd) {
> +/*
> + * handle_ti_cmd() case: esp_do_dma() is called only from
> + * handle_ti_cmd() with do_cmd != NULL (see the assert())
> + */
>  trace_esp_do_dma(s->cmdlen, len);
>  assert (s->cmdlen <= sizeof(s->cmdbuf) &&
>  len <= sizeof(s->cmdbuf) - s->cmdlen);
>  s->dma_memory_read(s->dma_opaque, >cmdbuf[s->cmdlen], len);
> +trace_esp_handle_ti_cmd(s->cmdlen);
> +s->ti_size = 0;
> +s->cmdlen = 0;
> +s->do_cmd = 0;
> +do_cmd(s, s->cmdbuf);
>  return;
>  }
>  if (s->async_len == 0) {
> @@ -373,8 +382,7 @@ static void handle_ti(ESPState *s)
>  s->dma_left = minlen;
>  s->rregs[ESP_RSTAT] &= ~STAT_TC;
>  esp_do_dma(s);
> -}
> -if (s->do_cmd) {
> +} else if (s->do_cmd) {
>  trace_esp_handle_ti_cmd(s->cmdlen);
>  s->ti_size = 0;
>  s->cmdlen = 0;
> 

Acked-by: Paolo Bonzini 



Re: [PATCH v15 03/11] esp: add pseudo-DMA as used by Macintosh

2019-10-27 Thread Paolo Bonzini
On 26/10/19 18:45, Laurent Vivier wrote:
> There is no DMA in Quadra 800, so the CPU reads/writes the data from the
> PDMA register (offset 0x100, ESP_PDMA in hw/m68k/q800.c) and copies them
> to/from the memory.
> 
> There is a nice assembly loop in the kernel to do that, see
> linux/drivers/scsi/mac_esp.c:MAC_ESP_PDMA_LOOP().
> 
> The start of the transfer is triggered by the DREQ interrupt (see linux
> mac_esp_send_pdma_cmd()), the CPU polls on the IRQ flag to start the
> transfer after a SCSI command has been sent (in Quadra 800 it goes
> through the VIA2, the via2-irq line and the vIFR register)
> 
> The Macintosh hardware includes hardware handshaking to prevent the CPU
> from reading invalid data or writing data faster than the peripheral
> device can accept it.
> 
> This is the "blind mode", and from the doc:
> "Approximate maximum SCSI transfer rates within a blocks are 1.4 MB per
> second for blind transfers in the Macintosh II"
> 
> Some references can be found in:
>   Apple Macintosh Family Hardware Reference, ISBN 0-201-19255-1
>   Guide to the Macintosh Family Hardware, ISBN-0-201-52405-8
> 
> Acked-by: Dr. David Alan Gilbert 
> Co-developed-by: Mark Cave-Ayland 
> Signed-off-by: Mark Cave-Ayland 
> Signed-off-by: Laurent Vivier 
> ---
>  include/hw/scsi/esp.h |  15 +++
>  hw/scsi/esp.c | 278 --
>  2 files changed, 284 insertions(+), 9 deletions(-)
> 
> diff --git a/include/hw/scsi/esp.h b/include/hw/scsi/esp.h
> index adab63d1c9..6ba47dac41 100644
> --- a/include/hw/scsi/esp.h
> +++ b/include/hw/scsi/esp.h
> @@ -14,10 +14,18 @@ typedef void (*ESPDMAMemoryReadWriteFunc)(void *opaque, 
> uint8_t *buf, int len);
>  
>  typedef struct ESPState ESPState;
>  
> +enum pdma_origin_id {
> +PDMA,
> +TI,
> +CMD,
> +ASYNC,
> +};
> +
>  struct ESPState {
>  uint8_t rregs[ESP_REGS];
>  uint8_t wregs[ESP_REGS];
>  qemu_irq irq;
> +qemu_irq irq_data;
>  uint8_t chip_id;
>  bool tchi_written;
>  int32_t ti_size;
> @@ -48,6 +56,12 @@ struct ESPState {
>  ESPDMAMemoryReadWriteFunc dma_memory_write;
>  void *dma_opaque;
>  void (*dma_cb)(ESPState *s);
> +uint8_t pdma_buf[32];
> +int pdma_origin;
> +uint32_t pdma_len;
> +uint32_t pdma_start;
> +uint32_t pdma_cur;
> +void (*pdma_cb)(ESPState *s);
>  };
>  
>  #define TYPE_ESP "esp"
> @@ -59,6 +73,7 @@ typedef struct {
>  /*< public >*/
>  
>  MemoryRegion iomem;
> +MemoryRegion pdma;
>  uint32_t it_shift;
>  ESPState esp;
>  } SysBusESPState;
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index 0230ede21d..f8fc30cccb 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -38,6 +38,8 @@
>   * 
> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C100.txt
>   * and
>   * 
> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR53C9X.txt
> + *
> + * On Macintosh Quadra it is a NCR53C96.
>   */
>  
>  static void esp_raise_irq(ESPState *s)
> @@ -58,6 +60,16 @@ static void esp_lower_irq(ESPState *s)
>  }
>  }
>  
> +static void esp_raise_drq(ESPState *s)
> +{
> +qemu_irq_raise(s->irq_data);
> +}
> +
> +static void esp_lower_drq(ESPState *s)
> +{
> +qemu_irq_lower(s->irq_data);
> +}
> +
>  void esp_dma_enable(ESPState *s, int irq, int level)
>  {
>  if (level) {
> @@ -84,6 +96,30 @@ void esp_request_cancelled(SCSIRequest *req)
>  }
>  }
>  
> +static void set_pdma(ESPState *s, enum pdma_origin_id origin,
> + uint32_t index, uint32_t len)
> +{
> +s->pdma_origin = origin;
> +s->pdma_start = index;
> +s->pdma_cur = index;
> +s->pdma_len = len;
> +}
> +
> +static uint8_t *get_pdma_buf(ESPState *s)
> +{
> +switch (s->pdma_origin) {
> +case PDMA:
> +return s->pdma_buf;
> +case TI:
> +return s->ti_buf;
> +case CMD:
> +return s->cmdbuf;
> +case ASYNC:
> +return s->async_buf;
> +}
> +return NULL;
> +}
> +
>  static int get_cmd_cb(ESPState *s)
>  {
>  int target;
> @@ -125,7 +161,14 @@ static uint32_t get_cmd(ESPState *s, uint8_t *buf, 
> uint8_t buflen)
>  if (dmalen > buflen) {
>  return 0;
>  }
> -s->dma_memory_read(s->dma_opaque, buf, dmalen);
> +if (s->dma_memory_read) {
> +s->dma_memory_read(s->dma_opaque, buf, dmalen);
> +} else {
> +memcpy(s->pdma_buf, buf, dmalen);
> +set_pdma(s, PDMA, 0, dmalen);
> +esp_raise_drq(s);
> +return 0;
> +}
>  } else {
>  dmalen = s->ti_size;
>  if (dmalen > TI_BUFSZ) {
> @@ -177,6 +220,16 @@ static void do_cmd(ESPState *s, uint8_t *buf)
>  do_busid_cmd(s, [1], busid);
>  }
>  
> +static void satn_pdma_cb(ESPState *s)
> +{
> +if (get_cmd_cb(s) < 0) {
> +return;
> +}
> +if (s->pdma_cur != s->pdma_start) {
> +do_cmd(s, get_pdma_buf(s) + s->pdma_start);
> +}
> +}
> +
>  static 

Re: [PATCH v15 02/11] esp: move get_cmd() post-DMA code to get_cmd_cb()

2019-10-27 Thread Paolo Bonzini
On 26/10/19 18:45, Laurent Vivier wrote:
> This will be needed to implement pseudo-DMA
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/scsi/esp.c | 46 +-
>  1 file changed, 29 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
> index 09b28cba17..0230ede21d 100644
> --- a/hw/scsi/esp.c
> +++ b/hw/scsi/esp.c
> @@ -84,6 +84,34 @@ void esp_request_cancelled(SCSIRequest *req)
>  }
>  }
>  
> +static int get_cmd_cb(ESPState *s)
> +{
> +int target;
> +
> +target = s->wregs[ESP_WBUSID] & BUSID_DID;
> +
> +s->ti_size = 0;
> +s->ti_rptr = 0;
> +s->ti_wptr = 0;
> +
> +if (s->current_req) {
> +/* Started a new command before the old one finished.  Cancel it.  */
> +scsi_req_cancel(s->current_req);
> +s->async_len = 0;
> +}
> +
> +s->current_dev = scsi_device_find(>bus, 0, target, 0);
> +if (!s->current_dev) {
> +/* No such drive */
> +s->rregs[ESP_RSTAT] = 0;
> +s->rregs[ESP_RINTR] = INTR_DC;
> +s->rregs[ESP_RSEQ] = SEQ_0;
> +esp_raise_irq(s);
> +return -1;
> +}
> +return 0;
> +}
> +
>  static uint32_t get_cmd(ESPState *s, uint8_t *buf, uint8_t buflen)
>  {
>  uint32_t dmalen;
> @@ -108,23 +136,7 @@ static uint32_t get_cmd(ESPState *s, uint8_t *buf, 
> uint8_t buflen)
>  }
>  trace_esp_get_cmd(dmalen, target);
>  
> -s->ti_size = 0;
> -s->ti_rptr = 0;
> -s->ti_wptr = 0;
> -
> -if (s->current_req) {
> -/* Started a new command before the old one finished.  Cancel it.  */
> -scsi_req_cancel(s->current_req);
> -s->async_len = 0;
> -}
> -
> -s->current_dev = scsi_device_find(>bus, 0, target, 0);
> -if (!s->current_dev) {
> -// No such drive
> -s->rregs[ESP_RSTAT] = 0;
> -s->rregs[ESP_RINTR] = INTR_DC;
> -s->rregs[ESP_RSEQ] = SEQ_0;
> -esp_raise_irq(s);
> +if (get_cmd_cb(s) < 0) {
>  return 0;
>  }
>  return dmalen;
> 

Acked-by: Paolo Bonzini 



Re: [PATCH 0/5] hw/i386/pc: Extract pc_gsi_create() and pc_i8259_create()

2019-10-27 Thread Paolo Bonzini
On 26/10/19 16:55, Philippe Mathieu-Daudé wrote:
> In the patch 'hw/i386: Introduce the microvm machine type'
> Sergio uses gsi_handler(), so you need to drop 'hw/i386/pc:
> Reduce gsi_handler scope' to keep microvm building.

Indeed, and I also have patches on top to move those out of pc.c
(otherwise you cannot build CONFIG_MICROVM without either i440fx or q35).

Paolo



[PATCH] iotests: Test nbd client reconnect

2019-10-27 Thread Andrey Shinkevich
The stress test for an NBD client. The NBD server is disconnected after
a client write operation. The NBD client should reconnect and retry the
operation.

Suggested-by: Denis V. Lunev 
Signed-off-by: Andrey Shinkevich 
---
 tests/qemu-iotests/277 | 91 ++
 tests/qemu-iotests/277.out |  7 
 tests/qemu-iotests/group   |  1 +
 3 files changed, 99 insertions(+)
 create mode 100755 tests/qemu-iotests/277
 create mode 100644 tests/qemu-iotests/277.out

diff --git a/tests/qemu-iotests/277 b/tests/qemu-iotests/277
new file mode 100755
index 000..46a29b7
--- /dev/null
+++ b/tests/qemu-iotests/277
@@ -0,0 +1,91 @@
+#!/usr/bin/env python
+#
+# Test nbd client reconnect
+#
+# Copyright (c) 2019 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import sys
+import io
+import subprocess
+import iotests
+from iotests import file_path, log
+
+
+def start_server_NBD(nbd_sock, conf_file):
+srv = subprocess.Popen(["nbd-fault-injector.py", "--classic-negotiation",
+   nbd_sock, conf_file], stdout=subprocess.PIPE,
+   stderr=subprocess.STDOUT, universal_newlines=True)
+line = srv.stdout.readline()
+if "Listening on " in line:
+log('NBD server: started')
+else:
+log('NBD server: {}'.format(line.rstrip()))
+
+return srv
+
+
+def check_server_NBD(srv):
+exitcode = srv.wait(timeout=10)
+
+if exitcode < 0:
+sys.stderr.write('NBD server: ERROR %i\n' % (-exitcode))
+log(srv.communicate()[0])
+else:
+line = srv.stdout.readline()
+log('NBD server: ' + line.rstrip())
+
+os.remove(nbd_sock)
+os.remove(conf_file)
+
+
+def make_conf_file(conf_file, event):
+if os.path.exists(conf_file):
+os.remove(conf_file)
+
+with open(conf_file, "w+") as conff:
+conff.write("[inject-error]\nevent={}\nwhen=after".format(event))
+
+
+disk, nbd_sock = file_path('disk', 'nbd-sock')
+nbd_uri = 'nbd+unix:///?socket=' + nbd_sock
+if os.path.exists(nbd_sock):
+os.remove(nbd_sock)
+
+conf_file = os.path.join(iotests.test_dir, "nbd-fault-injector.conf")
+make_conf_file(conf_file, "data")
+srv = start_server_NBD(nbd_sock, conf_file)
+
+log('NBD client: QEMU-IO write')
+args = iotests.qemu_io_args + ['-f', 'raw', '-c', 'write -P 0x7 0 3M', nbd_uri]
+clt = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+
+check_server_NBD(srv)
+
+make_conf_file(conf_file, "reply")
+srv = start_server_NBD(nbd_sock, conf_file)
+
+exitcode = clt.wait(timeout=10)
+if exitcode < 0:
+sys.stderr.write('qemu-io received signal %i: %s\n' %
+ (-exitcode, ' '.join(args)))
+
+for line in io.TextIOWrapper(clt.stdout, encoding="utf-8"):
+if "3 MiB" not in line:
+log('NBD client: ' + line.rstrip())
+
+check_server_NBD(srv)
diff --git a/tests/qemu-iotests/277.out b/tests/qemu-iotests/277.out
new file mode 100644
index 000..07e6e82
--- /dev/null
+++ b/tests/qemu-iotests/277.out
@@ -0,0 +1,7 @@
+NBD server: started
+NBD client: QEMU-IO write
+NBD server: Closing connection on rule match inject-error
+NBD server: started
+NBD client: Connection closed
+NBD client: wrote 3145728/3145728 bytes at offset 0
+NBD server: Closing connection on rule match inject-error
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index af322af..22ef1b8 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -282,3 +282,4 @@
 267 rw auto quick snapshot
 268 rw auto quick
 270 rw backing quick
+277 rw
-- 
1.8.3.1




Re: [RFC 1/3] WIP virtiofsd: import Linux header file

2019-10-27 Thread Stefan Hajnoczi
On Sat, Oct 26, 2019 at 05:49:11PM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 25, 2019 at 12:01:50PM +0200, Stefan Hajnoczi wrote:
> > tests/vhost-user-fs-test.c needs fuse.h.  The private copy that
> > virtiofsd has can be replaced with a properly imported file using
> > update-linux-headers.sh.
> > 
> > TODO rerun update-linux-headers.sh with upstream kernel tree!
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> OK I would just add this with the virtiofsd patchset.

Yes, I'll talk to David.

Stefan


signature.asc
Description: PGP signature


Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-27 Thread Stefan Hajnoczi
On Fri, Oct 25, 2019 at 02:36:49PM +, Vladimir Sementsov-Ogievskiy wrote:
> 25.10.2019 17:19, Max Reitz wrote:
> > On 25.10.19 15:56, Vladimir Sementsov-Ogievskiy wrote:
> >> 25.10.2019 16:40, Vladimir Sementsov-Ogievskiy wrote:
> >>> 25.10.2019 12:58, Max Reitz wrote:
>  Hi,
> 
>  It seems to me that there is a bug in Linux’s XFS kernel driver, as
>  I’ve explained here:
> 
>  https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html
> 
>  In combination with our commit c8bb23cbdbe32f, this may lead to guest
>  data corruption when using qcow2 images on XFS with aio=native.
> 
>  We can’t wait until the XFS kernel driver is fixed, we should work
>  around the problem ourselves.
> 
>  This is an RFC for two reasons:
>  (1) I don’t know whether this is the right way to address the issue,
>  (2) Ideally, we should detect whether the XFS kernel driver is fixed and
>    if so stop applying the workaround.
>    I don’t know how we would go about this, so this series doesn’t do
>    it.  (Hence it’s an RFC.)
>  (3) Perhaps it’s a bit of a layering violation to let the file-posix
>    driver access and modify a BdrvTrackedRequest object.
> 
>  As for how we can address the issue, I see three ways:
>  (1) The one presented in this series: On XFS with aio=native, we extend
>    tracked requests for post-EOF fallocate() calls (i.e., write-zero
>    operations) to reach until infinity (INT64_MAX in practice), mark
>    them serializing and wait for other conflicting requests.
> 
>    Advantages:
>    + Limits the impact to very specific cases
>      (And that means it wouldn’t hurt too much to keep this workaround
>      even when the XFS driver has been fixed)
>    + Works around the bug where it happens, namely in file-posix
> 
>    Disadvantages:
>    - A bit complex
>    - A bit of a layering violation (should file-posix have access to
>      tracked requests?)
> 
>  (2) Always skip qcow2’s handle_alloc_space() on XFS.  The XFS bug only
>    becomes visible due to that function: I don’t think qcow2 writes
>    zeroes in any other I/O path, and raw images are fixed in size so
>    post-EOF writes won’t happen.
> 
>    Advantages:
>    + Maybe simpler, depending on how difficult it is to handle the
>      layering violation
>    + Also fixes the performance problem of handle_alloc_space() being
>      slow on ppc64+XFS.
> 
>    Disadvantages:
>    - Huge layering violation because qcow2 would need to know whether
>      the image is stored on XFS or not.
>    - We’d definitely want to skip this workaround when the XFS driver
>      has been fixed, so we need some method to find out whether it has
> 
>  (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f.
>    To my knowledge I’m the only one who has provided any benchmarks 
>  for
>    this commit, and even then I was a bit skeptical because it 
>  performs
>    well in some cases and bad in others.  I concluded that it’s
>    probably worth it because the “some cases” are more likely to 
>  occur.
> 
>    Now we have this problem of corruption here (granted due to a bug 
>  in
>    the XFS driver), and another report of massively degraded
>    performance on ppc64
>    (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a
>    private BZ; I hate that :-/  The report is about 40 % worse
>    performance for an in-guest fio write benchmark.)
> 
>    So I have to ask the question about what the justification for
>    keeping c8bb23cbdbe32f is.  How much does performance increase with
>    it actually?  (On non-(ppc64+XFS) machines, obviously)
> 
>    Advantages:
>    + Trivial
>    + No layering violations
>    + We wouldn’t need to keep track of whether the kernel bug has been
>      fixed or not
>    + Fixes the ppc64+XFS performance problem
> 
>    Disadvantages:
>    - Reverts cluster allocation performance to pre-c8bb23cbdbe32f
>      levels, whatever that means
> 
>  So this is the main reason this is an RFC: What should we do?  Is (1)
>  really the best choice?
> 
> 
>  In any case, I’ve ran the test case I showed in
>  https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01282.html
>  more than ten times with this series applied and the installation
>  succeeded every time.  (Without this series, it fails like every other
>  time.)
> 
> 
> >>>
> >>> Hi!
> >>>
> >>> First, great thanks for your investigation!
> >>>
> >>> We need 

Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-27 Thread Stefan Hajnoczi
On Fri, Oct 25, 2019 at 11:58:46AM +0200, Max Reitz wrote:
> As for how we can address the issue, I see three ways:
> (1) The one presented in this series: On XFS with aio=native, we extend
> tracked requests for post-EOF fallocate() calls (i.e., write-zero
> operations) to reach until infinity (INT64_MAX in practice), mark
> them serializing and wait for other conflicting requests.
> 
> Advantages:
> + Limits the impact to very specific cases
>   (And that means it wouldn’t hurt too much to keep this workaround
>   even when the XFS driver has been fixed)
> + Works around the bug where it happens, namely in file-posix
> 
> Disadvantages:
> - A bit complex
> - A bit of a layering violation (should file-posix have access to
>   tracked requests?)

Your patch series is reasonable.  I don't think it's too bad.

The main question is how to detect the XFS fix once it ships.  XFS
already has a ton of ioctls, so maybe they don't mind adding a
feature/quirk bit map ioctl for publishing information about bug fixes
to userspace.  I didn't see another obvious way of doing it, maybe a
mount option that the kernel automatically sets and that gets reported
to userspace?

If we imagine that XFS will not provide a mechanism to detect the
presence of the fix, then could we ask QEMU package maintainers to
./configure --disable-xfs-fallocate-beyond-eof-workaround at some point
in the future when their distro has been shipping a fixed kernel for a
while?  It's ugly because it doesn't work if the user installs an older
custom-built kernel on the host.  But at least it will cover 98% of
users...

> (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f.
> To my knowledge I’m the only one who has provided any benchmarks for
> this commit, and even then I was a bit skeptical because it performs
> well in some cases and bad in others.  I concluded that it’s
> probably worth it because the “some cases” are more likely to occur.
> 
> Now we have this problem of corruption here (granted due to a bug in
> the XFS driver), and another report of massively degraded
> performance on ppc64
> (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a
> private BZ; I hate that :-/  The report is about 40 % worse
> performance for an in-guest fio write benchmark.)
> 
> So I have to ask the question about what the justification for
> keeping c8bb23cbdbe32f is.  How much does performance increase with
> it actually?  (On non-(ppc64+XFS) machines, obviously)
> 
> Advantages:
> + Trivial
> + No layering violations
> + We wouldn’t need to keep track of whether the kernel bug has been
>   fixed or not
> + Fixes the ppc64+XFS performance problem
> 
> Disadvantages:
> - Reverts cluster allocation performance to pre-c8bb23cbdbe32f
>   levels, whatever that means

My favorite because it is clean and simple, but Vladimir has a valid
use-case for requiring this performance optimization so reverting isn't
an option.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2] qemu-img.texi: Describe data_file and data_file_raw

2019-10-27 Thread Stefan Hajnoczi
On Mon, Oct 21, 2019 at 09:14:21AM +0800, Han Han wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=1763105
> 
> Signed-off-by: Han Han 
> ---
>  qemu-img.texi | 10 ++
>  1 file changed, 10 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v7 0/8] Packed virtqueue for virtio

2019-10-27 Thread Stefan Hajnoczi
On Fri, Oct 25, 2019 at 07:34:03AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 25, 2019 at 10:35:19AM +0200, Eugenio Pérez wrote:
> > Hi:
> > 
> > This is an updated version of packed virtqueue support based on Wei and 
> > Jason's
> > V6, mainly solving the clang leak detector error CI gave.
> 
> 
> Looks good, I will queue this up.
> 
> It would be nice to add libqos based tests on top,
> based on Stefan's work.

Packed virtqueue support in libqos would be nice now that we have VIRTIO
1.0 support.  I think it could be added nicely now.

Writing a low-level virtqueue layout test is also possible if you want
to test QEMU's packed virtqueue code.  For example, a test case could
trigger error code paths by creating invalid rings/descriptors.

Stefan


signature.asc
Description: PGP signature


[PATCH] qom/type_initialize_interface: inherit .class_data from the template TypeInfo

2019-10-27 Thread Maciej Bielski
The `TypeInfo::class_data` value of an interface is never properly
propagated for initializations of all types implementing the interface.
Although interfaces are rarely used, IMHO their functionality is
currently a bit incomplete.

A `TypeInfo fooable_info` of an interface "fooable" may have the
`.class_base_init` callback and the `.class_data` field defined. Let's
assume that `.class_data = 0xdeadbeef`. Additionally, the `fooable_info`
is only used to allocate one associated instance of `TypeImpl`, let's
say `fooable_impl`.

Then, for each type `TypeInfo xyz_info`, that implements the interface
"fooable", there is another `TypeInfo info` (and associated TypeImpl)
automatically created within `type_initialize_interface()`. The
automatic `info` reflects the fact that the interface "fooable" is
implemented by `xyz_info` (for example by having
`.name="xyz::fooable"`). In a way, the `info` inherits from
`fooable_impl`, for example it sets `.parent` field accordingly.

The problem is that this inheritance is fixed by the implementation of
`type_initialize_interface` and the `info.class_data` is always `NULL`.
Further, this NULL value is passed to `fooable_info::class_base_init()`,
where actually a common-sense expectation would be to have the
`0xdeadbeef` from the interface definition.

The fix below seems to be the easiest solution. Another option would be
to dereference `klass->type->parent_type->...->parent_type->class_data`
but the `TypeImpl` definition is (perhaps on purpose) private to
`qom/object.c`.

Signed-off-by: Maciej Bielski 
---
 qom/object.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qom/object.c b/qom/object.c
index aac08351b7..dc305e14b0 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -248,6 +248,7 @@ static void type_initialize_interface(TypeImpl *ti, 
TypeImpl *interface_type,
 TypeImpl *iface_impl;
 
 info.parent = parent_type->name;
+info.class_data = parent_type->class_data;
 info.name = g_strdup_printf("%s::%s", ti->name, interface_type->name);
 info.abstract = true;
 
-- 
2.21.0




Re: [PATCH 00/30] virtiofs daemon (base)

2019-10-27 Thread Michael S. Tsirkin
On Mon, Oct 21, 2019 at 11:58:02AM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Hi,
>   This is the 1st set for the virtiofsd - a daemon
> that implements the user space side of virtiofs.
> 
>   The kernel and qemu device parts recently went in,
> so the daemon is the only thing missing to have a working
> set.
> 
>   This set is the absolute minimal base set of patches;
> it's not yet safe to use (from security or correctness);

So based on this I'm guessing we don't want to rush this in
before the soft freeze.
If you have a different opinion, let me know pls.

-- 
MST



Re: [PATCH v20 0/5] Add ARMv8 RAS virtualization support in QEMU

2019-10-27 Thread Michael S. Tsirkin
On Sat, Oct 26, 2019 at 11:24:42AM +0800, Xiang Zheng wrote:
> In the ARMv8 platform, the CPU error types are synchronous external abort(SEA)
> and SError Interrupt (SEI). If exception happens in guest, sometimes it's 
> better
> for guest to perform the recovery, because host does not know the detailed
> information of guest. For example, if an exception happens in a user-space
> application within guest, host does not know which application encounters
> errors.
> 
> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace.
> After user space gets the notification, it will record the CPER into guest 
> GHES
> buffer and inject an exception or IRQ into guest.
> 
> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will
> treat it as a synchronous exception, and notify guest with ARMv8 SEA
> notification type after recording CPER into guest.
> 
> This series of patches are based on Qemu 4.1, which include two parts:
> 1. Generate APEI/GHES table.
> 2. Handle the SIGBUS signal, record the CPER in runtime and fill it into guest
>memory, then notify guest according to the type of SIGBUS.
> 
> The whole solution was suggested by James(james.mo...@arm.com); The solution 
> of
> APEI section was suggested by Laszlo(ler...@redhat.com).
> Show some discussions in [1].
> 
> This series of patches have already been tested on ARM64 platform with RAS
> feature enabled:
> Show the APEI part verification result in [2].
> Show the BUS_MCEERR_AR SIGBUS handling verification result in [3].


This looks mostly OK to me.  I sent some minor style comments but they
can be addressed by follow up patches.

Maybe it's a good idea to merge this before soft freeze to make sure it
gets some testing.  I'll leave this decision to the ARM maintainer.  For
ACPI parts:

Reviewed-by: Michael S. Tsirkin 


> ---
> Change since v19:
> 1. Fix clang compile error
> 2. Fix sphinx build error
> 
> Change since v18:
> 1. Fix some code-style and typo/grammar problems.
> 2. Remove no_ras in the VirtMachineClass struct.
> 3. Convert documentation to rst format.
> 4. Simplize the code and add comments for some magic value.
> 5. Move kvm_inject_arm_sea() function into the patch where it's used.
> 6. Register the reset handler(kvm_unpoison_all()) in the kvm_init() function.
> 
> Change since v17:
> 1. Improve some commit messages and comments.
> 2. Fix some code-style problems.
> 3. Add a *ras* machine option.
> 4. Move HEST/GHES related structures and macros into "hw/acpi/acpi_ghes.*".
> 5. Move HWPoison page functions into "include/sysemu/kvm_int.h".
> 6. Fix some bugs.
> 7. Improve the design document.
> 
> Change since v16:
> 1. check whether ACPI table is enabled when handling the memory error in the 
> SIGBUS handler.
> 
> Change since v15:
> 1. Add a doc-comment in the proper format for 'include/exec/ram_addr.h'
> 2. Remove write_part_cpustate_to_list() because there is another bug fix patch
>has been merged "arm: Allow system registers for KVM guests to be changed 
> by QEMU code"
> 3. Add some comments for kvm_inject_arm_sea() in 'target/arm/kvm64.c'
> 4. Compare the arm_current_el() return value to 0,1,2,3, not to PSTATE_MODE_* 
> constants.
> 5. Change the RAS support wasn't introduced before 4.1 QEMU version.
> 6. Move the no_ras flag  patch to begin in this series
> 
> Change since v14:
> 1. Remove the BUS_MCEERR_AO handling logic because this asynchronous signal 
> was masked by main thread
> 2. Address some Igor Mammedov's comments(ACPI part)
>1) change the comments for the enum AcpiHestNotifyType definition and 
> remove ditto in patch 1
>2) change some patch commit messages and separate "APEI GHES table 
> generation" patch to more patches.
> 3. Address some peter's comments(arm64 Synchronous External Abort injection)
>1) change some code notes
>2) using arm_current_el() for current EL
>2) use the helper functions for those (syn_data_abort_*).
> 
> Change since v13:
> 1. Move the patches that set guest ESR and inject virtual SError out of this 
> series
> 2. Clean and optimize the APEI part patches
> 3. Update the commit messages and add some comments for the code
> 
> Change since v12:
> 1. Address Paolo's comments to move HWPoisonPage definition to 
> accel/kvm/kvm-all.c
> 2. Only call kvm_cpu_synchronize_state() when get the BUS_MCEERR_AR signal
> 3. Only add and enable GPIO-Signal and ARMv8 SEA two hardware error sources
> 4. Address Michael's comments to not sync SPDX from Linux kernel header file
> 
> Change since v11:
> Address James's comments(james.mo...@arm.com)
> 1. Check whether KVM has the capability to to set ESR instead of detecting 
> host CPU RAS capability
> 2. For SIGBUS_MCEERR_AR SIGBUS, use Synchronous-External-Abort(SEA) 
> notification type
>for SIGBUS_MCEERR_AO SIGBUS, use GPIO-Signal notification
> 
> 
> Address Shannon's comments(for ACPI part):
> 1. Unify hest_ghes.c and hest_ghes.h license declaration
> 2. Remove unnecessary 

Re: [PATCH v20 3/5] ACPI: Add APEI GHES table generation support

2019-10-27 Thread Michael S. Tsirkin
On Sat, Oct 26, 2019 at 11:24:45AM +0800, Xiang Zheng wrote:
> From: Dongjiu Geng 
> 
> This patch implements APEI GHES Table generation via fw_cfg blobs. Now
> it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
> we can extend the supported types if needed. For the CPER section,
> currently it is memory section because kernel mainly wants userspace to
> handle the memory errors.
> 
> This patch follows the spec ACPI 6.2 to build the Hardware Error Source
> table. For more detailed information, please refer to document:
> docs/specs/acpi_hest_ghes.rst
> 
> Suggested-by: Laszlo Ersek 
> Signed-off-by: Dongjiu Geng 
> Signed-off-by: Xiang Zheng 
> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/acpi/Kconfig |   4 +
>  hw/acpi/Makefile.objs   |   1 +
>  hw/acpi/acpi_ghes.c | 217 
>  hw/acpi/aml-build.c |   2 +
>  hw/arm/virt-acpi-build.c|  12 ++
>  include/hw/acpi/acpi_ghes.h | 106 
>  include/hw/acpi/aml-build.h |   1 +
>  8 files changed, 344 insertions(+)
>  create mode 100644 hw/acpi/acpi_ghes.c
>  create mode 100644 include/hw/acpi/acpi_ghes.h
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 1f2e0e7fde..5722f3130e 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
>  CONFIG_FSL_IMX7=y
>  CONFIG_FSL_IMX6UL=y
>  CONFIG_SEMIHOSTING=y
> +CONFIG_ACPI_APEI=y
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 12e3f1e86e..ed8c34d238 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -23,6 +23,10 @@ config ACPI_NVDIMM
>  bool
>  depends on ACPI
>  
> +config ACPI_APEI
> +bool
> +depends on ACPI
> +
>  config ACPI_PCI
>  bool
>  depends on ACPI && PCI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 655a9c1973..84474b0ca8 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
>  common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> +common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o
>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
>  common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> new file mode 100644
> index 00..23f8a9928c
> --- /dev/null
> +++ b/hw/acpi/acpi_ghes.c
> @@ -0,0 +1,217 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/acpi_ghes.h"
> +#include "hw/nvram/fw_cfg.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/error-report.h"
> +
> +/*
> + * Hardware Error Notification
> + * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> + */
> +static void acpi_ghes_build_notify(GArray *table, const uint8_t type)
> +{
> +/* Type */
> +build_append_int_noprefix(table, type, 1);
> +/*
> + * Length:
> + * Total length of the structure in bytes
> + */
> +build_append_int_noprefix(table, 28, 1);
> +/* Configuration Write Enable */
> +build_append_int_noprefix(table, 0, 2);
> +/* Poll Interval */
> +build_append_int_noprefix(table, 0, 4);
> +/* Vector */
> +build_append_int_noprefix(table, 0, 4);
> +/* Switch To Polling Threshold Value */
> +build_append_int_noprefix(table, 0, 4);
> +/* Switch To Polling Threshold Window */
> +build_append_int_noprefix(table, 0, 4);
> +/* Error Threshold Value */
> +build_append_int_noprefix(table, 0, 4);
> +/* Error Threshold Window */
> +build_append_int_noprefix(table, 0, 4);
> +}
> +
> +/* Build table for the hardware error fw_cfg blob */
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
> +{
> +int i, error_status_block_offset;
> +
> +   

[Bug 1848901] Re: kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device (28)

2019-10-27 Thread P.O.
As a sidenote, while running the same ASAv in "GNS3 VM.ova" in oracle 
virtualbox in the same desktop computer, which apparently uses QEMU 3.1.0, it 
does work correctly.
However I would really like it to work without the VM inbetween directly in my 
OS QEMU 4 :-)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1848901

Title:
  kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device
  (28)

Status in QEMU:
  New

Bug description:
  => QEMU process has stopped, return code: -6

  Start QEMU with /usr/bin/qemu-system-x86_64 -name CiscoASAv9.8.1-1 -m
  2048M -smp cpus=1 -enable-kvm -machine smm=off -boot order=c -drive
  'file=/home/deemon/GNS3/projects/ASAv my ass/project-files/qemu
  /7725cdea-5e66-4777-b4dd-
  c3905f258394/hda_disk.qcow2,if=virtio,index=0,media=disk,id=drive0'
  -uuid 7725cdea-5e66-4777-b4dd-c3905f258394 -serial
  telnet:127.0.0.1:5000,server,nowait -monitor
  tcp:127.0.0.1:44629,server,nowait -net none -device
  e1000,mac=0c:7a:1d:83:94:00,netdev=gns3-0 -netdev
  socket,id=gns3-0,udp=127.0.0.1:10001,localaddr=127.0.0.1:1 -device
  e1000,mac=0c:7a:1d:83:94:01,netdev=gns3-1 -netdev
  socket,id=gns3-1,udp=127.0.0.1:10003,localaddr=127.0.0.1:10002 -device
  e1000,mac=0c:7a:1d:83:94:02,netdev=gns3-2 -netdev
  socket,id=gns3-2,udp=127.0.0.1:10005,localaddr=127.0.0.1:10004 -device
  e1000,mac=0c:7a:1d:83:94:03,netdev=gns3-3 -netdev
  socket,id=gns3-3,udp=127.0.0.1:10007,localaddr=127.0.0.1:10006 -device
  e1000,mac=0c:7a:1d:83:94:04,netdev=gns3-4 -netdev
  socket,id=gns3-4,udp=127.0.0.1:10009,localaddr=127.0.0.1:10008 -device
  e1000,mac=0c:7a:1d:83:94:05,netdev=gns3-5 -netdev
  socket,id=gns3-5,udp=127.0.0.1:10011,localaddr=127.0.0.1:10010 -device
  e1000,mac=0c:7a:1d:83:94:06,netdev=gns3-6 -netdev
  socket,id=gns3-6,udp=127.0.0.1:10013,localaddr=127.0.0.1:10012 -device
  e1000,mac=0c:7a:1d:83:94:07,netdev=gns3-7 -netdev
  socket,id=gns3-7,udp=127.0.0.1:10015,localaddr=127.0.0.1:10014
  -nographic

   
  Execution log:
  kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device (28)

  and then it just closes...


  [deemon@Zen ~]$ coredumpctl info 8638
 PID: 8638 (qemu-system-x86)
 UID: 1000 (deemon)
 GID: 1000 (deemon)
  Signal: 6 (ABRT)
   Timestamp: Sun 2019-10-20 04:27:29 EEST (5min ago)
Command Line: /usr/bin/qemu-system-x86_64 -name CiscoASAv9.8.1-1 -m 2048M 
-smp cpus=1 -enable-kvm -machine smm=off -boot order=c -drive 
file=/home/deemon/GNS3/projects/ASAv my ass/project-files/qemu>
  Executable: /usr/bin/qemu-system-x86_64
   Control Group: /user.slice/user-1000.slice/session-2.scope
Unit: session-2.scope
   Slice: user-1000.slice
 Session: 2
   Owner UID: 1000 (deemon)
 Boot ID: cd30f69a8d194359a31889dc7b6b026c
  Machine ID: d0a2d74a5cd9430797d902f5237c448d
Hostname: Zen
 Storage: 
/var/lib/systemd/coredump/core.qemu-system-x86.1000.cd30f69a8d194359a31889dc7b6b026c.8638.157153484900.lz4
 (truncated)
 Message: Process 8638 (qemu-system-x86) of user 1000 dumped core.
  
  Stack trace of thread 8642:
  #0  0x7f1a33609f25 n/a (n/a)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1848901/+subscriptions



Re: [PATCH v3 00/20] hw/i386/pc: Split PIIX3 southbridge from i440FX northbridge

2019-10-27 Thread Aleksandar Markovic
On Saturday, October 26, 2019, Philippe Mathieu-Daudé 
wrote:

> Changes since v2 [0]:
> - Use a #define
> - Reword one description
> - Added review tags (thanks all for reviewing!)
>
> Changes since v1 [1]:
> - Removed patch reintroducing DO_UPCAST() use (thuth)
> - Took various patches out to reduce series (thuth)
> - Added review tags (thanks all for reviewing!)
>
> $ git backport-diff -u pc_split_i440fx_piix-v2
> Key:
> [] : patches are identical
> [] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences,
> respectively
>
> 001/20:[] [--] 'MAINTAINERS: Keep PIIX4 South Bridge separate from PC
> Chipsets'
> 002/20:[0004] [FC] 'piix4: Add the Reset Control Register'
> 003/20:[0002] [FC] 'piix4: Add an i8259 Interrupt Controller as specified
> in datasheet'
> 004/20:[] [--] 'Revert "irq: introduce qemu_irq_proxy()"'
> 005/20:[] [--] 'piix4: Rename PIIX4 object to piix4-isa'
> 006/20:[] [--] 'piix4: Add an i8257 DMA Controller as specified in
> datasheet'
> 007/20:[] [-C] 'piix4: Add an i8254 PIT Controller as specified in
> datasheet'
> 008/20:[0004] [FC] 'piix4: Add a MC146818 RTC Controller as specified in
> datasheet'
> 009/20:[] [--] 'hw/mips/mips_malta: Create IDE hard drive array
> dynamically'
> 010/20:[] [--] 'hw/mips/mips_malta: Extract the PIIX4 creation code as
> piix4_create()'
> 011/20:[] [-C] 'hw/isa/piix4: Move piix4_create() to hw/isa/piix4.c'
> 012/20:[] [--] 'hw/i386: Remove obsolete LoadStateHandler::load_state_old
> handlers'
> 013/20:[] [--] 'hw/pci-host/piix: Extract piix3_create()'
> 014/20:[0002] [FC] 'hw/pci-host/piix: Move RCR_IOPORT register definition'
> 015/20:[] [--] 'hw/pci-host/piix: Define and use the PIIX IRQ Route
> Control Registers'
> 016/20:[] [-C] 'hw/pci-host/piix: Move i440FX declarations to
> hw/pci-host/i440fx.h'
> 017/20:[] [--] 'hw/pci-host/piix: Fix code style issues'
> 018/20:[] [--] 'hw/pci-host/piix: Extract PIIX3 functions to
> hw/isa/piix3.c'
> 019/20:[] [--] 'hw/pci-host: Rename incorrectly named 'piix' as
> 'i440fx''
> 020/20:[0004] [FC] 'hw/pci-host/i440fx: Remove the last PIIX3 traces'
>
> Previous cover:
>
> This series is a rework of "piix4: cleanup and improvements" [2]
> from Hervé, and my "remove i386/pc dependency: PIIX cleanup" [3].
>
> Still trying to remove the strong X86/PC dependency 2 years later,
> one step at a time.
> Here we split the PIIX3 southbridge from i440FX northbridge.
> The i440FX northbridge is only used by the PC machine, while the
> PIIX southbridge is also used by the Malta MIPS machine.
>
> This is also a step forward using KConfig with the Malta board.
> Without this split, it was impossible to compile the Malta without
> pulling various X86 pieces of code.
>
> The overall design cleanup is not yet perfect, but enough to post
> as a series.
>
> Now that the PIIX3 code is extracted, the code duplication with the
> PIIX4 chipset is obvious. Not worth improving for now because it
> isn't broken.
>
> Based-on: <1572097538-18898-1-git-send-email-pbonz...@redhat.com>
> to include:
> mc146818rtc: Allow call object_initialize(MC146818_RTC) instead of
> rtc_init()
> 20191018133547.10936-1-philmd@redhat.com">https://mid.mail-archive.com/20191018133547.10936-1-philmd@redhat.com
>
> Since Aleksandar offered me to send the pull request [4] I plan to do
> it once Paolo's pull [5] is merged.
>
>
Philippe,

I attempted the other day the integration of v2 of this series into MIPS
pull request, but couldn't do it - since another series of yours was
already merged, acting on the same code, making rebasing difficult. Now
this, v3, series can't be applied since certain patches in some, on
surface, unrelated series aren't megred, and v3 assumes they are merged.

If you send a series, it should preferably be based on the latest (current)
code base, not on some imagined future state.

Why did you create this such mess with interdependencies of your own
multiple series, and just right before softfreeze? :( You should have
distributed submitting those series over longer time interval, and
absolutely avoid, if possible, this hectic around-softfreeze period. You
did the opposite: waited for softfreeze to become close, and sent several
interdependant series in matter of days - creating stress without any real
technical reason.

In case you, for any reason, can't complete this by softfreeze, I advice
you not to rush, and postpone the integration to 5.0.

Thanks,
Aleksandar



> Thanks,
>
> Phil.
>
> CI results:
> https://travis-ci.org/philmd/qemu/builds/603253987
> https://app.shippable.com/github/philmd/qemu/runs/550/summary/console
>
> [0] https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg04662.html
> [1] https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg03685.html
> [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg500737.html
> [3] 

[Bug 1759522] Re: windows qemu-img create vpc/vhdx error

2019-10-27 Thread Adam Baxter
Can confirm this is still an issue with 4.1.0.

?field.comment=Can confirm this is still an issue with 4.1.0.


** Attachment added: "Files created on Windows and Debian"
   
https://bugs.launchpad.net/qemu/+bug/1759522/+attachment/5300530/+files/vhdx_comparison.7z

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1759522

Title:
  windows qemu-img create vpc/vhdx error

Status in QEMU:
  New

Bug description:
  On windows, using qemu-img (version 2.11.90) to create vpc/vhdx
  virtual disk tends to fail. Here's the way to reproduce:

  1. Install qemu-w64-setup-20180321.exe

  2. Use `qemu-img create -f vhdx -o subformat=fixed disk.vhdx 512M` to create 
a vhdx:
 Formatting 'disk.vhdx', fmt=vhdx size=536870912 log_size=1048576 
block_size=0 subformat=fixed

  3. Execute `qemu-img info disk.vhdx` gives the result, (note the `disk size` 
is incorrect):
 image: disk.vhdx
 file format: vhdx
 virtual size: 512M (536870912 bytes)
 disk size: 1.4M
 cluster_size: 8388608

  4. On Windows 10 (V1709), double click disk.vhdx gives an error:
 Make sure the file is in an NTFS volume and isn't in a compressed folder 
or volume.

 Using Disk Management -> Action -> Attach VHD gives an error:
 The requested operation could not be completed due to a virtual disk 
system limitation. Virtual hard disk files must be uncompressed and uneccrypted 
and must not be sparse.

  Comparison with Windows 10 created VHDX:

  1. Using Disk Management -> Action -> Create VHD:
 File name: win.vhdx
 Virtual hard disk size: 512MB
 Virtual hard disk format: VHDX
 Virtual hard disk type: Fixed size

  2. Detach VHDX

  3. Execute `qemu-img info win.vhdx` gives the result:
 image: win.vhdx
 file format: vhdx
 virtual size: 512M (536870912 bytes)
 disk size: 516M
 cluster_size: 33554432

  Comparison with qemu-img under Ubuntu:

  1. Version: qemu-img version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16),
  Copyright (c) 2004-2008 Fabrice Bellard

  2. qemu-img create -f vhdx -o subformat=fixed lin.vhdx 512M
 Formatting 'lin.vhdx', fmt=vhdx size=536870912 log_size=1048576 
block_size=0 subformat=fixed

  3. qemu-img info lin.vhdx
 image: lin.vhdx
 file format: vhdx
 virtual size: 512M (536870912 bytes)
 disk size: 520M
 cluster_size: 8388608

  4. Load lin.vhdx under Windows 10 is ok

  The same thing happens on `vpc` format with or without
  `oformat=fixed`, it seems that windows version of qemu-img has some
  incorrect operation? My guess is that windows version of qemu-img
  doesn't handle the description field of vpc/vhdx, which leads to an
  incorrect `disk size` field.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1759522/+subscriptions



qemu-img still misbehaving on Windows

2019-10-27 Thread Adam Baxter
Hi,
I've attached some example files to
https://bugs.launchpad.net/qemu/+bug/1759522 - the bug still exists in qemu
4.1.0.

Thanks,
Adam