Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-12 Thread David Gibson
On Thu, Jan 09, 2020 at 05:31:24PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 09/01/2020 15:07, David Gibson wrote:
> > On Wed, Jan 08, 2020 at 03:07:41PM +1100, Alexey Kardashevskiy wrote:
> >>
> >>
> >> On 07/01/2020 16:26, David Gibson wrote:
> >>
>  +static uint32_t client_setprop(SpaprMachineState *sm,
>  +   uint32_t nodeph, uint32_t pname,
>  +   uint32_t valaddr, uint32_t vallen)
>  +{
>  +char propname[64];
>  +uint32_t ret = -1;
>  +int proplen = 0;
>  +const void *prop;
>  +
>  +readstr(pname, propname);
>  +if (vallen == sizeof(uint32_t) &&
>  +((strncmp(propname, "linux,rtas-base", 15) == 0) ||
>  + (strncmp(propname, "linux,rtas-entry", 16) == 0))) {
>  +
>  +sm->rtas_base = readuint32(valaddr);
>  +prop = fdt_getprop_namelen(sm->fdt_blob,
>  +   
>  fdt_node_offset_by_phandle(sm->fdt_blob,
>  +  
>  nodeph),
>  +   propname, strlen(propname), 
>  );
>  +if (proplen == vallen) {
>  +*(uint32_t *) prop = cpu_to_be32(sm->rtas_base);
>  +ret = proplen;
>  +}
> >>>
> >>> Is there a particular reason to restrict this to the rtas properties,
> >>> rather than just allowing the guest to fdt_setprop() something
> >>> arbitrary?
> >>
> >> The FDT is flatten and I am not quite sure if libfdt can handle 
> >> updating
> >> properties if the length has changed.
> >
> > fdt_setprop() will handle updating properties with changed length (in
> > fact there's a special fdt_setprop_inplace() optimized for the case
> > where you don't need that).  It's not particularly efficient, but it
> > should work fine for the cases we have here.  In fact, I think you're
> > already relying on this for the code that adds the phandles to
> > everything.
> 
>  Well, I used to add phandles before calling fdt_pack() so it is not 
>  exactly the same.
> >>>
> >>> Ah, right, that's why adding the phandles worked.
> >>>
> > One complication is that it can return FDT_ERR_NOSPACE if there isn't
> > enough buffer for the increased thing.  We could either trap that,
> > resize and retry, or we could leave a bunch of extra space.  The
> > latter would be basically equivalent to not doing fdt_pack() on the
> > blob in the nobios case.
> 
> 
>  This is what I ended up doing.
> 
> 
> >> Also, more importantly, potentially property changes like this may have
> >> to be reflected in the QEMU device tree so I allowed only the 
> >> properties
> >> which I know how to deal with.
> >
> > That's a reasonable concern, but the nice thing about not having SLOF
> > is that there's only one copy of the device tree - the blob in qemu.
> > So a setprop() on that is automatically a setprop() everywhere (this
> > is another reason not to write the fdt into guest memory in the nobios
> > case - it will become stale as soon as the client changes anything).
> 
> 
>  True to a degree. It is "setprop" to the current fdt blob which we do not
>  analyze after we build the fdt. We either need to do parse the tree 
>  before
>  we rebuild it as CAS so we do not lose the updates or do selective 
>  changes
>  to the QEMUs objects from the "setprop" handler (this is what I do
>  now).
> >>>
> >>> Hrm.. do those setprops happen before CAS?
> >>
> >> Yes, vmlinux/zimage call "setprop" for "linux,initrd-start",
> >> "linux,initrd-end", "bootargs", "linux,stdout-path"; vmlinux sets
> >> properties if linux,initrd-* came from r3/r4 and zImage sets properties
> >> no matter how it discovered them - from r3/r4 or the device tree.
> > 
> > Ok, and those setprops happen before CAS?
> 
> Yes.
> 
> > In a sense this is kind of a fundamental problem with rebuilding the
> > whole DT at CAS time.  Except that strictly speaking it's a problem
> > even without that: we just get away with it by accident because CAS
> > isn't likely to change the same things that guest setprops do.
> 
> > It's still basically unsynchronized mutations by two parties to a
> > shared data structure.
> 
> True... We may end up not having FDT at all and reuse QOM objects for
> that, can even use hashes of QObject pointers as phandles :)

Hm, interesting idea.  I suspect the QOM hierarchy won't be quite
similar enough to the fdt hierarchy (or at least not guaranteed to be)
to make it work, but it's worth thinking about at least.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | 

Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-08 Thread Alexey Kardashevskiy



On 09/01/2020 15:07, David Gibson wrote:
> On Wed, Jan 08, 2020 at 03:07:41PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 07/01/2020 16:26, David Gibson wrote:
>>
 +static uint32_t client_setprop(SpaprMachineState *sm,
 +   uint32_t nodeph, uint32_t pname,
 +   uint32_t valaddr, uint32_t vallen)
 +{
 +char propname[64];
 +uint32_t ret = -1;
 +int proplen = 0;
 +const void *prop;
 +
 +readstr(pname, propname);
 +if (vallen == sizeof(uint32_t) &&
 +((strncmp(propname, "linux,rtas-base", 15) == 0) ||
 + (strncmp(propname, "linux,rtas-entry", 16) == 0))) {
 +
 +sm->rtas_base = readuint32(valaddr);
 +prop = fdt_getprop_namelen(sm->fdt_blob,
 +   
 fdt_node_offset_by_phandle(sm->fdt_blob,
 +  nodeph),
 +   propname, strlen(propname), 
 );
 +if (proplen == vallen) {
 +*(uint32_t *) prop = cpu_to_be32(sm->rtas_base);
 +ret = proplen;
 +}
>>>
>>> Is there a particular reason to restrict this to the rtas properties,
>>> rather than just allowing the guest to fdt_setprop() something
>>> arbitrary?
>>
>> The FDT is flatten and I am not quite sure if libfdt can handle updating
>> properties if the length has changed.
>
> fdt_setprop() will handle updating properties with changed length (in
> fact there's a special fdt_setprop_inplace() optimized for the case
> where you don't need that).  It's not particularly efficient, but it
> should work fine for the cases we have here.  In fact, I think you're
> already relying on this for the code that adds the phandles to
> everything.

 Well, I used to add phandles before calling fdt_pack() so it is not 
 exactly the same.
>>>
>>> Ah, right, that's why adding the phandles worked.
>>>
> One complication is that it can return FDT_ERR_NOSPACE if there isn't
> enough buffer for the increased thing.  We could either trap that,
> resize and retry, or we could leave a bunch of extra space.  The
> latter would be basically equivalent to not doing fdt_pack() on the
> blob in the nobios case.


 This is what I ended up doing.


>> Also, more importantly, potentially property changes like this may have
>> to be reflected in the QEMU device tree so I allowed only the properties
>> which I know how to deal with.
>
> That's a reasonable concern, but the nice thing about not having SLOF
> is that there's only one copy of the device tree - the blob in qemu.
> So a setprop() on that is automatically a setprop() everywhere (this
> is another reason not to write the fdt into guest memory in the nobios
> case - it will become stale as soon as the client changes anything).


 True to a degree. It is "setprop" to the current fdt blob which we do not
 analyze after we build the fdt. We either need to do parse the tree before
 we rebuild it as CAS so we do not lose the updates or do selective changes
 to the QEMUs objects from the "setprop" handler (this is what I do
 now).
>>>
>>> Hrm.. do those setprops happen before CAS?
>>
>> Yes, vmlinux/zimage call "setprop" for "linux,initrd-start",
>> "linux,initrd-end", "bootargs", "linux,stdout-path"; vmlinux sets
>> properties if linux,initrd-* came from r3/r4 and zImage sets properties
>> no matter how it discovered them - from r3/r4 or the device tree.
> 
> Ok, and those setprops happen before CAS?

Yes.

> In a sense this is kind of a fundamental problem with rebuilding the
> whole DT at CAS time.  Except that strictly speaking it's a problem
> even without that: we just get away with it by accident because CAS
> isn't likely to change the same things that guest setprops do.

> It's still basically unsynchronized mutations by two parties to a
> shared data structure.

True... We may end up not having FDT at all and reuse QOM objects for
that, can even use hashes of QObject pointers as phandles :)



>> btw we write them as "cells" (==4bytes long) in qemu but vminux changes
>> them to 8 bytes and zImage keeps it 4 bytes. Not a problem but an
>> interesting fact, this is why I had to allow extending the properties in
>> "setprop" :)
>>
>>
>>>  I would have thought we'd
>>> call CAS before instantiating RTAS.
>>
>> This is correct but I do not think the order is mandatory.
> 
> Hm, right.
> 

-- 
Alexey



Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-08 Thread David Gibson
On Wed, Jan 08, 2020 at 03:07:41PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 07/01/2020 16:26, David Gibson wrote:
> 
> >> +static uint32_t client_setprop(SpaprMachineState *sm,
> >> +   uint32_t nodeph, uint32_t pname,
> >> +   uint32_t valaddr, uint32_t vallen)
> >> +{
> >> +char propname[64];
> >> +uint32_t ret = -1;
> >> +int proplen = 0;
> >> +const void *prop;
> >> +
> >> +readstr(pname, propname);
> >> +if (vallen == sizeof(uint32_t) &&
> >> +((strncmp(propname, "linux,rtas-base", 15) == 0) ||
> >> + (strncmp(propname, "linux,rtas-entry", 16) == 0))) {
> >> +
> >> +sm->rtas_base = readuint32(valaddr);
> >> +prop = fdt_getprop_namelen(sm->fdt_blob,
> >> +   
> >> fdt_node_offset_by_phandle(sm->fdt_blob,
> >> +  nodeph),
> >> +   propname, strlen(propname), 
> >> );
> >> +if (proplen == vallen) {
> >> +*(uint32_t *) prop = cpu_to_be32(sm->rtas_base);
> >> +ret = proplen;
> >> +}
> >
> > Is there a particular reason to restrict this to the rtas properties,
> > rather than just allowing the guest to fdt_setprop() something
> > arbitrary?
> 
>  The FDT is flatten and I am not quite sure if libfdt can handle updating
>  properties if the length has changed.
> >>>
> >>> fdt_setprop() will handle updating properties with changed length (in
> >>> fact there's a special fdt_setprop_inplace() optimized for the case
> >>> where you don't need that).  It's not particularly efficient, but it
> >>> should work fine for the cases we have here.  In fact, I think you're
> >>> already relying on this for the code that adds the phandles to
> >>> everything.
> >>
> >> Well, I used to add phandles before calling fdt_pack() so it is not 
> >> exactly the same.
> > 
> > Ah, right, that's why adding the phandles worked.
> > 
> >>> One complication is that it can return FDT_ERR_NOSPACE if there isn't
> >>> enough buffer for the increased thing.  We could either trap that,
> >>> resize and retry, or we could leave a bunch of extra space.  The
> >>> latter would be basically equivalent to not doing fdt_pack() on the
> >>> blob in the nobios case.
> >>
> >>
> >> This is what I ended up doing.
> >>
> >>
>  Also, more importantly, potentially property changes like this may have
>  to be reflected in the QEMU device tree so I allowed only the properties
>  which I know how to deal with.
> >>>
> >>> That's a reasonable concern, but the nice thing about not having SLOF
> >>> is that there's only one copy of the device tree - the blob in qemu.
> >>> So a setprop() on that is automatically a setprop() everywhere (this
> >>> is another reason not to write the fdt into guest memory in the nobios
> >>> case - it will become stale as soon as the client changes anything).
> >>
> >>
> >> True to a degree. It is "setprop" to the current fdt blob which we do not
> >> analyze after we build the fdt. We either need to do parse the tree before
> >> we rebuild it as CAS so we do not lose the updates or do selective changes
> >> to the QEMUs objects from the "setprop" handler (this is what I do
> >> now).
> > 
> > Hrm.. do those setprops happen before CAS?
> 
> Yes, vmlinux/zimage call "setprop" for "linux,initrd-start",
> "linux,initrd-end", "bootargs", "linux,stdout-path"; vmlinux sets
> properties if linux,initrd-* came from r3/r4 and zImage sets properties
> no matter how it discovered them - from r3/r4 or the device tree.

Ok, and those setprops happen before CAS?

In a sense this is kind of a fundamental problem with rebuilding the
whole DT at CAS time.  Except that strictly speaking it's a problem
even without that: we just get away with it by accident because CAS
isn't likely to change the same things that guest setprops do.

It's still basically unsynchronized mutations by two parties to a
shared data structure.

> btw we write them as "cells" (==4bytes long) in qemu but vminux changes
> them to 8 bytes and zImage keeps it 4 bytes. Not a problem but an
> interesting fact, this is why I had to allow extending the properties in
> "setprop" :)
> 
> 
> >  I would have thought we'd
> > call CAS before instantiating RTAS.
> 
> This is correct but I do not think the order is mandatory.

Hm, right.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-08 Thread David Gibson
On Wed, Jan 08, 2020 at 04:53:06PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 08/01/2020 15:20, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 07/01/2020 16:54, David Gibson wrote:
> >> On Tue, Jan 07, 2020 at 03:44:35PM +1100, Alexey Kardashevskiy wrote:
> >>>
> >>>
> >>> On 06/01/2020 15:19, David Gibson wrote:
> > +
> > +static uint32_t client_package_to_path(const void *fdt, uint32_t 
> > phandle,
> > +   uint32_t buf, uint32_t len)
> > +{
> > +char tmp[256];
> 
>  Fixed sized buffers are icky.  You could either dynamically allocate
>  this based on the size the client gives, or you could use
>  memory_region_get_ram_ptr() to read the data from the tree directly
>  into guest memory.
> >>>
> >>> @len comes from the guest, I am really not comfortable with allocating
> >>> whatever (broken) guest requested. And if I limit @len by 1024 or
> >>> similar, then a fixed size buffer will do too, no?
> >>
> >> I see your point.  Does this call have a way to report failure?  In
> >> that case you could outright fail the call if it requests too long a
> >> length.
> > 
> > It returns length which can be 0 to signal an error.
> > 
> > but with this particular method the bigger problem is that I cannot know
> > in advance the actual path length from fdt_get_path(). I could double
> > the size until fdt_get_path() succeeded, just seems overkill here.
> > 
> > Property names seem to be limited by 32:
> 
> 
> >>> len("ibm,query-interrupt-source-number")
> 33
> 
> Awesome. Oh well :(

Yeah, as I suspected.  Also 'ibm,associativity-reference-points'.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-08 Thread David Gibson
On Wed, Jan 08, 2020 at 03:20:22PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 07/01/2020 16:54, David Gibson wrote:
> > On Tue, Jan 07, 2020 at 03:44:35PM +1100, Alexey Kardashevskiy wrote:
> >>
> >>
> >> On 06/01/2020 15:19, David Gibson wrote:
>  +
>  +static uint32_t client_package_to_path(const void *fdt, uint32_t 
>  phandle,
>  +   uint32_t buf, uint32_t len)
>  +{
>  +char tmp[256];
> >>>
> >>> Fixed sized buffers are icky.  You could either dynamically allocate
> >>> this based on the size the client gives, or you could use
> >>> memory_region_get_ram_ptr() to read the data from the tree directly
> >>> into guest memory.
> >>
> >> @len comes from the guest, I am really not comfortable with allocating
> >> whatever (broken) guest requested. And if I limit @len by 1024 or
> >> similar, then a fixed size buffer will do too, no?
> > 
> > I see your point.  Does this call have a way to report failure?  In
> > that case you could outright fail the call if it requests too long a
> > length.
> 
> It returns length which can be 0 to signal an error.
> 
> but with this particular method the bigger problem is that I cannot know
> in advance the actual path length from fdt_get_path(). I could double
> the size until fdt_get_path() succeeded, just seems overkill here.

fdt_get_path() will return -FDT_ERR_NOSPACE if the path doesn't fit in
the provided buffer.  I think that's enough to fail in the relevant
cases.  You could then use a buffer of size min(client provided size,
fixed max size).

I believe I've thought of trying to implement something that returns
what the path length would be without constructing it, but it turns
out to be essentially impossible given the fdt format and the fact
that we don't allocate inside libfdt (basically we have to use the
partially constructed path buf as state information to know how to
proceed with our scan).

> Property names seem to be limited by 32:
> 
> OF1275:
> ===
> nextprop
> IN:phandle, [string] previous, [address] buf
> OUT:  flag
> 
> Copies the name of the property following previous in the property list
> of the device node identified by phandle into buf, as a null-terminated
> string. Buf is the address of a 32-byte region of memory. If previous is
> zero or a pointer to a null string, copies the name of the device node’s
> first property.
> ===

Yeah... IEEE1275 says that, but I don't think most OF implementations
enforce it.  I'm also pretty sure that limit is broken by device trees
in the wild (I remember investigating this when implementing dtc &
libfdt).

> >> btw how exactly can I use memory_region_get_ram_ptr()?
> >> get_system_memory() returns a root MR which is not RAM, RAM is a
> >> "spapr.ram" sub-MR.
> > 
> > Right, but you know that RAM is always at offset 0 within that root
> > MR. 
> 
> Well, it could potentially be more than just one level down in the MR
> tree, for example we could add NUMA MRs and place actual RAM MR
> under these.

Oh.. yeah, sorry, I think you'd need address_space_translate() or
something like it to locate the right MR first.  Which come to think
of it, validates / truncates the length to, so..

> > That said, it doesn't look like it's that easy to bounds check
> > that pointer, so maybe that's not a good idea after all.
> 
> ok.

.. I think that would work after all.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-07 Thread Alexey Kardashevskiy



On 08/01/2020 15:20, Alexey Kardashevskiy wrote:
> 
> 
> On 07/01/2020 16:54, David Gibson wrote:
>> On Tue, Jan 07, 2020 at 03:44:35PM +1100, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 06/01/2020 15:19, David Gibson wrote:
> +
> +static uint32_t client_package_to_path(const void *fdt, uint32_t phandle,
> +   uint32_t buf, uint32_t len)
> +{
> +char tmp[256];

 Fixed sized buffers are icky.  You could either dynamically allocate
 this based on the size the client gives, or you could use
 memory_region_get_ram_ptr() to read the data from the tree directly
 into guest memory.
>>>
>>> @len comes from the guest, I am really not comfortable with allocating
>>> whatever (broken) guest requested. And if I limit @len by 1024 or
>>> similar, then a fixed size buffer will do too, no?
>>
>> I see your point.  Does this call have a way to report failure?  In
>> that case you could outright fail the call if it requests too long a
>> length.
> 
> It returns length which can be 0 to signal an error.
> 
> but with this particular method the bigger problem is that I cannot know
> in advance the actual path length from fdt_get_path(). I could double
> the size until fdt_get_path() succeeded, just seems overkill here.
> 
> Property names seem to be limited by 32:


>>> len("ibm,query-interrupt-source-number")
33

Awesome. Oh well :(


> 
> OF1275:
> ===
> nextprop
> IN:phandle, [string] previous, [address] buf
> OUT:  flag
> 
> Copies the name of the property following previous in the property list
> of the device node identified by phandle into buf, as a null-terminated
> string. Buf is the address of a 32-byte region of memory. If previous is
> zero or a pointer to a null string, copies the name of the device node’s
> first property.
> ===
> 
> 
>>> btw how exactly can I use memory_region_get_ram_ptr()?
>>> get_system_memory() returns a root MR which is not RAM, RAM is a
>>> "spapr.ram" sub-MR.
>>
>> Right, but you know that RAM is always at offset 0 within that root
>> MR. 
> 
> Well, it could potentially be more than just one level down in the MR
> tree, for example we could add NUMA MRs and place actual RAM MR under these.
> 
> 
>> That said, it doesn't look like it's that easy to bounds check
>> that pointer, so maybe that's not a good idea after all.
> 
> ok.
> 
> 

-- 
Alexey



Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-07 Thread Alexey Kardashevskiy



On 07/01/2020 16:54, David Gibson wrote:
> On Tue, Jan 07, 2020 at 03:44:35PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 06/01/2020 15:19, David Gibson wrote:
 +
 +static uint32_t client_package_to_path(const void *fdt, uint32_t phandle,
 +   uint32_t buf, uint32_t len)
 +{
 +char tmp[256];
>>>
>>> Fixed sized buffers are icky.  You could either dynamically allocate
>>> this based on the size the client gives, or you could use
>>> memory_region_get_ram_ptr() to read the data from the tree directly
>>> into guest memory.
>>
>> @len comes from the guest, I am really not comfortable with allocating
>> whatever (broken) guest requested. And if I limit @len by 1024 or
>> similar, then a fixed size buffer will do too, no?
> 
> I see your point.  Does this call have a way to report failure?  In
> that case you could outright fail the call if it requests too long a
> length.

It returns length which can be 0 to signal an error.

but with this particular method the bigger problem is that I cannot know
in advance the actual path length from fdt_get_path(). I could double
the size until fdt_get_path() succeeded, just seems overkill here.

Property names seem to be limited by 32:

OF1275:
===
nextprop
IN:phandle, [string] previous, [address] buf
OUT:  flag

Copies the name of the property following previous in the property list
of the device node identified by phandle into buf, as a null-terminated
string. Buf is the address of a 32-byte region of memory. If previous is
zero or a pointer to a null string, copies the name of the device node’s
first property.
===


>> btw how exactly can I use memory_region_get_ram_ptr()?
>> get_system_memory() returns a root MR which is not RAM, RAM is a
>> "spapr.ram" sub-MR.
> 
> Right, but you know that RAM is always at offset 0 within that root
> MR. 

Well, it could potentially be more than just one level down in the MR
tree, for example we could add NUMA MRs and place actual RAM MR under these.


> That said, it doesn't look like it's that easy to bounds check
> that pointer, so maybe that's not a good idea after all.

ok.


-- 
Alexey



Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-07 Thread Alexey Kardashevskiy



On 07/01/2020 16:26, David Gibson wrote:

>> +static uint32_t client_setprop(SpaprMachineState *sm,
>> +   uint32_t nodeph, uint32_t pname,
>> +   uint32_t valaddr, uint32_t vallen)
>> +{
>> +char propname[64];
>> +uint32_t ret = -1;
>> +int proplen = 0;
>> +const void *prop;
>> +
>> +readstr(pname, propname);
>> +if (vallen == sizeof(uint32_t) &&
>> +((strncmp(propname, "linux,rtas-base", 15) == 0) ||
>> + (strncmp(propname, "linux,rtas-entry", 16) == 0))) {
>> +
>> +sm->rtas_base = readuint32(valaddr);
>> +prop = fdt_getprop_namelen(sm->fdt_blob,
>> +   
>> fdt_node_offset_by_phandle(sm->fdt_blob,
>> +  nodeph),
>> +   propname, strlen(propname), 
>> );
>> +if (proplen == vallen) {
>> +*(uint32_t *) prop = cpu_to_be32(sm->rtas_base);
>> +ret = proplen;
>> +}
>
> Is there a particular reason to restrict this to the rtas properties,
> rather than just allowing the guest to fdt_setprop() something
> arbitrary?

 The FDT is flatten and I am not quite sure if libfdt can handle updating
 properties if the length has changed.
>>>
>>> fdt_setprop() will handle updating properties with changed length (in
>>> fact there's a special fdt_setprop_inplace() optimized for the case
>>> where you don't need that).  It's not particularly efficient, but it
>>> should work fine for the cases we have here.  In fact, I think you're
>>> already relying on this for the code that adds the phandles to
>>> everything.
>>
>> Well, I used to add phandles before calling fdt_pack() so it is not exactly 
>> the same.
> 
> Ah, right, that's why adding the phandles worked.
> 
>>> One complication is that it can return FDT_ERR_NOSPACE if there isn't
>>> enough buffer for the increased thing.  We could either trap that,
>>> resize and retry, or we could leave a bunch of extra space.  The
>>> latter would be basically equivalent to not doing fdt_pack() on the
>>> blob in the nobios case.
>>
>>
>> This is what I ended up doing.
>>
>>
 Also, more importantly, potentially property changes like this may have
 to be reflected in the QEMU device tree so I allowed only the properties
 which I know how to deal with.
>>>
>>> That's a reasonable concern, but the nice thing about not having SLOF
>>> is that there's only one copy of the device tree - the blob in qemu.
>>> So a setprop() on that is automatically a setprop() everywhere (this
>>> is another reason not to write the fdt into guest memory in the nobios
>>> case - it will become stale as soon as the client changes anything).
>>
>>
>> True to a degree. It is "setprop" to the current fdt blob which we do not
>> analyze after we build the fdt. We either need to do parse the tree before
>> we rebuild it as CAS so we do not lose the updates or do selective changes
>> to the QEMUs objects from the "setprop" handler (this is what I do
>> now).
> 
> Hrm.. do those setprops happen before CAS?

Yes, vmlinux/zimage call "setprop" for "linux,initrd-start",
"linux,initrd-end", "bootargs", "linux,stdout-path"; vmlinux sets
properties if linux,initrd-* came from r3/r4 and zImage sets properties
no matter how it discovered them - from r3/r4 or the device tree.

btw we write them as "cells" (==4bytes long) in qemu but vminux changes
them to 8 bytes and zImage keeps it 4 bytes. Not a problem but an
interesting fact, this is why I had to allow extending the properties in
"setprop" :)


>  I would have thought we'd
> call CAS before instantiating RTAS.

This is correct but I do not think the order is mandatory.


-- 
Alexey



Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-07 Thread David Gibson
On Mon, Jan 06, 2020 at 11:34:13PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 06/01/2020 19:50, David Gibson wrote:
> > On Mon, Jan 06, 2020 at 05:28:55PM +1100, Alexey Kardashevskiy wrote:
> > > 
> > > 
> > > On 06/01/2020 15:19, David Gibson wrote:
> > > > On Mon, Jan 06, 2020 at 10:42:42AM +1100, Alexey Kardashevskiy wrote:
> > > > > The Petitboot bootloader is way more advanced than SLOF is ever going 
> > > > > to
> > > > > be as Petitboot comes with the full-featured Linux kernel with all
> > > > > the drivers, and initramdisk with quite user friendly interface.
> > > > > The problem with ditching SLOF is that an unmodified pseries kernel 
> > > > > can
> > > > > either start via:
> > > > > 1. kexec, this requires presence of RTAS and skips
> > > > > ibm,client-architecture-support entirely;
> > > > > 2. normal boot, this heavily relies on the OF1275 client interface to
> > > > > fetch the device tree and do early setup (claim memory).
> > > > > 
> > > > > This adds a new bios-less mode to the pseries machine: "bios=on|off".
> > > > > When enabled, QEMU does not load SLOF and jumps to the kernel from
> > > > > "-kernel".
> > > > > 
> > > > > The client interface is implemented exactly as RTAS - a 20 bytes blob,
> > > > > right next after the RTAS blob. The entry point is passed to the 
> > > > > kernel
> > > > > via GPR5.
> > > > > 
> > > > > This implements a handful of client interface methods just to get 
> > > > > going.
> > > > > In particular, this implements the device tree fetching,
> > > > > ibm,client-architecture-support and instantiate-rtas.
> > > > > 
> > > > > This implements changing FDT properties only for "linux,rtas-base" and
> > > > > "linux,rtas-entry", just to get early boot going.
> > > > > 
> > > > > This assigns "phandles" to device tree nodes as there is no more SLOF
> > > > > and OF nodes addresses of which served as phandle values.
> > > > > This keeps predefined nodes (such as XICS/NVLINK/...) unchanged.
> > > > > phandles are regenerated at every FDT rebuild.
> > > > > 
> > > > > When bios=off, this adds "/chosen" every time QEMU builds a tree.
> > > > > 
> > > > > This implements "claim" which the client (Linux) uses for memory
> > > > > allocation; this is also  used by QEMU for claiming kernel/initrd 
> > > > > images,
> > > > > client interface entry point, RTAS and the initial stack.
> > > > > 
> > > > > While at this, add a "kernel-addr" machine parameter to allow moving
> > > > > the kernel in memory. This is useful for debugging if the kernel is
> > > > > loaded at @0, although not necessary.
> > > > > 
> > > > > This does not implement instances properly but in order to boot a VM,
> > > > > we only really need a stdout instance and the "/" instance for
> > > > > "call-method", we fake these.
> > > > 
> > > > As we've discussed, I really like the idea of this.  It think the
> > > > basic approach looks good too.
> > > > 
> > > > As you probably realize, there are quite a lot of things to be
> > > > polished though.  Comments below.
> > > > 
> > > > > 
> > > > > The test command line:
> > > > > 
> > > > > qemu-system-ppc64 \
> > > > > -nodefaults \
> > > > > -chardev stdio,id=STDIO0,signal=off,mux=on \
> > > > > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> > > > > -mon id=MON0,chardev=STDIO0,mode=readline \
> > > > > -nographic \
> > > > > -vga none \
> > > > > -kernel vmldbg \
> > > > > -machine pseries,bios=off \
> > > > > -m 4G \
> > > > > -enable-kvm \
> > > > > -initrd pb/rootfs.cpio.xz \
> > > > > img/u1804-64le.qcow2 \
> > > > > -snapshot \
> > > > > -smp 8,threads=8 \
> > > > > -L /home/aik/t/qemu-ppc64-bios/ \
> > > > > -trace events=qemu_trace_events \
> > > > > -d guest_errors \
> > > > > -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh37874 \
> > > > > -mon chardev=SOCKET0,mode=control
> > > > > 
> > > > > Signed-off-by: Alexey Kardashevskiy 
> > > > > ---
> > > > > Changes:
> > > > > v2:
> > > > > * fixed claim()
> > > > > * added "setprop"
> > > > > * cleaner client interface and RTAS blobs management
> > > > > * boots to petitboot and further to the target system
> > > > > * more trace points
> > > > > ---
> > > > >   hw/ppc/Makefile.objs   |   1 +
> > > > >   include/hw/loader.h|   1 +
> > > > >   include/hw/ppc/spapr.h |  25 ++-
> > > > >   hw/ppc/spapr.c | 231 ++--
> > > > >   hw/ppc/spapr_client.c  | 464 
> > > > > +
> > > > >   hw/ppc/spapr_hcall.c   |  49 +++--
> > > > >   hw/ppc/trace-events|   9 +
> > > > >   7 files changed, 743 insertions(+), 37 deletions(-)
> > > > >   create mode 100644 hw/ppc/spapr_client.c
> > > > > 
> > > > > diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> > > > > index 101e9fc59185..ce31c0acd2fb 100644
> > > > > --- a/hw/ppc/Makefile.objs
> > > > > +++ b/hw/ppc/Makefile.objs
> > > > > @@ -6,6 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
> > > > > spapr_rtas.o
> > > > >   

Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-07 Thread David Gibson
On Tue, Jan 07, 2020 at 03:44:35PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 06/01/2020 15:19, David Gibson wrote:
> >> +
> >> +static uint32_t client_package_to_path(const void *fdt, uint32_t phandle,
> >> +   uint32_t buf, uint32_t len)
> >> +{
> >> +char tmp[256];
> > 
> > Fixed sized buffers are icky.  You could either dynamically allocate
> > this based on the size the client gives, or you could use
> > memory_region_get_ram_ptr() to read the data from the tree directly
> > into guest memory.
> 
> @len comes from the guest, I am really not comfortable with allocating
> whatever (broken) guest requested. And if I limit @len by 1024 or
> similar, then a fixed size buffer will do too, no?

I see your point.  Does this call have a way to report failure?  In
that case you could outright fail the call if it requests too long a
length.

> btw how exactly can I use memory_region_get_ram_ptr()?
> get_system_memory() returns a root MR which is not RAM, RAM is a
> "spapr.ram" sub-MR.

Right, but you know that RAM is always at offset 0 within that root
MR.  That said, it doesn't look like it's that easy to bounds check
that pointer, so maybe that's not a good idea after all.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread Alexey Kardashevskiy



On 06/01/2020 15:19, David Gibson wrote:
>> +
>> +static uint32_t client_package_to_path(const void *fdt, uint32_t phandle,
>> +   uint32_t buf, uint32_t len)
>> +{
>> +char tmp[256];
> 
> Fixed sized buffers are icky.  You could either dynamically allocate
> this based on the size the client gives, or you could use
> memory_region_get_ram_ptr() to read the data from the tree directly
> into guest memory.

@len comes from the guest, I am really not comfortable with allocating
whatever (broken) guest requested. And if I limit @len by 1024 or
similar, then a fixed size buffer will do too, no?

btw how exactly can I use memory_region_get_ram_ptr()?
get_system_memory() returns a root MR which is not RAM, RAM is a
"spapr.ram" sub-MR.



-- 
Alexey



Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread Alexey Kardashevskiy



On 07/01/2020 04:25, Peter Maydell wrote:
> On Mon, 6 Jan 2020 at 17:09, Cédric Le Goater  wrote:
>> ARM bootloaders are also embedded in QEMU's code. See hw/arm/boot.c.
>> You could improve a bit the definition though.
> 
> Given an opportunity to restart from scratch I don't know
> that I'd do things the way hw/arm/boot.c does. The initial
> idea was really simple and straightforward: 3 or 4 insns
> which just set some registers and jumped to the kernel.

I believe this is the case here - the previous blob like this (RTAS) has
never changed. In fact it was compiled so rarely that when I did compile
it, I found that it does not compile big-endian as it must :) Having a
blob like this solves this too.


-- 
Alexey



Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread David Gibson
On Tue, Jan 07, 2020 at 12:39:42AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 06/01/2020 15:19, David Gibson wrote:
> 
> > > +const struct fdt_property *prop;
> > > +const char *tmp;
> > > +
> > > +readstr(prevaddr, prev);
> > > +for (offset = fdt_first_property_offset(fdt, offset);
> > > + (offset >= 0);
> > > + (offset = fdt_next_property_offset(fdt, offset))) {
> > > +
> > > +prop = fdt_get_property_by_offset(fdt, offset, );
> > 
> > fdt_getprop_by_offset() also returns the property's name without
> > having to dick around with fdt_get_string() manually.
> 
> 
> btw I looked quickly:
> 
> const void *fdt_getprop_by_offset(const void *fdt, int offset,
>   const char **namep, int *lenp)
> {
> const struct fdt_property *prop;
> 
> prop = fdt_get_property_by_offset_(fdt, offset, lenp);
> if (!prop)
> return NULL;
> if (namep) {
> const char *name;
> int namelen;
> name = fdt_get_string(fdt, fdt32_ld(>nameoff),
>   );
> if (!name) {
> if (lenp)
> *lenp = namelen;
> return NULL;
> }
> *namep = name;
> }
> 
> /* Handle realignment */
> if (fdt_version(fdt) < 0x10 && (offset + sizeof(*prop)) % 8 &&
> fdt32_ld(>len) >= 8)
> return prop->data + 4;
> return prop->data;
> }
> 
> 
> and incorrectly assumed *lenp is the name length because of "*lenp =
> namelen". Huh :)

Ah, yeah, that is a bit confusing, but I don't see a way to improve
it.  That's the error case, in which case both *lenp and namelen are
carrying an error code, not their usual value.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread Philippe Mathieu-Daudé

On 1/6/20 6:25 PM, Peter Maydell wrote:

On Mon, 6 Jan 2020 at 17:09, Cédric Le Goater  wrote:

ARM bootloaders are also embedded in QEMU's code. See hw/arm/boot.c.
You could improve a bit the definition though.


Given an opportunity to restart from scratch I don't know
that I'd do things the way hw/arm/boot.c does. The initial
idea was really simple and straightforward: 3 or 4 insns
which just set some registers and jumped to the kernel.
Fast-forward a decade or two, and the complexity has
significantly increased as we added extra tweaks to deal
with SMP systems, the GIC interrupt controller, boards
which need to do some extra odd stuff, CPUs which start
in Secure state, 64-bit CPUs, mangling the DTB, booting
multiple flavours of image file format, implementing
various 'firmware' functionality and APIs, and so on and on.


:S


Insisting from the start that QEMU emulates what bare metal
hardware does, and doesn't get into the business of faking
up the behaviour of firmware would have been a neater
separation of concerns in the long run.

To the narrower concern, yeah, on the arm side we
just embed hand-assembled hex values in the C file;
this is mostly to avoid needing a cross-compiler setup at
QEMU build time, but it also lets us hand-patch the
binary blob at runtime to fill in addresses and so on.


This was also before we provide a handy docker image with cross compiler 
for about all architectures supported.


Maybe the cross-build-guest-tests target from tests/tcg/Makefile.qemu 
can be reused to build spapr-rtas.S?





Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread Peter Maydell
On Mon, 6 Jan 2020 at 17:09, Cédric Le Goater  wrote:
> ARM bootloaders are also embedded in QEMU's code. See hw/arm/boot.c.
> You could improve a bit the definition though.

Given an opportunity to restart from scratch I don't know
that I'd do things the way hw/arm/boot.c does. The initial
idea was really simple and straightforward: 3 or 4 insns
which just set some registers and jumped to the kernel.
Fast-forward a decade or two, and the complexity has
significantly increased as we added extra tweaks to deal
with SMP systems, the GIC interrupt controller, boards
which need to do some extra odd stuff, CPUs which start
in Secure state, 64-bit CPUs, mangling the DTB, booting
multiple flavours of image file format, implementing
various 'firmware' functionality and APIs, and so on and on.
Insisting from the start that QEMU emulates what bare metal
hardware does, and doesn't get into the business of faking
up the behaviour of firmware would have been a neater
separation of concerns in the long run.

To the narrower concern, yeah, on the arm side we
just embed hand-assembled hex values in the C file;
this is mostly to avoid needing a cross-compiler setup at
QEMU build time, but it also lets us hand-patch the
binary blob at runtime to fill in addresses and so on.

thanks
-- PMM



Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread Cédric Le Goater
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e62c89b3dd40..1c97534a0aea 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -896,6 +896,51 @@ out:
>   return ret;
>   }
>   +/*
> + * Below is a compiled version of RTAS blob and OF client interface 
> entry point.
> + *
> + * gcc -nostdlib  -mbig -o spapr-rtas.img spapr-rtas.S
> + * objcopy  -O binary -j .text  spapr-rtas.img spapr-rtas.bin
> + *
> + *   .globl  _start
> + *   _start:
> + *   mr  4,3
> + *   lis 3,KVMPPC_H_RTAS@h
> + *   ori 3,3,KVMPPC_H_RTAS@l
> + *   sc  1
> + *   blr
> + *   mr  4,3
> + *   lis 3,KVMPPC_H_CLIENT@h
> + *   ori 3,3,KVMPPC_H_CLIENT@l
> + *   sc  1
> + *   blr
> + */
> +static struct {
> +    uint8_t rtas[20], client[20];
> +} QEMU_PACKED rtas_client_blob = {
> +    .rtas = {
> +    0x7c, 0x64, 0x1b, 0x78,
> +    0x3c, 0x60, 0x00, 0x00,
> +    0x60, 0x63, 0xf0, 0x00,
> +    0x44, 0x00, 0x00, 0x22,
> +    0x4e, 0x80, 0x00, 0x20
> +    },
> +    .client = {
> +    0x7c, 0x64, 0x1b, 0x78,
> +    0x3c, 0x60, 0x00, 0x00,
> +    0x60, 0x63, 0xf0, 0x05,
> +    0x44, 0x00, 0x00, 0x22,
> +    0x4e, 0x80, 0x00, 0x20
> +    }
> +};

 I'd really prefer to read this in from a file (or files) which we
 assemble (as with the existing spapr-rtas.img), rather than having a
 magic array in qemu.
>>>
>>> Two considerations here:
>>> 1. This blob is not going to change (at least often)
>>
>> True, but I'd still prefer to build it from a .S rather than have the
>> binary written out.  We already had this until recently (when we moved
>> the RTAS image into SLOF), and it's really not too hard to manage.
> 
> 
> But what is exactly the benefit? It is not going to change. I can look into 
> making it a .S, compile it to a binary an include this binary into 
> qemu-system-ppc64 as a byte array, just need to refresh my gcc/ldd memories 
> but a separate file is too much for this imho.

ARM bootloaders are also embedded in QEMU's code. See hw/arm/boot.c.
You could improve a bit the definition though.

C.

> 
>>> 2. A new file which needs to be packaged/signed/copied.
>>
>> Eh, again, we had this for RTAS until very recently and it really
>> wasn't that much hassle.
> 
> 
> I keep secure VMs in mind. And SLOF is a separate package, why was not RTAS?




Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread Alexey Kardashevskiy




On 06/01/2020 15:19, David Gibson wrote:


+const struct fdt_property *prop;
+const char *tmp;
+
+readstr(prevaddr, prev);
+for (offset = fdt_first_property_offset(fdt, offset);
+ (offset >= 0);
+ (offset = fdt_next_property_offset(fdt, offset))) {
+
+prop = fdt_get_property_by_offset(fdt, offset, );


fdt_getprop_by_offset() also returns the property's name without
having to dick around with fdt_get_string() manually.



btw I looked quickly:

const void *fdt_getprop_by_offset(const void *fdt, int offset,
  const char **namep, int *lenp)
{
const struct fdt_property *prop;

prop = fdt_get_property_by_offset_(fdt, offset, lenp);
if (!prop)
return NULL;
if (namep) {
const char *name;
int namelen;
name = fdt_get_string(fdt, fdt32_ld(>nameoff),
  );
if (!name) {
if (lenp)
*lenp = namelen;
return NULL;
}
*namep = name;
}

/* Handle realignment */
if (fdt_version(fdt) < 0x10 && (offset + sizeof(*prop)) % 8 &&
fdt32_ld(>len) >= 8)
return prop->data + 4;
return prop->data;
}


and incorrectly assumed *lenp is the name length because of "*lenp = namelen". 
Huh :)


--
Alexey



Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread Alexey Kardashevskiy




On 06/01/2020 19:50, David Gibson wrote:

On Mon, Jan 06, 2020 at 05:28:55PM +1100, Alexey Kardashevskiy wrote:



On 06/01/2020 15:19, David Gibson wrote:

On Mon, Jan 06, 2020 at 10:42:42AM +1100, Alexey Kardashevskiy wrote:

The Petitboot bootloader is way more advanced than SLOF is ever going to
be as Petitboot comes with the full-featured Linux kernel with all
the drivers, and initramdisk with quite user friendly interface.
The problem with ditching SLOF is that an unmodified pseries kernel can
either start via:
1. kexec, this requires presence of RTAS and skips
ibm,client-architecture-support entirely;
2. normal boot, this heavily relies on the OF1275 client interface to
fetch the device tree and do early setup (claim memory).

This adds a new bios-less mode to the pseries machine: "bios=on|off".
When enabled, QEMU does not load SLOF and jumps to the kernel from
"-kernel".

The client interface is implemented exactly as RTAS - a 20 bytes blob,
right next after the RTAS blob. The entry point is passed to the kernel
via GPR5.

This implements a handful of client interface methods just to get going.
In particular, this implements the device tree fetching,
ibm,client-architecture-support and instantiate-rtas.

This implements changing FDT properties only for "linux,rtas-base" and
"linux,rtas-entry", just to get early boot going.

This assigns "phandles" to device tree nodes as there is no more SLOF
and OF nodes addresses of which served as phandle values.
This keeps predefined nodes (such as XICS/NVLINK/...) unchanged.
phandles are regenerated at every FDT rebuild.

When bios=off, this adds "/chosen" every time QEMU builds a tree.

This implements "claim" which the client (Linux) uses for memory
allocation; this is also  used by QEMU for claiming kernel/initrd images,
client interface entry point, RTAS and the initial stack.

While at this, add a "kernel-addr" machine parameter to allow moving
the kernel in memory. This is useful for debugging if the kernel is
loaded at @0, although not necessary.

This does not implement instances properly but in order to boot a VM,
we only really need a stdout instance and the "/" instance for
"call-method", we fake these.


As we've discussed, I really like the idea of this.  It think the
basic approach looks good too.

As you probably realize, there are quite a lot of things to be
polished though.  Comments below.



The test command line:

qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-kernel vmldbg \
-machine pseries,bios=off \
-m 4G \
-enable-kvm \
-initrd pb/rootfs.cpio.xz \
img/u1804-64le.qcow2 \
-snapshot \
-smp 8,threads=8 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh37874 \
-mon chardev=SOCKET0,mode=control

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* fixed claim()
* added "setprop"
* cleaner client interface and RTAS blobs management
* boots to petitboot and further to the target system
* more trace points
---
  hw/ppc/Makefile.objs   |   1 +
  include/hw/loader.h|   1 +
  include/hw/ppc/spapr.h |  25 ++-
  hw/ppc/spapr.c | 231 ++--
  hw/ppc/spapr_client.c  | 464 +
  hw/ppc/spapr_hcall.c   |  49 +++--
  hw/ppc/trace-events|   9 +
  7 files changed, 743 insertions(+), 37 deletions(-)
  create mode 100644 hw/ppc/spapr_client.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 101e9fc59185..ce31c0acd2fb 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -6,6 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
spapr_rtas.o
  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
  obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o spapr_irq.o
  obj-$(CONFIG_PSERIES) += spapr_tpm_proxy.o
+obj-$(CONFIG_PSERIES) += spapr_client.o


This applies in a bunch of places here.  Just calling things "client"
is clear enough if you're already thinking about Open Firmware.  But
this appears in a bunch of places where you might come across it
without that context, in which case it could be kind of confusing.  So
I'd suggest using "of_client" or "of_client_interface" in most places
you're using "client".


  obj-$(CONFIG_SPAPR_RNG) +=  spapr_rng.o
  # IBM PowerNV
  obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o pnv_psi.o 
pnv_occ.o pnv_bmc.o
diff --git a/include/hw/loader.h b/include/hw/loader.h
index 48a96cd55904..a2f58077a47e 100644
--- a/include/hw/loader.h
+++ b/include/hw/loader.h
@@ -262,6 +262,7 @@ MemoryRegion *rom_add_blob(const char *name, const void 
*blob, size_t len,
  int rom_add_elf_program(const char *name, GMappedFile *mapped_file, void 
*data,
  size_t datasize, size_t romsize, hwaddr addr,
  AddressSpace 

Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-06 Thread David Gibson
On Mon, Jan 06, 2020 at 05:28:55PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 06/01/2020 15:19, David Gibson wrote:
> > On Mon, Jan 06, 2020 at 10:42:42AM +1100, Alexey Kardashevskiy wrote:
> >> The Petitboot bootloader is way more advanced than SLOF is ever going to
> >> be as Petitboot comes with the full-featured Linux kernel with all
> >> the drivers, and initramdisk with quite user friendly interface.
> >> The problem with ditching SLOF is that an unmodified pseries kernel can
> >> either start via:
> >> 1. kexec, this requires presence of RTAS and skips
> >> ibm,client-architecture-support entirely;
> >> 2. normal boot, this heavily relies on the OF1275 client interface to
> >> fetch the device tree and do early setup (claim memory).
> >>
> >> This adds a new bios-less mode to the pseries machine: "bios=on|off".
> >> When enabled, QEMU does not load SLOF and jumps to the kernel from
> >> "-kernel".
> >>
> >> The client interface is implemented exactly as RTAS - a 20 bytes blob,
> >> right next after the RTAS blob. The entry point is passed to the kernel
> >> via GPR5.
> >>
> >> This implements a handful of client interface methods just to get going.
> >> In particular, this implements the device tree fetching,
> >> ibm,client-architecture-support and instantiate-rtas.
> >>
> >> This implements changing FDT properties only for "linux,rtas-base" and
> >> "linux,rtas-entry", just to get early boot going.
> >>
> >> This assigns "phandles" to device tree nodes as there is no more SLOF
> >> and OF nodes addresses of which served as phandle values.
> >> This keeps predefined nodes (such as XICS/NVLINK/...) unchanged.
> >> phandles are regenerated at every FDT rebuild.
> >>
> >> When bios=off, this adds "/chosen" every time QEMU builds a tree.
> >>
> >> This implements "claim" which the client (Linux) uses for memory
> >> allocation; this is also  used by QEMU for claiming kernel/initrd images,
> >> client interface entry point, RTAS and the initial stack.
> >>
> >> While at this, add a "kernel-addr" machine parameter to allow moving
> >> the kernel in memory. This is useful for debugging if the kernel is
> >> loaded at @0, although not necessary.
> >>
> >> This does not implement instances properly but in order to boot a VM,
> >> we only really need a stdout instance and the "/" instance for
> >> "call-method", we fake these.
> > 
> > As we've discussed, I really like the idea of this.  It think the
> > basic approach looks good too.
> > 
> > As you probably realize, there are quite a lot of things to be
> > polished though.  Comments below.
> > 
> >>
> >> The test command line:
> >>
> >> qemu-system-ppc64 \
> >> -nodefaults \
> >> -chardev stdio,id=STDIO0,signal=off,mux=on \
> >> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> >> -mon id=MON0,chardev=STDIO0,mode=readline \
> >> -nographic \
> >> -vga none \
> >> -kernel vmldbg \
> >> -machine pseries,bios=off \
> >> -m 4G \
> >> -enable-kvm \
> >> -initrd pb/rootfs.cpio.xz \
> >> img/u1804-64le.qcow2 \
> >> -snapshot \
> >> -smp 8,threads=8 \
> >> -L /home/aik/t/qemu-ppc64-bios/ \
> >> -trace events=qemu_trace_events \
> >> -d guest_errors \
> >> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh37874 \
> >> -mon chardev=SOCKET0,mode=control
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >> Changes:
> >> v2:
> >> * fixed claim()
> >> * added "setprop"
> >> * cleaner client interface and RTAS blobs management
> >> * boots to petitboot and further to the target system
> >> * more trace points
> >> ---
> >>  hw/ppc/Makefile.objs   |   1 +
> >>  include/hw/loader.h|   1 +
> >>  include/hw/ppc/spapr.h |  25 ++-
> >>  hw/ppc/spapr.c | 231 ++--
> >>  hw/ppc/spapr_client.c  | 464 +
> >>  hw/ppc/spapr_hcall.c   |  49 +++--
> >>  hw/ppc/trace-events|   9 +
> >>  7 files changed, 743 insertions(+), 37 deletions(-)
> >>  create mode 100644 hw/ppc/spapr_client.c
> >>
> >> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> >> index 101e9fc59185..ce31c0acd2fb 100644
> >> --- a/hw/ppc/Makefile.objs
> >> +++ b/hw/ppc/Makefile.objs
> >> @@ -6,6 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
> >> spapr_rtas.o
> >>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
> >>  obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o spapr_irq.o
> >>  obj-$(CONFIG_PSERIES) += spapr_tpm_proxy.o
> >> +obj-$(CONFIG_PSERIES) += spapr_client.o
> > 
> > This applies in a bunch of places here.  Just calling things "client"
> > is clear enough if you're already thinking about Open Firmware.  But
> > this appears in a bunch of places where you might come across it
> > without that context, in which case it could be kind of confusing.  So
> > I'd suggest using "of_client" or "of_client_interface" in most places
> > you're using "client".
> > 
> >>  obj-$(CONFIG_SPAPR_RNG) +=  spapr_rng.o
> >>  # IBM PowerNV
> >>  obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o 

Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-05 Thread Alexey Kardashevskiy



On 06/01/2020 15:19, David Gibson wrote:
> On Mon, Jan 06, 2020 at 10:42:42AM +1100, Alexey Kardashevskiy wrote:
>> The Petitboot bootloader is way more advanced than SLOF is ever going to
>> be as Petitboot comes with the full-featured Linux kernel with all
>> the drivers, and initramdisk with quite user friendly interface.
>> The problem with ditching SLOF is that an unmodified pseries kernel can
>> either start via:
>> 1. kexec, this requires presence of RTAS and skips
>> ibm,client-architecture-support entirely;
>> 2. normal boot, this heavily relies on the OF1275 client interface to
>> fetch the device tree and do early setup (claim memory).
>>
>> This adds a new bios-less mode to the pseries machine: "bios=on|off".
>> When enabled, QEMU does not load SLOF and jumps to the kernel from
>> "-kernel".
>>
>> The client interface is implemented exactly as RTAS - a 20 bytes blob,
>> right next after the RTAS blob. The entry point is passed to the kernel
>> via GPR5.
>>
>> This implements a handful of client interface methods just to get going.
>> In particular, this implements the device tree fetching,
>> ibm,client-architecture-support and instantiate-rtas.
>>
>> This implements changing FDT properties only for "linux,rtas-base" and
>> "linux,rtas-entry", just to get early boot going.
>>
>> This assigns "phandles" to device tree nodes as there is no more SLOF
>> and OF nodes addresses of which served as phandle values.
>> This keeps predefined nodes (such as XICS/NVLINK/...) unchanged.
>> phandles are regenerated at every FDT rebuild.
>>
>> When bios=off, this adds "/chosen" every time QEMU builds a tree.
>>
>> This implements "claim" which the client (Linux) uses for memory
>> allocation; this is also  used by QEMU for claiming kernel/initrd images,
>> client interface entry point, RTAS and the initial stack.
>>
>> While at this, add a "kernel-addr" machine parameter to allow moving
>> the kernel in memory. This is useful for debugging if the kernel is
>> loaded at @0, although not necessary.
>>
>> This does not implement instances properly but in order to boot a VM,
>> we only really need a stdout instance and the "/" instance for
>> "call-method", we fake these.
> 
> As we've discussed, I really like the idea of this.  It think the
> basic approach looks good too.
> 
> As you probably realize, there are quite a lot of things to be
> polished though.  Comments below.
> 
>>
>> The test command line:
>>
>> qemu-system-ppc64 \
>> -nodefaults \
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>> -mon id=MON0,chardev=STDIO0,mode=readline \
>> -nographic \
>> -vga none \
>> -kernel vmldbg \
>> -machine pseries,bios=off \
>> -m 4G \
>> -enable-kvm \
>> -initrd pb/rootfs.cpio.xz \
>> img/u1804-64le.qcow2 \
>> -snapshot \
>> -smp 8,threads=8 \
>> -L /home/aik/t/qemu-ppc64-bios/ \
>> -trace events=qemu_trace_events \
>> -d guest_errors \
>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh37874 \
>> -mon chardev=SOCKET0,mode=control
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>> Changes:
>> v2:
>> * fixed claim()
>> * added "setprop"
>> * cleaner client interface and RTAS blobs management
>> * boots to petitboot and further to the target system
>> * more trace points
>> ---
>>  hw/ppc/Makefile.objs   |   1 +
>>  include/hw/loader.h|   1 +
>>  include/hw/ppc/spapr.h |  25 ++-
>>  hw/ppc/spapr.c | 231 ++--
>>  hw/ppc/spapr_client.c  | 464 +
>>  hw/ppc/spapr_hcall.c   |  49 +++--
>>  hw/ppc/trace-events|   9 +
>>  7 files changed, 743 insertions(+), 37 deletions(-)
>>  create mode 100644 hw/ppc/spapr_client.c
>>
>> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
>> index 101e9fc59185..ce31c0acd2fb 100644
>> --- a/hw/ppc/Makefile.objs
>> +++ b/hw/ppc/Makefile.objs
>> @@ -6,6 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
>> spapr_rtas.o
>>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
>>  obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o spapr_irq.o
>>  obj-$(CONFIG_PSERIES) += spapr_tpm_proxy.o
>> +obj-$(CONFIG_PSERIES) += spapr_client.o
> 
> This applies in a bunch of places here.  Just calling things "client"
> is clear enough if you're already thinking about Open Firmware.  But
> this appears in a bunch of places where you might come across it
> without that context, in which case it could be kind of confusing.  So
> I'd suggest using "of_client" or "of_client_interface" in most places
> you're using "client".
> 
>>  obj-$(CONFIG_SPAPR_RNG) +=  spapr_rng.o
>>  # IBM PowerNV
>>  obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o pnv_psi.o 
>> pnv_occ.o pnv_bmc.o
>> diff --git a/include/hw/loader.h b/include/hw/loader.h
>> index 48a96cd55904..a2f58077a47e 100644
>> --- a/include/hw/loader.h
>> +++ b/include/hw/loader.h
>> @@ -262,6 +262,7 @@ MemoryRegion *rom_add_blob(const char *name, const void 
>> *blob, 

Re: [PATCH qemu v2] spapr: Kill SLOF

2020-01-05 Thread David Gibson
On Mon, Jan 06, 2020 at 10:42:42AM +1100, Alexey Kardashevskiy wrote:
> The Petitboot bootloader is way more advanced than SLOF is ever going to
> be as Petitboot comes with the full-featured Linux kernel with all
> the drivers, and initramdisk with quite user friendly interface.
> The problem with ditching SLOF is that an unmodified pseries kernel can
> either start via:
> 1. kexec, this requires presence of RTAS and skips
> ibm,client-architecture-support entirely;
> 2. normal boot, this heavily relies on the OF1275 client interface to
> fetch the device tree and do early setup (claim memory).
> 
> This adds a new bios-less mode to the pseries machine: "bios=on|off".
> When enabled, QEMU does not load SLOF and jumps to the kernel from
> "-kernel".
> 
> The client interface is implemented exactly as RTAS - a 20 bytes blob,
> right next after the RTAS blob. The entry point is passed to the kernel
> via GPR5.
> 
> This implements a handful of client interface methods just to get going.
> In particular, this implements the device tree fetching,
> ibm,client-architecture-support and instantiate-rtas.
> 
> This implements changing FDT properties only for "linux,rtas-base" and
> "linux,rtas-entry", just to get early boot going.
> 
> This assigns "phandles" to device tree nodes as there is no more SLOF
> and OF nodes addresses of which served as phandle values.
> This keeps predefined nodes (such as XICS/NVLINK/...) unchanged.
> phandles are regenerated at every FDT rebuild.
> 
> When bios=off, this adds "/chosen" every time QEMU builds a tree.
> 
> This implements "claim" which the client (Linux) uses for memory
> allocation; this is also  used by QEMU for claiming kernel/initrd images,
> client interface entry point, RTAS and the initial stack.
> 
> While at this, add a "kernel-addr" machine parameter to allow moving
> the kernel in memory. This is useful for debugging if the kernel is
> loaded at @0, although not necessary.
> 
> This does not implement instances properly but in order to boot a VM,
> we only really need a stdout instance and the "/" instance for
> "call-method", we fake these.

As we've discussed, I really like the idea of this.  It think the
basic approach looks good too.

As you probably realize, there are quite a lot of things to be
polished though.  Comments below.

> 
> The test command line:
> 
> qemu-system-ppc64 \
> -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> -mon id=MON0,chardev=STDIO0,mode=readline \
> -nographic \
> -vga none \
> -kernel vmldbg \
> -machine pseries,bios=off \
> -m 4G \
> -enable-kvm \
> -initrd pb/rootfs.cpio.xz \
> img/u1804-64le.qcow2 \
> -snapshot \
> -smp 8,threads=8 \
> -L /home/aik/t/qemu-ppc64-bios/ \
> -trace events=qemu_trace_events \
> -d guest_errors \
> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh37874 \
> -mon chardev=SOCKET0,mode=control
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> Changes:
> v2:
> * fixed claim()
> * added "setprop"
> * cleaner client interface and RTAS blobs management
> * boots to petitboot and further to the target system
> * more trace points
> ---
>  hw/ppc/Makefile.objs   |   1 +
>  include/hw/loader.h|   1 +
>  include/hw/ppc/spapr.h |  25 ++-
>  hw/ppc/spapr.c | 231 ++--
>  hw/ppc/spapr_client.c  | 464 +
>  hw/ppc/spapr_hcall.c   |  49 +++--
>  hw/ppc/trace-events|   9 +
>  7 files changed, 743 insertions(+), 37 deletions(-)
>  create mode 100644 hw/ppc/spapr_client.c
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 101e9fc59185..ce31c0acd2fb 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -6,6 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
> spapr_rtas.o
>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
>  obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o spapr_irq.o
>  obj-$(CONFIG_PSERIES) += spapr_tpm_proxy.o
> +obj-$(CONFIG_PSERIES) += spapr_client.o

This applies in a bunch of places here.  Just calling things "client"
is clear enough if you're already thinking about Open Firmware.  But
this appears in a bunch of places where you might come across it
without that context, in which case it could be kind of confusing.  So
I'd suggest using "of_client" or "of_client_interface" in most places
you're using "client".

>  obj-$(CONFIG_SPAPR_RNG) +=  spapr_rng.o
>  # IBM PowerNV
>  obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o pnv_psi.o 
> pnv_occ.o pnv_bmc.o
> diff --git a/include/hw/loader.h b/include/hw/loader.h
> index 48a96cd55904..a2f58077a47e 100644
> --- a/include/hw/loader.h
> +++ b/include/hw/loader.h
> @@ -262,6 +262,7 @@ MemoryRegion *rom_add_blob(const char *name, const void 
> *blob, size_t len,
>  int rom_add_elf_program(const char *name, GMappedFile *mapped_file, void 
> *data,
>  size_t datasize, size_t romsize, hwaddr addr,
>