Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-28 Thread Christoffer Dall
On Tue, Jan 28, 2014 at 03:47:32PM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2014-01-27 at 16:44 -0800, Christoffer Dall wrote:
> 
> > I'm loosing track of this discussion, Ben, can you explain a bit?  You
> > wrote:
> > 
> >   Having a byte array coming in that represents what the CPU does in its
> >   current byte order means you do *NOT* need to query the endianness of
> >   the guest CPU from userspace.
> > 
> > What does "a byte array that represents what the CPU does in its current
> > byte order" mean in this context.  Do you mean the VCPU or the physical
> > CPU when you say CPU.
> 
> It doesn't matter once it's a byte array in address order. Again this is
> the *right* abstraction for the kernel ABI, because you do not care
> about the endianness of either side, guest or host.
> 
> It makes no sense to treat a modern CPU data bus as having an MSB and an
> LSB (even if they have it sometimes on the block diagram). Only when
> *interpreting a value* on that bus, such as an *address* does the
> endianness become of use.
> 
> Treat the bus instead as an ordered sequence of bytes in ascending
> address order and most of the complexity goes away.
> 
> From there, for a given device, it all depends which bytes *that device*
> choses to consider as being the MSB vs. LSB. It's not even a bus thing,
> though of course some busses suggest an endianness, and some like PCI
> mandates it for configuration space. 
> 
> But it remains a device-side choice.
> 
> > I read your text as saying "just do a store of the register into the
> > data pointer and don't worry about endianness", but somebody, somewhere,
> > has to check the VCPU endianness setting.
> > 
> > I'm probably wrong, and you are probably the right person to clear this
> > up, but can you formulate exactly what you think the KVM ABI is and how
> > you would put it in Documentation/virtual/kvm/api.txt?
> > 
> > My point of view is that it is KVM that needs to do this, and it should
> > "emulate the CPU" by performing a byteswap in the case where the CPU
> > E-bit is set on ARM, but this is an ARM-centric way of looking at
> > things.
> 
> The ABI going to qemu should be (and inside qemu from TCG to the
> emulation) that the CPU did an access of N bytes wide at address A
> whose value is the byte array data[] in ascending address order.
> 
OK, I've sent a v3 of the ABI clarification patch following the wording
from you and Scott.  I think we all agree what the format should look
like at this point and hopefully we can quickly agree about a text to
describe that.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-28 Thread Avi Kivity

On 01/28/2014 01:27 AM, Benjamin Herrenschmidt wrote:

On Wed, 2014-01-22 at 17:29 +, Peter Maydell wrote:

Basically if it would be on real bus, get byte value
that corresponds to phys_addr + 0 address place
it into data[0], get byte value that corresponds to
phys_addr + 1 address place it into data[1], etc.

This just isn't how real buses work.

Actually it can be :-)


  There is no
"address + 1, address + 2". There is a single address
for the memory transaction and a set of data on
data lines and some separate size information.
How the device at the far end of the bus chooses
to respond to 32 bit accesses to address X versus
8 bit accesses to addresses X through X+3 is entirely
its own business and unrelated to the CPU.

However the bus has a definition of what byte lane is the lowest in
address order. Byte order invariance is an important function of
all busses.

I think that trying to treat it any differently than an address
ordered series of bytes is going to turn into a complete and
inextricable mess.


I agree.

The two options are:

 (address, byte array, length)

and

 (address, value, word size, endianness)

the first is the KVM ABI, the second is how MemoryRegions work. Both are 
valid, but the first is more general (supports the 3-byte accesses 
sometimes generated on x86).






  (It would
be perfectly possible to have a device which when
you read from address X as 32 bits returned 0x12345678,
when you read from address X as 16 bits returned
0x9abc, returned 0x42 for an 8 bit read from X+1,
and so on. Having byte reads from X..X+3 return
values corresponding to parts of the 32 bit access
is purely a convention.)

Right, it's possible. It's also stupid and not how most modern devices
and busses work. Besides there is no reason why that can't be
implemented with Victor proposal anyway.


Right.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM and variable-endianness guest CPUs

2014-01-28 Thread Avi Kivity

On 01/22/2014 12:22 PM, Peter Maydell wrote:

On 22 January 2014 05:39, Victor Kamensky  wrote:

Hi Guys,

Christoffer and I had a bit heated chat :) on this
subject last night. Christoffer, really appreciate
your time! We did not really reach agreement
during the chat and Christoffer asked me to follow
up on this thread.
Here it goes. Sorry, it is very long email.

I don't believe we can assign any endianity to
mmio.data[] byte array. I believe mmio.data[] and
mmio.len acts just memcpy and that is all. As
memcpy does not imply any endianity of underlying
data mmio.data[] should not either.

This email is about five times too long to be actually
useful, but the major issue here is that the data being
transferred is not just a bag of bytes. The data[]
array plus the size field are being (mis)used to indicate
that the memory transaction is one of:
  * an 8 bit access
  * a 16 bit access of some uint16_t value
  * a 32 bit access of some uint32_t value
  * a 64 bit access of some uint64_t value

exactly as a CPU hardware bus would do. It's
because the API is defined in this awkward way with
a uint8_t[] array that we need to specify how both
sides should go from the actual properties of the
memory transaction (value and size) to filling in the
array.


That is not how x86 hardware works.  Back when there was a bus, there 
were no address lines A0-A2; instead we had 8 byte enables BE0-BE7.  A 
memory transaction placed the qword address on the address lines and 
asserted the byte enables for the appropriate byte, word, dword, or 
qword, shifted for the low order bits of the address.


If you generated an unaligned access, the transaction was split into 
two, so an 8-byte write might appear as a 5-byte write followed by a 
3-byte write.  In fact, the two halves of the transaction might go to 
different devices, or one might go to a device and another to memory.


PCI works the same way.





Furthermore, device endianness is entirely irrelevant
for deciding the properties of mmio.data[], because the
thing we're modelling here is essentially the CPU->bus
interface. In real hardware, the properties of individual
devices on the bus are irrelevant to how the CPU's
interface to the bus behaves, and similarly here the
properties of emulated devices don't affect how KVM's
interface to QEMU userspace needs to work.

MemoryRegion's 'endianness' field, incidentally, is
a dreadful mess that we should get rid of. It is attempting
to model the property that some buses/bridges have of
doing byte-lane-swaps on data that passes through as
a property of the device itself. It would be better if we
modelled it properly, with container regions having possible
byte-swapping and devices just being devices.



No, that is not what it is modelling.

Suppose a little endian cpu writes a dword 0x12345678 to address 0 of a 
device, and read back a byte from address 0.  What value do you read back?


Some (most) devices will return 0x78, others will return 0x12. Other 
devices don't support mixed sizes at all, but many do.  PCI 
configuration space is an example; it is common to read both Device ID 
and Vendor ID with a single 32-bit transaction, but you can also read 
them separately with two 16-bit transaction.  Because PCI is 
little-endian, the Vendor ID at address 0 will be returned as the low 
word of the 32-bit read of a little-endian processor.


If you remove device endianness from memory regions, you have to pass 
the data as arrays of bytes (like the KVM interface) and let the device 
assemble words from those bytes itself, taking into consideration its 
own endianness.  What MemoryRegion's endianness does is let the device 
declare its endianness to the API and let it do all the work.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Mon, 2014-01-27 at 16:44 -0800, Christoffer Dall wrote:

> I'm loosing track of this discussion, Ben, can you explain a bit?  You
> wrote:
> 
>   Having a byte array coming in that represents what the CPU does in its
>   current byte order means you do *NOT* need to query the endianness of
>   the guest CPU from userspace.
> 
> What does "a byte array that represents what the CPU does in its current
> byte order" mean in this context.  Do you mean the VCPU or the physical
> CPU when you say CPU.

It doesn't matter once it's a byte array in address order. Again this is
the *right* abstraction for the kernel ABI, because you do not care
about the endianness of either side, guest or host.

It makes no sense to treat a modern CPU data bus as having an MSB and an
LSB (even if they have it sometimes on the block diagram). Only when
*interpreting a value* on that bus, such as an *address* does the
endianness become of use.

Treat the bus instead as an ordered sequence of bytes in ascending
address order and most of the complexity goes away.

>From there, for a given device, it all depends which bytes *that device*
choses to consider as being the MSB vs. LSB. It's not even a bus thing,
though of course some busses suggest an endianness, and some like PCI
mandates it for configuration space. 

But it remains a device-side choice.

> I read your text as saying "just do a store of the register into the
> data pointer and don't worry about endianness", but somebody, somewhere,
> has to check the VCPU endianness setting.
> 
> I'm probably wrong, and you are probably the right person to clear this
> up, but can you formulate exactly what you think the KVM ABI is and how
> you would put it in Documentation/virtual/kvm/api.txt?
> 
> My point of view is that it is KVM that needs to do this, and it should
> "emulate the CPU" by performing a byteswap in the case where the CPU
> E-bit is set on ARM, but this is an ARM-centric way of looking at
> things.

The ABI going to qemu should be (and inside qemu from TCG to the
emulation) that the CPU did an access of N bytes wide at address A
whose value is the byte array data[] in ascending address order.

Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Christoffer Dall
On Tue, Jan 28, 2014 at 11:36:13AM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2014-01-27 at 23:49 +, Peter Maydell wrote:
> > 
> > Er, what? If we make the array be guest's current order
> > then by definition userspace has to look at the guest's
> > current endianness. I agree that would be bad. Either
> > of the two current proposals (host kernel order; guest
> > CPU's native/natural/default-byte-order) avoid it, though.
> 
> No, this has nothing to do with the guest endianness, and
> all to do about the (hopefully) byte-address invariant bus we have
> on the processor.
> 
> Anyway, the existing crap is ABI so I suspect we have to stick with it,
> just maybe document it better.
> 

I'm loosing track of this discussion, Ben, can you explain a bit?  You
wrote:

  Having a byte array coming in that represents what the CPU does in its
  current byte order means you do *NOT* need to query the endianness of
  the guest CPU from userspace.

What does "a byte array that represents what the CPU does in its current
byte order" mean in this context.  Do you mean the VCPU or the physical
CPU when you say CPU.

I read your text as saying "just do a store of the register into the
data pointer and don't worry about endianness", but somebody, somewhere,
has to check the VCPU endianness setting.

I'm probably wrong, and you are probably the right person to clear this
up, but can you formulate exactly what you think the KVM ABI is and how
you would put it in Documentation/virtual/kvm/api.txt?

My point of view is that it is KVM that needs to do this, and it should
"emulate the CPU" by performing a byteswap in the case where the CPU
E-bit is set on ARM, but this is an ARM-centric way of looking at
things.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Christoffer Dall
On Tue, Jan 28, 2014 at 11:32:41AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2014-01-23 at 20:11 -0800, Victor Kamensky wrote:
> > > I would take 50 byteswaps with a clear ABI any day over an obscure
> > > standard that can avoid a single hardware-on-register instruction.
> > This
> > > is about designing a clean software interface, not about building an
> > > optimized integrated stack.
> > >
> > > Unfortunately, this is going nowhere, so I think we need to stop
> > this
> > > thread.  As you can see I have sent a patch as a clarification to
> > the
> > > ABI, if it's merged we can move on with more important tasks.
> > 
> > OK, that is fine. I still believe is not the best choice,
> > but I agree that we need to move on. I will respin my
> > V7 KVM BE patches according to this new semantics, I will
> > integrate comments that you (thanks!) and others gave me
> > over mailing list and post my series again when it is ready.
> 
> Right, the whole "host endian" is a horrible choice from every way you
> look at it, but I'm afraid it's unfixable since it's already ABI :-(
> 
Why is it a horrible choice?

I don't think it's actually ABI at this point, it's undefined.

The only thing fixed is PPC BE host and ARM LE host, and in both cases
we currently perform a byteswap in KVM if the guest is a different
endianness.

Honestly I don't care which way it's defined, as long as it's defined
somehow, and I have not yet seen anyone formulate how the ABI
specification should be worded, so people clearly understand what's
going on.

If you take a look at the v2 patch "KVM: Specify byte order for
KVM_EXIT_MMIO", that's where it ended up.

If you can formulate something with your experience in endianness that
makes this clear, it would be extremely helpful.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Mon, 2014-01-27 at 23:49 +, Peter Maydell wrote:
> 
> Er, what? If we make the array be guest's current order
> then by definition userspace has to look at the guest's
> current endianness. I agree that would be bad. Either
> of the two current proposals (host kernel order; guest
> CPU's native/natural/default-byte-order) avoid it, though.

No, this has nothing to do with the guest endianness, and
all to do about the (hopefully) byte-address invariant bus we have
on the processor.

Anyway, the existing crap is ABI so I suspect we have to stick with it,
just maybe document it better.

Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Thu, 2014-01-23 at 20:11 -0800, Victor Kamensky wrote:
> > I would take 50 byteswaps with a clear ABI any day over an obscure
> > standard that can avoid a single hardware-on-register instruction.
> This
> > is about designing a clean software interface, not about building an
> > optimized integrated stack.
> >
> > Unfortunately, this is going nowhere, so I think we need to stop
> this
> > thread.  As you can see I have sent a patch as a clarification to
> the
> > ABI, if it's merged we can move on with more important tasks.
> 
> OK, that is fine. I still believe is not the best choice,
> but I agree that we need to move on. I will respin my
> V7 KVM BE patches according to this new semantics, I will
> integrate comments that you (thanks!) and others gave me
> over mailing list and post my series again when it is ready.

Right, the whole "host endian" is a horrible choice from every way you
look at it, but I'm afraid it's unfixable since it's already ABI :-(

Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt

> The point is simple, and Peter has made it over and over:
> Any consumer of a memory operation sees "value, len, address".
> 
> This is what KVM_EXIT_MMIO emulates.  So just by knowing the ABI
> definition and having a pointer to the structure you need to be able to
> tell me "value, len, address".

But that's useless because it doesn't tell you the byte order
of the value which is critical for emulation unless you *defined* in
your ABI the byte order of the value, and thus include an artificial
swap when the guest is in a different endian mode than the host.


My understanding is that ARM is byte-address invariant, as is powerpc,
so it makes a LOT more sense to carry a sequence of address ordered
bytes instead which will correspond to what the guest code thinks
it's writing, and have the device respond appropriately based on
the endianness of the bus it sits on or the device itself.


> > >  (2) the API between kernel and userspace needs to define
> > >  the semantics of mmio.data, ie how to map between
> > >  "x byte wide transaction with value v" and the array,
> > >  and that is primarily what this conversation is about
> > >  (3) the only choice which is both (a) sensible and (b)
> > >  not breaking existing usage is to say "the array is
> > >  in host-kernel-byte-order"
> > >  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
> > >  the same, because in the ARM case it is doing an
> > >  internal-to-CPU byteswap, and in the PPC case it is not

I very much doubt that there is a difference here, I'm not sure about
that business with "internal byteswap".

The order in which the bytes of a word are presented on the bus changes
depending on the core endianness. If that's what you call a "byteswap"
then both ARM and PPC do it when they are in their "other" endian,
but that confusion comes from assuming that a data bus has an endianness
at all to begin with.

I was hoping that by 2014, such ideas were things of the past.

> > That is one of the key disconnects. I'll go find real examples
> > in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
> > everybody sake's here is summary of the disconnect:
> > 
> > If we have the same h/w connected to memory bus in ARM
> > and PPC systems and we have the following three pieces
> > of code that work with r0 having same device same
> > register address:
> > 
> > 1. ARM LE word write of  0x04030201:
> > setend le
> > mov r1, #0x04030201
> > str r1, [r0]
> > 
> > 2. ARM BE word write of 0x01020304:
> > setend be
> > mov r1, #0x01020304
> > str r1, [r0]
> > 
> > 3. PPC BE word write of 0x01020304:
> > lis r1,0x102
> > ori r1,r1,0x304
> > stwr1,0(r0)
> > 
> > I claim that h/w will see the same data on bus lines in all
> > three cases, and h/w would acts the same in all three
> > cases. Peter says that ARM BE and PPC BE case h/w
> > would act differently.
> > 
> > If anyone else can offer opinion on that while I am looking
> > for real examples that would be great.
> > 
> 
> I really don't think listing all these examples help.  You need to focus
> on the key points that Peter listed in his previous mail.
> 
> I tried in our chat to ask you this questions:
> 
> vcpu_data_host_to_guest() is handling a read from an emulated device.
> All the info you have is:
> (1) len of memory access
> (2) mmio.data pointer
> (3) destination register
> (4) host CPU endianness
> (5) guest CPU endianness
> 
> Based on this information alone, you need to decide whether you do a
> byteswap or not before loading the hardware register upon returning to
> the guest.
> 
> You will find it impossible to answer, because you don't know the layout
> of mmio.data, and that is the thing we are trying to solve.
> 
> If you cannot reply to this point in less than 50 lines or mention
> anything about devices being LE or BE or come with examples, I am
> probably not going to read your reply, sorry.
> 
> -Christoffer


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Thu, 2014-01-23 at 15:33 +, Peter Maydell wrote:
>  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
>  the same, because in the ARM case it is doing an
>  internal-to-CPU byteswap, and in the PPC case it is not

Aren't they both byte-order invariant ?

In that case they are the same.

Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Tue, 2014-01-28 at 11:07 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2014-01-23 at 15:33 +, Peter Maydell wrote:
> >  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
> >  the same, because in the ARM case it is doing an
> >  internal-to-CPU byteswap, and in the PPC case it is not
> 
> Aren't they both byte-order invariant ?

I meant byte-address...

> In that case they are the same.
> 
> Ben.
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Peter Maydell
On 27 January 2014 23:34, Benjamin Herrenschmidt
 wrote:
> On Wed, 2014-01-22 at 20:02 +, Peter Maydell wrote:
>>
>> Defining it as being always guest-order would mean that
>> userspace had to continually look at the guest CPU
>> endianness bit, which is annoying and awkward.
>>
>> Defining it as always host-endian order is the most
>> reasonable option available. It also happens to work
>> for the current QEMU code, which is nice.
>
> No.
>
> Having a byte array coming in that represents what the CPU does in its
> current byte order means you do *NOT* need to query the endianness of
> the guest CPU from userspace.

Er, what? If we make the array be guest's current order
then by definition userspace has to look at the guest's
current endianness. I agree that would be bad. Either
of the two current proposals (host kernel order; guest
CPU's native/natural/default-byte-order) avoid it, though.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Wed, 2014-01-22 at 20:02 +, Peter Maydell wrote:
> 
> Defining it as being always guest-order would mean that
> userspace had to continually look at the guest CPU
> endianness bit, which is annoying and awkward.
> 
> Defining it as always host-endian order is the most
> reasonable option available. It also happens to work
> for the current QEMU code, which is nice.

No.

Having a byte array coming in that represents what the CPU does in its
current byte order means you do *NOT* need to query the endianness of
the guest CPU from userspace.

Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Wed, 2014-01-22 at 11:29 -0800, Victor Kamensky wrote:
> I don't see why you so attached to desire to describe
> data part of memory transaction as just one of int
> types. If we are talking about bunch of hypothetical
> cases imagine such bus that allow transaction with
> size of 6 bytes. How do you describe such data in
> your ints speak? What endianity you can assign to
> sequence of 6 bytes? While note that description of
> such transaction as set of 6 byte values at address
> $whatever makes perfect sense.

Absolutely. For example, the "real" bus out of a POWER8 core is
something like 128 bytes wide though I wouldn't be surprised if it was
serialized, I don't actually know the details, it's all inside the chip.
The interconnect between chip is a multi-lane elastic interface whose
width has nothing to do with the payload size. Same goes with PCIe.

The only thing that can more/less sanely represent all of these things
is a series of bytes ordered by address, with attributes such as the
access size (or byte enables if that makes more sense or we want to
emulate really funky stuff) and possibly other decoration that some
architectures might want to have (such as caching/combining attributes
etc... which *might* be useful under some circumstances).

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Wed, 2014-01-22 at 17:29 +, Peter Maydell wrote:
> 
> > Basically if it would be on real bus, get byte value
> > that corresponds to phys_addr + 0 address place
> > it into data[0], get byte value that corresponds to
> > phys_addr + 1 address place it into data[1], etc.
> 
> This just isn't how real buses work.

Actually it can be :-)

>  There is no
> "address + 1, address + 2". There is a single address
> for the memory transaction and a set of data on
> data lines and some separate size information.
> How the device at the far end of the bus chooses
> to respond to 32 bit accesses to address X versus
> 8 bit accesses to addresses X through X+3 is entirely
> its own business and unrelated to the CPU.

However the bus has a definition of what byte lane is the lowest in
address order. Byte order invariance is an important function of
all busses.

I think that trying to treat it any differently than an address
ordered series of bytes is going to turn into a complete and
inextricable mess.

>  (It would
> be perfectly possible to have a device which when
> you read from address X as 32 bits returned 0x12345678,
> when you read from address X as 16 bits returned
> 0x9abc, returned 0x42 for an 8 bit read from X+1,
> and so on. Having byte reads from X..X+3 return
> values corresponding to parts of the 32 bit access
> is purely a convention.)

Right, it's possible. It's also stupid and not how most modern devices
and busses work. Besides there is no reason why that can't be
implemented with Victor proposal anyway.

Ben.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Victor Kamensky
On 23 January 2014 18:14, Christoffer Dall  wrote:
> On Thu, Jan 23, 2014 at 04:50:18PM -0800, Victor Kamensky wrote:
>> On 23 January 2014 12:45, Christoffer Dall  
>> wrote:
>> > On Thu, Jan 23, 2014 at 08:25:35AM -0800, Victor Kamensky wrote:
>> >> On 23 January 2014 07:33, Peter Maydell  wrote:
>> >> > On 23 January 2014 15:06, Victor Kamensky  
>> >> > wrote:
>> >> >> In [1] I wrote
>> >> >>
>> >> >> "I don't see why you so attached to desire to describe
>> >> >> data part of memory transaction as just one of int
>> >> >> types. If we are talking about bunch of hypothetical
>> >> >> cases imagine such bus that allow transaction with
>> >> >> size of 6 bytes. How do you describe such data in
>> >> >> your ints speak? What endianity you can assign to
>> >> >> sequence of 6 bytes? While note that description of
>> >> >> such transaction as set of 6 byte values at address
>> >> >> $whatever makes perfect sense."
>> >> >>
>> >> >> But notice that in your next reply [2] you just dropped it
>> >> >
>> >> > Yes. This is because it was one of the places where
>> >> > I would have just had to repeat "no, I'm afraid you're wrong
>> >> > about how hardware works". I think in general it's going
>> >> > to be better if I don't try to reply point by point to this
>> >> > email; I think you should go back and reread the emails I've
>> >> > sent. Key points:
>> >> >  (1) hardware is not doing anything involving arrays
>> >> >  of bytes
>> >>
>> >> Array of bytes or integers is just a way to describe data lines
>> >> on the bus. Did you look at this document?
>> >>
>> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
>> >>
>> >> A0, A1, ,,, A7 byte values are the same for both LE and BE-8
>> >> case (first two columns in the table) and they unambiguously
>> >> describe data bus signals
>> >>
>> >
>> > The point is simple, and Peter has made it over and over:
>> > Any consumer of a memory operation sees "value, len, address".
>>
>> and "endianess" of operation.
>
> no, value is value, is value.  By a consumer I mean whatever sits and
> the end of the memory bus.  There is no endianness.
>
>>
>> here is memory operation
>>
>> *(int *) (0x1000) = 0x01020304;
>
> this is from the CPU's perspective and involves specifics of a
> programming language and a compiler.  You cannot compare to the above.

compare it with this description of memory like this
unsigned char mem[] = {0x4, 0x3, 0x2, 0x1};
that is the same memory content from *anyone* perspective
at address mem we have 0x4, at addres mem+1 we have 0x3, etc

>>
>> can you tell how memory will look like at 0x1000 address - you can't
>> in LE it will look one way in BE byteswapped.
>>
>> > This is what KVM_EXIT_MMIO emulates.  So just by knowing the ABI
>> > definition and having a pointer to the structure you need to be able to
>> > tell me "value, len, address".
>> >
>> >> >  (2) the API between kernel and userspace needs to define
>> >> >  the semantics of mmio.data, ie how to map between
>> >> >  "x byte wide transaction with value v" and the array,
>> >> >  and that is primarily what this conversation is about
>> >> >  (3) the only choice which is both (a) sensible and (b)
>> >> >  not breaking existing usage is to say "the array is
>> >> >  in host-kernel-byte-order"
>> >> >  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
>> >> >  the same, because in the ARM case it is doing an
>> >> >  internal-to-CPU byteswap, and in the PPC case it is not
>> >>
>> >> That is one of the key disconnects. I'll go find real examples
>> >> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
>> >> everybody sake's here is summary of the disconnect:
>> >>
>> >> If we have the same h/w connected to memory bus in ARM
>> >> and PPC systems and we have the following three pieces
>> >> of code that work with r0 having same device same
>> >> register address:
>> >>
>> >> 1. ARM LE word write of  0x04030201:
>> >> setend le
>> >> mov r1, #0x04030201
>> >> str r1, [r0]
>> >>
>> >> 2. ARM BE word write of 0x01020304:
>> >> setend be
>> >> mov r1, #0x01020304
>> >> str r1, [r0]
>> >>
>> >> 3. PPC BE word write of 0x01020304:
>> >> lis r1,0x102
>> >> ori r1,r1,0x304
>> >> stwr1,0(r0)
>> >>
>> >> I claim that h/w will see the same data on bus lines in all
>> >> three cases, and h/w would acts the same in all three
>> >> cases. Peter says that ARM BE and PPC BE case h/w
>> >> would act differently.
>> >>
>> >> If anyone else can offer opinion on that while I am looking
>> >> for real examples that would be great.
>> >>
>> >
>> > I really don't think listing all these examples help.
>>
>> I think Peter is wrong in his understanding how real
>> BE PPC kernel drivers work with h/w mapped devices. Going
>> with such misunderstanding to suggest how it should hold
>> info in emulated mmio case is quite strange.
>>
>> > You need to focus
>> > on the key points that Peter listed in his previous mail.
>> >
>> > I trie

Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Christoffer Dall
On Thu, Jan 23, 2014 at 04:50:18PM -0800, Victor Kamensky wrote:
> On 23 January 2014 12:45, Christoffer Dall  
> wrote:
> > On Thu, Jan 23, 2014 at 08:25:35AM -0800, Victor Kamensky wrote:
> >> On 23 January 2014 07:33, Peter Maydell  wrote:
> >> > On 23 January 2014 15:06, Victor Kamensky  
> >> > wrote:
> >> >> In [1] I wrote
> >> >>
> >> >> "I don't see why you so attached to desire to describe
> >> >> data part of memory transaction as just one of int
> >> >> types. If we are talking about bunch of hypothetical
> >> >> cases imagine such bus that allow transaction with
> >> >> size of 6 bytes. How do you describe such data in
> >> >> your ints speak? What endianity you can assign to
> >> >> sequence of 6 bytes? While note that description of
> >> >> such transaction as set of 6 byte values at address
> >> >> $whatever makes perfect sense."
> >> >>
> >> >> But notice that in your next reply [2] you just dropped it
> >> >
> >> > Yes. This is because it was one of the places where
> >> > I would have just had to repeat "no, I'm afraid you're wrong
> >> > about how hardware works". I think in general it's going
> >> > to be better if I don't try to reply point by point to this
> >> > email; I think you should go back and reread the emails I've
> >> > sent. Key points:
> >> >  (1) hardware is not doing anything involving arrays
> >> >  of bytes
> >>
> >> Array of bytes or integers is just a way to describe data lines
> >> on the bus. Did you look at this document?
> >>
> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
> >>
> >> A0, A1, ,,, A7 byte values are the same for both LE and BE-8
> >> case (first two columns in the table) and they unambiguously
> >> describe data bus signals
> >>
> >
> > The point is simple, and Peter has made it over and over:
> > Any consumer of a memory operation sees "value, len, address".
> 
> and "endianess" of operation.

no, value is value, is value.  By a consumer I mean whatever sits and
the end of the memory bus.  There is no endianness.

> 
> here is memory operation
> 
> *(int *) (0x1000) = 0x01020304;

this is from the CPU's perspective and involves specifics of a
programming language and a compiler.  You cannot compare to the above.

> 
> can you tell how memory will look like at 0x1000 address - you can't
> in LE it will look one way in BE byteswapped.
> 
> > This is what KVM_EXIT_MMIO emulates.  So just by knowing the ABI
> > definition and having a pointer to the structure you need to be able to
> > tell me "value, len, address".
> >
> >> >  (2) the API between kernel and userspace needs to define
> >> >  the semantics of mmio.data, ie how to map between
> >> >  "x byte wide transaction with value v" and the array,
> >> >  and that is primarily what this conversation is about
> >> >  (3) the only choice which is both (a) sensible and (b)
> >> >  not breaking existing usage is to say "the array is
> >> >  in host-kernel-byte-order"
> >> >  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
> >> >  the same, because in the ARM case it is doing an
> >> >  internal-to-CPU byteswap, and in the PPC case it is not
> >>
> >> That is one of the key disconnects. I'll go find real examples
> >> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
> >> everybody sake's here is summary of the disconnect:
> >>
> >> If we have the same h/w connected to memory bus in ARM
> >> and PPC systems and we have the following three pieces
> >> of code that work with r0 having same device same
> >> register address:
> >>
> >> 1. ARM LE word write of  0x04030201:
> >> setend le
> >> mov r1, #0x04030201
> >> str r1, [r0]
> >>
> >> 2. ARM BE word write of 0x01020304:
> >> setend be
> >> mov r1, #0x01020304
> >> str r1, [r0]
> >>
> >> 3. PPC BE word write of 0x01020304:
> >> lis r1,0x102
> >> ori r1,r1,0x304
> >> stwr1,0(r0)
> >>
> >> I claim that h/w will see the same data on bus lines in all
> >> three cases, and h/w would acts the same in all three
> >> cases. Peter says that ARM BE and PPC BE case h/w
> >> would act differently.
> >>
> >> If anyone else can offer opinion on that while I am looking
> >> for real examples that would be great.
> >>
> >
> > I really don't think listing all these examples help.
> 
> I think Peter is wrong in his understanding how real
> BE PPC kernel drivers work with h/w mapped devices. Going
> with such misunderstanding to suggest how it should hold
> info in emulated mmio case is quite strange.
> 
> > You need to focus
> > on the key points that Peter listed in his previous mail.
> >
> > I tried in our chat to ask you this questions:
> >
> > vcpu_data_host_to_guest() is handling a read from an emulated device.
> > All the info you have is:
> > (1) len of memory access
> > (2) mmio.data pointer
> > (3) destination register
> > (4) host CPU endianness
> > (5) guest CPU endianness
> >
> > Based on this information alone, you need to decide whether you do a
> > byteswap or n

Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Victor Kamensky
On 23 January 2014 12:45, Christoffer Dall  wrote:
> On Thu, Jan 23, 2014 at 08:25:35AM -0800, Victor Kamensky wrote:
>> On 23 January 2014 07:33, Peter Maydell  wrote:
>> > On 23 January 2014 15:06, Victor Kamensky  
>> > wrote:
>> >> In [1] I wrote
>> >>
>> >> "I don't see why you so attached to desire to describe
>> >> data part of memory transaction as just one of int
>> >> types. If we are talking about bunch of hypothetical
>> >> cases imagine such bus that allow transaction with
>> >> size of 6 bytes. How do you describe such data in
>> >> your ints speak? What endianity you can assign to
>> >> sequence of 6 bytes? While note that description of
>> >> such transaction as set of 6 byte values at address
>> >> $whatever makes perfect sense."
>> >>
>> >> But notice that in your next reply [2] you just dropped it
>> >
>> > Yes. This is because it was one of the places where
>> > I would have just had to repeat "no, I'm afraid you're wrong
>> > about how hardware works". I think in general it's going
>> > to be better if I don't try to reply point by point to this
>> > email; I think you should go back and reread the emails I've
>> > sent. Key points:
>> >  (1) hardware is not doing anything involving arrays
>> >  of bytes
>>
>> Array of bytes or integers is just a way to describe data lines
>> on the bus. Did you look at this document?
>>
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
>>
>> A0, A1, ,,, A7 byte values are the same for both LE and BE-8
>> case (first two columns in the table) and they unambiguously
>> describe data bus signals
>>
>
> The point is simple, and Peter has made it over and over:
> Any consumer of a memory operation sees "value, len, address".

and "endianess" of operation.

here is memory operation

*(int *) (0x1000) = 0x01020304;

can you tell how memory will look like at 0x1000 address - you can't
in LE it will look one way in BE byteswapped.

> This is what KVM_EXIT_MMIO emulates.  So just by knowing the ABI
> definition and having a pointer to the structure you need to be able to
> tell me "value, len, address".
>
>> >  (2) the API between kernel and userspace needs to define
>> >  the semantics of mmio.data, ie how to map between
>> >  "x byte wide transaction with value v" and the array,
>> >  and that is primarily what this conversation is about
>> >  (3) the only choice which is both (a) sensible and (b)
>> >  not breaking existing usage is to say "the array is
>> >  in host-kernel-byte-order"
>> >  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
>> >  the same, because in the ARM case it is doing an
>> >  internal-to-CPU byteswap, and in the PPC case it is not
>>
>> That is one of the key disconnects. I'll go find real examples
>> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
>> everybody sake's here is summary of the disconnect:
>>
>> If we have the same h/w connected to memory bus in ARM
>> and PPC systems and we have the following three pieces
>> of code that work with r0 having same device same
>> register address:
>>
>> 1. ARM LE word write of  0x04030201:
>> setend le
>> mov r1, #0x04030201
>> str r1, [r0]
>>
>> 2. ARM BE word write of 0x01020304:
>> setend be
>> mov r1, #0x01020304
>> str r1, [r0]
>>
>> 3. PPC BE word write of 0x01020304:
>> lis r1,0x102
>> ori r1,r1,0x304
>> stwr1,0(r0)
>>
>> I claim that h/w will see the same data on bus lines in all
>> three cases, and h/w would acts the same in all three
>> cases. Peter says that ARM BE and PPC BE case h/w
>> would act differently.
>>
>> If anyone else can offer opinion on that while I am looking
>> for real examples that would be great.
>>
>
> I really don't think listing all these examples help.

I think Peter is wrong in his understanding how real
BE PPC kernel drivers work with h/w mapped devices. Going
with such misunderstanding to suggest how it should hold
info in emulated mmio case is quite strange.

> You need to focus
> on the key points that Peter listed in his previous mail.
>
> I tried in our chat to ask you this questions:
>
> vcpu_data_host_to_guest() is handling a read from an emulated device.
> All the info you have is:
> (1) len of memory access
> (2) mmio.data pointer
> (3) destination register
> (4) host CPU endianness
> (5) guest CPU endianness
>
> Based on this information alone, you need to decide whether you do a
> byteswap or not before loading the hardware register upon returning to
> the guest.
>
> You will find it impossible to answer, because you don't know the layout
> of mmio.data, and that is the thing we are trying to solve.

Actually I am not arguing with above. I agree that
meaning of mmio.data should be better clarified.

I propose my clarification as array of bytes at
phys_addr address on BE-8,
byte invariant, memory bus. That unambiguously
describes data bus signals in case of BE-8 memory
bus. Please look at

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi

Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Victor Kamensky
On 23 January 2014 08:25, Victor Kamensky  wrote:
> On 23 January 2014 07:33, Peter Maydell  wrote:
>> On 23 January 2014 15:06, Victor Kamensky  wrote:
>>> In [1] I wrote
>>>
>>> "I don't see why you so attached to desire to describe
>>> data part of memory transaction as just one of int
>>> types. If we are talking about bunch of hypothetical
>>> cases imagine such bus that allow transaction with
>>> size of 6 bytes. How do you describe such data in
>>> your ints speak? What endianity you can assign to
>>> sequence of 6 bytes? While note that description of
>>> such transaction as set of 6 byte values at address
>>> $whatever makes perfect sense."
>>>
>>> But notice that in your next reply [2] you just dropped it
>>
>> Yes. This is because it was one of the places where
>> I would have just had to repeat "no, I'm afraid you're wrong
>> about how hardware works". I think in general it's going
>> to be better if I don't try to reply point by point to this
>> email; I think you should go back and reread the emails I've
>> sent. Key points:
>>  (1) hardware is not doing anything involving arrays
>>  of bytes
>
> Array of bytes or integers is just a way to describe data lines
> on the bus. Did you look at this document?
>
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
>
> A0, A1, ,,, A7 byte values are the same for both LE and BE-8
> case (first two columns in the table) and they unambiguously
> describe data bus signals
>
>>  (2) the API between kernel and userspace needs to define
>>  the semantics of mmio.data, ie how to map between
>>  "x byte wide transaction with value v" and the array,
>>  and that is primarily what this conversation is about
>>  (3) the only choice which is both (a) sensible and (b)
>>  not breaking existing usage is to say "the array is
>>  in host-kernel-byte-order"
>>  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
>>  the same, because in the ARM case it is doing an
>>  internal-to-CPU byteswap, and in the PPC case it is not
>
> That is one of the key disconnects. I'll go find real examples
> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
> everybody sake's here is summary of the disconnect:
>
> If we have the same h/w connected to memory bus in ARM
> and PPC systems and we have the following three pieces
> of code that work with r0 having same device same
> register address:
>
> 1. ARM LE word write of  0x04030201:
> setend le
> mov r1, #0x04030201
> str r1, [r0]
>
> 2. ARM BE word write of 0x01020304:
> setend be
> mov r1, #0x01020304
> str r1, [r0]
>
> 3. PPC BE word write of 0x01020304:
> lis r1,0x102
> ori r1,r1,0x304
> stwr1,0(r0)
>
> I claim that h/w will see the same data on bus lines in all
> three cases, and h/w would acts the same in all three
> cases. Peter says that ARM BE and PPC BE case h/w
> would act differently.
>
> If anyone else can offer opinion on that while I am looking
> for real examples that would be great.

Here is my example:

Let's look at isp1760 USB host controller (effectively
one used by TC2). Source code is at
drivers/usb/host/isp1760-hcd.c file. The driver could
be enabled in kernel by adding CONFIG_USB_ISP1760_HCD=y
config I enabled it in ppc image build, arm TC2 already
have it

isp1760 USB host controller driver registers are
in LE format. isp1760 USB host controller driver uses
reg_write32 function to write memory mapped controler
registers. That in turns calls writel, which is LE
device word write:

void reg_write32(void __iomem *base, u32 reg, u32 val)
{
writel(val, base + reg);
}

In C terms writel is LE word write function. It is
effective memory barrier, cpu_to_le32 and write.
cpu_to_le32 will do byteswap only in BE case.

LE ARM
--

2e04 :
2e04:   e92d4070push{r4, r5, r6, lr}
2e08:   e1a04000mov r4, r0
2e0c:   e1a05001mov r5, r1
2e10:   e1a06002mov r6, r2
2e14:   f57ff04edsb st
2e18:   e59f3018ldr r3, [pc, #24]   ; 2e38

2e1c:   e5933018ldr r3, [r3, #24]
2e20:   e353cmp r3, #0
2e24:   0a00beq 2e2c 
2e28:   e12fff33blx r3
2e2c:   e0841005add r1, r4, r5
2e30:   e5816000str r6, [r1] @ <---
2e34:   e8bd8070pop {r4, r5, r6, pc}
2e38:   .word   0x

operates in LE, just write value to device register mem location

BE ARM
--

0590 :
 590:   e92d4070push{r4, r5, r6, lr}
 594:   e1a04000mov r4, r0
 598:   e1a05001mov r5, r1
 59c:   e1a06002mov r6, r2
 5a0:   f57ff04edsb st
 5a4:   e59f301cldr r3, [pc, #28]   ; 5c8 
 5a8:   e5933018ldr r3, [r3, #24]
 5ac:   e353cmp r3, #0

Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-23 Thread Christoffer Dall
On Wed, Jan 22, 2014 at 02:27:29PM +0530, Anup Patel wrote:

[...]

> 
> Thanks for the info on QEMU side handling of MMIO data.
> 
> I was not aware that we would be only have "target endian = LE"
> for ARM/ARM64 in QEMU. I think Marc Z had mentioned similar
> thing about MMIO this in our previous discussions on his patches.
> (Please refer, http://www.spinics.net/lists/arm-kernel/msg283313.html)
> 
> This clearly means MMIO data passed to user space (QEMU) has
> to of host endianness so that QEMU can take care of bust->device
> endian map.

Hmmm, I'm not sure what you mean exactly, but the fact remains that we
simply need to decide on a layout of mmio.data that (1) doesn't break
existing userspace (2) is clearly defined for mixed-mmio use cases.

> 
> Current vcpu_data_guest_to_host() and vcpu_data_host_to_guest()
> does not perform endianness conversion of MMIO data to LE when
> we are running LE guest on BE host so we do need Victor's patch
> for fixing vcpu_data_guest_to_host() and vcpu_data_host_to_guest().
> (Already reported long time back by me,
> http://www.spinics.net/lists/arm-kernel/msg283308.html)
> 

The problem is that we cannot decide on how the patch should look like
before the endianness of mmio.data is decided.

Alex, Peter, and I agree that it should be that of the host endianness
and represent what the architecture in question would put on the memory
bus.  In the case of ARM, it's the register value when the VCPU E-bit is
clear, and it's the byteswapped register value when the VCPU E-bit is
set.

Therefore, the patch needs to do an unconditional byteswap when the VCPU
E-bit is set, instead of the beXX_to_cpu and cpu_to_beXX.

I'm sending out a patch to clarify the KVM API so we can move on.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Christoffer Dall
On Thu, Jan 23, 2014 at 08:25:35AM -0800, Victor Kamensky wrote:
> On 23 January 2014 07:33, Peter Maydell  wrote:
> > On 23 January 2014 15:06, Victor Kamensky  
> > wrote:
> >> In [1] I wrote
> >>
> >> "I don't see why you so attached to desire to describe
> >> data part of memory transaction as just one of int
> >> types. If we are talking about bunch of hypothetical
> >> cases imagine such bus that allow transaction with
> >> size of 6 bytes. How do you describe such data in
> >> your ints speak? What endianity you can assign to
> >> sequence of 6 bytes? While note that description of
> >> such transaction as set of 6 byte values at address
> >> $whatever makes perfect sense."
> >>
> >> But notice that in your next reply [2] you just dropped it
> >
> > Yes. This is because it was one of the places where
> > I would have just had to repeat "no, I'm afraid you're wrong
> > about how hardware works". I think in general it's going
> > to be better if I don't try to reply point by point to this
> > email; I think you should go back and reread the emails I've
> > sent. Key points:
> >  (1) hardware is not doing anything involving arrays
> >  of bytes
> 
> Array of bytes or integers is just a way to describe data lines
> on the bus. Did you look at this document?
> 
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
> 
> A0, A1, ,,, A7 byte values are the same for both LE and BE-8
> case (first two columns in the table) and they unambiguously
> describe data bus signals
> 

The point is simple, and Peter has made it over and over:
Any consumer of a memory operation sees "value, len, address".

This is what KVM_EXIT_MMIO emulates.  So just by knowing the ABI
definition and having a pointer to the structure you need to be able to
tell me "value, len, address".

> >  (2) the API between kernel and userspace needs to define
> >  the semantics of mmio.data, ie how to map between
> >  "x byte wide transaction with value v" and the array,
> >  and that is primarily what this conversation is about
> >  (3) the only choice which is both (a) sensible and (b)
> >  not breaking existing usage is to say "the array is
> >  in host-kernel-byte-order"
> >  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
> >  the same, because in the ARM case it is doing an
> >  internal-to-CPU byteswap, and in the PPC case it is not
> 
> That is one of the key disconnects. I'll go find real examples
> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
> everybody sake's here is summary of the disconnect:
> 
> If we have the same h/w connected to memory bus in ARM
> and PPC systems and we have the following three pieces
> of code that work with r0 having same device same
> register address:
> 
> 1. ARM LE word write of  0x04030201:
> setend le
> mov r1, #0x04030201
> str r1, [r0]
> 
> 2. ARM BE word write of 0x01020304:
> setend be
> mov r1, #0x01020304
> str r1, [r0]
> 
> 3. PPC BE word write of 0x01020304:
> lis r1,0x102
> ori r1,r1,0x304
> stwr1,0(r0)
> 
> I claim that h/w will see the same data on bus lines in all
> three cases, and h/w would acts the same in all three
> cases. Peter says that ARM BE and PPC BE case h/w
> would act differently.
> 
> If anyone else can offer opinion on that while I am looking
> for real examples that would be great.
> 

I really don't think listing all these examples help.  You need to focus
on the key points that Peter listed in his previous mail.

I tried in our chat to ask you this questions:

vcpu_data_host_to_guest() is handling a read from an emulated device.
All the info you have is:
(1) len of memory access
(2) mmio.data pointer
(3) destination register
(4) host CPU endianness
(5) guest CPU endianness

Based on this information alone, you need to decide whether you do a
byteswap or not before loading the hardware register upon returning to
the guest.

You will find it impossible to answer, because you don't know the layout
of mmio.data, and that is the thing we are trying to solve.

If you cannot reply to this point in less than 50 lines or mention
anything about devices being LE or BE or come with examples, I am
probably not going to read your reply, sorry.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Victor Kamensky
On 23 January 2014 07:33, Peter Maydell  wrote:
> On 23 January 2014 15:06, Victor Kamensky  wrote:
>> In [1] I wrote
>>
>> "I don't see why you so attached to desire to describe
>> data part of memory transaction as just one of int
>> types. If we are talking about bunch of hypothetical
>> cases imagine such bus that allow transaction with
>> size of 6 bytes. How do you describe such data in
>> your ints speak? What endianity you can assign to
>> sequence of 6 bytes? While note that description of
>> such transaction as set of 6 byte values at address
>> $whatever makes perfect sense."
>>
>> But notice that in your next reply [2] you just dropped it
>
> Yes. This is because it was one of the places where
> I would have just had to repeat "no, I'm afraid you're wrong
> about how hardware works". I think in general it's going
> to be better if I don't try to reply point by point to this
> email; I think you should go back and reread the emails I've
> sent. Key points:
>  (1) hardware is not doing anything involving arrays
>  of bytes

Array of bytes or integers is just a way to describe data lines
on the bus. Did you look at this document?

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html

A0, A1, ,,, A7 byte values are the same for both LE and BE-8
case (first two columns in the table) and they unambiguously
describe data bus signals

>  (2) the API between kernel and userspace needs to define
>  the semantics of mmio.data, ie how to map between
>  "x byte wide transaction with value v" and the array,
>  and that is primarily what this conversation is about
>  (3) the only choice which is both (a) sensible and (b)
>  not breaking existing usage is to say "the array is
>  in host-kernel-byte-order"
>  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
>  the same, because in the ARM case it is doing an
>  internal-to-CPU byteswap, and in the PPC case it is not

That is one of the key disconnects. I'll go find real examples
in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
everybody sake's here is summary of the disconnect:

If we have the same h/w connected to memory bus in ARM
and PPC systems and we have the following three pieces
of code that work with r0 having same device same
register address:

1. ARM LE word write of  0x04030201:
setend le
mov r1, #0x04030201
str r1, [r0]

2. ARM BE word write of 0x01020304:
setend be
mov r1, #0x01020304
str r1, [r0]

3. PPC BE word write of 0x01020304:
lis r1,0x102
ori r1,r1,0x304
stwr1,0(r0)

I claim that h/w will see the same data on bus lines in all
three cases, and h/w would acts the same in all three
cases. Peter says that ARM BE and PPC BE case h/w
would act differently.

If anyone else can offer opinion on that while I am looking
for real examples that would be great.

Thanks,
Victor

> thanks
> -- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Peter Maydell
On 23 January 2014 15:06, Victor Kamensky  wrote:
> In [1] I wrote
>
> "I don't see why you so attached to desire to describe
> data part of memory transaction as just one of int
> types. If we are talking about bunch of hypothetical
> cases imagine such bus that allow transaction with
> size of 6 bytes. How do you describe such data in
> your ints speak? What endianity you can assign to
> sequence of 6 bytes? While note that description of
> such transaction as set of 6 byte values at address
> $whatever makes perfect sense."
>
> But notice that in your next reply [2] you just dropped it

Yes. This is because it was one of the places where
I would have just had to repeat "no, I'm afraid you're wrong
about how hardware works". I think in general it's going
to be better if I don't try to reply point by point to this
email; I think you should go back and reread the emails I've
sent. Key points:
 (1) hardware is not doing anything involving arrays
 of bytes
 (2) the API between kernel and userspace needs to define
 the semantics of mmio.data, ie how to map between
 "x byte wide transaction with value v" and the array,
 and that is primarily what this conversation is about
 (3) the only choice which is both (a) sensible and (b)
 not breaking existing usage is to say "the array is
 in host-kernel-byte-order"
 (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
 the same, because in the ARM case it is doing an
 internal-to-CPU byteswap, and in the PPC case it is not

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Victor Kamensky
On 23 January 2014 02:23, Peter Maydell  wrote:
> On 23 January 2014 00:22, Victor Kamensky  wrote:
>> Peter, could I please ask you a favor. Could you please
>> stop deleting pieces of your and my previous responses
>> when you reply.
>
> No, sorry. It produces excessively long and totally unreadable
> emails for everybody else if people don't trim for context.
> This is standard mailing list practice.

Usually it is OK, but with your choices sometimes you remove
my questions without answering them, where I think that they
are essential for discussion. For example I asked about 'len'
with value that are not power of 2:

In [1] I wrote

"I don't see why you so attached to desire to describe
data part of memory transaction as just one of int
types. If we are talking about bunch of hypothetical
cases imagine such bus that allow transaction with
size of 6 bytes. How do you describe such data in
your ints speak? What endianity you can assign to
sequence of 6 bytes? While note that description of
such transaction as set of 6 byte values at address
$whatever makes perfect sense."

But notice that in your next reply [2] you just dropped it

Similar situation happens with this reply you removed
piece that I had to bring back. Please see below.

 Consider above big endian case (setend be) example,
 but now running in BE KVM host. 0x4 is LSB of CPU
 core register in this case.
>>>
>>> Yes. In this case if we are using the "mmio.data is host
>>> kernel endianness" definition then mmio.data[0] should be
>>> 0x01 (the MSB of the 32 bit data value).
>>
>> If mmio.data[0] is 0x1, mmio.data[] = {0x1, 0x2, 0x3, 0x4},
>> and now KVM host and emulator running in BE mode.
>> But that contradicts to what you said before.
>
> Sorry, I misread the example here (and assumed we were
> writing the same word in both cases, when actually the BE
> code example is writing a different value). mmio.data[0] should
> be 0x4, because:
>  * BE ARM guest, so KVM must byte-swap the register value
> (giving 0x04030201)
>  * BE host, so it writes the uint32_t in host order (giving
>0x4 in mmio.data[0])
>
 I believe, but I need to check, that PPC BE setup actually
 acts as the second case in above example  If we have PPC
 BE guest executing the following instructions:

 lis r1,0x102
 ori r1,r1,0x304
 stwr1,0(r0)

 after first two instructions r1 would contain 0x01020304.
 IMHO It exactly corresponds to above my ARM second case -
 BE guest when it runs under ARM BE KVM host. I believe
 that mmio.data[] in PPC BE case would be {0x1, 0x2, 0x3, 0x4}.
>>>
>>> Yes, assuming a BE PPC host kernel (which is the usual
>>> arrangement).
>>
>> OK, that confirms my understanding how PPC mmio
>> should work.
>>
 But according to you data[0] must be 0x4 in BE host case
>>>
>>> Er, no. The data here is 0x01020304, so for a BE host
>>> data[0] is the big end, ie 0x1. It would only be 0x4 if
>>> mmio.data[] were LE always (or if you were running
>>> your BE PPC guest on an LE PPC host, which I don't
>>> think is supported currently).
>>
>> So do you agree that for all three code snippets cited in this
>> email, we always will have mmio.data[] = {0x1, 0x2,
>> 0x3, 0x4}, for ARM LE qemu/host, for ARM BE qemu/host
>> and for ppc code snippet in PPC BE qemu/host.
>
> No. Also your ARM and PPC examples are not usefully
> comparable, because:
>
>> setend le
>> mov r1, #0x04030201
>> str r1, [r0]
>
> This is an LE guest writing 0x04030201, and that is the
> value that will go out on the bus.
>
>> and
>>
>> setend be
>> mov r1, #0x01020304
>> str r1, [r0]
>
> This is a BE guest writing 0x01020304; as far as the
> code running on the CPU is concerned; the value on the
> bus will be byteswapped.
>
>> lis r1,0x102
>> ori r1,r1,0x304
>> stwr1,0(r0)
>
> This is also a BE guest writing 0x01020304. I'm pretty
> sure that the PPC approach is that for BE guests writing
> a word that word goes out to the bus as is; for LE guests
> (or if the page table is set up to say "this page is LE") the
> CPU swaps it before putting it on the bus. In this regard
> it is the opposite way round to ARM.
>
> So the value you start with in the CPU register is not
> the same in all three cases, and what the hardware
> does is not the same either.

So in what cases h/w does differently? I think we agreed
before that in ARM cases it is the same memory
transaction h/w cannot do anything different. And ARM
'setend be' cases matches PPC BE case in both cases
BE write happens to the same h/w address and value is
0x01020304, why h/w would see it differently? It is
the same write.

I think you missing that in all discussed cases BE-8, byte
invariant CPU memory buses are used. Here is what I wrote
in reply to Alex, it is worth copying it here:

 start quote from my response to Alex -
I disagree with Peter's point of view as you saw from our
long thread :). I strongly believe that current mmio.data[]
descri

Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-23 Thread Greg Kurz
On Wed, 22 Jan 2014 20:25:05 -0800
Victor Kamensky  wrote:

> Hi Alex,
> 
> Sorry, for delayed reply, I was focusing on discussion
> with Peter. Hope you and other folks may get something
> out of it :).
> 
> Please see responses inline
> 
> On 22 January 2014 02:52, Alexander Graf  wrote:
> >
> > On 22.01.2014, at 08:26, Victor Kamensky 
> > wrote:
> >
> >> On 21 January 2014 22:41, Alexander Graf  wrote:
> >>>
> >>>
> >>> "Native endian" really is just a shortcut for "target endian"
> >>> which is LE for ARM and BE for PPC. There shouldn't be
> >>> a qemu-system-armeb or qemu-system-ppc64le.
> >>
> >> I disagree. Fully functional ARM BE system is what we've
> >> been working on for last few months. 'We' is Linaro
> >> Networking Group, Endian subteam and some other guys
> >> in ARM and across community. Why we do that is a bit
> >> beyond of this discussion.
> >>
> >> ARM BE patches for both V7 and V8 are already in mainline
> >> kernel. But ARM BE KVM host is broken now. It is known
> >> deficiency that I am trying to fix. Please look at [1]. Patches
> >> for V7 BE KVM were proposed and currently under active
> >> discussion. Currently I work on ARM V8 BE KVM changes.
> >>
> >> So "native endian" in ARM is value of CPSR register E bit.
> >> If it is off native endian is LE, if it is on it is BE.
> >>
> >> Once and if we agree on ARM BE KVM host changes, the
> >> next step would be patches in qemu one of which introduces
> >> qemu-system-armeb. Please see [2].
> >
> > I think we're facing an ideology conflict here. Yes, there
> > should be a qemu-system-arm that is BE capable.
> 
> Maybe it is not ideology conflict but rather terminology clarity
> issue :). I am not sure what do you mean by "qemu-system-arm
> that is BE capable". In qemu build system there is just target
> name 'arm', which is ARM V7 cpu in LE mode, and 'armeb'
> target which is ARM V7 cpu in BE mode. That is true for a lot
> of open source packages. You could check [1] patch that
> introduces armeb target into qemu. Build for
> arm target produces qemu-system-arm executable that is
> marked 'ELF 32-bit LSB executable' and it could run on LE
> traditional ARM Linux. Build for armeb target produces
> qemu-system-armeb executable that is marked 'ELF 32-bit
> MSB executable' that can run on BE ARM Linux. armbe is
> nothing special here, just build option for qemu that should run
> on BE ARM Linux.
> 

Hmmm... it looks like there is a confusion about the qemu command naming.
The -target suffix in qemu-system-target has nothing to do with the ELF
information of the command itself.

[greg@bahia ~]$ file `which qemu-system-arm`
/bin/qemu-system-arm: ELF 64-bit LSB shared object, x86-64, version 1
(SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32,
BuildID[sha1]=0xbcb974847daa8159c17ed74906cd5351387d4097, stripped

It is valid to create a new target if it is substantially different from
existing ones (ppc64 versus ppc for example). This is not the case with ARM
since it is the very same CPU that can switch endianess with the 'setend'
instruction (which needs anyway to be emulated when running in TCG mode).

qemu-system-arm is THE command that should be able to emulate an ARM cpu,
whether the guest does 'setend le' or 'setend be'.

> Both qemu-system-arm and qemu-system-armeb should
> be BE/LE capable. I.e either of them along with KVM could
> either run LE or BE guest. MarcZ demonstrated that this
> is possible. I've tested both LE and BE guests with
> qemu-system-arm running on traditional LE ARM Linux,
> effectively repeating Marc's setup but with qemu.
> And I did test with my patches both BE and LE guests with
> qemu-system-armeb running on BE ARM Linux.
> 
> > There
> > should also be a qemu-system-ppc64 that is LE capable.
> > But there is no point in changing the "default endiannes"
> > for the virtual CPUs that we plug in there. Both CPUs are
> > perfectly capable of running in LE or BE mode, the
> > question is just what we declare the "default".
> 
> I am not sure, what you mean by "default"? Is it initial
> setting of CPSR E bit and 'cp15 c1, c0, 0' EE bit? Yes,
> the way it is currently implemented by committed
> qemu-system-arm, and proposed qemu-system-armeb
> patches, they are both off. I.e even qemu-system-armeb
> starts running vcpu in LE mode, exactly by very similar
> reason as desribed in your next paragraph
> qemu-system-armeb has tiny bootloader that starts
> in LE mode, jumps to kernel kernel switches cpu to
> run in BE mode 'setend be' and EE bit is set just
> before mmu is enabled.
> 
> > Think about the PPC bootstrap. We start off with a
> > BE firmware, then boot into the Linux kernel which
> > calls a hypercall to set the LE bit on every interrupt.
> 
> We have very similar situation with BE ARM Linux.
> When we run ARM BE Linux we start with bootloader
> which is LE and then CPU issues 'setend be' very
> soon as it starts executing kernel code, all secondary
> CPUs issue 'setend be' when they go out 

Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-23 Thread Alexander Graf

On 23.01.2014, at 05:25, Victor Kamensky  wrote:

> Hi Alex,
> 
> Sorry, for delayed reply, I was focusing on discussion
> with Peter. Hope you and other folks may get something
> out of it :).
> 
> Please see responses inline
> 
> On 22 January 2014 02:52, Alexander Graf  wrote:
>> 
>> On 22.01.2014, at 08:26, Victor Kamensky  wrote:
>> 
>>> On 21 January 2014 22:41, Alexander Graf  wrote:
 
 
 "Native endian" really is just a shortcut for "target endian"
 which is LE for ARM and BE for PPC. There shouldn't be
 a qemu-system-armeb or qemu-system-ppc64le.
>>> 
>>> I disagree. Fully functional ARM BE system is what we've
>>> been working on for last few months. 'We' is Linaro
>>> Networking Group, Endian subteam and some other guys
>>> in ARM and across community. Why we do that is a bit
>>> beyond of this discussion.
>>> 
>>> ARM BE patches for both V7 and V8 are already in mainline
>>> kernel. But ARM BE KVM host is broken now. It is known
>>> deficiency that I am trying to fix. Please look at [1]. Patches
>>> for V7 BE KVM were proposed and currently under active
>>> discussion. Currently I work on ARM V8 BE KVM changes.
>>> 
>>> So "native endian" in ARM is value of CPSR register E bit.
>>> If it is off native endian is LE, if it is on it is BE.
>>> 
>>> Once and if we agree on ARM BE KVM host changes, the
>>> next step would be patches in qemu one of which introduces
>>> qemu-system-armeb. Please see [2].
>> 
>> I think we're facing an ideology conflict here. Yes, there
>> should be a qemu-system-arm that is BE capable.
> 
> Maybe it is not ideology conflict but rather terminology clarity
> issue :). I am not sure what do you mean by "qemu-system-arm
> that is BE capable". In qemu build system there is just target
> name 'arm', which is ARM V7 cpu in LE mode, and 'armeb'
> target which is ARM V7 cpu in BE mode. That is true for a lot
> of open source packages. You could check [1] patch that
> introduces armeb target into qemu. Build for
> arm target produces qemu-system-arm executable that is
> marked 'ELF 32-bit LSB executable' and it could run on LE
> traditional ARM Linux. Build for armeb target produces
> qemu-system-armeb executable that is marked 'ELF 32-bit
> MSB executable' that can run on BE ARM Linux. armbe is
> nothing special here, just build option for qemu that should run
> on BE ARM Linux.

But why should it be called armbe then? What actual difference does the model 
have compared to the qemu-system-arm model?

> 
> Both qemu-system-arm and qemu-system-armeb should
> be BE/LE capable. I.e either of them along with KVM could
> either run LE or BE guest. MarcZ demonstrated that this
> is possible. I've tested both LE and BE guests with
> qemu-system-arm running on traditional LE ARM Linux,
> effectively repeating Marc's setup but with qemu.
> And I did test with my patches both BE and LE guests with
> qemu-system-armeb running on BE ARM Linux.
> 
>> There
>> should also be a qemu-system-ppc64 that is LE capable.
>> But there is no point in changing the "default endiannes"
>> for the virtual CPUs that we plug in there. Both CPUs are
>> perfectly capable of running in LE or BE mode, the
>> question is just what we declare the "default".
> 
> I am not sure, what you mean by "default"? Is it initial
> setting of CPSR E bit and 'cp15 c1, c0, 0' EE bit? Yes,
> the way it is currently implemented by committed
> qemu-system-arm, and proposed qemu-system-armeb
> patches, they are both off. I.e even qemu-system-armeb
> starts running vcpu in LE mode, exactly by very similar
> reason as desribed in your next paragraph
> qemu-system-armeb has tiny bootloader that starts
> in LE mode, jumps to kernel kernel switches cpu to
> run in BE mode 'setend be' and EE bit is set just
> before mmu is enabled.

You're proving my point even more. If both targets are LE/BE capable and both 
targets start execution in LE mode, then why do we need a qemu-system-armbe at 
all? Just use qemu-system-arm.
> 

>> Think about the PPC bootstrap. We start off with a
>> BE firmware, then boot into the Linux kernel which
>> calls a hypercall to set the LE bit on every interrupt.
> 
> We have very similar situation with BE ARM Linux.
> When we run ARM BE Linux we start with bootloader
> which is LE and then CPU issues 'setend be' very
> soon as it starts executing kernel code, all secondary
> CPUs issue 'setend be' when they go out of reset pen
> or bootmonitor sleep.
> 
>> But there's no reason this little endian kernel
>> couldn't theoretically have big endian user space running
>> with access to emulated device registers.
> 
> I don't want to go there, it is very very messy ...
> 
> -- Just a side note: --
> Interestingly, half a year before I joined Linaro in Cisco I and
> my colleague implemented kernel patch that allowed to run
> BE user-space processes as sort of separate personality on
> top of LE ARM kernel ... treated kind of multi-abi system.
> Effectively we had to do byteswaps on all no

Re: KVM and variable-endianness guest CPUs

2014-01-23 Thread Peter Maydell
On 23 January 2014 00:22, Victor Kamensky  wrote:
> Peter, could I please ask you a favor. Could you please
> stop deleting pieces of your and my previous responses
> when you reply.

No, sorry. It produces excessively long and totally unreadable
emails for everybody else if people don't trim for context.
This is standard mailing list practice.

>>> Consider above big endian case (setend be) example,
>>> but now running in BE KVM host. 0x4 is LSB of CPU
>>> core register in this case.
>>
>> Yes. In this case if we are using the "mmio.data is host
>> kernel endianness" definition then mmio.data[0] should be
>> 0x01 (the MSB of the 32 bit data value).
>
> If mmio.data[0] is 0x1, mmio.data[] = {0x1, 0x2, 0x3, 0x4},
> and now KVM host and emulator running in BE mode.
> But that contradicts to what you said before.

Sorry, I misread the example here (and assumed we were
writing the same word in both cases, when actually the BE
code example is writing a different value). mmio.data[0] should
be 0x4, because:
 * BE ARM guest, so KVM must byte-swap the register value
(giving 0x04030201)
 * BE host, so it writes the uint32_t in host order (giving
   0x4 in mmio.data[0])

>>> I believe, but I need to check, that PPC BE setup actually
>>> acts as the second case in above example  If we have PPC
>>> BE guest executing the following instructions:
>>>
>>> lis r1,0x102
>>> ori r1,r1,0x304
>>> stwr1,0(r0)
>>>
>>> after first two instructions r1 would contain 0x01020304.
>>> IMHO It exactly corresponds to above my ARM second case -
>>> BE guest when it runs under ARM BE KVM host. I believe
>>> that mmio.data[] in PPC BE case would be {0x1, 0x2, 0x3, 0x4}.
>>
>> Yes, assuming a BE PPC host kernel (which is the usual
>> arrangement).
>
> OK, that confirms my understanding how PPC mmio
> should work.
>
>>> But according to you data[0] must be 0x4 in BE host case
>>
>> Er, no. The data here is 0x01020304, so for a BE host
>> data[0] is the big end, ie 0x1. It would only be 0x4 if
>> mmio.data[] were LE always (or if you were running
>> your BE PPC guest on an LE PPC host, which I don't
>> think is supported currently).
>
> So do you agree that for all three code snippets cited in this
> email, we always will have mmio.data[] = {0x1, 0x2,
> 0x3, 0x4}, for ARM LE qemu/host, for ARM BE qemu/host
> and for ppc code snippet in PPC BE qemu/host.

No. Also your ARM and PPC examples are not usefully
comparable, because:

> setend le
> mov r1, #0x04030201
> str r1, [r0]

This is an LE guest writing 0x04030201, and that is the
value that will go out on the bus.

> and
>
> setend be
> mov r1, #0x01020304
> str r1, [r0]

This is a BE guest writing 0x01020304; as far as the
code running on the CPU is concerned; the value on the
bus will be byteswapped.

> lis r1,0x102
> ori r1,r1,0x304
> stwr1,0(r0)

This is also a BE guest writing 0x01020304. I'm pretty
sure that the PPC approach is that for BE guests writing
a word that word goes out to the bus as is; for LE guests
(or if the page table is set up to say "this page is LE") the
CPU swaps it before putting it on the bus. In this regard
it is the opposite way round to ARM.

So the value you start with in the CPU register is not
the same in all three cases, and what the hardware
does is not the same either.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-22 Thread Victor Kamensky
Hi Alex,

Sorry, for delayed reply, I was focusing on discussion
with Peter. Hope you and other folks may get something
out of it :).

Please see responses inline

On 22 January 2014 02:52, Alexander Graf  wrote:
>
> On 22.01.2014, at 08:26, Victor Kamensky  wrote:
>
>> On 21 January 2014 22:41, Alexander Graf  wrote:
>>>
>>>
>>> "Native endian" really is just a shortcut for "target endian"
>>> which is LE for ARM and BE for PPC. There shouldn't be
>>> a qemu-system-armeb or qemu-system-ppc64le.
>>
>> I disagree. Fully functional ARM BE system is what we've
>> been working on for last few months. 'We' is Linaro
>> Networking Group, Endian subteam and some other guys
>> in ARM and across community. Why we do that is a bit
>> beyond of this discussion.
>>
>> ARM BE patches for both V7 and V8 are already in mainline
>> kernel. But ARM BE KVM host is broken now. It is known
>> deficiency that I am trying to fix. Please look at [1]. Patches
>> for V7 BE KVM were proposed and currently under active
>> discussion. Currently I work on ARM V8 BE KVM changes.
>>
>> So "native endian" in ARM is value of CPSR register E bit.
>> If it is off native endian is LE, if it is on it is BE.
>>
>> Once and if we agree on ARM BE KVM host changes, the
>> next step would be patches in qemu one of which introduces
>> qemu-system-armeb. Please see [2].
>
> I think we're facing an ideology conflict here. Yes, there
> should be a qemu-system-arm that is BE capable.

Maybe it is not ideology conflict but rather terminology clarity
issue :). I am not sure what do you mean by "qemu-system-arm
that is BE capable". In qemu build system there is just target
name 'arm', which is ARM V7 cpu in LE mode, and 'armeb'
target which is ARM V7 cpu in BE mode. That is true for a lot
of open source packages. You could check [1] patch that
introduces armeb target into qemu. Build for
arm target produces qemu-system-arm executable that is
marked 'ELF 32-bit LSB executable' and it could run on LE
traditional ARM Linux. Build for armeb target produces
qemu-system-armeb executable that is marked 'ELF 32-bit
MSB executable' that can run on BE ARM Linux. armbe is
nothing special here, just build option for qemu that should run
on BE ARM Linux.

Both qemu-system-arm and qemu-system-armeb should
be BE/LE capable. I.e either of them along with KVM could
either run LE or BE guest. MarcZ demonstrated that this
is possible. I've tested both LE and BE guests with
qemu-system-arm running on traditional LE ARM Linux,
effectively repeating Marc's setup but with qemu.
And I did test with my patches both BE and LE guests with
qemu-system-armeb running on BE ARM Linux.

> There
> should also be a qemu-system-ppc64 that is LE capable.
> But there is no point in changing the "default endiannes"
> for the virtual CPUs that we plug in there. Both CPUs are
> perfectly capable of running in LE or BE mode, the
> question is just what we declare the "default".

I am not sure, what you mean by "default"? Is it initial
setting of CPSR E bit and 'cp15 c1, c0, 0' EE bit? Yes,
the way it is currently implemented by committed
qemu-system-arm, and proposed qemu-system-armeb
patches, they are both off. I.e even qemu-system-armeb
starts running vcpu in LE mode, exactly by very similar
reason as desribed in your next paragraph
qemu-system-armeb has tiny bootloader that starts
in LE mode, jumps to kernel kernel switches cpu to
run in BE mode 'setend be' and EE bit is set just
before mmu is enabled.

> Think about the PPC bootstrap. We start off with a
> BE firmware, then boot into the Linux kernel which
> calls a hypercall to set the LE bit on every interrupt.

We have very similar situation with BE ARM Linux.
When we run ARM BE Linux we start with bootloader
which is LE and then CPU issues 'setend be' very
soon as it starts executing kernel code, all secondary
CPUs issue 'setend be' when they go out of reset pen
or bootmonitor sleep.

> But there's no reason this little endian kernel
> couldn't theoretically have big endian user space running
> with access to emulated device registers.

I don't want to go there, it is very very messy ...

-- Just a side note: --
Interestingly, half a year before I joined Linaro in Cisco I and
my colleague implemented kernel patch that allowed to run
BE user-space processes as sort of separate personality on
top of LE ARM kernel ... treated kind of multi-abi system.
Effectively we had to do byteswaps on all non-trivial
system calls and ioctls in side of the kernel. We converted
around 30 system calls and around 10 ioctls. Our target process
was just using those and it works working, but patch was
very intrusive and unnatural. I think in Linaro there was
some public version of my presentation circulated that
explained all this mess. I don't want seriously to consider it.

The only robust mixed mode, as MarcZ demonstrated,
could be done only on VM boundaries. I.e LE host can
run BE guest fine. And BE host can run LE guest fine.
Everything else would

Re: KVM and variable-endianness guest CPUs

2014-01-22 Thread Victor Kamensky
Peter, could I please ask you a favor. Could you please
stop deleting pieces of your and my previous responses
when you reply.
Please just reply inline. Sometimes I would like to
reference my or your previous statement, but I could not
find it in your response email. It is very bizzar. Sorry,
it will make your response email bigger, but I am very
confused otherwise.

On 22 January 2014 15:18, Peter Maydell  wrote:
> On 22 January 2014 22:47, Victor Kamensky  wrote:
>> You deleted my example, but I need it again:
>> Consider the following ARM code snippets:
>>
>> setend le
>> mov r1, #0x04030201
>> str r1, [r0]
>>
>> and
>>
>> setend be
>> mov r1, #0x01020304
>> str r1, [r0]
>>
>> Just for LE host case basically you are saying that if guest
>> issues 4 bytes store
>> instruction for CPU core register and CPSR E bit is off,
>> mmio.data[0] would contain LSB of integer from this CPU core
>> register. I don't understand your bus endianity thing, but I do
>> understand LSB of integer in core CPU register. Do we agree
>> that in above example in second case when BE access is
>> on (E bit is on), it is the exactly the same memory transaction
>> but it data[0] = 0x1 is MSB of integer in CPU core register (still the
>> same LE host case)?
>
> Yes, this is true both if we define mmio.data[] as "always
> little endian" and if we define it as "host kernel endianness",
> since you've specified an LE host here.

OK, we are in agreement here. mmio.data[] = { 0x1, 0x2, 0x3, 0x4}
for both types of guest access and host is LE  And as
far as bus concerned it is absolutely the same transaction.

> (The kernel has to byte swap if CPSR.E is set, because
> it has to emulate the byte-lane-swap the CPU hardware
> does internally before register data goes out to the bus.)
>
>> Consider above big endian case (setend be) example,
>> but now running in BE KVM host. 0x4 is LSB of CPU
>> core register in this case.
>
> Yes. In this case if we are using the "mmio.data is host
> kernel endianness" definition then mmio.data[0] should be
> 0x01 (the MSB of the 32 bit data value).

If mmio.data[0] is 0x1, mmio.data[] = {0x1, 0x2, 0x3, 0x4},
and now KVM host and emulator running in BE mode.
But that contradicts to what you said before. In previous
email (please see [1]). Here is what your and I just said in few
paragraphs before my "Consider above big endian .." above
paragraph:

- start quote --

>> BTW could you please propose how will you see such
>> "32 bit transaction, value 0x04030201, address $whatever".
>> on ARM LE CPU in mmio.data?
>
> That is exactly the problem we're discussing in this thread.
> Indeed, I proposed an answer to it, which is that the mmio.data
> array should be in host kernel byte order, in which case it
> would be (for an LE host kernel) 0x01 in mmio.data[0] and so
> on up.
>
>> If it would be {0x01, 0x02, 0x03, 0x4} it is fine with
>> me. That is current case ARM LE case when above
>> snippets would be executed by guest.
>>
>> Would we  agree that the same arrangement would be
>> true for all other cases on ARM regardless of all other
>> endianities of qemu, KVM host, guest, hypervisor, etc?
>
> No; under my proposal, for a big-endian host kernel (and
> thus big-endian QEMU) the order would be
> mmio.data[0] = 0x04, etc. (This wouldn't change based
> on the guest kernel endianness or whether it happened
> to have set the E bit temporarily.)

- end quote --

So in one case for the same memory transaction (ARM
setend be snippet) executed
under BE ARM host KVM you said that "mmio.data[0]
should be 0x01 (the MSB of the 32 bit data value)" and
before you said "No; under my proposal, for a big-endian
host kernel (and thus big-endian QEMU) the order would be
mmio.data[0] = 0x04, etc". So which is mmio.data[0]?

I argue that for all three code snippets in this email (two for
ARM and one for PPC) mmio.data[] = {0x1, 0x2, 0x3, 04},
and that does not depend whether it is LE ARM KVM host,
BE ARM KVM host, or BE PPC KVM.

> (Notice that the
> BE host kernel can actually just behave exactly like the LE
> one: byteswap 32 bit value from guest register if guest
> CPSR.E is set, then do a 32-bit store of the 32 bit word
> into mmio.data[].)
>
>>> Defining that mmio.data[] is always little-endian would
>>> be a valid definition of an API if we were doing it from
>>> scratch. It has the unfortunate property that it would
>>> completely break the existing PPC BE setups, which
>>> don't define it that way, so it is a non-starter.
>>
>> I believe, but I need to check, that PPC BE setup actually
>> acts as the second case in above example  If we have PPC
>> BE guest executing the following instructions:
>>
>> lis r1,0x102
>> ori r1,r1,0x304
>> stwr1,0(r0)
>>
>> after first two instructions r1 would contain 0x01020304.
>> IMHO It exactly corresponds to above my ARM second case -
>> BE guest when it runs under ARM BE KVM host. I believe
>> 

Re: KVM and variable-endianness guest CPUs

2014-01-22 Thread Peter Maydell
On 22 January 2014 22:47, Victor Kamensky  wrote:
> You deleted my example, but I need it again:
> Consider the following ARM code snippets:
>
> setend le
> mov r1, #0x04030201
> str r1, [r0]
>
> and
>
> setend be
> mov r1, #0x01020304
> str r1, [r0]
>
> Just for LE host case basically you are saying that if guest
> issues 4 bytes store
> instruction for CPU core register and CPSR E bit is off,
> mmio.data[0] would contain LSB of integer from this CPU core
> register. I don't understand your bus endianity thing, but I do
> understand LSB of integer in core CPU register. Do we agree
> that in above example in second case when BE access is
> on (E bit is on), it is the exactly the same memory transaction
> but it data[0] = 0x1 is MSB of integer in CPU core register (still the
> same LE host case)?

Yes, this is true both if we define mmio.data[] as "always
little endian" and if we define it as "host kernel endianness",
since you've specified an LE host here.

(The kernel has to byte swap if CPSR.E is set, because
it has to emulate the byte-lane-swap the CPU hardware
does internally before register data goes out to the bus.)

> Consider above big endian case (setend be) example,
> but now running in BE KVM host. 0x4 is LSB of CPU
> core register in this case.

Yes. In this case if we are using the "mmio.data is host
kernel endianness" definition then mmio.data[0] should be
0x01 (the MSB of the 32 bit data value). (Notice that the
BE host kernel can actually just behave exactly like the LE
one: byteswap 32 bit value from guest register if guest
CPSR.E is set, then do a 32-bit store of the 32 bit word
into mmio.data[].)

>> Defining that mmio.data[] is always little-endian would
>> be a valid definition of an API if we were doing it from
>> scratch. It has the unfortunate property that it would
>> completely break the existing PPC BE setups, which
>> don't define it that way, so it is a non-starter.
>
> I believe, but I need to check, that PPC BE setup actually
> acts as the second case in above example  If we have PPC
> BE guest executing the following instructions:
>
> lis r1,0x102
> ori r1,r1,0x304
> stwr1,0(r0)
>
> after first two instructions r1 would contain 0x01020304.
> IMHO It exactly corresponds to above my ARM second case -
> BE guest when it runs under ARM BE KVM host. I believe
> that mmio.data[] in PPC BE case would be {0x1, 0x2, 0x3, 0x4}.

Yes, assuming a BE PPC host kernel (which is the usual
arrangement).

> But according to you data[0] must be 0x4 in BE host case

Er, no. The data here is 0x01020304, so for a BE host
data[0] is the big end, ie 0x1. It would only be 0x4 if
mmio.data[] were LE always (or if you were running
your BE PPC guest on an LE PPC host, which I don't
think is supported currently).

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-22 Thread Victor Kamensky
On 22 January 2014 12:02, Peter Maydell  wrote:
> On 22 January 2014 19:29, Victor Kamensky  wrote:
>> On 22 January 2014 09:29, Peter Maydell  wrote:
>>> This just isn't how real buses work. There is no
>>> "address + 1, address + 2". There is a single address
>>> for the memory transaction and a set of data on
>>> data lines and some separate size information.
>>
>> Yes, and those data lines are just binary signal lines
>> not numbers. If one would want to describe information
>> on data lines as number he/she needs to assign integer
>> bits numbers to lines, and that is absolutely arbitrary
>> process
>
> It is part of the definition of the bus which signal pin is
> D0 and which is D31...
>
>>  In one choose one way to assigned those
>> bits to lines and another choose reverse way, they will
>> talk about completely different numbers for the same
>> signals on the bus. Such data lines enumeration has
>> no reflection on how bus actually works. And I don't
>> even see why it should be described just as single
>> integer, for example one can describe information on
>> data lines as set of 4 byte value, nothing wrong with
>> such description.
>
> It is not how the hardware works. If you describe it as
> a set of 4 bytes, then you need to also say how you are
> mapping from those 4 bytes to the actual 32 bit data
> transaction the hardware is doing. Which is the question
> we're trying to answer in this thread.
>
> I've snipped a huge chunk of my initial reply to this email,
> because it all boiled down to "sorry, you're just not correct
> about how the hardware works" and it doesn't seem
> necessary to repeat it three times. Devices really do see
> "this is a transaction with this value and this size". They
> do not in any way see a 32 bit word write as "this is a collection
> of byte writes". Therefore:
>
>  1) thinking about a 32 bit word write in terms of a byte array
> is confusing
>  2) since the KVM API is unfortunately stuck with this byte
>array, we must define the semantics of what it actually
>contains, so that the kernel and QEMU can go between
>   "the value being read/written in the transaction" and
>   "the contents of the byte array
>
>>> As soon as you try to think of the mmio.data as a set
>>> of bytes then you have to specify some endianness to
>>> the data, so that both sides (kernel and userspace)
>>> know how to reconstruct the actual data value from the
>>> array of bytes.
>>
>> What actual value? In what sense? You need to bring
>> into discussion semantic of this h/w address to really tell
>> that.
>
> I've just spent the last two emails doing exactly that.
> The actual value, as in "this CPU just did a memory
> transaction of a 32 bit data value".

You deleted my example, but I need it again:
Consider the following ARM code snippets:

setend le
mov r1, #0x04030201
str r1, [r0]

and

setend be
mov r1, #0x01020304
str r1, [r0]

Just for LE host case basically you are saying that if guest
issues 4 bytes store
instruction for CPU core register and CPSR E bit is off,
mmio.data[0] would contain LSB of integer from this CPU core
register. I don't understand your bus endianity thing, but I do
understand LSB of integer in core CPU register. Do we agree
that in above example in second case when BE access is
on (E bit is on), it is the exactly the same memory transaction
but it data[0] = 0x1 is MSB of integer in CPU core register (still the
same LE host case)?

>> BTW could you please propose how will you see such
>> "32 bit transaction, value 0x04030201, address $whatever".
>> on ARM LE CPU in mmio.data?
>
> That is exactly the problem we're discussing in this thread.
> Indeed, I proposed an answer to it, which is that the mmio.data
> array should be in host kernel byte order, in which case it
> would be (for an LE host kernel) 0x01 in mmio.data[0] and so
> on up.
>
>> If it would be {0x01, 0x02, 0x03, 0x4} it is fine with
>> me. That is current case ARM LE case when above
>> snippets would be executed by guest.
>>
>> Would we  agree that the same arrangement would be
>> true for all other cases on ARM regardless of all other
>> endianities of qemu, KVM host, guest, hypervisor, etc?
>
> No; under my proposal, for a big-endian host kernel (and
> thus big-endian QEMU) the order would be
> mmio.data[0] = 0x04, etc. (This wouldn't change based
> on the guest kernel endianness or whether it happened
> to have set the E bit temporarily.)

Consider above big endian case (setend be) example,
but now running in BE KVM host. 0x4 is LSB of CPU
core register in this case.

> Defining that mmio.data[] is always little-endian would
> be a valid definition of an API if we were doing it from
> scratch. It has the unfortunate property that it would
> completely break the existing PPC BE setups, which
> don't define it that way, so it is a non-starter.

I believe, but I need to check, that PPC BE setup actually
acts as the second case in above example  If we have PPC
BE guest executing the followi

Re: KVM and variable-endianness guest CPUs

2014-01-22 Thread Peter Maydell
On 22 January 2014 19:29, Victor Kamensky  wrote:
> On 22 January 2014 09:29, Peter Maydell  wrote:
>> This just isn't how real buses work. There is no
>> "address + 1, address + 2". There is a single address
>> for the memory transaction and a set of data on
>> data lines and some separate size information.
>
> Yes, and those data lines are just binary signal lines
> not numbers. If one would want to describe information
> on data lines as number he/she needs to assign integer
> bits numbers to lines, and that is absolutely arbitrary
> process

It is part of the definition of the bus which signal pin is
D0 and which is D31...

>  In one choose one way to assigned those
> bits to lines and another choose reverse way, they will
> talk about completely different numbers for the same
> signals on the bus. Such data lines enumeration has
> no reflection on how bus actually works. And I don't
> even see why it should be described just as single
> integer, for example one can describe information on
> data lines as set of 4 byte value, nothing wrong with
> such description.

It is not how the hardware works. If you describe it as
a set of 4 bytes, then you need to also say how you are
mapping from those 4 bytes to the actual 32 bit data
transaction the hardware is doing. Which is the question
we're trying to answer in this thread.

I've snipped a huge chunk of my initial reply to this email,
because it all boiled down to "sorry, you're just not correct
about how the hardware works" and it doesn't seem
necessary to repeat it three times. Devices really do see
"this is a transaction with this value and this size". They
do not in any way see a 32 bit word write as "this is a collection
of byte writes". Therefore:

 1) thinking about a 32 bit word write in terms of a byte array
is confusing
 2) since the KVM API is unfortunately stuck with this byte
   array, we must define the semantics of what it actually
   contains, so that the kernel and QEMU can go between
  "the value being read/written in the transaction" and
  "the contents of the byte array

>> As soon as you try to think of the mmio.data as a set
>> of bytes then you have to specify some endianness to
>> the data, so that both sides (kernel and userspace)
>> know how to reconstruct the actual data value from the
>> array of bytes.
>
> What actual value? In what sense? You need to bring
> into discussion semantic of this h/w address to really tell
> that.

I've just spent the last two emails doing exactly that.
The actual value, as in "this CPU just did a memory
transaction of a 32 bit data value".

> BTW could you please propose how will you see such
> "32 bit transaction, value 0x04030201, address $whatever".
> on ARM LE CPU in mmio.data?

That is exactly the problem we're discussing in this thread.
Indeed, I proposed an answer to it, which is that the mmio.data
array should be in host kernel byte order, in which case it
would be (for an LE host kernel) 0x01 in mmio.data[0] and so
on up.

> If it would be {0x01, 0x02, 0x03, 0x4} it is fine with
> me. That is current case ARM LE case when above
> snippets would be executed by guest.
>
> Would we  agree that the same arrangement would be
> true for all other cases on ARM regardless of all other
> endianities of qemu, KVM host, guest, hypervisor, etc?

No; under my proposal, for a big-endian host kernel (and
thus big-endian QEMU) the order would be
mmio.data[0] = 0x04, etc. (This wouldn't change based
on the guest kernel endianness or whether it happened
to have set the E bit temporarily.)

Defining that mmio.data[] is always little-endian would
be a valid definition of an API if we were doing it from
scratch. It has the unfortunate property that it would
completely break the existing PPC BE setups, which
don't define it that way, so it is a non-starter.

Defining it as being always guest-order would mean that
userspace had to continually look at the guest CPU
endianness bit, which is annoying and awkward.

Defining it as always host-endian order is the most
reasonable option available. It also happens to work
for the current QEMU code, which is nice.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-22 Thread Victor Kamensky
On 22 January 2014 09:29, Peter Maydell  wrote:
> On 22 January 2014 17:19, Victor Kamensky  wrote:
>> On 22 January 2014 02:22, Peter Maydell  wrote:
>>> but the major issue here is that the data being
>>> transferred is not just a bag of bytes. The data[]
>>> array plus the size field are being (mis)used to indicate
>>> that the memory transaction is one of:
>>>  * an 8 bit access
>>>  * a 16 bit access of some uint16_t value
>>>  * a 32 bit access of some uint32_t value
>>>  * a 64 bit access of some uint64_t value
>>>
>>> exactly as a CPU hardware bus would do. It's
>>> because the API is defined in this awkward way with
>>> a uint8_t[] array that we need to specify how both
>>> sides should go from the actual properties of the
>>> memory transaction (value and size) to filling in the
>>> array.
>>
>> While responding to Alex last night, I found, I think,
>> easiest and shortest way to think about mmio.data[]
>>
>> Just for discussion reference here it is again
>> struct {
>> __u64 phys_addr;
>> __u8  data[8];
>> __u32 len;
>> __u8  is_write;
>> } mmio;
>> I believe that in all cases it should be interpreted
>> in the following sense
>>byte data[0] goes into byte at phys_addr + 0
>>byte data[1] goes into byte at phys_addr + 1
>>byte data[2] goes into byte at phys_addr + 2
>>and so on up to len size
>>
>> Basically if it would be on real bus, get byte value
>> that corresponds to phys_addr + 0 address place
>> it into data[0], get byte value that corresponds to
>> phys_addr + 1 address place it into data[1], etc.
>
> This just isn't how real buses work. There is no
> "address + 1, address + 2". There is a single address
> for the memory transaction and a set of data on
> data lines and some separate size information.

Yes, and those data lines are just binary signal lines
not numbers. If one would want to describe information
on data lines as number he/she needs to assign integer
bits numbers to lines, and that is absolutely arbitrary
process  In one choose one way to assigned those
bits to lines and another choose reverse way, they will
talk about completely different numbers for the same
signals on the bus. Such data lines enumeration has
no reflection on how bus actually works. And I don't
even see why it should be described just as single
integer, for example one can describe information on
data lines as set of 4 byte value, nothing wrong with
such description.

> How the device at the far end of the bus chooses
> to respond to 32 bit accesses to address X versus
> 8 bit accesses to addresses X through X+3 is entirely
> its own business and unrelated to the CPU. (It would
> be perfectly possible to have a device which when
> you read from address X as 32 bits returned 0x12345678,
> when you read from address X as 16 bits returned
> 0x9abc, returned 0x42 for an 8 bit read from X+1,
> and so on. Having byte reads from X..X+3 return
> values corresponding to parts of the 32 bit access
> is purely a convention.)

I don't follow above, one may have one read from
device address X as 32 bits returned 0x12345678,
and another read from the same address X as 32 bit
returned 0xabcdef123, so what? Maybe real example
would help.

>> Note nowhere in my above description I've talked
>> about endianity of anything: device, access (E bit),
>> KVM host, guest, hypervisor. All these endianities
>> are irrelevant to mmio interface.
>
> As soon as you try to think of the mmio.data as a set
> of bytes then you have to specify some endianness to
> the data, so that both sides (kernel and userspace)
> know how to reconstruct the actual data value from the
> array of bytes.

What actual value? In what sense? You need to bring
into discussion semantic of this h/w address to really tell
that. Driver that reads/writes is aware about semantics
of those addresses. For example devices gives chanel1 byte
value at phys_addr, gives chanel2 byte value at
phys_addr + 1, so 16bit integer read from phys_addr
will bring two chanel values into register not one 16bit
integer.

>>> Furthermore, device endianness is entirely irrelevant
>>> for deciding the properties of mmio.data[], because the
>>> thing we're modelling here is essentially the CPU->bus
>>> interface. In real hardware, the properties of individual
>>> devices on the bus are irrelevant to how the CPU's
>>> interface to the bus behaves, and similarly here the
>>> properties of emulated devices don't affect how KVM's
>>> interface to QEMU userspace needs to work.
>>
>> As far as mmio interface concerned I claim that any
>> endianity is irrelevant here. I am utterly lost about
>> endianity of what you care about?
>
> I care about knowing which end of mmio.data is the
> least significant byte, obviously.

LSB of what? Memory semantics does not have
notion of LSB. It comes only when one start
interpreting memory content. memcpy does not
hav

Re: KVM and variable-endianness guest CPUs

2014-01-22 Thread Peter Maydell
On 22 January 2014 17:19, Victor Kamensky  wrote:
> On 22 January 2014 02:22, Peter Maydell  wrote:
>> but the major issue here is that the data being
>> transferred is not just a bag of bytes. The data[]
>> array plus the size field are being (mis)used to indicate
>> that the memory transaction is one of:
>>  * an 8 bit access
>>  * a 16 bit access of some uint16_t value
>>  * a 32 bit access of some uint32_t value
>>  * a 64 bit access of some uint64_t value
>>
>> exactly as a CPU hardware bus would do. It's
>> because the API is defined in this awkward way with
>> a uint8_t[] array that we need to specify how both
>> sides should go from the actual properties of the
>> memory transaction (value and size) to filling in the
>> array.
>
> While responding to Alex last night, I found, I think,
> easiest and shortest way to think about mmio.data[]
>
> Just for discussion reference here it is again
> struct {
> __u64 phys_addr;
> __u8  data[8];
> __u32 len;
> __u8  is_write;
> } mmio;
> I believe that in all cases it should be interpreted
> in the following sense
>byte data[0] goes into byte at phys_addr + 0
>byte data[1] goes into byte at phys_addr + 1
>byte data[2] goes into byte at phys_addr + 2
>and so on up to len size
>
> Basically if it would be on real bus, get byte value
> that corresponds to phys_addr + 0 address place
> it into data[0], get byte value that corresponds to
> phys_addr + 1 address place it into data[1], etc.

This just isn't how real buses work. There is no
"address + 1, address + 2". There is a single address
for the memory transaction and a set of data on
data lines and some separate size information.
How the device at the far end of the bus chooses
to respond to 32 bit accesses to address X versus
8 bit accesses to addresses X through X+3 is entirely
its own business and unrelated to the CPU. (It would
be perfectly possible to have a device which when
you read from address X as 32 bits returned 0x12345678,
when you read from address X as 16 bits returned
0x9abc, returned 0x42 for an 8 bit read from X+1,
and so on. Having byte reads from X..X+3 return
values corresponding to parts of the 32 bit access
is purely a convention.)

> Note nowhere in my above description I've talked
> about endianity of anything: device, access (E bit),
> KVM host, guest, hypervisor. All these endianities
> are irrelevant to mmio interface.

As soon as you try to think of the mmio.data as a set
of bytes then you have to specify some endianness to
the data, so that both sides (kernel and userspace)
know how to reconstruct the actual data value from the
array of bytes.

>> Furthermore, device endianness is entirely irrelevant
>> for deciding the properties of mmio.data[], because the
>> thing we're modelling here is essentially the CPU->bus
>> interface. In real hardware, the properties of individual
>> devices on the bus are irrelevant to how the CPU's
>> interface to the bus behaves, and similarly here the
>> properties of emulated devices don't affect how KVM's
>> interface to QEMU userspace needs to work.
>
> As far as mmio interface concerned I claim that any
> endianity is irrelevant here. I am utterly lost about
> endianity of what you care about?

I care about knowing which end of mmio.data is the
least significant byte, obviously.

> Consider
> the following ARM code snippets:
>
> setend le
> mov r1, #0x04030201
> str r1, [r0]
>
> and
>
> setend be
> mov r1, #0x01020304
> str r1, [r0]
>
> when above snippets are executed memory bus
> sees absolutely the same thing, can you tell by
> looking at this memory transaction what endianity
> is it? And endianity of what? I can't.

That is correct. That is because the value sent out on
the bus from the CPU is always the same: it says
"32 bit transaction, value 0x04030201, address $whatever".

> The only thing you can tell by looking at this bus
> memory transaction is that 0x01 byte value goes at
> r0 address, 0x02 byte value goes at r0 + 1 address,
> etc.

No, this part is absolutely wrong, see above.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-22 Thread Victor Kamensky
Hi Peter,

On 22 January 2014 02:22, Peter Maydell  wrote:
> On 22 January 2014 05:39, Victor Kamensky  wrote:
>> Hi Guys,
>>
>> Christoffer and I had a bit heated chat :) on this
>> subject last night. Christoffer, really appreciate
>> your time! We did not really reach agreement
>> during the chat and Christoffer asked me to follow
>> up on this thread.
>> Here it goes. Sorry, it is very long email.
>>
>> I don't believe we can assign any endianity to
>> mmio.data[] byte array. I believe mmio.data[] and
>> mmio.len acts just memcpy and that is all. As
>> memcpy does not imply any endianity of underlying
>> data mmio.data[] should not either.
>
> This email is about five times too long to be actually
> useful,

Sorry, about that you may be right.
My below responses much shorter :)

> but the major issue here is that the data being
> transferred is not just a bag of bytes. The data[]
> array plus the size field are being (mis)used to indicate
> that the memory transaction is one of:
>  * an 8 bit access
>  * a 16 bit access of some uint16_t value
>  * a 32 bit access of some uint32_t value
>  * a 64 bit access of some uint64_t value
>
> exactly as a CPU hardware bus would do. It's
> because the API is defined in this awkward way with
> a uint8_t[] array that we need to specify how both
> sides should go from the actual properties of the
> memory transaction (value and size) to filling in the
> array.

While responding to Alex last night, I found, I think,
easiest and shortest way to think about mmio.data[]

Just for discussion reference here it is again
struct {
__u64 phys_addr;
__u8  data[8];
__u32 len;
__u8  is_write;
} mmio;
I believe that in all cases it should be interpreted
in the following sense
   byte data[0] goes into byte at phys_addr + 0
   byte data[1] goes into byte at phys_addr + 1
   byte data[2] goes into byte at phys_addr + 2
   and so on up to len size

Basically if it would be on real bus, get byte value
that corresponds to phys_addr + 0 address place
it into data[0], get byte value that corresponds to
phys_addr + 1 address place it into data[1], etc.

I believe it is true for current ARM LE case and
PPC BE case. I am asking you to keep it this way
for all other cases. My ARM BE V7 KVM patches
still use it in the same sense.

What is wrong with it?

Note nowhere in my above description I've talked
about endianity of anything: device, access (E bit),
KVM host, guest, hypervisor. All these endianities
are irrelevant to mmio interface.

> Furthermore, device endianness is entirely irrelevant
> for deciding the properties of mmio.data[], because the
> thing we're modelling here is essentially the CPU->bus
> interface. In real hardware, the properties of individual
> devices on the bus are irrelevant to how the CPU's
> interface to the bus behaves, and similarly here the
> properties of emulated devices don't affect how KVM's
> interface to QEMU userspace needs to work.

As far as mmio interface concerned I claim that any
endianity is irrelevant here. I am utterly lost about
endianity of what you care about? Consider
the following ARM code snippets:

setend le
mov r1, #0x04030201
str r1, [r0]

and

setend be
mov r1, #0x01020304
str r1, [r0]

when above snippets are executed memory bus
sees absolutely the same thing, can you tell by
looking at this memory transaction what endianity
is it? And endianity of what? I can't.
The only thing you can tell by looking at this bus
memory transaction is that 0x01 byte value goes at
r0 address, 0x02 byte value goes at r0 + 1 address,
etc.

Thanks,
Victor

> MemoryRegion's 'endianness' field, incidentally, is
> a dreadful mess that we should get rid of. It is attempting
> to model the property that some buses/bridges have of
> doing byte-lane-swaps on data that passes through as
> a property of the device itself. It would be better if we
> modelled it properly, with container regions having possible
> byte-swapping and devices just being devices.
>
> thanks
> -- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-22 Thread Alexander Graf

On 22.01.2014, at 08:26, Victor Kamensky  wrote:

> On 21 January 2014 22:41, Alexander Graf  wrote:
>> 
>> 
>> "Native endian" really is just a shortcut for "target endian"
>> which is LE for ARM and BE for PPC. There shouldn't be
>> a qemu-system-armeb or qemu-system-ppc64le.
> 
> I disagree. Fully functional ARM BE system is what we've
> been working on for last few months. 'We' is Linaro
> Networking Group, Endian subteam and some other guys
> in ARM and across community. Why we do that is a bit
> beyond of this discussion.
> 
> ARM BE patches for both V7 and V8 are already in mainline
> kernel. But ARM BE KVM host is broken now. It is known
> deficiency that I am trying to fix. Please look at [1]. Patches
> for V7 BE KVM were proposed and currently under active
> discussion. Currently I work on ARM V8 BE KVM changes.
> 
> So "native endian" in ARM is value of CPSR register E bit.
> If it is off native endian is LE, if it is on it is BE.
> 
> Once and if we agree on ARM BE KVM host changes, the
> next step would be patches in qemu one of which introduces
> qemu-system-armeb. Please see [2].

I think we're facing an ideology conflict here. Yes, there should be a 
qemu-system-arm that is BE capable. There should also be a qemu-system-ppc64 
that is LE capable. But there is no point in changing the "default endiannes" 
for the virtual CPUs that we plug in there. Both CPUs are perfectly capable of 
running in LE or BE mode, the question is just what we declare the "default".

Think about the PPC bootstrap. We start off with a BE firmware, then boot into 
the Linux kernel which calls a hypercall to set the LE bit on every interrupt. 
But there's no reason this little endian kernel couldn't theoretically have big 
endian user space running with access to emulated device registers.

As Peter already pointed out, the actual breakage behind this is that we have a 
"default endianness" at all. But that's a very difficult thing to resolve and I 
don't think should be our primary goal. Just live with the fact that we declare 
ARM little endian in QEMU and swap things accordingly - then everyone's happy.

This really only ever becomes a problem if you have devices that have awareness 
of the CPUs endian mode. The only one on PPC that I'm aware of that falls into 
this category is virtio and there are patches pending to solve that. I don't 
know if there are any QEMU emulated devices outside of virtio with this issue 
on ARM, but you'll have to make the emulation code for those look at the CPU 
state then.

> 
>> QEMU emulates everything that comes after the CPU, so
>> imagine the ioctl struct as a bus package. Your bus
>> doesn't care what endianness the CPU is in - it just
>> gets data from the CPU.
> 
> I am not sure that I follow above. Suppose I have
> 
> move r1, #1
> str r1, [r0]
> 
> where r0 is device address. Now depending on CPSR
> E bit value device address will receive 1 as integer either
> in LE order or in BE order. That is how ARM v7 CPU
> works, regardless whether it is emulated or not.
> 
> So if E bit is off (LE case) after str is executed
> byte at r0 address will get 1
> byte at r0 + 1 address will get 0
> byte at r0 + 2 address will get 0
> byte at r0 + 3 address will get 0
> 
> If E bit is on (BE case) after str is executed
> byte at r0 address will get 0
> byte at r0 + 1 address will get 0
> byte at r0 + 2 address will get 0
> byte at r0 + 3 address will get 1
> 
> my point that mmio.data[] just carries bytes for phys_addr
> mmio.data[0] would be value for byte at phys_addr,
> mmio.data[1] would be value for byte at phys_addr + 1, and
> so on.

What we get is an instruction that traps because it wants to "write r1 (which 
has value=1) into address x". So at that point we get the register value.

Then we need to take a look at the E bit to see whether the write was supposed 
to be in non-host endianness because we need to emulate exactly the LE/BE 
difference you're indicating above. The way we implement this on PPC is that we 
simply byte swap the register value when guest_endian != host_endian.

With this in place, QEMU can just memcpy() the value into a local register and 
feed it into its emulation code which expects a "register value as if the CPU 
was running in native endianness" as parameter - with "native" meaning "little 
endian" for qemu-system-arm. Device emulation code doesn't know what to do with 
a byte array.

Take a look at QEMU's MMIO handler:

case KVM_EXIT_MMIO:
DPRINTF("handle_mmio\n");
cpu_physical_memory_rw(run->mmio.phys_addr,
   run->mmio.data,
   run->mmio.len,
   run->mmio.is_write);
ret = 0;
break;

which translates to

switch (l) {
case 8:
/* 64 bit write access */
val = ldq_p(buf);
error |= io_mem_write(mr, addr1, val, 8);

Re: KVM and variable-endianness guest CPUs

2014-01-22 Thread Peter Maydell
On 22 January 2014 05:39, Victor Kamensky  wrote:
> Hi Guys,
>
> Christoffer and I had a bit heated chat :) on this
> subject last night. Christoffer, really appreciate
> your time! We did not really reach agreement
> during the chat and Christoffer asked me to follow
> up on this thread.
> Here it goes. Sorry, it is very long email.
>
> I don't believe we can assign any endianity to
> mmio.data[] byte array. I believe mmio.data[] and
> mmio.len acts just memcpy and that is all. As
> memcpy does not imply any endianity of underlying
> data mmio.data[] should not either.

This email is about five times too long to be actually
useful, but the major issue here is that the data being
transferred is not just a bag of bytes. The data[]
array plus the size field are being (mis)used to indicate
that the memory transaction is one of:
 * an 8 bit access
 * a 16 bit access of some uint16_t value
 * a 32 bit access of some uint32_t value
 * a 64 bit access of some uint64_t value

exactly as a CPU hardware bus would do. It's
because the API is defined in this awkward way with
a uint8_t[] array that we need to specify how both
sides should go from the actual properties of the
memory transaction (value and size) to filling in the
array.

Furthermore, device endianness is entirely irrelevant
for deciding the properties of mmio.data[], because the
thing we're modelling here is essentially the CPU->bus
interface. In real hardware, the properties of individual
devices on the bus are irrelevant to how the CPU's
interface to the bus behaves, and similarly here the
properties of emulated devices don't affect how KVM's
interface to QEMU userspace needs to work.

MemoryRegion's 'endianness' field, incidentally, is
a dreadful mess that we should get rid of. It is attempting
to model the property that some buses/bridges have of
doing byte-lane-swaps on data that passes through as
a property of the device itself. It would be better if we
modelled it properly, with container regions having possible
byte-swapping and devices just being devices.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-22 Thread Anup Patel
Hi Alex,

On Wed, Jan 22, 2014 at 12:11 PM, Alexander Graf  wrote:
>
>
>> Am 22.01.2014 um 07:31 schrieb Anup Patel :
>>
>> On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky
>>  wrote:
>>> Hi Guys,
>>>
>>> Christoffer and I had a bit heated chat :) on this
>>> subject last night. Christoffer, really appreciate
>>> your time! We did not really reach agreement
>>> during the chat and Christoffer asked me to follow
>>> up on this thread.
>>> Here it goes. Sorry, it is very long email.
>>>
>>> I don't believe we can assign any endianity to
>>> mmio.data[] byte array. I believe mmio.data[] and
>>> mmio.len acts just memcpy and that is all. As
>>> memcpy does not imply any endianity of underlying
>>> data mmio.data[] should not either.
>>>
>>> Here is my definition:
>>>
>>> mmio.data[] is array of bytes that contains memory
>>> bytes in such form, for read case, that if those
>>> bytes are placed in guest memory and guest executes
>>> the same read access instruction with address to this
>>> memory, result would be the same as real h/w device
>>> memory access. Rest of KVM host and hypervisor
>>> part of code should really take care of mmio.data[]
>>> memory so it will be delivered to vcpu registers and
>>> restored by hypervisor part in such way that guest CPU
>>> register value is the same as it would be for real
>>> non-emulated h/w read access (that is emulation part).
>>> The same goes for write access, if guest writes into
>>> memory and those bytes are just copied to emulated
>>> h/w register it would have the same effect as real
>>> mapped h/w register write.
>>>
>>> In shorter form, i.e for len=4 access: endianity of integer
>>> at &mmio.data[0] address should match endianity
>>> of emulated h/w device behind phys_addr address,
>>> regardless what is endianity of emulator, KVM host,
>>> hypervisor, and guest
>>>
>>> Examples that illustrate my definition
>>> --
>>>
>>> 1) LE guest (E bit is off in ARM speak) reads integer
>>> (4 bytes) from mapped h/w LE device register -
>>> mmio.data[3] contains MSB, mmio.data[0] contains LSB.
>>>
>>> 2) BE guest (E bit is on in ARM speak) reads integer
>>> from mapped h/w LE device register - mmio.data[3]
>>> contains MSB, mmio.data[0] contains LSB. Note that
>>> if &mmio.data[0] memory would be placed in guest
>>> address space and instruction restarted with new
>>> address, then it would meet BE guest expectations
>>> - the guest knows that it reads LE h/w so it will byteswap
>>> register before processing it further. This is BE guest ARM
>>> case (regardless of what KVM host endianity is).
>>>
>>> 3) BE guest reads integer from mapped h/w BE device
>>> register - mmio.data[0] contains MSB, mmio.data[3]
>>> contains LSB. Note that if &mmio.data[0] memory would
>>> be placed in guest address space and instruction
>>> restarted with new address, then it would meet BE
>>> guest expectation - the guest knows that it reads
>>> BE h/w so it will proceed further without any other
>>> work. I guess, it is BE ppc case.
>>>
>>>
>>> Arguments in favor of memcpy semantics of mmio.data[]
>>> --
>>>
>>> x) What are possible values of 'len'? Previous discussions
>>> imply that is always powers of 2. Why is that? Maybe
>>> there will be CPU that would need to do 5 bytes mmio
>>> access, or 6 bytes. How do you assign endianity to
>>> such case? 'len' 5 or 6, or any works fine with
>>> memcpy semantics. I admit it is hypothetical case, but
>>> IMHO it tests how clean ABI definition is.
>>>
>>> x) Byte array does not have endianity because it
>>> does not have any structure. If one would want to
>>> imply structure why mmio is not defined in such way
>>> so structure reflected in mmio definition?
>>> Something like:
>>>
>>>
>>>/* KVM_EXIT_MMIO */
>>>struct {
>>>  __u64 phys_addr;
>>>  union {
>>>   __u8 byte;
>>>   __u16 hword;
>>>   __u32 word;
>>>   __u64 dword;
>>>  }  data;
>>>  __u32 len;
>>>  __u8  is_write;
>>>} mmio;
>>>
>>> where len is really serves as union discriminator and
>>> only allowed len values are 1, 2, 4, 8.
>>> In this case, I agree, endianity of integer types
>>> should be defined. I believe, use of byte array strongly
>>> implies that original intent was to have semantics of
>>> byte stream copy, just like memcpy does.
>>>
>>> x) Note there is nothing wrong with user kernel ABI to
>>> use just bytes stream as parameter. There is already
>>> precedents like 'read' and 'write' system calls :).
>>>
>>> x) Consider case when KVM works with emulated memory mapped
>>> h/w devices where some devices operate in LE mode and others
>>> operate in BE mode. It is defined by semantics of real h/w
>>> device whic

Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-21 Thread Victor Kamensky
On 21 January 2014 22:41, Alexander Graf  wrote:
>
>
>> Am 22.01.2014 um 07:31 schrieb Anup Patel :
>>
>> On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky
>>  wrote:
>>> Hi Guys,
>>>
>>> Christoffer and I had a bit heated chat :) on this
>>> subject last night. Christoffer, really appreciate
>>> your time! We did not really reach agreement
>>> during the chat and Christoffer asked me to follow
>>> up on this thread.
>>> Here it goes. Sorry, it is very long email.
>>>
>>> I don't believe we can assign any endianity to
>>> mmio.data[] byte array. I believe mmio.data[] and
>>> mmio.len acts just memcpy and that is all. As
>>> memcpy does not imply any endianity of underlying
>>> data mmio.data[] should not either.
>>>
>>> Here is my definition:
>>>
>>> mmio.data[] is array of bytes that contains memory
>>> bytes in such form, for read case, that if those
>>> bytes are placed in guest memory and guest executes
>>> the same read access instruction with address to this
>>> memory, result would be the same as real h/w device
>>> memory access. Rest of KVM host and hypervisor
>>> part of code should really take care of mmio.data[]
>>> memory so it will be delivered to vcpu registers and
>>> restored by hypervisor part in such way that guest CPU
>>> register value is the same as it would be for real
>>> non-emulated h/w read access (that is emulation part).
>>> The same goes for write access, if guest writes into
>>> memory and those bytes are just copied to emulated
>>> h/w register it would have the same effect as real
>>> mapped h/w register write.
>>>
>>> In shorter form, i.e for len=4 access: endianity of integer
>>> at &mmio.data[0] address should match endianity
>>> of emulated h/w device behind phys_addr address,
>>> regardless what is endianity of emulator, KVM host,
>>> hypervisor, and guest
>>>
>>> Examples that illustrate my definition
>>> --
>>>
>>> 1) LE guest (E bit is off in ARM speak) reads integer
>>> (4 bytes) from mapped h/w LE device register -
>>> mmio.data[3] contains MSB, mmio.data[0] contains LSB.
>>>
>>> 2) BE guest (E bit is on in ARM speak) reads integer
>>> from mapped h/w LE device register - mmio.data[3]
>>> contains MSB, mmio.data[0] contains LSB. Note that
>>> if &mmio.data[0] memory would be placed in guest
>>> address space and instruction restarted with new
>>> address, then it would meet BE guest expectations
>>> - the guest knows that it reads LE h/w so it will byteswap
>>> register before processing it further. This is BE guest ARM
>>> case (regardless of what KVM host endianity is).
>>>
>>> 3) BE guest reads integer from mapped h/w BE device
>>> register - mmio.data[0] contains MSB, mmio.data[3]
>>> contains LSB. Note that if &mmio.data[0] memory would
>>> be placed in guest address space and instruction
>>> restarted with new address, then it would meet BE
>>> guest expectation - the guest knows that it reads
>>> BE h/w so it will proceed further without any other
>>> work. I guess, it is BE ppc case.
>>>
>>>
>>> Arguments in favor of memcpy semantics of mmio.data[]
>>> --
>>>
>>> x) What are possible values of 'len'? Previous discussions
>>> imply that is always powers of 2. Why is that? Maybe
>>> there will be CPU that would need to do 5 bytes mmio
>>> access, or 6 bytes. How do you assign endianity to
>>> such case? 'len' 5 or 6, or any works fine with
>>> memcpy semantics. I admit it is hypothetical case, but
>>> IMHO it tests how clean ABI definition is.
>>>
>>> x) Byte array does not have endianity because it
>>> does not have any structure. If one would want to
>>> imply structure why mmio is not defined in such way
>>> so structure reflected in mmio definition?
>>> Something like:
>>>
>>>
>>>/* KVM_EXIT_MMIO */
>>>struct {
>>>  __u64 phys_addr;
>>>  union {
>>>   __u8 byte;
>>>   __u16 hword;
>>>   __u32 word;
>>>   __u64 dword;
>>>  }  data;
>>>  __u32 len;
>>>  __u8  is_write;
>>>} mmio;
>>>
>>> where len is really serves as union discriminator and
>>> only allowed len values are 1, 2, 4, 8.
>>> In this case, I agree, endianity of integer types
>>> should be defined. I believe, use of byte array strongly
>>> implies that original intent was to have semantics of
>>> byte stream copy, just like memcpy does.
>>>
>>> x) Note there is nothing wrong with user kernel ABI to
>>> use just bytes stream as parameter. There is already
>>> precedents like 'read' and 'write' system calls :).
>>>
>>> x) Consider case when KVM works with emulated memory mapped
>>> h/w devices where some devices operate in LE mode and others
>>> operate in BE mode. It is defined by semantics of real h/w
>>> device which is it, and shoul

Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-21 Thread Alexander Graf


> Am 22.01.2014 um 07:31 schrieb Anup Patel :
> 
> On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky
>  wrote:
>> Hi Guys,
>> 
>> Christoffer and I had a bit heated chat :) on this
>> subject last night. Christoffer, really appreciate
>> your time! We did not really reach agreement
>> during the chat and Christoffer asked me to follow
>> up on this thread.
>> Here it goes. Sorry, it is very long email.
>> 
>> I don't believe we can assign any endianity to
>> mmio.data[] byte array. I believe mmio.data[] and
>> mmio.len acts just memcpy and that is all. As
>> memcpy does not imply any endianity of underlying
>> data mmio.data[] should not either.
>> 
>> Here is my definition:
>> 
>> mmio.data[] is array of bytes that contains memory
>> bytes in such form, for read case, that if those
>> bytes are placed in guest memory and guest executes
>> the same read access instruction with address to this
>> memory, result would be the same as real h/w device
>> memory access. Rest of KVM host and hypervisor
>> part of code should really take care of mmio.data[]
>> memory so it will be delivered to vcpu registers and
>> restored by hypervisor part in such way that guest CPU
>> register value is the same as it would be for real
>> non-emulated h/w read access (that is emulation part).
>> The same goes for write access, if guest writes into
>> memory and those bytes are just copied to emulated
>> h/w register it would have the same effect as real
>> mapped h/w register write.
>> 
>> In shorter form, i.e for len=4 access: endianity of integer
>> at &mmio.data[0] address should match endianity
>> of emulated h/w device behind phys_addr address,
>> regardless what is endianity of emulator, KVM host,
>> hypervisor, and guest
>> 
>> Examples that illustrate my definition
>> --
>> 
>> 1) LE guest (E bit is off in ARM speak) reads integer
>> (4 bytes) from mapped h/w LE device register -
>> mmio.data[3] contains MSB, mmio.data[0] contains LSB.
>> 
>> 2) BE guest (E bit is on in ARM speak) reads integer
>> from mapped h/w LE device register - mmio.data[3]
>> contains MSB, mmio.data[0] contains LSB. Note that
>> if &mmio.data[0] memory would be placed in guest
>> address space and instruction restarted with new
>> address, then it would meet BE guest expectations
>> - the guest knows that it reads LE h/w so it will byteswap
>> register before processing it further. This is BE guest ARM
>> case (regardless of what KVM host endianity is).
>> 
>> 3) BE guest reads integer from mapped h/w BE device
>> register - mmio.data[0] contains MSB, mmio.data[3]
>> contains LSB. Note that if &mmio.data[0] memory would
>> be placed in guest address space and instruction
>> restarted with new address, then it would meet BE
>> guest expectation - the guest knows that it reads
>> BE h/w so it will proceed further without any other
>> work. I guess, it is BE ppc case.
>> 
>> 
>> Arguments in favor of memcpy semantics of mmio.data[]
>> --
>> 
>> x) What are possible values of 'len'? Previous discussions
>> imply that is always powers of 2. Why is that? Maybe
>> there will be CPU that would need to do 5 bytes mmio
>> access, or 6 bytes. How do you assign endianity to
>> such case? 'len' 5 or 6, or any works fine with
>> memcpy semantics. I admit it is hypothetical case, but
>> IMHO it tests how clean ABI definition is.
>> 
>> x) Byte array does not have endianity because it
>> does not have any structure. If one would want to
>> imply structure why mmio is not defined in such way
>> so structure reflected in mmio definition?
>> Something like:
>> 
>> 
>>/* KVM_EXIT_MMIO */
>>struct {
>>  __u64 phys_addr;
>>  union {
>>   __u8 byte;
>>   __u16 hword;
>>   __u32 word;
>>   __u64 dword;
>>  }  data;
>>  __u32 len;
>>  __u8  is_write;
>>} mmio;
>> 
>> where len is really serves as union discriminator and
>> only allowed len values are 1, 2, 4, 8.
>> In this case, I agree, endianity of integer types
>> should be defined. I believe, use of byte array strongly
>> implies that original intent was to have semantics of
>> byte stream copy, just like memcpy does.
>> 
>> x) Note there is nothing wrong with user kernel ABI to
>> use just bytes stream as parameter. There is already
>> precedents like 'read' and 'write' system calls :).
>> 
>> x) Consider case when KVM works with emulated memory mapped
>> h/w devices where some devices operate in LE mode and others
>> operate in BE mode. It is defined by semantics of real h/w
>> device which is it, and should be emulated by emulator and KVM
>> given all other context. As far as mmio.data[] array concerned, if the
>> same integer value is read from these

Re: KVM and variable-endianness guest CPUs

2014-01-21 Thread Anup Patel
On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky
 wrote:
> Hi Guys,
>
> Christoffer and I had a bit heated chat :) on this
> subject last night. Christoffer, really appreciate
> your time! We did not really reach agreement
> during the chat and Christoffer asked me to follow
> up on this thread.
> Here it goes. Sorry, it is very long email.
>
> I don't believe we can assign any endianity to
> mmio.data[] byte array. I believe mmio.data[] and
> mmio.len acts just memcpy and that is all. As
> memcpy does not imply any endianity of underlying
> data mmio.data[] should not either.
>
> Here is my definition:
>
> mmio.data[] is array of bytes that contains memory
> bytes in such form, for read case, that if those
> bytes are placed in guest memory and guest executes
> the same read access instruction with address to this
> memory, result would be the same as real h/w device
> memory access. Rest of KVM host and hypervisor
> part of code should really take care of mmio.data[]
> memory so it will be delivered to vcpu registers and
> restored by hypervisor part in such way that guest CPU
> register value is the same as it would be for real
> non-emulated h/w read access (that is emulation part).
> The same goes for write access, if guest writes into
> memory and those bytes are just copied to emulated
> h/w register it would have the same effect as real
> mapped h/w register write.
>
> In shorter form, i.e for len=4 access: endianity of integer
> at &mmio.data[0] address should match endianity
> of emulated h/w device behind phys_addr address,
> regardless what is endianity of emulator, KVM host,
> hypervisor, and guest
>
> Examples that illustrate my definition
> --
>
> 1) LE guest (E bit is off in ARM speak) reads integer
> (4 bytes) from mapped h/w LE device register -
> mmio.data[3] contains MSB, mmio.data[0] contains LSB.
>
> 2) BE guest (E bit is on in ARM speak) reads integer
> from mapped h/w LE device register - mmio.data[3]
> contains MSB, mmio.data[0] contains LSB. Note that
> if &mmio.data[0] memory would be placed in guest
> address space and instruction restarted with new
> address, then it would meet BE guest expectations
> - the guest knows that it reads LE h/w so it will byteswap
> register before processing it further. This is BE guest ARM
> case (regardless of what KVM host endianity is).
>
> 3) BE guest reads integer from mapped h/w BE device
> register - mmio.data[0] contains MSB, mmio.data[3]
> contains LSB. Note that if &mmio.data[0] memory would
> be placed in guest address space and instruction
> restarted with new address, then it would meet BE
> guest expectation - the guest knows that it reads
> BE h/w so it will proceed further without any other
> work. I guess, it is BE ppc case.
>
>
> Arguments in favor of memcpy semantics of mmio.data[]
> --
>
> x) What are possible values of 'len'? Previous discussions
> imply that is always powers of 2. Why is that? Maybe
> there will be CPU that would need to do 5 bytes mmio
> access, or 6 bytes. How do you assign endianity to
> such case? 'len' 5 or 6, or any works fine with
> memcpy semantics. I admit it is hypothetical case, but
> IMHO it tests how clean ABI definition is.
>
> x) Byte array does not have endianity because it
> does not have any structure. If one would want to
> imply structure why mmio is not defined in such way
> so structure reflected in mmio definition?
> Something like:
>
>
> /* KVM_EXIT_MMIO */
> struct {
>   __u64 phys_addr;
>   union {
>__u8 byte;
>__u16 hword;
>__u32 word;
>__u64 dword;
>   }  data;
>   __u32 len;
>   __u8  is_write;
> } mmio;
>
> where len is really serves as union discriminator and
> only allowed len values are 1, 2, 4, 8.
> In this case, I agree, endianity of integer types
> should be defined. I believe, use of byte array strongly
> implies that original intent was to have semantics of
> byte stream copy, just like memcpy does.
>
> x) Note there is nothing wrong with user kernel ABI to
> use just bytes stream as parameter. There is already
> precedents like 'read' and 'write' system calls :).
>
> x) Consider case when KVM works with emulated memory mapped
> h/w devices where some devices operate in LE mode and others
> operate in BE mode. It is defined by semantics of real h/w
> device which is it, and should be emulated by emulator and KVM
> given all other context. As far as mmio.data[] array concerned, if the
> same integer value is read from these devices registers, mmio.data[]
> memory should contain integer in opposite endianity for these
> two cases, i.e MSB is data[0] in one case and MSB is
> data[3] is in another 

Re: KVM and variable-endianness guest CPUs

2014-01-21 Thread Victor Kamensky
Hi Guys,

Christoffer and I had a bit heated chat :) on this
subject last night. Christoffer, really appreciate
your time! We did not really reach agreement
during the chat and Christoffer asked me to follow
up on this thread.
Here it goes. Sorry, it is very long email.

I don't believe we can assign any endianity to
mmio.data[] byte array. I believe mmio.data[] and
mmio.len acts just memcpy and that is all. As
memcpy does not imply any endianity of underlying
data mmio.data[] should not either.

Here is my definition:

mmio.data[] is array of bytes that contains memory
bytes in such form, for read case, that if those
bytes are placed in guest memory and guest executes
the same read access instruction with address to this
memory, result would be the same as real h/w device
memory access. Rest of KVM host and hypervisor
part of code should really take care of mmio.data[]
memory so it will be delivered to vcpu registers and
restored by hypervisor part in such way that guest CPU
register value is the same as it would be for real
non-emulated h/w read access (that is emulation part).
The same goes for write access, if guest writes into
memory and those bytes are just copied to emulated
h/w register it would have the same effect as real
mapped h/w register write.

In shorter form, i.e for len=4 access: endianity of integer
at &mmio.data[0] address should match endianity
of emulated h/w device behind phys_addr address,
regardless what is endianity of emulator, KVM host,
hypervisor, and guest

Examples that illustrate my definition
--

1) LE guest (E bit is off in ARM speak) reads integer
(4 bytes) from mapped h/w LE device register -
mmio.data[3] contains MSB, mmio.data[0] contains LSB.

2) BE guest (E bit is on in ARM speak) reads integer
from mapped h/w LE device register - mmio.data[3]
contains MSB, mmio.data[0] contains LSB. Note that
if &mmio.data[0] memory would be placed in guest
address space and instruction restarted with new
address, then it would meet BE guest expectations
- the guest knows that it reads LE h/w so it will byteswap
register before processing it further. This is BE guest ARM
case (regardless of what KVM host endianity is).

3) BE guest reads integer from mapped h/w BE device
register - mmio.data[0] contains MSB, mmio.data[3]
contains LSB. Note that if &mmio.data[0] memory would
be placed in guest address space and instruction
restarted with new address, then it would meet BE
guest expectation - the guest knows that it reads
BE h/w so it will proceed further without any other
work. I guess, it is BE ppc case.


Arguments in favor of memcpy semantics of mmio.data[]
--

x) What are possible values of 'len'? Previous discussions
imply that is always powers of 2. Why is that? Maybe
there will be CPU that would need to do 5 bytes mmio
access, or 6 bytes. How do you assign endianity to
such case? 'len' 5 or 6, or any works fine with
memcpy semantics. I admit it is hypothetical case, but
IMHO it tests how clean ABI definition is.

x) Byte array does not have endianity because it
does not have any structure. If one would want to
imply structure why mmio is not defined in such way
so structure reflected in mmio definition?
Something like:


/* KVM_EXIT_MMIO */
struct {
  __u64 phys_addr;
  union {
   __u8 byte;
   __u16 hword;
   __u32 word;
   __u64 dword;
  }  data;
  __u32 len;
  __u8  is_write;
} mmio;

where len is really serves as union discriminator and
only allowed len values are 1, 2, 4, 8.
In this case, I agree, endianity of integer types
should be defined. I believe, use of byte array strongly
implies that original intent was to have semantics of
byte stream copy, just like memcpy does.

x) Note there is nothing wrong with user kernel ABI to
use just bytes stream as parameter. There is already
precedents like 'read' and 'write' system calls :).

x) Consider case when KVM works with emulated memory mapped
h/w devices where some devices operate in LE mode and others
operate in BE mode. It is defined by semantics of real h/w
device which is it, and should be emulated by emulator and KVM
given all other context. As far as mmio.data[] array concerned, if the
same integer value is read from these devices registers, mmio.data[]
memory should contain integer in opposite endianity for these
two cases, i.e MSB is data[0] in one case and MSB is
data[3] is in another case. It cannot be the same, because
except emulator and guest kernel, all other, like KVM host
and hypervisor, have no clue what endianity of device
actually is - it should treat mmio.data[] in the same way.
But resulting guest target CPU register would need to contain
nor

Re: KVM and variable-endianness guest CPUs

2014-01-20 Thread Christoffer Dall
On Mon, Jan 20, 2014 at 03:22:11PM +0100, Alexander Graf wrote:
> 
> On 17.01.2014, at 19:52, Peter Maydell  wrote:
> 
> > On 17 January 2014 17:53, Peter Maydell  wrote:
> >> Specifically, the KVM API says "here's a uint8_t[] byte
> >> array and a length", and the current QEMU code treats that
> >> as "this is a byte array written as if the guest CPU
> >> (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
> >> I/O access to this buffer rather than to the device".
> >> 
> >> The KVM API docs don't actually specify the endianness
> >> semantics of the byte array, but I think that that really
> >> needs to be nailed down. I can think of a couple of options:
> >> * always LE
> >> * always BE
> >>   [these first two are non-starters because they would
> >>   break either x86 or PPC existing code]
> >> * always the endianness the guest is at the time
> >> * always some arbitrary endianness based purely on the
> >>   endianness the KVM implementation used historically
> >> * always the endianness of the host QEMU binary
> >> * something else?
> >> 
> >> Any preferences? Current QEMU code basically assumes
> >> "always the endianness of TARGET_WORDS_BIGENDIAN",
> >> which is pretty random.
> > 
> > Having thought a little more about this, my opinion is:
> > 
> > * we should specify that the byte order of the mmio.data
> >   array is host kernel endianness (ie same endianness
> >   as the QEMU process itself) [this is what it actually
> >   is, I think, for all the cases that work today]
> > * we should fix the code path in QEMU for handling
> >   mmio.data which currently has the implicit assumption
> >   that when using KVM TARGET_WORDS_BIGENDIAN is the same
> >   as the QEMU host process endianness (because it's using
> >   load/store functions which swap if TARGET_WORDS_BIGENDIAN
> >   is different from HOST_WORDS_BIGENDIAN)
> 
> Yes, I fully agree :).
> 
Great, I'll prepare a patch for the KVM API documentation.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-20 Thread Peter Maydell
On 20 January 2014 14:20, Alexander Graf  wrote:
> I think I see the problem now. You're thinking about LE hosts, not LE guests.
>
> I think the only really sensible options would be to
>
>   a) Always use a statically define target endianness (big for ppc)
>   b) Always use host endianness

> Currently QEMU apparently implements a), but that can
> easily be changed. Today we don't have kvm support for
> ppc64le hosts yet.

Yes; I would ideally like us be able to get rid of that
statically defined target endianness eventually, so if we
have the leeway to define the kernel<->userspace ABI in a
way that doesn't care about the current guest CPU endianness
(ie we haven't actually yet claimed support for
reverse-endianness guests in a way that locks us into an
unhelpful definition of the ABI) we should take it while
we still can.

Then the current QEMU restrictions boil down to "you can
only use QEMU for KVM on a host kernel with the same
endianness as QEMU's legacy TARGET_WORDS_BIGENDIAN
setting for that CPU" (but such a QEMU can deal with
guests whatever they do with the endianness control bits).

> I personally prefer b). It's the natural thing to do for
> a host interface to be in host endianness and it's exactly
> what we expose for LE-on-BE systems with ppc already.

Yes. Strictly speaking by "host endianness" here I guess
we mean "the endianness of the kernel-to-userspace ABI",
since it is at least in theory possible to have an LE
kernel which runs BE userspace processes.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-20 Thread Alexander Graf

On 17.01.2014, at 19:52, Peter Maydell  wrote:

> On 17 January 2014 17:53, Peter Maydell  wrote:
>> Specifically, the KVM API says "here's a uint8_t[] byte
>> array and a length", and the current QEMU code treats that
>> as "this is a byte array written as if the guest CPU
>> (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
>> I/O access to this buffer rather than to the device".
>> 
>> The KVM API docs don't actually specify the endianness
>> semantics of the byte array, but I think that that really
>> needs to be nailed down. I can think of a couple of options:
>> * always LE
>> * always BE
>>   [these first two are non-starters because they would
>>   break either x86 or PPC existing code]
>> * always the endianness the guest is at the time
>> * always some arbitrary endianness based purely on the
>>   endianness the KVM implementation used historically
>> * always the endianness of the host QEMU binary
>> * something else?
>> 
>> Any preferences? Current QEMU code basically assumes
>> "always the endianness of TARGET_WORDS_BIGENDIAN",
>> which is pretty random.
> 
> Having thought a little more about this, my opinion is:
> 
> * we should specify that the byte order of the mmio.data
>   array is host kernel endianness (ie same endianness
>   as the QEMU process itself) [this is what it actually
>   is, I think, for all the cases that work today]
> * we should fix the code path in QEMU for handling
>   mmio.data which currently has the implicit assumption
>   that when using KVM TARGET_WORDS_BIGENDIAN is the same
>   as the QEMU host process endianness (because it's using
>   load/store functions which swap if TARGET_WORDS_BIGENDIAN
>   is different from HOST_WORDS_BIGENDIAN)

Yes, I fully agree :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-20 Thread Alexander Graf

On 18.01.2014, at 11:15, Peter Maydell  wrote:

> On 18 January 2014 07:32, Alexander Graf  wrote:
>>> Am 18.01.2014 um 05:24 schrieb Christoffer Dall 
>>> :
 On Fri, Jan 17, 2014 at 06:52:57PM +, Peter Maydell wrote:
 Having thought a little more about this, my opinion is:
 
 * we should specify that the byte order of the mmio.data
  array is host kernel endianness (ie same endianness
  as the QEMU process itself) [this is what it actually
  is, I think, for all the cases that work today]
>>> 
>>> I completely agree, given that it's too late to be set on always LE/BE,
>>> I think the natural choice is something that allows a user to cast the
>>> byte array to an appropriate pointer type and dereference it.
>>> 
>>> And I think we need to amend the KVM API docs to specify this.
>> 
>> I don't see the problem.
> 
> The problem is (a) the docs aren't clear about the semantics
> (b) people have picked behaviour that suited them
> to implement without documenting what it was.

I think I see the problem now. You're thinking about LE hosts, not LE guests.

I think the only really sensible options would be to

  a) Always use a statically define target endianness (big for ppc)
  b) Always use host endianness

Currently QEMU apparently implements a), but that can easily be changed. Today 
we don't have kvm support for ppc64le hosts yet.

I personally prefer b). It's the natural thing to do for a host interface to be 
in host endianness and it's exactly what we expose for LE-on-BE systems with 
ppc already.

> 
>> For ppc we always do mmio emulation
>> as if the cpu was big endian.
> 
> Even if the guest, the host kernel and QEMU in userspace are
> all little endian?
> 
> Also "mmio emulation as if the CPU was big endian"
> doesn't make sense -- MMIO emulation doesn't depend
> on CPU endianness.
> 
>> We've had an is_bigendian variable
>> for that since the very first versions.
> 
> Where? In the kernel? In QEMU? What does it control?

In KVM. Check out 
https://github.com/agraf/linux-2.6/commit/1c00e7c21e39e20be7b03b111d5ab90ce938f108


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-18 Thread Peter Maydell
On 18 January 2014 07:32, Alexander Graf  wrote:
>> Am 18.01.2014 um 05:24 schrieb Christoffer Dall 
>> :
>>> On Fri, Jan 17, 2014 at 06:52:57PM +, Peter Maydell wrote:
>>> Having thought a little more about this, my opinion is:
>>>
>>> * we should specify that the byte order of the mmio.data
>>>   array is host kernel endianness (ie same endianness
>>>   as the QEMU process itself) [this is what it actually
>>>   is, I think, for all the cases that work today]
>>
>> I completely agree, given that it's too late to be set on always LE/BE,
>> I think the natural choice is something that allows a user to cast the
>> byte array to an appropriate pointer type and dereference it.
>>
>> And I think we need to amend the KVM API docs to specify this.
>
> I don't see the problem.

The problem is (a) the docs aren't clear about the semantics
(b) people have picked behaviour that suited them
to implement without documenting what it was.

> For ppc we always do mmio emulation
> as if the cpu was big endian.

Even if the guest, the host kernel and QEMU in userspace are
all little endian?

Also "mmio emulation as if the CPU was big endian"
doesn't make sense -- MMIO emulation doesn't depend
on CPU endianness.

> We've had an is_bigendian variable
> for that since the very first versions.

Where? In the kernel? In QEMU? What does it control?

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-17 Thread Alexander Graf


> Am 18.01.2014 um 05:24 schrieb Christoffer Dall :
> 
>> On Fri, Jan 17, 2014 at 06:52:57PM +, Peter Maydell wrote:
>>> On 17 January 2014 17:53, Peter Maydell  wrote:
>>> Specifically, the KVM API says "here's a uint8_t[] byte
>>> array and a length", and the current QEMU code treats that
>>> as "this is a byte array written as if the guest CPU
>>> (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
>>> I/O access to this buffer rather than to the device".
>>> 
>>> The KVM API docs don't actually specify the endianness
>>> semantics of the byte array, but I think that that really
>>> needs to be nailed down. I can think of a couple of options:
>>> * always LE
>>> * always BE
>>>   [these first two are non-starters because they would
>>>   break either x86 or PPC existing code]
>>> * always the endianness the guest is at the time
>>> * always some arbitrary endianness based purely on the
>>>   endianness the KVM implementation used historically
>>> * always the endianness of the host QEMU binary
>>> * something else?
>>> 
>>> Any preferences? Current QEMU code basically assumes
>>> "always the endianness of TARGET_WORDS_BIGENDIAN",
>>> which is pretty random.
>> 
>> Having thought a little more about this, my opinion is:
>> 
>> * we should specify that the byte order of the mmio.data
>>   array is host kernel endianness (ie same endianness
>>   as the QEMU process itself) [this is what it actually
>>   is, I think, for all the cases that work today]
> 
> I completely agree, given that it's too late to be set on always LE/BE,
> I think the natural choice is something that allows a user to cast the
> byte array to an appropriate pointer type and dereference it.
> 
> And I think we need to amend the KVM API docs to specify this.

I don't see the problem. For ppc we always do mmio emulation as if the cpu was 
big endian. We've had an is_bigendian variable for that since the very first 
versions.


Alex

> 
> -- 
> Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-17 Thread Christoffer Dall
On Fri, Jan 17, 2014 at 06:52:57PM +, Peter Maydell wrote:
> On 17 January 2014 17:53, Peter Maydell  wrote:
> > Specifically, the KVM API says "here's a uint8_t[] byte
> > array and a length", and the current QEMU code treats that
> > as "this is a byte array written as if the guest CPU
> > (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
> > I/O access to this buffer rather than to the device".
> >
> > The KVM API docs don't actually specify the endianness
> > semantics of the byte array, but I think that that really
> > needs to be nailed down. I can think of a couple of options:
> >  * always LE
> >  * always BE
> >[these first two are non-starters because they would
> >break either x86 or PPC existing code]
> >  * always the endianness the guest is at the time
> >  * always some arbitrary endianness based purely on the
> >endianness the KVM implementation used historically
> >  * always the endianness of the host QEMU binary
> >  * something else?
> >
> > Any preferences? Current QEMU code basically assumes
> > "always the endianness of TARGET_WORDS_BIGENDIAN",
> > which is pretty random.
> 
> Having thought a little more about this, my opinion is:
> 
>  * we should specify that the byte order of the mmio.data
>array is host kernel endianness (ie same endianness
>as the QEMU process itself) [this is what it actually
>is, I think, for all the cases that work today]

I completely agree, given that it's too late to be set on always LE/BE,
I think the natural choice is something that allows a user to cast the
byte array to an appropriate pointer type and dereference it.

And I think we need to amend the KVM API docs to specify this.

-- 
Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-17 Thread Peter Maydell
On 17 January 2014 17:53, Peter Maydell  wrote:
> Specifically, the KVM API says "here's a uint8_t[] byte
> array and a length", and the current QEMU code treats that
> as "this is a byte array written as if the guest CPU
> (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
> I/O access to this buffer rather than to the device".
>
> The KVM API docs don't actually specify the endianness
> semantics of the byte array, but I think that that really
> needs to be nailed down. I can think of a couple of options:
>  * always LE
>  * always BE
>[these first two are non-starters because they would
>break either x86 or PPC existing code]
>  * always the endianness the guest is at the time
>  * always some arbitrary endianness based purely on the
>endianness the KVM implementation used historically
>  * always the endianness of the host QEMU binary
>  * something else?
>
> Any preferences? Current QEMU code basically assumes
> "always the endianness of TARGET_WORDS_BIGENDIAN",
> which is pretty random.

Having thought a little more about this, my opinion is:

 * we should specify that the byte order of the mmio.data
   array is host kernel endianness (ie same endianness
   as the QEMU process itself) [this is what it actually
   is, I think, for all the cases that work today]
 * we should fix the code path in QEMU for handling
   mmio.data which currently has the implicit assumption
   that when using KVM TARGET_WORDS_BIGENDIAN is the same
   as the QEMU host process endianness (because it's using
   load/store functions which swap if TARGET_WORDS_BIGENDIAN
   is different from HOST_WORDS_BIGENDIAN)

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM and variable-endianness guest CPUs

2014-01-17 Thread Peter Maydell
[This seemed like a good jumping off point for this question.]

On 16 January 2014 17:51, Alexander Graf  wrote:
> Am 16.01.2014 um 18:41 schrieb Peter Maydell :
>> Also see my remarks on the previous patch series suggesting
>> that we should look at this in a more holistic way than
>> just randomly fixing small bits of things. A good place
>> to start would be "what should the semantics of stl_p()
>> be for a QEMU where the CPU is currently operating with
>> a reversed endianness to the TARGET_WORDS_BIGENDIAN
>> setting?".
>
> That'd open a giant can of worms that I'd rather not open.

Yeah, but you kind of have to open that can, because stl_p()
is used in the code path for KVM MMIO accesses to devices.

Specifically, the KVM API says "here's a uint8_t[] byte
array and a length", and the current QEMU code treats that
as "this is a byte array written as if the guest CPU
(a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
I/O access to this buffer rather than to the device".

The KVM API docs don't actually specify the endianness
semantics of the byte array, but I think that that really
needs to be nailed down. I can think of a couple of options:
 * always LE
 * always BE
   [these first two are non-starters because they would
   break either x86 or PPC existing code]
 * always the endianness the guest is at the time
 * always some arbitrary endianness based purely on the
   endianness the KVM implementation used historically
 * always the endianness of the host QEMU binary
 * something else?

Any preferences? Current QEMU code basically assumes
"always the endianness of TARGET_WORDS_BIGENDIAN",
which is pretty random.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html