Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/25/2014 07:54 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
>>
>> I do not understand why @val is considered LE here and need to be
>> converted
>> to CPU. Really. I truly believe it should be cpu_to_le32().
> 
> No. Both are slightly wrong semantically but le32_to_cpu() is less
> wrong :-)
>
> iowrite32 supposedly takes a "cpu" value as argument and writes an "le"
> value. So if anything, you need something that converts to a "cpu" value
> before you call iowrite32.
> 
> Now it's still slightly wrong because the "input" to le32_to_cpu() is
> supposed to be an "LE" value but of course here it's not, it's a "cpu"
> value too :-)
> 
> But yes, I agree with aw here, either do nothing or stick a set of
> iowriteXX_native or something equivalent in the generic iomap header,
> define them in term of iowrite32be/iowrite32 based on the compile time
> endian of the arch.


Ok. I'll do nothing.


> Hitting asm-generic/iomap.h I think will cover all archs except ARM.
> For ARM, just hack arch/arm/asm/io.h


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Benjamin Herrenschmidt
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
> 
> I do not understand why @val is considered LE here and need to be
> converted
> to CPU. Really. I truly believe it should be cpu_to_le32().

No. Both are slightly wrong semantically but le32_to_cpu() is less
wrong :-)

iowrite32 supposedly takes a "cpu" value as argument and writes an "le"
value. So if anything, you need something that converts to a "cpu" value
before you call iowrite32.

Now it's still slightly wrong because the "input" to le32_to_cpu() is
supposed to be an "LE" value but of course here it's not, it's a "cpu"
value too :-)

But yes, I agree with aw here, either do nothing or stick a set of
iowriteXX_native or something equivalent in the generic iomap header,
define them in term of iowrite32be/iowrite32 based on the compile time
endian of the arch.

Hitting asm-generic/iomap.h I think will cover all archs except ARM.
For ARM, just hack arch/arm/asm/io.h

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Benjamin Herrenschmidt
On Tue, 2014-06-24 at 12:41 +0200, Alexander Graf wrote:
> Is there actually any difference in generated code with this patch 
> applied and without? I would hope that iowrite..() is inlined and 
> cancels out the cpu_to_le..() calls that are also inlined?

No, the former uses byteswapping asm, the compiler can't "cancel" it
out, but the overhead of the additional byteswap might not be
measurable.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/25/2014 12:43 AM, Alex Williamson wrote:
> On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
>> On 06/25/2014 12:21 AM, Alex Williamson wrote:
>>> On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
 On 24.06.14 15:01, Alexey Kardashevskiy wrote:
> On 06/24/2014 10:52 PM, Alexander Graf wrote:
>> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
>>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
 On 24.06.14 12:11, Alexey Kardashevskiy wrote:
> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>
>>> Working on big endian being an accident may be a matter of 
>>> perspective
>> :-)
>>
>>> The comment remains that this patch doesn't actually fix anything 
>>> except
>>> the overhead on big endian systems doing redundant byte swapping and
>>> maybe the philosophy that vfio regions are little endian.
>> Yes, that works by accident because technically VFIO is a transport 
>> and
>> thus shouldn't perform any endian swapping of any sort, which remains
>> the responsibility of the end driver which is the only one to know
>> whether a given BAR location is a a register or some streaming data
>> and in the former case whether it's LE or BE (some PCI devices are BE
>> even ! :-)
>>
>> But yes, in the end, it works with the dual "cancelling" swaps and 
>> the
>> overhead of those swaps is probably drowned in the noise of the 
>> syscall
>> overhead.
>>
>>> I'm still not a fan of iowrite vs iowritebe, there must be 
>>> something we
>>> can use that doesn't have an implicit swap.
>> Sadly there isn't ... In the old day we didn't even have the "be"
>> variant and readl/writel style accessors still don't have them either
>> for all archs.
>>
>> There is __raw_readl/writel but here the semantics are much more than
>> just "don't swap", they also don't have memory barriers (which means
>> they are essentially useless to most drivers unless those are 
>> platform
>> specific drivers which know exactly what they are doing, or in the 
>> rare
>> cases such as accessing a framebuffer which we know never have side
>> effects).
>>
>>> Calling it iowrite*_native is also an abuse of the namespace.
>>> Next thing we know some common code
>>> will legitimately use that name.
>> I might make sense to those definitions into a common header. There 
>> have
>> been a handful of cases in the past that wanted that sort of "native
>> byte order" MMIOs iirc (though don't ask me for examples, I can't 
>> really
>> remember).
>>
>>> If we do need to define an alias
>>> (which I'd like to avoid) it should be something like 
>>> vfio_iowrite32.
> Ping?
>
> We need to make a decision whether to move those xxx_native() helpers
> somewhere (where?) or leave the patch as is (as we figured out that
> iowriteXX functions implement barriers and we cannot just use raw
> accessors) and fix commit log to explain everything.
 Is there actually any difference in generated code with this patch 
 applied
 and without? I would hope that iowrite..() is inlined and cancels out 
 the
 cpu_to_le..() calls that are also inlined?
>>> iowrite32 is a non-inline function so conversions take place so are the
>>> others. And sorry but I fail to see why this matters. We are not trying 
>>> to
>>> accelerate things, we are removing redundant operations which confuse
>>> people who read the code.
>> The confusion depends on where you're coming from. If you happen to know
>> that "iowrite32" writes in LE, then the LE conversion makes a lot of 
>> sense.
> It was like this (and this is just confusing):
>
> iowrite32(le32_to_cpu(val), io + off);
>
> What would make sense (according to you and I would understand this) is 
> this:
>
> iowrite32(cpu_to_le32(val), io + off);
>
>
> Or I missed your point, did I?

 No, you didn't miss it. I think for people who know how iowrite32() 
 works the above is obvious. I find the fact that iowrite32() writes in 
 LE always pretty scary though ;).

 So IMHO we should either create new, generic iowrite helpers that don't 
 do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
>>>
>>> I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
>>
>>
>> I do not understand why @val is considered LE here and need to be converted
>> to CPU. Really. I truly believe it should be 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alex Williamson
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
> On 06/25/2014 12:21 AM, Alex Williamson wrote:
> > On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
> >> On 24.06.14 15:01, Alexey Kardashevskiy wrote:
> >>> On 06/24/2014 10:52 PM, Alexander Graf wrote:
>  On 24.06.14 14:50, Alexey Kardashevskiy wrote:
> > On 06/24/2014 08:41 PM, Alexander Graf wrote:
> >> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
> >>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>  On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
> 
> > Working on big endian being an accident may be a matter of 
> > perspective
>  :-)
> 
> > The comment remains that this patch doesn't actually fix anything 
> > except
> > the overhead on big endian systems doing redundant byte swapping and
> > maybe the philosophy that vfio regions are little endian.
>  Yes, that works by accident because technically VFIO is a transport 
>  and
>  thus shouldn't perform any endian swapping of any sort, which remains
>  the responsibility of the end driver which is the only one to know
>  whether a given BAR location is a a register or some streaming data
>  and in the former case whether it's LE or BE (some PCI devices are BE
>  even ! :-)
> 
>  But yes, in the end, it works with the dual "cancelling" swaps and 
>  the
>  overhead of those swaps is probably drowned in the noise of the 
>  syscall
>  overhead.
> 
> > I'm still not a fan of iowrite vs iowritebe, there must be 
> > something we
> > can use that doesn't have an implicit swap.
>  Sadly there isn't ... In the old day we didn't even have the "be"
>  variant and readl/writel style accessors still don't have them either
>  for all archs.
> 
>  There is __raw_readl/writel but here the semantics are much more than
>  just "don't swap", they also don't have memory barriers (which means
>  they are essentially useless to most drivers unless those are 
>  platform
>  specific drivers which know exactly what they are doing, or in the 
>  rare
>  cases such as accessing a framebuffer which we know never have side
>  effects).
> 
> > Calling it iowrite*_native is also an abuse of the namespace.
> > Next thing we know some common code
> > will legitimately use that name.
>  I might make sense to those definitions into a common header. There 
>  have
>  been a handful of cases in the past that wanted that sort of "native
>  byte order" MMIOs iirc (though don't ask me for examples, I can't 
>  really
>  remember).
> 
> > If we do need to define an alias
> > (which I'd like to avoid) it should be something like 
> > vfio_iowrite32.
> >>> Ping?
> >>>
> >>> We need to make a decision whether to move those xxx_native() helpers
> >>> somewhere (where?) or leave the patch as is (as we figured out that
> >>> iowriteXX functions implement barriers and we cannot just use raw
> >>> accessors) and fix commit log to explain everything.
> >> Is there actually any difference in generated code with this patch 
> >> applied
> >> and without? I would hope that iowrite..() is inlined and cancels out 
> >> the
> >> cpu_to_le..() calls that are also inlined?
> > iowrite32 is a non-inline function so conversions take place so are the
> > others. And sorry but I fail to see why this matters. We are not trying 
> > to
> > accelerate things, we are removing redundant operations which confuse
> > people who read the code.
>  The confusion depends on where you're coming from. If you happen to know
>  that "iowrite32" writes in LE, then the LE conversion makes a lot of 
>  sense.
> >>> It was like this (and this is just confusing):
> >>>
> >>> iowrite32(le32_to_cpu(val), io + off);
> >>>
> >>> What would make sense (according to you and I would understand this) is 
> >>> this:
> >>>
> >>> iowrite32(cpu_to_le32(val), io + off);
> >>>
> >>>
> >>> Or I missed your point, did I?
> >>
> >> No, you didn't miss it. I think for people who know how iowrite32() 
> >> works the above is obvious. I find the fact that iowrite32() writes in 
> >> LE always pretty scary though ;).
> >>
> >> So IMHO we should either create new, generic iowrite helpers that don't 
> >> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
> > 
> > I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
> 
> 
> I do not understand why @val is considered LE here and need to be converted
> to CPU. Really. I truly believe it should be cpu_to_le32().

Because iowrite32 is defined to take 

RE: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread David Laight
From: Alexey Kardashevskiy
...
> >> So IMHO we should either create new, generic iowrite helpers that don't
> >> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
> >
> > I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
> 
> 
> I do not understand why @val is considered LE here and need to be converted
> to CPU. Really. I truly believe it should be cpu_to_le32().

Think about it carefully...

Apparently iowrite32() is writing a 'cpu' value out 'le'.
So if you have a 'le' value you need to convert it to 'cpu' first.

I really won't ask how 'be' ppc managed to get 'le' on-chip peripherals.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/25/2014 12:21 AM, Alex Williamson wrote:
> On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
>> On 24.06.14 15:01, Alexey Kardashevskiy wrote:
>>> On 06/24/2014 10:52 PM, Alexander Graf wrote:
 On 24.06.14 14:50, Alexey Kardashevskiy wrote:
> On 06/24/2014 08:41 PM, Alexander Graf wrote:
>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
>>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
 On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

> Working on big endian being an accident may be a matter of perspective
 :-)

> The comment remains that this patch doesn't actually fix anything 
> except
> the overhead on big endian systems doing redundant byte swapping and
> maybe the philosophy that vfio regions are little endian.
 Yes, that works by accident because technically VFIO is a transport and
 thus shouldn't perform any endian swapping of any sort, which remains
 the responsibility of the end driver which is the only one to know
 whether a given BAR location is a a register or some streaming data
 and in the former case whether it's LE or BE (some PCI devices are BE
 even ! :-)

 But yes, in the end, it works with the dual "cancelling" swaps and the
 overhead of those swaps is probably drowned in the noise of the syscall
 overhead.

> I'm still not a fan of iowrite vs iowritebe, there must be something 
> we
> can use that doesn't have an implicit swap.
 Sadly there isn't ... In the old day we didn't even have the "be"
 variant and readl/writel style accessors still don't have them either
 for all archs.

 There is __raw_readl/writel but here the semantics are much more than
 just "don't swap", they also don't have memory barriers (which means
 they are essentially useless to most drivers unless those are platform
 specific drivers which know exactly what they are doing, or in the rare
 cases such as accessing a framebuffer which we know never have side
 effects).

> Calling it iowrite*_native is also an abuse of the namespace.
> Next thing we know some common code
> will legitimately use that name.
 I might make sense to those definitions into a common header. There 
 have
 been a handful of cases in the past that wanted that sort of "native
 byte order" MMIOs iirc (though don't ask me for examples, I can't 
 really
 remember).

> If we do need to define an alias
> (which I'd like to avoid) it should be something like vfio_iowrite32.
>>> Ping?
>>>
>>> We need to make a decision whether to move those xxx_native() helpers
>>> somewhere (where?) or leave the patch as is (as we figured out that
>>> iowriteXX functions implement barriers and we cannot just use raw
>>> accessors) and fix commit log to explain everything.
>> Is there actually any difference in generated code with this patch 
>> applied
>> and without? I would hope that iowrite..() is inlined and cancels out the
>> cpu_to_le..() calls that are also inlined?
> iowrite32 is a non-inline function so conversions take place so are the
> others. And sorry but I fail to see why this matters. We are not trying to
> accelerate things, we are removing redundant operations which confuse
> people who read the code.
 The confusion depends on where you're coming from. If you happen to know
 that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.
>>> It was like this (and this is just confusing):
>>>
>>> iowrite32(le32_to_cpu(val), io + off);
>>>
>>> What would make sense (according to you and I would understand this) is 
>>> this:
>>>
>>> iowrite32(cpu_to_le32(val), io + off);
>>>
>>>
>>> Or I missed your point, did I?
>>
>> No, you didn't miss it. I think for people who know how iowrite32() 
>> works the above is obvious. I find the fact that iowrite32() writes in 
>> LE always pretty scary though ;).
>>
>> So IMHO we should either create new, generic iowrite helpers that don't 
>> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
> 
> I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense


I do not understand why @val is considered LE here and need to be converted
to CPU. Really. I truly believe it should be cpu_to_le32().


> and keeps the byte order consistent regardless of the platform, while
> iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
> remember that the byte swaps are a nop on the given platforms.  As Ben
> noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
> I'd prefer an attempt be made to make it exist before adding
> vfio-specific macros.  vfio is arguably 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alex Williamson
On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
> On 24.06.14 15:01, Alexey Kardashevskiy wrote:
> > On 06/24/2014 10:52 PM, Alexander Graf wrote:
> >> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
> >>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
>  On 24.06.14 12:11, Alexey Kardashevskiy wrote:
> > On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
> >> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
> >>
> >>> Working on big endian being an accident may be a matter of perspective
> >> :-)
> >>
> >>> The comment remains that this patch doesn't actually fix anything 
> >>> except
> >>> the overhead on big endian systems doing redundant byte swapping and
> >>> maybe the philosophy that vfio regions are little endian.
> >> Yes, that works by accident because technically VFIO is a transport and
> >> thus shouldn't perform any endian swapping of any sort, which remains
> >> the responsibility of the end driver which is the only one to know
> >> whether a given BAR location is a a register or some streaming data
> >> and in the former case whether it's LE or BE (some PCI devices are BE
> >> even ! :-)
> >>
> >> But yes, in the end, it works with the dual "cancelling" swaps and the
> >> overhead of those swaps is probably drowned in the noise of the syscall
> >> overhead.
> >>
> >>> I'm still not a fan of iowrite vs iowritebe, there must be something 
> >>> we
> >>> can use that doesn't have an implicit swap.
> >> Sadly there isn't ... In the old day we didn't even have the "be"
> >> variant and readl/writel style accessors still don't have them either
> >> for all archs.
> >>
> >> There is __raw_readl/writel but here the semantics are much more than
> >> just "don't swap", they also don't have memory barriers (which means
> >> they are essentially useless to most drivers unless those are platform
> >> specific drivers which know exactly what they are doing, or in the rare
> >> cases such as accessing a framebuffer which we know never have side
> >> effects).
> >>
> >>> Calling it iowrite*_native is also an abuse of the namespace.
> >>> Next thing we know some common code
> >>> will legitimately use that name.
> >> I might make sense to those definitions into a common header. There 
> >> have
> >> been a handful of cases in the past that wanted that sort of "native
> >> byte order" MMIOs iirc (though don't ask me for examples, I can't 
> >> really
> >> remember).
> >>
> >>> If we do need to define an alias
> >>> (which I'd like to avoid) it should be something like vfio_iowrite32.
> > Ping?
> >
> > We need to make a decision whether to move those xxx_native() helpers
> > somewhere (where?) or leave the patch as is (as we figured out that
> > iowriteXX functions implement barriers and we cannot just use raw
> > accessors) and fix commit log to explain everything.
>  Is there actually any difference in generated code with this patch 
>  applied
>  and without? I would hope that iowrite..() is inlined and cancels out the
>  cpu_to_le..() calls that are also inlined?
> >>> iowrite32 is a non-inline function so conversions take place so are the
> >>> others. And sorry but I fail to see why this matters. We are not trying to
> >>> accelerate things, we are removing redundant operations which confuse
> >>> people who read the code.
> >> The confusion depends on where you're coming from. If you happen to know
> >> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.
> > It was like this (and this is just confusing):
> >
> > iowrite32(le32_to_cpu(val), io + off);
> >
> > What would make sense (according to you and I would understand this) is 
> > this:
> >
> > iowrite32(cpu_to_le32(val), io + off);
> >
> >
> > Or I missed your point, did I?
> 
> No, you didn't miss it. I think for people who know how iowrite32() 
> works the above is obvious. I find the fact that iowrite32() writes in 
> LE always pretty scary though ;).
> 
> So IMHO we should either create new, generic iowrite helpers that don't 
> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.

I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
and keeps the byte order consistent regardless of the platform, while
iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
remember that the byte swaps are a nop on the given platforms.  As Ben
noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
I'd prefer an attempt be made to make it exist before adding
vfio-specific macros.  vfio is arguably doing the right thing here given
the functions available.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexander Graf


On 24.06.14 15:01, Alexey Kardashevskiy wrote:

On 06/24/2014 10:52 PM, Alexander Graf wrote:

On 24.06.14 14:50, Alexey Kardashevskiy wrote:

On 06/24/2014 08:41 PM, Alexander Graf wrote:

On 24.06.14 12:11, Alexey Kardashevskiy wrote:

On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:

On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:


Working on big endian being an accident may be a matter of perspective

:-)


The comment remains that this patch doesn't actually fix anything except
the overhead on big endian systems doing redundant byte swapping and
maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual "cancelling" swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.


I'm still not a fan of iowrite vs iowritebe, there must be something we
can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the "be"
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just "don't swap", they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects).


Calling it iowrite*_native is also an abuse of the namespace.
Next thing we know some common code
will legitimately use that name.

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of "native
byte order" MMIOs iirc (though don't ask me for examples, I can't really
remember).


If we do need to define an alias
(which I'd like to avoid) it should be something like vfio_iowrite32.

Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.

Is there actually any difference in generated code with this patch applied
and without? I would hope that iowrite..() is inlined and cancels out the
cpu_to_le..() calls that are also inlined?

iowrite32 is a non-inline function so conversions take place so are the
others. And sorry but I fail to see why this matters. We are not trying to
accelerate things, we are removing redundant operations which confuse
people who read the code.

The confusion depends on where you're coming from. If you happen to know
that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.

It was like this (and this is just confusing):

iowrite32(le32_to_cpu(val), io + off);

What would make sense (according to you and I would understand this) is this:

iowrite32(cpu_to_le32(val), io + off);


Or I missed your point, did I?


No, you didn't miss it. I think for people who know how iowrite32() 
works the above is obvious. I find the fact that iowrite32() writes in 
LE always pretty scary though ;).


So IMHO we should either create new, generic iowrite helpers that don't 
do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.



Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/24/2014 10:52 PM, Alexander Graf wrote:
> 
> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
>>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
 On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>
>> Working on big endian being an accident may be a matter of perspective
>:-)
>
>> The comment remains that this patch doesn't actually fix anything except
>> the overhead on big endian systems doing redundant byte swapping and
>> maybe the philosophy that vfio regions are little endian.
> Yes, that works by accident because technically VFIO is a transport and
> thus shouldn't perform any endian swapping of any sort, which remains
> the responsibility of the end driver which is the only one to know
> whether a given BAR location is a a register or some streaming data
> and in the former case whether it's LE or BE (some PCI devices are BE
> even ! :-)
>
> But yes, in the end, it works with the dual "cancelling" swaps and the
> overhead of those swaps is probably drowned in the noise of the syscall
> overhead.
>
>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>> can use that doesn't have an implicit swap.
> Sadly there isn't ... In the old day we didn't even have the "be"
> variant and readl/writel style accessors still don't have them either
> for all archs.
>
> There is __raw_readl/writel but here the semantics are much more than
> just "don't swap", they also don't have memory barriers (which means
> they are essentially useless to most drivers unless those are platform
> specific drivers which know exactly what they are doing, or in the rare
> cases such as accessing a framebuffer which we know never have side
> effects).
>
>>Calling it iowrite*_native is also an abuse of the namespace.
>>Next thing we know some common code
>> will legitimately use that name.
> I might make sense to those definitions into a common header. There have
> been a handful of cases in the past that wanted that sort of "native
> byte order" MMIOs iirc (though don't ask me for examples, I can't really
> remember).
>
>>If we do need to define an alias
>> (which I'd like to avoid) it should be something like vfio_iowrite32.
 Ping?

 We need to make a decision whether to move those xxx_native() helpers
 somewhere (where?) or leave the patch as is (as we figured out that
 iowriteXX functions implement barriers and we cannot just use raw
 accessors) and fix commit log to explain everything.
>>> Is there actually any difference in generated code with this patch applied
>>> and without? I would hope that iowrite..() is inlined and cancels out the
>>> cpu_to_le..() calls that are also inlined?
>> iowrite32 is a non-inline function so conversions take place so are the
>> others. And sorry but I fail to see why this matters. We are not trying to
>> accelerate things, we are removing redundant operations which confuse
>> people who read the code.
> 
> The confusion depends on where you're coming from. If you happen to know
> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.

It was like this (and this is just confusing):

iowrite32(le32_to_cpu(val), io + off);

What would make sense (according to you and I would understand this) is this:

iowrite32(cpu_to_le32(val), io + off);


Or I missed your point, did I?


> I don't have a strong feeling either way though and will let Alex decide on
> the path forward :)

It would probably help if you picked the side :)


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexander Graf


On 24.06.14 14:50, Alexey Kardashevskiy wrote:

On 06/24/2014 08:41 PM, Alexander Graf wrote:

On 24.06.14 12:11, Alexey Kardashevskiy wrote:

On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:

On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:


Working on big endian being an accident may be a matter of perspective

   :-)


The comment remains that this patch doesn't actually fix anything except
the overhead on big endian systems doing redundant byte swapping and
maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual "cancelling" swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.


I'm still not a fan of iowrite vs iowritebe, there must be something we
can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the "be"
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just "don't swap", they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects).


   Calling it iowrite*_native is also an abuse of the namespace.
   Next thing we know some common code
will legitimately use that name.

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of "native
byte order" MMIOs iirc (though don't ask me for examples, I can't really
remember).


   If we do need to define an alias
(which I'd like to avoid) it should be something like vfio_iowrite32.

Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.

Is there actually any difference in generated code with this patch applied
and without? I would hope that iowrite..() is inlined and cancels out the
cpu_to_le..() calls that are also inlined?

iowrite32 is a non-inline function so conversions take place so are the
others. And sorry but I fail to see why this matters. We are not trying to
accelerate things, we are removing redundant operations which confuse
people who read the code.


The confusion depends on where you're coming from. If you happen to know 
that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.


I don't have a strong feeling either way though and will let Alex decide 
on the path forward :).



Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/24/2014 08:41 PM, Alexander Graf wrote:
> 
> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>>
 Working on big endian being an accident may be a matter of perspective
>>>   :-)
>>>
 The comment remains that this patch doesn't actually fix anything except
 the overhead on big endian systems doing redundant byte swapping and
 maybe the philosophy that vfio regions are little endian.
>>> Yes, that works by accident because technically VFIO is a transport and
>>> thus shouldn't perform any endian swapping of any sort, which remains
>>> the responsibility of the end driver which is the only one to know
>>> whether a given BAR location is a a register or some streaming data
>>> and in the former case whether it's LE or BE (some PCI devices are BE
>>> even ! :-)
>>>
>>> But yes, in the end, it works with the dual "cancelling" swaps and the
>>> overhead of those swaps is probably drowned in the noise of the syscall
>>> overhead.
>>>
 I'm still not a fan of iowrite vs iowritebe, there must be something we
 can use that doesn't have an implicit swap.
>>> Sadly there isn't ... In the old day we didn't even have the "be"
>>> variant and readl/writel style accessors still don't have them either
>>> for all archs.
>>>
>>> There is __raw_readl/writel but here the semantics are much more than
>>> just "don't swap", they also don't have memory barriers (which means
>>> they are essentially useless to most drivers unless those are platform
>>> specific drivers which know exactly what they are doing, or in the rare
>>> cases such as accessing a framebuffer which we know never have side
>>> effects).
>>>
   Calling it iowrite*_native is also an abuse of the namespace.
>>>
   Next thing we know some common code
 will legitimately use that name.
>>> I might make sense to those definitions into a common header. There have
>>> been a handful of cases in the past that wanted that sort of "native
>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
>>> remember).
>>>
   If we do need to define an alias
 (which I'd like to avoid) it should be something like vfio_iowrite32.
>>
>> Ping?
>>
>> We need to make a decision whether to move those xxx_native() helpers
>> somewhere (where?) or leave the patch as is (as we figured out that
>> iowriteXX functions implement barriers and we cannot just use raw
>> accessors) and fix commit log to explain everything.
> 
> Is there actually any difference in generated code with this patch applied
> and without? I would hope that iowrite..() is inlined and cancels out the
> cpu_to_le..() calls that are also inlined?

iowrite32 is a non-inline function so conversions take place so are the
others. And sorry but I fail to see why this matters. We are not trying to
accelerate things, we are removing redundant operations which confuse
people who read the code.


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexander Graf


On 24.06.14 12:11, Alexey Kardashevskiy wrote:

On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:

On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:


Working on big endian being an accident may be a matter of perspective

  :-)


The comment remains that this patch doesn't actually fix anything except
the overhead on big endian systems doing redundant byte swapping and
maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual "cancelling" swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.


I'm still not a fan of iowrite vs iowritebe, there must be something we
can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the "be"
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just "don't swap", they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects).


  Calling it iowrite*_native is also an abuse of the namespace.



  Next thing we know some common code
will legitimately use that name.

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of "native
byte order" MMIOs iirc (though don't ask me for examples, I can't really
remember).


  If we do need to define an alias
(which I'd like to avoid) it should be something like vfio_iowrite32.


Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.


Is there actually any difference in generated code with this patch 
applied and without? I would hope that iowrite..() is inlined and 
cancels out the cpu_to_le..() calls that are also inlined?



Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
> 
>> Working on big endian being an accident may be a matter of perspective
> 
>  :-)
> 
>> The comment remains that this patch doesn't actually fix anything except
>> the overhead on big endian systems doing redundant byte swapping and
>> maybe the philosophy that vfio regions are little endian.
> 
> Yes, that works by accident because technically VFIO is a transport and
> thus shouldn't perform any endian swapping of any sort, which remains
> the responsibility of the end driver which is the only one to know
> whether a given BAR location is a a register or some streaming data
> and in the former case whether it's LE or BE (some PCI devices are BE
> even ! :-)
> 
> But yes, in the end, it works with the dual "cancelling" swaps and the
> overhead of those swaps is probably drowned in the noise of the syscall
> overhead.
> 
>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>> can use that doesn't have an implicit swap.
> 
> Sadly there isn't ... In the old day we didn't even have the "be"
> variant and readl/writel style accessors still don't have them either
> for all archs.
> 
> There is __raw_readl/writel but here the semantics are much more than
> just "don't swap", they also don't have memory barriers (which means
> they are essentially useless to most drivers unless those are platform
> specific drivers which know exactly what they are doing, or in the rare
> cases such as accessing a framebuffer which we know never have side
> effects). 
> 
>>  Calling it iowrite*_native is also an abuse of the namespace.
> 
> 
>>  Next thing we know some common code
>> will legitimately use that name. 
> 
> I might make sense to those definitions into a common header. There have
> been a handful of cases in the past that wanted that sort of "native
> byte order" MMIOs iirc (though don't ask me for examples, I can't really
> remember).
> 
>>  If we do need to define an alias
>> (which I'd like to avoid) it should be something like vfio_iowrite32.


Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.

Thanks!




>> Thanks,
> 
> Cheers,
> Ben.
> 
>> Alex
>>
 ===

 any better?




>>> Suggested-by: Benjamin Herrenschmidt 
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
>>> b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> index 210db24..f363b5a 100644
>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> @@ -21,6 +21,18 @@
>>>  
>>>  #include "vfio_pci_private.h"
>>>  
>>> +#ifdef __BIG_ENDIAN__
>>> +#define ioread16_nativeioread16be
>>> +#define ioread32_nativeioread32be
>>> +#define iowrite16_native   iowrite16be
>>> +#define iowrite32_native   iowrite32be
>>> +#else
>>> +#define ioread16_nativeioread16
>>> +#define ioread32_nativeioread32
>>> +#define iowrite16_native   iowrite16
>>> +#define iowrite32_native   iowrite32
>>> +#endif
>>> +
>>>  /*
>>>   * Read or write from an __iomem region (MMIO or I/O port) with an 
>>> excluded
>>>   * range which is inaccessible.  The excluded range drops writes and 
>>> fills
>>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user 
>>> *buf,
>>> if (copy_from_user(, buf, 4))
>>> return -EFAULT;
>>>  
>>> -   iowrite32(le32_to_cpu(val), io + off);
>>> +   iowrite32_native(val, io + off);
>>> } else {
>>> -   val = cpu_to_le32(ioread32(io + off));
>>> +   val = ioread32_native(io + off);
>>>  
>>> if (copy_to_user(buf, , 4))
>>> return -EFAULT;
>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user 
>>> *buf,
>>> if (copy_from_user(, buf, 2))
>>> return -EFAULT;
>>>  
>>> -   iowrite16(le16_to_cpu(val), io + off);
>>> +   iowrite16_native(val, io + off);
>>> } else {
>>> -   val = cpu_to_le16(ioread16(io + off));
>>> +   

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
 On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
 
 Working on big endian being an accident may be a matter of perspective
 
  :-)
 
 The comment remains that this patch doesn't actually fix anything except
 the overhead on big endian systems doing redundant byte swapping and
 maybe the philosophy that vfio regions are little endian.
 
 Yes, that works by accident because technically VFIO is a transport and
 thus shouldn't perform any endian swapping of any sort, which remains
 the responsibility of the end driver which is the only one to know
 whether a given BAR location is a a register or some streaming data
 and in the former case whether it's LE or BE (some PCI devices are BE
 even ! :-)
 
 But yes, in the end, it works with the dual cancelling swaps and the
 overhead of those swaps is probably drowned in the noise of the syscall
 overhead.
 
 I'm still not a fan of iowrite vs iowritebe, there must be something we
 can use that doesn't have an implicit swap.
 
 Sadly there isn't ... In the old day we didn't even have the be
 variant and readl/writel style accessors still don't have them either
 for all archs.
 
 There is __raw_readl/writel but here the semantics are much more than
 just don't swap, they also don't have memory barriers (which means
 they are essentially useless to most drivers unless those are platform
 specific drivers which know exactly what they are doing, or in the rare
 cases such as accessing a framebuffer which we know never have side
 effects). 
 
  Calling it iowrite*_native is also an abuse of the namespace.
 
 
  Next thing we know some common code
 will legitimately use that name. 
 
 I might make sense to those definitions into a common header. There have
 been a handful of cases in the past that wanted that sort of native
 byte order MMIOs iirc (though don't ask me for examples, I can't really
 remember).
 
  If we do need to define an alias
 (which I'd like to avoid) it should be something like vfio_iowrite32.


Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.

Thanks!




 Thanks,
 
 Cheers,
 Ben.
 
 Alex

 ===

 any better?




 Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

 diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
 b/drivers/vfio/pci/vfio_pci_rdwr.c
 index 210db24..f363b5a 100644
 --- a/drivers/vfio/pci/vfio_pci_rdwr.c
 +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
 @@ -21,6 +21,18 @@
  
  #include vfio_pci_private.h
  
 +#ifdef __BIG_ENDIAN__
 +#define ioread16_nativeioread16be
 +#define ioread32_nativeioread32be
 +#define iowrite16_native   iowrite16be
 +#define iowrite32_native   iowrite32be
 +#else
 +#define ioread16_nativeioread16
 +#define ioread32_nativeioread32
 +#define iowrite16_native   iowrite16
 +#define iowrite32_native   iowrite32
 +#endif
 +
  /*
   * Read or write from an __iomem region (MMIO or I/O port) with an 
 excluded
   * range which is inaccessible.  The excluded range drops writes and 
 fills
 @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user 
 *buf,
 if (copy_from_user(val, buf, 4))
 return -EFAULT;
  
 -   iowrite32(le32_to_cpu(val), io + off);
 +   iowrite32_native(val, io + off);
 } else {
 -   val = cpu_to_le32(ioread32(io + off));
 +   val = ioread32_native(io + off);
  
 if (copy_to_user(buf, val, 4))
 return -EFAULT;
 @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user 
 *buf,
 if (copy_from_user(val, buf, 2))
 return -EFAULT;
  
 -   iowrite16(le16_to_cpu(val), io + off);
 +   iowrite16_native(val, io + off);
 } else {
 -   val = cpu_to_le16(ioread16(io + off));
 +   val = ioread16_native(io + off);
  
 if (copy_to_user(buf, val, 2))
 return -EFAULT;




-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexander Graf


On 24.06.14 12:11, Alexey Kardashevskiy wrote:

On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:

On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:


Working on big endian being an accident may be a matter of perspective

  :-)


The comment remains that this patch doesn't actually fix anything except
the overhead on big endian systems doing redundant byte swapping and
maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual cancelling swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.


I'm still not a fan of iowrite vs iowritebe, there must be something we
can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the be
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just don't swap, they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects).


  Calling it iowrite*_native is also an abuse of the namespace.



  Next thing we know some common code
will legitimately use that name.

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of native
byte order MMIOs iirc (though don't ask me for examples, I can't really
remember).


  If we do need to define an alias
(which I'd like to avoid) it should be something like vfio_iowrite32.


Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.


Is there actually any difference in generated code with this patch 
applied and without? I would hope that iowrite..() is inlined and 
cancels out the cpu_to_le..() calls that are also inlined?



Alex

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/24/2014 08:41 PM, Alexander Graf wrote:
 
 On 24.06.14 12:11, Alexey Kardashevskiy wrote:
 On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
 On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

 Working on big endian being an accident may be a matter of perspective
   :-)

 The comment remains that this patch doesn't actually fix anything except
 the overhead on big endian systems doing redundant byte swapping and
 maybe the philosophy that vfio regions are little endian.
 Yes, that works by accident because technically VFIO is a transport and
 thus shouldn't perform any endian swapping of any sort, which remains
 the responsibility of the end driver which is the only one to know
 whether a given BAR location is a a register or some streaming data
 and in the former case whether it's LE or BE (some PCI devices are BE
 even ! :-)

 But yes, in the end, it works with the dual cancelling swaps and the
 overhead of those swaps is probably drowned in the noise of the syscall
 overhead.

 I'm still not a fan of iowrite vs iowritebe, there must be something we
 can use that doesn't have an implicit swap.
 Sadly there isn't ... In the old day we didn't even have the be
 variant and readl/writel style accessors still don't have them either
 for all archs.

 There is __raw_readl/writel but here the semantics are much more than
 just don't swap, they also don't have memory barriers (which means
 they are essentially useless to most drivers unless those are platform
 specific drivers which know exactly what they are doing, or in the rare
 cases such as accessing a framebuffer which we know never have side
 effects).

   Calling it iowrite*_native is also an abuse of the namespace.

   Next thing we know some common code
 will legitimately use that name.
 I might make sense to those definitions into a common header. There have
 been a handful of cases in the past that wanted that sort of native
 byte order MMIOs iirc (though don't ask me for examples, I can't really
 remember).

   If we do need to define an alias
 (which I'd like to avoid) it should be something like vfio_iowrite32.

 Ping?

 We need to make a decision whether to move those xxx_native() helpers
 somewhere (where?) or leave the patch as is (as we figured out that
 iowriteXX functions implement barriers and we cannot just use raw
 accessors) and fix commit log to explain everything.
 
 Is there actually any difference in generated code with this patch applied
 and without? I would hope that iowrite..() is inlined and cancels out the
 cpu_to_le..() calls that are also inlined?

iowrite32 is a non-inline function so conversions take place so are the
others. And sorry but I fail to see why this matters. We are not trying to
accelerate things, we are removing redundant operations which confuse
people who read the code.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexander Graf


On 24.06.14 14:50, Alexey Kardashevskiy wrote:

On 06/24/2014 08:41 PM, Alexander Graf wrote:

On 24.06.14 12:11, Alexey Kardashevskiy wrote:

On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:

On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:


Working on big endian being an accident may be a matter of perspective

   :-)


The comment remains that this patch doesn't actually fix anything except
the overhead on big endian systems doing redundant byte swapping and
maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual cancelling swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.


I'm still not a fan of iowrite vs iowritebe, there must be something we
can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the be
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just don't swap, they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects).


   Calling it iowrite*_native is also an abuse of the namespace.
   Next thing we know some common code
will legitimately use that name.

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of native
byte order MMIOs iirc (though don't ask me for examples, I can't really
remember).


   If we do need to define an alias
(which I'd like to avoid) it should be something like vfio_iowrite32.

Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.

Is there actually any difference in generated code with this patch applied
and without? I would hope that iowrite..() is inlined and cancels out the
cpu_to_le..() calls that are also inlined?

iowrite32 is a non-inline function so conversions take place so are the
others. And sorry but I fail to see why this matters. We are not trying to
accelerate things, we are removing redundant operations which confuse
people who read the code.


The confusion depends on where you're coming from. If you happen to know 
that iowrite32 writes in LE, then the LE conversion makes a lot of sense.


I don't have a strong feeling either way though and will let Alex decide 
on the path forward :).



Alex

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/24/2014 10:52 PM, Alexander Graf wrote:
 
 On 24.06.14 14:50, Alexey Kardashevskiy wrote:
 On 06/24/2014 08:41 PM, Alexander Graf wrote:
 On 24.06.14 12:11, Alexey Kardashevskiy wrote:
 On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
 On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

 Working on big endian being an accident may be a matter of perspective
:-)

 The comment remains that this patch doesn't actually fix anything except
 the overhead on big endian systems doing redundant byte swapping and
 maybe the philosophy that vfio regions are little endian.
 Yes, that works by accident because technically VFIO is a transport and
 thus shouldn't perform any endian swapping of any sort, which remains
 the responsibility of the end driver which is the only one to know
 whether a given BAR location is a a register or some streaming data
 and in the former case whether it's LE or BE (some PCI devices are BE
 even ! :-)

 But yes, in the end, it works with the dual cancelling swaps and the
 overhead of those swaps is probably drowned in the noise of the syscall
 overhead.

 I'm still not a fan of iowrite vs iowritebe, there must be something we
 can use that doesn't have an implicit swap.
 Sadly there isn't ... In the old day we didn't even have the be
 variant and readl/writel style accessors still don't have them either
 for all archs.

 There is __raw_readl/writel but here the semantics are much more than
 just don't swap, they also don't have memory barriers (which means
 they are essentially useless to most drivers unless those are platform
 specific drivers which know exactly what they are doing, or in the rare
 cases such as accessing a framebuffer which we know never have side
 effects).

Calling it iowrite*_native is also an abuse of the namespace.
Next thing we know some common code
 will legitimately use that name.
 I might make sense to those definitions into a common header. There have
 been a handful of cases in the past that wanted that sort of native
 byte order MMIOs iirc (though don't ask me for examples, I can't really
 remember).

If we do need to define an alias
 (which I'd like to avoid) it should be something like vfio_iowrite32.
 Ping?

 We need to make a decision whether to move those xxx_native() helpers
 somewhere (where?) or leave the patch as is (as we figured out that
 iowriteXX functions implement barriers and we cannot just use raw
 accessors) and fix commit log to explain everything.
 Is there actually any difference in generated code with this patch applied
 and without? I would hope that iowrite..() is inlined and cancels out the
 cpu_to_le..() calls that are also inlined?
 iowrite32 is a non-inline function so conversions take place so are the
 others. And sorry but I fail to see why this matters. We are not trying to
 accelerate things, we are removing redundant operations which confuse
 people who read the code.
 
 The confusion depends on where you're coming from. If you happen to know
 that iowrite32 writes in LE, then the LE conversion makes a lot of sense.

It was like this (and this is just confusing):

iowrite32(le32_to_cpu(val), io + off);

What would make sense (according to you and I would understand this) is this:

iowrite32(cpu_to_le32(val), io + off);


Or I missed your point, did I?


 I don't have a strong feeling either way though and will let Alex decide on
 the path forward :)

It would probably help if you picked the side :)


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexander Graf


On 24.06.14 15:01, Alexey Kardashevskiy wrote:

On 06/24/2014 10:52 PM, Alexander Graf wrote:

On 24.06.14 14:50, Alexey Kardashevskiy wrote:

On 06/24/2014 08:41 PM, Alexander Graf wrote:

On 24.06.14 12:11, Alexey Kardashevskiy wrote:

On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:

On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:


Working on big endian being an accident may be a matter of perspective

:-)


The comment remains that this patch doesn't actually fix anything except
the overhead on big endian systems doing redundant byte swapping and
maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual cancelling swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.


I'm still not a fan of iowrite vs iowritebe, there must be something we
can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the be
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just don't swap, they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects).


Calling it iowrite*_native is also an abuse of the namespace.
Next thing we know some common code
will legitimately use that name.

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of native
byte order MMIOs iirc (though don't ask me for examples, I can't really
remember).


If we do need to define an alias
(which I'd like to avoid) it should be something like vfio_iowrite32.

Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.

Is there actually any difference in generated code with this patch applied
and without? I would hope that iowrite..() is inlined and cancels out the
cpu_to_le..() calls that are also inlined?

iowrite32 is a non-inline function so conversions take place so are the
others. And sorry but I fail to see why this matters. We are not trying to
accelerate things, we are removing redundant operations which confuse
people who read the code.

The confusion depends on where you're coming from. If you happen to know
that iowrite32 writes in LE, then the LE conversion makes a lot of sense.

It was like this (and this is just confusing):

iowrite32(le32_to_cpu(val), io + off);

What would make sense (according to you and I would understand this) is this:

iowrite32(cpu_to_le32(val), io + off);


Or I missed your point, did I?


No, you didn't miss it. I think for people who know how iowrite32() 
works the above is obvious. I find the fact that iowrite32() writes in 
LE always pretty scary though ;).


So IMHO we should either create new, generic iowrite helpers that don't 
do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.



Alex

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alex Williamson
On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
 On 24.06.14 15:01, Alexey Kardashevskiy wrote:
  On 06/24/2014 10:52 PM, Alexander Graf wrote:
  On 24.06.14 14:50, Alexey Kardashevskiy wrote:
  On 06/24/2014 08:41 PM, Alexander Graf wrote:
  On 24.06.14 12:11, Alexey Kardashevskiy wrote:
  On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
  On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
 
  Working on big endian being an accident may be a matter of perspective
  :-)
 
  The comment remains that this patch doesn't actually fix anything 
  except
  the overhead on big endian systems doing redundant byte swapping and
  maybe the philosophy that vfio regions are little endian.
  Yes, that works by accident because technically VFIO is a transport and
  thus shouldn't perform any endian swapping of any sort, which remains
  the responsibility of the end driver which is the only one to know
  whether a given BAR location is a a register or some streaming data
  and in the former case whether it's LE or BE (some PCI devices are BE
  even ! :-)
 
  But yes, in the end, it works with the dual cancelling swaps and the
  overhead of those swaps is probably drowned in the noise of the syscall
  overhead.
 
  I'm still not a fan of iowrite vs iowritebe, there must be something 
  we
  can use that doesn't have an implicit swap.
  Sadly there isn't ... In the old day we didn't even have the be
  variant and readl/writel style accessors still don't have them either
  for all archs.
 
  There is __raw_readl/writel but here the semantics are much more than
  just don't swap, they also don't have memory barriers (which means
  they are essentially useless to most drivers unless those are platform
  specific drivers which know exactly what they are doing, or in the rare
  cases such as accessing a framebuffer which we know never have side
  effects).
 
  Calling it iowrite*_native is also an abuse of the namespace.
  Next thing we know some common code
  will legitimately use that name.
  I might make sense to those definitions into a common header. There 
  have
  been a handful of cases in the past that wanted that sort of native
  byte order MMIOs iirc (though don't ask me for examples, I can't 
  really
  remember).
 
  If we do need to define an alias
  (which I'd like to avoid) it should be something like vfio_iowrite32.
  Ping?
 
  We need to make a decision whether to move those xxx_native() helpers
  somewhere (where?) or leave the patch as is (as we figured out that
  iowriteXX functions implement barriers and we cannot just use raw
  accessors) and fix commit log to explain everything.
  Is there actually any difference in generated code with this patch 
  applied
  and without? I would hope that iowrite..() is inlined and cancels out the
  cpu_to_le..() calls that are also inlined?
  iowrite32 is a non-inline function so conversions take place so are the
  others. And sorry but I fail to see why this matters. We are not trying to
  accelerate things, we are removing redundant operations which confuse
  people who read the code.
  The confusion depends on where you're coming from. If you happen to know
  that iowrite32 writes in LE, then the LE conversion makes a lot of sense.
  It was like this (and this is just confusing):
 
  iowrite32(le32_to_cpu(val), io + off);
 
  What would make sense (according to you and I would understand this) is 
  this:
 
  iowrite32(cpu_to_le32(val), io + off);
 
 
  Or I missed your point, did I?
 
 No, you didn't miss it. I think for people who know how iowrite32() 
 works the above is obvious. I find the fact that iowrite32() writes in 
 LE always pretty scary though ;).
 
 So IMHO we should either create new, generic iowrite helpers that don't 
 do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.

I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
and keeps the byte order consistent regardless of the platform, while
iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
remember that the byte swaps are a nop on the given platforms.  As Ben
noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
I'd prefer an attempt be made to make it exist before adding
vfio-specific macros.  vfio is arguably doing the right thing here given
the functions available.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/25/2014 12:21 AM, Alex Williamson wrote:
 On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
 On 24.06.14 15:01, Alexey Kardashevskiy wrote:
 On 06/24/2014 10:52 PM, Alexander Graf wrote:
 On 24.06.14 14:50, Alexey Kardashevskiy wrote:
 On 06/24/2014 08:41 PM, Alexander Graf wrote:
 On 24.06.14 12:11, Alexey Kardashevskiy wrote:
 On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
 On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

 Working on big endian being an accident may be a matter of perspective
 :-)

 The comment remains that this patch doesn't actually fix anything 
 except
 the overhead on big endian systems doing redundant byte swapping and
 maybe the philosophy that vfio regions are little endian.
 Yes, that works by accident because technically VFIO is a transport and
 thus shouldn't perform any endian swapping of any sort, which remains
 the responsibility of the end driver which is the only one to know
 whether a given BAR location is a a register or some streaming data
 and in the former case whether it's LE or BE (some PCI devices are BE
 even ! :-)

 But yes, in the end, it works with the dual cancelling swaps and the
 overhead of those swaps is probably drowned in the noise of the syscall
 overhead.

 I'm still not a fan of iowrite vs iowritebe, there must be something 
 we
 can use that doesn't have an implicit swap.
 Sadly there isn't ... In the old day we didn't even have the be
 variant and readl/writel style accessors still don't have them either
 for all archs.

 There is __raw_readl/writel but here the semantics are much more than
 just don't swap, they also don't have memory barriers (which means
 they are essentially useless to most drivers unless those are platform
 specific drivers which know exactly what they are doing, or in the rare
 cases such as accessing a framebuffer which we know never have side
 effects).

 Calling it iowrite*_native is also an abuse of the namespace.
 Next thing we know some common code
 will legitimately use that name.
 I might make sense to those definitions into a common header. There 
 have
 been a handful of cases in the past that wanted that sort of native
 byte order MMIOs iirc (though don't ask me for examples, I can't 
 really
 remember).

 If we do need to define an alias
 (which I'd like to avoid) it should be something like vfio_iowrite32.
 Ping?

 We need to make a decision whether to move those xxx_native() helpers
 somewhere (where?) or leave the patch as is (as we figured out that
 iowriteXX functions implement barriers and we cannot just use raw
 accessors) and fix commit log to explain everything.
 Is there actually any difference in generated code with this patch 
 applied
 and without? I would hope that iowrite..() is inlined and cancels out the
 cpu_to_le..() calls that are also inlined?
 iowrite32 is a non-inline function so conversions take place so are the
 others. And sorry but I fail to see why this matters. We are not trying to
 accelerate things, we are removing redundant operations which confuse
 people who read the code.
 The confusion depends on where you're coming from. If you happen to know
 that iowrite32 writes in LE, then the LE conversion makes a lot of sense.
 It was like this (and this is just confusing):

 iowrite32(le32_to_cpu(val), io + off);

 What would make sense (according to you and I would understand this) is 
 this:

 iowrite32(cpu_to_le32(val), io + off);


 Or I missed your point, did I?

 No, you didn't miss it. I think for people who know how iowrite32() 
 works the above is obvious. I find the fact that iowrite32() writes in 
 LE always pretty scary though ;).

 So IMHO we should either create new, generic iowrite helpers that don't 
 do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
 
 I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense


I do not understand why @val is considered LE here and need to be converted
to CPU. Really. I truly believe it should be cpu_to_le32().


 and keeps the byte order consistent regardless of the platform, while
 iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
 remember that the byte swaps are a nop on the given platforms.  As Ben
 noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
 I'd prefer an attempt be made to make it exist before adding
 vfio-specific macros.  vfio is arguably doing the right thing here given
 the functions available.  Thanks,


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread David Laight
From: Alexey Kardashevskiy
...
  So IMHO we should either create new, generic iowrite helpers that don't
  do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
 
  I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
 
 
 I do not understand why @val is considered LE here and need to be converted
 to CPU. Really. I truly believe it should be cpu_to_le32().

Think about it carefully...

Apparently iowrite32() is writing a 'cpu' value out 'le'.
So if you have a 'le' value you need to convert it to 'cpu' first.

I really won't ask how 'be' ppc managed to get 'le' on-chip peripherals.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alex Williamson
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
 On 06/25/2014 12:21 AM, Alex Williamson wrote:
  On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
  On 24.06.14 15:01, Alexey Kardashevskiy wrote:
  On 06/24/2014 10:52 PM, Alexander Graf wrote:
  On 24.06.14 14:50, Alexey Kardashevskiy wrote:
  On 06/24/2014 08:41 PM, Alexander Graf wrote:
  On 24.06.14 12:11, Alexey Kardashevskiy wrote:
  On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
  On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
 
  Working on big endian being an accident may be a matter of 
  perspective
  :-)
 
  The comment remains that this patch doesn't actually fix anything 
  except
  the overhead on big endian systems doing redundant byte swapping and
  maybe the philosophy that vfio regions are little endian.
  Yes, that works by accident because technically VFIO is a transport 
  and
  thus shouldn't perform any endian swapping of any sort, which remains
  the responsibility of the end driver which is the only one to know
  whether a given BAR location is a a register or some streaming data
  and in the former case whether it's LE or BE (some PCI devices are BE
  even ! :-)
 
  But yes, in the end, it works with the dual cancelling swaps and 
  the
  overhead of those swaps is probably drowned in the noise of the 
  syscall
  overhead.
 
  I'm still not a fan of iowrite vs iowritebe, there must be 
  something we
  can use that doesn't have an implicit swap.
  Sadly there isn't ... In the old day we didn't even have the be
  variant and readl/writel style accessors still don't have them either
  for all archs.
 
  There is __raw_readl/writel but here the semantics are much more than
  just don't swap, they also don't have memory barriers (which means
  they are essentially useless to most drivers unless those are 
  platform
  specific drivers which know exactly what they are doing, or in the 
  rare
  cases such as accessing a framebuffer which we know never have side
  effects).
 
  Calling it iowrite*_native is also an abuse of the namespace.
  Next thing we know some common code
  will legitimately use that name.
  I might make sense to those definitions into a common header. There 
  have
  been a handful of cases in the past that wanted that sort of native
  byte order MMIOs iirc (though don't ask me for examples, I can't 
  really
  remember).
 
  If we do need to define an alias
  (which I'd like to avoid) it should be something like 
  vfio_iowrite32.
  Ping?
 
  We need to make a decision whether to move those xxx_native() helpers
  somewhere (where?) or leave the patch as is (as we figured out that
  iowriteXX functions implement barriers and we cannot just use raw
  accessors) and fix commit log to explain everything.
  Is there actually any difference in generated code with this patch 
  applied
  and without? I would hope that iowrite..() is inlined and cancels out 
  the
  cpu_to_le..() calls that are also inlined?
  iowrite32 is a non-inline function so conversions take place so are the
  others. And sorry but I fail to see why this matters. We are not trying 
  to
  accelerate things, we are removing redundant operations which confuse
  people who read the code.
  The confusion depends on where you're coming from. If you happen to know
  that iowrite32 writes in LE, then the LE conversion makes a lot of 
  sense.
  It was like this (and this is just confusing):
 
  iowrite32(le32_to_cpu(val), io + off);
 
  What would make sense (according to you and I would understand this) is 
  this:
 
  iowrite32(cpu_to_le32(val), io + off);
 
 
  Or I missed your point, did I?
 
  No, you didn't miss it. I think for people who know how iowrite32() 
  works the above is obvious. I find the fact that iowrite32() writes in 
  LE always pretty scary though ;).
 
  So IMHO we should either create new, generic iowrite helpers that don't 
  do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
  
  I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
 
 
 I do not understand why @val is considered LE here and need to be converted
 to CPU. Really. I truly believe it should be cpu_to_le32().

Because iowrite32 is defined to take a cpu byte order value and write it
as little endian.

  and keeps the byte order consistent regardless of the platform, while
  iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
  remember that the byte swaps are a nop on the given platforms.  As Ben
  noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
  I'd prefer an attempt be made to make it exist before adding
  vfio-specific macros.  vfio is arguably doing the right thing here given
  the functions available.  Thanks,
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/25/2014 12:43 AM, Alex Williamson wrote:
 On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
 On 06/25/2014 12:21 AM, Alex Williamson wrote:
 On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
 On 24.06.14 15:01, Alexey Kardashevskiy wrote:
 On 06/24/2014 10:52 PM, Alexander Graf wrote:
 On 24.06.14 14:50, Alexey Kardashevskiy wrote:
 On 06/24/2014 08:41 PM, Alexander Graf wrote:
 On 24.06.14 12:11, Alexey Kardashevskiy wrote:
 On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
 On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

 Working on big endian being an accident may be a matter of 
 perspective
 :-)

 The comment remains that this patch doesn't actually fix anything 
 except
 the overhead on big endian systems doing redundant byte swapping and
 maybe the philosophy that vfio regions are little endian.
 Yes, that works by accident because technically VFIO is a transport 
 and
 thus shouldn't perform any endian swapping of any sort, which remains
 the responsibility of the end driver which is the only one to know
 whether a given BAR location is a a register or some streaming data
 and in the former case whether it's LE or BE (some PCI devices are BE
 even ! :-)

 But yes, in the end, it works with the dual cancelling swaps and 
 the
 overhead of those swaps is probably drowned in the noise of the 
 syscall
 overhead.

 I'm still not a fan of iowrite vs iowritebe, there must be 
 something we
 can use that doesn't have an implicit swap.
 Sadly there isn't ... In the old day we didn't even have the be
 variant and readl/writel style accessors still don't have them either
 for all archs.

 There is __raw_readl/writel but here the semantics are much more than
 just don't swap, they also don't have memory barriers (which means
 they are essentially useless to most drivers unless those are 
 platform
 specific drivers which know exactly what they are doing, or in the 
 rare
 cases such as accessing a framebuffer which we know never have side
 effects).

 Calling it iowrite*_native is also an abuse of the namespace.
 Next thing we know some common code
 will legitimately use that name.
 I might make sense to those definitions into a common header. There 
 have
 been a handful of cases in the past that wanted that sort of native
 byte order MMIOs iirc (though don't ask me for examples, I can't 
 really
 remember).

 If we do need to define an alias
 (which I'd like to avoid) it should be something like 
 vfio_iowrite32.
 Ping?

 We need to make a decision whether to move those xxx_native() helpers
 somewhere (where?) or leave the patch as is (as we figured out that
 iowriteXX functions implement barriers and we cannot just use raw
 accessors) and fix commit log to explain everything.
 Is there actually any difference in generated code with this patch 
 applied
 and without? I would hope that iowrite..() is inlined and cancels out 
 the
 cpu_to_le..() calls that are also inlined?
 iowrite32 is a non-inline function so conversions take place so are the
 others. And sorry but I fail to see why this matters. We are not trying 
 to
 accelerate things, we are removing redundant operations which confuse
 people who read the code.
 The confusion depends on where you're coming from. If you happen to know
 that iowrite32 writes in LE, then the LE conversion makes a lot of 
 sense.
 It was like this (and this is just confusing):

 iowrite32(le32_to_cpu(val), io + off);

 What would make sense (according to you and I would understand this) is 
 this:

 iowrite32(cpu_to_le32(val), io + off);


 Or I missed your point, did I?

 No, you didn't miss it. I think for people who know how iowrite32() 
 works the above is obvious. I find the fact that iowrite32() writes in 
 LE always pretty scary though ;).

 So IMHO we should either create new, generic iowrite helpers that don't 
 do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.

 I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense


 I do not understand why @val is considered LE here and need to be converted
 to CPU. Really. I truly believe it should be cpu_to_le32().
 
 Because iowrite32 is defined to take a cpu byte order value and write it
 as little endian.


Ok, then neither le32_to_cpu() nor cpu_to_le32() should be there at all, if
we are talking about not scratching anyone's head :)



 and keeps the byte order consistent regardless of the platform, while
 iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
 remember that the byte swaps are a nop on the given platforms.  As Ben
 noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
 I'd prefer an attempt be made to make it exist before adding
 vfio-specific macros.  vfio is arguably doing the right thing here given
 the functions available.  Thanks,


I do not mind to make that atempt but what exactly would make sense here?
Try moving macros to include/asm-generic/io.h? Something else? 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Benjamin Herrenschmidt
On Tue, 2014-06-24 at 12:41 +0200, Alexander Graf wrote:
 Is there actually any difference in generated code with this patch 
 applied and without? I would hope that iowrite..() is inlined and 
 cancels out the cpu_to_le..() calls that are also inlined?

No, the former uses byteswapping asm, the compiler can't cancel it
out, but the overhead of the additional byteswap might not be
measurable.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Benjamin Herrenschmidt
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
 
 I do not understand why @val is considered LE here and need to be
 converted
 to CPU. Really. I truly believe it should be cpu_to_le32().

No. Both are slightly wrong semantically but le32_to_cpu() is less
wrong :-)

iowrite32 supposedly takes a cpu value as argument and writes an le
value. So if anything, you need something that converts to a cpu value
before you call iowrite32.

Now it's still slightly wrong because the input to le32_to_cpu() is
supposed to be an LE value but of course here it's not, it's a cpu
value too :-)

But yes, I agree with aw here, either do nothing or stick a set of
iowriteXX_native or something equivalent in the generic iomap header,
define them in term of iowrite32be/iowrite32 based on the compile time
endian of the arch.

Hitting asm-generic/iomap.h I think will cover all archs except ARM.
For ARM, just hack arch/arm/asm/io.h

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread Alexey Kardashevskiy
On 06/25/2014 07:54 AM, Benjamin Herrenschmidt wrote:
 On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:

 I do not understand why @val is considered LE here and need to be
 converted
 to CPU. Really. I truly believe it should be cpu_to_le32().
 
 No. Both are slightly wrong semantically but le32_to_cpu() is less
 wrong :-)

 iowrite32 supposedly takes a cpu value as argument and writes an le
 value. So if anything, you need something that converts to a cpu value
 before you call iowrite32.
 
 Now it's still slightly wrong because the input to le32_to_cpu() is
 supposed to be an LE value but of course here it's not, it's a cpu
 value too :-)
 
 But yes, I agree with aw here, either do nothing or stick a set of
 iowriteXX_native or something equivalent in the generic iomap header,
 define them in term of iowrite32be/iowrite32 based on the compile time
 endian of the arch.


Ok. I'll do nothing.


 Hitting asm-generic/iomap.h I think will cover all archs except ARM.
 For ARM, just hack arch/arm/asm/io.h


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Benjamin Herrenschmidt
On Sat, 2014-06-21 at 00:14 +1000, Alexey Kardashevskiy wrote:

> We can still use __raw_writel, would that be ok?

No unless you understand precisely what kind of memory barriers each
platform require for these.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Benjamin Herrenschmidt
On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

> Working on big endian being an accident may be a matter of perspective

 :-)

> The comment remains that this patch doesn't actually fix anything except
> the overhead on big endian systems doing redundant byte swapping and
> maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual "cancelling" swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.

> I'm still not a fan of iowrite vs iowritebe, there must be something we
> can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the "be"
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just "don't swap", they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects). 

>  Calling it iowrite*_native is also an abuse of the namespace.


>  Next thing we know some common code
> will legitimately use that name. 

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of "native
byte order" MMIOs iirc (though don't ask me for examples, I can't really
remember).

>  If we do need to define an alias
> (which I'd like to avoid) it should be something like vfio_iowrite32.
> Thanks,

Cheers,
Ben.

> Alex
> 
> > > ===
> > > 
> > > any better?
> > > 
> > > 
> > > 
> > > 
> >  Suggested-by: Benjamin Herrenschmidt 
> >  Signed-off-by: Alexey Kardashevskiy 
> >  ---
> >   drivers/vfio/pci/vfio_pci_rdwr.c | 20 
> >   1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> >  diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
> >  b/drivers/vfio/pci/vfio_pci_rdwr.c
> >  index 210db24..f363b5a 100644
> >  --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> >  +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> >  @@ -21,6 +21,18 @@
> >   
> >   #include "vfio_pci_private.h"
> >   
> >  +#ifdef __BIG_ENDIAN__
> >  +#define ioread16_native   ioread16be
> >  +#define ioread32_native   ioread32be
> >  +#define iowrite16_native  iowrite16be
> >  +#define iowrite32_native  iowrite32be
> >  +#else
> >  +#define ioread16_native   ioread16
> >  +#define ioread32_native   ioread32
> >  +#define iowrite16_native  iowrite16
> >  +#define iowrite32_native  iowrite32
> >  +#endif
> >  +
> >   /*
> >    * Read or write from an __iomem region (MMIO or I/O port) with an 
> >  excluded
> >    * range which is inaccessible.  The excluded range drops writes and 
> >  fills
> >  @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char 
> >  __user *buf,
> > if (copy_from_user(, buf, 4))
> > return -EFAULT;
> >   
> >  -  iowrite32(le32_to_cpu(val), io + off);
> >  +  iowrite32_native(val, io + off);
> > } else {
> >  -  val = cpu_to_le32(ioread32(io + off));
> >  +  val = ioread32_native(io + off);
> >   
> > if (copy_to_user(buf, , 4))
> > return -EFAULT;
> >  @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char 
> >  __user *buf,
> > if (copy_from_user(, buf, 2))
> > return -EFAULT;
> >   
> >  -  iowrite16(le16_to_cpu(val), io + off);
> >  +  iowrite16_native(val, io + off);
> > } else {
> >  -  val = cpu_to_le16(ioread16(io + off));
> >  +  val = ioread16_native(io + off);
> >   
> > if (copy_to_user(buf, , 2))
> > return -EFAULT;
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > > 
> > > 
> > 
> > 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Alexey Kardashevskiy
On 06/20/2014 01:21 PM, Alex Williamson wrote:
> On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote:
>> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
>>> On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 04:35 AM, Alex Williamson wrote:
> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>> VFIO exposes BARs to user space as a byte stream so userspace can
>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>> not do byte swapping and simply return values as it gets them from
>> PCI device.
>>
>> Instead, the existing code assumes that byte stream in read/write is
>> little-endian and it fixes endianness for values which it passes to
>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>> little endian and le32_to_cpu/... are stubs.
>
> vfio read32:
>
> val = cpu_to_le32(ioread32(io + off));
>
> Where the typical x86 case, ioread32 is:
>
> #define ioread32(addr)  readl(addr)
>
> and readl is:
>
> __le32_to_cpu(__raw_readl(addr));
>
> So we do canceling byte swaps, which are both nops on x86, and end up
> returning device endian, which we assume is little endian.
>
> vfio write32 is similar:
>
> iowrite32(le32_to_cpu(val), io + off);
>
> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
> out, so input data is device endian, which is assumed little.
>
>> This also works for big endian but rather by an accident: it reads 4 
>> bytes
>> from the stream (@val is big endian), converts to CPU format (which 
>> should
>> be big endian) as it was little endian (@val becomes actually little
>> endian) and calls iowrite32() which does not do swapping on big endian
>> system.
>
> Really?
>
> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
> writel(), which seems to use the generic implementation, which does
> include a cpu_to_le32.


 Ouch, wrong comment. iowrite32() does swapping. My bad.


>
> I also see other big endian archs like parisc doing cpu_to_le32 on
> iowrite32, so I don't think this statement is true.  I imagine it's
> probably working for you because the swap cancel.
>
>> This removes byte swapping and makes use ioread32be/iowrite32be
>> (and 16bit versions) on big-endian systems. The "be" helpers take
>> native endian values and do swapping at the moment of writing to a PCI
>> register using one of "store byte-reversed" instructions.
>
> So now you want iowrite32() on little endian and iowrite32be() on big
> endian, the former does a cpu_to_le32 (which is a nop on little endian)
> and the latter does a cpu_to_be32 (which is a nop on big endian)...
> should we just be using __raw_writel() on both?


 We can do that too. The beauty of iowrite32be on ppc64 is that it does not
 swap and write separately, it is implemented via the "Store Word
 Byte-Reverse Indexed X-form" single instruction.

 And some archs (do not know which ones) may add memory barriers in their
 implementations of ioread/iowrite. __raw_writel is too raw :)

>  There doesn't actually
> seem to be any change in behavior here, it just eliminates back-to-back
> byte swaps, which are a nop on x86, but not power, right?

 Exactly. No dependency for QEMU.
>>>
>>> How about that:
>>> ===
>>>
>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>> not do byte swapping and simply return values as it gets them from
>>> PCI device.
>>>
>>> Instead, the existing code assumes that byte stream in read/write is
>>> little-endian and it fixes endianness for values which it passes to
>>> ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
>>> again. Since both byte swaps are nops on little-endian host, this works.
>>>
>>> This also works for big endian but rather by an accident: it reads 4 bytes
>>> from the stream (@val is big endian), converts to CPU format (which should
>>> be big endian) as it was little endian (and @val becomes actually little
>>> endian) and calls iowrite32() which does swapping on big endian
>>> system again. So byte swap gets cancelled, __raw_writel() receives
>>> a native value and then
>>> *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
>>> just does the right thing.
>>
>> I am wrong here, sorry. This is what happens when you watch soccer between
>> 2am and 4am :)
>>
>>
>>>
>>> This removes byte swaps and makes use of ioread32be/iowrite32be
>>> (and 16bit versions) which do explicit byte swapping at the moment
>>> of write to a PCI register. PPC64 uses a special "Store Word
>>> Byte-Reverse Indexed X-form" instruction which does swap and store.
>>
>> No 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Alexey Kardashevskiy
On 06/20/2014 01:21 PM, Alex Williamson wrote:
 On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote:
 On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 04:35 AM, Alex Williamson wrote:
 On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.

 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
 little endian and le32_to_cpu/... are stubs.

 vfio read32:

 val = cpu_to_le32(ioread32(io + off));

 Where the typical x86 case, ioread32 is:

 #define ioread32(addr)  readl(addr)

 and readl is:

 __le32_to_cpu(__raw_readl(addr));

 So we do canceling byte swaps, which are both nops on x86, and end up
 returning device endian, which we assume is little endian.

 vfio write32 is similar:

 iowrite32(le32_to_cpu(val), io + off);

 The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
 out, so input data is device endian, which is assumed little.

 This also works for big endian but rather by an accident: it reads 4 
 bytes
 from the stream (@val is big endian), converts to CPU format (which 
 should
 be big endian) as it was little endian (@val becomes actually little
 endian) and calls iowrite32() which does not do swapping on big endian
 system.

 Really?

 In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
 writel(), which seems to use the generic implementation, which does
 include a cpu_to_le32.


 Ouch, wrong comment. iowrite32() does swapping. My bad.



 I also see other big endian archs like parisc doing cpu_to_le32 on
 iowrite32, so I don't think this statement is true.  I imagine it's
 probably working for you because the swap cancel.

 This removes byte swapping and makes use ioread32be/iowrite32be
 (and 16bit versions) on big-endian systems. The be helpers take
 native endian values and do swapping at the moment of writing to a PCI
 register using one of store byte-reversed instructions.

 So now you want iowrite32() on little endian and iowrite32be() on big
 endian, the former does a cpu_to_le32 (which is a nop on little endian)
 and the latter does a cpu_to_be32 (which is a nop on big endian)...
 should we just be using __raw_writel() on both?


 We can do that too. The beauty of iowrite32be on ppc64 is that it does not
 swap and write separately, it is implemented via the Store Word
 Byte-Reverse Indexed X-form single instruction.

 And some archs (do not know which ones) may add memory barriers in their
 implementations of ioread/iowrite. __raw_writel is too raw :)

  There doesn't actually
 seem to be any change in behavior here, it just eliminates back-to-back
 byte swaps, which are a nop on x86, but not power, right?

 Exactly. No dependency for QEMU.

 How about that:
 ===

 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.

 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
 again. Since both byte swaps are nops on little-endian host, this works.

 This also works for big endian but rather by an accident: it reads 4 bytes
 from the stream (@val is big endian), converts to CPU format (which should
 be big endian) as it was little endian (and @val becomes actually little
 endian) and calls iowrite32() which does swapping on big endian
 system again. So byte swap gets cancelled, __raw_writel() receives
 a native value and then
 *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
 just does the right thing.

 I am wrong here, sorry. This is what happens when you watch soccer between
 2am and 4am :)



 This removes byte swaps and makes use of ioread32be/iowrite32be
 (and 16bit versions) which do explicit byte swapping at the moment
 of write to a PCI register. PPC64 uses a special Store Word
 Byte-Reverse Indexed X-form instruction which does swap and store.

 No swapping is done here if we use ioread32be as it calls in_be32 and that
 animal does lwz which is simple load from memory.

 So @val (16/32 bit variable on stack) will have different values on LE and
 BE but since we do not handle it the host and just memcpy it to the buffer,
 nothing breaks here.


 So it should be like this:
 ===
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Benjamin Herrenschmidt
On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

 Working on big endian being an accident may be a matter of perspective

 :-)

 The comment remains that this patch doesn't actually fix anything except
 the overhead on big endian systems doing redundant byte swapping and
 maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual cancelling swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.

 I'm still not a fan of iowrite vs iowritebe, there must be something we
 can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the be
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just don't swap, they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects). 

  Calling it iowrite*_native is also an abuse of the namespace.


  Next thing we know some common code
 will legitimately use that name. 

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of native
byte order MMIOs iirc (though don't ask me for examples, I can't really
remember).

  If we do need to define an alias
 (which I'd like to avoid) it should be something like vfio_iowrite32.
 Thanks,

Cheers,
Ben.

 Alex
 
   ===
   
   any better?
   
   
   
   
   Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org
   Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
   ---
drivers/vfio/pci/vfio_pci_rdwr.c | 20 
1 file changed, 16 insertions(+), 4 deletions(-)
  
   diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
   b/drivers/vfio/pci/vfio_pci_rdwr.c
   index 210db24..f363b5a 100644
   --- a/drivers/vfio/pci/vfio_pci_rdwr.c
   +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
   @@ -21,6 +21,18 @@

#include vfio_pci_private.h

   +#ifdef __BIG_ENDIAN__
   +#define ioread16_native   ioread16be
   +#define ioread32_native   ioread32be
   +#define iowrite16_native  iowrite16be
   +#define iowrite32_native  iowrite32be
   +#else
   +#define ioread16_native   ioread16
   +#define ioread32_native   ioread32
   +#define iowrite16_native  iowrite16
   +#define iowrite32_native  iowrite32
   +#endif
   +
/*
 * Read or write from an __iomem region (MMIO or I/O port) with an 
   excluded
 * range which is inaccessible.  The excluded range drops writes and 
   fills
   @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char 
   __user *buf,
  if (copy_from_user(val, buf, 4))
  return -EFAULT;

   -  iowrite32(le32_to_cpu(val), io + off);
   +  iowrite32_native(val, io + off);
  } else {
   -  val = cpu_to_le32(ioread32(io + off));
   +  val = ioread32_native(io + off);

  if (copy_to_user(buf, val, 4))
  return -EFAULT;
   @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char 
   __user *buf,
  if (copy_from_user(val, buf, 2))
  return -EFAULT;

   -  iowrite16(le16_to_cpu(val), io + off);
   +  iowrite16_native(val, io + off);
  } else {
   -  val = cpu_to_le16(ioread16(io + off));
   +  val = ioread16_native(io + off);

  if (copy_to_user(buf, val, 2))
  return -EFAULT;
  
  
  
  
  
   
   
  
  
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Benjamin Herrenschmidt
On Sat, 2014-06-21 at 00:14 +1000, Alexey Kardashevskiy wrote:

 We can still use __raw_writelco, would that be ok?

No unless you understand precisely what kind of memory barriers each
platform require for these.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-19 Thread Alex Williamson
On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote:
> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
> > On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
> >> On 06/19/2014 04:35 AM, Alex Williamson wrote:
> >>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>  VFIO exposes BARs to user space as a byte stream so userspace can
>  read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>  not do byte swapping and simply return values as it gets them from
>  PCI device.
> 
>  Instead, the existing code assumes that byte stream in read/write is
>  little-endian and it fixes endianness for values which it passes to
>  ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>  little endian and le32_to_cpu/... are stubs.
> >>>
> >>> vfio read32:
> >>>
> >>> val = cpu_to_le32(ioread32(io + off));
> >>>
> >>> Where the typical x86 case, ioread32 is:
> >>>
> >>> #define ioread32(addr)  readl(addr)
> >>>
> >>> and readl is:
> >>>
> >>> __le32_to_cpu(__raw_readl(addr));
> >>>
> >>> So we do canceling byte swaps, which are both nops on x86, and end up
> >>> returning device endian, which we assume is little endian.
> >>>
> >>> vfio write32 is similar:
> >>>
> >>> iowrite32(le32_to_cpu(val), io + off);
> >>>
> >>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
> >>> out, so input data is device endian, which is assumed little.
> >>>
>  This also works for big endian but rather by an accident: it reads 4 
>  bytes
>  from the stream (@val is big endian), converts to CPU format (which 
>  should
>  be big endian) as it was little endian (@val becomes actually little
>  endian) and calls iowrite32() which does not do swapping on big endian
>  system.
> >>>
> >>> Really?
> >>>
> >>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
> >>> writel(), which seems to use the generic implementation, which does
> >>> include a cpu_to_le32.
> >>
> >>
> >> Ouch, wrong comment. iowrite32() does swapping. My bad.
> >>
> >>
> >>>
> >>> I also see other big endian archs like parisc doing cpu_to_le32 on
> >>> iowrite32, so I don't think this statement is true.  I imagine it's
> >>> probably working for you because the swap cancel.
> >>>
>  This removes byte swapping and makes use ioread32be/iowrite32be
>  (and 16bit versions) on big-endian systems. The "be" helpers take
>  native endian values and do swapping at the moment of writing to a PCI
>  register using one of "store byte-reversed" instructions.
> >>>
> >>> So now you want iowrite32() on little endian and iowrite32be() on big
> >>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
> >>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
> >>> should we just be using __raw_writel() on both?
> >>
> >>
> >> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
> >> swap and write separately, it is implemented via the "Store Word
> >> Byte-Reverse Indexed X-form" single instruction.
> >>
> >> And some archs (do not know which ones) may add memory barriers in their
> >> implementations of ioread/iowrite. __raw_writel is too raw :)
> >>
> >>>  There doesn't actually
> >>> seem to be any change in behavior here, it just eliminates back-to-back
> >>> byte swaps, which are a nop on x86, but not power, right?
> >>
> >> Exactly. No dependency for QEMU.
> > 
> > How about that:
> > ===
> > 
> > VFIO exposes BARs to user space as a byte stream so userspace can
> > read it using pread()/pwrite(). Since this is a byte stream, VFIO should
> > not do byte swapping and simply return values as it gets them from
> > PCI device.
> > 
> > Instead, the existing code assumes that byte stream in read/write is
> > little-endian and it fixes endianness for values which it passes to
> > ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
> > again. Since both byte swaps are nops on little-endian host, this works.
> > 
> > This also works for big endian but rather by an accident: it reads 4 bytes
> > from the stream (@val is big endian), converts to CPU format (which should
> > be big endian) as it was little endian (and @val becomes actually little
> > endian) and calls iowrite32() which does swapping on big endian
> > system again. So byte swap gets cancelled, __raw_writel() receives
> > a native value and then
> > *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
> > just does the right thing.
> 
> I am wrong here, sorry. This is what happens when you watch soccer between
> 2am and 4am :)
> 
> 
> > 
> > This removes byte swaps and makes use of ioread32be/iowrite32be
> > (and 16bit versions) which do explicit byte swapping at the moment
> > of write to a PCI register. PPC64 uses a special "Store Word
> > Byte-Reverse Indexed X-form" instruction which does swap and store.
> 
> No swapping is done here if we use ioread32be as it 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-19 Thread Alexey Kardashevskiy
On 06/19/2014 03:30 PM, bharat.bhus...@freescale.com wrote:
> 
> 
>> -Original Message-
>> From: Linuxppc-dev [mailto:linuxppc-dev-
>> bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey
>> Kardashevskiy
>> Sent: Thursday, June 19, 2014 9:18 AM
>> To: Alex Williamson
>> Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org;
>> Alexander Graf; linuxppc-...@lists.ozlabs.org
>> Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs
>>
>> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
>>> On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
>>>> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>>>>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>>>>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>>>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO
>>>>>> should not do byte swapping and simply return values as it gets
>>>>>> them from PCI device.
>>>>>>
>>>>>> Instead, the existing code assumes that byte stream in read/write
>>>>>> is little-endian and it fixes endianness for values which it passes
>>>>>> to ioreadXX/iowriteXX helpers. This works for little-endian as PCI
>>>>>> is little endian and le32_to_cpu/... are stubs.
>>>>>
>>>>> vfio read32:
>>>>>
>>>>> val = cpu_to_le32(ioread32(io + off));
>>>>>
>>>>> Where the typical x86 case, ioread32 is:
>>>>>
>>>>> #define ioread32(addr)  readl(addr)
>>>>>
>>>>> and readl is:
>>>>>
>>>>> __le32_to_cpu(__raw_readl(addr));
>>>>>
>>>>> So we do canceling byte swaps, which are both nops on x86, and end
>>>>> up returning device endian, which we assume is little endian.
>>>>>
>>>>> vfio write32 is similar:
>>>>>
>>>>> iowrite32(le32_to_cpu(val), io + off);
>>>>>
>>>>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>>>>> out, so input data is device endian, which is assumed little.
>>>>>
>>>>>> This also works for big endian but rather by an accident: it reads
>>>>>> 4 bytes from the stream (@val is big endian), converts to CPU
>>>>>> format (which should be big endian) as it was little endian (@val
>>>>>> becomes actually little
>>>>>> endian) and calls iowrite32() which does not do swapping on big
>>>>>> endian system.
>>>>>
>>>>> Really?
>>>>>
>>>>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>>>>> writel(), which seems to use the generic implementation, which does
>>>>> include a cpu_to_le32.
>>>>
>>>>
>>>> Ouch, wrong comment. iowrite32() does swapping. My bad.
>>>>
>>>>
>>>>>
>>>>> I also see other big endian archs like parisc doing cpu_to_le32 on
>>>>> iowrite32, so I don't think this statement is true.  I imagine it's
>>>>> probably working for you because the swap cancel.
>>>>>
>>>>>> This removes byte swapping and makes use ioread32be/iowrite32be
>>>>>> (and 16bit versions) on big-endian systems. The "be" helpers take
>>>>>> native endian values and do swapping at the moment of writing to a
>>>>>> PCI register using one of "store byte-reversed" instructions.
>>>>>
>>>>> So now you want iowrite32() on little endian and iowrite32be() on
>>>>> big endian, the former does a cpu_to_le32 (which is a nop on little
>>>>> endian) and the latter does a cpu_to_be32 (which is a nop on big 
>>>>> endian)...
>>>>> should we just be using __raw_writel() on both?
>>>>
>>>>
>>>> We can do that too. The beauty of iowrite32be on ppc64 is that it
>>>> does not swap and write separately, it is implemented via the "Store
>>>> Word Byte-Reverse Indexed X-form" single instruction.
>>>>
>>>> And some archs (do not know which ones) may add memory barriers in
>>>> their implementations of ioread/iowrite. __raw_writel is too raw :)
>>>>
>>>>>  There doesn't actually
>

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-19 Thread Alexey Kardashevskiy
On 06/19/2014 03:30 PM, bharat.bhus...@freescale.com wrote:
 
 
 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey
 Kardashevskiy
 Sent: Thursday, June 19, 2014 9:18 AM
 To: Alex Williamson
 Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org;
 Alexander Graf; linuxppc-...@lists.ozlabs.org
 Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs

 On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 04:35 AM, Alex Williamson wrote:
 On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO
 should not do byte swapping and simply return values as it gets
 them from PCI device.

 Instead, the existing code assumes that byte stream in read/write
 is little-endian and it fixes endianness for values which it passes
 to ioreadXX/iowriteXX helpers. This works for little-endian as PCI
 is little endian and le32_to_cpu/... are stubs.

 vfio read32:

 val = cpu_to_le32(ioread32(io + off));

 Where the typical x86 case, ioread32 is:

 #define ioread32(addr)  readl(addr)

 and readl is:

 __le32_to_cpu(__raw_readl(addr));

 So we do canceling byte swaps, which are both nops on x86, and end
 up returning device endian, which we assume is little endian.

 vfio write32 is similar:

 iowrite32(le32_to_cpu(val), io + off);

 The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
 out, so input data is device endian, which is assumed little.

 This also works for big endian but rather by an accident: it reads
 4 bytes from the stream (@val is big endian), converts to CPU
 format (which should be big endian) as it was little endian (@val
 becomes actually little
 endian) and calls iowrite32() which does not do swapping on big
 endian system.

 Really?

 In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
 writel(), which seems to use the generic implementation, which does
 include a cpu_to_le32.


 Ouch, wrong comment. iowrite32() does swapping. My bad.



 I also see other big endian archs like parisc doing cpu_to_le32 on
 iowrite32, so I don't think this statement is true.  I imagine it's
 probably working for you because the swap cancel.

 This removes byte swapping and makes use ioread32be/iowrite32be
 (and 16bit versions) on big-endian systems. The be helpers take
 native endian values and do swapping at the moment of writing to a
 PCI register using one of store byte-reversed instructions.

 So now you want iowrite32() on little endian and iowrite32be() on
 big endian, the former does a cpu_to_le32 (which is a nop on little
 endian) and the latter does a cpu_to_be32 (which is a nop on big 
 endian)...
 should we just be using __raw_writel() on both?


 We can do that too. The beauty of iowrite32be on ppc64 is that it
 does not swap and write separately, it is implemented via the Store
 Word Byte-Reverse Indexed X-form single instruction.

 And some archs (do not know which ones) may add memory barriers in
 their implementations of ioread/iowrite. __raw_writel is too raw :)

  There doesn't actually
 seem to be any change in behavior here, it just eliminates
 back-to-back byte swaps, which are a nop on x86, but not power, right?

 Exactly. No dependency for QEMU.

 How about that:
 ===

 VFIO exposes BARs to user space as a byte stream so userspace can read
 it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from PCI
 device.

 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers in native format. The IO helpers do
 swapping again. Since both byte swaps are nops on little-endian host, this
 works.

 This also works for big endian but rather by an accident: it reads 4
 bytes from the stream (@val is big endian), converts to CPU format
 (which should be big endian) as it was little endian (and @val becomes
 actually little
 endian) and calls iowrite32() which does swapping on big endian system
 again. So byte swap gets cancelled, __raw_writel() receives a native
 value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) =
 v; just does the right thing.

 I am wrong here, sorry. This is what happens when you watch soccer between 
 2am
 and 4am :)



 This removes byte swaps and makes use of ioread32be/iowrite32be (and
 16bit versions) which do explicit byte swapping at the moment of write
 to a PCI register. PPC64 uses a special Store Word Byte-Reverse
 Indexed X-form instruction which does swap and store.

 No swapping is done here if we use ioread32be as it calls in_be32 and that
 animal does lwz which is simple load from memory.

 So @val (16/32 bit

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-19 Thread Alex Williamson
On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote:
 On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
  On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
  On 06/19/2014 04:35 AM, Alex Williamson wrote:
  On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
  VFIO exposes BARs to user space as a byte stream so userspace can
  read it using pread()/pwrite(). Since this is a byte stream, VFIO should
  not do byte swapping and simply return values as it gets them from
  PCI device.
 
  Instead, the existing code assumes that byte stream in read/write is
  little-endian and it fixes endianness for values which it passes to
  ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
  little endian and le32_to_cpu/... are stubs.
 
  vfio read32:
 
  val = cpu_to_le32(ioread32(io + off));
 
  Where the typical x86 case, ioread32 is:
 
  #define ioread32(addr)  readl(addr)
 
  and readl is:
 
  __le32_to_cpu(__raw_readl(addr));
 
  So we do canceling byte swaps, which are both nops on x86, and end up
  returning device endian, which we assume is little endian.
 
  vfio write32 is similar:
 
  iowrite32(le32_to_cpu(val), io + off);
 
  The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
  out, so input data is device endian, which is assumed little.
 
  This also works for big endian but rather by an accident: it reads 4 
  bytes
  from the stream (@val is big endian), converts to CPU format (which 
  should
  be big endian) as it was little endian (@val becomes actually little
  endian) and calls iowrite32() which does not do swapping on big endian
  system.
 
  Really?
 
  In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
  writel(), which seems to use the generic implementation, which does
  include a cpu_to_le32.
 
 
  Ouch, wrong comment. iowrite32() does swapping. My bad.
 
 
 
  I also see other big endian archs like parisc doing cpu_to_le32 on
  iowrite32, so I don't think this statement is true.  I imagine it's
  probably working for you because the swap cancel.
 
  This removes byte swapping and makes use ioread32be/iowrite32be
  (and 16bit versions) on big-endian systems. The be helpers take
  native endian values and do swapping at the moment of writing to a PCI
  register using one of store byte-reversed instructions.
 
  So now you want iowrite32() on little endian and iowrite32be() on big
  endian, the former does a cpu_to_le32 (which is a nop on little endian)
  and the latter does a cpu_to_be32 (which is a nop on big endian)...
  should we just be using __raw_writel() on both?
 
 
  We can do that too. The beauty of iowrite32be on ppc64 is that it does not
  swap and write separately, it is implemented via the Store Word
  Byte-Reverse Indexed X-form single instruction.
 
  And some archs (do not know which ones) may add memory barriers in their
  implementations of ioread/iowrite. __raw_writel is too raw :)
 
   There doesn't actually
  seem to be any change in behavior here, it just eliminates back-to-back
  byte swaps, which are a nop on x86, but not power, right?
 
  Exactly. No dependency for QEMU.
  
  How about that:
  ===
  
  VFIO exposes BARs to user space as a byte stream so userspace can
  read it using pread()/pwrite(). Since this is a byte stream, VFIO should
  not do byte swapping and simply return values as it gets them from
  PCI device.
  
  Instead, the existing code assumes that byte stream in read/write is
  little-endian and it fixes endianness for values which it passes to
  ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
  again. Since both byte swaps are nops on little-endian host, this works.
  
  This also works for big endian but rather by an accident: it reads 4 bytes
  from the stream (@val is big endian), converts to CPU format (which should
  be big endian) as it was little endian (and @val becomes actually little
  endian) and calls iowrite32() which does swapping on big endian
  system again. So byte swap gets cancelled, __raw_writel() receives
  a native value and then
  *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
  just does the right thing.
 
 I am wrong here, sorry. This is what happens when you watch soccer between
 2am and 4am :)
 
 
  
  This removes byte swaps and makes use of ioread32be/iowrite32be
  (and 16bit versions) which do explicit byte swapping at the moment
  of write to a PCI register. PPC64 uses a special Store Word
  Byte-Reverse Indexed X-form instruction which does swap and store.
 
 No swapping is done here if we use ioread32be as it calls in_be32 and that
 animal does lwz which is simple load from memory.
 
 So @val (16/32 bit variable on stack) will have different values on LE and
 BE but since we do not handle it the host and just memcpy it to the buffer,
 nothing breaks here.
 
 
 So it should be like this:
 ===
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this 

RE: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread bharat.bhus...@freescale.com


> -Original Message-
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey
> Kardashevskiy
> Sent: Thursday, June 19, 2014 9:18 AM
> To: Alex Williamson
> Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org;
> Alexander Graf; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs
> 
> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
> > On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
> >> On 06/19/2014 04:35 AM, Alex Williamson wrote:
> >>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
> >>>> VFIO exposes BARs to user space as a byte stream so userspace can
> >>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO
> >>>> should not do byte swapping and simply return values as it gets
> >>>> them from PCI device.
> >>>>
> >>>> Instead, the existing code assumes that byte stream in read/write
> >>>> is little-endian and it fixes endianness for values which it passes
> >>>> to ioreadXX/iowriteXX helpers. This works for little-endian as PCI
> >>>> is little endian and le32_to_cpu/... are stubs.
> >>>
> >>> vfio read32:
> >>>
> >>> val = cpu_to_le32(ioread32(io + off));
> >>>
> >>> Where the typical x86 case, ioread32 is:
> >>>
> >>> #define ioread32(addr)  readl(addr)
> >>>
> >>> and readl is:
> >>>
> >>> __le32_to_cpu(__raw_readl(addr));
> >>>
> >>> So we do canceling byte swaps, which are both nops on x86, and end
> >>> up returning device endian, which we assume is little endian.
> >>>
> >>> vfio write32 is similar:
> >>>
> >>> iowrite32(le32_to_cpu(val), io + off);
> >>>
> >>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
> >>> out, so input data is device endian, which is assumed little.
> >>>
> >>>> This also works for big endian but rather by an accident: it reads
> >>>> 4 bytes from the stream (@val is big endian), converts to CPU
> >>>> format (which should be big endian) as it was little endian (@val
> >>>> becomes actually little
> >>>> endian) and calls iowrite32() which does not do swapping on big
> >>>> endian system.
> >>>
> >>> Really?
> >>>
> >>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
> >>> writel(), which seems to use the generic implementation, which does
> >>> include a cpu_to_le32.
> >>
> >>
> >> Ouch, wrong comment. iowrite32() does swapping. My bad.
> >>
> >>
> >>>
> >>> I also see other big endian archs like parisc doing cpu_to_le32 on
> >>> iowrite32, so I don't think this statement is true.  I imagine it's
> >>> probably working for you because the swap cancel.
> >>>
> >>>> This removes byte swapping and makes use ioread32be/iowrite32be
> >>>> (and 16bit versions) on big-endian systems. The "be" helpers take
> >>>> native endian values and do swapping at the moment of writing to a
> >>>> PCI register using one of "store byte-reversed" instructions.
> >>>
> >>> So now you want iowrite32() on little endian and iowrite32be() on
> >>> big endian, the former does a cpu_to_le32 (which is a nop on little
> >>> endian) and the latter does a cpu_to_be32 (which is a nop on big 
> >>> endian)...
> >>> should we just be using __raw_writel() on both?
> >>
> >>
> >> We can do that too. The beauty of iowrite32be on ppc64 is that it
> >> does not swap and write separately, it is implemented via the "Store
> >> Word Byte-Reverse Indexed X-form" single instruction.
> >>
> >> And some archs (do not know which ones) may add memory barriers in
> >> their implementations of ioread/iowrite. __raw_writel is too raw :)
> >>
> >>>  There doesn't actually
> >>> seem to be any change in behavior here, it just eliminates
> >>> back-to-back byte swaps, which are a nop on x86, but not power, right?
> >>
> >> Exactly. No dependency for QEMU.
> >
> > How about that:
> > ===
> >
> > VFIO exposes BARs to user space as a byte stream so

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alexey Kardashevskiy
On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
> On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
>> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.

 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
 little endian and le32_to_cpu/... are stubs.
>>>
>>> vfio read32:
>>>
>>> val = cpu_to_le32(ioread32(io + off));
>>>
>>> Where the typical x86 case, ioread32 is:
>>>
>>> #define ioread32(addr)  readl(addr)
>>>
>>> and readl is:
>>>
>>> __le32_to_cpu(__raw_readl(addr));
>>>
>>> So we do canceling byte swaps, which are both nops on x86, and end up
>>> returning device endian, which we assume is little endian.
>>>
>>> vfio write32 is similar:
>>>
>>> iowrite32(le32_to_cpu(val), io + off);
>>>
>>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>>> out, so input data is device endian, which is assumed little.
>>>
 This also works for big endian but rather by an accident: it reads 4 bytes
 from the stream (@val is big endian), converts to CPU format (which should
 be big endian) as it was little endian (@val becomes actually little
 endian) and calls iowrite32() which does not do swapping on big endian
 system.
>>>
>>> Really?
>>>
>>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>>> writel(), which seems to use the generic implementation, which does
>>> include a cpu_to_le32.
>>
>>
>> Ouch, wrong comment. iowrite32() does swapping. My bad.
>>
>>
>>>
>>> I also see other big endian archs like parisc doing cpu_to_le32 on
>>> iowrite32, so I don't think this statement is true.  I imagine it's
>>> probably working for you because the swap cancel.
>>>
 This removes byte swapping and makes use ioread32be/iowrite32be
 (and 16bit versions) on big-endian systems. The "be" helpers take
 native endian values and do swapping at the moment of writing to a PCI
 register using one of "store byte-reversed" instructions.
>>>
>>> So now you want iowrite32() on little endian and iowrite32be() on big
>>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
>>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
>>> should we just be using __raw_writel() on both?
>>
>>
>> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
>> swap and write separately, it is implemented via the "Store Word
>> Byte-Reverse Indexed X-form" single instruction.
>>
>> And some archs (do not know which ones) may add memory barriers in their
>> implementations of ioread/iowrite. __raw_writel is too raw :)
>>
>>>  There doesn't actually
>>> seem to be any change in behavior here, it just eliminates back-to-back
>>> byte swaps, which are a nop on x86, but not power, right?
>>
>> Exactly. No dependency for QEMU.
> 
> How about that:
> ===
> 
> VFIO exposes BARs to user space as a byte stream so userspace can
> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
> not do byte swapping and simply return values as it gets them from
> PCI device.
> 
> Instead, the existing code assumes that byte stream in read/write is
> little-endian and it fixes endianness for values which it passes to
> ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
> again. Since both byte swaps are nops on little-endian host, this works.
> 
> This also works for big endian but rather by an accident: it reads 4 bytes
> from the stream (@val is big endian), converts to CPU format (which should
> be big endian) as it was little endian (and @val becomes actually little
> endian) and calls iowrite32() which does swapping on big endian
> system again. So byte swap gets cancelled, __raw_writel() receives
> a native value and then
> *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
> just does the right thing.

I am wrong here, sorry. This is what happens when you watch soccer between
2am and 4am :)


> 
> This removes byte swaps and makes use of ioread32be/iowrite32be
> (and 16bit versions) which do explicit byte swapping at the moment
> of write to a PCI register. PPC64 uses a special "Store Word
> Byte-Reverse Indexed X-form" instruction which does swap and store.

No swapping is done here if we use ioread32be as it calls in_be32 and that
animal does "lwz" which is simple load from memory.

So @val (16/32 bit variable on stack) will have different values on LE and
BE but since we do not handle it the host and just memcpy it to the buffer,
nothing breaks here.


So it should be like this:
===
VFIO exposes BARs to user 

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alexey Kardashevskiy
On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>> not do byte swapping and simply return values as it gets them from
>>> PCI device.
>>>
>>> Instead, the existing code assumes that byte stream in read/write is
>>> little-endian and it fixes endianness for values which it passes to
>>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>>> little endian and le32_to_cpu/... are stubs.
>>
>> vfio read32:
>>
>> val = cpu_to_le32(ioread32(io + off));
>>
>> Where the typical x86 case, ioread32 is:
>>
>> #define ioread32(addr)  readl(addr)
>>
>> and readl is:
>>
>> __le32_to_cpu(__raw_readl(addr));
>>
>> So we do canceling byte swaps, which are both nops on x86, and end up
>> returning device endian, which we assume is little endian.
>>
>> vfio write32 is similar:
>>
>> iowrite32(le32_to_cpu(val), io + off);
>>
>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>> out, so input data is device endian, which is assumed little.
>>
>>> This also works for big endian but rather by an accident: it reads 4 bytes
>>> from the stream (@val is big endian), converts to CPU format (which should
>>> be big endian) as it was little endian (@val becomes actually little
>>> endian) and calls iowrite32() which does not do swapping on big endian
>>> system.
>>
>> Really?
>>
>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>> writel(), which seems to use the generic implementation, which does
>> include a cpu_to_le32.
> 
> 
> Ouch, wrong comment. iowrite32() does swapping. My bad.
> 
> 
>>
>> I also see other big endian archs like parisc doing cpu_to_le32 on
>> iowrite32, so I don't think this statement is true.  I imagine it's
>> probably working for you because the swap cancel.
>>
>>> This removes byte swapping and makes use ioread32be/iowrite32be
>>> (and 16bit versions) on big-endian systems. The "be" helpers take
>>> native endian values and do swapping at the moment of writing to a PCI
>>> register using one of "store byte-reversed" instructions.
>>
>> So now you want iowrite32() on little endian and iowrite32be() on big
>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
>> should we just be using __raw_writel() on both?
> 
> 
> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
> swap and write separately, it is implemented via the "Store Word
> Byte-Reverse Indexed X-form" single instruction.
> 
> And some archs (do not know which ones) may add memory barriers in their
> implementations of ioread/iowrite. __raw_writel is too raw :)
> 
>>  There doesn't actually
>> seem to be any change in behavior here, it just eliminates back-to-back
>> byte swaps, which are a nop on x86, but not power, right?
> 
> Exactly. No dependency for QEMU.

How about that:
===

VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device.

Instead, the existing code assumes that byte stream in read/write is
little-endian and it fixes endianness for values which it passes to
ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
again. Since both byte swaps are nops on little-endian host, this works.

This also works for big endian but rather by an accident: it reads 4 bytes
from the stream (@val is big endian), converts to CPU format (which should
be big endian) as it was little endian (and @val becomes actually little
endian) and calls iowrite32() which does swapping on big endian
system again. So byte swap gets cancelled, __raw_writel() receives
a native value and then
*(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
just does the right thing.

This removes byte swaps and makes use of ioread32be/iowrite32be
(and 16bit versions) which do explicit byte swapping at the moment
of write to a PCI register. PPC64 uses a special "Store Word
Byte-Reverse Indexed X-form" instruction which does swap and store.
===

any better?




>>> Suggested-by: Benjamin Herrenschmidt 
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
>>> b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> index 210db24..f363b5a 100644
>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> @@ -21,6 +21,18 @@
>>>  
>>>  #include "vfio_pci_private.h"
>>>  
>>> +#ifdef __BIG_ENDIAN__
>>> +#define ioread16_nativeioread16be
>>> +#define ioread32_native

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alexey Kardashevskiy
On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>> not do byte swapping and simply return values as it gets them from
>>> PCI device.
>>>
>>> Instead, the existing code assumes that byte stream in read/write is
>>> little-endian and it fixes endianness for values which it passes to
>>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>>> little endian and le32_to_cpu/... are stubs.
>>
>> vfio read32:
>>
>> val = cpu_to_le32(ioread32(io + off));
>>
>> Where the typical x86 case, ioread32 is:
>>
>> #define ioread32(addr)  readl(addr)
>>
>> and readl is:
>>
>> __le32_to_cpu(__raw_readl(addr));
>>
>> So we do canceling byte swaps, which are both nops on x86, and end up
>> returning device endian, which we assume is little endian.
>>
>> vfio write32 is similar:
>>
>> iowrite32(le32_to_cpu(val), io + off);
>>
>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>> out, so input data is device endian, which is assumed little.
>>
>>> This also works for big endian but rather by an accident: it reads 4 bytes
>>> from the stream (@val is big endian), converts to CPU format (which should
>>> be big endian) as it was little endian (@val becomes actually little
>>> endian) and calls iowrite32() which does not do swapping on big endian
>>> system.
>>
>> Really?
>>
>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>> writel(), which seems to use the generic implementation, which does
>> include a cpu_to_le32.
> 
> 
> Ouch, wrong comment. iowrite32() does swapping. My bad.
> 
> 
>>
>> I also see other big endian archs like parisc doing cpu_to_le32 on
>> iowrite32, so I don't think this statement is true.  I imagine it's
>> probably working for you because the swap cancel.
>>
>>> This removes byte swapping and makes use ioread32be/iowrite32be
>>> (and 16bit versions) on big-endian systems. The "be" helpers take
>>> native endian values and do swapping at the moment of writing to a PCI
>>> register using one of "store byte-reversed" instructions.
>>
>> So now you want iowrite32() on little endian and iowrite32be() on big
>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
>> should we just be using __raw_writel() on both?
> 
> 
> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
> swap and write separately, it is implemented via the "Store Word
> Byte-Reverse Indexed X-form" single instruction.
> 
> And some archs (do not know which ones) may add memory barriers in their
> implementations of ioread/iowrite. __raw_writel is too raw :)
> 
>>  There doesn't actually
>> seem to be any change in behavior here, it just eliminates back-to-back
>> byte swaps, which are a nop on x86, but not power, right?
> 
> Exactly. No dependency for QEMU.

How about that:
===

VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device.

Instead, the existing code assumes that byte stream in read/write is
little-endian and it fixes endianness for values which it passes to
ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
again. Since both byte swaps are nops on little-endian host, this works.

This also works for big endian but rather by an accident: it reads 4 bytes
from the stream (@val is big endian), converts to CPU format (which should
be big endian) as it was little endian (and @val becomes actually little
endian) and calls iowrite32() which does swapping on big endian
system again. So byte swap gets cancelled, __raw_writel() receives
a native value and then
*(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
just does the right thing.

This removes byte swaps and makes use of ioread32be/iowrite32be
(and 16bit versions) which do explicit byte swapping at the moment
of write to a PCI register. PPC64 uses a special "Store Word
Byte-Reverse Indexed X-form" instruction which does swap and store.
===

any better?




>>> Suggested-by: Benjamin Herrenschmidt 
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
>>> b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> index 210db24..f363b5a 100644
>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> @@ -21,6 +21,18 @@
>>>  
>>>  #include "vfio_pci_private.h"
>>>  
>>> +#ifdef __BIG_ENDIAN__
>>> +#define ioread16_nativeioread16be
>>> +#define ioread32_native

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alexey Kardashevskiy
On 06/19/2014 04:35 AM, Alex Williamson wrote:
> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>> VFIO exposes BARs to user space as a byte stream so userspace can
>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>> not do byte swapping and simply return values as it gets them from
>> PCI device.
>>
>> Instead, the existing code assumes that byte stream in read/write is
>> little-endian and it fixes endianness for values which it passes to
>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>> little endian and le32_to_cpu/... are stubs.
> 
> vfio read32:
> 
> val = cpu_to_le32(ioread32(io + off));
> 
> Where the typical x86 case, ioread32 is:
> 
> #define ioread32(addr)  readl(addr)
> 
> and readl is:
> 
> __le32_to_cpu(__raw_readl(addr));
> 
> So we do canceling byte swaps, which are both nops on x86, and end up
> returning device endian, which we assume is little endian.
> 
> vfio write32 is similar:
> 
> iowrite32(le32_to_cpu(val), io + off);
> 
> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
> out, so input data is device endian, which is assumed little.
> 
>> This also works for big endian but rather by an accident: it reads 4 bytes
>> from the stream (@val is big endian), converts to CPU format (which should
>> be big endian) as it was little endian (@val becomes actually little
>> endian) and calls iowrite32() which does not do swapping on big endian
>> system.
> 
> Really?
> 
> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
> writel(), which seems to use the generic implementation, which does
> include a cpu_to_le32.


Ouch, wrong comment. iowrite32() does swapping. My bad.


> 
> I also see other big endian archs like parisc doing cpu_to_le32 on
> iowrite32, so I don't think this statement is true.  I imagine it's
> probably working for you because the swap cancel.
> 
>> This removes byte swapping and makes use ioread32be/iowrite32be
>> (and 16bit versions) on big-endian systems. The "be" helpers take
>> native endian values and do swapping at the moment of writing to a PCI
>> register using one of "store byte-reversed" instructions.
> 
> So now you want iowrite32() on little endian and iowrite32be() on big
> endian, the former does a cpu_to_le32 (which is a nop on little endian)
> and the latter does a cpu_to_be32 (which is a nop on big endian)...
> should we just be using __raw_writel() on both?


We can do that too. The beauty of iowrite32be on ppc64 is that it does not
swap and write separately, it is implemented via the "Store Word
Byte-Reverse Indexed X-form" single instruction.

And some archs (do not know which ones) may add memory barriers in their
implementations of ioread/iowrite. __raw_writel is too raw :)

>  There doesn't actually
> seem to be any change in behavior here, it just eliminates back-to-back
> byte swaps, which are a nop on x86, but not power, right?

Exactly. No dependency for QEMU.


> 
>> Suggested-by: Benjamin Herrenschmidt 
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
>> b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index 210db24..f363b5a 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -21,6 +21,18 @@
>>  
>>  #include "vfio_pci_private.h"
>>  
>> +#ifdef __BIG_ENDIAN__
>> +#define ioread16_native ioread16be
>> +#define ioread32_native ioread32be
>> +#define iowrite16_nativeiowrite16be
>> +#define iowrite32_nativeiowrite32be
>> +#else
>> +#define ioread16_native ioread16
>> +#define ioread32_native ioread32
>> +#define iowrite16_nativeiowrite16
>> +#define iowrite32_nativeiowrite32
>> +#endif
>> +
>>  /*
>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>>   * range which is inaccessible.  The excluded range drops writes and fills
>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>  if (copy_from_user(, buf, 4))
>>  return -EFAULT;
>>  
>> -iowrite32(le32_to_cpu(val), io + off);
>> +iowrite32_native(val, io + off);
>>  } else {
>> -val = cpu_to_le32(ioread32(io + off));
>> +val = ioread32_native(io + off);
>>  
>>  if (copy_to_user(buf, , 4))
>>  return -EFAULT;
>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>  if (copy_from_user(, buf, 2))
>>  return -EFAULT;
>>  
>> -iowrite16(le16_to_cpu(val), io + off);
>> +

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alex Williamson
On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
> VFIO exposes BARs to user space as a byte stream so userspace can
> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
> not do byte swapping and simply return values as it gets them from
> PCI device.
> 
> Instead, the existing code assumes that byte stream in read/write is
> little-endian and it fixes endianness for values which it passes to
> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
> little endian and le32_to_cpu/... are stubs.

vfio read32:

val = cpu_to_le32(ioread32(io + off));

Where the typical x86 case, ioread32 is:

#define ioread32(addr)  readl(addr)

and readl is:

__le32_to_cpu(__raw_readl(addr));

So we do canceling byte swaps, which are both nops on x86, and end up
returning device endian, which we assume is little endian.

vfio write32 is similar:

iowrite32(le32_to_cpu(val), io + off);

The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
out, so input data is device endian, which is assumed little.

> This also works for big endian but rather by an accident: it reads 4 bytes
> from the stream (@val is big endian), converts to CPU format (which should
> be big endian) as it was little endian (@val becomes actually little
> endian) and calls iowrite32() which does not do swapping on big endian
> system.

Really?

In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
writel(), which seems to use the generic implementation, which does
include a cpu_to_le32.

I also see other big endian archs like parisc doing cpu_to_le32 on
iowrite32, so I don't think this statement is true.  I imagine it's
probably working for you because the swap cancel.

> This removes byte swapping and makes use ioread32be/iowrite32be
> (and 16bit versions) on big-endian systems. The "be" helpers take
> native endian values and do swapping at the moment of writing to a PCI
> register using one of "store byte-reversed" instructions.

So now you want iowrite32() on little endian and iowrite32be() on big
endian, the former does a cpu_to_le32 (which is a nop on little endian)
and the latter does a cpu_to_be32 (which is a nop on big endian)...
should we just be using __raw_writel() on both?  There doesn't actually
seem to be any change in behavior here, it just eliminates back-to-back
byte swaps, which are a nop on x86, but not power, right?

> Suggested-by: Benjamin Herrenschmidt 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
> b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 210db24..f363b5a 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -21,6 +21,18 @@
>  
>  #include "vfio_pci_private.h"
>  
> +#ifdef __BIG_ENDIAN__
> +#define ioread16_native  ioread16be
> +#define ioread32_native  ioread32be
> +#define iowrite16_native iowrite16be
> +#define iowrite32_native iowrite32be
> +#else
> +#define ioread16_native  ioread16
> +#define ioread32_native  ioread32
> +#define iowrite16_native iowrite16
> +#define iowrite32_native iowrite32
> +#endif
> +
>  /*
>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>   * range which is inaccessible.  The excluded range drops writes and fills
> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>   if (copy_from_user(, buf, 4))
>   return -EFAULT;
>  
> - iowrite32(le32_to_cpu(val), io + off);
> + iowrite32_native(val, io + off);
>   } else {
> - val = cpu_to_le32(ioread32(io + off));
> + val = ioread32_native(io + off);
>  
>   if (copy_to_user(buf, , 4))
>   return -EFAULT;
> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>   if (copy_from_user(, buf, 2))
>   return -EFAULT;
>  
> - iowrite16(le16_to_cpu(val), io + off);
> + iowrite16_native(val, io + off);
>   } else {
> - val = cpu_to_le16(ioread16(io + off));
> + val = ioread16_native(io + off);
>  
>   if (copy_to_user(buf, , 2))
>   return -EFAULT;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alex Williamson
On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.
 
 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
 little endian and le32_to_cpu/... are stubs.

vfio read32:

val = cpu_to_le32(ioread32(io + off));

Where the typical x86 case, ioread32 is:

#define ioread32(addr)  readl(addr)

and readl is:

__le32_to_cpu(__raw_readl(addr));

So we do canceling byte swaps, which are both nops on x86, and end up
returning device endian, which we assume is little endian.

vfio write32 is similar:

iowrite32(le32_to_cpu(val), io + off);

The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
out, so input data is device endian, which is assumed little.

 This also works for big endian but rather by an accident: it reads 4 bytes
 from the stream (@val is big endian), converts to CPU format (which should
 be big endian) as it was little endian (@val becomes actually little
 endian) and calls iowrite32() which does not do swapping on big endian
 system.

Really?

In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
writel(), which seems to use the generic implementation, which does
include a cpu_to_le32.

I also see other big endian archs like parisc doing cpu_to_le32 on
iowrite32, so I don't think this statement is true.  I imagine it's
probably working for you because the swap cancel.

 This removes byte swapping and makes use ioread32be/iowrite32be
 (and 16bit versions) on big-endian systems. The be helpers take
 native endian values and do swapping at the moment of writing to a PCI
 register using one of store byte-reversed instructions.

So now you want iowrite32() on little endian and iowrite32be() on big
endian, the former does a cpu_to_le32 (which is a nop on little endian)
and the latter does a cpu_to_be32 (which is a nop on big endian)...
should we just be using __raw_writel() on both?  There doesn't actually
seem to be any change in behavior here, it just eliminates back-to-back
byte swaps, which are a nop on x86, but not power, right?

 Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
 b/drivers/vfio/pci/vfio_pci_rdwr.c
 index 210db24..f363b5a 100644
 --- a/drivers/vfio/pci/vfio_pci_rdwr.c
 +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
 @@ -21,6 +21,18 @@
  
  #include vfio_pci_private.h
  
 +#ifdef __BIG_ENDIAN__
 +#define ioread16_native  ioread16be
 +#define ioread32_native  ioread32be
 +#define iowrite16_native iowrite16be
 +#define iowrite32_native iowrite32be
 +#else
 +#define ioread16_native  ioread16
 +#define ioread32_native  ioread32
 +#define iowrite16_native iowrite16
 +#define iowrite32_native iowrite32
 +#endif
 +
  /*
   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
   * range which is inaccessible.  The excluded range drops writes and fills
 @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
   if (copy_from_user(val, buf, 4))
   return -EFAULT;
  
 - iowrite32(le32_to_cpu(val), io + off);
 + iowrite32_native(val, io + off);
   } else {
 - val = cpu_to_le32(ioread32(io + off));
 + val = ioread32_native(io + off);
  
   if (copy_to_user(buf, val, 4))
   return -EFAULT;
 @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
   if (copy_from_user(val, buf, 2))
   return -EFAULT;
  
 - iowrite16(le16_to_cpu(val), io + off);
 + iowrite16_native(val, io + off);
   } else {
 - val = cpu_to_le16(ioread16(io + off));
 + val = ioread16_native(io + off);
  
   if (copy_to_user(buf, val, 2))
   return -EFAULT;



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alexey Kardashevskiy
On 06/19/2014 04:35 AM, Alex Williamson wrote:
 On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.

 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
 little endian and le32_to_cpu/... are stubs.
 
 vfio read32:
 
 val = cpu_to_le32(ioread32(io + off));
 
 Where the typical x86 case, ioread32 is:
 
 #define ioread32(addr)  readl(addr)
 
 and readl is:
 
 __le32_to_cpu(__raw_readl(addr));
 
 So we do canceling byte swaps, which are both nops on x86, and end up
 returning device endian, which we assume is little endian.
 
 vfio write32 is similar:
 
 iowrite32(le32_to_cpu(val), io + off);
 
 The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
 out, so input data is device endian, which is assumed little.
 
 This also works for big endian but rather by an accident: it reads 4 bytes
 from the stream (@val is big endian), converts to CPU format (which should
 be big endian) as it was little endian (@val becomes actually little
 endian) and calls iowrite32() which does not do swapping on big endian
 system.
 
 Really?
 
 In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
 writel(), which seems to use the generic implementation, which does
 include a cpu_to_le32.


Ouch, wrong comment. iowrite32() does swapping. My bad.


 
 I also see other big endian archs like parisc doing cpu_to_le32 on
 iowrite32, so I don't think this statement is true.  I imagine it's
 probably working for you because the swap cancel.
 
 This removes byte swapping and makes use ioread32be/iowrite32be
 (and 16bit versions) on big-endian systems. The be helpers take
 native endian values and do swapping at the moment of writing to a PCI
 register using one of store byte-reversed instructions.
 
 So now you want iowrite32() on little endian and iowrite32be() on big
 endian, the former does a cpu_to_le32 (which is a nop on little endian)
 and the latter does a cpu_to_be32 (which is a nop on big endian)...
 should we just be using __raw_writel() on both?


We can do that too. The beauty of iowrite32be on ppc64 is that it does not
swap and write separately, it is implemented via the Store Word
Byte-Reverse Indexed X-form single instruction.

And some archs (do not know which ones) may add memory barriers in their
implementations of ioread/iowrite. __raw_writel is too raw :)

  There doesn't actually
 seem to be any change in behavior here, it just eliminates back-to-back
 byte swaps, which are a nop on x86, but not power, right?

Exactly. No dependency for QEMU.


 
 Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

 diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
 b/drivers/vfio/pci/vfio_pci_rdwr.c
 index 210db24..f363b5a 100644
 --- a/drivers/vfio/pci/vfio_pci_rdwr.c
 +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
 @@ -21,6 +21,18 @@
  
  #include vfio_pci_private.h
  
 +#ifdef __BIG_ENDIAN__
 +#define ioread16_native ioread16be
 +#define ioread32_native ioread32be
 +#define iowrite16_nativeiowrite16be
 +#define iowrite32_nativeiowrite32be
 +#else
 +#define ioread16_native ioread16
 +#define ioread32_native ioread32
 +#define iowrite16_nativeiowrite16
 +#define iowrite32_nativeiowrite32
 +#endif
 +
  /*
   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
   * range which is inaccessible.  The excluded range drops writes and fills
 @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
  if (copy_from_user(val, buf, 4))
  return -EFAULT;
  
 -iowrite32(le32_to_cpu(val), io + off);
 +iowrite32_native(val, io + off);
  } else {
 -val = cpu_to_le32(ioread32(io + off));
 +val = ioread32_native(io + off);
  
  if (copy_to_user(buf, val, 4))
  return -EFAULT;
 @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
  if (copy_from_user(val, buf, 2))
  return -EFAULT;
  
 -iowrite16(le16_to_cpu(val), io + off);
 +iowrite16_native(val, io + off);
  } else {
 -val = cpu_to_le16(ioread16(io + off));
 +  

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alexey Kardashevskiy
On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 04:35 AM, Alex Williamson wrote:
 On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.

 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
 little endian and le32_to_cpu/... are stubs.

 vfio read32:

 val = cpu_to_le32(ioread32(io + off));

 Where the typical x86 case, ioread32 is:

 #define ioread32(addr)  readl(addr)

 and readl is:

 __le32_to_cpu(__raw_readl(addr));

 So we do canceling byte swaps, which are both nops on x86, and end up
 returning device endian, which we assume is little endian.

 vfio write32 is similar:

 iowrite32(le32_to_cpu(val), io + off);

 The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
 out, so input data is device endian, which is assumed little.

 This also works for big endian but rather by an accident: it reads 4 bytes
 from the stream (@val is big endian), converts to CPU format (which should
 be big endian) as it was little endian (@val becomes actually little
 endian) and calls iowrite32() which does not do swapping on big endian
 system.

 Really?

 In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
 writel(), which seems to use the generic implementation, which does
 include a cpu_to_le32.
 
 
 Ouch, wrong comment. iowrite32() does swapping. My bad.
 
 

 I also see other big endian archs like parisc doing cpu_to_le32 on
 iowrite32, so I don't think this statement is true.  I imagine it's
 probably working for you because the swap cancel.

 This removes byte swapping and makes use ioread32be/iowrite32be
 (and 16bit versions) on big-endian systems. The be helpers take
 native endian values and do swapping at the moment of writing to a PCI
 register using one of store byte-reversed instructions.

 So now you want iowrite32() on little endian and iowrite32be() on big
 endian, the former does a cpu_to_le32 (which is a nop on little endian)
 and the latter does a cpu_to_be32 (which is a nop on big endian)...
 should we just be using __raw_writel() on both?
 
 
 We can do that too. The beauty of iowrite32be on ppc64 is that it does not
 swap and write separately, it is implemented via the Store Word
 Byte-Reverse Indexed X-form single instruction.
 
 And some archs (do not know which ones) may add memory barriers in their
 implementations of ioread/iowrite. __raw_writel is too raw :)
 
  There doesn't actually
 seem to be any change in behavior here, it just eliminates back-to-back
 byte swaps, which are a nop on x86, but not power, right?
 
 Exactly. No dependency for QEMU.

How about that:
===

VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device.

Instead, the existing code assumes that byte stream in read/write is
little-endian and it fixes endianness for values which it passes to
ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
again. Since both byte swaps are nops on little-endian host, this works.

This also works for big endian but rather by an accident: it reads 4 bytes
from the stream (@val is big endian), converts to CPU format (which should
be big endian) as it was little endian (and @val becomes actually little
endian) and calls iowrite32() which does swapping on big endian
system again. So byte swap gets cancelled, __raw_writel() receives
a native value and then
*(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
just does the right thing.

This removes byte swaps and makes use of ioread32be/iowrite32be
(and 16bit versions) which do explicit byte swapping at the moment
of write to a PCI register. PPC64 uses a special Store Word
Byte-Reverse Indexed X-form instruction which does swap and store.
===

any better?




 Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

 diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
 b/drivers/vfio/pci/vfio_pci_rdwr.c
 index 210db24..f363b5a 100644
 --- a/drivers/vfio/pci/vfio_pci_rdwr.c
 +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
 @@ -21,6 +21,18 @@
  
  #include vfio_pci_private.h
  
 +#ifdef __BIG_ENDIAN__
 +#define ioread16_nativeioread16be
 +#define ioread32_nativeioread32be
 +#define iowrite16_native   iowrite16be
 +#define iowrite32_native   iowrite32be
 +#else
 +#define ioread16_nativeioread16
 +#define ioread32_native

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alexey Kardashevskiy
On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 04:35 AM, Alex Williamson wrote:
 On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.

 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
 little endian and le32_to_cpu/... are stubs.

 vfio read32:

 val = cpu_to_le32(ioread32(io + off));

 Where the typical x86 case, ioread32 is:

 #define ioread32(addr)  readl(addr)

 and readl is:

 __le32_to_cpu(__raw_readl(addr));

 So we do canceling byte swaps, which are both nops on x86, and end up
 returning device endian, which we assume is little endian.

 vfio write32 is similar:

 iowrite32(le32_to_cpu(val), io + off);

 The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
 out, so input data is device endian, which is assumed little.

 This also works for big endian but rather by an accident: it reads 4 bytes
 from the stream (@val is big endian), converts to CPU format (which should
 be big endian) as it was little endian (@val becomes actually little
 endian) and calls iowrite32() which does not do swapping on big endian
 system.

 Really?

 In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
 writel(), which seems to use the generic implementation, which does
 include a cpu_to_le32.
 
 
 Ouch, wrong comment. iowrite32() does swapping. My bad.
 
 

 I also see other big endian archs like parisc doing cpu_to_le32 on
 iowrite32, so I don't think this statement is true.  I imagine it's
 probably working for you because the swap cancel.

 This removes byte swapping and makes use ioread32be/iowrite32be
 (and 16bit versions) on big-endian systems. The be helpers take
 native endian values and do swapping at the moment of writing to a PCI
 register using one of store byte-reversed instructions.

 So now you want iowrite32() on little endian and iowrite32be() on big
 endian, the former does a cpu_to_le32 (which is a nop on little endian)
 and the latter does a cpu_to_be32 (which is a nop on big endian)...
 should we just be using __raw_writel() on both?
 
 
 We can do that too. The beauty of iowrite32be on ppc64 is that it does not
 swap and write separately, it is implemented via the Store Word
 Byte-Reverse Indexed X-form single instruction.
 
 And some archs (do not know which ones) may add memory barriers in their
 implementations of ioread/iowrite. __raw_writel is too raw :)
 
  There doesn't actually
 seem to be any change in behavior here, it just eliminates back-to-back
 byte swaps, which are a nop on x86, but not power, right?
 
 Exactly. No dependency for QEMU.

How about that:
===

VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device.

Instead, the existing code assumes that byte stream in read/write is
little-endian and it fixes endianness for values which it passes to
ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
again. Since both byte swaps are nops on little-endian host, this works.

This also works for big endian but rather by an accident: it reads 4 bytes
from the stream (@val is big endian), converts to CPU format (which should
be big endian) as it was little endian (and @val becomes actually little
endian) and calls iowrite32() which does swapping on big endian
system again. So byte swap gets cancelled, __raw_writel() receives
a native value and then
*(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
just does the right thing.

This removes byte swaps and makes use of ioread32be/iowrite32be
(and 16bit versions) which do explicit byte swapping at the moment
of write to a PCI register. PPC64 uses a special Store Word
Byte-Reverse Indexed X-form instruction which does swap and store.
===

any better?




 Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  drivers/vfio/pci/vfio_pci_rdwr.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

 diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
 b/drivers/vfio/pci/vfio_pci_rdwr.c
 index 210db24..f363b5a 100644
 --- a/drivers/vfio/pci/vfio_pci_rdwr.c
 +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
 @@ -21,6 +21,18 @@
  
  #include vfio_pci_private.h
  
 +#ifdef __BIG_ENDIAN__
 +#define ioread16_nativeioread16be
 +#define ioread32_nativeioread32be
 +#define iowrite16_native   iowrite16be
 +#define iowrite32_native   iowrite32be
 +#else
 +#define ioread16_nativeioread16
 +#define ioread32_native

Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread Alexey Kardashevskiy
On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
 On 06/19/2014 04:35 AM, Alex Williamson wrote:
 On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.

 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
 little endian and le32_to_cpu/... are stubs.

 vfio read32:

 val = cpu_to_le32(ioread32(io + off));

 Where the typical x86 case, ioread32 is:

 #define ioread32(addr)  readl(addr)

 and readl is:

 __le32_to_cpu(__raw_readl(addr));

 So we do canceling byte swaps, which are both nops on x86, and end up
 returning device endian, which we assume is little endian.

 vfio write32 is similar:

 iowrite32(le32_to_cpu(val), io + off);

 The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
 out, so input data is device endian, which is assumed little.

 This also works for big endian but rather by an accident: it reads 4 bytes
 from the stream (@val is big endian), converts to CPU format (which should
 be big endian) as it was little endian (@val becomes actually little
 endian) and calls iowrite32() which does not do swapping on big endian
 system.

 Really?

 In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
 writel(), which seems to use the generic implementation, which does
 include a cpu_to_le32.


 Ouch, wrong comment. iowrite32() does swapping. My bad.



 I also see other big endian archs like parisc doing cpu_to_le32 on
 iowrite32, so I don't think this statement is true.  I imagine it's
 probably working for you because the swap cancel.

 This removes byte swapping and makes use ioread32be/iowrite32be
 (and 16bit versions) on big-endian systems. The be helpers take
 native endian values and do swapping at the moment of writing to a PCI
 register using one of store byte-reversed instructions.

 So now you want iowrite32() on little endian and iowrite32be() on big
 endian, the former does a cpu_to_le32 (which is a nop on little endian)
 and the latter does a cpu_to_be32 (which is a nop on big endian)...
 should we just be using __raw_writel() on both?


 We can do that too. The beauty of iowrite32be on ppc64 is that it does not
 swap and write separately, it is implemented via the Store Word
 Byte-Reverse Indexed X-form single instruction.

 And some archs (do not know which ones) may add memory barriers in their
 implementations of ioread/iowrite. __raw_writel is too raw :)

  There doesn't actually
 seem to be any change in behavior here, it just eliminates back-to-back
 byte swaps, which are a nop on x86, but not power, right?

 Exactly. No dependency for QEMU.
 
 How about that:
 ===
 
 VFIO exposes BARs to user space as a byte stream so userspace can
 read it using pread()/pwrite(). Since this is a byte stream, VFIO should
 not do byte swapping and simply return values as it gets them from
 PCI device.
 
 Instead, the existing code assumes that byte stream in read/write is
 little-endian and it fixes endianness for values which it passes to
 ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
 again. Since both byte swaps are nops on little-endian host, this works.
 
 This also works for big endian but rather by an accident: it reads 4 bytes
 from the stream (@val is big endian), converts to CPU format (which should
 be big endian) as it was little endian (and @val becomes actually little
 endian) and calls iowrite32() which does swapping on big endian
 system again. So byte swap gets cancelled, __raw_writel() receives
 a native value and then
 *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
 just does the right thing.

I am wrong here, sorry. This is what happens when you watch soccer between
2am and 4am :)


 
 This removes byte swaps and makes use of ioread32be/iowrite32be
 (and 16bit versions) which do explicit byte swapping at the moment
 of write to a PCI register. PPC64 uses a special Store Word
 Byte-Reverse Indexed X-form instruction which does swap and store.

No swapping is done here if we use ioread32be as it calls in_be32 and that
animal does lwz which is simple load from memory.

So @val (16/32 bit variable on stack) will have different values on LE and
BE but since we do not handle it the host and just memcpy it to the buffer,
nothing breaks here.


So it should be like this:
===
VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device and copy_to_user will save bytes in the correct
same true for writes.

Instead, the 

RE: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-18 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey
 Kardashevskiy
 Sent: Thursday, June 19, 2014 9:18 AM
 To: Alex Williamson
 Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org;
 Alexander Graf; linuxppc-...@lists.ozlabs.org
 Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs
 
 On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
  On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
  On 06/19/2014 04:35 AM, Alex Williamson wrote:
  On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
  VFIO exposes BARs to user space as a byte stream so userspace can
  read it using pread()/pwrite(). Since this is a byte stream, VFIO
  should not do byte swapping and simply return values as it gets
  them from PCI device.
 
  Instead, the existing code assumes that byte stream in read/write
  is little-endian and it fixes endianness for values which it passes
  to ioreadXX/iowriteXX helpers. This works for little-endian as PCI
  is little endian and le32_to_cpu/... are stubs.
 
  vfio read32:
 
  val = cpu_to_le32(ioread32(io + off));
 
  Where the typical x86 case, ioread32 is:
 
  #define ioread32(addr)  readl(addr)
 
  and readl is:
 
  __le32_to_cpu(__raw_readl(addr));
 
  So we do canceling byte swaps, which are both nops on x86, and end
  up returning device endian, which we assume is little endian.
 
  vfio write32 is similar:
 
  iowrite32(le32_to_cpu(val), io + off);
 
  The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
  out, so input data is device endian, which is assumed little.
 
  This also works for big endian but rather by an accident: it reads
  4 bytes from the stream (@val is big endian), converts to CPU
  format (which should be big endian) as it was little endian (@val
  becomes actually little
  endian) and calls iowrite32() which does not do swapping on big
  endian system.
 
  Really?
 
  In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
  writel(), which seems to use the generic implementation, which does
  include a cpu_to_le32.
 
 
  Ouch, wrong comment. iowrite32() does swapping. My bad.
 
 
 
  I also see other big endian archs like parisc doing cpu_to_le32 on
  iowrite32, so I don't think this statement is true.  I imagine it's
  probably working for you because the swap cancel.
 
  This removes byte swapping and makes use ioread32be/iowrite32be
  (and 16bit versions) on big-endian systems. The be helpers take
  native endian values and do swapping at the moment of writing to a
  PCI register using one of store byte-reversed instructions.
 
  So now you want iowrite32() on little endian and iowrite32be() on
  big endian, the former does a cpu_to_le32 (which is a nop on little
  endian) and the latter does a cpu_to_be32 (which is a nop on big 
  endian)...
  should we just be using __raw_writel() on both?
 
 
  We can do that too. The beauty of iowrite32be on ppc64 is that it
  does not swap and write separately, it is implemented via the Store
  Word Byte-Reverse Indexed X-form single instruction.
 
  And some archs (do not know which ones) may add memory barriers in
  their implementations of ioread/iowrite. __raw_writel is too raw :)
 
   There doesn't actually
  seem to be any change in behavior here, it just eliminates
  back-to-back byte swaps, which are a nop on x86, but not power, right?
 
  Exactly. No dependency for QEMU.
 
  How about that:
  ===
 
  VFIO exposes BARs to user space as a byte stream so userspace can read
  it using pread()/pwrite(). Since this is a byte stream, VFIO should
  not do byte swapping and simply return values as it gets them from PCI
  device.
 
  Instead, the existing code assumes that byte stream in read/write is
  little-endian and it fixes endianness for values which it passes to
  ioreadXX/iowriteXX helpers in native format. The IO helpers do
  swapping again. Since both byte swaps are nops on little-endian host, this
 works.
 
  This also works for big endian but rather by an accident: it reads 4
  bytes from the stream (@val is big endian), converts to CPU format
  (which should be big endian) as it was little endian (and @val becomes
  actually little
  endian) and calls iowrite32() which does swapping on big endian system
  again. So byte swap gets cancelled, __raw_writel() receives a native
  value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) =
  v; just does the right thing.
 
 I am wrong here, sorry. This is what happens when you watch soccer between 2am
 and 4am :)
 
 
 
  This removes byte swaps and makes use of ioread32be/iowrite32be (and
  16bit versions) which do explicit byte swapping at the moment of write
  to a PCI register. PPC64 uses a special Store Word Byte-Reverse
  Indexed X-form instruction which does swap and store.
 
 No swapping is done here if we use ioread32be as it calls in_be32 and that
 animal does lwz which