Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/25/2014 07:54 AM, Benjamin Herrenschmidt wrote: > On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: >> >> I do not understand why @val is considered LE here and need to be >> converted >> to CPU. Really. I truly believe it should be cpu_to_le32(). > > No. Both are slightly wrong semantically but le32_to_cpu() is less > wrong :-) > > iowrite32 supposedly takes a "cpu" value as argument and writes an "le" > value. So if anything, you need something that converts to a "cpu" value > before you call iowrite32. > > Now it's still slightly wrong because the "input" to le32_to_cpu() is > supposed to be an "LE" value but of course here it's not, it's a "cpu" > value too :-) > > But yes, I agree with aw here, either do nothing or stick a set of > iowriteXX_native or something equivalent in the generic iomap header, > define them in term of iowrite32be/iowrite32 based on the compile time > endian of the arch. Ok. I'll do nothing. > Hitting asm-generic/iomap.h I think will cover all archs except ARM. > For ARM, just hack arch/arm/asm/io.h -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: > > I do not understand why @val is considered LE here and need to be > converted > to CPU. Really. I truly believe it should be cpu_to_le32(). No. Both are slightly wrong semantically but le32_to_cpu() is less wrong :-) iowrite32 supposedly takes a "cpu" value as argument and writes an "le" value. So if anything, you need something that converts to a "cpu" value before you call iowrite32. Now it's still slightly wrong because the "input" to le32_to_cpu() is supposed to be an "LE" value but of course here it's not, it's a "cpu" value too :-) But yes, I agree with aw here, either do nothing or stick a set of iowriteXX_native or something equivalent in the generic iomap header, define them in term of iowrite32be/iowrite32 based on the compile time endian of the arch. Hitting asm-generic/iomap.h I think will cover all archs except ARM. For ARM, just hack arch/arm/asm/io.h Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Tue, 2014-06-24 at 12:41 +0200, Alexander Graf wrote: > Is there actually any difference in generated code with this patch > applied and without? I would hope that iowrite..() is inlined and > cancels out the cpu_to_le..() calls that are also inlined? No, the former uses byteswapping asm, the compiler can't "cancel" it out, but the overhead of the additional byteswap might not be measurable. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/25/2014 12:43 AM, Alex Williamson wrote: > On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: >> On 06/25/2014 12:21 AM, Alex Williamson wrote: >>> On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote: On 24.06.14 15:01, Alexey Kardashevskiy wrote: > On 06/24/2014 10:52 PM, Alexander Graf wrote: >> On 24.06.14 14:50, Alexey Kardashevskiy wrote: >>> On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: > On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: >> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: >> >>> Working on big endian being an accident may be a matter of >>> perspective >> :-) >> >>> The comment remains that this patch doesn't actually fix anything >>> except >>> the overhead on big endian systems doing redundant byte swapping and >>> maybe the philosophy that vfio regions are little endian. >> Yes, that works by accident because technically VFIO is a transport >> and >> thus shouldn't perform any endian swapping of any sort, which remains >> the responsibility of the end driver which is the only one to know >> whether a given BAR location is a a register or some streaming data >> and in the former case whether it's LE or BE (some PCI devices are BE >> even ! :-) >> >> But yes, in the end, it works with the dual "cancelling" swaps and >> the >> overhead of those swaps is probably drowned in the noise of the >> syscall >> overhead. >> >>> I'm still not a fan of iowrite vs iowritebe, there must be >>> something we >>> can use that doesn't have an implicit swap. >> Sadly there isn't ... In the old day we didn't even have the "be" >> variant and readl/writel style accessors still don't have them either >> for all archs. >> >> There is __raw_readl/writel but here the semantics are much more than >> just "don't swap", they also don't have memory barriers (which means >> they are essentially useless to most drivers unless those are >> platform >> specific drivers which know exactly what they are doing, or in the >> rare >> cases such as accessing a framebuffer which we know never have side >> effects). >> >>> Calling it iowrite*_native is also an abuse of the namespace. >>> Next thing we know some common code >>> will legitimately use that name. >> I might make sense to those definitions into a common header. There >> have >> been a handful of cases in the past that wanted that sort of "native >> byte order" MMIOs iirc (though don't ask me for examples, I can't >> really >> remember). >> >>> If we do need to define an alias >>> (which I'd like to avoid) it should be something like >>> vfio_iowrite32. > Ping? > > We need to make a decision whether to move those xxx_native() helpers > somewhere (where?) or leave the patch as is (as we figured out that > iowriteXX functions implement barriers and we cannot just use raw > accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? >>> iowrite32 is a non-inline function so conversions take place so are the >>> others. And sorry but I fail to see why this matters. We are not trying >>> to >>> accelerate things, we are removing redundant operations which confuse >>> people who read the code. >> The confusion depends on where you're coming from. If you happen to know >> that "iowrite32" writes in LE, then the LE conversion makes a lot of >> sense. > It was like this (and this is just confusing): > > iowrite32(le32_to_cpu(val), io + off); > > What would make sense (according to you and I would understand this) is > this: > > iowrite32(cpu_to_le32(val), io + off); > > > Or I missed your point, did I? No, you didn't miss it. I think for people who know how iowrite32() works the above is obvious. I find the fact that iowrite32() writes in LE always pretty scary though ;). So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. >>> >>> I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense >> >> >> I do not understand why @val is considered LE here and need to be converted >> to CPU. Really. I truly believe it should be
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: > On 06/25/2014 12:21 AM, Alex Williamson wrote: > > On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote: > >> On 24.06.14 15:01, Alexey Kardashevskiy wrote: > >>> On 06/24/2014 10:52 PM, Alexander Graf wrote: > On 24.06.14 14:50, Alexey Kardashevskiy wrote: > > On 06/24/2014 08:41 PM, Alexander Graf wrote: > >> On 24.06.14 12:11, Alexey Kardashevskiy wrote: > >>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: > > > Working on big endian being an accident may be a matter of > > perspective > :-) > > > The comment remains that this patch doesn't actually fix anything > > except > > the overhead on big endian systems doing redundant byte swapping and > > maybe the philosophy that vfio regions are little endian. > Yes, that works by accident because technically VFIO is a transport > and > thus shouldn't perform any endian swapping of any sort, which remains > the responsibility of the end driver which is the only one to know > whether a given BAR location is a a register or some streaming data > and in the former case whether it's LE or BE (some PCI devices are BE > even ! :-) > > But yes, in the end, it works with the dual "cancelling" swaps and > the > overhead of those swaps is probably drowned in the noise of the > syscall > overhead. > > > I'm still not a fan of iowrite vs iowritebe, there must be > > something we > > can use that doesn't have an implicit swap. > Sadly there isn't ... In the old day we didn't even have the "be" > variant and readl/writel style accessors still don't have them either > for all archs. > > There is __raw_readl/writel but here the semantics are much more than > just "don't swap", they also don't have memory barriers (which means > they are essentially useless to most drivers unless those are > platform > specific drivers which know exactly what they are doing, or in the > rare > cases such as accessing a framebuffer which we know never have side > effects). > > > Calling it iowrite*_native is also an abuse of the namespace. > > Next thing we know some common code > > will legitimately use that name. > I might make sense to those definitions into a common header. There > have > been a handful of cases in the past that wanted that sort of "native > byte order" MMIOs iirc (though don't ask me for examples, I can't > really > remember). > > > If we do need to define an alias > > (which I'd like to avoid) it should be something like > > vfio_iowrite32. > >>> Ping? > >>> > >>> We need to make a decision whether to move those xxx_native() helpers > >>> somewhere (where?) or leave the patch as is (as we figured out that > >>> iowriteXX functions implement barriers and we cannot just use raw > >>> accessors) and fix commit log to explain everything. > >> Is there actually any difference in generated code with this patch > >> applied > >> and without? I would hope that iowrite..() is inlined and cancels out > >> the > >> cpu_to_le..() calls that are also inlined? > > iowrite32 is a non-inline function so conversions take place so are the > > others. And sorry but I fail to see why this matters. We are not trying > > to > > accelerate things, we are removing redundant operations which confuse > > people who read the code. > The confusion depends on where you're coming from. If you happen to know > that "iowrite32" writes in LE, then the LE conversion makes a lot of > sense. > >>> It was like this (and this is just confusing): > >>> > >>> iowrite32(le32_to_cpu(val), io + off); > >>> > >>> What would make sense (according to you and I would understand this) is > >>> this: > >>> > >>> iowrite32(cpu_to_le32(val), io + off); > >>> > >>> > >>> Or I missed your point, did I? > >> > >> No, you didn't miss it. I think for people who know how iowrite32() > >> works the above is obvious. I find the fact that iowrite32() writes in > >> LE always pretty scary though ;). > >> > >> So IMHO we should either create new, generic iowrite helpers that don't > >> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. > > > > I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense > > > I do not understand why @val is considered LE here and need to be converted > to CPU. Really. I truly believe it should be cpu_to_le32(). Because iowrite32 is defined to take
RE: [PATCH] vfio: Fix endianness handling for emulated BARs
From: Alexey Kardashevskiy ... > >> So IMHO we should either create new, generic iowrite helpers that don't > >> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. > > > > I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense > > > I do not understand why @val is considered LE here and need to be converted > to CPU. Really. I truly believe it should be cpu_to_le32(). Think about it carefully... Apparently iowrite32() is writing a 'cpu' value out 'le'. So if you have a 'le' value you need to convert it to 'cpu' first. I really won't ask how 'be' ppc managed to get 'le' on-chip peripherals. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/25/2014 12:21 AM, Alex Williamson wrote: > On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote: >> On 24.06.14 15:01, Alexey Kardashevskiy wrote: >>> On 06/24/2014 10:52 PM, Alexander Graf wrote: On 24.06.14 14:50, Alexey Kardashevskiy wrote: > On 06/24/2014 08:41 PM, Alexander Graf wrote: >> On 24.06.14 12:11, Alexey Kardashevskiy wrote: >>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: > Working on big endian being an accident may be a matter of perspective :-) > The comment remains that this patch doesn't actually fix anything > except > the overhead on big endian systems doing redundant byte swapping and > maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual "cancelling" swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. > I'm still not a fan of iowrite vs iowritebe, there must be something > we > can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the "be" variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just "don't swap", they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). > Calling it iowrite*_native is also an abuse of the namespace. > Next thing we know some common code > will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of "native byte order" MMIOs iirc (though don't ask me for examples, I can't really remember). > If we do need to define an alias > (which I'd like to avoid) it should be something like vfio_iowrite32. >>> Ping? >>> >>> We need to make a decision whether to move those xxx_native() helpers >>> somewhere (where?) or leave the patch as is (as we figured out that >>> iowriteXX functions implement barriers and we cannot just use raw >>> accessors) and fix commit log to explain everything. >> Is there actually any difference in generated code with this patch >> applied >> and without? I would hope that iowrite..() is inlined and cancels out the >> cpu_to_le..() calls that are also inlined? > iowrite32 is a non-inline function so conversions take place so are the > others. And sorry but I fail to see why this matters. We are not trying to > accelerate things, we are removing redundant operations which confuse > people who read the code. The confusion depends on where you're coming from. If you happen to know that "iowrite32" writes in LE, then the LE conversion makes a lot of sense. >>> It was like this (and this is just confusing): >>> >>> iowrite32(le32_to_cpu(val), io + off); >>> >>> What would make sense (according to you and I would understand this) is >>> this: >>> >>> iowrite32(cpu_to_le32(val), io + off); >>> >>> >>> Or I missed your point, did I? >> >> No, you didn't miss it. I think for people who know how iowrite32() >> works the above is obvious. I find the fact that iowrite32() writes in >> LE always pretty scary though ;). >> >> So IMHO we should either create new, generic iowrite helpers that don't >> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. > > I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). > and keeps the byte order consistent regardless of the platform, while > iowrite32(val) or iowrite32be(val) makes me scratch my head and try to > remember that the byte swaps are a nop on the given platforms. As Ben > noted, a native, no-swap ioread/write doesn't exist, but perhaps should. > I'd prefer an attempt be made to make it exist before adding > vfio-specific macros. vfio is arguably
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote: > On 24.06.14 15:01, Alexey Kardashevskiy wrote: > > On 06/24/2014 10:52 PM, Alexander Graf wrote: > >> On 24.06.14 14:50, Alexey Kardashevskiy wrote: > >>> On 06/24/2014 08:41 PM, Alexander Graf wrote: > On 24.06.14 12:11, Alexey Kardashevskiy wrote: > > On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: > >> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: > >> > >>> Working on big endian being an accident may be a matter of perspective > >> :-) > >> > >>> The comment remains that this patch doesn't actually fix anything > >>> except > >>> the overhead on big endian systems doing redundant byte swapping and > >>> maybe the philosophy that vfio regions are little endian. > >> Yes, that works by accident because technically VFIO is a transport and > >> thus shouldn't perform any endian swapping of any sort, which remains > >> the responsibility of the end driver which is the only one to know > >> whether a given BAR location is a a register or some streaming data > >> and in the former case whether it's LE or BE (some PCI devices are BE > >> even ! :-) > >> > >> But yes, in the end, it works with the dual "cancelling" swaps and the > >> overhead of those swaps is probably drowned in the noise of the syscall > >> overhead. > >> > >>> I'm still not a fan of iowrite vs iowritebe, there must be something > >>> we > >>> can use that doesn't have an implicit swap. > >> Sadly there isn't ... In the old day we didn't even have the "be" > >> variant and readl/writel style accessors still don't have them either > >> for all archs. > >> > >> There is __raw_readl/writel but here the semantics are much more than > >> just "don't swap", they also don't have memory barriers (which means > >> they are essentially useless to most drivers unless those are platform > >> specific drivers which know exactly what they are doing, or in the rare > >> cases such as accessing a framebuffer which we know never have side > >> effects). > >> > >>> Calling it iowrite*_native is also an abuse of the namespace. > >>> Next thing we know some common code > >>> will legitimately use that name. > >> I might make sense to those definitions into a common header. There > >> have > >> been a handful of cases in the past that wanted that sort of "native > >> byte order" MMIOs iirc (though don't ask me for examples, I can't > >> really > >> remember). > >> > >>> If we do need to define an alias > >>> (which I'd like to avoid) it should be something like vfio_iowrite32. > > Ping? > > > > We need to make a decision whether to move those xxx_native() helpers > > somewhere (where?) or leave the patch as is (as we figured out that > > iowriteXX functions implement barriers and we cannot just use raw > > accessors) and fix commit log to explain everything. > Is there actually any difference in generated code with this patch > applied > and without? I would hope that iowrite..() is inlined and cancels out the > cpu_to_le..() calls that are also inlined? > >>> iowrite32 is a non-inline function so conversions take place so are the > >>> others. And sorry but I fail to see why this matters. We are not trying to > >>> accelerate things, we are removing redundant operations which confuse > >>> people who read the code. > >> The confusion depends on where you're coming from. If you happen to know > >> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense. > > It was like this (and this is just confusing): > > > > iowrite32(le32_to_cpu(val), io + off); > > > > What would make sense (according to you and I would understand this) is > > this: > > > > iowrite32(cpu_to_le32(val), io + off); > > > > > > Or I missed your point, did I? > > No, you didn't miss it. I think for people who know how iowrite32() > works the above is obvious. I find the fact that iowrite32() writes in > LE always pretty scary though ;). > > So IMHO we should either create new, generic iowrite helpers that don't > do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense and keeps the byte order consistent regardless of the platform, while iowrite32(val) or iowrite32be(val) makes me scratch my head and try to remember that the byte swaps are a nop on the given platforms. As Ben noted, a native, no-swap ioread/write doesn't exist, but perhaps should. I'd prefer an attempt be made to make it exist before adding vfio-specific macros. vfio is arguably doing the right thing here given the functions available. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 24.06.14 15:01, Alexey Kardashevskiy wrote: On 06/24/2014 10:52 PM, Alexander Graf wrote: On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual "cancelling" swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the "be" variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just "don't swap", they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of "native byte order" MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that "iowrite32" writes in LE, then the LE conversion makes a lot of sense. It was like this (and this is just confusing): iowrite32(le32_to_cpu(val), io + off); What would make sense (according to you and I would understand this) is this: iowrite32(cpu_to_le32(val), io + off); Or I missed your point, did I? No, you didn't miss it. I think for people who know how iowrite32() works the above is obvious. I find the fact that iowrite32() writes in LE always pretty scary though ;). So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/24/2014 10:52 PM, Alexander Graf wrote: > > On 24.06.14 14:50, Alexey Kardashevskiy wrote: >> On 06/24/2014 08:41 PM, Alexander Graf wrote: >>> On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: > >> Working on big endian being an accident may be a matter of perspective >:-) > >> The comment remains that this patch doesn't actually fix anything except >> the overhead on big endian systems doing redundant byte swapping and >> maybe the philosophy that vfio regions are little endian. > Yes, that works by accident because technically VFIO is a transport and > thus shouldn't perform any endian swapping of any sort, which remains > the responsibility of the end driver which is the only one to know > whether a given BAR location is a a register or some streaming data > and in the former case whether it's LE or BE (some PCI devices are BE > even ! :-) > > But yes, in the end, it works with the dual "cancelling" swaps and the > overhead of those swaps is probably drowned in the noise of the syscall > overhead. > >> I'm still not a fan of iowrite vs iowritebe, there must be something we >> can use that doesn't have an implicit swap. > Sadly there isn't ... In the old day we didn't even have the "be" > variant and readl/writel style accessors still don't have them either > for all archs. > > There is __raw_readl/writel but here the semantics are much more than > just "don't swap", they also don't have memory barriers (which means > they are essentially useless to most drivers unless those are platform > specific drivers which know exactly what they are doing, or in the rare > cases such as accessing a framebuffer which we know never have side > effects). > >>Calling it iowrite*_native is also an abuse of the namespace. >>Next thing we know some common code >> will legitimately use that name. > I might make sense to those definitions into a common header. There have > been a handful of cases in the past that wanted that sort of "native > byte order" MMIOs iirc (though don't ask me for examples, I can't really > remember). > >>If we do need to define an alias >> (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. >>> Is there actually any difference in generated code with this patch applied >>> and without? I would hope that iowrite..() is inlined and cancels out the >>> cpu_to_le..() calls that are also inlined? >> iowrite32 is a non-inline function so conversions take place so are the >> others. And sorry but I fail to see why this matters. We are not trying to >> accelerate things, we are removing redundant operations which confuse >> people who read the code. > > The confusion depends on where you're coming from. If you happen to know > that "iowrite32" writes in LE, then the LE conversion makes a lot of sense. It was like this (and this is just confusing): iowrite32(le32_to_cpu(val), io + off); What would make sense (according to you and I would understand this) is this: iowrite32(cpu_to_le32(val), io + off); Or I missed your point, did I? > I don't have a strong feeling either way though and will let Alex decide on > the path forward :) It would probably help if you picked the side :) -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual "cancelling" swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the "be" variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just "don't swap", they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of "native byte order" MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that "iowrite32" writes in LE, then the LE conversion makes a lot of sense. I don't have a strong feeling either way though and will let Alex decide on the path forward :). Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/24/2014 08:41 PM, Alexander Graf wrote: > > On 24.06.14 12:11, Alexey Kardashevskiy wrote: >> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: >>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: >>> Working on big endian being an accident may be a matter of perspective >>> :-) >>> The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. >>> Yes, that works by accident because technically VFIO is a transport and >>> thus shouldn't perform any endian swapping of any sort, which remains >>> the responsibility of the end driver which is the only one to know >>> whether a given BAR location is a a register or some streaming data >>> and in the former case whether it's LE or BE (some PCI devices are BE >>> even ! :-) >>> >>> But yes, in the end, it works with the dual "cancelling" swaps and the >>> overhead of those swaps is probably drowned in the noise of the syscall >>> overhead. >>> I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. >>> Sadly there isn't ... In the old day we didn't even have the "be" >>> variant and readl/writel style accessors still don't have them either >>> for all archs. >>> >>> There is __raw_readl/writel but here the semantics are much more than >>> just "don't swap", they also don't have memory barriers (which means >>> they are essentially useless to most drivers unless those are platform >>> specific drivers which know exactly what they are doing, or in the rare >>> cases such as accessing a framebuffer which we know never have side >>> effects). >>> Calling it iowrite*_native is also an abuse of the namespace. >>> Next thing we know some common code will legitimately use that name. >>> I might make sense to those definitions into a common header. There have >>> been a handful of cases in the past that wanted that sort of "native >>> byte order" MMIOs iirc (though don't ask me for examples, I can't really >>> remember). >>> If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. >> >> Ping? >> >> We need to make a decision whether to move those xxx_native() helpers >> somewhere (where?) or leave the patch as is (as we figured out that >> iowriteXX functions implement barriers and we cannot just use raw >> accessors) and fix commit log to explain everything. > > Is there actually any difference in generated code with this patch applied > and without? I would hope that iowrite..() is inlined and cancels out the > cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual "cancelling" swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the "be" variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just "don't swap", they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of "native byte order" MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: > >> Working on big endian being an accident may be a matter of perspective > > :-) > >> The comment remains that this patch doesn't actually fix anything except >> the overhead on big endian systems doing redundant byte swapping and >> maybe the philosophy that vfio regions are little endian. > > Yes, that works by accident because technically VFIO is a transport and > thus shouldn't perform any endian swapping of any sort, which remains > the responsibility of the end driver which is the only one to know > whether a given BAR location is a a register or some streaming data > and in the former case whether it's LE or BE (some PCI devices are BE > even ! :-) > > But yes, in the end, it works with the dual "cancelling" swaps and the > overhead of those swaps is probably drowned in the noise of the syscall > overhead. > >> I'm still not a fan of iowrite vs iowritebe, there must be something we >> can use that doesn't have an implicit swap. > > Sadly there isn't ... In the old day we didn't even have the "be" > variant and readl/writel style accessors still don't have them either > for all archs. > > There is __raw_readl/writel but here the semantics are much more than > just "don't swap", they also don't have memory barriers (which means > they are essentially useless to most drivers unless those are platform > specific drivers which know exactly what they are doing, or in the rare > cases such as accessing a framebuffer which we know never have side > effects). > >> Calling it iowrite*_native is also an abuse of the namespace. > > >> Next thing we know some common code >> will legitimately use that name. > > I might make sense to those definitions into a common header. There have > been a handful of cases in the past that wanted that sort of "native > byte order" MMIOs iirc (though don't ask me for examples, I can't really > remember). > >> If we do need to define an alias >> (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Thanks! >> Thanks, > > Cheers, > Ben. > >> Alex >> === any better? >>> Suggested-by: Benjamin Herrenschmidt >>> Signed-off-by: Alexey Kardashevskiy >>> --- >>> drivers/vfio/pci/vfio_pci_rdwr.c | 20 >>> 1 file changed, 16 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c >>> b/drivers/vfio/pci/vfio_pci_rdwr.c >>> index 210db24..f363b5a 100644 >>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c >>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c >>> @@ -21,6 +21,18 @@ >>> >>> #include "vfio_pci_private.h" >>> >>> +#ifdef __BIG_ENDIAN__ >>> +#define ioread16_nativeioread16be >>> +#define ioread32_nativeioread32be >>> +#define iowrite16_native iowrite16be >>> +#define iowrite32_native iowrite32be >>> +#else >>> +#define ioread16_nativeioread16 >>> +#define ioread32_nativeioread32 >>> +#define iowrite16_native iowrite16 >>> +#define iowrite32_native iowrite32 >>> +#endif >>> + >>> /* >>> * Read or write from an __iomem region (MMIO or I/O port) with an >>> excluded >>> * range which is inaccessible. The excluded range drops writes and >>> fills >>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user >>> *buf, >>> if (copy_from_user(, buf, 4)) >>> return -EFAULT; >>> >>> - iowrite32(le32_to_cpu(val), io + off); >>> + iowrite32_native(val, io + off); >>> } else { >>> - val = cpu_to_le32(ioread32(io + off)); >>> + val = ioread32_native(io + off); >>> >>> if (copy_to_user(buf, , 4)) >>> return -EFAULT; >>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user >>> *buf, >>> if (copy_from_user(, buf, 2)) >>> return -EFAULT; >>> >>> - iowrite16(le16_to_cpu(val), io + off); >>> + iowrite16_native(val, io + off); >>> } else { >>> - val = cpu_to_le16(ioread16(io + off)); >>> +
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Thanks! Thanks, Cheers, Ben. Alex === any better? Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- drivers/vfio/pci/vfio_pci_rdwr.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 210db24..f363b5a 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -21,6 +21,18 @@ #include vfio_pci_private.h +#ifdef __BIG_ENDIAN__ +#define ioread16_nativeioread16be +#define ioread32_nativeioread32be +#define iowrite16_native iowrite16be +#define iowrite32_native iowrite32be +#else +#define ioread16_nativeioread16 +#define ioread32_nativeioread32 +#define iowrite16_native iowrite16 +#define iowrite32_native iowrite32 +#endif + /* * Read or write from an __iomem region (MMIO or I/O port) with an excluded * range which is inaccessible. The excluded range drops writes and fills @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, if (copy_from_user(val, buf, 4)) return -EFAULT; - iowrite32(le32_to_cpu(val), io + off); + iowrite32_native(val, io + off); } else { - val = cpu_to_le32(ioread32(io + off)); + val = ioread32_native(io + off); if (copy_to_user(buf, val, 4)) return -EFAULT; @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, if (copy_from_user(val, buf, 2)) return -EFAULT; - iowrite16(le16_to_cpu(val), io + off); + iowrite16_native(val, io + off); } else { - val = cpu_to_le16(ioread16(io + off)); + val = ioread16_native(io + off); if (copy_to_user(buf, val, 2)) return -EFAULT; -- Alexey -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. -- Alexey -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that iowrite32 writes in LE, then the LE conversion makes a lot of sense. I don't have a strong feeling either way though and will let Alex decide on the path forward :). Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/24/2014 10:52 PM, Alexander Graf wrote: On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that iowrite32 writes in LE, then the LE conversion makes a lot of sense. It was like this (and this is just confusing): iowrite32(le32_to_cpu(val), io + off); What would make sense (according to you and I would understand this) is this: iowrite32(cpu_to_le32(val), io + off); Or I missed your point, did I? I don't have a strong feeling either way though and will let Alex decide on the path forward :) It would probably help if you picked the side :) -- Alexey -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 24.06.14 15:01, Alexey Kardashevskiy wrote: On 06/24/2014 10:52 PM, Alexander Graf wrote: On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that iowrite32 writes in LE, then the LE conversion makes a lot of sense. It was like this (and this is just confusing): iowrite32(le32_to_cpu(val), io + off); What would make sense (according to you and I would understand this) is this: iowrite32(cpu_to_le32(val), io + off); Or I missed your point, did I? No, you didn't miss it. I think for people who know how iowrite32() works the above is obvious. I find the fact that iowrite32() writes in LE always pretty scary though ;). So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote: On 24.06.14 15:01, Alexey Kardashevskiy wrote: On 06/24/2014 10:52 PM, Alexander Graf wrote: On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that iowrite32 writes in LE, then the LE conversion makes a lot of sense. It was like this (and this is just confusing): iowrite32(le32_to_cpu(val), io + off); What would make sense (according to you and I would understand this) is this: iowrite32(cpu_to_le32(val), io + off); Or I missed your point, did I? No, you didn't miss it. I think for people who know how iowrite32() works the above is obvious. I find the fact that iowrite32() writes in LE always pretty scary though ;). So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense and keeps the byte order consistent regardless of the platform, while iowrite32(val) or iowrite32be(val) makes me scratch my head and try to remember that the byte swaps are a nop on the given platforms. As Ben noted, a native, no-swap ioread/write doesn't exist, but perhaps should. I'd prefer an attempt be made to make it exist before adding vfio-specific macros. vfio is arguably doing the right thing here given the functions available. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/25/2014 12:21 AM, Alex Williamson wrote: On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote: On 24.06.14 15:01, Alexey Kardashevskiy wrote: On 06/24/2014 10:52 PM, Alexander Graf wrote: On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that iowrite32 writes in LE, then the LE conversion makes a lot of sense. It was like this (and this is just confusing): iowrite32(le32_to_cpu(val), io + off); What would make sense (according to you and I would understand this) is this: iowrite32(cpu_to_le32(val), io + off); Or I missed your point, did I? No, you didn't miss it. I think for people who know how iowrite32() works the above is obvious. I find the fact that iowrite32() writes in LE always pretty scary though ;). So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). and keeps the byte order consistent regardless of the platform, while iowrite32(val) or iowrite32be(val) makes me scratch my head and try to remember that the byte swaps are a nop on the given platforms. As Ben noted, a native, no-swap ioread/write doesn't exist, but perhaps should. I'd prefer an attempt be made to make it exist before adding vfio-specific macros. vfio is arguably doing the right thing here given the functions available. Thanks, -- Alexey -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] vfio: Fix endianness handling for emulated BARs
From: Alexey Kardashevskiy ... So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). Think about it carefully... Apparently iowrite32() is writing a 'cpu' value out 'le'. So if you have a 'le' value you need to convert it to 'cpu' first. I really won't ask how 'be' ppc managed to get 'le' on-chip peripherals. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: On 06/25/2014 12:21 AM, Alex Williamson wrote: On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote: On 24.06.14 15:01, Alexey Kardashevskiy wrote: On 06/24/2014 10:52 PM, Alexander Graf wrote: On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that iowrite32 writes in LE, then the LE conversion makes a lot of sense. It was like this (and this is just confusing): iowrite32(le32_to_cpu(val), io + off); What would make sense (according to you and I would understand this) is this: iowrite32(cpu_to_le32(val), io + off); Or I missed your point, did I? No, you didn't miss it. I think for people who know how iowrite32() works the above is obvious. I find the fact that iowrite32() writes in LE always pretty scary though ;). So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). Because iowrite32 is defined to take a cpu byte order value and write it as little endian. and keeps the byte order consistent regardless of the platform, while iowrite32(val) or iowrite32be(val) makes me scratch my head and try to remember that the byte swaps are a nop on the given platforms. As Ben noted, a native, no-swap ioread/write doesn't exist, but perhaps should. I'd prefer an attempt be made to make it exist before adding vfio-specific macros. vfio is arguably doing the right thing here given the functions available. Thanks, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/25/2014 12:43 AM, Alex Williamson wrote: On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: On 06/25/2014 12:21 AM, Alex Williamson wrote: On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote: On 24.06.14 15:01, Alexey Kardashevskiy wrote: On 06/24/2014 10:52 PM, Alexander Graf wrote: On 24.06.14 14:50, Alexey Kardashevskiy wrote: On 06/24/2014 08:41 PM, Alexander Graf wrote: On 24.06.14 12:11, Alexey Kardashevskiy wrote: On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote: On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Ping? We need to make a decision whether to move those xxx_native() helpers somewhere (where?) or leave the patch as is (as we figured out that iowriteXX functions implement barriers and we cannot just use raw accessors) and fix commit log to explain everything. Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? iowrite32 is a non-inline function so conversions take place so are the others. And sorry but I fail to see why this matters. We are not trying to accelerate things, we are removing redundant operations which confuse people who read the code. The confusion depends on where you're coming from. If you happen to know that iowrite32 writes in LE, then the LE conversion makes a lot of sense. It was like this (and this is just confusing): iowrite32(le32_to_cpu(val), io + off); What would make sense (according to you and I would understand this) is this: iowrite32(cpu_to_le32(val), io + off); Or I missed your point, did I? No, you didn't miss it. I think for people who know how iowrite32() works the above is obvious. I find the fact that iowrite32() writes in LE always pretty scary though ;). So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). Because iowrite32 is defined to take a cpu byte order value and write it as little endian. Ok, then neither le32_to_cpu() nor cpu_to_le32() should be there at all, if we are talking about not scratching anyone's head :) and keeps the byte order consistent regardless of the platform, while iowrite32(val) or iowrite32be(val) makes me scratch my head and try to remember that the byte swaps are a nop on the given platforms. As Ben noted, a native, no-swap ioread/write doesn't exist, but perhaps should. I'd prefer an attempt be made to make it exist before adding vfio-specific macros. vfio is arguably doing the right thing here given the functions available. Thanks, I do not mind to make that atempt but what exactly would make sense here? Try moving macros to include/asm-generic/io.h? Something else?
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Tue, 2014-06-24 at 12:41 +0200, Alexander Graf wrote: Is there actually any difference in generated code with this patch applied and without? I would hope that iowrite..() is inlined and cancels out the cpu_to_le..() calls that are also inlined? No, the former uses byteswapping asm, the compiler can't cancel it out, but the overhead of the additional byteswap might not be measurable. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). No. Both are slightly wrong semantically but le32_to_cpu() is less wrong :-) iowrite32 supposedly takes a cpu value as argument and writes an le value. So if anything, you need something that converts to a cpu value before you call iowrite32. Now it's still slightly wrong because the input to le32_to_cpu() is supposed to be an LE value but of course here it's not, it's a cpu value too :-) But yes, I agree with aw here, either do nothing or stick a set of iowriteXX_native or something equivalent in the generic iomap header, define them in term of iowrite32be/iowrite32 based on the compile time endian of the arch. Hitting asm-generic/iomap.h I think will cover all archs except ARM. For ARM, just hack arch/arm/asm/io.h Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/25/2014 07:54 AM, Benjamin Herrenschmidt wrote: On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote: I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). No. Both are slightly wrong semantically but le32_to_cpu() is less wrong :-) iowrite32 supposedly takes a cpu value as argument and writes an le value. So if anything, you need something that converts to a cpu value before you call iowrite32. Now it's still slightly wrong because the input to le32_to_cpu() is supposed to be an LE value but of course here it's not, it's a cpu value too :-) But yes, I agree with aw here, either do nothing or stick a set of iowriteXX_native or something equivalent in the generic iomap header, define them in term of iowrite32be/iowrite32 based on the compile time endian of the arch. Ok. I'll do nothing. Hitting asm-generic/iomap.h I think will cover all archs except ARM. For ARM, just hack arch/arm/asm/io.h -- Alexey -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Sat, 2014-06-21 at 00:14 +1000, Alexey Kardashevskiy wrote: > We can still use __raw_writel, would that be ok? No unless you understand precisely what kind of memory barriers each platform require for these. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: > Working on big endian being an accident may be a matter of perspective :-) > The comment remains that this patch doesn't actually fix anything except > the overhead on big endian systems doing redundant byte swapping and > maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual "cancelling" swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. > I'm still not a fan of iowrite vs iowritebe, there must be something we > can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the "be" variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just "don't swap", they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). > Calling it iowrite*_native is also an abuse of the namespace. > Next thing we know some common code > will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of "native byte order" MMIOs iirc (though don't ask me for examples, I can't really remember). > If we do need to define an alias > (which I'd like to avoid) it should be something like vfio_iowrite32. > Thanks, Cheers, Ben. > Alex > > > > === > > > > > > any better? > > > > > > > > > > > > > > Suggested-by: Benjamin Herrenschmidt > > Signed-off-by: Alexey Kardashevskiy > > --- > > drivers/vfio/pci/vfio_pci_rdwr.c | 20 > > 1 file changed, 16 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c > > b/drivers/vfio/pci/vfio_pci_rdwr.c > > index 210db24..f363b5a 100644 > > --- a/drivers/vfio/pci/vfio_pci_rdwr.c > > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c > > @@ -21,6 +21,18 @@ > > > > #include "vfio_pci_private.h" > > > > +#ifdef __BIG_ENDIAN__ > > +#define ioread16_native ioread16be > > +#define ioread32_native ioread32be > > +#define iowrite16_native iowrite16be > > +#define iowrite32_native iowrite32be > > +#else > > +#define ioread16_native ioread16 > > +#define ioread32_native ioread32 > > +#define iowrite16_native iowrite16 > > +#define iowrite32_native iowrite32 > > +#endif > > + > > /* > > * Read or write from an __iomem region (MMIO or I/O port) with an > > excluded > > * range which is inaccessible. The excluded range drops writes and > > fills > > @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char > > __user *buf, > > if (copy_from_user(, buf, 4)) > > return -EFAULT; > > > > - iowrite32(le32_to_cpu(val), io + off); > > + iowrite32_native(val, io + off); > > } else { > > - val = cpu_to_le32(ioread32(io + off)); > > + val = ioread32_native(io + off); > > > > if (copy_to_user(buf, , 4)) > > return -EFAULT; > > @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char > > __user *buf, > > if (copy_from_user(, buf, 2)) > > return -EFAULT; > > > > - iowrite16(le16_to_cpu(val), io + off); > > + iowrite16_native(val, io + off); > > } else { > > - val = cpu_to_le16(ioread16(io + off)); > > + val = ioread16_native(io + off); > > > > if (copy_to_user(buf, , 2)) > > return -EFAULT; > > >>> > > >>> > > >>> > > >> > > >> > > > > > > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/20/2014 01:21 PM, Alex Williamson wrote: > On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote: >> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: >>> On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 04:35 AM, Alex Williamson wrote: > On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: >> VFIO exposes BARs to user space as a byte stream so userspace can >> read it using pread()/pwrite(). Since this is a byte stream, VFIO should >> not do byte swapping and simply return values as it gets them from >> PCI device. >> >> Instead, the existing code assumes that byte stream in read/write is >> little-endian and it fixes endianness for values which it passes to >> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is >> little endian and le32_to_cpu/... are stubs. > > vfio read32: > > val = cpu_to_le32(ioread32(io + off)); > > Where the typical x86 case, ioread32 is: > > #define ioread32(addr) readl(addr) > > and readl is: > > __le32_to_cpu(__raw_readl(addr)); > > So we do canceling byte swaps, which are both nops on x86, and end up > returning device endian, which we assume is little endian. > > vfio write32 is similar: > > iowrite32(le32_to_cpu(val), io + off); > > The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel > out, so input data is device endian, which is assumed little. > >> This also works for big endian but rather by an accident: it reads 4 >> bytes >> from the stream (@val is big endian), converts to CPU format (which >> should >> be big endian) as it was little endian (@val becomes actually little >> endian) and calls iowrite32() which does not do swapping on big endian >> system. > > Really? > > In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around > writel(), which seems to use the generic implementation, which does > include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. > > I also see other big endian archs like parisc doing cpu_to_le32 on > iowrite32, so I don't think this statement is true. I imagine it's > probably working for you because the swap cancel. > >> This removes byte swapping and makes use ioread32be/iowrite32be >> (and 16bit versions) on big-endian systems. The "be" helpers take >> native endian values and do swapping at the moment of writing to a PCI >> register using one of "store byte-reversed" instructions. > > So now you want iowrite32() on little endian and iowrite32be() on big > endian, the former does a cpu_to_le32 (which is a nop on little endian) > and the latter does a cpu_to_be32 (which is a nop on big endian)... > should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the "Store Word Byte-Reverse Indexed X-form" single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) > There doesn't actually > seem to be any change in behavior here, it just eliminates back-to-back > byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. >>> >>> How about that: >>> === >>> >>> VFIO exposes BARs to user space as a byte stream so userspace can >>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should >>> not do byte swapping and simply return values as it gets them from >>> PCI device. >>> >>> Instead, the existing code assumes that byte stream in read/write is >>> little-endian and it fixes endianness for values which it passes to >>> ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping >>> again. Since both byte swaps are nops on little-endian host, this works. >>> >>> This also works for big endian but rather by an accident: it reads 4 bytes >>> from the stream (@val is big endian), converts to CPU format (which should >>> be big endian) as it was little endian (and @val becomes actually little >>> endian) and calls iowrite32() which does swapping on big endian >>> system again. So byte swap gets cancelled, __raw_writel() receives >>> a native value and then >>> *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; >>> just does the right thing. >> >> I am wrong here, sorry. This is what happens when you watch soccer between >> 2am and 4am :) >> >> >>> >>> This removes byte swaps and makes use of ioread32be/iowrite32be >>> (and 16bit versions) which do explicit byte swapping at the moment >>> of write to a PCI register. PPC64 uses a special "Store Word >>> Byte-Reverse Indexed X-form" instruction which does swap and store. >> >> No
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/20/2014 01:21 PM, Alex Williamson wrote: On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote: On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 04:35 AM, Alex Williamson wrote: On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the Store Word Byte-Reverse Indexed X-form single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. I am wrong here, sorry. This is what happens when you watch soccer between 2am and 4am :) This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special Store Word Byte-Reverse Indexed X-form instruction which does swap and store. No swapping is done here if we use ioread32be as it calls in_be32 and that animal does lwz which is simple load from memory. So @val (16/32 bit variable on stack) will have different values on LE and BE but since we do not handle it the host and just memcpy it to the buffer, nothing breaks here. So it should be like this: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote: Working on big endian being an accident may be a matter of perspective :-) The comment remains that this patch doesn't actually fix anything except the overhead on big endian systems doing redundant byte swapping and maybe the philosophy that vfio regions are little endian. Yes, that works by accident because technically VFIO is a transport and thus shouldn't perform any endian swapping of any sort, which remains the responsibility of the end driver which is the only one to know whether a given BAR location is a a register or some streaming data and in the former case whether it's LE or BE (some PCI devices are BE even ! :-) But yes, in the end, it works with the dual cancelling swaps and the overhead of those swaps is probably drowned in the noise of the syscall overhead. I'm still not a fan of iowrite vs iowritebe, there must be something we can use that doesn't have an implicit swap. Sadly there isn't ... In the old day we didn't even have the be variant and readl/writel style accessors still don't have them either for all archs. There is __raw_readl/writel but here the semantics are much more than just don't swap, they also don't have memory barriers (which means they are essentially useless to most drivers unless those are platform specific drivers which know exactly what they are doing, or in the rare cases such as accessing a framebuffer which we know never have side effects). Calling it iowrite*_native is also an abuse of the namespace. Next thing we know some common code will legitimately use that name. I might make sense to those definitions into a common header. There have been a handful of cases in the past that wanted that sort of native byte order MMIOs iirc (though don't ask me for examples, I can't really remember). If we do need to define an alias (which I'd like to avoid) it should be something like vfio_iowrite32. Thanks, Cheers, Ben. Alex === any better? Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- drivers/vfio/pci/vfio_pci_rdwr.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 210db24..f363b5a 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -21,6 +21,18 @@ #include vfio_pci_private.h +#ifdef __BIG_ENDIAN__ +#define ioread16_native ioread16be +#define ioread32_native ioread32be +#define iowrite16_native iowrite16be +#define iowrite32_native iowrite32be +#else +#define ioread16_native ioread16 +#define ioread32_native ioread32 +#define iowrite16_native iowrite16 +#define iowrite32_native iowrite32 +#endif + /* * Read or write from an __iomem region (MMIO or I/O port) with an excluded * range which is inaccessible. The excluded range drops writes and fills @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, if (copy_from_user(val, buf, 4)) return -EFAULT; - iowrite32(le32_to_cpu(val), io + off); + iowrite32_native(val, io + off); } else { - val = cpu_to_le32(ioread32(io + off)); + val = ioread32_native(io + off); if (copy_to_user(buf, val, 4)) return -EFAULT; @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, if (copy_from_user(val, buf, 2)) return -EFAULT; - iowrite16(le16_to_cpu(val), io + off); + iowrite16_native(val, io + off); } else { - val = cpu_to_le16(ioread16(io + off)); + val = ioread16_native(io + off); if (copy_to_user(buf, val, 2)) return -EFAULT; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Sat, 2014-06-21 at 00:14 +1000, Alexey Kardashevskiy wrote: We can still use __raw_writelco, would that be ok? No unless you understand precisely what kind of memory barriers each platform require for these. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote: > On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: > > On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: > >> On 06/19/2014 04:35 AM, Alex Williamson wrote: > >>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: > VFIO exposes BARs to user space as a byte stream so userspace can > read it using pread()/pwrite(). Since this is a byte stream, VFIO should > not do byte swapping and simply return values as it gets them from > PCI device. > > Instead, the existing code assumes that byte stream in read/write is > little-endian and it fixes endianness for values which it passes to > ioreadXX/iowriteXX helpers. This works for little-endian as PCI is > little endian and le32_to_cpu/... are stubs. > >>> > >>> vfio read32: > >>> > >>> val = cpu_to_le32(ioread32(io + off)); > >>> > >>> Where the typical x86 case, ioread32 is: > >>> > >>> #define ioread32(addr) readl(addr) > >>> > >>> and readl is: > >>> > >>> __le32_to_cpu(__raw_readl(addr)); > >>> > >>> So we do canceling byte swaps, which are both nops on x86, and end up > >>> returning device endian, which we assume is little endian. > >>> > >>> vfio write32 is similar: > >>> > >>> iowrite32(le32_to_cpu(val), io + off); > >>> > >>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel > >>> out, so input data is device endian, which is assumed little. > >>> > This also works for big endian but rather by an accident: it reads 4 > bytes > from the stream (@val is big endian), converts to CPU format (which > should > be big endian) as it was little endian (@val becomes actually little > endian) and calls iowrite32() which does not do swapping on big endian > system. > >>> > >>> Really? > >>> > >>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around > >>> writel(), which seems to use the generic implementation, which does > >>> include a cpu_to_le32. > >> > >> > >> Ouch, wrong comment. iowrite32() does swapping. My bad. > >> > >> > >>> > >>> I also see other big endian archs like parisc doing cpu_to_le32 on > >>> iowrite32, so I don't think this statement is true. I imagine it's > >>> probably working for you because the swap cancel. > >>> > This removes byte swapping and makes use ioread32be/iowrite32be > (and 16bit versions) on big-endian systems. The "be" helpers take > native endian values and do swapping at the moment of writing to a PCI > register using one of "store byte-reversed" instructions. > >>> > >>> So now you want iowrite32() on little endian and iowrite32be() on big > >>> endian, the former does a cpu_to_le32 (which is a nop on little endian) > >>> and the latter does a cpu_to_be32 (which is a nop on big endian)... > >>> should we just be using __raw_writel() on both? > >> > >> > >> We can do that too. The beauty of iowrite32be on ppc64 is that it does not > >> swap and write separately, it is implemented via the "Store Word > >> Byte-Reverse Indexed X-form" single instruction. > >> > >> And some archs (do not know which ones) may add memory barriers in their > >> implementations of ioread/iowrite. __raw_writel is too raw :) > >> > >>> There doesn't actually > >>> seem to be any change in behavior here, it just eliminates back-to-back > >>> byte swaps, which are a nop on x86, but not power, right? > >> > >> Exactly. No dependency for QEMU. > > > > How about that: > > === > > > > VFIO exposes BARs to user space as a byte stream so userspace can > > read it using pread()/pwrite(). Since this is a byte stream, VFIO should > > not do byte swapping and simply return values as it gets them from > > PCI device. > > > > Instead, the existing code assumes that byte stream in read/write is > > little-endian and it fixes endianness for values which it passes to > > ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping > > again. Since both byte swaps are nops on little-endian host, this works. > > > > This also works for big endian but rather by an accident: it reads 4 bytes > > from the stream (@val is big endian), converts to CPU format (which should > > be big endian) as it was little endian (and @val becomes actually little > > endian) and calls iowrite32() which does swapping on big endian > > system again. So byte swap gets cancelled, __raw_writel() receives > > a native value and then > > *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; > > just does the right thing. > > I am wrong here, sorry. This is what happens when you watch soccer between > 2am and 4am :) > > > > > > This removes byte swaps and makes use of ioread32be/iowrite32be > > (and 16bit versions) which do explicit byte swapping at the moment > > of write to a PCI register. PPC64 uses a special "Store Word > > Byte-Reverse Indexed X-form" instruction which does swap and store. > > No swapping is done here if we use ioread32be as it
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 03:30 PM, bharat.bhus...@freescale.com wrote: > > >> -Original Message- >> From: Linuxppc-dev [mailto:linuxppc-dev- >> bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey >> Kardashevskiy >> Sent: Thursday, June 19, 2014 9:18 AM >> To: Alex Williamson >> Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org; >> Alexander Graf; linuxppc-...@lists.ozlabs.org >> Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs >> >> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: >>> On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: >>>> On 06/19/2014 04:35 AM, Alex Williamson wrote: >>>>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: >>>>>> VFIO exposes BARs to user space as a byte stream so userspace can >>>>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO >>>>>> should not do byte swapping and simply return values as it gets >>>>>> them from PCI device. >>>>>> >>>>>> Instead, the existing code assumes that byte stream in read/write >>>>>> is little-endian and it fixes endianness for values which it passes >>>>>> to ioreadXX/iowriteXX helpers. This works for little-endian as PCI >>>>>> is little endian and le32_to_cpu/... are stubs. >>>>> >>>>> vfio read32: >>>>> >>>>> val = cpu_to_le32(ioread32(io + off)); >>>>> >>>>> Where the typical x86 case, ioread32 is: >>>>> >>>>> #define ioread32(addr) readl(addr) >>>>> >>>>> and readl is: >>>>> >>>>> __le32_to_cpu(__raw_readl(addr)); >>>>> >>>>> So we do canceling byte swaps, which are both nops on x86, and end >>>>> up returning device endian, which we assume is little endian. >>>>> >>>>> vfio write32 is similar: >>>>> >>>>> iowrite32(le32_to_cpu(val), io + off); >>>>> >>>>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel >>>>> out, so input data is device endian, which is assumed little. >>>>> >>>>>> This also works for big endian but rather by an accident: it reads >>>>>> 4 bytes from the stream (@val is big endian), converts to CPU >>>>>> format (which should be big endian) as it was little endian (@val >>>>>> becomes actually little >>>>>> endian) and calls iowrite32() which does not do swapping on big >>>>>> endian system. >>>>> >>>>> Really? >>>>> >>>>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around >>>>> writel(), which seems to use the generic implementation, which does >>>>> include a cpu_to_le32. >>>> >>>> >>>> Ouch, wrong comment. iowrite32() does swapping. My bad. >>>> >>>> >>>>> >>>>> I also see other big endian archs like parisc doing cpu_to_le32 on >>>>> iowrite32, so I don't think this statement is true. I imagine it's >>>>> probably working for you because the swap cancel. >>>>> >>>>>> This removes byte swapping and makes use ioread32be/iowrite32be >>>>>> (and 16bit versions) on big-endian systems. The "be" helpers take >>>>>> native endian values and do swapping at the moment of writing to a >>>>>> PCI register using one of "store byte-reversed" instructions. >>>>> >>>>> So now you want iowrite32() on little endian and iowrite32be() on >>>>> big endian, the former does a cpu_to_le32 (which is a nop on little >>>>> endian) and the latter does a cpu_to_be32 (which is a nop on big >>>>> endian)... >>>>> should we just be using __raw_writel() on both? >>>> >>>> >>>> We can do that too. The beauty of iowrite32be on ppc64 is that it >>>> does not swap and write separately, it is implemented via the "Store >>>> Word Byte-Reverse Indexed X-form" single instruction. >>>> >>>> And some archs (do not know which ones) may add memory barriers in >>>> their implementations of ioread/iowrite. __raw_writel is too raw :) >>>> >>>>> There doesn't actually >
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 03:30 PM, bharat.bhus...@freescale.com wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey Kardashevskiy Sent: Thursday, June 19, 2014 9:18 AM To: Alex Williamson Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org; Alexander Graf; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 04:35 AM, Alex Williamson wrote: On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the Store Word Byte-Reverse Indexed X-form single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. I am wrong here, sorry. This is what happens when you watch soccer between 2am and 4am :) This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special Store Word Byte-Reverse Indexed X-form instruction which does swap and store. No swapping is done here if we use ioread32be as it calls in_be32 and that animal does lwz which is simple load from memory. So @val (16/32 bit
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote: On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 04:35 AM, Alex Williamson wrote: On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the Store Word Byte-Reverse Indexed X-form single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. I am wrong here, sorry. This is what happens when you watch soccer between 2am and 4am :) This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special Store Word Byte-Reverse Indexed X-form instruction which does swap and store. No swapping is done here if we use ioread32be as it calls in_be32 and that animal does lwz which is simple load from memory. So @val (16/32 bit variable on stack) will have different values on LE and BE but since we do not handle it the host and just memcpy it to the buffer, nothing breaks here. So it should be like this: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this
RE: [PATCH] vfio: Fix endianness handling for emulated BARs
> -Original Message- > From: Linuxppc-dev [mailto:linuxppc-dev- > bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey > Kardashevskiy > Sent: Thursday, June 19, 2014 9:18 AM > To: Alex Williamson > Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org; > Alexander Graf; linuxppc-...@lists.ozlabs.org > Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs > > On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: > > On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: > >> On 06/19/2014 04:35 AM, Alex Williamson wrote: > >>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: > >>>> VFIO exposes BARs to user space as a byte stream so userspace can > >>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO > >>>> should not do byte swapping and simply return values as it gets > >>>> them from PCI device. > >>>> > >>>> Instead, the existing code assumes that byte stream in read/write > >>>> is little-endian and it fixes endianness for values which it passes > >>>> to ioreadXX/iowriteXX helpers. This works for little-endian as PCI > >>>> is little endian and le32_to_cpu/... are stubs. > >>> > >>> vfio read32: > >>> > >>> val = cpu_to_le32(ioread32(io + off)); > >>> > >>> Where the typical x86 case, ioread32 is: > >>> > >>> #define ioread32(addr) readl(addr) > >>> > >>> and readl is: > >>> > >>> __le32_to_cpu(__raw_readl(addr)); > >>> > >>> So we do canceling byte swaps, which are both nops on x86, and end > >>> up returning device endian, which we assume is little endian. > >>> > >>> vfio write32 is similar: > >>> > >>> iowrite32(le32_to_cpu(val), io + off); > >>> > >>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel > >>> out, so input data is device endian, which is assumed little. > >>> > >>>> This also works for big endian but rather by an accident: it reads > >>>> 4 bytes from the stream (@val is big endian), converts to CPU > >>>> format (which should be big endian) as it was little endian (@val > >>>> becomes actually little > >>>> endian) and calls iowrite32() which does not do swapping on big > >>>> endian system. > >>> > >>> Really? > >>> > >>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around > >>> writel(), which seems to use the generic implementation, which does > >>> include a cpu_to_le32. > >> > >> > >> Ouch, wrong comment. iowrite32() does swapping. My bad. > >> > >> > >>> > >>> I also see other big endian archs like parisc doing cpu_to_le32 on > >>> iowrite32, so I don't think this statement is true. I imagine it's > >>> probably working for you because the swap cancel. > >>> > >>>> This removes byte swapping and makes use ioread32be/iowrite32be > >>>> (and 16bit versions) on big-endian systems. The "be" helpers take > >>>> native endian values and do swapping at the moment of writing to a > >>>> PCI register using one of "store byte-reversed" instructions. > >>> > >>> So now you want iowrite32() on little endian and iowrite32be() on > >>> big endian, the former does a cpu_to_le32 (which is a nop on little > >>> endian) and the latter does a cpu_to_be32 (which is a nop on big > >>> endian)... > >>> should we just be using __raw_writel() on both? > >> > >> > >> We can do that too. The beauty of iowrite32be on ppc64 is that it > >> does not swap and write separately, it is implemented via the "Store > >> Word Byte-Reverse Indexed X-form" single instruction. > >> > >> And some archs (do not know which ones) may add memory barriers in > >> their implementations of ioread/iowrite. __raw_writel is too raw :) > >> > >>> There doesn't actually > >>> seem to be any change in behavior here, it just eliminates > >>> back-to-back byte swaps, which are a nop on x86, but not power, right? > >> > >> Exactly. No dependency for QEMU. > > > > How about that: > > === > > > > VFIO exposes BARs to user space as a byte stream so
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: > On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: >> On 06/19/2014 04:35 AM, Alex Williamson wrote: >>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. >>> >>> vfio read32: >>> >>> val = cpu_to_le32(ioread32(io + off)); >>> >>> Where the typical x86 case, ioread32 is: >>> >>> #define ioread32(addr) readl(addr) >>> >>> and readl is: >>> >>> __le32_to_cpu(__raw_readl(addr)); >>> >>> So we do canceling byte swaps, which are both nops on x86, and end up >>> returning device endian, which we assume is little endian. >>> >>> vfio write32 is similar: >>> >>> iowrite32(le32_to_cpu(val), io + off); >>> >>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel >>> out, so input data is device endian, which is assumed little. >>> This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. >>> >>> Really? >>> >>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around >>> writel(), which seems to use the generic implementation, which does >>> include a cpu_to_le32. >> >> >> Ouch, wrong comment. iowrite32() does swapping. My bad. >> >> >>> >>> I also see other big endian archs like parisc doing cpu_to_le32 on >>> iowrite32, so I don't think this statement is true. I imagine it's >>> probably working for you because the swap cancel. >>> This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The "be" helpers take native endian values and do swapping at the moment of writing to a PCI register using one of "store byte-reversed" instructions. >>> >>> So now you want iowrite32() on little endian and iowrite32be() on big >>> endian, the former does a cpu_to_le32 (which is a nop on little endian) >>> and the latter does a cpu_to_be32 (which is a nop on big endian)... >>> should we just be using __raw_writel() on both? >> >> >> We can do that too. The beauty of iowrite32be on ppc64 is that it does not >> swap and write separately, it is implemented via the "Store Word >> Byte-Reverse Indexed X-form" single instruction. >> >> And some archs (do not know which ones) may add memory barriers in their >> implementations of ioread/iowrite. __raw_writel is too raw :) >> >>> There doesn't actually >>> seem to be any change in behavior here, it just eliminates back-to-back >>> byte swaps, which are a nop on x86, but not power, right? >> >> Exactly. No dependency for QEMU. > > How about that: > === > > VFIO exposes BARs to user space as a byte stream so userspace can > read it using pread()/pwrite(). Since this is a byte stream, VFIO should > not do byte swapping and simply return values as it gets them from > PCI device. > > Instead, the existing code assumes that byte stream in read/write is > little-endian and it fixes endianness for values which it passes to > ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping > again. Since both byte swaps are nops on little-endian host, this works. > > This also works for big endian but rather by an accident: it reads 4 bytes > from the stream (@val is big endian), converts to CPU format (which should > be big endian) as it was little endian (and @val becomes actually little > endian) and calls iowrite32() which does swapping on big endian > system again. So byte swap gets cancelled, __raw_writel() receives > a native value and then > *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; > just does the right thing. I am wrong here, sorry. This is what happens when you watch soccer between 2am and 4am :) > > This removes byte swaps and makes use of ioread32be/iowrite32be > (and 16bit versions) which do explicit byte swapping at the moment > of write to a PCI register. PPC64 uses a special "Store Word > Byte-Reverse Indexed X-form" instruction which does swap and store. No swapping is done here if we use ioread32be as it calls in_be32 and that animal does "lwz" which is simple load from memory. So @val (16/32 bit variable on stack) will have different values on LE and BE but since we do not handle it the host and just memcpy it to the buffer, nothing breaks here. So it should be like this: === VFIO exposes BARs to user
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: > On 06/19/2014 04:35 AM, Alex Williamson wrote: >> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: >>> VFIO exposes BARs to user space as a byte stream so userspace can >>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should >>> not do byte swapping and simply return values as it gets them from >>> PCI device. >>> >>> Instead, the existing code assumes that byte stream in read/write is >>> little-endian and it fixes endianness for values which it passes to >>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is >>> little endian and le32_to_cpu/... are stubs. >> >> vfio read32: >> >> val = cpu_to_le32(ioread32(io + off)); >> >> Where the typical x86 case, ioread32 is: >> >> #define ioread32(addr) readl(addr) >> >> and readl is: >> >> __le32_to_cpu(__raw_readl(addr)); >> >> So we do canceling byte swaps, which are both nops on x86, and end up >> returning device endian, which we assume is little endian. >> >> vfio write32 is similar: >> >> iowrite32(le32_to_cpu(val), io + off); >> >> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel >> out, so input data is device endian, which is assumed little. >> >>> This also works for big endian but rather by an accident: it reads 4 bytes >>> from the stream (@val is big endian), converts to CPU format (which should >>> be big endian) as it was little endian (@val becomes actually little >>> endian) and calls iowrite32() which does not do swapping on big endian >>> system. >> >> Really? >> >> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around >> writel(), which seems to use the generic implementation, which does >> include a cpu_to_le32. > > > Ouch, wrong comment. iowrite32() does swapping. My bad. > > >> >> I also see other big endian archs like parisc doing cpu_to_le32 on >> iowrite32, so I don't think this statement is true. I imagine it's >> probably working for you because the swap cancel. >> >>> This removes byte swapping and makes use ioread32be/iowrite32be >>> (and 16bit versions) on big-endian systems. The "be" helpers take >>> native endian values and do swapping at the moment of writing to a PCI >>> register using one of "store byte-reversed" instructions. >> >> So now you want iowrite32() on little endian and iowrite32be() on big >> endian, the former does a cpu_to_le32 (which is a nop on little endian) >> and the latter does a cpu_to_be32 (which is a nop on big endian)... >> should we just be using __raw_writel() on both? > > > We can do that too. The beauty of iowrite32be on ppc64 is that it does not > swap and write separately, it is implemented via the "Store Word > Byte-Reverse Indexed X-form" single instruction. > > And some archs (do not know which ones) may add memory barriers in their > implementations of ioread/iowrite. __raw_writel is too raw :) > >> There doesn't actually >> seem to be any change in behavior here, it just eliminates back-to-back >> byte swaps, which are a nop on x86, but not power, right? > > Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special "Store Word Byte-Reverse Indexed X-form" instruction which does swap and store. === any better? >>> Suggested-by: Benjamin Herrenschmidt >>> Signed-off-by: Alexey Kardashevskiy >>> --- >>> drivers/vfio/pci/vfio_pci_rdwr.c | 20 >>> 1 file changed, 16 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c >>> b/drivers/vfio/pci/vfio_pci_rdwr.c >>> index 210db24..f363b5a 100644 >>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c >>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c >>> @@ -21,6 +21,18 @@ >>> >>> #include "vfio_pci_private.h" >>> >>> +#ifdef __BIG_ENDIAN__ >>> +#define ioread16_nativeioread16be >>> +#define ioread32_native
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: > On 06/19/2014 04:35 AM, Alex Williamson wrote: >> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: >>> VFIO exposes BARs to user space as a byte stream so userspace can >>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should >>> not do byte swapping and simply return values as it gets them from >>> PCI device. >>> >>> Instead, the existing code assumes that byte stream in read/write is >>> little-endian and it fixes endianness for values which it passes to >>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is >>> little endian and le32_to_cpu/... are stubs. >> >> vfio read32: >> >> val = cpu_to_le32(ioread32(io + off)); >> >> Where the typical x86 case, ioread32 is: >> >> #define ioread32(addr) readl(addr) >> >> and readl is: >> >> __le32_to_cpu(__raw_readl(addr)); >> >> So we do canceling byte swaps, which are both nops on x86, and end up >> returning device endian, which we assume is little endian. >> >> vfio write32 is similar: >> >> iowrite32(le32_to_cpu(val), io + off); >> >> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel >> out, so input data is device endian, which is assumed little. >> >>> This also works for big endian but rather by an accident: it reads 4 bytes >>> from the stream (@val is big endian), converts to CPU format (which should >>> be big endian) as it was little endian (@val becomes actually little >>> endian) and calls iowrite32() which does not do swapping on big endian >>> system. >> >> Really? >> >> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around >> writel(), which seems to use the generic implementation, which does >> include a cpu_to_le32. > > > Ouch, wrong comment. iowrite32() does swapping. My bad. > > >> >> I also see other big endian archs like parisc doing cpu_to_le32 on >> iowrite32, so I don't think this statement is true. I imagine it's >> probably working for you because the swap cancel. >> >>> This removes byte swapping and makes use ioread32be/iowrite32be >>> (and 16bit versions) on big-endian systems. The "be" helpers take >>> native endian values and do swapping at the moment of writing to a PCI >>> register using one of "store byte-reversed" instructions. >> >> So now you want iowrite32() on little endian and iowrite32be() on big >> endian, the former does a cpu_to_le32 (which is a nop on little endian) >> and the latter does a cpu_to_be32 (which is a nop on big endian)... >> should we just be using __raw_writel() on both? > > > We can do that too. The beauty of iowrite32be on ppc64 is that it does not > swap and write separately, it is implemented via the "Store Word > Byte-Reverse Indexed X-form" single instruction. > > And some archs (do not know which ones) may add memory barriers in their > implementations of ioread/iowrite. __raw_writel is too raw :) > >> There doesn't actually >> seem to be any change in behavior here, it just eliminates back-to-back >> byte swaps, which are a nop on x86, but not power, right? > > Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special "Store Word Byte-Reverse Indexed X-form" instruction which does swap and store. === any better? >>> Suggested-by: Benjamin Herrenschmidt >>> Signed-off-by: Alexey Kardashevskiy >>> --- >>> drivers/vfio/pci/vfio_pci_rdwr.c | 20 >>> 1 file changed, 16 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c >>> b/drivers/vfio/pci/vfio_pci_rdwr.c >>> index 210db24..f363b5a 100644 >>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c >>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c >>> @@ -21,6 +21,18 @@ >>> >>> #include "vfio_pci_private.h" >>> >>> +#ifdef __BIG_ENDIAN__ >>> +#define ioread16_nativeioread16be >>> +#define ioread32_native
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 04:35 AM, Alex Williamson wrote: > On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: >> VFIO exposes BARs to user space as a byte stream so userspace can >> read it using pread()/pwrite(). Since this is a byte stream, VFIO should >> not do byte swapping and simply return values as it gets them from >> PCI device. >> >> Instead, the existing code assumes that byte stream in read/write is >> little-endian and it fixes endianness for values which it passes to >> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is >> little endian and le32_to_cpu/... are stubs. > > vfio read32: > > val = cpu_to_le32(ioread32(io + off)); > > Where the typical x86 case, ioread32 is: > > #define ioread32(addr) readl(addr) > > and readl is: > > __le32_to_cpu(__raw_readl(addr)); > > So we do canceling byte swaps, which are both nops on x86, and end up > returning device endian, which we assume is little endian. > > vfio write32 is similar: > > iowrite32(le32_to_cpu(val), io + off); > > The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel > out, so input data is device endian, which is assumed little. > >> This also works for big endian but rather by an accident: it reads 4 bytes >> from the stream (@val is big endian), converts to CPU format (which should >> be big endian) as it was little endian (@val becomes actually little >> endian) and calls iowrite32() which does not do swapping on big endian >> system. > > Really? > > In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around > writel(), which seems to use the generic implementation, which does > include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. > > I also see other big endian archs like parisc doing cpu_to_le32 on > iowrite32, so I don't think this statement is true. I imagine it's > probably working for you because the swap cancel. > >> This removes byte swapping and makes use ioread32be/iowrite32be >> (and 16bit versions) on big-endian systems. The "be" helpers take >> native endian values and do swapping at the moment of writing to a PCI >> register using one of "store byte-reversed" instructions. > > So now you want iowrite32() on little endian and iowrite32be() on big > endian, the former does a cpu_to_le32 (which is a nop on little endian) > and the latter does a cpu_to_be32 (which is a nop on big endian)... > should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the "Store Word Byte-Reverse Indexed X-form" single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) > There doesn't actually > seem to be any change in behavior here, it just eliminates back-to-back > byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. > >> Suggested-by: Benjamin Herrenschmidt >> Signed-off-by: Alexey Kardashevskiy >> --- >> drivers/vfio/pci/vfio_pci_rdwr.c | 20 >> 1 file changed, 16 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c >> b/drivers/vfio/pci/vfio_pci_rdwr.c >> index 210db24..f363b5a 100644 >> --- a/drivers/vfio/pci/vfio_pci_rdwr.c >> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c >> @@ -21,6 +21,18 @@ >> >> #include "vfio_pci_private.h" >> >> +#ifdef __BIG_ENDIAN__ >> +#define ioread16_native ioread16be >> +#define ioread32_native ioread32be >> +#define iowrite16_nativeiowrite16be >> +#define iowrite32_nativeiowrite32be >> +#else >> +#define ioread16_native ioread16 >> +#define ioread32_native ioread32 >> +#define iowrite16_nativeiowrite16 >> +#define iowrite32_nativeiowrite32 >> +#endif >> + >> /* >> * Read or write from an __iomem region (MMIO or I/O port) with an excluded >> * range which is inaccessible. The excluded range drops writes and fills >> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, >> if (copy_from_user(, buf, 4)) >> return -EFAULT; >> >> -iowrite32(le32_to_cpu(val), io + off); >> +iowrite32_native(val, io + off); >> } else { >> -val = cpu_to_le32(ioread32(io + off)); >> +val = ioread32_native(io + off); >> >> if (copy_to_user(buf, , 4)) >> return -EFAULT; >> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, >> if (copy_from_user(, buf, 2)) >> return -EFAULT; >> >> -iowrite16(le16_to_cpu(val), io + off); >> +
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: > VFIO exposes BARs to user space as a byte stream so userspace can > read it using pread()/pwrite(). Since this is a byte stream, VFIO should > not do byte swapping and simply return values as it gets them from > PCI device. > > Instead, the existing code assumes that byte stream in read/write is > little-endian and it fixes endianness for values which it passes to > ioreadXX/iowriteXX helpers. This works for little-endian as PCI is > little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. > This also works for big endian but rather by an accident: it reads 4 bytes > from the stream (@val is big endian), converts to CPU format (which should > be big endian) as it was little endian (@val becomes actually little > endian) and calls iowrite32() which does not do swapping on big endian > system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. > This removes byte swapping and makes use ioread32be/iowrite32be > (and 16bit versions) on big-endian systems. The "be" helpers take > native endian values and do swapping at the moment of writing to a PCI > register using one of "store byte-reversed" instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? > Suggested-by: Benjamin Herrenschmidt > Signed-off-by: Alexey Kardashevskiy > --- > drivers/vfio/pci/vfio_pci_rdwr.c | 20 > 1 file changed, 16 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c > b/drivers/vfio/pci/vfio_pci_rdwr.c > index 210db24..f363b5a 100644 > --- a/drivers/vfio/pci/vfio_pci_rdwr.c > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c > @@ -21,6 +21,18 @@ > > #include "vfio_pci_private.h" > > +#ifdef __BIG_ENDIAN__ > +#define ioread16_native ioread16be > +#define ioread32_native ioread32be > +#define iowrite16_native iowrite16be > +#define iowrite32_native iowrite32be > +#else > +#define ioread16_native ioread16 > +#define ioread32_native ioread32 > +#define iowrite16_native iowrite16 > +#define iowrite32_native iowrite32 > +#endif > + > /* > * Read or write from an __iomem region (MMIO or I/O port) with an excluded > * range which is inaccessible. The excluded range drops writes and fills > @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, > if (copy_from_user(, buf, 4)) > return -EFAULT; > > - iowrite32(le32_to_cpu(val), io + off); > + iowrite32_native(val, io + off); > } else { > - val = cpu_to_le32(ioread32(io + off)); > + val = ioread32_native(io + off); > > if (copy_to_user(buf, , 4)) > return -EFAULT; > @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, > if (copy_from_user(, buf, 2)) > return -EFAULT; > > - iowrite16(le16_to_cpu(val), io + off); > + iowrite16_native(val, io + off); > } else { > - val = cpu_to_le16(ioread16(io + off)); > + val = ioread16_native(io + off); > > if (copy_to_user(buf, , 2)) > return -EFAULT; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- drivers/vfio/pci/vfio_pci_rdwr.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 210db24..f363b5a 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -21,6 +21,18 @@ #include vfio_pci_private.h +#ifdef __BIG_ENDIAN__ +#define ioread16_native ioread16be +#define ioread32_native ioread32be +#define iowrite16_native iowrite16be +#define iowrite32_native iowrite32be +#else +#define ioread16_native ioread16 +#define ioread32_native ioread32 +#define iowrite16_native iowrite16 +#define iowrite32_native iowrite32 +#endif + /* * Read or write from an __iomem region (MMIO or I/O port) with an excluded * range which is inaccessible. The excluded range drops writes and fills @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, if (copy_from_user(val, buf, 4)) return -EFAULT; - iowrite32(le32_to_cpu(val), io + off); + iowrite32_native(val, io + off); } else { - val = cpu_to_le32(ioread32(io + off)); + val = ioread32_native(io + off); if (copy_to_user(buf, val, 4)) return -EFAULT; @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, if (copy_from_user(val, buf, 2)) return -EFAULT; - iowrite16(le16_to_cpu(val), io + off); + iowrite16_native(val, io + off); } else { - val = cpu_to_le16(ioread16(io + off)); + val = ioread16_native(io + off); if (copy_to_user(buf, val, 2)) return -EFAULT; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 04:35 AM, Alex Williamson wrote: On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the Store Word Byte-Reverse Indexed X-form single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- drivers/vfio/pci/vfio_pci_rdwr.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 210db24..f363b5a 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -21,6 +21,18 @@ #include vfio_pci_private.h +#ifdef __BIG_ENDIAN__ +#define ioread16_native ioread16be +#define ioread32_native ioread32be +#define iowrite16_nativeiowrite16be +#define iowrite32_nativeiowrite32be +#else +#define ioread16_native ioread16 +#define ioread32_native ioread32 +#define iowrite16_nativeiowrite16 +#define iowrite32_nativeiowrite32 +#endif + /* * Read or write from an __iomem region (MMIO or I/O port) with an excluded * range which is inaccessible. The excluded range drops writes and fills @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, if (copy_from_user(val, buf, 4)) return -EFAULT; -iowrite32(le32_to_cpu(val), io + off); +iowrite32_native(val, io + off); } else { -val = cpu_to_le32(ioread32(io + off)); +val = ioread32_native(io + off); if (copy_to_user(buf, val, 4)) return -EFAULT; @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf, if (copy_from_user(val, buf, 2)) return -EFAULT; -iowrite16(le16_to_cpu(val), io + off); +iowrite16_native(val, io + off); } else { -val = cpu_to_le16(ioread16(io + off)); +
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 04:35 AM, Alex Williamson wrote: On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the Store Word Byte-Reverse Indexed X-form single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special Store Word Byte-Reverse Indexed X-form instruction which does swap and store. === any better? Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- drivers/vfio/pci/vfio_pci_rdwr.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 210db24..f363b5a 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -21,6 +21,18 @@ #include vfio_pci_private.h +#ifdef __BIG_ENDIAN__ +#define ioread16_nativeioread16be +#define ioread32_nativeioread32be +#define iowrite16_native iowrite16be +#define iowrite32_native iowrite32be +#else +#define ioread16_nativeioread16 +#define ioread32_native
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 04:35 AM, Alex Williamson wrote: On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the Store Word Byte-Reverse Indexed X-form single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special Store Word Byte-Reverse Indexed X-form instruction which does swap and store. === any better? Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- drivers/vfio/pci/vfio_pci_rdwr.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 210db24..f363b5a 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -21,6 +21,18 @@ #include vfio_pci_private.h +#ifdef __BIG_ENDIAN__ +#define ioread16_nativeioread16be +#define ioread32_nativeioread32be +#define iowrite16_native iowrite16be +#define iowrite32_native iowrite32be +#else +#define ioread16_nativeioread16 +#define ioread32_native
Re: [PATCH] vfio: Fix endianness handling for emulated BARs
On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 04:35 AM, Alex Williamson wrote: On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the Store Word Byte-Reverse Indexed X-form single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. I am wrong here, sorry. This is what happens when you watch soccer between 2am and 4am :) This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special Store Word Byte-Reverse Indexed X-form instruction which does swap and store. No swapping is done here if we use ioread32be as it calls in_be32 and that animal does lwz which is simple load from memory. So @val (16/32 bit variable on stack) will have different values on LE and BE but since we do not handle it the host and just memcpy it to the buffer, nothing breaks here. So it should be like this: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device and copy_to_user will save bytes in the correct same true for writes. Instead, the
RE: [PATCH] vfio: Fix endianness handling for emulated BARs
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey Kardashevskiy Sent: Thursday, June 19, 2014 9:18 AM To: Alex Williamson Cc: k...@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org; Alexander Graf; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote: On 06/19/2014 04:35 AM, Alex Williamson wrote: On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote: VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers. This works for little-endian as PCI is little endian and le32_to_cpu/... are stubs. vfio read32: val = cpu_to_le32(ioread32(io + off)); Where the typical x86 case, ioread32 is: #define ioread32(addr) readl(addr) and readl is: __le32_to_cpu(__raw_readl(addr)); So we do canceling byte swaps, which are both nops on x86, and end up returning device endian, which we assume is little endian. vfio write32 is similar: iowrite32(le32_to_cpu(val), io + off); The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel out, so input data is device endian, which is assumed little. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (@val becomes actually little endian) and calls iowrite32() which does not do swapping on big endian system. Really? In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around writel(), which seems to use the generic implementation, which does include a cpu_to_le32. Ouch, wrong comment. iowrite32() does swapping. My bad. I also see other big endian archs like parisc doing cpu_to_le32 on iowrite32, so I don't think this statement is true. I imagine it's probably working for you because the swap cancel. This removes byte swapping and makes use ioread32be/iowrite32be (and 16bit versions) on big-endian systems. The be helpers take native endian values and do swapping at the moment of writing to a PCI register using one of store byte-reversed instructions. So now you want iowrite32() on little endian and iowrite32be() on big endian, the former does a cpu_to_le32 (which is a nop on little endian) and the latter does a cpu_to_be32 (which is a nop on big endian)... should we just be using __raw_writel() on both? We can do that too. The beauty of iowrite32be on ppc64 is that it does not swap and write separately, it is implemented via the Store Word Byte-Reverse Indexed X-form single instruction. And some archs (do not know which ones) may add memory barriers in their implementations of ioread/iowrite. __raw_writel is too raw :) There doesn't actually seem to be any change in behavior here, it just eliminates back-to-back byte swaps, which are a nop on x86, but not power, right? Exactly. No dependency for QEMU. How about that: === VFIO exposes BARs to user space as a byte stream so userspace can read it using pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping and simply return values as it gets them from PCI device. Instead, the existing code assumes that byte stream in read/write is little-endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping again. Since both byte swaps are nops on little-endian host, this works. This also works for big endian but rather by an accident: it reads 4 bytes from the stream (@val is big endian), converts to CPU format (which should be big endian) as it was little endian (and @val becomes actually little endian) and calls iowrite32() which does swapping on big endian system again. So byte swap gets cancelled, __raw_writel() receives a native value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v; just does the right thing. I am wrong here, sorry. This is what happens when you watch soccer between 2am and 4am :) This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit versions) which do explicit byte swapping at the moment of write to a PCI register. PPC64 uses a special Store Word Byte-Reverse Indexed X-form instruction which does swap and store. No swapping is done here if we use ioread32be as it calls in_be32 and that animal does lwz which