RE: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs

2020-10-25 Thread David Laight
From: David Woodhouse
> Sent: 25 October 2020 10:26
> To: David Laight ; x...@kernel.org
> 
> On Sun, 2020-10-25 at 09:49 +, David Laight wrote:
> > Just looking at a random one of these patches...
> >
> > Does the compiler manage to optimise that reasonably?
> > Or does it generate a lot of shifts and masks as each
> > bitfield is set?
> >
> > The code generation for bitfields is often a lot worse
> > that that for |= setting bits in a word.
> 
> Indeed, it appears to be utterly appalling. That was one of my
> motivations for doing it with masks and shifts in the first place.

I thought it would be.

I'm not even sure using bitfields to map hardware registers
makes the code any more readable (or fool proof).

I suspect using #define constants and explicit |= and &= ~
is actually best - provided the names are descriptive enough.

If you set all the fields the compiler will merge the values
and do a single write - provided nothing you read might
alias the target.

The only way to get strongly typed integers is to cast them
to a structure pointer type.
Together with a static inline function to remove the casts
and | the values together it might make things fool proof.
But I've never tried it.

ISTR once doing something like that with error codes, but
it didn't ever make source control.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs

2020-10-25 Thread David Woodhouse
On Sun, 2020-10-25 at 09:49 +, David Laight wrote:
> Just looking at a random one of these patches...
> 
> Does the compiler manage to optimise that reasonably?
> Or does it generate a lot of shifts and masks as each
> bitfield is set?
> 
> The code generation for bitfields is often a lot worse
> that that for |= setting bits in a word.

Indeed, it appears to be utterly appalling. That was one of my
motivations for doing it with masks and shifts in the first place.

Because in ioapic_setup_msg_from_msi(), for example, these two are
consecutive in both source and destination:

entry->vector   = msg.arch_data.vector;
entry->delivery_mode= msg.arch_data.delivery_mode;

And so are those:

entry->ir_format= msg.arch_addr_lo.dmar_format;
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;

But GCC 7.5.0 here is doing them all separately...

11e0 :
{
11e0:   53  push   %rbx
11e1:   48 89 f3mov%rsi,%rbx
11e4:   48 83 ec 18 sub$0x18,%rsp
irq_chip_compose_msi_msg(irq_data, );
11e8:   48 89 e6mov%rsp,%rsi
{
11eb:   65 48 8b 04 25 28 00mov%gs:0x28,%rax
11f2:   00 00 
11f4:   48 89 44 24 10  mov%rax,0x10(%rsp)
11f9:   31 c0   xor%eax,%eax
irq_chip_compose_msi_msg(irq_data, );
11fb:   e8 00 00 00 00  callq  1200 

entry->vector   = msg.arch_data.vector;
1200:   0f b6 44 24 08  movzbl 0x8(%rsp),%eax
entry->delivery_mode= msg.arch_data.delivery_mode;
1205:   0f b6 53 01 movzbl 0x1(%rbx),%edx
1209:   0f b6 74 24 09  movzbl 0x9(%rsp),%esi
entry->vector   = msg.arch_data.vector;
120e:   88 03   mov%al,(%rbx)
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
1210:   0f b6 04 24 movzbl (%rsp),%eax
entry->delivery_mode= msg.arch_data.delivery_mode;
1214:   83 e2 f0and$0xfff0,%edx
1217:   83 e6 07and$0x7,%esi
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
121a:   09 f2   or %esi,%edx
121c:   8d 0c 00lea(%rax,%rax,1),%ecx
entry->ir_format= msg.arch_addr_lo.dmar_format;
121f:   c0 e8 04shr$0x4,%al
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
1222:   83 e1 08and$0x8,%ecx
1225:   09 ca   or %ecx,%edx
1227:   88 53 01mov%dl,0x1(%rbx)
entry->ir_format= msg.arch_addr_lo.dmar_format;
122a:   89 c2   mov%eax,%edx
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;
122c:   8b 04 24mov(%rsp),%eax
122f:   83 e2 01and$0x1,%edx
1232:   c1 e8 05shr$0x5,%eax
1235:   66 25 ff 7f and$0x7fff,%ax
1239:   8d 0c 00lea(%rax,%rax,1),%ecx
123c:   66 c1 e8 07 shr$0x7,%ax
1240:   88 43 07mov%al,0x7(%rbx)
1243:   09 ca   or %ecx,%edx
}
1245:   48 8b 44 24 10  mov0x10(%rsp),%rax
124a:   65 48 33 04 25 28 00xor%gs:0x28,%rax
1251:   00 00 
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;
1253:   88 53 06mov%dl,0x6(%rbx)
}
1256:   75 06   jne125e 

1258:   48 83 c4 18 add$0x18,%rsp
125c:   5b  pop%rbx
125d:   c3  retq   
125e:   e8 00 00 00 00  callq  1263 

1263:   0f 1f 00nopl   (%rax)
1266:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
126d:   00 00 00 


Still, this isn't really a fast path so I'm content to file the GCC PR
for the really *obvious* misses and let it take its course.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs

2020-10-25 Thread David Laight
From: David Woodhouse
> Sent: 24 October 2020 22:35
> 
> From: Thomas Gleixner 
> 
> Use the msi_msg shadow structs and compose the message with named bitfields
> instead of the unreadable macro maze.
> 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: David Woodhouse 
> ---
>  arch/x86/pci/xen.c | 26 +++---
>  1 file changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index c552cd2d0632..3d41a09c2c14 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 
> gsi,
> 
>  #if defined(CONFIG_PCI_MSI)
>  #include 
> -#include 
> 
>  struct xen_pci_frontend_ops *xen_pci_frontend;
>  EXPORT_SYMBOL_GPL(xen_pci_frontend);
> @@ -210,23 +209,20 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int 
> nvec, int type)
>   return ret;
>  }
> 
> -#define XEN_PIRQ_MSI_DATA  (MSI_DATA_TRIGGER_EDGE | \
> - MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0))
> -
>  static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq,
>   struct msi_msg *msg)
>  {
> - /* We set vector == 0 to tell the hypervisor we don't care about it,
> -  * but we want a pirq setup instead.
> -  * We use the dest_id field to pass the pirq that we want. */
> - msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq);
> - msg->address_lo =
> - MSI_ADDR_BASE_LO |
> - MSI_ADDR_DEST_MODE_PHYSICAL |
> - MSI_ADDR_REDIRECTION_CPU |
> - MSI_ADDR_DEST_ID(pirq);
> -
> - msg->data = XEN_PIRQ_MSI_DATA;
> + /*
> +  * We set vector == 0 to tell the hypervisor we don't care about
> +  * it, but we want a pirq setup instead.  We use the dest_id fields
> +  * to pass the pirq that we want.
> +  */
> + memset(msg, 0, sizeof(*msg));
> + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
> + msg->arch_addr_hi.destid_8_31 = pirq >> 8;
> + msg->arch_addr_lo.destid_0_7 = pirq & 0xFF;
> + msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
> + msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
>  }

Just looking at a random one of these patches...

Does the compiler manage to optimise that reasonably?
Or does it generate a lot of shifts and masks as each
bitfield is set?

The code generation for bitfields is often a lot worse
that that for |= setting bits in a word.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu