Re: [edk2] MNP PaddingSize Question

Cohen, Eugene Thu, 22 Aug 2013 08:30:30 -0700

➢ You only need to byte copy 7 bytes max, so this seems like a bug in your 
CopyMem routine.  You only have to align the 1st few, and last few bytes, and 
then do a fast CopyMem in the middle?


Unfortunately, no.  Let me illustrate it with an example:

Source buffer: 0x10000000 (aligned, uncahced)
Dest buffer:     0x02100012 (unaligned, cached)

Since the buffers do not share the same (mis)alignment there is no opportunity 
for a fast CopyMem and must resort to a byte copy.  The CopyMem I'm using is 
the ArmPkg BaseMemoryLibStm -- with this combination of alignments it bypasses 
all of the fast paths and goes to a ldrb/strb mode.  I think what you're 
describing (go slow, then fast) will only work if they share the same 
misalignment (say, 0x2 and 0x2).  

Eugene


From: Andrew Fish [mailto:af...@apple.com] 
Sent: Thursday, August 22, 2013 7:55 AM
To: Cohen, Eugene
Cc: edk2-devel@lists.sourceforge.net
Subject: Re: [edk2] MNP PaddingSize Question


On Aug 22, 2013, at 6:50 AM, "Cohen, Eugene" <eug...@hp.com> wrote:


➢  I don't quite understand how this works. If the CommonBuffer becomes the 
user data where does Unmap() happen. It seems like this driver would leak 
uncached memory? CommonBuffer means you are going to keep doing DMA into the 
buffer, not that you have passed it off to the consumer?
 
There is a memory copy from the DMA common buffer to the user; they’re not 
using the same buffer.  The data is copied by the driver explicitly  (there’s 
no Map/Unmap since common buffers do not require it).  The copy goes from the 
uncached/aligned common buffer (network rx packet DMA destination) to the 
cached/unaligned user buffer (MNP).  Because the user buffer is not 4-byte 
aligned we downshift to a very slow byte copy.

You only need to byte copy 7 bytes max, so this seems like a bug in your 
CopyMem routine.  You only have to align the 1st few, and last few bytes, and 
then do a fast CopyMem in the middle?


 
From: Andrew Fish [mailto:af...@apple.com] 
Sent: Thursday, August 22, 2013 7:46 AM
To: edk2-devel@lists.sourceforge.net
Cc: Cohen, Eugene
Subject: Re: [edk2] MNP PaddingSize Question
 
 
On Aug 22, 2013, at 6:41 AM, Andrew Fish <af...@apple.com> wrote:



 
 
On Aug 22, 2013, at 5:58 AM, "Cohen, Eugene" <eug...@hp.com> wrote:



Thanks for the responses Siyuan and Andrew. 
 
I think I understand your explanation -- to get the payload aligned properly so 
higher layers can get the best performance and not necessarily align the start 
of the frame itself.  Do you have some data you can share on how much 
improvement aligning the payload has?  I would assume network performance in 
UEFI would be limited more by the latency of timer tick polling (since we don’t 
get real interrupts) rather than payload alignment.
 
DMA double-buffering is not happening.  The UEFI network driver we’re using 
(from one of the big networking guys) uses common buffer mappings instead.  
Because of the maturity of the network driver I don’t think it’s reasonable to 
ask the vendor to change their driver’s DMA scheme to use BusMasterRead and 
BusMasterWrite instead of common buffers (it could even be impossible because 
of HW limitations).
 
I don't quite understand how this works. If the CommonBuffer becomes the user 
data where does Unmap() happen. It seems like this driver would leak uncached 
memory? CommonBuffer means you are going to keep doing DMA into the buffer, not 
that you have passed it off to the consumer?
 
 
And on a system with an IOMMU this would be a potential security issue. 
 
Thanks,
 
Andrew Fish



Thanks,
 
Andrew Fish



  For our systems which do not support cache coherent DMA (ARM) the common 
buffers must be uncached.  The common buffers themselves are accessed in an 
aligned manner but the caller’s (cached) buffer is unaligned for the reasons 
we’re discussion.  So this forces a CopyMem from an aligned uncached location, 
to an unaligned cached location.  The memory copy code must downshift to a byte 
copy because of this misalignment and we get horrible performance (byte 
accesses to uncached memory regions are the worst possible workload).  I 
experimented changing the padding size from 6 to 8 and then performance 
improved significantly since the CopyMem could operate efficiently.
 
So it looks like we have two competing optimizations.  As you can imagine, on 
my platform the slow down from the uncached byte copy is far worse than the 
misaligned accesses to the cached IP protocol fields.  Is there some way we can 
address both concerns?  Here are some options I can think of:
 
1.       Add some parameter (PCD value) to configure MNP to either optimize for 
aligned payload or aligned frame
2.       Add the option to double-buffer so the first CopyMem (from uncached to 
cached) is frame-aligned and then do a second CopyMem to a buffer that is 
payload-aligned.
a.       This is really no different than if BusMasterRead/BusMasterWrite 
double-buffering is used, it would just need to be done somewhere above the 
driver, maybe in the SNP driver on top of UNDI.  Unfortunately there is no DMA 
Unmap() call in this common buffer case that we can use to add the additional 
CopyMem so it would have to be explicit.
3.       Analyze the performance benefit of the aligned payload and if it’s not 
significant enough, abandon that approach and just use frame-aligned buffers 
(we need data)
4.       Extend some protocol interfaces so that higher layers can ask lower 
layers what the required alignment is (like IoAlign in BLOCK_IO).  So on our 
platform we would say that frame alignment on 4 bytes is required.  Perhaps on 
X64 it would be payload alignment on 4 bytes instead.
 
1, 3, and 4 are the best performing options since they avoid the need for an 
additional CopyMem so those would be my preference.  #1 has the downside that 
we’re tuning for a particular DMA and driver scheme with a PCD value for a 
hardware-independent service (not the greatest architectural approach).  If we 
decide to pursue #4 in the long term it would be helpful to me to do #1 in the 
short term still.
 
Do you have other options or preferences for which approach is used?
 
Eugene
 
From: Andrew Fish [mailto:af...@apple.com] 
Sent: Thursday, August 22, 2013 1:38 AM
To: edk2-devel@lists.sourceforge.net
Cc: Cohen, Eugene; edk2-devel@lists.sourceforge.net
Subject: Re: [edk2] MNP PaddingSize Question
 


Sent from my iPhone

On Aug 22, 2013, at 12:15 AM, "Fu, Siyuan" <siyuan...@intel.com> wrote:
Hi, Eugene
 
The PaddingSize is in order to make the packet data (exclude the media header) 
4-byte aligned when we tries to receive a packet.
When MNP driver calls the Snp.Receive() interface, both the media header and 
the data will be placed to the *Buffer*. Use IP packet over Ethernet for 
example, the media header is 14 bytes length (2 * 6 bytes MAC address + 2 bytes 
protocol type), then the IP4 header which immediately following the media 
header. The EFI network stack is designed to make the minimum times memory 
copy, so most of the upper layer drivers will operate on this buffer directly.
Thus we have 2 choices,
(1)    If *Buffer* passed to Snp.Receive() is 4-byte aligned, the packet data 
will start at a non-dword aligned address. Since most network protocols are 
designed with alignment consideration, the upper layer protocols, like IP, UDP, 
TCP data items, will also start at a non-dword aligned address. I think parse 
these data on unaligned address will also have performance issue.
(2)    If we make the packet data aligned, the *Buffer* is unaligned, it will 
bring performance issue as your said. Fortunately this unaligned memory copy 
only happen once on each packet (only in SNP or UNDI driver).
I think that’s why MNP driver tries to align a later part of Ethernet packet. 
And I have tested the PXE boot and TCP download on my side and do not see clear 
differences between them (maybe it’s because my UNDI driver do not use DMA?).
 
 
ARM platforms have to do DMA into uncached buffers. This is why it is so 
important to follow the EFI DMA rules.
 
Eugene have you tried double buffering the data into a cached buffer? I wonder 
if you have a lot of small misaligned accesses to uncached memory, and a single 
copy to a cached buffer would be less overhead. Or maybe you could enable 
caching on the buffer after DMA completes?




Hope my explanation is helpful.
 
Fu, Siyuan
From: Cohen, Eugene [mailto:eug...@hp.com] 
Sent: Thursday, August 22, 2013 11:46 AM
To: edk2-devel@lists.sourceforge.net
Subject: Re: [edk2] MNP PaddingSize Question
 
Ruth,
 
The performance impact is related to unaligned copies to uncached buffers.  So 
I suppose any machine that must make use of uncached buffers for DMA coherency 
would have the same slowdown, although I have not had a reason to measure this 
on other platforms.
 
The code seems strange since for a normal driver (UNDI, SNP) the receive buffer 
address passed down is no longer 4-byte aligned.  Apparently this code is 
trying to align a later part of the ethernet packet (the payload, not the 
header) but I can’t think of a reason for this.
 
Eugene
 
From: Li, Ruth [mailto:ruth...@intel.com] 
Sent: Wednesday, August 21, 2013 7:55 PM
To: edk2-devel@lists.sourceforge.net
Subject: Re: [edk2] MNP PaddingSize Question
 
Hi Eugene,
 
Below pieces of code has been there for long time. We need some time to 
evaluate it and see possible impact.
 
BTW, can I know whether you see the performance impact only over your machine? 
Or generally all machine?
 
Thanks,
Ruth
From: Cohen, Eugene [mailto:eug...@hp.com] 
Sent: Tuesday, August 20, 2013 3:56 AM
To: edk2-devel@lists.sourceforge.net
Subject: [edk2] MNP PaddingSize Question
 
I’ve been tracking down a performance issue and have isolated it to this piece 
of MNP initialization code:
 
  //
  // Make sure the protocol headers immediately following the media header
  // 4-byte aligned, and also preserve additional space for VLAN tag
  //
  MnpDeviceData->PaddingSize = ((4 - SnpMode->MediaHeaderSize) & 0x3) + 
NET_VLAN_TAG_LEN;
 
On my system this is coming up with ‘6’ (MediaHeaderSize = 0xE) which is 
causing performance issues since some of the memory copies to the resulting 
non-dword aligned addresses are slower.  As an experiment I tried bumping this 
number to ‘8’ and things worked well.
 
This value is used later when NET_BUFs are being allocated:
 
    if (MnpDeviceData->PaddingSize > 0) {
      //
      // Pad padding bytes before the media header
      //
      NetbufAllocSpace (Nbuf, MnpDeviceData->PaddingSize, NET_BUF_TAIL);
      NetbufTrim (Nbuf, MnpDeviceData->PaddingSize, NET_BUF_HEAD);
    }
 
Can someone explain the purpose of PaddingSize and how that affects the later 
processing of packets?  Is this number a minimum value and is ok to be larger?
 
Thanks,
 
Eugene
 
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel
 
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel
 

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Re: [edk2] MNP PaddingSize Question

Reply via email to