Hi,

I've run into some very peculiar behavior from some new AMD Rome base systems 
(Dell R7515 and R6525) that I've already been back and forth with Dell over for 
some time and am pretty well convinced at this point that it isn't their 
problem.

We've used ipxe for years, and embed a script that makes an http call to a 
server that, based on the serial number of the booting system, fetches boot 
instructions.  Normally this is to just boot from localdisk but sometimes 
(rebuilds, etc) it is to fetch a boot script from a second http server, which 
contains boot params and locations from which to fetch (HTTP again) kernel and 
initrd.  This has all worked well and good for quite some time, including 
across our transition from legacy BIOS boot to EFI a year or two back.

These two systems both seem to have problems with the HTTP requests though.  
Upon setting up some port spans and captures we saw a couple of intersting 
things:

- Initial BOOTROM dhcp/dns/tftp was completely normal
- As soon as ipxe took over, we started seeing trailing, repeating (i.e. the 
same sequence, but starting in different places on different packets) "garbage" 
tacked onto the end of outgoing packets, both UDP (DNS) and TCP (HTTP)
- Nearby (within our lab setup) switches seemed to pass these odd packets ok 
but as soon as we hit the WAN routers, they were dropped.
- Failures consistently happened when we went to a "far" (across the WAN) http 
server to retrieve boot instructions.  The SYN succeeds, but the SYN/ACK + GET 
with trailing garbage in the packet never makes it through the WAN router.  The 
http server keeps trying to resend its part of the handshake, which arrives, 
and ipxe dutifuly responds but that too gets dropped.
- Booting these same systems against the same servers and same ipxe revision 
but in legacy BIOS mode results in no trailing garbage and a successful boot
- Booting these same systems in UEFI mode but using Fedora/Centos' patched grub 
as the 'PXE' image (which uses the UEFI IP stack) works fine, with no trailing 
garbage
- Whether ipxe attempts to use the onboard BCM5720 gige ports or an add-on 
Mellanox CX5 25Gb adapter, the results are the same

We build our own ipxe, but only to embed certificates and boot scripts.  There 
are no code modifications whatsoever.  Currently using v1.20.1 although I've 
tried many older versions and master HEAD.
I've been playing around with ipxe debug options, and the transport layer sizes 
look right, IP layer csum is correct.  Is there some way to get ipxe to dump 
the full ethernet frame that is believes it is sending?  I had hope iobuf would 
do the trick, but apparently not.

I do have some packet captures which I've gotten approval to share off-list 
with a few developers provided any on-list discussion avoids mention of / 
ubfuscates hostnames, IPs (yes, even RFC1918), etc.

Any help with this issue would be greatly appreciated,
_______________________________________________
ipxe-devel mailing list
ipxe-devel@lists.ipxe.org
https://lists.ipxe.org/mailman/listinfo/ipxe-devel

Reply via email to