Well, perhaps a bit embarrassingly, I realized yesterday that ethernet has
a minimum frame size of 64 bytes, and I believe this is where the 18 byte
padding comes into play. The 18 bytes pads the ARP reply to 64 bytes total,
as I mentioned in my first post.

Anyway, while this 18 byte padding is definitely unnecessary on the VAX
architecture, since the ethernet low level drivers in boot handle the
packet padding, I wonder about other architectures. Perhaps other
architectures do not have the padding in their ethernet drivers. In this
case, the 18 bytes certainly don't seem to hurt, so I'd say to leave it,
but perhaps it should have a better comment about it? The comment for the
arp_reply function currently reads something along the lines of 'padded to
46 bytes' without much explanation. This, I now realize, is 46 bytes that
are sent to sendether and then have an additional 18 tacked on for ethernet
header and checksum, coming out to 64 in length at the end of this all.
Perhaps an additional comment is all that's necessary here:

--- arp.c    2016-10-03 10:20:33.689098700 -0500
+++ arp_new.c    2016-10-03 10:31:56.438994000 -0500
@@ -240,7 +240,8 @@

 /*
  * Convert an ARP request into a reply and send it.
- * Notes:  Re-uses buffer.  Pad to length = 46.
+ * Notes:  Re-uses buffer.
+ * Pad to length = 46 to meet minimum ethernet payload.
  */
 void
 arp_reply(struct iodesc *d, void *pkt)


This patch was done on the 6.0 source.

Now on the other hand, I did figure out what was wrong with the VAX boot
program. I know it's unsupported now, but I'll put it here for reference of
anyone trying to solve similar issues. To recap the issue, boot.mop in 5.8
and every other older release I could find would begin loading over NFS,
and then intermittently crash shortly after it encountered an ARP request.
As it turns out, this issue is specific to the DEC SGEC ethernet
controller. The low level driver for this in boot.mop is
/arch/vax/stand/boot/if_ze.c. Near line 288, in the transmit function, we
find these two lines:

    /* Reset the transmitter interrupt pending flag. */
    addr->ze_nicsr5 |= ZE_NICSR5_TI;

purportedly clearing the TX interrupt. In actuality, this line clears *all*
of the interrupts. The csr5 register holds many if not all of the interrupt
status bits for the SGEC. A interrupt request bit is read as 1 in this
register if an interrupt has occured. To clear a request bit, you write to
the register with the corresponding bit as a 1. This line, however, writes
csr5 with a copy of itself. For every interrupt that is currently
requested, there will be a 1, and when it is written, all of those 1's
clear that corresponding interrupt. The issue here is that the receive
function in if_ze.c uses polling of this interrupt register, so if we
receive an ARP request, we transmit a reply and clear the RX interrupt as
well. That means any packets received between the time that the ARP request
is received and the end of the reply are never found in the receive
function because the RX interrupt is never triggered and the receive
function correspondingly assumes no packets have arrived. This issue is
corrected by changing the line to *actually* just clear the TX interrupt:

    /* Reset the transmitter interrupt pending flag. */
    addr->ze_nicsr5 = ZE_NICSR5_TI;

Correcting this line certainly improved the boot.mop program, making it
possible to reliably load OpenBSD most of the time. However, debug messages
and packet sniffing still shows hiccups after ARP requests occurred.

Another potential issue that I just now realized is that the RX function
also clears the RX interrupt every time it gets a packet, EVEN if there are
other packets waiting in the buffer. The result is that when an ARP request
comes in, and another packet for, say, NFS also arrived a moment later, the
RX function in if_ze just gets the ARP request, and clears the RX
interrupt, so it never sees the NFS packet that also came in, until the
next packet comes in (which is usually after a re-request occurs).

Anyway, I'm going to keep looking into this VAX issue, but I know it's
unsupported now, so I'll have to handle that on my own.

Best Regards,

Joe Zatarski


On Mon, Oct 3, 2016 at 5:24 AM, Martin Pieuchot <[email protected]> wrote:

> On 29/09/16(Thu) 16:20, Joseph Zatarski wrote:
> > Yesterday, I was attempting to install 5.8 on a VAX architecture (a DEC
> > VXT2000 with 18MB RAM and SPX graphics to be specific) by net booting
> using
> > the VAX boot.mop file. I was having issues during the process, and I
> > eventually used a network sniffer attempting to troubleshoot issues. I
> > noticed that loading over NFS seemed to stop shortly after my VXT
> received
> > an ARP request from my laptop, and also that the ARP reply seemed to be
> of
> > the wrong length. This issue was also a highly intermittent issue, as
> well
> > as variable in the ways it seemed to fail. Sometimes a full reboot would
> be
> > required before an NFS download worked again, other times I could retry
> > without rebooting, and rarely I would be able to boot without issues.
> >
> > I know VAX is no longer supported, but I discovered issues in
> > arch-independent sections of the source, so please keep reading.
> >
> > I did find, I think, the source of the issue with ARP reply packet
> length.
> > In libsa, the function arp_reply is what handles arp replies. Some
> comments
> > in this section read:
> >
> > /*
> >  * Convert an ARP request into a reply and send it.
> >  * Notes:  Re-uses buffer.  Pad to length = 46.
> >  */
> >
> > The length I see in the network sniffer is actually 64 bytes, which is 18
> > bytes longer than the 46 mentioned here. A normal ARP reply or request
> > (including ethernet frame) is actually 46 bytes, so this appears to be
> > correct in that the length should ultimately be 46. This function
> accepts a
> > pointer to an ethernet packet of type void, and casts it to a structure
> > devised to hold an ARP request or reply, called ether_arp:
> >
> >     struct ether_arp *arp = pkt;
> >
> > This structure should be the size of the ethernet frame's 'payload' for
> an
> > ARP packet, which comes out to be 28 bytes. This is 18 less than 46. The
> > additional 18 is for the layer 2 ethernet overhead.
> >
> > Near the end of the function, we find a call to sendether which 'pads'
> with
> > 18 bytes:
> >
> >     (void) sendether(d, pkt, sizeof(*arp) + 18,
> >         arp->arp_tha, ETHERTYPE_ARP);
> >
> > However, this is where we go wrong. In ether.c, we find the function
> > sendether. Sendether has the following notes:
> >
> > /* Caller must leave room for ethernet header in front!! */
> >
> > And later:
> >
> >     eh = (struct ether_header *)pkt - 1;
> >     len += sizeof(*eh);
> >
> > which is actually what does the padding here, meaning the '+ 18' in the
> > call to sendether is redundant and wrong.
> >
> > Looking at this reveals another potential issue. arp_reply passes *pkt to
> > sendether, and sendether will overwrite some number of bytes before the
> > buffer, which could be out the bounds of that buffer. Of course, it is
> also
> > possible that the buffer that *pkt points inside actually has enough
> space
> > within it before *pkt that this is a nonissue, but I need to investigate
> > this further (starting with finding the caller of arp_reply) to determine
> > whether it's an issue or not. Although weak evidence, data corruption
> > caused by ARP reply would seem to explain the issues I had with
> netbooting
> > my VXT2000.
> >
> > To conclude, I believe that line 291 in /sys/lib/libsa/arp.c should be
> > changed from:
> >
> >     (void) sendether(d, pkt, sizeof(*arp) + 18,
> >
> > to
> >
> >     (void) sendether(d, pkt, sizeof(*arp),
> >
> > in order to solve the packet length issue. Aside from this, I will
> continue
> > looking into potential issues with exceeding buffer bounds when sendether
> > is called from arp_reply.
>
> Thanks for the analyse, did you try what you suggested?  On which archs?
>
> Do you have a diff to share?
>

Reply via email to