Hi,

On 10/29/2014 06:31 PM, Michael Brown wrote:
On 29/10/14 17:14, Floris Bos wrote:
I'm not sure if it is actually the iBFT that is the problem.
My initial guess was that was the case because the nameserver does not
show up in "ipconfig", and my iSCSI disk is not there.
But perhaps Windows does not copy the nameserver from iBFT, but normally
gets that by using normal DHCP later on.
And the real problem is that network connectivity is just screwed up,
perhaps caused by iPXE leaving the network adapter in some kind of state
Windows is not expecting.
That I am seeing DHCP requests, and repeated ARP requests for the IP of
my SAN after Windows booted supports the theory that it does have the
iBFT, but that Windows is able to transmit network packets, but somehow
has problems receiving them.

- Several commits I tried before "[tcp] Do not send RST for unrecognised
connections" all work properly
- Several commits I tried after, all fail
- It might be coincidence, but I just managed to get HEAD to work by
reversing both "[tcp] Do not send RST for unrecognised connections" and
"[tcp] Defer sending ACKs until all received packets have been
processed" both which do hackery in src/net/tcp.c.

The problem does not seem to be related to the iBFT; I think we can leave that aside for now.

Interesting. I wonder if it could be somehow related to the possibility of packets arriving between the time that Windows last allows iPXE control of the NIC (via an INT 13 call) and the moment that the Windows native driver starts up.

Unfortunately there is no way to enforce a clean handover of the NIC when doing anything with iSCSI, since the INT 13 API simply does not have any "shut down device" call. The Windows driver will therefore always find the NIC in a slightly unexpected state in which it is already up and running and receiving packets. It's plausible that the two TCP-related changes alter the behaviour in terms of when packets are transmitted (and thus responses received) sufficiently to trigger/avoid a bug.

You could try using the iPXE native driver instead of undionly.kkpxe. This will definitely change the state of the NIC at the time that the Windows driver starts up, and it may be that Windows likes this state better.


Does seem to work with the native driver.

You could also try using wireshark to see if there are any packets present on the network which might arrive after iPXE last relinquishes control (i.e. after the last packet sent by iPXE within its TCP connection to the iSCSI target) but before the Windows driver has started up (i.e. before Windows' initial DHCP request or anything else which has obviously been sent by Windows).


undionly.kkpxe with the two patches reversed (does work):

- iPXE communcation seems to end with an iSCSI read response, iPXE ACKs nicely - then there is this long wait on Windows startup (waiting for disks?), and during that there are some TCP retransmissions of an iSCSI NOP command trying to keep the connection warm from SAN to virtualbox. - straight after that Windows takes over, there is some DHCP/ARP traffic (not shown below), a LLMNR request for wpad, and a new iSCSI login.

==
No. Time Source Destination Protocol Length Info 315 29.079661000 192.168.178.4 192.168.178.99 iSCSI 116 SCSI: Read(10) LUN: 0x00 (LBA: 0x00000000, Len: 1) 316 29.080009000 192.168.178.99 192.168.178.4 TCP 116 [TCP segment of a reassembled PDU] 317 29.080017000 192.168.178.99 192.168.178.4 iSCSI 580 SCSI: Data In LUN: 0x00 (Read(10) Response Data) SCSI: Response LUN: 0x00 (Read(10)) (Good) 318 29.080127000 192.168.178.4 192.168.178.99 TCP 68 5624 > iscsi-target [ACK] Seq=929 Ack=4389 Win=262144 Len=0 TSval=1345915 TSecr=9180359 319 29.080229000 192.168.178.4 192.168.178.99 TCP 68 5624 > iscsi-target [ACK] Seq=929 Ack=4901 Win=262144 Len=0 TSval=1345915 TSecr=9180359 383 39.098163000 192.168.178.99 192.168.178.4 iSCSI 116 NOP In 384 39.941994000 192.168.178.99 192.168.178.4 iSCSI 116 [TCP Retransmission] NOP In 385 41.633880000 192.168.178.99 192.168.178.4 iSCSI 116 [TCP Retransmission] NOP In 388 45.017528000 192.168.178.99 192.168.178.4 iSCSI 116 [TCP Retransmission] NOP In 403 51.784762000 192.168.178.99 192.168.178.4 iSCSI 116 [TCP Retransmission] NOP In 907 172.706543000 192.168.178.4 224.0.0.252 LLMNR 66 Standard query 0xbed4 A wpad 908 172.706550000 192.168.178.4 224.0.0.252 LLMNR 66 Standard query 0xbed4 A wpad 909 172.787428000 192.168.178.4 192.168.178.99 TCP 68 49154 > iscsi-target [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1 910 172.787730000 192.168.178.99 192.168.178.4 TCP 68 iscsi-target > 49154 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=8 911 172.787999000 192.168.178.4 192.168.178.99 TCP 56 49154 > iscsi-target [ACK] Seq=1 Ack=1 Win=65536 Len=0 912 172.789389000 192.168.178.4 192.168.178.99 iSCSI 244 Login Command
[...various iSCSI commands...]
2149 173.114694000 192.168.178.4 224.0.0.252 LLMNR 66 Standard query 0xbed4 A wpad 2150 173.114697000 192.168.178.4 224.0.0.252 LLMNR 66 Standard query 0xbed4 A wpad 2169 176.317161000 192.168.178.4 192.168.178.255 NBNS 112 Registration NB WORKGROUP<00> 2170 176.317172000 192.168.178.4 192.168.178.255 NBNS 112 Registration NB WORKGROUP<00> 2171 176.317356000 192.168.178.4 192.168.178.255 NBNS 112 Registration NB MININT-SG3NP4U<00>
==

undionly.kkpxe without patch reversion (does NOT work):

- Seems the iSCSI read response is retransmitted lacking the last ACK. Those packets may arrive when Windows is about to take over.
- Windows does not seem to do any iSCSI communication

==
293 30.869835000 192.168.178.4 192.168.178.99 iSCSI 116 SCSI: Read(10) LUN: 0x00 (LBA: 0x00000000, Len: 1) 294 30.870209000 192.168.178.99 192.168.178.4 TCP 116 [TCP segment of a reassembled PDU] 295 30.870230000 192.168.178.99 192.168.178.4 iSCSI 580 SCSI: Data In LUN: 0x00 (Read(10) Response Data) SCSI: Response LUN: 0x00 (Read(10)) (Good) 296 30.870346000 192.168.178.4 192.168.178.99 TCP 68 28509 > iscsi-target [ACK] Seq=929 Ack=4389 Win=262144 Len=0 TSval=1363248 TSecr=9418323 315 34.310476000 192.168.178.99 192.168.178.4 TCP 580 [TCP Retransmission] iscsi-target > 28509 [PSH, ACK] Seq=4389 Ack=929 Win=16624 Len=512 TSval=9419184 TSecr=1363248[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)] 353 41.189666000 192.168.178.99 192.168.178.4 TCP 580 [TCP Retransmission] iscsi-target > 28509 [PSH, ACK] Seq=4389 Ack=929 Win=16624 Len=512 TSval=9420904 TSecr=1363248[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)] 522 54.980144000 192.168.178.99 192.168.178.4 TCP 580 [TCP Retransmission] iscsi-target > 28509 [PSH, ACK] Seq=4389 Ack=929 Win=16624 Len=512 TSval=9424352 TSecr=1363248[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)] 549 60.879695000 192.168.178.99 192.168.178.4 iSCSI 164 NOP In, NOP In 1333 173.270305000 192.168.178.4 192.168.178.255 NBNS 112 Registration NB MININT-GRDEK79<00> 1334 173.270318000 192.168.178.4 192.168.178.255 NBNS 112 Registration NB MININT-GRDEK79<00> 1335 173.270536000 192.168.178.4 192.168.178.255 NBNS 112 Registration NB WORKGROUP<00>
==


A problem is that the SAN-booted OS is likely to clear the screen
almost immediately, meaning that the warning message would not be seen
in practice.

But if I am doing a SAN installation couldn't the warning be printed the
moment I do the sanhook command?

The iBFT is not created until you attempt to boot from the SAN target.

Thought you already had memory reserved for it in some data segment, before filling it in.

==
/** The boot firmware table generated by iPXE */
static union xbft_table __bss16 ( xbftab ) __attribute__ (( aligned ( 16 ) ));
==

Or am I misunderstanding what that does?
(not a low level programmer)

--
Yours sincerely,

Floris Bos

_______________________________________________
ipxe-devel mailing list
[email protected]
https://lists.ipxe.org/mailman/listinfo.cgi/ipxe-devel

Reply via email to