Re: Problem with netconsole and eth0 timing
On 9/27/18 3:16 PM, don fisher wrote: On 9/27/18 12:01 AM, valdis.kletni...@vt.edu wrote: On Wed, 26 Sep 2018 21:38:27 -0700, don fisher said: Thanks. I tried building with the driver embedded in the kernel, but the compile failed with a halt. No crash is apparent, just a halt. It turned out that this was repeated until I removed the netconsole command during boot. System appears stable now. I will try tomorrow to embed the driver, then add netconsole option in the command line. Wait, what? The *compile* "failed with a halt"? What the heck does that mean? Don't know what it means. The compile just happened to be what I was execution. The system just stopped with no output to screen, dmesg or journal. With trial an error I discovered that if I eliminated the netconsole command from the grub2 linux command line, the system appeared stable again. I put that netconsole command in /etc/default/grub, so it is sort of a pain to insert and remove it. Don As threatened, I rebuilt with the alx driver embedded. I tested this kernel just to make sure the alx driver still supported standard Ethernet, which it did. I then add the linux netconsole command and rebooted. Everything worked well for awhile, but the output remote output stopped at 11.909 sec, while dmesg has entries up to 12.627 sec, and later up to 206.453 sec. The last message from dmesg is "work still pending". Before, at 11.977 sec there was a "No iBFT detected" message. This is about the time output terminated. I do not know what iBFT is. The netconsole on the receiver, nc -u -l 64001, is still running at 100% cpu utilization. Journal ctrl on the source gave: 701:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: local port 64001 703:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: local IPv4 address 192.168.7.60 705:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: interface 'eth0' 706:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: remote port 64001 707:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: remote IPv4 address 192.168.7.55 709:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: remote ethernet address 34:e6:d7:01:2a:dd 711:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: device eth0 not up yet, forcing it 719:Sep 27 16:25:01 dfpc60 kernel: netconsole: network logging started 750:Sep 27 16:25:01 dfpc60 systemd-modules-load[186]: Module 'netconsole' is builtin Is the forcing it correct? Don ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Problem with netconsole and eth0 timing
On 9/27/18 12:01 AM, valdis.kletni...@vt.edu wrote: On Wed, 26 Sep 2018 21:38:27 -0700, don fisher said: Thanks. I tried building with the driver embedded in the kernel, but the compile failed with a halt. No crash is apparent, just a halt. It turned out that this was repeated until I removed the netconsole command during boot. System appears stable now. I will try tomorrow to embed the driver, then add netconsole option in the command line. Wait, what? The *compile* "failed with a halt"? What the heck does that mean? ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies Don't know what it means. The compile just happened to be what I was execution. The system just stopped with no output to screen, dmesg or journal. With trial an error I discovered that if I eliminated the netconsole command from the grub2 linux command line, the system appeared stable again. I put that netconsole command in /etc/default/grub, so it is sort of a pain to insert and remove it. Don ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Problem with netconsole and eth0 timing
On Wed, 26 Sep 2018 21:38:27 -0700, don fisher said: > Thanks. I tried building with the driver embedded in the kernel, but the > compile failed with a halt. No crash is apparent, just a halt. It turned > out that this was repeated until I removed the netconsole command during > boot. System appears stable now. I will try tomorrow to embed the > driver, then add netconsole option in the command line. Wait, what? The *compile* "failed with a halt"? What the heck does that mean? pgp5U1nXlk_eM.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Problem with netconsole and eth0 timing
On 9/26/18 2:33 PM, valdis.kletni...@vt.edu wrote: On Wed, 26 Sep 2018 13:25:35 -0700, don fisher said: Would you tell me how to tell the driver that it is to be eth0, ip address etc. Maybe on linux command line, but I do not know the format. To quote some guy named Don Fisher: my kernel and including the proper command (as shown below) in the linux boot string: netconsole=64001@192.168.7.60/eth0,64001@192.168.7.55/34:e6:d7:01:2a:dd That's how. The netconsole command gets the info it needs from there, and tells the network layer how to configure the ethernet device and the network layer - although mostly the network layer. And the devices will auto-name themselves during boot, so all you need to do is know *which* name the kernel gives to the port you want to use, and then use that name. So grovel around in dmesg, and look for lines like (2 examples I have handy here) grep eth /var/log/dmesg [7.278395] igb :07:00.0: added PHC on eth0 [7.278398] igb :07:00.0: eth0: (PCIe:5.0Gb/s:Width x2) 24:6e:96:10:db:64 [7.278916] igb :07:00.0: eth0: PBA No: G61346-000 [7.368911] igb :07:00.1: added PHC on eth1 [7.368913] igb :07:00.1: eth1: (PCIe:5.0Gb/s:Width x2) 24:6e:96:10:db:65 [7.369372] igb :07:00.1: eth1: PBA No: G61346-000 dmesg | grep eth [2.642006] e1000e :00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) f0:1f:af:0c:8a:da [2.642076] e1000e :00:19.0 eth0: Intel(R) PRO/1000 Network Connection [2.642118] e1000e :00:19.0 eth0: MAC: 10, PHY: 11, PBA No: 7011FF-0FF [5.071095] e1000e :00:19.0 eno1: renamed from eth0 [ 44.516004] e1000e :00:19.0 eth0: renamed from eno1 And find the one that has the MAC address of the port you want to use. Note that you want the kernel-assigned device name, not the one that udev/systemd finally assign to the device. So for the second example (my laptop), you'd want either the eth0 or eno1 name (depending which one was in effect when the netconsole module initializes. The initial eth0 is from pci enumeration, the rename to eno1 is courtesy of the kernel, and then the rename back to eth0 by systemd. Some experimentation may be needed (I've got a few servers that have 4 1G ports on the motherboard and multiple 10G/40G dual-port cards, so sometimes the port wired to our management network ends up at eth6 or eth7...) So if the port you want to use gets named eth4 by the kernel, you use netconsole=64001@192.168.7.60/eth4,64001@192.168.7.55/34:e6:d7:01:2a:dd (Gory ethernet details follow :) Remember that strictly speaking, the ethernet device itself doesn't *need* to know what its IP address is - it only needs to know its own MAC address so it knows which packets on the wire to accept to hand to the network stack, and *maybe* a list of other MAC addresses it should accept. And the hardware already knows its own MAC address.. :) You can get ethernet devices working with a *very* small set of functions: 0) Tell the kernel your hardware state (link/no link, MAC address, a few other things) 1) Receive packets for your own MAC address 2) Receive broadcast packets 3) Receive packets for another specified MAC address (semi-optional) 4) Receive packets in promiscuous mode (semi-optional) 5) Transmit packet to the MAC address provided Pretty much everything else can be done in kernelspace (though modern cards often provide offload of some IP and even TCP handling, interrupt coalescing, and all sorts of other stuff) (I learned far too much about minimalist Ethernet when the Clarkson Packet Drivers were getting created in the cubicle next to mine. :) Thanks. I tried building with the driver embedded in the kernel, but the compile failed with a halt. No crash is apparent, just a halt. It turned out that this was repeated until I removed the netconsole command during boot. System appears stable now. I will try tomorrow to embed the driver, then add netconsole option in the command line. Don ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Problem with netconsole and eth0 timing
On Wed, 26 Sep 2018 13:25:35 -0700, don fisher said: > Would you tell me how to tell the driver that it is to be eth0, ip > address etc. Maybe on linux command line, but I do not know the format. To quote some guy named Don Fisher: > my kernel and including the proper command (as shown below) in the linux boot > string: > netconsole=64001@192.168.7.60/eth0,64001@192.168.7.55/34:e6:d7:01:2a:dd That's how. The netconsole command gets the info it needs from there, and tells the network layer how to configure the ethernet device and the network layer - although mostly the network layer. And the devices will auto-name themselves during boot, so all you need to do is know *which* name the kernel gives to the port you want to use, and then use that name. So grovel around in dmesg, and look for lines like (2 examples I have handy here) grep eth /var/log/dmesg [7.278395] igb :07:00.0: added PHC on eth0 [7.278398] igb :07:00.0: eth0: (PCIe:5.0Gb/s:Width x2) 24:6e:96:10:db:64 [7.278916] igb :07:00.0: eth0: PBA No: G61346-000 [7.368911] igb :07:00.1: added PHC on eth1 [7.368913] igb :07:00.1: eth1: (PCIe:5.0Gb/s:Width x2) 24:6e:96:10:db:65 [7.369372] igb :07:00.1: eth1: PBA No: G61346-000 dmesg | grep eth [2.642006] e1000e :00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) f0:1f:af:0c:8a:da [2.642076] e1000e :00:19.0 eth0: Intel(R) PRO/1000 Network Connection [2.642118] e1000e :00:19.0 eth0: MAC: 10, PHY: 11, PBA No: 7011FF-0FF [5.071095] e1000e :00:19.0 eno1: renamed from eth0 [ 44.516004] e1000e :00:19.0 eth0: renamed from eno1 And find the one that has the MAC address of the port you want to use. Note that you want the kernel-assigned device name, not the one that udev/systemd finally assign to the device. So for the second example (my laptop), you'd want either the eth0 or eno1 name (depending which one was in effect when the netconsole module initializes. The initial eth0 is from pci enumeration, the rename to eno1 is courtesy of the kernel, and then the rename back to eth0 by systemd. Some experimentation may be needed (I've got a few servers that have 4 1G ports on the motherboard and multiple 10G/40G dual-port cards, so sometimes the port wired to our management network ends up at eth6 or eth7...) So if the port you want to use gets named eth4 by the kernel, you use netconsole=64001@192.168.7.60/eth4,64001@192.168.7.55/34:e6:d7:01:2a:dd (Gory ethernet details follow :) Remember that strictly speaking, the ethernet device itself doesn't *need* to know what its IP address is - it only needs to know its own MAC address so it knows which packets on the wire to accept to hand to the network stack, and *maybe* a list of other MAC addresses it should accept. And the hardware already knows its own MAC address.. :) You can get ethernet devices working with a *very* small set of functions: 0) Tell the kernel your hardware state (link/no link, MAC address, a few other things) 1) Receive packets for your own MAC address 2) Receive broadcast packets 3) Receive packets for another specified MAC address (semi-optional) 4) Receive packets in promiscuous mode (semi-optional) 5) Transmit packet to the MAC address provided Pretty much everything else can be done in kernelspace (though modern cards often provide offload of some IP and even TCP handling, interrupt coalescing, and all sorts of other stuff) (I learned far too much about minimalist Ethernet when the Clarkson Packet Drivers were getting created in the cubicle next to mine. :) pgp7phgOz_EgO.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Problem with netconsole and eth0 timing
On 9/25/18 7:26 PM, valdis.kletni...@vt.edu wrote: On Tue, 25 Sep 2018 18:26:06 -0700, don fisher said: The wicked message eth0: up comes at Sep 24 22:02:01.173616. The difference is maybe 36 seconds? There is an eth0: avail message at Sep 24 22:01:34.112744, don't know if that would suffice for netconsole Both are after netconsole has bailed out. Any obvious solutions I am missing? The documentation is pretty clear on how to set this up, so there must be some way to get it to work. I could find nothing on Google. Here's the big clue: Sep 24 22:01:25 dfpc60 systemd-modules-load[185]: Module 'netconsole' is builtin which means that it's going to get initialized during *very* early boot, before the initramfs gets called. netconsole is able to do the equivalent of 'ifconfig' (or at least enough of it to set the IPs/ports/ARP entries before the rest of networking comes up), but it can't also get the physical device up and running if the hardware driver isn't present. Since it's builtin, this is probably a custom-built kernel. So make sure that the driver for eth0 is also builtin. Would you tell me how to tell the driver that it is to be eth0, ip address etc. Maybe on linux command line, but I do not know the format. Thanks Don ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Problem with netconsole and eth0 timing
On Tue, 25 Sep 2018 18:26:06 -0700, don fisher said: > The wicked message eth0: up comes at Sep 24 22:02:01.173616. The > difference is maybe 36 seconds? There is an eth0: avail message at Sep > 24 22:01:34.112744, don't know if that would suffice for netconsole Both > are after netconsole has bailed out. Any obvious solutions I am missing? > The documentation is pretty clear on how to set this up, so there must > be some way to get it to work. I could find nothing on Google. Here's the big clue: Sep 24 22:01:25 dfpc60 systemd-modules-load[185]: Module 'netconsole' is builtin which means that it's going to get initialized during *very* early boot, before the initramfs gets called. netconsole is able to do the equivalent of 'ifconfig' (or at least enough of it to set the IPs/ports/ARP entries before the rest of networking comes up), but it can't also get the physical device up and running if the hardware driver isn't present. Since it's builtin, this is probably a custom-built kernel. So make sure that the driver for eth0 is also builtin. pgpxHC9SDy10Q.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Problem with netconsole and eth0 timing
I am trying to use netconsole to examine some problems I am having booting Alienware 13 r3 and 14 laptops. I am running opensuse 42.3 with generic 4.18.7 kernel configured to include netconsole. After building my kernel and including the proper command (as shown below) in the linux boot string: netconsole=64001@192.168.7.60/eth0,64001@192.168.7.55/34:e6:d7:01:2a:dd I receive the following error message, as shown by journalctl: sudo journalctl -b | grep netconsole Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: local port 64001 Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: local IPv4 address 192.168.7.60 Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: interface 'eth0' Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: remote port 64001 Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: remote IPv4 address 192.168.7.55 Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: remote ethernet address 34:e6:d7:01:2a:dd Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: eth0 doesn't exist, aborting Sep 24 22:01:25 dfpc60 kernel: netconsole: cleaning up Sep 24 22:01:25 dfpc60 systemd-modules-load[185]: Module 'netconsole' is builtin I examined the timing in more detail and found that the above "eth0: doesn't exist" message comes at Sep 24 22:01:25.080603. The wicked message eth0: up comes at Sep 24 22:02:01.173616. The difference is maybe 36 seconds? There is an eth0: avail message at Sep 24 22:01:34.112744, don't know if that would suffice for netconsole Both are after netconsole has bailed out. Any obvious solutions I am missing? The documentation is pretty clear on how to set this up, so there must be some way to get it to work. I could find nothing on Google. This is my first post to this list, so if this question is not appropriate please let me know without flames:-) Don ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies