Re: Problem with netconsole and eth0 timing

2018-09-27 Thread don fisher

On 9/27/18 3:16 PM, don fisher wrote:

On 9/27/18 12:01 AM, valdis.kletni...@vt.edu wrote:

On Wed, 26 Sep 2018 21:38:27 -0700, don fisher said:


Thanks. I tried building with the driver embedded in the kernel, but the
compile failed with a halt. No crash is apparent, just a halt. It turned
out that this was repeated until I removed the netconsole command during
boot. System appears stable now. I will try tomorrow to embed the
driver, then add netconsole option in the command line.


Wait, what?  The *compile* "failed with a halt"?  What the heck does 
that mean?


Don't know what it means. The compile just happened to be what I was 
execution. The system just stopped with no output to screen, dmesg or 
journal. With trial an error I discovered that if I eliminated the 
netconsole command from the grub2 linux command line, the system 
appeared stable again. I put that netconsole command in 
/etc/default/grub, so it is sort of a pain to insert and remove it.


Don
As threatened, I rebuilt with the alx driver embedded. I tested this 
kernel just to make sure the alx driver still supported standard 
Ethernet, which it did. I then add the linux netconsole command and 
rebooted. Everything worked well for awhile, but the output remote 
output stopped at 11.909 sec, while dmesg has entries up to 12.627 sec, 
and later up to 206.453 sec. The last message from dmesg is "work still 
pending". Before, at 11.977 sec there was a "No iBFT detected" message. 
This is about the time output terminated. I do not know what iBFT is. 
The netconsole on the receiver, nc -u -l 64001, is still running at 100% 
cpu utilization.


Journal ctrl on  the source gave:

701:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: local port 64001
703:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: local IPv4 
address 192.168.7.60

705:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: interface 'eth0'
706:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: remote port 64001
707:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: remote IPv4 
address 192.168.7.55
709:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: remote ethernet 
address 34:e6:d7:01:2a:dd
711:Sep 27 16:25:01 dfpc60 kernel: netpoll: netconsole: device eth0 not 
up yet, forcing it

719:Sep 27 16:25:01 dfpc60 kernel: netconsole: network logging started
750:Sep 27 16:25:01 dfpc60 systemd-modules-load[186]: Module 
'netconsole' is builtin


Is the forcing it correct?

Don


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem with netconsole and eth0 timing

2018-09-27 Thread don fisher

On 9/27/18 12:01 AM, valdis.kletni...@vt.edu wrote:

On Wed, 26 Sep 2018 21:38:27 -0700, don fisher said:


Thanks. I tried building with the driver embedded in the kernel, but the
compile failed with a halt. No crash is apparent, just a halt. It turned
out that this was repeated until I removed the netconsole command during
boot. System appears stable now. I will try tomorrow to embed the
driver, then add netconsole option in the command line.


Wait, what?  The *compile* "failed with a halt"?  What the heck does that mean?


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

Don't know what it means. The compile just happened to be what I was 
execution. The system just stopped with no output to screen, dmesg or 
journal. With trial an error I discovered that if I eliminated the 
netconsole command from the grub2 linux command line, the system 
appeared stable again. I put that netconsole command in 
/etc/default/grub, so it is sort of a pain to insert and remove it.


Don

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem with netconsole and eth0 timing

2018-09-27 Thread valdis . kletnieks
On Wed, 26 Sep 2018 21:38:27 -0700, don fisher said:

> Thanks. I tried building with the driver embedded in the kernel, but the
> compile failed with a halt. No crash is apparent, just a halt. It turned
> out that this was repeated until I removed the netconsole command during
> boot. System appears stable now. I will try tomorrow to embed the
> driver, then add netconsole option in the command line.

Wait, what?  The *compile* "failed with a halt"?  What the heck does that mean?


pgp5U1nXlk_eM.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem with netconsole and eth0 timing

2018-09-26 Thread don fisher

On 9/26/18 2:33 PM, valdis.kletni...@vt.edu wrote:

On Wed, 26 Sep 2018 13:25:35 -0700, don fisher said:

Would you tell me how to tell the driver that it is to be eth0, ip
address etc. Maybe on linux command line, but I do not know the format.


To quote some guy named Don Fisher:


my kernel and including the proper command (as shown below) in the linux boot 
string:
   netconsole=64001@192.168.7.60/eth0,64001@192.168.7.55/34:e6:d7:01:2a:dd


That's how. The netconsole command gets the info it needs from there, and tells
the network layer how to configure the ethernet device and the network layer -
although mostly the network layer.  And the devices will auto-name themselves
during boot, so all you need to do is know *which* name the kernel gives to the
port you want to use, and then use that name.

So grovel around in dmesg, and look for lines like (2 examples I have handy 
here)

  grep eth /var/log/dmesg
[7.278395] igb :07:00.0: added PHC on eth0
[7.278398] igb :07:00.0: eth0: (PCIe:5.0Gb/s:Width x2) 24:6e:96:10:db:64
[7.278916] igb :07:00.0: eth0: PBA No: G61346-000
[7.368911] igb :07:00.1: added PHC on eth1
[7.368913] igb :07:00.1: eth1: (PCIe:5.0Gb/s:Width x2) 24:6e:96:10:db:65
[7.369372] igb :07:00.1: eth1: PBA No: G61346-000

dmesg | grep eth
[2.642006] e1000e :00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 
f0:1f:af:0c:8a:da
[2.642076] e1000e :00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[2.642118] e1000e :00:19.0 eth0: MAC: 10, PHY: 11, PBA No: 7011FF-0FF
[5.071095] e1000e :00:19.0 eno1: renamed from eth0
[   44.516004] e1000e :00:19.0 eth0: renamed from eno1

And find the one that has the MAC address of the port you want to use. Note
that you want the kernel-assigned device name, not the one that udev/systemd
finally assign to the device. So for the second example (my laptop), you'd want
either the eth0 or eno1 name (depending which one was in effect when the
netconsole module initializes.  The initial eth0 is from pci enumeration, the
rename to eno1 is courtesy of the kernel, and then the rename back to eth0 by
systemd.  Some experimentation may be needed (I've got a few servers that have
4 1G ports on the motherboard and multiple 10G/40G dual-port cards, so
sometimes the port wired to our management network ends up at eth6 or eth7...)

So if the port you want to use gets named eth4 by the kernel, you use
netconsole=64001@192.168.7.60/eth4,64001@192.168.7.55/34:e6:d7:01:2a:dd

(Gory ethernet details follow :)

Remember that strictly speaking, the ethernet device itself doesn't *need* to
know what its IP address is - it only needs to know its own MAC address so it
knows which packets on the wire to accept to hand to the network stack, and
*maybe* a list of other MAC addresses it should accept.

And the hardware already knows its own MAC address.. :)

You can get ethernet devices working with a *very* small set of functions:

0) Tell the kernel your hardware state (link/no link, MAC address, a few other 
things)
1) Receive packets for your own MAC address
2) Receive broadcast packets
3) Receive packets for another specified MAC address (semi-optional)
4) Receive packets in promiscuous mode (semi-optional)
5) Transmit packet to the MAC address provided

Pretty much everything else can be done in kernelspace (though modern
cards often provide offload of some IP and even TCP handling, interrupt
coalescing, and all sorts of other stuff)

(I learned far too much about minimalist Ethernet when the Clarkson Packet
Drivers were getting created in the cubicle next to mine. :)


Thanks. I tried building with the driver embedded in the kernel, but the 
compile failed with a halt. No crash is apparent, just a halt. It turned 
out that this was repeated until I removed the netconsole command during 
boot. System appears stable now. I will try tomorrow to embed the 
driver, then add netconsole option in the command line.


Don




___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem with netconsole and eth0 timing

2018-09-26 Thread valdis . kletnieks
On Wed, 26 Sep 2018 13:25:35 -0700, don fisher said:
> Would you tell me how to tell the driver that it is to be eth0, ip
> address etc. Maybe on linux command line, but I do not know the format.

To quote some guy named Don Fisher:

> my kernel and including the proper command (as shown below) in the linux boot 
> string:
>   netconsole=64001@192.168.7.60/eth0,64001@192.168.7.55/34:e6:d7:01:2a:dd

That's how. The netconsole command gets the info it needs from there, and tells
the network layer how to configure the ethernet device and the network layer -
although mostly the network layer.  And the devices will auto-name themselves
during boot, so all you need to do is know *which* name the kernel gives to the
port you want to use, and then use that name.

So grovel around in dmesg, and look for lines like (2 examples I have handy 
here)

 grep eth /var/log/dmesg
[7.278395] igb :07:00.0: added PHC on eth0
[7.278398] igb :07:00.0: eth0: (PCIe:5.0Gb/s:Width x2) 24:6e:96:10:db:64
[7.278916] igb :07:00.0: eth0: PBA No: G61346-000
[7.368911] igb :07:00.1: added PHC on eth1
[7.368913] igb :07:00.1: eth1: (PCIe:5.0Gb/s:Width x2) 24:6e:96:10:db:65
[7.369372] igb :07:00.1: eth1: PBA No: G61346-000

dmesg | grep eth
[2.642006] e1000e :00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 
f0:1f:af:0c:8a:da
[2.642076] e1000e :00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[2.642118] e1000e :00:19.0 eth0: MAC: 10, PHY: 11, PBA No: 7011FF-0FF
[5.071095] e1000e :00:19.0 eno1: renamed from eth0
[   44.516004] e1000e :00:19.0 eth0: renamed from eno1

And find the one that has the MAC address of the port you want to use. Note
that you want the kernel-assigned device name, not the one that udev/systemd
finally assign to the device. So for the second example (my laptop), you'd want
either the eth0 or eno1 name (depending which one was in effect when the
netconsole module initializes.  The initial eth0 is from pci enumeration, the
rename to eno1 is courtesy of the kernel, and then the rename back to eth0 by
systemd.  Some experimentation may be needed (I've got a few servers that have
4 1G ports on the motherboard and multiple 10G/40G dual-port cards, so
sometimes the port wired to our management network ends up at eth6 or eth7...)

So if the port you want to use gets named eth4 by the kernel, you use
netconsole=64001@192.168.7.60/eth4,64001@192.168.7.55/34:e6:d7:01:2a:dd

(Gory ethernet details follow :)

Remember that strictly speaking, the ethernet device itself doesn't *need* to
know what its IP address is - it only needs to know its own MAC address so it
knows which packets on the wire to accept to hand to the network stack, and
*maybe* a list of other MAC addresses it should accept.

And the hardware already knows its own MAC address.. :)

You can get ethernet devices working with a *very* small set of functions:

0) Tell the kernel your hardware state (link/no link, MAC address, a few other 
things)
1) Receive packets for your own MAC address
2) Receive broadcast packets
3) Receive packets for another specified MAC address (semi-optional)
4) Receive packets in promiscuous mode (semi-optional)
5) Transmit packet to the MAC address provided

Pretty much everything else can be done in kernelspace (though modern
cards often provide offload of some IP and even TCP handling, interrupt
coalescing, and all sorts of other stuff)

(I learned far too much about minimalist Ethernet when the Clarkson Packet
Drivers were getting created in the cubicle next to mine. :)




pgp7phgOz_EgO.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem with netconsole and eth0 timing

2018-09-26 Thread don fisher

On 9/25/18 7:26 PM, valdis.kletni...@vt.edu wrote:

On Tue, 25 Sep 2018 18:26:06 -0700, don fisher said:


The wicked message eth0: up comes at Sep 24 22:02:01.173616. The
difference is maybe 36 seconds? There is an eth0: avail message at Sep
24 22:01:34.112744, don't know if that would suffice for netconsole Both
are after netconsole has bailed out. Any obvious solutions I am missing?
The documentation is pretty clear on how to set this up, so there must
be some way to get it to work. I could find nothing on Google.


Here's the big clue:

Sep 24 22:01:25 dfpc60 systemd-modules-load[185]: Module 'netconsole' is builtin

which means that it's going to get initialized during *very* early boot, before
the initramfs gets called. netconsole is able to do the equivalent of
'ifconfig' (or at least enough of it to set the IPs/ports/ARP entries before
the rest of networking comes up), but it can't also get the physical device up 
and
running if the hardware driver isn't present.

Since it's builtin, this is probably a custom-built kernel.  So make sure that
the driver for eth0 is also builtin.

Would you tell me how to tell the driver that it is to be eth0, ip 
address etc. Maybe on linux command line, but I do not know the format.


Thanks
Don

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem with netconsole and eth0 timing

2018-09-25 Thread valdis . kletnieks
On Tue, 25 Sep 2018 18:26:06 -0700, don fisher said:

> The wicked message eth0: up comes at Sep 24 22:02:01.173616. The
> difference is maybe 36 seconds? There is an eth0: avail message at Sep
> 24 22:01:34.112744, don't know if that would suffice for netconsole Both
> are after netconsole has bailed out. Any obvious solutions I am missing?
> The documentation is pretty clear on how to set this up, so there must
> be some way to get it to work. I could find nothing on Google.

Here's the big clue:

Sep 24 22:01:25 dfpc60 systemd-modules-load[185]: Module 'netconsole' is builtin

which means that it's going to get initialized during *very* early boot, before
the initramfs gets called. netconsole is able to do the equivalent of
'ifconfig' (or at least enough of it to set the IPs/ports/ARP entries before
the rest of networking comes up), but it can't also get the physical device up 
and
running if the hardware driver isn't present.

Since it's builtin, this is probably a custom-built kernel.  So make sure that
the driver for eth0 is also builtin.



pgpxHC9SDy10Q.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Problem with netconsole and eth0 timing

2018-09-25 Thread don fisher
I am trying to use netconsole to examine some problems I am having 
booting Alienware 13 r3 and 14 laptops. I am running opensuse 42.3 with 
generic 4.18.7 kernel configured to include netconsole. After building 
my kernel and including the proper command (as shown below) in the linux 
boot string:

 netconsole=64001@192.168.7.60/eth0,64001@192.168.7.55/34:e6:d7:01:2a:dd
I receive the following error message, as shown by journalctl:

sudo journalctl -b | grep netconsole
Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: local port 64001
Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: local IPv4 address 
192.168.7.60

Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: interface 'eth0'
Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: remote port 64001
Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: remote IPv4 address 
192.168.7.55
Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: remote ethernet 
address 34:e6:d7:01:2a:dd
Sep 24 22:01:25 dfpc60 kernel: netpoll: netconsole: eth0 doesn't exist, 
aborting

Sep 24 22:01:25 dfpc60 kernel: netconsole: cleaning up
Sep 24 22:01:25 dfpc60 systemd-modules-load[185]: Module 'netconsole' is 
builtin


I examined the timing in more detail and found that the above "eth0: 
doesn't exist" message comes at Sep 24 22:01:25.080603.


The wicked message eth0: up comes at Sep 24 22:02:01.173616. The 
difference is maybe 36 seconds? There is an eth0: avail message at Sep 
24 22:01:34.112744, don't know if that would suffice for netconsole Both 
are after netconsole has bailed out. Any obvious solutions I am missing? 
The documentation is pretty clear on how to set this up, so there must 
be some way to get it to work. I could find nothing on Google.


This is my first post to this list, so if this question is not 
appropriate please let me know without flames:-)


Don

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies