Re: Re: reproducible kernel crash with quota

2022-04-26 Thread 6bone

Hello everyone,

I made a mistake testing the two patches. The two patches solve the 
problem. The kernel then no longer seems to crash.



Thank you for your efforts


Regards
Uwe


On Thu, 21 Apr 2022, J. Hannken-Illjes wrote:


Date: Thu, 21 Apr 2022 14:09:03 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota


On 21. Apr 2022, at 00:36, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Wed, 20 Apr 2022, J. Hannken-Illjes wrote:


Date: Wed, 20 Apr 2022 22:19:30 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota

On 20. Apr 2022, at 22:10, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Tue, 19 Apr 2022, J. Hannken-Illjes wrote:


Date: Tue, 19 Apr 2022 11:07:48 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota

On 19. Apr 2022, at 08:38, 6b...@6bone.informatik.uni-leipzig.de wrote:



Please try again with both diffs applied.


I tested with both patches. If I just enable querquota it seems to work. If you 
also activate groupquota, the kernel crashes:

output:

/etc/rc.d/quota restart
Checking quotas:quotacheck: creating quota file //quota.group


You have root (/) with quota?  What exactly do you have in /etc/fstab?


cat /etc/fstab
# NetBSD /etc/fstab
# See /usr/share/examples/fstab/ for more examples.
NAME=179d5ca2-7f26-476b-b544-823bd1849816   /   ffs 
rw,userquota,groupquota  1 1


I'm confused.  With "/dev/ld0a / ffs rw,userquota,groupquota 1 1"
in /etc/fstab and both patches applied I get:

$ /etc/rc.d/quota restart
Checking quotas: done.

No line "creating quota file ..."

--
J. Hannken-Illjes - hann...@mailbox.org



Re: Re: reproducible kernel crash with quota

2022-04-25 Thread 6bone

On Thu, 21 Apr 2022, J. Hannken-Illjes wrote:


Date: Thu, 21 Apr 2022 14:09:03 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota


On 21. Apr 2022, at 00:36, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Wed, 20 Apr 2022, J. Hannken-Illjes wrote:


Date: Wed, 20 Apr 2022 22:19:30 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota

On 20. Apr 2022, at 22:10, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Tue, 19 Apr 2022, J. Hannken-Illjes wrote:


Date: Tue, 19 Apr 2022 11:07:48 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota

On 19. Apr 2022, at 08:38, 6b...@6bone.informatik.uni-leipzig.de wrote:



Please try again with both diffs applied.


I tested with both patches. If I just enable querquota it seems to work. If you 
also activate groupquota, the kernel crashes:

output:

/etc/rc.d/quota restart
Checking quotas:quotacheck: creating quota file //quota.group


You have root (/) with quota?  What exactly do you have in /etc/fstab?


cat /etc/fstab
# NetBSD /etc/fstab
# See /usr/share/examples/fstab/ for more examples.
NAME=179d5ca2-7f26-476b-b544-823bd1849816   /   ffs 
rw,userquota,groupquota  1 1


I'm confused.  With "/dev/ld0a / ffs rw,userquota,groupquota 1 1"
in /etc/fstab and both patches applied I get:

$ /etc/rc.d/quota restart
Checking quotas: done.

No line "creating quota file ..."


Sorry My mistake! I had deleted the quota.group file. I wanted to make 
sure that the problem wasn't caused by a corrupt file.


I tested again with the current sources and the two patches. The crash can 
still be reproduced.



Thank you for your efforts

Regards
Uwe




--
J. Hannken-Illjes - hann...@mailbox.org



Re: Re: reproducible kernel crash with quota

2022-04-20 Thread 6bone

On Wed, 20 Apr 2022, J. Hannken-Illjes wrote:


Date: Wed, 20 Apr 2022 22:19:30 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota


On 20. Apr 2022, at 22:10, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Tue, 19 Apr 2022, J. Hannken-Illjes wrote:


Date: Tue, 19 Apr 2022 11:07:48 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota

On 19. Apr 2022, at 08:38, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 14 Apr 2022, J. Hannken-Illjes wrote:


Date: Thu, 14 Apr 2022 13:09:02 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota

On 12. Apr 2022, at 08:52, 6b...@6bone.informatik.uni-leipzig.de wrote:

Hello,

since I already have some open bugs with reproducible kernel crashes, I'm only 
writing this to the mailing list.

how to reproduce the crash: /etc/rc.d/quota restart

dmesg:

[   412.047595] panic: kernel diagnostic assertion "dq->dq_ump->um_quotas[dq->dq _type] != 
vp" failed: file "/usr/src/sys/ufs/ufs/ufs_quota.c", line 978
[   412.047595] cpu8: Begin traceback...
[   412.047595] vpanic() at netbsd:vpanic+0x156
[   412.057595] kern_assert() at netbsd:kern_assert+0x4b
[   412.057595] dqflush() at netbsd:dqflush+0x92
[   412.057595] quota1_handle_cmd_quotaoff() at 
netbsd:quota1_handle_cmd_quotaof f+0x120
[   412.057595] ufs_quotactl() at netbsd:ufs_quotactl+0x3d
[   412.057595] VFS_QUOTACTL() at netbsd:VFS_QUOTACTL+0x22
[   412.057595] vfs_quotactl_quotaoff() at netbsd:vfs_quotactl_quotaoff+0x1b
[   412.057595] do_sys_quotactl() at netbsd:do_sys_quotactl+0xf1
[   412.067595] sys___quotactl() at netbsd:sys___quotactl+0x2e
[   412.067595] syscall() at netbsd:syscall+0x196
[   412.067595] --- syscall (number 473) ---
[   412.067595] netbsd:syscall+0x196:
[   412.067595] cpu8: End traceback...

[   412.067595] dumping to dev 168,1 (offset=8, size=33425953):
[   412.067595] dump


(gdb) target kvm netbsd.1.core



I'm quite sure you have a /etc/fstab with "userquota,groupquota", yes?

with gdb:

frame 4 (dqflush())
print dq->dq_ump->um_quotas[0]
print dq->dq_ump->um_quotas[1]

gives the same vnode address for both fields, yes?

If this is the case the attached diff should help, since 2012-01-30
group quota got enabled on the user quota file.

As a workaround you could try to name the quota files in /etc/fstab
like "groupquota=XXX/quota.group".


You are right. I use groupquota and userquota in fstab. I tested the patch. 
With patch there is no crash. But the /etc/rc.d/quota restart leads to the 
blocking of the file system. You can only turn off the server. This also 
happens when I only use userquota in the fstab.


Sorry, forgot the second diff (now attached) that prevents looping
when taking the quota off on a modified file system.

Please try again with both diffs applied.


I tested with both patches. If I just enable querquota it seems to work. If you 
also activate groupquota, the kernel crashes:

output:

/etc/rc.d/quota restart
Checking quotas:quotacheck: creating quota file //quota.group


You have root (/) with quota?  What exactly do you have in /etc/fstab?


cat /etc/fstab
# NetBSD /etc/fstab
# See /usr/share/examples/fstab/ for more examples.
NAME=179d5ca2-7f26-476b-b544-823bd1849816   /   ffs 
rw,userquota,groupquota  1 1
NAME=088fb0c9-d042-451d-b768-39d9c1f44a17   noneswap 
sw,dp   0 0

tmpfs   /tmptmpfs   rw,-m=1777,-s=ram%15
kernfs  /kern   kernfs  rw
ptyfs   /dev/ptsptyfs   rw
procfs  /proc   procfs  rw
/dev/cd0a   /cdrom  cd9660  ro,noauto
tmpfs   /var/shmtmpfs   rw,-m1777,-sram%15


done.

-> crash


Are "dq->dq_ump->um_quotas[0]" and "dq->dq_ump->um_quotas[1]]" now different?


[   448.325252] panic: kernel diagnostic assertion "dq->dq_ump->um_quotas[dq->dq_type] != 
vp" failed: file "/usr/src/sys/ufs/ufs/ufs_quota.c", line 978
[   448.325252] cpu1: Begin traceback...
[   448.325252] vpanic() at netbsd:vpanic+0x156
[   448.325252] kern_assert() at netbsd:kern_assert+0x4b
[   448.325252] dqflush() at netbsd:dqflush+0x92
[   448.335252] quota1_handle_cmd_quotaoff() at 
netbsd:quota1_handle_cmd_quotaoff+0x117
[   448.335252] ufs_quotactl() at netbsd:ufs_quotactl+0x3d
[   448.335252] VFS_QUOTACTL() at netbsd:VFS_QUOTACTL+0x22
[   448.335252] vfs_quotactl_quotaoff() at netbsd:vfs_quotactl_quotaoff+0x1b
[   448.335252] do_sys_quotactl() at netbsd:do_sys_quotactl+0xf1
[   448.335252] sys___quotactl() at netbsd:sys___quotactl+0x2e
[   448.345252] syscall() at netbsd:syscall+0x196
[   448.345252] --- syscall (number 473) ---
[   448.345252] netbsd:syscall+0x196:
[   448.345252] cpu1: End traceback...

[   

Re: Re: reproducible kernel crash with quota

2022-04-20 Thread 6bone

On Tue, 19 Apr 2022, J. Hannken-Illjes wrote:


Date: Tue, 19 Apr 2022 11:07:48 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota


On 19. Apr 2022, at 08:38, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 14 Apr 2022, J. Hannken-Illjes wrote:


Date: Thu, 14 Apr 2022 13:09:02 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota

On 12. Apr 2022, at 08:52, 6b...@6bone.informatik.uni-leipzig.de wrote:

Hello,

since I already have some open bugs with reproducible kernel crashes, I'm only 
writing this to the mailing list.

how to reproduce the crash: /etc/rc.d/quota restart

dmesg:

[   412.047595] panic: kernel diagnostic assertion "dq->dq_ump->um_quotas[dq->dq _type] != 
vp" failed: file "/usr/src/sys/ufs/ufs/ufs_quota.c", line 978
[   412.047595] cpu8: Begin traceback...
[   412.047595] vpanic() at netbsd:vpanic+0x156
[   412.057595] kern_assert() at netbsd:kern_assert+0x4b
[   412.057595] dqflush() at netbsd:dqflush+0x92
[   412.057595] quota1_handle_cmd_quotaoff() at 
netbsd:quota1_handle_cmd_quotaof f+0x120
[   412.057595] ufs_quotactl() at netbsd:ufs_quotactl+0x3d
[   412.057595] VFS_QUOTACTL() at netbsd:VFS_QUOTACTL+0x22
[   412.057595] vfs_quotactl_quotaoff() at netbsd:vfs_quotactl_quotaoff+0x1b
[   412.057595] do_sys_quotactl() at netbsd:do_sys_quotactl+0xf1
[   412.067595] sys___quotactl() at netbsd:sys___quotactl+0x2e
[   412.067595] syscall() at netbsd:syscall+0x196
[   412.067595] --- syscall (number 473) ---
[   412.067595] netbsd:syscall+0x196:
[   412.067595] cpu8: End traceback...

[   412.067595] dumping to dev 168,1 (offset=8, size=33425953):
[   412.067595] dump


(gdb) target kvm netbsd.1.core



I'm quite sure you have a /etc/fstab with "userquota,groupquota", yes?

with gdb:

frame 4 (dqflush())
print dq->dq_ump->um_quotas[0]
print dq->dq_ump->um_quotas[1]

gives the same vnode address for both fields, yes?

If this is the case the attached diff should help, since 2012-01-30
group quota got enabled on the user quota file.

As a workaround you could try to name the quota files in /etc/fstab
like "groupquota=XXX/quota.group".


You are right. I use groupquota and userquota in fstab. I tested the patch. 
With patch there is no crash. But the /etc/rc.d/quota restart leads to the 
blocking of the file system. You can only turn off the server. This also 
happens when I only use userquota in the fstab.


Sorry, forgot the second diff (now attached) that prevents looping
when taking the quota off on a modified file system.

Please try again with both diffs applied.


I tested with both patches. If I just enable querquota it seems to work. 
If you also activate groupquota, the kernel crashes:


output:

 /etc/rc.d/quota restart
Checking quotas:quotacheck: creating quota file //quota.group
 done.

-> crash

[   448.325252] panic: kernel diagnostic assertion 
"dq->dq_ump->um_quotas[dq->dq_type] != vp" failed: file 
"/usr/src/sys/ufs/ufs/ufs_quota.c", line 978

[   448.325252] cpu1: Begin traceback...
[   448.325252] vpanic() at netbsd:vpanic+0x156
[   448.325252] kern_assert() at netbsd:kern_assert+0x4b
[   448.325252] dqflush() at netbsd:dqflush+0x92
[   448.335252] quota1_handle_cmd_quotaoff() at 
netbsd:quota1_handle_cmd_quotaoff+0x117

[   448.335252] ufs_quotactl() at netbsd:ufs_quotactl+0x3d
[   448.335252] VFS_QUOTACTL() at netbsd:VFS_QUOTACTL+0x22
[   448.335252] vfs_quotactl_quotaoff() at 
netbsd:vfs_quotactl_quotaoff+0x1b

[   448.335252] do_sys_quotactl() at netbsd:do_sys_quotactl+0xf1
[   448.335252] sys___quotactl() at netbsd:sys___quotactl+0x2e
[   448.345252] syscall() at netbsd:syscall+0x196
[   448.345252] --- syscall (number 473) ---
[   448.345252] netbsd:syscall+0x196:
[   448.345252] cpu1: End traceback...

[   448.345252] dumping to dev 168,1 (offset=8, size=33425953):
[   448.345252] dump



Thank you for your efforts

Regards
Uwe






Maybe someone can fix the problem.


Thank you for your efforts


Regards
Uwe


--
J. Hannken-Illjes - hann...@mailbox.org



--
J. Hannken-Illjes - hann...@mailbox.org




Re: Re: reproducible kernel crash with quota

2022-04-19 Thread 6bone

On Thu, 14 Apr 2022, J. Hannken-Illjes wrote:


Date: Thu, 14 Apr 2022 13:09:02 +0200
From: J. Hannken-Illjes 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org, Manuel Bouyer 
Subject: [Extern] Re: reproducible kernel crash with quota


On 12. Apr 2022, at 08:52, 6b...@6bone.informatik.uni-leipzig.de wrote:

Hello,

since I already have some open bugs with reproducible kernel crashes, I'm only 
writing this to the mailing list.

how to reproduce the crash: /etc/rc.d/quota restart

dmesg:

[   412.047595] panic: kernel diagnostic assertion "dq->dq_ump->um_quotas[dq->dq _type] != 
vp" failed: file "/usr/src/sys/ufs/ufs/ufs_quota.c", line 978
[   412.047595] cpu8: Begin traceback...
[   412.047595] vpanic() at netbsd:vpanic+0x156
[   412.057595] kern_assert() at netbsd:kern_assert+0x4b
[   412.057595] dqflush() at netbsd:dqflush+0x92
[   412.057595] quota1_handle_cmd_quotaoff() at 
netbsd:quota1_handle_cmd_quotaof f+0x120
[   412.057595] ufs_quotactl() at netbsd:ufs_quotactl+0x3d
[   412.057595] VFS_QUOTACTL() at netbsd:VFS_QUOTACTL+0x22
[   412.057595] vfs_quotactl_quotaoff() at netbsd:vfs_quotactl_quotaoff+0x1b
[   412.057595] do_sys_quotactl() at netbsd:do_sys_quotactl+0xf1
[   412.067595] sys___quotactl() at netbsd:sys___quotactl+0x2e
[   412.067595] syscall() at netbsd:syscall+0x196
[   412.067595] --- syscall (number 473) ---
[   412.067595] netbsd:syscall+0x196:
[   412.067595] cpu8: End traceback...

[   412.067595] dumping to dev 168,1 (offset=8, size=33425953):
[   412.067595] dump


(gdb) target kvm netbsd.1.core



I'm quite sure you have a /etc/fstab with "userquota,groupquota", yes?

with gdb:

frame 4 (dqflush())
print dq->dq_ump->um_quotas[0]
print dq->dq_ump->um_quotas[1]

gives the same vnode address for both fields, yes?

If this is the case the attached diff should help, since 2012-01-30
group quota got enabled on the user quota file.

As a workaround you could try to name the quota files in /etc/fstab
like "groupquota=XXX/quota.group".


You are right. I use groupquota and userquota in fstab. I tested the 
patch. With patch there is no crash. But the /etc/rc.d/quota restart leads 
to the blocking of the file system. You can only turn off the server. This 
also happens when I only use userquota in the fstab.



Thank you for your efforts

Regards
Uwe






Maybe someone can fix the problem.


Thank you for your efforts


Regards
Uwe


--
J. Hannken-Illjes - hann...@mailbox.org



reproducible kernel crash with quota

2022-04-12 Thread 6bone

Hello,

since I already have some open bugs with reproducible kernel crashes, I'm 
only writing this to the mailing list.


how to reproduce the crash: /etc/rc.d/quota restart

dmesg:

[   412.047595] panic: kernel diagnostic assertion 
"dq->dq_ump->um_quotas[dq->dq 
_type] != vp" failed: file "/usr/src/sys/ufs/ufs/ufs_quota.c", line 978

[   412.047595] cpu8: Begin traceback...
[   412.047595] vpanic() at netbsd:vpanic+0x156
[   412.057595] kern_assert() at netbsd:kern_assert+0x4b
[   412.057595] dqflush() at netbsd:dqflush+0x92
[   412.057595] quota1_handle_cmd_quotaoff() at 
netbsd:quota1_handle_cmd_quotaof 
f+0x120

[   412.057595] ufs_quotactl() at netbsd:ufs_quotactl+0x3d
[   412.057595] VFS_QUOTACTL() at netbsd:VFS_QUOTACTL+0x22
[   412.057595] vfs_quotactl_quotaoff() at 
netbsd:vfs_quotactl_quotaoff+0x1b

[   412.057595] do_sys_quotactl() at netbsd:do_sys_quotactl+0xf1
[   412.067595] sys___quotactl() at netbsd:sys___quotactl+0x2e
[   412.067595] syscall() at netbsd:syscall+0x196
[   412.067595] --- syscall (number 473) ---
[   412.067595] netbsd:syscall+0x196:
[   412.067595] cpu8: End traceback...

[   412.067595] dumping to dev 168,1 (offset=8, size=33425953):
[   412.067595] dump


(gdb) target kvm netbsd.1.core

0x80226145 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:720

720 dumpsys();
(gdb) bt
#0  0x80226145 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:720
#1  0x80d39a67 in kern_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
#2  0x80d7e722 in vpanic (fmt=0x81390ac8 "kernel 
%sassertion \"%s\" failed: file \"%s\", line %d ", 
ap=ap@entry=0xb30927a7bcc8) at /usr/src/sys/kern/subr_prf.c:290
#3  0x80f3e47f in kern_assert (fmt=fmt@entry=0x81390ac8 
"kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at 
/usr/src/sys/lib/libkern/kern_assert.c:51
#4  0x80cbc937 in dqflush (vp=0xa7149294cd00) at 
/usr/src/sys/ufs/ufs/ufs_quota.c:978
#5  0x80cbd173 in quota1_handle_cmd_quotaoff 
(l=l@entry=0xa7147954eb80, ump=0xa7149627a900, type=1) at 
/usr/src/sys/ufs/ufs/ufs_quota1.c:461
#6  0x80cbbbd1 in quota_handle_cmd_quotaoff 
(args=0xb30927a7be30, l=0xa7147954eb80, mp=0xa7149626f000) at 
/usr/src/sys/ufs/ufs/ufs_quota.c:675
#7  0x80cc480c in ufs_quotactl (mp=0xa7149626f000, 
args=0xb30927a7be30) at /usr/src/sys/ufs/ufs/ufs_vfsops.c:142
#8  0x80ddf825 in VFS_QUOTACTL (mp=mp@entry=0xa7149626f000, 
args=args@entry=0xb30927a7be30) at /usr/src/sys/kern/vfs_subr.c:1449
#9  0x80ddce21 in vfs_quotactl_quotaoff 
(mp=mp@entry=0xa7149626f000, idtype=) at 
/usr/src/sys/kern/vfs_quotactl.c:193
#10 0x80de3071 in do_sys_quotactl_quotaoff (idtype=out>, mp=0xa7149626f000) at /usr/src/sys/kern/vfs_syscalls.c:1125
#11 do_sys_quotactl (path_u=, 
args=args@entry=0xb30927a7bf50) at 
/usr/src/sys/kern/vfs_syscalls.c:1203
#12 0x80de34df in sys___quotactl (l=, 
uap=0xb30927a7c000, retval=) at 
/usr/src/sys/kern/vfs_syscalls.c:1232
#13 0x805726ee in sy_call (rval=0xb30927a7bfb0, 
uap=0xb30927a7c000, l=0xa7147954eb80, sy=0x81888338 
) at /usr/src/sys/sys/syscallvar.h:65
#14 sy_invoke (code=473, rval=0xb30927a7bfb0, uap=0xb30927a7c000, 
l=0xa7147954eb80, sy=0x81888338 ) at 
/usr/src/sys/sys/syscallvar.h:94
#15 syscall (frame=0xb30927a7c000) at 
/usr/src/sys/arch/x86/x86/syscall.c:138

#16 0x8020b25d in handle_syscall ()
#17 0x76b67bfbf010 in ?? ()
#18 0x7f7fffe65c90 in ?? ()
#19 0x0001 in ?? ()
#20 0x76b67aff94eb in ?? ()
#21 0x in ?? ()

Maybe someone can fix the problem.


Thank you for your efforts


Regards
Uwe


Re: Re: Crash on various Supermicro motherboards

2022-04-08 Thread 6bone

Here is the CPU type:

https://speicherwolke.uni-leipzig.de/index.php/s/SM6LQqKPqKYeCqM


Regards
Uwe

On Thu, 7 Apr 2022, Christos Zoulas wrote:


Date: Thu, 7 Apr 2022 20:36:26 - (UTC)
From: Christos Zoulas 
To: current-users@netbsd.org
Subject: [Extern] Re: Crash on various Supermicro motherboards

In article ,
<6b...@6bone.informatik.uni-leipzig.de> wrote:

Hello,

I now have the backtrace:

https://speicherwolke.uni-leipzig.de/index.php/s/cFXAbL6axwHpKkL


What CPUs are these? I don't see the cpu lines in the avi...

christos



Re: Re: Re: Crash on various Supermicro motherboards

2022-04-07 Thread 6bone

Hello,

I now have the backtrace:

https://speicherwolke.uni-leipzig.de/index.php/s/cFXAbL6axwHpKkL


Thank you for your efforts


Regards
Uwe


On Wed, 23 Mar 2022, Paul Goyette wrote:


Date: Wed, 23 Mar 2022 14:19:50 -0700 (PDT)
From: Paul Goyette 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: [Extern] Re: Re: Crash on various Supermicro motherboards

On Wed, 23 Mar 2022, 6b...@6bone.informatik.uni-leipzig.de wrote:


On Wed, 23 Mar 2022, Paul Goyette wrote:


Date: Wed, 23 Mar 2022 08:50:03 -0700 (PDT)
From: Paul Goyette 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: [Extern] Re: Crash on various Supermicro motherboards

On Wed, 23 Mar 2022, 6b...@6bone.informatik.uni-leipzig.de wrote:

I can't offer a dump. The kernel jumps into the ddb. This does not accept 
input from USB devices.


Recompile a kernel with ``options DDB_COMMANDONENTER="reboot 0x100' ''


Should reboot 0x100 create a kernel dump? I'm afraid this doesn't work. No 
drive or swap is mounted at the time of the crash.


The config line should help

config netbsd root on wd0a dump on wd0b

(of course use ethe correct drive designation)

also try setting up a serial-console connection to another machine
so you can capture the console messages


++--+--+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses:|
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com|
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org  |
| & Network Engineer |  | pgoyett...@gmail.com |
++--+--+



Re: Re: Crash on various Supermicro motherboards

2022-03-23 Thread 6bone

On Wed, 23 Mar 2022, Paul Goyette wrote:


Date: Wed, 23 Mar 2022 08:50:03 -0700 (PDT)
From: Paul Goyette 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: [Extern] Re: Crash on various Supermicro motherboards

On Wed, 23 Mar 2022, 6b...@6bone.informatik.uni-leipzig.de wrote:

I can't offer a dump. The kernel jumps into the ddb. This does not accept 
input from USB devices.


Recompile a kernel with ``options DDB_COMMANDONENTER="reboot 0x100' ''


Should reboot 0x100 create a kernel dump? I'm afraid this doesn't work. No 
drive or swap is mounted at the time of the crash.





++--+--+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses:|
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com|
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org  |
| & Network Engineer |  | pgoyett...@gmail.com |
++--+--+



Crash on various Supermicro motherboards

2022-03-23 Thread 6bone

Hello,

When trying to install netbsd-current on different Supermicro servers, 
reproducible crashes occur.


A concrete example:

Serverboard: Supermicro X11DPL-i
Bios Version 3.1

If the bios option "Extended APIC" is disabled, everything works as 
expected. If the option is activated, the kernel crashes when booting.


The manual says:

###
Extended APIC

Select Enable to activate APIC (Advanced Programmable Interrupt 
Controller) support. The options are Disable and Enable.

###

I can't offer a dump. The kernel jumps into the ddb. This does not accept 
input from USB devices.


Is this a bug or does netbsd not support this feature?


Thank you four your efforts


Regards
Uwe


Re: Re: reproducible kernel crash with the agr interface

2022-02-22 Thread 6bone

Hello,

for me the lagg interface works perfectly.

Thanks for the tip.


Regards
Uwe


On Mon, 21 Feb 2022, Shoichi Yamaguchi wrote:


Date: Mon, 21 Feb 2022 16:18:59 +0900
From: Shoichi Yamaguchi 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: [Extern] Re: reproducible kernel crash with the agr interface

Hi,

This panic may be caused by adding an interface to agr(4)
after another interface is added to the agr(4) and deleted.


ifconfig agr0 create
ifconfig agr0 agrport wm1


ifp->if_hwdl of agr0 is created on adding wm1.


ifconfig agr0 -agrport wm1


ifp->if_hwdl of agr0 is not deleted on it.


ifconfig agr0 agrport bnx1


ifp->if_hwdl is created again, so the panic appears.

You may be able to use lagg(4) to avoid the panic.

% ifconfig lagg0 create
% ifconfig lagg0 laggproto lacp
% ifconfig lagg0 laggport wm1
% ifconfig lagg0 -laggport wm1
% ifconfig lagg0 laggport bnx1

-- yamaguchi



reproducible kernel crash with the agr interface

2022-02-18 Thread 6bone

Hello,

I have a system with bnx and wm interfaces. When I try the following the 
kernel crashes.


ifconfig agr0 create
ifconfig agr0 agrport wm1
ifconfig agr0 -agrport wm1
ifconfig agr0 agrport bnx1
->dump


[ 22261.159501] panic: kernel diagnostic assertion "ifp->if_hwdl == NULL" 
failed: file "/usr/src/sys/net/if.c", line 456

[ 22261.159501] cpu1: Begin traceback...
[ 22261.159501] vpanic() at netbsd:vpanic+0x156
[ 22261.159501] kern_assert() at netbsd:kern_assert+0x4b
[ 22261.159501] if_set_sadl() at netbsd:if_set_sadl+0x9b
[ 22261.159501] ether_ifattach() at netbsd:ether_ifattach+0x78
[ 22261.159501] agrether_ctor() at netbsd:agrether_ctor+0x6d
[ 22261.159501] agr_ioctl() at netbsd:agr_ioctl+0x977
[ 22261.169542] doifioctl() at netbsd:doifioctl+0x307
[ 22261.169542] sys_ioctl() at netbsd:sys_ioctl+0x56d
[ 22261.169542] syscall() at netbsd:syscall+0x196
[ 22261.169542] --- syscall (number 54) ---
[ 22261.169542] netbsd:syscall+0x196:
[ 22261.169542] cpu1: End traceback...

[ 22261.169542] dumping to dev 4,1 (offset=7071, size=12581616):
[ 22261.169542] dump


Is the information enough to recreate the problem or do I need to do more 
work?



Thank you for your efforts

Regards
Uwe



Re: Re: Re: Bug or no Bug?

2022-02-11 Thread 6bone

Hello,

Since the server doesn't have an Nvidia graphics card, I tried to find 
another cause. In my case the problem seems to occur when starting the 
bnx* network interface.


If LOCKDEBUG is enabled in the kernel, the kernel crashes when the bnx* 
network card is activated. With LOCKDEBUG and without network the server 
starts fine.


Without LOCKDEBUG everything works stable.


Thank you for your efforts

Regards
Uwe

On Thu, 10 Feb 2022, Martin Husemann wrote:


Date: Thu, 10 Feb 2022 13:58:59 +0100
From: Martin Husemann 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: Manuel Bouyer , current-users@netbsd.org
Subject: [Extern] Re: Re: Bug or no Bug?

The kernel lock is held too long while the graphics card is configured.
I have seen that with some Nvidia cards, where I just have not been able
to boot LOCKDEBUG kernels (example in PR 55185).

You can patch out the kernel lock spinout code (so the kernel does
not limit the KERNEL_LOCK() spin time and you then can investigate
other locking issues). Maybe we should offer an official option to do
that?

Martin
P.S.: only guessing it is the graphics card, something(tm) is taking very
long while holding the kernel lock, anything called during autoconfiguration
could do that - but also a few other things.



Re: Re: Bug or no Bug?

2022-02-10 Thread 6bone

Hello,

the kernel crashes during the boot process after enabling the network. At 
this point no dump files have been written.


As first step I have created a clip from the crash:

https://speicherwolke.uni-leipzig.de/index.php/s/jFpEa5TAnJAmEcF

As soon as I get a usable crashfile I will make an official bug report.


Thank you for your efforts

Regards
Uwe


On Thu, 10 Feb 2022, Manuel Bouyer wrote:


Date: Thu, 10 Feb 2022 09:53:07 +0100
From: Manuel Bouyer 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: [Extern] Re: Bug or no Bug?

On Wed, Feb 09, 2022 at 09:22:34PM +0100, 6b...@6bone.informatik.uni-leipzig.de 
wrote:

Hello,

I have installed the 9.99.xx kernel on several systems. On most systems
there are no problems. On a Dell 2800, the kernel crashes during boot. The
problem only occurs if the option LOCKDEBUG is set.

options LOCKDEBUG   # expensive locking checks/support

Should a bug report be made in this case? Or should problems that only occur
when LOCKDEBUG is enabled be ignored?


Crash with LOCKDEBUG are not expected, so please report.


--
Manuel Bouyer 
NetBSD: 26 ans d'experience feront toujours la difference
--



Bug or no Bug?

2022-02-09 Thread 6bone

Hello,

I have installed the 9.99.xx kernel on several systems. On most systems 
there are no problems. On a Dell 2800, the kernel crashes during boot. The 
problem only occurs if the option LOCKDEBUG is set.


options LOCKDEBUG   # expensive locking checks/support

Should a bug report be made in this case? Or should problems that only 
occur when LOCKDEBUG is enabled be ignored?



Thank you for your efforts

Regards
Uwe



crash at MegaRAID SAS 9341-8i

2022-02-03 Thread 6bone

Hi there,

could someone take a look at the kern/56669 issue? I tested different 
kernels on different days. The problem can be reproduced in a short time.


An SSD is also installed in the server. If you only work on the SSD (no 
MegaRAID) the problem does not occur. So I suspect there is a problem 
using the controller.


Is there more information I can provide?


Thank you for your efforts
Uwe


Re: Re: netbsd-9.99.93 crash

2022-01-26 Thread 6bone



On Sat, 22 Jan 2022, Martin Husemann wrote:


Date: Sat, 22 Jan 2022 08:36:22 +
From: Martin Husemann 
To: Christos Zoulas 
Cc: current-users@netbsd.org
Subject: [Extern] Re: netbsd-9.99.93 crash

On Mon, Jan 10, 2022 at 10:18:02PM -, Christos Zoulas wrote:

In article ,
 <6b...@6bone.informatik.uni-leipzig.de> wrote:

Does that help?

Regards
Uwe


I have the same issue and I can reproduce this on demand. I have disabled
IPv6 on this router.


Has this been fixed? If not, can you file a PR please? I consider this
a blocker for the netbsd-10 branch.


I tested with the current code. I can no longer reproduce the crash.



Thanks,

Martin



Regards
Uwe


Re: Re: netbsd-9.99.93 crash

2022-01-24 Thread 6bone

Hello,

I think the problem is not fixed yet. At the moment I can't boot my 
server to test.



Regards
Uwe


On Sat, 22 Jan 2022, Martin Husemann wrote:


Date: Sat, 22 Jan 2022 08:36:22 +
From: Martin Husemann 
To: Christos Zoulas 
Cc: current-users@netbsd.org
Subject: [Extern] Re: netbsd-9.99.93 crash

On Mon, Jan 10, 2022 at 10:18:02PM -, Christos Zoulas wrote:

In article ,
 <6b...@6bone.informatik.uni-leipzig.de> wrote:

Does that help?

Regards
Uwe


I have the same issue and I can reproduce this on demand. I have disabled
IPv6 on this router.


Has this been fixed? If not, can you file a PR please? I consider this
a blocker for the netbsd-10 branch.

Thanks,

Martin



Re: Re: netbsd-9.99.93 crash

2022-01-10 Thread 6bone

Does that help?

Regards
Uwe

(gdb) list *(0x80ea0a90)
0x80ea0a90 is in doifioctl (/usr/src/sys/net/if.c:3494).
3489}
3490}
3491
3492oif_flags = ifp->if_flags;
3493
3494KERNEL_LOCK_UNLESS_IFP_MPSAFE(ifp);
3495IFNET_LOCK(ifp);
3496
3497error = if_ioctl(ifp, cmd, data);
3498if (error != ENOTTY)


On Mon, 10 Jan 2022, Martin Husemann wrote:


Date: Mon, 10 Jan 2022 12:32:50 +0100
From: Martin Husemann 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: [Extern] Re: netbsd-9.99.93 crash

You need to identify what the "last lockecd" location is, something
like:

gdb netbsd.gdb  # or just netbsd, if you do not have a netbsd.gdb
gdb> list *(0x80ea0a90)

(I never can remember the equivalent addr2line flags for this, probably:
addr2line -a 0x80ea0a90 -e netbsd.gdb)

Martin



netbsd-9.99.93 crash

2022-01-10 Thread 6bone

Hi there,

when starting the network the kernel crashes.

Does anyone have an idea what the problem could be?


Thank you for your efforts

Regards
Uwe

dmesg -M netbsd.14.core -N netbsd.14
...
[28.382481] boot device: sd0
[28.382481] root on sd0a dumps on sd0b
[28.382481] dump_misc_init: max_paddr = 0xc3000
[28.382481] mountroot: trying lfs...
[28.382481] mountroot: trying ffs...
[28.382481] root file system type: ffs
[28.382481] kern.module.path=/stand/amd64/9.99.93/modules
[28.402663] init: copying out path `/sbin/init' 11
[34.076183] mfi0: normal state on 'mfi0:0' (online)
[   177.114151] Kernel lock error: _kernel_lock,239: spinout

[   177.114151] lock address : 0x818a5040 type : 
spin

[   177.114151] initialized  : 0x80f82ed0
[   177.114151] shared holds :  0 exclusive: 
1
[   177.114151] shares wanted:  0 exclusive: 
2
[   177.114151] relevant cpu :  0 last held: 
1
[   177.114151] relevant lwp : 0x865288bc1080 last held: 
0x865280278080
[   177.114151] last locked* : 0x80ea0a90 unlocked : 
0x80ea0aa1
[   177.114151] curcpu holds :  0 wanted by: 
0x865288bc1080


[   177.114151] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,239: 
spinout

[   177.114151] cpu0: Begin traceback...
[   177.114151] vpanic() at netbsd:vpanic+0x156
[   177.124150] panic() at netbsd:panic+0x3c
[   177.124150] lockdebug_abort1() at netbsd:lockdebug_abort1+0xe6
[   177.124150] _kernel_lock() at netbsd:_kernel_lock+0x22a
[   177.124150] frag6_fasttimo() at netbsd:frag6_fasttimo+0x1a
[   177.124150] pffasttimo() at netbsd:pffasttimo+0x34
[   177.124150] callout_softclock() at netbsd:callout_softclock+0xbe
[   177.124150] softint_dispatch() at netbsd:softint_dispatch+0xf2
[   177.124150] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0x99037d0eb0f0

[   177.124150] Xsoftintr() at netbsd:Xsoftintr+0x4f
[   177.134195] --- interrupt ---
[   177.134195] 0:
[   177.134195] cpu0: End traceback...

[   177.134195] dumping to dev 4,1 (offset=7071, size=12581616):
[   177.134195] dump


vmstat -M netbsd.14.core -N netbsd.14 -s
 4096 bytes per page
   64 page colors
 12192674 pages managed
 12109936 pages free
0 pages paging
0 pages wired
0 zero pages
1 reserve pagedaemon pages
   20 reserve kernel pages
   381045 boot kernel pages
189529716 kernel pool pages
  635 anonymous pages
  153 cached file pages
  552 cached executable pages
 1024 minimum free pages
 1365 target free pages
  4064224 maximum wired pages
1 swap devices
 1535 swap pages
0 swap pages in use
0 swap allocations
0 total faults taken
0 traps
0 device interrupts
0 CPU context switches
0 software interrupts
18446517600032562176 system calls
0 pagein requests
0 pageout requests
0 pages swapped in
0 pages swapped out
  134 forks total
   86 forks blocked parent
   86 forks shared address space with parent
0 pagealloc zero wanted and avail
0 pagealloc zero wanted and not avail
0 aborts of idle page zeroing
81370 pagealloc desired color avail
0 pagealloc desired color not avail
21255 pagealloc local cpu avail
60115 pagealloc local cpu not avail
0 faults with no memory
0 faults with no anons
0 faults had to wait on pages
0 faults found released page
  143 faults relock (143 ok)
   726290 anon page faults
0 anon retry faults
 3480 amap copy faults
0 neighbour anon page faults
33715 neighbour object page faults
 9305 locked pager get faults
  143 unlocked pager get faults
   724465 anon faults
 1825 anon copy on write faults
 7249 object faults
 2056 promote copy faults
12206 promote zero fill faults
0 times daemon wokeup
0 revolutions of the clock hand
0 pages freed by daemon
0 pages scanned by daemon
0 anonymous pages scanned by daemon
0 object pages scanned by daemon
0 pages reactivated
0 pages found busy by daemon
0 total pending pageouts
0 pages deactivated
0 total name lookups
0 good hits
0 negative hits
0 bad hits
0 false hits
0 miss
0 too long
0 pass2 hits
0 2passes
  cache hits (0% pos + 0% neg) system 0% per-process
  deletions 0%, falsehits 0%, toolong 0%


gdb /netbsd
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 


This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type 

NetBSD 9.0 RC1 compile error

2019-12-16 Thread 6bone

Hello,

with enabled DEBUG:

--- if_ena.o ---
/usr/src/sys/dev/pci/if_ena.c: In function 'ena_request_io_irq':
/usr/src/sys/dev/pci/if_ena.c:2089:7: error: unused variable 'irq_slot' 
[-Werror=unused-variable]

   int irq_slot = i + irq_off;
   ^~~~


Regards
Uwe


Intel XXV710-DA2 / ixl

2019-09-18 Thread 6bone

Hi,

I have a server with an INTEL XXV710-DA2 network card. Unfortunately, 
NetBSD does not support this card. On the mailing list I read that other 
users have asked for support for this card. Would anyone be able to port 
the ixl driver from FreeBSD? I could offer access to an unused server with 
this card.



Thanks for your efforts

Best regards

Uwe


support for Intel NIC

2019-08-14 Thread 6bone

Hello,

is it planed to add the support for Intel NIC XXV710-DA2 (vendor 8086, 
product 0x158b)?


Thank you for your efforts

Regards
Uwe



Network questions

2018-11-11 Thread 6bone

Hello,

We run a NetBSD router for our network. The router has two Layer 3 uplinks 
(quagga / ospf) to the provider. The router also has two Layer-3 links 
(quagga / ospf) to the Datacenter.


The provider offers us two lines (active / active). Unfortunately we can 
not use them in upstream because NetBSD does not offer ECMP. Is the 
implementation of ECMP planned in the near future?


We also run a firewall with npf. Is there any way to define firewall rules 
with connection tracking when using asynchronous routing as described?


Some years ago it was not possible to make portchannel with Broadcom 
network cards. There were problems with the driver setting the MAC address 
of the physical interfaces. The problem does not seem to be solved with 
NetBSD-8. Is it still possible to do that?


Thank you for your efforts

Regards
Uwe


Re: netbsd-8 crash

2018-08-05 Thread 6bone

On Sun, 5 Aug 2018, Christos Zoulas wrote:


Also, are there other print messages on the console?


The dump is written automatically. Messages on the console are lost. Since 
it's a productive server, I can not change the behavior. The server must 
be available again as soon as possible after a crash.




christos



Re: netbsd-8 crash

2018-08-04 Thread 6bone

On Sat, 4 Aug 2018, Martin Husemann wrote:


Looks like a bug in the ciss driver.

   /* if never got a chance to be done above... */
   if (ccb->ccb_state != CISS_CCB_FREE) {
   KASSERT(error);
   ccb->ccb_err.cmd_stat = CISS_ERR_TMO;
   error = ciss_done(ccb);
   }


Could you try to change that KASSERT(error); into a KASSERTMSG() and
print ccb->ccb_state? When savecore runs, it should log the full panic
message (in case you can not capture it at crash time).



I am unable to rewrite the code. Can you send me a patch?


Thank you for your efforts

Regards
Uwe


netbsd-8 crash

2018-08-04 Thread 6bone

Hello,

With high CPU load netbsd-8 crashes from time to time. The crashinfo is 
not written correctly in every case. Eg crashinfo 8.


-rw---  1 root  wheel  614974272 Aug  3 08:59 netbsd.8.core.gz
-rw---  1 root  wheel  0 Aug  3 08:59 netbsd.8.gz
-rw---  1 root  wheel  755472452 Aug  4 10:28 netbsd.9.core.gz
-rw---  1 root  wheel 756718 Aug  4 10:28 netbsd.9.gz

netbsd.8.gz has 0 bytes, an analysis of netbsd.8.core.gz is not possible:

(gdb) target kvm netbsd.8.core
Cannot access memory at address 0x8001daffdc50


(gdb) target kvm netbsd.9.core
0x80222605 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:707

707 dumpsys();
(gdb) bt
#0  0x80222605 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:707
#1  0x8093f78c in vpanic (fmt=0x810dd3b0 "kernel 
%sassertion \"%s\" failed: file \"%s\", line %d ", 
ap=ap@entry=0x8001daffdcd8)

at /usr/src/sys/kern/subr_prf.c:342
#2  0x80c95445 in kern_assert (fmt=fmt@entry=0x810dd3b0 
"kernel %sassertion \"%s\" failed: file \"%s\", line %d ")

at /usr/src/sys/lib/libkern/kern_assert.c:51
#3  0x8052c021 in ciss_cmd (ccb=0x8001cbed1400, 
flags=flags@entry=1, wait=) at 
/usr/src/sys/dev/ic/ciss.c:633
#4  0x8052c6bb in ciss_ldid (sc=sc@entry=0xfe812c06dc08, 
target=, id=id@entry=0x8001d035a000) at 
/usr/src/sys/dev/ic/ciss.c:912
#5  0x8052cc15 in ciss_ioctl_vol (sc=0xfe812c06dc08, 
bv=bv@entry=0x8001daffddc8) at /usr/src/sys/dev/ic/ciss.c:1417
#6  0x8052ce33 in ciss_sensor_refresh (sme=, 
edata=0xfe812c0cf0d0) at /usr/src/sys/dev/ic/ciss.c:1573
#7  0x8066f975 in sysmon_envsys_refresh_sensor 
(sme=sme@entry=0xfe812c0cf208, edata=edata@entry=0xfe812c0cf0d0)

at /usr/src/sys/dev/sysmon/sysmon_envsys.c:2109
#8  0x80671bd0 in sme_events_worker (wk=0xfe813504be48, 
arg=) at /usr/src/sys/dev/sysmon/sysmon_envsys_events.c:784
#9  0x80946d66 in workqueue_runlist (wq=0xfe8134fbc480, 
wq=0xfe8134fbc480, list=0xfe8134fbc4f0) at 
/usr/src/sys/kern/subr_workqueue.c:106
#10 workqueue_worker (cookie=0xfe8134fbc480) at 
/usr/src/sys/kern/subr_workqueue.c:133

#11 0x80207797 in lwp_trampoline ()
#12 0x in ?? ()


I hope that helps to find the problem.


Thank you for your efforts

Regards
Uwe



Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg drivers (?))

2018-06-03 Thread 6bone

Hello,

I have applied

http://www.netbsd.org/~msaitoh/ixgbe-eitr-20180522-0.dif
and
http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif

to netbsd-8 RC1. With these patches the problem seems to be solved.


Thank you for your efforts

Regards
Uwe


On Fri, 1 Jun 2018, Masanobu SAITOH wrote:


Date: Fri, 1 Jun 2018 12:47:32 +0900
From: Masanobu SAITOH 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: msai...@execsw.org, Martin Husemann ,
current-users@netbsd.org
Subject: Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg
drivers (?))




  The same diff is at:

 http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif



Updated patch (Fix compile error and ixv patch):

--
Don't call ixgbe_rearm_queues() in ixgbe_local_timer1(). 
ixgbe_enable_queue()

and ixgbe_disable_queue() try to enable/disable queue interrupt safely. It
has the internal counter. When a queue's MSI-X is received, ixgbe_msix_que()
is called (IPL_NET). This function disable the queue's interrupt by
ixgbe_disable_queue() and issues an softint. ixgbe_handle() queue is called 
by

the softint (IPL_SOFTNET), process TX,RX and call ixgbe_enable_queue() at the
end.

ixgbe_local_timer1() is a callout and run always on CPU 0 (IPL_SOFTCLOCK).
When ixgbe_rearm_queues() called, an MSI-X interrupt is issued for a specific
queue. It may not CPU 0. If this interrupt's ixgbe_msix_que() is called
and sofint_schedule() is called before the last sofint's softint_execute()
is not called, the softint_schedule() fails because of SOFTINT_PENDING.
It result in breaking ixgbe_{enable,disable}_queue()'s internal counter.

ixgbe_local_timer1() is written not to call ixgbe_rearm_queues() if
the interrupt is disabled, but it's called because of unknown bug or a race.

One solution is to not to use the internal counter, but it's little 
difficult.

Another solution is stop using ixgbe_rearm_queues() at all.  Essentially,
ixgbe_rearm_queues() is not required (it was added in ixgbe.c rev. 1.43
(2016/12/01)). ixgbe_rearm_queues() helps for lost interrupt problem but
I've never seen it other than ixgbe_rearm_queues() problem.


Index: ixgbe.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.c,v
retrieving revision 1.158
diff -u -p -r1.158 ixgbe.c
--- ixgbe.c 30 May 2018 09:17:17 -  1.158
+++ ixgbe.c 1 Jun 2018 03:22:05 -
@@ -4411,6 +4411,7 @@ ixgbe_local_timer1(void *arg)
/* Only truely watchdog if all queues show hung */
if (hung == adapter->num_queues)
goto watchdog;
+#if 0 /* XXX Avoid unexpectedly disabling interrupt forever (PR#53294) */
else if (queues != 0) { /* Force an IRQ on queues with work */
que = adapter->queues;
for (i = 0; i < adapter->num_queues; i++, que++) {
@@ -4421,6 +4422,7 @@ ixgbe_local_timer1(void *arg)
mutex_exit(>dc_mtx);
}
}
+#endif
 out:
callout_reset(>timer, hz, ixgbe_local_timer, adapter);
@@ -6643,7 +6645,7 @@ ixgbe_handle_link(void *context)
/
 * ixgbe_rearm_queues
 /
-static void
+static __inline void
ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
{
u32 mask;
Index: ixv.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixv.c,v
retrieving revision 1.102
diff -u -p -r1.102 ixv.c
--- ixv.c   30 May 2018 08:35:26 -  1.102
+++ ixv.c   1 Jun 2018 03:22:05 -
@@ -1266,9 +1266,11 @@ ixv_local_timer_locked(void *arg)
/* Only truly watchdog if all queues show hung */
if (hung == adapter->num_queues)
goto watchdog;
+#if 0
else if (queues != 0) { /* Force an IRQ on queues with work */
ixv_rearm_queues(adapter, queues);
}
+#endif
callout_reset(>timer, hz, ixv_local_timer, adapter);
--

The same diff is at:

http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180531-0.dif

--
---
   SAITOH Masanobu (msai...@execsw.org
msai...@netbsd.org)


Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg drivers (?))

2018-05-29 Thread 6bone

Hello,

I have tested the the patch with netbsd-8. The problem is not solved.


Regards
Uwe


On Mon, 28 May 2018, Masanobu SAITOH wrote:


Date: Mon, 28 May 2018 17:10:02 +0900
From: Masanobu SAITOH 
To: Martin Husemann ,
6b...@6bone.informatik.uni-leipzig.de, current-users@netbsd.org
Cc: msai...@execsw.org
Subject: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg drivers
 (?))

On 2018/05/28 16:51, Martin Husemann wrote:
On Mon, May 28, 2018 at 09:46:21AM +0200, 
6b...@6bone.informatik.uni-leipzig.de wrote:

Hello,

At the weekend I tried to update to a current version of netbsd-8 rc1.

After the restart, the kernel will work for a few hours. After that, no
packets will arrive at the network card.



Please try the following patch who are using ixg(4) on netbsd-8 or -current:

http://www.netbsd.org/~msaitoh/ixgbe-eitr-20180522-0.dif

This change might fix RX stall problem. If you got TX device timeout or
RX stall,  please report with the output of:

sysctl hw |grep ixg


Regards.




The server is running normally. No
hints in dmesg.

Some network programs report issues:

zebra[371]: rtm_write: write : No buffer space available (55)

syslogd[541]: recvfrom() unix `/var/run/log': No buffer space available

gate zebra[1423]: routing socket error: No buffer space available


You are seeing two different issues here. The "No buffer space" is 
considered

harmless (it used to be silent, but the lossage should be the same).

The ixg(4) stops receiving packets is under investigation, RC2 is waiting
for a proposed patch being tested.

Martin




--
---
   SAITOH Masanobu (msai...@execsw.org
msai...@netbsd.org)



Problems with netbsd-8 RC1 and ixg drivers (?)

2018-05-28 Thread 6bone

Hello,

At the weekend I tried to update to a current version of netbsd-8 rc1.

After the restart, the kernel will work for a few hours. After that, no 
packets will arrive at the network card. The server is running normally. 
No hints in dmesg.


Some network programs report issues:

zebra[371]: rtm_write: write : No buffer space available (55)

syslogd[541]: recvfrom() unix `/var/run/log': No buffer space available

gate zebra[1423]: routing socket error: No buffer space available


Hardware is an HP G5 with an Intel PRO/10GbE NIC.


After the downgrade to the old kernel from Fri Feb 16, everything is 
stable again.


Any ideas what the problem could be? Can I test any things to narrow down 
the problem?



Thank you for your efforts


Regards
Uwe


Re: nfs client issue at netbsd-8

2018-04-16 Thread 6bone

On Wed, 28 Mar 2018, Manuel Bouyer wrote:


On Wed, Mar 28, 2018 at 06:23:56PM +0200, 6b...@6bone.informatik.uni-leipzig.de 
wrote:

I changed from 1.88.2.10 to 1.88.2.13. The problem was not solved. I have
also tested the onboeard network card. After a few hours, the problem also
occurred here. I still think it's a nfs problem.

It seems like the whole system is blocked. Not just the network. If no nfs
drives are mounted, the problems will not occur.

Any ideas how to narrow down the cause?


Can you enter ddb ? If so, can you see is some soft interrupt thread is
blocked ?


The moment the system hangs, it is not possible to start the ddb. Once the 
system keeps running, you can enter the ddb without any problems.




I also use NFS on a netbsd-8 host, without problems.

--
Manuel Bouyer 
NetBSD: 26 ans d'experience feront toujours la difference
--



Regards
Uwe



Re: nfs client issue at netbsd-8

2018-03-28 Thread 6bone



On Fri, 23 Mar 2018, SAITOH Masanobu wrote:


How old is your kernel?

If your kernel's ixgbe.c is older than 1.88.2.13 please update the latest
netbsd-8 and try. 1.88.2.13 (and 1.88.2.10) fixed serious interrupt problem.



I changed from 1.88.2.10 to 1.88.2.13. The problem was not solved. I have 
also tested the onboeard network card. After a few hours, the problem also 
occurred here. I still think it's a nfs problem.


It seems like the whole system is blocked. Not just the network. If no nfs 
drives are mounted, the problems will not occur.


Any ideas how to narrow down the cause?

Many thanks for you efforts

Regards
Uwe



Re: nfs client issue at netbsd-8

2018-03-20 Thread 6bone


On Tue, 20 Mar 2018, Martin Husemann wrote:


Which network driver is used on the client?


ixg0 at pci8 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, 
Version - 3.2.12-k
ixg0: device 82599EB
ixg0: ETrackID 830d
ixg0: for TX/RX, interrupting at msix0 vec 0, bound queue 0 to cpu 0
ixg0: for TX/RX, interrupting at msix0 vec 1, bound queue 1 to cpu 1
ixg0: for TX/RX, interrupting at msix0 vec 2, bound queue 2 to cpu 2
ixg0: for TX/RX, interrupting at msix0 vec 3, bound queue 3 to cpu 3
ixg0: for link, interrupting at msix0 vec 4, affinity to cpu 0
ixg0: Using MSI-X interrupts with 5 vectors
ixg0: PCI Express Bus: Speed 2.5GT/s Width x4
ixg0: PCI-Express bandwidth available for this card
 is not sufficient for optimal performance.
ixg0: For optimal performance a x8 PCIE, or x4 PCIE Gen2 slot is required.
ixg0: feature cap 0x1780
ixg0: feature ena 0x400



Martin



Uwe



nfs client issue at netbsd-8

2018-03-20 Thread 6bone

Hello,

I have problems with the nfs-client under netbsd-8. In principle, the nfs 
works. Some time after an nfs share is mounted, there are network breaks 
on the client. The following ping from the outside on the client shows the 
phenomenon.



...
64 bytes from 139.18.xx.yy: icmp_seq=6496 ttl=254 time=0.175 ms
64 bytes from 139.18.xx.yy: icmp_seq=6497 ttl=254 time=0.178 ms
64 bytes from 139.18.xx.yy: icmp_seq=6498 ttl=254 time=0.202 ms
64 bytes from 139.18.xx.yy: icmp_seq=6499 ttl=254 time=0.179 ms
64 bytes from 139.18.xx.yy: icmp_seq=6500 ttl=254 time=0.161 ms
64 bytes from 139.18.xx.yy: icmp_seq=6501 ttl=254 time=3126 ms
64 bytes from 139.18.xx.yy: icmp_seq=6502 ttl=254 time=2126 ms
64 bytes from 139.18.xx.yy: icmp_seq=6503 ttl=254 time=1126 ms
64 bytes from 139.18.xx.yy: icmp_seq=6504 ttl=254 time=126 ms
64 bytes from 139.18.xx.yy: icmp_seq=6505 ttl=254 time=0.177 ms
64 bytes from 139.18.xx.yy: icmp_seq=6506 ttl=254 time=0.182 ms
64 bytes from 139.18.xx.yy: icmp_seq=6507 ttl=254 time=0.098 ms
...

At the time of ping, no user process is running on the nfs client. There 
is no disk or network access. The behavior is repeated within a few 
minutes. Within a few days, the intervals of occurrence will be shorter. 
Over time, the CPU usage of the system process continues to increase. See the 
following top.


load averages:  0.00,  0.01,  0.00;   up 12+10:41:47 
35 processes: 1 runnable, 32 sleeping, 2 on CPU

CPU0 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU3 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Memory: 3697M Act, 308K Inact, 11M Wired, 36M Exec, 3584M File, 35G Free
Swap: 59G Total, 59G Free

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
0 root   00 0K  708M CPU/3 62:17  0.00% 65.89% [system]
  975 root  85048M 2456K kqueue/0   0:33  0.00%  0.00% master
  191 root  85023M 2068K kqueue/1   0:16  0.00%  0.00% syslogd
  364 root  85023M   10M pause/20:03  0.00%  0.00% ntpd
  964 postfix   85049M 3880K kqueue/1   0:03  0.00%  0.00% qmgr

After umounting the nfs-share the problems disappear. A reboot is not 
necessary.


The nfs server is a netapp. The nfs share has a size of 10TB (usage 5.7TB, 
3600307 files, 92262 directories).


The mount-options are: 
userquota,nosuid,rw,tcp,soft,intr,-x15,rdirplus,rsize=65536,-I65536,readahead=4


Are there any ideas which mount option can be the problem? Is it possible 
to eliminate the problem with a configuration change or is it a bug?


Thank you for your efforts

Best Regards
Uwe


NetBSD 8.0_BETA / snmpd

2018-01-12 Thread 6bone

Hello,

I'm running NetBSD 8.0_BETA with the kernel from the end of August. 
There's no problem. Today I compiled the 8.0_BETA kernel from the current 
CVS resources. The kernel is running, but the snmpd does not work anymore.


The snmpd process is running, netstat shows a listen on port 161.

udp0  0  *.161  *.*

tcpdump shows the request but the snmpd does not send a response. There is 
no crash and no entry in the logfiles. After booting with the old kernel 
everything works again.


Does anyone have an idea which settings must be changed so that the snmpd 
runs with the current 8.0_BETA kernel?




Thank you for your efforts


Regards
Uwe



Re: netbsd-8 crash in ixg driver during booting

2017-11-15 Thread 6bone

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


This problem is different from ixg(4)'s problem. I'll now
working to fix this softint related problem.

This problem is caused by some devices which uses a lot of
softint, could you tell me the machine's spec? e.g.:

number of port of wm(4) and/or ixg(4)
number of nvme(4)
etc.


Hello,

the hardware is an HP G5 with two dual-port Intel 10GE network cards. The 
server is used as a router.


Here is the dmesg output from the netbsd-8 kernel.

Thank you for your efforts

Regards
Uwe


Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 8.0_BETA (MYCONF8.gdb) #0: Mon Aug 28 22:51:59 CEST 2017

r...@gate.ipv6.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF8.gdb
total memory = 24565 MB
avail memory = 23830 MB
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
running cgd selftest aes-xts-256 aes-xts-512 done
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
HP ProLiant DL380 G7
mainbus0 (root)
ACPI: RSDP 0x000F4F00 24 (v02 HP)
ACPI: XSDT 0xDF630340 BC (v01 HP ProLiant 0002 ??   
162E)
ACPI: FACP 0xDF630440 F4 (v03 HP ProLiant 0002 ??   
162E)
ACPI BIOS Warning (bug): Invalid length for FADT/Pm1aControlBlock: 32, using 
default 16 (20170303/tbfadt-745)
ACPI BIOS Warning (bug): Invalid length for FADT/Pm2ControlBlock: 32, using 
default 8 (20170303/tbfadt-745)
ACPI: DSDT 0xDF630540 0020BD (v01 HP DSDT 0001 INTL 
20030228)
ACPI: FACS 0xDF62F100 40
ACPI: FACS 0xDF62F100 40
ACPI: SPCR 0xDF62F140 50 (v01 HP SPCRRBSU 0001 ??   
162E)
ACPI: MCFG 0xDF62F1C0 3C (v01 HP ProLiant 0001  
)
ACPI: HPET 0xDF62F200 38 (v01 HP ProLiant 0002 ??   
162E)
ACPI:  0xDF62F240 64 (v02 HP ProLiant 0002 ??   
162E)
ACPI: SPMI 0xDF62F2C0 40 (v05 HP ProLiant 0001 ??   
162E)
ACPI: ERST 0xDF62F300 0001D0 (v01 HP ProLiant 0001 ??   
162E)
ACPI: APIC 0xDF62F500 00015E (v01 HP ProLiant 0002  
)
ACPI: SRAT 0xDF62F680 000570 (v01 HP Proliant 0001 ??   
162E)
ACPI:  0xDF62FC00 000176 (v01 HP ProLiant 0001 ??   
162E)
ACPI: BERT 0xDF62FD80 30 (v01 HP ProLiant 0001 ??   
162E)
ACPI: HEST 0xDF62FDC0 BC (v01 HP ProLiant 0001 ??   
162E)
ACPI: DMAR 0xDF62FE80 00017C (v01 HP ProLiant 0001 ??   
162E)
ACPI: SSDT 0xDF632600 000125 (v03 HP CRSPCI0  0002 HP   
0001)
ACPI: SSDT 0xDF632740 000255 (v03 HP riser1a  0002 INTL 
20061109)
ACPI: SSDT 0xDF6329C0 00025D (v03 HP riser2a  0002 INTL 
20061109)
ACPI: SSDT 0xDF632C40 0003BB (v01 HP pcc  0001 INTL 
20090625)
ACPI: SSDT 0xDF633000 000377 (v01 HP pmab 0001 INTL 
20090625)
ACPI: SSDT 0xDF633380 002B64 (v01 INTEL  PPM RCM  0001 INTL 
20061109)
ACPI: 7 ACPI AML tables successfully acquired and loaded
ioapic0 at mainbus0 apid 8: pa 0xfec0, version 0x20, 24 pins
ioapic1 at mainbus0 apid 0: pa 0xfec8, version 0x20, 24 pins
cpu0 at mainbus0 apid 0
cpu0: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu0: package 0, core 0, smt 0
cpu1 at mainbus0 apid 32
cpu1: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu1: package 1, core 0, smt 0
cpu2 at mainbus0 apid 20
cpu2: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu2: package 0, core 10, smt 0
cpu3 at mainbus0 apid 52
cpu3: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu3: package 1, core 10, smt 0
cpu4 at mainbus0 apid 2
cpu4: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu4: package 0, core 1, smt 0
cpu5 at mainbus0 apid 34
cpu5: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu5: package 1, core 1, smt 0
cpu6 at mainbus0 apid 18
cpu6: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu6: package 0, core 9, smt 0
cpu7 at mainbus0 apid 50
cpu7: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu7: package 1, core 9, smt 0
cpu8 at mainbus0 apid 1
cpu8: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu8: package 0, core 0, smt 1
cpu9 at mainbus0 apid 33
cpu9: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu9: package 1, core 0, smt 1
cpu10 at mainbus0 apid 21
cpu10: Intel(R) Xeon(R) CPU   E5630  @ 2.53GHz, id 0x206c2
cpu10: package 0, core 10, smt 1
cpu11 at mainbus0 apid 53
cpu11: Intel(R) Xeon(R) CPU   E5630  @ 

Re: netbsd-8 crash in ixg driver during booting

2017-11-14 Thread 6bone


Does your machine boot with the latest -current?


I have tested the current sources from tonignt.

https://suse.uni-leipzig.de/crash/crash-current1.jpg
https://suse.uni-leipzig.de/crash/crash-current2.jpg

Regards
Uwe


Re: netbsd-8 crash in ixg driver during booting

2017-11-13 Thread 6bone

On Sun, 12 Nov 2017, SAITOH Masanobu wrote:

Hello,

I checked out the current-cvs-source from this morning. I can't compile it 
because of an error:


--- streambuf-inst.o ---
#   compile  libstdc++-v3/streambuf-inst.o
/usr/src/obj/tooldir.NetBSD-8.0_BETA-amd64/bin/x86_64--netbsd-c++ 
-frandom-seed=fd5fac20 -O2 -Wall -Wpointer-arith -Wno-sign-compare 
-Wsystem-headers -Wa,--fatal-warnings -Werror -fPIE 
-fno-implicit-templates -fdiagnostics-show-location=once 
--sysroot=/usr/src/obj/destdir.amd64 
-I/usr/src/external/gpl3/gcc.old/dist/gcc 
-I/usr/src/external/gpl3/gcc.old/dist/include 
-I/usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++ 
-I/usr/src/external/gpl3/gcc.old/dist/libgcc 
-I/usr/src/external/gpl3/gcc.old/lib/libstdc++-v3/../libstdc++-v3/arch/x86_64 
-I. -DHAVE_STDLIB_H -DHAVE_STRING_H 
-I/usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/include 
-I/usr/src/external/gpl3/gcc.old/lib/libstdc++-v3/arch/x86_64 
-D_GLIBCXX_SHARED -DGTHREAD_USE_WEAK -DSUPPORTS_WEAK  -c -std=gnu++11 
/usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/src/c++11/streambuf-inst.cc 
-o streambuf-inst.o

--- random.o ---
*** [random.o] Error code 1

nbmake[6]: stopped in /usr/src/external/gpl3/gcc.old/lib/libstdc++-v3
--- iostream-inst.o ---


Regards
Uwe




Does your machine boot with the latest -current?
If it boots, could you show the dmesg output with the
following patch?

http://www.netbsd.org/~msaitoh/ixgbe-current-20171112-0.dif

And, if you can, please test for netbsd-8 using with
the following patch and how the dmesg output:

http://www.netbsd.org/~msaitoh/ixgbe-n8-20171112-0.dif


Thanks in advance.





https://suse.uni-leipzig.de/crash/crash2.jpg
https://suse.uni-leipzig.de/crash/crash3.jpg

My old kernel from August 2017 did not have the problem yet.

Can someone take a look at the problem?

Thank you for your Efforts

Regards
Uwe







--
---
   SAITOH Masanobu (msai...@execsw.org
msai...@netbsd.org)



netbsd-8 crash in ixg driver during booting

2017-11-09 Thread 6bone
the current version of netbsd-8 crashes while booting during the 
initialization of the network driver.


https://suse.uni-leipzig.de/crash/crash1.jpg
https://suse.uni-leipzig.de/crash/crash2.jpg
https://suse.uni-leipzig.de/crash/crash3.jpg

My old kernel from August 2017 did not have the problem yet.

Can someone take a look at the problem?

Thank you for your Efforts

Regards
Uwe


Re: problems with vlan interface counters (NetBSD 8.0_BETA)

2017-08-30 Thread 6bone

Hello,

I was not in the office for three weeks and can only answer today.

I have tested the netbsd-8 sources of yesterday. The problem is solved.


Thank you for your efforts

Regards
Uwe


On Wed, 9 Aug 2017, Kengo NAKAHARA wrote:


Date: Wed, 9 Aug 2017 15:36:15 +0900
From: Kengo NAKAHARA 
To: s.ymgch...@gmail.com, 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: Re: problems with vlan interface counters (NetBSD 8.0_BETA)

Hi,

On 2017/08/03 18:13, s ymgch wrote:

The problem was happened in vlan mp-ify.
I fixed this problem by the following patch in my environment.

Could you apply the patch and check it?

Regards,
s-yamaguchi@IIJ

 patch 
diff --git a/sys/net/if_vlan.c b/sys/net/if_vlan.c
index 531a2f5..a4ea6e1 100644
--- a/sys/net/if_vlan.c
+++ b/sys/net/if_vlan.c
@@ -1451,10 +1451,13 @@ vlan_transmit(struct ifnet *ifp, struct mbuf *m)
/* mbuf is already freed */
ifp->if_oerrors++;
} else {
+   size_t pktlen = m->m_pkthdr.len;
+   bool mcast = (m->m_flags & M_MCAST) != 0;
+
ifp->if_opackets++;
-   /*
-* obytes is incremented at ether_output() or bridge_enqueue().
-*/
+   ifp->if_obytes += pktlen;
+   if (mcast)
+   ifp->if_omcasts++;
}

 out:

2017-07-28 17:10 GMT+09:00  <6b...@6bone.informatik.uni-leipzig.de>:

Hello,

The interface counters of vlan interface do not count:

bash-4.4# ifconfig -v vlan8
vlan8: flags=0x8843 mtu 1500
capabilities=7ff80
capabilities=7ff80
capabilities=7ff80
enabled=0
vlan: 8 parent: ixg0
address: a0:36:9f:d4:3c:08
input: 1966263 packets, 273676300 bytes, 66058 multicasts
output: 1238957 packets, 0 bytes
inet6 fe80::a236:9fff:fed4:3c08%vlan8/64 flags 0x0 scopeid 0x1a
inet6 ::::::: flags 0x0

The output byte counter shows 0. With netbsd-7 all worked fine.

So it is not longer possible to record traffic data via snmp.

I test s-yamaguchi's patch and minor fixes which I received locally.
# He is my co-worker :)
The patch fixes the problem correctly, so I commit that as if_vlan.c:r1.99.

I will send pullup request to netbsd-8 branch. Could you retry after
pulled up?


Thanks,

--
//
Internet Initiative Japan Inc.

Device Engineering Section,
IoT Platform Development Department,
Network Division,
Technology Unit

Kengo NAKAHARA 



problems with vlan interface counters (NetBSD 8.0_BETA)

2017-07-28 Thread 6bone

Hello,

The interface counters of vlan interface do not count:

bash-4.4# ifconfig -v vlan8
vlan8: flags=0x8843 mtu 1500
capabilities=7ff80
capabilities=7ff80
capabilities=7ff80
enabled=0
vlan: 8 parent: ixg0
address: a0:36:9f:d4:3c:08
input: 1966263 packets, 273676300 bytes, 66058 multicasts
output: 1238957 packets, 0 bytes
inet6 fe80::a236:9fff:fed4:3c08%vlan8/64 flags 0x0 scopeid 0x1a
inet6 ::::::: flags 0x0

The output byte counter shows 0. With netbsd-7 all worked fine.

So it is not longer possible to record traffic data via snmp.


Regards
Uwe


bad counter for ixg* interfaces

2017-04-28 Thread 6bone

Hello,

ifconfig -v ixg0 shows:

ixg0: flags=8843 mtu 1500
capabilities=fff80
capabilities=fff80
capabilities=fff80
enabled=0
ec_capabilities=7
ec_enabled=7
address: a0:36:9f:d4:3c:08
media: Ethernet autoselect (10GbaseSR full-duplex,rxpause,txpause)
status: active
input: 22429456 packets, 12404771626 bytes, 132266 multicasts
output: 10573554 packets, 0 bytes
...


The outgoing packets are counted correctly. The outgoing bytes remain at 
0.


Kernel version: NetBSD 7.99.70. Older Versions have the same problem.

Programs like the snmpd which the counter evaluate provide wrong data.

Maybe someone can take a look at the code.



Thank you for your efforts


Regards
Uwe



Re: npf bug(?)

2017-04-13 Thread 6bone

On Mon, 10 Apr 2017, Christos Zoulas wrote:


Npf just calls "error = ip_reass_packet(mp, ip)" and if that fails
it increments NPF_STAT_REASSFAIL. Since it uses the same exact
call the regular ip stack uses, I would expect that:

   NPF_STAT_REASSFAIL = IP_STAT_BADFRAGS + IP_STAT_RCVMEMDROP;


From your stats I see (without npf):


1795 fragments received

 335 malformed fragments dropped
 440 fragments dropped after timeout
 636 packets reassembled ok
  ---
 1411 what happened to the rest?

I guess we should look at the code to figure out if we are not printing
some, or we and not updating the status for some. Nevertheless, the stack
seems to be able to reassemble a bit more 1/3 of the fragments, while 2/3
of them are dropped. What's the ratio with npf?



Sorry for the long response time. I could not restart the router. Now runs 
the kernel without the patch. I have made sure that all counter at 0 are 
started.


After a few hours the statistics look as follows:

npf:

Fragmentation:
870 fragments
774 reassembled
102 failed reassembly

netstat:
1644 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped (out of ipqent)
102 malformed fragments dropped
33 fragments dropped after timeout
774 packets reassembled ok

There is again a difference between the npf counters and the counters of 
the IP stack.



Regards
Uwe


Re: npf bug(?)

2017-04-09 Thread 6bone

On Sun, 9 Apr 2017, Christos Zoulas wrote:


Perhaps you get a lot of dup fragments? netstat -s should show you the
stack's reassembly and fragment stats. Perhaps those agree with what
npf shows?


Currently the patch is active. That's why I have no npf statistics. The 
netstat statistics seem to me credible.


If npf checks the fragmentation, then the counters of npf and the ip stack 
run parallel? Or are the ip stack only counted the packets the npf leaves?


netstat -s shows:

ip:
413339977 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with length > max ip packet size
0 with header length < data size
0 with data length < header length
0 with bad options
0 with incorrect version number
1795 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped (out of ipqent)
335 malformed fragments dropped
440 fragments dropped after timeout
636 packets reassembled ok
410154493 packets for this host
35 packets for unknown/unsupported protocol
10 packets forwarded (6 packets fast forwarded)
3183945 packets not forwardable
0 redirects sent
0 packets no matching gif found
218900862 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
0 output packets discarded due to no route
31819922 output datagrams fragmented
33505129 fragments created
0 datagrams that can't be fragmented
0 datagrams with bad address in header


Regards
Uwe



Re: npf bug(?)

2017-04-09 Thread 6bone

On Thu, 6 Apr 2017, Christos Zoulas wrote:



| Thanks for the patch. For me it works very well.

Well, the question is do you need it? I.e. why don't you let the v4
traffic flow through npf? Is it a performance issue or doesn't npf
do reassembly correctly?


I'm not sure if the reassembling works properly. After a few hours the 
statistics showed over 7000 failed reassembly but only 1296 fragments 
with 1104 reassembled. I find these numbers unusual. I also know no way to 
check if the reassembly works correctly. So I would like to disable packet 
reassembly.




| Is this a special solution just for me or will the patch be part of the
| current kernel.

Special, but we can consider adding the functionality in a more general
way if it is needed.



Regards
Uwe



Re: npf bug(?)

2017-04-06 Thread 6bone

On Mon, 3 Apr 2017, Christos Zoulas wrote:



Here's a rough patch that kills v4 processing.

christos



Thanks for the patch. For me it works very well.

Is this a special solution just for me or will the patch be part of the 
current kernel.



Regards
Uwe


Re: npf bug(?)

2017-04-02 Thread 6bone

On Sun, 2 Apr 2017, Christos Zoulas wrote:



I am trying to understand the use case here:
1. you want to have V4 DNS and 6to4 service that can generate V4 fragments
2. you want V4 fragments dropped.
3. you can't put V4 rules in your firewall to restrict traffic to only
  those services.

Is that correct?


That is not completely right. I want to filter IPv6 with npf. IPv4 should 
not be filtered. After the activation of npf the statistics shows:


Fragmentation:
1296 fragments
1104 reassembled
7160 failed reassembly

Since IPv6 is no longer reassambling, it must be IPv4 packets. I want to 
make sure that the reassembly errors do not lead to packet losses, 
especially at 6to4.



Regards
Uwe


Re: reproducible kernel crash in NetBSD 7.1_RC1

2017-01-24 Thread 6bone

On Mon, 23 Jan 2017, Christos Zoulas wrote:


Date: Mon, 23 Jan 2017 19:35:06 + (UTC)
From: Christos Zoulas 
To: current-users@netbsd.org
Subject: Re: reproducible kernel crash in NetBSD 7.1_RC1

I think that the vlan creation/removal code is racy even under /current.
There was some discussion recently about it.

I have unfortunately not heard the discussion. Is it planned to solve the 
problem in the near future?

christos


Regards
Uwe


Re: configure vlan-if jumbo mtu crashs kernel

2016-11-19 Thread 6bone

On Fri, 18 Nov 2016, Joerg Sonnenberger wrote:


Date: Fri, 18 Nov 2016 16:26:49 +0100
From: Joerg Sonnenberger 
To: current-users@netbsd.org
Subject: Re: configure vlan-if jumbo mtu crashs kernel

On Fri, Nov 18, 2016 at 02:35:03PM +0100, 6b...@6bone.informatik.uni-leipzig.de 
wrote:

in some networks we are working with jumbo mtu's. If I configure the mtu to
the native interface all works fine.

ifconfig ixg1 mtu 9000 (no problem)

but

ifconfig vlan850 ip4csum tcp4csum udp4csum tcp6csum udp6csum ip4csum-tx
ip4csum-rx tcp4csum-tx tcp4csum-rx udp4csum-tx udp4csum-rx tcp6csum-tx
tcp6csum-rx udp6csum-tx udp6csum-rx tso4 tso6 mtu 9000


The attached patch should prevent the crash, does it?


Yes, the patch solves the problem. Can you commit the patch into the 
netbsd-7 kernel and (if necessary) into -current.



Thank you for your efforts

Regards
Uwe



Joerg



configure vlan-if jumbo mtu crashs kernel

2016-11-18 Thread 6bone

hello,

in some networks we are working with jumbo mtu's. If I configure the mtu 
to the native interface all works fine.


ifconfig ixg1 mtu 9000 (no problem)

but

ifconfig vlan850 ip4csum tcp4csum udp4csum tcp6csum udp6csum ip4csum-tx 
ip4csum-rx tcp4csum-tx tcp4csum-rx udp4csum-tx udp4csum-rx tcp6csum-tx 
tcp6csum-rx udp6csum-tx udp6csum-rx tso4 tso6 mtu 9000


results in a kernel crash (netbsd-7):

#0  0x806867df in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:671
#1  0x808cc214 in vpanic (fmt=fmt@entry=0x80dd44dd "trap",
ap=ap@entry=0xfe815af29a28) at /usr/src/sys/kern/subr_prf.c:340
#2  0x808cc2cf in panic (fmt=fmt@entry=0x80dd44dd "trap")
at /usr/src/sys/kern/subr_prf.c:256
#3  0x8091b531 in trap (frame=0xfe815af29b30)
at /usr/src/sys/arch/amd64/amd64/trap.c:298
#4  0x80100fde in alltraps ()
#5  0x804c62fb in vlan_ioctl (ifp=0xfe8a70fc2410, cmd=2149607797,
data=0xfe815af29df8) at /usr/src/sys/net/if_vlan.c:553
#6  0x803e88f5 in doifioctl (so=0xfe8bf6dfadb0, cmd=2149607797,
data=, l=0xfe88032ca560) at /usr/src/sys/net/if.c:1934
#7  0x808e9976 in soo_ioctl (fp=, cmd=2149607797,
data=0xfe815af29df8) at /usr/src/sys/kern/sys_socket.c:202
#8  0x808deb79 in sys_ioctl (l=,
uap=0xfe815af29f00, retval=)
at /usr/src/sys/kern/sys_generic.c:681
#9  0x808e9a7a in sy_call (rval=0xfe815af29eb8,
uap=0xfe815af29f00, l=0xfe88032ca560,
sy=0x810d2ba0 ) at /usr/src/sys/sys/syscallvar.h:61
#10 sy_invoke (code=54, rval=0xfe815af29eb8, uap=0xfe815af29f00,
l=0xfe88032ca560, sy=0x810d2ba0 )
at /usr/src/sys/sys/syscallvar.h:85
#11 syscall (frame=0xfe815af29f00)
at /usr/src/sys/arch/x86/x86/syscall.c:156
#12 0x80100691 in Xsyscall ()


Is is possible to get a patch so that I can use jumbo-frames with vlan 
interfaces?




Thank you for your efforts


Regards
Uwe



multipath fibre channel

2016-08-10 Thread 6bone

Hello,

I want to configure multipath for a fibre channel storage. I need only the 
availability, not the performance. For netbsd I have not found any 
documentation on this subject. Is multipath possible for FC storages? If 
not, it is possible / useful fo use a software raid1 over both paths?



Regards
Uwe


Re: nfs client kernel crash

2016-06-20 Thread 6bone

On Fri, 17 Jun 2016, Christos Zoulas wrote:


| I am mounting with: nosuid,userquota,rw,tcp,soft,intr
|
| Is this a problem?

Yes :-), I'd get rid of "soft" first.

christos


I changed the nfs options to: userquota,nosuid,rw,tcp,bg

The message is still there.


Regards
Uwe



Re: nfs client kernel crash

2016-06-17 Thread 6bone

On Tue, 14 Jun 2016, Christos Zoulas wrote:


On Jun 14,  2:07pm, 6b...@6bone.informatik.uni-leipzig.de 
(6b...@6bone.informatik.uni-leipzig.de) wrote:
-- Subject: Re: nfs client kernel crash

| On Tue, 14 Jun 2016, Christos Zoulas wrote:
|
| > Yes, that helps. No crash though?
|
| So far, no crash. But this often takes several days.

Ok. Let's take a look at the EINTR issue then.

christos

First the good news. The patch seems to help! Since applying the patch I 
could not see any more crash. Can you apply the patch to current and 
netbsd-7?



The second Problem are the

'nfs server 172.18.86.13:/vol/vol_bsd_2: not responding'

messages. In some cases occurs an nfs i/o error at the same time. 
Unfortunately the printf statement is not executed, so the problem must 
occur at a different location.



Regards
Uwe



Re: nfs client kernel crash

2016-06-14 Thread 6bone

On Tue, 14 Jun 2016, Christos Zoulas wrote:


Yes, that helps. No crash though?


So far, no crash. But this often takes several days.


Uwe



Re: nfs client kernel crash

2016-06-14 Thread 6bone

On Tue, 14 Jun 2016, 6b...@6bone.informatik.uni-leipzig.de wrote:

I applied the patch. For testing I have started a "rm -rf" on a large 
directory tree. dmesg reports:


nfs server 172.18.86.13:/vol/vol_bsd_2: is alive again
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: is alive again
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: is alive again
...



If it helps: 'rm -rf' reports randomly "Interrupted system call"


Regards
Uwe


Re: nfs client kernel crash

2016-06-14 Thread 6bone

On Mon, 13 Jun 2016, Christos Zoulas wrote:


Can you try this? The first one might not apply cleanly since I changed
the loop, but it should work just the same if you put the spl stuff around
the old loop.


I applied the patch. For testing I have started a "rm -rf" on a large 
directory tree. dmesg reports:


nfs server 172.18.86.13:/vol/vol_bsd_2: is alive again
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: is alive again
nfs server 172.18.86.13:/vol/vol_bsd_2: not responding
nfs server 172.18.86.13:/vol/vol_bsd_2: is alive again
...

The behavior already exists prior apply the patch. I think that the 
crash will also reoccur.


The storage is a netapp nfs volume. The volume is only mounted at my 
server. On the storage there are no load problems.



Regards
Uwe


Re: nfs client kernel crash

2016-06-13 Thread 6bone

On Sat, 4 Jun 2016, Christos Zoulas wrote:


| The PR/50432 may describe the same problem.

Thanks!

christos



A few weeks ago I changed the storage of our netbsd mirror to nfs. Since 
the change, the mirror crashes regularly. There are also other problems 
with the nfs. So the mirror is not usable at the moment.


Can you have a look at the problem? Perhaps you can find a quick 
workaround.



Thank you for your efforts

Regards
Uwe


Re: nfs client kernel crash

2016-06-04 Thread 6bone

On Thu, 2 Jun 2016, Christos Zoulas wrote:


| The PR is for netbsd-5 and is some years old. Do it make sense to create a
| new bug report for netbsd-7?

Sure, and close 40491 saying superceded by the new one.


I have opened a new bug-report (kern/51215). The PR kern/40491 is already 
closed.


The PR/50432 may describe the same problem.


Regards
Uwe


Re: nfs client kernel crash

2016-06-01 Thread 6bone

On Thu, 2 Jun 2016, Christos Zoulas wrote:


Date: Thu, 2 Jun 2016 00:07:48 + (UTC)
From: Christos Zoulas 
To: current-users@netbsd.org
Subject: Re: nfs client kernel crash

In article ,
<6b...@6bone.informatik.uni-leipzig.de> wrote:

hello

in rare cases, under high load the nfs client crashs:

0x8068630f in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at
/usr/src/sys/arch/amd64/amd64/machdep.c:671
671 dumpsys();
(gdb) bt
#0  0x8068630f in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at
/usr/src/sys/arch/amd64/amd64/machdep.c:671
#1  0x808cbce4 in vpanic (fmt=fmt@entry=0x80dd3d3d "trap",
ap=ap@entry=0xfe8157920d18) at /usr/src/sys/kern/subr_prf.c:340
#2  0x808cbd9f in panic (fmt=fmt@entry=0x80dd3d3d "trap")
at /usr/src/sys/kern/subr_prf.c:256
#3  0x8091b001 in trap (frame=0xfe8157920e20)
at /usr/src/sys/arch/amd64/amd64/trap.c:298
#4  0x80100fde in alltraps ()
#5  0x8070605c in nfs_timer (arg=)
at /usr/src/sys/nfs/nfs_socket.c:770
#6  0x80622022 in callout_softclock (v=)
at /usr/src/sys/kern/kern_timeout.c:736
#7  0x80616bb8 in softint_execute (l=, s=2,
si=0x80035866a0c0) at /usr/src/sys/kern/kern_softint.c:589
#8  softint_dispatch (pinned=, s=2)
at /usr/src/sys/kern/kern_softint.c:871
#9  0x8011412f in Xsoftintr ()

kernel: NetBSD 7.0_STABLE

Any ideas what could be the problem?


See PR/40491. I don't think that the mutex there is correct. It seems
that in other places the nfs_reqq is protected with splsoftnet().

The PR is for netbsd-5 and is some years old. Do it make sense to create a 
new bug report for netbsd-7?



Regards
Uwe


nfs client kernel crash

2016-06-01 Thread 6bone

hello

in rare cases, under high load the nfs client crashs:

0x8068630f in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at 
/usr/src/sys/arch/amd64/amd64/machdep.c:671

671 dumpsys();
(gdb) bt
#0  0x8068630f in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at 
/usr/src/sys/arch/amd64/amd64/machdep.c:671

#1  0x808cbce4 in vpanic (fmt=fmt@entry=0x80dd3d3d "trap",
ap=ap@entry=0xfe8157920d18) at /usr/src/sys/kern/subr_prf.c:340
#2  0x808cbd9f in panic (fmt=fmt@entry=0x80dd3d3d "trap")
at /usr/src/sys/kern/subr_prf.c:256
#3  0x8091b001 in trap (frame=0xfe8157920e20)
at /usr/src/sys/arch/amd64/amd64/trap.c:298
#4  0x80100fde in alltraps ()
#5  0x8070605c in nfs_timer (arg=)
at /usr/src/sys/nfs/nfs_socket.c:770
#6  0x80622022 in callout_softclock (v=)
at /usr/src/sys/kern/kern_timeout.c:736
#7  0x80616bb8 in softint_execute (l=, s=2,
si=0x80035866a0c0) at /usr/src/sys/kern/kern_softint.c:589
#8  softint_dispatch (pinned=, s=2)
at /usr/src/sys/kern/kern_softint.c:871
#9  0x8011412f in Xsoftintr ()

kernel: NetBSD 7.0_STABLE

Any ideas what could be the problem?


Regards
Uwe


Re: high cpu load with tcpdump

2016-03-01 Thread 6bone

On Mon, 29 Feb 2016, Christos Zoulas wrote:



This tcpdump ktrace is when bind is running, right?
What happens if you stop it? How does the ktrace look then?
Or once you start bind, tcpdump goes nuts and stays that way?


You're right. When tcpdump is started after the bind, tcpdump caused 
problems.



Here the ktrace (tcpdump before bind)

  3451  1 tcpdump  1456839639.234979724 RET   read 0
  3451  1 tcpdump  1456839639.234983565 CALL  read(3,0x7f7ff7b16000,0x8)
  3451  1 tcpdump  1456839640.235319234 GIO   fd 3 read 0 bytes  ""
  3451  1 tcpdump  1456839640.235322516 RET   read 0
  3451  1 tcpdump  1456839640.235328872 CALL  read(3,0x7f7ff7b16000,0x8)
  3451  1 tcpdump  1456839641.235662445 GIO   fd 3 read 0 bytes  ""
  3451  1 tcpdump  1456839641.235665868 RET   read 0
  3451  1 tcpdump  1456839641.235670128 CALL  read(3,0x7f7ff7b16000,0x8)
  3451  1 tcpdump  1456839642.236012990 GIO   fd 3 read 0 bytes  ""
  3451  1 tcpdump  1456839642.236017460 RET   read 0
  3451  1 tcpdump  1456839642.236021371 CALL  read(3,0x7f7ff7b16000,0x8)
  3451  1 tcpdump  1456839643.236348030 GIO   fd 3 read 0 bytes  ""


In this case all works fine. There is also no change in the bind ktrace.


Regards
Uwe



Re: high cpu load with tcpdump

2016-02-29 Thread 6bone

On Mon, 29 Feb 2016, Christos Zoulas wrote:


| Hello,
|
| the problem occurs only on one of my servers. I tried to find the
| difference. It is the bind9 (bind-9.10.3pl3). If I stop the bind9, tcpdump
| works without problems. When I restart the bind9, the CPU load goes back
| to 100%.
|
| Is it a problem of the kernel, tcpdump or bind9?

Can you ktrace the bind? Perhaps it is waking up tcpdump spuriously.
That would indicate a kernel problem.


ktrace tcpdump starting at timestamp 1456773618

  1847  1 tcpdump  1456773617.99648 CALL  read(3,0x7f7ff7b16000,0x8)
  1847  1 tcpdump  1456773618.01813 RET   read -1 errno 35 Resource 
temporarily unavailable
  1847  1 tcpdump  1456773618.03699 CALL  read(3,0x7f7ff7b16000,0x8)
  1847  1 tcpdump  1456773618.06213 RET   read -1 errno 35 Resource 
temporarily unavailable
  1847  1 tcpdump  1456773618.08029 CALL  read(3,0x7f7ff7b16000,0x8)
  1847  1 tcpdump  1456773618.10333 RET   read -1 errno 35 Resource 
temporarily unavailable
  1847  1 tcpdump  1456773618.12289 CALL  read(3,0x7f7ff7b16000,0x8)
  1847  1 tcpdump  1456773618.15641 RET   read -1 errno 35 Resource 
temporarily unavailable
  1847  1 tcpdump  1456773618.17667 CALL  read(3,0x7f7ff7b16000,0x8)
  1847  1 tcpdump  1456773618.20111 RET   read -1 errno 35 Resource 
temporarily unavailable
  1847  1 tcpdump  1456773618.22206 CALL  read(3,0x7f7ff7b16000,0x8)
  1847  1 tcpdump  1456773618.24860 RET   read -1 errno 35 Resource 
temporarily unavailable
  1847  1 tcpdump  1456773618.26746 CALL  read(3,0x7f7ff7b16000,0x8)
  1847  1 tcpdump  1456773618.28981 RET   read -1 errno 35 Resource 
temporarily unavailable
...


ktrace named starting at timestamp 1456773618

  2362  9 named1456773617.648034355 CALL  
setsockopt(0x262,0x,0x800,0x7f7ff09f8484,4)
  2362  6 named1456773618.151726698 RET   setsockopt 0
  2362  6 named1456773618.151732844 CALL  
setsockopt(0x260,0x,0x2000,0x7f7ff15fb484,4)
  2362  3 named1456773618.151748627 RET   setsockopt 0
  2362  6 named1456773618.151749815 RET   setsockopt 0
  2362  3 named1456773618.151760291 CALL  
getsockopt(0x261,0x,0x1002,0x7f7ff21fe43c,0x7f7ff21fe438)
  2362  6 named1456773618.151763224 CALL  
getsockopt(0x260,0x,0x1002,0x7f7ff15fb48c,0x7f7ff15fb488)
  2362  9 named1456773618.151783617 RET   setsockopt 0
  2362  3 named1456773618.151790601 RET   getsockopt 0
  2362  6 named1456773618.151792696 RET   getsockopt 0
  2362  5 named1456773618.151793395 RET   __socket30 14/0xe
  2362  3 named1456773618.151807642 CALL  
bind(0x261,0x7f7ff21fe630,0x10)
  2362  5 named1456773618.151812601 CALL  fcntl(0xe,0,0x200)
  2362  9 named1456773618.151791509 CALL  
setsockopt(0x262,0x,0x2000,0x7f7ff09f8484,4)
  2362  3 named1456773618.151817071 MISC  mbsoname: [0.0.0.0]
  2362  6 named1456773618.151803941 CALL  
bind(0x260,0x7f7ff15fb680,0x10)
  2362  6 named1456773618.151832016 MISC  mbsoname: [0.0.0.0]
  2362  3 named1456773618.151861768 RET   bind 0
  2362  5 named1456773618.151883279 RET   fcntl 611/0x263
  2362  3 named1456773618.151884815 CALL  
recvmsg(0x261,0x7f7ff21fe320,0)
  2362  5 named1456773618.151889844 CALL  close(0xe)
  2362  3 named1456773618.151893545 MISC  msghdr: [name=0x7f7fef511088, 
namelen=128, iov=0x7f7ff21fe350, iovlen=1, control=0x7f7ff39a6fa0, 
controllen=96, flags=400]
  2362  5 named1456773618.151908980 RET   close 0
  2362  5 named1456773618.151913729 CALL  fcntl(0x263,3,0)
  2362  5 named1456773618.151916732 RET   fcntl 2
  2362  5 named1456773618.151920992 CALL  fcntl(0x263,4,6)
  2362  9 named1456773618.151922808 RET   setsockopt 0
  2362  9 named1456773618.151943900 CALL  
getsockopt(0x262,0x,0x1002,0x7f7ff09f848c,0x7f7ff09f8488)
  2362  9 named1456773618.151980705 RET   getsockopt 0
  2362  9 named1456773618.151992299 CALL  
bind(0x262,0x7f7ff09f8680,0x10)
  2362  9 named1456773618.15251 MISC  mbsoname: [0.0.0.0]
  2362 11 named1456773618.152002845 RET   close 0
  2362 11 named1456773618.152016184 CALL  read(5,0x7f7ff01f6f10,8)
  2362  9 named1456773618.152018768 RET   bind 0
  2362  6 named1456773618.152018140 RET   bind 0
  2362  3 named1456773618.152029244 RET   recvmsg -1 errno 35 Resource 
temporarily unavailable
  2362 11 named1456773618.152035390 GIO   fd 5 read 8 bytes 
",\^B\0\0\M-{\M^?\M^?\M^?"
  2362 11 named1456773618.152041396 RET   read 8
  2362  6 named1456773618.152040698 CALL  
recvmsg(0x260,0x7f7ff15fb370,0)
  2362  9 named1456773618.152033854 CALL  
recvmsg(0x262,0x7f7ff09f8370,0)
  2362 11 named1456773618.152050266 CALL  

Re: high cpu load with tcpdump

2016-02-29 Thread 6bone

On Sat, 27 Feb 2016, 6b...@6bone.informatik.uni-leipzig.de wrote:


Date: Sat, 27 Feb 2016 23:52:24 +0100 (CET)
From: 6b...@6bone.informatik.uni-leipzig.de
To: Joerg Sonnenberger 
Cc: current-users@netbsd.org
Subject: Re: high cpu load with tcpdump

On Sat, 27 Feb 2016, Joerg Sonnenberger wrote:


fstat should tell you what the file descriptor is, I just want to
identify what device seems to have the trouble.


Hello,

the problem occurs only on one of my servers. I tried to find the 
difference. It is the bind9 (bind-9.10.3pl3). If I stop the bind9, tcpdump 
works without problems. When I restart the bind9, the CPU load goes back 
to 100%.


Is it a problem of the kernel, tcpdump or bind9?


Thank you for your efforts


Regards
Uwe


Re: high cpu load with tcpdump

2016-02-27 Thread 6bone

On Sat, 27 Feb 2016, Joerg Sonnenberger wrote:


fstat should tell you what the file descriptor is, I just want to
identify what device seems to have the trouble.



You're right.

USER CMD  PID   FD MOUNT   INUM MODE SZ|DV R/W
_tcpdump tcpdump  823 root /14249633 drwxr-xr-x 512 r
_tcpdump tcpdump  823   wd /14249633 drwxr-xr-x 512 r
_tcpdump tcpdump  8230 /dev/pts   5 crw--w--w-   pts/1 rw
_tcpdump tcpdump  8231 /dev/pts   5 crw--w--w-   pts/1 rw
_tcpdump tcpdump  8232 /dev/pts   5 crw--w--w-   pts/1 rw
_tcpdump tcpdump  8233* bpf rec=0, dr=0, cap=0, pid=823, seesent, idle
_tcpdump tcpdump  8234 /9157214 -rw-r--r--   25628 r
_tcpdump tcpdump  8235* kqueue pending 0
_tcpdump tcpdump  8236 /9157439 -rw-r--r--  71 r



Regards
Uwe



Re: high cpu load with tcpdump

2016-02-27 Thread 6bone

On Sat, 27 Feb 2016, Joerg Sonnenberger wrote:


Date: Sat, 27 Feb 2016 20:38:37 +0100
From: Joerg Sonnenberger 
To: current-users@netbsd.org
Subject: Re: high cpu load with tcpdump

On Sat, Feb 27, 2016 at 08:18:41PM +0100, 6b...@6bone.informatik.uni-leipzig.de 
wrote:

  5015  1 tcpdump  1456559035.621583576 CALL  read(3,0x7f7ff7b16000,0x8)


FD 3 is a BPF instance?

Joerg



I don't know what FD 3 is used for, perhaps it could be a BPF descriptor.

modstat | grep bpf
bpf  driver builtin7 0--
if_athn_usb  driver builtin0 0-bpf
if_axe   driver builtin0 0-bpf
if_axen  driver builtin0 0-bpf
if_rum   driver builtin0 0-bpf
if_run   driver builtin0 0-bpf
if_urtw  driver builtin0 0-bpf
if_urtwn driver builtin0 0-bpf

But I think this is the default of all kernels based at the GENERIC 
configuration. The system has some ipfilter rules but no npf 
configuration. Stopping ipfilter has no impact to the tcpdump problem.



Regards
Uwe


Re: high cpu load with tcpdump

2016-02-27 Thread 6bone

On Fri, 26 Feb 2016, Christos Zoulas wrote:


Date: Fri, 26 Feb 2016 14:52:46 + (UTC)
From: Christos Zoulas 
To: current-users@netbsd.org
Subject: Re: high cpu load with tcpdump

In article ,
<6b...@6bone.informatik.uni-leipzig.de> wrote:

Hello,

On my router tcpdump uses always 100% CPU.

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
 3403 _tcpdump  38019M 3016K RUN/6  0:24 98.08% 70.02% tcpdump
0 root   00 0K 1182M CPU/7 49:35  0.00%  3.76% [system]

The problem also occurs when tcpdump is listening on an interface
with no network traffic. Therefore, it can not be a load problem.

The problem does not occur on all interfaces.

ixg(0|1), bnx(0|1) and tap(0|1) are affected.

lo0, stf0 and vlan* are not affected.


System is: NetBSD 7.0_STABLE

Any idea what could be the problem?


Thank you for your efforts.


ktrace please?

christos


...
  5015  1 tcpdump  1456559035.621583576 CALL  read(3,0x7f7ff7b16000,0x8)
  5015  1 tcpdump  1456559035.621593842 RET   read -1 errno 35 Resource 
temporarily unavailable
  5015  1 tcpdump  1456559035.621595938 CALL  read(3,0x7f7ff7b16000,0x8)
  5015  1 tcpdump  1456559035.621598033 RET   read -1 errno 35 Resource 
temporarily unavailable
  5015  1 tcpdump  1456559035.621599849 CALL  read(3,0x7f7ff7b16000,0x8)
  5015  1 tcpdump  1456559035.621601944 RET   read -1 errno 35 Resource 
temporarily unavailable
  5015  1 tcpdump  1456559035.621603690 CALL  read(3,0x7f7ff7b16000,0x8)
  5015  1 tcpdump  1456559035.621605785 RET   read -1 errno 35 Resource 
temporarily unavailable
  5015  1 tcpdump  1456559035.621607601 CALL  read(3,0x7f7ff7b16000,0x8)
  5015  1 tcpdump  1456559035.621609626 RET   read -1 errno 35 Resource 
temporarily unavailable
...

There are more than 150,000 read operations per second.
When a data packet is sent or received by the interface, the data packet 
is displayed correctly.



Regards
Uwe



high cpu load with tcpdump

2016-02-26 Thread 6bone

Hello,

On my router tcpdump uses always 100% CPU.

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
 3403 _tcpdump  38019M 3016K RUN/6  0:24 98.08% 70.02% tcpdump
0 root   00 0K 1182M CPU/7 49:35  0.00%  3.76% [system]

The problem also occurs when tcpdump is listening on an interface 
with no network traffic. Therefore, it can not be a load problem.


The problem does not occur on all interfaces.

ixg(0|1), bnx(0|1) and tap(0|1) are affected.

lo0, stf0 and vlan* are not affected.


System is: NetBSD 7.0_STABLE

Any idea what could be the problem?


Thank you for your efforts.


Regards
Uwe



Re: nfs client and quota

2016-01-27 Thread 6bone

On Mon, 25 Jan 2016, Manuel Bouyer wrote:


Date: Mon, 25 Jan 2016 15:28:49 +0100
From: Manuel Bouyer 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: Re: nfs client and quota

On Mon, Jan 25, 2016 at 02:57:08PM +0100, 6b...@6bone.informatik.uni-leipzig.de 
wrote:

Where can I find the file getnfsquota.c?

"find /usr/src/ -name getnfsquota.c -type f" reports no result.


I probably looked at old sources, in netbsd-7 it's function
__quota_nfs_get in lib/libquota/quota_nfs.c




at file lib/libquota/quota_nfs.c:180

ret = callaurpc(host, RQUOTAPROG, EXT_RQUOTAVERS, RQUOTAPROC_GETQUOTA, 
(xdrproc_t)xdr_ext_getquota_args, _gq_args, 
(xdrproc_t)xdr_getquota_rslt, _rslt);


returns (RPC_PROGNOTREGISTERED=15)

the value of host is 172.18.86.9, 'rpcinfo -p 172.18.86.9' reports

bash-4.3# rpcinfo -p 172.18.86.9
   program vers proto   port  service
102   udp111  portmapper
102   tcp111  portmapper
103   udp111  portmapper
103   tcp111  portmapper
104   udp111  portmapper
104   tcp111  portmapper
133   udp   2049  nfs
133   tcp   2049  nfs
134   tcp   2049  nfs
4000101   tcp   2049
151   udp635  mountd
152   udp635  mountd
153   udp635  mountd
151   tcp635  mountd
152   tcp635  mountd
153   tcp635  mountd
1000214   udp   4045  nlockmgr
1000214   tcp   4045  nlockmgr
1000241   udp   4046  status
1000241   tcp   4046  status
1000111   udp   4049  rquotad



Regards
Uwe



Re: nfs client and quota

2016-01-25 Thread 6bone

Where can I find the file getnfsquota.c?

"find /usr/src/ -name getnfsquota.c -type f" reports no result.


Regards
Uwe



On Tue, 19 Jan 2016, Manuel Bouyer wrote:


Date: Tue, 19 Jan 2016 09:29:34 +0100
From: Manuel Bouyer 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: Re: nfs client and quota

On Tue, Jan 19, 2016 at 09:13:54AM +0100, 6b...@6bone.informatik.uni-leipzig.de 
wrote:

[...]
1000241   udp   4046  status
1000241   tcp   4046  status
1000111   udp   4049  rquotad


Only version 1 rquotad; this may be the issue. The servers I tested all
support version 2.

I guess the problem is in src/lib/libquota/getnfsquota.c:getnfsquota()
If callaurpc() for EXT_RQUOTAVERS fails, it's supposed to try
RQUOTAVERS. Can you try to trace what happens here ?
Is the first callaurpc() call returning RPC_PROGVERSMISMATCH or something
else ? Is rpcqtype really RQUOTA_USRQUOTA ?
What is returning the second call ?

--
Manuel Bouyer 
NetBSD: 26 ans d'experience feront toujours la difference
--



Re: nfs client and quota

2016-01-19 Thread 6bone

On Tue, 19 Jan 2016, Manuel Bouyer wrote:


Date: Tue, 19 Jan 2016 09:01:10 +0100
From: Manuel Bouyer 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: Re: nfs client and quota

On Tue, Jan 19, 2016 at 08:23:37AM +0100, 6b...@6bone.informatik.uni-leipzig.de 
wrote:

hello,

I am trying to display the quota on a nfs client. The nfs server is a
Netapp. With a Linux client all works fine. NetBSD-7 does not show anything.

At boot time the nfs client starts the following services:

rpcbind=YES
nfs_client=YES
lockd=YES
statd=YES

There is no firewall running.

Is quota for nfs clients implemented under NetBSD?


Yes


What can be the problem?


what does rpcinfo -p against the server reports ?


bash-4.3#  rpcinfo -p 172.18.86.9
   program vers proto   port  service
102   udp111  portmapper
102   tcp111  portmapper
103   udp111  portmapper
103   tcp111  portmapper
104   udp111  portmapper
104   tcp111  portmapper
133   udp   2049  nfs
133   tcp   2049  nfs
134   tcp   2049  nfs
4000101   tcp   2049
151   udp635  mountd
152   udp635  mountd
153   udp635  mountd
151   tcp635  mountd
152   tcp635  mountd
153   tcp635  mountd
1000214   udp   4045  nlockmgr
1000214   tcp   4045  nlockmgr
1000241   udp   4046  status
1000241   tcp   4046  status
1000111   udp   4049  rquotad



nfs client and quota

2016-01-18 Thread 6bone

hello,

I am trying to display the quota on a nfs client. The nfs server is a 
Netapp. With a Linux client all works fine. NetBSD-7 does not show 
anything.


At boot time the nfs client starts the following services:

rpcbind=YES
nfs_client=YES
lockd=YES
statd=YES

There is no firewall running.

Is quota for nfs clients implemented under NetBSD? What can be the 
problem?




Regards
Uwe


netbsd-7 crash

2016-01-04 Thread 6bone

hello,

my NetBSD routers today crashed after about 3 months update. To me it 
looks like an error in the network code. Maybe someone can see something 
more accurate in the dump.


If more information is needed, I have the crashdump and the kernel with 
debug information.




Regards
Uwe

Reading symbols from /netbsd...done.
(gdb) target kvm /tmp/netbsd.22.core
(gdb) bt
#0  0x80689f9f in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at 
/usr/src/sys/arch/amd64/amd64/machdep.c:671

#1  0x808b72b4 in vpanic (
fmt=0x80d000f8 "kernel %sassertion \"%s\" failed: file \"%s\", 
line %d ", ap=ap@entry=0xfe813a5ef9f0) at 
/usr/src/sys/kern/subr_prf.c:340

#2  0x80a53b83 in kern_assert (
fmt=fmt@entry=0x80d000f8 "kernel %sassertion \"%s\" failed: 
file \"%s\", line %d ") at /usr/src/sys/lib/libkern/kern_assert.c:51

#3  0x8096830b in m_freem (m=0xfe813c896600)
at /usr/src/sys/kern/uipc_mbuf.c:652
#4  0x8056f758 in ipf_fastroute (m0=0xfe813c896600,
mpp=mpp@entry=0xfe813a5efb70, fin=fin@entry=0xfe813a5efb78,
fdp=fdp@entry=0x0)
at /usr/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:1073
#5  0x8056eb4d in ipf_send_ip (fin=fin@entry=0xfe813a5efd50,
m=m@entry=0xfe813c896600)
at /usr/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:849
#6  0x8056f033 in ipf_send_icmp_err (type=,
type@entry=3, fin=fin@entry=0xfe813a5efd50, dst=dst@entry=0)
at /usr/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:1033
#7  0x80356917 in ipf_check (ctx=0x81106600 ,
ip=, hlen=, ifp=,
out=, mp=0xfe813a5efea0)
at /usr/src/sys/external/bsd/ipf/netinet/fil.c:3072
#8  0x8074d2ef in pfil_run_hooks (ph=,
mp=mp@entry=0xfe813a5efee8, ifp=0xfe813c5bd810, 
dir=dir@entry=1)

at /usr/src/sys/net/pfil.c:266
#9  0x8055a8b1 in ip6_input (m=0xfe87f5c0e600)
at /usr/src/sys/netinet6/ip6_input.c:350
#10 0x8055b265 in ip6intr (arg=)
at /usr/src/sys/netinet6/ip6_input.c:238
#11 0x8061aa98 in softint_execute (l=, s=4,
si=0x80023b4e7230) at /usr/src/sys/kern/kern_softint.c:589
#12 softint_dispatch (pinned=, s=4)
at /usr/src/sys/kern/kern_softint.c:871
#13 0x8011412f in Xsoftintr ()
(gdb)


Re: agr issue in netbsd-7

2016-01-03 Thread 6bone

On Wed, 19 Aug 2015, Havard Eidnes wrote:


In the meantime, perhaps someone of you could file a PR?
(so this doesn't get lost in the archives...)


Done, PR#50155.

Regards,

- Håvard


Hello,

is there a possibility that the problem will be solved in the near future? 
The workaround described in the PR is not working correctly. If you set 
the link address with ifconfig, the lacp keeps using the hardware MAC 
address.


ifconfig bnx1
bnx1: flags=8843 mtu 1500
capabilities=3f00
capabilities=3f00
enabled=3f00
enabled=3f00
ec_capabilities=7
ec_enabled=0
address: 00:19:b9:b0:f1:43
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
link 00:19:b9:b0:f1:45

the lacp information from the switch:

Partner's information:

  LACP portAdmin  Oper   Port 
Port

Port  Flags   Priority  Dev ID  AgekeyKeyNumber  State
Gi7/44SA  32768 0019.b9b0.f145  13s0x00xD0   0x1 0x3D
Gi7/46SA  32768 0019.b9b0.f143  15s0x00xD0   0x4 0xD

At Port Gi7/46 is still used the hardware MAC address.


Thank you for your efforts

Regards
Uwe

Re: netbsd-7 quota issue

2015-12-14 Thread 6bone

On Fri, 11 Dec 2015, Manuel Bouyer wrote:


On Fri, Dec 11, 2015 at 03:15:06PM +0100, 6b...@6bone.informatik.uni-leipzig.de 
wrote:

I'am trying to activate quota on a NetBSD-7 system, but the startup script
returns an exit code of 1.

[running /etc/rc.d/quota]
Checking quotas: done.
/etc/rc.d/quota exited with code 1


Do you have any error in /var/run/rc.log ?


No. rc.log shows only the three lines.





When I try to disable the quota with 'quotaoff -a', the process goes to a
non-interruptible status, uses 100% CPU and can not be terminted.

Here my fstab:

# NetBSD /targetroot/etc/fstab
# See /usr/share/examples/fstab/ for more examples.
/dev/sd0a   /   ffs rw,userquota 1 1


I would suggest using the new quota2, though.
See tunefs(8) and fsck_ffs(8)


It was not possible to activate the quota with tunefs. This is because I 
use a kernel with "#INSECURE". Now I have a kernel with 'INSECURE'. Now 
you can activate the quota with tunefs. rc.log now looks like 
this:


[running /etc/rc.d/quota]
Checking quotas:quotacheck: filesystem / has quotas already turned on

Question: Can I use QUOTA2 with a '#INSECURE' kernel?

Is it further necessary to run '/etc/rc.d/quota' at boot or to use the 
userquota statement in the fstab?


Thank you for your efforts.

Regards
Uwe


agr issue in netbsd-7

2015-07-30 Thread 6bone

hello,

I tried to configure a port channel (agr0).
When I configure the port channel only with bnx0 or only with bnx1 
everything works. If I use bnx0 and bnx1, the Cisco switch sets one of 
the two links to suspended mode.



bnx0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx
capabilities=3f00UDP4CSUM_Rx,UDP4CSUM_Tx
enabled=0
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=0
address: 00:19:b9:b0:f1:45
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active

bnx1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx
capabilities=3f00UDP4CSUM_Rx,UDP4CSUM_Tx
enabled=0
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=0
address: 00:19:b9:b0:f1:43
media: Ethernet autoselect (1000baseT full-duplex)
status: active

agr0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx
capabilities=3f00UDP4CSUM_Rx,UDP4CSUM_Tx
enabled=0
agrport: bnx0, flags=0x3COLLECTING,DISTRIBUTING
agrport: bnx1, flags=0x0
address: 00:19:b9:b0:f1:45
inet 139.18.25.36 netmask 0xfff8 broadcast 139.18.25.39
inet6 fe80::219:b9ff:feb0:f145%agr0 prefixlen 64 scopeid 0x6


The cisco catalyst reports:

Channel group 4 neighbors

Partner's information:

  LACP portAdmin  Oper   Port 
Port
Port  Flags   Priority  Dev ID  AgekeyKeyNumber 
State

Gi7/44SA  32768 0019.b9b0.f145  14s0x00xD0   0x1 0x3D
Gi7/46SA  32768 0019.b9b0.f143  14s0x00xD0   0x4 0xD


Maybe the problem is the device ID. I think the device ID should be the 
same for all ports in a port channel.



Can someone take a look at the problem?


Thank you for your efforts


Regards
Uwe


Re: question to stf interface (current)

2015-06-10 Thread 6bone

On Sun, 31 May 2015, Christos Zoulas wrote:


Let's keep monitoring it, and perhaps we can run a tcpdump to capture the
exact packet and see what it contains...



It took some time, but now I, I identified the packets. Responses to DNS 
requests generate error messages. There are only DNS replies affected 
which are sendet to 2002::/16 addresses.


For Exmaple:

14:22:19.482338 AF IPv6 (24), length 551: (hlim 64, next-header UDP (17) 
payload length: 507) 2001:638:902:1::10.53  
2002:d9f5:7de1:8000:3631:c4ff:fef3:6b3d.51711: [udp sum ok] 47776| q: A? 
au.download.windowsupdate.com. 10/7/0 au.download.windowsupdate.com. 
[37m57s] CNAME audownload.windowsupdate.nsatc.net., 
audownload.windowsupdate.nsatc.net. [7m56s] CNAME au.au-msedge.net., 
au.au-msedge.net. [1m] CNAME au.c-0001.c-msedge.net., 
au.c-0001.c-msedge.net. [34s] CNAME au.c-0001.e-msedge.net., 
au.c-0001.e-msedge.net. [4m] CNAME 
edgereturn.audownload.windowsupdate.nsatc.net., 
edgereturn.audownload.windowsupdate.nsatc.net. [10m] CNAME 
edgehop.audownload.windowsupdate.nsatc.net., 
edgehop.audownload.windowsupdate.nsatc.net. [10m] CNAME 
au.download.windowsupdate.com.edgesuite.net., 
au.download.windowsupdate.com.edgesuite.net. [3m15s] CNAME 
a767.dscd.akamai.net., a767.dscd.akamai.net. [20s] A 212.201.100.136, 
a767.dscd.akamai.net. [20s] A 212.201.100.135 ns: dscd.akamai.net. 
[31m17s] NS n5dscd.akamai.net., dscd.akamai.net. [31m17s] NS 
n7dscd.akamai.net., dscd.akamai.net. [31m17s] NS n2dscd.akamai.net., 
dscd.akamai.net. [31m17s] NS n6dscd.akamai.net., dscd.akamai.net. [31m17s] 
NS n0dscd.akamai.net., dscd.akamai.net. [31m17s] NS n1dscd.akamai.net., 
dscd.akamai.net. [31m17s] NS n3dscd.akamai.net. (499)
14:22:19.538227 AF IPv6 (24), length 551: (hlim 64, next-header UDP (17) 
payload length: 507) 2001:638:902:1::10.53  
2002:d9f5:7de1:8000:3631:c4ff:fef3:6b3d.51711: [udp sum ok] 60014| q: A? 
au.download.windowsupdate.com. 10/7/0 au.download.windowsupdate.com. 
[37m57s] CNAME audownload.windowsupdate.nsatc.net., 
audownload.windowsupdate.nsatc.net. [7m56s] CNAME au.au-msedge.net., 
au.au-msedge.net. [1m] CNAME au.c-0001.c-msedge.net., 
au.c-0001.c-msedge.net. [34s] CNAME au.c-0001.e-msedge.net., 
au.c-0001.e-msedge.net. [4m] CNAME 
edgereturn.audownload.windowsupdate.nsatc.net., 
edgereturn.audownload.windowsupdate.nsatc.net. [10m] CNAME 
edgehop.audownload.windowsupdate.nsatc.net., 
edgehop.audownload.windowsupdate.nsatc.net. [10m] CNAME 
au.download.windowsupdate.com.edgesuite.net., 
au.download.windowsupdate.com.edgesuite.net. [3m15s] CNAME 
a767.dscd.akamai.net., a767.dscd.akamai.net. [20s] A 212.201.100.135, 
a767.dscd.akamai.net. [20s] A 212.201.100.136 ns: dscd.akamai.net. 
[31m17s] NS n2dscd.akamai.net., dscd.akamai.net. [31m17s] NS 
n6dscd.akamai.net., dscd.akamai.net. [31m17s] NS n4dscd.akamai.net., 
dscd.akamai.net. [31m17s] NS n7dscd.akamai.net., dscd.akamai.net. [31m17s] 
NS n5dscd.akamai.net., dscd.akamai.net. [31m17s] NS n3dscd.akamai.net., 
dscd.akamai.net. [31m17s] NS n1dscd.akamai.net. (499)



dmesg reports:

Jun 10 14:22:19 gate /netbsd: nd6_storelladdr: bad gateway address type 
inet6: 2002:8b12:1921:: for dst 2002:d9f5:7de1:8000:3631:c4ff:fef3:6b3d 
through interface vlan404


the DNS server runs on the computer with the STF interface.

stf0: flags=1UP mtu 1280
inet6 2002:8b12:1921:: prefixlen 16

ixg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=fff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
status: active
inet 139.18.25.33 netmask 0xfff8 broadcast 139.18.25.39
inet alias 139.18.25.34 netmask 0x broadcast 139.18.25.34
inet alias 192.88.99.1 netmask 0x broadcast 192.88.99.1
inet6 fe80::a236:9fff:fe27:4330%ixg0 prefixlen 64 scopeid 0x1
inet6 2001:638:902:1::1 prefixlen 64
inet6 2001:638:902:1::10 prefixlen 128


vlan404: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
vlan: 404 parent: ixg0
inet6 fe80::a236:9fff:fe27:4330%vlan404 prefixlen 64 scopeid 0xf
inet6 2001:638:c:a00a::2 prefixlen 64


and the routing table:

IPv4
default139.18.25.38   UG  --  -L ixg0

IPv6
default2001:638:c:a00a::1 UG 
--  -  vlan404




Any ideas what could be the problem?


Regards
Uwe



Re: question to stf interface (current)

2015-05-28 Thread 6bone

On Thu, 28 May 2015, Christos Zoulas wrote:


Date: Thu, 28 May 2015 09:30:10 -0400
From: Christos Zoulas chris...@zoulas.com
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: current-users@netbsd.org
Subject: Re: question to stf interface (current)

On May 28,  3:13pm, 6b...@6bone.informatik.uni-leipzig.de 
(6b...@6bone.informatik.uni-leipzig.de) wrote:
-- Subject: Re: question to stf interface (current)

| On Thu, 28 May 2015, Christos Zoulas wrote:
|
|  I think it has to go the other way (from the router) for this to happen.
|  Can you also post ifconfig -a?
| 
|  christos
|
| Here the ifconfig -a

Thanks and you are having issues because of the packets dropped, or are
you just worried about the messages?


I am worried about the error message and would like to know where it comes 
from.



Uwe


question to stf interface (current)

2015-05-28 Thread 6bone

Hello,

I provide an 6to4 router and use the stf interface.

ifconfig stf0

stf0: flags=1UP mtu 1280
inet6 2002:8b12:1921:: prefixlen 16

dmesg now reports regularly errors. For example,

nd6_storelladdr: bad gateway address type inet6: 2002:8b12:1921:: for dst 
2002:72f3:e21d:0:ad57:3c67:30ff:720e through interface vlan404


I think the packet to 2002:72f3:e21d:0:ad57:3c67:30ff:720e should be 
sendet to 114.243.226.29 via IPv4 and not via IPv6.


Here the interesting entries in the routing table:

Routing tables

Internet:
DestinationGatewayFlagsRefs  UseMtu 
Interface

default139.18.25.38   UG  --  -L ixg0

...

Internet6:
DestinationGatewayFlags 
Refs  UseMtu Interface


default2001:638:c:a00a::1 UG 
--  -  vlan404
2002::/16  2002:8b12:1921::   U 
--  -  stf0
2002:8b12:1921::   link#30UHL 
--  -  lo0



In my opinion, the package to 2002:72f3:E21D:0:ad57:3c67:720e:30ff should 
be sent to the interface stf0. There will it be encapsulated in 
IPv4 and should be send via the IPv4 default route to Interface ixg0. 
The package must not be sent to the interface vlan404.


The message is only for packets with the sender 2002:8B12:1921:: (the 
router itself). If I send a ping packet to the affected addresses from the 
router everything works as fine. I can not force the fault so.



Does anyone have an idea where the messages could come from?


Thank you for your efforts


Regards
Uwe


Kernel RNG ???

2015-05-20 Thread 6bone

Hello,

dmesg reported:

Kernel RNG 2105 0 2 runs test FAILURE: too many runs of 3 1s (728  723)
cprng 2105 0 2: failed statistical RNG test

Any ideas what could be the problem?


kernel version: NetBSD 7.99.9
distribution: netbsd-7 (version May 11)


Regards
Uwe


Re: current status of ixg(4)

2015-04-09 Thread 6bone

On Tue, 7 Apr 2015, Justin Cormack wrote:


Try the sysctls, there is a maximum interrupt rate, hw.ixgbe.max_interrupt_rate

Justin



Sure that the value exists at netbsd?

# sysctl -a | grep ixg0
net.interfaces.ixg0.sndq.len = 0
net.interfaces.ixg0.sndq.maxlen = 2046
net.interfaces.ixg0.sndq.drops = 0
hw.ixg0.num_rx_desc = 2048
hw.ixg0.num_queues = 1
hw.ixg0.fc = 3
hw.ixg0.enable_aim = 1
hw.ixg0.advertise_speed = 0
hw.ixg0.ts = 0
hw.ixg0.rx_processing_limit = 256
hw.ixg0.queue0.interrupt_rate = 2000
hw.ixg0.queue0.irqs = 8053816
hw.ixg0.queue0.txd_head = 1872
hw.ixg0.queue0.txd_tail = 1872
hw.ixg0.queue0.rxd_head = 159
hw.ixg0.queue0.rxd_tail = 158



 Regards
 Uwe


Re: current status of ixg(4)

2015-04-09 Thread 6bone

On Wed, 8 Apr 2015, SAITOH Masanobu wrote:


Use new one:

http://www.netbsd.org/~msaitoh/ixg-20150407-1.dif



After a first test, it looks as if the interrupt throttling now works (better).


 Regards
 Uwe


Re: current status of ixg(4)

2015-03-31 Thread 6bone

On Fri, 27 Mar 2015, Masanobu SAITOH wrote:


This change have commited now.

New patch:

http://www.netbsd.org/~msaitoh/ixg-20150327-0.dif



I have tested the patch and found no problems.

My server (HP G5) can handle with the new driver package rates up to 
200,000 packets per second. Then CPU0 is running at 100% with interrupts.


If I have not charged me, it comes with 200,000 packets per second and an 
MTU of 1500 bytes to a maximum of 2.4GB.


Is it possible to optimize some parameters for the interrupt throttling?


Regards
Uwe


Re: current status of ixg(4)

2015-03-25 Thread 6bone

On Wed, 25 Mar 2015, SAITOH Masanobu wrote:



Did you really applied this patch?



Upps... I tried to apply the patch against the -current sources where I 
have applied http://www.netbsd.org/~msaitoh/ixg-20150321-0.dif before



# patch  vlan.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--
|Index: ixgbe.c
|===
|RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.c,v
|retrieving revision 1.14.2.2
|diff -u -p -r1.14.2.2 ixgbe.c
|--- ixgbe.c24 Feb 2015 10:41:09 -1.14.2.2
|+++ ixgbe.c23 Mar 2015 07:32:50 -
--
Patching file ixgbe.c using Plan A...
Hunk #1 failed at 1064.
1 out of 1 hunks failed--saving rejects to ixgbe.c.rej
Hmm...  Ignoring the trailing garbage.
done

So you are right. The patch was not applied. If I add the code manuelly it 
works perfekt!


Thank you.

Regards
Uwe


Re: current status of ixg(4)

2015-03-23 Thread 6bone

On Mon, 23 Mar 2015, Masanobu SAITOH wrote:


Is this problem filed PR? If not, could you file a PR?


Could you test with this patch?



The path dosn't solve the problem.

Here the requested information:

HW:
023:00:0: Intel 82599 (SFP+) 10 GbE Controller (ethernet network, revision 0x01)
023:00:1: Intel 82599 (SFP+) 10 GbE Controller (ethernet network, revision 0x01)

Driver:
ixg0 at pci14 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, 
Version - 2.4.5
ixg0: interrupting at ioapic0 pin 19
ixg0: PCI Express Bus: Speed 2.5Gb/s Width x8
ixg1 at pci14 dev 0 function 1: Intel(R) PRO/10GbE PCI-Express Network Driver, 
Version - 2.4.5
ixg1: interrupting at ioapic0 pin 16
ifmedia_match: multiple match for 0x20/0xfff, selected instance 0
ixg1: PCI Express Bus: Speed 2.5Gb/s Width x8


ifconfig:

ixg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=bff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
capabilities=bff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
capabilities=bff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,LRO
enabled=0
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
address: a0:36:9f:26:95:04
media: Ethernet autoselect (10GbaseSR full-duplex)
status: active
input: 64405 packets, 5192777 bytes, 700 multicasts, 2459 unknown 
protocol
output: 7 packets, 1138 bytes, 3 multicasts
inet 0.0.0.0 netmask 0xff00 broadcast 255.255.255.255
inet6 fe80::a236:9fff:fe26:9504%ixg0 prefixlen 64 scopeid 0x1

vlan8: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=3ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
capabilities=3ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
capabilities=3ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx
enabled=0
vlan: 8 parent: ixg0
address: a0:36:9f:26:95:04
input: 0 packets, 0 bytes
output: 3 packets, 250 bytes, 3 multicasts
inet6 fe80::a236:9fff:fe26:9504%vlan8 prefixlen 64 scopeid 0x4

You can see, the input counter is 0. tcpdump -i vlan8 shows no packets. 
But tcpdump -evi ixg0 shows tagged packets for vlan 8:


e.g.:
23:26:13.880538 a2:de:48:00:00:0e  ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 8, p 0, ethertype ARP, Request who-has 139.18.13.212 
tell 139.18.13.254, length 46


Thank you for your efforts


Regards
Uwe



Re: current status of ixg(4)

2015-03-21 Thread 6bone

On Fri, 20 Mar 2015, Masanobu SAITOH wrote:


Date: Fri, 20 Mar 2015 17:38:03 +0900
From: Masanobu SAITOH msai...@execsw.org
To: current-users@NetBSD.org
Cc: msai...@execsw.org
Subject: current status of ixg(4)

Hello.

Yesterday, I commited some changes to ixg(4) on -current.

http://mail-index.netbsd.org/source-changes/2015/03/19/msg064110.html

I'll wait for a few days to wait feedback of this change. And then
I'll send pullup request to pullup-7@.



I have applied the patch on -current. The build fails with:

ixgbe_api.o: In function `ixgbe_init_shared_code':
ixgbe_api.c:(.text+0x16d): undefined reference to `ixgbe_init_ops_X540'


Regards
Uwe


Re: current status of ixg(4)

2015-03-21 Thread 6bone

On Sat, 21 Mar 2015, SAITOH Masanobu wrote:


New patch:

http://www.netbsd.org/~msaitoh/ixg-20150321-0.dif

Could you try with this patch again?


Now the patch works, but I found a problem with vlan interfaces.

You can create a vlan interface with:

ifconfig vlan8 create
ifconfig vlan8 vlan 8 vlanif ixg0 up

The interface is generated. But there are no packages. ifconfig -v shows 
no inbound packets. tcpdump on ixg0 but indicates tagged packets for 
VLAN8.


The problem also existed with the previous ixg driver. With wm interfaces, 
the problem does not seem to exist.



Regards
Uwe


Re: DoS attack against TCP services

2015-02-28 Thread 6bone

On Fri, 13 Feb 2015, Christos Zoulas wrote:


I tried adding show callout to crash(8) but it is not useful because the
pointers move too quickly. OTOH, next time this happens you can enter ddb
on your machine and type show callout and see if that sheds any light
to the expired and not fired callouts...

christos



The problem occurred again. I have created a couple of screenshots. 
Unfortunately I can not interpret the output.


https://www.ipv6.uni-leipzig.de/callout_1.png
https://www.ipv6.uni-leipzig.de/callout_2.png
https://www.ipv6.uni-leipzig.de/callout_3.png
https://www.ipv6.uni-leipzig.de/callout_4.png
https://www.ipv6.uni-leipzig.de/callout_x.png


Thank your for your efforts


Regards
Uwe


Re: DoS attack against TCP services

2015-02-28 Thread 6bone

On Sat, 28 Feb 2015, J. Hannken-Illjes wrote:


This one looks bad.  Which thread holds proc_lock?



Helps this?

https://www.ipv6.uni-leipzig.de/proc_lock.png


Regards
Uwe



Re: DoS attack against TCP services

2015-02-28 Thread 6bone

On Sat, 28 Feb 2015, Christos Zoulas wrote:


Yes, that's a good start but we need to find which process that
lwp belongs to.


I'm not sure what the best course of action is. The machine is still 
running. Should you try to get the information from the current system or 
force a dump and analyze this?


On Sat, 28 Feb 2015, J. Hannken-Illjes wrote:


Looks unlocked -- what about a backtrace of thread 0.5,
bt /a 0xfe882df11860


https://www.ipv6.uni-leipzig.de/bt_0xfe882df11860.png


Regards
Uwe


Re: DoS attack against TCP services

2015-02-28 Thread 6bone

On Sat, 28 Feb 2015, Christos Zoulas wrote:


Good idea. You can use crash, ps and see what each process is holding...

christos


Here the output from crash and ps

gate# crash
Crash version 7.0_BETA, image version 7.99.5.
WARNING: versions differ, you may not be able to examine this image.
Output from a running system is unreliable.
crash ps
PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
94701 7   4 0   fe8824dfc760  crash
220121 3   480   fe82ae6b52a0 sh wait
267811 3   580   fe813bb998a0 su wait
9257 1 3   080   fe815fcc50a0 sh wait
13570   1 7   0 0   fe881c291280   sshd
126311 3   680   fe81ab1bd540   sshd select
210441 3   480   fe8811bdca00 pickup kqueue
6605 1 3   780   fe813b7cc040  getty ttyraw
7943 1 3   080   fe8817684040   sshd netio
8076 1 3   080   fe873be9cb80   sshd select
29513   1 7   7 0   fe881a39c6e0  snmpd
13553   11 3   0  1080   fe881ce1f680  named kqueue
13553   10 3   080   fe8348fa56e0  named parked
135539 3   680   fe87b4b1cb40  named parked
135538 3   180   fe863979c940  named parked
135537 2   7 0   fe881ed3c500  named
135536 3   480   fe88182101a0  named parked
135535 3   280   fe85240e6ac0  named parked
135534 3   680   fe881610a980  named parked
135533 3   580   fe872fa5c460  named parked
135532 3   380   fe813ccfa140  named parked
135531 3   080   fe85237e7a80  named sigwait
115001 3   280   fe8811bdc1c0   ntpd pause
6567 1 3   580   fe815fcc54c0   bash ttyraw
2965 1 3   580   fe881c2916a0openvpn select
34   1 3   780   fe8823ae0580 sh wait
50   1 3   680   fe8823ae09a0 su wait
43   1 3   680   fe88239955a0 sh wait
2712 1 3   580   fe83120a94a0   sshd select
42   1 3   780   fe8823995180   sshd select
1968 1 3   380   fe88209c5a60   cron nanoslp
2073 1 3   080   fe8824270620  inetd kqueue
1847 1 3   080   fe8824270a40 ospf6d select
1604 1 3   380   fe881f366a80   qmgr kqueue
2321 1 3   480   fe8824270200 master kqueue
1882 1 3   680   fe88268ee5e0   sshd select
1742 1 3   580   fe88268eea00 powerd kqueue
1477 1 3   680   fe8823ae0160  zebra select
1379 1 3   280   fe83e1769920   dhcrelay select
827  1 3   680   fe813be09900syslogd kqueue
11 3   780   fe813b871420   init wait
0  104 3   5   200   fe813be094e0  ipmi0 ipmi0
0  103 3   5   200   fe813be090c0physiod physiod
0  102 3   0   200   fe813b7cc460   aiodoned aiodoned
0  101 3   1   200   fe813b84d020ioflush syncer
0  100 3   0   200   fe813b7cc880   pgdaemon pgdaemon
0   97 3   0   200   fe813b493b40   scsibus1 sccomp
0   96 3   7   200   fe813b452700   usb5 usbevt
0   95 3   2   200   fe813b451b00   usb3 usbevt
0   94 3   0   200   fe813b4522e0   usb1 usbevt
0   93 3   7   200   fe813b4516e0   usb4 usbevt
0   92 3   0   200   fe813b452b20   usb0 usbevt
0   91 3   4   200   fe813b84d860   usb2 usbevt
0   90 3   0   200   fe813b84d440  atapibus0 sccomp
0   88 3   0   200   fe813b871000  cryptoret crypto_w
0   87 3   0   200   fe813b871840  unpgc unpgc
0   86 3   5   200   fe813b4512c0vmem_rehash vmem_rehash
0   85 3   5   200   fe813b4d7360  coretemp7 coretemp7
0   84 3   3   200   fe813b4d7780  coretemp6 coretemp6
0   83 3   3   200   fe813b4d7ba0  coretemp5 coretemp5
0   82 3   0   200   fe813b4d6340  coretemp4 coretemp4
0   81 3   6   200   fe813b4d6760  coretemp3 coretemp3
0   80 3   7   200   fe813b4d6b80  coretemp2 coretemp2
0   79 3   7   200   fe813b495320  

intel nic (ixgbe driver) and vlan interfaces

2015-02-23 Thread 6bone

Hello,

I use an Intel 10GbE card with ixgbe driver. My configuration is as 
follows:


cat ifconfig.ixg0
up

cat ifconfig.vlan103
create
vlan 103 vlanif ixg0 up


The following settings in rc.conf work without problems:

cat /etc/rc.conf
...
auto_ifconfig=NO
net_interfaces=vlan103 ixg0


With the following configuration vlan interface vlan103 does not work.

cat /etc/rc.conf
...
auto_ifconfig=NO
net_interfaces=ixg0 vlan103


You can see the problem only after a reboot. Both interfaces are 
generated, both interfaces are up. tcpdump shows incoming packets on the 
interface ixg0. tcpdump shows tagged packets (vlan 103) on the interface 
ixg0. tcpdump shows no packets on the interface vlan103. 'ifconfig -v 
vlan103' shows 0 packets received.


I tried to reproduce the behavior with a 1GB intel NIC. There, the two 
interfaces works indipendent from the order of their creation.


Is there a regulation in which order the interfaces must be generated?



Regards
Uwe



Re: DoS attack against TCP services

2015-02-13 Thread 6bone

On Wed, 4 Feb 2015, Sverre Froyen wrote:


I'd also look at the open descriptors of the named process (although they
should be closed at this time, since TIME_WAIT means closed on this side,
and waiting for the 4 minutes to expire before killing the connection)...

Also I'd record that information every minute or so to see how many
connections are added and how many are going away.

Perhaps there is some bug triggered in the tcp stack and somehow connections
are not being GC’ed?


This is vaguely similar to a problem I have seen from time to time. On my 
servers, it is usually port 80 that gets attacked. Someone opens TCP 
connections to this port on the server, sends no request, and leaves the 
connection open indefinitely. See 
http://mail-index.netbsd.org/netbsd-users/2011/01/04/msg007484.html

When I test such a scenario to port 53 (using telnet), the connection shows as 
ESTABLISHED for 30 seconds. Then, presumably, named times-out and closes the 
connection. At this point netstat shows the connection as TIME_WAIT for another 
10 seconds. After that it disappears.

If I disable the network connection during the 30 second period before named 
times out, however, I instead observe the connection in FIN_WAIT_1 mode for 
another 30 minutes or so.

This is on netbsd-6. I notice that your netstat output has the client and 
server columns in the reverse order from what I see. Could it be that in your 
netstat output, FIN_WAIT_1 is reported as TIME_WAIT?

Regards,
Sverre


The two problems are not identical. In my case, the connections are really 
in the TIME_WAIT state. Christos has also found that the 2MSL timer each 
connection is negative. If this value is negative, the connection should 
be removed.


The callout code in kern_timeout.c:

if (delta 0)
 cc-cc_ev_late.ev_count++;

At the same time, the problem occurs that expired entries are not deleted 
from the ndp table. 'ndp -a' shows expired entries.


Both problems occur only after several days of uptime. They probably have 
the same cause.


The problem you described is different.


Regards
Uwe

Re: DoS attack against TCP services

2015-02-07 Thread 6bone

On Fri, 6 Feb 2015, Robert Elz wrote:


What's more, it seems peculiar to your system, as no-one else seems to
be reporting similar problems.   So I'd be investigating how the timers
are working (or are not working) in the kernel - perhaps even try
selecting a different timer.



Just to make sure. If the bug described here be the cause? If the problem 
is already fixed?


 http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-7250

 Regards
 Uwe


Re: DoS attack against TCP services

2015-02-07 Thread 6bone

On Sat, 7 Feb 2015, Greg Troxel wrote:


I don't know; I will take look, but in this case the connections are
initiated by the inflicted system.


And so far we don't have any traces showing packets that look like attacks.


There must be no attack, yes. However, it is described that the attack 
exploits a memory leak. Maybe this can lead to problems in normal usage.


http://vigilance.fr/vulnerability/FreeBSD-NetBSD-OpenBSD-memory-leak-via-Net-2-TCP-Timer-15696

And as Robert Elz suspected a problem with a timer that bug might fit. 
The article says: However, the implementation of TCP Timers is invalid. The 
memory allocated to process them is never freed.



Regards
Uwe


Re: DoS attack against TCP services

2015-02-05 Thread 6bone

On Fri, 6 Feb 2015, Robert Elz wrote:


I assume that time (as seen from user processes) is functioning correctly?


ndp -a shows:

...
2001:638:902:2000:290:f5ff:fe39:3815 00:90:f5:39:38:15 vlan14 23h27m30s S
2001:638:902:2000:565:50de:c658:60cc 90:b1:1c:a6:b5:99 vlan14 expired   R
2001:638:902:2000:955:6e00:75d6:8dce f8:b1:56:d7:c2:d1 vlan14 expired   S
...


bash-4.3# ndp -a | grep expired | wc -l
2369

Do you mean this with 'timer' problem '?


Regards
Uwe


Re: DoS attack against TCP services

2015-02-05 Thread 6bone

On Fri, 6 Feb 2015, Robert Elz wrote:


What's more, it seems peculiar to your system, as no-one else seems to
be reporting similar problems.   So I'd be investigating how the timers
are working (or are not working) in the kernel - perhaps even try
selecting a different timer.


I wonder also that no one else has similar problems. The main task of the 
server is to route IPv6 packets between VLANs. The server has a 10G link. 
About the 10GE link VLANs are routed. The highest rate I have observed was 
2 Gb. The CPU load of CPU 0 was over 50%. It was caused by interrupts. All 
other CPUs were almost 100% idle. Everything works fine until the TCP 
connections are not closed properly.



I assume that time (as seen from user processes) is functioning correctly?

kre



I can not notice any problems with the system time.


Regards
Uwe


Re: DoS attack against TCP services

2015-02-04 Thread 6bone
Yes, I am sure that the most TIME_WAIT connections stay forever. I cannot 
say for sure that no TIME_WAIT connection is removed. But I can say, that 
some example connections have been existing for more than 5 hours.



Regards
Uwe



On Wed, 4 Feb 2015, Johnny Billquist wrote:


Date: Wed, 04 Feb 2015 19:54:59 +0100
From: Johnny Billquist b...@update.uu.se
To: 6b...@6bone.informatik.uni-leipzig.de,
Christos Zoulas chris...@astron.com
Cc: current-users@netbsd.org
Subject: Re: DoS attack against TCP services

Are you *sure* the same connections stay around forever, or might it just be 
that you get new ones at a higher rate than old ones go away?


Johnny

On 2015-02-04 19:44, 6b...@6bone.informatik.uni-leipzig.de wrote:

Now the server has over 5000 TIME_WAIT connections.

netstat -a -n | grep TIME_WAIT
tcp0  0  139.18.25.33.59256 198.6.1.83.53
TIME_WAIT
tcp0  0  139.18.25.33.59257 77.222.50.250.53
TIME_WAIT
tcp0  0  139.18.25.33.59258 193.232.128.6.53
TIME_WAIT
tcp0  0  139.18.25.33.59259 78.104.145.37.53
TIME_WAIT
tcp0  0  139.18.25.33.59260 192.5.6.30.53
TIME_WAIT
tcp0  0  139.18.25.33.59261 192.41.162.30.53
TIME_WAIT
tcp0  0  139.18.25.33.59262 192.35.51.30.53
TIME_WAIT
tcp0  0  139.18.25.33.59263 192.43.172.30.53
TIME_WAIT
tcp0  0  139.18.25.33.59264 202.12.27.33.53
TIME_WAIT
...

It seems to be a result of the named. lsof shows that the connections
are not owned by named. lsof doesn't show any of the TIME_WAIT
connections. So stopping and restarting named doesn't delete the
connections.

Any more things that could be interessing for a problem report?


Regards
Uwe


On Wed, 4 Feb 2015, Christos Zoulas wrote:


Date: Wed, 4 Feb 2015 15:40:00 + (UTC)
From: Christos Zoulas chris...@astron.com
To: current-users@netbsd.org
Subject: Re: DoS attack against TCP services

In article
pine.neb.4.64.1502041602460@6bone.informatik.uni-leipzig.de,
6b...@6bone.informatik.uni-leipzig.de wrote:

Hello,

The problem occurred again. The kernel has over 3,000 connections in
TIME_WAIT state. The compounds are after an hour wait not disappeared.
There are more and more connections in the TIME_WAIT state. My settings
are:

net.inet.tcp.mslt.enable = 1
net.inet.tcp.mslt.loopback = 2
net.inet.tcp.mslt.local = 10
net.inet.tcp.mslt.remote = 60
net.inet.tcp.mslt.remote_threshold = 6

The last few times I have restarted the server in order to solve the
problem. Frequent reboots but very inconvenient for a server.

Does anyone have instructions what information I can still gather to
post
a bug report? The statement connections in the TIME_WAIT status are not
degraded are probably not sufficient to find the problem.


Thank you for your efforts


Can you find what daemon/process is being connected to and from where?

christos





Re: DoS attack against TCP services

2015-02-04 Thread 6bone

Now the server has over 5000 TIME_WAIT connections.

netstat -a -n | grep TIME_WAIT
tcp0  0  139.18.25.33.59256 198.6.1.83.53  TIME_WAIT
tcp0  0  139.18.25.33.59257 77.222.50.250.53   TIME_WAIT
tcp0  0  139.18.25.33.59258 193.232.128.6.53   TIME_WAIT
tcp0  0  139.18.25.33.59259 78.104.145.37.53   TIME_WAIT
tcp0  0  139.18.25.33.59260 192.5.6.30.53  TIME_WAIT
tcp0  0  139.18.25.33.59261 192.41.162.30.53   TIME_WAIT
tcp0  0  139.18.25.33.59262 192.35.51.30.53TIME_WAIT
tcp0  0  139.18.25.33.59263 192.43.172.30.53   TIME_WAIT
tcp0  0  139.18.25.33.59264 202.12.27.33.53TIME_WAIT
...

It seems to be a result of the named. lsof shows that the connections are 
not owned by named. lsof doesn't show any of the TIME_WAIT connections. So 
stopping and restarting named doesn't delete the connections.


Any more things that could be interessing for a problem report?


Regards
Uwe


On Wed, 4 
Feb 
2015, Christos Zoulas wrote:



Date: Wed, 4 Feb 2015 15:40:00 + (UTC)
From: Christos Zoulas chris...@astron.com
To: current-users@netbsd.org
Subject: Re: DoS attack against TCP services

In article pine.neb.4.64.1502041602460@6bone.informatik.uni-leipzig.de,
6b...@6bone.informatik.uni-leipzig.de wrote:

Hello,

The problem occurred again. The kernel has over 3,000 connections in
TIME_WAIT state. The compounds are after an hour wait not disappeared.
There are more and more connections in the TIME_WAIT state. My settings
are:

net.inet.tcp.mslt.enable = 1
net.inet.tcp.mslt.loopback = 2
net.inet.tcp.mslt.local = 10
net.inet.tcp.mslt.remote = 60
net.inet.tcp.mslt.remote_threshold = 6

The last few times I have restarted the server in order to solve the
problem. Frequent reboots but very inconvenient for a server.

Does anyone have instructions what information I can still gather to post
a bug report? The statement connections in the TIME_WAIT status are not
degraded are probably not sufficient to find the problem.


Thank you for your efforts


Can you find what daemon/process is being connected to and from where?

christos



Re: DoS attack against TCP services

2015-02-04 Thread 6bone

Hello,

The problem occurred again. The kernel has over 3,000 connections in 
TIME_WAIT state. The compounds are after an hour wait not disappeared. 
There are more and more connections in the TIME_WAIT state. My settings 
are:


net.inet.tcp.mslt.enable = 1
net.inet.tcp.mslt.loopback = 2
net.inet.tcp.mslt.local = 10
net.inet.tcp.mslt.remote = 60
net.inet.tcp.mslt.remote_threshold = 6

The last few times I have restarted the server in order to solve the 
problem. Frequent reboots but very inconvenient for a server.


Does anyone have instructions what information I can still gather to post 
a bug report? The statement connections in the TIME_WAIT status are not 
degraded are probably not sufficient to find the problem.



Thank you for your efforts


Regards
Uwe


On Mon, 19 Jan 2015, Michael van Elst wrote:


Date: Mon, 19 Jan 2015 19:51:31 + (UTC)
From: Michael van Elst mlel...@serpens.de
To: current-users@netbsd.org
Newsgroups: lists.netbsd.current-users
Subject: Re: DoS attack against TCP services

b...@update.uu.se (Johnny Billquist) writes:


Timeout should not depend on distance, and should actually be (at least)
2*MSS, which would be something in the several minutes range.


It's 2*msl but msl can be a bit variable

net.inet.tcp.mslt.enable = 1
net.inet.tcp.mslt.loopback = 2
net.inet.tcp.mslt.local = 10
net.inet.tcp.mslt.remote = 60

If I understand this correctly, these msl values are in units of 500ms,
so 2*msl is the same value in seconds.

What is considered a local connection is a bit of magic and if you set
net.inet.tcp.mslt.enable=0 then everything is treated as a remote
connection.

--
--
   Michael van Elst
Internet: mlel...@serpens.de
   A potential Snark may lurk in every tree.



Re: DoS attack against TCP services

2015-01-19 Thread 6bone

On Mon, 19 Jan 2015, Michael van Elst wrote:


Date: Mon, 19 Jan 2015 09:24:02 + (UTC)
From: Michael van Elst mlel...@serpens.de
To: current-users@netbsd.org
Newsgroups: lists.netbsd.current-users
Subject: Re: DoS attack against TCP services

6b...@6bone.informatik.uni-leipzig.de writes:


Unfortunately, all TCP connections are now in the TIME_WAIT state.

bash-4.3 # netstat -a -n | grep TIME_WAIT | wc -l
 34611

Is there a way to remove it without rebooting the server?


tcpdrop(8)?



It works. But why doesn't drop the kernel it automatically?



TCP connections in TIME_WAIT will expire after some time, usually between
10 and 60 seconds after a connection is closed. The timeout depends on
the distance of the remote machine.


Yes, but in my case the connections are not expired after over one hour.


Uwe



  1   2   >