Re: Bind9 + TCP_FASTOPEN => no rndc

2017-09-27 Thread hiren panchasara
On 09/27/17 at 01:35P, Christopher Sean Hilton wrote:
> I'm trying to configure bind 9.11 as a nameserver on FreeBSD
> 11-STABLE. When the bind9 port compile it enables TCP_FASTOPEN but the
> changes haven't yet been baked into the GENERIC Kernel. I can't find a
> way to disable the use of TCP_FASTOPEN in bind at startup. Is the only
> way to fix this problem to build a new kernel with TCP_FASTOPEN
> enabled?

afaik, yes. options TCP_RFC7413 in kernconf.
It defaults to off even with the option. net.inet.tcp.fastopen.enabled=1
will enable actual use.

Cheers,
Hiren


pgpZJ19TuNV2E.pgp
Description: PGP signature


Re: 'show alllocks' of completely locked machine [Was: Re: Complete IO lockup, state "ufs" from userland, debuging help wanted]

2017-03-06 Thread hiren panchasara
On 03/06/17 at 08:56P, Harry Schmalzbauer wrote:
>  Bez?glich Harry Schmalzbauer's Nachricht vom 05.03.2017 22:59 (localtime):
> >  Hello,
> >
> > I can easily lock up FreeBSD stable/11 from userland. Not that I want to...
> > I'm running squid, which starts an authentication helper
> > "*negotiate_kerberos_auth*", which seems to be the culprit.
> > Completely all IO is blocked, there's no way to get anything from any
> > filesystem.
> > All non IO-requesting processes(threads) run well, including sshd and
> > shells.
> > There's no load (neither cpu nor io) just any process requesting io
> > stucks in state "ufs"
> >
> > Can anyone help me finding out what's going wrong?
> > Serial console is available.
> 
> Dear hackers,
> 
> I managed to get into DDB, but I'm lost from there?
> 
> What information could be usefull to find out the cause of this complete
> lockup?
> 
> I'd need someone who could guide me through ? I'd pay for a debuging
> lesson! (quiet constrained budget though)
> 
> This happens when the machine got stuck:
> 
> intr_event_handle() at intr_event_handle+0x9c/frame 0xfe0093dcb7d0
> intr_execute_handlers() at intr_execute_handlers+0x48/frame
> 0xfe0093dcb800
> lapic_handle_intr() at lapic_handle_intr+0x68/frame 0xfe0093dcb840
> Xapic_isr1() at Xapic_isr1+0xb7/frame 0xfe0093dcb840
> --- interrupt, rip = 0x807b9bd6, rsp = 0xfe0093dcb910, rbp =
> 0xfe0093dcb910 ---
> acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfe0093dcb910
> acpi_cpu_idle() at acpi_cpu_idle+0x2ea/frame 0xfe0093dcb960
> cpu_idle_acpi() at cpu_idle_acpi+0x3f/frame 0xfe0093dcb980
> cpu_idle() at cpu_idle+0x8f/frame 0xfe0093dcb9a0
> sched_idletd() at sched_idletd+0x436/frame 0xfe0093dcba70
> fork_exit() at fork_exit+0x84/frame 0xfe0093dcbab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe0093dcbab0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> 
> 
> db> show alllocks
> Process 1259 (negotiate_kerberos_) thread 0xf80005ddea00 (100096)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1258 (negotiate_kerberos_) thread 0xf80005ddc500 (100252)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1257 (negotiate_kerberos_) thread 0xf80005ddda00 (100247)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1256 (negotiate_kerberos_) thread 0xf80065612500 (100261)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1255 (negotiate_kerberos_) thread 0xf80065612a00 (100260)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1254 (negotiate_kerberos_) thread 0xf80065613000 (100257)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1253 (negotiate_kerberos_) thread 0xf80065614000 (100254)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1252 (negotiate_kerberos_) thread 0xf800651e1000 (100246)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1251 (negotiate_kerberos_) thread 0xf80005ddca00 (100251)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1250 (negotiate_kerberos_) thread 0xf800651e2a00 (100241)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1251 (negotiate_kerberos_) thread 0xf80005ddca00 (100251)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1250 (negotiate_kerberos_) thread 0xf800651e2a00 (100241)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1247 (sqtop) thread 0xf80065650a00 (100259)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1184 (systat) thread 0xf80065613a00 (100255)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> Process 1042 (negotiate_kerberos_) thread 0xf800651e2500 (100242)
> shared lockmgr ufs (ufs) r = 0 (0xf8000523d5f0) locked @
> 

Re: Question about pmcstat

2017-02-07 Thread hiren panchasara
On 02/07/17 at 05:55P, rai...@ultra-secure.de wrote:
> Hi,
> 
> in Brendan Gregg's tutorial:
> 
> http://www.brendangregg.com/blog/2015-03-10/freebsd-flame-graphs.html
> 
> it says to run
> 
> pmcstat ?S RESOURCE_STALLS.ANY -O out.pmcstat sleep 10

Not sure if it's the mailer or what but it should be '-S' and not '?S'.
> 
> However, I get
> 
> freebsd11 ) 0 # pmcstat ?S RESOURCE_STALLS.ANY -O out.pmcstat 
> sleep 10
> pmcstat: [options] [commandline]
>   Measure process and/or system performance using hardware
>   performance monitoring counters.
>   Options include:
>   -C  (toggle) show cumulative counts
>   -D path create profiles in directory "path"
>   -E  (toggle) show counts at process exit
>   -F file write a system-wide callgraph (Kcachegrind 
> format) to "file"
>   -G file write a system-wide callgraph to "file"
>   -M file print executable/gmon file map to "file"
>   -N  (toggle) capture callchains
>   -O file send log output to "file"
>   -P spec allocate a process-private sampling PMC
>   -R file read events from "file"
>   -S spec allocate a system-wide sampling PMC
>   -T  start in top mode
>   -W  (toggle) show counts per context switch
>   -a file print sampled PCs and callgraph to "file"
>   -c cpu-list set cpus for subsequent system-wide PMCs
>   -d  (toggle) track descendants
>   -e  use wide history counter for gprof(1) output
>   -f spec pass "spec" to as plugin option
>   -g  produce gprof(1) compatible profiles
>   -k dir  set the path to the kernel
>   -l secs set duration time
>   -m file print sampled PCs to "file"
>   -n rate set sampling rate
>   -o file send print output to "file"
>   -p spec allocate a process-private counting PMC
>   -q  suppress verbosity
>   -r fsroot   specify FS root directory
>   -s spec allocate a system-wide counting PMC
>   -t process-spec attach to running processes matching 
> "process-spec"
>   -v  increase verbosity
>   -w secs set printing time interval
>   -z depthlimit callchain display depth
> 
> 
> I assume, the event specifier is not correct. Is there a list of the 
> valid ones in FreeBSD 11?

You can see available event-spacs via 'pmccontrol -L'.

Cheers,
Hiren


pgpBpyfDeXAaD.pgp
Description: PGP signature


Re: sonewconn: pcb [...]: Listen queue overflow to human-readable form

2016-12-19 Thread hiren panchasara
On 12/16/16 at 11:20P, Andrey V. Elsukov wrote:
> On 15.12.2016 20:51, hiren panchasara wrote:
> > On 12/15/16 at 05:23P, Eugene M. Zheganin wrote:
> >> Hi.
> >>
> >> Sometimes on one of my servers I got dmesg full of
> >>
> >> sonewconn: pcb 0xf80373aec000: Listen queue overflow: 49 already in
> >> queue awaiting acceptance (6 occurrences)
> > [skip]
> >>
> >> but at the time of investigation the socket is already closed and lsof
> >> cannot show me the owner. I wonder if the kernel can itself decode this
> >> output and write it in the human-readable form ?
> > 
> > I have this not-quite-correct patch that may help you. (If you follow the
> > discussion there, you'd know why its not complete.) 
> > 
> > https://lists.freebsd.org/pipermail/freebsd-net/2014-March/038074.html
> 
> Hi Hiren,
> 
> I think the check for socket's domain should be enough?
> 
> 
> -- 
> WBR, Andrey V. Elsukov

> Index: sys/kern/uipc_socket.c
> ===
> --- sys/kern/uipc_socket.c(revision 309834)
> +++ sys/kern/uipc_socket.c(working copy)
> @@ -139,6 +139,7 @@ __FBSDID("$FreeBSD$");
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -577,10 +578,15 @@ sonewconn(struct socket *head, int connstatus)
>   overcount++;
>  
>   if (ratecheck(, )) {
> - log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow: "
> - "%i already in queue awaiting acceptance "
> - "(%d occurrences)\n",
> - __func__, head->so_pcb, head->so_qlen, overcount);
> + if (INP_CHECK_SOCKAF(head, AF_INET) ||
> + INP_CHECK_SOCKAF(head, AF_INET6))
> + over = ntohs(sotoinpcb(head)->inp_lport);
> + else
> + over = 0;
> + log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow on "
> + "port %d: %i already in queue awaiting acceptance "
> + "(%d occurrences)\n", __func__, head->so_pcb,
> + over, head->so_qlen, overcount);
>  
>   overcount = 0;
>   }

Andrey,
Thanks, this seems correct to me. :-)

Cheers,
Hiren


pgpxDeeWRcL5Y.pgp
Description: PGP signature


Re: sonewconn: pcb [...]: Listen queue overflow to human-readable form

2016-12-15 Thread hiren panchasara
On 12/15/16 at 05:23P, Eugene M. Zheganin wrote:
> Hi.
> 
> Sometimes on one of my servers I got dmesg full of
> 
> sonewconn: pcb 0xf80373aec000: Listen queue overflow: 49 already in
> queue awaiting acceptance (6 occurrences)
[skip]
> 
> but at the time of investigation the socket is already closed and lsof
> cannot show me the owner. I wonder if the kernel can itself decode this
> output and write it in the human-readable form ?

I have this not-quite-correct patch that may help you. (If you follow the
discussion there, you'd know why its not complete.) 

https://lists.freebsd.org/pipermail/freebsd-net/2014-March/038074.html

Cheers,
Hiren


pgps4v8nv8jEZ.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-10-06 Thread hiren panchasara
On 10/06/16 at 09:51P, Julien Charbon wrote:
> 
>  Hi Hiren,
> 
> On 10/6/16 9:44 AM, hiren panchasara wrote:
> > On 10/06/16 at 09:28P, Julien Charbon wrote:
> >> On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote:
> >>> On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote:
> >>>> 
> >>>>  I am still trying to reproduce your issue, without success so far.
> >>
> >>  Thanks for Slawa effort and multiple debug report we start seeing the
> >> bottom of this issue and it seems to be a generic one.  The most useful
> >> report being:
> >>
> >> panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL
> > 
> > I know there are multiple and probably related problems being
> > discussed here but what about the one mentioned in subject of this
> > thread?
> > Apologies if I've missed something conclusive in one of the replies of
> > this thread about that issue.
> 
>  This issue can lead the machine being stuck on high network load, by
> double freeing an inp, you can corrupt/leak an inp lock, and the network
> stack can wait definitely on this inp lock to be released.  You get this
> assert only with INVARIANTS defined.
> 
>  Of usual, we can have more than one issue here, but this
> INP_TIMEWAI|INP_DROPPED issue need to be fixed anyway.

Thanks for the explanation, Julien.

Cheers,
Hiren


pgpsLxxVbSK2k.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-10-06 Thread hiren panchasara
On 10/06/16 at 09:28P, Julien Charbon wrote:
> 
>  Hi,
> 
> On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote:
> > On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote:
> >> 
> >>  I am still trying to reproduce your issue, without success so far.
> 
>  Thanks for Slawa effort and multiple debug report we start seeing the
> bottom of this issue and it seems to be a generic one.  The most useful
> report being:
> 
> panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL

I know there are multiple and probably related problems being
discussed here but what about the one mentioned in subject of this
thread?
Apologies if I've missed something conclusive in one of the replies of
this thread about that issue.

Cheers,
Hiren


pgpZxjPShG4YG.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-09-16 Thread hiren panchasara
On 09/16/16 at 02:46P, hiren panchasara wrote:
> On 09/16/16 at 11:30P, Slawa Olhovchenkov wrote:
> > On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote:
> > 
> > > 
> > > As I suspected, this looks like a hang trying to lock V_tcbinfo.
> > > 
> > > I'm ccing Julien here who worked on WLOCK -> RLOCK transition to improve
> > > performance for short-lived connections. I am not too sure if thats the
> > > problem but looks in similar area so he may be able to provide some
> > > insights.
> > 
> > No, this is other case. In may case at this time no network traffic
> > more then hour. This is some sore of deadlock or like.
> 
> In my limited understanding, such deadlock like situation can occur
> without traffic too.

Err, I meant to say light traffic.

Cheers,
Hiren


pgpY4y5mJrKJT.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-09-16 Thread hiren panchasara
On 09/16/16 at 11:30P, Slawa Olhovchenkov wrote:
> On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote:
> 
> > 
> > As I suspected, this looks like a hang trying to lock V_tcbinfo.
> > 
> > I'm ccing Julien here who worked on WLOCK -> RLOCK transition to improve
> > performance for short-lived connections. I am not too sure if thats the
> > problem but looks in similar area so he may be able to provide some
> > insights.
> 
> No, this is other case. In may case at this time no network traffic
> more then hour. This is some sore of deadlock or like.

In my limited understanding, such deadlock like situation can occur
without traffic too.

Cheers,
Hiren


pgp4HrxJ3cBOu.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-09-16 Thread hiren panchasara
+ jch@ 
On 09/16/16 at 10:03P, Slawa Olhovchenkov wrote:
> On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote:
> 
> > On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote:
> > > On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote:
> > > 
> > > > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote:
> > > > 
> > > > > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote:
> > > > > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote:
> > > > > > 
> > > > > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote:
> > > > > > > > I am try using 11.0 on Dual E5-2620 (no X2APIC).
> > > > > > > > Under high network load and may be addtional conditional system 
> > > > > > > > go to
> > > > > > > > unresponsible state -- no reaction to network and console (USB 
> > > > > > > > IPMI
> > > > > > > > emulation). INVARIANTS give to high overhad. Is this exist some 
> > > > > > > > way to
> > > > > > > > debug this?
> > > > > > > 
> > > > > > > Can you panic it from console to get to db> to get backtrace and 
> > > > > > > other
> > > > > > > info when it goes unresponsive?
> > > > > > 
> > > > > > ipmi console don't respond (chassis power diag don't react)
> > > > > > login on sol console stuck on *tcp.
> > > > > 
> > > > > Is 'login' you reference is the ipmi client state, or you mean 
> > > > > login(1)
> > > > > on the wedged host ?
> > > > 
> > > > on the wedged host
> > > > 
> > > > > If BMC stops responding simultaneously with the host, I would suspect
> > > > > the hardware platform issues instead of a software problem.  Do you 
> > > > > have
> > > > > dedicated LAN port for BMC ?
> > > > 
> > > > Yes.
> > > > But BMC emulate USB keyboard and this is may be lock inside USB
> > > > system.
> > > > "ipmi console don't respond" must be read as "ipmi console runnnig and
> > > > attached but system don't react to keypress on this console".
> > > > at the sime moment system respon to `enter` on ipmi sol console, but
> > > > after enter `root` stuck in login in the '*tcp' state (I think this is
> > > > NIS related).
> > > 
> > > ~^B don't break to debuger.
> > > But I can login to sol console.
> > 
> > You can probably:
> > debug.kdb.enter: set to enter the debugger
> > 
> > or force a panic and get vmcore:
> > debug.kdb.panic: set to panic the kernel
> 
> I am reset this host.
> PMC samples collected and decoded:
> 
> @ CPU_CLK_UNHALTED_CORE [4653445 samples]
> 
> 51.86%  [2413083]  lock_delay @ /boot/kernel.VSTREAM/kernel
>  100.0%  [2413083]   __rw_wlock_hard
>   100.0%  [2413083]tcp_tw_2msl_scan
>99.99%  [2412958] pfslowtimo
> 100.0%  [2412958]  softclock_call_cc
>  100.0%  [2412958]   softclock
>   100.0%  [2412958]intr_event_execute_handlers
>100.0%  [2412958] ithread_loop
> 100.0%  [2412958]  fork_exit
>00.01%  [125] tcp_twstart
> 100.0%  [125]  tcp_do_segment
>  100.0%  [125]   tcp_input
>   100.0%  [125]ip_input
>100.0%  [125] swi_net
> 100.0%  [125]  intr_event_execute_handlers
>  100.0%  [125]   ithread_loop
>   100.0%  [125]fork_exit
> 
> 09.43%  [438774]   _rw_runlock_cookie @ /boot/kernel.VSTREAM/kernel
>  100.0%  [438774]tcp_tw_2msl_scan
>   99.99%  [438735] pfslowtimo
>100.0%  [438735]  softclock_call_cc
> 100.0%  [438735]   softclock
>  100.0%  [438735]intr_event_execute_handlers
>   100.0%  [438735] ithread_loop
>100.0%  [438735]  fork_exit
>   00.01%  [39] tcp_twstart
>100.0%  [39]  tcp_do_segment
> 100.0%  [39]   tcp_input
>  100.0%  [39]ip_input
>   100.0%  [39] swi_net
>100.0%  [39]  intr_event_execute_handlers
> 100.0%  [39]   ithread_loop
>  100.0%  [39]fork_exit
> 
> 08.57%  [398970]   __rw_wlock_hard @ /boot/kernel.VSTRE

Re: 11.0 stuck on high network load

2016-09-16 Thread hiren panchasara
On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote:
> On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote:
> 
> > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote:
> > 
> > > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote:
> > > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote:
> > > > 
> > > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote:
> > > > > > I am try using 11.0 on Dual E5-2620 (no X2APIC).
> > > > > > Under high network load and may be addtional conditional system go 
> > > > > > to
> > > > > > unresponsible state -- no reaction to network and console (USB IPMI
> > > > > > emulation). INVARIANTS give to high overhad. Is this exist some way 
> > > > > > to
> > > > > > debug this?
> > > > > 
> > > > > Can you panic it from console to get to db> to get backtrace and other
> > > > > info when it goes unresponsive?
> > > > 
> > > > ipmi console don't respond (chassis power diag don't react)
> > > > login on sol console stuck on *tcp.
> > > 
> > > Is 'login' you reference is the ipmi client state, or you mean login(1)
> > > on the wedged host ?
> > 
> > on the wedged host
> > 
> > > If BMC stops responding simultaneously with the host, I would suspect
> > > the hardware platform issues instead of a software problem.  Do you have
> > > dedicated LAN port for BMC ?
> > 
> > Yes.
> > But BMC emulate USB keyboard and this is may be lock inside USB
> > system.
> > "ipmi console don't respond" must be read as "ipmi console runnnig and
> > attached but system don't react to keypress on this console".
> > at the sime moment system respon to `enter` on ipmi sol console, but
> > after enter `root` stuck in login in the '*tcp' state (I think this is
> > NIS related).
> 
> ~^B don't break to debuger.
> But I can login to sol console.

You can probably:
debug.kdb.enter: set to enter the debugger

or force a panic and get vmcore:
debug.kdb.panic: set to panic the kernel

Cheers,
Hiren


pgpgq86WSIRNf.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-09-14 Thread hiren panchasara
On 09/15/16 at 12:57P, Slawa Olhovchenkov wrote:
> On Wed, Sep 14, 2016 at 02:43:06PM -0700, hiren panchasara wrote:
> 
> > On 09/15/16 at 12:35P, Slawa Olhovchenkov wrote:
> > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote:
> > > 
> > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote:
> > > > > I am try using 11.0 on Dual E5-2620 (no X2APIC).
> > > > > Under high network load and may be addtional conditional system go to
> > > > > unresponsible state -- no reaction to network and console (USB IPMI
> > > > > emulation). INVARIANTS give to high overhad. Is this exist some way to
> > > > > debug this?
> > > > 
> > > > Can you panic it from console to get to db> to get backtrace and other
> > > > info when it goes unresponsive?
> > > 
> > > ipmi console don't respond (chassis power diag don't react)
> > > login on sol console stuck on *tcp.
> > 
> > I assume you tried ~^b (tilda followed by ctrl+b) without success?
> 
> ~B, as in man ipmitool

No, not shift-b but ctrl-b.

I am not aware of ipmitool reference. On unresponsive console, try
~^b (tilda followed by ctrl+b)

> 
> > That usually drops into db>
> 
> May be now need some sysctl for enable this?

There is a sysctl for this too but on console, the keystrokes I said
should work, imo.

Cheers,
Hiren


pgpUmm0V7ZAAG.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-09-14 Thread hiren panchasara
On 09/15/16 at 12:35P, Slawa Olhovchenkov wrote:
> On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote:
> 
> > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote:
> > > I am try using 11.0 on Dual E5-2620 (no X2APIC).
> > > Under high network load and may be addtional conditional system go to
> > > unresponsible state -- no reaction to network and console (USB IPMI
> > > emulation). INVARIANTS give to high overhad. Is this exist some way to
> > > debug this?
> > 
> > Can you panic it from console to get to db> to get backtrace and other
> > info when it goes unresponsive?
> 
> ipmi console don't respond (chassis power diag don't react)
> login on sol console stuck on *tcp.

Also *tcp means its stuck on lock tcp? if so, that'd be lock on
V_tcbinfo. I think?

tcp_subr.c has tcp_init() which calls
in_pcbinfo_init(_tcbinfo, "tcp", _tcb, hashsize, hashsize,
"tcp_inpcb", tcp_inpcb_init, NULL, 0, IPI_HASHFIELDS_4TUPLE);

and "tcp" is the name used to initialise the lock inside
in_pcbinfo_init() with
INP_INFO_LOCK_INIT(pcbinfo, name); 

What exact svn rev are you on? Also do you have any local changes?

Cheers,
Hiren


pgpU1mnjIQD8n.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-09-14 Thread hiren panchasara
On 09/15/16 at 12:35P, Slawa Olhovchenkov wrote:
> On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote:
> 
> > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote:
> > > I am try using 11.0 on Dual E5-2620 (no X2APIC).
> > > Under high network load and may be addtional conditional system go to
> > > unresponsible state -- no reaction to network and console (USB IPMI
> > > emulation). INVARIANTS give to high overhad. Is this exist some way to
> > > debug this?
> > 
> > Can you panic it from console to get to db> to get backtrace and other
> > info when it goes unresponsive?
> 
> ipmi console don't respond (chassis power diag don't react)
> login on sol console stuck on *tcp.

I assume you tried ~^b (tilda followed by ctrl+b) without success?

That usually drops into db>

I am also fighting an issue where upon said keystrokes, I see 
"KDB: enter: Break to debugger" but it doesn't drop to db>
At that point I have to 'ipmitool blah power reset' the box.

Cheers,
Hiren


pgphfkL7YoAtv.pgp
Description: PGP signature


Re: 11.0 stuck on high network load

2016-09-04 Thread hiren panchasara
On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote:
> I am try using 11.0 on Dual E5-2620 (no X2APIC).
> Under high network load and may be addtional conditional system go to
> unresponsible state -- no reaction to network and console (USB IPMI
> emulation). INVARIANTS give to high overhad. Is this exist some way to
> debug this?

Can you panic it from console to get to db> to get backtrace and other
info when it goes unresponsive?

Cheers,
Hiren


pgpyWFAAPU7AK.pgp
Description: PGP signature


Re: intr using Swap

2016-02-17 Thread hiren panchasara
On 02/17/16 at 04:44P, Efra?n D?ctor wrote:
> El 17/02/2016 a las 01:15 p. m., dweimer escribi?:
> >
> > They may not show as swapped unless the entire process is actually 
> > swapped, which would be unlikely to occur. Personally I wouldn't worry 
> > about it, the only thing I can think of is to restart processes one at 
> > a time to see which one clears up the swap usage. Granted you may see 
> > a little clear after each process.
> >
> > The more important task would be to determine what caused the memory 
> > to run out in the first place, and decide if its going to be a 
> > frequent enough occurrence to justify adding physical memory to the 
> > system.
> >
> > There is likely some way to find out what is using it, but that is 
> > beyond my knowledge.
> >
> > -- 
> > Thanks,
> >Dean E. Weimer
> > http://www.dweimer.net/
> 
> The server has 64 GB of RAM, 40-45 GB are always inactive thats why I'm 
> wondering why are the processes being swapped out.

Yes, I've seen this too. Inact end up accumulating a very large chunk of
memory leaving Free to very low. 

What VM/pagedaemon seems to care about is Free+Cache and not just Free.
I kind of get that Free mem is wasted mem but putting everything in Inact
to the point that machine has to go into swap when a sudden need arises
also doesn't seem right.

I guess it all boils down to adjusting defaults to the system's need.
i.e. if you know you have a proc that may need a large chunk of mem
you'd need to tweak free+cache target accordingly. What I find lacking
is the correct/easy way to do it. If I look at available sysctls:
vm.v_free_min: Minimum low-free-pages threshold
vm.v_cache_min: Min pages on cache queue
vm.v_free_target: Desired free pages
And I cannot get them to do the right thing to have more Free around so
swapping doesn't happen in sudden need. And are these all runtime
sysctls? OR does it require reboot for them to work right? 

Anyways, enough rant from someone who doesn't know much about VM. :-)

Cheers,
Hiren


pgpfVcf1Yfarx.pgp
Description: PGP signature


Re: Strange TCP behaviour in STABLE

2016-01-14 Thread hiren panchasara
On 01/15/16 at 02:48P, Slawa Olhovchenkov wrote:
> 02:14:20.410159 IP 127.0.0.1.5423 > 127.0.0.1.80: Flags [S], seq 818919263, 
> win 65535, options [mss 16344,nop,wscale 9,sackOK,TS val 749536482 ecr 0], 
> length 0
> 02:14:20.410173 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [S.], seq 644693209, 
> ack 818919264, win 65535, options [mss 16344,nop,wscale 9,sackOK,TS val 
> 3209080170 ecr 749536482], length 0
> 02:14:20.410193 IP 127.0.0.1.5423 > 127.0.0.1.80: Flags [.], ack 1, win 159, 
> options [nop,nop,TS val 749536482 ecr 3209080170], length 0
> 02:14:20.410212 IP 127.0.0.1.5423 > 127.0.0.1.80: Flags [P.], seq 1:417, ack 
> 1, win 159, options [nop,nop,TS val 749536482 ecr 3209080170], length 416
> 02:14:20.410236 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [.], ack 417, win 
> 158, options [nop,nop,TS val 3209080170 ecr 749536482], length 0
> 02:14:20.412066 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [P.], seq 1:557, ack 
> 417, win 159, options [nop,nop,TS val 3209080172 ecr 749536482], length 556
> 02:14:20.412086 IP 127.0.0.1.5423 > 127.0.0.1.80: Flags [.], ack 557, win 
> 158, options [nop,nop,TS val 749536484 ecr 3209080172], length 0
> 02:14:20.412163 IP 127.0.0.1.5423 > 127.0.0.1.80: Flags [F.], seq 417, ack 
> 557, win 159, options [nop,nop,TS val 749536484 ecr 3209080172], length 0
> 02:14:20.412175 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [.], ack 418, win 
> 159, options [nop,nop,TS val 3209080172 ecr 749536484], length 0
> 02:14:20.412241 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209080172 ecr 749536484], length 0
> 02:14:20.656139 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209080416 ecr 749536484], length 0
> 02:14:20.918187 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209080678 ecr 749536484], length 0
> 02:14:21.249783 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209081010 ecr 749536484], length 0
> 02:14:21.692560 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209081452 ecr 749536484], length 0
> 02:14:22.371972 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209082133 ecr 749536484], length 0
> 02:14:23.531776 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209083292 ecr 749536484], length 0
> 02:14:25.651788 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209085412 ecr 749536484], length 0
> 02:14:29.722527 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209089482 ecr 749536484], length 0
> 02:14:37.618090 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209097379 ecr 749536484], length 0
> 02:14:53.178362 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209112938 ecr 749536484], length 0
> 02:15:08.737766 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209128498 ecr 749536484], length 0
> 02:15:24.307989 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [F.], seq 557, ack 
> 418, win 159, options [nop,nop,TS val 3209144068 ecr 749536484], length 0
> 02:15:39.869441 IP 127.0.0.1.80 > 127.0.0.1.5423: Flags [R.], seq 558, ack 
> 418, win 159, options [nop,nop,TS val 3209159630 ecr 749536484], length 0
> 
> What purpose for this retransmits?

A few questions:
1) What svn rev are you on? 
2) Does this happen all the time or is there a specific trigger?
3) Do you see this on -head?

Cheers,
Hiren


pgpvRnLJ4Cv8X.pgp
Description: PGP signature


Re: TCP regression between 288167 and 291456

2015-12-08 Thread hiren panchasara
+ net@
On 12/03/15 at 06:24P, Slawa Olhovchenkov wrote:
> After upgrading STABLE to r291456 I am see bunch of sockets in
> TIME_WAIT state. In normal situation I am expect about 30k-50k such
> sockets. Now I am see all of net.inet.tcp.maxtcptw (440k currently).
> Setting net.inet.tcp.msl to low value don't reduce this sockets.
> 
> I am see socket in TIME_WAIT state during 30 minutes.
> Perhapsh in this state socket may be for ever

Does updating to latest stable/10 help? I see there were other fixes
that went in after r291456. (I am not 100% if that'll help but worth
trying.)

I've added -net to get more relevant eyes on the problem.

Cheers,
Hiren


pgpTl1ouwj69v.pgp
Description: PGP signature


Re: Silent data corruption on em(4) interfaces

2015-08-24 Thread hiren panchasara
On 08/20/15 at 12:57P, KOT MATPOCKuH wrote:
 Hello!
 
 I got silent data corruption when transferring data via em(4) interface on
 10.2-STABLE r286912.
 1. I got broken large file transferred via ftp (MD5 checksum mismatched);
 2. I got disconnects when transferring large data via ssh with messages:
 Corrupted MAC on input.
 Disconnecting: Packet corrupt
 
 Problem occurs only after few hours of uptime. Immediately after reboot I
 transferred same file via ftp without any errors.
 
 I tried to use:
 - em0 and em2 interfaces in link aggregation
 - em1 as clean interface
 But I got same problem in both cases.
 
 Also one time when transferring file I got this messages:
 em0: Interface stopped DISTRIBUTING, possible flapping
 em0: Watchdog timeout -- resetting
 em2: Interface stopped DISTRIBUTING, possible flapping
 em2: Watchdog timeout -- resetting
 
 netstat -in does not see any problems:
 NameMtu Network   Address  Ipkts Ierrs IdropOpkts
 Oerrs  Coll
 em01500 Link#1  00:14:4f:01:3f:7a  6689452 0 0
 146720 0 0
 em11500 Link#2  00:14:4f:01:3f:7b  5732168 0 0
 2865912 0 0
 em21500 Link#3  00:14:4f:01:3f:7c   501817 0 0
 3392333 0 0
 
 Network adapters is build in to the Sun Fire X4100 mother board:
 em0@pci0:1:1:0: class=0x02 card=0x10118086 chip=0x10108086 rev=0x03
 hdr=0x00
 vendor = 'Intel Corporation'
 device = '82546EB Gigabit Ethernet Controller (Copper)'
 class  = network
 subclass   = ethernet
 
 TCP_OFFLOAD disabled in kernel's config.

See if disabling TSO helps. You can disable on the interface with
'-tso'.

Cheers,
Hiren


pgpNIsnr4xHzR.pgp
Description: PGP signature