Re: Nvidia_load not working

2016-09-18 Thread David Wolfskill
On Sun, Sep 18, 2016 at 06:58:38PM -0700, Charles Cowart wrote:
> I did a clean install of RC3 over RC2, and I noticed that nvidia_load="yes"
> no longer appears to work in /boot/loader.conf. I can still load the module
> from etc/rc.conf
> ...

As the nvidia kernel module is part of a port/package, I suspect that
this is more of a "ports" issue than a "stable" issue; in particular, if
the version of the nvidia-driver you're using is sufficiently recent,
you may find a recent ports/UPDATING entry relevant:

20160829:
  AFFECTS: users of x11/nvidia-driver
  AUTHOR: c...@freebsd.org

  The NVidia driver has been updated to version 367.35.  Starting with
  version 358.09, new kernel module was added, nvidia-modeset.ko.  This
  new driver component works in conjunction with the nvidia.ko kernel
  module to program the display engine of the GPU.

  Users that experience hangs when starting X11 server, or observe

(II) NVIDIA(0): Validated MetaModes:
(II) NVIDIA(0): "NULL"

  messages in their /var/log/Xorg.0.log file should replace ``nvidia''
  with ``nvidia-modeset'' in /boot/loader.conf or /etc/rc.conf files,
  depending on how they prefer to load NVidia driver kernel module.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who would murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Nvidia_load not working

2016-09-18 Thread Charles Cowart
I did a clean install of RC3 over RC2, and I noticed that nvidia_load="yes"
no longer appears to work in /boot/loader.conf. I can still load the module
from etc/rc.conf
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.0 stuck on high network load

2016-09-18 Thread Slawa Olhovchenkov
On Sun, Sep 18, 2016 at 10:38:58PM +0200, Hans Petter Selasky wrote:

> On 09/18/16 20:10, Slawa Olhovchenkov wrote:
> > On Sun, Sep 18, 2016 at 07:50:08PM +0200, Hans Petter Selasky wrote:
> >
> >> Hi,
> >>
> >> Got some tips regarding this thread.
> >>
> >> Some things you can try:
> >>
> >> 1) Compile kernel from projects/hps_head instead of your 11-stable?
> >
> > How many difference from 11-stable?
> 
> Hi,
> 
> The callout subsystem has a different implementation. Else identical.

userbase compatible?
can i recompile only kernel?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.0 stuck on high network load

2016-09-18 Thread Hans Petter Selasky

On 09/18/16 20:10, Slawa Olhovchenkov wrote:

On Sun, Sep 18, 2016 at 07:50:08PM +0200, Hans Petter Selasky wrote:


Hi,

Got some tips regarding this thread.

Some things you can try:

1) Compile kernel from projects/hps_head instead of your 11-stable?


How many difference from 11-stable?


Hi,

The callout subsystem has a different implementation. Else identical.




2) Set net.inet.tcp.per_cpu_timers=1


Already. From 10.x, by manual MFC.


OK.




If the system just hangs, it is pretty likely that the timers are going
in a loop due to typical use after free.

Please keep me CC'ed, hence I'm not subscribed to @stable.



--HPS

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: buildkernel fails with a 'invalid conversion specifier' compiler error

2016-09-18 Thread Dimitry Andric
On 18 Sep 2016, at 20:37, Alex T.  wrote:
> 
> I'm on stable/10 branch and have been using it to rebuild world
> and kernel. This is the revision I'm currently trying to build but
> started seeing the following issue way before it.
> 
> URL: svn://svn.freebsd.org/base/stable/10
> Revision: 305760
> 
> The world builds fine, but building the kernel fails with this error:
> 
> /usr/src/sys/cam/cam_xpt.c:1060:27: error:
>  invalid conversion specifier 'b'
>  [-Werror,-Wformat-invalid-specifier]
>  ...printf("%s%d: quirks=0x%b\n", perip...
>~^
> /usr/src/sys/cam/cam_xpt.c:1061:36: error:
>  data argument not used by format
>  string [-Werror,-Wformat-extra-args]
>  ...periph->unit_number, quirks, bit_st...
> 
> This is how my /etc/make.conf looks like:
> WITH_PKGNG=yes
> SSP_CFLAGS=-fstack-protector-all
> WITH_SSP_PORTS=yes
> WITHOUT="DOCS"
> 
> and I don't have /etc/src.conf. Has anyone seen this issue?
> 
> Any idea what might me misconfigured missing here?

It's hard to say what is different on your system, but it looks like the
-fformat-extensions flag is somehow not being used for building your
kernel.  If you can't figure out what causes this, you can try to work
around it by setting WITHOUT_FORMAT_EXTENSIONS, or setting WERROR to
empty.

-Dimitry



signature.asc
Description: Message signed with OpenPGP using GPGMail


buildkernel fails with a 'invalid conversion specifier' compiler error

2016-09-18 Thread Alex T.
Hi guys,

I'm on stable/10 branch and have been using it to rebuild world
and kernel. This is the revision I'm currently trying to build but
started seeing the following issue way before it.

URL: svn://svn.freebsd.org/base/stable/10
Revision: 305760

The world builds fine, but building the kernel fails with this error:

/usr/src/sys/cam/cam_xpt.c:1060:27: error:
  invalid conversion specifier 'b'
  [-Werror,-Wformat-invalid-specifier]
  ...printf("%s%d: quirks=0x%b\n", perip...
~^
/usr/src/sys/cam/cam_xpt.c:1061:36: error:
  data argument not used by format
  string [-Werror,-Wformat-extra-args]
  ...periph->unit_number, quirks, bit_st...

This is how my /etc/make.conf looks like:
WITH_PKGNG=yes
SSP_CFLAGS=-fstack-protector-all
WITH_SSP_PORTS=yes
WITHOUT="DOCS"

and I don't have /etc/src.conf. Has anyone seen this issue?

 Any idea what might me misconfigured missing here?

Thank you.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.0 stuck on high network load

2016-09-18 Thread Slawa Olhovchenkov
On Sun, Sep 18, 2016 at 07:50:08PM +0200, Hans Petter Selasky wrote:

> Hi,
> 
> Got some tips regarding this thread.
> 
> Some things you can try:
> 
> 1) Compile kernel from projects/hps_head instead of your 11-stable?

How many difference from 11-stable?

> 2) Set net.inet.tcp.per_cpu_timers=1

Already. From 10.x, by manual MFC.

> If the system just hangs, it is pretty likely that the timers are going 
> in a loop due to typical use after free.
> 
> Please keep me CC'ed, hence I'm not subscribed to @stable.
> 
> --HPS
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


11.0 stuck on high network load

2016-09-18 Thread Hans Petter Selasky

Hi,

Got some tips regarding this thread.

Some things you can try:

1) Compile kernel from projects/hps_head instead of your 11-stable?

2) Set net.inet.tcp.per_cpu_timers=1

If the system just hangs, it is pretty likely that the timers are going 
in a loop due to typical use after free.


Please keep me CC'ed, hence I'm not subscribed to @stable.

--HPS
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs resilver keeps restarting

2016-09-18 Thread Alan Somers
On Sun, Sep 18, 2016 at 10:46 AM, Marc UBM Bocklet via freebsd-stable
 wrote:
> On Sun, 18 Sep 2016 10:05:52 -0600
> Alan Somers  wrote:
>
>> On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable
>>  wrote:
>> >
>> > Hi all,
>> >
>> > due to two bad cables, I had two drives drop from my striped raidz2
>> > pool (built on top of geli encrypted drives). I replaced one of the
>> > drives before I realized that the cabling was at fault - that's the
>> > drive which is being replaced in the ouput of zpool status below.
>> >
>> > I have just installed the new cables and all sata errors are gone.
>> > However, the resilver of the pool keeps restarting.
>> >
>> > I see no errors in /var/log/messages, but zpool history -i says:
>> >
>> > 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
>> > maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
>> > 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
>> > maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
>> > 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
>> > maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
>> > 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
>> > maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
>> > 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
>> > maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
>> > 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
>> > maxtxg=1219391
>> >
>> > I assume that "scan done complete=0" means that the resilver didn't
>> > finish?
>> >
>> > pool layout is the following:
>> >
>> >  pool: pool
>> >  state: DEGRADED
>> > status: One or more devices is currently being resilvered.  The pool
>> > will continue to function, possibly in a degraded state.
>> > action: Wait for the resilver to complete.
>> >   scan: resilver in progress since Sun Sep 18 14:51:39 2016
>> > 235G scanned out of 9.81T at 830M/s, 3h21m to go
>> > 13.2M resilvered, 2.34% done
>> > config:
>> >
>> > NAMESTATE READ WRITE CKSUM
>> > poolDEGRADED 0 0 0
>> >   raidz2-0  ONLINE   0 0 0
>> > da6.eli ONLINE   0 0 0
>> > da7.eli ONLINE   0 0 0
>> > ada1.eliONLINE   0 0 0
>> > ada2.eliONLINE   0 0 0
>> > da10.eliONLINE   0 0 2
>> > da11.eliONLINE   0 0 0
>> > da12.eliONLINE   0 0 0
>> > da13.eliONLINE   0 0 0
>> >   raidz2-1  DEGRADED 0 0 0
>> > da0.eli ONLINE   0 0 0
>> > da1.eli ONLINE   0 0 0
>> > da2.eli ONLINE   0 0 1
>> > (resilvering)
>> > replacing-3 DEGRADED 0 0 1
>> >   10699825708166646100  UNAVAIL  0 0 0
>> > was /dev/da3.eli da4.eliONLINE   0 0 0
>> > (resilvering)
>> > da3.eli ONLINE   0 0 0
>> > da5.eli ONLINE   0 0 0
>> > da8.eli ONLINE   0 0 0
>> > da9.eli ONLINE   0 0 0
>> >
>> > errors: No known data errors
>> >
>> > system is
>> > FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
>> > Mon Sep 15 22:34:05 CEST 2014
>> > root@xxx:/usr/obj/usr/src/sys/xxx  amd64
>> >
>> > controller is
>> > SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]
>> >
>> > Drives are connected via four four-port sata cables.
>> >
>> > Should I upgrade to 10.3-release or did I make some sort of
>> > configuration error / overlook something?
>> >
>> > Thanks in advance!
>> >
>> > Cheers,
>> > Marc
>>
>> Resilver will start over anytime there's new damage.  In your case,
>> with two failed drives, resilver should've begun after you replaced
>> the first drive, and restarted after you replaced the second.  Have
>> you seen it restart more than that?  If so, keep an eye on the error
>> counters in "zpool status"; they might give you a clue.  You could
>> also raise the loglevel of devd to "info" in /etc/syslog.conf and see
>> what gets logged to /etc/devd.log.  That will tell you if drives a
>> dropping out and automatically rejoining the pool, for example.
>
> Thanks a lot for your fast reply, unfortunately (or not), devd is silent
> and the error count for the pool remains at zero. The resilver, however,
> just keeps restarting. The furthest it got was about 68% 

Re: zfs resilver keeps restarting

2016-09-18 Thread Marc UBM Bocklet via freebsd-stable
On Sun, 18 Sep 2016 10:05:52 -0600
Alan Somers  wrote:

> On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable
>  wrote:
> >
> > Hi all,
> >
> > due to two bad cables, I had two drives drop from my striped raidz2
> > pool (built on top of geli encrypted drives). I replaced one of the
> > drives before I realized that the cabling was at fault - that's the
> > drive which is being replaced in the ouput of zpool status below.
> >
> > I have just installed the new cables and all sata errors are gone.
> > However, the resilver of the pool keeps restarting.
> >
> > I see no errors in /var/log/messages, but zpool history -i says:
> >
> > 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
> > 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
> > 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
> > 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
> > 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
> > 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
> > maxtxg=1219391
> >
> > I assume that "scan done complete=0" means that the resilver didn't
> > finish?
> >
> > pool layout is the following:
> >
> >  pool: pool
> >  state: DEGRADED
> > status: One or more devices is currently being resilvered.  The pool
> > will continue to function, possibly in a degraded state.
> > action: Wait for the resilver to complete.
> >   scan: resilver in progress since Sun Sep 18 14:51:39 2016
> > 235G scanned out of 9.81T at 830M/s, 3h21m to go
> > 13.2M resilvered, 2.34% done
> > config:
> >
> > NAMESTATE READ WRITE CKSUM
> > poolDEGRADED 0 0 0
> >   raidz2-0  ONLINE   0 0 0
> > da6.eli ONLINE   0 0 0
> > da7.eli ONLINE   0 0 0
> > ada1.eliONLINE   0 0 0
> > ada2.eliONLINE   0 0 0
> > da10.eliONLINE   0 0 2
> > da11.eliONLINE   0 0 0
> > da12.eliONLINE   0 0 0
> > da13.eliONLINE   0 0 0
> >   raidz2-1  DEGRADED 0 0 0
> > da0.eli ONLINE   0 0 0
> > da1.eli ONLINE   0 0 0
> > da2.eli ONLINE   0 0 1
> > (resilvering)
> > replacing-3 DEGRADED 0 0 1
> >   10699825708166646100  UNAVAIL  0 0 0
> > was /dev/da3.eli da4.eliONLINE   0 0 0
> > (resilvering)
> > da3.eli ONLINE   0 0 0
> > da5.eli ONLINE   0 0 0
> > da8.eli ONLINE   0 0 0
> > da9.eli ONLINE   0 0 0
> >
> > errors: No known data errors
> >
> > system is
> > FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
> > Mon Sep 15 22:34:05 CEST 2014
> > root@xxx:/usr/obj/usr/src/sys/xxx  amd64
> >
> > controller is
> > SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]
> >
> > Drives are connected via four four-port sata cables.
> >
> > Should I upgrade to 10.3-release or did I make some sort of
> > configuration error / overlook something?
> >
> > Thanks in advance!
> >
> > Cheers,
> > Marc
> 
> Resilver will start over anytime there's new damage.  In your case,
> with two failed drives, resilver should've begun after you replaced
> the first drive, and restarted after you replaced the second.  Have
> you seen it restart more than that?  If so, keep an eye on the error
> counters in "zpool status"; they might give you a clue.  You could
> also raise the loglevel of devd to "info" in /etc/syslog.conf and see
> what gets logged to /etc/devd.log.  That will tell you if drives a
> dropping out and automatically rejoining the pool, for example.

Thanks a lot for your fast reply, unfortunately (or not), devd is silent
and the error count for the pool remains at zero. The resilver, however,
just keeps restarting. The furthest it got was about 68% resilvered.
Usually, it gets to 2 - 3%, then restarts. 

I plan on offlining the pool, upgrading to 10.3, and then reimporting
the pool next. Does that make sense?

Cheers,
Marc

-- 
Marc "UBM" Bocklet 

Re: 11.0 stuck on high network load

2016-09-18 Thread Slawa Olhovchenkov
On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote:

> + jch@ 
> On 09/16/16 at 10:03P, Slawa Olhovchenkov wrote:
> > On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote:
> > 
> > > On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote:
> > > > On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote:
> > > > 
> > > > > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote:
> > > > > 
> > > > > > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote:
> > > > > > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote:
> > > > > > > 
> > > > > > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote:
> > > > > > > > > I am try using 11.0 on Dual E5-2620 (no X2APIC).
> > > > > > > > > Under high network load and may be addtional conditional 
> > > > > > > > > system go to
> > > > > > > > > unresponsible state -- no reaction to network and console 
> > > > > > > > > (USB IPMI
> > > > > > > > > emulation). INVARIANTS give to high overhad. Is this exist 
> > > > > > > > > some way to
> > > > > > > > > debug this?
> > > > > > > > 
> > > > > > > > Can you panic it from console to get to db> to get backtrace 
> > > > > > > > and other
> > > > > > > > info when it goes unresponsive?
> > > > > > > 
> > > > > > > ipmi console don't respond (chassis power diag don't react)
> > > > > > > login on sol console stuck on *tcp.
> > > > > > 
> > > > > > Is 'login' you reference is the ipmi client state, or you mean 
> > > > > > login(1)
> > > > > > on the wedged host ?
> > > > > 
> > > > > on the wedged host
> > > > > 
> > > > > > If BMC stops responding simultaneously with the host, I would 
> > > > > > suspect
> > > > > > the hardware platform issues instead of a software problem.  Do you 
> > > > > > have
> > > > > > dedicated LAN port for BMC ?
> > > > > 
> > > > > Yes.
> > > > > But BMC emulate USB keyboard and this is may be lock inside USB
> > > > > system.
> > > > > "ipmi console don't respond" must be read as "ipmi console runnnig and
> > > > > attached but system don't react to keypress on this console".
> > > > > at the sime moment system respon to `enter` on ipmi sol console, but
> > > > > after enter `root` stuck in login in the '*tcp' state (I think this is
> > > > > NIS related).
> > > > 
> > > > ~^B don't break to debuger.
> > > > But I can login to sol console.
> > > 
> > > You can probably:
> > > debug.kdb.enter: set to enter the debugger
> > > 
> > > or force a panic and get vmcore:
> > > debug.kdb.panic: set to panic the kernel
> > 
> > I am reset this host.
> > PMC samples collected and decoded:
> > 
> > @ CPU_CLK_UNHALTED_CORE [4653445 samples]
> > 
> > 51.86%  [2413083]  lock_delay @ /boot/kernel.VSTREAM/kernel
> >  100.0%  [2413083]   __rw_wlock_hard
> >   100.0%  [2413083]tcp_tw_2msl_scan
> >99.99%  [2412958] pfslowtimo
> > 100.0%  [2412958]  softclock_call_cc
> >  100.0%  [2412958]   softclock
> >   100.0%  [2412958]intr_event_execute_handlers
> >100.0%  [2412958] ithread_loop
> > 100.0%  [2412958]  fork_exit
> >00.01%  [125] tcp_twstart
> > 100.0%  [125]  tcp_do_segment
> >  100.0%  [125]   tcp_input
> >   100.0%  [125]ip_input
> >100.0%  [125] swi_net
> > 100.0%  [125]  intr_event_execute_handlers
> >  100.0%  [125]   ithread_loop
> >   100.0%  [125]fork_exit
> > 
> > 09.43%  [438774]   _rw_runlock_cookie @ /boot/kernel.VSTREAM/kernel
> >  100.0%  [438774]tcp_tw_2msl_scan
> >   99.99%  [438735] pfslowtimo
> >100.0%  [438735]  softclock_call_cc
> > 100.0%  [438735]   softclock
> >  100.0%  [438735]intr_event_execute_handlers
> >   100.0%  [438735] ithread_loop
> >100.0%  [438735]  fork_exit
> >   00.01%  [39] tcp_twstart
> >100.0%  [39]  tcp_do_segment
> > 100.0%  [39]   tcp_input
> >  100.0%  [39]ip_input
> >   100.0%  [39] swi_net
> >100.0%  [39]  intr_event_execute_handlers
> > 100.0%  [39]   ithread_loop
> >  100.0%  [39]fork_exit
> > 
> > 08.57%  [398970]   __rw_wlock_hard @ /boot/kernel.VSTREAM/kernel
> >  100.0%  [398970]tcp_tw_2msl_scan
> >   99.99%  [398940] pfslowtimo
> >100.0%  [398940]  softclock_call_cc
> > 100.0%  [398940]   softclock
> >  100.0%  [398940]intr_event_execute_handlers
> >   100.0%  [398940] ithread_loop
> >100.0%  [398940]  fork_exit
> >   00.01%  [30] tcp_twstart
> >100.0%  [30]  tcp_do_segment
> > 100.0%  [30]   tcp_input
> >  100.0%  [30]ip_input
> >   100.0%  [30] swi_net
> >100.0%  [30]  intr_event_execute_handlers
> > 100.0%  [30]   

Re: nginx and FreeBSD11

2016-09-18 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:

> On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> > 
> > > I am have strange issuse with nginx on FreeBSD11.
> > > I am have FreeBSD11 instaled over STABLE-10.
> > > nginx build for FreeBSD10 and run w/o recompile work fine.
> > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > > totaly craped.
> > > 
> > > I am see next potential cause:
> > > 
> > > 1) clang 3.8 code generation issuse
> > > 2) system library issuse
> > > 
> > > may be i am miss something?
> > > 
> > > How to find real cause?
> > 
> > I find real cause and this like show-stopper for RELEASE.
> > I am use nginx with AIO and AIO from one nginx process corrupt memory
> > from other nginx process. Yes, this is cross-process memory
> > corruption.
> > 
> > Last case, core dumped proccess with pid 1060 at 15:45:14.
> > Corruped memory at 0x860697000.
> > I am know about good memory at 0x86067f800.
> > Dumping (form core) this region to file and analyze by hexdump I am
> > found start of corrupt region -- offset c8c0 from 0x86067f800.
> > 0x86067f800+0xc8c0 = 0x86068c0c0
> > 
> > I am preliminary enabled debuggin of AIO started operation to nginx
> > error log (memory address, file name, offset and size of transfer).
> > 
> > grep -i 86068c0c0 error.log near 15:45:14 give target file.
> > grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> > 
> > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 
> > start 00086068C0C0 561b0   2646736 ce949665cbcd.hls
> > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 
> > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 
> > start 00086472B7C0 7ff70   2999424 ce949665cbcd.hls
> 
> Does nginx only use AIO for regular files or does it also use it with sockets?
> 
> You can try using this patch as a diagnostic (you will need to
> run with INVARIANTS enabled, or at least enabled for vfs_aio.c):
> 
> Index: vfs_aio.c
> ===
> --- vfs_aio.c (revision 305811)
> +++ vfs_aio.c (working copy)
> @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
>* aio_aqueue() acquires a reference to the file that is
>* released in aio_free_entry().
>*/
> + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> + ("%s: vmspace mismatch", __func__));
>   if (cb->aio_lio_opcode == LIO_READ) {
>   auio.uio_rw = UIO_READ;
>   if (auio.uio_resid == 0)
> @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
>  {
>  
>   vmspace_switch_aio(job->userproc->p_vmspace);
> + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> + ("%s: vmspace mismatch", __func__));
>  }
> 
> If this panics, then vmspace_switch_aio() is not working for
> some reason.

I am try using next DTrace script:

#pragma D option dynvarsize=64m

int req[struct vmspace  *, void *];
self int trace;

syscall:freebsd:aio_read:entry
{
this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb));
req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
curthread->td_proc->p_pid; 
}

fbt:kernel:aio_process_rw:entry
{
self->job = args[0];
self->trace = 1;
}

fbt:kernel:aio_process_rw:return
/self->trace/
{
req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] = 0;
self->job = 0;
self->trace = 0;
}

fbt:kernel:vn_io_fault:entry
/self->trace && !req[curthread->td_proc->p_vmspace, 
args[1]->uio_iov[0].iov_base]/
{
this->buf = args[1]->uio_iov[0].iov_base;
printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
curthread->td_proc->p_vmspace, this->buf, req[curthread->td_proc->p_vmspace, 
this->buf]);
}
===

And don't got any messages near nginx core dump.
What I can check next?
May be check context/address space switch for kernel process?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs resilver keeps restarting

2016-09-18 Thread Alan Somers
On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable
 wrote:
>
> Hi all,
>
> due to two bad cables, I had two drives drop from my striped raidz2
> pool (built on top of geli encrypted drives). I replaced one of the
> drives before I realized that the cabling was at fault - that's the
> drive which is being replaced in the ouput of zpool status below.
>
> I have just installed the new cables and all sata errors are gone.
> However, the resilver of the pool keeps restarting.
>
> I see no errors in /var/log/messages, but zpool history -i says:
>
> 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
> 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
> 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
> 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
> 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
> maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
> 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
> maxtxg=1219391
>
> I assume that "scan done complete=0" means that the resilver didn't
> finish?
>
> pool layout is the following:
>
>  pool: pool
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool
> will continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Sun Sep 18 14:51:39 2016
> 235G scanned out of 9.81T at 830M/s, 3h21m to go
> 13.2M resilvered, 2.34% done
> config:
>
> NAMESTATE READ WRITE CKSUM
> poolDEGRADED 0 0 0
>   raidz2-0  ONLINE   0 0 0
> da6.eli ONLINE   0 0 0
> da7.eli ONLINE   0 0 0
> ada1.eliONLINE   0 0 0
> ada2.eliONLINE   0 0 0
> da10.eliONLINE   0 0 2
> da11.eliONLINE   0 0 0
> da12.eliONLINE   0 0 0
> da13.eliONLINE   0 0 0
>   raidz2-1  DEGRADED 0 0 0
> da0.eli ONLINE   0 0 0
> da1.eli ONLINE   0 0 0
> da2.eli ONLINE   0 0 1
> (resilvering)
> replacing-3 DEGRADED 0 0 1
>   10699825708166646100  UNAVAIL  0 0 0
> was /dev/da3.eli da4.eliONLINE   0 0 0
> (resilvering)
> da3.eli ONLINE   0 0 0
> da5.eli ONLINE   0 0 0
> da8.eli ONLINE   0 0 0
> da9.eli ONLINE   0 0 0
>
> errors: No known data errors
>
> system is
> FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
> Mon Sep 15 22:34:05 CEST 2014
> root@xxx:/usr/obj/usr/src/sys/xxx  amd64
>
> controller is
> SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]
>
> Drives are connected via four four-port sata cables.
>
> Should I upgrade to 10.3-release or did I make some sort of
> configuration error / overlook something?
>
> Thanks in advance!
>
> Cheers,
> Marc

Resilver will start over anytime there's new damage.  In your case,
with two failed drives, resilver should've begun after you replaced
the first drive, and restarted after you replaced the second.  Have
you seen it restart more than that?  If so, keep an eye on the error
counters in "zpool status"; they might give you a clue.  You could
also raise the loglevel of devd to "info" in /etc/syslog.conf and see
what gets logged to /etc/devd.log.  That will tell you if drives a
dropping out and automatically rejoining the pool, for example.

-Alan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: working now: freebsd-update to FreeBSD 11.0-RC3 then kernel compile fails In function `iflib_legacy_setup'

2016-09-18 Thread Kim Culhan
Attempted to isolate the problem, compiled a GENERIC kernel with nothing
added,
with no problem.

Then with only pf added and then with pf and altq added and still no
compile problem.

Did not reboot the machine any more between these trials so I think the
update
process was good, do not know what the difference is.

Sorry for the noise.

thanks
-kim
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


freebsd-update to FreeBSD 11.0-RC3 then kernel compile fails In function `iflib_legacy_setup'

2016-09-18 Thread Kim Culhan
Used freebsd-update from 11.0-RC1 to 11.0-RC3 and kernel compile failed:

linking kernel.full
iflib.o: In function `iflib_legacy_setup':
/usr/src/sys/amd64/compile/hyster3/../../../net/iflib.c:4457: undefined
reference to `taskqgroup_attach'

kernel config was GENERIC with added:

> device  pf
> device  pflog
> device  pfsync
>
> options ALTQ
> options ALTQ_CBQ
> options ALTQ_RED
> options ALTQ_RIO
> options ALTQ_CODEL
> options ALTQ_HFSC
> options ALTQ_FAIRQ
> options ALTQ_CDNR
> options ALTQ_PRIQ

Any help greatly appreciated.

thanks
-kim
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"