subject:"freeze"

Re: Rare NVME related freeze at boot (was: Re: NVME aborting outstanding i/o)

2019-04-05 Thread Patrick M. Hausen

Hi!

> Am 05.04.2019 um 16:36 schrieb Warner Losh :
> What normally comes after the nvme6 line in boot? Often times it's the next 
> thing after the last message that's the issue, not the last thing.

nvme7 ;-)

And I had hangs at nvme1, nvme3, … as well.

Patrick
-- 
punkt.de GmbH   Internet - Dienstleistungen - Beratung
Kaiserallee 13a Tel.: 0721 9109-0 Fax: -100
76133 Karlsruhe i...@punkt.de   http://punkt.de
AG Mannheim 108285  Gf: Juergen Egeling

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Rare NVME related freeze at boot (was: Re: NVME aborting outstanding i/o)

2019-04-05 Thread Warner Losh

On Fri, Apr 5, 2019 at 6:41 AM Patrick M. Hausen  wrote:

> Hi all,
>
> in addition to the aborted commands every dozen of system boots or so
> (this order of magnitude) the kernel simply hangs during initialisation of
> one of the NVME devices:
>
> https://cloud.hausen.com/s/TxPTDFJwMe6sJr2
>
> The particular device affected is not constant.
>
> A power cycle fixes it, the system has not shown hangs/freezes during
> multiuser operation, yet.
>
>
> Any ideas?
>

What normally comes after the nvme6 line in boot? Often times it's the next
thing after the last message that's the issue, not the last thing.

Warner
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Rare NVME related freeze at boot (was: Re: NVME aborting outstanding i/o)

2019-04-05 Thread Patrick M. Hausen

Hi all,

in addition to the aborted commands every dozen of system boots or so
(this order of magnitude) the kernel simply hangs during initialisation of
one of the NVME devices:

https://cloud.hausen.com/s/TxPTDFJwMe6sJr2

The particular device affected is not constant.

A power cycle fixes it, the system has not shown hangs/freezes during
multiuser operation, yet.


Any ideas?
Patrick
-- 
punkt.de GmbH   Internet - Dienstleistungen - Beratung
Kaiserallee 13a Tel.: 0721 9109-0 Fax: -100
76133 Karlsruhe i...@punkt.de   http://punkt.de
AG Mannheim 108285  Gf: Juergen Egeling

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Freebsd-11.2-p2/amd64: Black-screen freeze with base i915kms and xf86-video-intel

2018-09-03 Thread Karl Dunn


I need help getting Xorg working with the intel driver.  Here is some info:

Machine: Dellox755 shipped 2009-May-16

dmesg snips:
 FreeBSD 11.2-RELEASE-p2 #0: Tue Aug 14 21:45:40 UTC 2018
 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
 ...
 CPU: Intel(R) Core(TM)2 Duo CPU E8400  @ 3.00GHz (2992.19-MHz K8-class CPU)
   Origin="GenuineIntel"  Id=0x1067a  Family=0x6  Model=0x17  Stepping=10
   
Features=0xbfebfbff
   
Features2=0xc08e3fd
   AMD Features=0x20100800
   AMD Features2=0x1
   VT-x: (disabled in BIOS) HLT,PAUSE
   TSC: P-state invariant, performance statistics
 real memory  = 3221225472 (3072 MB)
 ...
 vgapci0:  port 0xec90-0xec97 mem 
0xfea0-0xfea7,0xd000-0xdfff,0xfeb0-0xfebf irq 16 at device 
2.0 on pci0
 agp0:  on vgapci0
 agp0: aperture size is 256M, detected 7164k stolen memory
 ...

pciconf -lv snip:
 vgapci0@pci0:0:2:0:class=0x03 card=0x02111028 chip=0x29b28086 rev=0x02 
hdr=0x00
 vendor = 'Intel Corporation'
 device = '82Q35 Express Integrated Graphics Controller'
 class  = display
 subclass   = VGA
 vgapci1@pci0:0:2:1:class=0x038000 card=0x02111028 chip=0x29b38086 rev=0x02 
hdr=0x00
 vendor = 'Intel Corporation'
 device = '82Q35 Express Integrated Graphics Controller'
 class  = display

Screen capture, intall of xf86-video-intel:
 # pkg install xf86-video-intel
 Updating FreeBSD repository catalogue...
 FreeBSD repository is up to date.
 All repositories are up to date.
 Checking integrity... done (0 conflicting)
 The following 1 package(s) will be affected (of 0 checked):

 New packages to be INSTALLED:
 xf86-video-intel: 2.99.917.20180512

 Number of packages to be installed: 1

 The process will require 2 MiB more space.

 Proceed with this action? [y/N]: y
 [1/1] Installing xf86-video-intel-2.99.917.20180512...
 [1/1] Extracting xf86-video-intel-2.99.917.20180512: 100%

If I do anything much different than the following, I get a black-screen 
freeze, no network, no interrupts, dead keyboard.  Only way out is 
power-off/power-on, followed by single-user boot and fsck.
 kldload i915
 startx
which gets X going, but it uses VESA, not the intel driver.
For examples:
 kldload i915kms --> black-screen-freeze
 kldload drm2; kldload i915kms --> BSF
 Don't load either --> BSF (xf86-video-intel installed, no drm or 915 modules)

HOWEVER: Exactly once, I forgot to kldload any of drm or i915kms or i915, and 
then ran startx, Xorg used the intel driver and it seemed to work OK.  It even 
loaded i915kms and drm2 by itself, as was shown by kldstat after Xorg had shut 
down.  It never did that again!  Unfortunately I can't find the Xorg log I 
think I saved!

Some more detail, for the case where Xorg uses VESA:
kldstat right after boot:
 Id Refs AddressSize Name
  17 0x8020 20647f8  kernel
  21 0x82421000 1780 uhid.ko
  31 0x82423000 2328 ums.ko
kldstat right after loading i915.ko:
 Id Refs AddressSize Name
  1   13 0x8020 20647f8  kernel
  21 0x82421000 1780 uhid.ko
  31 0x82423000 2328 ums.ko
  41 0x82426000 6d44 i915.ko
  51 0x8242d000 10708drm.ko
kldstat after startx and then exiting X:
 Id Refs AddressSize Name
  1   33 0x8020 20647f8  kernel
  21 0x82421000 1780 uhid.ko
  31 0x82423000 2328 ums.ko
  41 0x82426000 6d44 i915.ko
  51 0x8242d000 10708drm.ko
  61 0x8243e000 7a2b8i915kms.ko
  71 0x824b9000 3f8ccdrm2.ko
  84 0x824f9000 1ed0 iicbus.ko
  91 0x824fb000 e58  iic.ko
 101 0x824fc000 1570 iicbb.ko
Note that Xorg loaded the last five itself.

all.log snip, evidently because of kldload i915:
 Aug 31 15:02:00 dellox755 kernel: drm0:  on vgapci0
 Aug 31 15:02:00 dellox755 kernel: info: [drm] MSI enabled 1 message(s)
 Aug 31 15:02:00 dellox755 kernel: info: [drm] AGP at 0xd000 256MB
 Aug 31 15:02:00 dellox755 kernel: info: [drm] Initialized i915 1.6.0 20080730

all.log snip, evidently because of startx:
 Aug 31 15:02:42 dellox755 kernel: info: [drm] Initialized drm 1.1.0 20060810
 Aug 31 15:02:42 dellox755 kernel: drmn0:  on vgapci0
 Aug 31 15:02:42 dellox755 kernel: error: [drm:pid706:drm_get_minor] *ERROR* 
Failed to create cdev: 17
 Aug 31 15:02:42 dellox755 kernel: device_attach: drmn0 attach returned -17

Xorg.0.log snips:
 [  3056.909]
 X.Org X Server 1.18.4
 Release Date: 2016-07-19
 ...
 [  3057.252] (==) AIGLX enabled
 [  3057.253] (II) LoadModule: "intel"
 [  3057.253] (II) Loading /usr/local/lib/xorg/modules/drivers/intel_drv.so
 [  3057.335] (II) Module intel: vendor="X.Org Foundation"
 [  3057.335]   compiled for 1.18.4, module version = 2.99.91

Re: make kernel ctfmerge freeze on 11-STABLE

2017-01-02 Thread Aryeh Friedman

Completely cleaning out /usr/src and /usr/obj fixed it (both current and
past revisions)

On Mon, Jan 2, 2017 at 8:33 AM, Aryeh Friedman 
wrote:

>
>
> On Mon, Jan 2, 2017 at 7:57 AM, Mateusz Guzik  wrote:
>
>> On Mon, Jan 02, 2017 at 07:48:22AM -0500, Aryeh Friedman wrote:
>> > On Mon, Jan 2, 2017 at 7:36 AM, Mateusz Guzik 
>> wrote:
>> >
>> > > On Mon, Jan 02, 2017 at 06:57:48AM -0500, Aryeh Friedman wrote:
>> > > > FreeBSD lilith 11.0-STABLE FreeBSD 11.0-STABLE #7 r311003: Sun Jan
>> 1
>> > > > 02:45:34 EST 2017 root@lilith:/usr/obj/usr/src/sys/GENERIC
>> amd64
>> > > >
>> > > >
>> > > > --
>> > > > >>> stage 3.1: building everything
>> > > > --
>> > > > cd /usr/obj/usr/src/sys/GENERIC; COMPILER_VERSION=30901
>> > > > COMPILER_TYPE=clang  COMPILER_FREEBSD_VERSION=1100503
>> > > > MAKEOBJDIRPREFIX=/usr/obj  MACHINE_ARCH=amd64  MACHINE=amd64
>> CPUTYPE=
>> > > > GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
>> > > > GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
>> > > > GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac CC="cc
>> > > -target
>> > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
>> > > > -B/usr/obj/usr/src/tmp/usr/bin" CXX="c++  -target
>> > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
>> > > > -B/usr/obj/usr/src/tmp/usr/bin"  CPP="cpp -target
>> > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
>> > > > -B/usr/obj/usr/src/tmp/usr/bin"  AS="as" AR="ar" LD="ld" NM=nm
>> > > > OBJDUMP=objdump OBJCOPY="objcopy"  RANLIB=ranlib STRINGS=
>> SIZE="size"
>> > > > INSTALL="sh /usr/src/tools/install.sh"
>> > > > PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/
>> > > src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/
>> > > usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/
>> > > sbin:/bin:/usr/sbin:/usr/bin
>> > > > make  -m /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
>> > > > linking kernel.full
>> > > > ctfmerge -L VERSION -g -o kernel.full ...
>> > > > 
>> > >
>> > > How reproducible is the crash? What previous kernel was known to work?
>> > > Can you narrow it down to a particular revision, preferably with
>> kernel
>> > > debugging enabled? (see the end of the mail)
>> > >
>> >
>> > It first appeared a few days ago (forget what revision) then disappeared
>> > the day after and reappeared yesterday.   It is 100% reproducible (i.e.
>> > clearing out /usr/obj and doing a make kernel in either single or
>> multiuser
>> > mode both cause it).Turing on debugging would be hard but perhaps I
>> > should slightly qualify "freeze": make freezes but the rest of the
>> system
>> > is responsive and killing make leaves a zombie ctfmerge.  If I still
>> need
>> > kernel debugging based on the above I will do it but looking for an
>> easier
>> > explanation first.
>> >
>>
>> I definitely don't run into anything of the sort and the problem
>> statement is quote vague.
>>
>> However, if the problem is indeed reproducible, the minimum you can do
>> is find the first revision where it started appearing and that would
>> definitely help with an investigation.
>>
>>
> Any advice on how to do that since I update daily I can tell you when it
> started (the day) but not the actual revision ID.
>
>
>
> --
> Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org
>



-- 
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: make kernel ctfmerge freeze on 11-STABLE

2017-01-02 Thread Mateusz Guzik

On Mon, Jan 02, 2017 at 08:33:29AM -0500, Aryeh Friedman wrote:
> On Mon, Jan 2, 2017 at 7:57 AM, Mateusz Guzik  wrote:
> 
> > On Mon, Jan 02, 2017 at 07:48:22AM -0500, Aryeh Friedman wrote:
> > > On Mon, Jan 2, 2017 at 7:36 AM, Mateusz Guzik  wrote:
> > >
> > > > On Mon, Jan 02, 2017 at 06:57:48AM -0500, Aryeh Friedman wrote:
> > > > > FreeBSD lilith 11.0-STABLE FreeBSD 11.0-STABLE #7 r311003: Sun Jan  1
> > > > > 02:45:34 EST 2017 root@lilith:/usr/obj/usr/src/sys/GENERIC
> > amd64
> > > > >
> > > > >
> > > > > --
> > > > > >>> stage 3.1: building everything
> > > > > --
> > > > > cd /usr/obj/usr/src/sys/GENERIC; COMPILER_VERSION=30901
> > > > > COMPILER_TYPE=clang  COMPILER_FREEBSD_VERSION=1100503
> > > > > MAKEOBJDIRPREFIX=/usr/obj  MACHINE_ARCH=amd64  MACHINE=amd64
> > CPUTYPE=
> > > > > GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
> > > > > GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
> > > > > GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac CC="cc
> > > > -target
> > > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > > > -B/usr/obj/usr/src/tmp/usr/bin" CXX="c++  -target
> > > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > > > -B/usr/obj/usr/src/tmp/usr/bin"  CPP="cpp -target
> > > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > > > -B/usr/obj/usr/src/tmp/usr/bin"  AS="as" AR="ar" LD="ld" NM=nm
> > > > > OBJDUMP=objdump OBJCOPY="objcopy"  RANLIB=ranlib STRINGS=
> > SIZE="size"
> > > > > INSTALL="sh /usr/src/tools/install.sh"
> > > > > PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/
> > > > src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/
> > > > usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/
> > > > sbin:/bin:/usr/sbin:/usr/bin
> > > > > make  -m /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
> > > > > linking kernel.full
> > > > > ctfmerge -L VERSION -g -o kernel.full ...
> > > > > 
> > > >
> > > > How reproducible is the crash? What previous kernel was known to work?
> > > > Can you narrow it down to a particular revision, preferably with kernel
> > > > debugging enabled? (see the end of the mail)
> > > >
> > >
> > > It first appeared a few days ago (forget what revision) then disappeared
> > > the day after and reappeared yesterday.   It is 100% reproducible (i.e.
> > > clearing out /usr/obj and doing a make kernel in either single or
> > multiuser
> > > mode both cause it).Turing on debugging would be hard but perhaps I
> > > should slightly qualify "freeze": make freezes but the rest of the system
> > > is responsive and killing make leaves a zombie ctfmerge.  If I still need
> > > kernel debugging based on the above I will do it but looking for an
> > easier
> > > explanation first.
> > >
> >
> > I definitely don't run into anything of the sort and the problem
> > statement is quote vague.
> >
> > However, if the problem is indeed reproducible, the minimum you can do
> > is find the first revision where it started appearing and that would
> > definitely help with an investigation.
> >
> >
> Any advice on how to do that since I update daily I can tell you when it
> started (the day) but not the actual revision ID.
> 

Just get the source, e.g.:
svn checkout https://svn.freebsd.org/base/stable/11 /usr/src

You can then switch to a particular revision you can svn up -r, e.g.:
svn update -r r310953

to switch to the revision prior to cache merge.

Preferably though you would use git as it allows easy bisection.
https://github.com/freebsd/freebsd, the branch is origin/stable/11.

-- 
Mateusz Guzik 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: make kernel ctfmerge freeze on 11-STABLE

2017-01-02 Thread Aryeh Friedman

On Mon, Jan 2, 2017 at 7:57 AM, Mateusz Guzik  wrote:

> On Mon, Jan 02, 2017 at 07:48:22AM -0500, Aryeh Friedman wrote:
> > On Mon, Jan 2, 2017 at 7:36 AM, Mateusz Guzik  wrote:
> >
> > > On Mon, Jan 02, 2017 at 06:57:48AM -0500, Aryeh Friedman wrote:
> > > > FreeBSD lilith 11.0-STABLE FreeBSD 11.0-STABLE #7 r311003: Sun Jan  1
> > > > 02:45:34 EST 2017 root@lilith:/usr/obj/usr/src/sys/GENERIC
> amd64
> > > >
> > > >
> > > > --
> > > > >>> stage 3.1: building everything
> > > > --
> > > > cd /usr/obj/usr/src/sys/GENERIC; COMPILER_VERSION=30901
> > > > COMPILER_TYPE=clang  COMPILER_FREEBSD_VERSION=1100503
> > > > MAKEOBJDIRPREFIX=/usr/obj  MACHINE_ARCH=amd64  MACHINE=amd64
> CPUTYPE=
> > > > GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
> > > > GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
> > > > GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac CC="cc
> > > -target
> > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > > -B/usr/obj/usr/src/tmp/usr/bin" CXX="c++  -target
> > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > > -B/usr/obj/usr/src/tmp/usr/bin"  CPP="cpp -target
> > > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > > -B/usr/obj/usr/src/tmp/usr/bin"  AS="as" AR="ar" LD="ld" NM=nm
> > > > OBJDUMP=objdump OBJCOPY="objcopy"  RANLIB=ranlib STRINGS=
> SIZE="size"
> > > > INSTALL="sh /usr/src/tools/install.sh"
> > > > PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/
> > > src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/
> > > usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/
> > > sbin:/bin:/usr/sbin:/usr/bin
> > > > make  -m /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
> > > > linking kernel.full
> > > > ctfmerge -L VERSION -g -o kernel.full ...
> > > > 
> > >
> > > How reproducible is the crash? What previous kernel was known to work?
> > > Can you narrow it down to a particular revision, preferably with kernel
> > > debugging enabled? (see the end of the mail)
> > >
> >
> > It first appeared a few days ago (forget what revision) then disappeared
> > the day after and reappeared yesterday.   It is 100% reproducible (i.e.
> > clearing out /usr/obj and doing a make kernel in either single or
> multiuser
> > mode both cause it).Turing on debugging would be hard but perhaps I
> > should slightly qualify "freeze": make freezes but the rest of the system
> > is responsive and killing make leaves a zombie ctfmerge.  If I still need
> > kernel debugging based on the above I will do it but looking for an
> easier
> > explanation first.
> >
>
> I definitely don't run into anything of the sort and the problem
> statement is quote vague.
>
> However, if the problem is indeed reproducible, the minimum you can do
> is find the first revision where it started appearing and that would
> definitely help with an investigation.
>
>
Any advice on how to do that since I update daily I can tell you when it
started (the day) but not the actual revision ID.



-- 
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: make kernel ctfmerge freeze on 11-STABLE

2017-01-02 Thread Mateusz Guzik

On Mon, Jan 02, 2017 at 07:48:22AM -0500, Aryeh Friedman wrote:
> On Mon, Jan 2, 2017 at 7:36 AM, Mateusz Guzik  wrote:
> 
> > On Mon, Jan 02, 2017 at 06:57:48AM -0500, Aryeh Friedman wrote:
> > > FreeBSD lilith 11.0-STABLE FreeBSD 11.0-STABLE #7 r311003: Sun Jan  1
> > > 02:45:34 EST 2017 root@lilith:/usr/obj/usr/src/sys/GENERIC  amd64
> > >
> > >
> > > --
> > > >>> stage 3.1: building everything
> > > --
> > > cd /usr/obj/usr/src/sys/GENERIC; COMPILER_VERSION=30901
> > > COMPILER_TYPE=clang  COMPILER_FREEBSD_VERSION=1100503
> > > MAKEOBJDIRPREFIX=/usr/obj  MACHINE_ARCH=amd64  MACHINE=amd64  CPUTYPE=
> > > GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
> > > GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
> > > GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac CC="cc
> > -target
> > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > -B/usr/obj/usr/src/tmp/usr/bin" CXX="c++  -target
> > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > -B/usr/obj/usr/src/tmp/usr/bin"  CPP="cpp -target
> > > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > > -B/usr/obj/usr/src/tmp/usr/bin"  AS="as" AR="ar" LD="ld" NM=nm
> > > OBJDUMP=objdump OBJCOPY="objcopy"  RANLIB=ranlib STRINGS=  SIZE="size"
> > > INSTALL="sh /usr/src/tools/install.sh"
> > > PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/
> > src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/
> > usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/
> > sbin:/bin:/usr/sbin:/usr/bin
> > > make  -m /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
> > > linking kernel.full
> > > ctfmerge -L VERSION -g -o kernel.full ...
> > > 
> >
> > How reproducible is the crash? What previous kernel was known to work?
> > Can you narrow it down to a particular revision, preferably with kernel
> > debugging enabled? (see the end of the mail)
> >
> 
> It first appeared a few days ago (forget what revision) then disappeared
> the day after and reappeared yesterday.   It is 100% reproducible (i.e.
> clearing out /usr/obj and doing a make kernel in either single or multiuser
> mode both cause it).Turing on debugging would be hard but perhaps I
> should slightly qualify "freeze": make freezes but the rest of the system
> is responsive and killing make leaves a zombie ctfmerge.  If I still need
> kernel debugging based on the above I will do it but looking for an easier
> explanation first.
> 

I definitely don't run into anything of the sort and the problem
statement is quote vague.

However, if the problem is indeed reproducible, the minimum you can do
is find the first revision where it started appearing and that would
definitely help with an investigation.

-- 
Mateusz Guzik 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: make kernel ctfmerge freeze on 11-STABLE

2017-01-02 Thread Aryeh Friedman

On Mon, Jan 2, 2017 at 7:36 AM, Mateusz Guzik  wrote:

> On Mon, Jan 02, 2017 at 06:57:48AM -0500, Aryeh Friedman wrote:
> > FreeBSD lilith 11.0-STABLE FreeBSD 11.0-STABLE #7 r311003: Sun Jan  1
> > 02:45:34 EST 2017 root@lilith:/usr/obj/usr/src/sys/GENERIC  amd64
> >
> >
> > --
> > >>> stage 3.1: building everything
> > --
> > cd /usr/obj/usr/src/sys/GENERIC; COMPILER_VERSION=30901
> > COMPILER_TYPE=clang  COMPILER_FREEBSD_VERSION=1100503
> > MAKEOBJDIRPREFIX=/usr/obj  MACHINE_ARCH=amd64  MACHINE=amd64  CPUTYPE=
> > GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
> > GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
> > GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac CC="cc
> -target
> > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > -B/usr/obj/usr/src/tmp/usr/bin" CXX="c++  -target
> > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > -B/usr/obj/usr/src/tmp/usr/bin"  CPP="cpp -target
> > x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> > -B/usr/obj/usr/src/tmp/usr/bin"  AS="as" AR="ar" LD="ld" NM=nm
> > OBJDUMP=objdump OBJCOPY="objcopy"  RANLIB=ranlib STRINGS=  SIZE="size"
> > INSTALL="sh /usr/src/tools/install.sh"
> > PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/
> src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/
> usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/
> sbin:/bin:/usr/sbin:/usr/bin
> > make  -m /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
> > linking kernel.full
> > ctfmerge -L VERSION -g -o kernel.full ...
> > 
>
> How reproducible is the crash? What previous kernel was known to work?
> Can you narrow it down to a particular revision, preferably with kernel
> debugging enabled? (see the end of the mail)
>

It first appeared a few days ago (forget what revision) then disappeared
the day after and reappeared yesterday.   It is 100% reproducible (i.e.
clearing out /usr/obj and doing a make kernel in either single or multiuser
mode both cause it).Turing on debugging would be hard but perhaps I
should slightly qualify "freeze": make freezes but the rest of the system
is responsive and killing make leaves a zombie ctfmerge.  If I still need
kernel debugging based on the above I will do it but looking for an easier
explanation first.


>
>


-- 
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: make kernel ctfmerge freeze on 11-STABLE

2017-01-02 Thread David Wolfskill

On Mon, Jan 02, 2017 at 01:36:31PM +0100, Mateusz Guzik wrote:
> On Mon, Jan 02, 2017 at 06:57:48AM -0500, Aryeh Friedman wrote:
> > FreeBSD lilith 11.0-STABLE FreeBSD 11.0-STABLE #7 r311003: Sun Jan  1
> > 02:45:34 EST 2017 root@lilith:/usr/obj/usr/src/sys/GENERIC  amd64
> > ... 
> > make  -m /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
> > linking kernel.full
> > ctfmerge -L VERSION -g -o kernel.full ...
> > 
> 
> How reproducible is the crash? What previous kernel was known to work?
> Can you narrow it down to a particular revision, preferably with kernel
> debugging enabled? (see the end of the mail)

FWIW, I did not see anything approaching such a freeze, either on my
build machine or my laptop, during the just-comopleted upgrade from:

FreeBSD g1-252.catwhisker.org 11.0-STABLE FreeBSD 11.0-STABLE #209  
r311007M/311007:1100508: Sun Jan  1 03:51:25 PST 2017 
r...@g1-252.catwhisker.org:/common/S1/obj/usr/src/sys/CANARY  amd64


to:

FreeBSD g1-252.catwhisker.org 11.0-STABLE FreeBSD 11.0-STABLE #210  
r311047M/311097:1100508: Mon Jan  2 04:23:25 PST 2017 
r...@g1-252.catwhisker.org:/common/S1/obj/usr/src/sys/CANARY  amd64


(Or any prior upgrade, that I recall).

>  

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Epistemology for post-truthers: How do we select parts of reality to ignore?

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature

Re: make kernel ctfmerge freeze on 11-STABLE

2017-01-02 Thread Mateusz Guzik

On Mon, Jan 02, 2017 at 06:57:48AM -0500, Aryeh Friedman wrote:
> FreeBSD lilith 11.0-STABLE FreeBSD 11.0-STABLE #7 r311003: Sun Jan  1
> 02:45:34 EST 2017 root@lilith:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> 
> --
> >>> stage 3.1: building everything
> --
> cd /usr/obj/usr/src/sys/GENERIC; COMPILER_VERSION=30901
> COMPILER_TYPE=clang  COMPILER_FREEBSD_VERSION=1100503
> MAKEOBJDIRPREFIX=/usr/obj  MACHINE_ARCH=amd64  MACHINE=amd64  CPUTYPE=
> GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
> GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
> GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac CC="cc -target
> x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> -B/usr/obj/usr/src/tmp/usr/bin" CXX="c++  -target
> x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> -B/usr/obj/usr/src/tmp/usr/bin"  CPP="cpp -target
> x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
> -B/usr/obj/usr/src/tmp/usr/bin"  AS="as" AR="ar" LD="ld" NM=nm
> OBJDUMP=objdump OBJCOPY="objcopy"  RANLIB=ranlib STRINGS=  SIZE="size"
> INSTALL="sh /usr/src/tools/install.sh"
> PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin
> make  -m /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
> linking kernel.full
> ctfmerge -L VERSION -g -o kernel.full ...
> 

How reproducible is the crash? What previous kernel was known to work?
Can you narrow it down to a particular revision, preferably with kernel
debugging enabled? (see the end of the mail)

There was one invasive change merged - fine-grained namecache in
r310959 and that can be treated as the likely culprit.

That said, I would start the search with verifying there are no issues
with r310953 first.

Debug opts:

options KDB # Enable kernel debugger support.
options KDB_TRACE   # Print a stack trace for a panic.
# For full debugger support use (turn off in stable branch):
options DDB # Support DDB.
options GDB # Support remote GDB.
options INVARIANTS  # Enable calls of extra sanity checking
options INVARIANT_SUPPORT   # Extra sanity checks of internal 
structures, required by INVARIANTS
options WITNESS # Enable checks to detect deadlocks and 
cycles
options WITNESS_SKIPSPIN# Don't run witness on spinlocks for 
speed
options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones
options DEBUG_VFS_LOCKS

-- 
Mateusz Guzik 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

make kernel ctfmerge freeze on 11-STABLE

2017-01-02 Thread Aryeh Friedman

FreeBSD lilith 11.0-STABLE FreeBSD 11.0-STABLE #7 r311003: Sun Jan  1
02:45:34 EST 2017 root@lilith:/usr/obj/usr/src/sys/GENERIC  amd64


--
>>> stage 3.1: building everything
--
cd /usr/obj/usr/src/sys/GENERIC; COMPILER_VERSION=30901
COMPILER_TYPE=clang  COMPILER_FREEBSD_VERSION=1100503
MAKEOBJDIRPREFIX=/usr/obj  MACHINE_ARCH=amd64  MACHINE=amd64  CPUTYPE=
GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac CC="cc -target
x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
-B/usr/obj/usr/src/tmp/usr/bin" CXX="c++  -target
x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
-B/usr/obj/usr/src/tmp/usr/bin"  CPP="cpp -target
x86_64-unknown-freebsd11.0 --sysroot=/usr/obj/usr/src/tmp
-B/usr/obj/usr/src/tmp/usr/bin"  AS="as" AR="ar" LD="ld" NM=nm
OBJDUMP=objdump OBJCOPY="objcopy"  RANLIB=ranlib STRINGS=  SIZE="size"
INSTALL="sh /usr/src/tools/install.sh"
PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin
make  -m /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
linking kernel.full
ctfmerge -L VERSION -g -o kernel.full ...


-- 
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-08-27 Thread Yonghyeon PYUN

On Thu, Aug 27, 2015 at 11:29:28AM +0200, Johann Hugo wrote:
> It's working for me so far and I haven't seen any watchdog timeouts.
> With 10.2-RELEASE I got timeouts and lost connectivity in less that a
> minute.
> 

Ok, great.  Committed in r287238.
Thanks again.

> Johann
> 
> On Wed, Aug 26, 2015 at 10:28 AM, Yonghyeon PYUN  wrote:
> > On Wed, Aug 26, 2015 at 10:06:29AM +0200, Johann Hugo wrote:
> >> 10.2-RELEASE does not work for me. It works for a very short while and
> >> then it stops with "msk0 watchdog timeout" errors
> >>
> >
> > Thanks a lot for your report.  This is the first report for
> > msk(4) watchdog timeouts on 10.2-RELEASE.
> >
> >> I'm not sure what patch Roosevelt was talking about, but the patch in
> >> this thread works for me:
> >> https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html
> >>
> >> I've changed MSK_STAT_ALIGN  from 4096 to 8192 in if_mskreg.h and it's
> >> been running stable for the last week.
> >>
> >
> > I see.  I'm under the impression that RX/TX descriptor ring
> > alignment shall trigger the same issue so it would be better to
> > know how attached patch works on your box.
> >
> > Thanks.
> >
> >> Johann
> >>
> >> On Sun, Aug 16, 2015 at 2:08 PM, Yonghyeon PYUN  wrote:
> >> > On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote:
> >> >> Hi,
> >> >> So, I can confirm with the attached patch. I have a working msk0 that
> >> >> hasn't failed for the past month. I considered this problem fix for me.
> >> >> Since, I have went a long time without any problems. Thanks!
> >> >
> >> > I'm not sure which patch you used.  Given that users reported
> >> > 10.2-RELEASE works, it would be great if you revert local patch
> >> > and try it again on 10.2-RELEASE.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-08-27 Thread Johann Hugo

It's working for me so far and I haven't seen any watchdog timeouts.
With 10.2-RELEASE I got timeouts and lost connectivity in less that a
minute.

Johann

On Wed, Aug 26, 2015 at 10:28 AM, Yonghyeon PYUN  wrote:
> On Wed, Aug 26, 2015 at 10:06:29AM +0200, Johann Hugo wrote:
>> 10.2-RELEASE does not work for me. It works for a very short while and
>> then it stops with "msk0 watchdog timeout" errors
>>
>
> Thanks a lot for your report.  This is the first report for
> msk(4) watchdog timeouts on 10.2-RELEASE.
>
>> I'm not sure what patch Roosevelt was talking about, but the patch in
>> this thread works for me:
>> https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html
>>
>> I've changed MSK_STAT_ALIGN  from 4096 to 8192 in if_mskreg.h and it's
>> been running stable for the last week.
>>
>
> I see.  I'm under the impression that RX/TX descriptor ring
> alignment shall trigger the same issue so it would be better to
> know how attached patch works on your box.
>
> Thanks.
>
>> Johann
>>
>> On Sun, Aug 16, 2015 at 2:08 PM, Yonghyeon PYUN  wrote:
>> > On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote:
>> >> Hi,
>> >> So, I can confirm with the attached patch. I have a working msk0 that
>> >> hasn't failed for the past month. I considered this problem fix for me.
>> >> Since, I have went a long time without any problems. Thanks!
>> >
>> > I'm not sure which patch you used.  Given that users reported
>> > 10.2-RELEASE works, it would be great if you revert local patch
>> > and try it again on 10.2-RELEASE.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-08-26 Thread Yonghyeon PYUN

On Wed, Aug 26, 2015 at 10:06:29AM +0200, Johann Hugo wrote:
> 10.2-RELEASE does not work for me. It works for a very short while and
> then it stops with "msk0 watchdog timeout" errors
> 

Thanks a lot for your report.  This is the first report for
msk(4) watchdog timeouts on 10.2-RELEASE.

> I'm not sure what patch Roosevelt was talking about, but the patch in
> this thread works for me:
> https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html
> 
> I've changed MSK_STAT_ALIGN  from 4096 to 8192 in if_mskreg.h and it's
> been running stable for the last week.
> 

I see.  I'm under the impression that RX/TX descriptor ring
alignment shall trigger the same issue so it would be better to
know how attached patch works on your box.

Thanks.

> Johann
> 
> On Sun, Aug 16, 2015 at 2:08 PM, Yonghyeon PYUN  wrote:
> > On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote:
> >> Hi,
> >> So, I can confirm with the attached patch. I have a working msk0 that
> >> hasn't failed for the past month. I considered this problem fix for me.
> >> Since, I have went a long time without any problems. Thanks!
> >
> > I'm not sure which patch you used.  Given that users reported
> > 10.2-RELEASE works, it would be great if you revert local patch
> > and try it again on 10.2-RELEASE.
Index: sys/dev/msk/if_mskreg.h
===
--- sys/dev/msk/if_mskreg.h	(revision 281587)
+++ sys/dev/msk/if_mskreg.h	(working copy)
@@ -2175,13 +2175,8 @@
 #define MSK_ADDR_LO(x)	((uint64_t) (x) & 0xUL)
 #define MSK_ADDR_HI(x)	((uint64_t) (x) >> 32)
 
-/*
- * At first I guessed 8 bytes, the size of a single descriptor, would be
- * required alignment constraints. But, it seems that Yukon II have 4096
- * bytes boundary alignment constraints.
- */
-#define MSK_RING_ALIGN	4096
-#define	MSK_STAT_ALIGN	4096
+#define	MSK_RING_ALIGN	32768
+#define	MSK_STAT_ALIGN	32768
 
 /* Rx descriptor data structure */
 struct msk_rx_desc {
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-08-26 Thread Johann Hugo

10.2-RELEASE does not work for me. It works for a very short while and
then it stops with "msk0 watchdog timeout" errors

I'm not sure what patch Roosevelt was talking about, but the patch in
this thread works for me:
https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html

I've changed MSK_STAT_ALIGN  from 4096 to 8192 in if_mskreg.h and it's
been running stable for the last week.

Johann

On Sun, Aug 16, 2015 at 2:08 PM, Yonghyeon PYUN  wrote:
> On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote:
>> Hi,
>> So, I can confirm with the attached patch. I have a working msk0 that
>> hasn't failed for the past month. I considered this problem fix for me.
>> Since, I have went a long time without any problems. Thanks!
>
> I'm not sure which patch you used.  Given that users reported
> 10.2-RELEASE works, it would be great if you revert local patch
> and try it again on 10.2-RELEASE.
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-08-16 Thread Yonghyeon PYUN

On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote:
> Hi,
> So, I can confirm with the attached patch. I have a working msk0 that
> hasn't failed for the past month. I considered this problem fix for me.
> Since, I have went a long time without any problems. Thanks!

I'm not sure which patch you used.  Given that users reported
10.2-RELEASE works, it would be great if you revert local patch
and try it again on 10.2-RELEASE.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-08-15 Thread Alnis Morics


On 08/12/2015 04:44 PM, Roosevelt Littleton wrote:

Hi,
So, I can confirm with the attached patch. I have a working msk0 that
hasn't failed for the past month. I considered this problem fix for me.
Since, I have went a long time without any problems. Thanks!

Roosevelt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Since 10.2-RC1 it works for me, too; now on 10.2-RELEASE. And I don't 
use any patches, still.


-Alnis
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-08-12 Thread Roosevelt Littleton

Hi,
So, I can confirm with the attached patch. I have a working msk0 that
hasn't failed for the past month. I considered this problem fix for me.
Since, I have went a long time without any problems. Thanks!

Roosevelt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-07-26 Thread Alnis Morics


On 07/26/2015 01:40 PM, Yonghyeon PYUN wrote:

On Sat, Jul 25, 2015 at 02:08:10PM +0300, Alnis Morics wrote:


Just tried 10.2-RC1 amd64 GENERIC, and the problem seems to be gone. I
was even able to scp a 500 MB file. Could it be related to this fix in
BETA2, as mentioned in the announcement, "The watchdog(4) device has
been fixed to print to the correct buffer."?


msk(4) will show watchdog timeouts when it detects driver TX path
is in stuck condition but I believe this has nothing to do with
watchdog(4).

There was no msk(4) code change in 10.2-RC1.  If you happen to see
the watchdog timeouts again, please try attached patch and let me
know whether it makes any difference for you.  I didn't get much
feedbacks on the patch so I'm not sure whether it really fixes the
root cause.


pciconf -lv
[..]
mskc0@pci0:9:0:0:class=0x02 card=0xc072144d chip=0x435411ab
rev=0x00 hdr=0x00
 vendor = 'Marvell Technology Group Ltd.'
 device = '88E8040 PCI-E Fast Ethernet Controller'
 class  = network
 subclass   = ethernet




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Thanks, Pyun. If the watchdog timeouts reappear, I'll try the patch and 
give notice about the results.


-Alnis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-07-26 Thread Yonghyeon PYUN

On Sat, Jul 25, 2015 at 02:08:10PM +0300, Alnis Morics wrote:

> Just tried 10.2-RC1 amd64 GENERIC, and the problem seems to be gone. I 
> was even able to scp a 500 MB file. Could it be related to this fix in 
> BETA2, as mentioned in the announcement, "The watchdog(4) device has 
> been fixed to print to the correct buffer."?
> 

msk(4) will show watchdog timeouts when it detects driver TX path
is in stuck condition but I believe this has nothing to do with
watchdog(4).

There was no msk(4) code change in 10.2-RC1.  If you happen to see
the watchdog timeouts again, please try attached patch and let me
know whether it makes any difference for you.  I didn't get much
feedbacks on the patch so I'm not sure whether it really fixes the
root cause.

> pciconf -lv
> [..]
> mskc0@pci0:9:0:0:class=0x02 card=0xc072144d chip=0x435411ab 
> rev=0x00 hdr=0x00
> vendor = 'Marvell Technology Group Ltd.'
> device = '88E8040 PCI-E Fast Ethernet Controller'
> class  = network
> subclass   = ethernet
> 
> 
Index: sys/dev/msk/if_mskreg.h
===
--- sys/dev/msk/if_mskreg.h	(revision 281587)
+++ sys/dev/msk/if_mskreg.h	(working copy)
@@ -2175,13 +2175,8 @@
 #define MSK_ADDR_LO(x)	((uint64_t) (x) & 0xUL)
 #define MSK_ADDR_HI(x)	((uint64_t) (x) >> 32)
 
-/*
- * At first I guessed 8 bytes, the size of a single descriptor, would be
- * required alignment constraints. But, it seems that Yukon II have 4096
- * bytes boundary alignment constraints.
- */
-#define MSK_RING_ALIGN	4096
-#define	MSK_STAT_ALIGN	4096
+#define	MSK_RING_ALIGN	32768
+#define	MSK_STAT_ALIGN	32768
 
 /* Rx descriptor data structure */
 struct msk_rx_desc {
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-07-25 Thread Alnis Morics

0: msk_handle_events: Break #5  cons=196  csrread=197
mskc0: msk_handle_events: Break #5  cons=197  csrread=198
...
mskc0: msk_handle_events: Break #5  cons=510  csrread=511
mskc0: msk_handle_events: Break #5  cons=511  csrread=512
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
...
mskc0: msk_handle_events: Break #1  cons=512  csrread=519
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=519
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
...etc



From: owner-freebsd-sta...@freebsd.org [owner-freebsd-sta...@freebsd.org] on 
behalf of Yonghyeon PYUN [pyu...@gmail.com]
Sent: 13 April 2015 09:13
To: Gareth Wyn Roberts
Cc: freebsd-stable@freebsd.org
Subject: Re: msk msk0 watchdog timeout freeze hang lock stop problem

On Sun, Apr 12, 2015 at 05:57:34PM +, Gareth Wyn Roberts wrote:

I've run in to problems using the msk device where initially it works well 
enough to set DHCP etc. but stops/freezes as soon as any appreciable network 
traffic occurs . There are several threads describing similar symptoms over the 
past two years or more.  I've been following several false leads but have 
finally found a solution (at least it solves my problem).

I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as:

mskc0:  mem 0xfa00-0xfa003fff irq 
19 at device 0.0 on pci6
msk0:  on mskc0
msk0: Ethernet address: 00:13:77:e9:df:eb
miibus0:  on msk0
e1000phy0:  PHY 0 on miibus0
e1000phy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma
ster, auto, auto-flow

The network worked when using the i386 release, but failed for the amd64 
release (as reported previously) which prompted me to disable 64-bit DMA (the 
patch for this is attached below).  This worked for the first kernel built but 
mysteriously failed when another unrelated part of the kernel was changed (a 
usb driver) and the kernel recompiled.  So identical msk driver code worked in 
one kernel but not the second! This suggested that alignment differences 
between the two kernels were causing the msk driver to fail. Others have 
reported varying behaviour depending on different circumstances.

It transpires that changing just one value in the if_mskreg.h file solved all 
my problems.  Subsequently I have not been able to make it fail under heavy 
network traffic in either 32-bit or 64-bit mode.
I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and 
if_mskreg.h revision 264442.

Thanks for letting me know your findings.  I really appreciate
that.
I recall that the alignment requirement of status LEs(List Elements
in Marvell terms) is 2048 and the maximum size of the status LEs is
4096 bytes(Actual alignment seems to be much lower value like 32 or
64 bytes, but alignment 2048 is chosen to avoid silicon bugs).
Later experiments showed some variants of Yukon II require 4096
bytes alignment and I changed the alignment to 4096 in the past.
It seems your finding indicates msk(4) needs 8192 alignment for
status LEs.

However this does not explain how and why the same code in 8.x/9.x
works well.  In addition, it's not common to require alignment size
greater than PAGE_SIZE on x86 given that the maximum size of DMA
buffer is 4096 bytes.  I have to check whether there was a change
in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due
to lack of spare time.  Probably you can verify the DMA address of
status LEs meets the following requirements both on i386 and amd64.
   - Alignment is 4096.
   - Number of DMA segment is 1.
   - DMA segment base address plus DMA segment size does not cross
 a PAGE_SIZE boundary.


Here's the patch to if_mskreg.h
--- if_mskreg.h-orig2014-11-11 20:02:58.0 +
+++ if_mskreg.h 2015-04-12 18:47:20.0 +0100
@@ -2179,9 +2179,11 @@
   * At first I guessed 8 bytes, the size of a single descriptor, would be
   * required alignment cons

Heads-Up: stable/10 freeze in effect

2015-07-02 Thread Glen Barber

For those not subscribed to svn commit email, the code freeze for the
upcoming 10.2-RELEASE is now in effect.

The full schedule as it stands now is available here:

https://www.FreeBSD.org/releases/10.2R/schedule.html

If you are aware of an issue that affects stable/10 that does not have
a corresponding PR, please file a bug report so we do not lose track.

Thank you.

Glen
On behalf of:   re@



pgpNM869awBxn.pgp
Description: PGP signature

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-04-15 Thread Yonghyeon PYUN

On Wed, Apr 15, 2015 at 09:52:09PM +, Gareth Wyn Roberts wrote:
> I've inserted code to print some values which show the differences between 
> specifying 4096 or 8192 for MSK_STAT_ALIGN.  In both cases the status buffer 
> has length 0x4000 (8x2048=16K) but the alignments are different as expected, 
> respectively start addresses 0x5c3b000 or 0xbdc2c000.
> 
> The following values were output from functions msk_status_dma_alloc(), 
> msk_dmamap_cb() and msk_handle_events().
> The "Break #n" refer to breaks in msk_handle_events(). "#1" occurs if 
> ((control & HW_OWNER) == 0), "#5" is OP_RXSTAT and "#6" is OP_TXINDEXLE.
> 
> The first output is for MSK_STAT_ALIGN=8192.  It continues normally.  
> Although not shown here, it reaches cons=2047 then cons=0 as expected.
> 
> The second output is for MSK_STAT_ALIGN=4096.  Although there can be isolated 
> occurences of "Break #1" (e.g. cons=196) (?are these to be expected?),  it 
> continues normally until cons=512. At this point it continually invokes the 
> "#1" block because the msk_control from msk_stat_ring[512] is always zero and 
> the network hangs immediately. This suggests the Yukon Ultra 2 88E8057 can't 
> access the next 4096 memory block, but why not?
> 

Yes, it seems the status LE block is not updated at all for
MSK_STAT_ALIGN == 4096 and some elements of the status block looks
suspicious(put index increases but the value in the location is 0).
I vaguely guess this indicates there are DMA alignment and/or DMA
boundary issues.
The maximum number of elements of the status block is 4096 so the
maximum size of the status block is 32KB.  For i386, msk(4) uses
8KB status block(1024 elements).  For 64bit architectures, the
block size is increased to 16KB(2048 elements).
Probably the safe alignment value for the status block would be
32K.  This looks excessive value to me but it shall avoid guessing
DMA boundary issue.

> Please let me know if any further information would be helpful.
> 

Thanks a lot. I've attached a diff which sets the alignment of
TX/RX ring and status block to 32KB.  Not sure whether this also
addresses other msk(4) related watchdog timeouts.
Index: sys/dev/msk/if_mskreg.h
===
--- sys/dev/msk/if_mskreg.h	(revision 281587)
+++ sys/dev/msk/if_mskreg.h	(working copy)
@@ -2175,13 +2175,8 @@
 #define MSK_ADDR_LO(x)	((uint64_t) (x) & 0xUL)
 #define MSK_ADDR_HI(x)	((uint64_t) (x) >> 32)
 
-/*
- * At first I guessed 8 bytes, the size of a single descriptor, would be
- * required alignment constraints. But, it seems that Yukon II have 4096
- * bytes boundary alignment constraints.
- */
-#define MSK_RING_ALIGN	4096
-#define	MSK_STAT_ALIGN	4096
+#define	MSK_RING_ALIGN	32768
+#define	MSK_STAT_ALIGN	32768
 
 /* Rx descriptor data structure */
 struct msk_rx_desc {
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: msk msk0 watchdog timeout freeze hang lock stop problem

2015-04-15 Thread Gareth Wyn Roberts

0: msk_handle_events: Break #5  cons=197  csrread=198
...
mskc0: msk_handle_events: Break #5  cons=510  csrread=511
mskc0: msk_handle_events: Break #5  cons=511  csrread=512
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=513
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
...
mskc0: msk_handle_events: Break #1  cons=512  csrread=519
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
mskc0: msk_handle_events: Break #1  cons=512  csrread=519
mskc0: msk_handle_events: sd=0xfe011e23c000  sd->msk_control=0  control=0
...etc



From: owner-freebsd-sta...@freebsd.org [owner-freebsd-sta...@freebsd.org] on 
behalf of Yonghyeon PYUN [pyu...@gmail.com]
Sent: 13 April 2015 09:13
To: Gareth Wyn Roberts
Cc: freebsd-stable@freebsd.org
Subject: Re: msk msk0 watchdog timeout freeze hang lock stop problem

On Sun, Apr 12, 2015 at 05:57:34PM +, Gareth Wyn Roberts wrote:
> I've run in to problems using the msk device where initially it works well 
> enough to set DHCP etc. but stops/freezes as soon as any appreciable network 
> traffic occurs . There are several threads describing similar symptoms over 
> the past two years or more.  I've been following several false leads but have 
> finally found a solution (at least it solves my problem).
>
> I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as:
>
> mskc0:  mem 0xfa00-0xfa003fff irq 
> 19 at device 0.0 on pci6
> msk0:  on mskc0
> msk0: Ethernet address: 00:13:77:e9:df:eb
> miibus0:  on msk0
> e1000phy0:  PHY 0 on miibus0
> e1000phy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma
> ster, auto, auto-flow
>
> The network worked when using the i386 release, but failed for the amd64 
> release (as reported previously) which prompted me to disable 64-bit DMA (the 
> patch for this is attached below).  This worked for the first kernel built 
> but mysteriously failed when another unrelated part of the kernel was changed 
> (a usb driver) and the kernel recompiled.  So identical msk driver code 
> worked in one kernel but not the second! This suggested that alignment 
> differences between the two kernels were causing the msk driver to fail. 
> Others have reported varying behaviour depending on different circumstances.
>
> It transpires that changing just one value in the if_mskreg.h file solved all 
> my problems.  Subsequently I have not been able to make it fail under heavy 
> network traffic in either 32-bit or 64-bit mode.
> I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and 
> if_mskreg.h revision 264442.

Thanks for letting me know your findings.  I really appreciate
that.
I recall that the alignment requirement of status LEs(List Elements
in Marvell terms) is 2048 and the maximum size of the status LEs is
4096 bytes(Actual alignment seems to be much lower value like 32 or
64 bytes, but alignment 2048 is chosen to avoid silicon bugs).
Later experiments showed some variants of Yukon II require 4096
bytes alignment and I changed the alignment to 4096 in the past.
It seems your finding indicates msk(4) needs 8192 alignment for
status LEs.

However this does not explain how and why the same code in 8.x/9.x
works well.  In addition, it's not common to require alignment size
greater than PAGE_SIZE on x86 given that the maximum size of DMA
buffer is 4096 bytes.  I have to check whether there was a change
in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due
to lack of spare time.  Probably you can verify the DMA address of
status LEs meets the following requirements both on i386 and amd64.
  - Alignment is 4096.
  - Number of DMA segment is 1.
  - DMA segment base address plus DMA segment size does not cross
a PAGE_SIZE boundary.

>
> Here's the patch to if_mskreg.h
> --- if_mskreg.h-orig2014-11-11 20:02:58.0 +
> +++ if_mskreg.h 2015-04-12 18:47:20.0 +0100
> @@ -21

msk msk0 watchdog timeout freeze hang lock stop problem

2015-04-13 Thread Alnis Morics

Hm... I patched if_msk.c with if_msk.c.rev262524.dma.diff 
(attachment-001.bin) and if_mskreg.h with if_mskreg.h.rev264442.dma.diff 
(attachment-002.bin), and nothing changed: scp'ing 50 MB soon got 
"stalled" and ended up with "broken pipe", as it was before.


I have 10.1-RELEASE-p9 amd64

pciconf -lv:
[..]
mskc0@pci0:9:0:0:class=0x02 card=0xc072144d chip=0x435411ab 
rev=0x00 hdr=0x00

vendor = 'Marvell Technology Group Ltd.'
device = '88E8040 PCI-E Fast Ethernet Controller'
class  = network
subclass   = ethernet

Alnis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-04-13 Thread Yonghyeon PYUN

On Sun, Apr 12, 2015 at 05:57:34PM +, Gareth Wyn Roberts wrote:
> I've run in to problems using the msk device where initially it works well 
> enough to set DHCP etc. but stops/freezes as soon as any appreciable network 
> traffic occurs . There are several threads describing similar symptoms over 
> the past two years or more.  I've been following several false leads but have 
> finally found a solution (at least it solves my problem).
> 
> I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as:
> 
> mskc0:  mem 0xfa00-0xfa003fff irq 
> 19 at device 0.0 on pci6
> msk0:  on mskc0
> msk0: Ethernet address: 00:13:77:e9:df:eb
> miibus0:  on msk0
> e1000phy0:  PHY 0 on miibus0
> e1000phy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma
> ster, auto, auto-flow
> 
> The network worked when using the i386 release, but failed for the amd64 
> release (as reported previously) which prompted me to disable 64-bit DMA (the 
> patch for this is attached below).  This worked for the first kernel built 
> but mysteriously failed when another unrelated part of the kernel was changed 
> (a usb driver) and the kernel recompiled.  So identical msk driver code 
> worked in one kernel but not the second! This suggested that alignment 
> differences between the two kernels were causing the msk driver to fail. 
> Others have reported varying behaviour depending on different circumstances.
> 
> It transpires that changing just one value in the if_mskreg.h file solved all 
> my problems.  Subsequently I have not been able to make it fail under heavy 
> network traffic in either 32-bit or 64-bit mode.
> I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and 
> if_mskreg.h revision 264442.

Thanks for letting me know your findings.  I really appreciate
that.
I recall that the alignment requirement of status LEs(List Elements
in Marvell terms) is 2048 and the maximum size of the status LEs is
4096 bytes(Actual alignment seems to be much lower value like 32 or
64 bytes, but alignment 2048 is chosen to avoid silicon bugs).
Later experiments showed some variants of Yukon II require 4096
bytes alignment and I changed the alignment to 4096 in the past.
It seems your finding indicates msk(4) needs 8192 alignment for
status LEs.

However this does not explain how and why the same code in 8.x/9.x
works well.  In addition, it's not common to require alignment size
greater than PAGE_SIZE on x86 given that the maximum size of DMA
buffer is 4096 bytes.  I have to check whether there was a change
in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due
to lack of spare time.  Probably you can verify the DMA address of
status LEs meets the following requirements both on i386 and amd64.
  - Alignment is 4096.
  - Number of DMA segment is 1.
  - DMA segment base address plus DMA segment size does not cross
a PAGE_SIZE boundary.

> 
> Here's the patch to if_mskreg.h
> --- if_mskreg.h-orig2014-11-11 20:02:58.0 +
> +++ if_mskreg.h 2015-04-12 18:47:20.0 +0100
> @@ -2179,9 +2179,11 @@
>   * At first I guessed 8 bytes, the size of a single descriptor, would be
>   * required alignment constraints. But, it seems that Yukon II have 4096
>   * bytes boundary alignment constraints.
> + * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057)
> + * requires 8192 byte alignment to prevent locking.
>   */
>  #define MSK_RING_ALIGN 4096
> -#defineMSK_STAT_ALIGN  4096
> +#defineMSK_STAT_ALIGN  8192
> 
> 
> The patches to both files which also implement a MSK_64BIT_DMA_DISABLE flag 
> are attached.  Perhaps the developers would consider committing these as it 
> may be useful for future debugging.
> 

If you have more than 4GB memory installed and disables 64bit DMA
addressing, msk(4) shall use bounce buffers.  Passing packets
through bounce buffers involves copy operation and it costs a lot.
You can check hw.busdma sysctl node to see whether there are
drivers that use bounce buffers.  And if you want to disable 64bit
DMA on 64bit architectures, add '#undef MSK_64BIT_DMA' just below
BUS_SPACE_MAXADDR check in if_mskreg.h.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-04-12 Thread Kurt Jaeger

Hi!

> I've run in to problems using the msk device [...]

> I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and 
> if_mskreg.h revision 264442.
> 
> Here's the patch to if_mskreg.h
[...]

Thanks for the suggested fix.

There are five PRs, all describe similar things:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197887
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197002
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=189404
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186872
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166727

I added some pointer to your posting, maybe someone can test it ?

-- 
p...@opsec.eu+49 171 3101372 5 years to go !
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

msk msk0 watchdog timeout freeze hang lock stop problem

2015-04-12 Thread Gareth Wyn Roberts

I've run in to problems using the msk device where initially it works well 
enough to set DHCP etc. but stops/freezes as soon as any appreciable network 
traffic occurs . There are several threads describing similar symptoms over the 
past two years or more.  I've been following several false leads but have 
finally found a solution (at least it solves my problem).

I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as:

mskc0:  mem 0xfa00-0xfa003fff irq 
19 at device 0.0 on pci6
msk0:  on mskc0
msk0: Ethernet address: 00:13:77:e9:df:eb
miibus0:  on msk0
e1000phy0:  PHY 0 on miibus0
e1000phy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma
ster, auto, auto-flow

The network worked when using the i386 release, but failed for the amd64 
release (as reported previously) which prompted me to disable 64-bit DMA (the 
patch for this is attached below).  This worked for the first kernel built but 
mysteriously failed when another unrelated part of the kernel was changed (a 
usb driver) and the kernel recompiled.  So identical msk driver code worked in 
one kernel but not the second! This suggested that alignment differences 
between the two kernels were causing the msk driver to fail. Others have 
reported varying behaviour depending on different circumstances.

It transpires that changing just one value in the if_mskreg.h file solved all 
my problems.  Subsequently I have not been able to make it fail under heavy 
network traffic in either 32-bit or 64-bit mode.
I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and 
if_mskreg.h revision 264442.

Here's the patch to if_mskreg.h
--- if_mskreg.h-orig2014-11-11 20:02:58.0 +
+++ if_mskreg.h 2015-04-12 18:47:20.0 +0100
@@ -2179,9 +2179,11 @@
  * At first I guessed 8 bytes, the size of a single descriptor, would be
  * required alignment constraints. But, it seems that Yukon II have 4096
  * bytes boundary alignment constraints.
+ * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057)
+ * requires 8192 byte alignment to prevent locking.
  */
 #define MSK_RING_ALIGN 4096
-#defineMSK_STAT_ALIGN  4096
+#defineMSK_STAT_ALIGN  8192


The patches to both files which also implement a MSK_64BIT_DMA_DISABLE flag are 
attached.  Perhaps the developers would consider committing these as it may be 
useful for future debugging.

Gareth.
--- if_mskreg.h-orig	2014-11-11 20:02:58.0 +
+++ if_mskreg.h	2015-04-12 18:47:20.0 +0100
@@ -2179,9 +2179,11 @@
  * At first I guessed 8 bytes, the size of a single descriptor, would be
  * required alignment constraints. But, it seems that Yukon II have 4096
  * bytes boundary alignment constraints.
+ * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057)
+ * requires 8192 byte alignment to prevent locking.
  */
 #define MSK_RING_ALIGN	4096
-#define	MSK_STAT_ALIGN	4096
+#define	MSK_STAT_ALIGN	8192
 
 /* Rx descriptor data structure */
 struct msk_rx_desc {
--- if_msk.c-orig	2014-11-11 20:02:58.0 +
+++ if_msk.c	2015-04-12 02:15:12.551005000 +0100
@@ -2164,8 +2164,8 @@
 	error = bus_dma_tag_create(
 		bus_get_dma_tag(sc->msk_dev),	/* parent */
 		MSK_STAT_ALIGN, 0,		/* alignment, boundary */
-		BUS_SPACE_MAXADDR,		/* lowaddr */
-		BUS_SPACE_MAXADDR,		/* highaddr */
+		BUS_DMA_TAG_LOWADDR,	/* lowaddr */
+		BUS_DMA_TAG_HIGHADDR,	/* highaddr */
 		NULL, NULL,			/* filter, filterarg */
 		stat_sz,			/* maxsize */
 		1,/* nsegments */
@@ -2235,8 +2235,8 @@
 	error = bus_dma_tag_create(
 		bus_get_dma_tag(sc_if->msk_if_dev),	/* parent */
 		1, 0,			/* alignment, boundary */
-		BUS_SPACE_MAXADDR,		/* lowaddr */
-		BUS_SPACE_MAXADDR,		/* highaddr */
+		BUS_DMA_TAG_LOWADDR,	/* lowaddr */
+		BUS_DMA_TAG_HIGHADDR,	/* highaddr */
 		NULL, NULL,			/* filter, filterarg */
 		BUS_SPACE_MAXSIZE_32BIT,	/* maxsize */
 		0,/* nsegments */
@@ -2252,8 +2252,8 @@
 	/* Create tag for Tx ring. */
 	error = bus_dma_tag_create(sc_if->msk_cdata.msk_parent_tag,/* parent */
 		MSK_RING_ALIGN, 0,		/* alignment, boundary */
-		BUS_SPACE_MAXADDR,		/* lowaddr */
-		BUS_SPACE_MAXADDR,		/* highaddr */
+		BUS_DMA_TAG_LOWADDR,	/* lowaddr */
+		BUS_DMA_TAG_HIGHADDR,	/* highaddr */
 		NULL, NULL,			/* filter, filterarg */
 		MSK_TX_RING_SZ,		/* maxsize */
 		1,/* nsegments */
@@ -2270,8 +2270,8 @@
 	/* Create tag for Rx ring. */
 	error = bus_dma_tag_create(sc_if->msk_cdata.msk_parent_tag,/* parent */
 		MSK_RING_ALIGN, 0,		/* alignment, boundary */
-		BUS_SPACE_MAXADDR,		/* lowaddr */
-		BUS_SPACE_MAXADDR,		/* highaddr */
+		BUS_DMA_TAG_LOWADDR,	/* lowaddr */
+		BUS_DMA_TAG_HIGHADDR,	/* highaddr */
 		NULL, NULL,			/* filter, filterarg */
 		MSK_RX_RING_SZ,		/* maxsize */
 		1,/* nsegments */
@@ -2288,8 +2288,8 @@
 	/* Create

Re: FreeBSD 9-Stable + Atom D510 Freeze

2013-09-20 Thread Thomas Laus

Gary Palmer [gpal...@freebsd.org] wrote:
> It used to be that ports had MAKE_JOBS_SAFE in the Makefile to mark that
> the port could be built using parallel compiles with the '-j' argument
> to make.  It appears that the logic has been switched and now you have
> to mark them as MAKE_JOBS_UNSAFE to say that parallel builds shouldn't be
> done, indicating that parallel builds are the default now (unless I'm
> misreading the code)
> 
> You can try putting
> 
> DISABLE_MAKE_JOBS=yes
> 
> into /etc/make.conf to see if that stops the problem on port builds.
>
Gary:

Making that change worked for me.  I built both Subversion and Tshark,
my two problem children.  The build time was not too much different
than without the flag.  Only 1 CPU was active with cc1 at a time.
I had no 'pfault' states on any entries in top for both builds.

I guess that we can close out this issue.

Thank you and the list for the suggestion.

Tom

-- 
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 9-Stable + Atom D510 Freeze

2013-09-20 Thread Thomas Laus

Gary Palmer [gpal...@freebsd.org] wrote:
> It's not a compiler flag, it's a make flag.  make -j n will fork off up to
> n compilers to do the build.  If you just do "make buildworld" then there
> is no parallel compilation.
> 
> It used to be that ports had MAKE_JOBS_SAFE in the Makefile to mark that
> the port could be built using parallel compiles with the '-j' argument
> to make.  It appears that the logic has been switched and now you have
> to mark them as MAKE_JOBS_UNSAFE to say that parallel builds shouldn't be
> done, indicating that parallel builds are the default now (unless I'm
> misreading the code)
> 
> You can try putting
> 
> DISABLE_MAKE_JOBS=yes
> 
> into /etc/make.conf to see if that stops the problem on port builds.
>
Gary:

I don't see that as an option in /usr/share/examples/etc/make.conf.
Did you find that one by reading the source code?  I will add that
to my /etc/make.conf and see if it makes a difference.  This issue is
very intermittant and may not trigger for weeks or months.  I'll
repost to the list if any problems show up after setting the flag in
my /etc/make.conf

Thanks for the help.

Tom

-- 
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 9-Stable + Atom D510 Freeze

2013-09-20 Thread ill...@gmail.com

On 20 September 2013 11:52, Gary Palmer  wrote:
> On Fri, Sep 20, 2013 at 10:49:28AM -0400, Thomas Laus wrote:
>> Gary Palmer [gpal...@freebsd.org] wrote:
>> >
>> > When building kernel & world do you use the '-j' argument to do parallel
>> > builds?  AFAIK thats not done by default, but it is for some ports.
>> >
>> Gary:
>>
>> I just use the system defaults when building anything.  If there is a
>> '-j' argument passed to the compiler, I was not the one that did it.
>> Does this mean that the port building process needs to determine the
>> processor type in the configure stage?  I only use portmaster to keep
>> the ports updated.  I don't know of a global hook that will change the
>> compiler build flags in portmaster.
>
> Hi Tim,
>
> It's not a compiler flag, it's a make flag.  make -j n will fork off up to
> n compilers to do the build.  If you just do "make buildworld" then there
> is no parallel compilation.
>
> It used to be that ports had MAKE_JOBS_SAFE in the Makefile to mark that
> the port could be built using parallel compiles with the '-j' argument
> to make.  It appears that the logic has been switched and now you have
> to mark them as MAKE_JOBS_UNSAFE to say that parallel builds shouldn't be
> done, indicating that parallel builds are the default now (unless I'm
> misreading the code)
>
> You can try putting
>
> DISABLE_MAKE_JOBS=yes
>
> into /etc/make.conf to see if that stops the problem on port builds.
> Alternatively I think you could do
>
> portmaster -m DISABLE_MAKE_JOBS=yes 
>
> However you'd have to do that each time you run portmaster.  I think
> putting
>
> PM_MAKE_ARGS="DISABLE_MAKE_JOBS=yes"
>
> in your .portmasterrc may do the same thing (not tried it).
>
> Note: this is NOT a fix.  If it works, it merely stops the ports builder
> from triggering the problem by not doing parallel compiles.  The compiles
> will also take longer.
>

I believe that both world/kernel & ports will honour
MAKE_JOBS_NUMBER=1 #(in /etc/make.conf)
which should restrict all builds to 1 "parallel" thread,
yes?

-- 
--
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 9-Stable + Atom D510 Freeze

2013-09-20 Thread Gary Palmer

On Fri, Sep 20, 2013 at 10:49:28AM -0400, Thomas Laus wrote:
> Gary Palmer [gpal...@freebsd.org] wrote:
> > 
> > When building kernel & world do you use the '-j' argument to do parallel
> > builds?  AFAIK thats not done by default, but it is for some ports.
> >
> Gary:
> 
> I just use the system defaults when building anything.  If there is a
> '-j' argument passed to the compiler, I was not the one that did it.
> Does this mean that the port building process needs to determine the
> processor type in the configure stage?  I only use portmaster to keep
> the ports updated.  I don't know of a global hook that will change the
> compiler build flags in portmaster.

Hi Tim,

It's not a compiler flag, it's a make flag.  make -j n will fork off up to
n compilers to do the build.  If you just do "make buildworld" then there
is no parallel compilation.

It used to be that ports had MAKE_JOBS_SAFE in the Makefile to mark that
the port could be built using parallel compiles with the '-j' argument
to make.  It appears that the logic has been switched and now you have
to mark them as MAKE_JOBS_UNSAFE to say that parallel builds shouldn't be
done, indicating that parallel builds are the default now (unless I'm
misreading the code)

You can try putting

DISABLE_MAKE_JOBS=yes

into /etc/make.conf to see if that stops the problem on port builds.
Alternatively I think you could do

portmaster -m DISABLE_MAKE_JOBS=yes 

However you'd have to do that each time you run portmaster.  I think
putting

PM_MAKE_ARGS="DISABLE_MAKE_JOBS=yes"

in your .portmasterrc may do the same thing (not tried it).

Note: this is NOT a fix.  If it works, it merely stops the ports builder
from triggering the problem by not doing parallel compiles.  The compiles
will also take longer.

Regards,

Gary
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 9-Stable + Atom D510 Freeze

2013-09-20 Thread Thomas Laus

Gary Palmer [gpal...@freebsd.org] wrote:
> 
> When building kernel & world do you use the '-j' argument to do parallel
> builds?  AFAIK thats not done by default, but it is for some ports.
>
Gary:

I just use the system defaults when building anything.  If there is a
'-j' argument passed to the compiler, I was not the one that did it.
Does this mean that the port building process needs to determine the
processor type in the configure stage?  I only use portmaster to keep
the ports updated.  I don't know of a global hook that will change the
compiler build flags in portmaster.

Tom 

-- 
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 9-Stable + Atom D510 Freeze

2013-09-20 Thread Gary Palmer

On Fri, Sep 20, 2013 at 09:12:09AM -0400, Thomas Laus wrote:
> > Tom,
> > I have had multiple D510's and now D525's that are part of my test
> > systems, all are 4GB machines and all run the latest (ie 2 days old) 9.X
> > Stable.  They're faultless.  I have a D510 in production serving 30
> > users - yes its a 1G system running, sendmail, squid, samba as PDC. 
> > It's been in place for at least 7 months and runs without any hiccups.
> > 
> > Though I would point out that the Atom processor does NOT do out of
> > order processing, so a VIA motherboard that is of lower GHz builds
> > worlds/ports in less time that a supposedly faster Atom. 
> > 
> > Your question re HT, yes HT introduces some additional latency, but is
> > unlikely to be the problem.
> >
> Thanks for the information about the HT CPU's.  I asked the question to the 
> group because I did not know if they were functionally any different than a 
> traditional CPU.  I successfully built my problem port, Tshark, yesterday 
> while monitoring 'top' on another console.  I observed that all 4 cpu's were 
> in service for the build and at times were running at 100 percent each.  The 
> State column on all 4 occasionally showed a 'pfault' on all 4 but recovered 
> and the build continued to successful completion.
>  
> > When I experience something like spurious reboots and it is definately
> > not hardware, then I delete /usr/src and /usr/ports and perform a
> > complete rebuild.  (Yes seriously, and on the Atom's we're talking days,
> > aren't we :)  )
> >
> I have been using this Atom D510 since it was released about 3 years ago.  It 
> ran on FreeBSD 8-Stable until about a month ago.  I installed an Intel 520 
> SSD and loaded a fresh copy of a FreeBSD 9 Snapshot.  After getting the 
> source and ports tarballs, I used svnup to bring both up to date.  I built 
> and installed world and the kernel to bring me up to Stable.  I rebuilt all 
> of my ports using Portmaster.
> 
> The spurious reboot issue existed for the last 3 years when running FreeBSD-8 
> Stable.  I never had the problem building world or kernel.  It only occurred 
> when building some ports.  Subversion and Tshark more often than others.  
> FreeBSD 9-Stable was frozen when I tried to build tshark, but I was able to 
> build it OK yesterday.  Everything hardware related other than the Atom 
> microprocessor and the Intel motherboard itself is new.  The OS is now a 
> different version and all of the source was rebuilt monthly.  The ports have 
> been been built many times in the last 3 years.

When building kernel & world do you use the '-j' argument to do parallel
builds?  AFAIK thats not done by default, but it is for some ports.

Gary
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 9-Stable + Atom D510 Freeze

2013-09-20 Thread Thomas Laus

> Tom,
> I have had multiple D510's and now D525's that are part of my test
> systems, all are 4GB machines and all run the latest (ie 2 days old) 9.X
> Stable.  They're faultless.  I have a D510 in production serving 30
> users - yes its a 1G system running, sendmail, squid, samba as PDC. 
> It's been in place for at least 7 months and runs without any hiccups.
> 
> Though I would point out that the Atom processor does NOT do out of
> order processing, so a VIA motherboard that is of lower GHz builds
> worlds/ports in less time that a supposedly faster Atom. 
> 
> Your question re HT, yes HT introduces some additional latency, but is
> unlikely to be the problem.
>
Thanks for the information about the HT CPU's.  I asked the question to the 
group because I did not know if they were functionally any different than a 
traditional CPU.  I successfully built my problem port, Tshark, yesterday 
while monitoring 'top' on another console.  I observed that all 4 cpu's were 
in service for the build and at times were running at 100 percent each.  The 
State column on all 4 occasionally showed a 'pfault' on all 4 but recovered 
and the build continued to successful completion.
 
> When I experience something like spurious reboots and it is definately
> not hardware, then I delete /usr/src and /usr/ports and perform a
> complete rebuild.  (Yes seriously, and on the Atom's we're talking days,
> aren't we :)  )
>
I have been using this Atom D510 since it was released about 3 years ago.  It 
ran on FreeBSD 8-Stable until about a month ago.  I installed an Intel 520 
SSD and loaded a fresh copy of a FreeBSD 9 Snapshot.  After getting the 
source and ports tarballs, I used svnup to bring both up to date.  I built 
and installed world and the kernel to bring me up to Stable.  I rebuilt all 
of my ports using Portmaster.

The spurious reboot issue existed for the last 3 years when running FreeBSD-8 
Stable.  I never had the problem building world or kernel.  It only occurred 
when building some ports.  Subversion and Tshark more often than others.  
FreeBSD 9-Stable was frozen when I tried to build tshark, but I was able to 
build it OK yesterday.  Everything hardware related other than the Atom 
microprocessor and the Intel motherboard itself is new.  The OS is now a 
different version and all of the source was rebuilt monthly.  The ports have 
been been built many times in the last 3 years.

Tom

-- 
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

FreeBSD 9-Stable + Atom D510 Freeze

2013-09-19 Thread Thomas Laus

I have an Intel Atom D510 motherboard that is being used in my home router 
for
the last several years.  It started on FreeBSD 8-Stable and was recently 
upgraded
to FreeBSD 9-Stable.  Through the years I have observed spurious reboots when
rebuilding ports, but never world or kernel.  I have tried both schedulers in
FreeBSD 8-Stable.  I have also replaced memory, power supply and disk drives
to attempt to isolate hardware from the equation.  Last evening I had a 
complete
freeze when rebuilding tshark.  The keyboard was dead, screen display was 
frozen
and no network access.  I recovered by pressing the reset switch.  As always,
there are no log entries about panic or core dumps in the swap partition.

My question to the group is whether FreeBSD is correctly identifying the 
number
of CPU's on this motherboard.  I see 4 listed in the top utility and it 
appears
that code is being run on all 4.  Are HT CPU's equal in performance to 'real'
ones and should they participate fully in the task scheduler operation?  
Since
my problem is very intermittant and non-reproducable, is it possible that 
code
may try to exercise something in a HT core that should only be run on a 
'real'
one?

My DMESG:

Sep 18 20:50:19 mail kernel: Copyright (c) 1992-2013 The FreeBSD Project.
Sep 18 20:50:19 mail kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 
1989, 1991, 1992, 1993, 1994
Sep 18 20:50:19 mail kernel: The Regents of the University of California. All 
rights reserved.
Sep 18 20:50:19 mail kernel: FreeBSD is a registered trademark of The FreeBSD 
Foundation.
Sep 18 20:50:19 mail kernel: FreeBSD 9.2-PRERELEASE #2: Sat Sep 14 18:27:55 
EDT 2013
Sep 18 20:50:19 mail kernel: 
root@x.x.x:/usr/obj/usr/src/sys/ROUTER amd64   
Sep 18 20:50:19 mail kernel: gcc version 4.2.1 20070831 patched [FreeBSD]
Sep 18 20:50:19 mail kernel: CPU: Intel(R) Atom(TM) CPU D510   @ 1.66GHz 
(1662.72-MHz K8-class CPU)
Sep 18 20:50:19 mail kernel: Origin = "GenuineIntel"  Id = 0x106ca  Family = 
0x6  Model = 0x1c  Stepping = 10
Sep 18 20:50:19 mail kernel: 
Features=0xbfebfbff
Sep 18 20:50:19 mail kernel: 
Features2=0x40e31d
Sep 18 20:50:19 mail kernel: AMD Features=0x20100800
Sep 18 20:50:19 mail kernel: AMD Features2=0x1
Sep 18 20:50:19 mail kernel: TSC: P-state invariant, performance statistics
Sep 18 20:50:19 mail kernel: real memory  = 1073741824 (1024 MB)
Sep 18 20:50:19 mail kernel: avail memory = 1002127360 (955 MB)
Sep 18 20:50:19 mail kernel: Event timer "LAPIC" quality 400
Sep 18 20:50:19 mail kernel: ACPI APIC Table: 
Sep 18 20:50:19 mail kernel: FreeBSD/SMP: Multiprocessor System Detected: 4 
CPUs
Sep 18 20:50:19 mail kernel: FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT 
threads
Sep 18 20:50:19 mail kernel: cpu0 (BSP): APIC ID:  0
Sep 18 20:50:19 mail kernel: cpu1 (AP/HT): APIC ID:  1
Sep 18 20:50:19 mail kernel: cpu2 (AP): APIC ID:  2
Sep 18 20:50:19 mail kernel: cpu3 (AP/HT): APIC ID:  3

Tom

-- 
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-08-18 Thread Dominic Fandrey

On 28/07/2013 08:24, Konstantin Belousov wrote:
> On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote:
>> On 26/07/2013 19:10, Dominic Fandrey wrote:
>>> On 25/07/2013 12:00, Konstantin Belousov wrote:
>>>> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
>>>>> On 22/07/2013 12:07, Konstantin Belousov wrote:
>>>>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>>>>>>> ...
>>>>>>>
>>>>>>> I run amd through sysutils/automounter, which is a scripting solution
>>>>>>> that generates an amd.map file based on encountered devices and devd
>>>>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
>>>>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
>>>>>>>
>>>>>>> Nothing was mounted (by amd) during the last freeze.
>>>>>>>
>>>>>>> ...
>>>>>>
>>>>>> Are you sure that the machine did not paniced ?  Do you have serial 
>>>>>> console ?
>>>>>>
>>>>>> The amd(8) locks itself into memory, most likely due to the fear of
>>>>>> deadlock. There are some known issues with user wirings in stable/9.
>>>>>> If the problem you see is indeed due to wiring, you might try to apply
>>>>>> r253187-r253191.
>>>>>
>>>>> I tried that. Applying the diff was straightforward enough. But the
>>>>> resulting kernel paniced as soon as it tried to mount the root fs.
>>>> You did provided a useful info to diagnose the issue.
>>>>
>>>> Patch should keep KBI compatible, but, just in case, if you have any
>>>> third-party module, rebuild it.
>>>>
>>>>>
>>>>> So I'll wait for the MFC from someone who knows what he/she is doing.
>>>>
>>>> Patch below booted for me, and I run some sanity check tests for the
>>>> mlockall(2), which also did not resulted in misbehaviour.
>>>>
>>>
>>> Your patch applied cleanly and the system booted with the resulting
>>> kernel.
>>>
>>> Amd exhibits several very strange behaviours. ...
>>
>> I can verify the whole thing with a clean world and kernel.
>>
>> This time I'll concentrate on the first instance of amd:
>>
>> # tail -n3 /var/log/messages
>> Jul 27 10:08:56 mobileKamikaze kernel: newnfs server 
>> pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
>> Jul 27 10:09:41 mobileKamikaze kernel: newnfs server 
>> pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
>> Jul 27 10:11:41 mobileKamikaze last message repeated 3 times
>>
>> The process, it turns out, simply doesn't exist. There is another
>> process, though:
>> # ps auxww | grep -F sbin/amd
>> root   5869   0.0  0.1  12036   8020 ??  S10:08am   0:00.01 
>> /usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 
>> /var/run/automounter.amd.mnt /var/run/automounter.amd.map
>>
>> # cat /var/run/automounter.amd.pid
>> 5868
>>
>> Here is what I think happens, amd forks a subprocess and the main
>> process, silently dies after it wrote its pidfile.
> Nothing dies silently.  Either process was killed by signal, or it
> exited with the explicit call to exit(2).  In the first case, default
> kernel settings of kern.logsigexit should make a record in the syslog.
> The machdep.uprintf_signal might be also useful, but not for daemons.

Well, it finally turned out, that amd came up in this broken state with
missing processes because rpcbind wasn't running.

I think it would be a good idea for amd to fail with a bit of noise
instead of coming up broken, causing the kernel to spam syslog, and
confusing the user.

At this point I'd usually pull whoever works on amd into the conversation,
but the most recent change to src/contrib/amd is 4 years old.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-08-17 Thread Dominic Fandrey

On 28/07/2013 11:00, Daniel Braniss wrote:
>> On 28/07/2013 08:24, Konstantin Belousov wrote:
>>> On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote:
>>>> On 26/07/2013 19:10, Dominic Fandrey wrote:
>>>>> On 25/07/2013 12:00, Konstantin Belousov wrote:
>>>>>> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
>>>>>>> On 22/07/2013 12:07, Konstantin Belousov wrote:
>>>>>>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> I run amd through sysutils/automounter, which is a scripting solution
>>>>>>>>> that generates an amd.map file based on encountered devices and devd
>>>>>>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
>>>>>>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the 
>>>>>>>>> freeze.
>>>>>>>>>
>>>>>>>>> Nothing was mounted (by amd) during the last freeze.
>>>>>>>>>
>>>>>>>>> ...

Thank you everyone, after updating to stable/9 r254418 the problem
has dissipated.


-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-28 Thread Daniel Braniss

> On 28/07/2013 08:24, Konstantin Belousov wrote:
> > On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote:
> >> On 26/07/2013 19:10, Dominic Fandrey wrote:
> >>> On 25/07/2013 12:00, Konstantin Belousov wrote:
> >>>> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
> >>>>> On 22/07/2013 12:07, Konstantin Belousov wrote:
> >>>>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
> >>>>>>> ...
> >>>>>>>
> >>>>>>> I run amd through sysutils/automounter, which is a scripting solution
> >>>>>>> that generates an amd.map file based on encountered devices and devd
> >>>>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
> >>>>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the 
> >>>>>>> freeze.
> >>>>>>>
> >>>>>>> Nothing was mounted (by amd) during the last freeze.
> >>>>>>>
> >>>>>>> ...
> >>>>>>
> >>>>>> Are you sure that the machine did not paniced ?  Do you have serial 
> >>>>>> console ?
> >>>>>>
> >>>>>> The amd(8) locks itself into memory, most likely due to the fear of
> >>>>>> deadlock. There are some known issues with user wirings in stable/9.
> >>>>>> If the problem you see is indeed due to wiring, you might try to apply
> >>>>>> r253187-r253191.
> >>>>>
> >>>>> I tried that. Applying the diff was straightforward enough. But the
> >>>>> resulting kernel paniced as soon as it tried to mount the root fs.
> >>>> You did provided a useful info to diagnose the issue.
> >>>>
> >>>> Patch should keep KBI compatible, but, just in case, if you have any
> >>>> third-party module, rebuild it.
> >>>>
> >>>>>
> >>>>> So I'll wait for the MFC from someone who knows what he/she is doing.
> >>>>
> >>>> Patch below booted for me, and I run some sanity check tests for the
> >>>> mlockall(2), which also did not resulted in misbehaviour.
> >>>>
> >>>
> >>> Your patch applied cleanly and the system booted with the resulting
> >>> kernel.
> >>>
> >>> Amd exhibits several very strange behaviours. ...
> >>
> >> I can verify the whole thing with a clean world and kernel.
> >>
> >> This time I'll concentrate on the first instance of amd:
> >>
> >> # tail -n3 /var/log/messages
> >> Jul 27 10:08:56 mobileKamikaze kernel: newnfs server 
> >> pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
> >> Jul 27 10:09:41 mobileKamikaze kernel: newnfs server 
> >> pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
> >> Jul 27 10:11:41 mobileKamikaze last message repeated 3 times
> >>
> >> The process, it turns out, simply doesn't exist. There is another
> >> process, though:
> >> # ps auxww | grep -F sbin/amd
> >> root   5869   0.0  0.1  12036   8020 ??  S10:08am   0:00.01 
> >> /usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 
> >> /var/run/automounter.amd.mnt /var/run/automounter.amd.map
> >>
> >> # cat /var/run/automounter.amd.pid
> >> 5868
> >>
> >> Here is what I think happens, amd forks a subprocess and the main
> >> process, silently dies after it wrote its pidfile.
> > Nothing dies silently.  Either process was killed by signal, or it
> > exited with the explicit call to exit(2).  In the first case, default
> > kernel settings of kern.logsigexit should make a record in the syslog.
> > The machdep.uprintf_signal might be also useful, but not for daemons.
> 
> Well, after I reverted your patch I got some things in the syslog.
> Sometimes amd works as expected, sometimes it dies right after starting:
> Jul 28 10:19:42 mobileKamikaze kernel: pid 24217 (amd), uid 0: exited on 
> signal 11 (core dumped)
> 
> This is just all over confusing.

just to confuse you a bit more :-)
I gave up with mlockall(2) so I compiled amd statically linked.

my 5 cents.

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-28 Thread Dominic Fandrey

On 28/07/2013 08:24, Konstantin Belousov wrote:
> On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote:
>> On 26/07/2013 19:10, Dominic Fandrey wrote:
>>> On 25/07/2013 12:00, Konstantin Belousov wrote:
>>>> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
>>>>> On 22/07/2013 12:07, Konstantin Belousov wrote:
>>>>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>>>>>>> ...
>>>>>>>
>>>>>>> I run amd through sysutils/automounter, which is a scripting solution
>>>>>>> that generates an amd.map file based on encountered devices and devd
>>>>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
>>>>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
>>>>>>>
>>>>>>> Nothing was mounted (by amd) during the last freeze.
>>>>>>>
>>>>>>> ...
>>>>>>
>>>>>> Are you sure that the machine did not paniced ?  Do you have serial 
>>>>>> console ?
>>>>>>
>>>>>> The amd(8) locks itself into memory, most likely due to the fear of
>>>>>> deadlock. There are some known issues with user wirings in stable/9.
>>>>>> If the problem you see is indeed due to wiring, you might try to apply
>>>>>> r253187-r253191.
>>>>>
>>>>> I tried that. Applying the diff was straightforward enough. But the
>>>>> resulting kernel paniced as soon as it tried to mount the root fs.
>>>> You did provided a useful info to diagnose the issue.
>>>>
>>>> Patch should keep KBI compatible, but, just in case, if you have any
>>>> third-party module, rebuild it.
>>>>
>>>>>
>>>>> So I'll wait for the MFC from someone who knows what he/she is doing.
>>>>
>>>> Patch below booted for me, and I run some sanity check tests for the
>>>> mlockall(2), which also did not resulted in misbehaviour.
>>>>
>>>
>>> Your patch applied cleanly and the system booted with the resulting
>>> kernel.
>>>
>>> Amd exhibits several very strange behaviours. ...
>>
>> I can verify the whole thing with a clean world and kernel.
>>
>> This time I'll concentrate on the first instance of amd:
>>
>> # tail -n3 /var/log/messages
>> Jul 27 10:08:56 mobileKamikaze kernel: newnfs server 
>> pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
>> Jul 27 10:09:41 mobileKamikaze kernel: newnfs server 
>> pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
>> Jul 27 10:11:41 mobileKamikaze last message repeated 3 times
>>
>> The process, it turns out, simply doesn't exist. There is another
>> process, though:
>> # ps auxww | grep -F sbin/amd
>> root   5869   0.0  0.1  12036   8020 ??  S10:08am   0:00.01 
>> /usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 
>> /var/run/automounter.amd.mnt /var/run/automounter.amd.map
>>
>> # cat /var/run/automounter.amd.pid
>> 5868
>>
>> Here is what I think happens, amd forks a subprocess and the main
>> process, silently dies after it wrote its pidfile.
> Nothing dies silently.  Either process was killed by signal, or it
> exited with the explicit call to exit(2).  In the first case, default
> kernel settings of kern.logsigexit should make a record in the syslog.
> The machdep.uprintf_signal might be also useful, but not for daemons.

Well, after I reverted your patch I got some things in the syslog.
Sometimes amd works as expected, sometimes it dies right after starting:
Jul 28 10:19:42 mobileKamikaze kernel: pid 24217 (amd), uid 0: exited on signal 
11 (core dumped)

This is just all over confusing.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-27 Thread Konstantin Belousov

On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote:
> On 26/07/2013 19:10, Dominic Fandrey wrote:
> > On 25/07/2013 12:00, Konstantin Belousov wrote:
> >> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
> >>> On 22/07/2013 12:07, Konstantin Belousov wrote:
> >>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
> >>>>> ...
> >>>>>
> >>>>> I run amd through sysutils/automounter, which is a scripting solution
> >>>>> that generates an amd.map file based on encountered devices and devd
> >>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
> >>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
> >>>>>
> >>>>> Nothing was mounted (by amd) during the last freeze.
> >>>>>
> >>>>> ...
> >>>>
> >>>> Are you sure that the machine did not paniced ?  Do you have serial 
> >>>> console ?
> >>>>
> >>>> The amd(8) locks itself into memory, most likely due to the fear of
> >>>> deadlock. There are some known issues with user wirings in stable/9.
> >>>> If the problem you see is indeed due to wiring, you might try to apply
> >>>> r253187-r253191.
> >>>
> >>> I tried that. Applying the diff was straightforward enough. But the
> >>> resulting kernel paniced as soon as it tried to mount the root fs.
> >> You did provided a useful info to diagnose the issue.
> >>
> >> Patch should keep KBI compatible, but, just in case, if you have any
> >> third-party module, rebuild it.
> >>
> >>>
> >>> So I'll wait for the MFC from someone who knows what he/she is doing.
> >>
> >> Patch below booted for me, and I run some sanity check tests for the
> >> mlockall(2), which also did not resulted in misbehaviour.
> >>
> > 
> > Your patch applied cleanly and the system booted with the resulting
> > kernel.
> > 
> > Amd exhibits several very strange behaviours. ...
> 
> I can verify the whole thing with a clean world and kernel.
> 
> This time I'll concentrate on the first instance of amd:
> 
> # tail -n3 /var/log/messages
> Jul 27 10:08:56 mobileKamikaze kernel: newnfs server 
> pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
> Jul 27 10:09:41 mobileKamikaze kernel: newnfs server 
> pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
> Jul 27 10:11:41 mobileKamikaze last message repeated 3 times
> 
> The process, it turns out, simply doesn't exist. There is another
> process, though:
> # ps auxww | grep -F sbin/amd
> root   5869   0.0  0.1  12036   8020 ??  S10:08am   0:00.01 
> /usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 
> /var/run/automounter.amd.mnt /var/run/automounter.amd.map
> 
> # cat /var/run/automounter.amd.pid
> 5868
> 
> Here is what I think happens, amd forks a subprocess and the main
> process, silently dies after it wrote its pidfile.
Nothing dies silently.  Either process was killed by signal, or it
exited with the explicit call to exit(2).  In the first case, default
kernel settings of kern.logsigexit should make a record in the syslog.
The machdep.uprintf_signal might be also useful, but not for daemons.

If the process called exit(2), ktrace would show it.

> 
> For completeness:
> # mount
> /dev/ufs/5root on / (ufs, local, noatime, soft-updates)
> devfs on /dev (devfs, local, multilabel)
> /dev/ufs/5stor on /pool/5stor (ufs, local, noatime, soft-updates)
> /pool/5stor/usr on /usr (nullfs, local, noatime)
> /pool/5stor/var on /var (nullfs, local, noatime)
> /usr/home/root on /root (nullfs, local, noatime)
> tmpfs on /var/log (tmpfs, local)
> tmpfs on /var/run (tmpfs, local)
> tmpfs on /tmp (tmpfs, local)
> 
> Everything else seems to work. I'll revert your patch for now and
> wait for the MFC.

I was unable to get useful information from any of your posts.
My current plan is to merge the revisions after the 9.2 freeze is over.


pgp56BLYX1cxw.pgp
Description: PGP signature

Re: stopping amd causes a freeze

2013-07-27 Thread Dominic Fandrey

On 26/07/2013 19:10, Dominic Fandrey wrote:
> On 25/07/2013 12:00, Konstantin Belousov wrote:
>> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
>>> On 22/07/2013 12:07, Konstantin Belousov wrote:
>>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>>>>> ...
>>>>>
>>>>> I run amd through sysutils/automounter, which is a scripting solution
>>>>> that generates an amd.map file based on encountered devices and devd
>>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
>>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
>>>>>
>>>>> Nothing was mounted (by amd) during the last freeze.
>>>>>
>>>>> ...
>>>>
>>>> Are you sure that the machine did not paniced ?  Do you have serial 
>>>> console ?
>>>>
>>>> The amd(8) locks itself into memory, most likely due to the fear of
>>>> deadlock. There are some known issues with user wirings in stable/9.
>>>> If the problem you see is indeed due to wiring, you might try to apply
>>>> r253187-r253191.
>>>
>>> I tried that. Applying the diff was straightforward enough. But the
>>> resulting kernel paniced as soon as it tried to mount the root fs.
>> You did provided a useful info to diagnose the issue.
>>
>> Patch should keep KBI compatible, but, just in case, if you have any
>> third-party module, rebuild it.
>>
>>>
>>> So I'll wait for the MFC from someone who knows what he/she is doing.
>>
>> Patch below booted for me, and I run some sanity check tests for the
>> mlockall(2), which also did not resulted in misbehaviour.
>>
> 
> Your patch applied cleanly and the system booted with the resulting
> kernel.
> 
> Amd exhibits several very strange behaviours. ...

I can verify the whole thing with a clean world and kernel.

This time I'll concentrate on the first instance of amd:

# tail -n3 /var/log/messages
Jul 27 10:08:56 mobileKamikaze kernel: newnfs server 
pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
Jul 27 10:09:41 mobileKamikaze kernel: newnfs server 
pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
Jul 27 10:11:41 mobileKamikaze last message repeated 3 times

The process, it turns out, simply doesn't exist. There is another
process, though:
# ps auxww | grep -F sbin/amd
root   5869   0.0  0.1  12036   8020 ??  S10:08am   0:00.01 
/usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 
/var/run/automounter.amd.mnt /var/run/automounter.amd.map

# cat /var/run/automounter.amd.pid
5868

Here is what I think happens, amd forks a subprocess and the main
process, silently dies after it wrote its pidfile.

For completeness:
# mount
/dev/ufs/5root on / (ufs, local, noatime, soft-updates)
devfs on /dev (devfs, local, multilabel)
/dev/ufs/5stor on /pool/5stor (ufs, local, noatime, soft-updates)
/pool/5stor/usr on /usr (nullfs, local, noatime)
/pool/5stor/var on /var (nullfs, local, noatime)
/usr/home/root on /root (nullfs, local, noatime)
tmpfs on /var/log (tmpfs, local)
tmpfs on /var/run (tmpfs, local)
tmpfs on /tmp (tmpfs, local)

Everything else seems to work. I'll revert your patch for now and
wait for the MFC.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-27 Thread Dominic Fandrey

On 26/07/2013 20:37, Artem Belevich wrote:
> On Fri, Jul 26, 2013 at 10:10 AM, Dominic Fandrey wrote:
> 
>> Amd exhibits several very strange behaviours.
>>
>> a)
>> During the first start it writes the wrong PID into the pidfile,
>> it however still reacts to SIGTERM.
>>
>> b)
>> After starting it again, it no longer reacts to SIGTERM.
>>
> 
> amd does block off signals in some of its sub-processes. For instance amd
> process that works as NFS server and handles amd mount points does block
> off INT/TERM/CHLD/HUP. See /usr/src/contrib/amd/amd/nfs_start.c

Didn't know that. But so sending signals to the process in the pidfile,
used to work™.

>> c)
>> It appear to be no longer reacting to SIGHUP, which is required to
>> tell it that the amd.map was updated.
>>
>>
> Try using 'amq -f' which would ask amd to reload its maps via RPC and
> should work regardless of whether you know the right PID.

amq -m or amq -p just hang there and do nothing right now. As soon as amd
is unbroken this is good to know, though.

Sending a SIGINFO:
load: 0.58  cmd: amq 6071 [kqread] 4.71r 0.00u 0.00s 0% 2132k

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-26 Thread Artem Belevich

On Fri, Jul 26, 2013 at 10:10 AM, Dominic Fandrey wrote:

> Amd exhibits several very strange behaviours.
>
> a)
> During the first start it writes the wrong PID into the pidfile,
> it however still reacts to SIGTERM.
>
> b)
> After starting it again, it no longer reacts to SIGTERM.
>

amd does block off signals in some of its sub-processes. For instance amd
process that works as NFS server and handles amd mount points does block
off INT/TERM/CHLD/HUP. See /usr/src/contrib/amd/amd/nfs_start.c

>
> c)
> It appear to be no longer reacting to SIGHUP, which is required to
> tell it that the amd.map was updated.
>
>
Try using 'amq -f' which would ask amd to reload its maps via RPC and
should work regardless of whether you know the right PID.

Strangely enough amd man page does not mention SIGHUP at all.
amd/doc/am-utils.texi in the source tree does, but only when it talks about
hlfsd or about 'type:=auto' maps with 'cache' option.
Documentation on am-utils.org matches am-utils.texi.

As far as I can tell 'amq -f' is the official way to tell amd that it
should reload maps.

--Artem

> d)
> It doesn't work at all, I only get:
> # cd /media/ufs/FreeBSD_Install
> /media/ufs/FreeBSD_Install: Too many levels of symbolic links.
>
> e)
> A SIGKILL without load will terminate the process. A SIGKILL while
> there is heavy file system load panics the system.
>
>
I'll try a clean buildworld buildkernel and repeat.
>
> --
> A: Because it fouls the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing on usenet and in e-mail?
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-26 Thread Dominic Fandrey

On 25/07/2013 12:00, Konstantin Belousov wrote:
> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
>> On 22/07/2013 12:07, Konstantin Belousov wrote:
>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>>>> ...
>>>>
>>>> I run amd through sysutils/automounter, which is a scripting solution
>>>> that generates an amd.map file based on encountered devices and devd
>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
>>>>
>>>> Nothing was mounted (by amd) during the last freeze.
>>>>
>>>> ...
>>>
>>> Are you sure that the machine did not paniced ?  Do you have serial console 
>>> ?
>>>
>>> The amd(8) locks itself into memory, most likely due to the fear of
>>> deadlock. There are some known issues with user wirings in stable/9.
>>> If the problem you see is indeed due to wiring, you might try to apply
>>> r253187-r253191.
>>
>> I tried that. Applying the diff was straightforward enough. But the
>> resulting kernel paniced as soon as it tried to mount the root fs.
> You did provided a useful info to diagnose the issue.
> 
> Patch should keep KBI compatible, but, just in case, if you have any
> third-party module, rebuild it.
> 
>>
>> So I'll wait for the MFC from someone who knows what he/she is doing.
> 
> Patch below booted for me, and I run some sanity check tests for the
> mlockall(2), which also did not resulted in misbehaviour.
> 

Your patch applied cleanly and the system booted with the resulting
kernel.

Amd exhibits several very strange behaviours.

a)
During the first start it writes the wrong PID into the pidfile,
it however still reacts to SIGTERM.

b)
After starting it again, it no longer reacts to SIGTERM.

c)
It appear to be no longer reacting to SIGHUP, which is required to
tell it that the amd.map was updated.

d)
It doesn't work at all, I only get:
# cd /media/ufs/FreeBSD_Install
/media/ufs/FreeBSD_Install: Too many levels of symbolic links.

e)
A SIGKILL without load will terminate the process. A SIGKILL while
there is heavy file system load panics the system.

I'll try a clean buildworld buildkernel and repeat.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-25 Thread Konstantin Belousov

On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
> On 22/07/2013 12:07, Konstantin Belousov wrote:
> > On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
> >> ...
> >>
> >> I run amd through sysutils/automounter, which is a scripting solution
> >> that generates an amd.map file based on encountered devices and devd
> >> events. The SIGHUP it sends to amd to tell it the map file was updated
> >> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
> >>
> >> Nothing was mounted (by amd) during the last freeze.
> >>
> >> ...
> > 
> > Are you sure that the machine did not paniced ?  Do you have serial console 
> > ?
> > 
> > The amd(8) locks itself into memory, most likely due to the fear of
> > deadlock. There are some known issues with user wirings in stable/9.
> > If the problem you see is indeed due to wiring, you might try to apply
> > r253187-r253191.
> 
> I tried that. Applying the diff was straightforward enough. But the
> resulting kernel paniced as soon as it tried to mount the root fs.
You did provided a useful info to diagnose the issue.

Patch should keep KBI compatible, but, just in case, if you have any
third-party module, rebuild it.

> 
> So I'll wait for the MFC from someone who knows what he/she is doing.

Patch below booted for me, and I run some sanity check tests for the
mlockall(2), which also did not resulted in misbehaviour.

Index: kern/vfs_bio.c
===
--- kern/vfs_bio.c  (revision 253643)
+++ kern/vfs_bio.c  (working copy)
@@ -1614,7 +1614,8 @@ brelse(struct buf *bp)
(PAGE_SIZE - poffset) : resid;
 
KASSERT(presid >= 0, ("brelse: extra page"));
-   vm_page_set_invalid(m, poffset, presid);
+   if (pmap_page_wired_mappings(m) == 0)
+   vm_page_set_invalid(m, poffset, presid);
if (had_bogus)
printf("avoided corruption bug in 
bogus_page/brelse code\n");
}
Index: vm/vm_fault.c
===
--- vm/vm_fault.c   (revision 253643)
+++ vm/vm_fault.c   (working copy)
@@ -286,6 +286,19 @@ RetryFault:;
(u_long)vaddr);
}
 
+   if (fs.entry->eflags & MAP_ENTRY_IN_TRANSITION &&
+   fs.entry->wiring_thread != curthread) {
+   vm_map_unlock_read(fs.map);
+   vm_map_lock(fs.map);
+   if (vm_map_lookup_entry(fs.map, vaddr, &fs.entry) &&
+   (fs.entry->eflags & MAP_ENTRY_IN_TRANSITION)) {
+   fs.entry->eflags |= MAP_ENTRY_NEEDS_WAKEUP;
+   vm_map_unlock_and_wait(fs.map, 0);
+   } else
+   vm_map_unlock(fs.map);
+   goto RetryFault;
+   }
+
/*
 * Make a reference to this object to prevent its disposal while we
 * are messing with it.  Once we have the reference, the map is free
Index: vm/vm_map.c
===
--- vm/vm_map.c (revision 253643)
+++ vm/vm_map.c (working copy)
@@ -2272,6 +2272,7 @@ vm_map_unwire(vm_map_t map, vm_offset_t start, vm_
 * above.)
 */
entry->eflags |= MAP_ENTRY_IN_TRANSITION;
+   entry->wiring_thread = curthread;
/*
 * Check the map for holes in the specified region.
 * If VM_MAP_WIRE_HOLESOK was specified, skip this check.
@@ -2304,8 +2305,24 @@ done:
else
KASSERT(result, ("vm_map_unwire: lookup failed"));
}
-   entry = first_entry;
-   while (entry != &map->header && entry->start < end) {
+   for (entry = first_entry; entry != &map->header && entry->start < end;
+   entry = entry->next) {
+   /*
+* If VM_MAP_WIRE_HOLESOK was specified, an empty
+* space in the unwired region could have been mapped
+* while the map lock was dropped for draining
+* MAP_ENTRY_IN_TRANSITION.  Moreover, another thread
+* could be simultaneously wiring this new mapping
+* entry.  Detect these cases and skip any entries
+* marked as in transition by us.
+*/
+   if ((entry->eflags & MAP_ENTRY_IN_TRANSITION) == 0 ||
+   entry->wiring_thread != curthread) {
+

Re: stopping amd causes a freeze

2013-07-25 Thread Dominic Fandrey

On 22/07/2013 12:07, Konstantin Belousov wrote:
> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>> ...
>>
>> I run amd through sysutils/automounter, which is a scripting solution
>> that generates an amd.map file based on encountered devices and devd
>> events. The SIGHUP it sends to amd to tell it the map file was updated
>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
>>
>> Nothing was mounted (by amd) during the last freeze.
>>
>> ...
> 
> Are you sure that the machine did not paniced ?  Do you have serial console ?
> 
> The amd(8) locks itself into memory, most likely due to the fear of
> deadlock. There are some known issues with user wirings in stable/9.
> If the problem you see is indeed due to wiring, you might try to apply
> r253187-r253191.

I tried that. Applying the diff was straightforward enough. But the
resulting kernel paniced as soon as it tried to mount the root fs.

So I'll wait for the MFC from someone who knows what he/she is doing.

Regards

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-23 Thread Artem Belevich

On Tue, Jul 23, 2013 at 10:43 AM, Dominic Fandrey wrote:

> > Don't use KILL or make sure that nobody tries to use amd mountpoints
> until
> > new instance starts. Manually unmounting them before killing amd may
> help.
> > Why not let amd do it itself with "/etc/rc.d/amd stop" ?
>
> That was a typo, I'm using SIGTERM. Sorry about that.
>
>
On SIGTERM amd will attempt to unmount its mountpoints. If someone is using
them, unmount may not succeed. I've no clue what amd does in such case.

The point is that you should treat amd restart as reboot of an NFS server.
amd map reload does not really require amd restart. In some cases you may
have to manually unmount some automounted filesystem if underlying map had
changed, but that's the only case I can think of off the top of my head. In
most of the cases "amq -f" worked well enough for me.

By the way, are you absolutely sure that your script that restarts amd is
guaranteed not to touch anything mounted with amd? Otherwise you're risking
a deadlock. For example, if PATH contains amd-mounted directory then when
it's time to execute next command your script may attempt to touch such
path and may hang waiting for amd to respond which will never happen
because the script can't start it.

Now, back to debugging your problem. One way to check what's going on would
be to figure out where do the processes get stuck.
Start with "ps -axl" and see STAT field. CHances are that stuck processes
will be in uninterruptible sleep state 'D'. Check MWCHAN field for those.
Hitting '^T' which normally sends SIGINFO should also produce a message
that includes process' wait channel and is convenient to use when you have
console where you've started the app that is hung.

Dig further into the sleeping process with "procstat -kk PID" -- it will
give you in-kernel stack trace for process' threads which should whos
what's going on. You may want to do it from a root login with local host
directory and minimalistic PATH so it does not touch any amd mount points.

--Artem
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-23 Thread Dominic Fandrey

On 22/07/2013 20:05, Artem Belevich wrote:
> On Mon, Jul 22, 2013 at 2:50 AM, Dominic Fandrey wrote:
> 
>> Occasionally stopping amd freezes my system. It's a rare occurrence,
>> and I haven't found a reliable way to reproduce it.
>>
>> It's also a real freeze, so there's no way to get into the debugger
>> or grab a core dump. I only can perform the 4 seconds hard shutdown to
>> revive the system.
>>
>> I run amd through sysutils/automounter, which is a scripting solution
>> that generates an amd.map file based on encountered devices and devd
>> events. The SIGHUP it sends to amd to tell it the map file was updated
>> does not cause problems, only a SIGKILL may cause the freeze.
>>
>> Nothing was mounted (by amd) during the last freeze.
>>
>>
> ...
> 
> 
>> I don't see any angle to tackle this, but I'm throwing it out here
>> any way, in the hopes that someone actually has an idea how to approach
>> the issue.
>>
> 
> Don't use KILL or make sure that nobody tries to use amd mountpoints until
> new instance starts. Manually unmounting them before killing amd may help.
> Why not let amd do it itself with "/etc/rc.d/amd stop" ?

That was a typo, I'm using SIGTERM. Sorry about that.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-22 Thread Artem Belevich

On Mon, Jul 22, 2013 at 2:50 AM, Dominic Fandrey wrote:

> Occasionally stopping amd freezes my system. It's a rare occurrence,
> and I haven't found a reliable way to reproduce it.
>
> It's also a real freeze, so there's no way to get into the debugger
> or grab a core dump. I only can perform the 4 seconds hard shutdown to
> revive the system.
>
> I run amd through sysutils/automounter, which is a scripting solution
> that generates an amd.map file based on encountered devices and devd
> events. The SIGHUP it sends to amd to tell it the map file was updated
> does not cause problems, only a SIGKILL may cause the freeze.
>
> Nothing was mounted (by amd) during the last freeze.
>
>
amd itself is a primitive NFS server as far as system is concerned and amd
mount points are mounted from it. If you just KILL it without giving it a
chance to clean things up you'll potentially end up in a situation similar
to mounting from remote NFS server that's unresponsive. From mount_nfs(8):

 If the server becomes unresponsive while an NFS file system is mounted,
 any new or outstanding file operations on that file system will hang
 uninterruptibly until the server comes back.  To modify this default
be-
 haviour, see the intr and soft options.



> I don't see any angle to tackle this, but I'm throwing it out here
> any way, in the hopes that someone actually has an idea how to approach
> the issue.
>

Don't use KILL or make sure that nobody tries to use amd mountpoints until
new instance starts. Manually unmounting them before killing amd may help.
Why not let amd do it itself with "/etc/rc.d/amd stop" ?

--Artem


>
> # uname -a
> FreeBSD mobileKamikaze.norad 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0
> r253413: Wed Jul 17 13:12:46 CEST 2013 
> root@mobileKamikaze.norad:/usr/obj/HP6510b-91/amd64/usr/src/sys/HP6510b-91
>  amd64
>
> That's amd's starting message:
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  no logfile defined; using
> stderr
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  AM-UTILS VERSION
> INFORMATION:
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Copyright (c) 1997-2006
> Erez Zadok
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Copyright (c) 1990
> Jan-Simon Pendry
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Copyright (c) 1990
> Imperial College of Science, Technology & Medicine
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Copyright (c) 1990 The
> Regents of the University of California.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  am-utils version 6.1.5
> (build 901505).
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Report bugs to
> https://bugzilla.am-utils.org/ or am-ut...@am-utils.org.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Configured by David
> O'Brien  on date 4-December-2007 PST.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Built by
> root@mobileKamikaze.norad.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  cpu=amd64 (little-endian),
> arch=amd64, karch=amd64.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  full_os=freebsd9.2,
> os=freebsd9, osver=9.2, vendor=undermydesk, distro=The FreeBSD Project.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  domain=norad,
> host=mobileKamikaze, hostd=mobileKamikaze.norad.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Map support for: root,
> passwd, union, nis, ndbm, file, exec, error.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  AMFS: nfs, link, nfsx,
> nfsl, host, linkx, program, union, ufs, cdfs,
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:pcfs, auto, direct,
> toplvl, error, inherit.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  FS: cd9660, nfs, nfs3,
> nullfs, msdosfs, ufs, unionfs.
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Network 1:
> wire="192.168.1.0" (netnumber=192.168.1).
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Network 2:
> wire="192.168.0.0" (netnumber=192.168).
> Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  My ip addr is 127.0.0.1
>
> amd is called with the flags -r -p -a -c 4 -w 2
>
> --
> A: Because it fouls the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing on usenet and in e-mail?
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-22 Thread Dominic Fandrey

On 22/07/2013 14:35, Ronald Klop wrote:
> On Mon, 22 Jul 2013 14:19:44 +0200, Dominic Fandrey  
> wrote:
> 
>> On 22/07/2013 12:07, Konstantin Belousov wrote:
>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>>>> Occasionally stopping amd freezes my system. It's a rare occurrence,
>>>> and I haven't found a reliable way to reproduce it.
>>>>
>>>> It's also a real freeze, so there's no way to get into the debugger
>>>> or grab a core dump. I only can perform the 4 seconds hard shutdown to
>>>> revive the system.
>>>>
>>>> I run amd through sysutils/automounter, which is a scripting solution
>>>> that generates an amd.map file based on encountered devices and devd
>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
>>>> does not cause problems, only a SIGKILL may cause the freeze.
>>>>
>>>> Nothing was mounted (by amd) during the last freeze.
>>>>
>>>> I don't see any angle to tackle this, but I'm throwing it out here
>>>> any way, in the hopes that someone actually has an idea how to approach
>>>> the issue.
>>>
>>> Are you sure that the machine did not paniced ?  Do you have serial console 
>>> ?
>>
>> No, I don't have one. All that I can tell is that everything freezes
>> (i.e. Xorg screen and mouse). ACPI events like shutdown don't cause a
>> reaction. And the system doesn't respond to ICMP queries.
>>
>>> The amd(8) locks itself into memory, most likely due to the fear of
>>> deadlock. There are some known issues with user wirings in stable/9.
>>> If the problem you see is indeed due to wiring, you might try to apply
>>> r253187-r253191.
>>
>> From head? That may be worth a try. It would be better for testing if I
>> managed to reproduce the problem reliably, before I test patches.
>>
>> I see it's scheduled for MFC, soon.
>>
> 
> Did you try a run with the INVARIANTS, etc. options in the kernel? That 
> enables more sanity checking for locks which is too slow for production.

No I didn't, but I managed to reproduce it in combination with heavy tmpfs
load. So now I've got a working test case and will be able to determine
whether the suggested fix works.

I suppose INVARIANTS would be the next step.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-22 Thread Ronald Klop

On Mon, 22 Jul 2013 14:19:44 +0200, Dominic Fandrey   
wrote:



On 22/07/2013 12:07, Konstantin Belousov wrote:

On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:

Occasionally stopping amd freezes my system. It's a rare occurrence,
and I haven't found a reliable way to reproduce it.

It's also a real freeze, so there's no way to get into the debugger
or grab a core dump. I only can perform the 4 seconds hard shutdown to
revive the system.

I run amd through sysutils/automounter, which is a scripting solution
that generates an amd.map file based on encountered devices and devd
events. The SIGHUP it sends to amd to tell it the map file was updated
does not cause problems, only a SIGKILL may cause the freeze.

Nothing was mounted (by amd) during the last freeze.

I don't see any angle to tackle this, but I'm throwing it out here
any way, in the hopes that someone actually has an idea how to approach
the issue.


Are you sure that the machine did not paniced ?  Do you have serial  
console ?


No, I don't have one. All that I can tell is that everything freezes
(i.e. Xorg screen and mouse). ACPI events like shutdown don't cause a
reaction. And the system doesn't respond to ICMP queries.


The amd(8) locks itself into memory, most likely due to the fear of
deadlock. There are some known issues with user wirings in stable/9.
If the problem you see is indeed due to wiring, you might try to apply
r253187-r253191.


From head? That may be worth a try. It would be better for testing if I
managed to reproduce the problem reliably, before I test patches.

I see it's scheduled for MFC, soon.



Did you try a run with the INVARIANTS, etc. options in the kernel? That  
enables more sanity checking for locks which is too slow for production.


Ronald.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-22 Thread Dominic Fandrey

On 22/07/2013 12:07, Konstantin Belousov wrote:
> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>> Occasionally stopping amd freezes my system. It's a rare occurrence,
>> and I haven't found a reliable way to reproduce it.
>>
>> It's also a real freeze, so there's no way to get into the debugger
>> or grab a core dump. I only can perform the 4 seconds hard shutdown to
>> revive the system.
>>
>> I run amd through sysutils/automounter, which is a scripting solution
>> that generates an amd.map file based on encountered devices and devd
>> events. The SIGHUP it sends to amd to tell it the map file was updated
>> does not cause problems, only a SIGKILL may cause the freeze.
>>
>> Nothing was mounted (by amd) during the last freeze.
>>
>> I don't see any angle to tackle this, but I'm throwing it out here
>> any way, in the hopes that someone actually has an idea how to approach
>> the issue.
> 
> Are you sure that the machine did not paniced ?  Do you have serial console ?

No, I don't have one. All that I can tell is that everything freezes
(i.e. Xorg screen and mouse). ACPI events like shutdown don't cause a
reaction. And the system doesn't respond to ICMP queries.

> The amd(8) locks itself into memory, most likely due to the fear of
> deadlock. There are some known issues with user wirings in stable/9.
> If the problem you see is indeed due to wiring, you might try to apply
> r253187-r253191.

>From head? That may be worth a try. It would be better for testing if I
managed to reproduce the problem reliably, before I test patches.

I see it's scheduled for MFC, soon.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stopping amd causes a freeze

2013-07-22 Thread Konstantin Belousov

On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
> Occasionally stopping amd freezes my system. It's a rare occurrence,
> and I haven't found a reliable way to reproduce it.
> 
> It's also a real freeze, so there's no way to get into the debugger
> or grab a core dump. I only can perform the 4 seconds hard shutdown to
> revive the system.
> 
> I run amd through sysutils/automounter, which is a scripting solution
> that generates an amd.map file based on encountered devices and devd
> events. The SIGHUP it sends to amd to tell it the map file was updated
> does not cause problems, only a SIGKILL may cause the freeze.
> 
> Nothing was mounted (by amd) during the last freeze.
> 
> I don't see any angle to tackle this, but I'm throwing it out here
> any way, in the hopes that someone actually has an idea how to approach
> the issue.

Are you sure that the machine did not paniced ?  Do you have serial console ?

The amd(8) locks itself into memory, most likely due to the fear of
deadlock. There are some known issues with user wirings in stable/9.
If the problem you see is indeed due to wiring, you might try to apply
r253187-r253191.


pgpsPkzdjccIf.pgp
Description: PGP signature

stopping amd causes a freeze

2013-07-22 Thread Dominic Fandrey

Occasionally stopping amd freezes my system. It's a rare occurrence,
and I haven't found a reliable way to reproduce it.

It's also a real freeze, so there's no way to get into the debugger
or grab a core dump. I only can perform the 4 seconds hard shutdown to
revive the system.

I run amd through sysutils/automounter, which is a scripting solution
that generates an amd.map file based on encountered devices and devd
events. The SIGHUP it sends to amd to tell it the map file was updated
does not cause problems, only a SIGKILL may cause the freeze.

Nothing was mounted (by amd) during the last freeze.

I don't see any angle to tackle this, but I'm throwing it out here
any way, in the hopes that someone actually has an idea how to approach
the issue.

# uname -a
FreeBSD mobileKamikaze.norad 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0 r253413: 
Wed Jul 17 13:12:46 CEST 2013 
root@mobileKamikaze.norad:/usr/obj/HP6510b-91/amd64/usr/src/sys/HP6510b-91  
amd64

That's amd's starting message:
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  no logfile defined; using stderr
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  AM-UTILS VERSION INFORMATION:
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Copyright (c) 1997-2006 Erez 
Zadok
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Copyright (c) 1990 Jan-Simon 
Pendry
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Copyright (c) 1990 Imperial 
College of Science, Technology & Medicine
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Copyright (c) 1990 The Regents 
of the University of California.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  am-utils version 6.1.5 (build 
901505).
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Report bugs to 
https://bugzilla.am-utils.org/ or am-ut...@am-utils.org.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Configured by David O'Brien 
 on date 4-December-2007 PST.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Built by 
root@mobileKamikaze.norad.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  cpu=amd64 (little-endian), 
arch=amd64, karch=amd64.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  full_os=freebsd9.2, 
os=freebsd9, osver=9.2, vendor=undermydesk, distro=The FreeBSD Project.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  domain=norad, 
host=mobileKamikaze, hostd=mobileKamikaze.norad.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Map support for: root, passwd, 
union, nis, ndbm, file, exec, error.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  AMFS: nfs, link, nfsx, nfsl, 
host, linkx, program, union, ufs, cdfs,
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:pcfs, auto, direct, 
toplvl, error, inherit.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  FS: cd9660, nfs, nfs3, nullfs, 
msdosfs, ufs, unionfs.
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Network 1: wire="192.168.1.0" 
(netnumber=192.168.1).
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  Network 2: wire="192.168.0.0" 
(netnumber=192.168).
Jul 22 11:32:28 mobileKamikaze amd[8176]/info:  My ip addr is 127.0.0.1

amd is called with the flags -r -p -a -c 4 -w 2

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 6.2-Release ..ish.. CF + ata == freeze

2012-07-17 Thread john fleming

This is a really old thread i thought i would bring back to life. I have heard 
that the flash card vendor has fessed up to a problem and said there is a 
software fix they can create. So far i have no ETA on when that is going to 
happen and for the record i don't think i will.
 
Oh well... here comes a crap load of RMAs.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 8.3 and 9.0 freeze with firefox

2012-04-15 Thread Reed Loefgren


On 04/15/12 03:29, Ronald Klop wrote:

On Fri, 30 Mar 2012 04:28:06 +0200, Joseph Olatt  wrote:


Hi,

Starting with 8.3, I've been experiencing FreeBSD freezing up completely
after using firefox for a while. Thinking the problem would go away if I
upgraded to 9.0, I did that and I am still experiencing the same
freezing up. The mouse pointer freezes, the keyboard freezes (caps lock
light will not come on; Ctrl-Alt-F[1-10] does not work etc.). The only
way to get the system back is by pressing and holding down the power
button.

The problem seems similar to: kern/163145


There is nothing in /var/log/messages to indicate a problem. Output of
pciconf -lv and uname -a are at:

http://www.eskimo.com/~joji/wisdom/

Anybody else experiencing similar freeze ups with 8.3 or 9.0 while using
firefox?


Since Firefox uses all kinds of GPU stuff nowadays. Is it possible it 
locks up your graphics card?

I suggest trying to turn of GPU hardware acceleration in Firefox.

Ronald.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Ronald and Joseph,

I've been seeing this too on an AMD Phenom II X3 720 running Stable from 
2012-03-23. A Prescott Dell, 9 Stable, I use at work does it too. The 
freezes last some number of second, longer if Youtube types of video is 
involved. Annoying. I'll try the graphics tip.


Thanks,
r
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 8.3 and 9.0 freeze with firefox

2012-04-15 Thread Ronald Klop


On Fri, 30 Mar 2012 04:28:06 +0200, Joseph Olatt  wrote:


Hi,

Starting with 8.3, I've been experiencing FreeBSD freezing up completely
after using firefox for a while. Thinking the problem would go away if I
upgraded to 9.0, I did that and I am still experiencing the same
freezing up. The mouse pointer freezes, the keyboard freezes (caps lock
light will not come on; Ctrl-Alt-F[1-10] does not work etc.). The only
way to get the system back is by pressing and holding down the power
button.

The problem seems similar to: kern/163145


There is nothing in /var/log/messages to indicate a problem. Output of
pciconf -lv and uname -a are at:

http://www.eskimo.com/~joji/wisdom/

Anybody else experiencing similar freeze ups with 8.3 or 9.0 while using
firefox?


Since Firefox uses all kinds of GPU stuff nowadays. Is it possible it  
locks up your graphics card?

I suggest trying to turn of GPU hardware acceleration in Firefox.

Ronald.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 8.3 and 9.0 freeze with firefox

2012-03-30 Thread Joseph Olatt

On Fri, Mar 30, 2012 at 10:41:54AM +0700, Erich Dollansky wrote:
> Hi,
> 
> On Friday 30 March 2012 09:28:06 Joseph Olatt wrote:
> > 
> > Starting with 8.3, I've been experiencing FreeBSD freezing up completely
> > after using firefox for a while. Thinking the problem would go away if I
> > upgraded to 9.0, I did that and I am still experiencing the same
> > freezing up. The mouse pointer freezes, the keyboard freezes (caps lock
> > light will not come on; Ctrl-Alt-F[1-10] does not work etc.). The only
> > way to get the system back is by pressing and holding down the power
> > button. 
> > 
> > The problem seems similar to: kern/163145
> > 
> > 
> > There is nothing in /var/log/messages to indicate a problem. Output of 
> > pciconf -lv and uname -a are at:
> > 
> > http://www.eskimo.com/~joji/wisdom/
> > 
> > Anybody else experiencing similar freeze ups with 8.3 or 9.0 while using
> > firefox?
> > 
> I use 8.3 and Firefox without problems. What extension did you install? Are 
> they all properly updated?
> 
> Earlier, it helped deleting firefox' directory in the user directory.
> 
> Erich


Erich,

Thanks for your response.

I've removed the .mozilla directory from my home directory. Let's see if
it will make a difference. It is quite possible it will. I had updated
the firefox port when I updated from 8.2 to 8.3.

Thanks for the suggestion.

joseph

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 8.3 and 9.0 freeze with firefox

2012-03-29 Thread Erich Dollansky

Hi,

On Friday 30 March 2012 09:28:06 Joseph Olatt wrote:
> 
> Starting with 8.3, I've been experiencing FreeBSD freezing up completely
> after using firefox for a while. Thinking the problem would go away if I
> upgraded to 9.0, I did that and I am still experiencing the same
> freezing up. The mouse pointer freezes, the keyboard freezes (caps lock
> light will not come on; Ctrl-Alt-F[1-10] does not work etc.). The only
> way to get the system back is by pressing and holding down the power
> button. 
> 
> The problem seems similar to: kern/163145
> 
> 
> There is nothing in /var/log/messages to indicate a problem. Output of 
> pciconf -lv and uname -a are at:
> 
> http://www.eskimo.com/~joji/wisdom/
> 
> Anybody else experiencing similar freeze ups with 8.3 or 9.0 while using
> firefox?
> 
I use 8.3 and Firefox without problems. What extension did you install? Are 
they all properly updated?

Earlier, it helped deleting firefox' directory in the user directory.

Erich
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

FreeBSD 8.3 and 9.0 freeze with firefox

2012-03-29 Thread Joseph Olatt

Hi,

Starting with 8.3, I've been experiencing FreeBSD freezing up completely
after using firefox for a while. Thinking the problem would go away if I
upgraded to 9.0, I did that and I am still experiencing the same
freezing up. The mouse pointer freezes, the keyboard freezes (caps lock
light will not come on; Ctrl-Alt-F[1-10] does not work etc.). The only
way to get the system back is by pressing and holding down the power
button. 

The problem seems similar to: kern/163145


There is nothing in /var/log/messages to indicate a problem. Output of 
pciconf -lv and uname -a are at:

http://www.eskimo.com/~joji/wisdom/

Anybody else experiencing similar freeze ups with 8.3 or 9.0 while using
firefox?


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-16 Thread john fleming

The plot is starting to thicken. I've noticed all the systems that have done 
this (so far) have this flash card on them.

STEC M2+ CF 9.0.2 K1186-2

From talking to checkpoint this is a newer flash they have started using. I 
just had a 4th machine do the same thing yesterday. Basic install, about %70 
disk space free, very new install, like 1-2 month and the up time on the 
machine in question was only 16 days. After rebooting i did a few dd 
if=/dev/zero of=~/file bs=1m count=350 and didn't get any errors.

The latest machine is a 1 gig version of the flash listed above, so this ate 
almost all the free disk space. Checkpoint is asking that we RAM one of the 
flash cards so they can play with it.

 From: "jfleming...@yahoo.com" 
To: Jeremy Chadwick  
Cc: "freebsd-stable@freebsd.org"  
Sent: Tuesday, February 14, 2012 7:57 PM
Subject: Re: 6.2-Release ..ish.. CF + ata == freeze?

2 of the 3 cf cards are very new, like less then 6 months old. 

I think around 65-70 percent is in use. This number doesn't change unless the 
user dumps data in a home dir, which isn't the case so far. 

You are correct that only writes are failing. Msgbuf has more then what I 
pasted but I'm pretty sure its just more of the same errors. Ill redouble my 
check. 

The other slices are very small. One is 35 meg the other is 100 some odd meg. H 
is 1.2 gig.  

I don't know if ill be able to try the dd test for a few reasons but ill check 
it out. Let me ask you this. Say zeroing out the drive works without error. 
Does that tell me anything?  

I also don't have access to smart tools as this is basically a closed system 
and the vendor would never give us access to a complier. Granted I haven't 
tried just throwing on gcc from 6.2. I could play with that or maybe since said 
vendor's dev team is keeping track of this thread they could provide said 
binary :). 

I really don't like the idea of replacing hardware as I'm looking at around 200 
boxes. I really hope it doesn't come to that. 

Thanks for the reply!

Sent via BlackBerry from T-Mobile

-Original Message-
From: Jeremy Chadwick 
Date: Mon, 13 Feb 2012 21:18:28 
To: john fleming
Cc: freebsd-stable@freebsd.org
Subject: Re: 6.2-Release ..ish.. CF + ata == freeze?

On Mon, Feb 13, 2012 at 08:43:08PM -0800, john fleming wrote:
> Just thought i would post over here as i'm not getting a warm fuzzy from 
> checkpoint about being able to find the root cause of an issue. I have a 
> large install base of IPSO checkpoint firewalls, which are based on FreeBSD 
> 6.2. I've had 3 firewalls hang basically the same way, with something that 
> looks like a filesystem issue or an?issue with a CF card. 

FreeBSD 6.2 was EOL'd in early-to-mid-2008.  The ATA driver has changed
significantly since then (present-day uses CAM).

> Does anyone happen to know of any bugs (i've been looking around) that could 
> cause something like that? Granted, it could be a batch of bad CF cards, but 
> its odd that i'm seeing the same thing on 3 different boxes and once rebooted 
> they seem ok.
> ?
> Also is it possible to get useful info form the atacontroller when things go 
> south like this from the ddb prompt?

Not particularly.  What's shown below indicates that the driver had
issued some form of ATA write command (there are multiple kinds per ATA
specification), and either the underlying media (CF/disk) or controller
stalled/locked up/took too long.  I forget what the timeout value is in
6.2; I can't be bothered to remember such from 6 years ago.  :-)

> This is what shows in show msgbuf
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> g_vfs_done():ad0s4h[WRITE(offset=33849344, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=33980416, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34111488, length=131072)]error = 5
> ?g_vfs_done():ad0s4h[WRITE(offset=34242560, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34373632, length=131072)]error = 5 

error 5 = EIO = Input/output error.  But this isn't too big of a
surprise given the timeouts you see prior.

Are these CF cards brand new -- meaning, are they completely unused
(having never had any writes done to them), or have they been in use a
while?  I'm betting they've been in use a while, and have probably been
doing many writes over the years.

Two things to note here:

1) The errors you've shown are only happening on writes, not reads.  Of
course if you omitted information then this isn't an accurate statemen

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-14 Thread jflemingeds

2 of the 3 cf cards are very new, like less then 6 months old. 

I think around 65-70 percent is in use. This number doesn't change unless the 
user dumps data in a home dir, which isn't the case so far. 

You are correct that only writes are failing. Msgbuf has more then what I 
pasted but I'm pretty sure its just more of the same errors. Ill redouble my 
check. 

The other slices are very small. One is 35 meg the other is 100 some odd meg. H 
is 1.2 gig.  

I don't know if ill be able to try the dd test for a few reasons but ill check 
it out. Let me ask you this. Say zeroing out the drive works without error. 
Does that tell me anything?  

I also don't have access to smart tools as this is basically a closed system 
and the vendor would never give us access to a complier. Granted I haven't 
tried just throwing on gcc from 6.2. I could play with that or maybe since said 
vendor's dev team is keeping track of this thread they could provide said 
binary :). 

I really don't like the idea of replacing hardware as I'm looking at around 200 
boxes. I really hope it doesn't come to that. 

Thanks for the reply!

Sent via BlackBerry from T-Mobile

-Original Message-
From: Jeremy Chadwick 
Date: Mon, 13 Feb 2012 21:18:28 
To: john fleming
Cc: freebsd-stable@freebsd.org
Subject: Re: 6.2-Release ..ish.. CF + ata == freeze?

On Mon, Feb 13, 2012 at 08:43:08PM -0800, john fleming wrote:
> Just thought i would post over here as i'm not getting a warm fuzzy from 
> checkpoint about being able to find the root cause of an issue. I have a 
> large install base of IPSO checkpoint firewalls, which are based on FreeBSD 
> 6.2. I've had 3 firewalls hang basically the same way, with something that 
> looks like a filesystem issue or an?issue with a CF card. 

FreeBSD 6.2 was EOL'd in early-to-mid-2008.  The ATA driver has changed
significantly since then (present-day uses CAM).

> Does anyone happen to know of any bugs (i've been looking around) that could 
> cause something like that? Granted, it could be a batch of bad CF cards, but 
> its odd that i'm seeing the same thing on 3 different boxes and once rebooted 
> they seem ok.
> ?
> Also is it possible to get useful info form the atacontroller when things go 
> south like this from the ddb prompt?

Not particularly.  What's shown below indicates that the driver had
issued some form of ATA write command (there are multiple kinds per ATA
specification), and either the underlying media (CF/disk) or controller
stalled/locked up/took too long.  I forget what the timeout value is in
6.2; I can't be bothered to remember such from 6 years ago.  :-)

> This is what shows in show msgbuf
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> g_vfs_done():ad0s4h[WRITE(offset=33849344, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=33980416, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34111488, length=131072)]error = 5
> ?g_vfs_done():ad0s4h[WRITE(offset=34242560, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34373632, length=131072)]error = 5 

error 5 = EIO = Input/output error.  But this isn't too big of a
surprise given the timeouts you see prior.

Are these CF cards brand new -- meaning, are they completely unused
(having never had any writes done to them), or have they been in use a
while?  I'm betting they've been in use a while, and have probably been
doing many writes over the years.

Two things to note here:

1) The errors you've shown are only happening on writes, not reads.  Of
course if you omitted information then this isn't an accurate statement.
2) Timeouts are seen when issuing writes to some LBA regions.

How full is the CF card, disk-space-wise?  Not just ad0s4h, I'm talking
about the entire card.  How much space is roughly available?  They're
very small CF cards (1.8GByte roughly), and the less space available,
the less effectiveness of wear levelling (and in some cases the slower
the writes are).

Reason I ask: given that these are CF cards, this smells of cards which
are simply "worn down".  CF cards have limited numbers of writes, and
the card may be "freaking out" internally when attempting to write to
some LBAs which map to CF sectors that are, in effect, "bad".  The CF
cards' ECC implementation may be buggy, or may simply be "spinning hard"
for too long.  You can read about this sort of behaviour on Wikipedia's
CompactFlash article.

You wouldn't be able to verify this with dd if=/dev/ad0, because those
are r

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-14 Thread Ian Lepore

On Tue, 2012-02-14 at 00:12 -0500, Jason Hellenthal wrote:
> 
> On Mon, Feb 13, 2012 at 08:43:08PM -0800, john fleming wrote:
> > Just thought i would post over here as i'm not getting a warm fuzzy from 
> > checkpoint about being able to find the root cause of an issue. I have a 
> > large install base of IPSO checkpoint firewalls, which are based on FreeBSD 
> > 6.2. I've had 3 firewalls hang basically the same way, with something that 
> > looks like a filesystem issue or an issue with a CF card. 
> >  
> > Does anyone happen to know of any bugs (i've been looking around) that 
> > could cause something like that? Granted, it could be a batch of bad CF 
> > cards, but its odd that i'm seeing the same thing on 3 different boxes and 
> > once rebooted they seem ok.
> >  
> > Also is it possible to get useful info form the atacontroller when things 
> > go south like this from the ddb prompt?
> >  
> > This is what shows in show msgbuf
> > ad0: timeout waiting to issue command
> > ad0: error issuing WRITE command
> > ad0: timeout waiting to issue command
> > ad0: error issuing WRITE command
> > ad0: timeout waiting to issue command
> > ad0: error issuing WRITE command
> > ad0: timeout waiting to issue command
> > ad0: error issuing WRITE command
> > g_vfs_done():ad0s4h[WRITE(offset=33849344, length=131072)]error = 5 
> > g_vfs_done():ad0s4h[WRITE(offset=33980416, length=131072)]error = 5 
> > g_vfs_done():ad0s4h[WRITE(offset=34111488, length=131072)]error = 5
> >  g_vfs_done():ad0s4h[WRITE(offset=34242560, length=131072)]error = 5 
> > g_vfs_done():ad0s4h[WRITE(offset=34373632, length=131072)]error = 5 
> >  
> > ad0: 1882MB  at ata0-master PIO4
> > atapci0:  port 
> > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x5070-0x507f mem 0x80301000-0x803013ff 
> > at device 31.1 on pci0
> > ata0:  on atapci0
> > ata1:  on atapci0
> > atapci1:  port 
> > 0x5088-0x508f,0x50a4-0x50a7,0x5080-0x5087,0x50a0-0x50a3,0x5060-0x506f irq 
> > 15 at device 31.2 on pci0
> > ata2:  on atapci1
> > ata3:  on atapci1ad0s4h is basically a r/w ufs partition on 
> > the box where almost anything that needs to be written goes.
> > trace
> > Tracing pid 1101 tid 100043 td 0x656d8460
> > kdb_enter(608cc388,6246,656d8460,64ba1400,6095d580,...) at kdb_enter+0x2b
> > siointr1(64ba1400) at siointr1+0xf0
> > siointr(64ba1400) at siointr+0x38
> > intr_execute_handler(6095d580,f0a4ab04,6,6095d580,f0a4aafc,...) at 
> > intr_execute_handler+0x61
> > intr_execute_handlers(6095d580,f0a4ab04,6,0,656d8460,...) at 
> > intr_execute_handlers+0x40
> > atpic_handle_intr(4) at atpic_handle_intr+0x96
> > Xatpic_intr4() at Xatpic_intr4+0x20
> > --- interrupt, eip = 0x606044af, esp = 0xf0a4ab48, ebp = 0xf0a4ab5c ---
> > lockmgr(e1456a04,6,0,656d8460) at lockmgr+0x58f
> > getdirtybuf(e14569a4,60a405e4,1) at getdirtybuf+0x2e2
> > flush_deplist(68b30850,1,f0a4abb8) at flush_deplist+0x30
> > flush_inodedep_deps(656fa28c,1f235) at flush_inodedep_deps+0xcf
> > softdep_sync_metadata(65964618) at softdep_sync_metadata+0x61
> > ffs_syncvnode(65964618,1) at ffs_syncvnode+0x3a2
> > ffs_fsync(f0a4ac74) at ffs_fsync+0x12
> > VOP_FSYNC_APV(60949260,f0a4ac74) at VOP_FSYNC_APV+0x38
> > fsync(656d8460,f0a4acb4) at fsync+0x170
> > syscall(805003b,806003b,5fbf003b,805,288be450,...) at syscall+0x2ee
> > Xint0x80_syscall() at Xint0x80_syscall+0x1f
> 
> This looks to be a problem with softupdates and CF cards. Can you get
> this to repeat on a brand new (good) card ?
> 

EIO errors on a write that lead to a panic nearly always backtrace into
the softupdates code, because that code pretty much has to panic if it
can't write things in the proper order.  That doesn't imply that the
softupdates code is at fault in any way, or that the errors would go
away if softupdates were turned off.  

In fact, I consider it important to have softupdates enabled on CF and
SDCard media.  The number of writes (and especially of repeated
re-writes of the same filesystem metadata sectors) goes way way up
without SU enabled, and that's bad for media with a limited number of
write cycles in its lifetime.

We've been using 6.2 with SU enabled on CF cards for many years at
Symmetricom; we're still shipping systems with that config.  Depending
on the motherboard or SBC, we often have to disable ata DMA, or limit it
to a max of WDMA2 mode.  The indication that you need to do so is
typically a lockup either trying to load the kernel and modules, or
sometimes that works but it locks up while initializing the ata driver.
[1]  If your systems have been running fine with DMA enabled, it's not
the sort of problem that suddenly appears out of the blue.  You find out
when you need to disable it pretty quickly on new hardware because it
doesn't boot reliably.

I tend to agree with Jeremy's assesment that you may have some CF cards
that have neared the end of their life, and especially if they're full
the automatic wear leveling can't find any un-worn cells to use.  If the
cards are old they may have primitive wear-leve

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-13 Thread Jason Hellenthal



On Mon, Feb 13, 2012 at 11:38:06PM -0600, Adam Vande More wrote:
> On Mon, Feb 13, 2012 at 10:43 PM, john fleming wrote:
> 
> > Just thought i would post over here as i'm not getting a warm fuzzy from
> > checkpoint about being able to find the root cause of an issue. I have a
> > large install base of IPSO checkpoint firewalls, which are based on FreeBSD
> > 6.2. I've had 3 firewalls hang basically the same way, with something that
> > looks like a filesystem issue or an issue with a CF card.
> >
> 
> There was a thread just the other day mentioned lockup problems with DMA
> and CF cards. Disabling DMA or reducing the mode helped.  Not sure if
> applicable to that old of FreeBSD version.
> 

I seen that thread. Doubt it is related to his issue since he is running
6.2. And besides his dmesg proves otherwise.

ad0: 1882MB  at ata0-master PIO4

-- 
;s =;
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-13 Thread Adam Vande More

On Mon, Feb 13, 2012 at 10:43 PM, john fleming wrote:

> Just thought i would post over here as i'm not getting a warm fuzzy from
> checkpoint about being able to find the root cause of an issue. I have a
> large install base of IPSO checkpoint firewalls, which are based on FreeBSD
> 6.2. I've had 3 firewalls hang basically the same way, with something that
> looks like a filesystem issue or an issue with a CF card.
>

There was a thread just the other day mentioned lockup problems with DMA
and CF cards. Disabling DMA or reducing the mode helped.  Not sure if
applicable to that old of FreeBSD version.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-13 Thread john fleming

I can't seem to replicate it at all. I've seen it happen on 3 different IPSO 
boxes so far. The last machine it happened on is maybe 4 months old. Basically 
on all 3 machines once rebooted the problem doesn't come back. Checkpoint so 
far is telling me its a known issue and they don't know what the fix is.
 
What makes you think its softupdates? Would that cause the write timeout as 
well? Just not sure what level of this is failing, filesystem, flash or ata 
controller.
 
 
thanks for the reply!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-13 Thread Jeremy Chadwick

On Mon, Feb 13, 2012 at 08:43:08PM -0800, john fleming wrote:
> Just thought i would post over here as i'm not getting a warm fuzzy from 
> checkpoint about being able to find the root cause of an issue. I have a 
> large install base of IPSO checkpoint firewalls, which are based on FreeBSD 
> 6.2. I've had 3 firewalls hang basically the same way, with something that 
> looks like a filesystem issue or an?issue with a CF card. 

FreeBSD 6.2 was EOL'd in early-to-mid-2008.  The ATA driver has changed
significantly since then (present-day uses CAM).

> Does anyone happen to know of any bugs (i've been looking around) that could 
> cause something like that? Granted, it could be a batch of bad CF cards, but 
> its odd that i'm seeing the same thing on 3 different boxes and once rebooted 
> they seem ok.
> ?
> Also is it possible to get useful info form the atacontroller when things go 
> south like this from the ddb prompt?

Not particularly.  What's shown below indicates that the driver had
issued some form of ATA write command (there are multiple kinds per ATA
specification), and either the underlying media (CF/disk) or controller
stalled/locked up/took too long.  I forget what the timeout value is in
6.2; I can't be bothered to remember such from 6 years ago.  :-)

> This is what shows in show msgbuf
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> g_vfs_done():ad0s4h[WRITE(offset=33849344, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=33980416, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34111488, length=131072)]error = 5
> ?g_vfs_done():ad0s4h[WRITE(offset=34242560, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34373632, length=131072)]error = 5 

error 5 = EIO = Input/output error.  But this isn't too big of a
surprise given the timeouts you see prior.

Are these CF cards brand new -- meaning, are they completely unused
(having never had any writes done to them), or have they been in use a
while?  I'm betting they've been in use a while, and have probably been
doing many writes over the years.

Two things to note here:

1) The errors you've shown are only happening on writes, not reads.  Of
course if you omitted information then this isn't an accurate statement.
2) Timeouts are seen when issuing writes to some LBA regions.

How full is the CF card, disk-space-wise?  Not just ad0s4h, I'm talking
about the entire card.  How much space is roughly available?  They're
very small CF cards (1.8GByte roughly), and the less space available,
the less effectiveness of wear levelling (and in some cases the slower
the writes are).

Reason I ask: given that these are CF cards, this smells of cards which
are simply "worn down".  CF cards have limited numbers of writes, and
the card may be "freaking out" internally when attempting to write to
some LBAs which map to CF sectors that are, in effect, "bad".  The CF
cards' ECC implementation may be buggy, or may simply be "spinning hard"
for too long.  You can read about this sort of behaviour on Wikipedia's
CompactFlash article.

You wouldn't be able to verify this with dd if=/dev/ad0, because those
are read operations.  You could zero the media (dd if=/dev/zero
of=/dev/ad0) as a form of verification if you wanted.

Do you happen to know if these CF cards support SMART?  If so,
installing smartmontools (version 5.42 or newer please) and providing
output from "smartctl -a /dev/ad0" may be helpful to me, but I make no
guarantees anything of use will be shown there.

Overall my advice would be to replace the CF cards, especially if they
have been in use for a long while.  It really doesn't matter to me that
it's happening on 3 machines (honest), especially if these are 6.2
machines with CF cards that have been in use for years.  We're lucky to
get 2 years out of our CF cards on our Juniper M120/320s before they
start spitting I/O errors.  Pick larger CF cards as well; more space =
more room for effective wear levelling.

> ?
> ad0: 1882MB  at ata0-master PIO4
> atapci0:  port 
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x5070-0x507f mem 0x80301000-0x803013ff 
> at device 31.1 on pci0
> ata0:  on atapci0
> ata1:  on atapci0
> atapci1:  port 
> 0x5088-0x508f,0x50a4-0x50a7,0x5080-0x5087,0x50a0-0x50a3,0x5060-0x506f irq 15 
> at device 31.2 on pci0
> ata2:  on atapci1
> ata3:  on atapci1ad0s4h is basically a r/w ufs partition on 
> the box where almost anything that needs to be written goes.
> trace
> Tracing pid 1101 tid 100043 td 0x656d8460
> kdb_enter(608cc388,6246,656d8460,64ba1400,6095d580,...) at kdb_enter+0x2b
> siointr1(64ba1400) at siointr1+0xf0
> siointr(64ba1400) at siointr+0x38
> intr_execute_handler(6095d580,f0a4ab04,6,6095d580,f0a4aafc,...) at 
> intr_execute_handler+0x61
> int

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-13 Thread Jason Hellenthal



On Mon, Feb 13, 2012 at 08:43:08PM -0800, john fleming wrote:
> Just thought i would post over here as i'm not getting a warm fuzzy from 
> checkpoint about being able to find the root cause of an issue. I have a 
> large install base of IPSO checkpoint firewalls, which are based on FreeBSD 
> 6.2. I've had 3 firewalls hang basically the same way, with something that 
> looks like a filesystem issue or an issue with a CF card. 
>  
> Does anyone happen to know of any bugs (i've been looking around) that could 
> cause something like that? Granted, it could be a batch of bad CF cards, but 
> its odd that i'm seeing the same thing on 3 different boxes and once rebooted 
> they seem ok.
>  
> Also is it possible to get useful info form the atacontroller when things go 
> south like this from the ddb prompt?
>  
> This is what shows in show msgbuf
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> g_vfs_done():ad0s4h[WRITE(offset=33849344, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=33980416, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34111488, length=131072)]error = 5
>  g_vfs_done():ad0s4h[WRITE(offset=34242560, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34373632, length=131072)]error = 5 
>  
> ad0: 1882MB  at ata0-master PIO4
> atapci0:  port 
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x5070-0x507f mem 0x80301000-0x803013ff 
> at device 31.1 on pci0
> ata0:  on atapci0
> ata1:  on atapci0
> atapci1:  port 
> 0x5088-0x508f,0x50a4-0x50a7,0x5080-0x5087,0x50a0-0x50a3,0x5060-0x506f irq 15 
> at device 31.2 on pci0
> ata2:  on atapci1
> ata3:  on atapci1ad0s4h is basically a r/w ufs partition on 
> the box where almost anything that needs to be written goes.
> trace
> Tracing pid 1101 tid 100043 td 0x656d8460
> kdb_enter(608cc388,6246,656d8460,64ba1400,6095d580,...) at kdb_enter+0x2b
> siointr1(64ba1400) at siointr1+0xf0
> siointr(64ba1400) at siointr+0x38
> intr_execute_handler(6095d580,f0a4ab04,6,6095d580,f0a4aafc,...) at 
> intr_execute_handler+0x61
> intr_execute_handlers(6095d580,f0a4ab04,6,0,656d8460,...) at 
> intr_execute_handlers+0x40
> atpic_handle_intr(4) at atpic_handle_intr+0x96
> Xatpic_intr4() at Xatpic_intr4+0x20
> --- interrupt, eip = 0x606044af, esp = 0xf0a4ab48, ebp = 0xf0a4ab5c ---
> lockmgr(e1456a04,6,0,656d8460) at lockmgr+0x58f
> getdirtybuf(e14569a4,60a405e4,1) at getdirtybuf+0x2e2
> flush_deplist(68b30850,1,f0a4abb8) at flush_deplist+0x30
> flush_inodedep_deps(656fa28c,1f235) at flush_inodedep_deps+0xcf
> softdep_sync_metadata(65964618) at softdep_sync_metadata+0x61
> ffs_syncvnode(65964618,1) at ffs_syncvnode+0x3a2
> ffs_fsync(f0a4ac74) at ffs_fsync+0x12
> VOP_FSYNC_APV(60949260,f0a4ac74) at VOP_FSYNC_APV+0x38
> fsync(656d8460,f0a4acb4) at fsync+0x170
> syscall(805003b,806003b,5fbf003b,805,288be450,...) at syscall+0x2ee
> Xint0x80_syscall() at Xint0x80_syscall+0x1f

This looks to be a problem with softupdates and CF cards. Can you get
this to repeat on a brand new (good) card ?

-- 
;s =;
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

6.2-Release ..ish.. CF + ata == freeze?

2012-02-13 Thread john fleming

Just thought i would post over here as i'm not getting a warm fuzzy from 
checkpoint about being able to find the root cause of an issue. I have a large 
install base of IPSO checkpoint firewalls, which are based on FreeBSD 6.2. I've 
had 3 firewalls hang basically the same way, with something that looks like a 
filesystem issue or an issue with a CF card. 
 
Does anyone happen to know of any bugs (i've been looking around) that could 
cause something like that? Granted, it could be a batch of bad CF cards, but 
its odd that i'm seeing the same thing on 3 different boxes and once rebooted 
they seem ok.
 
Also is it possible to get useful info form the atacontroller when things go 
south like this from the ddb prompt?
 
This is what shows in show msgbuf
ad0: timeout waiting to issue command
ad0: error issuing WRITE command
ad0: timeout waiting to issue command
ad0: error issuing WRITE command
ad0: timeout waiting to issue command
ad0: error issuing WRITE command
ad0: timeout waiting to issue command
ad0: error issuing WRITE command
g_vfs_done():ad0s4h[WRITE(offset=33849344, length=131072)]error = 5 
g_vfs_done():ad0s4h[WRITE(offset=33980416, length=131072)]error = 5 
g_vfs_done():ad0s4h[WRITE(offset=34111488, length=131072)]error = 5
 g_vfs_done():ad0s4h[WRITE(offset=34242560, length=131072)]error = 5 
g_vfs_done():ad0s4h[WRITE(offset=34373632, length=131072)]error = 5 
 
ad0: 1882MB  at ata0-master PIO4
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x5070-0x507f mem 0x80301000-0x803013ff at 
device 31.1 on pci0
ata0:  on atapci0
ata1:  on atapci0
atapci1:  port 
0x5088-0x508f,0x50a4-0x50a7,0x5080-0x5087,0x50a0-0x50a3,0x5060-0x506f irq 15 at 
device 31.2 on pci0
ata2:  on atapci1
ata3:  on atapci1ad0s4h is basically a r/w ufs partition on the 
box where almost anything that needs to be written goes.
trace
Tracing pid 1101 tid 100043 td 0x656d8460
kdb_enter(608cc388,6246,656d8460,64ba1400,6095d580,...) at kdb_enter+0x2b
siointr1(64ba1400) at siointr1+0xf0
siointr(64ba1400) at siointr+0x38
intr_execute_handler(6095d580,f0a4ab04,6,6095d580,f0a4aafc,...) at 
intr_execute_handler+0x61
intr_execute_handlers(6095d580,f0a4ab04,6,0,656d8460,...) at 
intr_execute_handlers+0x40
atpic_handle_intr(4) at atpic_handle_intr+0x96
Xatpic_intr4() at Xatpic_intr4+0x20
--- interrupt, eip = 0x606044af, esp = 0xf0a4ab48, ebp = 0xf0a4ab5c ---
lockmgr(e1456a04,6,0,656d8460) at lockmgr+0x58f
getdirtybuf(e14569a4,60a405e4,1) at getdirtybuf+0x2e2
flush_deplist(68b30850,1,f0a4abb8) at flush_deplist+0x30
flush_inodedep_deps(656fa28c,1f235) at flush_inodedep_deps+0xcf
softdep_sync_metadata(65964618) at softdep_sync_metadata+0x61
ffs_syncvnode(65964618,1) at ffs_syncvnode+0x3a2
ffs_fsync(f0a4ac74) at ffs_fsync+0x12
VOP_FSYNC_APV(60949260,f0a4ac74) at VOP_FSYNC_APV+0x38
fsync(656d8460,f0a4acb4) at fsync+0x170
syscall(805003b,806003b,5fbf003b,805,288be450,...) at syscall+0x2ee
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--More--
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 9.0-RC1 freeze after swapoff/swapon procedure on md/vnode-backend file

2011-10-26 Thread Sergey Kandaurov

On 26 October 2011 23:31, Subbsd  wrote:
> Hi
>
> I get easy reproducible  a hang-up servers that use the file-based
> swap file after swapoff / swapon procedure (in this case, some of the
> data must be swapped). For example:
>
> 1) dd if=/dev/zero of=/usr/swp1 bs=1m count=100
> 2) mdconfig -a -t vnode -f /usr/swp1
> 3) swapon /dev/md0
> 4) begin to allocated memory, for example by simple:
> tail /dev/zero
>
> 5) after a filling of some percent, do swapoff /dev/md0, then swapon
> /dev/md0. you can try this procedure again.
>
> The system may stop responding to commands and freezes or locks up
> after some time. From the outside - the core lives (icmp response
> goes) but the disk system is not available.
>
> PS: one of my server to my mind is frozen without swapoff/on - just
> had three swapfile, a day after he crashed.

Something interesting while trying to reproduce your problem:

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1048970, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1057174, size: 4096
panic: swapoff: failed to locate 16056 swap blocks
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at 0x802e009a = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0x80486d87 = kdb_backtrace+0x37
panic() at 0x8044f6ee = panic+0x2ee
swapoff_one() at 0x80687425 = swapoff_one+0x475
sys_swapoff() at 0x8068789b = sys_swapoff+0x1bb
amd64_syscall() at 0x806c60c9 = amd64_syscall+0x299
Xfast_syscall() at 0x806b1467 = Xfast_syscall+0xf7
--- syscall (424, FreeBSD ELF64, sys_swapoff), rip = 0x800ab307c, rsp
= 0x7fffd9c8, rbp = 0 ---
KDB: enter: panic
[ thread pid 63255 tid 100211 ]
Stopped at  0x8048696b = kdb_enter+0x3b:movq
$0,0x735f72(%rip)

Below is a trace for a process on another CPU that's doing intensive
malloc+bzero in userland:

db> bt 63066
Tracing pid 63066 tid 100199 td 0xfe000e89f000
cpustop_handler() at 0x806bb46b = cpustop_handler+0x2b
ipi_nmi_handler() at 0x806bb540 = ipi_nmi_handler+0x50
trap() at 0x806c7035 = trap+0x2a5
nmi_calltrap() at 0x806b15bf = nmi_calltrap+0x8
--- trap 0x13, rip = 0x8043e0d0, rsp = 0x80dc4dc0, rbp
= 0xff80908de750 ---
_mtx_unlock_flags() at 0x8043e0d0 = _mtx_unlock_flags+0x170
swp_pager_meta_ctl() at 0x806841aa = swp_pager_meta_ctl+0xea
swap_pager_haspage() at 0x80684272 = swap_pager_haspage+0x42
vm_fault_hold() at 0x8068e379 = vm_fault_hold+0x599
trap_pfault() at 0x806c6c26 = trap_pfault+0xe6
trap() at 0x806c733f = trap+0x5af
calltrap() at 0x806b1183 = calltrap+0x8
--- trap 0xc, rip = 0x4006ed, rsp = 0x7fffdad0, rbp = 0x7fffdae0 ---

That corresponds to kgdb:

#9  0x8044f6e4 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:624
#10 0x80687425 in swapoff_one (sp=0xfe000dc3dc00,
cred=0xff808e5bf340) at /usr/src/sys/vm/swap_pager.c:1774
#11 0x8068789b in sys_swapoff (td=0xfe000e90f000,
uap=Variable "uap" is not available.
)
at /usr/src/sys/vm/swap_pager.c:2236
#12 0x806c60c9 in amd64_syscall (td=0xfe000e90f000, traced=0)
at subr_syscall.c:131
---Type  to continue, or q  to quit---
#13 0x806b1467 in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:387

and

(kgdb) thr 108
[Switching to thread 108 (Thread 100199)]#0  cpustop_handler ()
at /usr/src/sys/amd64/amd64/mp_machdep.c:1394
1394CPU_SET_ATOMIC(cpu, &stopped_cpus);
(kgdb) bt
#0  cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1394
#1  0x806bb540 in ipi_nmi_handler ()
at /usr/src/sys/amd64/amd64/mp_machdep.c:1376
#2  0x806c7035 in trap (frame=0x80dc4d10)
at /usr/src/sys/amd64/amd64/trap.c:200
#3  0x806b15bf in nmi_calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:501
#4  0x8043e0d0 in _mtx_unlock_flags (m=0x80d8af80, opts=0,
file=0x80791e48 "/usr/src/sys/vm/swap_pager.c", line=2040)
at /usr/src/sys/kern/kern_mutex.c:221
[smth. wrong there -- no further stack: swap_pager_*,  etc]

Here both swap_pager_swapoff() and swp_pager_meta_ctl() contend on
swhash_mtx. Or rather that's due to low limit set on retries counter?


Let's see again for another crash:

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1062783, size: 4096
panic: swapoff: failed to locate 22133 swap blocks
cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at 0x802e009a = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0x80486d87 = kdb_backtrace+0x37
panic() at 0x8044f6ee = panic+0x2ee
swapoff_one() at 0x80687425 = swapoff_one+0x475
sys_swapoff() at 0x8068789b = sys_swapoff+0x1bb
amd64_syscall() at 0x806c60c9 = amd64_syscall+0x299
Xfast_syscall() at 0x806b1467 = Xfast_syscall+0xf7
--- syscall (424, FreeBSD ELF64, sys_swapoff), rip = 0x800ab307c, rsp
= 0x7ff

FreeBSD 9.0-RC1 freeze after swapoff/swapon procedure on md/vnode-backend file

2011-10-26 Thread Subbsd

Hi

I get easy reproducible  a hang-up servers that use the file-based
swap file after swapoff / swapon procedure (in this case, some of the
data must be swapped). For example:

1) dd if=/dev/zero of=/usr/swp1 bs=1m count=100
2) mdconfig -a -t vnode -f /usr/swp1
3) swapon /dev/md0
4) begin to allocated memory, for example by simple:
tail /dev/zero

5) after a filling of some percent, do swapoff /dev/md0, then swapon
/dev/md0. you can try this procedure again.

The system may stop responding to commands and freezes or locks up
after some time. From the outside - the core lives (icmp response
goes) but the disk system is not available.

PS: one of my server to my mind is frozen without swapoff/on - just
had three swapfile, a day after he crashed.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: [HEADSUP]: ports feature freeze starts soon

2011-10-17 Thread Erwin Lansing

On Fri, Oct 07, 2011 at 11:20:28AM +0200, Erwin Lansing wrote:
> In preparation for 9.0 the ports tree will be in feature freeze
> after release candidate 2 (RC2)is released, currently planned for
> October 17.
> 
Depending on your timezone, October 17 has come and gone and the ports
tree has not frozen yet.  As always, we'll follow the actual dates
during the release cycle and not the estimated dates in the tentative
schedule.  A rough guess would be that RC2, and thus the ports feature
freeze, will happed at the end of the month, so please take this as a
reminder to get anything you want included in the release into the
tree as soon as possible.

Erwin

-- 
Erwin Lansing   http://droso.org
Prediction is very difficult
especially about the futureer...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Fwd: Re: [HEADSUP]: ports feature freeze starts soon

2011-10-08 Thread Chris Rees

Just on case anyone's not on ports@:

-- Forwarded message --
From: "Chris Rees" 
Date: 8 Oct 2011 10:30
Subject: Re: [HEADSUP]: ports feature freeze starts soon
To: "Thomas Mueller" 
Cc: , "Erwin Lansing" 

On 8 October 2011 10:22, Thomas Mueller  wrote:
> from Erwin Lansing :
>
>> In preparation for 9.0 the ports tree will be in feature freeze
>> after release candidate 1 (RC2)is released, currently planned for
>> October 17.
>
> Was there a typo here?  Did you mean release candidate 1 or 2?
>
> RC1 seems more logical, since RC1 has not been released yet,
> and October 17 is only nine days away.
>


-- Forwarded message --
From: Erwin Lansing 
Date: 7 October 2011 17:34
Subject: Re: [HEADSUP]: ports feature freeze starts soon
To: "develop...@freebsd.org" 
On Oct 7, 2011, at 11:20, Erwin Lansing  wrote:

> In preparation for 9.0 the ports tree will be in feature freeze
> after release candidate 1 (RC2)is released, currently planned for
> October 17.
>
Sorry about the typo, just to be clear I did mean RC2, not RC1 as
usual as an RC3 has been planned in this release cycle.

Erwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

[HEADSUP]: ports feature freeze starts soon

2011-10-07 Thread Erwin Lansing

In preparation for 9.0 the ports tree will be in feature freeze
after release candidate 1 (RC2)is released, currently planned for
October 17.

If you have any commits with high impact planned, get them in the tree
before then and if they require an experimental build, have a request
for one in portmgr hands within the next few days.

Note that this again will be a feature freeze and not a full freeze.
Normal upgrade, new ports, and changes that only affect other branches
will be allowed without prior approval but with the extra
Feature safe: yes tag in the commit message.  Any commit that is sweeping,
i.e. touches a large number of ports, infrastructural changes, commts to
ports with unusually high number of dependencies, and any other commit
that requires the rebuilding of many packages will not be allowed
without prior explicit approval from portmgr after that date.

-erwin

-- 
Erwin Lansing   http://droso.org
Prediction is very difficult
especially about the futureer...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: System freeze: Adaptec (aac) timeouts (releng 8)

2011-09-15 Thread Elliot Finley

On Thu, Sep 15, 2011 at 12:13 AM, Dennis Koegel  wrote:
> I'm not aware how licensing issues play in here, but apart from that, it
> should be easy to patch this into base. (I was already half-way there
> yesterday and I think I could work up a patch against HEAD and 8.x).

I won't worry about filing a PR then.  Sounds like you've got it under control.

Thanks!

Elliot
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: System freeze: Adaptec (aac) timeouts (releng 8)

2011-09-14 Thread Dennis Koegel

On Wed, Sep 14, 2011 at 07:26:20PM -0700, Jeremy Chadwick wrote:
> I'm actually very surprised to hear there's an official FreeBSD driver
> on Adaptec's site that's actually intended for FreeBSD 8.x.

As far as I can tell from the source, it's the very same driver (same
source code and copyright notices), only that Adaptec has taken over
development; fbsd has 2.1, Adaptec has continued development to a
version 2.4.

I'm not aware how licensing issues play in here, but apart from that, it
should be easy to patch this into base. (I was already half-way there
yesterday and I think I could work up a patch against HEAD and 8.x).

- D.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: System freeze: Adaptec (aac) timeouts (releng 8)

2011-09-14 Thread Jeremy Chadwick

On Thu, Sep 15, 2011 at 09:36:43AM +0800, Adrian Chadd wrote:
> On 15 September 2011 00:38, Elliot Finley  wrote:
> > I was having the exact same problem using an Adaptec 52445. ?After
> > downloading and using the latest driver from the adaptec website, the
> > problems stopped. ?I haven't had a single freeze since using the new
> > code. ?The newest driver from the website has source code with it, so
> > it shouldn't be that big of a deal to incorporate it into the base
> > system. ? ?I emailed the authors of the aac driver (Mike Smith and
> > Scott Long), but they have both retired. ?So I'm not really sure how
> > to get this code into the base. ?If anyone knows, please take up the
> > charge.
> 
> File a PR and hound people on freebsd-current until it gets done?

...which will either be ignored given that (TMK) nobody is maintaining
the Adaptec drivers, or will be addressed in HEAD which won't help the
OP who runs RELENG_8 until an MFC happens -- and if it happens (I forget
how MFC approvals work).  :-)

As for the lack of aac(4) maintainer, I'm not sure how this should be
addressed in the aac(4) man page.  AUTHORS tends to indicate the names
of the people who created or were involved in creating/maintaining said
driver, which is sometimes (but on FreeBSD hardly always) the
individual(s) who currently support it.  In the case that there is a
different maintainer, how does this get denoted in the man page?

I'm actually very surprised to hear there's an official FreeBSD driver
on Adaptec's site that's actually intended for FreeBSD 8.x.  Last I knew
they had basically blown off FreeBSD support.  I wonder who at Adaptec
is responsible for the FreeBSD driver?  It would be good to know to
involve them in all communiqu?s.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: System freeze: Adaptec (aac) timeouts (releng 8)

2011-09-14 Thread Adrian Chadd

On 15 September 2011 00:38, Elliot Finley  wrote:
> I was having the exact same problem using an Adaptec 52445.  After
> downloading and using the latest driver from the adaptec website, the
> problems stopped.  I haven't had a single freeze since using the new
> code.  The newest driver from the website has source code with it, so
> it shouldn't be that big of a deal to incorporate it into the base
> system.    I emailed the authors of the aac driver (Mike Smith and
> Scott Long), but they have both retired.  So I'm not really sure how
> to get this code into the base.  If anyone knows, please take up the
> charge.

File a PR and hound people on freebsd-current until it gets done?


Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: System freeze: Adaptec (aac) timeouts (releng 8)

2011-09-14 Thread Elliot Finley

I was having the exact same problem using an Adaptec 52445.  After
downloading and using the latest driver from the adaptec website, the
problems stopped.  I haven't had a single freeze since using the new
code.  The newest driver from the website has source code with it, so
it shouldn't be that big of a deal to incorporate it into the base
system.I emailed the authors of the aac driver (Mike Smith and
Scott Long), but they have both retired.  So I'm not really sure how
to get this code into the base.  If anyone knows, please take up the
charge.

On Wed, Sep 14, 2011 at 2:08 AM, Dennis Koegel  wrote:
> Cheers,
>
> we have a reproducible system freeze due to Adaptec driver (aac) timeouts:
>
> Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xff80005ae4c0 (TYPE 502) 
> TIMEOUT AFTER 129 SECONDS
> Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xff80005ac0e0 (TYPE 502) 
> TIMEOUT AFTER 129 SECONDS
> Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xff80005b0fa0 (TYPE 502) 
> TIMEOUT AFTER 129 SECONDS
> 
>
> Once this happens, the userland seems to be alive, but the controller is
> completely dead. As soon as the disk subsystem is involved, any process
> hangs forever (e.g. SSH crypto-exchange still happens, but a shell won't
> even start anymore).
>
> We observe the same issue on two systems of (mostly) identical spec, so
> it's not a hardware issue.
>
> Apparently this only happens under heavy disk i/o and high cpu load.
> Notably high write throughput plus a 'zpool scrub' on a large
> GELI-backed zpool usually triggers the problem after a few hours.
> Without high activity, they run smooth for weeks.
>
> Both systems are amd64 with an Adaptec 5805 controller and 16 disks (of
> which two form a RAID-1 system volume (UFS), and the remaining 14 serve
> as JBOD for a large zpool -- a total of 15 "aacd" devices).
>
> Both were running 8.2R originally. I've taken them to 8-STABLE now and
> also applied svn r222951 (where the MFC was forgotten, it seems), but
> the problem remains.
>
> Any help is greatly appreciated.
>
> Thanks,
> - D.
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

System freeze: Adaptec (aac) timeouts (releng 8)

2011-09-14 Thread Dennis Koegel

Cheers,

we have a reproducible system freeze due to Adaptec driver (aac) timeouts:

Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xff80005ae4c0 (TYPE 502) TIMEOUT 
AFTER 129 SECONDS
Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xff80005ac0e0 (TYPE 502) TIMEOUT 
AFTER 129 SECONDS
Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xff80005b0fa0 (TYPE 502) TIMEOUT 
AFTER 129 SECONDS


Once this happens, the userland seems to be alive, but the controller is
completely dead. As soon as the disk subsystem is involved, any process
hangs forever (e.g. SSH crypto-exchange still happens, but a shell won't
even start anymore).

We observe the same issue on two systems of (mostly) identical spec, so
it's not a hardware issue.

Apparently this only happens under heavy disk i/o and high cpu load.
Notably high write throughput plus a 'zpool scrub' on a large
GELI-backed zpool usually triggers the problem after a few hours.
Without high activity, they run smooth for weeks.

Both systems are amd64 with an Adaptec 5805 controller and 16 disks (of
which two form a RAID-1 system volume (UFS), and the remaining 14 serve
as JBOD for a large zpool -- a total of 15 "aacd" devices).

Both were running 8.2R originally. I've taken them to 8-STABLE now and
also applied svn r222951 (where the MFC was forgotten, it seems), but
the problem remains.

Any help is greatly appreciated.

Thanks,
- D.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

zfsboot from 8.2RC1 freeze at boot time

2010-12-28 Thread Henri Hennebert


Hello and merry Xmas to everybody,

I upgrade a remote server from 8.1-RELEASE to 8.2-RC1.

This server have one disk:

[r...@tignes ~]# gpart show
=>   63  488397105  ada0  MBR  (233G)
 63   12583809 1  freebsd  (6.0G)
   12583872  475813296 2  freebsd  [active]  (227G)

=>   0  12583809  ada0s1  BSD  (6.0G)
 0   8388608   1  freebsd-ufs  (4.0G)
   8388608   4195201   2  freebsd-swap  (2.0G)

=>0  475813296  ada0s2  BSD  (227G)
  0  475813296   1  freebsd-zfs  (227G)


It boot with zfsboot from ada0s2 containing a zfs pool.

After upgrading the zfsboot just to be able to upgrade the pool to v15, 
the server don't boot anymore.


It is a remote server, so I reproduce this config under VirtualBox. The 
boot freeze after zfsboot displaying "-".


I grab a old zfsboot from another server running 8.1-STABLE (r213582) 
which boot fine.


I put the zfsboot from r213582 (zpool v15 aware) on ada0s2 and bingo, 
the server boot normally.


Henri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ichwd causes freeze instead of reset

2010-08-21 Thread Jeremy Chadwick

On Sat, Aug 21, 2010 at 11:09:04PM +0200, Stefan Bethke wrote:
> Am 21.08.2010 um 23:02 schrieb Andriy Gapon:
> 
> > on 21/08/2010 23:33 Stefan Bethke said the following:
> >> Hi,
> >> 
> >> somewhat foolishly, I activated watchdogd and ichwd on a remote box, and
> >> while testing it (by suspending watchdogd), apparently the watchdog
> >> triggered.  But instead of resetting, the machine does not react anymore on
> >> the serial console.  I will have to wait until Monday to get physical 
> >> access,
> >> so it might be hanging or just switched itself off; I have no way of 
> >> telling
> >> right now.
> >> 
> >> ichwd probes as: ichwd0:  on isa0 ichwd0: Intel
> >> ICH7 watchdog timer (ICH7 or equivalent) ppc0: cannot reserve I/O port 
> >> range
> >> 
> >> (not sure why ppc0 is getting involved at that point.)
> >> 
> >> FreeBSD lokschuppen.zs64.net 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #30: Thu
> >> Jul 15 12:58:20 UTC 2010
> >> r...@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT  amd64
> >> 
> >> Once the box is up again, is it worthwile trying ichwd again, should I try
> >> and use SW_WATCHDOG, or forget about it?
> > 
> > Just test it more while having physical access before making any 
> > conclusions.
> > There could be a number of radically different possibilities ranging from
> > hardware peculiarities to configuration problems to pilot errors to etc.
> 
> I guess what I'm looking for is some confirmation that ichwd is working 
> properly on this particular hardware: Asus Pundit P4 P5G41 with a G41 chipset.
> 
> Below are pciconv -lvb and dmesg:
> 
> hos...@pci0:0:0:0:class=0x06 card=0x836d1043 chip=0x2e308086 rev=0x03 
> hdr=0x00
> vendor = 'Intel Corporation'
> class  = bridge
> subclass   = HOST-PCI
> vgap...@pci0:0:2:0:   class=0x03 card=0x836d1043 chip=0x2e328086 rev=0x03 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = 'Intel G41 express graphics 
> (PCIVEN_8086&DEV_2E32&SUBSYS_31031565&REV_033&115)'
> class  = display
> subclass   = VGA
> bar   [10] = type Memory, range 64, base 0xfe40, size 4194304, enabled
> bar   [18] = type Prefetchable Memory, range 64, base 0xe000, size 
> 268435456, enabled
> bar   [20] = type I/O Port, range 32, base 0xbc00, size  8, enabled
> vgap...@pci0:0:2:1:   class=0x038000 card=0x836d1043 chip=0x2e338086 rev=0x03 
> hdr=0x00
> vendor = 'Intel Corporation'
> class  = display
> bar   [10] = type Memory, range 64, base 0xfe80, size 1048576, enabled
> no...@pci0:0:27:0:class=0x040300 card=0x82fe1043 chip=0x27d88086 rev=0x01 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = 'IDT High Definition Audio Driver  (BA101897)'
> class  = multimedia
> subclass   = HDA
> bar   [10] = type Memory, range 64, base 0xfe3f8000, size 16384, enabled
> pc...@pci0:0:28:0:class=0x060400 card=0x81791043 chip=0x27d08086 rev=0x01 
> hdr=0x01
> vendor = 'Intel Corporation'
> device = '82801G (ICH7 Family) PCIe Root Port'
> class  = bridge
> subclass   = PCI-PCI
> pc...@pci0:0:28:2:class=0x060400 card=0x81791043 chip=0x27d48086 rev=0x01 
> hdr=0x01
> vendor = 'Intel Corporation'
> device = '82801G (ICH7 Family) PCIe Root Port'
> class  = bridge
> subclass   = PCI-PCI
> pc...@pci0:0:28:3:class=0x060400 card=0x81791043 chip=0x27d68086 rev=0x01 
> hdr=0x01
> vendor = 'Intel Corporation'
> device = '82801G (ICH7 Family) PCIe Root Port'
> class  = bridge
> subclass   = PCI-PCI
> uh...@pci0:0:29:0:class=0x0c0300 card=0x81791043 chip=0x27c88086 rev=0x01 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82801G (ICH7 Family) USB Universal Host Controller'
> class  = serial bus
> subclass   = USB
> bar   [20] = type I/O Port, range 32, base 0xb400, size 32, enabled
> uh...@pci0:0:29:1:class=0x0c0300 card=0x81791043 chip=0x27c98086 rev=0x01 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82801G (ICH7 Family) USB Universal Host Controller'
> class  = serial bus
> subclass   = USB
> bar   [20] = type I/O Port, range 32, base 0xb480, size 32, enabled
> uh...@pci0:0:29:2:class=0x0c0300 card=0x81791043 chip=0x27ca8086 rev=0x01 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82801G (ICH7 Family) USB Universal Host Controller'
> class  = serial bus
> subclass   = USB
> bar   [20] = type I/O Port, range 32, base 0xb800, size 32, enabled
> uh...@pci0:0:29:3:class=0x0c0300 card=0x81791043 chip=0x27cb8086 rev=0x01 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82801G (ICH7 Family) USB Universal Host Controller'
> class  = serial bus
> subclass   = USB
> bar   [20] = type I/O Port, range 32, base 0xb880, size 32, enabled
> eh...@pci0:0:29:7:class=0x0c0320 card=0x81791043 chip=0x27cc8086 rev=0

Re: ichwd causes freeze instead of reset

2010-08-21 Thread Stefan Bethke

Am 21.08.2010 um 23:24 schrieb Mike Tancsa:

> At 05:09 PM 8/21/2010, Stefan Bethke wrote:
> 
>> I guess what I'm looking for is some confirmation that ichwd is working 
>> properly on this particular hardware: Asus Pundit P4 P5G41 with a G41 
>> chipset.
>> 
> 
> Dont know about that particular MB implementation, but I have a number of 
> various ICH7 based boards where ichwd works as expected.  The freeze could 
> some something as simple as the box is waiting for keyboard input at the BIOS 
> prompt, or the BIOS option after a watchdog reset is to power off 
> However, I have only seen that option in later boards.

Thanks, I'll check that out Monday morning.


Stefan

-- 
Stefan BethkeFon +49 151 14070811



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ichwd causes freeze instead of reset

2010-08-21 Thread Mike Tancsa


At 05:09 PM 8/21/2010, Stefan Bethke wrote:

I guess what I'm looking for is some confirmation that ichwd is 
working properly on this particular hardware: Asus Pundit P4 P5G41 
with a G41 chipset.




Dont know about that particular MB implementation, but I have a 
number of various ICH7 based boards where ichwd works as 
expected.  The freeze could some something as simple as the box is 
waiting for keyboard input at the BIOS prompt, or the BIOS option 
after a watchdog reset is to power off However, I have only seen 
that option in later boards.


---Mike





Mike Tancsa,  tel +1 519 651 3400
Sentex Communications,m...@sentex.net
Providing Internet since 1994www.sentex.net
Cambridge, Ontario Canada www.sentex.net/mike

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ichwd causes freeze instead of reset

2010-08-21 Thread Stefan Bethke

Am 21.08.2010 um 23:02 schrieb Andriy Gapon:

> on 21/08/2010 23:33 Stefan Bethke said the following:
>> Hi,
>> 
>> somewhat foolishly, I activated watchdogd and ichwd on a remote box, and
>> while testing it (by suspending watchdogd), apparently the watchdog
>> triggered.  But instead of resetting, the machine does not react anymore on
>> the serial console.  I will have to wait until Monday to get physical access,
>> so it might be hanging or just switched itself off; I have no way of telling
>> right now.
>> 
>> ichwd probes as: ichwd0:  on isa0 ichwd0: Intel
>> ICH7 watchdog timer (ICH7 or equivalent) ppc0: cannot reserve I/O port range
>> 
>> (not sure why ppc0 is getting involved at that point.)
>> 
>> FreeBSD lokschuppen.zs64.net 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #30: Thu
>> Jul 15 12:58:20 UTC 2010
>> r...@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT  amd64
>> 
>> Once the box is up again, is it worthwile trying ichwd again, should I try
>> and use SW_WATCHDOG, or forget about it?
> 
> Just test it more while having physical access before making any conclusions.
> There could be a number of radically different possibilities ranging from
> hardware peculiarities to configuration problems to pilot errors to etc.

I guess what I'm looking for is some confirmation that ichwd is working 
properly on this particular hardware: Asus Pundit P4 P5G41 with a G41 chipset.

Below are pciconv -lvb and dmesg:

hos...@pci0:0:0:0:  class=0x06 card=0x836d1043 chip=0x2e308086 rev=0x03 
hdr=0x00
vendor = 'Intel Corporation'
class  = bridge
subclass   = HOST-PCI
vgap...@pci0:0:2:0: class=0x03 card=0x836d1043 chip=0x2e328086 rev=0x03 
hdr=0x00
vendor = 'Intel Corporation'
device = 'Intel G41 express graphics 
(PCIVEN_8086&DEV_2E32&SUBSYS_31031565&REV_033&115)'
class  = display
subclass   = VGA
bar   [10] = type Memory, range 64, base 0xfe40, size 4194304, enabled
bar   [18] = type Prefetchable Memory, range 64, base 0xe000, size 
268435456, enabled
bar   [20] = type I/O Port, range 32, base 0xbc00, size  8, enabled
vgap...@pci0:0:2:1: class=0x038000 card=0x836d1043 chip=0x2e338086 rev=0x03 
hdr=0x00
vendor = 'Intel Corporation'
class  = display
bar   [10] = type Memory, range 64, base 0xfe80, size 1048576, enabled
no...@pci0:0:27:0:  class=0x040300 card=0x82fe1043 chip=0x27d88086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = 'IDT High Definition Audio Driver  (BA101897)'
class  = multimedia
subclass   = HDA
bar   [10] = type Memory, range 64, base 0xfe3f8000, size 16384, enabled
pc...@pci0:0:28:0:  class=0x060400 card=0x81791043 chip=0x27d08086 rev=0x01 
hdr=0x01
vendor = 'Intel Corporation'
device = '82801G (ICH7 Family) PCIe Root Port'
class  = bridge
subclass   = PCI-PCI
pc...@pci0:0:28:2:  class=0x060400 card=0x81791043 chip=0x27d48086 rev=0x01 
hdr=0x01
vendor = 'Intel Corporation'
device = '82801G (ICH7 Family) PCIe Root Port'
class  = bridge
subclass   = PCI-PCI
pc...@pci0:0:28:3:  class=0x060400 card=0x81791043 chip=0x27d68086 rev=0x01 
hdr=0x01
vendor = 'Intel Corporation'
device = '82801G (ICH7 Family) PCIe Root Port'
class  = bridge
subclass   = PCI-PCI
uh...@pci0:0:29:0:  class=0x0c0300 card=0x81791043 chip=0x27c88086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = '82801G (ICH7 Family) USB Universal Host Controller'
class  = serial bus
subclass   = USB
bar   [20] = type I/O Port, range 32, base 0xb400, size 32, enabled
uh...@pci0:0:29:1:  class=0x0c0300 card=0x81791043 chip=0x27c98086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = '82801G (ICH7 Family) USB Universal Host Controller'
class  = serial bus
subclass   = USB
bar   [20] = type I/O Port, range 32, base 0xb480, size 32, enabled
uh...@pci0:0:29:2:  class=0x0c0300 card=0x81791043 chip=0x27ca8086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = '82801G (ICH7 Family) USB Universal Host Controller'
class  = serial bus
subclass   = USB
bar   [20] = type I/O Port, range 32, base 0xb800, size 32, enabled
uh...@pci0:0:29:3:  class=0x0c0300 card=0x81791043 chip=0x27cb8086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = '82801G (ICH7 Family) USB Universal Host Controller'
class  = serial bus
subclass   = USB
bar   [20] = type I/O Port, range 32, base 0xb880, size 32, enabled
eh...@pci0:0:29:7:  class=0x0c0320 card=0x81791043 chip=0x27cc8086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = '82801G (ICH7 Family) USB 2.0 Enhanced Host Controller'
class  = serial bus
subclass   = USB
bar   [10] = type Memory, range 32, base 0xfe3f7c00, size 1024, enabled
pc...@pci0:0:30:0:  class

Re: ichwd causes freeze instead of reset

2010-08-21 Thread Andriy Gapon

on 21/08/2010 23:33 Stefan Bethke said the following:
> Hi,
> 
> somewhat foolishly, I activated watchdogd and ichwd on a remote box, and
> while testing it (by suspending watchdogd), apparently the watchdog
> triggered.  But instead of resetting, the machine does not react anymore on
> the serial console.  I will have to wait until Monday to get physical access,
> so it might be hanging or just switched itself off; I have no way of telling
> right now.
> 
> ichwd probes as: ichwd0:  on isa0 ichwd0: Intel
> ICH7 watchdog timer (ICH7 or equivalent) ppc0: cannot reserve I/O port range
> 
> (not sure why ppc0 is getting involved at that point.)
> 
> FreeBSD lokschuppen.zs64.net 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #30: Thu
> Jul 15 12:58:20 UTC 2010
> r...@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT  amd64
> 
> Once the box is up again, is it worthwile trying ichwd again, should I try
> and use SW_WATCHDOG, or forget about it?

Just test it more while having physical access before making any conclusions.
There could be a number of radically different possibilities ranging from
hardware peculiarities to configuration problems to pilot errors to etc.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ichwd causes freeze instead of reset

2010-08-21 Thread Stefan Bethke

Hi,

somewhat foolishly, I activated watchdogd and ichwd on a remote box, and while 
testing it (by suspending watchdogd), apparently the watchdog triggered.  But 
instead of resetting, the machine does not react anymore on the serial console. 
 I will have to wait until Monday to get physical access, so it might be 
hanging or just switched itself off; I have no way of telling right now.

ichwd probes as:
ichwd0:  on isa0
ichwd0: Intel ICH7 watchdog timer (ICH7 or equivalent)
ppc0: cannot reserve I/O port range

(not sure why ppc0 is getting involved at that point.)

FreeBSD lokschuppen.zs64.net 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #30: Thu Jul 
15 12:58:20 UTC 2010 
r...@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT  amd64

Once the box is up again, is it worthwile trying ichwd again, should I try and 
use SW_WATCHDOG, or forget about it?


Thanks,
Stefan

-- 
Stefan BethkeFon +49 151 14070811



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: [HEADSUP]: Ports feature freeze for 8.1 now in effect

2010-06-18 Thread Ion-Mihai Tetcu

On Fri, 18 Jun 2010 14:10:28 +0200
Erwin Lansing  wrote:

> In preparation for 8.1-RELEASE, the ports tree is now in feature
> freeze.
> 
> Normal upgrade, new ports, and changes that only affect other branches
> are allowed without prior approval but with the extra Feature safe:
> yes tag in the commit message. Any commit that is sweeping, i.e.
> touches a large number of ports, infrastructural changes, commits to
> ports with unusually high number of dependent ports, and any other
> commit that requires the rebuilding of many packages is not allowed
> without prior explicit approval from portmgr after that date.
> 
> When in doubt, please do not hesitate to contact portmgr.


>>>> "any commit that requires the rebuilding of many packages"

And this time we will ask for instant back-out of everything that
should had not been committed in the first place.


If you have time, you can always help with unmaintained ports:
http://qat.tecnik93.com/index.php?action=failed_buildports&maintainer=ports%40freebsd.org&;
or even maintained ones:
http://qat.tecnik93.com/index.php?action=failed_buildports


Help us getting a good, stable package set for the release please,

-- 
IOnut - Un^d^dregistered ;) FreeBSD "user"
  "Intellectual Property" is   nowhere near as valuable   as "Intellect"
FreeBSD committer -> ite...@freebsd.org, PGP Key ID 057E9F8B493A297B


signature.asc
Description: PGP signature

[HEADSUP]: Ports feature freeze for 8.1 now in effect

2010-06-18 Thread Erwin Lansing

In preparation for 8.1-RELEASE, the ports tree is now in feature freeze.

Normal upgrade, new ports, and changes that only affect other branches
are allowed without prior approval but with the extra Feature safe: yes
tag in the commit message. Any commit that is sweeping, i.e. touches a
large number of ports, infrastructural changes, commits to ports with
unusually high number of dependent ports, and any other commit that
requires the rebuilding of many packages is not allowed without prior
explicit approval from portmgr after that date.

When in doubt, please do not hesitate to contact portmgr.

-- 
Erwin Lansing   http://droso.org
Prediction is very difficult
especially about the futureer...@freebsd.org


pgpjUJDoWTfnI.pgp
Description: PGP signature

Re: [HEADS UP] Ports feature freeze coming soon

2010-06-14 Thread Erwin Lansing

On Tue, Jun 08, 2010 at 02:20:53PM -0400, FreeBSD portmgr secretary wrote:
> In preparation for 8.1-RELEASE, the ports tree will be in feature freeze
> after release candidate 1 (RC1) is released, currently planned for June 11.

As you may have noticed, RC1 has not been released as yet, but the delay
is not expected to be more than a few days.  The ports feature freeze
will therefore be postponed until this Friday, June 18, 12pm UTC.  We do
still ask you to be conservative with your changes until then.

-erwin

> 
> If you have any commits with high impact planned, get them in the tree
> before then and if they require an experimental build, have a request for
> one in portmgr@ hands within the next few days.
> 
> Note that this again will be a feature freeze and not a full freeze.
> Normal upgrade, new ports, and changes that only affect other branches will
> be allowed without prior approval but with the extra Feature safe: yes tag
> in the commit message.  Any commit that is sweeping, i.e. touches a large
> number of ports, infrastructural changes, commits to ports with unusually
> high number of dependencies, and any other commit that requires the
> rebuilding of many packages will not be allowed without prior explicit
> approval from portmgr@ after that date.
> 
> Thomas
> with portmgr-secretary@ hat on
> 
> -- 
> Thomas Abthorpe   | FreeBSD Ports Management Team Secretary
> tabtho...@freebsd.org | portmgr-secret...@freebsd.org


-- 
Erwin Lansing   http://droso.org
Prediction is very difficult
especially about the futureer...@freebsd.org


pgpEJ11mZa5Tz.pgp
Description: PGP signature

[HEADSUP] ports feature freeze starts soon

2010-06-08 Thread FreeBSD portmgr secretary

In preparation for 8.1-RELEASE, the ports tree will be in feature freeze
after release candidate 1 (RC1) is released, currently planned for June 11.

If you have any commits with high impact planned, get them in the tree
before then and if they require an experimental build, have a request for
one in portmgr@ hands within the next few days.

Note that this again will be a feature freeze and not a full freeze.
Normal upgrade, new ports, and changes that only affect other branches will
be allowed without prior approval but with the extra Feature safe: yes tag
in the commit message.  Any commit that is sweeping, i.e. touches a large
number of ports, infrastructural changes, commits to ports with unusually
high number of dependencies, and any other commit that requires the
rebuilding of many packages will not be allowed without prior explicit
approval from portmgr@ after that date.

Thomas
with portmgr-secretary@ hat on

-- 
Thomas Abthorpe | FreeBSD Ports Management Team Secretary
tabtho...@freebsd.org   | portmgr-secret...@freebsd.org


pgplW61OQcvRl.pgp
Description: PGP signature

Re: Give freeze a chance

2010-05-18 Thread Renato Botelho

On Mon, May 17, 2010 at 5:29 PM, Thomas Abthorpe  wrote:
> The next wave of the challenge, fear, there is one more already
> composed to be released with 8.1!
>
> --
>
> Give Freeze a chance
>  with apologies to John Lennon et al
>
> Ev'rybody's talkin' 'bout
> portism, srcism, docism, cvsism, svnism, tagism
> This-ism, that-ism, ism ism ism
> All we are saying is give freeze a chance
> All we are saying is give freeze a chance
>
> C'mon
> Ev'rybody's talkin' 'bout
> re@, core@, doceng@, donations@, secteam@,
> marketing@, portmgr@, vendor-relations@
> All we are saying is give freeze a chance
> All we are saying is give freeze a chance
>
> Let me tell you now
> Ev'rybody's talkin' 'bout
> Revolution, evolution, i18n, l10n, documentation,
> Integration, administration, applications, congratulations
> All we are saying is give freeze a chance
> All we are saying is give freeze a chance
>
> Ev'rybody's talkin' 'bout
> Erwin Lansing, Mark Linimon, Martin Wilke,
> Pav Lucistnik, Florent Thoumie, Ion-Mihai Tetcu,
> Kris Kennaway, Joe Marcus Clarke, Thomas Abthorpe too
> All we are saying is give freeze a chance
> All we are saying is give freeze a chance

Nice, it makes me remember the old "Breaking the Ports" song...

http://www.mail-archive.com/freebsd-po...@freebsd.org/msg02907.html

-- 
Renato Botelho
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Give freeze a chance

2010-05-17 Thread Thomas Abthorpe

The next wave of the challenge, fear, there is one more already
composed to be released with 8.1!

--

Give Freeze a chance
  with apologies to John Lennon et al

Ev'rybody's talkin' 'bout
portism, srcism, docism, cvsism, svnism, tagism
This-ism, that-ism, ism ism ism
All we are saying is give freeze a chance
All we are saying is give freeze a chance

C'mon
Ev'rybody's talkin' 'bout
re@, core@, doceng@, donations@, secteam@,
marketing@, portmgr@, vendor-relations@
All we are saying is give freeze a chance
All we are saying is give freeze a chance

Let me tell you now
Ev'rybody's talkin' 'bout
Revolution, evolution, i18n, l10n, documentation,
Integration, administration, applications, congratulations
All we are saying is give freeze a chance
All we are saying is give freeze a chance

Ev'rybody's talkin' 'bout
Erwin Lansing, Mark Linimon, Martin Wilke,
Pav Lucistnik, Florent Thoumie, Ion-Mihai Tetcu,
Kris Kennaway, Joe Marcus Clarke, Thomas Abthorpe too
All we are saying is give freeze a chance
All we are saying is give freeze a chance


Thomas

-- 
Thomas Abthorpe | FreeBSD Committer
tabtho...@freebsd.org   | http://people.freebsd.org/~tabthorpe


pgpggnrJ067p0.pgp
Description: PGP signature

Re: Freeze on my laptop.

2010-04-14 Thread Masoom Shaikh

On Wed, Apr 14, 2010 at 3:31 AM, Demelier David
 wrote:
> Hi,
>
>        I'm so sad because FreeBSD is the one which can runs almost perfectly 
> on
>        my laptop. But it freezes. Sometime I just do anything and I want to
>        click on a link in firefox, or open a terminal and then freeze.
>
>        There is no messages, no reboot nothing. Can't know where that come
>        from.
>
>        I'm running 8.0-STABLE on a hp probook 4510s.
>
>        King regards,
> --
> Demelier David
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>

do u have dumpdev set in your rc.conf ? try setting it to AUTO, you
might get a core dump on next reebot
dumpdev="AUTO"

there is very little in your mail to infer anything, do you use
wireless net access ? what graphics card you have ?

these threads might be of help to you

http://lists.freebsd.org/pipermail/freebsd-questions/2010-March/214339.html
http://lists.freebsd.org/pipermail/freebsd-stable/2010-April/056096.html
http://lists.freebsd.org/pipermail/freebsd-hackers/2006-April/016107.html
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Freeze on my laptop.

2010-04-13 Thread Demelier David

On Wed, Apr 14, 2010 at 09:43:22AM +1000, Andrew Snow wrote:
> Demelier David wrote:
> > I'm so sad because FreeBSD is the one which can runs almost perfectly on
> > my laptop. But it freezes. Sometime I just do anything and I want to
> > click on a link in firefox, or open a terminal and then freeze.
> 
> Sounds like a problem with the X graphics driver..  when it next 
> happens, can you press Alt+F1 or Ctrl+Alt+F1 to get back to a text console?
> 
> You might like to try upgrading your version of X to a newer version.
> 
> - Andrew

I'll try with vesa, maybe you right but with last(1) command I get many `crash'.
And I can't go back in console.

The odd thing is that happens often when I use gtk based applications (pidgin,
firefox).

-- 
Demelier David
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Freeze on my laptop.

2010-04-13 Thread Andrew Snow


Demelier David wrote:

I'm so sad because FreeBSD is the one which can runs almost perfectly on
my laptop. But it freezes. Sometime I just do anything and I want to
click on a link in firefox, or open a terminal and then freeze.


Sounds like a problem with the X graphics driver..  when it next 
happens, can you press Alt+F1 or Ctrl+Alt+F1 to get back to a text console?


You might like to try upgrading your version of X to a newer version.

- Andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Freeze on my laptop.

2010-04-13 Thread Demelier David

Hi,

I'm so sad because FreeBSD is the one which can runs almost perfectly on
my laptop. But it freezes. Sometime I just do anything and I want to
click on a link in firefox, or open a terminal and then freeze.

There is no messages, no reboot nothing. Can't know where that come
from.

I'm running 8.0-STABLE on a hp probook 4510s.

King regards,
-- 
Demelier David
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Freeze on closing terminal that runs wpa_supplicant

2010-03-19 Thread Paul B Mahol

On 3/17/10, Mathias Sogorski  wrote:
> Hello!
> I am running 8.0-RELEASE on a notebook with the Intel 3945 WiFi. I usually
> start wpa_supplicant [...]& on a terminal when entering gnome followed by
> the dhcpcd call to use the WiFi connection. After having finished work and
> closing the terminal that runs wpa_supplicant, everything freezes and I have
> to turn the power off. Any suggestions?

That should not happen. So report the bug. You managed to get backtrace?
Did kernel actually crashed?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

1 2 3 4 5 >

1 - 100 of 420 matches

Mail list logo