date:20120214

Re: freebsd 9-stable TOP problem from around Jan 10

2012-02-14 Thread Julian Elischer


On 2/14/12 4:20 PM, Jeremy Chadwick wrote:

On Tue, Feb 14, 2012 at 03:35:01PM -0800, Julian Elischer wrote:

On 2/14/12 10:38 AM, Kevin Oberman wrote:

On Tue, Feb 14, 2012 at 12:23 AM, Julian Elischer   wrote:

Has anyone else seen a  problem with top -H -S?

after a short while the screen gets more and more corrupted..

hitting ^L or turning off S&   H modes helps .. for a while.

If this is a known fixed problem, let me know but I need to co-ordinate with
others
to upgrade the machine in question.

Not seeing it here on 9-stable. Could it be a display issue? I am
using gnome-terminal with TERM defined as 'xterm'.

yeah I'm on a mac with iterm, but running through 'screen' .

it's never been a problem before.. just since we upgraded to 9-stable.

If you remove GNU screen from the picture does the problem go away?  If
so, I'm not surprised.  :-)

Make sure that when you're using GNU screen, that all shells launched
"under/within" screen have TERM=screen.  If they don't, then this is
almost certainly the problem -- GNU screen "translates" between terminal
types, meaning it translates its own terminal type ("screen") into
whatever TERM is currently attached ("xterm", "iterm", whatever).  See
the last 4 paragraphs of my post here to understand what exactly GNU
screen is doing:

http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/063052.html

So, in general, make sure your dotfiles and so on don't mess about with
the $TERM environment variable and you should generally be okay.

it seems to have stopped doing it for no apparent reason

will keep an eye on it. and save this email away for when it does it 
again.



If within GNU screen TERM=screen and you see the problem, but outside of
screen you use TERM=xterm (or something else) but don't see the problem,
then I would almost certainly blame GNU screen.  If you're looking for
something that simply keeps a terminal running in the background, try
nohup or tmux.

Alternately, possibly someone added a "screen" entry to /etc/termcap on
RELENG_9?  I don't use 9 so I have no way to confirm this, but on 8
there is no such entry.


SC|screen|VT 100/ANSI X3.64 virtual terminal:\

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: sysutils/pftop on 9.x+

2012-02-14 Thread Greg Rivers


On Tue, 14 Feb 2012, Florian Smeets wrote:


On 14.02.12 17:14, Fabian Keil wrote:

Greg Rivers  wrote:


sysutils/pftop was marked broken on 9.x and above last March[1].  Are
there any plans to fix it soon?  It's a really handy utility.

[1]
http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/pftop/Makefile?rev=1.17


Please have a look at:
http://www.freebsd.org/cgi/query-pr.cgi?pr=155938

Note that the currently working fix is in the audit trail,
the original fix stopped working after the PF update.


The PR was closed by mistake, I'll take care of it.



Thanks for committing the fix, Florian.  pftop now builds and runs fine; 
tested on recent 9.0-STABLE amd64.  Thanks also to Patrick for his input 
and especially to Fabian for creating the patches and filing the PR.


--
Greg Rivers
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: disk devices speed is ugly

2012-02-14 Thread Adam Vande More

On Tue, Feb 14, 2012 at 10:50 PM, Scott Long  wrote:

>
> Any filesystem that uses bread/bwrite/cluster_read are already using the
> "generic caching subsystem" that you propose.  This includes UDF, CD9660,
> MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local storage
> filesystem in the tree except for ZFS.  Not all of them implement
> VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations for the vnode
> pager, not requirements for using buffer-cache services on block devices.
>  As Kostik pointed out in a parallel email, the only thing that was removed
> from FreeBSD was the userland interface to cached devices via /dev nodes.
>

Does this mean the Architecture Handbook page is wrong?:

http://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

[releng_8 tinderbox] failure on ia64/ia64

2012-02-14 Thread FreeBSD Tinderbox

TB --- 2012-02-15 04:26:47 - tinderbox 2.9 running on freebsd-legacy2.sentex.ca
TB --- 2012-02-15 04:26:47 - starting RELENG_8 tinderbox run for ia64/ia64
TB --- 2012-02-15 04:26:47 - cleaning the object tree
TB --- 2012-02-15 04:27:08 - cvsupping the source tree
TB --- 2012-02-15 04:27:08 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8/ia64/ia64/supfile
TB --- 2012-02-15 04:32:32 - building world
TB --- 2012-02-15 04:32:32 - CROSS_BUILD_TESTING=YES
TB --- 2012-02-15 04:32:32 - MAKEOBJDIRPREFIX=/obj
TB --- 2012-02-15 04:32:32 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2012-02-15 04:32:32 - SRCCONF=/dev/null
TB --- 2012-02-15 04:32:32 - TARGET=ia64
TB --- 2012-02-15 04:32:32 - TARGET_ARCH=ia64
TB --- 2012-02-15 04:32:32 - TZ=UTC
TB --- 2012-02-15 04:32:32 - __MAKE_CONF=/dev/null
TB --- 2012-02-15 04:32:32 - cd /src
TB --- 2012-02-15 04:32:32 - /usr/bin/make -B buildworld
>>> World build started on Wed Feb 15 04:32:33 UTC 2012
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Wed Feb 15 05:32:53 UTC 2012
TB --- 2012-02-15 05:32:53 - generating LINT kernel config
TB --- 2012-02-15 05:32:53 - cd /src/sys/ia64/conf
TB --- 2012-02-15 05:32:53 - /usr/bin/make -B LINT
TB --- 2012-02-15 05:32:53 - cd /src/sys/ia64/conf
TB --- 2012-02-15 05:32:53 - /usr/sbin/config -m LINT
TB --- 2012-02-15 05:32:53 - building LINT kernel
TB --- 2012-02-15 05:32:53 - CROSS_BUILD_TESTING=YES
TB --- 2012-02-15 05:32:53 - MAKEOBJDIRPREFIX=/obj
TB --- 2012-02-15 05:32:53 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2012-02-15 05:32:53 - SRCCONF=/dev/null
TB --- 2012-02-15 05:32:53 - TARGET=ia64
TB --- 2012-02-15 05:32:53 - TARGET_ARCH=ia64
TB --- 2012-02-15 05:32:53 - TZ=UTC
TB --- 2012-02-15 05:32:53 - __MAKE_CONF=/dev/null
TB --- 2012-02-15 05:32:53 - cd /src
TB --- 2012-02-15 05:32:53 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Wed Feb 15 05:32:53 UTC 2012
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
>>> stage 3.2: building everything
[...]
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc  
-I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
-D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
/src/sys/dev/mxge/mxge_ethp_z8e.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc  
-I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
-D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
/src/sys/dev/mxge/mxge_rss_eth_z8e.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc  
-I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
-D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
/src/sys/dev/mxge/mxge_rss_ethp_z8e.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc  
-I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
-D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  /src/sys/dev/my/if_my.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototype

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-14 Thread jflemingeds

2 of the 3 cf cards are very new, like less then 6 months old. 

I think around 65-70 percent is in use. This number doesn't change unless the 
user dumps data in a home dir, which isn't the case so far. 

You are correct that only writes are failing. Msgbuf has more then what I 
pasted but I'm pretty sure its just more of the same errors. Ill redouble my 
check. 

The other slices are very small. One is 35 meg the other is 100 some odd meg. H 
is 1.2 gig.  

I don't know if ill be able to try the dd test for a few reasons but ill check 
it out. Let me ask you this. Say zeroing out the drive works without error. 
Does that tell me anything?  

I also don't have access to smart tools as this is basically a closed system 
and the vendor would never give us access to a complier. Granted I haven't 
tried just throwing on gcc from 6.2. I could play with that or maybe since said 
vendor's dev team is keeping track of this thread they could provide said 
binary :). 

I really don't like the idea of replacing hardware as I'm looking at around 200 
boxes. I really hope it doesn't come to that. 

Thanks for the reply!

Sent via BlackBerry from T-Mobile

-Original Message-
From: Jeremy Chadwick 
Date: Mon, 13 Feb 2012 21:18:28 
To: john fleming
Cc: freebsd-stable@freebsd.org
Subject: Re: 6.2-Release ..ish.. CF + ata == freeze?

On Mon, Feb 13, 2012 at 08:43:08PM -0800, john fleming wrote:
> Just thought i would post over here as i'm not getting a warm fuzzy from 
> checkpoint about being able to find the root cause of an issue. I have a 
> large install base of IPSO checkpoint firewalls, which are based on FreeBSD 
> 6.2. I've had 3 firewalls hang basically the same way, with something that 
> looks like a filesystem issue or an?issue with a CF card. 

FreeBSD 6.2 was EOL'd in early-to-mid-2008.  The ATA driver has changed
significantly since then (present-day uses CAM).

> Does anyone happen to know of any bugs (i've been looking around) that could 
> cause something like that? Granted, it could be a batch of bad CF cards, but 
> its odd that i'm seeing the same thing on 3 different boxes and once rebooted 
> they seem ok.
> ?
> Also is it possible to get useful info form the atacontroller when things go 
> south like this from the ddb prompt?

Not particularly.  What's shown below indicates that the driver had
issued some form of ATA write command (there are multiple kinds per ATA
specification), and either the underlying media (CF/disk) or controller
stalled/locked up/took too long.  I forget what the timeout value is in
6.2; I can't be bothered to remember such from 6 years ago.  :-)

> This is what shows in show msgbuf
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> ad0: timeout waiting to issue command
> ad0: error issuing WRITE command
> g_vfs_done():ad0s4h[WRITE(offset=33849344, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=33980416, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34111488, length=131072)]error = 5
> ?g_vfs_done():ad0s4h[WRITE(offset=34242560, length=131072)]error = 5 
> g_vfs_done():ad0s4h[WRITE(offset=34373632, length=131072)]error = 5 

error 5 = EIO = Input/output error.  But this isn't too big of a
surprise given the timeouts you see prior.

Are these CF cards brand new -- meaning, are they completely unused
(having never had any writes done to them), or have they been in use a
while?  I'm betting they've been in use a while, and have probably been
doing many writes over the years.

Two things to note here:

1) The errors you've shown are only happening on writes, not reads.  Of
course if you omitted information then this isn't an accurate statement.
2) Timeouts are seen when issuing writes to some LBA regions.

How full is the CF card, disk-space-wise?  Not just ad0s4h, I'm talking
about the entire card.  How much space is roughly available?  They're
very small CF cards (1.8GByte roughly), and the less space available,
the less effectiveness of wear levelling (and in some cases the slower
the writes are).

Reason I ask: given that these are CF cards, this smells of cards which
are simply "worn down".  CF cards have limited numbers of writes, and
the card may be "freaking out" internally when attempting to write to
some LBAs which map to CF sectors that are, in effect, "bad".  The CF
cards' ECC implementation may be buggy, or may simply be "spinning hard"
for too long.  You can read about this sort of behaviour on Wikipedia's
CompactFlash article.

You wouldn't be able to verify this with dd if=/dev/ad0, because those
are read operations.  You could zero the media (dd if=/dev/zero
of=/dev/ad0) as a form of verification if you wanted.

Do you happen to know if these CF cards support SMART?  If so,
installing smartmontools (version 5.42 or newer please) and providing
output from

Re: disk devices speed is ugly

2012-02-14 Thread Scott Long

On Feb 14, 2012, at 1:02 PM, Peter Jeremy wrote:

> On 2012-Feb-13 08:28:21 -0500, Gary Palmer  wrote:
>> The filesystem is the *BEST* place to do caching.  It knows what metadata
>> is most effective to cache and what other data (e.g. file contents) doesn't
>> need to be cached.
> 
> Agreed.
> 
>> Any attempt to do this in layers between the FS and
>> the disk won't achieve the same gains as a properly written filesystem. 
> 
> Agreed - but traditionally, Unix uses this approach via block devices.
> For various reasons, FreeBSD moved caching into UFS and removed block
> devices.  Unfortunately, this means that any FS that wants caching has
> to implement its own - and currently only UFS & ZFS do.
> 
> What would be nice is a generic caching subsystem that any FS can use
> - similar to the old block devices but with hooks to allow the FS to
> request read-ahead, advise of unwanted blocks and ability to flush
> dirty blocks in a requested order with the equivalent of barriers
> (request Y will not occur until preceeding request X has been
> committed to stable media).  This would allow filesystems to regain
> the benefits of block devices with minimal effort and then improve
> performance & cache efficiency with additional work.
> 

Any filesystem that uses bread/bwrite/cluster_read are already using the 
"generic caching subsystem" that you propose.  This includes UDF, CD9660, 
MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local storage 
filesystem in the tree except for ZFS.  Not all of them implement 
VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations for the vnode 
pager, not requirements for using buffer-cache services on block devices.  As 
Kostik pointed out in a parallel email, the only thing that was removed from 
FreeBSD was the userland interface to cached devices via /dev nodes.  This has 
nothing to do with filesystems, though I suppose that could maybe sorta kinda 
be an issue for FUSE?.

ZFS isn't in this list because it implements its own private buffer/cache (the 
ARC) that understands the special requirements of ZFS.  There are good and bad 
aspects to this, noted below.

> One downside of the "each FS does its own caching" in that the caches
> are all separate and need careful integration into the VM subsystem to
> prevent starvation (eg past problems with UFS starving ZFS L2ARC).
> 

I'm not sure what you mean here.  The ARC is limited by available wired memory; 
attempts to allocate such memory will evict pages from the buffer cache as 
necessary, until all available RAM is consumed.  If anything, ZFS starves the 
rest of the system, not the other way around, and that's simply because the ARC 
isn't integrated with the normal VM.  Such integration is extremely hard and 
has nothing to do with having a generic caching subsystem.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 9-stable - ifmedia_set: no match for 0x0/0xfffffff

2012-02-14 Thread YongHyeon PYUN

On Sun, Jan 29, 2012 at 01:19:40PM +0900, Randy Bush wrote:
> > What happens if you set hw.bge.allow_asf to 0 and use auto-negotiation
> > on both sides?
> 
> it works!  the switch was already auto-neg, and i forced auto-neg on the
> server side.
> 

Apart from suspend/resume issue, bge(4) still needs more code to
handle controllers with ASF/IPMI firmware.  This part is mostly
undocumented and hard to experiment due to lack of hardware access.
Current IPMI/ASF handling code shows mixed results and setting
hw.bge.allow_asf to 0 will break IPMI support.

> thanks.  this was not pleasant.  did i remember to whine that i am in
> tokyo and the server is on the beast coast of the states?  :)
> 
> i think a bit of a warning about hw.bge.allow_asf in UPDATING might help
> folk.
> 
> thank you *very* much for your help.
> 
> randy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: CARP carpdev

2012-02-14 Thread Bjoern A. Zeeb


On 14. Feb 2012, at 22:04 , Hugo Silva wrote:

> On 02/14/12 17:33, Freddie Cash wrote:
>> On Tue, Feb 14, 2012 at 8:56 AM, Hugo Silva  wrote:
>>> Looks like there's been conversations about porting this to FreeBSD since at
>>> least 2007.
>>> 
>>> Are there any plans to have ifconfig carpdev available in 9.0-STABLE?
>> 
>> CARP support has been redone in 10-CURRENT, removing the whole "carp0"
>> pseudo-interface support, and just enabling the CARP protocol on the
>> existing network interfaces. This includes the equivalent of "carpdev"
>> support.
>> 
>> Search the -current archives for more information, CFT, and so on.
>> 
>> I don't recall seeing anything about specific plans to MFC to
>> stable/9, but could be mis-remembering things.
>> 
> 
> 
> http://svnweb.freebsd.org/base?view=revision&revision=228571
> 
> The single IP limitation may be a problem in some locations..
> 
> Did not find anything about a possible MFC either. glebius@ is cc'd, perhaps 
> he can add something, but based on 
> http://svn.freebsd.org/base/stable/9/UPDATING, I don't think it's been MFCd 
> (there's a primer for the new carp in current's UPDATING)\


There's no plans to MFC given it changes things significantly.

I however wonder if someone wants to provide a user branch in SVN to
provide regular patchsets for stable/9 and maybe even stable/8 (8.3R)
to help people not going to HEAD?

-- 
Bjoern A. Zeeb You have to have visions!
   It does not matter how good you are. It matters what good you do!

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

[releng_8 tinderbox] failure on ia64/ia64

2012-02-14 Thread FreeBSD Tinderbox

TB --- 2012-02-15 00:31:23 - tinderbox 2.9 running on freebsd-legacy2.sentex.ca
TB --- 2012-02-15 00:31:23 - starting RELENG_8 tinderbox run for ia64/ia64
TB --- 2012-02-15 00:31:23 - cleaning the object tree
TB --- 2012-02-15 00:31:23 - cvsupping the source tree
TB --- 2012-02-15 00:31:23 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8/ia64/ia64/supfile
TB --- 2012-02-15 00:36:50 - building world
TB --- 2012-02-15 00:36:50 - CROSS_BUILD_TESTING=YES
TB --- 2012-02-15 00:36:50 - MAKEOBJDIRPREFIX=/obj
TB --- 2012-02-15 00:36:50 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2012-02-15 00:36:50 - SRCCONF=/dev/null
TB --- 2012-02-15 00:36:50 - TARGET=ia64
TB --- 2012-02-15 00:36:50 - TARGET_ARCH=ia64
TB --- 2012-02-15 00:36:50 - TZ=UTC
TB --- 2012-02-15 00:36:50 - __MAKE_CONF=/dev/null
TB --- 2012-02-15 00:36:50 - cd /src
TB --- 2012-02-15 00:36:50 - /usr/bin/make -B buildworld
>>> World build started on Wed Feb 15 00:36:50 UTC 2012
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Wed Feb 15 01:37:12 UTC 2012
TB --- 2012-02-15 01:37:12 - generating LINT kernel config
TB --- 2012-02-15 01:37:12 - cd /src/sys/ia64/conf
TB --- 2012-02-15 01:37:12 - /usr/bin/make -B LINT
TB --- 2012-02-15 01:37:12 - cd /src/sys/ia64/conf
TB --- 2012-02-15 01:37:12 - /usr/sbin/config -m LINT
TB --- 2012-02-15 01:37:12 - building LINT kernel
TB --- 2012-02-15 01:37:12 - CROSS_BUILD_TESTING=YES
TB --- 2012-02-15 01:37:12 - MAKEOBJDIRPREFIX=/obj
TB --- 2012-02-15 01:37:12 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2012-02-15 01:37:12 - SRCCONF=/dev/null
TB --- 2012-02-15 01:37:12 - TARGET=ia64
TB --- 2012-02-15 01:37:12 - TARGET_ARCH=ia64
TB --- 2012-02-15 01:37:12 - TZ=UTC
TB --- 2012-02-15 01:37:12 - __MAKE_CONF=/dev/null
TB --- 2012-02-15 01:37:12 - cd /src
TB --- 2012-02-15 01:37:12 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Wed Feb 15 01:37:12 UTC 2012
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
>>> stage 3.2: building everything
[...]
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc  
-I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
-D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
/src/sys/dev/mxge/mxge_ethp_z8e.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc  
-I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
-D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
/src/sys/dev/mxge/mxge_rss_eth_z8e.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc  
-I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
-D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
/src/sys/dev/mxge/mxge_rss_ethp_z8e.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc  
-I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src 
-D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  /src/sys/dev/my/if_my.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototype

Re: disk devices speed is ugly

2012-02-14 Thread Konstantin Belousov

On Wed, Feb 15, 2012 at 07:02:58AM +1100, Peter Jeremy wrote:
> On 2012-Feb-13 08:28:21 -0500, Gary Palmer  wrote:
> >The filesystem is the *BEST* place to do caching.  It knows what metadata
> >is most effective to cache and what other data (e.g. file contents) doesn't
> >need to be cached.
> 
> Agreed.
> 
> >  Any attempt to do this in layers between the FS and
> >the disk won't achieve the same gains as a properly written filesystem. 
> 
> Agreed - but traditionally, Unix uses this approach via block devices.
> For various reasons, FreeBSD moved caching into UFS and removed block
> devices.  Unfortunately, this means that any FS that wants caching has
> to implement its own - and currently only UFS & ZFS do.
Block caching is still there, only user-accessible interface was removed.
UFS utilizes the buffer cache for the device which carries the volume,
for metadata caching. There are some memory areas in UFS which can be
classified as caches on its own, but their existence is mostly to support
operation, and not caching (e.g. the inodeblock copy accompaniying each
inode).

> 
> What would be nice is a generic caching subsystem that any FS can use
> - similar to the old block devices but with hooks to allow the FS to
> request read-ahead, advise of unwanted blocks and ability to flush
> dirty blocks in a requested order with the equivalent of barriers
> (request Y will not occur until preceeding request X has been
> committed to stable media).  This would allow filesystems to regain
> the benefits of block devices with minimal effort and then improve
> performance & cache efficiency with additional work.
> 
> One downside of the "each FS does its own caching" in that the caches
> are all separate and need careful integration into the VM subsystem to
> prevent starvation (eg past problems with UFS starving ZFS L2ARC).
Other filesystems which use vfs_bio, like cd9660 or ufs, use the same
disk cache layer as UFS.


pgpqbAGs3GLrm.pgp
Description: PGP signature

Re: ZFS + nullfs + Linuxulator = panic?

2012-02-14 Thread Konstantin Belousov

On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last 
> built 2012-02-08).  It will panic during the daily periodic scripts that run 
> at 3am.  Here is the most recent panic message:
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer = 0x20:0x8069d266
> stack pointer   = 0x28:0xff8094b90390
> frame pointer   = 0x28:0xff8094b903a0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= resume, IOPL = 0
> current process = 72566 (ps)
> trap number = 9
> panic: general protection fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0x8062cf8e at kdb_backtrace+0x5e
> #1 0x805facd3 at panic+0x183
> #2 0x808e6c20 at trap_fatal+0x290
> #3 0x808e715a at trap+0x10a
> #4 0x808cec64 at calltrap+0x8
> #5 0x805ee034 at fill_kinfo_thread+0x54
> #6 0x805eee76 at fill_kinfo_proc+0x586
> #7 0x805f22b8 at sysctl_out_proc+0x48
> #8 0x805f26c8 at sysctl_kern_proc+0x278
> #9 0x8060473f at sysctl_root+0x14f
> #10 0x80604a2a at userland_sysctl+0x14a
> #11 0x80604f1a at __sysctl+0xaa
> #12 0x808e62d4 at amd64_syscall+0x1f4
> #13 0x808cef5c at Xfast_syscall+0xfc

Please look up the line number for the fill_kinfo_thread+0x54.


pgpJipexj3Uac.pgp
Description: PGP signature

RE: Reducing the need to compile a custom kernel

2012-02-14 Thread Scott, Brian

>>  - CPU_SOEKRIS, CPU_GEODE, CPU_ELAN, NO_SWAPPING for embedded devices
>
>Embedded devices are out of the scope of this, normally you do a lot of
other modifictions to such systems anyway, so a custom kernel should be
not a >big problem.

Just as a quick data point here, I have just installed FreeBSD onto an
ALIX system and was hoping to keep everything very standard.

Turns out that I needed to rebuild the kernel to add CPU_GEODE to get a
few simple features added. Everything else is standard GENERIC because
I'm too lazy to fine tune. The geode code is very small and I would
expect completely harmless if left enabled in GENERIC. The overhead of
including it for other systems would be a few extra compares during
startup and a k or so extra size in the kernel.

I would suggest that avoiding custom kernels to make trivial changes is
exactly what you should be looking at. Make features like this removable
for the people who want to fine tune their kernels but include for
people who are happy to have a little overhead as a trade of for ease of
management.

The only other thing that regularly has me running custom kernels is
IPFIREWALL_FORWARD. As others have said, I'd be very happy if that was
the default but removable.

Brian Scott
**
This message is intended for the addressee named and may contain
privileged information or confidential information or both. If you
are not the intended recipient please delete it and notify the sender.
**
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS + nullfs + Linuxulator = panic?

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last 
> built 2012-02-08).  It will panic during the daily periodic scripts that run 
> at 3am.  Here is the most recent panic message:
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer = 0x20:0x8069d266
> stack pointer   = 0x28:0xff8094b90390
> frame pointer   = 0x28:0xff8094b903a0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= resume, IOPL = 0
> current process = 72566 (ps)
> trap number = 9
> panic: general protection fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0x8062cf8e at kdb_backtrace+0x5e
> #1 0x805facd3 at panic+0x183
> #2 0x808e6c20 at trap_fatal+0x290
> #3 0x808e715a at trap+0x10a
> #4 0x808cec64 at calltrap+0x8
> #5 0x805ee034 at fill_kinfo_thread+0x54
> #6 0x805eee76 at fill_kinfo_proc+0x586
> #7 0x805f22b8 at sysctl_out_proc+0x48
> #8 0x805f26c8 at sysctl_kern_proc+0x278
> #9 0x8060473f at sysctl_root+0x14f
> #10 0x80604a2a at userland_sysctl+0x14a
> #11 0x80604f1a at __sysctl+0xaa
> #12 0x808e62d4 at amd64_syscall+0x1f4
> #13 0x808cef5c at Xfast_syscall+0xfc
> Uptime: 3d19h6m0s
> Dumping 1308 out of 2028 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91%
> Dump complete
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> 
> 
> The reason for the subject line is that I have another RELENG_8 system that 
> uses ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs 
> is not the problem.  I am wondering if it is the combination of the three 
> that is deadly, here.
> 
> Both RELENG_8 systems are root-on-ZFS installs.  Each night there is a 
> separate backup script that runs and completes before the regular "periodic 
> daily" run.  This script takes a recursive snapshot of the ZFS pool and then 
> mounts these snapshots via mount_nullfs to provide a coherent view of the 
> filesystem under /backup.  The only difference between the two RELENG_8 
> systems is that one uses rsync to back up /backup to another machine and the 
> other uses the Linux Tivoli TSM client to back up /backup to a TSM server.  
> After the backup is completed, a script runs that unmounts the nullfs file 
> systems and then destroys the ZFS snapshot.
> 
> The first (rsync backup) RELENG_8 system does not panic.  It has been running 
> the ZFS + nullfs rsync backup job without incident for weeks now.  The second 
> (Tivoli TSM) RELENG_8 will reliably panic when the subsequent "periodic 
> daily" job runs.  (It is using the 32-bit TSM 6.2.4 Linux client running 
> "dsmc schedule" via the linux_base-f10-10_4 package.)  The actual ZFS + 
> nullfs Tivoli TSM backup job appears to run successfully, making me wonder if 
> perhaps it has some memory leak or other subtle corruption that sets up the 
> ensuing panic when the "periodic daily" job later gives the system a workout.
> 
> If I can provide more information about the panic, please let me know.  
> Despite the message about dumping in the panic output above, when the system 
> reboots I get a "No core dumps found" message during boot.  (I have 
> dumpdev="AUTO" set in /etc/rc.conf.)  My swap device is on separate 
> partitions but is mirrored using geom_mirror as /dev/mirror/swap.  Do crash 
> dumps to gmirror devices work on RELENG_8?

See gmirror(8) man page, section NOTES.  Read the full thing.

> Does anyone have any idea what is to blame for the panic, or how I can fix or 
> work around it?

Does the panic always happen when "ps" is run?  That's what's shown in
the above panic message.  Quoting:

> current process = 72566 (ps)

And I'm inclined to think it does, based on the backtrace:

> #5 0x805ee034 at fill_kinfo_thread+0x54
> #6 0x805eee76 at fill_kinfo_proc+0x586
> #7 0x805f22b8 at sysctl_out_proc+0x48
> #8 0x805f26c8 at sysctl_kern_proc+0x278

But if you can go through the previous panics and confirm that, it would
be helpful to developers in tracking down the problem.

Sorry I can't be of any more assistance than this.

-- 
| Jeremy Chadwick  jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: freebsd 9-stable TOP problem from around Jan 10

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 03:35:01PM -0800, Julian Elischer wrote:
> On 2/14/12 10:38 AM, Kevin Oberman wrote:
> >On Tue, Feb 14, 2012 at 12:23 AM, Julian Elischer  wrote:
> >>Has anyone else seen a  problem with top -H -S?
> >>
> >>after a short while the screen gets more and more corrupted..
> >>
> >>hitting ^L or turning off S&  H modes helps .. for a while.
> >>
> >>If this is a known fixed problem, let me know but I need to co-ordinate with
> >>others
> >>to upgrade the machine in question.
> >Not seeing it here on 9-stable. Could it be a display issue? I am
> >using gnome-terminal with TERM defined as 'xterm'.
> 
> yeah I'm on a mac with iterm, but running through 'screen' .
> 
> it's never been a problem before.. just since we upgraded to 9-stable.

If you remove GNU screen from the picture does the problem go away?  If
so, I'm not surprised.  :-)

Make sure that when you're using GNU screen, that all shells launched
"under/within" screen have TERM=screen.  If they don't, then this is
almost certainly the problem -- GNU screen "translates" between terminal
types, meaning it translates its own terminal type ("screen") into
whatever TERM is currently attached ("xterm", "iterm", whatever).  See
the last 4 paragraphs of my post here to understand what exactly GNU
screen is doing:

http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/063052.html

So, in general, make sure your dotfiles and so on don't mess about with
the $TERM environment variable and you should generally be okay.

If within GNU screen TERM=screen and you see the problem, but outside of
screen you use TERM=xterm (or something else) but don't see the problem,
then I would almost certainly blame GNU screen.  If you're looking for
something that simply keeps a terminal running in the background, try
nohup or tmux.

Alternately, possibly someone added a "screen" entry to /etc/termcap on
RELENG_9?  I don't use 9 so I have no way to confirm this, but on 8
there is no such entry.

-- 
| Jeremy Chadwick  jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

HEADS UP: Xen merge coming to stable/8

2012-02-14 Thread Kenneth D. Merry

Hi folks,

I'm planning to merge almost all of the Xen changes from FreeBSD/head into
stable/8 soon.

This should bring more features, stability, etc.

I've attached what will be the commit message.

If there are any objections, speak now.

Ken
-- 
Kenneth Merry
k...@freebsd.org
MFC r215818, r216405, r216437, r216448, r216956, r221827, r222975, r223059,
r225343, r225704, r225705, r225706, r225707, r225709, r226029, r220647,
r230183, r230587, r230916, r228526, r230879:

Bring Xen support in stable/8 up to parity with head.

  r215818 | cperciva | 2010-11-25 08:05:21 -0700 (Thu, 25 Nov 2010) | 5 lines
  
  Rename HYPERVISOR_multicall (which performs the multicall hypercall) to
  _HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which
  invokes _HYPERVISOR_multicall and checks that the individual hypercalls all
  succeeded.
  
  r216405 | rwatson | 2010-12-13 05:15:46 -0700 (Mon, 13 Dec 2010) | 7 lines
  
  Add options NO_ADAPTIVE_SX to the XENHVM kernel configuration, matching
  its similar disabling of adaptive mutexes and rwlocks.  The existing
  comment on why this is the case also applies to sx locks.
  
  MFC after:3 days
  Discussed with:   attilio
  
  r216437 | gibbs | 2010-12-14 10:23:49 -0700 (Tue, 14 Dec 2010) | 2 lines
  
  Remove spurious printf left over from debugging our XenStore support.
  
  r216448 | gibbs | 2010-12-14 13:57:40 -0700 (Tue, 14 Dec 2010) | 4 lines
  
  Fix a typo in a comment.
  
  Noticed by:   Attila Nagy 
  
  r216956 | rwatson | 2011-01-04 07:49:54 -0700 (Tue, 04 Jan 2011) | 8 lines
  
  Make "options XENHVM" compile for i386, not just amd64 -- a largely
  mechanical change.  This opens the door for using PV device drivers
  under Xen HVM on i386, as well as more general harmonisation of i386
  and amd64 Xen support in FreeBSD.
  
  Reviewed by:  cperciva
  MFC after:3 weeks
  
  r221827 | mav | 2011-05-12 21:40:16 -0600 (Thu, 12 May 2011) | 2 lines
  
  Fix msleep() usage in Xen balloon driver to not wake up on every HZ tick.
  
  r222975 | gibbs | 2011-06-10 22:59:01 -0600 (Fri, 10 Jun 2011) | 63 lines
  
  Monitor and emit events for XenStore changes to XenBus trees
  of the devices we manage.  These changes can be due to writes
  we make ourselves or due to changes made by the control domain.
  The goal of these changes is to insure that all state transitions
  can be detected regardless of their source and to allow common
  device policies (e.g. "onlined" backend devices) to be centralized
  in the XenBus bus code.
  
  sys/xen/xenbus/xenbusvar.h:
  sys/xen/xenbus/xenbus.c:
  sys/xen/xenbus/xenbus_if.m:
Add a new method for XenBus drivers "localend_changed".
This method is invoked whenever a write is detected to
a device's XenBus tree.  The default implementation of
this method is a no-op.
  
  sys/xen/xenbus/xenbus_if.m:
  sys/dev/xen/netfront/netfront.c:
  sys/dev/xen/blkfront/blkfront.c:
  sys/dev/xen/blkback/blkback.c:
Change the signature of the "otherend_changed" method.
This notification cannot fail, so it should return void.
  
  sys/xen/xenbus/xenbusb_back.c:
Add "online" device handling to the XenBus Back Bus
support code.  An online backend device remains active
after a front-end detaches as a reconnect is expected
to occur in the near future.
  
  sys/xen/interface/io/xenbus.h:
Add comment block further explaining the meaning and
driver responsibilities associated with the XenBus
Closed state.
  
  sys/xen/xenbus/xenbusb.c:
  sys/xen/xenbus/xenbusb.h:
  sys/xen/xenbus/xenbusb_back.c:
  sys/xen/xenbus/xenbusb_front.c:
  sys/xen/xenbus/xenbusb_if.m:
o Register a XenStore watch against the local XenBus tree
  for all devices.
o Cache the string length of the path to our local tree.
o Allow the xenbus front and back drivers to hook/filter both
  local and otherend watch processing.
o Update the device ivar version of "state" when we detect
  a XenStore update of that node.
  
  sys/dev/xen/control/control.c:
  sys/xen/xenbus/xenbus.c:
  sys/xen/xenbus/xenbusb.c:
  sys/xen/xenbus/xenbusb.h:
  sys/xen/xenbus/xenbusvar.h:
  sys/xen/xenstore/xenstorevar.h:
Allow clients of the XenStore watch mechanism to attach
a single uintptr_t worth of client data to the watch.
This removes the need to carefully place client watch
data within enclosing objects so that a cast or offsetof
calculation can be used to convert from watch to enclosing
object.
  
  Sponsored by: Spectra Logic Corporation
  MFC after:1 week
  
  r223059 | gibbs | 2011-06-13 14:36:29 -0600 (Mon, 13 Jun 2011) | 36 lines
  
  Several enhancements to the Xen block back driver.
  
  sys/dev/xen/blkback/blkback.c:
o Implement front-end request coalescing.  This greatly improves the
  performance of front-end clients that are unaware of the dynamic

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Scott Long


On Feb 14, 2012, at 4:34 PM, Victor Balada Diaz wrote:

> On Tue, Feb 14, 2012 at 03:09:58PM -0800, Jeremy Chadwick wrote:
>> On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote:
>>> On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
 schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
> On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
>> Hello,
>> 
>> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
>> persists on FreeBSD 9.0 release.
>> 
>> Switching from ahci to ataahci resolved the problem for me too.
>> 
>> I'm using gmirror for swap, system is on a zpool and the problem first
>> occurred during a zpool scrub, but it is easily reproducible with dd.
>> 
>> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
>> of=/dev/null is not an issue.
>> Sometimes I need to power off the server because after a reboot one disk
>> is still missing.
>> 
>> I really would like to help in this issue, so let me know if you need
>> any more information.
> I find it interesting that, at least so far, the only people reporting
> problems of this type with the ahci.ko driver are people using Samsung
> disks.  The only difference is that your models are F1s while the OPs
> are F2s.
 
 I saw such timeouts long ago and mav@ had a look at my postings and he
 mentioned it could be a NCQ problem.
 I suspected the disks firmware.
 I never tracked it down further, because after replacing the Samsung (F3
 in that case) disks with hitachi ones solved all my problems and gave a
 big performance kick as well (with zfs).
 You can find the discussion here:
 http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
 
>>> 
>>> You gave me a good idea: try to disable NCQ and see if that's the fault. So
>>> i went and applied the attached patch. After it, i can no longer reproduce
>>> the issue with ahci driver.
>>> 
>>> I know this is not a solution because it disables NCQ at controller level
>>> instead of disk level, but at least we know for sure where the problem is.
>>> 
>>> I think the solution would be to add a new quirk ADA_Q_NONCQ in 
>>> sys/cam/ata/ata_da.c.
>>> Quirks infraestructure is already built, so adding a new quirk for this 
>>> seems
>>> easy.
>>> 
>>> Is someone interested? Do you think there is a better solution?
>>> 
>>> If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and 
>>> add my drives
>>> to it.
>> 
>> I took a stab at this, but I don't feel confident this is the proper
>> solution/method.  I worry there's some sort of chicken-or-the-egg
>> condition here (quirk setup/matching comes *after* SATA capabilities
>> detection), or that it makes the code messier.  Need mav@'s
>> recommendations on this.
>> 
>> Below is for RELENG_8.  I should note I haven't tested if this works, or
>> even compiles -- normally I don't provide such patches without testing
>> so I apologise in advance / user beware.
> 
> You're amazingly fast. Thanks for all your help :)
> 
> You start applying the quirks before 
> 
>snprintf(announce_buf, sizeof(announce_buf),
>"kern.cam.ada.%d.quirks", periph->unit_number);
>quirks = softc->quirks;
>TUNABLE_INT_FETCH(announce_buf, &quirks);
> 
> So you're breaking quirk setting at boot time.
> 
> See my attached patch. I can confirm it works for me.
> 
> Regards.
> 

I don't think that disabling NCQ entirely is the right solution.  It's a tag 
starvation issue in the firmware, not a complete failure, and it can be dealt 
with in the CAM XPT scheduler fairly efficiently.  Alexander and I talked about 
this recently, and though we differ on the details, a tag hack is not in order, 
IMHO.  In the short term, try just using "cam control tags ada0 -N 1" to limit 
the concurrent commands to 1.

Scott


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Jeremy Chadwick

On Wed, Feb 15, 2012 at 12:34:20AM +0100, Victor Balada Diaz wrote:
> On Tue, Feb 14, 2012 at 03:09:58PM -0800, Jeremy Chadwick wrote:
> > On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote:
> > > On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
> > > >  schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
> > > > > On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
> > > > >> Hello,
> > > > >>
> > > > >> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it 
> > > > >> still
> > > > >> persists on FreeBSD 9.0 release.
> > > > >>
> > > > >> Switching from ahci to ataahci resolved the problem for me too.
> > > > >>
> > > > >> I'm using gmirror for swap, system is on a zpool and the problem 
> > > > >> first
> > > > >> occurred during a zpool scrub, but it is easily reproducible with dd.
> > > > >>
> > > > >> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
> > > > >> of=/dev/null is not an issue.
> > > > >> Sometimes I need to power off the server because after a reboot one 
> > > > >> disk
> > > > >> is still missing.
> > > > >>
> > > > >> I really would like to help in this issue, so let me know if you need
> > > > >> any more information.
> > > > > I find it interesting that, at least so far, the only people reporting
> > > > > problems of this type with the ahci.ko driver are people using Samsung
> > > > > disks.  The only difference is that your models are F1s while the OPs
> > > > > are F2s.
> > > > 
> > > > I saw such timeouts long ago and mav@ had a look at my postings and he
> > > > mentioned it could be a NCQ problem.
> > > > I suspected the disks firmware.
> > > > I never tracked it down further, because after replacing the Samsung (F3
> > > > in that case) disks with hitachi ones solved all my problems and gave a
> > > > big performance kick as well (with zfs).
> > > > You can find the discussion here:
> > > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> > > > 
> > > 
> > > You gave me a good idea: try to disable NCQ and see if that's the fault. 
> > > So
> > > i went and applied the attached patch. After it, i can no longer reproduce
> > > the issue with ahci driver.
> > > 
> > > I know this is not a solution because it disables NCQ at controller level
> > > instead of disk level, but at least we know for sure where the problem is.
> > > 
> > > I think the solution would be to add a new quirk ADA_Q_NONCQ in 
> > > sys/cam/ata/ata_da.c.
> > > Quirks infraestructure is already built, so adding a new quirk for this 
> > > seems
> > > easy.
> > > 
> > > Is someone interested? Do you think there is a better solution?
> > > 
> > > If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and 
> > > add my drives
> > > to it.
> > 
> > I took a stab at this, but I don't feel confident this is the proper
> > solution/method.  I worry there's some sort of chicken-or-the-egg
> > condition here (quirk setup/matching comes *after* SATA capabilities
> > detection), or that it makes the code messier.  Need mav@'s
> > recommendations on this.
> > 
> > Below is for RELENG_8.  I should note I haven't tested if this works, or
> > even compiles -- normally I don't provide such patches without testing
> > so I apologise in advance / user beware.
> 
> You're amazingly fast. Thanks for all your help :)
> 
> You start applying the quirks before 
> 
> snprintf(announce_buf, sizeof(announce_buf),
> "kern.cam.ada.%d.quirks", periph->unit_number);
> quirks = softc->quirks;
> TUNABLE_INT_FETCH(announce_buf, &quirks);
> 
> So you're breaking quirk setting at boot time.

I'm too tired to quite understand (in full) what's wrong with my patch,
but I think you're referring to situations where someone would have
kern.cam.ada.X.quirks set in loader.conf?

If so, I believe that same situation would happen presently if someone
set kern.cam.ada.X.quirks in their loader.conf to a value that did not
contain bit #0 set to 1, and used one of the 4K sector disks listed in
ada_quirk_table -- what's in loader.conf looks like it would overwrite
whatever the kernel code bits chose automatically:

 910 match = cam_quirkmatch((caddr_t)&cgd->ident_data,
 911(caddr_t)ada_quirk_table,
 912
sizeof(ada_quirk_table)/sizeof(*ada_quirk_table),
 913sizeof(*ada_quirk_table), 
ata_identify_match);
 914 if (match != NULL)
 915 softc->quirks = ((struct ada_quirk_entry *)match)->quirks;
 916 else
 917 softc->quirks = ADA_Q_NONE;
 ...
 931 snprintf(announce_buf, sizeof(announce_buf),
 932 "kern.cam.ada.%d.quirks", periph->unit_number);
 933 quirks = softc->quirks;
 934 TUNABLE_INT_FETCH(announce_buf, &quirks);
 935 softc->quirks = quirks;

I read this to mean:

Lines 910-917 -- if there's a device ID st

ZFS + nullfs + Linuxulator = panic?

2012-02-14 Thread Paul Mather

I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last 
built 2012-02-08).  It will panic during the daily periodic scripts that run at 
3am.  Here is the most recent panic message:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0x8069d266
stack pointer   = 0x28:0xff8094b90390
frame pointer   = 0x28:0xff8094b903a0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 72566 (ps)
trap number = 9
panic: general protection fault
cpuid = 0
KDB: stack backtrace:
#0 0x8062cf8e at kdb_backtrace+0x5e
#1 0x805facd3 at panic+0x183
#2 0x808e6c20 at trap_fatal+0x290
#3 0x808e715a at trap+0x10a
#4 0x808cec64 at calltrap+0x8
#5 0x805ee034 at fill_kinfo_thread+0x54
#6 0x805eee76 at fill_kinfo_proc+0x586
#7 0x805f22b8 at sysctl_out_proc+0x48
#8 0x805f26c8 at sysctl_kern_proc+0x278
#9 0x8060473f at sysctl_root+0x14f
#10 0x80604a2a at userland_sysctl+0x14a
#11 0x80604f1a at __sysctl+0xaa
#12 0x808e62d4 at amd64_syscall+0x1f4
#13 0x808cef5c at Xfast_syscall+0xfc
Uptime: 3d19h6m0s
Dumping 1308 out of 2028 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91%
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...


The reason for the subject line is that I have another RELENG_8 system that 
uses ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs is 
not the problem.  I am wondering if it is the combination of the three that is 
deadly, here.

Both RELENG_8 systems are root-on-ZFS installs.  Each night there is a separate 
backup script that runs and completes before the regular "periodic daily" run.  
This script takes a recursive snapshot of the ZFS pool and then mounts these 
snapshots via mount_nullfs to provide a coherent view of the filesystem under 
/backup.  The only difference between the two RELENG_8 systems is that one uses 
rsync to back up /backup to another machine and the other uses the Linux Tivoli 
TSM client to back up /backup to a TSM server.  After the backup is completed, 
a script runs that unmounts the nullfs file systems and then destroys the ZFS 
snapshot.

The first (rsync backup) RELENG_8 system does not panic.  It has been running 
the ZFS + nullfs rsync backup job without incident for weeks now.  The second 
(Tivoli TSM) RELENG_8 will reliably panic when the subsequent "periodic daily" 
job runs.  (It is using the 32-bit TSM 6.2.4 Linux client running "dsmc 
schedule" via the linux_base-f10-10_4 package.)  The actual ZFS + nullfs Tivoli 
TSM backup job appears to run successfully, making me wonder if perhaps it has 
some memory leak or other subtle corruption that sets up the ensuing panic when 
the "periodic daily" job later gives the system a workout.

If I can provide more information about the panic, please let me know.  Despite 
the message about dumping in the panic output above, when the system reboots I 
get a "No core dumps found" message during boot.  (I have dumpdev="AUTO" set in 
/etc/rc.conf.)  My swap device is on separate partitions but is mirrored using 
geom_mirror as /dev/mirror/swap.  Do crash dumps to gmirror devices work on 
RELENG_8?

Does anyone have any idea what is to blame for the panic, or how I can fix or 
work around it?

Cheers,

Paul.

PS: The uptime of three days in the panic message is because I disabled the 
Tivoli TSM backup job on Friday so it would not run over the weekend.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: New BSD Installer

2012-02-14 Thread Devin Teske

> -Original Message-
> From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-
> sta...@freebsd.org] On Behalf Of Mike Andrews
> Sent: Tuesday, February 14, 2012 1:11 PM
> To: freebsd-stable@freebsd.org
> Subject: Re: New BSD Installer
> 
> On 2/14/2012 3:05 PM, Devin Teske wrote:
> > Please don't get rid of fdisk or bsdlabel as they are (and forever will be)
> > required to do things like:
> >
> > 1. scripted formatting of a thumb drive
> >
> > 2. automated probing of disk information (fdisk -p)
> >
> > 3. Other tasks that are not suitably handled by curses-based utilities
> >
> > For example, the following command will create a second Windows partition on
> a
> > thumb drive without user interaction:
> >
> > echo "p 2 0x0c * *" | fdisk -f - /dev/da0
> >
> > If you take away fdisk, how am I supposed to achieve the above?
> 
> /sbin/gpart add -t 12 -i 2 da0
> 

I stand corrected.

Ok, remove at-will but not before 10.0 please. Looking for 9.x to be the
transitional phase.
-- 
Devin

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Victor Balada Diaz

On Tue, Feb 14, 2012 at 03:09:58PM -0800, Jeremy Chadwick wrote:
> On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote:
> > On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
> > >  schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
> > > > On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
> > > >> Hello,
> > > >>
> > > >> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it 
> > > >> still
> > > >> persists on FreeBSD 9.0 release.
> > > >>
> > > >> Switching from ahci to ataahci resolved the problem for me too.
> > > >>
> > > >> I'm using gmirror for swap, system is on a zpool and the problem first
> > > >> occurred during a zpool scrub, but it is easily reproducible with dd.
> > > >>
> > > >> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
> > > >> of=/dev/null is not an issue.
> > > >> Sometimes I need to power off the server because after a reboot one 
> > > >> disk
> > > >> is still missing.
> > > >>
> > > >> I really would like to help in this issue, so let me know if you need
> > > >> any more information.
> > > > I find it interesting that, at least so far, the only people reporting
> > > > problems of this type with the ahci.ko driver are people using Samsung
> > > > disks.  The only difference is that your models are F1s while the OPs
> > > > are F2s.
> > > 
> > > I saw such timeouts long ago and mav@ had a look at my postings and he
> > > mentioned it could be a NCQ problem.
> > > I suspected the disks firmware.
> > > I never tracked it down further, because after replacing the Samsung (F3
> > > in that case) disks with hitachi ones solved all my problems and gave a
> > > big performance kick as well (with zfs).
> > > You can find the discussion here:
> > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> > > 
> > 
> > You gave me a good idea: try to disable NCQ and see if that's the fault. So
> > i went and applied the attached patch. After it, i can no longer reproduce
> > the issue with ahci driver.
> > 
> > I know this is not a solution because it disables NCQ at controller level
> > instead of disk level, but at least we know for sure where the problem is.
> > 
> > I think the solution would be to add a new quirk ADA_Q_NONCQ in 
> > sys/cam/ata/ata_da.c.
> > Quirks infraestructure is already built, so adding a new quirk for this 
> > seems
> > easy.
> > 
> > Is someone interested? Do you think there is a better solution?
> > 
> > If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and 
> > add my drives
> > to it.
> 
> I took a stab at this, but I don't feel confident this is the proper
> solution/method.  I worry there's some sort of chicken-or-the-egg
> condition here (quirk setup/matching comes *after* SATA capabilities
> detection), or that it makes the code messier.  Need mav@'s
> recommendations on this.
> 
> Below is for RELENG_8.  I should note I haven't tested if this works, or
> even compiles -- normally I don't provide such patches without testing
> so I apologise in advance / user beware.

You're amazingly fast. Thanks for all your help :)

You start applying the quirks before 

snprintf(announce_buf, sizeof(announce_buf),
"kern.cam.ada.%d.quirks", periph->unit_number);
quirks = softc->quirks;
TUNABLE_INT_FETCH(announce_buf, &quirks);

So you're breaking quirk setting at boot time.

See my attached patch. I can confirm it works for me.

Regards.

-- 
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros. 
--- ata_da.c	2012-02-14 22:17:54.0 +0100
+++ ata_da.c	2012-02-14 22:58:05.0 +0100
@@ -91,6 +91,7 @@
 typedef enum {
 	ADA_Q_NONE		= 0x00,
 	ADA_Q_4K		= 0x01,
+	ADA_Q_NONCQ		= 0x02,
 } ada_quirks;
 
 typedef enum {
@@ -162,6 +163,14 @@
 		/*quirks*/ADA_Q_4K
 	},
 	{
+		/* 
+		 * Samsung have NCQ broken:
+		 * http://lists.freebsd.org/pipermail/freebsd-stable/2012-February/066168.html
+		 */
+		{ T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD154UI*", "*" },
+		/*quirks*/ADA_Q_NONCQ
+	},
+	{
 		/* Samsung Advanced Format (4k) drives */
 		{ T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD155UI*", "*" },
 		/*quirks*/ADA_Q_4K
@@ -967,6 +976,10 @@
 	softc->disk->d_maxsize = maxio;
 	softc->disk->d_unit = periph->unit_number;
 	softc->disk->d_flags = 0;
+	/* Disable NCQ if needed */
+	if (softc->flags & ADA_FLAG_CAN_NCQ &&
+	softc->quirks & ADA_Q_NONCQ)
+	  softc->flags ^= ADA_FLAG_CAN_NCQ;
 	if (softc->flags & ADA_FLAG_CAN_FLUSHCACHE)
 		softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE;
 	if ((softc->flags & ADA_FLAG_CAN_TRIM) ||
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: Why won't 8.2 umount -f?

2012-02-14 Thread Devin Teske

> -Original Message-
> From: owner-freebsd...@freebsd.org [mailto:owner-freebsd...@freebsd.org]
> On Behalf Of Doug Barton
> Sent: Tuesday, February 14, 2012 1:05 PM
> To: Rick Macklem
> Cc: freebsd...@freebsd.org; freebsd-stable@FreeBSD.org
> Subject: Re: Why won't 8.2 umount -f?
> 
> On 02/14/2012 08:39, Rick Macklem wrote:
> > I took a look and they seem to have been MFC'd.
> 
> That's awesome! Thanks for your time on this. I guess we've got some
> upgrading to do.
> 

+1

Awaiting 8.3 with bated breath!
-- 
Devin

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: freebsd 9-stable TOP problem from around Jan 10

2012-02-14 Thread Julian Elischer


On 2/14/12 10:38 AM, Kevin Oberman wrote:

On Tue, Feb 14, 2012 at 12:23 AM, Julian Elischer  wrote:

Has anyone else seen a  problem with top -H -S?

after a short while the screen gets more and more corrupted..

hitting ^L or turning off S&  H modes helps .. for a while.

If this is a known fixed problem, let me know but I need to co-ordinate with
others
to upgrade the machine in question.

Not seeing it here on 9-stable. Could it be a display issue? I am
using gnome-terminal with TERM defined as 'xterm'.


yeah I'm on a mac with iterm, but running through 'screen' .

it's never been a problem before.. just since we upgraded to 9-stable.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote:
> On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
> >  schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
> > > On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
> > >> Hello,
> > >>
> > >> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
> > >> persists on FreeBSD 9.0 release.
> > >>
> > >> Switching from ahci to ataahci resolved the problem for me too.
> > >>
> > >> I'm using gmirror for swap, system is on a zpool and the problem first
> > >> occurred during a zpool scrub, but it is easily reproducible with dd.
> > >>
> > >> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
> > >> of=/dev/null is not an issue.
> > >> Sometimes I need to power off the server because after a reboot one disk
> > >> is still missing.
> > >>
> > >> I really would like to help in this issue, so let me know if you need
> > >> any more information.
> > > I find it interesting that, at least so far, the only people reporting
> > > problems of this type with the ahci.ko driver are people using Samsung
> > > disks.  The only difference is that your models are F1s while the OPs
> > > are F2s.
> > 
> > I saw such timeouts long ago and mav@ had a look at my postings and he
> > mentioned it could be a NCQ problem.
> > I suspected the disks firmware.
> > I never tracked it down further, because after replacing the Samsung (F3
> > in that case) disks with hitachi ones solved all my problems and gave a
> > big performance kick as well (with zfs).
> > You can find the discussion here:
> > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> > 
> 
> You gave me a good idea: try to disable NCQ and see if that's the fault. So
> i went and applied the attached patch. After it, i can no longer reproduce
> the issue with ahci driver.
> 
> I know this is not a solution because it disables NCQ at controller level
> instead of disk level, but at least we know for sure where the problem is.
> 
> I think the solution would be to add a new quirk ADA_Q_NONCQ in 
> sys/cam/ata/ata_da.c.
> Quirks infraestructure is already built, so adding a new quirk for this seems
> easy.
> 
> Is someone interested? Do you think there is a better solution?
> 
> If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and add 
> my drives
> to it.

I took a stab at this, but I don't feel confident this is the proper
solution/method.  I worry there's some sort of chicken-or-the-egg
condition here (quirk setup/matching comes *after* SATA capabilities
detection), or that it makes the code messier.  Need mav@'s
recommendations on this.

Below is for RELENG_8.  I should note I haven't tested if this works, or
even compiles -- normally I don't provide such patches without testing
so I apologise in advance / user beware.

-- 
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

diff -ruN /usr/src/sys/cam/ata/ata_da.c src/sys/cam/ata/ata_da.c
--- /usr/src/sys/cam/ata/ata_da.c   2012-02-10 17:22:25.0 -0800
+++ src/sys/cam/ata/ata_da.c2012-02-14 15:07:07.988814133 -0800
@@ -90,7 +90,8 @@
 
 typedef enum {
ADA_Q_NONE  = 0x00,
-   ADA_Q_4K= 0x01,
+   ADA_Q_4K= 0x01, /* 4k sectors */
+   ADA_Q_NONCQ = 0x02, /* device has flaky NCQ support */
 } ada_quirks;
 
 typedef enum {
@@ -162,6 +163,11 @@
/*quirks*/ADA_Q_4K
},
{
+   /* Samsung Spinpoint F2 EG (EcoGreen) drives */
+   { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD154UI*", "*" },
+   /*quirks*/ADA_Q_NONCQ,
+   },
+   {
/* Samsung Advanced Format (4k) drives */
{ T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD155UI*", "*" },
/*quirks*/ADA_Q_4K
@@ -887,9 +893,6 @@
softc->flags |= ADA_FLAG_CAN_FLUSHCACHE;
if (cgd->ident_data.support.command1 & ATA_SUPPORT_POWERMGT)
softc->flags |= ADA_FLAG_CAN_POWERMGT;
-   if (cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ &&
-   (cgd->inq_flags & SID_DMA) && (cgd->inq_flags & SID_CmdQue))
-   softc->flags |= ADA_FLAG_CAN_NCQ;
if (cgd->ident_data.support_dsm & ATA_SUPPORT_DSM_TRIM) {
softc->flags |= ADA_FLAG_CAN_TRIM;
softc->trim_max_ranges = TRIM_MAX_RANGES;
@@ -916,6 +919,15 @@
else
softc->quirks = ADA_Q_NONE;
 
+   /*
+* Do not enable NCQ for devices which have the ADA_Q_NONCQ quirk.
+*/
+   if (!(softc->quirks & ADA_Q_NONCQ)) {
+   if (cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ &&
+   (cgd->i

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Oscar Prieto

Thank you again Jeremy, sure it helps!

On Tue, Feb 14, 2012 at 9:31 PM, Jeremy Chadwick
 wrote:
> On Tue, Feb 14, 2012 at 09:19:02PM +0100, Oscar Prieto wrote:
>> Thank you Jeremy, i'm already checking your links.
>>
>> When i installed smartd i configured a daily short test and a weekly
>> long one for all the drives while the machine remains mostly unused,
>> never thought it could be a problem reading the documentation and info
>> around.
>>
>> # /usr/local/etc/smartd.conf
>> /dev/ada0 -a -o on -S on -s (S/../.././03|L/../../2/07)
>> /dev/ada1 -a -o on -S on -s (S/../.././04|L/../../3/07)
>> /dev/ada2 -a -o on -S on -s (S/../.././05|L/../../4/07)
>> /dev/ada3 -a -o on -S on -s (S/../.././06|L/../../5/07)
>
> The problem is that, quite honestly, these do you zero good.  All it does
> is make a mess (per se) of the SMART self-test log.
>
> Take for example your situation with ada3: smartd(8) told you that the
> number of pending sectors increased to 5, and uncorrected increased to
> 1.  That's really all you need to know at that point.  If you want to
> know the LBA numbers which are problematic, you can manually intervene.
>
> The point is: the drive itself is going to notice problematic or bad
> sectors quicker than periodic short or long or surface scan tests will.
> Let the drive do its thing normally and only use SMART tests when
> there's indication something is wrong.
>
>> I'll remove the checks, do you advice for removing the daemon altogether?
>
> smartd(8) is useful because it keeps track of attributes which change in
> value and logs data to syslog (if I remember right), thus you have an
> exact time/date when an attribute changed.  This is especially useful
> for things pertaining to sector/physical media problems.
>
> As such, I tend to recommend folks using smartd(8) properly tune their
> smartd.conf to only monitor specific attributes.  This varies from drive
> to drive, but the key ones are things like attributes 5, 10, 11, 192,
> 193, 194 (if you want temperature logging), 196, 197, 198, 199, and 200.
> I'm speaking strictly for Western Digital disks here.
>
> The stock defaults, if I remember right, are to "monitor everything",
> which really doesn't work well given that so many vendors encode their
> RAW_VALUE fields in proprietary/vendor-specific formats.  People will
> often monitor things like the Hardware_ECC_Recovered attribute and start
> "freaking out" once day when the value goes from 0 to 838938239 or
> something larger.  Attribute data formats are not part of the ATA
> standard, so vendors choose to encode them.  Plus, not many admins that
> I've run into (honest) know what that attribute actually means
> disk-wise (hint: it's 100% normal for sector ECC to happen at all times;
> magnetic media is not perfect, that's what the per-sector ECC section is
> for!)
>
> However: people don't understand what SMART attribute acquisition
> actually does behind the scenes -- it results in the disk having to read
> from the HPA area (not user accessible or within LBA regions), which
> means seeking + moving the arms to an area, reading, then reporting all
> of this back.  Thus, it impacts I/O performance.  This is why I don't
> use smartd(8) on any of our systems.  But if I was to use it?  I would
> have it poll maybe every 120 minutes, rather than every 30.  It all
> depends on the system/load/etc..  I've seen people poll every 5 minutes
> (I think they're absolutely crazy/paranoid).  Their systems, their
> problem.  :-)
>
> Hope this helps.
>
> --
> | Jeremy Chadwick                                 j...@parodius.com |
> | Parodius Networking                     http://www.parodius.com/ |
> | UNIX Systems Administrator                 Mountain View, CA, US |
> | Making life hard for others since 1977.             PGP 4BD6C0CB |
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Victor Balada Diaz

On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
>  schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
> > On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
> >> Hello,
> >>
> >> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
> >> persists on FreeBSD 9.0 release.
> >>
> >> Switching from ahci to ataahci resolved the problem for me too.
> >>
> >> I'm using gmirror for swap, system is on a zpool and the problem first
> >> occurred during a zpool scrub, but it is easily reproducible with dd.
> >>
> >> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
> >> of=/dev/null is not an issue.
> >> Sometimes I need to power off the server because after a reboot one disk
> >> is still missing.
> >>
> >> I really would like to help in this issue, so let me know if you need
> >> any more information.
> > I find it interesting that, at least so far, the only people reporting
> > problems of this type with the ahci.ko driver are people using Samsung
> > disks.  The only difference is that your models are F1s while the OPs
> > are F2s.
> 
> I saw such timeouts long ago and mav@ had a look at my postings and he
> mentioned it could be a NCQ problem.
> I suspected the disks firmware.
> I never tracked it down further, because after replacing the Samsung (F3
> in that case) disks with hitachi ones solved all my problems and gave a
> big performance kick as well (with zfs).
> You can find the discussion here:
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> 

You gave me a good idea: try to disable NCQ and see if that's the fault. So
i went and applied the attached patch. After it, i can no longer reproduce
the issue with ahci driver.

I know this is not a solution because it disables NCQ at controller level
instead of disk level, but at least we know for sure where the problem is.

I think the solution would be to add a new quirk ADA_Q_NONCQ in 
sys/cam/ata/ata_da.c.
Quirks infraestructure is already built, so adding a new quirk for this seems
easy.

Is someone interested? Do you think there is a better solution?

If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and add 
my drives
to it.

Regards.
-- 
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros. 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: CARP carpdev

2012-02-14 Thread Hugo Silva


On 02/14/12 17:33, Freddie Cash wrote:

On Tue, Feb 14, 2012 at 8:56 AM, Hugo Silva  wrote:

Looks like there's been conversations about porting this to FreeBSD since at
least 2007.

Are there any plans to have ifconfig carpdev available in 9.0-STABLE?


CARP support has been redone in 10-CURRENT, removing the whole "carp0"
pseudo-interface support, and just enabling the CARP protocol on the
existing network interfaces. This includes the equivalent of "carpdev"
support.

Search the -current archives for more information, CFT, and so on.

I don't recall seeing anything about specific plans to MFC to
stable/9, but could be mis-remembering things.




http://svnweb.freebsd.org/base?view=revision&revision=228571

The single IP limitation may be a problem in some locations..

Did not find anything about a possible MFC either. glebius@ is cc'd, 
perhaps he can add something, but based on 
http://svn.freebsd.org/base/stable/9/UPDATING, I don't think it's been 
MFCd (there's a primer for the new carp in current's UPDATING)\


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: LSI supported mps(4) driver in stable/9 and stable/8

2012-02-14 Thread Ollivier Robert

According to Kenneth D. Merry:
> So it is perfectly fine to run the driver in stable/9 or stable/8 without
> the CAM changes.

Excellent, thank you Ken.

-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- robe...@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-14 Thread Mike Andrews


On 2/14/2012 3:05 PM, Devin Teske wrote:

Please don't get rid of fdisk or bsdlabel as they are (and forever will be)
required to do things like:

1. scripted formatting of a thumb drive

2. automated probing of disk information (fdisk -p)

3. Other tasks that are not suitably handled by curses-based utilities

For example, the following command will create a second Windows partition on a
thumb drive without user interaction:

echo "p 2 0x0c * *" | fdisk -f - /dev/da0

If you take away fdisk, how am I supposed to achieve the above?


/sbin/gpart add -t 12 -i 2 da0

(Untested, but that should work...)

gpart is very scriptable, and still handles MBR and bsdlabel partitions 
if you need to work with removable media or volumes that will never be 
larger than 2 TB.  "gpart list" and "gpart show" would get you all the 
machine-parsable stuff you'd ever need.


The 2 TB limit is *the* reason to move from MBR+bsdlabel to GPT though.  
Even without RAID, 3 TB disks exist already. :)  With FreeBSD's boot 
code, you don't even need an EFI-capable machine to boot from a 
GPT-partitioned device.  For non-removable media, it's time to move on.  
Really.  :)  Even on smaller 250 GB disks, I'm using GPT just because 
there's no reason not to... it's just cleaner and it was easier to write 
gpart scripts than it was to script fdisk/bsdlabel scripts anyway.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Why won't 8.2 umount -f?

2012-02-14 Thread Doug Barton

On 02/14/2012 08:39, Rick Macklem wrote:
> I took a look and they seem to have been MFC'd.

That's awesome! Thanks for your time on this. I guess we've got some
upgrading to do.


Doug

-- 

It's always a long day; 86400 doesn't fit into a short.

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 09:19:02PM +0100, Oscar Prieto wrote:
> Thank you Jeremy, i'm already checking your links.
> 
> When i installed smartd i configured a daily short test and a weekly
> long one for all the drives while the machine remains mostly unused,
> never thought it could be a problem reading the documentation and info
> around.
> 
> # /usr/local/etc/smartd.conf
> /dev/ada0 -a -o on -S on -s (S/../.././03|L/../../2/07)
> /dev/ada1 -a -o on -S on -s (S/../.././04|L/../../3/07)
> /dev/ada2 -a -o on -S on -s (S/../.././05|L/../../4/07)
> /dev/ada3 -a -o on -S on -s (S/../.././06|L/../../5/07)

The problem is that, quite honestly, these do you zero good.  All it does
is make a mess (per se) of the SMART self-test log.

Take for example your situation with ada3: smartd(8) told you that the
number of pending sectors increased to 5, and uncorrected increased to
1.  That's really all you need to know at that point.  If you want to
know the LBA numbers which are problematic, you can manually intervene.

The point is: the drive itself is going to notice problematic or bad
sectors quicker than periodic short or long or surface scan tests will.
Let the drive do its thing normally and only use SMART tests when
there's indication something is wrong.

> I'll remove the checks, do you advice for removing the daemon altogether?

smartd(8) is useful because it keeps track of attributes which change in
value and logs data to syslog (if I remember right), thus you have an
exact time/date when an attribute changed.  This is especially useful
for things pertaining to sector/physical media problems.

As such, I tend to recommend folks using smartd(8) properly tune their
smartd.conf to only monitor specific attributes.  This varies from drive
to drive, but the key ones are things like attributes 5, 10, 11, 192,
193, 194 (if you want temperature logging), 196, 197, 198, 199, and 200.
I'm speaking strictly for Western Digital disks here.

The stock defaults, if I remember right, are to "monitor everything",
which really doesn't work well given that so many vendors encode their
RAW_VALUE fields in proprietary/vendor-specific formats.  People will
often monitor things like the Hardware_ECC_Recovered attribute and start
"freaking out" once day when the value goes from 0 to 838938239 or
something larger.  Attribute data formats are not part of the ATA
standard, so vendors choose to encode them.  Plus, not many admins that
I've run into (honest) know what that attribute actually means
disk-wise (hint: it's 100% normal for sector ECC to happen at all times;
magnetic media is not perfect, that's what the per-sector ECC section is
for!)

However: people don't understand what SMART attribute acquisition
actually does behind the scenes -- it results in the disk having to read
from the HPA area (not user accessible or within LBA regions), which
means seeking + moving the arms to an area, reading, then reporting all
of this back.  Thus, it impacts I/O performance.  This is why I don't
use smartd(8) on any of our systems.  But if I was to use it?  I would
have it poll maybe every 120 minutes, rather than every 30.  It all
depends on the system/load/etc..  I've seen people poll every 5 minutes
(I think they're absolutely crazy/paranoid).  Their systems, their
problem.  :-)

Hope this helps.

-- 
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Oscar Prieto

Thank you Jeremy, i'm already checking your links.

When i installed smartd i configured a daily short test and a weekly
long one for all the drives while the machine remains mostly unused,
never thought it could be a problem reading the documentation and info
around.

# /usr/local/etc/smartd.conf
/dev/ada0 -a -o on -S on -s (S/../.././03|L/../../2/07)
/dev/ada1 -a -o on -S on -s (S/../.././04|L/../../3/07)
/dev/ada2 -a -o on -S on -s (S/../.././05|L/../../4/07)
/dev/ada3 -a -o on -S on -s (S/../.././06|L/../../5/07)

I'll remove the checks, do you advice for removing the daemon altogether?


On Tue, Feb 14, 2012 at 8:51 PM, Martin Sugioarto  wrote:
> Am Tue, 14 Feb 2012 20:24:32 +0100
> schrieb Harald Schmalzbauer :
>
>> I guess it's always the firmware of the EcoGreen models which cause
>> these problems. Your drive isn't EG...
>> I don't remember exactly the different model numbers, but I'm sure
>> they were all EcoGreen. The lower power consumption was the reason to
>> choose these specific drives (different capacities and F2/F3 series
>> tried), with acceptable performance loss - I thought. But it turned
>> out that EcoGreen and NCQ as well as RAIDZ demands dont' fit
>> together...
>
> Hi,
>
> I intentionally did not buy any Eco or Green model because I don't like
> them (Load_Cycle_Count bugs and so on). I realized, I like to use 1 Watt
> more power but have the performance doubled.
>
> --
> Martin
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 12:05:31PM -0800, Devin Teske wrote:
> Please don't get rid of fdisk or bsdlabel as they are (and forever will be)
> required to do things like:
> 
> 1. scripted formatting of a thumb drive

Can't this be done with gpart(8)?  There are scripts all over the web
and on the lists here showing people using it for that purpose.  It
doesn't require use of GPT either.

> 2. automated probing of disk information (fdisk -p)

Can't this be accomplished with "gpart list"?  Yes I know the man page
doesn't appear to have it documented, but it's there.  Furthermore,
fdisk -p shows silly things like C/H/S nomenclature; do you really use
this?  Do you have boards which don't support even the most basic 28-bit
LBA addressing?

> 3. Other tasks that are not suitably handled by curses-based utilities
>
> ...
>
> For example, the following command will create a second Windows partition on a
> thumb drive without user interaction:
> 
>   echo "p 2 0x0c * *" | fdisk -f - /dev/da0
> 
> If you take away fdisk, how am I supposed to achieve the above?

Again: gpart(8).  And before you complain: yes, I am in full agreement
that introduction of gpart into the fray should have probably been "more
public".  The syntax of the gpart commands takes some getting used to as
well (some things are hardly intuitive, but eventually make sense once
you see them in use).

I'm happy to use gpart for scripting, while fdisk/bsdlabel are like
pulling teeth.  That said, like others, I would be thrilled to see fdisk
and bsdlabel/disklabel disappear.  However, for that to happen, I really
expect gpart to be better documented.  Hell, all of the GEOM-based g*
utilities should be implemented slightly... differently.  It's hard to
explain what I mean by this.  Play with the geom(8) command sometime to
see what I mean.  "geom list" says to use "geom list list", etc..  Once
you delve into the code to see how it all works it then starts making
more sense why the utilities behave this way, but it's completely and
entirely non-intuitive to anyone not already familiar with it.

-- 
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: New BSD Installer

2012-02-14 Thread Devin Teske



> -Original Message-
> From: Kevin Oberman [mailto:kob6...@gmail.com]
> Sent: Tuesday, February 14, 2012 11:51 AM
> To: Devin Teske
> Cc: Ian Smith; Bruce Cran; Alex Samorukov; Joe Holden; FreeBSD Stable Mailing
> List
> Subject: Re: New BSD Installer
> 
> On Tue, Feb 14, 2012 at 9:43 AM, Devin Teske 
> wrote:
> >
> >
> >> -Original Message-
> >> From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-
> >> sta...@freebsd.org] On Behalf Of Ian Smith
> >> Sent: Tuesday, February 14, 2012 9:15 AM
> >> To: Bruce Cran
> >> Cc: FreeBSD Stable Mailing List; Joe Holden; Alex Samorukov
> >> Subject: Re: New BSD Installer
> >>
> >> Strangely, the big push to GPT partitions was oft said to be because MBR
> >> slices provided too few partitions.
> >
> > That's part of it (no pun intended).
> >
> > The other big deal is that you can't exceed 2TB on a single primary
partition.
> >
> >
> >> I never found 4 * 6 much of a limit
> >> myself, and now the default install makes a Linux-like single partition,
> >> rendering dump & restore more or less unusable or at least impractical,
> >
> > I'm with you on this one. I really don't like the single-"/" setup.
> >
> >
> >> while booting multiple systems on GPT also seems to require Linux tools.
> >>
> >> I don't know whether this move away from BSD traditional filesystem
> >> partitioning (/, /var, /usr etc) to Linux-style came down from Core On
> >> High or is just the prerogative of installer-writers?  Jordan was both
> >> the latter and a big part of the former for many years, but I guess
> >> that's something that can be reverted if people feel to do so.
> >>
> >
> > Maybe a vote should be taken. There's about 12 votes in this office here
alone
> > for putting the partition scheme back the way it was (Colin Percival had a
great
> > formula for determining partition sizes).
> 
> I suggest that both be implemented, which looks to the untrained eye
> as a straight-forward thing to implement, and then the install ask if
> a single partition or a traditional multi-partition system should be
> installed. I prefer multi and use that on all of my systems.
> 
> I also really prefer GPT for a variety of reasons, but we need better
> tools to support things. I miss booteasy. Yes, you can get it to boot
> from a different partition, but it is a pain. I deal with it by
> putting FreeBSD on one disk and Windows on another when I want a
> dual-boot system. I put the MBR formatted (Windows) is first in the
> boot order, so I can just hit F5 to boot the FreeBSD disk.
> 
> This works for me, but I suspect that lots of people would prefer
> having multiple OSes on a single disk...especially when it's a single
> spindle laptop. (I suspect laptops are more commonly dual-boot than
> most any other platform.)
> 
> As for fdisk and bsdlabel, I'm happy to see both go. They have a
> horrid user interface and require a calculator to get right. Yes, I
> use them, but only because there is no other way to do some things.
> (sade(8) comes closer all of the time, though.)

Please don't get rid of fdisk or bsdlabel as they are (and forever will be)
required to do things like:

1. scripted formatting of a thumb drive

2. automated probing of disk information (fdisk -p)

3. Other tasks that are not suitably handled by curses-based utilities

For example, the following command will create a second Windows partition on a
thumb drive without user interaction:

echo "p 2 0x0c * *" | fdisk -f - /dev/da0

If you take away fdisk, how am I supposed to achieve the above?
-- 
Devin

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: disk devices speed is ugly

2012-02-14 Thread Peter Jeremy

On 2012-Feb-13 08:28:21 -0500, Gary Palmer  wrote:
>The filesystem is the *BEST* place to do caching.  It knows what metadata
>is most effective to cache and what other data (e.g. file contents) doesn't
>need to be cached.

Agreed.

>  Any attempt to do this in layers between the FS and
>the disk won't achieve the same gains as a properly written filesystem. 

Agreed - but traditionally, Unix uses this approach via block devices.
For various reasons, FreeBSD moved caching into UFS and removed block
devices.  Unfortunately, this means that any FS that wants caching has
to implement its own - and currently only UFS & ZFS do.

What would be nice is a generic caching subsystem that any FS can use
- similar to the old block devices but with hooks to allow the FS to
request read-ahead, advise of unwanted blocks and ability to flush
dirty blocks in a requested order with the equivalent of barriers
(request Y will not occur until preceeding request X has been
committed to stable media).  This would allow filesystems to regain
the benefits of block devices with minimal effort and then improve
performance & cache efficiency with additional work.

One downside of the "each FS does its own caching" in that the caches
are all separate and need careful integration into the VM subsystem to
prevent starvation (eg past problems with UFS starving ZFS L2ARC).

-- 
Peter Jeremy

pgpa3o0LQ2kfG.pgp
Description: PGP signature

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 08:31:23PM +0100, Oscar Prieto wrote:
> I used to had tons of ahci errors in my 4 disk raidz1 worth of
> HD154UIs when the rig was built a year ago or so (with 8.0 Release),
> but they dissapeared after tuning ZFS.
> 
> Sadly i also got a new timeout days ago followed with smartcl erros i
> still keep unchecked but i guess they cold be legit, i still have to
> test/swap cables and give it a try.

About your ada3 disk:

The below SMART errors indicate your disk does in fact have physical
media problems -- 1 confirmed bad sector, and 5 which are "suspect".
"Suspect" LBAs are unreadable until writes are issued to them.  A write
will induce the drive to re-analyse the sector at that LBA and determine
if it's truly bad or not.  A single LBA can actually take quite a long
time to analyse (it depends on what the problem is), and may result in
30+ seconds of delay.  You can either let the drive figure it out over
normal usage patterns, or you can do it manually yourself time
permitting.  Your drive that shows read failures in the SMART self-test
log gives you the LBA numbers; try reading from those LBAs first.  I can
explain this procedure in another thread/offline/whatever.  (Does anyone
read what I write, re: don't hijack the thread?  :-) )

About all of your disks:

All of your disks are undergoing regular/periodic SMART short and long
tests.  Please stop this; it really, truly does no good.  You will
experience performance hits during these tests.

About timeouts:

Timeouts seen on the controller and driver level can happen in this
situation; this is universal.  This is usually what features like
Western Digital's TLER and Hitachi + Samsung's CCTL can help alleviate,
but not fully solve.  I think the ada(4) default timeout of 30 seconds
is a decent value, to be quite honest, but I'm not sure what the AHCI
driver timeout is.  mav@ would need to clue me in, or I'd need to go
look at the source.  (Right now in my life is not a good time for me to
be reviewing source code or looking at commits, sadly.  Too much on my
mind recently.)

I can discuss the TLER/CCTL stuff more at length if needed, but to be
blatantly honest, I would rather not and here's why: people begin to
rely on these features to try and circumvent actual problems with their
drives.  Phrased differently: people on the Internet become incredibly
focused on all of these timeout durations (TLER/CCTL vs. controller vs.
driver vs. storage subsystem timeouts) and try to find some bizarre
"perfect harmony" between them all.  Instead, just leave them all alone
and watch your drives for problems.

Further details which pertain to Samsung drives:

In your case, you run smartd(8), which periodically hits the drive with
SMART requests, pulling attribute data down and parsing it.  I believe
your model is fine for this, but for similar Samsung models, I must
strongly advise against this.  There are well-documented problems with
Samsung firmwares and SMART behaviour which can result in data loss (yes
you read that right).  Please see smartmontools' Wiki page on the matter
for full details.  Just make sure you're running a fixed firmware:

http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks

Regarding throughput of the drives being slow (30-40MBytes/sec across a
gigE link):

This sounds more like a Samba tuning problem, but ZFS raidz isn't known
for "amazing speed" per se.  Please see a post of mine from a while back
on how to tune Samba, which many followed up to with appreciation
stating their throughput increased dramatically:

http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061642.html

I should follow up to that post with the following entry, because I've
since updated my own smb.conf to tune things a bit better, and include
comments as to the justifications:

#
# The below options increase throughput substantially.  Be aware
# that AIO support requires the aio.ko kernel module loaded,
# and Samba to be built with AIO enabled.  Important notes:
#
# 1) We explicitly disable sendfile(2) because it has known
# problems on ZFS, including resulting in 2x the amount of memory
# used on the machine (VM cache + ZFS cache).  For further details,
# see freebsd-fs or freebsd-stable thread, subject "8.1-STABLE:
# zfs and sendfile: problem still exists".
#
# 2) (2011/10/03) socket options SO_SNDBUF and SO_RCVBUF do not
# appear to matter on FreeBSD, or our sysctls somehow take care of
# this (or maybe AIO?).  The performance is the same with or without
# these two socket options on 8.2-STABLE.
#
# 3) (2011/10/03) My previously-mentioned "aio write behind" option
# is incorrect; see the officia smb.conf(5) man page for the syntax.
# It's not a yes/no toggleable, thus serves no purpose.
#
socket options = TCP_NODELAY
use sendfile = no
min receivefile size = 16384
aio read size = 16384
aio write size = 16384

The rest is in the thread I linked.

Hope this helps.

-- 
| Jeremy Chadwick

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Martin Sugioarto

Am Tue, 14 Feb 2012 20:24:32 +0100
schrieb Harald Schmalzbauer :

> I guess it's always the firmware of the EcoGreen models which cause
> these problems. Your drive isn't EG...
> I don't remember exactly the different model numbers, but I'm sure
> they were all EcoGreen. The lower power consumption was the reason to
> choose these specific drives (different capacities and F2/F3 series
> tried), with acceptable performance loss - I thought. But it turned
> out that EcoGreen and NCQ as well as RAIDZ demands dont' fit
> together...

Hi,

I intentionally did not buy any Eco or Green model because I don't like
them (Load_Cycle_Count bugs and so on). I realized, I like to use 1 Watt
more power but have the performance doubled.

--
Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-14 Thread Kevin Oberman

On Tue, Feb 14, 2012 at 9:43 AM, Devin Teske  wrote:
>
>
>> -Original Message-
>> From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-
>> sta...@freebsd.org] On Behalf Of Ian Smith
>> Sent: Tuesday, February 14, 2012 9:15 AM
>> To: Bruce Cran
>> Cc: FreeBSD Stable Mailing List; Joe Holden; Alex Samorukov
>> Subject: Re: New BSD Installer
>>
>> Strangely, the big push to GPT partitions was oft said to be because MBR
>> slices provided too few partitions.
>
> That's part of it (no pun intended).
>
> The other big deal is that you can't exceed 2TB on a single primary partition.
>
>
>> I never found 4 * 6 much of a limit
>> myself, and now the default install makes a Linux-like single partition,
>> rendering dump & restore more or less unusable or at least impractical,
>
> I'm with you on this one. I really don't like the single-"/" setup.
>
>
>> while booting multiple systems on GPT also seems to require Linux tools.
>>
>> I don't know whether this move away from BSD traditional filesystem
>> partitioning (/, /var, /usr etc) to Linux-style came down from Core On
>> High or is just the prerogative of installer-writers?  Jordan was both
>> the latter and a big part of the former for many years, but I guess
>> that's something that can be reverted if people feel to do so.
>>
>
> Maybe a vote should be taken. There's about 12 votes in this office here alone
> for putting the partition scheme back the way it was (Colin Percival had a 
> great
> formula for determining partition sizes).

I suggest that both be implemented, which looks to the untrained eye
as a straight-forward thing to implement, and then the install ask if
a single partition or a traditional multi-partition system should be
installed. I prefer multi and use that on all of my systems.

I also really prefer GPT for a variety of reasons, but we need better
tools to support things. I miss booteasy. Yes, you can get it to boot
from a different partition, but it is a pain. I deal with it by
putting FreeBSD on one disk and Windows on another when I want a
dual-boot system. I put the MBR formatted (Windows) is first in the
boot order, so I can just hit F5 to boot the FreeBSD disk.

This works for me, but I suspect that lots of people would prefer
having multiple OSes on a single disk...especially when it's a single
spindle laptop. (I suspect laptops are more commonly dual-boot than
most any other platform.)

As for fdisk and bsdlabel, I'm happy to see both go. They have a
horrid user interface and require a calculator to get right. Yes, I
use them, but only because there is no other way to do some things.
(sade(8) comes closer all of the time, though.)
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Harald Schmalzbauer

 schrieb Martin Sugioarto am 14.02.2012 19:23 (localtime):
> Am Tue, 14 Feb 2012 18:17:19 +0100
> schrieb Harald Schmalzbauer :
>
>>> I find it interesting that, at least so far, the only people
>>> reporting problems of this type with the ahci.ko driver are people
>>> using Samsung disks.  The only difference is that your models are
>>> F1s while the OPs are F2s.
>> I saw such timeouts long ago and mav@ had a look at my postings and he
>> mentioned it could be a NCQ problem.
>> I suspected the disks firmware.
>> I never tracked it down further, because after replacing the Samsung
>> (F3 in that case) disks with hitachi ones solved all my problems and
>> gave a big performance kick as well (with zfs).
>> You can find the discussion here:
>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> Hi,
>
> I just want to add here that I am using 2 drives of type "Samsung
> HD103SJ" (SpinPoint F3). And I did not have problems with ZFS and with
> UFS either (for several years now). Everything has been deployed ontop
> ada(4) since FreeBSD-8.
>
> Actually the speed is very good (sequential read at 140 MB/s and more).

I guess it's always the firmware of the EcoGreen models which cause
these problems. Your drive isn't EG...
I don't remember exactly the different model numbers, but I'm sure they
were all EcoGreen. The lower power consumption was the reason to choose
these specific drives (different capacities and F2/F3 series tried),
with acceptable performance loss - I thought. But it turned out that
EcoGreen and NCQ as well as RAIDZ demands dont' fit together...

-Harry



signature.asc
Description: OpenPGP digital signature

Re: hang during dump (reproducible)

2012-02-14 Thread Andrew Boyer


On Feb 10, 2012, at 9:50 PM, Jake Holland wrote:
> 
> Many thanks to Attilio Rao, Kostik Belousov, and Andriy Gapon. And anybody 
> else involved.
> 
> However, when I looked at the commit I noticed this:
>> $ svn log -r228424 svn://svn.freebsd.org/base
>  ...
>> MFC after:  3 months (or never)
> 
> I'm not sure whether "never" is still considered an option, but it would be 
> useful for me if 8.3 release, when it comes, does not hang this way during 
> panic. But thanks for the patch, regardless.
> 

Agreed - if this commit could be MFC'd for 8.3 it would be much appreciated.

-Andrew

--
Andrew Boyerabo...@averesystems.com




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 9-stable : geli + one-disk ZFS fails

2012-02-14 Thread Arno J. Klaassen


Hi,

Martin Simmons  writes:

> Some random ideas:
>
> 1) Can you dd the whole of ada0s3.eli without errors?

I just started it; will take some hours 

> 2) If you scrub a few more times, does it find the same number of errors each
> time and are they always in that XNAT.tar file?

I deleted the XNAT.tar; I also copied files by 'ssh tar -c | tar -xp' to
rule out NFS, same type of errors; Looks like multiple scrubs give the
same files but not the same number of chksum errors (to be confirmed)

> 3) Can you try zfs without geli?

sure, I will split the place in one partition with geli and one without

> 4) Is the slice/partition layout definitely correct?


I (still ???) use sysinstall to do the dirty computations in my place.

This is what gpart says (looks OK (to me ...) :


[root@cc ~]# gpart list ada0
Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 976773167
first: 63
entries: 4
scheme: MBR
Providers:
1. Name: ada0s1
   Mediasize: 40802001408 (38G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 32256
   Mode: r0w0e0
   rawtype: 7
   length: 40802001408
   offset: 32256
   type: ntfs
   index: 1
   end: 79691471
   start: 63
2. Name: ada0s2
   Mediasize: 34359607296 (32G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147328000
   Mode: r3w3e5
   attrib: active
   rawtype: 165
   length: 34359607296
   offset: 40802033664
   type: freebsd
   index: 2
   end: 146800079
   start: 79691472
3. Name: ada0s3
   Mediasize: 424946221056 (395G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147196928
   Mode: r1w1e1
   rawtype: 165
   length: 424946221056
   offset: 75161640960
   type: freebsd
   index: 3
   end: 976773167
   start: 146800080
Consumers:
1. Name: ada0
   Mediasize: 500107862016 (465G)
   Sectorsize: 512
   Mode: r4w4e10

  

Merci,

Arno


> __Martin
>
>
>> On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said:
>> 
>> hello,
>> 
>> to eventually gain interest in this issue :
>> 
>>  I updated to today's -stable, tested with vfs.zfs.debug=1
>>  and vfs.zfs.prefetch_disable=0, no difference.
>> 
>>  I also tested to read the raw partition :
>> 
>>   [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096  conv=noerror
>>   103746636+0 records in
>>   103746636+0 records out
>>   424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec)
>>   [root@cc /usr/ports]#
>> 
>>  Disk is brand new, looks ok, either my setup is not good or there is
>>  a bug somewhere; I can play around with this box for some more time,
>>  please feel free to provide me with some hints what to do to be useful
>>  for you.
>> 
>> Best,
>> 
>> Arno
>> 
>> 
>> "Arno J. Klaassen"  writes:
>> 
>> > Hello,
>> >
>> >
>> > I finally decided to 'play' a bit with ZFS on a notebook, some years
>> > old, but I installed a brand new disk and memtest passes OK.
>> >
>> > I installed base+ports on partition 2, using 'classical' UFS.
>> >
>> > I crypted partition 3 and created a single zpool on it containing
>> > 4 Z-"file-systems" :
>> >
>> >  [root@cc ~]# zfs list
>> >  NAME  USED  AVAIL  REFER  MOUNTPOINT
>> >  zfiles   10.7G   377G   152K  /zfiles
>> >  zfiles/home  10.6G   377G   119M  /zfiles/home
>> >  zfiles/home/arno 10.5G   377G  2.35G  /zfiles/home/arno
>> >  zfiles/home/arno/.priv192K   377G   192K  /zfiles/home/arno/.priv
>> >  zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
>> >
>> >
>> > I export the ZFS's via nfs and rsynced on the other machine some backup
>> > of my current note-book (geli + UFS, (almost) same 9-stable version, no
>> > problem) to the ZFS's.
>> >
>> >
>> > Quite fast, I see on the notebook :
>> >
>> >
>> >  [root@cc /usr/temp]# zpool status -v
>> >pool: zfiles
>> >   state: ONLINE
>> >  status: One or more devices has experienced an error resulting in data
>> >  corruption.  Applications may be affected.
>> >  action: Restore the file in question if possible.  Otherwise restore the
>> >  entire pool from backup.
>> > see: http://www.sun.com/msg/ZFS-8000-8A
>> >scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34
>> >2012
>> >  config: 
>> >  
>> >  NAME  STATE READ WRITE CKSUM
>> >  zfilesONLINE   0 011
>> >ada0s3.eli  ONLINE   0 023
>> >
>> >  errors: Permanent errors have been detected in the following files:
>> >
>> >  /zfiles/home/arno/.scito/contrib/XNAT.tar
>> >  [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar
>> >  md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error
>> >  [root@cc /usr/temp]#
>> >
>> >
>> > As said, memtest is OK, nothing is logged to the console, UFS on the
>> > same disk works OK (I did some tests copying and comparing random data)
>> > and smartctl as well seems to trust the disk :
>> >
>> >  SMART Self-test log structure revision number 1
>> >  Num  Test_Description

Re: freebsd 9-stable TOP problem from around Jan 10

2012-02-14 Thread Kevin Oberman

On Tue, Feb 14, 2012 at 12:23 AM, Julian Elischer  wrote:
> Has anyone else seen a  problem with top -H -S?
>
> after a short while the screen gets more and more corrupted..
>
> hitting ^L or turning off S & H modes helps .. for a while.
>
> If this is a known fixed problem, let me know but I need to co-ordinate with
> others
> to upgrade the machine in question.

Not seeing it here on 9-stable. Could it be a display issue? I am
using gnome-terminal with TERM defined as 'xterm'.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Martin Sugioarto

Am Tue, 14 Feb 2012 18:17:19 +0100
schrieb Harald Schmalzbauer :

> > I find it interesting that, at least so far, the only people
> > reporting problems of this type with the ahci.ko driver are people
> > using Samsung disks.  The only difference is that your models are
> > F1s while the OPs are F2s.
> 
> I saw such timeouts long ago and mav@ had a look at my postings and he
> mentioned it could be a NCQ problem.
> I suspected the disks firmware.
> I never tracked it down further, because after replacing the Samsung
> (F3 in that case) disks with hitachi ones solved all my problems and
> gave a big performance kick as well (with zfs).
> You can find the discussion here:
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html

Hi,

I just want to add here that I am using 2 drives of type "Samsung
HD103SJ" (SpinPoint F3). And I did not have problems with ZFS and with
UFS either (for several years now). Everything has been deployed ontop
ada(4) since FreeBSD-8.

Actually the speed is very good (sequential read at 140 MB/s and more).

--
Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 6.2-Release ..ish.. CF + ata == freeze?

2012-02-14 Thread Ian Lepore

On Tue, 2012-02-14 at 00:12 -0500, Jason Hellenthal wrote:
> 
> On Mon, Feb 13, 2012 at 08:43:08PM -0800, john fleming wrote:
> > Just thought i would post over here as i'm not getting a warm fuzzy from 
> > checkpoint about being able to find the root cause of an issue. I have a 
> > large install base of IPSO checkpoint firewalls, which are based on FreeBSD 
> > 6.2. I've had 3 firewalls hang basically the same way, with something that 
> > looks like a filesystem issue or an issue with a CF card. 
> >  
> > Does anyone happen to know of any bugs (i've been looking around) that 
> > could cause something like that? Granted, it could be a batch of bad CF 
> > cards, but its odd that i'm seeing the same thing on 3 different boxes and 
> > once rebooted they seem ok.
> >  
> > Also is it possible to get useful info form the atacontroller when things 
> > go south like this from the ddb prompt?
> >  
> > This is what shows in show msgbuf
> > ad0: timeout waiting to issue command
> > ad0: error issuing WRITE command
> > ad0: timeout waiting to issue command
> > ad0: error issuing WRITE command
> > ad0: timeout waiting to issue command
> > ad0: error issuing WRITE command
> > ad0: timeout waiting to issue command
> > ad0: error issuing WRITE command
> > g_vfs_done():ad0s4h[WRITE(offset=33849344, length=131072)]error = 5 
> > g_vfs_done():ad0s4h[WRITE(offset=33980416, length=131072)]error = 5 
> > g_vfs_done():ad0s4h[WRITE(offset=34111488, length=131072)]error = 5
> >  g_vfs_done():ad0s4h[WRITE(offset=34242560, length=131072)]error = 5 
> > g_vfs_done():ad0s4h[WRITE(offset=34373632, length=131072)]error = 5 
> >  
> > ad0: 1882MB  at ata0-master PIO4
> > atapci0:  port 
> > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x5070-0x507f mem 0x80301000-0x803013ff 
> > at device 31.1 on pci0
> > ata0:  on atapci0
> > ata1:  on atapci0
> > atapci1:  port 
> > 0x5088-0x508f,0x50a4-0x50a7,0x5080-0x5087,0x50a0-0x50a3,0x5060-0x506f irq 
> > 15 at device 31.2 on pci0
> > ata2:  on atapci1
> > ata3:  on atapci1ad0s4h is basically a r/w ufs partition on 
> > the box where almost anything that needs to be written goes.
> > trace
> > Tracing pid 1101 tid 100043 td 0x656d8460
> > kdb_enter(608cc388,6246,656d8460,64ba1400,6095d580,...) at kdb_enter+0x2b
> > siointr1(64ba1400) at siointr1+0xf0
> > siointr(64ba1400) at siointr+0x38
> > intr_execute_handler(6095d580,f0a4ab04,6,6095d580,f0a4aafc,...) at 
> > intr_execute_handler+0x61
> > intr_execute_handlers(6095d580,f0a4ab04,6,0,656d8460,...) at 
> > intr_execute_handlers+0x40
> > atpic_handle_intr(4) at atpic_handle_intr+0x96
> > Xatpic_intr4() at Xatpic_intr4+0x20
> > --- interrupt, eip = 0x606044af, esp = 0xf0a4ab48, ebp = 0xf0a4ab5c ---
> > lockmgr(e1456a04,6,0,656d8460) at lockmgr+0x58f
> > getdirtybuf(e14569a4,60a405e4,1) at getdirtybuf+0x2e2
> > flush_deplist(68b30850,1,f0a4abb8) at flush_deplist+0x30
> > flush_inodedep_deps(656fa28c,1f235) at flush_inodedep_deps+0xcf
> > softdep_sync_metadata(65964618) at softdep_sync_metadata+0x61
> > ffs_syncvnode(65964618,1) at ffs_syncvnode+0x3a2
> > ffs_fsync(f0a4ac74) at ffs_fsync+0x12
> > VOP_FSYNC_APV(60949260,f0a4ac74) at VOP_FSYNC_APV+0x38
> > fsync(656d8460,f0a4acb4) at fsync+0x170
> > syscall(805003b,806003b,5fbf003b,805,288be450,...) at syscall+0x2ee
> > Xint0x80_syscall() at Xint0x80_syscall+0x1f
> 
> This looks to be a problem with softupdates and CF cards. Can you get
> this to repeat on a brand new (good) card ?
> 

EIO errors on a write that lead to a panic nearly always backtrace into
the softupdates code, because that code pretty much has to panic if it
can't write things in the proper order.  That doesn't imply that the
softupdates code is at fault in any way, or that the errors would go
away if softupdates were turned off.  

In fact, I consider it important to have softupdates enabled on CF and
SDCard media.  The number of writes (and especially of repeated
re-writes of the same filesystem metadata sectors) goes way way up
without SU enabled, and that's bad for media with a limited number of
write cycles in its lifetime.

We've been using 6.2 with SU enabled on CF cards for many years at
Symmetricom; we're still shipping systems with that config.  Depending
on the motherboard or SBC, we often have to disable ata DMA, or limit it
to a max of WDMA2 mode.  The indication that you need to do so is
typically a lockup either trying to load the kernel and modules, or
sometimes that works but it locks up while initializing the ata driver.
[1]  If your systems have been running fine with DMA enabled, it's not
the sort of problem that suddenly appears out of the blue.  You find out
when you need to disable it pretty quickly on new hardware because it
doesn't boot reliably.

I tend to agree with Jeremy's assesment that you may have some CF cards
that have neared the end of their life, and especially if they're full
the automatic wear leveling can't find any un-worn cells to use.  If the
cards are old they may have primitive wear-leve

Re: 9-stable : geli + one-disk ZFS fails

2012-02-14 Thread Arno J. Klaassen


Hallo Aleksandr,

>  Hello, Arno J. Klaassen!
>
> On Sat, Feb 11, 2012 at 04:53:10PM +0100
> a...@heho.snv.jussieu.fr wrote about "9-stable : geli + one-disk ZFS fails":
>> 
>> Hello,
>> 
>> 
>> I finally decided to 'play' a bit with ZFS on a notebook, some years
>> old, but I installed a brand new disk and memtest passes OK.
>> 
>> I installed base+ports on partition 2, using 'classical' UFS.
>> 
>> I crypted partition 3 and created a single zpool on it containing
>> 4 Z-"file-systems" :
>> 
>>  [root@cc ~]# zfs list
>>  NAME  USED  AVAIL  REFER  MOUNTPOINT
>>  zfiles   10.7G   377G   152K  /zfiles
>>  zfiles/home  10.6G   377G   119M  /zfiles/home
>>  zfiles/home/arno 10.5G   377G  2.35G  /zfiles/home/arno
>>  zfiles/home/arno/.priv192K   377G   192K  /zfiles/home/arno/.priv
>>  zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
>> 
>> 
>> I export the ZFS's via nfs and rsynced on the other machine some backup
>> of my current note-book (geli + UFS, (almost) same 9-stable version, no
>> problem) to the ZFS's.
>> 
>> 
>> Quite fast, I see on the notebook :
>> 
>> 
>>  [root@cc /usr/temp]# zpool status -v
>>pool: zfiles
>>   state: ONLINE
>>  status: One or more devices has experienced an error resulting in data
>>  corruption.  Applications may be affected.
>>  action: Restore the file in question if possible.  Otherwise restore the
>>  entire pool from backup.
>> see: http://www.sun.com/msg/ZFS-8000-8A
>>scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34
>>2012
>>  config: 
>>  
>>  NAME  STATE READ WRITE CKSUM
>>  zfilesONLINE   0 011
>>ada0s3.eli  ONLINE   0 023
>> 
>>  errors: Permanent errors have been detected in the following files:
>> 
>>  /zfiles/home/arno/.scito/contrib/XNAT.tar
>>  [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar
>>  md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error
>>  [root@cc /usr/temp]#
>> 
>> 
>> As said, memtest is OK, nothing is logged to the console, UFS on the
>> same disk works OK (I did some tests copying and comparing random data)
>> and smartctl as well seems to trust the disk :
>> 
>>  SMART Self-test log structure revision number 1
>>  Num  Test_DescriptionStatus  Remaining  LifeTime(hours)
>>  # 1  Extended offlineCompleted without error   00%   388
>>  # 2  Short offline   Completed without error   00%   387 
>> 
>> 
>> Am I doing something wrong and/or let me know what I could provide as
>> extra info to try to solve this (dmesg.boot at the end of this mail).
>> 
>> Thanx a lot in advance,
>> 
>> best, Arno
>
> Arno, you forgot to say how are you create geli partiotion.
> It is important.


  geli init /dev/ada0s3  (should I have used ' -s 4096 ' ???) 

I added later :

  geli  attach -k /tmp/ifmemoryfails.key1 -p /dev/ada0s3


In fact, on my regular laptop on which I now use UFS on top of GELI
I use /dev/ada0s3f, not the whole partition 

Hope this helps ;-)

thanx, best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: New BSD Installer

2012-02-14 Thread Devin Teske

> -Original Message-
> From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-
> sta...@freebsd.org] On Behalf Of Lars Engels
> Sent: Tuesday, February 14, 2012 9:28 AM
> To: Ian Smith
> Cc: Bruce Cran; Alex Samorukov; Joe Holden; FreeBSD Stable Mailing List
> Subject: Re: New BSD Installer
> 
> On Wed, Feb 15, 2012 at 04:15:17AM +1100, Ian Smith wrote:
> > On Sun, 12 Feb 2012 15:32:51 +, Bruce Cran wrote:
> >  > On 2/10/2012 7:47 PM, Alex Samorukov wrote:
> >  > > I am highly against reverting. Old installer is not GPT aware and in 
> > fact
> >  > > is unmaintained for a very long time.
> >  >
> >  > That's not really correct: quite a lot of work was done on it last year.
> >
> > Indeed.  Was it you working on the updated sade(8) adding GPT and ZFS?
> >
> > 
> >
> > I don't see it in terms of reverting.  Much other useful functionality
> > of sysinstall has yet to be reimplemented.
> 
> What exactly are you missing?
> There's sysutils/host-setup to configure your system like sysinstall
> did.

sysutils/host-setup (written/maintained by me) is only good for the following 
bits right now:

1. Time zone
2. Hostname/Domain
3. Network Interfaces
4. Default Router/Gateway
5. DNS nameservers

There's still quite a bit more that sysinstall(8) offered which isn't provided 
by anything yet (sysutils/host-setup included).

> There's sade and I am working on a tool to browse and add packages from
> the installation media and / or the ftp mirrors.

Ron McDowell and I are working on a new tool named "bsdconfig(8)" which is very 
modular and written in sh(1).

bsdconfig(8) is designed squarely at reimplementing all of the sysinstall(8) 
post-install bits so that we can cleanly whack sysinstall(8) without the prior 
complaints.

The portion of bsdconfig(8) that will handle browsing and adding of packages 
from either the installation media or ftp mirrors is incomplete at the moment, 
and we'd love it if you were willing to either:

(a) download the preliminary framework for bsdconfig(8) and start working on 
the packages module, or

(b) join the SourceForge CVS project and start working on bsdconfig(8) in 
realtime with Ron and I

NOTE: Choice of either option will result in further information being 
disbursed for your digestive pleasure.

So far, bsdconfig(8) has the 8529 lines of code (counting all modules, 
internationalization files, and Makefiles) with the following 
modules/components (status listed for each):

1. Distributions
Description: Install additional distribution sets
Status: pending development

2. Documentation installation
Description: Install FreeBSD Documentation set
Status: Done. Links to "bsdinstall docsinstall"

3. Packages
Description: Install Pre-packaged Software
Status: pending development

4. Password
Description: Set Root Password
Status: pending development

5. Fdisk
Description: Fdisk Partition Editor
Status: pending development
Note: Could be linked directly to sade(8)

6. Disklabel
Description: Disk Label Editor
Status: pending development
Note: Could be linked directly to sade(8)

7. Login/Group Management
Description: Add user's login and group information
Status: Done (by Ron McDowell)

8. Console
Description: Console Settings
Status: pending development

9. Timezone
Description: Set up Time Zone
Status: Done (by Devin Teske; me)

NOTE: Functionality shamelessly ripped from my ports addition: sysutils/tzdialog

10. Media Selection
Description: Select Media to Install From
Status: pending development

11. Mouse
Description: Configure the Mouse
Status: pending development

12. Networking Management
Description: Setup Networking interfaces, services, etc.
Status: Done (by Devin Teske; me)

NOTE: Functionality shamelessly ripped from my ports addition: 
sysutils/host-setup

13. Security
Description: Set Security Parameters
Status: pending development

14. Startup
Description: Set Startup Parameters
Status: pending development

15. Ttys
Description: Configure Ttys
Status: pending development

I am currently working on the framework some more and then I'm going to jump 
over to working on #14 "Startup".

As you can see from the above-list, we have quite a bit of functionality to 
migrate from sysinstall(8) over to bsdconfig(8) -- however the most difficult 
bits (user management, network management, and timezone have all been done so 
the rest should fall like a house of cards -- especially since we have really 
nice modular includes making the modules nice and light-weight).
-- 
Devin

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in

[releng_9 tinderbox] failure on ia64/ia64

2012-02-14 Thread FreeBSD Tinderbox

TB --- 2012-02-14 15:28:02 - tinderbox 2.9 running on freebsd-stable.sentex.ca
TB --- 2012-02-14 15:28:02 - starting RELENG_9 tinderbox run for ia64/ia64
TB --- 2012-02-14 15:28:02 - cleaning the object tree
TB --- 2012-02-14 15:28:02 - cvsupping the source tree
TB --- 2012-02-14 15:28:02 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_9/ia64/ia64/supfile
TB --- 2012-02-14 15:29:05 - building world
TB --- 2012-02-14 15:29:05 - CROSS_BUILD_TESTING=YES
TB --- 2012-02-14 15:29:05 - MAKEOBJDIRPREFIX=/obj
TB --- 2012-02-14 15:29:05 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2012-02-14 15:29:05 - SRCCONF=/dev/null
TB --- 2012-02-14 15:29:05 - TARGET=ia64
TB --- 2012-02-14 15:29:05 - TARGET_ARCH=ia64
TB --- 2012-02-14 15:29:05 - TZ=UTC
TB --- 2012-02-14 15:29:05 - __MAKE_CONF=/dev/null
TB --- 2012-02-14 15:29:05 - cd /src
TB --- 2012-02-14 15:29:05 - /usr/bin/make -B buildworld
>>> World build started on Tue Feb 14 15:29:06 UTC 2012
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Tue Feb 14 17:15:08 UTC 2012
TB --- 2012-02-14 17:15:08 - generating LINT kernel config
TB --- 2012-02-14 17:15:08 - cd /src/sys/ia64/conf
TB --- 2012-02-14 17:15:08 - /usr/bin/make -B LINT
TB --- 2012-02-14 17:15:08 - cd /src/sys/ia64/conf
TB --- 2012-02-14 17:15:08 - /usr/sbin/config -m LINT
TB --- 2012-02-14 17:15:08 - building LINT kernel
TB --- 2012-02-14 17:15:08 - CROSS_BUILD_TESTING=YES
TB --- 2012-02-14 17:15:08 - MAKEOBJDIRPREFIX=/obj
TB --- 2012-02-14 17:15:08 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2012-02-14 17:15:08 - SRCCONF=/dev/null
TB --- 2012-02-14 17:15:08 - TARGET=ia64
TB --- 2012-02-14 17:15:08 - TARGET_ARCH=ia64
TB --- 2012-02-14 17:15:08 - TZ=UTC
TB --- 2012-02-14 17:15:08 - __MAKE_CONF=/dev/null
TB --- 2012-02-14 17:15:08 - cd /src
TB --- 2012-02-14 17:15:08 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Tue Feb 14 17:15:09 UTC 2012
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
>>> stage 3.2: building everything
[...]
/src/sys/modules/mps/../../dev/mps/mps_sas_lsi.c:329: warning: implicit 
declaration of function 'mpssas_find_target_by_handle'
/src/sys/modules/mps/../../dev/mps/mps_sas_lsi.c:329: warning: nested extern 
declaration of 'mpssas_find_target_by_handle' [-Wnested-externs]
/src/sys/modules/mps/../../dev/mps/mps_sas_lsi.c:329: warning: assignment makes 
pointer from integer without a cast
/src/sys/modules/mps/../../dev/mps/mps_sas_lsi.c:396: warning: implicit 
declaration of function 'mpssas_prepare_volume_remove'
/src/sys/modules/mps/../../dev/mps/mps_sas_lsi.c:396: warning: nested extern 
declaration of 'mpssas_prepare_volume_remove' [-Wnested-externs]
/src/sys/modules/mps/../../dev/mps/mps_sas_lsi.c:403: warning: assignment makes 
pointer from integer without a cast
/src/sys/modules/mps/../../dev/mps/mps_sas_lsi.c:469: warning: assignment makes 
pointer from integer without a cast
/src/sys/modules/mps/../../dev/mps/mps_sas_lsi.c:481: warning: assignment makes 
pointer from integer without a cast
*** Error code 1

Stop in /src/sys/modules/mps.
*** Error code 1

Stop in /src/sys/modules.
*** Error code 1

Stop in /obj/ia64.ia64/src/sys/LINT.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2012-02-14 17:49:05 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2012-02-14 17:49:05 - ERROR: failed to build LINT kernel
TB --- 2012-02-14 17:49:05 - 5814.42 user 834.07 system 8463.93 real


http://tinderbox.freebsd.org/tinderbox-releng_9-RELENG_9-ia64-ia64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: New BSD Installer

2012-02-14 Thread Devin Teske

> -Original Message-
> From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-
> sta...@freebsd.org] On Behalf Of Ian Smith
> Sent: Tuesday, February 14, 2012 9:15 AM
> To: Bruce Cran
> Cc: FreeBSD Stable Mailing List; Joe Holden; Alex Samorukov
> Subject: Re: New BSD Installer
> 
> On Sun, 12 Feb 2012 15:32:51 +, Bruce Cran wrote:
>  > On 2/10/2012 7:47 PM, Alex Samorukov wrote:
>  > > I am highly against reverting. Old installer is not GPT aware and in fact
>  > > is unmaintained for a very long time.
>  >
>  > That's not really correct: quite a lot of work was done on it last year.
> 
> Indeed.  Was it you working on the updated sade(8) adding GPT and ZFS?
> 
> 
> 
> I don't see it in terms of reverting.  Much other useful functionality
> of sysinstall has yet to be reimplemented.

Ron McDowell and I are working feverishly on bsdconfig(8) set to arrive in
10.0-CURRENT

Highlights:
- It's modular
- It's easily estendable/maintained (written in sh(1))
- It's goal is to completely reimplement all missing functionality from
sysinstall(8)

However, it's still in the preliminary stages.

Discussions on bsdconfig are being held on -sysinstall@

Development work is being performed off the reservation (using SourceForge CVS
server) until we can agree on the structure prior to import to the base of HEAD
SVN tree.

Despite being preliminary code, there is currently 8529 lines of code so far.

I won't be posting links to the preliminary code (it's still preliminary) for
fear of getting too much feedback too early in the game (but if you're
interested, you can crawl the recent posts to -sysinstall@ and gleen the links
both from Ron and myself).

>  Sure, I know, send code ..
> but it's not only the functionality lost, but the ability for new users
> to accomplish a good deal of initial server setup before they're skilled
> enough to do it all from the command line, which is where I was in '98.
> 

bsdconfig(8) will fill this gap as sysinstall(8) did in the past.

The current plan moving forward is:

1. RELENG_9 will continue to offer both sysinstall and bsdinstall in the
installed base

2. RELENG_10 will drop sysinstall(8) but bring in bsdconfig(8)

This much has been agreed upon in the discussions involving many.

> I also think much of the sometimes gratuitous deprecation of sysinstall
> is unwarranted.

Yes, it has been acknowledged by many that the scheduled deprecation is
aggressive.

>  I've used sysinstall post-installation regularly since
> '98 on 2.2.6 through 3.3, 4.4-10, 5.-5, 6.1, 7.0-4 and 8.0-2.  Since one
> small disaster on 3.3 about 12 years ago (installing to the wrong slice)
> I've had no major issues with it, mostly partitioning all sorts of disks
> but also browsing and adding useful packages at installation.
> 

When bsdconfig(8) reaches a usable state (is entered into HEAD), we encourage
you to be an avid tester in the early stages to make sure we "get it right" with
respect to replication of sysinstall(8) features.

bsdconfig(8) should work fine on RELENG_9 just as 10.0-CURRENT

> Strangely, the big push to GPT partitions was oft said to be because MBR
> slices provided too few partitions. 

That's part of it (no pun intended).

The other big deal is that you can't exceed 2TB on a single primary partition.

> I never found 4 * 6 much of a limit
> myself, and now the default install makes a Linux-like single partition,
> rendering dump & restore more or less unusable or at least impractical,

I'm with you on this one. I really don't like the single-"/" setup.

> while booting multiple systems on GPT also seems to require Linux tools.
> 
> I don't know whether this move away from BSD traditional filesystem
> partitioning (/, /var, /usr etc) to Linux-style came down from Core On
> High or is just the prerogative of installer-writers?  Jordan was both
> the latter and a big part of the former for many years, but I guess
> that's something that can be reverted if people feel to do so.
> 

Maybe a vote should be taken. There's about 12 votes in this office here alone
for putting the partition scheme back the way it was (Colin Percival had a great
formula for determining partition sizes).

> I expect most developers run mostly the latest gear, and nowadays tend
> to use vbox images a good deal, but there will be many laptops and other
> systems using MBR slices and bsdlabel partitions for years to come, and
> I'd hate to see FreeBSD's longterm support for _slightly_ older hardware
> disappear, just because of having added better support for latest kit.
> 

Others will point out that if you try hard enough, you can create the old-style
MBR partitions with RELENG_9 (note: some minor bugs were documented in
9.0-RELEASE; the next release will not suffer these fallbacks).

> I for one will be screwed if sade, fdisk and bsdlabel disappear, as the
> release notes for 9 seem to indicate may be imminently on the cards.
> 

I too would be sad if those disappear. However, I do think

Re: CARP carpdev

2012-02-14 Thread Freddie Cash

On Tue, Feb 14, 2012 at 8:56 AM, Hugo Silva  wrote:
> Looks like there's been conversations about porting this to FreeBSD since at
> least 2007.
>
> Are there any plans to have ifconfig carpdev available in 9.0-STABLE?

CARP support has been redone in 10-CURRENT, removing the whole "carp0"
pseudo-interface support, and just enabling the CARP protocol on the
existing network interfaces. This includes the equivalent of "carpdev"
support.

Search the -current archives for more information, CFT, and so on.

I don't recall seeing anything about specific plans to MFC to
stable/9, but could be mis-remembering things.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-14 Thread Lars Engels

On Wed, Feb 15, 2012 at 04:15:17AM +1100, Ian Smith wrote:
> On Sun, 12 Feb 2012 15:32:51 +, Bruce Cran wrote:
>  > On 2/10/2012 7:47 PM, Alex Samorukov wrote:
>  > > I am highly against reverting. Old installer is not GPT aware and in fact
>  > > is unmaintained for a very long time.
>  > 
>  > That's not really correct: quite a lot of work was done on it last year.
> 
> Indeed.  Was it you working on the updated sade(8) adding GPT and ZFS?
> 
> 
> 
> I don't see it in terms of reverting.  Much other useful functionality 
> of sysinstall has yet to be reimplemented. 

What exactly are you missing?
There's sysutils/host-setup to configure your system like sysinstall
did.
There's sade and I am working on a tool to browse and add packages from
the installation media and / or the ftp mirrors.


pgpjMtFvgk8XW.pgp
Description: PGP signature

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Harald Schmalzbauer

 schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
> On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
>> Hello,
>>
>> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
>> persists on FreeBSD 9.0 release.
>>
>> Switching from ahci to ataahci resolved the problem for me too.
>>
>> I'm using gmirror for swap, system is on a zpool and the problem first
>> occurred during a zpool scrub, but it is easily reproducible with dd.
>>
>> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
>> of=/dev/null is not an issue.
>> Sometimes I need to power off the server because after a reboot one disk
>> is still missing.
>>
>> I really would like to help in this issue, so let me know if you need
>> any more information.
> I find it interesting that, at least so far, the only people reporting
> problems of this type with the ahci.ko driver are people using Samsung
> disks.  The only difference is that your models are F1s while the OPs
> are F2s.

I saw such timeouts long ago and mav@ had a look at my postings and he
mentioned it could be a NCQ problem.
I suspected the disks firmware.
I never tracked it down further, because after replacing the Samsung (F3
in that case) disks with hitachi ones solved all my problems and gave a
big performance kick as well (with zfs).
You can find the discussion here:
http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html

JFI

-Harry



signature.asc
Description: OpenPGP digital signature

Re: LSI supported mps(4) driver in stable/9 and stable/8

2012-02-14 Thread Kenneth D. Merry

On Mon, Feb 13, 2012 at 15:08:45 +0100, Ollivier Robert wrote:
> According to Kenneth D. Merry:
> > The LSI-supported version of the mps(4) driver that supports their 6Gb SAS
> > HBAs as well as WarpDrive controllers, is now in stable/9 and stable/8.
> 
> Thanks.
>  
> > Note that the CAM infrastructure changes that went into FreeBSD/head along
> > with this driver have not gone into either stable/9 or stable/8.  Only the
> > driver itself has been merged.
> > 
> > The CAM infrastructure changes depend on some other da(4) driver changes
> > that will need to get merged before they can go back.  If that merge
> > happens, it will probably only be into stable/9.
> 
> Got an ETA for this?  Saying differently, is it reasonable to run stable/9 
> with the new driver but w/o the CAM changes?  What do these changes bring 
> BTW?  Sorry, been out-of-touch these days :(
> 

No ETA for the CAM changes.  I need to talk with Alexander Motin about it,
and I haven't gotten around to that.  Too busy with other things.

The changes just allow the driver to get notification from CAM about read
capacity data instead of having the driver probe by itself.  The probe in
the driver for stable is kludgy, but does work.

So it is perfectly fine to run the driver in stable/9 or stable/8 without
the CAM changes.

The latest mps(4) driver changes have been merged into stable/9 and
stable/8, so this would be a good time to try it out.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-14 Thread Ian Smith

On Sun, 12 Feb 2012 15:32:51 +, Bruce Cran wrote:
 > On 2/10/2012 7:47 PM, Alex Samorukov wrote:
 > > I am highly against reverting. Old installer is not GPT aware and in fact
 > > is unmaintained for a very long time.
 > 
 > That's not really correct: quite a lot of work was done on it last year.

Indeed.  Was it you working on the updated sade(8) adding GPT and ZFS?

I don't see it in terms of reverting.  Much other useful functionality 
of sysinstall has yet to be reimplemented.  Sure, I know, send code .. 
but it's not only the functionality lost, but the ability for new users 
to accomplish a good deal of initial server setup before they're skilled 
enough to do it all from the command line, which is where I was in '98.

I also think much of the sometimes gratuitous deprecation of sysinstall 
is unwarranted.  I've used sysinstall post-installation regularly since 
'98 on 2.2.6 through 3.3, 4.4-10, 5.-5, 6.1, 7.0-4 and 8.0-2.  Since one 
small disaster on 3.3 about 12 years ago (installing to the wrong slice)
I've had no major issues with it, mostly partitioning all sorts of disks 
but also browsing and adding useful packages at installation.

Strangely, the big push to GPT partitions was oft said to be because MBR 
slices provided too few partitions.  I never found 4 * 6 much of a limit 
myself, and now the default install makes a Linux-like single partition, 
rendering dump & restore more or less unusable or at least impractical, 
while booting multiple systems on GPT also seems to require Linux tools.

I don't know whether this move away from BSD traditional filesystem 
partitioning (/, /var, /usr etc) to Linux-style came down from Core On 
High or is just the prerogative of installer-writers?  Jordan was both 
the latter and a big part of the former for many years, but I guess 
that's something that can be reverted if people feel to do so.

I expect most developers run mostly the latest gear, and nowadays tend 
to use vbox images a good deal, but there will be many laptops and other 
systems using MBR slices and bsdlabel partitions for years to come, and 
I'd hate to see FreeBSD's longterm support for _slightly_ older hardware 
disappear, just because of having added better support for latest kit.

I for one will be screwed if sade, fdisk and bsdlabel disappear, as the 
release notes for 9 seem to indicate may be imminently on the cards.

cheers, Ian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

CARP carpdev

2012-02-14 Thread Hugo Silva

Looks like there's been conversations about porting this to FreeBSD 
since at least 2007.


Are there any plans to have ifconfig carpdev available in 9.0-STABLE?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Complete hang on 9.0-RELEASE

2012-02-14 Thread Arnaud Lacombe

Hi folks,

For the records, I was running some tests yesterday on top of a
9.0-RELEASE, amd64, kernel when the box hanged. At the time of the
hang, the box was running a process with about 2800 threads with heavy
IPC between 1400 writers and 1400 readers. The box was in single user
mode (/bin/sh coming from FreeBSD 7.4-STABLE). Here is the beginning
of the dmesg:

Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012
r...@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
CPU: Intel(R) Atom(TM) CPU D510   @ 1.66GHz (1666.70-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x106ca  Family = 6  Model = 1c  Stepping = 10
  
Features=0xbfebfbff
  Features2=0x40e31d
  AMD Features=0x2800
  AMD Features2=0x1
  TSC: P-state invariant, performance statistics
real memory  = 2137587712 (2038 MB)
avail memory = 2037841920 (1943 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <070611 APIC1125>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP/HT): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP/HT): APIC ID:  3

I will restart the test and see if this happens again.

regards,
 - Arnaud
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
> 
> Hello,
> 
> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
> persists on FreeBSD 9.0 release.
> 
> Switching from ahci to ataahci resolved the problem for me too.
> 
> I'm using gmirror for swap, system is on a zpool and the problem first
> occurred during a zpool scrub, but it is easily reproducible with dd.
> 
> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
> of=/dev/null is not an issue.
> Sometimes I need to power off the server because after a reboot one disk
> is still missing.
> 
> I really would like to help in this issue, so let me know if you need
> any more information.

I find it interesting that, at least so far, the only people reporting
problems of this type with the ahci.ko driver are people using Samsung
disks.  The only difference is that your models are F1s while the OPs
are F2s.

The only difference I can think of is that the ahci.ko driver may have
more strict timeouts than the ata driver (ata driver includes ataahci;
ataahci.ko != ahci.ko, as you know).

You may be able to adjust these using loader.conf variables:

kern.cam.ada.default_timeout
kern.cam.ada.retry_count

I also imagine that hint.ahci.X.ccc might have some involvement here,
but it's something I am not familiar with.  mav@ would need to comment
on this -- it's outside of my familiarity scope.

Furthermore, in your case, your ada1 disk has serious CRC-related
problems, and your ada0 disk has seen similar just at a much lower rate.
ada1 should probably be replaced (along with cables, dusting out SATA
ports, etc.), but keeping ada0 is probably fine.  The statistics for
these are shown in the "smartctl -l sataphy" output, field labelled ID
0x0001, "Command failed due to ICRC error".  These are SATA-level
problems or physical problems which will manifest themselves as
anomalies during any kind of I/O.

The counters shown in ID 0x000a and 0x0009 are completely fine; these
don't indicate any problems.

Your drives don't support GP log region 0x04, which is why "smartctl -l
devstat" returns the errors it does.  The errors you see coming from the
kernel in this situation are 100% okay/acceptable; the drive itself is
literally returning ABRT status to the inquiry submit to it.  Different
drives from different vendors behave differently in this regard.

So, what I'm trying to say is, your problem looks different than the
OPs.  Let's not start a big "I have this problem too" thread; that has
happened so many times over the years that when it happens I immediately
bow out + stop participating in the thread.

> smartctl -l sataphy /dev/ada0
> 
> SATA Phy Event Counters (GP Log 0x11)
> ID  Size Value  Description
> 0x000a  2  150  Device-to-host register FISes sent due to a COMRESET
> 0x0001  23  Command failed due to ICRC error
> 0x0009  2  173  Transition from drive PhyRdy to drive PhyNRdy
> 
> smartctl -l sataphy /dev/ada1
> 
> SATA Phy Event Counters (GP Log 0x11)
> ID  Size Value  Description
> 0x000a  2  155  Device-to-host register FISes sent due to a COMRESET
> 0x0001  265535+ Command failed due to ICRC error
> 0x0009  2  178  Transition from drive PhyRdy to drive PhyNRdy

-- 
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Custom kernel poll summary

2012-02-14 Thread Julian Elischer

On 2/14/12 7:43 AM, Ian Smith wrote:

On Tue, 14 Feb 2012 2:37:55 +0100, Alexander Leidinger wrote:
  >  Here is what I got, the first column is the number of requests, the second
  >  what is requested, and the 3rd my comments (basically it means, if there 
is a
  >  comment, it is not needed/possible to include in a modular kernel):
  >  ---snip---

[..]

  >  1 IPFIREWALL_FORWARD->  performance impact too big if unused 
(julian)

well it's not that big but you will be running extra code for every 
packet unless you want it.
when I made it an option but I was mainly trying to placate the "just 
say no" crowd.
I perswonally wouldn't  mind having it on by default in GENERIC, as 
long as we still make it an option

so people who want every last drop of cpu can remove it.

I expect Julian will object if I've mis-paraphrased or over-simplified
something I recall him saying at least a couple of years ago :)

[..]

  >  4 ALTQ*  ->  does add code to the pf module
  > other impact?

ipfw(8) can also apply ALTQ tags, but relies on pfctl(8) to setup the
queues - or so I read; I've not used it here.  From altq(4):

  ALTQEnable ALTQ.
  ALTQ_CBQBuild the ``Class Based Queuing'' discipline.
  ALTQ_REDBuild the ``Random Early Detection'' extension.
  ALTQ_RIOBuild ``Random Early Drop'' for input and output.
  ALTQ_HFSC   Build the ``Hierarchical Packet Scheduler'' discipline.
  ALTQ_CDNR   Build the traffic conditioner.  This option is meaningless at
  the moment as the conditioner is not used by any of the
  available disciplines or consumers.
  ALTQ_PRIQ   Build the ``Priority Queuing'' discipline.
  ALTQ_NOPCC  Required if the TSC is unusable.
  ALTQ_DEBUG  Enable additional debugging facilities.

  Note that ALTQ-disciplines cannot be loaded as kernel modules.  In order
  to use a certain discipline you have to build it into a custom kernel.
  The pf(4) interface, that is required for the configuration process of
  ALTQ can be loaded as a module.

So which disciplines would one choose?  Seeming an unlikely candidate?

  >  1 IPSTEALTH  ->  changes ipfw module only?

I don't think this is specific to ipfw.  From /sys/conf/NOTES:

# IPSTEALTH enables code to support stealth forwarding (i.e., forwarding
# packets without touching the TTL).  This can be useful to hide firewalls
# from traceroute and similar tools.

But can it be disabled once added to kernel?  It's no good as a default.

  >  1 IPFIREWALL_VERBOSE_LIMIT=5 ->  changes ipfw module only?
  > loader tunable?
  >  1 IPFIREWALL_VERBOSE ->  changes ipfw module only?
  > loader tunable?

sysctl.conf: net.inet.ip.fw.verbose and net.inet.ip.fw.verbose_limit

cheers, Ian

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Why won't 8.2 umount -f?

2012-02-14 Thread Rick Macklem

Doug Barton wrote:
> On 02/13/2012 19:13, Rick Macklem wrote:
> > I just looked and at least some of the fixes were MFC'd to stable/8
> > about
> > 8months ago. So, they aren't in 8.2, but will be in 8.3.
> 
> Well 8.3 is about to enter code freeze, any way we can check to be
> sure
> all of the relevant fixes can be mfc'ed?
> 
I took a look and they seem to have been MFC'd.

rick

> 
> Doug
> 
> --
> 
> It's always a long day; 86400 doesn't fit into a short.
> 
> Breadth of IT experience, and depth of knowledge in the DNS.
> Yours for the right price. :) http://SupersetSolutions.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: sysutils/pftop on 9.x+

2012-02-14 Thread Florian Smeets

On 14.02.12 17:14, Fabian Keil wrote:
> Greg Rivers  wrote:
> 
>> sysutils/pftop was marked broken on 9.x and above last March[1].  Are 
>> there any plans to fix it soon?  It's a really handy utility.
>>
>> [1] 
>> http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/pftop/Makefile?rev=1.17
> 
> Please have a look at:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=155938
> 
> Note that the currently working fix is in the audit trail,
> the original fix stopped working after the PF update.

The PR was closed by mistake, I'll take care of it.

Florian



signature.asc
Description: OpenPGP digital signature

Re: sysutils/pftop on 9.x+

2012-02-14 Thread Fabian Keil

Greg Rivers  wrote:

> sysutils/pftop was marked broken on 9.x and above last March[1].  Are 
> there any plans to fix it soon?  It's a really handy utility.
> 
> [1] 
> http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/pftop/Makefile?rev=1.17

Please have a look at:
http://www.freebsd.org/cgi/query-pr.cgi?pr=155938

Note that the currently working fix is in the audit trail,
the original fix stopped working after the PF update.

Fabian


signature.asc
Description: PGP signature

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

2012-02-14 Thread Freddie Cash

On Tue, Feb 14, 2012 at 7:43 AM, Ian Smith  wrote:
> On Tue, 14 Feb 2012 2:37:55 +0100, Alexander Leidinger wrote:
>  > 1 IPSTEALTH                      -> changes ipfw module only?
>
> I don't think this is specific to ipfw.  From /sys/conf/NOTES:
>
> # IPSTEALTH enables code to support stealth forwarding (i.e., forwarding
> # packets without touching the TTL).  This can be useful to hide firewalls
> # from traceroute and similar tools.
>
> But can it be disabled once added to kernel?  It's no good as a default.

It's controllable via sysctl once it's compiled into the kernel.  If
it's not compiled into the kernel, then the sysctl doesn't exist.

>  > 1 IPFIREWALL_VERBOSE_LIMIT=5     -> changes ipfw module only?
>  >                                    loader tunable?

This is controllable via sysctl.  Not sure if it needs to be compiled
into the kernel before it's controllable via sysctl, though.   We have
compiled into all our firewall kernels (with a default of 1000), then
change it via sysctl when needed.

>  > 1 IPFIREWALL_VERBOSE             -> changes ipfw module only?
>  >                                    loader tunable?
>
> sysctl.conf: net.inet.ip.fw.verbose and net.inet.ip.fw.verbose_limit

Ah, you list the sysctls that control the last two.  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Claudius Herder


Hello,

I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
persists on FreeBSD 9.0 release.

Switching from ahci to ataahci resolved the problem for me too.

I'm using gmirror for swap, system is on a zpool and the problem first
occurred during a zpool scrub, but it is easily reproducible with dd.

The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
of=/dev/null is not an issue.
Sometimes I need to power off the server because after a reboot one disk
is still missing.

I really would like to help in this issue, so let me know if you need
any more information.

--
Claudius

dmesg:
--cut--
Jan 14 01:33:57 server kernel: ahcich0: Timeout on slot 7 port 0
Jan 14 01:33:57 server kernel: ahcich0: is  cs 0080 ss
 rs 0080 tfd c0 serr  cmd 0004c717
Jan 14 01:33:57 server kernel: ahcich1: Timeout on slot 31 port 0
Jan 14 01:33:57 server kernel: ahcich1: is  cs 8000 ss
 rs 8000 tfd c0 serr  cmd 0004df17
Jan 14 01:33:57 server kernel: ahcich0: Timeout on slot 7 port 0
Jan 14 01:33:57 server kernel: ahcich0: is  cs f800 ss
ff80 rs ff80 tfd c0 serr  cmd 0004cb17
Jan 14 01:33:57 server kernel: ahcich1: Timeout on slot 31 port 0
Jan 14 01:33:57 server kernel: ahcich1: is  cs 00f8 ss
80ff rs 80ff tfd c0 serr  cmd 0004c317
Jan 14 01:33:57 server kernel: ahcich0: Timeout on slot 23 port 0
Jan 14 01:33:57 server kernel: ahcich0: is  cs 0180 ss
 rs 0180 tfd c0 serr  cmd 0004d717
Jan 14 01:33:57 server kernel: ahcich1: Timeout on slot 15 port 0
Jan 14 01:33:57 server kernel: ahcich1: is  cs 00018000 ss
 rs 00018000 tfd c0 serr  cmd 0004cf17
Jan 14 01:33:57 server kernel: ahcich1: Timeout on slot 17 port 0
Jan 14 01:33:57 server kernel: ahcich1: is  cs 01f8 ss
01fe rs 01fe tfd c0 serr  cmd 0004d317
Jan 14 01:33:57 server kernel: ahcich0: AHCI reset: device not ready
after 31000ms (tfd = 0080)
Jan 14 01:33:57 server kernel: ahcich1: Timeout on slot 31 port 0
Jan 14 01:33:57 server kernel: ahcich1: is  cs 8000 ss
 rs 8000 tfd c0 serr  cmd 0004df17
Jan 14 01:33:57 server kernel: ahcich0: Timeout on slot 24 port 0
--cut--

smartctl -a /dev/ada0
smartctl 5.42 2011-10-20 r3458 [FreeBSD 9.0-RELEASE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F1 DT
Device Model: SAMSUNG HD753LJ
Serial Number:S13UJDWS900110
LU WWN Device Id: 5 0024e9 0020d1bfa
Firmware Version: 1AA01118
User Capacity:750,156,374,016 bytes [750 GB]
Sector Size:  512 bytes logical/physical
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:Tue Feb 14 16:32:58 2012 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:( 9429) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 158) minutes.
Conveyance self-test routine
recommended polling time:(  17) minutes.
SCT capabilities:  (0x003f) SCT Status supported.
SCT Error Recovery Control supported.

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

2012-02-14 Thread Ian Smith

On Tue, 14 Feb 2012 2:37:55 +0100, Alexander Leidinger wrote:
 > Here is what I got, the first column is the number of requests, the second
 > what is requested, and the 3rd my comments (basically it means, if there is a
 > comment, it is not needed/possible to include in a modular kernel):
 > ---snip---

[..]

 > 1 IPFIREWALL_FORWARD-> performance impact too big if unused 
 > (julian)

I expect Julian will object if I've mis-paraphrased or over-simplified 
something I recall him saying at least a couple of years ago :)

[..]

 > 4 ALTQ*  -> does add code to the pf module
 >other impact?

ipfw(8) can also apply ALTQ tags, but relies on pfctl(8) to setup the 
queues - or so I read; I've not used it here.  From altq(4):

 ALTQEnable ALTQ.
 ALTQ_CBQBuild the ``Class Based Queuing'' discipline.
 ALTQ_REDBuild the ``Random Early Detection'' extension.
 ALTQ_RIOBuild ``Random Early Drop'' for input and output.
 ALTQ_HFSC   Build the ``Hierarchical Packet Scheduler'' discipline.
 ALTQ_CDNR   Build the traffic conditioner.  This option is meaningless at
 the moment as the conditioner is not used by any of the
 available disciplines or consumers.
 ALTQ_PRIQ   Build the ``Priority Queuing'' discipline.
 ALTQ_NOPCC  Required if the TSC is unusable.
 ALTQ_DEBUG  Enable additional debugging facilities.

 Note that ALTQ-disciplines cannot be loaded as kernel modules.  In order
 to use a certain discipline you have to build it into a custom kernel.
 The pf(4) interface, that is required for the configuration process of
 ALTQ can be loaded as a module.

So which disciplines would one choose?  Seeming an unlikely candidate?

 > 1 IPSTEALTH  -> changes ipfw module only?

I don't think this is specific to ipfw.  From /sys/conf/NOTES:

# IPSTEALTH enables code to support stealth forwarding (i.e., forwarding
# packets without touching the TTL).  This can be useful to hide firewalls
# from traceroute and similar tools.

But can it be disabled once added to kernel?  It's no good as a default.

 > 1 IPFIREWALL_VERBOSE_LIMIT=5 -> changes ipfw module only?
 >loader tunable?
 > 1 IPFIREWALL_VERBOSE -> changes ipfw module only?
 >loader tunable?

sysctl.conf: net.inet.ip.fw.verbose and net.inet.ip.fw.verbose_limit

cheers, Ian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Victor Balada Diaz

On Tue, Feb 14, 2012 at 06:16:01AM -0800, Jeremy Chadwick wrote:

[..]

> 
> Thanks.  Both your drives look overall fine, sort-of.  I'll outline my
> concern points, and ask for some more info:
> 
> * ada0 has 28 CRC errors, while ada1 has 2.  These drives have been in
> use for 4688 hours and 4583 hours (respectively), which is roughly 6
> months for each drive.  CRC errors usually result in transparent
> retransmits, but this can sometimes cause I/O delays (especially if the
> CRC errors are repeated).
> 
> If the timeout messages recur in the future, please run the commands I
> gave you above once more and provide the output.  I can then compare the
> old to the new and see if there is anything of interest.

I can force the error each time i want. Its 100% reproducible on my environment
so i'll do the tests and send you smartctl -a output again.

> 
> * Both drives had 2 long tests run on them a few days ago ("Extended
> offline" tests).  Did you induce these manually?  If so, were these
> tests running at the time you witnessed AHCI timeout errors on ada0?
> Short, long, and selective surface scan tests are supposed to be
> non-intrusive, but given the nature of the tests sometimes they can
> stall the I/O subsystem.

I've ran the tests, but they were not running during timeout problems.
The only thing running on the disks was a newfs -J under a gjournal partiton.
For the rest, they're mostly idle.

> 
> If you do tests of this nature, you should write down the exact
> dates/times when you ran them (at least from now on).
> 
> If you didn't induce these, something must have, or possibly the drive
> itself did it (and if that's the case, convenient that it induces an
> entry in the self-test log!).
> 
> I do have some familiarity with drives doing internal tests -- the best
> example are old IBM Deskstar drives executing ADM on their own,
> resulting in the drives spinning down and performing internal tests,
> which would subsequently be interrupted by ATA I/O, drive spins back up,
> etc. -- but took too long resulting in ATA timeouts on FreeBSD and
> Linux.  I mailed IBM about this back in 2000 and got confirmation of the
> feature (which was also on their SCSI drives but defaulted to off); the
> feature was mysteriously removed in future drive models and still
> remains gone today:
> 
> http://jdc.parodius.com/freebsd/ibm_email_aware_of_adm.txt
> 
> I'm not saying your drives do this.  I'm simply saying that if there is
> some form of automated test that runs on these drives which is
> transparent to the underlying ATA layer, then there is really nothing
> you can do about it, and timeouts are possible.  The IBM ADM issue was
> only discovered after reviewing technical specifications/documentation
> and compared to their SCSI drives.

That's of course possible, but as the problem is 100% reproducible with
AHCI driver and is not with ata driver, i guess this time is not drive's
fault. 

We've also tested replacement disks and cables during the previous days. I
guess the problem is in some bad interaction with AHCI driver.

> 
> * Samsung has a notoriously bad reputation for firmware reliability on
> their SpinPoint drives, but I haven't read of anything bad about the F2
> series, just the F1, F3, and F4 models.  I have very little (almost
> none) experience with these drives.  I'm not boycotting their products,
> but I wouldn't be surprised if the timeout errors you saw were caused by
> something internal the drive was doing.  There is absolutely zero
> visibility into this kind of problem on any layer (even if you had an
> ATA protocol analyser hooked up); you're completely at the mercy of the
> firmware.  Just something to keep in mind when working with ANY kind of
> disk (MHDD, SSD, etc.).

I've seen reports on freebsd lists and smartmontools wiki about firmware
problems with F4 drives manufactured before december of 2010, but checking
samsung's web page, seems this drives are not affected. I hope
we're not hitting a new bug. More info:

http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks

> 
> All that said, could you please provide output from the following
> commands as well?  These may return "not supported" errors, which is
> acceptable, but we have to check.
> 
> * smartctl -l devstat /dev/ada0
> * smartctl -l sataphy /dev/ada0
> * smartctl -l devstat /dev/ada1
> * smartctl -l sataphy /dev/ada1
> 

Thanks a lot for you help Jeremy. Attached is the output of the commands:

fe09# smartctl -l devstat /dev/ada0
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

(pass0:ahcich0:0:0:0): READ_LOG_EXT. ACB: 2f 00 04 00 00 40 00 00 00 00 01 00
(pass0:ahcich0:0:0:0): CAM status: ATA Status Error
(pass0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(pass0:ahcich0:0:0:0): RES: 51 04 04 00 00 40 00 00 00 01 00
ATA_READ_LOG_EXT (addr=0x04:0x00, page=0, n=1) failed: Unknown

Re: siisch1: Error while READ LOG EXT

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 09:30:29AM -0500, Mike Tancsa wrote:
> On 2/10/2012 8:43 PM, Mike Tancsa wrote:
> > On 2/10/2012 8:27 PM, Jeremy Chadwick wrote:
> >> Mike,
> >>
> >> I wanted to make you aware of this commit that just came through:
> >>
> >> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cam/ata/ata_da.c
> > 
> > Thanks, I did see that.  I was going to wait until Monday to csup up
> > once all the weekend level zeros are done.  The prior kernels from Nov
> > 28th never saw these READ LOG EXT errors on either of these 2 big zfs boxes
> 
> 
> So far so good. Unfortunately, I had to make 2 changes to the box
> showing the problem the most. I changed the cable (the new one does seem
> to fit more snug) as well as updated the code.  I havent done many level
> 0 dumps to it (the real test will be the weekend), but so far so good.
> On the other box that did show the same READ LOG EXT error, I also
> updated the kernel, but made no hardware changes. It too has not yet
> shown any errors since the upgrade.
> 
> I changed the cable at 8am local time yesterday, and I take snapshots of
> smartctl at 5am

Cool.

> I did see this error increase in 24hrs, but that was on a disk that was
> off the motherboard.  Perhaps a new cable for it too.
> 
> < 0x000a  2   12  Device-to-host register FISes sent due to a
> COMRESET
> ---
> > 0x000a  26  Device-to-host register FISes sent due to a
> COMRESET

This ID tracks the number of times an actual communication reset command
was sent from the drive to the controller via a FIS packet.  This is at
the SATA layer, not the ATA command layer.  It's completely normal/okay
for a drive to have this number increase, especially if the machine is
shut off, force-reset (via reset button), or in some cases simply soft
rebooted.  Nothing to worry about here; no need to adjust cables or
otherwise.  Values 6, 12, etc. are all perfectly reasonable and will
vary from system to system based on use.

-- 
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: siisch1: Error while READ LOG EXT

2012-02-14 Thread Mike Tancsa

On 2/10/2012 8:43 PM, Mike Tancsa wrote:
> On 2/10/2012 8:27 PM, Jeremy Chadwick wrote:
>> Mike,
>>
>> I wanted to make you aware of this commit that just came through:
>>
>> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cam/ata/ata_da.c
> 
> Thanks, I did see that.  I was going to wait until Monday to csup up
> once all the weekend level zeros are done.  The prior kernels from Nov
> 28th never saw these READ LOG EXT errors on either of these 2 big zfs boxes

So far so good. Unfortunately, I had to make 2 changes to the box
showing the problem the most. I changed the cable (the new one does seem
to fit more snug) as well as updated the code.  I havent done many level
0 dumps to it (the real test will be the weekend), but so far so good.
On the other box that did show the same READ LOG EXT error, I also
updated the kernel, but made no hardware changes. It too has not yet
shown any errors since the upgrade.

I changed the cable at 8am local time yesterday, and I take snapshots of
smartctl at 5am

I did see this error increase in 24hrs, but that was on a disk that was
off the motherboard.  Perhaps a new cable for it too.

< 0x000a  2   12  Device-to-host register FISes sent due to a
COMRESET
---
> 0x000a  26  Device-to-host register FISes sent due to a
COMRESET

---Mike

-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 02:54:35PM +0100, Victor Balada Diaz wrote:
> On Tue, Feb 14, 2012 at 02:05:13AM -0800, Jeremy Chadwick wrote:
> > On Tue, Feb 14, 2012 at 10:19:09AM +0100, Victor Balada Diaz wrote:
> > > We're having some troubles with AHCI under FreeBSD 8.2 and 8-STABLE. The 
> > > error is:
> > > 
> > > ahcich0: Timeout on slot 8
> > > ahcich0: is  cs 0100 ss  rs 0100 tfd c0 serr 
> > > 
> > > ahcich0: AHCI reset...
> > > ahcich0: SATA connect time=0ms status=0123
> > > ahcich0: ready wait time=18ms
> > > ahcich0: AHCI reset done: device found
> > > (ada0:ahcich0:0:0:0): Request requeued
> > > (ada0:ahcich0:0:0:0): Retrying command
> > > (ada0:ahcich0:0:0:0): Command timed out
> > > (ada0:ahcich0:0:0:0): Retrying command
> > > ahcich0: Timeout on slot 8
> > > ahcich0: is  cs 007ff000 ss 007fff00 rs 007fff00 tfd c0 serr 
> > > 
> > > ahcich0: AHCI reset...
> > > ahcich0: SATA connect time=0ms status=0123
> > > ahcich0: ready wait time=84ms
> > > ahcich0: AHCI reset done: device found
> > > (ada0:ahcich0:0:0:0): Request requeued
> > > (ada0:ahcich0:0:0:0): Retrying command
> > > (ada0:ahcich0:0:0:0): Command timed out
> > > (ada0:ahcich0:0:0:0): Retrying command
> > > (ada0:ahcich0:0:0:0): Request requeued
> > > [...]
> > > 
> > > If we use old ATA driver we have no problems. If we just use the first 
> > > disk (ada0) with ahci,
> > > no problems either. If we use both disks (ada0 and ada1) in gmirror setup 
> > > with ahci, we
> > > got the above error. If we use both disks in gmirror with old ata driver, 
> > > no problems.
> > 
> > Please provide SMART statistics for both disks by installing
> > ports/sysutils/smartmontools (5.42 or newer please) and running
> > "smartctl -a" against both disks (ada0/ada1, or ad4/ad10 -- doesn't
> > matter which driver you're using).  I will review the output.
> 
> Just forgot to say that from time to time, after system hangs and i need
> to reboot, one of the disks is lost. It doesn't even show after a few reboots,
> nor on Linux live system.
> 
> You can see smartctl output here:
>
> ada0:
> 
> # smartctl -a /dev/ada0
> smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family: SAMSUNG SpinPoint F2 EG
> Device Model: SAMSUNG HD154UI
> Serial Number:S24EJ9BB200080
> LU WWN Device Id: 5 0024e9 2047cb78f
> Firmware Version: 1AG01118
> User Capacity:1,500,301,910,016 bytes [1.50 TB]
> Sector Size:  512 bytes logical/physical
> Device is:In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 3b
> Local Time is:Tue Feb 14 13:51:18 2012 CET
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x00) Offline data collection activity
> was never started.
> Auto Offline Data Collection: 
> Disabled.
> Self-test execution status:  (   0) The previous self-test routine 
> completed
> without error or no self-test has 
> ever 
> been run.
> Total time to complete Offline 
> data collection:(18863) seconds.
> Offline data collection
> capabilities:(0x7b) SMART execute Offline immediate.
> Auto Offline data collection on/off 
> support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities:(0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability:(0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine 
> recommended polling time:(   2) minutes.
> Extended self-test routine
> recommended polling time:( 255) minutes.
> Conveyance self-test routine
> recommended polling time:(  33) minutes.
> SCT capabilities:  (0x003f) SCT Status supported.
> SCT Error Recovery Control supported.
> SCT Feature Control supported.
>

Re: sysutils/pftop on 9.x+

2012-02-14 Thread Patrick Lamaiziere

Le Mon, 13 Feb 2012 14:09:25 -0600 (CST),
Greg Rivers  a écrit :

> sysutils/pftop was marked broken on 9.x and above last March[1].  Are 
> there any plans to fix it soon?  It's a really handy utility.
> 
> [1] 
> http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/pftop/Makefile?rev=1.17

Looks like there are some patches to make it works with
DragonFlyBSD/NetBSD in pkgsrc. Don't have the time to try...

http://pkgsrc.se/sysutils/pftop

HTH
Regards.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Victor Balada Diaz

On Tue, Feb 14, 2012 at 02:05:13AM -0800, Jeremy Chadwick wrote:
> On Tue, Feb 14, 2012 at 10:19:09AM +0100, Victor Balada Diaz wrote:
> > We're having some troubles with AHCI under FreeBSD 8.2 and 8-STABLE. The 
> > error is:
> > 
> > ahcich0: Timeout on slot 8
> > ahcich0: is  cs 0100 ss  rs 0100 tfd c0 serr 
> > 
> > ahcich0: AHCI reset...
> > ahcich0: SATA connect time=0ms status=0123
> > ahcich0: ready wait time=18ms
> > ahcich0: AHCI reset done: device found
> > (ada0:ahcich0:0:0:0): Request requeued
> > (ada0:ahcich0:0:0:0): Retrying command
> > (ada0:ahcich0:0:0:0): Command timed out
> > (ada0:ahcich0:0:0:0): Retrying command
> > ahcich0: Timeout on slot 8
> > ahcich0: is  cs 007ff000 ss 007fff00 rs 007fff00 tfd c0 serr 
> > 
> > ahcich0: AHCI reset...
> > ahcich0: SATA connect time=0ms status=0123
> > ahcich0: ready wait time=84ms
> > ahcich0: AHCI reset done: device found
> > (ada0:ahcich0:0:0:0): Request requeued
> > (ada0:ahcich0:0:0:0): Retrying command
> > (ada0:ahcich0:0:0:0): Command timed out
> > (ada0:ahcich0:0:0:0): Retrying command
> > (ada0:ahcich0:0:0:0): Request requeued
> > [...]
> > 
> > If we use old ATA driver we have no problems. If we just use the first disk 
> > (ada0) with ahci,
> > no problems either. If we use both disks (ada0 and ada1) in gmirror setup 
> > with ahci, we
> > got the above error. If we use both disks in gmirror with old ata driver, 
> > no problems.
> 
> Please provide SMART statistics for both disks by installing
> ports/sysutils/smartmontools (5.42 or newer please) and running
> "smartctl -a" against both disks (ada0/ada1, or ad4/ad10 -- doesn't
> matter which driver you're using).  I will review the output.

Just forgot to say that from time to time, after system hangs and i need
to reboot, one of the disks is lost. It doesn't even show after a few reboots,
nor on Linux live system.

You can see smartctl output here:

ada0:

# smartctl -a /dev/ada0
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F2 EG
Device Model: SAMSUNG HD154UI
Serial Number:S24EJ9BB200080
LU WWN Device Id: 5 0024e9 2047cb78f
Firmware Version: 1AG01118
User Capacity:1,500,301,910,016 bytes [1.50 TB]
Sector Size:  512 bytes logical/physical
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:Tue Feb 14 13:51:18 2012 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection:(18863) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 255) minutes.
Conveyance self-test routine
recommended polling time:(  33) minutes.
SCT capabilities:  (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

2012-02-14 Thread Alexander Leidinger

Quoting Attilio Rao  (from Tue, 14 Feb 2012  
12:38:17 +):



2012/2/14, Alexander Leidinger :



2 SW_WATCHDOG


This can become a module with very little effort I guess.


What's the TODO list for this?

Bye,
Alexander.

--
No man is lonely while eating spaghetti.

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Reducing the need to compile a custom kernel

2012-02-14 Thread Nenhum_de_Nos


On Tue, February 14, 2012 08:31, Alexander Leidinger wrote:
> Quoting Paul Schenkeveld  (from Fri, 10 Feb 2012
> 15:44:50 +0100):
>
>> On Fri, Feb 10, 2012 at 02:56:04PM +0100, Alexander Leidinger wrote:
>>> Hi,
>>>
>>> during some big discussions in the last monts on various lists, one of
>>> the problems was that some people would like to use freebsd-update but
>>> can't as they are using a custom kernel. With all the kernel modules
>>> we provide, the need for a custom kernel should be small, but on the
>>> other hand, we do not provide a small kernel-skeleton where you can
>>> load just the modules you need.
>>>
>>> This should be easy to change. As a first step I took the generic
>>> kernel and removed all devices which are available as modules, e.g.
>>> the USB section consists now only of the USB_DEBUG option (so that the
>>> module is build like with the current generic kernel). I also removed
>>> some storage drivers which are not available as a module. The
>>> rationale is, that I can not remove CAM from the kernel config if I
>>> let those drivers inside (if those drivers are important enough,
>>> someone will probably fix the problem and add the missing pieces to
>>> generate a module).
>>>
>>> Such a kernel would cover situations where people compile their own
>>> kernel because they want to get rid of some unused kernel code (and
>>> maybe even need the memory this frees up).
>>>
>>> The question is, is this enough? Or asked differently, why are you
>>> compiling a custom kernel in a production environment (so I rule out
>>> debug options zhich are not enabled in GENERIC)? Are there options
>>> which you add which you can not add as a module (SW_WATCHDOG comes to
>>> my mind)? If yes, which ones and how important are they for you?
>>
>>  - INET without INET6
>>  - SOFTUPDATES, UFS_ACL, AUDIT, SCTP (left out for embedded devices)
>>  - Björn may add INET6 without INET
>>  - SCHED_ULE vs. SCHED_4BSD
>>  - No vga console/atkbd/psm for embedded devices
>>  - CPU_SOEKRIS, CPU_GEODE, CPU_ELAN, NO_SWAPPING for embedded devices
>
> Embedded devices are out of the scope of this, normally you do a lot
> of other modifictions to such systems anyway, so a custom kernel
> should be not a big problem.
>
> I will also not touch the dual-stack part of the kernel config (it
> shall still allow the generic purpose computing like the GERNERIC
> config).

I'm really curious why, if they are the piece of hardware that usually are 
worse to compile things
on, for access issues to poor hardware (great to compile kernel+world on i7, 
pain to do so in my
net5501-70).

its a bummer to hear this :(

matheus

>>  - IPSTEALTH, IPSEC, IPSEC_FILTERTUNNEL, IPFILTER, ALTQ for firewalls
>
> Request noted.
>
>> I also always specify exactly one CPU type (on i386), know it made a
>> difference in the 386/486/586 era but am not sure how much difference
>> it makes nowadays.
>
> The 386 part (which we do not have anymore in GENERIC) made a
> difference, the rest doesn't hurt in the kernel.
>
> Bye,
> Alexander.
>
> --
> Smuggling... It's not just a job, it's an adventure!
>   -- paid for by your local Colombian recruiting office
>
> http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
> http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
>
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>


-- 
We will call you Cygnus,
The God of balance you shall be

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

http://en.wikipedia.org/wiki/Posting_style
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

2012-02-14 Thread Attilio Rao

2012/2/14, Alexander Leidinger :
> Quoting Alexander Leidinger  (from Fri, 10
> Feb 2012 14:56:04 +0100):
>
>> Such a kernel would cover situations where people compile their own
>> kernel because they want to get rid of some unused kernel code (and
>> maybe even need the memory this frees up).
>>
>> The question is, is this enough? Or asked differently, why are you
>> compiling a custom kernel in a production environment (so I rule out
>> debug options zhich are not enabled in GENERIC)? Are there options
>> which you add which you can not add as a module (SW_WATCHDOG comes
>> to my mind)? If yes, which ones and how important are they for you?
>
> Here is what I got, the first column is the number of requests, the
> second what is requested, and the 3rd my comments (basically it means,
> if there is a comment, it is not needed/possible to include in a
> modular kernel):

...
> 2 SW_WATCHDOG

This can become a module with very little effort I guess.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

[releng_9 tinderbox] failure on sparc64/sparc64

2012-02-14 Thread FreeBSD Tinderbox

TB --- 2012-02-14 10:58:05 - tinderbox 2.9 running on freebsd-stable.sentex.ca
TB --- 2012-02-14 10:58:05 - starting RELENG_9 tinderbox run for sparc64/sparc64
TB --- 2012-02-14 10:58:05 - cleaning the object tree
TB --- 2012-02-14 10:58:05 - cvsupping the source tree
TB --- 2012-02-14 10:58:05 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_9/sparc64/sparc64/supfile
TB --- 2012-02-14 10:58:44 - building world
TB --- 2012-02-14 10:58:44 - CROSS_BUILD_TESTING=YES
TB --- 2012-02-14 10:58:44 - MAKEOBJDIRPREFIX=/obj
TB --- 2012-02-14 10:58:44 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2012-02-14 10:58:44 - SRCCONF=/dev/null
TB --- 2012-02-14 10:58:44 - TARGET=sparc64
TB --- 2012-02-14 10:58:44 - TARGET_ARCH=sparc64
TB --- 2012-02-14 10:58:44 - TZ=UTC
TB --- 2012-02-14 10:58:44 - __MAKE_CONF=/dev/null
TB --- 2012-02-14 10:58:44 - cd /src
TB --- 2012-02-14 10:58:44 - /usr/bin/make -B buildworld
>>> World build started on Tue Feb 14 10:58:46 UTC 2012
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Tue Feb 14 12:05:49 UTC 2012
TB --- 2012-02-14 12:05:49 - generating LINT kernel config
TB --- 2012-02-14 12:05:49 - cd /src/sys/sparc64/conf
TB --- 2012-02-14 12:05:49 - /usr/bin/make -B LINT
TB --- 2012-02-14 12:05:49 - cd /src/sys/sparc64/conf
TB --- 2012-02-14 12:05:49 - /usr/sbin/config -m LINT
TB --- 2012-02-14 12:05:49 - building LINT kernel
TB --- 2012-02-14 12:05:49 - CROSS_BUILD_TESTING=YES
TB --- 2012-02-14 12:05:49 - MAKEOBJDIRPREFIX=/obj
TB --- 2012-02-14 12:05:49 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2012-02-14 12:05:49 - SRCCONF=/dev/null
TB --- 2012-02-14 12:05:49 - TARGET=sparc64
TB --- 2012-02-14 12:05:49 - TARGET_ARCH=sparc64
TB --- 2012-02-14 12:05:49 - TZ=UTC
TB --- 2012-02-14 12:05:49 - __MAKE_CONF=/dev/null
TB --- 2012-02-14 12:05:49 - cd /src
TB --- 2012-02-14 12:05:49 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Tue Feb 14 12:05:49 UTC 2012
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
[...]
/usr/bin/make -V CFILES -V SYSTEM_CFILES -V GEN_CFILES |  MKDEP_CPP="cc -E" 
CC="cc" xargs mkdep -a -f .newdep -O2 -pipe -fno-strict-aliasing  -std=c99  
-Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef 
-Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs 
-fdiagnostics-show-option   -nostdinc  -I. -I/src/sys -I/src/sys/contrib/altq 
-I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/dev/ath 
-I/src/sys/dev/ath/ath_hal -I/src/sys/contrib/ngatm -I/src/sys/dev/twa 
-I/src/sys/gnu/fs/xfs/FreeBSD -I/src/sys/gnu/fs/xfs/FreeBSD/support 
-I/src/sys/gnu/fs/xfs -I/src/sys/dev/cxgb -I/src/sys/dev/cxgbe -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mcmodel=medany -msoft-float 
-ffreestanding -fstack-protector
cc: /src/sys/dev/oce/oce_hw.c: No such file or directory
cc: /src/sys/dev/oce/oce_if.c: No such file or directory
cc: /src/sys/dev/oce/oce_mbox.c: No such file or directory
cc: /src/sys/dev/oce/oce_queue.c: No such file or directory
cc: /src/sys/dev/oce/oce_sysctl.c: No such file or directory
cc: /src/sys/dev/oce/oce_util.c: No such file or directory
mkdep: compile failed
*** Error code 1

Stop in /obj/sparc64.sparc64/src/sys/LINT.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2012-02-14 12:07:30 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2012-02-14 12:07:30 - ERROR: failed to build LINT kernel
TB --- 2012-02-14 12:07:30 - 2961.26 user 499.21 system 4165.45 real


http://tinderbox.freebsd.org/tinderbox-releng_9-RELENG_9-sparc64-sparc64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

2012-02-14 Thread Alexander Leidinger

Quoting Alexander Leidinger  (from Fri, 10  
Feb 2012 14:56:04 +0100):


Such a kernel would cover situations where people compile their own  
kernel because they want to get rid of some unused kernel code (and  
maybe even need the memory this frees up).


The question is, is this enough? Or asked differently, why are you  
compiling a custom kernel in a production environment (so I rule out  
debug options zhich are not enabled in GENERIC)? Are there options  
which you add which you can not add as a module (SW_WATCHDOG comes  
to my mind)? If yes, which ones and how important are they for you?


Here is what I got, the first column is the number of requests, the  
second what is requested, and the 3rd my comments (basically it means,  
if there is a comment, it is not needed/possible to include in a  
modular kernel):

---snip---
5 IPSEC
4 ALTQ
2 VIMAGE-> not production ready (bz)
2 SW_WATCHDOG
2 IPSEC_FILTERTUNNEL-> obsolete according to bz
2 IPFIREWALL_DEFAULT_TO_ACCEPT  -> loader.conf:  
net.inet.ip.fw.default_to_accept

2 IPFIREWALL-> loader.conf: ipfw_load='YES'
2 HZ=1000   -> loader.conf: kern.hz
2 DEVICE_POLLING-> ifconfig in 9.0 handles this at runtime?
1 enc
1 ZERO_COPY_SOCKETS -> has known problems? can't find the  
reference,

   but I removed it from my kernels
1 SC_* options  -> not a generic setting, will not include
1 ROUTETABLES=n -> bz is working on this
1 QUOTA
1 PF-> loader.conf: pf_load='YES'
1 MROUTING  -> loader.conf: ip_mroute='YES'?
1 KTR   -> rare use case, kernel recompile is OK
1 KDTRACE_HOOKS -> legal review needed
1 KDB_UNATTENDED-> re@ wants this, but has reservations
1 KDB_TRACE -> re@ wants this, but has reservations
1 KDB   -> re@ wants this, but has reservations
1 IPSTEALTH
1 IPSEC_NAT_T
1 IPFIREWALL_VERBOSE_LIMIT=5
1 IPFIREWALL_VERBOSE
1 IPFIREWALL_FORWARD-> performance impact too big if  
unused (julian)

1 IPFILTER  -> 2/3 firewalls can be loaded... and this one
   is not really maintained anymore
1 IPDIVERT  -> loader.conf: ipdivert_load='YES'
1 GDB
1 FLOWTABLE
1 DUMMYNET  -> loader.conf: dummynet_load='YES'
1 DIRECTIO
1 DDB_NUMSYM
1 DDB
1 BREAK_TO_DEBUGGER -> loader.conf: debug.kdb.break_to_debugger
1 BPF_JITTER
1 ALT_BREAK_TO_DEBUGGER -> loader.conf:  
debug.kdb.alt_break_to_debugger

---snip---

Yes, this poll is not representative...

So... what's the impact of including the following options into a  
kernel which is intended to be modular, respectively are there reasons  
to _not_ include one of the following?

---snip---
5 IPSEC  -> we do not have a separate cryto
dist, so it should be possible
to include in a kernel now...
legal advise needed
4 ALTQ*  -> does add code to the pf module
other impact?
2 SW_WATCHDOG-> should not hurt if not enabled
in rc.conf
1 enc-> together with IPSEC
1 IPSTEALTH  -> changes ipfw module only?
1 IPSEC_NAT_T
1 IPFIREWALL_VERBOSE_LIMIT=5 -> changes ipfw module only?
loader tunable?
1 IPFIREWALL_VERBOSE -> changes ipfw module only?
loader tunable?
1 FLOWTABLE
1 DIRECTIO
1 BPF_JITTER
---snip---

Bye,
Alexander.

--
Q:  What is purple and concord the world?
A:  Alexander the Grape.

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: dhclient script adjustments

2012-02-14 Thread Dag-Erling Smørgrav

Jason Hellenthal  writes:
> After recent merges to stable/8 I am now seeing errors on bootup of
> the following for three interfaces that will never see the light of
> DHCP. ?
>
> /etc/rc.d/dhclient: ERROR: 'dc1' is not a DHCP-enabled interface

This is perfectly harmless.  Just ignore these messages.  They will go
away as soon as r230388 is MFCed.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Reducing the need to compile a custom kernel

2012-02-14 Thread Alexander Leidinger


Quoting Paul Schenkeveld  (from Fri, 10 Feb 2012
15:44:50 +0100):


On Fri, Feb 10, 2012 at 02:56:04PM +0100, Alexander Leidinger wrote:

Hi,

during some big discussions in the last monts on various lists, one of
the problems was that some people would like to use freebsd-update but
can't as they are using a custom kernel. With all the kernel modules
we provide, the need for a custom kernel should be small, but on the
other hand, we do not provide a small kernel-skeleton where you can
load just the modules you need.

This should be easy to change. As a first step I took the generic
kernel and removed all devices which are available as modules, e.g.
the USB section consists now only of the USB_DEBUG option (so that the
module is build like with the current generic kernel). I also removed
some storage drivers which are not available as a module. The
rationale is, that I can not remove CAM from the kernel config if I
let those drivers inside (if those drivers are important enough,
someone will probably fix the problem and add the missing pieces to
generate a module).

Such a kernel would cover situations where people compile their own
kernel because they want to get rid of some unused kernel code (and
maybe even need the memory this frees up).

The question is, is this enough? Or asked differently, why are you
compiling a custom kernel in a production environment (so I rule out
debug options zhich are not enabled in GENERIC)? Are there options
which you add which you can not add as a module (SW_WATCHDOG comes to
my mind)? If yes, which ones and how important are they for you?


 - INET without INET6
 - SOFTUPDATES, UFS_ACL, AUDIT, SCTP (left out for embedded devices)
 - Björn may add INET6 without INET
 - SCHED_ULE vs. SCHED_4BSD
 - No vga console/atkbd/psm for embedded devices
 - CPU_SOEKRIS, CPU_GEODE, CPU_ELAN, NO_SWAPPING for embedded devices


Embedded devices are out of the scope of this, normally you do a lot
of other modifictions to such systems anyway, so a custom kernel
should be not a big problem.

I will also not touch the dual-stack part of the kernel config (it
shall still allow the generic purpose computing like the GERNERIC
config).


 - IPSTEALTH, IPSEC, IPSEC_FILTERTUNNEL, IPFILTER, ALTQ for firewalls


Request noted.


I also always specify exactly one CPU type (on i386), know it made a
difference in the 386/486/586 era but am not sure how much difference
it makes nowadays.


The 386 part (which we do not have anymore in GENERIC) made a
difference, the rest doesn't hurt in the kernel.

Bye,
Alexander.

--
Smuggling... It's not just a job, it's an adventure!
-- paid for by your local Colombian recruiting office

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Jeremy Chadwick

On Tue, Feb 14, 2012 at 10:19:09AM +0100, Victor Balada Diaz wrote:
> We're having some troubles with AHCI under FreeBSD 8.2 and 8-STABLE. The 
> error is:
> 
> ahcich0: Timeout on slot 8
> ahcich0: is  cs 0100 ss  rs 0100 tfd c0 serr 
> ahcich0: AHCI reset...
> ahcich0: SATA connect time=0ms status=0123
> ahcich0: ready wait time=18ms
> ahcich0: AHCI reset done: device found
> (ada0:ahcich0:0:0:0): Request requeued
> (ada0:ahcich0:0:0:0): Retrying command
> (ada0:ahcich0:0:0:0): Command timed out
> (ada0:ahcich0:0:0:0): Retrying command
> ahcich0: Timeout on slot 8
> ahcich0: is  cs 007ff000 ss 007fff00 rs 007fff00 tfd c0 serr 
> ahcich0: AHCI reset...
> ahcich0: SATA connect time=0ms status=0123
> ahcich0: ready wait time=84ms
> ahcich0: AHCI reset done: device found
> (ada0:ahcich0:0:0:0): Request requeued
> (ada0:ahcich0:0:0:0): Retrying command
> (ada0:ahcich0:0:0:0): Command timed out
> (ada0:ahcich0:0:0:0): Retrying command
> (ada0:ahcich0:0:0:0): Request requeued
> [...]
> 
> If we use old ATA driver we have no problems. If we just use the first disk 
> (ada0) with ahci,
> no problems either. If we use both disks (ada0 and ada1) in gmirror setup 
> with ahci, we
> got the above error. If we use both disks in gmirror with old ata driver, no 
> problems.

Please provide SMART statistics for both disks by installing
ports/sysutils/smartmontools (5.42 or newer please) and running
"smartctl -a" against both disks (ada0/ada1, or ad4/ad10 -- doesn't
matter which driver you're using).  I will review the output.

-- 
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Alexander Motin


On 02/14/12 11:19, Victor Balada Diaz wrote:

We're having some troubles with AHCI under FreeBSD 8.2 and 8-STABLE. The error 
is:

ahcich0: Timeout on slot 8
ahcich0: is  cs 0100 ss  rs 0100 tfd c0 serr 
ahcich0: AHCI reset...
ahcich0: SATA connect time=0ms status=0123
ahcich0: ready wait time=18ms
ahcich0: AHCI reset done: device found
(ada0:ahcich0:0:0:0): Request requeued
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): Command timed out
(ada0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 8
ahcich0: is  cs 007ff000 ss 007fff00 rs 007fff00 tfd c0 serr 
ahcich0: AHCI reset...
ahcich0: SATA connect time=0ms status=0123
ahcich0: ready wait time=84ms
ahcich0: AHCI reset done: device found
(ada0:ahcich0:0:0:0): Request requeued
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): Command timed out
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): Request requeued
[...]

If we use old ATA driver we have no problems. If we just use the first disk 
(ada0) with ahci,
no problems either. If we use both disks (ada0 and ada1) in gmirror setup with 
ahci, we
got the above error. If we use both disks in gmirror with old ata driver, no 
problems.


In both cases controller reports command status as 0xc0, that means 
device is busy with the command. For NCQ commands it means that device 
in in stage of processing command itself, not a head positioning or data 
transfer. Enabling AHCI enables NCQ for the devices. That increases load 
on both devices and the controller, and it is difficult to say who's 
fault is here. SAMSUNG HD154UI disks AFAIR have 4k sectors that may have 
big performance penalties when accessing small/misaligned data. I am not 
sure how big that penalty can be in the worst case, especially since 
disks by default cache writes, hiding the real load level. Relations 
with gmirror is harder to explain. Depending on how you created it and 
partitions it could cause more misaligned I/Os during rebuild. Using 
gmirror also double concurrent load on the controller, but at this point 
I have nothing to blame it for.


--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Reducing the need to compile a custom kernel

2012-02-14 Thread n j

On Sun, Feb 12, 2012 at 8:52 AM, Ian Smith  wrote:
> On Fri, 10 Feb 2012 16:12:00 +, Bjoern A. Zeeb wrote:
>  > > IPFIREWALL_FORWARD
>
> Unless something's changed, julian@ has pointed out (paraphrasing) that
> this adds bits of code to various parts of the stack and was thought to
> impact performance too much when unused to conditionalise each instance.
>
> I'm unsure if this is the only case ipfw still needs building in kernel?

If something's changed, I'd really love to hear it. IPFIREWALL_FORWARD
is the most common reason I need a custom kernel (usually to solve the
issues around asymmetric/source-based policy routing on multihomed
hosts).

Really miss Linux' "ip rule... table" functionality.

Regards,
-- 
Nino
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: dhclient script adjustments

2012-02-14 Thread Baptiste Daroussin

On Tue, Feb 14, 2012 at 02:47:00AM -0500, Jason Hellenthal wrote:
> 
> Anyone ?
> 

Sorry for mess, I'm working on this to figure out why it does that.

Thanks for reporting,

regards,
Bapt


pgp2hdjzuz7zb.pgp
Description: PGP signature

Re: Regression in 8.2-STABLE bge code (from 7.4-STABLE)

2012-02-14 Thread YongHyeon PYUN

On Sat, Jan 28, 2012 at 09:24:53PM -0500, Michael L. Squires wrote:

Sorry for late reply.  Had been busy due to relocation.

> There is a bug in the Tyan S4881/S4882 PCI-X bridges that was fixed with a 
> patch in 7.x (thank you very much).  This patch is not present in the 
> 8.2-STABLE code and the symptoms (watchdog timeouts) have recurred.
> 

Hmm, I thought the mailbox reordering bug was avoided by limiting
DMA address space to 32bits but it seems it was not right workaround
for AMD 8131 PCI-X Bridge.

> The watchdog timeouts do not appear to be present after I switched to an 
> Intel gigabit PCI-X card.
> 
> I did a brute-force patch of the 8.2-STABLE bge code using the patches for
> 7.4-STABLE; the resulting code compiled and, other than odd behavior at
> startup, seems to be working normally.
> 
> This is using FreeBSD 8.2-STABLE amd64; I don't know what happens with 
> i386.
> 
> Given the age of the boards it may be easier if I just continue using the
> Intel gigabit card but am happy to test anything that comes my way.
> 

Try attached patch and let me know how it goes.
I didn't enable 64bit DMA addressing though. I think the AMD-8131
PCI-X bridge needs both workarounds.

> Thanks,
> 
> Mike Squires
> mikes at siralan.org
Index: sys/dev/bge/if_bgereg.h
===
--- sys/dev/bge/if_bgereg.h	(revision 231621)
+++ sys/dev/bge/if_bgereg.h	(working copy)
@@ -2828,6 +2828,7 @@
 #define	BGE_FLAG_RX_ALIGNBUG	0x0400
 #define	BGE_FLAG_SHORT_DMA_BUG	0x0800
 #define	BGE_FLAG_4K_RDMA_BUG	0x1000
+#define	BGE_FLAG_MBOX_REORDER	0x2000
 	uint32_t		bge_phy_flags;
 #define	BGE_PHY_NO_WIRESPEED	0x0001
 #define	BGE_PHY_ADC_BUG		0x0002
Index: sys/dev/bge/if_bge.c
===
--- sys/dev/bge/if_bge.c	(revision 231621)
+++ sys/dev/bge/if_bge.c	(working copy)
@@ -380,6 +380,8 @@
 static int bge_dma_ring_alloc(struct bge_softc *, bus_size_t, bus_size_t,
 bus_dma_tag_t *, uint8_t **, bus_dmamap_t *, bus_addr_t *, const char *);
 
+static int bge_mbox_reorder(struct bge_softc *);
+
 static int bge_get_eaddr_fw(struct bge_softc *sc, uint8_t ether_addr[]);
 static int bge_get_eaddr_mem(struct bge_softc *, uint8_t[]);
 static int bge_get_eaddr_nvram(struct bge_softc *, uint8_t[]);
@@ -635,6 +637,8 @@
 		off += BGE_LPMBX_IRQ0_HI - BGE_MBX_IRQ0_HI;
 
 	CSR_WRITE_4(sc, off, val);
+	if ((sc->bge_flags & BGE_FLAG_MBOX_REORDER) != 0)
+		CSR_READ_4(sc, off);
 }
 
 /*
@@ -2609,8 +2613,8 @@
 		 * XXX
 		 * watchdog timeout issue was observed on BCM5704 which
 		 * lives behind PCI-X bridge(e.g AMD 8131 PCI-X bridge).
-		 * Limiting DMA address space to 32bits seems to address
-		 * it.
+		 * Both limiting DMA address space to 32bits and flushing
+		 * mailbox write seem to address the issue.
 		 */
 		if (sc->bge_flags & BGE_FLAG_PCIX)
 			lowaddr = BUS_SPACE_MAXADDR_32BIT;
@@ -2775,6 +2779,42 @@
 }
 
 static int
+bge_mbox_reorder(struct bge_softc *sc)
+{
+	/* Lists of PCI bridges that are known to reorder mailbox writes. */
+	static const struct mbox_reorder {
+		const uint16_t vendor;
+		const uint16_t device;
+		const char *desc;
+	} const mbox_reorder_lists[] = {
+		{ 0x1022, 0x7450, "AMD-8131 PCI-X Bridge" },
+	};
+	devclass_t pcib;
+	device_t dev;
+	int i, count, unit;
+
+	count = sizeof(mbox_reorder_lists) / sizeof(mbox_reorder_lists[0]);
+	pcib = devclass_find("pcib");
+	for (unit = 0; unit < devclass_get_maxunit(pcib); unit++) {
+		dev = devclass_get_device(pcib, unit);
+		if (dev == NULL)
+continue;
+		for (i = 0; i < count; i++) {
+			if (pci_get_vendor(dev) ==
+			mbox_reorder_lists[i].vendor &&
+			pci_get_device(dev) ==
+			mbox_reorder_lists[i].device) {
+device_printf(sc->bge_dev,
+"enabling MBOX workaround for %s\n",
+mbox_reorder_lists[i].desc);
+return (1);
+			}
+		}
+	}
+	return (0);
+}
+
+static int
 bge_attach(device_t dev)
 {
 	struct ifnet *ifp;
@@ -3094,6 +3134,14 @@
 	if (BGE_IS_5714_FAMILY(sc) && (sc->bge_flags & BGE_FLAG_PCIX))
 		sc->bge_flags |= BGE_FLAG_40BIT_BUG;
 	/*
+	 * Some PCI-X bridges are known to trigger write reordering to
+	 * the mailbox registers. Typical phenomena is watchdog timeouts
+	 * caused by out-of-order TX completions.  Enable workaround for
+	 * PCI-X devices that live behind these bridges.
+	 */
+	if (sc->bge_flags & BGE_FLAG_PCIX && bge_mbox_reorder(sc) != 0)
+		sc->bge_flags |= BGE_FLAG_MBOX_REORDER;
+	/*
 	 * Allocate the interrupt, using MSI if possible.  These devices
 	 * support 8 MSI messages, but only the first one is used in
 	 * normal operation.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

freebsd 9-stable TOP problem from around Jan 10

2012-02-14 Thread Julian Elischer


Has anyone else seen a  problem with top -H -S?

after a short while the screen gets more and more corrupted..

hitting ^L or turning off S & H modes helps .. for a while.

If this is a known fixed problem, let me know but I need to 
co-ordinate with others

to upgrade the machine in question.

Julian

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Reducing the need to compile a custom kernel

2012-02-14 Thread Alexander Leidinger

Quoting Volodymyr Kostyrko  (from Mon, 13 Feb 2012  
17:44:33 +0200):



Alexander Leidinger wrote:

Feasible: depend upon your definition of "feasible". You would have to
add all keymaps statically into the kernel. No idea which parts exactly
we talk about, but:
---snip---
% du -h /usr/share/syscons/
40k /usr/share/syscons/scrnmaps
570k /usr/share/syscons/fonts
1.1M /usr/share/syscons/keymaps
1.8M /usr/share/syscons/
---snip---

I wouldn't mind for 40k, but 1.8M looks more like the value to calculate
with. Anyway, this is out of the scope of the original question.


Correct me if I'm wrong but zfs already fetches plain file  
/boot/zfs/zpool.cache on load. Can't this be:


 1. Postponed to later processing.
 2. After filesystems are mounted the keymap is loaded.


This is already the case. you can set the keymap in rc.conf.


Or even:

 1. Put all viable files on the / partition.
 2. Select and load correct one before kernel is fired.


This is not the same as compiling it in the kernel. Think about a  
problem where parts of your FS are corrupt / damaged / overwritten  
with nonsense. Yes you can minimize the problem by loading it more  
early, but having it in the kernel removes the keyboard problem  
completely.


Bye,
Alexander.

--
A lost ounce of gold may be found, a lost moment of time never.

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

82 matches

Mail list logo