Re: Traffic "corruption" in 12-stable

2020-08-04 Thread Joe Clarke


> On Aug 4, 2020, at 11:51, Mark Johnston  wrote:
> 
> On Mon, Aug 03, 2020 at 05:22:37PM -0400, Joe Clarke wrote:
>>> On Jul 27, 2020, at 15:41, Joe Clarke  wrote:
>>>> On Jul 27, 2020, at 15:01, Mark Johnston  wrote:
>>>> There are some fixes for vmx not present in stable/12 (yet).  I did a
>>>> merge of a number of outstanding revisions.  Would you be able to test
>>>> the patch?  I haven't observed any problems with it on a host using igb,
>>>> but I have no ability to test vmx at the moment.
>>> 
>>> I’m down to test anything.  I did notice quite a few vmxnet3 changes around 
>>> performance that appealed to me.  I tried a few of them on my last kernel.  
>>> That took much longer to exhibit the problem, but eventually did.
>>> 
>>> I can tell you I don’t have all of these patches in, though.  I’ll build 
>>> with this diff and start running it now.  I’ll let you know how it goes.
>> 
>> So it’s been just over a week of runtime with this full patch set.  I have 
>> seen no further issues with ingress packet “truncation”, and performance has 
>> been what I expect.  I’m going to keep running, but I think this seems like 
>> a good set to MFC.
> 
> Done in r363844, thanks.

Thank you.  On day 8, and still no issues.

Joe


---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-08-03 Thread Joe Clarke


> On Jul 27, 2020, at 15:41, Joe Clarke  wrote:
> 
> 
> 
>> On Jul 27, 2020, at 15:01, Mark Johnston  wrote:
>> 
>> On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote:
>>> About two weeks ago, I upgraded from the latest 11-stable to the latest 
>>> 12-stable.  After that, I periodically see the network throughput come to a 
>>> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  
>>> It acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It 
>>> runs ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a 
>>> tap0 L2 VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN 
>>> side) uses the default 1500.
>>> 
>>> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN 
>>> ping times), I know the problem has occurred because my lldpd reports:
>>> 
>>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
>>> bridge0
>>> 
>>> And if I turn on ipfw verbose messages, I see tons of:
>>> 
>>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
>>> 
>>> This leads to me to believe packets are being corrupted on ingress.  I’ve 
>>> applied all the recent iflib changes, but the problem persists. What causes 
>>> it, I don’t know.
>>> 
>>> The only thing that changed (and yes, it’s a big one) is I upgraded to 
>>> 12-stable.  Meaning, the rest of the network infra and topology has 
>>> remained the same.  This did not happen at all in 11-stable.
>>> 
>>> I’m open to suggestions.
>> 
>> There are some fixes for vmx not present in stable/12 (yet).  I did a
>> merge of a number of outstanding revisions.  Would you be able to test
>> the patch?  I haven't observed any problems with it on a host using igb,
>> but I have no ability to test vmx at the moment.
> 
> I’m down to test anything.  I did notice quite a few vmxnet3 changes around 
> performance that appealed to me.  I tried a few of them on my last kernel.  
> That took much longer to exhibit the problem, but eventually did.
> 
> I can tell you I don’t have all of these patches in, though.  I’ll build with 
> this diff and start running it now.  I’ll let you know how it goes.

So it’s been just over a week of runtime with this full patch set.  I have seen 
no further issues with ingress packet “truncation”, and performance has been 
what I expect.  I’m going to keep running, but I think this seems like a good 
set to MFC.

Thanks again for your help.

Joe


---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-07-27 Thread Joe Clarke


> On Jul 27, 2020, at 15:01, Mark Johnston  wrote:
> 
> On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote:
>> About two weeks ago, I upgraded from the latest 11-stable to the latest 
>> 12-stable.  After that, I periodically see the network throughput come to a 
>> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  
>> It acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It 
>> runs ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a 
>> tap0 L2 VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN 
>> side) uses the default 1500.
>> 
>> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN 
>> ping times), I know the problem has occurred because my lldpd reports:
>> 
>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
>> bridge0
>> 
>> And if I turn on ipfw verbose messages, I see tons of:
>> 
>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
>> 
>> This leads to me to believe packets are being corrupted on ingress.  I’ve 
>> applied all the recent iflib changes, but the problem persists. What causes 
>> it, I don’t know.
>> 
>> The only thing that changed (and yes, it’s a big one) is I upgraded to 
>> 12-stable.  Meaning, the rest of the network infra and topology has remained 
>> the same.  This did not happen at all in 11-stable.
>> 
>> I’m open to suggestions.
> 
> There are some fixes for vmx not present in stable/12 (yet).  I did a
> merge of a number of outstanding revisions.  Would you be able to test
> the patch?  I haven't observed any problems with it on a host using igb,
> but I have no ability to test vmx at the moment.

I’m down to test anything.  I did notice quite a few vmxnet3 changes around 
performance that appealed to me.  I tried a few of them on my last kernel.  
That took much longer to exhibit the problem, but eventually did.

I can tell you I don’t have all of these patches in, though.  I’ll build with 
this diff and start running it now.  I’ll let you know how it goes.

Thanks!

Joe



---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-07-27 Thread Joe Clarke


> On Jul 27, 2020, at 01:00, Eugene Grosbein  wrote:
> 
> 27.07.2020 5:16, Joe Clarke wrote:
> 
>> About two weeks ago, I upgraded from the latest 11-stable to the latest 
>> 12-stable.  After that, I periodically see the network throughput come to a 
>> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  
>> It acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It 
>> runs ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a 
>> tap0 L2 VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN 
>> side) uses the default 1500.
>> 
>> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN 
>> ping times), I know the problem has occurred because my lldpd reports:
>> 
>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
>> bridge0
>> 
>> And if I turn on ipfw verbose messages, I see tons of:
>> 
>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
>> 
>> This leads to me to believe packets are being corrupted on ingress.  I’ve 
>> applied all the recent iflib changes, but the problem persists. What causes 
>> it, I don’t know.
>> 
>> The only thing that changed (and yes, it’s a big one) is I upgraded to 
>> 12-stable.  Meaning, the rest of the network infra and topology has remained 
>> the same.  This did not happen at all in 11-stable.
>> 
>> I’m open to suggestions.
> 
> First, try: ifconfig $ifname -rxcsum -txcsum

Thanks for the suggestion.  I should have mentioned I’ve been initializing 
these two interfaces since 11-stable with:

ifconfig_vmx0="up mtu 9000 -tso -lro -vlanhwtso -rxcsum -txcsum -rxcsum6 
-txcsum6 -tso4 -tso6 -vlanhwcsum”
ifconfig_vmx1="DHCP -tso -lro -vlanhwtso -rxcsum -txcsum -rxcsum6 -txcsum6 
-tso4 -tso6 -vlanhwcsum”

And I’m running:

FreeBSD namale.marcuscom.com 12.1-STABLE FreeBSD 12.1-STABLE NAMALE  amd64 
1201520 1201520

I most recently built this yesterday, but the previous kernel that exhibited 
the problem was built about a week ago.  It had the fragment fixes for iflib.c.

Joe

> 


---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Traffic "corruption" in 12-stable

2020-07-26 Thread Joe Clarke
About two weeks ago, I upgraded from the latest 11-stable to the latest 
12-stable.  After that, I periodically see the network throughput come to a 
near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It 
acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs 
ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 
VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses 
the default 1500.

Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping 
times), I know the problem has occurred because my lldpd reports:

Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on bridge0

And if I turn on ipfw verbose messages, I see tons of:

Jul 26 16:02:23 namale kernel: ipfw: pullup failed

This leads to me to believe packets are being corrupted on ingress.  I’ve 
applied all the recent iflib changes, but the problem persists. What causes it, 
I don’t know.

The only thing that changed (and yes, it’s a big one) is I upgraded to 
12-stable.  Meaning, the rest of the network infra and topology has remained 
the same.  This did not happen at all in 11-stable.

I’m open to suggestions.

Thanks.

Joe

---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS...

2019-05-07 Thread Joe Maloney
You might look at UFS Explorer.  It claims to have ZFS support now.  It costs 
money for a license and I think required windows last I used it.  I can attest 
that a previous version allowed me to recover all the data I needed from a lost 
UFS mirror almost a decade ago.

Sent from my iPhone

> On May 7, 2019, at 9:01 PM, Michelle Sullivan  wrote:
> 
> Karl Denninger wrote:
>>> On 5/7/2019 00:02, Michelle Sullivan wrote:
>>> The problem I see with that statement is that the zfs dev mailing lists 
>>> constantly and consistently following the line of, the data is always right 
>>> there is no need for a “fsck” (which I actually get) but it’s used to shut 
>>> down every thread... the irony is I’m now installing windows 7 and SP1 on a 
>>> usb stick (well it’s actually installed, but sp1 isn’t finished yet) so I 
>>> can install a zfs data recovery tool which reports to be able to “walk the 
>>> data” to retrieve all the files...  the irony eh... install windows7 on a 
>>> usb stick to recover a FreeBSD installed zfs filesystem...  will let you 
>>> know if the tool works, but as it was recommended by a dev I’m hopeful... 
>>> have another array (with zfs I might add) loaded and ready to go... if the 
>>> data recovery is successful I’ll blow away the original machine and work 
>>> out what OS and drive setup will be safe for the data in the future.  I 
>>> might even put FreeBSD and zfs back on it, but if I do it won’t be in the 
>>> current Zraid2 config.
>> Meh.
>> 
>> Hardware failure is, well, hardware failure.  Yes, power-related
>> failures are hardware failures.
>> 
>> Never mind the potential for /software /failures.  Bugs are, well,
>> bugs.  And they're a real thing.  Never had the shortcomings of UFS bite
>> you on an "unexpected" power loss?  Well, I have.  Is ZFS absolutely
>> safe against any such event?  No, but it's safe*r*.
> 
> Yes and no ... I'll explain...
> 
>> 
>> I've yet to have ZFS lose an entire pool due to something bad happening,
>> but the same basic risk (entire filesystem being gone)
> 
> Everytime I have seen this issue (and it's been more than once - though until 
> now recoverable - even if extremely painful) - its always been during a 
> resilver of a failed drive and something happening... panic, another drive 
> failure, power etc.. any other time its rock solid... which is the yes and 
> no... under normal circumstances zfs is very very good and seems as safe as 
> or safer than UFS... but my experience is ZFS has one really bad flaw.. if 
> there is a corruption in the metadata - even if the stored data is 100% 
> correct - it will fault the pool and thats it it's gone barring some luck and 
> painful recovery (backups aside) ... this other file systems also suffer but 
> there are tools that *majority of the time* will get you out of the s**t with 
> little pain.  Barring this windows based tool I haven't been able to run yet, 
> zfs appears to have nothing.
> 
>> has occurred more
>> than once in my IT career with other filesystems -- including UFS, lowly
>> MSDOS and NTFS, never mind their predecessors all the way back to floppy
>> disks and the first 5Mb Winchesters.
> 
> Absolutely, been there done that.. and btrfs...*ouch* still as bad.. however 
> with the only one btrfs install I had (I didn't knopw it was btrfs 
> underneath, but netgear NAS...) I was still able to recover the data even 
> though it had screwed the file system so bad I vowed never to consider or use 
> it again on anything ever...
> 
>> 
>> I learned a long time ago that two is one and one is none when it comes
>> to data, and WHEN two becomes one you SWEAT, because that second failure
>> CAN happen at the worst possible time.
> 
> and does..
> 
>> 
>> As for RaidZ2 .vs. mirrored it's not as simple as you might think.
>> Mirrored vdevs can only lose one member per mirror set, unless you use
>> three-member mirrors.  That sounds insane but actually it isn't in
>> certain circumstances, such as very-read-heavy and high-performance-read
>> environments.
> 
> I know - this is why I don't use mirrored - because wear patterns will ensure 
> both sides of the mirror are closely matched.
> 
>> 
>> The short answer is that a 2-way mirrored set is materially faster on
>> reads but has no acceleration on writes, and can lose one member per
>> mirror.  If the SECOND one fails before you can resilver, and that
>> resilver takes quite a long while if the disks are large, you're dead.
>> However, if you do six drives as a 2x3 way mirror (that is, 3 vdevs each
>> of a 2-way mirror) you now have three parallel data paths going at once
>> and potentially six for reads -- and performance is MUCH better.  A
>> 3-way mirror can lose two members (and could be organized as 3x2) but
>> obviously requires lots of drive slots, 3x as much *power* per gigabyte
>> stored (and you pay for power twice; once to buy it and again to get the
>> heat out of the room where the machine is.)
> 
> my problem (as always) is slots not

Re: CFT: FreeBSD Package Base

2019-04-29 Thread Joe Maloney
With CFT version you chose to build, and package individual components such as 
sendmail with a port option.  That does entirely solve the problem of being 
able to reinstall sendmail after the fact without a rebuild of the userland 
(base) port but perhaps base flavors could solve that problem assuming flavors 
could extend beyond python.

Joe Maloney
Quality Engineering Manager / iXsystems
Enterprise Storage & Servers Driven By Open Source

> On Apr 29, 2019, at 3:31 PM, Cy Schubert  wrote:
> 
> In message <201904291441.x3tefmid072...@gndrsh.dnsmgr.net>, "Rodney W. 
> Grimes"
> writes:
>>> On Mon, Apr 29, 2019 at 10:09 AM Rodney W. Grimes <
>>> freebsd-...@gndrsh.dnsmgr.net> wrote:
>>> 
>>>>> 
>>>>> Correct, this is ZFS only. And it's something we're using specific to
>>>> FreeNAS / TrueOS, which is why I didn't originally mention it as apart of
>>>> our CFT.
>>>> 
>>>> Then please it is "CFT: FreeNAS/TrueOS pkg base, ZFS only",
>>>> calling this FreeBSD pkg base when it is not was wrong,
>>>> and miss leading.
>>>> 
>>> 
>>> Sorry, I disagree.
>> Which is fine.
>> 
>>> This pkg base is independent of the ZFS tool we're using
>>> to wrangle boot-environments. Hence why it wasn't mentioned in the CFT.
>>> These base packages work the same as existing in-tree pkg base on UFS, no
>>> difference. If anything are probably safer due to being able to update all
>>> of userland in single extract operation, so you don't have out of order
>>> extraction of libc or some such.
>> 
>> You missed the major string change and focused on the edge,
>> No comment on calling iXsystems :stuff: FreeBSD instead of FreeNAS/TrueOS?
>> 
>> That was the major point of my statement, your miss leading the user
>> community, you yourself said this would never be imported into FreeBSD
>> base, so I see no reason that it should be called "FreeBSD package Base",
>> as it is not, that is a different project.
> 
> Taking the last comment on this thread to ask a question and maybe 
> refocus a little.
> 
> The discussion about granularity begs the question, why pkgbase in the 
> first place? My impression was that it allowed people to select which 
> components they wanted to either create a lean installation or mix and 
> match base packages and ports (possibly with flavours to install in 
> /usr rather than $LOCALBASE) such that maybe person A wanted a stock 
> install while person B wanted to replace, picking a random example, BSD 
> tar with GNU tar. Isn't that the real advantage of pkgbase?
> 
> If OTOH it's binary updates V 2.0, what's the point? I'm a little 
> rhetorical here but you get my point. If I want ipfw instead pf or 
> ipfilter instead of the others I should have the freedom. Similarly if 
> I want vim instead of vi I should have the choice to install vim as 
> /usr/bin/vi. Otherwise all the effort to replace binary updates makes 
> no sense.
> 
> 
> -- 
> Cheers,
> Cy Schubert 
> FreeBSD UNIX: Web:  http://www.FreeBSD.org
> 
>   The need of the many outweighs the greed of the few.
> 
> 
> ___
> freebsd-curr...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic on 11-STABLE with Xen guest

2018-11-26 Thread Joe Clarke
On 11/26/18 13:31, John Baldwin wrote:
> On 11/22/18 12:39 PM, Joe Clarke wrote:
>> I believe after the commit 340016 for the dynamic IRQ layout, my Xen VM
>> started to panic.  I just upgraded the kernel today and saw this:
>>
>> xen: unable to map IRQ#2
>> panic: Unable to register interrupt override
>> cpuid = 0
>> KDB: stack backtrace:
>> #0 0x8060a4e7 at kdb_backtrace+0x67
>> #1 0x805c3787 at vpanic+0x177
>> #2 0x805c3603 at panic+0x43
>> #3 0x8093a766 at madt_parse_ints+0x96
>> #4 0x803353f9 at acpi_walk_subtables+0x29
>> #5 0x8093a5e6 at xenpv_register_pirqs+0x56
>> #6 0x80928296 at intr_init_sources+0x116
>> #7 0x8055eba8 at mi_startup+0x118
>> #8 0x8029902c at btext+0x2c
>>
>> The following kernel works:
>>
>> @(#)FreeBSD 11.2-STABLE #4: Thu Nov  1 02:24:07 EDT 2018
>> FreeBSD 11.2-STABLE #4: Thu Nov  1 02:24:07 EDT 2018
>> root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE
>>
>> The following kernel produces the panic above immediately on boot:
>>
>> @(#)FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018
>> FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018
>> root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE
>>
>> Attached is a screen grab of the console of the panic.
> 
> Hmm, I don't see any obvious candidates of Xen changes that weren't included
> in the MFC.  I've added royger@ (who maintains Xen in FreeBSD) to the cc to
> see if he has an idea.
> 
> Roger, the main changes that aren't MFC'd to 11 from 12/head seem to be some
> refcounting on event channels and PVHv2 vs PVHv1?

Thanks for the follow-up, John.  Apparently, there was an incomplete
MFC.  Roger added the missing bit today in r340982 which resolved the panic.

Joe

Joe

> 


-- 
PGP Key : http://www.marcuscom.com/pgp.asc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic on 11-STABLE with Xen guest

2018-11-26 Thread Joe Clarke
On 11/25/18 18:22, Richard M.Timoney wrote:
> I have the same failure to boot 11-stable as a DomU host on xen_version:
> 4.4.1
> 
> 
> Kernel I was trying was recent, FreeBSD 11.2-STABLE (GENERIC) #23
> r334205:340834
> 
> 
> commit 340016 for the dynamic IRQ layout seems rather involved and I doubt I 
> could isolate the problem, but maybe it is in

Yep.  This is what I believe as well.  I'm using Xen 3.4 with RootBSD.

Joe

> 
> 
> 338631:
> xen: legacy PVH fixes for the new interrupt count
> 
> Register interrupts using the PIC pic_register_sources method instead
> of doing it in apic_setup_io. This is now required, since the internal
> interrupt structures are not yet setup when calling apic_setup_io.
> 
> -- 
> Richard M. Timoney
>   (richa...@maths.tcd.ie)   Tel. +353-1-896 1196
> School of Mathematics, Trinity College, Dublin 2, Ireland
> WWW https://www.maths.tcd.ie/~richardt  FAX  +353-1-896 2282
> 


-- 
PGP Key : http://www.marcuscom.com/pgp.asc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Panic on 11-STABLE with Xen guest

2018-11-22 Thread Joe Clarke
I believe after the commit 340016 for the dynamic IRQ layout, my Xen VM
started to panic.  I just upgraded the kernel today and saw this:

xen: unable to map IRQ#2
panic: Unable to register interrupt override
cpuid = 0
KDB: stack backtrace:
#0 0x8060a4e7 at kdb_backtrace+0x67
#1 0x805c3787 at vpanic+0x177
#2 0x805c3603 at panic+0x43
#3 0x8093a766 at madt_parse_ints+0x96
#4 0x803353f9 at acpi_walk_subtables+0x29
#5 0x8093a5e6 at xenpv_register_pirqs+0x56
#6 0x80928296 at intr_init_sources+0x116
#7 0x8055eba8 at mi_startup+0x118
#8 0x8029902c at btext+0x2c

The following kernel works:

@(#)FreeBSD 11.2-STABLE #4: Thu Nov  1 02:24:07 EDT 2018
FreeBSD 11.2-STABLE #4: Thu Nov  1 02:24:07 EDT 2018
root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE

The following kernel produces the panic above immediately on boot:

@(#)FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018
FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018
root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE

Attached is a screen grab of the console of the panic.

Joe

-- 
PGP Key : http://www.marcuscom.com/pgp.asc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: drm / drm2 removal in 12

2018-08-27 Thread Joe Maloney
Thanks for the drm-next efforts.  I could not, and would not be using
FreeBSD without it.

Joe Maloney

On Mon, Aug 27, 2018 at 5:58 AM Thomas Mueller  wrote:

> Excerpt from Oliver Pinter:
>
> > Let's do some more step backwards, and see how the graphics driver
> > developments works from the corporation side.
> > They not bother about any of the BSDs, they focus only to Windows and
> > Linux. If you want to use a recent (haha recent, something after  2014)
> you
> > are forced to use new drivers from linux.
> > The fore/advantage on the Linux side are the zillions of corporately paid
> > kernel developers.
> > They can just focus on a new hw supports, on freebsd side, there are no
> > corporately paid drm driver developer. Sadly.
> > In linux word their internal KPI (try a Google for a "stable API
> nonsense"
> > words) moves so fastly, that porting of these drivers gets non trivial
> > without a dedicated paid team.
>
> > If you want to change on this situation, try to learn for you could help
> or
> > send directed donations to freebsd foundation. ;)
>
> Linux and FreeBSD are not the only open-source OSes.
>
> There is also (Net, Open, DragonFly)BSD, Haiku, OpenIndiana and others.
>
> Maybe better would be for the hardware manufacturers to release more
> general specifications that could be adapted to any OS, by the NetBSD
> developers, Haiku developers, etc.  Certainly not to ignore Linux.
>
> Tom
>
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Jenkins build is still unstable: FreeBSD_stable_10 #302

2016-07-10 Thread Joe Shevland
Small qualifier, I have had trouble with that. But not since, my build 
issues have been without that.


https://www.youtube.com/watch?v=I9MZNEXrElw


On 10/07/2016 9:17 PM, Joe Shevland wrote:
(My foot-shooting moments have involved LibreSSL and tomcat-native. 
I've removed them since).


On 10/07/2016 8:30 PM, Joe Shevland wrote:
I'm wondering if it's my build process where I'm seeing issues. I 
have been tracking -stable on a spare machine lately, and I've had 
about 60% success rate on a full build world/kernel etc. (following 
UPDATING instructions) on the times I do it. Been a few foot-shooting 
moments, but those aside, still what look to be a few just broken 
builds.


Typically to resolve this, I'd just 'svnlite -up' in /usr/src, and 
rebuild, and it works fine (this little Atom/Shuttle doesn't compile 
things too quickly, so that's a window of 6 hours at least).


Normally, I'm used to a gated commit system i.e. you commit changes, 
the change/s in question compiles successfully (with any other 
changes that have been committed by others), and only then those 
changes are promoted to another branch or tag (where they should 
compile w/o problems).


Is that what happens, or am I doing things wrong? I follow that 
little chunk down the bottom of UPDATING normally to do a full 
world/kernel build.


Cheers,
Joe



On 10/07/2016 5:59 PM, jenkins-ad...@freebsd.org wrote:

See <https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/302/>

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to 
"freebsd-stable-unsubscr...@freebsd.org"




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Jenkins build is still unstable: FreeBSD_stable_10 #302

2016-07-10 Thread Joe Shevland
(My foot-shooting moments have involved LibreSSL and tomcat-native. I've 
removed them since).


On 10/07/2016 8:30 PM, Joe Shevland wrote:
I'm wondering if it's my build process where I'm seeing issues. I have 
been tracking -stable on a spare machine lately, and I've had about 
60% success rate on a full build world/kernel etc. (following UPDATING 
instructions) on the times I do it. Been a few foot-shooting moments, 
but those aside, still what look to be a few just broken builds.


Typically to resolve this, I'd just 'svnlite -up' in /usr/src, and 
rebuild, and it works fine (this little Atom/Shuttle doesn't compile 
things too quickly, so that's a window of 6 hours at least).


Normally, I'm used to a gated commit system i.e. you commit changes, 
the change/s in question compiles successfully (with any other changes 
that have been committed by others), and only then those changes are 
promoted to another branch or tag (where they should compile w/o 
problems).


Is that what happens, or am I doing things wrong? I follow that little 
chunk down the bottom of UPDATING normally to do a full world/kernel 
build.


Cheers,
Joe



On 10/07/2016 5:59 PM, jenkins-ad...@freebsd.org wrote:

See <https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/302/>

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to 
"freebsd-stable-unsubscr...@freebsd.org"




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Jenkins build is still unstable: FreeBSD_stable_10 #302

2016-07-10 Thread Joe Shevland
I'm wondering if it's my build process where I'm seeing issues. I have 
been tracking -stable on a spare machine lately, and I've had about 60% 
success rate on a full build world/kernel etc. (following UPDATING 
instructions) on the times I do it. Been a few foot-shooting moments, 
but those aside, still what look to be a few just broken builds.


Typically to resolve this, I'd just 'svnlite -up' in /usr/src, and 
rebuild, and it works fine (this little Atom/Shuttle doesn't compile 
things too quickly, so that's a window of 6 hours at least).


Normally, I'm used to a gated commit system i.e. you commit changes, the 
change/s in question compiles successfully (with any other changes that 
have been committed by others), and only then those changes are promoted 
to another branch or tag (where they should compile w/o problems).


Is that what happens, or am I doing things wrong? I follow that little 
chunk down the bottom of UPDATING normally to do a full world/kernel build.


Cheers,
Joe



On 10/07/2016 5:59 PM, jenkins-ad...@freebsd.org wrote:

See <https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/302/>

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ia64 stable/10 r286316: hang at Entering /boot/kernel/kernel

2015-08-28 Thread Joe Shevland
To add a very small (useless) data point to this, I have an atom device 
that, very occasionally, hangs before the boot stage (at the little 
slash, prior to the daemon boot menu offering you the chance to select 
another kernel etc).


I haven't worked out the rhyme or reason yet, so its probably a red 
herring, but its frustrated me when i have to dig out the monitor and 
keyboard again. At least it did with 10.1-release, yet to have it happen 
with stable.


Cheers,
Joe

On 28/08/2015 8:30 PM, Anton Shterenlikht wrote:

>From kostik...@gmail.com Thu Aug 27 18:22:37 2015

On Thu, Aug 27, 2015 at 01:12:16PM +0100, Anton Shterenlikht wrote:

ia64 stable/10 r286315 boots, but
r286316 hangs at "Entering /boot/kernel/kernel".

Please advise

To state an obvious thing.  The commit which you pointed to, changes
the code which is not executed at that early kernel boot stage.  The
revision cannot cause the consequences you described.

yes, I'm surprised too.


I think that you either have build-environment issue which randomly pops
up, or there is some other boot-time issue which is sporadic.  The only
suggestion I have, try many boots with kernels which look either good
or bad, I would be not surprised if statistic would be completely
different from binary good/bad outcome.

Otherwise, I do not have an idea.


I doubt it's a random or a sporadic issue.
I did a bisection, as suggested, during which
I built world/kernel on 7 revisions, and when I
narrowed it down to <50, a further 4 kernels.
All kernels <=286315 boot, all kernels >= 286316
do not. I think if it were something random,
it wouldn't be such a clear cut picture.

What about my loader.conf:

# cat /boot/loader.conf
zfs_load="YES"
# soft limits
kern.dfldsiz=536748032  # default soft limit for process data
kern.dflssiz=536748032  # default soft limit for stack
# hard limits
kern.maxdsiz=536748032  # hard limit for process data
kern.maxssiz=536748032  # hard limit for stack
kern.maxtsiz=536748032  # hard limit for text size
 # processes may not exceed these limits.
#

My memory:

real memory  = 8589934592 (8192 MB)
avail memory = 8387649536 (7999 MB)

I'll try disabling all these settings in loader.conf
and see if makes a difference.
But these settings have been there for a few years
with no problems.

Anton

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


image solutions

2015-05-11 Thread Joe

How are you?

We offer photo editing:
Like ecommerce photos editing, jewelry photo retouching, beauty retouching,
Wedding photos editing, image cut out and clipping path.

Quality is good
Turnaround time fast

You may send us a test photo to judge our quality.

Have a good day!

Best regards,
Joe
Email: songe...@tom.com

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


mps in GENERIC in FreeBSD 9.2R i386

2013-10-03 Thread Joe Greco
Did nobody ever verify this for Ken?

> On Mon, Oct 01, 2012 at 23:38:33 +0530, Desai, Kashyap wrote:
> > 
> > 
> > > -Original Message-
> > > From: owner-freebsd-stable at freebsd.org [mailto:owner-freebsd-
> > > stable at freebsd.org] On Behalf Of Kenneth D. Merry
> > > Sent: Monday, October 01, 2012 8:58 PM
> > > To: John Baldwin
> > > Cc: Harald Schmalzbauer; freebsd-stable at freebsd.org
> > > Subject: Re: mps in GENERIC, only in amd64? (RELENG_9_1)
> > > 
> > > On Mon, Oct 01, 2012 at 08:49:36 -0400, John Baldwin wrote:
> > > > On Saturday, September 29, 2012 5:58:42 am Harald Schmalzbauer wrote:
> > > > >  Hello,
> > > > >
> > > > > accidentally I saw that mps is included in sys/amd64/conf/GENERIC,
> > > but
> > > > > not in sys/i386/conf/GENERIC.
> > > > > Is this intended?
> > > >
> > > > Have you tested it on i386?  From the log message, Ken (cc'd) only
> > > added it
> > > > on amd64 as it hadn't been tested on i386.
> > > 
> > > That was certainly the case two years ago.  Since then, though, I think
> > > the LSI folks have tested it on i386.  If we get reports of success
> > > using it on i386, I don't see any issue with putting it in GENERIC.
> > 
> > YES LSI has tested i386 arch on different Released FreeBSDs of 7.x, 8.x and 
> > 9.x
> series.
> > 
> 
> That confirms it.  I'll go ahead and check it into head if someone with an
> i386 build environment can confirm that the driver in head builds properly
> on i386.
> 
> Thanks,
> 
> Ken

It seems to compile cleanly on i386.  I don't have an easy way to test
it though (only compiling in a VM).

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 9.2-PRE: switch off that stupid "Nakatomi Socrates"

2013-09-30 Thread Joe Holden

On 30/09/2013 14:50, Matthieu Volat wrote:


Le 30 sept. 2013 à 01:54, Ricardo Ferreira 
 a écrit :


Em 29-09-2013 19:11, Charles Sprickman escreveu:

On Sep 29, 2013, at 3:28 PM, C. P. Ghost wrote:


On 28.09.2013 11:32, Phil Regnauld wrote:

Teske, Devin (Devin.Teske) writes:

If you work seriously on serious issues long enough... you'll become burned-
out. Let me just come right out and say it...

I coded it.

And thanks, you got me chuckling - nice to see some humor once in a 
while.

To the offended poster: read the last line of tunefs(8) - there's 
probably
many more places you could use serious time looking for deviations from
corporate correctnes.

Humor can even be etched in silicon, like e.g. on an IC created by Siemens:

http://micro.magnet.fsu.edu/creatures/pages/bunny.html

Cisco too, besides weird Star Wars ROM messages, you have stuff like the
"BFR" (Big F*cking Router, after Big F*cking Gun in Doom) screened on the PCB:

https://www.kumari.net/gallery/index.php/Technology/Networking/BFR_2_001
https://www.kumari.net/gallery/index.php/Technology/Networking/BFR_2

I have no idea what Sluggo and Nancy are doing on this board:

https://www.kumari.net/gallery/index.php/Technology/Networking/CIMG0988

Charles


;-)

-cpghost.

--
Cordula's Web. http://www.cordula.ws/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


keep it cool u have others like:



man chmod...

BUGS
 There is no perm option for the naughty bits of a horse.

and so many others. So...



I find strange nobody mentioned the one in make :)

% make love
Not War.

-- Mazhe


Alas, not for much longer as bmake doesn't handle that target:

root@build:/pseudosrc/misc # make love
make: don't know how to make love. Stop

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 9.1 ix driver vlan problem

2013-09-27 Thread Joe Holden

On 25/09/2013 22:10, Dmitry Morozovsky wrote:

On Wed, 25 Sep 2013, Rumen Telbizov wrote:


Thanks for the heads-up Oleg, although not the news that I was hoping for.

So what I am going to do right now is reinstall with 9.2 and recompile the
driver with your patch.
I'll come back to the list with my results.


FWIW, we're (with oleg@, yeah) using this patch on stable/9, so you're welcome
to test this on your 9

It's supposedly way too late to try to include this fix into 9.2-R, but maybe
it's worth the errata notice...


This happens on several other intel chipsets as well, no previous errata 
was ever noted (legacy em, for example) :(

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: virtio for 9.1-R

2012-11-27 Thread Joe Holden

On 27/11/2012 23:22, Bryan Venteicher wrote:

Hi,

- Original Message -

From: "Joe Holden" 
To: "Sergey Kandaurov" 
Cc: freebsd-stable@freebsd.org
Sent: Tuesday, November 27, 2012 2:49:07 PM
Subject: Re: virtio for 9.1-R

On 27/11/2012 19:25, Sergey Kandaurov wrote:

On 27 November 2012 22:12, Joe Holden  wrote:

Hi guys,

I can't see virtio in releng/9.1, is there any particular reason
why it
isn't going to be included given that it works reasonable well
(and is
optional anyway, so not likely to be detrimental)?


virtio appeared in stable/9 a bit after 9.1 cut off,
and it is too late now regardless of virtio shape.
Anyway you can installed it from ports.


Ah I see, doesn't really help all the people who can't install it in
KVM
and such though unfortunately, seems silly making them wait even
longer and having to use Linux :)



Yes - it is long overdue and something I plan to fix in the next
month. There have been off-list patches floating around that do
just that.

I also plan to spend my spare time in Dec. to work on FreeBSD
VirtIO improvements/bugs/nags. I've been busy with $JOB and have
been busy finishing up a VMware vmxnet driver.

Bryan


cheers
___


Sounds good, FWIW I've been using it for a while and it works rather 
well (on 9.0-R), of course this requires that the KVM instance can be 
switched to ide mode first (or a custom iso/image uploaded which isn't 
always possible)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: virtio for 9.1-R

2012-11-27 Thread Joe Holden

On 27/11/2012 19:25, Sergey Kandaurov wrote:

On 27 November 2012 22:12, Joe Holden  wrote:

Hi guys,

I can't see virtio in releng/9.1, is there any particular reason why it
isn't going to be included given that it works reasonable well (and is
optional anyway, so not likely to be detrimental)?


virtio appeared in stable/9 a bit after 9.1 cut off,
and it is too late now regardless of virtio shape.
Anyway you can installed it from ports.

Ah I see, doesn't really help all the people who can't install it in KVM 
and such though unfortunately, seems silly making them wait even

longer and having to use Linux :)

cheers
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


virtio for 9.1-R

2012-11-27 Thread Joe Holden

Hi guys,

I can't see virtio in releng/9.1, is there any particular reason why it 
isn't going to be included given that it works reasonable well (and is 
optional anyway, so not likely to be detrimental)?


Thanks,
Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Checksum errors across ZFS array

2012-07-19 Thread Dr Joe Karthauser
Hi James,

It's almost definitely a memory problem. I'd change it ASAP if I were you.

I lost about 70mb from my zfs pool for this very reason just a few weeks ago. 
Luckily I had enough snapshots from before the rot set in to recover most of 
what I lost.

Joe

-- 
Dr Joe Karthauser

On 19 Jul 2012, at 16:29, James Snow  wrote:

> I have a ZFS server on which I've seen periodic checksum errors on
> almost every drive. While scrubbing the pool last night, it began to
> report unrecoverable data errors on a single file.
> 
> I compared an md5 of the supposedly corrupted file to an md5 of the
> original copy, stored on different media. They were the same, suggesting
> no corruption.
> 
> A large file was being written to the pool while the scrub was in
> progress, and the entire array became unresponsive. The OS was still up,
> but 'zpool status' showed the scrub progress stuck at the same spot,
> with the throughput rate falling. 'shutdown -r now' stalled. Eventually
> I hard power cycled the system.
> 
> Now, attempting to read the file that ZFS reports errors on yields
> "Input/output error." The scrub completed, with the following result:
> 
>NAME STATE READ WRITE CKSUM
>tank ONLINE   0 0 7
>  mirror-0   ONLINE   0 0 0
>aacd0p1  ONLINE   0 0 0
>aacd4p1  ONLINE   0 0 1
>  mirror-1   ONLINE   0 0 0
>aacd1p1  ONLINE   0 0 0
>aacd5p1  ONLINE   0 0 0
>  mirror-2   ONLINE   0 014
>aacd2p1  ONLINE   0 014
>aacd6p1  ONLINE   0 014
>  mirror-3   ONLINE   0 0 0
>aacd3p1  ONLINE   0 0 0
>aacd7p1  ONLINE   0 0 0
> 
> The system configuration is as follows:
> 
> Controller:  Adaptec 2805 
> Motherboard: Supermicro X8STE
> Drive Cage:  2x Supermicro CSE-M35T-1
> Memory:  2x Kingston 12GB ECC (KVR1066D3E7SK3/12G)
> PSU: Nexus RX-7000
> OS:  9.0-RELEASE-p3
> ZFS: ZFS filesystem version 5, ZFS storage pool version 28
> 
> 
> The Adaptec card has 2 ports, each of which uses a 4-port fan-out cable.
> The cables are routed as shown:
> 
>  /--- aacd0 (ST1000DM003-9YN1 CC4D)
> / /-- aacd1 (ST1000DM003-9YN1 CC4D)
> p1-
> \ \-- aacd2 (WDC WD1001FALS-0 05.0)
>  \--- aacd3 (WDC WD1001FALS-0 05.0)
> 
>  /--- aacd4 (ST1000DM003-9YN1 CC4D)
> / /-- aacd5 (ST1000DM003-9YN1 CC4D)
> p2-
> \ \-- aacd6 (WDC WD1002FAEX-0 05.0)
>  \--- aacd7 (WDC WD1002FAEX-0 05.0)
> 
> You can see that each ZFS mirror device is comprised of one drive from
> each drive carrier, on separate ports, on separate cables.
> 
> Since I have seen periodic checksum errors on almost every drive but the
> only common component is the Adapter controller and the motherboard, I
> suspect the controller. (Or the motherboard, but I'm starting with the
> controller since it's much simpler to swap out.)
> 
> Could it be something else? What else I should be looking at? Any input
> greatly appreciated.
> 
> 
> -Snow
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.eventtimer.periodic

2012-03-31 Thread Joe Holden

Joe Holden wrote:

Hey,

So I have another box that has time issues since being upgraded to 
9.0-REL, again kern.eventtimer.periodic=1 seems to be the fix.


Should this perhaps be a default in future releases?


Sigh... correct list this time.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New BSD Installer

2012-02-10 Thread Joe Holden

Joe Holden wrote:

Alex Samorukov wrote:

On 02/10/2012 06:56 PM, Joe Holden wrote:

Guys,

This should really be reverted to sysinstall until the new installer 
is at least in a state where it consistently works... the most 
important part of a new users experience is the installer and the few 
new installs I have done lately I've just installed 8.2 and upgraded 
from there as the new installer is terribly buggy.



Hi,

I am highly against reverting. Old installer is not GPT aware and in 
fact is unmaintained for a very long time.



True, there is that.

About ftp - its probably needs to be handled better, but most of the 
user i think using cd/dvd image, so it is not an issue. And new 
installer is written on shell, so i think its better to fix broken 
parts then to revert it to outdated and unmaintained code.


True also perhaps, could be more user friendly though especially for 
people just installing it - I have been using my own install scripts and 
such since 5 but am giving the new installer a go at the moment...



P.S. i personally had no problems with a new installer, used it from DVD.
On a related note - does the new installer have any kind of config file 
for unattended installs a la sysinstall?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New BSD Installer

2012-02-10 Thread Joe Holden

Alex Samorukov wrote:

On 02/10/2012 06:56 PM, Joe Holden wrote:

Guys,

This should really be reverted to sysinstall until the new installer 
is at least in a state where it consistently works... the most 
important part of a new users experience is the installer and the few 
new installs I have done lately I've just installed 8.2 and upgraded 
from there as the new installer is terribly buggy.



Hi,

I am highly against reverting. Old installer is not GPT aware and in 
fact is unmaintained for a very long time.



True, there is that.

About ftp - its probably needs to be handled better, but most of the 
user i think using cd/dvd image, so it is not an issue. And new 
installer is written on shell, so i think its better to fix broken parts 
then to revert it to outdated and unmaintained code.


True also perhaps, could be more user friendly though especially for 
people just installing it - I have been using my own install scripts and 
such since 5 but am giving the new installer a go at the moment...



P.S. i personally had no problems with a new installer, used it from DVD.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New BSD Installer

2012-02-10 Thread Joe Holden

Joe Holden wrote:

Guys,

This should really be reverted to sysinstall until the new installer is 
at least in a state where it consistently works... the most important 
part of a new users experience is the installer and the few new installs 
I have done lately I've just installed 8.2 and upgraded from there as 
the new installer is terribly buggy.


Few things:

- On my installs at least, if there is an unknown ftp connection problem 
the installer will just bail and say it has been aborted - this 
consistently happens when ftp.de is selected


- there is no method of stepping back through the install

- If a dhcp lease request times out manual configuration isn't offered

Another one I've just encountered several times:

For some reason the output for setting root password has new lines and 
lots of space between the various bits of text and isn't taking any 
input (see http://i.imgur.com/lTP5b.png)


The lack of installation progress or emergency shell on another terminal 
is also something that I think should be considered - being able to see 
whats going on and getting error output from the commands the installer 
is running is invaluable.




I realise that a lot of work has gone into it and it's nice and new, but 
really unless it's finished it shouldn't be the default.


Thanks,
J
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


New BSD Installer

2012-02-10 Thread Joe Holden

Guys,

This should really be reverted to sysinstall until the new installer is 
at least in a state where it consistently works... the most important 
part of a new users experience is the installer and the few new installs 
I have done lately I've just installed 8.2 and upgraded from there as 
the new installer is terribly buggy.


Few things:

- On my installs at least, if there is an unknown ftp connection problem 
the installer will just bail and say it has been aborted - this 
consistently happens when ftp.de is selected


- there is no method of stepping back through the install

- If a dhcp lease request times out manual configuration isn't offered

I realise that a lot of work has gone into it and it's nice and new, but 
really unless it's finished it shouldn't be the default.


Thanks,
J
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Timekeeping in stable/9

2012-01-21 Thread Joe Holden

Ronald Klop wrote:
On Sat, 21 Jan 2012 14:11:51 +0100, Martin Sugioarto 
 wrote:



Am Sat, 21 Jan 2012 13:20:51 +0100
schrieb "Ronald Klop" :


Hi,

As I understand it.
Host: FreeBSD 9
Guest: WinXP

Which one has troubles with its clock? The host or the guest or both?


Hi,

only inside VirtualBox, I think it's only an application problem and
my emails would be probably better addressed to ports@. ONLY the guest
is affected when host is loaded.

I noticed additionally:

You get better results with a desync'ed clock in the guest system, when
you start "openssl speed -multi 20" or similar. Within a few seconds the
clock gets a 20 seconds difference.


How many CPU's did you assign to the guest?
Did you install virtualbox guest additions to the guest?


Here a few details (guest additions are installed):

Memory size: 1600MB
Page Fusion: off
VRAM size:   256MB
HPET:on/off (tried both settings)
Chipset: piix3
Firmware:BIOS
Number of CPUs:  1
Synthetic Cpu:   off
CPUID overrides: None
[...]
ACPI:on
IOAPIC:  off
PAE: on
Time offset: 0 ms
RTC: local time
Hardw. virt.ext: on
Hardw. virt.ext exclusive: on
Nested Paging:   on
Large Pages: on
VT-x VPID:   on
[...]
3D Acceleration: off
2D Video Acceleration: on


Do you run NTP on the guest XP also? If yes, turn it off.


Windows XP default installation (synch'ed to time.windows.com).
Switching this off, does not have any influence. I think MS-Windows
does not do continuous synchronization, only at system start, I guess.


VBox guest additions can sync the guest clock with the host.


I'll try to deinstall them. But I somehow like my shared folder.


BTW: My experience with VBox is that it is nice for hobby stuff, but
not for heavy load server stuff. VMWare does a better job there.


Yes. I know. Still VirtualBox ist nice and cheap solution.

--
Martin


BTW: I used VBox on Linux at work. Same problems. Different problems 
come and go with different versions of Linux in combination with 
different versions of VirtualBox. Using VmWare ESXI solved it. If you 
search a lot on the vmware website you will find a free version.


Ronald.
In the extreme case I have here, the host isn't taxed at all, cpu, disk 
i/o and such are almost idle but the time is skewed dramatically regardless.


For reference the settings I have are:

4 VCPUS (4 physical cores)
1GB ram
ICH9, SAS controller

If I toggle the sysctl in my previous post the problem goes way, and 
doesn't return even if the sysctl is changed back... until a reboot of 
course.  None of the pre-9 guests (there are quite a few spread across a 
couple of identical machines) exhibit the behaviour, nor does this 
particular one when reverted to a pre-upgrade snapshot, so in this case 
it is certainly not the hardware but whatever is used to keep track of 
the "ticks" (terminology probably incorrect)


Thanks,
J
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Timekeeping in stable/9

2012-01-19 Thread Joe Holden

Joe Holden wrote:

Chuck Swiger wrote:

On Jan 19, 2012, at 12:18 PM, Joe Holden wrote:
Sounds like you were looking for commercial support, since unpaid 
volunteers don't have an obligation to promptly leap out and provide 
solutions within your ETA.
Not really, just an acknowledgement would be fine.  It is what it is, 
everyday I try to argue in favour of the project, I still use it for 
myself everywhere but increasingly things happen that just don't on 
other volunteer projects... it's rather difficult to argue the case 
when they can install Ubuntu or whatever nonsense distro is the 
current favourite and it just works.  Just a bit more accurate info 
would solve it, if it doesn't do X reliably, or Y has changed, note it.


You asked a question and got two or three responses back in a day.  
You mentioned trying different timekeeping choices, but I don't recall 
seeing what your kern.timecounter sysctl values looked like; without 
that, folks are missing info that is likely to be relevant.


Ah, well

Regards,
Yeah my gripe isn't with having no responses, the handful of people that 
have responded have been helpful but ultimately no responses from anyone 
involved.  Just a one liner saying "we changed the timecounter stuff in 
9, look at sysctl tree X" would have been more than sufficient, this 
sort of thing should really be mentioned in the relnotes though...


For the record though, setting kern.eventtimer.periodic to 1 fixes the 
problem on all affected machines (returns my virtualbox guest to 
normality, reduces the drift on physical machines to 8.2 figures).


FWIW, I can't even see any notes relating to this in UPDATING either.
I should probably clarify here that some responses were received from 
the maintainers (eg: Qing for mpath) for a couple of bits of code but 
the wider issues weren't discussed further.  I'm not trying to say that 
no effort is made, but as a whole for the project to be comparable to 
the alternatives this sort of thing shouldn't happen.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Timekeeping in stable/9

2012-01-19 Thread Joe Holden

Chuck Swiger wrote:

On Jan 19, 2012, at 12:18 PM, Joe Holden wrote:

Sounds like you were looking for commercial support, since unpaid volunteers 
don't have an obligation to promptly leap out and provide solutions within your 
ETA.

Not really, just an acknowledgement would be fine.  It is what it is, everyday 
I try to argue in favour of the project, I still use it for myself everywhere 
but increasingly things happen that just don't on other volunteer projects... 
it's rather difficult to argue the case when they can install Ubuntu or 
whatever nonsense distro is the current favourite and it just works.  Just a 
bit more accurate info would solve it, if it doesn't do X reliably, or Y has 
changed, note it.


You asked a question and got two or three responses back in a day.  You 
mentioned trying different timekeeping choices, but I don't recall seeing what 
your kern.timecounter sysctl values looked like; without that, folks are 
missing info that is likely to be relevant.

Ah, well

Regards,
Yeah my gripe isn't with having no responses, the handful of people that 
have responded have been helpful but ultimately no responses from anyone 
involved.  Just a one liner saying "we changed the timecounter stuff in 
9, look at sysctl tree X" would have been more than sufficient, this 
sort of thing should really be mentioned in the relnotes though...


For the record though, setting kern.eventtimer.periodic to 1 fixes the 
problem on all affected machines (returns my virtualbox guest to 
normality, reduces the drift on physical machines to 8.2 figures).


FWIW, I can't even see any notes relating to this in UPDATING either.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Timekeeping in stable/9

2012-01-19 Thread Joe Holden

Chuck Swiger wrote:

On Jan 19, 2012, at 12:04 PM, Joe Holden wrote:

Looks like this is down to the dynamic/tickless changes in 9 (that aren't even 
noted in the release notes), the machines have now been switched to linux as 
the lack of responses/care given to my recent postings has been noted and it 
was deemed that using linux would be less hassle in the long run.


Sounds like you were looking for commercial support, since unpaid volunteers 
don't have an obligation to promptly leap out and provide solutions within your 
ETA.

Regards,
Not really, just an acknowledgement would be fine.  It is what it is, 
everyday I try to argue in favour of the project, I still use it for 
myself everywhere but increasingly things happen that just don't on 
other volunteer projects... it's rather difficult to argue the case when 
they can install Ubuntu or whatever nonsense distro is the current 
favourite and it just works.  Just a bit more accurate info would solve 
it, if it doesn't do X reliably, or Y has changed, note it.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Timekeeping in stable/9

2012-01-19 Thread Joe Holden
Looks like this is down to the dynamic/tickless changes in 9 (that 
aren't even noted in the release notes), the machines have now been 
switched to linux as the lack of responses/care given to my recent 
postings has been noted and it was deemed that using linux would be less 
hassle in the long run.


Unfortunate decision but I am inclined to agree.

Thanks,
J

Ian Lepore wrote:

On Tue, 2012-01-17 at 20:12 +0000, Joe Holden wrote:

Hi guys,

Has anyone else noticed the tendency for 9.0-R to be unable to 
accurately keep time?  I've got a couple of machines that have been 
upgraded from 8.2 that are struggling, in particular a Virtual box guest 
that was fine on 8.2, but now that's its been upgraded to 9.0 counts at 
anything from 2 to 20 seconds per 5 second sample, the result is similar 
with HPET, ACPI-fast and TSC.


I also have physical boxes which new seem to drift quite substantially, 
ntpd cannot keep up and as these boxes need to be able to report the 
time relatively accurately, it is causing problems with log times and 
such...


Any suggestions most welcome!

Thanks,
J


I finally got a 9.0 generic build done today and I've been watching the
timekeeping on 3 systems and they're all doing just fine.  Two of the
systems are performing pretty much identically to how they did on 8.2;
the clock frequency correction calculated by ntpd differs by less than
1ppm.  On the other system the kernel timekeeping routines are now
choosing to use a different clock so I don't get a direct comparison of
the old vs new drift rate, but the drift is still reasonable  (100ppm
now, used to be around 88, on an old 300mhz MediaGx-based system).

I haven't had time yet to learn about the new eventtimer stuff in 9.0,
but I know you can get some info on the choices it made via sysctl
kern.eventtimer.  Before 9.0 I'd check sysctl kern.clockrate and vmstat
-i and make sure the chosen clock is interrupting at the right rate, but
now with the eventtimer stuff there's not an obvious correlation between
hz and profhz and stathz and any particular device's interrupt rate, at
least for some clock choices (on the old MediaGx system without ACPI or
HPET it seems to work more like it used to).

-- Ian




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Timekeeping in stable/9

2012-01-17 Thread Joe Holden

Hi guys,

Has anyone else noticed the tendency for 9.0-R to be unable to 
accurately keep time?  I've got a couple of machines that have been 
upgraded from 8.2 that are struggling, in particular a Virtual box guest 
that was fine on 8.2, but now that's its been upgraded to 9.0 counts at 
anything from 2 to 20 seconds per 5 second sample, the result is similar 
with HPET, ACPI-fast and TSC.


I also have physical boxes which new seem to drift quite substantially, 
ntpd cannot keep up and as these boxes need to be able to report the 
time relatively accurately, it is causing problems with log times and 
such...


Any suggestions most welcome!

Thanks,
J
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: UFS corruption panic

2012-01-15 Thread Joe Holden
Actually, that would be a safe assumption especially now that the
installer rightly or wrongly defaults to a single / filesystem, but
perhaps if it could be tunable via mount flags that would be sensible
also...

Thanks,
J

On Sun, Jan 15, 2012 at 12:48 PM, Bruce Cran  wrote:
>
> On 15/01/2012 08:12, Stefan Bethke wrote:
>>
>> Yes, a panic is the correct action here.  While I agree that it's super 
>> annoying, the filesystem notices that something is *really* wrong.  Instead 
>> of letting the problem fester and continue to corrupt data, it stops the 
>> system.
>
>
> One could argue instead that for non-root filesystems the correct action is 
> to stop all operations on that filesystem but let the rest of the system 
> continue.
>
> --
> Bruce Cran
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


UFS corruption panic

2012-01-14 Thread Joe Holden
Guys

Is a panic **really** appropriate for a filesystem that isn't even in
fstab?

ie;
panic: ufs_dirbad: /mnt: bad dir ino 3229 at offset 0: mangled entry

Which happened to be an file-backed md volume that got changed as I forgot
to unmount it beforehand, however as a result there is now inconsistencies
and probably data corruption or even missing data on other important
filesystems (ie; /, /var etc) because there wasn't even a sync or any kind
of other sensible behaviour.

This is on a production box, which also has gmirror so I now have no idea
what state it's going to be in when I can get a display attached.

Surely the appropriate response here for non-critical filesystems is to
warn and suggest manually inspecting it as turning a working production box
into one thats dead in the water seems a little extreme.


J
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


UFS corruption panic

2012-01-14 Thread Joe Holden
 Guys

Is a panic **really** appropriate for a filesystem that isn't even in
fstab?

ie;
panic: ufs_dirbad: /mnt: bad dir ino 3229 at offset 0: mangled entry

Which happened to be an file-backed md volume that got changed as I forgot
to unmount it beforehand, however as a result there is now inconsistencies
and probably data corruption or even missing data on other important
filesystems (ie; /, /var etc) because there wasn't even a sync or any kind
of other sensible behaviour.

This is on a production box, which also has gmirror so I now have no idea
what state it's going to be in when I can get a display attached.

Surely the appropriate response here for non-critical filesystems is to
warn and suggest manually inspecting it as turning a working production box
into one thats dead in the water seems a little extreme.

J
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: GENERIC make buildkernel error / fails - posix_fadvise

2012-01-12 Thread Joe Ennis
On Thu, 12 Jan 2012 19:11:54 -0800
Garrett Cooper  wrote:

> On Thu, Jan 12, 2012 at 5:52 PM, Doug Barton 
> wrote:
> >
> >>> chflags -R noschg /usr/obj/usr
> >>> rm -rf /usr/obj/usr
> >
> > It's much faster to do:
> >
> > /bin/rm -rf ${obj}/* 2> /dev/null || /bin/chflags -R 0 ${obj}/* &&
> > /bin/rm -rf ${obj}/*
> 
> +1. And it's faster yet when you can run parallel copies of rm on
> different portions of the directory tree (e.g. xargs, find [..] -exec)
> as rm is O(n).
> Cheers,
> -Garrett
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscr...@freebsd.org"

What I've been doing just before I do a make buildworld/buildkernel
is:

mdmfs -s2g md1 /usr/obj

on a clean /usr/obj . If I need to recompile before a boot, just umount
and recreate.

Provides a little performance boost too.

Regards,

--
joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FLAME - security advisories on the 23rd ? uncool idea is uncool

2011-12-23 Thread Joe Holden
The serious one (telnetd) is already being exploited in the wild, and if 
you're running telnetd anyway then you can always switch to ssh or acl 
the port, either way it is a relative non-issue to ignore the update for 
now...


Damien Fleuriot wrote:

My point (which may or may not be valid) was that if the vulnerabilities
remained *undisclosed*, they would have a much lower chance of being
exploited.



On 12/23/11 5:47 PM, Joe Holden wrote:

So don't update until Monday? The outcome will be the same :)

Damien Fleuriot wrote:

Hey up list,



Look, just a rant here.


Who in *HELL* thought it would be a cool idea to release no less than
FOUR security advisories today ?

I mean, couldn't this have waited and remained undisclosed until monday ?

I for one do *NOT* relish the idea of updating 50+ boxes this evening
and tomorrow !


Not to mention a whole lot of merchants and banks have toggled IT Freeze
a few weeks ago, to ensure xmas shopping doesn't get disturbed by
production changes.


Seriously, this is just irritating.


/flame
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FLAME - security advisories on the 23rd ? uncool idea is uncool

2011-12-23 Thread Joe Holden

So don't update until Monday? The outcome will be the same :)

Damien Fleuriot wrote:

Hey up list,



Look, just a rant here.


Who in *HELL* thought it would be a cool idea to release no less than
FOUR security advisories today ?

I mean, couldn't this have waited and remained undisclosed until monday ?

I for one do *NOT* relish the idea of updating 50+ boxes this evening
and tomorrow !


Not to mention a whole lot of merchants and banks have toggled IT Freeze
a few weeks ago, to ensure xmas shopping doesn't get disturbed by
production changes.


Seriously, this is just irritating.


/flame
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

2011-12-15 Thread Joe Holden

Arnaud Lacombe wrote:

Hi,

On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann
 wrote:

Just saw this shot benchmark on Phoronix dot com today:

http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA


it might be worth highlighting that despite Oracle Linux 6.1 Server is
using a kernel + compiler almost 2 years old, it still manages to
out-perform the bleeding edge FreeBSD :-)


serenity# gcc --version
gcc (GCC) 4.2.1 20070831 patched [FreeBSD]

serenity# uname -r
9.0-RC3


Now, from what I've read so far in this thread, it seems that a lot of
people are still in abnegation...

my 0.2c,
 - Arnaud


It may be worth to discuss the sad performance of FBSD in some parts of
the benchmark. A difference of a factor 10 or 100 is simply far beyond
disapointing, it is more than inacceptable and by just reading those
benchmarks, I'd like to drop thinking of using FreeBSD even as a backend
server in scientific and business environments. In detail, some of the
SciMark benches look disappointing. The overall image can't help over
the fact that in C-Ray FreeBSD is better performing.

From the compiler, I'd like say there couldn't be a drop of more than 10
- 15% in performance - but not 10 or 100 times.

I'm just thinking about the discussion of SCHED_ULE and all the saur
spots we discussed when I stumbled over the test.

Regards,
Oliver


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Unable to attach USB disks at boot time

2011-07-03 Thread Joe Marcus Clarke
I have a VMware ESX 4.1 Update 1 server (underlying hardware is a Cisco
UCS C210) to which I have connected two WD My Book 1130 drives.  I have
allocated both drives to my FreeBSD RELENG_8 VM (amd64).  At boot time,
I see:

Root mount waiting for: usbus1
usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored)
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT,
ignored)
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT,
ignored)
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
ugen1.2:  at usbus1 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: usbus1
Root mount waiting for: usbus1
usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored)
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT,
ignored)
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT,
ignored)
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
Root mount waiting for: usbus1
ugen1.2:  at usbus1 (disconnected)
uhub_reattach_port: could not allocate new device

However, once FreeBSD is fully booted, I can unattach then reattach the
drives (though VIC), and they attach just fine:

ugen1.2:  at usbus1
umass0:  on usbus1
umass0:  SCSI over Bulk-Only; quirks = 0x
umass0:1:0:-1: Attached to scbus1
da1 at umass-sim0 bus 0 scbus1 target 0 lun 0
da1:  Fixed Direct Access SCSI-6 device
da1: 40.000MB/s transfers
da1: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C)
ses0 at umass-sim0 bus 0 scbus1 target 0 lun 1
ses0:  Fixed Enclosure Services SCSI-6 device
ses0: 40.000MB/s transfers
ses0: SCSI-3 SES Device
ugen1.3:  at usbus1
umass1:  on usbus1
umass1:  SCSI over Bulk-Only; quirks = 0x
umass1:2:1:-1: Attached to scbus2
da2 at umass-sim1 bus 1 scbus2 target 0 lun 0
da2:  Fixed Direct Access SCSI-6 device
da2: 40.000MB/s transfers
da2: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C)
ses1 at umass-sim1 bus 1 scbus2 target 0 lun 1
ses1:  Fixed Enclosure Services SCSI-6 device
ses1: 40.000MB/s transfers
ses1: SCSI-3 SES Device

I'm running FreeBSD RELENG_8 from Sat Jul  2 17:40:20 EDT 2011.  I had
an older Maxtor drive connected to this VM previously, and it was
working fine.  These WD drives are USB 3, but operating under USB 2
mode.  Any advice?  Thanks.

Joe

-- 
Joe Marcus Clarke
FreeBSD GNOME Team  ::  gn...@freebsd.org
FreeNode / #freebsd-gnome
http://www.FreeBSD.org/gnome
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


LTO3 tape drive not detected

2011-06-23 Thread Joe in MPLS
This was originally posted on the freebsd-questions list. It was 
suggested that I post it here:


I have FreeBSD 8.2-RELEASE running on an HP DL360 G5. I recently added 
an (HP branded) LSI Logic single channel SCSI 320 card and attached an 
HP Ultrium 920 LTO3 tape drive.


The system sees the SCSI controller as mpt0, and it seems to know 
there's something at SCSI ID 4, but I get an "AutoSense Failed" for 
hba/id/lun 0:4:0 at boot and subsequent camcontrol rescans.


I checked the supported hardware doc for the release but it doesn't get 
very specific about tape drives. This is my first experience with LTO3 
tape. I was hoping that I'd automagically get a /dev/sa0 device like I 
always did with my old DLT drives but it wasn't to be this time.


Is there a way to make this drive work?


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HEADS UP: FreeBSD 6.4 and 8.0 EoLs coming soon

2010-09-21 Thread Joe Shevland

 On 21/09/2010 11:49 PM, Willem Jan Withagen wrote:

On 2010-09-21 15:16, Jeremy Chadwick wrote:

On Tue, Sep 21, 2010 at 02:59:46PM +0200, Willem Jan Withagen wrote:

On 2010-09-21 13:39, {some mysterious person :-)} wrote:
The Project is ultimately about the users, right? There are early 
signs that
some old FreeBSD users get tired from those changes, those 
removals, lesser
POLA adherence, marketing-not-technical-stuff for 
time-not-feature-based
releases, not so stable -STABLE as it used to be, and so on, 
migrating to
other systems. And older users are more valuable to project than 
newer ones.
May be it's time to revert to some of thet Old Good Things, if 
decade-long
project is mostly ended, while those signs are still early and not 
a strong
tendency?.. Given this thread, I've mentioned earlier about 12 
messages in
announce@ from 2002 with such public calls for volunteers - there 
are several

years already without these.


Andriy wasn't the one who wrote this.  In fact, I'm not sure who the
quote actually came from because I never received the Email it came
from, but I'm under the impression it's from Vadim.  My mail spool:


My bad for not checking the included reference.
I was also very much under the impression that that quote was Vadim's, 
since it was in completeline with his previous complaints/rants/whining.


And yes, your are smart to stay out of the discussion. But this old 
fart just had too much urge to react. So now I'll just go back to my 
old lurking state.


My thoughts are below - remembering its a volunteer project, people 
spend their precious time to make it happen, and 
noneofthatwisthandingitsstilldamngood:


a) if you don't like it, fix it.
b) if you can't fix it, pay someone else to fix it
c) if you can't fix it or otherwise be helpful, remain silent

If you can't do a or b or c, and still have no options, below:

d) whinging never helps
e) those that whinge on volunteer projects are subject to the emperors wrath
f) kill the heretic, the witch, the unbeliever. Recover the gene-seed at 
all costs.


Cheers
Joe



--WjW

___
freebsd-secur...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-security
To unsubscribe, send any mail to 
"freebsd-security-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates

2009-06-17 Thread Joe Koberg
The difference in layout can easily explain a 2x difference in 
sequential transfer performance.


I seriously doubt your disk is really getting 23K seeks/s done in the 
UFS case - 100/s sounds much more reasonable for real hardware. Perhaps 
the results of caching?



Joe Koberg




Dan Naumov wrote:

I am wondering if the numbers I am seeing is something expected or is
something broken somewhere. Output of bonnie -s 1024:

on UFS2 + SoftUpdates:

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6 23603.8 
243.3

on ZFS:

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 102.2  1.2


atom# cat /boot/loader.conf
vm.kmem_size="1024M"
vm.kmem_size_max="1024M"
vfs.zfs.arc_max="96M"

The test isn't completely fair in that the test on UFS2 is done on a
partition that resides on the first 16gb of a 2tb disk while the zfs
test is done on the enormous 1,9tb zfs pool that comes after that
partition (same disk). Can this difference in layout make up for the
huge difference in performance or is there something else in play? The
system is an Intel Atom 330 dualcore, 2gb ram, Western Digital Green
2tb disk. Also what would be another good way to get good numbers for
comparing the performance of UFS2 vs ZFS on the same system.


Sincerely,
- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

  

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-06-01 Thread Joe Karthauser

on 23/05/2009 05:26 Alexander Motin said the following:

Hi.

Joe Karthauser wrote:

I spoke too soon. It must have just randomly booted, because it is now
hanging again. No amount of jiggling cables has made any difference.


Can you provide verbose boot messages of your system from the beginning
up to the problem? Especially, all related to the ATA.



Attached.

>

Do you have AHCI mode enabled in BIOS, or you using legacy ATA emulation?



It's set up as AHCI in the bios.

What is strange is that it has now started working again. I can't make 
any sense of it. The machine boots up fine.  It was definitely hanging 
at the ata probes though, just after the ZFS messages are output.


Joe
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.2-STABLE #7: Fri May 22 23:10:15 BST 2009
r...@athenaeum.tao.org.uk:/usr/obj/usr/src/sys/ATHENAEUM
Preloaded elf kernel "/boot/kernel/kernel" at 0x80b47000.
Preloaded elf module "/boot/kernel/zfs.ko" at 0x80b4719c.
Preloaded elf module "/boot/kernel/opensolaris.ko" at 0x80b47244.
Preloaded elf module "/boot/kernel/geom_eli.ko" at 0x80b472f4.
Preloaded elf module "/boot/kernel/crypto.ko" at 0x80b473a4.
Preloaded elf module "/boot/kernel/zlib.ko" at 0x80b47450.
Preloaded elf module "/boot/kernel/geom_label.ko" at 0x80b474fc.
Preloaded elf module "/boot/kernel/geom_mirror.ko" at 0x80b475ac.
Preloaded /boot/zfs/zpool.cache "/boot/zfs/zpool.cache" at 0x80b4765c.
Preloaded elf module "/boot/kernel/acpi.ko" at 0x80b476b4.
module_register: module g_label already exists!
Module g_label failed to register: 17
Calibrating clock(s) ... i8254 clock: 1192003 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter "i8254" frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 2402413236 Hz
CPU: Intel(R) Core(TM)2 Quad CPUQ6600  @ 2.40GHz (2402.41-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6fb  Stepping = 11
  
Features=0xbfebfbff
  Features2=0xe3bd
  AMD Features=0x2010
  AMD Features2=0x1
  Cores per package: 4

Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries
1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size
1st-level data cache: 32 KB, 8-way set associative, 64 byte line size
L2 cache: 4096 kbytes, 16-way associative, 64 bytes/line
real memory  = 3756916736 (3582 MB)
Physical memory chunk(s):
0x1000 - 0x0009dfff, 643072 bytes (157 pages)
0x0010 - 0x003f, 3145728 bytes (768 pages)
0x00c25000 - 0xdbf7, 3677728768 bytes (897883 pages)
avail memory = 3673681920 (3503 MB)
Table 'FACP' at 0xdfee30c0
Table 'HPET' at 0xdfee7e00
Table 'MCFG' at 0xdfee7e80
Table 'APIC' at 0xdfee7d00
MADT: Found table at 0xdfee7d00
MP Configuration Table version 1.4 found at 0x800f0d00
APIC: Using the MADT enumerator.
MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
SMP: Added CPU 0 (AP)
MADT: Found CPU APIC ID 3 ACPI ID 1: enabled
SMP: Added CPU 3 (AP)
MADT: Found CPU APIC ID 2 ACPI ID 2: enabled
SMP: Added CPU 2 (AP)
MADT: Found CPU APIC ID 1 ACPI ID 3: enabled
SMP: Added CPU 1 (AP)
ACPI APIC Table: 
INTR: Adding local APIC 1 as a target
INTR: Adding local APIC 2 as a target
INTR: Adding local APIC 3 as a target
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
bios32: Found BIOS32 Service Directory header at 0x800fad30
bios32: Entry = 0xfb3f0 (800fb3f0)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xf+0xb420
pnpbios: Found PnP BIOS data at 0x800fbf90
pnpbios: Entry = f:bfc0  Rev = 1.0
Other BIOS signatures found:
APIC: CPU 0 has ACPI ID 0
APIC: CPU 1 has ACPI ID 3
APIC: CPU 2 has ACPI ID 2
APIC: CPU 3 has ACPI ID 1
ULE: setup cpu group 0
ULE: setup cpu 0
ULE: adding cpu 0 to group 0: cpus 1 mask 0x1
ULE: setup cpu group 1
ULE: setup cpu 1
ULE: adding cpu 1 to group 1: cpus 1 mask 0x2
ULE: setup cpu group 2
ULE: setup cpu 2
ULE: adding cpu 2 to group 2: cpus 1 mask 0x4
ULE: setup cpu group 3
ULE: setup cpu 3
ULE: adding cpu 3 to group 3: cpus 1 mask 0x8
This module (opensolaris) contains code covered by the
Common Development and Distribution License (CDDL)
see http://opensolaris.org/os/licensing/opensolaris_license/
ACPI: RSDP @ 0x0xf6c30/0x0014 (v  0 GBT   )
ACPI: RSDT @ 0x0xdfee3040/0x0034 (v  1 GBTGBTUACPI 0x42302E31 GBTU 
0x01010101)
ACPI: FACP @ 0x0xdfee30c0/0x0074 (v  1 GBTGBTUACPI 0x42302E31 GBTU 
0x01010101)
ACPI: DSDT @ 0x0xdfee3180/0x4B32 (v  1 GBTGBTUACPI 0x1000 MSFT 
0x010C)
ACPI: FACS @ 0x0xdfee/0x0040
ACPI: HPET @ 0x0xdfe

Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser
I spoke too soon. It must have just randomly booted, because it is now 
hanging again. No amount of jiggling cables has made any difference.


:(.

Joe

on 22/05/2009 20:40 Joe Karthauser said the following:

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem
goes away if I disconnect some combination of bays.

Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old
I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser
wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my
RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't
upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start
zfs, and
mount all the partitions. However, one of the disks is missing
more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35
board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,

although it might be a DS4 variant). I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
5-1/4"
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem
irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for
about two
years.

But, now it hangs in the same place no matter what disk I boot on
(I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other
buses,
but
not on the last one. It's not the disk, because if I swap it into
another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the
hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to
"freebsd-stable-unsubscr...@freebsd.org"














___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser
This appears to have gone away now. I unplugged the bay that was causing 
the trouble, and the system booted just fine on the remaining 4 drives. 
Then I plugged the bay back in (live) and did an atacontrol 
detach/attach on that bus (I wonder why I always have to do that). The 
drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to 
make sure that everything is good, and I'll do a reboot and see if it's 
all ok after that.


Strange, so it looks like a cable might have got a little loose or 
something. I wonder why that would have hung the kernel probe though.


Joe

on 22/05/2009 20:40 Joe Karthauser said the following:

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem
goes away if I disconnect some combination of bays.

Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old
I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser
wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my
RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't
upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start
zfs, and
mount all the partitions. However, one of the disks is missing
more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35
board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,

although it might be a DS4 variant). I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
5-1/4"
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem
irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for
about two
years.

But, now it hangs in the same place no matter what disk I boot on
(I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other
buses,
but
not on the last one. It's not the disk, because if I swap it into
another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the
hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem 
goes away if I disconnect some combination of bays.


Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser  wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserwrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start zfs, and
mount all the partitions. However, one of the disks is missing more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35 board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,
although it might be a DS4 variant).  I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4"
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for about two
years.

But, now it hangs in the same place no matter what disk I boot on (I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other buses,
but
not on the last one. It's not the disk, because if I swap it into another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"












___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)

2009-05-22 Thread Joe Karthauser

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old I 
still get the same problem. Very confusing. :(.


Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser  wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the rest
of each drive).

What I did was to update the userland, and then reboot. I didn't upgrade the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot single
user with acpi switched off. When I do that I can manually start zfs, and
mount all the partitions. However, one of the disks is missing more on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35 board,
similar to this
http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,
although it might be a DS4 variant).  I've got 5 of the 6 sata ports wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem irrespective of
any zfs pool, etc. And, indeed, this has been working fine for about two
years.

But, now it hangs in the same place no matter what disk I boot on (I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol
attach sata4' and the device reappears. This happens on the other buses, but
not on the last one. It's not the disk, because if I swap it into another
bay, it comes up and appears on the bus. On the other hand it doesn't appear
to be that controller or slow in the drive bay because if I unplug all the
over disks the system will boot that disk and get as far as the hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"







___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)

2009-05-21 Thread Joe Karthauser
Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 
server, which now doesn't boot. :(.


So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 
disks (gmirror on 500Mb partition on each of five disks, and zraid2 over 
the rest of each drive).


What I did was to update the userland, and then reboot. I didn't upgrade 
the kernel (but I've subsequently done that and have the same problem).


What happens is that the kernel hangs booting just after displaying a 
LABEL message or ZFS pool/spool message. I _can_ get it to boot if I 
boot single user with acpi switched off. When I do that I can manually 
start zfs, and mount all the partitions. However, one of the disks is 
missing more on that next.


The machine is running a gigabyte motherboard (domestic gamer P35 board, 
similar to this 
http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, 
although it might be a DS4 variant).  I've got 5 of the 6 sata ports 
wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 
5-1/4" bays kind of thing).


Now, because of the gmirror I can boot the system on any disk, or 
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem irrespective 
of any zfs pool, etc. And, indeed, this has been working fine for about 
two years.


But, now it hangs in the same place no matter what disk I boot on (I've 
tried every bay).


But, without ACPI enabled it does appear to boot ok... what's going on 
here? Is it possible that the machine has developed a hardware fault?


Ok, finally, if I boot with ACPI disabled then one of the disks is 
missing. If I unplug it I get a disconnect message from the ata device, 
and a reconnect and reinit attempt when I plug it back in, but no device 
appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 
1; atacontrol attach sata4' and the device reappears. This happens on 
the other buses, but not on the last one. It's not the disk, because if 
I swap it into another bay, it comes up and appears on the bus. On the 
other hand it doesn't appear to be that controller or slow in the drive 
bay because if I unplug all the over disks the system will boot that 
disk and get as far as the hang hmm.


Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Error message: run_interrupt_driven_hooks:...

2009-05-11 Thread Joe A.
Greetings...

Basic data on my experience with the xpt_config hang; I have more
detail if needed, but I doubt anyone will believe it. I'm not even
sure I do.

Some other reports:

http://lists.freebsd.org/pipermail/freebsd-questions/2009-April/196116.html
Seur Bors Thu Apr 9 14:43:34 UTC 2009.

http://lists.freebsd.org/pipermail/freebsd-stable/2009-May/049901.html
martinko gamato Mon May 11 22:05:56 UTC 2009

http://www.nabble.com/Freebsd-7.2-RC-boot-problem-tt23257632.html#a23257632

http://forums.pcbsd.org/viewtopic.php?f=1&t=13312

Here is the entire error for me during boot:

run_interrupt_driven_hooks: still waiting after BIGNUM seconds for xpt_config

It hangs after this point in the boot process:

pcm0: 
pcm0: 

the boot process does not continue, so the next normal thing does not
appear on the console:

SMP: AP CPU #1 Launched!

but during the hang, this scrolls past (punctuated by the BIGNUM
seconds wait) over and over on the console:

acpi_tz0: _TMP value is absurd, ignored (-269.4C)

Normally, that message is suppressed by this /etc/sysctl.conf entry:

hw.acpi.thermal.polling_rate=0

I suppose this means that /etc/sysctl.conf is not parsed and the
second CPU is not launched.

Hardware in question, as seen by dmesg, is attached; the vendor's
specs are:

Core 2 Duo (C) E6400 2.13 GHz 1066 MHz front side bus Socket 775
Chipset P965
Motherboard: Asus P5BW-LA
HP/Compaq motherboard name: Basswood-UL8E

There is RAID on the motherboard; I don't use it. I do use AHCI. BIOS
is current; there are no available updates. The onboard firewire is
disabled, since it began (prior to 7.1) causing unresolvable panics.

CAM is in my kernel:

# SCSI peripherals
#Added atapicam; apparently, cdparanoia requires it.
device  atapicam
device  scbus   # SCSI bus (required for SCSI)
device  da  # Direct Access (disks)
device  sa  # Sequential Access (tape etc)
device  cd  # CD
device  pass# Passthrough device (direct SCSI access)

As of 9:30 PM EDT May 11, the issue has de-Heisenberged from my PC.

I'm not subscribed to the list; so you'll need to Cc: me if you think
I can help.
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.1-RELEASE-p5 #0: Sun May  3 06:43:50 EDT 2009
r...@whisperer.chthonixia.net:/usr/obj/usr/src/sys/WHISPERER
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 CPU  6400  @ 2.13GHz (2135.55-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6
  
Features=0xbfebfbff
  Features2=0xe3bd
  AMD Features=0x2000
  AMD Features2=0x1
  Cores per package: 2
real memory  = 2146299904 (2046 MB)
avail memory = 2094936064 (1997 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 4
ioapic0  irqs 0-23 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0:  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, 7fde (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
acpi_hpet0:  iomem 0xfed0-0xfed003ff on acpi0
device_attach: acpi_hpet0 attach returned 12
acpi_button0:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  irq 16 at device 1.0 on pci0
pci1:  on pcib1
vgapci0:  port 0xde00-0xdeff mem 
0xe000-0xefff,0xfddf-0xfddf irq 16 at device 0.0 on pci1
vgapci1:  mem 0xfdde-0xfdde at device 0.1 on 
pci1
uhci0:  port 0xff00-0xff1f irq 21 at 
device 26.0 on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0:  on uhci0
usb0: USB revision 1.0
uhub0:  on usb0
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 0xfe00-0xfe1f irq 18 at 
device 26.1 on pci0
uhci1: [GIANT-LOCKED]
uhci1: [ITHREAD]
usb1:  on uhci1
usb1: USB revision 1.0
uhub1:  on usb1
uhub1: 2 ports with 2 removable, self powered
ehci0:  mem 
0xfdfff000-0xfdfff3ff irq 21 at device 26.7 on pci0
ehci0: [GIANT-LOCKED]
ehci0: [ITHREAD]
usb2: EHCI version 1.0
usb2: companion controllers, 2 ports each: usb0 usb1
usb2:  on ehci0
usb2: USB revision 2.0
uhub2:  on usb2
uhub2: 4 ports with 4 removable, self powered
pcm0:  mem 0xfdff4000-0xfdff7fff 
irq 22 at device 27.0 on pci0
pcm0: [ITHREAD]
uhci2:  port 0xfd00-0xfd1f irq 23 at 
device 29.0 on pci0
uhci2: [GIANT-LOCKED]
uhci2: [ITHREAD]
usb3:  on uhci2
usb3: USB revision 1.0
uhub3:  on usb3
uhub3: 2 ports with 2 removable, self powered
uhci3:  port 0xfc00-0xfc1f irq 17 at 
device 29.1 on pci0
uhci3: [GIANT-LOCKED]
uhci3: [ITHREAD]
usb4:  on uhci3
usb4: USB revision 1.0
uhub4:  on usb

Re: xorg 7.4 keyboard localisation (xorg.conf vs hal)

2009-02-03 Thread Joe Kelsey

Robert Noland wrote:

On Tue, 2009-02-03 at 15:07 +0100, Sebastien Chassot wrote:
  

On Mon, 2009-02-02 at 16:05 -0500, Robert Noland wrote:


On Mon, 2009-02-02 at 12:43 -0800, Joe Kelsey wrote:
  

Robert Noland wrote:


man xorg.conf search for Input...

  
  

This provides absolutely no help.

I look at my /var/log/Xorg.0.log and it tells me nothing.  If I remove 
the keyboard and mouse input devices from xorg.conf, the log tells me 
that it is disabling all input devices and never says anything else.  
There is no evidence that hal does anything that X want to know about.  
How would I detect that my configuration file needs changing?  Is there 
a message in Xorg.0.log to look for?  Is there a help file somehwere 
which explains how to change your configuration file to allow hal to work?


I cannot find any information anywhere in the system to allow me to 
debug my problems in any way.  I want to have fully automatic 
configuration using whatever means will allow it.  You explanations 
about the mysterious behavior of hal and xorg do not give me any 
information I can use in any way to solve my problems.




Set Options "AutoAddDevices" "off" and you have to configure everything
like you used to.
  
  
I WANT to use the new facilities.  Is it possible to debug my 
configuration problems?  Where do I start?  How do I enable this magical 
new world of letting hal do things for me?  What changes do I make to 
xorg.conf to allow this?


Ok, are you using gdm, xdm, or startx?

You need to ensure that dbus and hald are running first.  Set
dbus_enable="YES" and hald_enable="YES" in your rc.conf.
  

This FAQ says to remplace dbus/hal by gnome_enable="YES"



Correct, if you using gnome, that will enable hal and dbus.

  

zircon.zircon.seattle.wa.us$ ps xa | egrep hal\|dbus
 789  ??  Is 0:00.12 /usr/local/bin/dbus-daemon --system
 946  ??  Ss 0:17.94 /usr/local/sbin/hald
 951  ??  IW 0:00.00 hald-runner
 968  ??  IW 0:00.00 hald-addon-mouse-sysmouse: /dev/ums0 
(hald-addon-mous

 986  ??  S  0:09.15 hald-addon-storage: /dev/cd0 (hald-addon-storage)
1027  ??  IW 0:00.00 /usr/local/bin/dbus-launch --exit-with-session
1082  ??  IW 0:00.00 dbus-launch --exit-with-session 
/usr/local/bin/seahor
1083  ??  Is 0:00.92 /usr/local/bin/dbus-daemon --fork --print-pid 
7 --pri

42823  p1  DL+0:00.00 egrep hal|dbus

Attached is /etc/rc.conf.

/Joe


robert.

  

http://www.freebsd.org/gnome/docs/faq2.html#full-gnome




# -- sysinstall generated deltas -- # Sun Oct 23 06:00:05 2005
# Created: Sun Oct 23 06:00:05 2005
# Enable network daemons for user convenience.
# Please make all changes to this file, not to /etc/defaults/rc.conf.
# This file now contains just the overrides from /etc/defaults/rc.conf.
defaultrouter="192.168.1.1"
hostname="zircon.zircon.seattle.wa.us"
ifconfig_sk0="inet 192.168.1.3 netmask 255.255.255.0"
linux_enable="YES"
nfs_server_enable="YES"
nfs_client_enable="YES"
rpcbind_enable="YES"
rpc_statd_enable="YES"
rpc_lockd_enable="YES"
sshd_enable="YES"
usbd_enable="YES"
svscan_enable="YES"
moused_enable="YES" # Run the mouse daemon.
moused_type="auto"  # See man page for rc.conf(5) for available settings.
moused_port="/dev/ums0" # Set to your mouse port.
moused_flags="-m 3=1 -m 1=3 -m 4=6 -m 6=4 -m 5=7 -m 7=5"
mysql_enable="YES"
sendmail_enable="NO"
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_map_queue_enable="NO"
gdm_enable="YES"
dumpdev="NO"
# This file now contains just the overrides from /etc/defaults/rc.conf.
# Please make all changes to this file, not to /etc/defaults/rc.conf.

# Enable network daemons for user convenience.
ntpdate_flags=140.142.16.34
ntpdate_enable="YES"
ntpd_enable=YES
#amd_enable="YES"

dbus_enable="YES"
polkitd_enable="YES"
hald_enable="YES"

# The Fish generated deltas - Sat May  5 14:27:39 2007
weak_mountd_authentication="YES"
# added by mergebase.sh
local_startup="/usr/local/etc/rc.d"
cupsd_enable="YES"

apache22_enable="YES"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: xorg 7.4 keyboard localisation (xorg.conf vs hal)

2009-02-02 Thread Joe Kelsey

Robert Noland wrote:


man xorg.conf search for Input...

  

This provides absolutely no help.

I look at my /var/log/Xorg.0.log and it tells me nothing.  If I remove 
the keyboard and mouse input devices from xorg.conf, the log tells me 
that it is disabling all input devices and never says anything else.  
There is no evidence that hal does anything that X want to know about.  
How would I detect that my configuration file needs changing?  Is there 
a message in Xorg.0.log to look for?  Is there a help file somehwere 
which explains how to change your configuration file to allow hal to work?


I cannot find any information anywhere in the system to allow me to 
debug my problems in any way.  I want to have fully automatic 
configuration using whatever means will allow it.  You explanations 
about the mysterious behavior of hal and xorg do not give me any 
information I can use in any way to solve my problems.




Set Options "AutoAddDevices" "off" and you have to configure everything
like you used to.
  
I WANT to use the new facilities.  Is it possible to debug my 
configuration problems?  Where do I start?  How do I enable this magical 
new world of letting hal do things for me?  What changes do I make to 
xorg.conf to allow this?


/Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: xorg 7.4 keyboard localisation (xorg.conf vs hal)

2009-02-02 Thread Joe Kelsey

Robert Noland wrote:

On Sun, 2009-02-01 at 14:10 -0800, Joe Kelsey wrote:
  

Sebastien Chassot wrote:


On Sun, 2009-02-01 at 17:19 +, Daniel Bye wrote:
  
  

On Sun, Feb 01, 2009 at 05:42:39PM +0100, Sebastien Chassot wrote:



Hi,

I've upgrade to xorg7.4 and apparently keyboard and mouse are now
working with hald.

In xorg.conf changing "old" keybord config as no effect and I can't find
how change it with hal. I've got  /usr/local/etc/hal/fdi/* but no
*keymap* and I don't know how build such a file.
  
  

This should get you started:



  

  gb

  


Change the `gb' in the example to your local keymap name, save the file
as /usr/local/etc/hal/fdi/policy/x11-input.fdi and restart hald.


This seems to have a way to enable HAL to detect a keyboard and export 
it to X, but what about mice?  My Xorg log tells me that it is ignoring 
my USB mouse in addition to ignoring my keyboard, so what sort of HAL 
file do I add to enable it to find my mouse?



The above is only to set keyboard layout, everything to detect the
keyboard is already present.

  
Where in HAL documentation is this information found?  R. Noland seemed 
to think it was a trivial process to make HAL do keyboards and mice?  In 
fact it is not trivial but a pain in the ass!  If you intend to inflict 
broken software on unsuspecting users you had better think through all 
of the problems and come up with explicit solutions to all of those 
problems so that everyone has a chance to make their systems work.



We (marcus and I) have gone to great pains to try and ensure that hal
behaves correctly in pretty much all mice configurations with or without
sysmouse.  If you don't want to use hal, set AutoAddDevices off and
configure away.

  

I did my best to follow ALL of the posted directions to absolutely NO AVAIL.

When I start Xorg, it explicitly tells me it is disabling all automatic 
devices and refuses to use HAL or any other detectable methosd to find 
the mouse and/or keyboard.


There is no documentation ANYWHERE about how HAL is supposed to help in 
any of this.  There is no documentation ANYWHERE about what exaqctly the 
new Xorg is supposed to do about it.  There is no documentation ANYWHERE 
about the new, secret, hidden options that you can put in your xorg.conf 
file to disable this whole HAL mess.


The only documentation available ANYWHERE is the skimpy little paragraph 
that says, it works or it doesn't.  No explanation about why it works or 
doesn't or how to determine exactly what might be wrong in your 
configuration to make it work or not work.


I would not compalin if you actually documented what you are inflicting 
on us rather than just say, here it is, good luck!  I understand how 
difficult some of these port upgrades are, but you have to realize that 
you have not provided ANY resources that anyone else can use to help 
themselves figure out their issues.


I don't want to pay you with money I do not have from a job I do not 
have.  You have to realize how many people may or may not have problems 
due to your blithe posting of this complicated mess.  Either explain how 
to use HAL properly to configure X resources or disable the capability.


Thank you for all of your effort so far.  I really do appreciate it.

/Joe

There had better not be any more surprises waiting in the X 1.6 wings to 
surprise and confound everyone again!



Are you going to stop paying me?  You have no idea how many combinations
of hardware and configurations for X  exist, or the amount of wok that
goes into making all of those combinations work.

robert.

  

I'll start with that

Thank you


Sebastien

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


  
  

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: xorg 7.4 keyboard localisation (xorg.conf vs hal)

2009-02-01 Thread Joe Kelsey

Sebastien Chassot wrote:

On Sun, 2009-02-01 at 17:19 +, Daniel Bye wrote:
  

On Sun, Feb 01, 2009 at 05:42:39PM +0100, Sebastien Chassot wrote:


Hi,

I've upgrade to xorg7.4 and apparently keyboard and mouse are now
working with hald.

In xorg.conf changing "old" keybord config as no effect and I can't find
how change it with hal. I've got  /usr/local/etc/hal/fdi/* but no
*keymap* and I don't know how build such a file.
  

This should get you started:



  

  gb

  


Change the `gb' in the example to your local keymap name, save the file
as /usr/local/etc/hal/fdi/policy/x11-input.fdi and restart hald.

This seems to have a way to enable HAL to detect a keyboard and export 
it to X, but what about mice?  My Xorg log tells me that it is ignoring 
my USB mouse in addition to ignoring my keyboard, so what sort of HAL 
file do I add to enable it to find my mouse?


Where in HAL documentation is this information found?  R. Noland seemed 
to think it was a trivial process to make HAL do keyboards and mice?  In 
fact it is not trivial but a pain in the ass!  If you intend to inflict 
broken software on unsuspecting users you had better think through all 
of the problems and come up with explicit solutions to all of those 
problems so that everyone has a chance to make their systems work.


There had better not be any more surprises waiting in the X 1.6 wings to 
surprise and confound everyone again!


I'll start with that

Thank you


Sebastien

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Western Digital hard disks and ATA timeouts

2008-11-09 Thread Joe Kelsey

Søren Schmidt wrote:

On 7Nov, 2008, at 20:12 , Peter Wemm wrote:

On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick <[EMAIL PROTECTED]> 
wrote:

[..]

As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and
is not adjustable without editing the ATA code yourself and increasing
the value.  The FreeNAS folks have made patches available to turn the
timeout value into a sysctl.

Soren and/or others, please increase this timeout value.  Five seconds
has now been deemed too aggressive a default.  And please consider
migrating the timeout value into a sysctl.


The 5 second timeout has been a problem for quite a while actually.
I've had a number of instances where I've had to increase it to 20 or
30 seconds when recovering from marginal drives.  The longest
"successful" recovery attempt I've seen was 26 seconds, I believe on a
Maxtor drive a few years ago.   ("successful" == the drive spent 26
seconds but eventually successfully read the sector).  Even the IBM
death star drives could take much longer than 5 seconds to do a
recovery 5 years ago.  5 seconds has never been a good default.

I think the timeout should be increased to at least 30 seconds.  My
windows box has a timeout that goes for several minutes.

If there is concern about FreeBSD appearing to hang, I could imagine
that a console warning message could be printed after 5 seconds.  But
just say "drive has not yet responded".  But give it more time.

In this day and age we're generally not playing games with udma33 vs
66, notched cables, poor CRC support etc.  SATA seems to have
eliminated all that.  Hmm, it might make sense to increase the timeout
on SATA connections to 2 or 3 minutes by default.


Actually I do have a patch around that logs the timeout on the console 
after the normal timeout (5secs), then just goes on to wait for double 
the timeout and log again etc etc, final timeout was IIRC 60 secs but 
could be anything.
I have a disk which I am finally getting rid of that produces READ_DMA 
and WRITE_DMA errors at a pretty high rate.  I did enable the extra ATA 
error reporting and it doesn't seem to indicate any sort of actual 
errors, just extra long itmeouts.


At one time, I did change the system to extend the timeout, but I did 
not see any real improvement at 30 seconds.  I suspect that an even more 
extended timeout would be necessary to solve the problem.


I am removing the disk this week.  Does anyone want a disk that produces 
DMA timeouts at a regular rate?  Would it help actually solve this problem?


Please let me know if you want such a beast and I will ship it to you.

/Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern.maxdsiz on amd63 with i386 binaries

2008-10-25 Thread Joe Marcus Clarke
Chris Peterson wrote:
> Thanks Joe, that did it.
> 
> Out of curiosity, I don't see any of the compat tree in
> /boot/defaults/loader.conf, is there any place this is documented
> besides kernel sources? If not then I guess I should give something back
> to the community and change that :)

Not that I found.  I typically find stuff like this by running sysctl -a
and grepping for familiar patterns.

Joe

> 
> Regards,
> Chris Peterson
> 
> On Oct 24, 2008, at 9:03 PM, Joe Marcus Clarke wrote:
> 
>> Chris Peterson wrote:
>>> Hello,
>>>
>>> I've got a handful of i386 boxes, and a handful of amd64 boxes running a
>>> 32-bit application, the reasons for this exact configuration mystify me
>>> as well as the deployment predates my time in the environment. Now that
>>> the dataset the application is loading is rapidly approaching 512MB
>>> we're starting to tweak kern.maxdsiz and kern.dfldsiz to 1GB.
>>>
>>> The i386 boxes are doing great, but we hit an issue with the amd64
>>> machines in that 64bit apps seem to work fine, but the 32bit apps
>>> running on the amd64 machines fail to be able to use more than the i386
>>> default of 512MB no matter what we set kern.maxdsiz to. I've also tried
>>> compiling it into the kernel, which results in the same issue.
>>>
>>> I tried starting the app with "limits -d 1090519040", and it seems to
>>> fail as well. Limits does show the proper value for datasize of
>>> 1064960 kB.
>>>
>>> We're locked into 32-bit binaries for this app at the moment thanks to
>>> some uh... interesting libraries it uses, so the usual option of
>>> recompile isn't available. I'd like to avoid traveling from San Jose to
>>> Seattle, then Virginia, then Munich to reinstall the amd64 machines with
>>> i386 machines if at all possible.
>>>
>>> Uh... help?
>>
>> Have you tried setting compat.ia32.maxdsiz?  I believe this will do what
>> you want.
>>
>> Joe
>>
>> -- 
>> Joe Marcus Clarke
>> FreeBSD GNOME Team::[EMAIL PROTECTED]
>> FreeNode / #freebsd-gnome
>> http://www.FreeBSD.org/gnome
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 


-- 
Joe Marcus Clarke
FreeBSD GNOME Team  ::  [EMAIL PROTECTED]
FreeNode / #freebsd-gnome
http://www.FreeBSD.org/gnome
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern.maxdsiz on amd63 with i386 binaries

2008-10-24 Thread Joe Marcus Clarke
Chris Peterson wrote:
> Hello,
> 
> I've got a handful of i386 boxes, and a handful of amd64 boxes running a
> 32-bit application, the reasons for this exact configuration mystify me
> as well as the deployment predates my time in the environment. Now that
> the dataset the application is loading is rapidly approaching 512MB
> we're starting to tweak kern.maxdsiz and kern.dfldsiz to 1GB.
> 
> The i386 boxes are doing great, but we hit an issue with the amd64
> machines in that 64bit apps seem to work fine, but the 32bit apps
> running on the amd64 machines fail to be able to use more than the i386
> default of 512MB no matter what we set kern.maxdsiz to. I've also tried
> compiling it into the kernel, which results in the same issue.
> 
> I tried starting the app with "limits -d 1090519040", and it seems to
> fail as well. Limits does show the proper value for datasize of 1064960 kB.
> 
> We're locked into 32-bit binaries for this app at the moment thanks to
> some uh... interesting libraries it uses, so the usual option of
> recompile isn't available. I'd like to avoid traveling from San Jose to
> Seattle, then Virginia, then Munich to reinstall the amd64 machines with
> i386 machines if at all possible.
> 
> Uh... help?

Have you tried setting compat.ia32.maxdsiz?  I believe this will do what
you want.

Joe

-- 
Joe Marcus Clarke
FreeBSD GNOME Team  ::  [EMAIL PROTECTED]
FreeNode / #freebsd-gnome
http://www.FreeBSD.org/gnome
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.4 RC1 locks up solid on first reboot

2008-10-22 Thread Joe Koberg

Jeremy Chadwick wrote:

On Thu, Oct 23, 2008 at 06:27:45AM +0200, Milan Obuch wrote:
  
I did not investigate on this issue too much, but there is an workaround - 
copy older /boot/loader over newer one. In my case, I am rebuilding whole






I have experienced loader troubles in the past when using customized 
compiler options in /etc/make.conf .  Rebuilding without compiler 
options fixed the issue.


Joe Koberg
joe at osoft dot us


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: fxp performance with POLLING

2008-10-07 Thread Joe Koberg

Pete French wrote:
However, ethernet at 100Mbit is 4B5B coded at a 125mhz rate. So the raw 



Errr, 4B5B *is* 10 bits per byte surely?
...
Gig ether is mainly 8B10, as is Firewire, SATA, FibreChannel and a

Mind you, it assumes that you know the real bit rate, which in the
case of 100baseT is, as you say, actualy 125mbits/sec.
  


You are right. It definitely is 10 bits per byte clocked at a higher 
rate. I guess the "100mbit/s" rate is so strongly associated with the 
technology that I glossed right over that.



Joe






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: fxp performance with POLLING

2008-10-07 Thread Joe Koberg

Pete French wrote:

1 megabit = 106 = 1,000,000 bits which is equal to 125,000 bytes.



you are assuming eight bits per byte - but this is a serial line so
you should use ten bits per byte instead.

-pete.
  


That was a rule of thumb in the heyday of async serial lines, which used 
a start and stop bit per byte.


However, ethernet at 100Mbit is 4B5B coded at a 125mhz rate. So the raw 
synchronous data rate really is 12.5Mbytes/s.  Minus the sync preamble 
of 8 bytes per packet and the mandatory inter-frame-gap of 12 bytes 
that's a physical layer rate of (12.5M * (1500/(1500+20))) or 12.34Mbyte/s.


Even in the later days of modems this rule applied less and less, 
because the modulation schemes became synchronous.


Joe Koberg
joe_at_osoft_dot_us


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 7.0 + Xen 3.1 + HVM: Success!

2008-06-29 Thread Joe Auty
T-LOCKED]
psm0: [ITHREAD]
psm0: model IntelliMouse Explorer, device ID 4
ppc0: parallel port not found.
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 8250 or not responding
sio0: [FILTER]
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
Timecounter "TSC" frequency 2793128576 Hz quality 800
Timecounters tick every 1.000 msec
hptrr: no controller detected.
ad0: 102400MB  at ata0-master WDMA2
acd0: CDROM  at ata1-master PIO3
GEOM_LABEL: Label for provider acd0 is iso9660/FreeBSD_Install.
Trying to mount root from ufs:/dev/ad0s1a


FreeBSD 7.0 pciconf -vl:
[EMAIL PROTECTED]:0:0:0: class=0x06 card=0x chip=0x12378086
rev=0x02 hdr=0x00
vendor = 'Intel Corporation'
device = '82440/1FX 440FX (Natoma) System Controller'
class = bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:0:1:0: class=0x060100 card=0x chip=0x70008086
rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82371SB PIIX3 PCI-to-ISA Bridge (Triton II)'
class = bridge
subclass = PCI-ISA
[EMAIL PROTECTED]:0:1:1: class=0x010180 card=0x00015853 chip=0x70108086
rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82371SB PIIX3 IDE Interface (Triton II)'
class = mass storage
subclass = ATA
[EMAIL PROTECTED]:0:2:0: class=0x03 card=0x00015853 chip=0x00b81013
rev=0x00 hdr=0x00
vendor = 'Cirrus Logic'
device = 'CL-GD5446 64-bit VisualMedia Accelerator'
class = display
subclass = VGA
[EMAIL PROTECTED]:0:3:0: class=0xff8000 card=0x00015853 chip=0x00015853
rev=0x01 hdr=0x00
[EMAIL PROTECTED]:0:4:0: class=0x02 card=0x00015853 chip=0x813910ec 
rev=0x20
hdr=0x00
vendor = 'Realtek Semiconductor'
device = 'RT8139 (A/B/C/810x/813x/C+) Fast Ethernet Adapter'
class = network
subclass = ethernet

--
Freddie Cash
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"




--
Joe Auty
NetMusician: web publishing software for musicians
http://www.netmusician.org
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: DVD-RW doesn't write

2008-06-10 Thread Joe Kelsey

Jerahmy Pocott wrote:


On loading atapicam module it says:
acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00

I have never managed to use burncd with any drive.

In order to use atapicam, you must enable the pass? devices.  My 
devfs.conf contains:



# Commonly used by many ports
linkacd0cdrom
linkacd0dvd

# Allow a user in the wheel group to query the smb0 device
#permsmb00660
permxpt00660
permpass00660
permpass10660

The xpt0 is left over from other experiments.  The pass? is required to 
allow general access to use of growisofs.


/Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Closing the Jo Rhett argument

2008-06-09 Thread Joe Kelsey
Jo Rhett has clearly stated (in offline reply) that they do not 
participate in the -BETA and-RC cycles leading up to -RELEASE, so they 
therefore do not have any issues with -RELEASE and EoL to raise.


Actually, they still have the same complaints to raise about EoL, but 
since they refuse to participate in the -RELEASE process, they do not 
have valid points to raise.


I ask that everyone please stop communicating with the persona known on 
this list as "Jo Rhett" unless and until they participate in the -BETA, 
-RC, and -RELEASE process.  You cannot raise any sort of valid complaint 
about -RELEASE, EoL or bugs if you do not participate in finding bugs 
during the -BETA and -RC stages prior to the -RELEASE.  If you instead 
choose to try to run -RELEASE and find bugs then, then complain about 
the bugs you found and continue to complain that these bugs were not 
found by someone else and fixed ahead of time, you have no issues and do 
not deserve an answer, no matter how much you try to frame it as a 
"policy" question.


/Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


6.3-RELEASE versus 5.2-RELEASE

2008-06-09 Thread Joe Kelsey
I think I have finally decoded Jo Rhett's issue.  It is very hard to 
decipher because the poster refuses to exactly identify their problem.


The entire problem comes down to the definition of -RELEASE.  Jo 
apparantly feels that they can ONLY run -RELEASE branded code at their 
workplace.  That means that they cannot run any form of -STABLE.  
Therefore, they can only ever run 6.3-RELEASE and then only if no bugs 
were fixed after the official branding of 6.3-RELEASE.


I cannot speak at all about the branding of 6.3-RELEASE.  I run 
7.0-STABLE here.  What Jo seems to thik is that a certain sequence of 
events occurred during the 6.3-RELEASE branding.  6.3-RELEASE was marked 
in the tree.  Sometime after this marking event occurred, bugs were 
ientified and subsequently fixed in the -STABLE branch.  These bugs have 
been identified by Jo as SHOWSTOPPER bugs which will prevent him from 
ever using 6.3-RELEASE, since by their definition, they can only ever 
use the exact thing identified by the cvs tag of 6.3-RELEASE.


Therefore, by Jo's definition, they can never run 6.3-anything at their 
shop and are forced to wait for 6.4-RELEASE, whenever that happens.  
Therefore, they must take on the onerous duty of examining all security 
fixes target for 6.3 and redo them for 6.2.


Basically, they do not wish to do this and protest the EoL status given 
to 6.2 because they are physically prevented from using 6.3.  They 
refuse to even try to identify whether or not 6.3-RELEASE actually has 
any bugs that affect them, they just assume that the presence of bugs 
fixed AFTER the tagging of 6.3-RELEASE in cvs certifies their inability 
to use the actual 6.3-RELEASE code, since they can apparantly only run 
binary releases direct from FreeBSD and cannot "roll their own" for some 
unknown reason.  They are also, apparantly, prohibited from testing any 
code locally due to some unknowable reason.


Can anyone verify that some number of bugs related to either  a) 
gmirror, b) bge and/or c)twe were fixed after the release of 6.3?  That 
is as far as I can tell the reason that Jo objets to EoL of 6.2, the 
fact that 6.3 is unusable due to these late-fixed bugs.


/Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: cvsup.uk.FreeBSD.org

2008-05-11 Thread Dr Joe Karthauser
--- Original message ---
From: Ollivier Robert <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Sent: 8.5.'08,  8:35

> Stefan Lambrev disait :
> > cvsup.uk.FreeBSD.org is outdated.
> > I know this is not the proper list, but which one is?
> 
> freebsd-hubs is, redirected.
> 
> I've noticed that recently but I should have send a mail about it, sorry.
> -- 
> Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- 
[EMAIL PROTECTED]
> Darwin sidhe.keltia.net Version 9.2.0: Tue Feb  5 16:13:22 PST 2008; i386
> 

Hey guys. 

I have reclassified this faulty mirror as cvsup1 and made cvsup a cname to 
cvsup3, which is the most recent addition and best hardware available. In 
the future we will always point to the most available machine in this way.

Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: HP ProLiant DL360 G5 success stories?

2008-03-12 Thread Joe Koberg

Johan Ström wrote:
But.. 
http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf seems 
to tell me that in basic mode I can only access BIOS (pre-OS) using 
the Remote Console feature, and that after POST I have to have the 
advanced licensed option?




I don't do the purchasing and we get all Advanced iLO, so I will take 
your word for it.  The older generations supported text console (i have 
a 360G2 that does so).   We use the HP Management agents under Windows 
for all SNMP reporting so I can't comment on the reporting method under 
other OS's.




Joe


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: HP ProLiant DL360 G5 success stories?

2008-03-12 Thread Joe Koberg
The iLO is a completely separate management processor with its own 
network port. It runs its own OS and has its own IP address. It runs an 
SSL webserver for access.  The iLO is accessible over the network any 
time the machine is plugged into power.  I am not sure about IPMI access 
to it.


The "normal" iLO option will give you exact textual console screen 
output and keyboard control from the moment of power-on.  It will also 
let you toggle power and hit the reset button. I believe it uses a java 
applet in the browser.


The "advanced" iLO option, which is license-key-unlocked, also provides 
graphical remote console, and virtual media. You can upload a CD or 
floppy image and then boot the server from it.  I suspect the 
compatibility issue appears here - the virtual media probably emulates 
USB mass storage, and the OS must be able to boot from it.


It has full reporting of hardware state and management log details, and 
the "home page" is a big summary with any faults outlined in red.


In this data center we probably have 1500 HP machines with iLO. I find 
it an effective and reliable remote access method.  We definitely prefer 
it using it to our Avocent IP KVMs.




Joe Koberg
joe at osoft dot us





Johan Ström wrote:
First of all, nice with all these positive answers! Thank you all 
(without responding to each and every post:))!



On Mar 12, 2008, at 12:35 PM, Pete French wrote:


What I'm looking at is a DL360 G5, probably with one E5335 (quad 2.0)
and 4G of RAM and 4x 146Gb SAS disks on the Smart Array P400i card.

...
So.. Does anyone have any experience with this combo (DL360 G5 / 
P400i)?


We have around 20 machines like that and they work beautifully. We
run 7.0/amd64 on the machines now, but we have run 6.2/i386 in the past
and that work fine - though you will only be able to use the first
3.5 gig of RAM.


I don't have any plans on running i368, running amd64 on the 
supermicro box now without any problems (that I can relate to that at 
least).


How long have you run 7.0 (before release)?  From all the other 
responses it seems lots of ppl use 7.0 on these without any problems 
at all.






Furthermore, anyone run 7.0 on this? Or should I still stick with


We run 7.0 on these machines and it works fine - I always prefer 7.0
to 6.3 on SMP machines as it performs better. Also 7.0 works well with
the iLO on these machines - I seem to recall when I installed 6.X that
it didn't work too well and I had to use boot floppy images. I'd say
go for 7.0 and amd64 if you can.


This is where I'm a bit curious. What OS interaction does iLO do? That 
needs to be "compatible" i mean.
On my current box I got a IPMI card that gives me (when its working..) 
SOL capabilities.. To what degree can I remote control with iLO? If 
I've understood correct, I get the exact console as on screen with kb 
access, over web/ssh/telnet. Is this working good? This is one of my 
important points for changing since its so crappy on my current box, 
and when the box is a couple of miles away its quite nice to have it 
working flawlessly..
iLO over internet? Possible, impossible? Encryption? (yes i know, not 
exactly freebsd related questions but.. )



Another thing, how is it with physical monitoring? 
Temperatures/fanspeeds/voltage?


Thank you (all)! :)

--
Johan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Analysis of disk file block with ZFS checksum error

2008-03-04 Thread Joe Peterson
Eric Anderson wrote:
> I'm starting to think there is a timing issue or some such problem with 
> ZFS, since I can use the same drives in a gmirror with UFS, and never 
> have any data problems (md5 checksums confirm it over-and-over).  I 
> highly doubt that everyone is seeing similar issues and it just is 
> because ZFS is so intense.  I've had plenty of systems under severe disk 
> load that have never exhibited corrupt files because of something like 
> this.

I also wondered this - i.e. if ZFS was triggering a certain timing
behavior that revealed the problem.  Still, if this is the case, it
seems to me that the problem lies in the ATA subsystem, since it should
prevent a higher-level things like ZFS to be able to create bad timings
(or am I not thinking of this correctly?).

Also, I think there were some reports of problems with DMA/ATA when
*not* using ZFS.

> I wish we could get our hands on this issue..  Seems like some common 
> threads are ATA/SATA disks.  Is your setup running 32bit or 64bit 
> FreeBSD?  (if you already mentioned it, I'm sorry, I missed it)

This was on 32bit FreeBSD with PATA.  I am the one who had no SMART
issues and no DMA errors reported under Linux.  Changing the cable may
have "fixed" it, since I did not see errors in some further testing, but
even if so, my theory is that there is some edge case (timing?) that the
FreeBSD ATA drivers were sensitive to, and perhaps my change of cables
pushed the problem to the other side of the threshold.  Since I never
saw errors under Linux (and I've been using that cable for a couple of
years), I do not necessarily think the cable was actually "defective".

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 7.0-STABLE amd64 kernel trap during boot-time device probe

2008-03-01 Thread Joe Koberg


Jeff Blank wrote:

Hello,

I posted this around 3 months ago and never received a response.  the
problem still occurs with 7.0-STABLE (csup on 20080301).  I possibly
incorrectly referred to it as a panic last time, when the problem was
really a trap.

  


I also receive "Fatal trap 12: page fault while in kernel mode" while 
trying to boot a HP Proliant DL580G3 from the 7.0-RELEASE amd64 disc1 CD.


I can successfully boot with the verbose boot option from the boot CD, 
and I installed the system and got it all setup for ZFS root. At long as 
I booted verbose it worked.


But now I have recompiled the kernel to include SCHED_ULE and a few 
options and I cannot avoid the "Fatal trap 12"


It is annoying to troubleshoot on this machine because the BIOS takes 5 
minutes finally get around to booting the OS after a reboot. But it has 
an iLO management controller that I might be able to arrange access to 
for anyone who has the skill to find/fix the issue.


Joe Koberg
joe at osoft dot us





Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x258
fault code  = supervisor read data, page not present
instruction pointer = 0x8:0x8047aa7e
stack pointer   = 0x10:0xa0677b40
frame pointer   = 0x10:0xa0677b60
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 23 (irq21: ohci0+)
[thread pid 23 tid 100029 ]
Stopped at  0x8047aa7e = _mtx_lock_sleep+0x4e:  movl
0x258(%rcx),%esi
db>
=== end panic ===

=== no panic ===
[...]
ums0:  
on uhub0
ums0: 5 buttons and Z dir.
ukbd0:  on uhub0
kbd2 at ukbd0
Timecounters tick every 1.000 msec
firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
firewire0: bus manager 0 (me)
acd0: DMA limited to UDMA33, device found non-ATA66 cable
acd0: DVDR  at ata0-master UDMA33
ad4: 238475MB  at ata2-master SATA300
ad8: 157066MB  at ata4-master SATA300
ad10: 157066MB  at ata5-master SATA300
ar0: 314133MB  status: READY
ar0: disk0 READY using ad8 at ata4-master
ar0: disk1 READY using ad10 at ata5-master
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/ad4s1a
[continue successful boot]
=== end no panic ===

- End forwarded message -
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: is there any raid5 in software in FreeBSD ?

2008-02-19 Thread Joe Peterson
ZFS has RAIDZ - very similar to RAID5 (with added features), if you
don't mind ZFS's current experimental state.

    -Joe


Nenhum_de_Nos wrote:
> i've seen RAID 0 through 3 (skip 2 ;) )
> 
> thanks,
> 
> matheus
> 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Multiple key presses are hindered when repeat turned off

2008-02-19 Thread Joe Peterson
I have verified this on two machines, but it would be helpful if others
out there can reproduce it too.  Also, I do not know if it is Xorg or
the FreeBSD keyboard drivers, since I see no way to reproduce on the
console (i.e. turn off repeat).

In an xterm, type: "xset r off".  Then try some multiple-key
combinations (i.e. keep holding first key(s) when you type the next one):

po (o does not appear)
lk (k does not appear)
grep (e does not appear)

When you release the keys, the press events will show up.

Keyboards in general have limited multiple-key (rollover) capabilities,
but using "xset r off" reduces these to the point that you will often
mistype things, and it seems unique to FreeBSD.  I am using 7.0-RC2 at
the moment.

    Thanks, Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Revisiting jerky/freezing mouse issue in 7.0

2008-02-18 Thread Joe Peterson
I spent some time looking again at a trace I posted last month showing
mouse "jerkiness/freezing" under load (note that I see it all of the
time under light load too, but it is harder to reproduce on demand).
Here's the trace:

http://www.skyrush.com/downloads/ktr_ule_4.out

The large stretches of yellow in the Xorg process are what trouble me.
Clearly, Xorg is yielding processor time mostly to, in this case, xtrs,
which is getting a whole lot of time.  If you look at the fairly regular
mouse events, you'll notice that moused runs for a short time on each
mouse even from psm0 and then sleeps.  This makes sense, and it appears
moused is acting correctly.  But many of these mouse events are
seemingly ignored by Xorg, which spends most of its time yielding
(yellow) and not getting "woken up" by the events to simply process
them.  I've noticed, also, that Xorg can "get behind" easily and spend
its time catching up on event processing for a while after I stop using
the mouse.  It just doesn't seem to be getting an appropriate amount of
CPU time, or at least it yields too long between runs, to make
interactivity smooth.  These yields, I believe, are the freezes I see.
Here's a question: does Xorg "respond" to mouse events, or does it just
wake up every now and then and check?

Note that even when Xorg runs, it only runs for a very short time.  If
the ULE scheduluer is being fair, I would think this might give Xorg
*more* of a share of the CPU to use to service these events, since it is
running a lot less than xtrs.

One interesting point is at timestamp 1478223777518.  It looks like Xorg
*starts* to yield when moused runs.  Here's the line:

1478223777518 sched_add: 0xa7be1660(Xorg) prio 160 by 0xa5eb7aa0(moused)

Does this mean that moused *caused* Xorg to yield, or am I reading this
incorrectly)?  The yield then lasts through a series of mouse moves.  A
quick look through the graph shows that this happens quite a bit, which
seems like the reverse of what we'd like.

This issue (especially since it does not even require continuous heavy
CPU use to see) is a constant distraction while using the system, and
again I want to volunteer my time to help track it down.  I am not sure
how to further delve into it, so if there is some additional data I can
gather, please let me know, and I'll gladly do it.

Thanks, Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: mount of ext2fs volume stuck in "D+" state (disk uninterruptible wait)

2008-02-11 Thread Joe Peterson
New information: it looks as though this ext2fs was already mounted when
the mount was attempted.  I have reproduced the issue by simply trying
to mount the ext2fs volume more than once.  Given this, I'd expect the
mount to return an already mounted error rather than hanging, so this is
perhaps a straightforward bug.

    -Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: mount of ext2fs volume stuck in "D+" state (disk uninterruptible wait)

2008-02-11 Thread Joe Peterson
Kris Kennaway wrote:
> Joe Peterson wrote:
>> I just tried (under FreeBSD 7.0-RC1) to mount an ext2fs volume - I've
>> mounted it before with no trouble on this same FreeBSD version.  This
>> time, mount appeared to hang.  I noticed that I can see the contents of
>> the volume under the mount point, so the mount seemed to "work", but the
>> process is stuff.  "ps" shows:
>>
>> root   1307  0.0  0.0  3156   792  p6  D+5:21PM   0:00.00 mount
>> /mnt/linux-home
>>
>> The "ps" man page says that "D" means: "Marks a process in disk (or
>> other short term, uninterruptible) wait."
>>
>> Is there any way I can investigate what is going on?  I cannot umount
>> (device busy) or break out of the mount command...
> 
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

But unfortunately I do not have KDB and DDB compiled into the kernel.
And, obviously, if I reboot, I will lose this opportunity.  I suspect
this to be an intermittent thing.  Is there anything I can extract while
the system is running that would be useful?

Thanks, Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


mount of ext2fs volume stuck in "D+" state (disk uninterruptible wait)

2008-02-11 Thread Joe Peterson
I just tried (under FreeBSD 7.0-RC1) to mount an ext2fs volume - I've
mounted it before with no trouble on this same FreeBSD version.  This
time, mount appeared to hang.  I noticed that I can see the contents of
the volume under the mount point, so the mount seemed to "work", but the
process is stuff.  "ps" shows:

root   1307  0.0  0.0  3156   792  p6  D+5:21PM   0:00.00 mount
/mnt/linux-home

The "ps" man page says that "D" means: "Marks a process in disk (or
other short term, uninterruptible) wait."

Is there any way I can investigate what is going on?  I cannot umount
(device busy) or break out of the mount command...

Thanks, Joe


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Analysis of disk file block with ZFS checksum error

2008-02-11 Thread Joe Peterson
Gavin Atkinson wrote:
> Are the datestamps (Thu Jan 24 23:20:58 2008) found within the corrupt
> block before or after the datestamp of the file it was found within?
> i.e. was the corrupt block on the disk before or after the mp3 was
> written there?

Hi Gavin, those dated are later than the original copy (I do not have
the file timestamps to prove this, but according to my email record, I
am pretty sure of this).  So the corrupt block is later than the
original write.

If this is the case, I assume that the block got written, by mistake,
into the middle of the mp3 file.  Someone else suggested that it could
be caused by a bad transfer block number or bad drive command (corrupted
on the way to the drive, since these are not checksummed in the
hardware).  If the block went to the wrong place, AND if it was a HW
glitch, I suppose the best ZFS could then do is retry the write (if its
failure was even detected - still not sure if ZFS does a re-check of the
disk data checksum after the disk write), not knowing until the later
scrub that the block had corrupted a file.

I think that anything is possible, but I know I was getting periodic DMA
timeouts, etc. around that time.  I hesitate, although it is tempting,
to use this evidence to focus blame purely on bad HW, given that others
seem to be seeing DMA problems too, and there is reasonable doubt
whether their problems are HW related or not.  In my case, I have been
free of DMA errors (cross your fingers) after re-installed FreeBSD
completely (giving it a larger boot partition and redoing the ZFS slice
too), and before this, I changed the IDE cable just to eliminate one
more variable.  Therefore, there are too many variables to reach a firm
conclusion, since even if the cable was "bad", I never saw one DMA error
or other indication of anything wrong with HW from the Linux side (and
I've been using that HW with both Linux and FreeBSD 6.2 for months now -
no apparent flakiness of any kind on either system).  So either it *was*
bad and FreeBSD 7.0 was being more "honest", FreeBSD's drivers and/or
ZFS was stressing the HW and revealing weaknesses in the cable, or it
was a SW issue that got cleared somehow when I re-installed.

Is it possible that the problem lies in the ATA drivers in FreeBSD or
even in ZFS and just looks like HW issues?  I do not have enough
info/expertise to know.  If not, then it may very well be true that HW
problems are pretty widespread (and that disk HW cannot, in fact, be
trusted), and there really *is* a strong need for ZFS *now* to protect
our data.  If there is a possibility that SW could be involved, any
hints on how to further debug this would be of great help to those still
experiencing recent DMA errors.  I just want to be more sure one way or
the other, but I know this issue is not an easy one (however, it's the
kind of problem that should receive the highest priority, IMHO).

        -Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Joe Peterson
Julian Elischer wrote:
> it could be an old file..
> what kind of disks?

It's a Seagate ST3500630A parallel ATA drive.

> I had a scenario where 3ware controllers were just failing to write to
> a drive in the array, so old data showed through.

I have an Intel ICH4 controller - nothing unusual.

> the filesystem and the partitions and the raids all were on different
> alignments so teh only part of the system that had a boundary that 
> aligned with the bad data was the physical stripes laid down by the 
> controller.  It was 64k stripes and 64k data missing, exactly on
> stripe boundaries. Due to the fact that FreeBSD had partitioned the 
> drive staring at 63 blocks in, nothing else aligned with the problem.

Hmm, well this is a straight-forward disk situation - never used RAID on
this drive.  Give what is happening, I wonder the changes of it being
HW, OS, or a filesystem issue.

        -Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Joe Peterson
Chris Dillon wrote:
> That is a chunk of a Mozilla Mork-format database.  Perhaps the  
> Firefox URL history or address book from Thunderbird.

Interesting (thanks to all who recognized Mork).  I do use Firefox and
Thunderbird, so it's feasible, but how the heck would a piece of one of
those files find its way into 1/2 of a ZFS block in one of my mp3 files?
   I wonder if it could have been done on write when the file was copied
to the ZFS pool (maybe some write-caching issue?), but I thought ZFS
would have verified the block after write.  It seems unlikely that it
would get changed later - I never rewrote that file after the original
copy...

        -Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Joe Peterson
Mark Day wrote:
> Based on the subset of data you posted, the bad data looks like ASCII
> text.
> The bad data from offset a to a000f is:
>
> ${138AFE{@
> @$$}1
>
> The bad data from offset af6c1 to af6c8 is:
>
> 392A9}@
>
> I don't recognize the content beyond that, but I'd guess that somehow
> the
> contents of some other file managed to overwrite that portion of the bad
> file.  As for how that happened, I don't know.  But if someone
> recognizes
> where the bad content came from, that might be a clue.


Gary/Mark,

Good eye!  Yes, it indeed does appear to be ASCII.  I *thought*
something in the repetition when I originally did an od -a looked
interesting.

I dumped the whole bad section as a string, and here's (partly) what I get:

${138AFE{@
@$$}138AFE}@

@$${138AFF{@
[A3:^80(^91^2146F)]
@$$}138AFF}@

@$${138B00{@
@$$}138B00}@

@$${138B01{@
[181:^80(^91^2146F)]
@$$}138B01}@

@$${138B02{@
@$$}138B02}@

@$${138B03{@
[2C:^80(^91^2146F)]
@$$}138B03}@

@$${138B04{@
@$$}138B04}@

.
.
.

@$${138B8B{@
<(21470=Thu Jan 24 23:20:58 2008)>
[117:^80(^91^21470)]
@$$}138B8B}@

.
.
.

@$${138C18{@
<(21472=1201242069)>[-2:^80(^82^85)(^83^1B5)(^84=b)(^85=1)(^86=0)(^87=0)
(^88=0)(^89^2146C)(^8A=)(^8B=40)(^8C=2e)(^8D^84)(^8E=0)(^90^21472)
(^91^21460)]
@$$}138C18}@

@$${138C19{@
<(21473=a72f78)>[2:^80(^89^21473)]
@$$}138C19}@

@$${138C1A{@
@$$}138C1A}@

.
.
.


and more of the same.  Note the date string.  There are several like
that.  Anyone recognize this text format?

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Joe Peterson
In my experimentation with the ZFS filesystem, I encountered one case of
a file block with a checksum mismatch.  Doing a "zpool scrub" revealed
it, and trying to read the file yielded an error - only the part of the
file before the bad block was read (ZFS aborts reading at this point,
which makes sense), resulting in a short file.  The reason the CKSUM
error is not fixable is because my ZFS pool contains only one device (no
mirror or RAIDZ), but I do have the original/good version of the file
affected.  Here's the output of zpool status (new scrub in process):

  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress, 64.36% done, 0h18m to go
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 2
  hda6  ONLINE   0 0 2

errors: Permanent errors have been detected in the following files:

/mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/
Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3


I was curious about what actually happened: was this a ZFS bug, trouble
with its metadata, or truly a bad block?  In order to determine this, I
modified ZFS's source code temporarily to ignore the checksum mismatch
and let the file read fully.  What I then got was the full-length file
and no errors, showing that there were no disk read errors associated
with the read (I already had assumed this from the fact that zpool
status showed only a non-zero CKSUM count), however, I may have seen
other error counts previously (ZFS resets them to zero on, e.g.,
reboot).  I received no errors when originally copying this file *to*
the ZFS pool - only on subsequent reads/scrubs.

(Note that I have posted before about DMA errors in my log for the disk
I am using, but I have had nothing but successful SeaTools tests
(surface scans) of the drive.  Jeremy Chadwick had similar issues, as
did others, so I think it is worth investigating if there is some
OS/software cause rather than real HW issues.  This is one reason I
wanted to investigate my ZFS checksum issue more deeply.)

I also have a good backup of the file in question, so I now have two
copies of the file: one good, and one with a bad block.  The file is
3575936 bytes long, and recordsize (in ZFS) is 128K, making the file
about 27 blocks long.  Curiously, the bad section of the file is exactly
65536 bytes long (1/2 a block).  The bad block starts at exactly the 5th
128K block (byte 65536 or hex a).

I wanted to see the characteristics of the bad data.  Was just one bit
flipped randomly?  No.  It is just one bit or set of bits in the bytes
that are affected?  It doesn't seem so.  Were there any other stange
patterns here?  Well, yes, and maybe someout out there with more
knowledge/experience in disk modes of failure will recognize something
(I have included some data below).

For one thing (as I mentioned), only 65536 bytes are bad (and it's
exactly this many, with a few "good" bytes thrown in, but not far from
what matches random chance would produce.  Also, all bad bytes have a
zero in the high bit - interesting?  Also, near the end of the block,
the bad bytes all go to zero, strangely coincident with the first "good"
zero in that bad block - not sure if that's coincidence or not.  Also, I
calculated the number of "Bits same" (matching bits) in the good vs. bad
bytes, and it appears fairly random, so it appears that the bad bytes
are very random in nature and not correlated much at all with the good
bytes.

So except for the fact that the 2nd half (65536 bytes) of the ZFS block
are good, the bad block seems to consist of random data, except for the
string of zero bytes near the end and the zero high-bit.  It's not as if
one bit on the disk flipped - it affects the whole (1/2) block.  Does
this seem like a disk error, controller error/bug, cable problem (I
recently put a new cable on, so I doubt this).  It seems to me something
more systemic rather than a random bit error - opinions are more than
welcome.

Here is some info from a python program I wrote to look at the data
(I've left out spans of essentially uninteresting portions showing
similar stuff, but I can get you the whole thing if interested):

File posGoodBad Match   Good (bin)  Bad (bin)  Bits same
0009fff0d9  d9  Yes 11011001110110018
0009fff105  05  Yes 010101018
0009fff2c1  c1  Yes 110111018
0009fff381  81  Yes 100110018
0009fff45f  5f  Yes 010101018
0009fff566  66  Yes

Re: Frequent USB mouse disconnections under load with RELENG_7

2008-02-01 Thread Joe Peterson
Wayne Sierke wrote:
> On Fri, 2008-01-25 at 01:59 +1030, Wayne Sierke wrote: 
>> I'm getting a lot of USB mouse disconnects on RELENG_7. I wondered
>> whether they might have been due to running with a KTR-enabled kernel
>> but in just the last 7 hours I've been running on stock GENERIC and
>> they're still happening.

Hey Wayne,

I'm not sure if you associating the disconnects with the "jerky mouse"
behavior, but as an added datapoint, I have a PS/2 mouse, I see *no*
disconnects in the system logs (well, it's PS/2...), and I still get the
jerky mouse...

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Unexpected "resilver" after reboot (after scrub found CKSUM problems)

2008-01-30 Thread Joe Peterson
[...reposting to freebsd-stable - no response on freebsd-fs]

I had a strange thing happen on ZFS the other day, and I cannot find any
info about it on the web - thought you might have some ideas.  I am
using 7.0-RC1 at the moment.

I found a checksum error in ZFS during a scrub.  This is strange in
itself, since I believe the disk is OK (see below):



  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  ad0s1dONLINE   0 0 0

errors: Permanent errors have been detected in the following files:


/home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3



This is how it appears after a recent reboot, however.  After a scrub, I
see varying number of non-zero counts under CKSUM.  Not sure why it is
zero after reboot (maybe that's normal).

However, the strange this is that after my first reboot after the scrub
found the issue, zpool status told me that "resilver completed with 0
errors", and there were no known errors.  Only trying to read the file
and/or rescrubbing returned the status to the error state and made the
CKSUM column non-zero.  Since I do not have a mirror or raid config, I'm
not sure why it would resilver at all, and I did nothing explicit to
cause a resilver (as far as I know)...

Any ideas?

As an aside, I, along with some others on freebsd-stable@freebsd.org,
have been seeing what "look" like disk errors in the system logs.  I
have a suspicion that there could be some other cause (lots of
discussion on that list, if you are interested).

Strangely, this disk checks out fine on both short and long tests in
Seatools, and smartctl shows it as OK.  Also, using Linux to do lots of
reads from it does not show any issue or error logs.  At this point, I
am not sure if the CKSUM issue is a real HW flaw or something else...

        Thanks, Joe


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1

2008-01-27 Thread Joe Peterson
Remco van Bekkum wrote:
> Well it looks like in my case it is hardware related after all. It failed to 
> read the boot
> block several times now. 2nd sort of DOA of this disk...

Have you tried reading the block in another OS or using SeaTools?  That would
at least verify that it's hardware.

-Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1

2008-01-26 Thread Joe Peterson
Jeremy Chadwick wrote:
>> If this is widespread, I think the chances re slim that it is a
>> hardware problem in every case.
> 
> I'm in definite agreement here.  I think it might be worthwhile to note
> what hardware we're all using, in case there's something similar between
> our systems (chipset, disk vendor, etc.).
> 
> My system is as follows; timeouts were reported during an rsync of data
> from the ZFS stripe (ad8+ad10) to a UFS2 filesystem on ad6.  System
> eventually panic'd after remaining deadlocked (while kernel messages
> about timeouts kept printing on the console for ad6 only) for 10-15
> minutes.
>
> *   MB: Supermicro PDSMI+  (Intel ICH7-based)
> *  CPU: Intel Core 2 Duo E6600
> *  RAM: Corsair CM2X1024-6400 DDR2, 2GB
> *  ad4: WD Caviar SE WD2000JD (boot/OS)
> *  ad6: Seagate Barracuda 7200.10 ST3500630AS
> *  ad8: WD Caviar SE16 WD5000AAKS (ZFS stripe)
> * ad10: WD Caviar SE16 WD5000AAKS (ZFS stripe)
> * All drives are hooked up to the ICH7.
> * SMART stats showed no problems on any of the drives before or after.
> * RELENG_7, i386, ULE scheduler.

Mine is as follows:

*   MB: Tyan Trinity S2099
*  CPU: Pentium 4, 2.4GHz
*  RAM: Crucial DDR, ECC, CL2.5, Unbuffered 2GB (1/2 PC2100, 1/2 PC2700)
*  ad0: Seagate ST3500630A 3.AAE (1 UFS2 boot, 1 ZFS pool)
*  ad1: Seagate ST3160812A 3.AAH (not used by FreeBSD)
* Intel ICH4 UDMA100 controller
* ATI Radeon RV280 9250
* Intel PRO/1000 NIC
* 7.0-RC1, i386, ULE scheduler

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1

2008-01-26 Thread Joe Peterson
Remco van Bekkum wrote:
> Same here. On an amd64 system with 1x sata disk (Western Digital Caviar
> Green Power) on an amd690G chipset, with UFS and intensive disk activity
> the system hangs and in the end it may panic. I've csupped today and
> rebuild world & generic kernel but still it's very unstable, sometimes it
> even hangs when activating geom volumes at boot time... 
> I must add that this is a new system so I'm not 100% sure the hardware is 
> sane.
> Using ZFS it also crashed when doing intensive I/O.

This is very interesting.  It seems to there are several of us who are
experiencing something that *looks* like hardware (disk) issues when using 7.0.

Could this be related to the mouse freeze issue?  Could some process be
locking/grabbing the CPU at inopportune times and causing not only the
freezing symptoms but also reads/writes problems?

Can anyone else using 7.0 who hasn't already (especially those using ZFS)
check his/her /var/log/messages for disk TIMEOUTs or other disk error
messages?  If this is widespread, I think the chances re slim that it is a
hardware problem in every case.

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-26 Thread Joe Peterson
Ivan Voras wrote:
> Were both tests done in the same machine (actually, I mean the same PSU)?

Yes - I deliberately changed nothing (not even cables) before I ran the tests.
 I didn't want any variables.

    -Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-26 Thread Joe Peterson
Joe Peterson wrote:
> So I have started a "SeaTools" (disk scanner from Seagate) "long test" of the
> drive.  The short test passed already.  The results should be interesting.  If
> it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS
> bugs that just happen to look like drive problems.  I already did a long read,
> under linux, of disk contents, and got no messages about anything wrong.

Update: both SHORT and LONG tests passed for this drive in SeaTools.
Hmph...  the mystery remains.
        -Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-26 Thread Joe Peterson
I performed a ZFS scrub, which finished yesterday, and no new
/var/log/messages errors were reported during that time.  However, the scrub
found something interesting:


crater# zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed with 1 errors on Fri Jan 25 12:52:32 2008
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   1 3 2
  ad0s1dONLINE   1 3 2

errors: Permanent errors have been detected in the following files:


/home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_
Bachelor_Pad/07-Snowfall.mp3



Note that I have not touched this file since copying it to this drive.

So, it seems one file failed a checksum check during the scrub.  I now
(expectedly) get errors trying to read this file - probably ZFS indicating the
condition.  When I just logged in tonight, I got two more /var/log/messages
disk messages about WRITE_DMA48 TIMEOUT/FAILURE - might be a coincidence (just
as I was typing my password).

Also, smartctl still shows PASSED, however, this is interesting:

195 Hardware_ECC_Recovered  0x001a   061   046   000Old_age   Always
  -   9070

The number is much *smaller* now!  It was "6" a few minutes before this...
wrap around?  Hmm, I'm really not sure, at this point, what is going on.

So I have started a "SeaTools" (disk scanner from Seagate) "long test" of the
drive.  The short test passed already.  The results should be interesting.  If
it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS
bugs that just happen to look like drive problems.  I already did a long read,
under linux, of disk contents, and got no messages about anything wrong.

If I can turn on any debugging info to help determine if this is
software-related, let me know the magic keywords to use.  :)

        -Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson
Glad you got it back!  Yes, when I was first playing with ZFS, I noticed
that booting between single and multi user mode could make the pools
"invisible".  Import seemed to bring them back...

So, is the disk toast, or can you still read anything from it (part
table, etc.)?

    -Joe


Jeremy Chadwick wrote:
> On Fri, Jan 25, 2008 at 05:00:54PM -0800, Jeremy Chadwick wrote:
>> icarus# zfs list
>> no datasets available
>>
>> This doesn't bode well, and doesn't make me happy.  At all.
> 
> Pshew!  I was able to get ZFS to start seeing the pool again by doing
> the following:  (Supposedly "zpool import" by itself will show you a
> list of pools which it manages to see...")
> 
> icarus# zpool import -f storage
> icarus# df -k /storage
> Filesystem  1024-blocks  Used Avail Capacity  Mounted on
> storage   957873024 106124032 85174899211%/storage
> icarus# zfs list
> NAME  USED  AVAIL  REFER  MOUNTPOINT
> storage   101G   812G   101G  /storage
> icarus# zpool status
>   pool: storage
>  state: ONLINE
>  scrub: none requested
> config:
> 
> NAMESTATE READ WRITE CKSUM
> storage ONLINE   0 0 0
>   ad8   ONLINE   0 0 0
>   ad10  ONLINE   0 0 0
> 
> errors: No known data errors
> 
> Back to the drawing board.
> 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson
Jeremy Chadwick wrote:
> Joe, I wanted to send you a note about something that I'm still in the
> process of dealing with.  The timing couldn't be more ironic.
> 
> I decided it would be worthwhile to migrate from my two-disk ZFS stripe
> with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3
> disks combined (since they're all the same size).  I had another
> terminal with gstat -I500ms running in it, so I could see overall I/O.
> 
> All was going well until about the 81GB mark of the copy.  gstat started
> showing 0KB in/out on all the drives, and the rsync was stalled.  ^Z did
> nothing, which is usually a bad sign.  :-)  I ssh'd in and did a dmesg
> (summarised):
> 
> ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing 
> request directly
> ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing 
> request directly
> ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327
> ad6: FAILURE - WRITE_DMA timed out LBA=13951071
> ad6: FAILURE - WRITE_DMA timed out LBA=13951327
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839
> ad6: FAILURE - WRITE_DMA timed out LBA=13951583
> ad6: FAILURE - WRITE_DMA timed out LBA=13951839
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351
> g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5
> g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5
> g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5
> g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5
> g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5
> 
> It appears my /dev/ad6 (a Seagate -- more irony) must have some bad
> blocks.  Actually, after letting things go for a while, I realised the
> box just locked up.  Probably kernel panic'd due to the I/O problem.
> I'll have to poke at SMART stats later to see what showed up.

Wow, pretty crazy!  Hmm, and yes, those LBAs do look close together.
Well, let me know how the smartctl output looks.  I'd be curious if your
bad sector count rises.  I had noticed that 1

BTW, I tried:

crater# dd if=/dev/ad1s4 of=/dev/null bs=64k
^C1408596+0 records in
1408596+0 records out
92313747456 bytes transferred in 1415.324362 secs (65224446 bytes/sec)

(I let it go for 92GB or so) - no messages about ad1.  So I wonder if
this points at either the cable connector on ad0 or the drive itself.  I
guess I'd rather have a failing drive than motherboard...

I originally was wondering if somehow something peculiar about ZFS's
disk access pattern was making it happen...

THanks for the recomendations.  I'll keep an eye on it, and I'll let you
know what a cable change does for me.  Still, I have not had any ad0
messages since this morning (I haven't been using the system today much,
but maybe the cron processes are more likely to trigger it...

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1

2008-01-25 Thread Joe Peterson
John Baldwin wrote:
> Hmm, when I look at that graph using schedgraphy from HEAD it just looks
> like xtrs is using up all the CPU.

Yeah, xtrs is eating a lot of CPU, but I've never seen this affect the
mouse movement (making it really jerky) the same way on, e.g., Linux.
And the xtrs test is just a way to *reliably* make it happen.  It
happens intermittently all of the time (at least every few minutes, and
often in small batches) even when the system is pretty idle...

        -Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1

2008-01-25 Thread Joe Peterson
Sam Leffler wrote:
> Sigh, you are correct.  I backrev'd the machine where I ran schedgraph 
> to RELENG_7 and didn't notice the old version mis-parses the ktr file.  
> The graph is totally different w/ schedgraph from HEAD.
> 
> Sorry Joe for misleading you.

No problem, Sam, but the question I have for you now is: do you see
anything with the updated schedgraph that indicates any "freezes" that
look funny?  The length of the ones I saw with mouse movement were
mostly some portion of a second, from maybe 1/8 to 1/2 sec.  And there
should be a lot of them in quick succession.

    Thanks, Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson
Chuck Swiger wrote:
> On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote:
>> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE   
>> UPDATED  WHEN_FAILED RAW_VALUE
>>  1 Raw_Read_Error_Rate 0x000f   114   071   006Pre-fail   
>> Always   -   82422948
> [ ... ]
>>  7 Seek_Error_Rate 0x000f   084   060   030Pre-fail   
>> Always   -   286126605
> [ ... ]
>> 195 Hardware_ECC_Recovered  0x001a   063   046   000Old_age
>> Always   -   166181300
> 
> These numbers are quite worrysome-- they should be zero or nearly so  
> in a healthy drive.

It seems to depend on the drive manufacturer.  E.g. this is a Seagate.  Every
Seagate I've ever had (or heard about on the web via smartctl dumps) reports
very large numbers for these values.  I've heard it described that Seagate
shows you the raw numbers (and correctable errors do happen all the time in
all drives).

In Western Digital drives (IIRC), the numbers shown are the ones that *should*
be zero, thereby hiding the low-level errors.

Hard to say if my numbers are "too high", but these "corrected" error counts
are always frighteningly high in Seagates.

-Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson
0
  ad0s1dONLINE   1 3 0

errors: No known data errors

> Other things which have fixed problems in the past for others:
> 
> * BIOS updates
> * Change of motherboards (sometimes replacing board with same model,
>   other times going with a completely different vendor (implies weird
>   implementation issues or BIOS problems))

I've been using this same motherboard/BIOS for a long time (as well as
this drive), so no changes have happened to the HW recently.  The BIOS
is the newest, available, I believe (It's a Tyan Trinity S2099, so it's
a few years old)

> * Changing SATA cables

I'm using regular ATA 80-pin cables.  Also, these seem to have been
working fine for quite a while now.  But, yes, I have also witnessed bad
cable issues on older systems in the past.  I certainly could try a new
cable and see if it helps.

> * Getting a larger power supply (usually when lots of disk are involved)

I only have two drives, so I think the PS has enough capacity in my case.

Anyway, thanks for the reply and further questions.  Let me know if
anything I've sent back is helpful!

Thanks, Joe
smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3500630A
Serial Number:9QG0DG03
Firmware Version: 3.AAE
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Fri Jan 25 09:55:13 2008 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: ( 430) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   114   071   006Pre-fail  Always   
-   82422948
  3 Spin_Up_Time0x0003   093   093   000Pre-fail  Always   
-   0
  4 Start_Stop_Count0x0032   100   100   020Old_age   Always   
-   56
  5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  Always   
-   1
  7 Seek_Error_Rate 0x000f   084   060   030Pre-fail  Always   
-   286126605
  9 Power_On_Hours  0x0032   095   095   000Old_age   Always   
-   5250
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   020Old_age   Always   
-   59
187 Unknown_Attribute   0x0032   100   100   000Old_age   Always   
-   0
189 Unknown_Attribute   0x003a   100   100   000Old_age   Always   
-   0
190 Temperature_Celsius 0x0022   065   056   045Old_age   Always   
-   605749283
194 Temperature_Celsius 0x0022   035   044   000Old_age   Always   
-   35 (Lifetime Min/Max 0/15)
195 Hardware_ECC_Recovered  0x001a   063   046   000Old_age   Always   
-   166181300
197 Curren

Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson
0
  ad0s1dONLINE   1 3 0

errors: No known data errors

> Other things which have fixed problems in the past for others:
> 
> * BIOS updates
> * Change of motherboards (sometimes replacing board with same model,
>   other times going with a completely different vendor (implies weird
>   implementation issues or BIOS problems))

I've been using this same motherboard/BIOS for a long time (as well as
this drive), so no changes have happened to the HW recently.  The BIOS
is the newest, available, I believe (It's a Tyan Trinity S2099, so it's
a few years old)

> * Changing SATA cables

I'm using regular ATA 80-pin cables.  Also, these seem to have been
working fine for quite a while now.  But, yes, I have also witnessed bad
cable issues on older systems in the past.  I certainly could try a new
cable and see if it helps.

> * Getting a larger power supply (usually when lots of disk are involved)

I only have two drives, so I think the PS has enough capacity in my case.

Anyway, thanks for the reply and further questions.  Let me know if
anything I've sent back is helpful!

Thanks, Joe

smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3500630A
Serial Number:9QG0DG03
Firmware Version: 3.AAE
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Fri Jan 25 09:55:13 2008 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: ( 430) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   114   071   006Pre-fail  Always   
-   82422948
  3 Spin_Up_Time0x0003   093   093   000Pre-fail  Always   
-   0
  4 Start_Stop_Count0x0032   100   100   020Old_age   Always   
-   56
  5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  Always   
-   1
  7 Seek_Error_Rate 0x000f   084   060   030Pre-fail  Always   
-   286126605
  9 Power_On_Hours  0x0032   095   095   000Old_age   Always   
-   5250
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   020Old_age   Always   
-   59
187 Unknown_Attribute   0x0032   100   100   000Old_age   Always   
-   0
189 Unknown_Attribute   0x003a   100   100   000Old_age   Always   
-   0
190 Temperature_Celsius 0x0022   065   056   045Old_age   Always   
-   605749283
194 Temperature_Celsius 0x0022   035   044   000Old_age   Always   
-   35 (Lifetime Min/Max 0/15)
195 Hardware_ECC_Recovered  0x001a   063   046   000Old_age   Always   
-   166181300
197 Curren

"ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson
I've seen mention of this kind of issue before, but I never saw a
solution, except that someone reported that a certain version of 6.x
seemed to make it go away - accounts of this problem are a bit vague.  I
am running 7.0-RC1, and I am seeing the errors periodically, and I am
wondering if this is a known issue.  Note that smartctl does not report
errors logged and gives a "PASSED" to the drive.  I am running at
UDMA100 ATA.  Also, if it matters, I am using ZFS.

Attached is a grep of the /var/log/messages file.  Let me know if anyone
has suggestions.

        Thanks!  Joe
Jan 21 23:39:54 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=54112319
Jan 22 00:06:29 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=51610951
Jan 22 00:16:40 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=53031647
Jan 22 00:30:15 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=54243391
Jan 22 07:05:59 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=51768047
Jan 22 09:08:16 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=55890239
Jan 22 09:17:52 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=55919423
Jan 22 09:23:42 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=53470111
Jan 23 00:26:03 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=53588527
Jan 23 00:26:26 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry 
left) LBA=764596887
Jan 23 00:26:26 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries 
left) LBA=764596887
Jan 23 00:26:26 crater kernel: ad0: FAILURE - WRITE_DMA48 
status=51 error=10 LBA=764596887
Jan 23 03:01:06 crater kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) 
LBA=185819705
Jan 23 03:01:37 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=54837686
Jan 23 03:03:22 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=53472407
Jan 23 03:03:39 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=53627991
Jan 23 11:33:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=5747
Jan 23 12:30:31 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=55407234
Jan 23 13:20:06 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=57779519
Jan 23 17:30:18 crater kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry 
left) LBA=453849407
Jan 23 17:30:19 crater kernel: ad0: FAILURE - READ_DMA48 
status=51 error=10 LBA=453849407
Jan 23 17:30:29 crater kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) 
LBA=187373078
Jan 23 18:34:50 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=1017919
Jan 23 18:35:00 crater kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) 
LBA=54547647
Jan 23 18:35:12 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=56354060
Jan 23 18:35:20 crater kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) 
LBA=53919167
Jan 23 23:59:18 crater kernel: ad0: TIMEOUT - FLUSHCACHE retrying (1 retry left)
Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=237661119
Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (0 retries 
left) LBA=237661119
Jan 24 00:00:27 crater kernel: ad0: FAILURE - WRITE_DMA timed out LBA=237661119
Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=236239553
Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (0 retries 
left) LBA=236239553
Jan 24 00:00:27 crater kernel: ad0: FAILURE - WRITE_DMA timed out LBA=236239553
Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry 
left) LBA=764595671
Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries 
left) LBA=764595671
Jan 24 00:00:27 crater kernel: ad0: FAILURE - WRITE_DMA48 timed out 
LBA=764595671
Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry 
left) LBA=764595671
Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries 
left) LBA=764595671
Jan 24 00:01:13 crater kernel: ad0: FAILURE - WRITE_DMA48 timed out 
LBA=764595671
Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=236180175
Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (0 retries 
left) LBA=236180175
Jan 24 00:01:13 crater kernel: ad0: FAILURE - WRITE_DMA timed out LBA=236180175
Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - FLUSHCACHE retrying (1 retry left)
Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - FLUSHCACHE retrying (0 retries 
left)
Jan 24 02:31:53 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=236191551
Jan 24 04:54:57 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=238068287
Jan 24 04:55:56 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (0 retries 
left) LBA=238068287
Jan 24 04:55:5

Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1

2008-01-24 Thread Joe Peterson
Sam Leffler wrote:
>>  http://www.skyrush.com/downloads/ktr_ule_4.out
>>
> I don't see what it is 
> from the trace data.  It sort of looks like the last thing that ran is 
> the swi4 which is likely a callout (need to check the log file contents 
> to be certain).  If the callback function does something it wouldn't 
> necessarily be visible in the schedgraph plot.  If you could stick a 
> dmesg from booting out in the same spot it might be worthwhile.

OK, I just ran a dmesg and put it up there:

http://www.skyrush.com/downloads/dmesg_4.out

The WRITE_DMA messages are not time-correlated with this issue; I don't
like the looks of those either, but that's a different issue to look into...

> Also if 
> you rebuild the kernel the kernel with DIAGNOSTIC then softclock() will 
> complain about callouts that take longer than 2ms to run.

OK, recompiling now...  Will the new messages appear in dmesg, or in a
log file?

> This might 
> generate too much noise in which case you can adjust the threshold by 
> editing the code in sys/kern/kern_timeout.c.

Cool - thanks for looking at this, and I will let you know what I find!
 Do I need to make another trace concurrently, or should I just repeat
the test procedure and see if I get new messages?

-Thanks, Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


New KTR trace for mouse freezing/stuttering in 7.0-RC1

2008-01-23 Thread Joe Peterson
In an attempt to track down this mouse freezing/stuttering (i.e. "jerky
mouse movement) behavior in FreeBSD 7.0-RC1, I have come up with a
reliable way to cause it to happen, and I have created a longer trace
showing the results.  Note that I am using the ULE scheduler.

In general, it becomes easier to see the effect if there is CPU
activity.  I have noticed it during kernel compiles, while at the same
time loading web pages in firefox that contain images (and moving the
mouse while this is happening).  But a more controlled way to see it is
to run something that uses some CPU and then generating lots of X events.

In my case, I start "xtrs" (TRS-80 emulator) in Model IV mode, which
happens to poll for input, using the CPU.  Then I move the mouse back
and forth quickly between windows in "focus under mouse" mode (in my
case, a KDE focus mode), which causes many focus events quickly.  In
about 15 or 20 seconds, the mouse reliably starts to show erratic
movement, not moving smoothly.

I really hope this can shed more light on what might be going on.  Here
is the trace:

http://www.skyrush.com/downloads/ktr_ule_4.out

        Thanks, Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


  1   2   3   >