Re: Traffic "corruption" in 12-stable
> On Aug 4, 2020, at 11:51, Mark Johnston wrote: > > On Mon, Aug 03, 2020 at 05:22:37PM -0400, Joe Clarke wrote: >>> On Jul 27, 2020, at 15:41, Joe Clarke wrote: >>>> On Jul 27, 2020, at 15:01, Mark Johnston wrote: >>>> There are some fixes for vmx not present in stable/12 (yet). I did a >>>> merge of a number of outstanding revisions. Would you be able to test >>>> the patch? I haven't observed any problems with it on a host using igb, >>>> but I have no ability to test vmx at the moment. >>> >>> I’m down to test anything. I did notice quite a few vmxnet3 changes around >>> performance that appealed to me. I tried a few of them on my last kernel. >>> That took much longer to exhibit the problem, but eventually did. >>> >>> I can tell you I don’t have all of these patches in, though. I’ll build >>> with this diff and start running it now. I’ll let you know how it goes. >> >> So it’s been just over a week of runtime with this full patch set. I have >> seen no further issues with ingress packet “truncation”, and performance has >> been what I expect. I’m going to keep running, but I think this seems like >> a good set to MFC. > > Done in r363844, thanks. Thank you. On day 8, and still no issues. Joe --- PGP Key : http://www.marcuscom.com/pgp.asc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Traffic "corruption" in 12-stable
> On Jul 27, 2020, at 15:41, Joe Clarke wrote: > > > >> On Jul 27, 2020, at 15:01, Mark Johnston wrote: >> >> On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote: >>> About two weeks ago, I upgraded from the latest 11-stable to the latest >>> 12-stable. After that, I periodically see the network throughput come to a >>> near standstill. This FreeBSD machine is an ESXi VM with two interfaces. >>> It acts as a router. It uses vmxnet3 interfaces for both LAN and WAN. It >>> runs ipfw with in-kernel NAT. The LAN side uses a bridge with vmx0 and a >>> tap0 L2 VPN interface. My LAN side uses an MTU of 9000, and my vmx1 (WAN >>> side) uses the default 1500. >>> >>> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN >>> ping times), I know the problem has occurred because my lldpd reports: >>> >>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on >>> bridge0 >>> >>> And if I turn on ipfw verbose messages, I see tons of: >>> >>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed >>> >>> This leads to me to believe packets are being corrupted on ingress. I’ve >>> applied all the recent iflib changes, but the problem persists. What causes >>> it, I don’t know. >>> >>> The only thing that changed (and yes, it’s a big one) is I upgraded to >>> 12-stable. Meaning, the rest of the network infra and topology has >>> remained the same. This did not happen at all in 11-stable. >>> >>> I’m open to suggestions. >> >> There are some fixes for vmx not present in stable/12 (yet). I did a >> merge of a number of outstanding revisions. Would you be able to test >> the patch? I haven't observed any problems with it on a host using igb, >> but I have no ability to test vmx at the moment. > > I’m down to test anything. I did notice quite a few vmxnet3 changes around > performance that appealed to me. I tried a few of them on my last kernel. > That took much longer to exhibit the problem, but eventually did. > > I can tell you I don’t have all of these patches in, though. I’ll build with > this diff and start running it now. I’ll let you know how it goes. So it’s been just over a week of runtime with this full patch set. I have seen no further issues with ingress packet “truncation”, and performance has been what I expect. I’m going to keep running, but I think this seems like a good set to MFC. Thanks again for your help. Joe --- PGP Key : http://www.marcuscom.com/pgp.asc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Traffic "corruption" in 12-stable
> On Jul 27, 2020, at 15:01, Mark Johnston wrote: > > On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote: >> About two weeks ago, I upgraded from the latest 11-stable to the latest >> 12-stable. After that, I periodically see the network throughput come to a >> near standstill. This FreeBSD machine is an ESXi VM with two interfaces. >> It acts as a router. It uses vmxnet3 interfaces for both LAN and WAN. It >> runs ipfw with in-kernel NAT. The LAN side uses a bridge with vmx0 and a >> tap0 L2 VPN interface. My LAN side uses an MTU of 9000, and my vmx1 (WAN >> side) uses the default 1500. >> >> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN >> ping times), I know the problem has occurred because my lldpd reports: >> >> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on >> bridge0 >> >> And if I turn on ipfw verbose messages, I see tons of: >> >> Jul 26 16:02:23 namale kernel: ipfw: pullup failed >> >> This leads to me to believe packets are being corrupted on ingress. I’ve >> applied all the recent iflib changes, but the problem persists. What causes >> it, I don’t know. >> >> The only thing that changed (and yes, it’s a big one) is I upgraded to >> 12-stable. Meaning, the rest of the network infra and topology has remained >> the same. This did not happen at all in 11-stable. >> >> I’m open to suggestions. > > There are some fixes for vmx not present in stable/12 (yet). I did a > merge of a number of outstanding revisions. Would you be able to test > the patch? I haven't observed any problems with it on a host using igb, > but I have no ability to test vmx at the moment. I’m down to test anything. I did notice quite a few vmxnet3 changes around performance that appealed to me. I tried a few of them on my last kernel. That took much longer to exhibit the problem, but eventually did. I can tell you I don’t have all of these patches in, though. I’ll build with this diff and start running it now. I’ll let you know how it goes. Thanks! Joe --- PGP Key : http://www.marcuscom.com/pgp.asc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Traffic "corruption" in 12-stable
> On Jul 27, 2020, at 01:00, Eugene Grosbein wrote: > > 27.07.2020 5:16, Joe Clarke wrote: > >> About two weeks ago, I upgraded from the latest 11-stable to the latest >> 12-stable. After that, I periodically see the network throughput come to a >> near standstill. This FreeBSD machine is an ESXi VM with two interfaces. >> It acts as a router. It uses vmxnet3 interfaces for both LAN and WAN. It >> runs ipfw with in-kernel NAT. The LAN side uses a bridge with vmx0 and a >> tap0 L2 VPN interface. My LAN side uses an MTU of 9000, and my vmx1 (WAN >> side) uses the default 1500. >> >> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN >> ping times), I know the problem has occurred because my lldpd reports: >> >> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on >> bridge0 >> >> And if I turn on ipfw verbose messages, I see tons of: >> >> Jul 26 16:02:23 namale kernel: ipfw: pullup failed >> >> This leads to me to believe packets are being corrupted on ingress. I’ve >> applied all the recent iflib changes, but the problem persists. What causes >> it, I don’t know. >> >> The only thing that changed (and yes, it’s a big one) is I upgraded to >> 12-stable. Meaning, the rest of the network infra and topology has remained >> the same. This did not happen at all in 11-stable. >> >> I’m open to suggestions. > > First, try: ifconfig $ifname -rxcsum -txcsum Thanks for the suggestion. I should have mentioned I’ve been initializing these two interfaces since 11-stable with: ifconfig_vmx0="up mtu 9000 -tso -lro -vlanhwtso -rxcsum -txcsum -rxcsum6 -txcsum6 -tso4 -tso6 -vlanhwcsum” ifconfig_vmx1="DHCP -tso -lro -vlanhwtso -rxcsum -txcsum -rxcsum6 -txcsum6 -tso4 -tso6 -vlanhwcsum” And I’m running: FreeBSD namale.marcuscom.com 12.1-STABLE FreeBSD 12.1-STABLE NAMALE amd64 1201520 1201520 I most recently built this yesterday, but the previous kernel that exhibited the problem was built about a week ago. It had the fragment fixes for iflib.c. Joe > --- PGP Key : http://www.marcuscom.com/pgp.asc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Traffic "corruption" in 12-stable
About two weeks ago, I upgraded from the latest 11-stable to the latest 12-stable. After that, I periodically see the network throughput come to a near standstill. This FreeBSD machine is an ESXi VM with two interfaces. It acts as a router. It uses vmxnet3 interfaces for both LAN and WAN. It runs ipfw with in-kernel NAT. The LAN side uses a bridge with vmx0 and a tap0 L2 VPN interface. My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses the default 1500. Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping times), I know the problem has occurred because my lldpd reports: Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on bridge0 And if I turn on ipfw verbose messages, I see tons of: Jul 26 16:02:23 namale kernel: ipfw: pullup failed This leads to me to believe packets are being corrupted on ingress. I’ve applied all the recent iflib changes, but the problem persists. What causes it, I don’t know. The only thing that changed (and yes, it’s a big one) is I upgraded to 12-stable. Meaning, the rest of the network infra and topology has remained the same. This did not happen at all in 11-stable. I’m open to suggestions. Thanks. Joe --- PGP Key : http://www.marcuscom.com/pgp.asc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
You might look at UFS Explorer. It claims to have ZFS support now. It costs money for a license and I think required windows last I used it. I can attest that a previous version allowed me to recover all the data I needed from a lost UFS mirror almost a decade ago. Sent from my iPhone > On May 7, 2019, at 9:01 PM, Michelle Sullivan wrote: > > Karl Denninger wrote: >>> On 5/7/2019 00:02, Michelle Sullivan wrote: >>> The problem I see with that statement is that the zfs dev mailing lists >>> constantly and consistently following the line of, the data is always right >>> there is no need for a “fsck” (which I actually get) but it’s used to shut >>> down every thread... the irony is I’m now installing windows 7 and SP1 on a >>> usb stick (well it’s actually installed, but sp1 isn’t finished yet) so I >>> can install a zfs data recovery tool which reports to be able to “walk the >>> data” to retrieve all the files... the irony eh... install windows7 on a >>> usb stick to recover a FreeBSD installed zfs filesystem... will let you >>> know if the tool works, but as it was recommended by a dev I’m hopeful... >>> have another array (with zfs I might add) loaded and ready to go... if the >>> data recovery is successful I’ll blow away the original machine and work >>> out what OS and drive setup will be safe for the data in the future. I >>> might even put FreeBSD and zfs back on it, but if I do it won’t be in the >>> current Zraid2 config. >> Meh. >> >> Hardware failure is, well, hardware failure. Yes, power-related >> failures are hardware failures. >> >> Never mind the potential for /software /failures. Bugs are, well, >> bugs. And they're a real thing. Never had the shortcomings of UFS bite >> you on an "unexpected" power loss? Well, I have. Is ZFS absolutely >> safe against any such event? No, but it's safe*r*. > > Yes and no ... I'll explain... > >> >> I've yet to have ZFS lose an entire pool due to something bad happening, >> but the same basic risk (entire filesystem being gone) > > Everytime I have seen this issue (and it's been more than once - though until > now recoverable - even if extremely painful) - its always been during a > resilver of a failed drive and something happening... panic, another drive > failure, power etc.. any other time its rock solid... which is the yes and > no... under normal circumstances zfs is very very good and seems as safe as > or safer than UFS... but my experience is ZFS has one really bad flaw.. if > there is a corruption in the metadata - even if the stored data is 100% > correct - it will fault the pool and thats it it's gone barring some luck and > painful recovery (backups aside) ... this other file systems also suffer but > there are tools that *majority of the time* will get you out of the s**t with > little pain. Barring this windows based tool I haven't been able to run yet, > zfs appears to have nothing. > >> has occurred more >> than once in my IT career with other filesystems -- including UFS, lowly >> MSDOS and NTFS, never mind their predecessors all the way back to floppy >> disks and the first 5Mb Winchesters. > > Absolutely, been there done that.. and btrfs...*ouch* still as bad.. however > with the only one btrfs install I had (I didn't knopw it was btrfs > underneath, but netgear NAS...) I was still able to recover the data even > though it had screwed the file system so bad I vowed never to consider or use > it again on anything ever... > >> >> I learned a long time ago that two is one and one is none when it comes >> to data, and WHEN two becomes one you SWEAT, because that second failure >> CAN happen at the worst possible time. > > and does.. > >> >> As for RaidZ2 .vs. mirrored it's not as simple as you might think. >> Mirrored vdevs can only lose one member per mirror set, unless you use >> three-member mirrors. That sounds insane but actually it isn't in >> certain circumstances, such as very-read-heavy and high-performance-read >> environments. > > I know - this is why I don't use mirrored - because wear patterns will ensure > both sides of the mirror are closely matched. > >> >> The short answer is that a 2-way mirrored set is materially faster on >> reads but has no acceleration on writes, and can lose one member per >> mirror. If the SECOND one fails before you can resilver, and that >> resilver takes quite a long while if the disks are large, you're dead. >> However, if you do six drives as a 2x3 way mirror (that is, 3 vdevs each >> of a 2-way mirror) you now have three parallel data paths going at once >> and potentially six for reads -- and performance is MUCH better. A >> 3-way mirror can lose two members (and could be organized as 3x2) but >> obviously requires lots of drive slots, 3x as much *power* per gigabyte >> stored (and you pay for power twice; once to buy it and again to get the >> heat out of the room where the machine is.) > > my problem (as always) is slots not
Re: CFT: FreeBSD Package Base
With CFT version you chose to build, and package individual components such as sendmail with a port option. That does entirely solve the problem of being able to reinstall sendmail after the fact without a rebuild of the userland (base) port but perhaps base flavors could solve that problem assuming flavors could extend beyond python. Joe Maloney Quality Engineering Manager / iXsystems Enterprise Storage & Servers Driven By Open Source > On Apr 29, 2019, at 3:31 PM, Cy Schubert wrote: > > In message <201904291441.x3tefmid072...@gndrsh.dnsmgr.net>, "Rodney W. > Grimes" > writes: >>> On Mon, Apr 29, 2019 at 10:09 AM Rodney W. Grimes < >>> freebsd-...@gndrsh.dnsmgr.net> wrote: >>> >>>>> >>>>> Correct, this is ZFS only. And it's something we're using specific to >>>> FreeNAS / TrueOS, which is why I didn't originally mention it as apart of >>>> our CFT. >>>> >>>> Then please it is "CFT: FreeNAS/TrueOS pkg base, ZFS only", >>>> calling this FreeBSD pkg base when it is not was wrong, >>>> and miss leading. >>>> >>> >>> Sorry, I disagree. >> Which is fine. >> >>> This pkg base is independent of the ZFS tool we're using >>> to wrangle boot-environments. Hence why it wasn't mentioned in the CFT. >>> These base packages work the same as existing in-tree pkg base on UFS, no >>> difference. If anything are probably safer due to being able to update all >>> of userland in single extract operation, so you don't have out of order >>> extraction of libc or some such. >> >> You missed the major string change and focused on the edge, >> No comment on calling iXsystems :stuff: FreeBSD instead of FreeNAS/TrueOS? >> >> That was the major point of my statement, your miss leading the user >> community, you yourself said this would never be imported into FreeBSD >> base, so I see no reason that it should be called "FreeBSD package Base", >> as it is not, that is a different project. > > Taking the last comment on this thread to ask a question and maybe > refocus a little. > > The discussion about granularity begs the question, why pkgbase in the > first place? My impression was that it allowed people to select which > components they wanted to either create a lean installation or mix and > match base packages and ports (possibly with flavours to install in > /usr rather than $LOCALBASE) such that maybe person A wanted a stock > install while person B wanted to replace, picking a random example, BSD > tar with GNU tar. Isn't that the real advantage of pkgbase? > > If OTOH it's binary updates V 2.0, what's the point? I'm a little > rhetorical here but you get my point. If I want ipfw instead pf or > ipfilter instead of the others I should have the freedom. Similarly if > I want vim instead of vi I should have the choice to install vim as > /usr/bin/vi. Otherwise all the effort to replace binary updates makes > no sense. > > > -- > Cheers, > Cy Schubert > FreeBSD UNIX: Web: http://www.FreeBSD.org > > The need of the many outweighs the greed of the few. > > > ___ > freebsd-curr...@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic on 11-STABLE with Xen guest
On 11/26/18 13:31, John Baldwin wrote: > On 11/22/18 12:39 PM, Joe Clarke wrote: >> I believe after the commit 340016 for the dynamic IRQ layout, my Xen VM >> started to panic. I just upgraded the kernel today and saw this: >> >> xen: unable to map IRQ#2 >> panic: Unable to register interrupt override >> cpuid = 0 >> KDB: stack backtrace: >> #0 0x8060a4e7 at kdb_backtrace+0x67 >> #1 0x805c3787 at vpanic+0x177 >> #2 0x805c3603 at panic+0x43 >> #3 0x8093a766 at madt_parse_ints+0x96 >> #4 0x803353f9 at acpi_walk_subtables+0x29 >> #5 0x8093a5e6 at xenpv_register_pirqs+0x56 >> #6 0x80928296 at intr_init_sources+0x116 >> #7 0x8055eba8 at mi_startup+0x118 >> #8 0x8029902c at btext+0x2c >> >> The following kernel works: >> >> @(#)FreeBSD 11.2-STABLE #4: Thu Nov 1 02:24:07 EDT 2018 >> FreeBSD 11.2-STABLE #4: Thu Nov 1 02:24:07 EDT 2018 >> root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE >> >> The following kernel produces the panic above immediately on boot: >> >> @(#)FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018 >> FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018 >> root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE >> >> Attached is a screen grab of the console of the panic. > > Hmm, I don't see any obvious candidates of Xen changes that weren't included > in the MFC. I've added royger@ (who maintains Xen in FreeBSD) to the cc to > see if he has an idea. > > Roger, the main changes that aren't MFC'd to 11 from 12/head seem to be some > refcounting on event channels and PVHv2 vs PVHv1? Thanks for the follow-up, John. Apparently, there was an incomplete MFC. Roger added the missing bit today in r340982 which resolved the panic. Joe Joe > -- PGP Key : http://www.marcuscom.com/pgp.asc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic on 11-STABLE with Xen guest
On 11/25/18 18:22, Richard M.Timoney wrote: > I have the same failure to boot 11-stable as a DomU host on xen_version: > 4.4.1 > > > Kernel I was trying was recent, FreeBSD 11.2-STABLE (GENERIC) #23 > r334205:340834 > > > commit 340016 for the dynamic IRQ layout seems rather involved and I doubt I > could isolate the problem, but maybe it is in Yep. This is what I believe as well. I'm using Xen 3.4 with RootBSD. Joe > > > 338631: > xen: legacy PVH fixes for the new interrupt count > > Register interrupts using the PIC pic_register_sources method instead > of doing it in apic_setup_io. This is now required, since the internal > interrupt structures are not yet setup when calling apic_setup_io. > > -- > Richard M. Timoney > (richa...@maths.tcd.ie) Tel. +353-1-896 1196 > School of Mathematics, Trinity College, Dublin 2, Ireland > WWW https://www.maths.tcd.ie/~richardt FAX +353-1-896 2282 > -- PGP Key : http://www.marcuscom.com/pgp.asc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Panic on 11-STABLE with Xen guest
I believe after the commit 340016 for the dynamic IRQ layout, my Xen VM started to panic. I just upgraded the kernel today and saw this: xen: unable to map IRQ#2 panic: Unable to register interrupt override cpuid = 0 KDB: stack backtrace: #0 0x8060a4e7 at kdb_backtrace+0x67 #1 0x805c3787 at vpanic+0x177 #2 0x805c3603 at panic+0x43 #3 0x8093a766 at madt_parse_ints+0x96 #4 0x803353f9 at acpi_walk_subtables+0x29 #5 0x8093a5e6 at xenpv_register_pirqs+0x56 #6 0x80928296 at intr_init_sources+0x116 #7 0x8055eba8 at mi_startup+0x118 #8 0x8029902c at btext+0x2c The following kernel works: @(#)FreeBSD 11.2-STABLE #4: Thu Nov 1 02:24:07 EDT 2018 FreeBSD 11.2-STABLE #4: Thu Nov 1 02:24:07 EDT 2018 root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE The following kernel produces the panic above immediately on boot: @(#)FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018 FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018 root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE Attached is a screen grab of the console of the panic. Joe -- PGP Key : http://www.marcuscom.com/pgp.asc ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: drm / drm2 removal in 12
Thanks for the drm-next efforts. I could not, and would not be using FreeBSD without it. Joe Maloney On Mon, Aug 27, 2018 at 5:58 AM Thomas Mueller wrote: > Excerpt from Oliver Pinter: > > > Let's do some more step backwards, and see how the graphics driver > > developments works from the corporation side. > > They not bother about any of the BSDs, they focus only to Windows and > > Linux. If you want to use a recent (haha recent, something after 2014) > you > > are forced to use new drivers from linux. > > The fore/advantage on the Linux side are the zillions of corporately paid > > kernel developers. > > They can just focus on a new hw supports, on freebsd side, there are no > > corporately paid drm driver developer. Sadly. > > In linux word their internal KPI (try a Google for a "stable API > nonsense" > > words) moves so fastly, that porting of these drivers gets non trivial > > without a dedicated paid team. > > > If you want to change on this situation, try to learn for you could help > or > > send directed donations to freebsd foundation. ;) > > Linux and FreeBSD are not the only open-source OSes. > > There is also (Net, Open, DragonFly)BSD, Haiku, OpenIndiana and others. > > Maybe better would be for the hardware manufacturers to release more > general specifications that could be adapted to any OS, by the NetBSD > developers, Haiku developers, etc. Certainly not to ignore Linux. > > Tom > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Jenkins build is still unstable: FreeBSD_stable_10 #302
Small qualifier, I have had trouble with that. But not since, my build issues have been without that. https://www.youtube.com/watch?v=I9MZNEXrElw On 10/07/2016 9:17 PM, Joe Shevland wrote: (My foot-shooting moments have involved LibreSSL and tomcat-native. I've removed them since). On 10/07/2016 8:30 PM, Joe Shevland wrote: I'm wondering if it's my build process where I'm seeing issues. I have been tracking -stable on a spare machine lately, and I've had about 60% success rate on a full build world/kernel etc. (following UPDATING instructions) on the times I do it. Been a few foot-shooting moments, but those aside, still what look to be a few just broken builds. Typically to resolve this, I'd just 'svnlite -up' in /usr/src, and rebuild, and it works fine (this little Atom/Shuttle doesn't compile things too quickly, so that's a window of 6 hours at least). Normally, I'm used to a gated commit system i.e. you commit changes, the change/s in question compiles successfully (with any other changes that have been committed by others), and only then those changes are promoted to another branch or tag (where they should compile w/o problems). Is that what happens, or am I doing things wrong? I follow that little chunk down the bottom of UPDATING normally to do a full world/kernel build. Cheers, Joe On 10/07/2016 5:59 PM, jenkins-ad...@freebsd.org wrote: See <https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/302/> ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Jenkins build is still unstable: FreeBSD_stable_10 #302
(My foot-shooting moments have involved LibreSSL and tomcat-native. I've removed them since). On 10/07/2016 8:30 PM, Joe Shevland wrote: I'm wondering if it's my build process where I'm seeing issues. I have been tracking -stable on a spare machine lately, and I've had about 60% success rate on a full build world/kernel etc. (following UPDATING instructions) on the times I do it. Been a few foot-shooting moments, but those aside, still what look to be a few just broken builds. Typically to resolve this, I'd just 'svnlite -up' in /usr/src, and rebuild, and it works fine (this little Atom/Shuttle doesn't compile things too quickly, so that's a window of 6 hours at least). Normally, I'm used to a gated commit system i.e. you commit changes, the change/s in question compiles successfully (with any other changes that have been committed by others), and only then those changes are promoted to another branch or tag (where they should compile w/o problems). Is that what happens, or am I doing things wrong? I follow that little chunk down the bottom of UPDATING normally to do a full world/kernel build. Cheers, Joe On 10/07/2016 5:59 PM, jenkins-ad...@freebsd.org wrote: See <https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/302/> ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Jenkins build is still unstable: FreeBSD_stable_10 #302
I'm wondering if it's my build process where I'm seeing issues. I have been tracking -stable on a spare machine lately, and I've had about 60% success rate on a full build world/kernel etc. (following UPDATING instructions) on the times I do it. Been a few foot-shooting moments, but those aside, still what look to be a few just broken builds. Typically to resolve this, I'd just 'svnlite -up' in /usr/src, and rebuild, and it works fine (this little Atom/Shuttle doesn't compile things too quickly, so that's a window of 6 hours at least). Normally, I'm used to a gated commit system i.e. you commit changes, the change/s in question compiles successfully (with any other changes that have been committed by others), and only then those changes are promoted to another branch or tag (where they should compile w/o problems). Is that what happens, or am I doing things wrong? I follow that little chunk down the bottom of UPDATING normally to do a full world/kernel build. Cheers, Joe On 10/07/2016 5:59 PM, jenkins-ad...@freebsd.org wrote: See <https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/302/> ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ia64 stable/10 r286316: hang at Entering /boot/kernel/kernel
To add a very small (useless) data point to this, I have an atom device that, very occasionally, hangs before the boot stage (at the little slash, prior to the daemon boot menu offering you the chance to select another kernel etc). I haven't worked out the rhyme or reason yet, so its probably a red herring, but its frustrated me when i have to dig out the monitor and keyboard again. At least it did with 10.1-release, yet to have it happen with stable. Cheers, Joe On 28/08/2015 8:30 PM, Anton Shterenlikht wrote: >From kostik...@gmail.com Thu Aug 27 18:22:37 2015 On Thu, Aug 27, 2015 at 01:12:16PM +0100, Anton Shterenlikht wrote: ia64 stable/10 r286315 boots, but r286316 hangs at "Entering /boot/kernel/kernel". Please advise To state an obvious thing. The commit which you pointed to, changes the code which is not executed at that early kernel boot stage. The revision cannot cause the consequences you described. yes, I'm surprised too. I think that you either have build-environment issue which randomly pops up, or there is some other boot-time issue which is sporadic. The only suggestion I have, try many boots with kernels which look either good or bad, I would be not surprised if statistic would be completely different from binary good/bad outcome. Otherwise, I do not have an idea. I doubt it's a random or a sporadic issue. I did a bisection, as suggested, during which I built world/kernel on 7 revisions, and when I narrowed it down to <50, a further 4 kernels. All kernels <=286315 boot, all kernels >= 286316 do not. I think if it were something random, it wouldn't be such a clear cut picture. What about my loader.conf: # cat /boot/loader.conf zfs_load="YES" # soft limits kern.dfldsiz=536748032 # default soft limit for process data kern.dflssiz=536748032 # default soft limit for stack # hard limits kern.maxdsiz=536748032 # hard limit for process data kern.maxssiz=536748032 # hard limit for stack kern.maxtsiz=536748032 # hard limit for text size # processes may not exceed these limits. # My memory: real memory = 8589934592 (8192 MB) avail memory = 8387649536 (7999 MB) I'll try disabling all these settings in loader.conf and see if makes a difference. But these settings have been there for a few years with no problems. Anton ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
image solutions
How are you? We offer photo editing: Like ecommerce photos editing, jewelry photo retouching, beauty retouching, Wedding photos editing, image cut out and clipping path. Quality is good Turnaround time fast You may send us a test photo to judge our quality. Have a good day! Best regards, Joe Email: songe...@tom.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
mps in GENERIC in FreeBSD 9.2R i386
Did nobody ever verify this for Ken? > On Mon, Oct 01, 2012 at 23:38:33 +0530, Desai, Kashyap wrote: > > > > > > > -Original Message- > > > From: owner-freebsd-stable at freebsd.org [mailto:owner-freebsd- > > > stable at freebsd.org] On Behalf Of Kenneth D. Merry > > > Sent: Monday, October 01, 2012 8:58 PM > > > To: John Baldwin > > > Cc: Harald Schmalzbauer; freebsd-stable at freebsd.org > > > Subject: Re: mps in GENERIC, only in amd64? (RELENG_9_1) > > > > > > On Mon, Oct 01, 2012 at 08:49:36 -0400, John Baldwin wrote: > > > > On Saturday, September 29, 2012 5:58:42 am Harald Schmalzbauer wrote: > > > > > Hello, > > > > > > > > > > accidentally I saw that mps is included in sys/amd64/conf/GENERIC, > > > but > > > > > not in sys/i386/conf/GENERIC. > > > > > Is this intended? > > > > > > > > Have you tested it on i386? From the log message, Ken (cc'd) only > > > added it > > > > on amd64 as it hadn't been tested on i386. > > > > > > That was certainly the case two years ago. Since then, though, I think > > > the LSI folks have tested it on i386. If we get reports of success > > > using it on i386, I don't see any issue with putting it in GENERIC. > > > > YES LSI has tested i386 arch on different Released FreeBSDs of 7.x, 8.x and > > 9.x > series. > > > > That confirms it. I'll go ahead and check it into head if someone with an > i386 build environment can confirm that the driver in head builds properly > on i386. > > Thanks, > > Ken It seems to compile cleanly on i386. I don't have an easy way to test it though (only compiling in a VM). ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 9.2-PRE: switch off that stupid "Nakatomi Socrates"
On 30/09/2013 14:50, Matthieu Volat wrote: Le 30 sept. 2013 à 01:54, Ricardo Ferreira a écrit : Em 29-09-2013 19:11, Charles Sprickman escreveu: On Sep 29, 2013, at 3:28 PM, C. P. Ghost wrote: On 28.09.2013 11:32, Phil Regnauld wrote: Teske, Devin (Devin.Teske) writes: If you work seriously on serious issues long enough... you'll become burned- out. Let me just come right out and say it... I coded it. And thanks, you got me chuckling - nice to see some humor once in a while. To the offended poster: read the last line of tunefs(8) - there's probably many more places you could use serious time looking for deviations from corporate correctnes. Humor can even be etched in silicon, like e.g. on an IC created by Siemens: http://micro.magnet.fsu.edu/creatures/pages/bunny.html Cisco too, besides weird Star Wars ROM messages, you have stuff like the "BFR" (Big F*cking Router, after Big F*cking Gun in Doom) screened on the PCB: https://www.kumari.net/gallery/index.php/Technology/Networking/BFR_2_001 https://www.kumari.net/gallery/index.php/Technology/Networking/BFR_2 I have no idea what Sluggo and Nancy are doing on this board: https://www.kumari.net/gallery/index.php/Technology/Networking/CIMG0988 Charles ;-) -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" keep it cool u have others like: man chmod... BUGS There is no perm option for the naughty bits of a horse. and so many others. So... I find strange nobody mentioned the one in make :) % make love Not War. -- Mazhe Alas, not for much longer as bmake doesn't handle that target: root@build:/pseudosrc/misc # make love make: don't know how to make love. Stop ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FreeBSD 9.1 ix driver vlan problem
On 25/09/2013 22:10, Dmitry Morozovsky wrote: On Wed, 25 Sep 2013, Rumen Telbizov wrote: Thanks for the heads-up Oleg, although not the news that I was hoping for. So what I am going to do right now is reinstall with 9.2 and recompile the driver with your patch. I'll come back to the list with my results. FWIW, we're (with oleg@, yeah) using this patch on stable/9, so you're welcome to test this on your 9 It's supposedly way too late to try to include this fix into 9.2-R, but maybe it's worth the errata notice... This happens on several other intel chipsets as well, no previous errata was ever noted (legacy em, for example) :( ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: virtio for 9.1-R
On 27/11/2012 23:22, Bryan Venteicher wrote: Hi, - Original Message - From: "Joe Holden" To: "Sergey Kandaurov" Cc: freebsd-stable@freebsd.org Sent: Tuesday, November 27, 2012 2:49:07 PM Subject: Re: virtio for 9.1-R On 27/11/2012 19:25, Sergey Kandaurov wrote: On 27 November 2012 22:12, Joe Holden wrote: Hi guys, I can't see virtio in releng/9.1, is there any particular reason why it isn't going to be included given that it works reasonable well (and is optional anyway, so not likely to be detrimental)? virtio appeared in stable/9 a bit after 9.1 cut off, and it is too late now regardless of virtio shape. Anyway you can installed it from ports. Ah I see, doesn't really help all the people who can't install it in KVM and such though unfortunately, seems silly making them wait even longer and having to use Linux :) Yes - it is long overdue and something I plan to fix in the next month. There have been off-list patches floating around that do just that. I also plan to spend my spare time in Dec. to work on FreeBSD VirtIO improvements/bugs/nags. I've been busy with $JOB and have been busy finishing up a VMware vmxnet driver. Bryan cheers ___ Sounds good, FWIW I've been using it for a while and it works rather well (on 9.0-R), of course this requires that the KVM instance can be switched to ide mode first (or a custom iso/image uploaded which isn't always possible) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: virtio for 9.1-R
On 27/11/2012 19:25, Sergey Kandaurov wrote: On 27 November 2012 22:12, Joe Holden wrote: Hi guys, I can't see virtio in releng/9.1, is there any particular reason why it isn't going to be included given that it works reasonable well (and is optional anyway, so not likely to be detrimental)? virtio appeared in stable/9 a bit after 9.1 cut off, and it is too late now regardless of virtio shape. Anyway you can installed it from ports. Ah I see, doesn't really help all the people who can't install it in KVM and such though unfortunately, seems silly making them wait even longer and having to use Linux :) cheers ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
virtio for 9.1-R
Hi guys, I can't see virtio in releng/9.1, is there any particular reason why it isn't going to be included given that it works reasonable well (and is optional anyway, so not likely to be detrimental)? Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Checksum errors across ZFS array
Hi James, It's almost definitely a memory problem. I'd change it ASAP if I were you. I lost about 70mb from my zfs pool for this very reason just a few weeks ago. Luckily I had enough snapshots from before the rot set in to recover most of what I lost. Joe -- Dr Joe Karthauser On 19 Jul 2012, at 16:29, James Snow wrote: > I have a ZFS server on which I've seen periodic checksum errors on > almost every drive. While scrubbing the pool last night, it began to > report unrecoverable data errors on a single file. > > I compared an md5 of the supposedly corrupted file to an md5 of the > original copy, stored on different media. They were the same, suggesting > no corruption. > > A large file was being written to the pool while the scrub was in > progress, and the entire array became unresponsive. The OS was still up, > but 'zpool status' showed the scrub progress stuck at the same spot, > with the throughput rate falling. 'shutdown -r now' stalled. Eventually > I hard power cycled the system. > > Now, attempting to read the file that ZFS reports errors on yields > "Input/output error." The scrub completed, with the following result: > >NAME STATE READ WRITE CKSUM >tank ONLINE 0 0 7 > mirror-0 ONLINE 0 0 0 >aacd0p1 ONLINE 0 0 0 >aacd4p1 ONLINE 0 0 1 > mirror-1 ONLINE 0 0 0 >aacd1p1 ONLINE 0 0 0 >aacd5p1 ONLINE 0 0 0 > mirror-2 ONLINE 0 014 >aacd2p1 ONLINE 0 014 >aacd6p1 ONLINE 0 014 > mirror-3 ONLINE 0 0 0 >aacd3p1 ONLINE 0 0 0 >aacd7p1 ONLINE 0 0 0 > > The system configuration is as follows: > > Controller: Adaptec 2805 > Motherboard: Supermicro X8STE > Drive Cage: 2x Supermicro CSE-M35T-1 > Memory: 2x Kingston 12GB ECC (KVR1066D3E7SK3/12G) > PSU: Nexus RX-7000 > OS: 9.0-RELEASE-p3 > ZFS: ZFS filesystem version 5, ZFS storage pool version 28 > > > The Adaptec card has 2 ports, each of which uses a 4-port fan-out cable. > The cables are routed as shown: > > /--- aacd0 (ST1000DM003-9YN1 CC4D) > / /-- aacd1 (ST1000DM003-9YN1 CC4D) > p1- > \ \-- aacd2 (WDC WD1001FALS-0 05.0) > \--- aacd3 (WDC WD1001FALS-0 05.0) > > /--- aacd4 (ST1000DM003-9YN1 CC4D) > / /-- aacd5 (ST1000DM003-9YN1 CC4D) > p2- > \ \-- aacd6 (WDC WD1002FAEX-0 05.0) > \--- aacd7 (WDC WD1002FAEX-0 05.0) > > You can see that each ZFS mirror device is comprised of one drive from > each drive carrier, on separate ports, on separate cables. > > Since I have seen periodic checksum errors on almost every drive but the > only common component is the Adapter controller and the motherboard, I > suspect the controller. (Or the motherboard, but I'm starting with the > controller since it's much simpler to swap out.) > > Could it be something else? What else I should be looking at? Any input > greatly appreciated. > > > -Snow > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.eventtimer.periodic
Joe Holden wrote: Hey, So I have another box that has time issues since being upgraded to 9.0-REL, again kern.eventtimer.periodic=1 seems to be the fix. Should this perhaps be a default in future releases? Sigh... correct list this time. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: New BSD Installer
Joe Holden wrote: Alex Samorukov wrote: On 02/10/2012 06:56 PM, Joe Holden wrote: Guys, This should really be reverted to sysinstall until the new installer is at least in a state where it consistently works... the most important part of a new users experience is the installer and the few new installs I have done lately I've just installed 8.2 and upgraded from there as the new installer is terribly buggy. Hi, I am highly against reverting. Old installer is not GPT aware and in fact is unmaintained for a very long time. True, there is that. About ftp - its probably needs to be handled better, but most of the user i think using cd/dvd image, so it is not an issue. And new installer is written on shell, so i think its better to fix broken parts then to revert it to outdated and unmaintained code. True also perhaps, could be more user friendly though especially for people just installing it - I have been using my own install scripts and such since 5 but am giving the new installer a go at the moment... P.S. i personally had no problems with a new installer, used it from DVD. On a related note - does the new installer have any kind of config file for unattended installs a la sysinstall? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: New BSD Installer
Alex Samorukov wrote: On 02/10/2012 06:56 PM, Joe Holden wrote: Guys, This should really be reverted to sysinstall until the new installer is at least in a state where it consistently works... the most important part of a new users experience is the installer and the few new installs I have done lately I've just installed 8.2 and upgraded from there as the new installer is terribly buggy. Hi, I am highly against reverting. Old installer is not GPT aware and in fact is unmaintained for a very long time. True, there is that. About ftp - its probably needs to be handled better, but most of the user i think using cd/dvd image, so it is not an issue. And new installer is written on shell, so i think its better to fix broken parts then to revert it to outdated and unmaintained code. True also perhaps, could be more user friendly though especially for people just installing it - I have been using my own install scripts and such since 5 but am giving the new installer a go at the moment... P.S. i personally had no problems with a new installer, used it from DVD. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: New BSD Installer
Joe Holden wrote: Guys, This should really be reverted to sysinstall until the new installer is at least in a state where it consistently works... the most important part of a new users experience is the installer and the few new installs I have done lately I've just installed 8.2 and upgraded from there as the new installer is terribly buggy. Few things: - On my installs at least, if there is an unknown ftp connection problem the installer will just bail and say it has been aborted - this consistently happens when ftp.de is selected - there is no method of stepping back through the install - If a dhcp lease request times out manual configuration isn't offered Another one I've just encountered several times: For some reason the output for setting root password has new lines and lots of space between the various bits of text and isn't taking any input (see http://i.imgur.com/lTP5b.png) The lack of installation progress or emergency shell on another terminal is also something that I think should be considered - being able to see whats going on and getting error output from the commands the installer is running is invaluable. I realise that a lot of work has gone into it and it's nice and new, but really unless it's finished it shouldn't be the default. Thanks, J ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
New BSD Installer
Guys, This should really be reverted to sysinstall until the new installer is at least in a state where it consistently works... the most important part of a new users experience is the installer and the few new installs I have done lately I've just installed 8.2 and upgraded from there as the new installer is terribly buggy. Few things: - On my installs at least, if there is an unknown ftp connection problem the installer will just bail and say it has been aborted - this consistently happens when ftp.de is selected - there is no method of stepping back through the install - If a dhcp lease request times out manual configuration isn't offered I realise that a lot of work has gone into it and it's nice and new, but really unless it's finished it shouldn't be the default. Thanks, J ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Timekeeping in stable/9
Ronald Klop wrote: On Sat, 21 Jan 2012 14:11:51 +0100, Martin Sugioarto wrote: Am Sat, 21 Jan 2012 13:20:51 +0100 schrieb "Ronald Klop" : Hi, As I understand it. Host: FreeBSD 9 Guest: WinXP Which one has troubles with its clock? The host or the guest or both? Hi, only inside VirtualBox, I think it's only an application problem and my emails would be probably better addressed to ports@. ONLY the guest is affected when host is loaded. I noticed additionally: You get better results with a desync'ed clock in the guest system, when you start "openssl speed -multi 20" or similar. Within a few seconds the clock gets a 20 seconds difference. How many CPU's did you assign to the guest? Did you install virtualbox guest additions to the guest? Here a few details (guest additions are installed): Memory size: 1600MB Page Fusion: off VRAM size: 256MB HPET:on/off (tried both settings) Chipset: piix3 Firmware:BIOS Number of CPUs: 1 Synthetic Cpu: off CPUID overrides: None [...] ACPI:on IOAPIC: off PAE: on Time offset: 0 ms RTC: local time Hardw. virt.ext: on Hardw. virt.ext exclusive: on Nested Paging: on Large Pages: on VT-x VPID: on [...] 3D Acceleration: off 2D Video Acceleration: on Do you run NTP on the guest XP also? If yes, turn it off. Windows XP default installation (synch'ed to time.windows.com). Switching this off, does not have any influence. I think MS-Windows does not do continuous synchronization, only at system start, I guess. VBox guest additions can sync the guest clock with the host. I'll try to deinstall them. But I somehow like my shared folder. BTW: My experience with VBox is that it is nice for hobby stuff, but not for heavy load server stuff. VMWare does a better job there. Yes. I know. Still VirtualBox ist nice and cheap solution. -- Martin BTW: I used VBox on Linux at work. Same problems. Different problems come and go with different versions of Linux in combination with different versions of VirtualBox. Using VmWare ESXI solved it. If you search a lot on the vmware website you will find a free version. Ronald. In the extreme case I have here, the host isn't taxed at all, cpu, disk i/o and such are almost idle but the time is skewed dramatically regardless. For reference the settings I have are: 4 VCPUS (4 physical cores) 1GB ram ICH9, SAS controller If I toggle the sysctl in my previous post the problem goes way, and doesn't return even if the sysctl is changed back... until a reboot of course. None of the pre-9 guests (there are quite a few spread across a couple of identical machines) exhibit the behaviour, nor does this particular one when reverted to a pre-upgrade snapshot, so in this case it is certainly not the hardware but whatever is used to keep track of the "ticks" (terminology probably incorrect) Thanks, J ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Timekeeping in stable/9
Joe Holden wrote: Chuck Swiger wrote: On Jan 19, 2012, at 12:18 PM, Joe Holden wrote: Sounds like you were looking for commercial support, since unpaid volunteers don't have an obligation to promptly leap out and provide solutions within your ETA. Not really, just an acknowledgement would be fine. It is what it is, everyday I try to argue in favour of the project, I still use it for myself everywhere but increasingly things happen that just don't on other volunteer projects... it's rather difficult to argue the case when they can install Ubuntu or whatever nonsense distro is the current favourite and it just works. Just a bit more accurate info would solve it, if it doesn't do X reliably, or Y has changed, note it. You asked a question and got two or three responses back in a day. You mentioned trying different timekeeping choices, but I don't recall seeing what your kern.timecounter sysctl values looked like; without that, folks are missing info that is likely to be relevant. Ah, well Regards, Yeah my gripe isn't with having no responses, the handful of people that have responded have been helpful but ultimately no responses from anyone involved. Just a one liner saying "we changed the timecounter stuff in 9, look at sysctl tree X" would have been more than sufficient, this sort of thing should really be mentioned in the relnotes though... For the record though, setting kern.eventtimer.periodic to 1 fixes the problem on all affected machines (returns my virtualbox guest to normality, reduces the drift on physical machines to 8.2 figures). FWIW, I can't even see any notes relating to this in UPDATING either. I should probably clarify here that some responses were received from the maintainers (eg: Qing for mpath) for a couple of bits of code but the wider issues weren't discussed further. I'm not trying to say that no effort is made, but as a whole for the project to be comparable to the alternatives this sort of thing shouldn't happen. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Timekeeping in stable/9
Chuck Swiger wrote: On Jan 19, 2012, at 12:18 PM, Joe Holden wrote: Sounds like you were looking for commercial support, since unpaid volunteers don't have an obligation to promptly leap out and provide solutions within your ETA. Not really, just an acknowledgement would be fine. It is what it is, everyday I try to argue in favour of the project, I still use it for myself everywhere but increasingly things happen that just don't on other volunteer projects... it's rather difficult to argue the case when they can install Ubuntu or whatever nonsense distro is the current favourite and it just works. Just a bit more accurate info would solve it, if it doesn't do X reliably, or Y has changed, note it. You asked a question and got two or three responses back in a day. You mentioned trying different timekeeping choices, but I don't recall seeing what your kern.timecounter sysctl values looked like; without that, folks are missing info that is likely to be relevant. Ah, well Regards, Yeah my gripe isn't with having no responses, the handful of people that have responded have been helpful but ultimately no responses from anyone involved. Just a one liner saying "we changed the timecounter stuff in 9, look at sysctl tree X" would have been more than sufficient, this sort of thing should really be mentioned in the relnotes though... For the record though, setting kern.eventtimer.periodic to 1 fixes the problem on all affected machines (returns my virtualbox guest to normality, reduces the drift on physical machines to 8.2 figures). FWIW, I can't even see any notes relating to this in UPDATING either. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Timekeeping in stable/9
Chuck Swiger wrote: On Jan 19, 2012, at 12:04 PM, Joe Holden wrote: Looks like this is down to the dynamic/tickless changes in 9 (that aren't even noted in the release notes), the machines have now been switched to linux as the lack of responses/care given to my recent postings has been noted and it was deemed that using linux would be less hassle in the long run. Sounds like you were looking for commercial support, since unpaid volunteers don't have an obligation to promptly leap out and provide solutions within your ETA. Regards, Not really, just an acknowledgement would be fine. It is what it is, everyday I try to argue in favour of the project, I still use it for myself everywhere but increasingly things happen that just don't on other volunteer projects... it's rather difficult to argue the case when they can install Ubuntu or whatever nonsense distro is the current favourite and it just works. Just a bit more accurate info would solve it, if it doesn't do X reliably, or Y has changed, note it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Timekeeping in stable/9
Looks like this is down to the dynamic/tickless changes in 9 (that aren't even noted in the release notes), the machines have now been switched to linux as the lack of responses/care given to my recent postings has been noted and it was deemed that using linux would be less hassle in the long run. Unfortunate decision but I am inclined to agree. Thanks, J Ian Lepore wrote: On Tue, 2012-01-17 at 20:12 +0000, Joe Holden wrote: Hi guys, Has anyone else noticed the tendency for 9.0-R to be unable to accurately keep time? I've got a couple of machines that have been upgraded from 8.2 that are struggling, in particular a Virtual box guest that was fine on 8.2, but now that's its been upgraded to 9.0 counts at anything from 2 to 20 seconds per 5 second sample, the result is similar with HPET, ACPI-fast and TSC. I also have physical boxes which new seem to drift quite substantially, ntpd cannot keep up and as these boxes need to be able to report the time relatively accurately, it is causing problems with log times and such... Any suggestions most welcome! Thanks, J I finally got a 9.0 generic build done today and I've been watching the timekeeping on 3 systems and they're all doing just fine. Two of the systems are performing pretty much identically to how they did on 8.2; the clock frequency correction calculated by ntpd differs by less than 1ppm. On the other system the kernel timekeeping routines are now choosing to use a different clock so I don't get a direct comparison of the old vs new drift rate, but the drift is still reasonable (100ppm now, used to be around 88, on an old 300mhz MediaGx-based system). I haven't had time yet to learn about the new eventtimer stuff in 9.0, but I know you can get some info on the choices it made via sysctl kern.eventtimer. Before 9.0 I'd check sysctl kern.clockrate and vmstat -i and make sure the chosen clock is interrupting at the right rate, but now with the eventtimer stuff there's not an obvious correlation between hz and profhz and stathz and any particular device's interrupt rate, at least for some clock choices (on the old MediaGx system without ACPI or HPET it seems to work more like it used to). -- Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Timekeeping in stable/9
Hi guys, Has anyone else noticed the tendency for 9.0-R to be unable to accurately keep time? I've got a couple of machines that have been upgraded from 8.2 that are struggling, in particular a Virtual box guest that was fine on 8.2, but now that's its been upgraded to 9.0 counts at anything from 2 to 20 seconds per 5 second sample, the result is similar with HPET, ACPI-fast and TSC. I also have physical boxes which new seem to drift quite substantially, ntpd cannot keep up and as these boxes need to be able to report the time relatively accurately, it is causing problems with log times and such... Any suggestions most welcome! Thanks, J ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: UFS corruption panic
Actually, that would be a safe assumption especially now that the installer rightly or wrongly defaults to a single / filesystem, but perhaps if it could be tunable via mount flags that would be sensible also... Thanks, J On Sun, Jan 15, 2012 at 12:48 PM, Bruce Cran wrote: > > On 15/01/2012 08:12, Stefan Bethke wrote: >> >> Yes, a panic is the correct action here. While I agree that it's super >> annoying, the filesystem notices that something is *really* wrong. Instead >> of letting the problem fester and continue to corrupt data, it stops the >> system. > > > One could argue instead that for non-root filesystems the correct action is > to stop all operations on that filesystem but let the rest of the system > continue. > > -- > Bruce Cran ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
UFS corruption panic
Guys Is a panic **really** appropriate for a filesystem that isn't even in fstab? ie; panic: ufs_dirbad: /mnt: bad dir ino 3229 at offset 0: mangled entry Which happened to be an file-backed md volume that got changed as I forgot to unmount it beforehand, however as a result there is now inconsistencies and probably data corruption or even missing data on other important filesystems (ie; /, /var etc) because there wasn't even a sync or any kind of other sensible behaviour. This is on a production box, which also has gmirror so I now have no idea what state it's going to be in when I can get a display attached. Surely the appropriate response here for non-critical filesystems is to warn and suggest manually inspecting it as turning a working production box into one thats dead in the water seems a little extreme. J ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
UFS corruption panic
Guys Is a panic **really** appropriate for a filesystem that isn't even in fstab? ie; panic: ufs_dirbad: /mnt: bad dir ino 3229 at offset 0: mangled entry Which happened to be an file-backed md volume that got changed as I forgot to unmount it beforehand, however as a result there is now inconsistencies and probably data corruption or even missing data on other important filesystems (ie; /, /var etc) because there wasn't even a sync or any kind of other sensible behaviour. This is on a production box, which also has gmirror so I now have no idea what state it's going to be in when I can get a display attached. Surely the appropriate response here for non-critical filesystems is to warn and suggest manually inspecting it as turning a working production box into one thats dead in the water seems a little extreme. J ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: GENERIC make buildkernel error / fails - posix_fadvise
On Thu, 12 Jan 2012 19:11:54 -0800 Garrett Cooper wrote: > On Thu, Jan 12, 2012 at 5:52 PM, Doug Barton > wrote: > > > >>> chflags -R noschg /usr/obj/usr > >>> rm -rf /usr/obj/usr > > > > It's much faster to do: > > > > /bin/rm -rf ${obj}/* 2> /dev/null || /bin/chflags -R 0 ${obj}/* && > > /bin/rm -rf ${obj}/* > > +1. And it's faster yet when you can run parallel copies of rm on > different portions of the directory tree (e.g. xargs, find [..] -exec) > as rm is O(n). > Cheers, > -Garrett > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscr...@freebsd.org" What I've been doing just before I do a make buildworld/buildkernel is: mdmfs -s2g md1 /usr/obj on a clean /usr/obj . If I need to recompile before a boot, just umount and recreate. Provides a little performance boost too. Regards, -- joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FLAME - security advisories on the 23rd ? uncool idea is uncool
The serious one (telnetd) is already being exploited in the wild, and if you're running telnetd anyway then you can always switch to ssh or acl the port, either way it is a relative non-issue to ignore the update for now... Damien Fleuriot wrote: My point (which may or may not be valid) was that if the vulnerabilities remained *undisclosed*, they would have a much lower chance of being exploited. On 12/23/11 5:47 PM, Joe Holden wrote: So don't update until Monday? The outcome will be the same :) Damien Fleuriot wrote: Hey up list, Look, just a rant here. Who in *HELL* thought it would be a cool idea to release no less than FOUR security advisories today ? I mean, couldn't this have waited and remained undisclosed until monday ? I for one do *NOT* relish the idea of updating 50+ boxes this evening and tomorrow ! Not to mention a whole lot of merchants and banks have toggled IT Freeze a few weeks ago, to ensure xmas shopping doesn't get disturbed by production changes. Seriously, this is just irritating. /flame ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FLAME - security advisories on the 23rd ? uncool idea is uncool
So don't update until Monday? The outcome will be the same :) Damien Fleuriot wrote: Hey up list, Look, just a rant here. Who in *HELL* thought it would be a cool idea to release no less than FOUR security advisories today ? I mean, couldn't this have waited and remained undisclosed until monday ? I for one do *NOT* relish the idea of updating 50+ boxes this evening and tomorrow ! Not to mention a whole lot of merchants and banks have toggled IT Freeze a few weeks ago, to ensure xmas shopping doesn't get disturbed by production changes. Seriously, this is just irritating. /flame ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
Arnaud Lacombe wrote: Hi, On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann wrote: Just saw this shot benchmark on Phoronix dot com today: http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA it might be worth highlighting that despite Oracle Linux 6.1 Server is using a kernel + compiler almost 2 years old, it still manages to out-perform the bleeding edge FreeBSD :-) serenity# gcc --version gcc (GCC) 4.2.1 20070831 patched [FreeBSD] serenity# uname -r 9.0-RC3 Now, from what I've read so far in this thread, it seems that a lot of people are still in abnegation... my 0.2c, - Arnaud It may be worth to discuss the sad performance of FBSD in some parts of the benchmark. A difference of a factor 10 or 100 is simply far beyond disapointing, it is more than inacceptable and by just reading those benchmarks, I'd like to drop thinking of using FreeBSD even as a backend server in scientific and business environments. In detail, some of the SciMark benches look disappointing. The overall image can't help over the fact that in C-Ray FreeBSD is better performing. From the compiler, I'd like say there couldn't be a drop of more than 10 - 15% in performance - but not 10 or 100 times. I'm just thinking about the discussion of SCHED_ULE and all the saur spots we discussed when I stumbled over the test. Regards, Oliver ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Unable to attach USB disks at boot time
I have a VMware ESX 4.1 Update 1 server (underlying hardware is a Cisco UCS C210) to which I have connected two WD My Book 1130 drives. I have allocated both drives to my FreeBSD RELENG_8 VM (amd64). At boot time, I see: Root mount waiting for: usbus1 usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored) Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 ugen1.2: at usbus1 (disconnected) uhub_reattach_port: could not allocate new device Root mount waiting for: usbus1 Root mount waiting for: usbus1 usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored) Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 ugen1.2: at usbus1 (disconnected) uhub_reattach_port: could not allocate new device However, once FreeBSD is fully booted, I can unattach then reattach the drives (though VIC), and they attach just fine: ugen1.2: at usbus1 umass0: on usbus1 umass0: SCSI over Bulk-Only; quirks = 0x umass0:1:0:-1: Attached to scbus1 da1 at umass-sim0 bus 0 scbus1 target 0 lun 0 da1: Fixed Direct Access SCSI-6 device da1: 40.000MB/s transfers da1: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C) ses0 at umass-sim0 bus 0 scbus1 target 0 lun 1 ses0: Fixed Enclosure Services SCSI-6 device ses0: 40.000MB/s transfers ses0: SCSI-3 SES Device ugen1.3: at usbus1 umass1: on usbus1 umass1: SCSI over Bulk-Only; quirks = 0x umass1:2:1:-1: Attached to scbus2 da2 at umass-sim1 bus 1 scbus2 target 0 lun 0 da2: Fixed Direct Access SCSI-6 device da2: 40.000MB/s transfers da2: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C) ses1 at umass-sim1 bus 1 scbus2 target 0 lun 1 ses1: Fixed Enclosure Services SCSI-6 device ses1: 40.000MB/s transfers ses1: SCSI-3 SES Device I'm running FreeBSD RELENG_8 from Sat Jul 2 17:40:20 EDT 2011. I had an older Maxtor drive connected to this VM previously, and it was working fine. These WD drives are USB 3, but operating under USB 2 mode. Any advice? Thanks. Joe -- Joe Marcus Clarke FreeBSD GNOME Team :: gn...@freebsd.org FreeNode / #freebsd-gnome http://www.FreeBSD.org/gnome ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
LTO3 tape drive not detected
This was originally posted on the freebsd-questions list. It was suggested that I post it here: I have FreeBSD 8.2-RELEASE running on an HP DL360 G5. I recently added an (HP branded) LSI Logic single channel SCSI 320 card and attached an HP Ultrium 920 LTO3 tape drive. The system sees the SCSI controller as mpt0, and it seems to know there's something at SCSI ID 4, but I get an "AutoSense Failed" for hba/id/lun 0:4:0 at boot and subsequent camcontrol rescans. I checked the supported hardware doc for the release but it doesn't get very specific about tape drives. This is my first experience with LTO3 tape. I was hoping that I'd automagically get a /dev/sa0 device like I always did with my old DLT drives but it wasn't to be this time. Is there a way to make this drive work? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: HEADS UP: FreeBSD 6.4 and 8.0 EoLs coming soon
On 21/09/2010 11:49 PM, Willem Jan Withagen wrote: On 2010-09-21 15:16, Jeremy Chadwick wrote: On Tue, Sep 21, 2010 at 02:59:46PM +0200, Willem Jan Withagen wrote: On 2010-09-21 13:39, {some mysterious person :-)} wrote: The Project is ultimately about the users, right? There are early signs that some old FreeBSD users get tired from those changes, those removals, lesser POLA adherence, marketing-not-technical-stuff for time-not-feature-based releases, not so stable -STABLE as it used to be, and so on, migrating to other systems. And older users are more valuable to project than newer ones. May be it's time to revert to some of thet Old Good Things, if decade-long project is mostly ended, while those signs are still early and not a strong tendency?.. Given this thread, I've mentioned earlier about 12 messages in announce@ from 2002 with such public calls for volunteers - there are several years already without these. Andriy wasn't the one who wrote this. In fact, I'm not sure who the quote actually came from because I never received the Email it came from, but I'm under the impression it's from Vadim. My mail spool: My bad for not checking the included reference. I was also very much under the impression that that quote was Vadim's, since it was in completeline with his previous complaints/rants/whining. And yes, your are smart to stay out of the discussion. But this old fart just had too much urge to react. So now I'll just go back to my old lurking state. My thoughts are below - remembering its a volunteer project, people spend their precious time to make it happen, and noneofthatwisthandingitsstilldamngood: a) if you don't like it, fix it. b) if you can't fix it, pay someone else to fix it c) if you can't fix it or otherwise be helpful, remain silent If you can't do a or b or c, and still have no options, below: d) whinging never helps e) those that whinge on volunteer projects are subject to the emperors wrath f) kill the heretic, the witch, the unbeliever. Recover the gene-seed at all costs. Cheers Joe --WjW ___ freebsd-secur...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-security To unsubscribe, send any mail to "freebsd-security-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates
The difference in layout can easily explain a 2x difference in sequential transfer performance. I seriously doubt your disk is really getting 23K seeks/s done in the UFS case - 100/s sounds much more reasonable for real hardware. Perhaps the results of caching? Joe Koberg Dan Naumov wrote: I am wondering if the numbers I am seeing is something expected or is something broken somewhere. Output of bonnie -s 1024: on UFS2 + SoftUpdates: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6 23603.8 243.3 on ZFS: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 102.2 1.2 atom# cat /boot/loader.conf vm.kmem_size="1024M" vm.kmem_size_max="1024M" vfs.zfs.arc_max="96M" The test isn't completely fair in that the test on UFS2 is done on a partition that resides on the first 16gb of a 2tb disk while the zfs test is done on the enormous 1,9tb zfs pool that comes after that partition (same disk). Can this difference in layout make up for the huge difference in performance or is there something else in play? The system is an Intel Atom 330 dualcore, 2gb ram, Western Digital Green 2tb disk. Also what would be another good way to get good numbers for comparing the performance of UFS2 vs ZFS on the same system. Sincerely, - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
on 23/05/2009 05:26 Alexander Motin said the following: Hi. Joe Karthauser wrote: I spoke too soon. It must have just randomly booted, because it is now hanging again. No amount of jiggling cables has made any difference. Can you provide verbose boot messages of your system from the beginning up to the problem? Especially, all related to the ATA. Attached. > Do you have AHCI mode enabled in BIOS, or you using legacy ATA emulation? It's set up as AHCI in the bios. What is strange is that it has now started working again. I can't make any sense of it. The machine boots up fine. It was definitely hanging at the ata probes though, just after the ZFS messages are output. Joe Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-STABLE #7: Fri May 22 23:10:15 BST 2009 r...@athenaeum.tao.org.uk:/usr/obj/usr/src/sys/ATHENAEUM Preloaded elf kernel "/boot/kernel/kernel" at 0x80b47000. Preloaded elf module "/boot/kernel/zfs.ko" at 0x80b4719c. Preloaded elf module "/boot/kernel/opensolaris.ko" at 0x80b47244. Preloaded elf module "/boot/kernel/geom_eli.ko" at 0x80b472f4. Preloaded elf module "/boot/kernel/crypto.ko" at 0x80b473a4. Preloaded elf module "/boot/kernel/zlib.ko" at 0x80b47450. Preloaded elf module "/boot/kernel/geom_label.ko" at 0x80b474fc. Preloaded elf module "/boot/kernel/geom_mirror.ko" at 0x80b475ac. Preloaded /boot/zfs/zpool.cache "/boot/zfs/zpool.cache" at 0x80b4765c. Preloaded elf module "/boot/kernel/acpi.ko" at 0x80b476b4. module_register: module g_label already exists! Module g_label failed to register: 17 Calibrating clock(s) ... i8254 clock: 1192003 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter "i8254" frequency 1193182 Hz quality 0 Calibrating TSC clock ... TSC clock: 2402413236 Hz CPU: Intel(R) Core(TM)2 Quad CPUQ6600 @ 2.40GHz (2402.41-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6fb Stepping = 11 Features=0xbfebfbff Features2=0xe3bd AMD Features=0x2010 AMD Features2=0x1 Cores per package: 4 Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries 1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size 1st-level data cache: 32 KB, 8-way set associative, 64 byte line size L2 cache: 4096 kbytes, 16-way associative, 64 bytes/line real memory = 3756916736 (3582 MB) Physical memory chunk(s): 0x1000 - 0x0009dfff, 643072 bytes (157 pages) 0x0010 - 0x003f, 3145728 bytes (768 pages) 0x00c25000 - 0xdbf7, 3677728768 bytes (897883 pages) avail memory = 3673681920 (3503 MB) Table 'FACP' at 0xdfee30c0 Table 'HPET' at 0xdfee7e00 Table 'MCFG' at 0xdfee7e80 Table 'APIC' at 0xdfee7d00 MADT: Found table at 0xdfee7d00 MP Configuration Table version 1.4 found at 0x800f0d00 APIC: Using the MADT enumerator. MADT: Found CPU APIC ID 0 ACPI ID 0: enabled SMP: Added CPU 0 (AP) MADT: Found CPU APIC ID 3 ACPI ID 1: enabled SMP: Added CPU 3 (AP) MADT: Found CPU APIC ID 2 ACPI ID 2: enabled SMP: Added CPU 2 (AP) MADT: Found CPU APIC ID 1 ACPI ID 3: enabled SMP: Added CPU 1 (AP) ACPI APIC Table: INTR: Adding local APIC 1 as a target INTR: Adding local APIC 2 as a target INTR: Adding local APIC 3 as a target FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 bios32: Found BIOS32 Service Directory header at 0x800fad30 bios32: Entry = 0xfb3f0 (800fb3f0) Rev = 0 Len = 1 pcibios: PCI BIOS entry at 0xf+0xb420 pnpbios: Found PnP BIOS data at 0x800fbf90 pnpbios: Entry = f:bfc0 Rev = 1.0 Other BIOS signatures found: APIC: CPU 0 has ACPI ID 0 APIC: CPU 1 has ACPI ID 3 APIC: CPU 2 has ACPI ID 2 APIC: CPU 3 has ACPI ID 1 ULE: setup cpu group 0 ULE: setup cpu 0 ULE: adding cpu 0 to group 0: cpus 1 mask 0x1 ULE: setup cpu group 1 ULE: setup cpu 1 ULE: adding cpu 1 to group 1: cpus 1 mask 0x2 ULE: setup cpu group 2 ULE: setup cpu 2 ULE: adding cpu 2 to group 2: cpus 1 mask 0x4 ULE: setup cpu group 3 ULE: setup cpu 3 ULE: adding cpu 3 to group 3: cpus 1 mask 0x8 This module (opensolaris) contains code covered by the Common Development and Distribution License (CDDL) see http://opensolaris.org/os/licensing/opensolaris_license/ ACPI: RSDP @ 0x0xf6c30/0x0014 (v 0 GBT ) ACPI: RSDT @ 0x0xdfee3040/0x0034 (v 1 GBTGBTUACPI 0x42302E31 GBTU 0x01010101) ACPI: FACP @ 0x0xdfee30c0/0x0074 (v 1 GBTGBTUACPI 0x42302E31 GBTU 0x01010101) ACPI: DSDT @ 0x0xdfee3180/0x4B32 (v 1 GBTGBTUACPI 0x1000 MSFT 0x010C) ACPI: FACS @ 0x0xdfee/0x0040 ACPI: HPET @ 0x0xdfe
Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
I spoke too soon. It must have just randomly booted, because it is now hanging again. No amount of jiggling cables has made any difference. :(. Joe on 22/05/2009 20:40 Joe Karthauser said the following: Hi Alexander, I've love it if you were able to provide some insight into this problem. I'm going to try switching sata cables around next to see if the problem goes away if I disconnect some combination of bays. Thanks, Joe on 22/05/2009 19:39 Kip Macy said the following: Motin is your best bet in tracking down ATA problems. Cheers, Kip On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: Hi Kip, I seriously don't understand what has happened. If I boot kernel.old I still get the same problem. Very confusing. :(. Joe on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser wrote: Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
This appears to have gone away now. I unplugged the bay that was causing the trouble, and the system booted just fine on the remaining 4 drives. Then I plugged the bay back in (live) and did an atacontrol detach/attach on that bus (I wonder why I always have to do that). The drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to make sure that everything is good, and I'll do a reboot and see if it's all ok after that. Strange, so it looks like a cable might have got a little loose or something. I wonder why that would have hung the kernel probe though. Joe on 22/05/2009 20:40 Joe Karthauser said the following: Hi Alexander, I've love it if you were able to provide some insight into this problem. I'm going to try switching sata cables around next to see if the problem goes away if I disconnect some combination of bays. Thanks, Joe on 22/05/2009 19:39 Kip Macy said the following: Motin is your best bet in tracking down ATA problems. Cheers, Kip On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: Hi Kip, I seriously don't understand what has happened. If I boot kernel.old I still get the same problem. Very confusing. :(. Joe on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser wrote: Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))
Hi Alexander, I've love it if you were able to provide some insight into this problem. I'm going to try switching sata cables around next to see if the problem goes away if I disconnect some combination of bays. Thanks, Joe on 22/05/2009 19:39 Kip Macy said the following: Motin is your best bet in tracking down ATA problems. Cheers, Kip On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: Hi Kip, I seriously don't understand what has happened. If I boot kernel.old I still get the same problem. Very confusing. :(. Joe on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserwrote: Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)
Hi Kip, I seriously don't understand what has happened. If I boot kernel.old I still get the same problem. Very confusing. :(. Joe on 21/05/2009 19:28 Kip Macy said the following: I have no idea what is happening. I think our best bet is having someone with insight into ATA provide us with help in adding diagnostics. Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. Cheers, Kip On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser wrote: Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)
Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Error message: run_interrupt_driven_hooks:...
Greetings... Basic data on my experience with the xpt_config hang; I have more detail if needed, but I doubt anyone will believe it. I'm not even sure I do. Some other reports: http://lists.freebsd.org/pipermail/freebsd-questions/2009-April/196116.html Seur Bors Thu Apr 9 14:43:34 UTC 2009. http://lists.freebsd.org/pipermail/freebsd-stable/2009-May/049901.html martinko gamato Mon May 11 22:05:56 UTC 2009 http://www.nabble.com/Freebsd-7.2-RC-boot-problem-tt23257632.html#a23257632 http://forums.pcbsd.org/viewtopic.php?f=1&t=13312 Here is the entire error for me during boot: run_interrupt_driven_hooks: still waiting after BIGNUM seconds for xpt_config It hangs after this point in the boot process: pcm0: pcm0: the boot process does not continue, so the next normal thing does not appear on the console: SMP: AP CPU #1 Launched! but during the hang, this scrolls past (punctuated by the BIGNUM seconds wait) over and over on the console: acpi_tz0: _TMP value is absurd, ignored (-269.4C) Normally, that message is suppressed by this /etc/sysctl.conf entry: hw.acpi.thermal.polling_rate=0 I suppose this means that /etc/sysctl.conf is not parsed and the second CPU is not launched. Hardware in question, as seen by dmesg, is attached; the vendor's specs are: Core 2 Duo (C) E6400 2.13 GHz 1066 MHz front side bus Socket 775 Chipset P965 Motherboard: Asus P5BW-LA HP/Compaq motherboard name: Basswood-UL8E There is RAID on the motherboard; I don't use it. I do use AHCI. BIOS is current; there are no available updates. The onboard firewire is disabled, since it began (prior to 7.1) causing unresolvable panics. CAM is in my kernel: # SCSI peripherals #Added atapicam; apparently, cdparanoia requires it. device atapicam device scbus # SCSI bus (required for SCSI) device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass# Passthrough device (direct SCSI access) As of 9:30 PM EDT May 11, the issue has de-Heisenberged from my PC. I'm not subscribed to the list; so you'll need to Cc: me if you think I can help. Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-RELEASE-p5 #0: Sun May 3 06:43:50 EDT 2009 r...@whisperer.chthonixia.net:/usr/obj/usr/src/sys/WHISPERER Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz (2135.55-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6f6 Stepping = 6 Features=0xbfebfbff Features2=0xe3bd AMD Features=0x2000 AMD Features2=0x1 Cores per package: 2 real memory = 2146299904 (2046 MB) avail memory = 2094936064 (1997 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 4 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a (3) failed acpi0: reservation of 10, 7fde (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 acpi_hpet0: iomem 0xfed0-0xfed003ff on acpi0 device_attach: acpi_hpet0 attach returned 12 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 16 at device 1.0 on pci0 pci1: on pcib1 vgapci0: port 0xde00-0xdeff mem 0xe000-0xefff,0xfddf-0xfddf irq 16 at device 0.0 on pci1 vgapci1: mem 0xfdde-0xfdde at device 0.1 on pci1 uhci0: port 0xff00-0xff1f irq 21 at device 26.0 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xfe00-0xfe1f irq 18 at device 26.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered ehci0: mem 0xfdfff000-0xfdfff3ff irq 21 at device 26.7 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb2: EHCI version 1.0 usb2: companion controllers, 2 ports each: usb0 usb1 usb2: on ehci0 usb2: USB revision 2.0 uhub2: on usb2 uhub2: 4 ports with 4 removable, self powered pcm0: mem 0xfdff4000-0xfdff7fff irq 22 at device 27.0 on pci0 pcm0: [ITHREAD] uhci2: port 0xfd00-0xfd1f irq 23 at device 29.0 on pci0 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb3: on uhci2 usb3: USB revision 1.0 uhub3: on usb3 uhub3: 2 ports with 2 removable, self powered uhci3: port 0xfc00-0xfc1f irq 17 at device 29.1 on pci0 uhci3: [GIANT-LOCKED] uhci3: [ITHREAD] usb4: on uhci3 usb4: USB revision 1.0 uhub4: on usb
Re: xorg 7.4 keyboard localisation (xorg.conf vs hal)
Robert Noland wrote: On Tue, 2009-02-03 at 15:07 +0100, Sebastien Chassot wrote: On Mon, 2009-02-02 at 16:05 -0500, Robert Noland wrote: On Mon, 2009-02-02 at 12:43 -0800, Joe Kelsey wrote: Robert Noland wrote: man xorg.conf search for Input... This provides absolutely no help. I look at my /var/log/Xorg.0.log and it tells me nothing. If I remove the keyboard and mouse input devices from xorg.conf, the log tells me that it is disabling all input devices and never says anything else. There is no evidence that hal does anything that X want to know about. How would I detect that my configuration file needs changing? Is there a message in Xorg.0.log to look for? Is there a help file somehwere which explains how to change your configuration file to allow hal to work? I cannot find any information anywhere in the system to allow me to debug my problems in any way. I want to have fully automatic configuration using whatever means will allow it. You explanations about the mysterious behavior of hal and xorg do not give me any information I can use in any way to solve my problems. Set Options "AutoAddDevices" "off" and you have to configure everything like you used to. I WANT to use the new facilities. Is it possible to debug my configuration problems? Where do I start? How do I enable this magical new world of letting hal do things for me? What changes do I make to xorg.conf to allow this? Ok, are you using gdm, xdm, or startx? You need to ensure that dbus and hald are running first. Set dbus_enable="YES" and hald_enable="YES" in your rc.conf. This FAQ says to remplace dbus/hal by gnome_enable="YES" Correct, if you using gnome, that will enable hal and dbus. zircon.zircon.seattle.wa.us$ ps xa | egrep hal\|dbus 789 ?? Is 0:00.12 /usr/local/bin/dbus-daemon --system 946 ?? Ss 0:17.94 /usr/local/sbin/hald 951 ?? IW 0:00.00 hald-runner 968 ?? IW 0:00.00 hald-addon-mouse-sysmouse: /dev/ums0 (hald-addon-mous 986 ?? S 0:09.15 hald-addon-storage: /dev/cd0 (hald-addon-storage) 1027 ?? IW 0:00.00 /usr/local/bin/dbus-launch --exit-with-session 1082 ?? IW 0:00.00 dbus-launch --exit-with-session /usr/local/bin/seahor 1083 ?? Is 0:00.92 /usr/local/bin/dbus-daemon --fork --print-pid 7 --pri 42823 p1 DL+0:00.00 egrep hal|dbus Attached is /etc/rc.conf. /Joe robert. http://www.freebsd.org/gnome/docs/faq2.html#full-gnome # -- sysinstall generated deltas -- # Sun Oct 23 06:00:05 2005 # Created: Sun Oct 23 06:00:05 2005 # Enable network daemons for user convenience. # Please make all changes to this file, not to /etc/defaults/rc.conf. # This file now contains just the overrides from /etc/defaults/rc.conf. defaultrouter="192.168.1.1" hostname="zircon.zircon.seattle.wa.us" ifconfig_sk0="inet 192.168.1.3 netmask 255.255.255.0" linux_enable="YES" nfs_server_enable="YES" nfs_client_enable="YES" rpcbind_enable="YES" rpc_statd_enable="YES" rpc_lockd_enable="YES" sshd_enable="YES" usbd_enable="YES" svscan_enable="YES" moused_enable="YES" # Run the mouse daemon. moused_type="auto" # See man page for rc.conf(5) for available settings. moused_port="/dev/ums0" # Set to your mouse port. moused_flags="-m 3=1 -m 1=3 -m 4=6 -m 6=4 -m 5=7 -m 7=5" mysql_enable="YES" sendmail_enable="NO" sendmail_submit_enable="NO" sendmail_outbound_enable="NO" sendmail_map_queue_enable="NO" gdm_enable="YES" dumpdev="NO" # This file now contains just the overrides from /etc/defaults/rc.conf. # Please make all changes to this file, not to /etc/defaults/rc.conf. # Enable network daemons for user convenience. ntpdate_flags=140.142.16.34 ntpdate_enable="YES" ntpd_enable=YES #amd_enable="YES" dbus_enable="YES" polkitd_enable="YES" hald_enable="YES" # The Fish generated deltas - Sat May 5 14:27:39 2007 weak_mountd_authentication="YES" # added by mergebase.sh local_startup="/usr/local/etc/rc.d" cupsd_enable="YES" apache22_enable="YES" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: xorg 7.4 keyboard localisation (xorg.conf vs hal)
Robert Noland wrote: man xorg.conf search for Input... This provides absolutely no help. I look at my /var/log/Xorg.0.log and it tells me nothing. If I remove the keyboard and mouse input devices from xorg.conf, the log tells me that it is disabling all input devices and never says anything else. There is no evidence that hal does anything that X want to know about. How would I detect that my configuration file needs changing? Is there a message in Xorg.0.log to look for? Is there a help file somehwere which explains how to change your configuration file to allow hal to work? I cannot find any information anywhere in the system to allow me to debug my problems in any way. I want to have fully automatic configuration using whatever means will allow it. You explanations about the mysterious behavior of hal and xorg do not give me any information I can use in any way to solve my problems. Set Options "AutoAddDevices" "off" and you have to configure everything like you used to. I WANT to use the new facilities. Is it possible to debug my configuration problems? Where do I start? How do I enable this magical new world of letting hal do things for me? What changes do I make to xorg.conf to allow this? /Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: xorg 7.4 keyboard localisation (xorg.conf vs hal)
Robert Noland wrote: On Sun, 2009-02-01 at 14:10 -0800, Joe Kelsey wrote: Sebastien Chassot wrote: On Sun, 2009-02-01 at 17:19 +, Daniel Bye wrote: On Sun, Feb 01, 2009 at 05:42:39PM +0100, Sebastien Chassot wrote: Hi, I've upgrade to xorg7.4 and apparently keyboard and mouse are now working with hald. In xorg.conf changing "old" keybord config as no effect and I can't find how change it with hal. I've got /usr/local/etc/hal/fdi/* but no *keymap* and I don't know how build such a file. This should get you started: gb Change the `gb' in the example to your local keymap name, save the file as /usr/local/etc/hal/fdi/policy/x11-input.fdi and restart hald. This seems to have a way to enable HAL to detect a keyboard and export it to X, but what about mice? My Xorg log tells me that it is ignoring my USB mouse in addition to ignoring my keyboard, so what sort of HAL file do I add to enable it to find my mouse? The above is only to set keyboard layout, everything to detect the keyboard is already present. Where in HAL documentation is this information found? R. Noland seemed to think it was a trivial process to make HAL do keyboards and mice? In fact it is not trivial but a pain in the ass! If you intend to inflict broken software on unsuspecting users you had better think through all of the problems and come up with explicit solutions to all of those problems so that everyone has a chance to make their systems work. We (marcus and I) have gone to great pains to try and ensure that hal behaves correctly in pretty much all mice configurations with or without sysmouse. If you don't want to use hal, set AutoAddDevices off and configure away. I did my best to follow ALL of the posted directions to absolutely NO AVAIL. When I start Xorg, it explicitly tells me it is disabling all automatic devices and refuses to use HAL or any other detectable methosd to find the mouse and/or keyboard. There is no documentation ANYWHERE about how HAL is supposed to help in any of this. There is no documentation ANYWHERE about what exaqctly the new Xorg is supposed to do about it. There is no documentation ANYWHERE about the new, secret, hidden options that you can put in your xorg.conf file to disable this whole HAL mess. The only documentation available ANYWHERE is the skimpy little paragraph that says, it works or it doesn't. No explanation about why it works or doesn't or how to determine exactly what might be wrong in your configuration to make it work or not work. I would not compalin if you actually documented what you are inflicting on us rather than just say, here it is, good luck! I understand how difficult some of these port upgrades are, but you have to realize that you have not provided ANY resources that anyone else can use to help themselves figure out their issues. I don't want to pay you with money I do not have from a job I do not have. You have to realize how many people may or may not have problems due to your blithe posting of this complicated mess. Either explain how to use HAL properly to configure X resources or disable the capability. Thank you for all of your effort so far. I really do appreciate it. /Joe There had better not be any more surprises waiting in the X 1.6 wings to surprise and confound everyone again! Are you going to stop paying me? You have no idea how many combinations of hardware and configurations for X exist, or the amount of wok that goes into making all of those combinations work. robert. I'll start with that Thank you Sebastien ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: xorg 7.4 keyboard localisation (xorg.conf vs hal)
Sebastien Chassot wrote: On Sun, 2009-02-01 at 17:19 +, Daniel Bye wrote: On Sun, Feb 01, 2009 at 05:42:39PM +0100, Sebastien Chassot wrote: Hi, I've upgrade to xorg7.4 and apparently keyboard and mouse are now working with hald. In xorg.conf changing "old" keybord config as no effect and I can't find how change it with hal. I've got /usr/local/etc/hal/fdi/* but no *keymap* and I don't know how build such a file. This should get you started: gb Change the `gb' in the example to your local keymap name, save the file as /usr/local/etc/hal/fdi/policy/x11-input.fdi and restart hald. This seems to have a way to enable HAL to detect a keyboard and export it to X, but what about mice? My Xorg log tells me that it is ignoring my USB mouse in addition to ignoring my keyboard, so what sort of HAL file do I add to enable it to find my mouse? Where in HAL documentation is this information found? R. Noland seemed to think it was a trivial process to make HAL do keyboards and mice? In fact it is not trivial but a pain in the ass! If you intend to inflict broken software on unsuspecting users you had better think through all of the problems and come up with explicit solutions to all of those problems so that everyone has a chance to make their systems work. There had better not be any more surprises waiting in the X 1.6 wings to surprise and confound everyone again! I'll start with that Thank you Sebastien ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Western Digital hard disks and ATA timeouts
Søren Schmidt wrote: On 7Nov, 2008, at 20:12 , Peter Wemm wrote: On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote: [..] As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and is not adjustable without editing the ATA code yourself and increasing the value. The FreeNAS folks have made patches available to turn the timeout value into a sysctl. Soren and/or others, please increase this timeout value. Five seconds has now been deemed too aggressive a default. And please consider migrating the timeout value into a sysctl. The 5 second timeout has been a problem for quite a while actually. I've had a number of instances where I've had to increase it to 20 or 30 seconds when recovering from marginal drives. The longest "successful" recovery attempt I've seen was 26 seconds, I believe on a Maxtor drive a few years ago. ("successful" == the drive spent 26 seconds but eventually successfully read the sector). Even the IBM death star drives could take much longer than 5 seconds to do a recovery 5 years ago. 5 seconds has never been a good default. I think the timeout should be increased to at least 30 seconds. My windows box has a timeout that goes for several minutes. If there is concern about FreeBSD appearing to hang, I could imagine that a console warning message could be printed after 5 seconds. But just say "drive has not yet responded". But give it more time. In this day and age we're generally not playing games with udma33 vs 66, notched cables, poor CRC support etc. SATA seems to have eliminated all that. Hmm, it might make sense to increase the timeout on SATA connections to 2 or 3 minutes by default. Actually I do have a patch around that logs the timeout on the console after the normal timeout (5secs), then just goes on to wait for double the timeout and log again etc etc, final timeout was IIRC 60 secs but could be anything. I have a disk which I am finally getting rid of that produces READ_DMA and WRITE_DMA errors at a pretty high rate. I did enable the extra ATA error reporting and it doesn't seem to indicate any sort of actual errors, just extra long itmeouts. At one time, I did change the system to extend the timeout, but I did not see any real improvement at 30 seconds. I suspect that an even more extended timeout would be necessary to solve the problem. I am removing the disk this week. Does anyone want a disk that produces DMA timeouts at a regular rate? Would it help actually solve this problem? Please let me know if you want such a beast and I will ship it to you. /Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern.maxdsiz on amd63 with i386 binaries
Chris Peterson wrote: > Thanks Joe, that did it. > > Out of curiosity, I don't see any of the compat tree in > /boot/defaults/loader.conf, is there any place this is documented > besides kernel sources? If not then I guess I should give something back > to the community and change that :) Not that I found. I typically find stuff like this by running sysctl -a and grepping for familiar patterns. Joe > > Regards, > Chris Peterson > > On Oct 24, 2008, at 9:03 PM, Joe Marcus Clarke wrote: > >> Chris Peterson wrote: >>> Hello, >>> >>> I've got a handful of i386 boxes, and a handful of amd64 boxes running a >>> 32-bit application, the reasons for this exact configuration mystify me >>> as well as the deployment predates my time in the environment. Now that >>> the dataset the application is loading is rapidly approaching 512MB >>> we're starting to tweak kern.maxdsiz and kern.dfldsiz to 1GB. >>> >>> The i386 boxes are doing great, but we hit an issue with the amd64 >>> machines in that 64bit apps seem to work fine, but the 32bit apps >>> running on the amd64 machines fail to be able to use more than the i386 >>> default of 512MB no matter what we set kern.maxdsiz to. I've also tried >>> compiling it into the kernel, which results in the same issue. >>> >>> I tried starting the app with "limits -d 1090519040", and it seems to >>> fail as well. Limits does show the proper value for datasize of >>> 1064960 kB. >>> >>> We're locked into 32-bit binaries for this app at the moment thanks to >>> some uh... interesting libraries it uses, so the usual option of >>> recompile isn't available. I'd like to avoid traveling from San Jose to >>> Seattle, then Virginia, then Munich to reinstall the amd64 machines with >>> i386 machines if at all possible. >>> >>> Uh... help? >> >> Have you tried setting compat.ia32.maxdsiz? I believe this will do what >> you want. >> >> Joe >> >> -- >> Joe Marcus Clarke >> FreeBSD GNOME Team::[EMAIL PROTECTED] >> FreeNode / #freebsd-gnome >> http://www.FreeBSD.org/gnome > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "[EMAIL PROTECTED]" > -- Joe Marcus Clarke FreeBSD GNOME Team :: [EMAIL PROTECTED] FreeNode / #freebsd-gnome http://www.FreeBSD.org/gnome ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern.maxdsiz on amd63 with i386 binaries
Chris Peterson wrote: > Hello, > > I've got a handful of i386 boxes, and a handful of amd64 boxes running a > 32-bit application, the reasons for this exact configuration mystify me > as well as the deployment predates my time in the environment. Now that > the dataset the application is loading is rapidly approaching 512MB > we're starting to tweak kern.maxdsiz and kern.dfldsiz to 1GB. > > The i386 boxes are doing great, but we hit an issue with the amd64 > machines in that 64bit apps seem to work fine, but the 32bit apps > running on the amd64 machines fail to be able to use more than the i386 > default of 512MB no matter what we set kern.maxdsiz to. I've also tried > compiling it into the kernel, which results in the same issue. > > I tried starting the app with "limits -d 1090519040", and it seems to > fail as well. Limits does show the proper value for datasize of 1064960 kB. > > We're locked into 32-bit binaries for this app at the moment thanks to > some uh... interesting libraries it uses, so the usual option of > recompile isn't available. I'd like to avoid traveling from San Jose to > Seattle, then Virginia, then Munich to reinstall the amd64 machines with > i386 machines if at all possible. > > Uh... help? Have you tried setting compat.ia32.maxdsiz? I believe this will do what you want. Joe -- Joe Marcus Clarke FreeBSD GNOME Team :: [EMAIL PROTECTED] FreeNode / #freebsd-gnome http://www.FreeBSD.org/gnome ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.4 RC1 locks up solid on first reboot
Jeremy Chadwick wrote: On Thu, Oct 23, 2008 at 06:27:45AM +0200, Milan Obuch wrote: I did not investigate on this issue too much, but there is an workaround - copy older /boot/loader over newer one. In my case, I am rebuilding whole I have experienced loader troubles in the past when using customized compiler options in /etc/make.conf . Rebuilding without compiler options fixed the issue. Joe Koberg joe at osoft dot us ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fxp performance with POLLING
Pete French wrote: However, ethernet at 100Mbit is 4B5B coded at a 125mhz rate. So the raw Errr, 4B5B *is* 10 bits per byte surely? ... Gig ether is mainly 8B10, as is Firewire, SATA, FibreChannel and a Mind you, it assumes that you know the real bit rate, which in the case of 100baseT is, as you say, actualy 125mbits/sec. You are right. It definitely is 10 bits per byte clocked at a higher rate. I guess the "100mbit/s" rate is so strongly associated with the technology that I glossed right over that. Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: fxp performance with POLLING
Pete French wrote: 1 megabit = 106 = 1,000,000 bits which is equal to 125,000 bytes. you are assuming eight bits per byte - but this is a serial line so you should use ten bits per byte instead. -pete. That was a rule of thumb in the heyday of async serial lines, which used a start and stop bit per byte. However, ethernet at 100Mbit is 4B5B coded at a 125mhz rate. So the raw synchronous data rate really is 12.5Mbytes/s. Minus the sync preamble of 8 bytes per packet and the mandatory inter-frame-gap of 12 bytes that's a physical layer rate of (12.5M * (1500/(1500+20))) or 12.34Mbyte/s. Even in the later days of modems this rule applied less and less, because the modulation schemes became synchronous. Joe Koberg joe_at_osoft_dot_us ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD 7.0 + Xen 3.1 + HVM: Success!
T-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse Explorer, device ID 4 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 8250 or not responding sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa-0xb on isa0 Timecounter "TSC" frequency 2793128576 Hz quality 800 Timecounters tick every 1.000 msec hptrr: no controller detected. ad0: 102400MB at ata0-master WDMA2 acd0: CDROM at ata1-master PIO3 GEOM_LABEL: Label for provider acd0 is iso9660/FreeBSD_Install. Trying to mount root from ufs:/dev/ad0s1a FreeBSD 7.0 pciconf -vl: [EMAIL PROTECTED]:0:0:0: class=0x06 card=0x chip=0x12378086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82440/1FX 440FX (Natoma) System Controller' class = bridge subclass = HOST-PCI [EMAIL PROTECTED]:0:1:0: class=0x060100 card=0x chip=0x70008086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82371SB PIIX3 PCI-to-ISA Bridge (Triton II)' class = bridge subclass = PCI-ISA [EMAIL PROTECTED]:0:1:1: class=0x010180 card=0x00015853 chip=0x70108086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82371SB PIIX3 IDE Interface (Triton II)' class = mass storage subclass = ATA [EMAIL PROTECTED]:0:2:0: class=0x03 card=0x00015853 chip=0x00b81013 rev=0x00 hdr=0x00 vendor = 'Cirrus Logic' device = 'CL-GD5446 64-bit VisualMedia Accelerator' class = display subclass = VGA [EMAIL PROTECTED]:0:3:0: class=0xff8000 card=0x00015853 chip=0x00015853 rev=0x01 hdr=0x00 [EMAIL PROTECTED]:0:4:0: class=0x02 card=0x00015853 chip=0x813910ec rev=0x20 hdr=0x00 vendor = 'Realtek Semiconductor' device = 'RT8139 (A/B/C/810x/813x/C+) Fast Ethernet Adapter' class = network subclass = ethernet -- Freddie Cash [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- Joe Auty NetMusician: web publishing software for musicians http://www.netmusician.org [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: DVD-RW doesn't write
Jerahmy Pocott wrote: On loading atapicam module it says: acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 I have never managed to use burncd with any drive. In order to use atapicam, you must enable the pass? devices. My devfs.conf contains: # Commonly used by many ports linkacd0cdrom linkacd0dvd # Allow a user in the wheel group to query the smb0 device #permsmb00660 permxpt00660 permpass00660 permpass10660 The xpt0 is left over from other experiments. The pass? is required to allow general access to use of growisofs. /Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Closing the Jo Rhett argument
Jo Rhett has clearly stated (in offline reply) that they do not participate in the -BETA and-RC cycles leading up to -RELEASE, so they therefore do not have any issues with -RELEASE and EoL to raise. Actually, they still have the same complaints to raise about EoL, but since they refuse to participate in the -RELEASE process, they do not have valid points to raise. I ask that everyone please stop communicating with the persona known on this list as "Jo Rhett" unless and until they participate in the -BETA, -RC, and -RELEASE process. You cannot raise any sort of valid complaint about -RELEASE, EoL or bugs if you do not participate in finding bugs during the -BETA and -RC stages prior to the -RELEASE. If you instead choose to try to run -RELEASE and find bugs then, then complain about the bugs you found and continue to complain that these bugs were not found by someone else and fixed ahead of time, you have no issues and do not deserve an answer, no matter how much you try to frame it as a "policy" question. /Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
6.3-RELEASE versus 5.2-RELEASE
I think I have finally decoded Jo Rhett's issue. It is very hard to decipher because the poster refuses to exactly identify their problem. The entire problem comes down to the definition of -RELEASE. Jo apparantly feels that they can ONLY run -RELEASE branded code at their workplace. That means that they cannot run any form of -STABLE. Therefore, they can only ever run 6.3-RELEASE and then only if no bugs were fixed after the official branding of 6.3-RELEASE. I cannot speak at all about the branding of 6.3-RELEASE. I run 7.0-STABLE here. What Jo seems to thik is that a certain sequence of events occurred during the 6.3-RELEASE branding. 6.3-RELEASE was marked in the tree. Sometime after this marking event occurred, bugs were ientified and subsequently fixed in the -STABLE branch. These bugs have been identified by Jo as SHOWSTOPPER bugs which will prevent him from ever using 6.3-RELEASE, since by their definition, they can only ever use the exact thing identified by the cvs tag of 6.3-RELEASE. Therefore, by Jo's definition, they can never run 6.3-anything at their shop and are forced to wait for 6.4-RELEASE, whenever that happens. Therefore, they must take on the onerous duty of examining all security fixes target for 6.3 and redo them for 6.2. Basically, they do not wish to do this and protest the EoL status given to 6.2 because they are physically prevented from using 6.3. They refuse to even try to identify whether or not 6.3-RELEASE actually has any bugs that affect them, they just assume that the presence of bugs fixed AFTER the tagging of 6.3-RELEASE in cvs certifies their inability to use the actual 6.3-RELEASE code, since they can apparantly only run binary releases direct from FreeBSD and cannot "roll their own" for some unknown reason. They are also, apparantly, prohibited from testing any code locally due to some unknowable reason. Can anyone verify that some number of bugs related to either a) gmirror, b) bge and/or c)twe were fixed after the release of 6.3? That is as far as I can tell the reason that Jo objets to EoL of 6.2, the fact that 6.3 is unusable due to these late-fixed bugs. /Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: cvsup.uk.FreeBSD.org
--- Original message --- From: Ollivier Robert <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Sent: 8.5.'08, 8:35 > Stefan Lambrev disait : > > cvsup.uk.FreeBSD.org is outdated. > > I know this is not the proper list, but which one is? > > freebsd-hubs is, redirected. > > I've noticed that recently but I should have send a mail about it, sorry. > -- > Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- [EMAIL PROTECTED] > Darwin sidhe.keltia.net Version 9.2.0: Tue Feb 5 16:13:22 PST 2008; i386 > Hey guys. I have reclassified this faulty mirror as cvsup1 and made cvsup a cname to cvsup3, which is the most recent addition and best hardware available. In the future we will always point to the most available machine in this way. Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
Johan Ström wrote: But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf seems to tell me that in basic mode I can only access BIOS (pre-OS) using the Remote Console feature, and that after POST I have to have the advanced licensed option? I don't do the purchasing and we get all Advanced iLO, so I will take your word for it. The older generations supported text console (i have a 360G2 that does so). We use the HP Management agents under Windows for all SNMP reporting so I can't comment on the reporting method under other OS's. Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: HP ProLiant DL360 G5 success stories?
The iLO is a completely separate management processor with its own network port. It runs its own OS and has its own IP address. It runs an SSL webserver for access. The iLO is accessible over the network any time the machine is plugged into power. I am not sure about IPMI access to it. The "normal" iLO option will give you exact textual console screen output and keyboard control from the moment of power-on. It will also let you toggle power and hit the reset button. I believe it uses a java applet in the browser. The "advanced" iLO option, which is license-key-unlocked, also provides graphical remote console, and virtual media. You can upload a CD or floppy image and then boot the server from it. I suspect the compatibility issue appears here - the virtual media probably emulates USB mass storage, and the OS must be able to boot from it. It has full reporting of hardware state and management log details, and the "home page" is a big summary with any faults outlined in red. In this data center we probably have 1500 HP machines with iLO. I find it an effective and reliable remote access method. We definitely prefer it using it to our Avocent IP KVMs. Joe Koberg joe at osoft dot us Johan Ström wrote: First of all, nice with all these positive answers! Thank you all (without responding to each and every post:))! On Mar 12, 2008, at 12:35 PM, Pete French wrote: What I'm looking at is a DL360 G5, probably with one E5335 (quad 2.0) and 4G of RAM and 4x 146Gb SAS disks on the Smart Array P400i card. ... So.. Does anyone have any experience with this combo (DL360 G5 / P400i)? We have around 20 machines like that and they work beautifully. We run 7.0/amd64 on the machines now, but we have run 6.2/i386 in the past and that work fine - though you will only be able to use the first 3.5 gig of RAM. I don't have any plans on running i368, running amd64 on the supermicro box now without any problems (that I can relate to that at least). How long have you run 7.0 (before release)? From all the other responses it seems lots of ppl use 7.0 on these without any problems at all. Furthermore, anyone run 7.0 on this? Or should I still stick with We run 7.0 on these machines and it works fine - I always prefer 7.0 to 6.3 on SMP machines as it performs better. Also 7.0 works well with the iLO on these machines - I seem to recall when I installed 6.X that it didn't work too well and I had to use boot floppy images. I'd say go for 7.0 and amd64 if you can. This is where I'm a bit curious. What OS interaction does iLO do? That needs to be "compatible" i mean. On my current box I got a IPMI card that gives me (when its working..) SOL capabilities.. To what degree can I remote control with iLO? If I've understood correct, I get the exact console as on screen with kb access, over web/ssh/telnet. Is this working good? This is one of my important points for changing since its so crappy on my current box, and when the box is a couple of miles away its quite nice to have it working flawlessly.. iLO over internet? Possible, impossible? Encryption? (yes i know, not exactly freebsd related questions but.. ) Another thing, how is it with physical monitoring? Temperatures/fanspeeds/voltage? Thank you (all)! :) -- Johan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Analysis of disk file block with ZFS checksum error
Eric Anderson wrote: > I'm starting to think there is a timing issue or some such problem with > ZFS, since I can use the same drives in a gmirror with UFS, and never > have any data problems (md5 checksums confirm it over-and-over). I > highly doubt that everyone is seeing similar issues and it just is > because ZFS is so intense. I've had plenty of systems under severe disk > load that have never exhibited corrupt files because of something like > this. I also wondered this - i.e. if ZFS was triggering a certain timing behavior that revealed the problem. Still, if this is the case, it seems to me that the problem lies in the ATA subsystem, since it should prevent a higher-level things like ZFS to be able to create bad timings (or am I not thinking of this correctly?). Also, I think there were some reports of problems with DMA/ATA when *not* using ZFS. > I wish we could get our hands on this issue.. Seems like some common > threads are ATA/SATA disks. Is your setup running 32bit or 64bit > FreeBSD? (if you already mentioned it, I'm sorry, I missed it) This was on 32bit FreeBSD with PATA. I am the one who had no SMART issues and no DMA errors reported under Linux. Changing the cable may have "fixed" it, since I did not see errors in some further testing, but even if so, my theory is that there is some edge case (timing?) that the FreeBSD ATA drivers were sensitive to, and perhaps my change of cables pushed the problem to the other side of the threshold. Since I never saw errors under Linux (and I've been using that cable for a couple of years), I do not necessarily think the cable was actually "defective". -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 7.0-STABLE amd64 kernel trap during boot-time device probe
Jeff Blank wrote: Hello, I posted this around 3 months ago and never received a response. the problem still occurs with 7.0-STABLE (csup on 20080301). I possibly incorrectly referred to it as a panic last time, when the problem was really a trap. I also receive "Fatal trap 12: page fault while in kernel mode" while trying to boot a HP Proliant DL580G3 from the 7.0-RELEASE amd64 disc1 CD. I can successfully boot with the verbose boot option from the boot CD, and I installed the system and got it all setup for ZFS root. At long as I booted verbose it worked. But now I have recompiled the kernel to include SCHED_ULE and a few options and I cannot avoid the "Fatal trap 12" It is annoying to troubleshoot on this machine because the BIOS takes 5 minutes finally get around to booting the OS after a reboot. But it has an iLO management controller that I might be able to arrange access to for anyone who has the skill to find/fix the issue. Joe Koberg joe at osoft dot us Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x258 fault code = supervisor read data, page not present instruction pointer = 0x8:0x8047aa7e stack pointer = 0x10:0xa0677b40 frame pointer = 0x10:0xa0677b60 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 23 (irq21: ohci0+) [thread pid 23 tid 100029 ] Stopped at 0x8047aa7e = _mtx_lock_sleep+0x4e: movl 0x258(%rcx),%esi db> === end panic === === no panic === [...] ums0: on uhub0 ums0: 5 buttons and Z dir. ukbd0: on uhub0 kbd2 at ukbd0 Timecounters tick every 1.000 msec firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) acd0: DMA limited to UDMA33, device found non-ATA66 cable acd0: DVDR at ata0-master UDMA33 ad4: 238475MB at ata2-master SATA300 ad8: 157066MB at ata4-master SATA300 ad10: 157066MB at ata5-master SATA300 ar0: 314133MB status: READY ar0: disk0 READY using ad8 at ata4-master ar0: disk1 READY using ad10 at ata5-master SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/ad4s1a [continue successful boot] === end no panic === - End forwarded message - ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: is there any raid5 in software in FreeBSD ?
ZFS has RAIDZ - very similar to RAID5 (with added features), if you don't mind ZFS's current experimental state. -Joe Nenhum_de_Nos wrote: > i've seen RAID 0 through 3 (skip 2 ;) ) > > thanks, > > matheus > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Multiple key presses are hindered when repeat turned off
I have verified this on two machines, but it would be helpful if others out there can reproduce it too. Also, I do not know if it is Xorg or the FreeBSD keyboard drivers, since I see no way to reproduce on the console (i.e. turn off repeat). In an xterm, type: "xset r off". Then try some multiple-key combinations (i.e. keep holding first key(s) when you type the next one): po (o does not appear) lk (k does not appear) grep (e does not appear) When you release the keys, the press events will show up. Keyboards in general have limited multiple-key (rollover) capabilities, but using "xset r off" reduces these to the point that you will often mistype things, and it seems unique to FreeBSD. I am using 7.0-RC2 at the moment. Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Revisiting jerky/freezing mouse issue in 7.0
I spent some time looking again at a trace I posted last month showing mouse "jerkiness/freezing" under load (note that I see it all of the time under light load too, but it is harder to reproduce on demand). Here's the trace: http://www.skyrush.com/downloads/ktr_ule_4.out The large stretches of yellow in the Xorg process are what trouble me. Clearly, Xorg is yielding processor time mostly to, in this case, xtrs, which is getting a whole lot of time. If you look at the fairly regular mouse events, you'll notice that moused runs for a short time on each mouse even from psm0 and then sleeps. This makes sense, and it appears moused is acting correctly. But many of these mouse events are seemingly ignored by Xorg, which spends most of its time yielding (yellow) and not getting "woken up" by the events to simply process them. I've noticed, also, that Xorg can "get behind" easily and spend its time catching up on event processing for a while after I stop using the mouse. It just doesn't seem to be getting an appropriate amount of CPU time, or at least it yields too long between runs, to make interactivity smooth. These yields, I believe, are the freezes I see. Here's a question: does Xorg "respond" to mouse events, or does it just wake up every now and then and check? Note that even when Xorg runs, it only runs for a very short time. If the ULE scheduluer is being fair, I would think this might give Xorg *more* of a share of the CPU to use to service these events, since it is running a lot less than xtrs. One interesting point is at timestamp 1478223777518. It looks like Xorg *starts* to yield when moused runs. Here's the line: 1478223777518 sched_add: 0xa7be1660(Xorg) prio 160 by 0xa5eb7aa0(moused) Does this mean that moused *caused* Xorg to yield, or am I reading this incorrectly)? The yield then lasts through a series of mouse moves. A quick look through the graph shows that this happens quite a bit, which seems like the reverse of what we'd like. This issue (especially since it does not even require continuous heavy CPU use to see) is a constant distraction while using the system, and again I want to volunteer my time to help track it down. I am not sure how to further delve into it, so if there is some additional data I can gather, please let me know, and I'll gladly do it. Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: mount of ext2fs volume stuck in "D+" state (disk uninterruptible wait)
New information: it looks as though this ext2fs was already mounted when the mount was attempted. I have reproduced the issue by simply trying to mount the ext2fs volume more than once. Given this, I'd expect the mount to return an already mounted error rather than hanging, so this is perhaps a straightforward bug. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: mount of ext2fs volume stuck in "D+" state (disk uninterruptible wait)
Kris Kennaway wrote: > Joe Peterson wrote: >> I just tried (under FreeBSD 7.0-RC1) to mount an ext2fs volume - I've >> mounted it before with no trouble on this same FreeBSD version. This >> time, mount appeared to hang. I noticed that I can see the contents of >> the volume under the mount point, so the mount seemed to "work", but the >> process is stuff. "ps" shows: >> >> root 1307 0.0 0.0 3156 792 p6 D+5:21PM 0:00.00 mount >> /mnt/linux-home >> >> The "ps" man page says that "D" means: "Marks a process in disk (or >> other short term, uninterruptible) wait." >> >> Is there any way I can investigate what is going on? I cannot umount >> (device busy) or break out of the mount command... > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html But unfortunately I do not have KDB and DDB compiled into the kernel. And, obviously, if I reboot, I will lose this opportunity. I suspect this to be an intermittent thing. Is there anything I can extract while the system is running that would be useful? Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
mount of ext2fs volume stuck in "D+" state (disk uninterruptible wait)
I just tried (under FreeBSD 7.0-RC1) to mount an ext2fs volume - I've mounted it before with no trouble on this same FreeBSD version. This time, mount appeared to hang. I noticed that I can see the contents of the volume under the mount point, so the mount seemed to "work", but the process is stuff. "ps" shows: root 1307 0.0 0.0 3156 792 p6 D+5:21PM 0:00.00 mount /mnt/linux-home The "ps" man page says that "D" means: "Marks a process in disk (or other short term, uninterruptible) wait." Is there any way I can investigate what is going on? I cannot umount (device busy) or break out of the mount command... Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Analysis of disk file block with ZFS checksum error
Gavin Atkinson wrote: > Are the datestamps (Thu Jan 24 23:20:58 2008) found within the corrupt > block before or after the datestamp of the file it was found within? > i.e. was the corrupt block on the disk before or after the mp3 was > written there? Hi Gavin, those dated are later than the original copy (I do not have the file timestamps to prove this, but according to my email record, I am pretty sure of this). So the corrupt block is later than the original write. If this is the case, I assume that the block got written, by mistake, into the middle of the mp3 file. Someone else suggested that it could be caused by a bad transfer block number or bad drive command (corrupted on the way to the drive, since these are not checksummed in the hardware). If the block went to the wrong place, AND if it was a HW glitch, I suppose the best ZFS could then do is retry the write (if its failure was even detected - still not sure if ZFS does a re-check of the disk data checksum after the disk write), not knowing until the later scrub that the block had corrupted a file. I think that anything is possible, but I know I was getting periodic DMA timeouts, etc. around that time. I hesitate, although it is tempting, to use this evidence to focus blame purely on bad HW, given that others seem to be seeing DMA problems too, and there is reasonable doubt whether their problems are HW related or not. In my case, I have been free of DMA errors (cross your fingers) after re-installed FreeBSD completely (giving it a larger boot partition and redoing the ZFS slice too), and before this, I changed the IDE cable just to eliminate one more variable. Therefore, there are too many variables to reach a firm conclusion, since even if the cable was "bad", I never saw one DMA error or other indication of anything wrong with HW from the Linux side (and I've been using that HW with both Linux and FreeBSD 6.2 for months now - no apparent flakiness of any kind on either system). So either it *was* bad and FreeBSD 7.0 was being more "honest", FreeBSD's drivers and/or ZFS was stressing the HW and revealing weaknesses in the cable, or it was a SW issue that got cleared somehow when I re-installed. Is it possible that the problem lies in the ATA drivers in FreeBSD or even in ZFS and just looks like HW issues? I do not have enough info/expertise to know. If not, then it may very well be true that HW problems are pretty widespread (and that disk HW cannot, in fact, be trusted), and there really *is* a strong need for ZFS *now* to protect our data. If there is a possibility that SW could be involved, any hints on how to further debug this would be of great help to those still experiencing recent DMA errors. I just want to be more sure one way or the other, but I know this issue is not an easy one (however, it's the kind of problem that should receive the highest priority, IMHO). -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Analysis of disk file block with ZFS checksum error
Julian Elischer wrote: > it could be an old file.. > what kind of disks? It's a Seagate ST3500630A parallel ATA drive. > I had a scenario where 3ware controllers were just failing to write to > a drive in the array, so old data showed through. I have an Intel ICH4 controller - nothing unusual. > the filesystem and the partitions and the raids all were on different > alignments so teh only part of the system that had a boundary that > aligned with the bad data was the physical stripes laid down by the > controller. It was 64k stripes and 64k data missing, exactly on > stripe boundaries. Due to the fact that FreeBSD had partitioned the > drive staring at 63 blocks in, nothing else aligned with the problem. Hmm, well this is a straight-forward disk situation - never used RAID on this drive. Give what is happening, I wonder the changes of it being HW, OS, or a filesystem issue. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Analysis of disk file block with ZFS checksum error
Chris Dillon wrote: > That is a chunk of a Mozilla Mork-format database. Perhaps the > Firefox URL history or address book from Thunderbird. Interesting (thanks to all who recognized Mork). I do use Firefox and Thunderbird, so it's feasible, but how the heck would a piece of one of those files find its way into 1/2 of a ZFS block in one of my mp3 files? I wonder if it could have been done on write when the file was copied to the ZFS pool (maybe some write-caching issue?), but I thought ZFS would have verified the block after write. It seems unlikely that it would get changed later - I never rewrote that file after the original copy... -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Analysis of disk file block with ZFS checksum error
Mark Day wrote: > Based on the subset of data you posted, the bad data looks like ASCII > text. > The bad data from offset a to a000f is: > > ${138AFE{@ > @$$}1 > > The bad data from offset af6c1 to af6c8 is: > > 392A9}@ > > I don't recognize the content beyond that, but I'd guess that somehow > the > contents of some other file managed to overwrite that portion of the bad > file. As for how that happened, I don't know. But if someone > recognizes > where the bad content came from, that might be a clue. Gary/Mark, Good eye! Yes, it indeed does appear to be ASCII. I *thought* something in the repetition when I originally did an od -a looked interesting. I dumped the whole bad section as a string, and here's (partly) what I get: ${138AFE{@ @$$}138AFE}@ @$${138AFF{@ [A3:^80(^91^2146F)] @$$}138AFF}@ @$${138B00{@ @$$}138B00}@ @$${138B01{@ [181:^80(^91^2146F)] @$$}138B01}@ @$${138B02{@ @$$}138B02}@ @$${138B03{@ [2C:^80(^91^2146F)] @$$}138B03}@ @$${138B04{@ @$$}138B04}@ . . . @$${138B8B{@ <(21470=Thu Jan 24 23:20:58 2008)> [117:^80(^91^21470)] @$$}138B8B}@ . . . @$${138C18{@ <(21472=1201242069)>[-2:^80(^82^85)(^83^1B5)(^84=b)(^85=1)(^86=0)(^87=0) (^88=0)(^89^2146C)(^8A=)(^8B=40)(^8C=2e)(^8D^84)(^8E=0)(^90^21472) (^91^21460)] @$$}138C18}@ @$${138C19{@ <(21473=a72f78)>[2:^80(^89^21473)] @$$}138C19}@ @$${138C1A{@ @$$}138C1A}@ . . . and more of the same. Note the date string. There are several like that. Anyone recognize this text format? -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Analysis of disk file block with ZFS checksum error
In my experimentation with the ZFS filesystem, I encountered one case of a file block with a checksum mismatch. Doing a "zpool scrub" revealed it, and trying to read the file yielded an error - only the part of the file before the bad block was read (ZFS aborts reading at this point, which makes sense), resulting in a short file. The reason the CKSUM error is not fixable is because my ZFS pool contains only one device (no mirror or RAIDZ), but I do have the original/good version of the file affected. Here's the output of zpool status (new scrub in process): pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress, 64.36% done, 0h18m to go config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 2 hda6 ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/ Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3 I was curious about what actually happened: was this a ZFS bug, trouble with its metadata, or truly a bad block? In order to determine this, I modified ZFS's source code temporarily to ignore the checksum mismatch and let the file read fully. What I then got was the full-length file and no errors, showing that there were no disk read errors associated with the read (I already had assumed this from the fact that zpool status showed only a non-zero CKSUM count), however, I may have seen other error counts previously (ZFS resets them to zero on, e.g., reboot). I received no errors when originally copying this file *to* the ZFS pool - only on subsequent reads/scrubs. (Note that I have posted before about DMA errors in my log for the disk I am using, but I have had nothing but successful SeaTools tests (surface scans) of the drive. Jeremy Chadwick had similar issues, as did others, so I think it is worth investigating if there is some OS/software cause rather than real HW issues. This is one reason I wanted to investigate my ZFS checksum issue more deeply.) I also have a good backup of the file in question, so I now have two copies of the file: one good, and one with a bad block. The file is 3575936 bytes long, and recordsize (in ZFS) is 128K, making the file about 27 blocks long. Curiously, the bad section of the file is exactly 65536 bytes long (1/2 a block). The bad block starts at exactly the 5th 128K block (byte 65536 or hex a). I wanted to see the characteristics of the bad data. Was just one bit flipped randomly? No. It is just one bit or set of bits in the bytes that are affected? It doesn't seem so. Were there any other stange patterns here? Well, yes, and maybe someout out there with more knowledge/experience in disk modes of failure will recognize something (I have included some data below). For one thing (as I mentioned), only 65536 bytes are bad (and it's exactly this many, with a few "good" bytes thrown in, but not far from what matches random chance would produce. Also, all bad bytes have a zero in the high bit - interesting? Also, near the end of the block, the bad bytes all go to zero, strangely coincident with the first "good" zero in that bad block - not sure if that's coincidence or not. Also, I calculated the number of "Bits same" (matching bits) in the good vs. bad bytes, and it appears fairly random, so it appears that the bad bytes are very random in nature and not correlated much at all with the good bytes. So except for the fact that the 2nd half (65536 bytes) of the ZFS block are good, the bad block seems to consist of random data, except for the string of zero bytes near the end and the zero high-bit. It's not as if one bit on the disk flipped - it affects the whole (1/2) block. Does this seem like a disk error, controller error/bug, cable problem (I recently put a new cable on, so I doubt this). It seems to me something more systemic rather than a random bit error - opinions are more than welcome. Here is some info from a python program I wrote to look at the data (I've left out spans of essentially uninteresting portions showing similar stuff, but I can get you the whole thing if interested): File posGoodBad Match Good (bin) Bad (bin) Bits same 0009fff0d9 d9 Yes 11011001110110018 0009fff105 05 Yes 010101018 0009fff2c1 c1 Yes 110111018 0009fff381 81 Yes 100110018 0009fff45f 5f Yes 010101018 0009fff566 66 Yes
Re: Frequent USB mouse disconnections under load with RELENG_7
Wayne Sierke wrote: > On Fri, 2008-01-25 at 01:59 +1030, Wayne Sierke wrote: >> I'm getting a lot of USB mouse disconnects on RELENG_7. I wondered >> whether they might have been due to running with a KTR-enabled kernel >> but in just the last 7 hours I've been running on stock GENERIC and >> they're still happening. Hey Wayne, I'm not sure if you associating the disconnects with the "jerky mouse" behavior, but as an added datapoint, I have a PS/2 mouse, I see *no* disconnects in the system logs (well, it's PS/2...), and I still get the jerky mouse... -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Unexpected "resilver" after reboot (after scrub found CKSUM problems)
[...reposting to freebsd-stable - no response on freebsd-fs] I had a strange thing happen on ZFS the other day, and I cannot find any info about it on the web - thought you might have some ideas. I am using 7.0-RC1 at the moment. I found a checksum error in ZFS during a scrub. This is strange in itself, since I believe the disk is OK (see below): pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 ad0s1dONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3 This is how it appears after a recent reboot, however. After a scrub, I see varying number of non-zero counts under CKSUM. Not sure why it is zero after reboot (maybe that's normal). However, the strange this is that after my first reboot after the scrub found the issue, zpool status told me that "resilver completed with 0 errors", and there were no known errors. Only trying to read the file and/or rescrubbing returned the status to the error state and made the CKSUM column non-zero. Since I do not have a mirror or raid config, I'm not sure why it would resilver at all, and I did nothing explicit to cause a resilver (as far as I know)... Any ideas? As an aside, I, along with some others on freebsd-stable@freebsd.org, have been seeing what "look" like disk errors in the system logs. I have a suspicion that there could be some other cause (lots of discussion on that list, if you are interested). Strangely, this disk checks out fine on both short and long tests in Seatools, and smartctl shows it as OK. Also, using Linux to do lots of reads from it does not show any issue or error logs. At this point, I am not sure if the CKSUM issue is a real HW flaw or something else... Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
Remco van Bekkum wrote: > Well it looks like in my case it is hardware related after all. It failed to > read the boot > block several times now. 2nd sort of DOA of this disk... Have you tried reading the block in another OS or using SeaTools? That would at least verify that it's hardware. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
Jeremy Chadwick wrote: >> If this is widespread, I think the chances re slim that it is a >> hardware problem in every case. > > I'm in definite agreement here. I think it might be worthwhile to note > what hardware we're all using, in case there's something similar between > our systems (chipset, disk vendor, etc.). > > My system is as follows; timeouts were reported during an rsync of data > from the ZFS stripe (ad8+ad10) to a UFS2 filesystem on ad6. System > eventually panic'd after remaining deadlocked (while kernel messages > about timeouts kept printing on the console for ad6 only) for 10-15 > minutes. > > * MB: Supermicro PDSMI+ (Intel ICH7-based) > * CPU: Intel Core 2 Duo E6600 > * RAM: Corsair CM2X1024-6400 DDR2, 2GB > * ad4: WD Caviar SE WD2000JD (boot/OS) > * ad6: Seagate Barracuda 7200.10 ST3500630AS > * ad8: WD Caviar SE16 WD5000AAKS (ZFS stripe) > * ad10: WD Caviar SE16 WD5000AAKS (ZFS stripe) > * All drives are hooked up to the ICH7. > * SMART stats showed no problems on any of the drives before or after. > * RELENG_7, i386, ULE scheduler. Mine is as follows: * MB: Tyan Trinity S2099 * CPU: Pentium 4, 2.4GHz * RAM: Crucial DDR, ECC, CL2.5, Unbuffered 2GB (1/2 PC2100, 1/2 PC2700) * ad0: Seagate ST3500630A 3.AAE (1 UFS2 boot, 1 ZFS pool) * ad1: Seagate ST3160812A 3.AAH (not used by FreeBSD) * Intel ICH4 UDMA100 controller * ATI Radeon RV280 9250 * Intel PRO/1000 NIC * 7.0-RC1, i386, ULE scheduler -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
Remco van Bekkum wrote: > Same here. On an amd64 system with 1x sata disk (Western Digital Caviar > Green Power) on an amd690G chipset, with UFS and intensive disk activity > the system hangs and in the end it may panic. I've csupped today and > rebuild world & generic kernel but still it's very unstable, sometimes it > even hangs when activating geom volumes at boot time... > I must add that this is a new system so I'm not 100% sure the hardware is > sane. > Using ZFS it also crashed when doing intensive I/O. This is very interesting. It seems to there are several of us who are experiencing something that *looks* like hardware (disk) issues when using 7.0. Could this be related to the mouse freeze issue? Could some process be locking/grabbing the CPU at inopportune times and causing not only the freezing symptoms but also reads/writes problems? Can anyone else using 7.0 who hasn't already (especially those using ZFS) check his/her /var/log/messages for disk TIMEOUTs or other disk error messages? If this is widespread, I think the chances re slim that it is a hardware problem in every case. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
Ivan Voras wrote: > Were both tests done in the same machine (actually, I mean the same PSU)? Yes - I deliberately changed nothing (not even cables) before I ran the tests. I didn't want any variables. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
Joe Peterson wrote: > So I have started a "SeaTools" (disk scanner from Seagate) "long test" of the > drive. The short test passed already. The results should be interesting. If > it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS > bugs that just happen to look like drive problems. I already did a long read, > under linux, of disk contents, and got no messages about anything wrong. Update: both SHORT and LONG tests passed for this drive in SeaTools. Hmph... the mystery remains. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
I performed a ZFS scrub, which finished yesterday, and no new /var/log/messages errors were reported during that time. However, the scrub found something interesting: crater# zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 1 errors on Fri Jan 25 12:52:32 2008 config: NAMESTATE READ WRITE CKSUM tankONLINE 1 3 2 ad0s1dONLINE 1 3 2 errors: Permanent errors have been detected in the following files: /home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_ Bachelor_Pad/07-Snowfall.mp3 Note that I have not touched this file since copying it to this drive. So, it seems one file failed a checksum check during the scrub. I now (expectedly) get errors trying to read this file - probably ZFS indicating the condition. When I just logged in tonight, I got two more /var/log/messages disk messages about WRITE_DMA48 TIMEOUT/FAILURE - might be a coincidence (just as I was typing my password). Also, smartctl still shows PASSED, however, this is interesting: 195 Hardware_ECC_Recovered 0x001a 061 046 000Old_age Always - 9070 The number is much *smaller* now! It was "6" a few minutes before this... wrap around? Hmm, I'm really not sure, at this point, what is going on. So I have started a "SeaTools" (disk scanner from Seagate) "long test" of the drive. The short test passed already. The results should be interesting. If it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS bugs that just happen to look like drive problems. I already did a long read, under linux, of disk contents, and got no messages about anything wrong. If I can turn on any debugging info to help determine if this is software-related, let me know the magic keywords to use. :) -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
Glad you got it back! Yes, when I was first playing with ZFS, I noticed that booting between single and multi user mode could make the pools "invisible". Import seemed to bring them back... So, is the disk toast, or can you still read anything from it (part table, etc.)? -Joe Jeremy Chadwick wrote: > On Fri, Jan 25, 2008 at 05:00:54PM -0800, Jeremy Chadwick wrote: >> icarus# zfs list >> no datasets available >> >> This doesn't bode well, and doesn't make me happy. At all. > > Pshew! I was able to get ZFS to start seeing the pool again by doing > the following: (Supposedly "zpool import" by itself will show you a > list of pools which it manages to see...") > > icarus# zpool import -f storage > icarus# df -k /storage > Filesystem 1024-blocks Used Avail Capacity Mounted on > storage 957873024 106124032 85174899211%/storage > icarus# zfs list > NAME USED AVAIL REFER MOUNTPOINT > storage 101G 812G 101G /storage > icarus# zpool status > pool: storage > state: ONLINE > scrub: none requested > config: > > NAMESTATE READ WRITE CKSUM > storage ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > > errors: No known data errors > > Back to the drawing board. > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
Jeremy Chadwick wrote: > Joe, I wanted to send you a note about something that I'm still in the > process of dealing with. The timing couldn't be more ironic. > > I decided it would be worthwhile to migrate from my two-disk ZFS stripe > with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3 > disks combined (since they're all the same size). I had another > terminal with gstat -I500ms running in it, so I could see overall I/O. > > All was going well until about the 81GB mark of the copy. gstat started > showing 0KB in/out on all the drives, and the rsync was stalled. ^Z did > nothing, which is usually a bad sign. :-) I ssh'd in and did a dmesg > (summarised): > > ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing > request directly > ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing > request directly > ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327 > ad6: FAILURE - WRITE_DMA timed out LBA=13951071 > ad6: FAILURE - WRITE_DMA timed out LBA=13951327 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839 > ad6: FAILURE - WRITE_DMA timed out LBA=13951583 > ad6: FAILURE - WRITE_DMA timed out LBA=13951839 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351 > g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5 > g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5 > g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5 > g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5 > g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5 > > It appears my /dev/ad6 (a Seagate -- more irony) must have some bad > blocks. Actually, after letting things go for a while, I realised the > box just locked up. Probably kernel panic'd due to the I/O problem. > I'll have to poke at SMART stats later to see what showed up. Wow, pretty crazy! Hmm, and yes, those LBAs do look close together. Well, let me know how the smartctl output looks. I'd be curious if your bad sector count rises. I had noticed that 1 BTW, I tried: crater# dd if=/dev/ad1s4 of=/dev/null bs=64k ^C1408596+0 records in 1408596+0 records out 92313747456 bytes transferred in 1415.324362 secs (65224446 bytes/sec) (I let it go for 92GB or so) - no messages about ad1. So I wonder if this points at either the cable connector on ad0 or the drive itself. I guess I'd rather have a failing drive than motherboard... I originally was wondering if somehow something peculiar about ZFS's disk access pattern was making it happen... THanks for the recomendations. I'll keep an eye on it, and I'll let you know what a cable change does for me. Still, I have not had any ad0 messages since this morning (I haven't been using the system today much, but maybe the cron processes are more likely to trigger it... -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1
John Baldwin wrote: > Hmm, when I look at that graph using schedgraphy from HEAD it just looks > like xtrs is using up all the CPU. Yeah, xtrs is eating a lot of CPU, but I've never seen this affect the mouse movement (making it really jerky) the same way on, e.g., Linux. And the xtrs test is just a way to *reliably* make it happen. It happens intermittently all of the time (at least every few minutes, and often in small batches) even when the system is pretty idle... -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1
Sam Leffler wrote: > Sigh, you are correct. I backrev'd the machine where I ran schedgraph > to RELENG_7 and didn't notice the old version mis-parses the ktr file. > The graph is totally different w/ schedgraph from HEAD. > > Sorry Joe for misleading you. No problem, Sam, but the question I have for you now is: do you see anything with the updated schedgraph that indicates any "freezes" that look funny? The length of the ones I saw with mouse movement were mostly some portion of a second, from maybe 1/8 to 1/2 sec. And there should be a lot of them in quick succession. Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
Chuck Swiger wrote: > On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote: >> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE >> UPDATED WHEN_FAILED RAW_VALUE >> 1 Raw_Read_Error_Rate 0x000f 114 071 006Pre-fail >> Always - 82422948 > [ ... ] >> 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail >> Always - 286126605 > [ ... ] >> 195 Hardware_ECC_Recovered 0x001a 063 046 000Old_age >> Always - 166181300 > > These numbers are quite worrysome-- they should be zero or nearly so > in a healthy drive. It seems to depend on the drive manufacturer. E.g. this is a Seagate. Every Seagate I've ever had (or heard about on the web via smartctl dumps) reports very large numbers for these values. I've heard it described that Seagate shows you the raw numbers (and correctable errors do happen all the time in all drives). In Western Digital drives (IIRC), the numbers shown are the ones that *should* be zero, thereby hiding the low-level errors. Hard to say if my numbers are "too high", but these "corrected" error counts are always frighteningly high in Seagates. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
0 ad0s1dONLINE 1 3 0 errors: No known data errors > Other things which have fixed problems in the past for others: > > * BIOS updates > * Change of motherboards (sometimes replacing board with same model, > other times going with a completely different vendor (implies weird > implementation issues or BIOS problems)) I've been using this same motherboard/BIOS for a long time (as well as this drive), so no changes have happened to the HW recently. The BIOS is the newest, available, I believe (It's a Tyan Trinity S2099, so it's a few years old) > * Changing SATA cables I'm using regular ATA 80-pin cables. Also, these seem to have been working fine for quite a while now. But, yes, I have also witnessed bad cable issues on older systems in the past. I certainly could try a new cable and see if it helps. > * Getting a larger power supply (usually when lots of disk are involved) I only have two drives, so I think the PS has enough capacity in my case. Anyway, thanks for the reply and further questions. Let me know if anything I've sent back is helpful! Thanks, Joe smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630A Serial Number:9QG0DG03 Firmware Version: 3.AAE User Capacity:500,107,862,016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Fri Jan 25 09:55:13 2008 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 1) minutes. Extended self-test routine recommended polling time:( 163) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 071 006Pre-fail Always - 82422948 3 Spin_Up_Time0x0003 093 093 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 56 5 Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 1 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail Always - 286126605 9 Power_On_Hours 0x0032 095 095 000Old_age Always - 5250 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020Old_age Always - 59 187 Unknown_Attribute 0x0032 100 100 000Old_age Always - 0 189 Unknown_Attribute 0x003a 100 100 000Old_age Always - 0 190 Temperature_Celsius 0x0022 065 056 045Old_age Always - 605749283 194 Temperature_Celsius 0x0022 035 044 000Old_age Always - 35 (Lifetime Min/Max 0/15) 195 Hardware_ECC_Recovered 0x001a 063 046 000Old_age Always - 166181300 197 Curren
Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
0 ad0s1dONLINE 1 3 0 errors: No known data errors > Other things which have fixed problems in the past for others: > > * BIOS updates > * Change of motherboards (sometimes replacing board with same model, > other times going with a completely different vendor (implies weird > implementation issues or BIOS problems)) I've been using this same motherboard/BIOS for a long time (as well as this drive), so no changes have happened to the HW recently. The BIOS is the newest, available, I believe (It's a Tyan Trinity S2099, so it's a few years old) > * Changing SATA cables I'm using regular ATA 80-pin cables. Also, these seem to have been working fine for quite a while now. But, yes, I have also witnessed bad cable issues on older systems in the past. I certainly could try a new cable and see if it helps. > * Getting a larger power supply (usually when lots of disk are involved) I only have two drives, so I think the PS has enough capacity in my case. Anyway, thanks for the reply and further questions. Let me know if anything I've sent back is helpful! Thanks, Joe smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630A Serial Number:9QG0DG03 Firmware Version: 3.AAE User Capacity:500,107,862,016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Fri Jan 25 09:55:13 2008 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 1) minutes. Extended self-test routine recommended polling time:( 163) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 071 006Pre-fail Always - 82422948 3 Spin_Up_Time0x0003 093 093 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 56 5 Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 1 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail Always - 286126605 9 Power_On_Hours 0x0032 095 095 000Old_age Always - 5250 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020Old_age Always - 59 187 Unknown_Attribute 0x0032 100 100 000Old_age Always - 0 189 Unknown_Attribute 0x003a 100 100 000Old_age Always - 0 190 Temperature_Celsius 0x0022 065 056 045Old_age Always - 605749283 194 Temperature_Celsius 0x0022 035 044 000Old_age Always - 35 (Lifetime Min/Max 0/15) 195 Hardware_ECC_Recovered 0x001a 063 046 000Old_age Always - 166181300 197 Curren
"ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
I've seen mention of this kind of issue before, but I never saw a solution, except that someone reported that a certain version of 6.x seemed to make it go away - accounts of this problem are a bit vague. I am running 7.0-RC1, and I am seeing the errors periodically, and I am wondering if this is a known issue. Note that smartctl does not report errors logged and gives a "PASSED" to the drive. I am running at UDMA100 ATA. Also, if it matters, I am using ZFS. Attached is a grep of the /var/log/messages file. Let me know if anyone has suggestions. Thanks! Joe Jan 21 23:39:54 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54112319 Jan 22 00:06:29 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=51610951 Jan 22 00:16:40 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=53031647 Jan 22 00:30:15 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54243391 Jan 22 07:05:59 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=51768047 Jan 22 09:08:16 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=55890239 Jan 22 09:17:52 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=55919423 Jan 22 09:23:42 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=53470111 Jan 23 00:26:03 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=53588527 Jan 23 00:26:26 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=764596887 Jan 23 00:26:26 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=764596887 Jan 23 00:26:26 crater kernel: ad0: FAILURE - WRITE_DMA48 status=51 error=10 LBA=764596887 Jan 23 03:01:06 crater kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=185819705 Jan 23 03:01:37 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54837686 Jan 23 03:03:22 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=53472407 Jan 23 03:03:39 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=53627991 Jan 23 11:33:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=5747 Jan 23 12:30:31 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=55407234 Jan 23 13:20:06 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=57779519 Jan 23 17:30:18 crater kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=453849407 Jan 23 17:30:19 crater kernel: ad0: FAILURE - READ_DMA48 status=51 error=10 LBA=453849407 Jan 23 17:30:29 crater kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=187373078 Jan 23 18:34:50 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=1017919 Jan 23 18:35:00 crater kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=54547647 Jan 23 18:35:12 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=56354060 Jan 23 18:35:20 crater kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=53919167 Jan 23 23:59:18 crater kernel: ad0: TIMEOUT - FLUSHCACHE retrying (1 retry left) Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=237661119 Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=237661119 Jan 24 00:00:27 crater kernel: ad0: FAILURE - WRITE_DMA timed out LBA=237661119 Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=236239553 Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=236239553 Jan 24 00:00:27 crater kernel: ad0: FAILURE - WRITE_DMA timed out LBA=236239553 Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=764595671 Jan 24 00:00:27 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=764595671 Jan 24 00:00:27 crater kernel: ad0: FAILURE - WRITE_DMA48 timed out LBA=764595671 Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=764595671 Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=764595671 Jan 24 00:01:13 crater kernel: ad0: FAILURE - WRITE_DMA48 timed out LBA=764595671 Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=236180175 Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=236180175 Jan 24 00:01:13 crater kernel: ad0: FAILURE - WRITE_DMA timed out LBA=236180175 Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - FLUSHCACHE retrying (1 retry left) Jan 24 00:01:13 crater kernel: ad0: TIMEOUT - FLUSHCACHE retrying (0 retries left) Jan 24 02:31:53 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=236191551 Jan 24 04:54:57 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=238068287 Jan 24 04:55:56 crater kernel: ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=238068287 Jan 24 04:55:5
Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1
Sam Leffler wrote: >> http://www.skyrush.com/downloads/ktr_ule_4.out >> > I don't see what it is > from the trace data. It sort of looks like the last thing that ran is > the swi4 which is likely a callout (need to check the log file contents > to be certain). If the callback function does something it wouldn't > necessarily be visible in the schedgraph plot. If you could stick a > dmesg from booting out in the same spot it might be worthwhile. OK, I just ran a dmesg and put it up there: http://www.skyrush.com/downloads/dmesg_4.out The WRITE_DMA messages are not time-correlated with this issue; I don't like the looks of those either, but that's a different issue to look into... > Also if > you rebuild the kernel the kernel with DIAGNOSTIC then softclock() will > complain about callouts that take longer than 2ms to run. OK, recompiling now... Will the new messages appear in dmesg, or in a log file? > This might > generate too much noise in which case you can adjust the threshold by > editing the code in sys/kern/kern_timeout.c. Cool - thanks for looking at this, and I will let you know what I find! Do I need to make another trace concurrently, or should I just repeat the test procedure and see if I get new messages? -Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
New KTR trace for mouse freezing/stuttering in 7.0-RC1
In an attempt to track down this mouse freezing/stuttering (i.e. "jerky mouse movement) behavior in FreeBSD 7.0-RC1, I have come up with a reliable way to cause it to happen, and I have created a longer trace showing the results. Note that I am using the ULE scheduler. In general, it becomes easier to see the effect if there is CPU activity. I have noticed it during kernel compiles, while at the same time loading web pages in firefox that contain images (and moving the mouse while this is happening). But a more controlled way to see it is to run something that uses some CPU and then generating lots of X events. In my case, I start "xtrs" (TRS-80 emulator) in Model IV mode, which happens to poll for input, using the CPU. Then I move the mouse back and forth quickly between windows in "focus under mouse" mode (in my case, a KDE focus mode), which causes many focus events quickly. In about 15 or 20 seconds, the mouse reliably starts to show erratic movement, not moving smoothly. I really hope this can shed more light on what might be going on. Here is the trace: http://www.skyrush.com/downloads/ktr_ule_4.out Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"