Re: Any success stories for HAST + ZFS?
> Everything is detected correctly, everything comes up correctly. See > a new option (reload) in the RC script for hast. same here - have patched the master databse achines, all came up fine, everything running erfectly, have flip-flopped between the two machines with no ill effects whatsoever, and all looking very good. cheers, -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Mon, 11 Apr 2011 11:26:15 -0700 Freddie Cash wrote: FC> On Sun, Apr 10, 2011 at 12:36 PM, Mikolaj Golub wrote: >> On Mon, 4 Apr 2011 11:08:16 -0700 Freddie Cash wrote: >> FC> Once the deadlock patches above are MFC'd to -STABLE, I can do an >> FC> upgrade cycle and test them. >> >> Committed to STABLE. FC> Updated src tree to r220537. Recompiled world, kernel, etc. FC> Installed world, kernel, etc. ZFSv28 patch was not affected. FC> Everything is detected correctly, everything comes up correctly. See FC> a new option (reload) in the RC script for hast. FC> Can create/change role for 24 hast devices simultaneously. FC> Can switch between master/slave modes. FC> Have 5 rsyncs running in parallel without any issues, transferring FC> 80-120 Mbps over the network (just under 100 Mbps seems to be the FC> average right now). FC> Switching roles while the rsyncs are running succeeds without FC> deadlocking (obviously, rsync complains a whole bunch while the switch FC> happens as the pool disappears out from underneath it, but it picks up FC> again when the pool is back in place). FC> Hitting the reset switch on the box while the rsyncs are running FC> doesn't affect the hast devices or the pool, beyond losing the last 5 FC> seconds of writes. FC> It's only been a couple of hours of testing and hammering, but so far FC> things are much more stable/performant than before. Cool! Thanks for reporting! FC> Anything else I should test? Nothing particular, but any tests and reports are appreciated. E.g. ones of the recent features Pawel has added are checksum and compression. You could try different options and compare :-) -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Sun, Apr 10, 2011 at 12:36 PM, Mikolaj Golub wrote: > On Mon, 4 Apr 2011 11:08:16 -0700 Freddie Cash wrote: > FC> Once the deadlock patches above are MFC'd to -STABLE, I can do an > FC> upgrade cycle and test them. > > Committed to STABLE. Updated src tree to r220537. Recompiled world, kernel, etc. Installed world, kernel, etc. ZFSv28 patch was not affected. Everything is detected correctly, everything comes up correctly. See a new option (reload) in the RC script for hast. Can create/change role for 24 hast devices simultaneously. Can switch between master/slave modes. Have 5 rsyncs running in parallel without any issues, transferring 80-120 Mbps over the network (just under 100 Mbps seems to be the average right now). Switching roles while the rsyncs are running succeeds without deadlocking (obviously, rsync complains a whole bunch while the switch happens as the pool disappears out from underneath it, but it picks up again when the pool is back in place). Hitting the reset switch on the box while the rsyncs are running doesn't affect the hast devices or the pool, beyond losing the last 5 seconds of writes. It's only been a couple of hours of testing and hammering, but so far things are much more stable/performant than before. Anything else I should test? -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Mon, 4 Apr 2011 11:08:16 -0700 Freddie Cash wrote: FC> Once the deadlock patches above are MFC'd to -STABLE, I can do an FC> upgrade cycle and test them. Committed to STABLE. -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Tue, Apr 5, 2011 at 5:05 AM, Mikolaj Golub wrote: > On Mon, 4 Apr 2011 11:08:16 -0700 Freddie Cash wrote: > > FC> On Sat, Apr 2, 2011 at 1:44 AM, Pawel Jakub Dawidek > wrote: > >> > >> I just committed a fix for a problem that might look like a deadlock. > >> With trociny@ patch and my last fix (to GEOM GATE and hastd) do you > >> still have any issues? > > FC> Just to confirm, this is commit r220264, 220265, 220266 to -CURRENT? > > Yes, r220264 and 220266. As it is stated in the commit log MFC is planned > after 1 week. Okay. I'll keep an eye out next week for the MFC of those patches to hit -STABLE, and do an upgrade/test cycle after that point. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Mon, 4 Apr 2011 11:08:16 -0700 Freddie Cash wrote: FC> On Sat, Apr 2, 2011 at 1:44 AM, Pawel Jakub Dawidek wrote: >> >> I just committed a fix for a problem that might look like a deadlock. >> With trociny@ patch and my last fix (to GEOM GATE and hastd) do you >> still have any issues? FC> Just to confirm, this is commit r220264, 220265, 220266 to -CURRENT? Yes, r220264 and 220266. As it is stated in the commit log MFC is planned after 1 week. -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Sat, Apr 2, 2011 at 1:44 AM, Pawel Jakub Dawidek wrote: > On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote: >> [Not sure which list is most appropriate since it's using HAST + ZFS >> on -RELEASE, -STABLE, and -CURRENT. Feel free to trim the CC: on >> replies.] >> >> I'm having a hell of a time making this work on real hardware, and am >> not ruling out hardware issues as yet, but wanted to get some >> reassurance that someone out there is using this combination (FreeBSD >> + HAST + ZFS) successfully, without kernel panics, without core dumps, >> without deadlocks, without issues, etc. I need to know I'm not >> chasing a dead rabbit. > > I just committed a fix for a problem that might look like a deadlock. > With trociny@ patch and my last fix (to GEOM GATE and hastd) do you > still have any issues? Just to confirm, this is commit r220264, 220265, 220266 to -CURRENT? Looking through the commit logs, I don't see any of these MFC'd to -STABLE yet, so I can't test them directly. The storage box that was having the issues is running 8-STABLE r219754 at the moment (with ZFSv28 and Mikolag's ggate patches). I see there have been a lot of hast/ggate-related MFCs in the past week, but they don't include the deadlock patches. Once the deadlock patches above are MFC'd to -STABLE, I can do an upgrade cycle and test them. I do have the previous 9-CURRENT install saved, just nothing to run it on atm. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote: > [Not sure which list is most appropriate since it's using HAST + ZFS > on -RELEASE, -STABLE, and -CURRENT. Feel free to trim the CC: on > replies.] > > I'm having a hell of a time making this work on real hardware, and am > not ruling out hardware issues as yet, but wanted to get some > reassurance that someone out there is using this combination (FreeBSD > + HAST + ZFS) successfully, without kernel panics, without core dumps, > without deadlocks, without issues, etc. I need to know I'm not > chasing a dead rabbit. I just committed a fix for a problem that might look like a deadlock. With trociny@ patch and my last fix (to GEOM GATE and hastd) do you still have any issues? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp80PxT4EuiQ.pgp Description: PGP signature
Re: Any success stories for HAST + ZFS?
On Fri, Apr 1, 2011 at 4:22 AM, Pete French wrote: >> The other 5% of the time, the hastd crashes occurred either when >> importing the ZFS pool, or when running multiple parallel rsyncs to >> the pool. hastd was always shown as the last running process in the >> backtrace onscreen. > > This is what I am seeing - did you manage to reproduce this with the patch, > or does it fix the issue for you ? Am doing more test now, with only a single > hast device to see if it is stable. Am Ok to run without mirroring across > hast devices for now, but wouldnt like to do so long term! I have not been able to crash or hang the box since applying Mikolaj's patch. I've tried the following: - destroy pool - create pool - destroy hast providers - create hast providers - switch from master to slave via hastctl using "role secondary all" - switch from slave to master via hastctl using "role primary all" - switch roles via hast-carp-switch which does one provider per second - import/export pool I've been running 6 parallel rsyncs for the past 48 hours, getting a consistent 200 Mbps of transfers, with just under 2 TB of deduped data in the pool, without any lockups. So far, so good. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
> This looks like a different problem. If you have this again please provide the > output of 'procstat -kka'. Will do... -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Fri, 01 Apr 2011 11:40:11 +0100 Pete French wrote: >> Yes, you may hit it only on hast devices creation. The workaround is to >> avoid >> using 'hastctl role primary all', start providers one by one instead. PF> Interesting to note that I just hit a lockup in hast (the discs froze PF> up - could not run hastctl or zpool import, and could not kill PF> them). I have two hast devices instead of one, but I am starting them PF> individually instead of using 'all'. The copde includes all the latest PF> patches which have gone into STABLE over the last few days, none of which PF> look particularly controversial! PF> I havent tried your atch yet, nor been able to reporduce the lockup, but PF> thought you might be interested to know that I also had problems with PF> multiple providers. This looks like a different problem. If you have this again please provide the output of 'procstat -kka'. -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
> The other 5% of the time, the hastd crashes occurred either when > importing the ZFS pool, or when running multiple parallel rsyncs to > the pool. hastd was always shown as the last running process in the > backtrace onscreen. This is what I am seeing - did you manage to reproduce this with the patch, or does it fix the issue for you ? Am doing more test now, with only a single hast device to see if it is stable. Am Ok to run without mirroring across hast devices for now, but wouldnt like to do so long term! -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
> Yes, you may hit it only on hast devices creation. The workaround is to avoid > using 'hastctl role primary all', start providers one by one instead. Interesting to note that I just hit a lockup in hast (the discs froze up - could not run hastctl or zpool import, and could not kill them). I have two hast devices instead of one, but I am starting them individually instead of using 'all'. The copde includes all the latest patches which have gone into STABLE over the last few days, none of which look particularly controversial! I havent tried your atch yet, nor been able to reporduce the lockup, but thought you might be interested to know that I also had problems with multiple providers. cheers, -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Sun, Mar 27, 2011 at 5:16 AM, Mikolaj Golub wrote: On Sat, 26 Mar 2011 10:52:08 -0700 Freddie Cash wrote: > > FC> hastd backtrace is here: > FC> http://www.sd73.bc.ca/downloads/crash/hast-backtrace.png > > It is not a hastd crash, but a kernel crash triggered by hastd process. Ah, interesting. > I am not sure I got the same crash as you but apparently the race is possible > in g_gate on device creation. 95% of the time that it would crash, would be when creating the /dev/hast/* devices (switching to primary role). Most of the crashes happened when doing "hastctl role primary all", but would occasionally happen when doing it manually for each resource. Creating the resources by hand, one every 2 seconds or so, would usually create them all without crashing. The other 5% of the time, the hastd crashes occurred either when importing the ZFS pool, or when running multiple parallel rsyncs to the pool. hastd was always shown as the last running process in the backtrace onscreen. > I got the following crash starting many hast providers simultaneously: > > fault virtual address = 0x0 > > #8 0xc0c11adc in calltrap () at /usr/src/sys/i386/i386/exception.s:168 > #9 0xc086ac6b in g_gate_ioctl (dev=0xc6a24300, cmd=3374345472, > addr=0xc9fec000 "\002", flags=3, td=0xc7ff0b80) > at /usr/src/sys/geom/gate/g_gate.c:410 > #10 0xc0853c5b in devfs_ioctl_f (fp=0xc9b9e310, com=3374345472, > data=0xc9fec000, cred=0xc8c9c200, td=0xc7ff0b80) > at /usr/src/sys/fs/devfs/devfs_vnops.c:678 > #11 0xc09210cd in kern_ioctl (td=0xc7ff0b80, fd=3, com=3374345472, > data=0xc9fec000 "\002") at file.h:262 > #12 0xc0921254 in ioctl (td=0xc7ff0b80, uap=0xf5edbcec) > at /usr/src/sys/kern/sys_generic.c:679 > #13 0xc0916616 in syscallenter (td=0xc7ff0b80, sa=0xf5edbce4) > at /usr/src/sys/kern/subr_trap.c:315 > #14 0xc0c2b9ff in syscall (frame=0xf5edbd28) > at /usr/src/sys/i386/i386/trap.c:1086 > #15 0xc0c11b71 in Xint0x80_syscall () > at /usr/src/sys/i386/i386/exception.s:266 > > Or just creating many ggate devices simultaneously: > > for i in `jot 100`; do > ./ggiocreate $i& > done > > ggiocreate.c is attached. > > In my case the kernel crashes in g_gate_create() when checking for name > collisions in strcmp(): > > /* Check for name collision. */ > for (unit = 0; unit < g_gate_maxunits; unit++) { > if (g_gate_units[unit] == NULL) > continue; > if (strcmp(name, g_gate_units[unit]->sc_provider->name) != 0) > continue; > mtx_unlock(&g_gate_units_lock); > mtx_destroy(&sc->sc_queue_mtx); > free(sc, M_GATE); > return (EEXIST); > } > > I think the issue is the following. When preparing sc we take > g_gate_units_lock, check for name collision, fill sc fields except > sc->sc_provider, and registers sc in g_gate_units[unit]. sc_provider is filled > later, when g_gate_units_lock is released. So the scenario is possible: > > 1) Thread A registers sc in g_gate_units[unit] with > g_gate_units[unit]->sc_provider still null and releases g_gate_units_lock. > > 2) Thread B traverses g_gate_units[] when checking for name collision and > craches accessing g_gate_units[unit]->sc_provider->name. > > The attached patch fixes the issue in my case. Patch applied cleanly to 8-STABLE with ZFSv28 patch also applied. Just to be safe, did a full buildwold/kernel cycle, running GENERIC kernel. So far, I have not been able to produce a crash in hastd, through several reboots, switching from primary to secondary and back, and just switching from primary to init and back. So far, so good. Now to see if I can reproduce any of the ZFS crashes I had earlier. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Mon, 28 Mar 2011 10:47:22 +0100 Pete French wrote: >> It is not a hastd crash, but a kernel crash triggered by hastd process. >> >> I am not sure I got the same crash as you but apparently the race is >> possible >> in g_gate on device creation. >> >> I got the following crash starting many hast providers simultaneously: PF> This is very interestng to me - my successful ZFS+HAST only had PF> a single drive, but in my new setup I am intending to use two PF> HAST processes and then mirror across thhem under ZFS, so I am PF> likely to hit this bug. Are the processes stable once launched ? Yes, you may hit it only on hast devices creation. The workaround is to avoid using 'hastctl role primary all', start providers one by one instead. -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
> It is not a hastd crash, but a kernel crash triggered by hastd process. > > I am not sure I got the same crash as you but apparently the race is possible > in g_gate on device creation. > > I got the following crash starting many hast providers simultaneously: This is very interestng to me - my successful ZFS+HAST only had a single drive, but in my new setup I am intending to use two HAST processes and then mirror across thhem under ZFS, so I am likely to hit this bug. Are the processes stable once launched ? I dont have a system on whcih to try your patch at the moment, but will do so when I get the opportunity! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Sun, 27 Mar 2011 15:16:15 +0300 Mikolaj Golub wrote to Freddie Cash: MG> The attached patch fixes the issue in my case. The patch is committed to current. -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Sat, 26 Mar 2011 10:52:08 -0700 Freddie Cash wrote: FC> hastd backtrace is here: FC> http://www.sd73.bc.ca/downloads/crash/hast-backtrace.png It is not a hastd crash, but a kernel crash triggered by hastd process. I am not sure I got the same crash as you but apparently the race is possible in g_gate on device creation. I got the following crash starting many hast providers simultaneously: fault virtual address = 0x0 #8 0xc0c11adc in calltrap () at /usr/src/sys/i386/i386/exception.s:168 #9 0xc086ac6b in g_gate_ioctl (dev=0xc6a24300, cmd=3374345472, addr=0xc9fec000 "\002", flags=3, td=0xc7ff0b80) at /usr/src/sys/geom/gate/g_gate.c:410 #10 0xc0853c5b in devfs_ioctl_f (fp=0xc9b9e310, com=3374345472, data=0xc9fec000, cred=0xc8c9c200, td=0xc7ff0b80) at /usr/src/sys/fs/devfs/devfs_vnops.c:678 #11 0xc09210cd in kern_ioctl (td=0xc7ff0b80, fd=3, com=3374345472, data=0xc9fec000 "\002") at file.h:262 #12 0xc0921254 in ioctl (td=0xc7ff0b80, uap=0xf5edbcec) at /usr/src/sys/kern/sys_generic.c:679 #13 0xc0916616 in syscallenter (td=0xc7ff0b80, sa=0xf5edbce4) at /usr/src/sys/kern/subr_trap.c:315 #14 0xc0c2b9ff in syscall (frame=0xf5edbd28) at /usr/src/sys/i386/i386/trap.c:1086 #15 0xc0c11b71 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266 Or just creating many ggate devices simultaneously: for i in `jot 100`; do ./ggiocreate $i& done ggiocreate.c is attached. In my case the kernel crashes in g_gate_create() when checking for name collisions in strcmp(): /* Check for name collision. */ for (unit = 0; unit < g_gate_maxunits; unit++) { if (g_gate_units[unit] == NULL) continue; if (strcmp(name, g_gate_units[unit]->sc_provider->name) != 0) continue; mtx_unlock(&g_gate_units_lock); mtx_destroy(&sc->sc_queue_mtx); free(sc, M_GATE); return (EEXIST); } I think the issue is the following. When preparing sc we take g_gate_units_lock, check for name collision, fill sc fields except sc->sc_provider, and registers sc in g_gate_units[unit]. sc_provider is filled later, when g_gate_units_lock is released. So the scenario is possible: 1) Thread A registers sc in g_gate_units[unit] with g_gate_units[unit]->sc_provider still null and releases g_gate_units_lock. 2) Thread B traverses g_gate_units[] when checking for name collision and craches accessing g_gate_units[unit]->sc_provider->name. The attached patch fixes the issue in my case. -- Mikolaj Golub ggiocreate.c Description: Binary data Index: sys/geom/gate/g_gate.c === --- sys/geom/gate/g_gate.c (revision 220050) +++ sys/geom/gate/g_gate.c (working copy) @@ -407,13 +407,14 @@ g_gate_create(struct g_gate_ctl_create *ggio) for (unit = 0; unit < g_gate_maxunits; unit++) { if (g_gate_units[unit] == NULL) continue; - if (strcmp(name, g_gate_units[unit]->sc_provider->name) != 0) + if (strcmp(name, g_gate_units[unit]->sc_name) != 0) continue; mtx_unlock(&g_gate_units_lock); mtx_destroy(&sc->sc_queue_mtx); free(sc, M_GATE); return (EEXIST); } + sc->sc_name = name; g_gate_units[sc->sc_unit] = sc; g_gate_nunits++; mtx_unlock(&g_gate_units_lock); @@ -432,6 +433,9 @@ g_gate_create(struct g_gate_ctl_create *ggio) sc->sc_provider = pp; g_error_provider(pp, 0); g_topology_unlock(); + mtx_lock(&g_gate_units_lock); + sc->sc_name = sc->sc_provider->name; + mtx_unlock(&g_gate_units_lock); if (sc->sc_timeout > 0) { callout_reset(&sc->sc_callout, sc->sc_timeout * hz, Index: sys/geom/gate/g_gate.h === --- sys/geom/gate/g_gate.h (revision 220050) +++ sys/geom/gate/g_gate.h (working copy) @@ -76,6 +76,7 @@ * 'P:' means 'Protected by'. */ struct g_gate_softc { + char *sc_name; /* P: (read-only) */ int sc_unit; /* P: (read-only) */ int sc_ref; /* P: g_gate_list_mtx */ struct g_provider *sc_provider; /* P: (read-only) */ @@ -96,7 +97,6 @@ struct g_gate_softc { LIST_ENTRY(g_gate_softc) sc_next; /* P: g_gate_list_mtx */ char sc_info[G_GATE_INFOSIZE]; /* P: (read-only) */ }; -#define sc_name sc_provider->geom->name #define G_GATE_DEBUG(lvl, ...) do { \ if (g_gate_debug >= (lvl)) { \ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Fri, Mar 25, 2011 at 12:55 AM, Pawel Jakub Dawidek wrote: > On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote: >> I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28 >> patches, and 9-CURRENT (after the ZFSv28 commit). Things work well >> until I start hastd. Then either the system locks up, or hastd causes >> a kernel panic, or hastd dumps core. > > The minimum amount of information (as always) would be backtrace from > the kernel and also hastd backtrace when it coredumps. There is really > decent logging in hast, so I'm also sure it does log something > interesting on primary or secondary. Another useful thing would be to > turn on debugging in hast (single -d option for hastd). > > The best you can do is to give me the simplest and quickest procedure to > reproduce the issue, eg. configure two hast resources, put ZFS mirror on > top, start rsync /usr/src to the file system on top of hast and switch > roles. The simpler the better. FreeBSD 8-STABLE r219754 with the ZFSv28 patches applied. hast.conf: resource disk-a1 { local /dev/label/disk-a1 on omegadrive { remote tcp4://10.20.0.102 } on alphadrive { remote tcp4://10.20.0.101 } } resource disk-a2 { local /dev/label/disk-a2 on omegadrive { remote tcp4://10.20.0.102 } on alphadrive { remote tcp4://10.20.0.101 } } Following will crash hastd: service hastd onestart hastctl create disk-a1 hastctl create disk-a2 hastctl role primary all hastd backtrace is here: http://www.sd73.bc.ca/downloads/crash/hast-backtrace.png I'll try running it with -d to see if there's anything interesting there. Sure, running it with -d and -F, output to a log file, everything works well using 2 disks. Hrm, running it with all 24 disks, I can't make it crash now. However, I did change the kernel hz from 100 to 1000. I'll see if I can switch it back to 100 and try the tests again using -dF. The backtrace listed above is with kern.hz=100. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
Hi, 2011/3/24 Freddie Cash : > The hardware is fairly standard fare: > - SuperMicro H8DGi-F motherboard > - AMD Opteron 6100-series CPU (8-cores @ 2.0 GHz) > - 8 GB DDR3 SDRAM > - 64 GB Kingston V-Series SSD for the OS install (using ahci(4) and > the motherboard SATA controller) > - 3x SuperMicro AOC-USAS2-8Li SATA controllers with IT firmware > - 6x 1.5 TB Seagate 7200.11 drives (1x raidz2 vdev) > - 12x 1.0 TB Seagate 7200.12 drives (2x raidz2 vdev) > - 6x 0.5 TB WD RE3 drives (1x raidz2 vdev) just for info, sun recommend 1 Gb of RAM per Tera of data. i see here ~ 16 To of available data, so i would recommend 16 Gb for arc_size and 24 or 32 Gb for the host. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Any success stories for HAST + ZFS?
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote: > I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28 > patches, and 9-CURRENT (after the ZFSv28 commit). Things work well > until I start hastd. Then either the system locks up, or hastd causes > a kernel panic, or hastd dumps core. The minimum amount of information (as always) would be backtrace from the kernel and also hastd backtrace when it coredumps. There is really decent logging in hast, so I'm also sure it does log something interesting on primary or secondary. Another useful thing would be to turn on debugging in hast (single -d option for hastd). The best you can do is to give me the simplest and quickest procedure to reproduce the issue, eg. configure two hast resources, put ZFS mirror on top, start rsync /usr/src to the file system on top of hast and switch roles. The simpler the better. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpQSkm3Cfnru.pgp Description: PGP signature
Re: Any success stories for HAST + ZFS?
> So, please, someone, somewhere, share a success story, where you're > using FreeBSD, ZFS, and HAST. Let me know that it does work. I'm > starting to lose faith in my abilities here. :( I ran our main database for the old company using ZFS on top of HAST without any problems at all. Had a single HAST disc with a zpool on top of it, and mysql on top of that. All worked perfectly for us. Am not running that currently as the company went under and we lost the hardware. But am working for a new business and am about to deploy the same configuration for the main database as its "tried and tested" as far as I am concerned. Will be slightly different, as I will have a pair of HAST drives and do mirroring over the top with ZFS. But I shall report back how well, or not, it works. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Any success stories for HAST + ZFS?
[Not sure which list is most appropriate since it's using HAST + ZFS on -RELEASE, -STABLE, and -CURRENT. Feel free to trim the CC: on replies.] I'm having a hell of a time making this work on real hardware, and am not ruling out hardware issues as yet, but wanted to get some reassurance that someone out there is using this combination (FreeBSD + HAST + ZFS) successfully, without kernel panics, without core dumps, without deadlocks, without issues, etc. I need to know I'm not chasing a dead rabbit. In tests using VirtualBox and FreeBSD 8-STABLE from when HAST was first MFC'd, everything worked wonderfully. HAST-based pool would come up, data would sync to the slave node, fail-over worked nicely, bringing the other box back online as the slave worked, data synced back, etc. It was a thing of beauty. Now, on real hardware, I cannot get the system to stay online for more than an hour. :( hastd causes kernel panics with "bufwrite: buffer not busy" errors. ZFS pools get corrupted. System deadlocks (no log messages, no onscreen errors, not even NumLock key works) at random points. The hardware is fairly standard fare: - SuperMicro H8DGi-F motherboard - AMD Opteron 6100-series CPU (8-cores @ 2.0 GHz) - 8 GB DDR3 SDRAM - 64 GB Kingston V-Series SSD for the OS install (using ahci(4) and the motherboard SATA controller) - 3x SuperMicro AOC-USAS2-8Li SATA controllers with IT firmware - 6x 1.5 TB Seagate 7200.11 drives (1x raidz2 vdev) - 12x 1.0 TB Seagate 7200.12 drives (2x raidz2 vdev) - 6x 0.5 TB WD RE3 drives (1x raidz2 vdev) The motherboard BIOS is up-to-date. I do not see any way to update the firmware on the SATA controllers. Using the onboard IPMI-based sensors, CPU, motherboard, RAM temps and volatages are in the nominal range. I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28 patches, and 9-CURRENT (after the ZFSv28 commit). Things work well until I start hastd. Then either the system locks up, or hastd causes a kernel panic, or hastd dumps core. Each harddrive is glabel'd as "disk-a1" through "disk-d6". hast.conf has 24 resources listed, one for each glabel'd device. The pool is created using the /dev/hast/* devices with disk-a1 through disk-a6 being one raidz2 vdev, and so on through disk-b*, disk-c*, and disk-d*, for a total of 4 raidz2 vdevs of 6 drives each. A fairly standard setup, I would think. Even using a GENERIC kernel, I can't keep things stable and running. So, please, someone, somewhere, share a success story, where you're using FreeBSD, ZFS, and HAST. Let me know that it does work. I'm starting to lose faith in my abilities here. :( Or point out where I'm doing things wrong so I can correct the issues. Thanks. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"