[OmniOS-discuss] For developers - the dmake-in-illumos flag day
Hello folks who use OmniOS to build illumos-{omnios,gate}! Y'all have likely seen Rich Lowe's flag day announcement. The developer/build/make package is now in illumos-gate (and the master branch of illumos-omnios). Sometime in the next 48 hours I plan to update the bloody IPS repo server with new bits that embrace this flag day. What used to be developer/build/make will now be built in illumos-omnios, and contain less bits. To that end, two new packages: developer/as developer/versioning/sccs have been created. "developer/as" is a BIT of a misnomer, as it includes other closed-source binaries as well: bloody(~)[0]% pkg contents developer/as PATH usr/bin/as usr/lib/amd64 usr/lib/amd64/libtdf.so usr/lib/amd64/libtdf.so.1 usr/lib/amd64/libxprof.so usr/lib/amd64/libxprof.so.1 usr/lib/amd64/libxprof_audit.so usr/lib/amd64/libxprof_audit.so.1 usr/lib/libtdf.so usr/lib/libtdf.so.1 usr/lib/libxprof.so usr/lib/libxprof.so.1 usr/lib/libxprof_audit.so usr/lib/libxprof_audit.so.1 bloody(~)[0]% I may choose to split those out further, but for the next bloody build, it'll stay like that. IF you are attempting to build the world of omnios-build or all of OOod, you will be unpleasantly surprised. There are chicken & egg problems to overcome. I've done it for you, and when the next update of bloody happens, you'll be just fine building the new world order of things. Thanks, and watch this space for an update to bloody. Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ZFS crash/reboot loop
On 7/13/15 12:02 PM, Dan McDonald wrote: > >> On Jul 13, 2015, at 11:56 AM, Derek Yarnell wrote: >> >> I don't need to hot patch (cold patch would be fine) so any update that >> I can apply and reboot would be fine. We have a second OmniOS r14 copy >> running that we are happy to patch in any way possible to get it mounted rw. > > IF (and only if) it's the bug I mentioned that's the problem. > > I want ZFS experts to take a look as well. It's on the ZFS list now, so > we'll see what happens. If you're REALLY feeling brave, I can build a > replacement ZFS module with 6033 in place for you to try, but I can't promise > it'll work. Hi Dan, I would be happy to try to test a build with 6033 on it to see if that is my issue. We have secured all the critical data and only have scratch data left. So at this point I would happy to take a chance to see if this will fix the issue. Thanks, derek -- Derek T. Yarnell University of Maryland Institute for Advanced Computer Studies ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] pkgsrc-current OmniOS 170cea2/i386 2015-07-09 21:35
> On Jul 13, 2015, at 5:11 PM, Sevan / Venture37 wrote: > > Where can I get the release number (the r151014) from the system? (I'm > currently unable to reach the OmniOS zone I'm using to check for > myself) /etc/release. (Yes, this even works in zones.) >> Is there anything we can fix to help these move along? > > I'd like to propose a fix for the gettext issue but I've not been able > to get very far with the buildctl tool in omnios-build. It appears to > be incapable of generating a repo or utilising a pre-prepared repo for > me (I used a different path for the location of my repo if that makes > any difference). To rebuild gettext from omnios-build: Set the PKGSRVR environment variable to be where your repo lives. e.g. file:///data/builder/builder.repo/ cd $PATH_TO/omnios-build/build ./buildctl build gettext Say no to pkglint and say yes to publication, verifying that PKGSRVR was set in your environment. > I wanted to evaluate the solution proposed previously in this thread > that is to rebuild gettext with -rpath specified which includes the > lib directory of the GCC with building with so that libgomp can be > found. > Happy to get a pull request in for the change necessary once I've > worked out what's needed. We take 'em in omnios-build. > The gettext issue is not present in SmartOS and is specific to OmniOS. Weird... as I don't see any changes in the libc callers: bloody(~/ws)[0]% diff -r illumos-{gate,joyent}/usr/src/lib/libc/port/i18n/ bloody(~/ws)[0]% diff -r illumos-{omnios,joyent}/usr/src/lib/libc/port/i18n/ bloody(~/ws)[0]% > Building specifically on the various distros is of benefit to pkgsrc > because it builds against another version of toolchain on each distro. > One of the proposed solutions has been to ignore everything & rebuild > from scratch using components in pkgsrc. - This would cause a > substantial delay in bootstrap time get everything built. > Another proposal was to patch pkgsrc so that gettext bundled is > ignored and replaced with the version in pkgsrc - theres an effort to > generate a patch for that. > > I personally would rather get the gettext fixed upstream (in OmniOS) > and there doesn't need any work around in the pkgsrc tree or having to > recreate the user land components to work around a single component. I'm game for any changes. Your best bet is to modify $PATH_TO/omnios-build/build/gettext/build.sh. Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] pkgsrc-current OmniOS 170cea2/i386 2015-07-09 21:35
On Mon, 13 Jul 2015 22:11:05 +0100 "Sevan / Venture37" wrote: > > This was only recently implemented at pkgsrcCon and pending review so > it's not something that's in the pkgsrc tree yet, previously it was > just marked as Solaris 11 and I was manually adding illumos version > info manually which was clumsy. It can be changed, not a problem. > Where can I get the release number (the r151014) from the system? (I'm > currently unable to reach the OmniOS zone I'm using to check for > myself) > pkg info kernel |perl -ne 'print "$1\n" if /Branch:\s+\d\.(\d+)/' or to add 'r' pkg info kernel |perl -ne 'print "r$1\n" if /Branch:\s+\d\.(\d+)/' -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -- /usr/games/fortune -es says: No discipline is ever requisite to force attendance upon lectures which are really worth the attending. -- Adam Smith, "The Wealth of Nations" pgp5C4k2eMMDK.pgp Description: OpenPGP digital signature ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] pkgsrc-current OmniOS 170cea2/i386 2015-07-09 21:35
Hi Dan, On 13 July 2015 at 05:51, Dan McDonald wrote: > > > I also saw you mention this indirectly on twitter. > > Generally, the OmniOS release should mention which release. 170cea2 is > r151014. It's good to mention that alongside the uname as that's how most of > us lock in on a release. This was only recently implemented at pkgsrcCon and pending review so it's not something that's in the pkgsrc tree yet, previously it was just marked as Solaris 11 and I was manually adding illumos version info manually which was clumsy. It can be changed, not a problem. Where can I get the release number (the r151014) from the system? (I'm currently unable to reach the OmniOS zone I'm using to check for myself) > Is there anything we can fix to help these move along? I'd like to propose a fix for the gettext issue but I've not been able to get very far with the buildctl tool in omnios-build. It appears to be incapable of generating a repo or utilising a pre-prepared repo for me (I used a different path for the location of my repo if that makes any difference). I wanted to evaluate the solution proposed previously in this thread that is to rebuild gettext with -rpath specified which includes the lib directory of the GCC with building with so that libgomp can be found. Happy to get a pull request in for the change necessary once I've worked out what's needed. > Also, you ARE aware that pkgsrc's Jonathan Perkin works for Joyent, and does > work to make sure pkgsrc bits build on all illumos distros, right? Yes I'm aware that Jonathan Perkin works for Joyent on pkgsrc which runs on SmartOS and should in theory work on the other illumos distros provided the other distros are in sync with the changes Joyent made to the source. :) The gettext issue is not present in SmartOS and is specific to OmniOS. Building specifically on the various distros is of benefit to pkgsrc because it builds against another version of toolchain on each distro. One of the proposed solutions has been to ignore everything & rebuild from scratch using components in pkgsrc. - This would cause a substantial delay in bootstrap time get everything built. Another proposal was to patch pkgsrc so that gettext bundled is ignored and replaced with the version in pkgsrc - theres an effort to generate a patch for that. I personally would rather get the gettext fixed upstream (in OmniOS) and there doesn't need any work around in the pkgsrc tree or having to recreate the user land components to work around a single component. Sevan / Venture37 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ZFS crash/reboot loop
> On Jul 13, 2015, at 11:56 AM, Derek Yarnell wrote: > > I don't need to hot patch (cold patch would be fine) so any update that > I can apply and reboot would be fine. We have a second OmniOS r14 copy > running that we are happy to patch in any way possible to get it mounted rw. IF (and only if) it's the bug I mentioned that's the problem. I want ZFS experts to take a look as well. It's on the ZFS list now, so we'll see what happens. If you're REALLY feeling brave, I can build a replacement ZFS module with 6033 in place for you to try, but I can't promise it'll work. Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ZFS crash/reboot loop
>> ff0d4071ca98::print arc_buf_t b_hdr |::print arc_buf_hdr_t b_size > b_size = 0 >> > > Ouch. There's your zero. > > I'm going to forward this very note to the illumos ZFS list. I see ONE > possible bugfix post-r151014 that might help: > > commit 31c46cf23cd1cf4d66390a983dc5072d7d299ba2 > Author: Alek Pinchuk > Date: Tue Jun 30 09:44:11 2015 -0700 > > 6033 arc_adjust() should search MFU lists for oldest buffer when > adjusting MFU size > Reviewed by: Saso Kiselkov > Reviewed by: Xin Li > Reviewed by: Prakash Surya > Approved by: Matthew Ahrens > > It's a small bug, and I shudder to say this, even hot-patchable on a running > system if you're desperate. :) > Hi Dan, I don't need to hot patch (cold patch would be fine) so any update that I can apply and reboot would be fine. We have a second OmniOS r14 copy running that we are happy to patch in any way possible to get it mounted rw. Thanks, derek -- Derek T. Yarnell University of Maryland Institute for Advanced Computer Studies ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ZFS crash/reboot loop
> On Jul 13, 2015, at 11:29 AM, Dan McDonald wrote: > > >> On Jul 13, 2015, at 11:25 AM, Derek Yarnell wrote: >> >> https://obj.umiacs.umd.edu/derek_support/vmdump.0 > > Yeah, that's what I'm seeking. Downloading it now to an r151014 box (you are > running r151014 according to the first mail). My normal '014 box is > otherwise indisposed at the moment, so this dump may take a bit longer to > analyze. I can forward it along to the ZFS folks once I've done my initial > analysis. > > For bugs like these, I usually have to engage the illumos ZFS list. If > anyone here wants to follow along, I'll Cc: you on anything I report to them. Okay, it's a VERIFY() failure in zio_buf_alloc(). It's passed a size of 0 by its caller. Observe this MDB interaction: > $c vpanic() 0xfba8b13d() zio_buf_alloc+0x49(0) arc_get_data_buf+0x12b(ff0d4071ca98) arc_buf_alloc+0xd2(ff0d4dfec000, 0, 0, 1) ... 0xff0d4071ca98 is an arc_buf_t, read off of disk. The code in arc_get_data_buf starts with: static void arc_get_data_buf(arc_buf_t *buf) { arc_state_t *state = buf->b_hdr->b_l1hdr.b_state; uint64_tsize = buf->b_hdr->b_size; arc_buf_contents_t type = arc_buf_type(buf->b_hdr); So let's look at that size: > ff0d4071ca98::print arc_buf_t b_hdr |::print arc_buf_hdr_t b_size b_size = 0 > Ouch. There's your zero. I'm going to forward this very note to the illumos ZFS list. I see ONE possible bugfix post-r151014 that might help: commit 31c46cf23cd1cf4d66390a983dc5072d7d299ba2 Author: Alek Pinchuk Date: Tue Jun 30 09:44:11 2015 -0700 6033 arc_adjust() should search MFU lists for oldest buffer when adjusting MFU size Reviewed by: Saso Kiselkov Reviewed by: Xin Li Reviewed by: Prakash Surya Approved by: Matthew Ahrens It's a small bug, and I shudder to say this, even hot-patchable on a running system if you're desperate. :) Thanks, Dan d ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ZFS crash/reboot loop
> On Jul 13, 2015, at 11:25 AM, Derek Yarnell wrote: > > https://obj.umiacs.umd.edu/derek_support/vmdump.0 Yeah, that's what I'm seeking. Downloading it now to an r151014 box (you are running r151014 according to the first mail). My normal '014 box is otherwise indisposed at the moment, so this dump may take a bit longer to analyze. I can forward it along to the ZFS folks once I've done my initial analysis. For bugs like these, I usually have to engage the illumos ZFS list. If anyone here wants to follow along, I'll Cc: you on anything I report to them. Thanks! Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ZFS crash/reboot loop
Hi Dan, Sorry I have not dealt with dumpadm/savecore that much but it looks like this is what you want. https://obj.umiacs.umd.edu/derek_support/vmdump.0 Thanks, derek On 7/13/15 12:55 AM, Dan McDonald wrote: > >> On Jul 12, 2015, at 9:18 PM, Richard Elling >> wrote: >> >> Dan, if you're listening, Matt would be the best person to weigh-in on this. > > Yes he would be, Richard.. > > The panic in the arc_get_data_buf() paths is similar to older problems we'd > seen in r151006. > > Derek, do you have a kernel coredump from these? I know you've been > panic-and-reboot-and-panic-ing, but if you can get savecore(1M) to do its > thing, having that dump would be useful. > > Thanks, > Dan > > -- Derek T. Yarnell University of Maryland Institute for Advanced Computer Studies ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] big zfs storage?
Liam, This report is encouraging. Please share some details of your configuration. What disk failure parameters are have you set? Which JBODs and disks are you running? I have mostly DataON JBODs and a some Supermicro. DataON has PMC SAS expanders and Supermicro has LSI, both setups have pretty much the same behavior with disk failures. All my servers are Supermicro with LSI HBAs. If there's a magic combination of hardware and OS config out there that solves the disk failure panic problem, I will certainly change my builds going forward. -Chip On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser wrote: > I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T > systems. Things generally work very well. We loose a disk here and there > but its never resulted in downtime. They're all on Dell hardware with LSI > or Dell PERC controllers. > > Putting in smaller disk failure parameters, so disks fail quicker, was a > big help when something does go wrong with a disk. > > thanks, > liam > > > On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip > wrote: > >> Unfortunately for the past couple years panics on disk failure has been >> the norm. All my production systems are HA with RSF-1, so at least things >> come back online relatively quick. There are quite a few open tickets in >> the Illumos bug tracker related to mpt_sas related panics. >> >> Most of the work to fix these problems has been committed in the past >> year, though problems still exist. For example, my systems are dual path >> SAS, however, mpt_sas will panic if you pull a cable instead of dropping a >> path to the disks. Dan McDonald is actively working to resolve this. He >> is also pushing a bug fix in genunix from Nexenta that appears to fix a lot >> of the panic problems. I'll know for sure in a few months after I see a >> disk or two drop if it truly fixes things. Hans Rosenfeld at Nexenta is >> responsible for most of the updates to mpt_sas including support for 3008 >> (12G SAS). >> >> I haven't run any 12G SAS yet, but plan to on my next build in a couple >> months. This will be about 300TB using an 84 disk JBOD. All the code >> from Nexenta to support the 3008 appears to be in Illumos now, and they >> fully support it so I suspect it's pretty stable now. From what I >> understand there may be some 12G performance fixes coming sometime. >> >> The fault manager is nice when the system doesn't panic. When it panics, >> the fault manger never gets a chance to take action. It is still the >> consensus that is is better to run pools without hot spares because there >> are situations the fault manager will do bad things. I witnessed this >> myself when building a system and the fault manger replaced 5 disks in a >> raidz2 vdev inside 1 minute, trashing the pool. I haven't completely >> yield to the "best practice". I now run one hot spare per pool. I figure >> with raidz2, the odds of the fault manager causing something catastrophic >> is much less possible. >> >> -Chip >> >> >> >> On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley >> wrote: >> >>> I have to build and maintain my own system. I usually help others >>> build(i teach zfs and freenas classes/consulting). I really love fault >>> management in solaris and miss it. Just thought since it's my system and I >>> get to choose I would use omni. I have 20+ years using solaris and only 2 >>> on freebsd. >>> >>> I like freebsd for how well tuned for zfs oob. I miss the network, v12n >>> and resource controls in solaris. >>> >>> Concerned about panics on disk failure. Is that common? >>> >>> >> linda >>> >>> >>> On 7/9/15 9:30 PM, Schweiss, Chip wrote: >>> >>> Linda, >>> >>> I have 3.5 PB running under OmniOS. All my systems have LSI 2108 HBAs >>> which is considered the best choice for HBAs. >>> >>> Illumos leaves a bit to be desired with handling faults from disks or >>> SAS problems, but things under OmniOS have been improving, much thanks to >>> Dan McDonald and OmniTI. We have a paid support on all of our production >>> systems with OmniTI. Their response and dedication has been very good. >>> Other than the occasional panic and restart from a disk failure, OmniOS has >>> been solid. ZFS of course never has lost a single bit of information. >>> >>> I'd be curious why you're looking to move, have there been specific >>> problems under BSD or ZoL? I've been slowly evaluating FreeBSD ZFS, but of >>> course the skeletons in the closet never seem to come out until you do >>> something big. >>> >>> -Chip >>> >>> On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley >>> wrote: >>> Hey is there anyone out there running big zfs on omni? I have been doing mostly zol and freebsd for the last year but have to build a 300+TB box and i want to come back home to roots(solaris). Feeling kind of hesitant :) Also, if you had to do over, is there anything you would do different. Also, what is the go to HBA these days? Seems