[OmniOS-discuss] For developers - the dmake-in-illumos flag day

2015-07-13 Thread Dan McDonald
Hello folks who use OmniOS to build illumos-{omnios,gate}!

Y'all have likely seen Rich Lowe's flag day announcement.  The 
developer/build/make package is now in illumos-gate (and the master branch of 
illumos-omnios).  Sometime in the next 48 hours I plan to update the bloody IPS 
repo server with new bits that embrace this flag day.

What used to be developer/build/make will now be built in illumos-omnios, and 
contain less bits.  To that end, two new packages:

developer/as

developer/versioning/sccs

have been created.  "developer/as" is a BIT of a misnomer, as it includes other 
closed-source binaries as well:

bloody(~)[0]% pkg contents developer/as
PATH
usr/bin/as
usr/lib/amd64
usr/lib/amd64/libtdf.so
usr/lib/amd64/libtdf.so.1
usr/lib/amd64/libxprof.so
usr/lib/amd64/libxprof.so.1
usr/lib/amd64/libxprof_audit.so
usr/lib/amd64/libxprof_audit.so.1
usr/lib/libtdf.so
usr/lib/libtdf.so.1
usr/lib/libxprof.so
usr/lib/libxprof.so.1
usr/lib/libxprof_audit.so
usr/lib/libxprof_audit.so.1
bloody(~)[0]% 

I may choose to split those out further, but for the next bloody build, it'll 
stay like that.

IF you are attempting to build the world of omnios-build or all of OOod, you 
will be unpleasantly surprised.  There are chicken & egg problems to overcome.  
I've done it for you, and when the next update of bloody happens, you'll be 
just fine building the new world order of things.

Thanks, and watch this space for an update to bloody.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Derek Yarnell


On 7/13/15 12:02 PM, Dan McDonald wrote:
> 
>> On Jul 13, 2015, at 11:56 AM, Derek Yarnell  wrote:
>>
>> I don't need to hot patch (cold patch would be fine) so any update that
>> I can apply and reboot would be fine.  We have a second OmniOS r14 copy
>> running that we are happy to patch in any way possible to get it mounted rw.
> 
> IF (and only if) it's the bug I mentioned that's the problem.
> 
> I want ZFS experts to take a look as well.  It's on the ZFS list now, so 
> we'll see what happens.  If you're REALLY feeling brave, I can build a 
> replacement ZFS module with 6033 in place for you to try, but I can't promise 
> it'll work.

Hi Dan,

I would be happy to try to test a build with 6033 on it to see if that
is my issue.  We have secured all the critical data and only have
scratch data left.  So at this point I would happy to take a chance to
see if this will fix the issue.

Thanks,
derek

-- 
Derek T. Yarnell
University of Maryland
Institute for Advanced Computer Studies
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] pkgsrc-current OmniOS 170cea2/i386 2015-07-09 21:35

2015-07-13 Thread Dan McDonald

> On Jul 13, 2015, at 5:11 PM, Sevan / Venture37  wrote:
> 
> Where can I get the release number (the r151014) from the system? (I'm
> currently unable to reach the OmniOS zone I'm using to check for
> myself)

/etc/release.  (Yes, this even works in zones.)

>> Is there anything we can fix to help these move along?
> 
> I'd like to propose a fix for the gettext issue but I've not been able
> to get very far with the buildctl tool in omnios-build. It appears to
> be incapable of generating a repo or utilising a pre-prepared repo for
> me (I used  a different path for the location of my repo if that makes
> any difference).

To rebuild gettext from omnios-build:

Set the PKGSRVR environment variable to be where your repo lives.  e.g. 
file:///data/builder/builder.repo/

cd $PATH_TO/omnios-build/build

./buildctl build gettext

Say no to pkglint and say yes to publication, verifying that PKGSRVR was set in 
your environment.

> I wanted to evaluate the solution proposed previously in this thread
> that is to rebuild gettext with -rpath specified which includes the
> lib directory of the GCC with building with so that libgomp can be
> found.
> Happy to get a pull request in for the change necessary once I've
> worked out what's needed.

We take 'em in omnios-build.

> The gettext issue is not present in SmartOS and is specific to OmniOS.

Weird... as I don't see any changes in the libc callers:

bloody(~/ws)[0]% diff -r illumos-{gate,joyent}/usr/src/lib/libc/port/i18n/
bloody(~/ws)[0]% diff -r illumos-{omnios,joyent}/usr/src/lib/libc/port/i18n/
bloody(~/ws)[0]% 


> Building specifically on the various distros is of benefit to pkgsrc
> because it builds against another version of toolchain on each distro.
> One of the proposed solutions has been to ignore everything & rebuild
> from scratch using components in pkgsrc. - This would cause a
> substantial delay in bootstrap time get everything built.
> Another proposal was to patch pkgsrc so that gettext bundled is
> ignored and replaced with the version in pkgsrc - theres an effort to
> generate a patch for that.
> 
> I personally would rather get the gettext fixed upstream (in OmniOS)
> and there doesn't need any work around in the pkgsrc tree or having to
> recreate the user land components to work around a single component.

I'm game for any changes.  Your best bet is to modify 
$PATH_TO/omnios-build/build/gettext/build.sh.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] pkgsrc-current OmniOS 170cea2/i386 2015-07-09 21:35

2015-07-13 Thread Michael Rasmussen
On Mon, 13 Jul 2015 22:11:05 +0100
"Sevan / Venture37"  wrote:

> 
> This was only recently implemented at pkgsrcCon and pending review so
> it's not something that's in the pkgsrc tree yet, previously it was
> just marked as Solaris 11 and I was manually adding illumos version
> info manually which was clumsy. It can be changed, not a problem.
> Where can I get the release number (the r151014) from the system? (I'm
> currently unable to reach the OmniOS zone I'm using to check for
> myself)
> 
pkg info kernel |perl -ne 'print "$1\n" if /Branch:\s+\d\.(\d+)/'

or to add 'r'

pkg info kernel |perl -ne 'print "r$1\n" if /Branch:\s+\d\.(\d+)/'

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--
/usr/games/fortune -es says:
No discipline is ever requisite to force attendance upon lectures which
are really worth the attending.
-- Adam Smith, "The Wealth of Nations"


pgp5C4k2eMMDK.pgp
Description: OpenPGP digital signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] pkgsrc-current OmniOS 170cea2/i386 2015-07-09 21:35

2015-07-13 Thread Sevan / Venture37
Hi Dan,

On 13 July 2015 at 05:51, Dan McDonald  wrote:
> 
>
> I also saw you mention this indirectly on twitter.
>
> Generally, the OmniOS release should mention which release.  170cea2 is 
> r151014.  It's good to mention that alongside the uname as that's how most of 
> us lock in on a release.

This was only recently implemented at pkgsrcCon and pending review so
it's not something that's in the pkgsrc tree yet, previously it was
just marked as Solaris 11 and I was manually adding illumos version
info manually which was clumsy. It can be changed, not a problem.
Where can I get the release number (the r151014) from the system? (I'm
currently unable to reach the OmniOS zone I'm using to check for
myself)

> Is there anything we can fix to help these move along?

I'd like to propose a fix for the gettext issue but I've not been able
to get very far with the buildctl tool in omnios-build. It appears to
be incapable of generating a repo or utilising a pre-prepared repo for
me (I used  a different path for the location of my repo if that makes
any difference).
I wanted to evaluate the solution proposed previously in this thread
that is to rebuild gettext with -rpath specified which includes the
lib directory of the GCC with building with so that libgomp can be
found.
Happy to get a pull request in for the change necessary once I've
worked out what's needed.

> Also, you ARE aware that pkgsrc's Jonathan Perkin works for Joyent, and does 
> work to make sure pkgsrc bits build on all illumos distros, right?

Yes I'm aware that Jonathan Perkin works for Joyent on pkgsrc which
runs on SmartOS and should in theory work on the other illumos distros
provided the other distros are in sync with the changes Joyent made to
the source. :)
The gettext issue is not present in SmartOS and is specific to OmniOS.
Building specifically on the various distros is of benefit to pkgsrc
because it builds against another version of toolchain on each distro.
One of the proposed solutions has been to ignore everything & rebuild
from scratch using components in pkgsrc. - This would cause a
substantial delay in bootstrap time get everything built.
Another proposal was to patch pkgsrc so that gettext bundled is
ignored and replaced with the version in pkgsrc - theres an effort to
generate a patch for that.

I personally would rather get the gettext fixed upstream (in OmniOS)
and there doesn't need any work around in the pkgsrc tree or having to
recreate the user land components to work around a single component.



Sevan / Venture37
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Dan McDonald

> On Jul 13, 2015, at 11:56 AM, Derek Yarnell  wrote:
> 
> I don't need to hot patch (cold patch would be fine) so any update that
> I can apply and reboot would be fine.  We have a second OmniOS r14 copy
> running that we are happy to patch in any way possible to get it mounted rw.

IF (and only if) it's the bug I mentioned that's the problem.

I want ZFS experts to take a look as well.  It's on the ZFS list now, so we'll 
see what happens.  If you're REALLY feeling brave, I can build a replacement 
ZFS module with 6033 in place for you to try, but I can't promise it'll work.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Derek Yarnell
>> ff0d4071ca98::print arc_buf_t b_hdr |::print arc_buf_hdr_t b_size
> b_size = 0
>>
> 
> Ouch.  There's your zero.
> 
> I'm going to forward this very note to the illumos ZFS list.  I see ONE 
> possible bugfix post-r151014 that might help:
> 
> commit 31c46cf23cd1cf4d66390a983dc5072d7d299ba2
> Author: Alek Pinchuk 
> Date:   Tue Jun 30 09:44:11 2015 -0700
> 
> 6033 arc_adjust() should search MFU lists for oldest buffer when 
> adjusting MFU size
> Reviewed by: Saso Kiselkov 
> Reviewed by: Xin Li 
> Reviewed by: Prakash Surya 
> Approved by: Matthew Ahrens 
> 
> It's a small bug, and I shudder to say this, even hot-patchable on a running 
> system if you're desperate.  :)
> 

Hi Dan,

I don't need to hot patch (cold patch would be fine) so any update that
I can apply and reboot would be fine.  We have a second OmniOS r14 copy
running that we are happy to patch in any way possible to get it mounted rw.

Thanks,
derek

-- 
Derek T. Yarnell
University of Maryland
Institute for Advanced Computer Studies
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Dan McDonald

> On Jul 13, 2015, at 11:29 AM, Dan McDonald  wrote:
> 
> 
>> On Jul 13, 2015, at 11:25 AM, Derek Yarnell  wrote:
>> 
>> https://obj.umiacs.umd.edu/derek_support/vmdump.0
> 
> Yeah, that's what I'm seeking.  Downloading it now to an r151014 box (you are 
> running r151014 according to the first mail).  My normal '014 box is 
> otherwise indisposed at the moment, so this dump may take a bit longer to 
> analyze.  I can forward it along to the ZFS folks once I've done my initial 
> analysis.
> 
> For bugs like these, I usually have to engage the illumos ZFS list.  If 
> anyone here wants to follow along, I'll Cc: you on anything I report to them.


Okay, it's a VERIFY() failure in zio_buf_alloc().  It's passed a size of 0 by 
its caller.  Observe this MDB interaction:

> $c
vpanic()
0xfba8b13d()
zio_buf_alloc+0x49(0)
arc_get_data_buf+0x12b(ff0d4071ca98)
arc_buf_alloc+0xd2(ff0d4dfec000, 0, 0, 1)
...


0xff0d4071ca98 is an arc_buf_t, read off of disk.  The code in 
arc_get_data_buf starts with:

static void
arc_get_data_buf(arc_buf_t *buf)
{
arc_state_t *state = buf->b_hdr->b_l1hdr.b_state;
uint64_tsize = buf->b_hdr->b_size;
arc_buf_contents_t  type = arc_buf_type(buf->b_hdr);


So let's look at that size:

> ff0d4071ca98::print arc_buf_t b_hdr |::print arc_buf_hdr_t b_size
b_size = 0
> 

Ouch.  There's your zero.

I'm going to forward this very note to the illumos ZFS list.  I see ONE 
possible bugfix post-r151014 that might help:

commit 31c46cf23cd1cf4d66390a983dc5072d7d299ba2
Author: Alek Pinchuk 
Date:   Tue Jun 30 09:44:11 2015 -0700

6033 arc_adjust() should search MFU lists for oldest buffer when adjusting 
MFU size
Reviewed by: Saso Kiselkov 
Reviewed by: Xin Li 
Reviewed by: Prakash Surya 
Approved by: Matthew Ahrens 

It's a small bug, and I shudder to say this, even hot-patchable on a running 
system if you're desperate.  :)

Thanks,
Dan
d
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Dan McDonald

> On Jul 13, 2015, at 11:25 AM, Derek Yarnell  wrote:
> 
> https://obj.umiacs.umd.edu/derek_support/vmdump.0

Yeah, that's what I'm seeking.  Downloading it now to an r151014 box (you are 
running r151014 according to the first mail).  My normal '014 box is otherwise 
indisposed at the moment, so this dump may take a bit longer to analyze.  I can 
forward it along to the ZFS folks once I've done my initial analysis.

For bugs like these, I usually have to engage the illumos ZFS list.  If anyone 
here wants to follow along, I'll Cc: you on anything I report to them.

Thanks!
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Derek Yarnell
Hi Dan,

Sorry I have not dealt with dumpadm/savecore that much but it looks like
this is what you want.

https://obj.umiacs.umd.edu/derek_support/vmdump.0

Thanks,
derek

On 7/13/15 12:55 AM, Dan McDonald wrote:
> 
>> On Jul 12, 2015, at 9:18 PM, Richard Elling 
>>  wrote:
>>
>> Dan, if you're listening, Matt would be the best person to weigh-in on this.
> 
> Yes he would be, Richard..
> 
> The panic in the arc_get_data_buf() paths is similar to older problems we'd 
> seen in r151006.
> 
> Derek, do you have a kernel coredump from these?  I know you've been 
> panic-and-reboot-and-panic-ing, but if you can get savecore(1M) to do its 
> thing, having that dump would be useful.
> 
> Thanks,
> Dan
> 
> 

-- 
Derek T. Yarnell
University of Maryland
Institute for Advanced Computer Studies
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] big zfs storage?

2015-07-13 Thread Schweiss, Chip
Liam,

This report is encouraging.  Please share some details of your
configuration.   What disk failure parameters are have you set?   Which
JBODs and disks are you running?

I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
expanders and Supermicro has LSI, both setups have pretty much the same
behavior with disk failures.   All my servers are Supermicro with LSI HBAs.

If there's a magic combination of hardware and OS config out there that
solves the disk failure panic problem, I will certainly change my builds
going forward.

-Chip

On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser  wrote:

> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T
> systems.  Things generally work very well.  We loose a disk here and there
> but its never resulted in downtime.  They're all on Dell hardware with LSI
> or Dell PERC controllers.
>
> Putting in smaller disk failure parameters, so disks fail quicker, was a
> big help when something does go wrong with a disk.
>
> thanks,
> liam
>
>
> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip 
> wrote:
>
>> Unfortunately for the past couple years panics on disk failure has been
>> the norm.   All my production systems are HA with RSF-1, so at least things
>> come back online relatively quick.  There are quite a few open tickets in
>> the Illumos bug tracker related to mpt_sas related panics.
>>
>> Most of the work to fix these problems has been committed in the past
>> year, though problems still exist.  For example, my systems are dual path
>> SAS, however, mpt_sas will panic if you pull a cable instead of dropping a
>> path to the disks.  Dan McDonald is actively working to resolve this.   He
>> is also pushing a bug fix in genunix from Nexenta that appears to fix a lot
>> of the panic problems.   I'll know for sure in a few months after I see a
>> disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
>> responsible for most of the updates to mpt_sas including support for 3008
>> (12G SAS).
>>
>> I haven't run any 12G SAS yet, but plan to on my next build in a couple
>> months.   This will be about 300TB using an 84 disk JBOD.  All the code
>> from Nexenta to support the 3008 appears to be in Illumos now, and they
>> fully support it so I suspect it's pretty stable now.  From what I
>> understand there may be some 12G performance fixes coming sometime.
>>
>> The fault manager is nice when the system doesn't panic.  When it panics,
>> the fault manger never gets a chance to take action.  It is still the
>> consensus that is is better to run pools without hot spares because there
>> are situations the fault manager will do bad things.   I witnessed this
>> myself when building a system and the fault manger replaced 5 disks in a
>> raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely
>> yield to the "best practice".  I now run one hot spare per pool.  I figure
>> with raidz2, the odds of the fault manager causing something catastrophic
>> is much less possible.
>>
>> -Chip
>>
>>
>>
>> On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley 
>> wrote:
>>
>>>  I have to build and maintain my own system. I usually help others
>>> build(i teach zfs and freenas classes/consulting). I really love fault
>>> management in solaris and miss it. Just thought since it's my system and I
>>> get to choose I would use omni. I have 20+ years using solaris and only 2
>>> on freebsd.
>>>
>>> I like freebsd for how well tuned for zfs oob. I miss the network, v12n
>>> and resource controls in solaris.
>>>
>>> Concerned about panics on disk failure. Is that common?
>>>
>>>
>> linda
>>>
>>>
>>> On 7/9/15 9:30 PM, Schweiss, Chip wrote:
>>>
>>>   Linda,
>>>
>>>  I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
>>> which is considered the best choice for HBAs.
>>>
>>> Illumos leaves a bit to be desired with handling faults from disks or
>>> SAS problems, but things under OmniOS have been improving, much thanks to
>>> Dan McDonald and OmniTI.   We have a paid support on all of our production
>>> systems with OmniTI.  Their response and dedication has been very good.
>>> Other than the occasional panic and restart from a disk failure, OmniOS has
>>> been solid.   ZFS of course never has lost a single bit of information.
>>>
>>>  I'd be curious why you're looking to move, have there been specific
>>> problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
>>> course the skeletons in the closet never seem to come out until you do
>>> something big.
>>>
>>>  -Chip
>>>
>>> On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley 
>>> wrote:
>>>
 Hey is there anyone out there running big zfs on omni?

 I have been doing mostly zol and freebsd for the last year but have to
 build a 300+TB box and i want to come back home to roots(solaris). Feeling
 kind of hesitant :) Also, if you had to do over, is there anything you
 would do different.

 Also, what is the go to HBA these days? Seems