from:"Saso Kiselkov"

Re: [OmniOS-discuss] Status of the OmniOS project

2017-09-08 Thread Saso Kiselkov

On 9/8/17 2:38 PM, Sylvain Leroux wrote:
> Hi everyone,
> 
> After Oracle has laid off the core Solaris developers, I'm writing an
> article for https://itsfoss.com about the various
> OpenSolaris-based/Illumos-based distributions.
> 
> I've already heard about OmniOS previously and wanted to mention it as
> the de-facto Illumos server distribution.
> However while doing some researches, I found that article:
> 
> https://www.theregister.co.uk/2017/04/25/oracle_free_solaris_project_stops/
> 
> Here is the headline:
> """
> Development of OmniOS – an Oracle-free open-source variant of Solaris –
> is being killed after five years of work.
> """
> 
> The author then reports several quotes from OmniTI chief executive
> Robert Treat to support that assertion.
> On the other hand, by what I saw on the mailing list the project _seems_
> still to be active.
> 
> 
> Could you give more informations about the current state of the project?
> 
> 
> Thanks in advance for your time,

While "OmniOS" is no longer being developed by OmniTI, its development
has been taken over by the community (at OmniTI's request, actually) and
can be found under http://www.omniosce.org with new releases still ongoing.

Cheers,
-- 
Saso



signature.asc
Description: OpenPGP digital signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] r151014 on top of Linux KVM spontaneously reboots

2016-10-20 Thread Saso Kiselkov

On 10/19/16 11:02 PM, Dan McDonald wrote:
> Not sure how hard it'd be, but can you see if 018 has similar problems?

Managed to get this resolved by upping amount of vram. My suspicion was
roused because I saw a shell command return with "fork: not enough
space". After I went from 2G to 4G of vram and swap from 1g to 2g, full
build /w lint went through without a hitch, twice. Maybe just increasing
swap would have been enough, at the expense of the speed of the build.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] r151014 on top of Linux KVM spontaneously reboots

2016-10-19 Thread Saso Kiselkov

On 10/19/16 11:02 PM, Dan McDonald wrote:
> Not sure how hard it'd be, but can you see if 018 has similar problems?

Tried it, still happens. After about 10 minutes of building, VM
spontaneously reboots.

-- 
Saso

>> On Oct 19, 2016, at 4:55 PM, Saso Kiselkov  wrote:
>>
>> So I've been trying to get illumos-gate built on a fresh r151014 install
>> in a Linux KVM host (Ubuntu 16.04 64-bit). It's running on a standard
>> qcow2 16GB DATA disk, 2GB of vRAM, 4 cores on the Core i5 CPU and
>> console redirected to serial. Other than that, no tuning. The problem
>> is, when I start the gate build process, randomly after about 15-30
>> minutes of building, the machine just spontaneously reboots. No panic,
>> no log message in libvirt, nothing (even when kmdb was loaded). Out of
>> nothing, the VM just hits a reset and reappears in the bootloader.
>>
>> Has anybody else seen this?
>>
>> Cheers,
>> -- 
>> Saso
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] r151014 on top of Linux KVM spontaneously reboots

2016-10-19 Thread Saso Kiselkov

On 10/19/16 11:02 PM, Dan McDonald wrote:
> Not sure how hard it'd be, but can you see if 018 has similar problems?

I'm gonna try to give it a go.

-- 
Saso

>> On Oct 19, 2016, at 4:55 PM, Saso Kiselkov  wrote:
>>
>> So I've been trying to get illumos-gate built on a fresh r151014 install
>> in a Linux KVM host (Ubuntu 16.04 64-bit). It's running on a standard
>> qcow2 16GB DATA disk, 2GB of vRAM, 4 cores on the Core i5 CPU and
>> console redirected to serial. Other than that, no tuning. The problem
>> is, when I start the gate build process, randomly after about 15-30
>> minutes of building, the machine just spontaneously reboots. No panic,
>> no log message in libvirt, nothing (even when kmdb was loaded). Out of
>> nothing, the VM just hits a reset and reappears in the bootloader.
>>
>> Has anybody else seen this?
>>
>> Cheers,
>> -- 
>> Saso
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] r151014 on top of Linux KVM spontaneously reboots

2016-10-19 Thread Saso Kiselkov

So I've been trying to get illumos-gate built on a fresh r151014 install
in a Linux KVM host (Ubuntu 16.04 64-bit). It's running on a standard
qcow2 16GB DATA disk, 2GB of vRAM, 4 cores on the Core i5 CPU and
console redirected to serial. Other than that, no tuning. The problem
is, when I start the gate build process, randomly after about 15-30
minutes of building, the machine just spontaneously reboots. No panic,
no log message in libvirt, nothing (even when kmdb was loaded). Out of
nothing, the VM just hits a reset and reappears in the bootloader.

Has anybody else seen this?

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [discuss] Re: [networking] rge_intr troubles

2016-10-02 Thread Saso Kiselkov

But we already do an rge_receive on receiving NO_RXDESC_INT. The code
sequence is:

 1) get interrupt status word (line 1479)
 2) if interrupt is RX_FIFO_OVERFLOW_INT, modify int_mask variable to
exclude further RX_FIFO_OVERFLOW_INT (line 1488)
 3) otherwise if interrupt contains NO_RXDESC_INT (among other things),
modify int_mask to include RX_FIFO_OVERFLOW_INT (line 1495)
 4) if interrupt contains NO_RXDESC_INT (among other things), do
rge_receive (line 1510)
 ...
 5) update interrupt mask according to steps 2 & 3.

So after receiving the interrupt, the first thing we pretty much do is
rge_receive anyway. I'm sorry I can't be of more help here, but this is
all completely foreign to me. I suspect the problem has more to do with
the fact that we never really properly dequeue packets from the RX ring,
so once the NIC runs out of descriptors, it just keeps on notifying us
that it needs more RX descriptors. The reason I think is that this
problem appears to be triggered strictly by the number of packets received.

-- 
Saso

On 10/2/16 10:37 PM, Garrett D'Amore wrote:
> so do the rx logic first even if the fifo overflow is set.  im not in front 
> of the code but refer to the if-else block I referenced earlier. 
> 
> Sent from my iPhone
> 
>> On Oct 2, 2016, at 1:32 PM, Saso Kiselkov  wrote:
>>
>> I'm willing to test anything you suggest.
>>
>> -- 
>> Saso
>>
>>> On 10/2/16 10:09 PM, Garrett D'Amore wrote:
>>> ah so maybe we need to change the logic so that the exhaustion of 
>>> descriptors takes precedence over the rx fifo overrun.  
>>>
>>> Sent from my iPhone
>>>
>>>> On Oct 2, 2016, at 12:45 PM, Saso Kiselkov  wrote:
>>>>
>>>> Thanks for the suggestions! Results below:
>>>>
>>>>> On 10/2/16 7:20 PM, Garrett D'Amore wrote:
>>>>> But as a first test, you can try calling rge_receive().  The simplest
>>>>> way I can see to do that is to OR in the value of RGE_NO_RXDESC_INT in
>>>>> the check at 1495. (Btw the ordering of the checks at 1488 and 1495 are
>>>>> suboptimal, as the rx interrupt should be the *hot* code path.
>>>>
>>>> Sadly, this won't help. RGE_RX_INT is already a composite of
>>>> RX_OK_INT | RX_ERR_INT | NO_RXDESC_INT
>>>>
>>>>> One thing you might also try doing is changing the value in rge.h for
>>>>> RGE_RECV_COPY_SIZE from 256 to something much larger — 8000 will be
>>>>> larger than the largest possible rge frame.  I have a theory that part
>>>>> of the problem you are encountering may be due to being out of buffers
>>>>> to loan up, and the screwy handling for that case.
>>>>
>>>> Tried this, but to no avail. It still goes into the sad place. Sadly,
>>>> this stupid box doesn't even have a serial port with which I could
>>>> provide you access. And I'm out of PCI-e slots to shove in an Intel NIC,
>>>> so the on-board Realtek POS is my last option. Well that, or using Linux
>>>> with Illumos as a KVM on top of it, but even saying that out loud leaves
>>>> a bad taste in my mouth...
>>>>
>>>> Cheers,
>>>> --
>>>> Saso
>>
>>
> 
> 
> ---
> illumos-discuss
> Archives: https://www.listbox.com/member/archive/182180/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182180/22722377-e9306e56
> Modify Your Subscription: 
> https://www.listbox.com/member/?member_id=22722377&id_secret=22722377-08ac87bf
> Powered by Listbox: http://www.listbox.com
> 

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [discuss] Re: [networking] rge_intr troubles

2016-10-02 Thread Saso Kiselkov

I'm willing to test anything you suggest.

-- 
Saso

On 10/2/16 10:09 PM, Garrett D'Amore wrote:
> ah so maybe we need to change the logic so that the exhaustion of descriptors 
> takes precedence over the rx fifo overrun.  
> 
> Sent from my iPhone
> 
>> On Oct 2, 2016, at 12:45 PM, Saso Kiselkov  wrote:
>>
>> Thanks for the suggestions! Results below:
>>
>>> On 10/2/16 7:20 PM, Garrett D'Amore wrote:
>>> But as a first test, you can try calling rge_receive().  The simplest
>>> way I can see to do that is to OR in the value of RGE_NO_RXDESC_INT in
>>> the check at 1495. (Btw the ordering of the checks at 1488 and 1495 are
>>> suboptimal, as the rx interrupt should be the *hot* code path.
>>
>> Sadly, this won't help. RGE_RX_INT is already a composite of
>> RX_OK_INT | RX_ERR_INT | NO_RXDESC_INT
>>
>>> One thing you might also try doing is changing the value in rge.h for
>>> RGE_RECV_COPY_SIZE from 256 to something much larger — 8000 will be
>>> larger than the largest possible rge frame.  I have a theory that part
>>> of the problem you are encountering may be due to being out of buffers
>>> to loan up, and the screwy handling for that case.
>>
>> Tried this, but to no avail. It still goes into the sad place. Sadly,
>> this stupid box doesn't even have a serial port with which I could
>> provide you access. And I'm out of PCI-e slots to shove in an Intel NIC,
>> so the on-board Realtek POS is my last option. Well that, or using Linux
>> with Illumos as a KVM on top of it, but even saying that out loud leaves
>> a bad taste in my mouth...
>>
>> Cheers,
>> --
>> Saso
>>
> 
> 
> ---
> illumos-discuss
> Archives: https://www.listbox.com/member/archive/182180/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182180/22722377-e9306e56
> Modify Your Subscription: 
> https://www.listbox.com/member/?member_id=22722377&id_secret=22722377-08ac87bf
> Powered by Listbox: http://www.listbox.com
> 

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [discuss] Re: [networking] rge_intr troubles

2016-10-02 Thread Saso Kiselkov

Thanks for the suggestions! Results below:

On 10/2/16 7:20 PM, Garrett D'Amore wrote:
> But as a first test, you can try calling rge_receive().  The simplest
> way I can see to do that is to OR in the value of RGE_NO_RXDESC_INT in
> the check at 1495. (Btw the ordering of the checks at 1488 and 1495 are
> suboptimal, as the rx interrupt should be the *hot* code path.

Sadly, this won't help. RGE_RX_INT is already a composite of
RX_OK_INT | RX_ERR_INT | NO_RXDESC_INT

> One thing you might also try doing is changing the value in rge.h for
> RGE_RECV_COPY_SIZE from 256 to something much larger — 8000 will be
> larger than the largest possible rge frame.  I have a theory that part
> of the problem you are encountering may be due to being out of buffers
> to loan up, and the screwy handling for that case.

Tried this, but to no avail. It still goes into the sad place. Sadly,
this stupid box doesn't even have a serial port with which I could
provide you access. And I'm out of PCI-e slots to shove in an Intel NIC,
so the on-board Realtek POS is my last option. Well that, or using Linux
with Illumos as a KVM on top of it, but even saying that out loud leaves
a bad taste in my mouth...

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [discuss] Re: [networking] rge_intr troubles

2016-10-02 Thread Saso Kiselkov

Well what I know so far is that after a fixed number of packets
(probably after filling up the RX ring), we get a storm of
RX_FIFO_OVERFLOW_INT | NO_RXDESC_INT interrupts, probably because the
adapter is telling us that it filled up the RX ring and we didn't let it
know that we dequeued the old packets. Unfortunately, for the life of
me, I can't figure out how we're supposed to let it know that. I've been
staring at the drivers (both our and FreeBSD's) for hours and to me it's
all just a jumble of "DMA sync this, write reg that".

-- 
Saso

On 10/2/16 3:40 AM, Garrett D'Amore wrote:
> probably we should do something.  like reap the descriptors. i am afk but the 
> usual strategy is to treat these kinds of interrupts just like normal rx. 
> after that you should ack the interrupt of course. 
> 
> Sent from my iPhone
> 
>> On Oct 1, 2016, at 6:31 PM, Saso Kiselkov  wrote:
>>
>>> On 10/2/16 12:23 AM, Robert Mustacchi wrote:
>>>> On 10/1/16 15:15 , Saso Kiselkov wrote:
>>>>> On 10/1/16 11:45 PM, Dale Ghent wrote:
>>>>>
>>>>>> On Oct 1, 2016, at 3:36 PM, Saso Kiselkov  wrote:
>>>>>>
>>>>>> So I'm playing around with a box that has an on-board Realtek NIC and
>>>>>> periodically, about once every 2-5 minutes, the network just goes out to
>>>>>> lunch and stops responding to ping or attempts to send anything from
>>>>>> the box. I noticed that while doing so, the box is getting floored by
>>>>>> interrupts from the NIC, so I see tons of rge_intr activity and one CPU
>>>>>> core receiving about 16 interrupts per second (other cores are idle).
>>>>>
>>>>> One core getting all the interrupts is expected, as both these chips and 
>>>>> the driver do not support RSS.
>>>>>
>>>>> The key thing here is to see what rge_intr() is actually doing. It has 2 
>>>>> outcomes: It identifies the interrupt type, processes it, then returns to 
>>>>> the DDI that it was claimed. IF it doesn't identify the interrupt, 
>>>>> rge_intr() returns and reports unclaimed to the DDI.
>>>>>
>>>>> Knowing this info would be a good first step in figuring out what's going 
>>>>> on.
>>>>
>>>> Gah, I'm an idiot, it's actually a bitmask of two things:
>>>>
>>>> RX_FIFO_OVERFLOW_INT | NO_RXDESC_INT
>>>>
>>>> Apparently, we don't give it enough rx descriptors. Trying to now figure
>>>> out where to change that...
>>>
>>> There'll always be cases where we don't have enough rx descriptors for
>>> devices. Presumably we shouldn't actually care about receiving that
>>> interrupt. Do you happen to have a specification for the device handy?
>>>
>>> Given that we're not doing anything with the NO_RXDESC_INT, we probably
>>> should just mask it on the device if possible.
>>
>> Just as a general FYI, I'm dealing with 8168G version of the MAC.
>> FreeBSD does have a driver that supports it, but since the driver there
>> appears home-grown (similar to ours), trying to transplant it would be a
>> major undertaking. I'll try to identify the major differences between
>> the versions we support and the 8168G, but of course, this being
>> hardware, they are many and few of them make any logical sense.
>>
>> --
>> Saso
>>
> 
> 
> ---
> illumos-networking
> Archives: https://www.listbox.com/member/archive/182193/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182193/22721964-fe287663
> Modify Your Subscription: 
> https://www.listbox.com/member/?member_id=22721964&id_secret=22721964-d1c6dd60
> Powered by Listbox: http://www.listbox.com
> 

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [discuss] Re: [networking] rge_intr troubles

2016-10-01 Thread Saso Kiselkov

On 10/1/16 11:45 PM, Dale Ghent wrote:
> 
>> On Oct 1, 2016, at 3:36 PM, Saso Kiselkov  wrote:
>>
>> So I'm playing around with a box that has an on-board Realtek NIC and
>> periodically, about once every 2-5 minutes, the network just goes out to
>> lunch and stops responding to ping or attempts to send anything from
>> the box. I noticed that while doing so, the box is getting floored by
>> interrupts from the NIC, so I see tons of rge_intr activity and one CPU
>> core receiving about 16 interrupts per second (other cores are idle).
> 
> One core getting all the interrupts is expected, as both these chips and the 
> driver do not support RSS.
> 
> The key thing here is to see what rge_intr() is actually doing. It has 2 
> outcomes: It identifies the interrupt type, processes it, then returns to the 
> DDI that it was claimed. IF it doesn't identify the interrupt, rge_intr() 
> returns and reports unclaimed to the DDI.
> 
> Knowing this info would be a good first step in figuring out what's going on.

Gah, I'm an idiot, it's actually a bitmask of two things:

RX_FIFO_OVERFLOW_INT | NO_RXDESC_INT

Apparently, we don't give it enough rx descriptors. Trying to now figure
out where to change that...

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [discuss] Re: [networking] rge_intr troubles

2016-10-01 Thread Saso Kiselkov

On 10/1/16 11:45 PM, Dale Ghent wrote:
> 
>> On Oct 1, 2016, at 3:36 PM, Saso Kiselkov  wrote:
>>
>> So I'm playing around with a box that has an on-board Realtek NIC and
>> periodically, about once every 2-5 minutes, the network just goes out to
>> lunch and stops responding to ping or attempts to send anything from
>> the box. I noticed that while doing so, the box is getting floored by
>> interrupts from the NIC, so I see tons of rge_intr activity and one CPU
>> core receiving about 16 interrupts per second (other cores are idle).
> 
> One core getting all the interrupts is expected, as both these chips and the 
> driver do not support RSS.
> 
> The key thing here is to see what rge_intr() is actually doing. It has 2 
> outcomes: It identifies the interrupt type, processes it, then returns to the 
> DDI that it was claimed. IF it doesn't identify the interrupt, rge_intr() 
> returns and reports unclaimed to the DDI.
> 
> Knowing this info would be a good first step in figuring out what's going on.

It's a whole buttload of RT_93c46_COMMOND_REG (0x50) interrupts.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [discuss] Re: [networking] rge_intr troubles

2016-10-01 Thread Saso Kiselkov

On 10/1/16 11:45 PM, Dale Ghent wrote:
> 
>> On Oct 1, 2016, at 3:36 PM, Saso Kiselkov  wrote:
>>
>> So I'm playing around with a box that has an on-board Realtek NIC and
>> periodically, about once every 2-5 minutes, the network just goes out to
>> lunch and stops responding to ping or attempts to send anything from
>> the box. I noticed that while doing so, the box is getting floored by
>> interrupts from the NIC, so I see tons of rge_intr activity and one CPU
>> core receiving about 16 interrupts per second (other cores are idle).
> 
> One core getting all the interrupts is expected, as both these chips and the 
> driver do not support RSS.
> 
> The key thing here is to see what rge_intr() is actually doing. It has 2 
> outcomes: It identifies the interrupt type, processes it, then returns to the 
> DDI that it was claimed. IF it doesn't identify the interrupt, rge_intr() 
> returns and reports unclaimed to the DDI.
> 
> Knowing this info would be a good first step in figuring out what's going on.

Btw: current working on modifying the driver to give us int_status
through dtrace, so I know what kind of interrupt we're dealing with.

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] rge_intr troubles

2016-10-01 Thread Saso Kiselkov

On 10/1/16 11:45 PM, Michael Rasmussen wrote:
> On Sat, 1 Oct 2016 23:32:32 +0200
> Saso Kiselkov  wrote:
> 
>>
>> Unfortunately, setting ip:dohwcksum=0 in /etc/system didn't help.
>>
> You don't have a left-over intel nic to plug in?

Sadly no. I might end up buying one, but first I'd like to try getting
this to work.

-- 
Saso




signature.asc
Description: OpenPGP digital signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [discuss] Re: [networking] rge_intr troubles

2016-10-01 Thread Saso Kiselkov

On 10/1/16 11:45 PM, Dale Ghent wrote:
> 
>> On Oct 1, 2016, at 3:36 PM, Saso Kiselkov  wrote:
>>
>> So I'm playing around with a box that has an on-board Realtek NIC and
>> periodically, about once every 2-5 minutes, the network just goes out to
>> lunch and stops responding to ping or attempts to send anything from
>> the box. I noticed that while doing so, the box is getting floored by
>> interrupts from the NIC, so I see tons of rge_intr activity and one CPU
>> core receiving about 16 interrupts per second (other cores are idle).
> 
> One core getting all the interrupts is expected, as both these chips and the 
> driver do not support RSS.
> 
> The key thing here is to see what rge_intr() is actually doing. It has 2 
> outcomes: It identifies the interrupt type, processes it, then returns to the 
> DDI that it was claimed. IF it doesn't identify the interrupt, rge_intr() 
> returns and reports unclaimed to the DDI.
> 
> Knowing this info would be a good first step in figuring out what's going on.

Every time, we're returning through the bottom of the function,
returning DDI_INTR_CLAIMED. So the interrupt is meant for us.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] rge_intr troubles

2016-10-01 Thread Saso Kiselkov

On 10/1/16 11:32 PM, Michael Rasmussen wrote:
> On Sat, 1 Oct 2016 23:21:40 +0200
> Saso Kiselkov  wrote:
> 
>> On 10/1/16 11:13 PM, Michael Rasmussen wrote:
>>> hardware offloading  
>>
>> Just found the article that mentions dohwcksum. Set it to 0 and testing
>> now. Any other tunables I should be aware of?
>>
> LRO: https://docs.oracle.com/cd/E53394_01/html/E54790/goryb.html
>

LRO isn't in Illumos' rge driver, at least the option doesn't appear to
be in rge's dladm show-linkprop.

-- 
Saso



signature.asc
Description: OpenPGP digital signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] rge_intr troubles

2016-10-01 Thread Saso Kiselkov

On 10/1/16 11:13 PM, Michael Rasmussen wrote:
> On Sat, 1 Oct 2016 21:36:23 +0200
> Saso Kiselkov  wrote:
> 
>>
>> Not being a driver person myself, I have no idea where to look next.
>> I've attached prtconf -v and lspci -vvvxx output for PCI IDs and config
>> info. Also attached are screenshots of mpstat and dtrace profile at
>> 997hz running on the CPU that's bogged down with interrupts.
>>
> Have you tried to disable csum and hardware offloading?

Unfortunately, setting ip:dohwcksum=0 in /etc/system didn't help.

-- 
Saso



signature.asc
Description: OpenPGP digital signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] rge_intr troubles

2016-10-01 Thread Saso Kiselkov

On 10/1/16 11:13 PM, Michael Rasmussen wrote:
> hardware offloading

Just found the article that mentions dohwcksum. Set it to 0 and testing
now. Any other tunables I should be aware of?

Cheers,
-- 
Saso



signature.asc
Description: OpenPGP digital signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Direct SAS or Expander?

2016-08-18 Thread Saso Kiselkov

On 8/19/16 1:02 AM, Bob Friesenhahn wrote:
> I am looking at a 16-bay SuperMicro chassis.  There is the option of
> using a SAS HBA with 16 channels (e.g. Avago SAS 9300-16i) and no
> expander or a SAS HBA with 4 channels and an expander.  Most drives
> would be SAS but I might want to fit a couple of SATA SSDs.
> 
> Is there a strong technical reason (performance, reliability, uptime) to
> prefer one or the other with OmniOS and zfs?
> 
> The main technical issue I am already aware of is that one should not
> put SATA devices behind an expander.
> 
> I am leaning toward the 16 channels and no expander solution since it
> feels better from a failure-mode standpoint, because it lessens
> contention, and because it should allow use of SATA SSDs.
> 
> Bob

Direct. SATA SSDs and SAS drives on the same expander is asking for
trouble. I've got experiences with locked up expanders, requiring a hard
power cycle of the enclosure. Avoid like the plague.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Saso Kiselkov

On 9/14/15 9:40 PM, Matthew Lagoe wrote:
> Also I believe the compression is not threaded as well as it could be so you
> may be limited by the single core performance of your machine.

It is multi-threaded.

Cheers,
-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Saso Kiselkov

On 9/14/15 9:18 PM, Doug Hughes wrote:
> That does seem to keep performance at much closer to parity. It still
> seems about 70-80% of peak vs what I was seeing before, but not that
> 100MB/sec bottleneck.

Well, that's the reality of compression. Even the compressibility check
is not free, but it's a lot less of an impact with lz4 than with lzjb.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Saso Kiselkov

On 9/14/15 9:05 PM, Doug Hughes wrote:
> Probably something for Illumos, but you guys may have seen this or may
> like to know.
> 
> I've got a 10g connected Xyratex box running OmniOS, and I noticed that
> no matter how many streams (1, 2, 3) I only get 100MB/sec write
> throughput and it just tops out. Even with 1 stream. This is with the
> default lzjb compression on (fast option).
> 
> I turned off compression and have 2 streams running now and am getting
> about 250-600MB/sec in aggregate. Much better!
> 
> The compress ratio was only 1.02x - 1.03x so it's no great loss on this
> data. I just thought the 100MB/sec speed limit was interesting.

Try setting compression=lz4. It should perform much, much better than
lzjb on incompressible data.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] CPU advice

2015-06-08 Thread Saso Kiselkov

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 6/8/15 8:08 PM, Michael Rasmussen wrote:
> Hi all,
> 
> I am considering to build a new small ZFS storage server which
> will host 4 to 8 SATA disks connected on a HBA and SSD for log and
> cache connected through on-board SATA 3. Motherboard is Asrock
> E3C224 with 2x8 GB ECC RAM.
> 
> I have these 3 CPU's in consideration: 1) Pentium G3450T 2) Core i3
> I3-4130T 3) Xeon E3-1220V3
> 
> Taking the price tag and power consumption into consideration what 
> would be the optimal choice?
> 
> What performance can I expect from the above listed CPU's?
> 
> If performance is also taking into consideration will that change
> the optimal choice?

Any one of these is a fine choice, even the G3450T is plenty fast for
most "small user" needs and will happily saturate a couple of 1Gbit/s
links with data. And all of the models you listed support ECC - wise
choice. For me, I'd get the G3450T and spend the extra money on an
extra 16GB of memory, or maybe a decent SAS HBA.

Cheers,
- -- 
Saso
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJVdfWkAAoJEAREJfuiSrcaPJoP/3vTtEhFWzjkIg2ZWJNVqHg5
9wSNg3IuIUHtyTv/mtoEWSK5FvwwDGuO2LKKiOOc++xv5eFrRIB0OGWZHQKSTGuh
Kna3sq0mAYIG0liw1l9B8HKzPhpJfUFyHIFJ8L4VqUz6K+sTv/4btXJT9rG1Ed1f
EEJslqOvymj6O33kSlAUiuDA6pZFQURVR1508PoRHFfXKuLMww5kFutzPASHYhcm
Z5kDF0vm1XaEjTUtCSEsyQgC0k8ZLashXnAHRuuC6W7lCuwcBTG8q+tVFUqxPm+k
WxgOthhXTNxroK2k3tp1X3bfDdfc64NJHNgIwD8sjUokFDzgtaRCQrirJnTeM6R2
Cl1ORfdNE6FAKaRX7P2jgFZUuRMGZRfVmksqfp2pcbQsQzcddt4bD2Ryz8wqN50x
rQUl+cQGiewpb1jfslp3wZBWkDYxoPALZQS4nweMH9yA8HL8zxTDlW3p8eK2vQ4E
vgu+xrixSvLn6oDx7pxOVg7EnoRe0Lr0o0dsP7yE/y2IXl+ythcsjNMEPBlMl5f7
CpHs7HeekX0AlcE3fhhKyI4y2KIL8vOURu/ENhgfM2pujjsCAnh9se5WeeKuCre/
1JG/s/oqH7KeGo6WU1nlzKYTCnqJzl82LeUcVQ2s36doOJZGPRVIHUYt1T3ZBzwW
0khd2iK/WrgofuhIPIAp
=cnhA
-END PGP SIGNATURE-
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] What repos do people use to build a *AMP server?

2015-05-08 Thread Saso Kiselkov

On 5/8/15 6:56 PM, Schweiss, Chip wrote:
> I've done really well with the OpenCSW packages on OmniOS.

Thanks, seems to be working pretty well. Still, lamentable that there's
no IPS mirrors around (although, given how IPS can be obnoxious, I'm not
surprised).

Cheers,
-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] What repos do people use to build a *AMP server?

2015-05-08 Thread Saso Kiselkov

I've decided to try and update my r151006 box to something newer, seeing
as r151014 just came out and it's supposed to be LTS. Trouble is, I'm
trying to build a *AMP box and I can't find any prebuilt packages for it
in any of these repos:
http://omnios.omniti.com/wiki.php/Packaging
What do you guys use for getting pre-built software? Do all people here
just roll their own?

Also, allow me to say, I *hate* consolidations and the way they lock
accessible package versions. Where are the days when OSes used to be
backwards-compatible?

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Active-Active vSphere

2014-11-28 Thread Saso Kiselkov

On 11/28/14 3:34 PM, Rune Tipsmark wrote:
> Okay, i noticed alua support is not enabled per default on omnios. How do i 
> enable that?

The process is complicated, not very well documented and it isn't meant
for your scenario anyway. If you only have one storage node, then you
don't need ALUA, even if you have multiple connections to the node. So
no need to worry about it.

Cheers,
-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Active-Active vSphere

2014-11-27 Thread Saso Kiselkov

On 11/27/14 11:40 PM, Rune Tipsmark wrote:
> so to simplify, say we have one esxi host that has two FC ports, one 
> omnios/zfs server has two FC ports as well.
> Should it run ALUA with Round Robin instead of the default ALUA with MRU 
> (most recently used). RR has traffic on both paths (says Active I/O on both) 
> and MRU only on one...

Ah, now this is a different story altogether, if it's just one storage
head node. In that case, Round-Robin will of course work exactly as
expected, be it over FC or iSCSI and any number of links. The important
thing is that the backing ZFS pool is only imported on one machine.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Active-Active vSphere

2014-11-27 Thread Saso Kiselkov

On 11/27/14 3:35 PM, Rune Tipsmark wrote:
> Hi guys,
> 
> Does anyone know if Active/Active and Round Robin is supported from
> vSphere towards OmniOS ZFS on Fiber Channel?

The short answer is: yes, but you wouldn't want to employ Round-Robin on
it. A ZFS pool can only be imported on a single node, never
simultaneously on two or more. However, using ALUA it's possible to make
the LUs on it visible from two OmniOS nodes - this is NOT round robin,
though. The initiator will see two paths to the LU, but only one should
be active at any one time (the one that has the pool holding the LU
imported). Access to the LU over the secondary target will be possible,
but slow. Upon failover, the secondary would grab the pool and become
the preferred ALUA path, so it all works out OK. Google "ALUA" for more
on the theory behind it.

It is a complicated to set up, though, so be aware of that.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] SolarFlare SFXGE with OmniOS

2014-09-25 Thread Saso Kiselkov

On 9/25/14, 9:35 PM, Doug Hughes wrote:
> With non-debug driver:
> 9.71 gbits/sec inbound
> 8.54 gbits/sec outboudn
> 
> 512k tcp window size. same 10g switch stack, but I can't guarantee that
> it's not traversing the stack 40g link between the two stack members.
> one side is a Sol10U10 box, the other is this x4440 running OmniOS. 
> Both have Solarflare cards setup in LACP to the switch stack.
> 
> I'd say this is ok to release/bundle!
> 
> (I'd be happy to volunteer a sfxge.conf for this, also we should use the
> Solaris postinstall or a variant of it to do the correct add-drv and
> driver_aliases stuff.)

Great. If you have customizations to sfxge.conf that are supposed to be
included, post 'em. Illumos doesn't really deal with packaging, so bit
is going to be dealt with by the distro maintainers.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] SolarFlare SFXGE with OmniOS

2014-09-19 Thread Saso Kiselkov

On 9/20/14, 1:23 AM, Mark wrote:
> On 20/09/2014 5:13 a.m., Dan McDonald wrote:
>>
>> On Sep 19, 2014, at 11:44 AM, Doug Hughes  wrote:
>>
>>> Is anybody using Solarflare 10G cards with OmniOS. I have a 2010
>>> vintage of Openindiana (yeah, I should really upgrade that - it's a
>>> test box) running with the Sol10 production sfxge driver and working
>>> flawlessly for years. It tried the sfxge package for Sol11 and for
>>> Sol10 on OmniOS r151012 (yeah, I know it's not quite ready) and it
>>> crashes and core dumps as soon as I send any traffic on the nic, even
>>> a ping.
>>>
>>> Curious if anybody else is using them and having luck?
>>
>> Without source you'll be out of luck.
> 
> 
> Source at http://cr.illumos.org/~webrev/rincebrain/illumos-sfxge/
> 

Ok, got it to build cleanly, code at:

https://github.com/skiselkov/illumos-gate/tree/sfxge

Unique commits in that branch:

1c3fe595 - initial commit of the sfxge pretty much as-is from the webrev
(plus a few formatting fixes to get git to shut up about
trailing whitespace)

efb39dd8 - build & lint cleaning to get it to build with our gcc, plus
one bugfix noted below

I'm not going to bother with cstyle fixes for foreign code, there's 1200
offending lines and it would jeopardize upstream-mergeability anyway.

Willing testers with sfxge hardware who don't wanna muck around with
manual building can grab a pre-built version at:
http://37.153.99.61/sfxge.tar.gz. To install & use, do:

# tar xzf sfxge.tar.gz
# beadm create sfxge_test
# beadm mount sfxge_test
Mounted successfully on: ''
# cp -r sfxge/sfxge.conf sfxge/debug/* /kernel/drv
# bootadm update-archive -R 
# reboot -fe sfxge_test

And take her out for a good spin. If testing proves this card to work
well, I can post a code review & RTI.

==

Lint found one potential bug (which I've fixed) that might be worth
reporting back to Solarflare:

sfxge_gld_v3.c:680: contains a line like this:

if ((rc = sfxge_ev_moderation_set(sp, (unsigned int) val) != 0))

The double brace at the end is incorrect, "rc" will get assigned the
result of the "sfxge_ev_moderation_set() != 0" comparison. I'm fairly
confident this is not what the author intended.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] SolarFlare SFXGE with OmniOS

2014-09-19 Thread Saso Kiselkov

On 9/20/14, 1:14 AM, Mark wrote:
> On 20/09/2014 5:13 a.m., Dan McDonald wrote:
>>
>> On Sep 19, 2014, at 11:44 AM, Doug Hughes  wrote:
>>
>>> Is anybody using Solarflare 10G cards with OmniOS. I have a 2010
>>> vintage of Openindiana (yeah, I should really upgrade that - it's a
>>> test box) running with the Sol10 production sfxge driver and working
>>> flawlessly for years. It tried the sfxge package for Sol11 and for
>>> Sol10 on OmniOS r151012 (yeah, I know it's not quite ready) and it
>>> crashes and core dumps as soon as I send any traffic on the nic, even
>>> a ping.
>>>
>>> Curious if anybody else is using them and having luck?
>>
>> Without source you'll be out of luck.
>>
>> The Generic Lan Driver (GLDv3) interface is *slightly* different on
>> recent-vintage illumos.  A modern OI (like hipster) would likely
>> exhibit the same problems.  If you want Solarflare cards, you'll need
>> to get them to cough up the source.
>>
>> Dan
>>
> See https://www.illumos.org/issues/4057

Appears to have stalled despite an offer of help from Robert. Maybe it
wasn't production ready? Does anybody know if Rich published the source?
Maybe I can get it built so people with sfxge hardware can get testing.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] announcement znapzend a new zfs backup tool

2014-07-29 Thread Saso Kiselkov

On 7/29/14, 5:50 PM, Tobias Oetiker wrote:
> Just out:
> 
>  ZnapZend a Multilevel Backuptool for ZFS
> 
> It is on Github. Check out
> 
>  http://www.znapzend.org

Neat, especially the feature that the backup config is part of a
dataset's properties. Very cool.

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] HP Microserver G8

2014-06-25 Thread Saso Kiselkov

On 6/25/14, 1:33 PM, Olaf Marzocchi wrote:
> If you want to increase the lifespan, just write less data, without worrying 
> about the leveling.

I'd second that. Just set compress=lz4 on your rpool and be done with it.

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] HP Microserver G8

2014-06-25 Thread Saso Kiselkov

On 6/25/14, 1:12 PM, Nicolas Di Gregorio wrote:
> I was able to install to the microsd card too. Does anyone has a kind of
> best pratice to maximize the lifetime of it? Like moving logs somewhere
> else etc.

With ZFS and its copy-on-write nature, I wouldn't really worry about it
all that much. ZFS doesn't spread the load completely evenly for sure,
but it does help somewhat.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Status of TRIM support?

2014-06-05 Thread Saso Kiselkov

On 6/5/14, 5:06 AM, Dan Swartzendruber wrote:
> 
>> Second this. The DC S3700 are very good.
> 
> Okay, so far so good.  100MB s3700 came today.  threw it in the tank pool
> as log device, and set sync=standard.  Re-ran crystaldiskmark and get
> 93MB/sec writes.  Given that reads are running 106MB/sec, I think it's
> time to call this a win...  Thanks all for the advice...

Glad to hear that. Just remember to regularly do zpool scrub since a log
device is pretty much write-only until it is needed to save your behind
after a crash/power outage, so if you don't regularly check its health,
you may not know if it's any good.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Status of TRIM support?

2014-05-29 Thread Saso Kiselkov

On 5/29/14, 5:48 PM, Dan McDonald wrote:
> 
> On May 29, 2014, at 11:25 AM, Doug Hughes  wrote:
>>
>> The higher price is the reason I tend to prefer the 320 series that come in 
>> around $1/GB and have smaller sizes available. I use them for OS + slog.
> 
> What about the S3500?  I've heard that's more the drop-in replacement for the 
> 320 series.
> 
> (ObDisclosure:  I use a pair as part-rpool/part-mirrored-slog for my home 
> server.  Blog post about HDC2.0 coming RSN.)

The DC S3500 is reported to have only about 1/3 - 1/2 the performance of
the DC S3700, see:
http://www.anandtech.com/show/7065/intel-ssd-dc-s3500-review-480gb-part-1/3
May be more than enough for SOHO, though.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Status of TRIM support?

2014-05-28 Thread Saso Kiselkov

On 5/28/14, 4:08 PM, Schweiss, Chip wrote:
> Intel has several SATA SSDs with proper super-cap protected caches that
> make good log devices.

I'd recommend looking at a Intel DC S3700. The 200 GB or 400 GB
varieties promise ~3 4k random write IOPS and actually seem to deliver:
http://www.anandtech.com/show/7065/intel-ssd-dc-s3500-review-480gb-part-1/3
They're also not so expensive that it'll break your bank:
http://www.amazon.com/S3700-Internal-Solid-State-Drive/dp/B00A8NWD68

> If you want to do HA, you need to look at an all SAS solution, which
> is not cheap.  

I've had good results with SATA SSDs sitting behind LSI SAS interposers
in dual-path SAS JBODs. YMMV though.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Status of TRIM support?

2014-05-28 Thread Saso Kiselkov

On 5/28/14, 3:51 PM, Dan Swartzendruber wrote:
> (merging comments to Saso and Jim)
> 
> I don't think I mentioned my environment - if not, my apologies.  This is
> a SOHO/Lab setup, so things like zeusram are non-starters.  The basic
> network infrastructure is gigabit, so iSCSI ZIL would suck badly, I
> suspect.  As far as over-provisioning the 840PRO, I have it sliced for
> 16GB.  Once it's been running for awhile, I will re-run the disk
> benchmark.  I understand the 840PRO doesn't have a supercap - this was
> basically just a performance analysis to see how it stacks up compared to
> sync=disabled and on-pool ZIL.  If I go this route, I will need to look
> for a decent/affordable unit with supercap.  One other test I can try is
> with a 15K 76GB SAS 2.5-inch drive I salvaged from a dead server.  It
> should have about 1/2 the latency of a 7200rpm sata drive, and if so would
> get me up to about 40MB/sec, which is still not good, but better than
> on-pool ZIL.  I'll find out later.  I have googled a fair amount and there
> seems to be 'work in progress' for TRIM support for ZoL and illumos, but
> no real indication I could find as to when either might support it.

If you want IOPS performance out of HDDs, short-stroke the hell out of
them. The 76 GB drive should be reduced to something like 2-4 GB so that
only the outermost tracks are used. But a good power-protected SSD can
nowadays be had for relatively little and will absolutely crush the
short-stroked HDD in throughput.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Status of TRIM support?

2014-05-28 Thread Saso Kiselkov

On 5/28/14, 3:11 AM, Dan Swartzendruber wrote:
> 
> So I've been running with sync=disabled on my vsphere NFS datastore.  I've
> been willing to do so because I have a big-ass UPS, and do hourly backups.
>  But, I'm thinking of going to an active/passive connection to my JBOD,
> using Saso's blog post on zfs zfs-create.blogspot.com.  Here's why I think
> I can't keep using sync=disabled (I would love to have my logic sanity
> checked.)  If you switch manually from host A to B, all is well, since
> before host A exports the pool, any pending writes will be completed (so
> even though we lied to vsphere, it's okay.)  On the other hand, if host A
> crashes/hangs and host B takes over, forcibly importing the pool, you
> could end up with the following scenario: vsphere issues writes for blocks
> A, B, C, D and E.  A and B have been written.  C and D were sent to host
> A, and ACKed, so vsphere thinks all is well.  Host A has not yet committed
> blocks C and D to disk.  Host B imports the pool, assumes the virtual IP
> for the NFS share and vsphere reconnects to the datastore.  Since it
> thinks it has written blocks A-D, it then issues a write for block E. 
> Host B commits that to disk.  vsphere thinks blocks A-E were written to
> disk, when in fact, blocks C and D were not.  Silent data corruption, and
> as far as I can tell, no way to know this happened, so if I ever did have
> a forced failover, I would have to rollback every single VM to the last
> known, good snapshot.  Anyway, I decided to see what would happen
> write-wise with an SLOG SSD.  I took a samsung 840PRO used for l2arc and
> made that a log device.  I ran crystaldiskmark before and after.  Prior to
> the SLOG, I was getting about 90MB/sec (gigabit enet), which is pretty
> good.  Afterward, it went down to 8MB/sec!  I pulled the SSD and plugged
> it into my windows 7 workstation, formatted it and deleted the partition,
> which should have TRIM'ed it.  I reinserted it as SLOG and re-ran the
> test.  50MB/sec.  Still not great, but this is after all an MLC device,
> not SLC, and that's probably 'good enough'.  Looking at open-zfs.org, it
> looks like out of illumos, freebsd and ZoL, only freebsd has TRIM now.  I
> don't want to have to re-TRIM the thing every few weeks (or however long
> it takes).  Does over-provisioning help?

Hi Dan,

First off, the Samsung 840 Pro apparently doesn't have power loss
protection, so DON'T use it for slog (ZIL). Use some enterprise-class
SSD that has proper protection of its DRAM contents. Even better, if you
have the cash to spend, get a ZeusRAM - these are true NVRAM devices
with extremely low latency.

If you use an SSD for slog, do a secure erase on it and then partition
it so that you leave something like 1/3 of it unused and untouched by
the OS. Evidence suggests that that might dramatically improve write
IOPS consistency:
http://www.anandtech.com/show/6489/playing-with-op

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OmniOS Panic on high ZFS Write Load

2014-05-15 Thread Saso Kiselkov

On 5/15/14, 11:07 AM, Rune Tipsmark wrote:
> My server panics on high write load using VMware to provision thick disk
> to the LU over infiniband.
> 
> I get this error here http://i.imgur.com/fxk79zJ.png every time I put
> over 1.5GB/sec load on my ZFS box.
> 
> Tried various disks, controllers, omnios distributions, OI distributions
> etc.
> 
> Always the same, easy to reproduce.
> 
> Googled for ever to find anything, but nothing.
> 
> Does anyone have any idea? I don’t really want to abandon ZFS just yet.

Your system stored a crash dump on your dump device. Have a look here on
what to do with it and how to extract some meaningful info from it so
that developers can help you:
http://wiki.illumos.org/display/illumos/How+To+Report+Problems

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-22 Thread Saso Kiselkov

On 4/22/14, 10:31 PM, Schweiss, Chip wrote:
> 
> On Tue, Apr 22, 2014 at 3:17 PM, Saso Kiselkov  <mailto:skiselkov...@gmail.com>> wrote:
> 
> 
> I know, but if I understand it correctly, I need to not only disable a
> particular path, I need to disable mpath support entirely to get
> sg_write_buffer to talk to mpt_sas directly, instead of going through
> the scsi_vhci glob in the middle (which, presumably, is what's causing
> this problem). If I'm misunderstanding this, please do set me straight.
> 
> Cheers,
> --
> Saso
> 
> 
> Actually no.  Disabling a physical path works too.   That is how I
> stumbled upon the MP issue.  I plugged one of my paths into a second
> server to attempt using Linux to flash the firmware.   When the flash
> started working from the primary server, I never loaded Linux in the
> second server.
> 
> I think the problem is actually in the disk accepting firmware via
> multipath not so much the OS.  The OS throws the error when a message
> down a second path gets rejected by the drive. 

Still no luck, though it's possible I'm doing it wrong:

# mpathadm disable path -l /dev/rdsk/c9t5000C500578F774Bd0s2 \
  -i w5b8ca3a0e5029c00 -t w5000c500578f774a

# mpathadm show lu /dev/rdsk/c9t5000C500578F774Bd0s2
Logical Unit:  /dev/rdsk/c9t5000C500578F774Bd0s2
mpath-support:  libmpscsi_vhci.so
Vendor:  SEAGATE
Product:  ST2000NM0023
Revision:  0003
Name Type:  unknown type
Name:  5000c500578f774b
Asymmetric:  no
Current Load Balance:  round-robin
Logical Unit Group ID:  NA
Auto Failback:  on
Auto Probing:  NA

Paths:
Initiator Port Name:  w5b8ca3a0e5029c00
Target Port Name:  w5000c500578f774a
Override Path:  NA
Path State:  OK
Disabled:  yes

Initiator Port Name:  w5b8ca3a0e5029c00
Target Port Name:  w5000c500578f7749
Override Path:  NA
Path State:  OK
Disabled:  no

Target Ports:
Name:  w5000c500578f774a
Relative ID:  0

Name:  w5000c500578f7749
Relative ID:  0

# sg_write_buffer -v --in=MegalodonES3-SAS-STD-0004.LOD \
  --length=1625600 --mode=5 /dev/rdsk/c9t5000C500578F774Bd0
Write buffer cmd: 3b 05 00 00 00 00 18 ce 00 00
ioctl(USCSICMD) failed with os_err (errno) = 22
write buffer: pass through os error: Invalid argument
Write buffer failed res=-1

The situation is the same regardless of which path I disable. At the
point of the sg_write_buffer, I also get a single SCSI error logged by
"iostat -E", so it's clear there's something wrong going on on the SCSI
bus. I suspect it might have something to do with what you mentioned,
but I'm just no SCSI guru to figure this out.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-22 Thread Saso Kiselkov

On 4/22/14, 10:08 PM, Richard Elling wrote:
> On Apr 22, 2014, at 10:58 AM, Saso Kiselkov  <mailto:skiselkov...@gmail.com>> wrote:
> 
>> On 4/22/14, 5:03 PM, Schweiss, Chip wrote:
>>> Are you sure you have SAS multipath disabled on the disk you are trying
>>> to flash?
>>>
>>> I couldn't get these to flash at all with MP enabled.  I too kept
>>> getting OS related errors.
>>>
>>> For one system I did an stmsboot -d, for another I just pulled one of
>>> the SAS cables to each JBOD.
>>
>> Oh, you're right, hadn't considered that. I'll have to try this out,
>> even though it means downtime.
> 
> mpathadm(1m) allows you to enable/disable paths on the fly, without
> pulling cables.

I know, but if I understand it correctly, I need to not only disable a
particular path, I need to disable mpath support entirely to get
sg_write_buffer to talk to mpt_sas directly, instead of going through
the scsi_vhci glob in the middle (which, presumably, is what's causing
this problem). If I'm misunderstanding this, please do set me straight.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-22 Thread Saso Kiselkov

On 4/22/14, 5:03 PM, Schweiss, Chip wrote:
> Are you sure you have SAS multipath disabled on the disk you are trying
> to flash?
> 
> I couldn't get these to flash at all with MP enabled.  I too kept
> getting OS related errors.
> 
> For one system I did an stmsboot -d, for another I just pulled one of
> the SAS cables to each JBOD.

Oh, you're right, hadn't considered that. I'll have to try this out,
even though it means downtime.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-22 Thread Saso Kiselkov

On 4/22/14, 11:53 AM, Michael Rasmussen wrote:
> On Tue, 22 Apr 2014 11:36:03 +0200
> Saso Kiselkov  wrote:
> 
>>
>> Any ideas on what to do next?
>>
> Could you boot the system from a live linux distro and run the tools
> from it? Maybe support for linux is better.
> 
> I can recommend systemrescuecd (based on gentoo).

I can't, the system is in production.

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-22 Thread Saso Kiselkov

On 4/18/14, 10:49 PM, Schweiss, Chip wrote:
> I used Santools, which is a licensed product.  
> 
> From what I understand lsiutil and sg_buffer_write from sg3-utils can do
> it too.  The mode for sg_buffer_write may need to be set to 7 instead of
> 5 as stated in the firmware docs.
>

Sadly, I had no luck with either lsiutil or sg_write_buffer from
sg3-utils. lsiutil is only for older MPT HBAs (I have an MPT 2.0 one)
and sg_write_buffer fails with the following error:

# sg_write_buffer -v --in=MegalodonES3-SAS-STD-0004.LOD --length=1625600
--mode=5 /dev/rdsk/c9t5000C500578F774Bd0
Write buffer cmd: 3b 05 00 00 00 00 18 ce 00 00
ioctl(USCSICMD) failed with os_err (errno) = 22
write buffer: pass through os error: Invalid argument
Write buffer failed res=-1

I also tried the following device names:
  /dev/rdsk/c9t5000C500578F774Bd0p0
  /dev/dsk/c9t5000C500578F774Bd0
  /dev/dsk/c9t5000C500578F774Bd0p0

The OS also printed the following error:

WARNING: mpt_sas: coding error detected, the driver is using
ddi_dma_attr(9S) incorrectly. There is a small risk of data corruption
in particular with large I/Os. The driver should be replaced with a
corrected version for proper system operation. To disable this warning,
add 'set rootnex:rootnex_bind_warn=0' to /etc/system(4).

Staring at the code near usr/src/uts/i86pc/io/rootnex.c:3305, this means
that the driver can't submit a DMA job this large, which means that I
can't really fix this at all (this is really way outside of my field).

Any ideas on what to do next?

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-18 Thread Saso Kiselkov

On 4/18/14, 10:49 PM, Schweiss, Chip wrote:
> I used Santools, which is a licensed product.  
> 
> From what I understand lsiutil and sg_buffer_write from sg3-utils can do
> it too.  The mode for sg_buffer_write may need to be set to 7 instead of
> 5 as stated in the firmware docs.

Hey cool, didn't know sg3_utils was compilable on non-Linux systems.
Will try it out, thanks!

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-18 Thread Saso Kiselkov

On 4/18/14, 9:23 PM, Schweiss, Chip wrote:
> I've flashed 0004 to some of my Constellations so far.   The drives are
> now set at a reference temperature of 60C which is much better than 40C.  
> 
> I had to disable mulltipathing to get these disks to flash.   I'm not
> sure if this is an issue with the drive or the Supermicro JBOD.  
> 
> I disabled multipathing and I'm getting them to flash.

I'm still trying to figure out how to flash them, as the flashing tools
only seem to be available for Linux :(

Guess I'm gonna have to ask the customer to take the machine offline for
a while.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-17 Thread Saso Kiselkov

On 4/17/14, 6:27 PM, Schweiss, Chip wrote:
> 
> Use the short form of the S/N: Z1Y18H7V

Ok, thanks, didn't know there two forms... (FMA only prints one).

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-17 Thread Saso Kiselkov

On 4/17/14, 5:40 PM, Schweiss, Chip wrote:
> You can get the Seagate firmwares from this link:
> 
> https://apps1.seagate.com/downloads/request.html
> 
> Seems they don't link to this on their site any more I found it in an
> old email from their site.

I found the same form, but the damn thing can't find my drive by S/N
(Z1Y18H7VC4196NRF).

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Overheating faults with ST4000NM0023

2014-04-15 Thread Saso Kiselkov

Hi,

I've hit this exact same issue on my recent SEAGATE ST2000NM0023 drives.
Can you please direct me to where I can get the firmware package?
Perhaps we could also post the link publicly, so that people can find it
through google or some such method.

Thanks!

Best wishes,
-- 
Saso

On 2/13/14, 11:18 AM, Thibault VINCENT wrote:
> On 02/12/2014 09:59 PM, Steamer wrote:
>> Did you ever find a solution to the overheating faults with the
>> ST4000NM0023?
>>  
>> I'm currently having the exact same issue with ST1000NM0023 drives,
>> seems like seagate has the user temp probe set at 40'C. The manual
>> states that the temperature settings are programmable via smart, but I
>> haven't found a way to do that.
> 
> Hello Emile,
> 
> I've found a workaround but the definitive fix should be handled by
> Illumos I guess. There is no open ticket, first I was waiting for
> something to happen with #4051 before going back to using that distro
> and kernel.
> 
> Here's the story:
> The SCSI specification defines two registers to store the temperature
> thresholds in SMART data. One contains the recommended maximum operation
> temperature for best MTBF, and the other register is for the absolute
> maximum rating. Usually the industry has always put the same value in
> both, and that is the absolute maximum. That's why we always see
> something like 60/65°C from SMART. But recently Seagate has changed that
> because it was asked by a large OS company to comply with the
> specification for better hardware monitoring integration. The change did
> not only occur in newer products but in a firmware update for existing
> disks and that was applied to the production line which explains some
> disks mays or may not expose this problem although they are the same
> model. Our disks are of the Megalodon serie and all share the same
> firmware basecode.
> 
> So any Seagate disk will now trigger faults in FMA if they have a
> firmware with the newer policy. Also I think other brands will follow
> the same path.
> 
> Like other members suggested in that thread, maybe nothing should change
> in FMA but let's face it, you can't maintain a temperature steadily
> under 40°C in a JBOD of hundreds of busy disks. Especially in
> eco-friendly datacenters. IMHO we should not trigger a fault on the
> lower threshold, and certainly not a drive retirement. It breaks storage
> servers on reboot or before a pool import, also spare disks could
> disappear with the retirement triggered.
> 
> The workaround is to downgrade firmware to the last version before the
> change, and to reset the register with an SCSI command. It is not
> possible to set the register to a user specified value like the
> documentation suggests, they confirmed it.
> 
> I'm sending a working firmware to you in a private mail. I'm not aware
> of any issue working with that older version and hopefully it should
> upload to 1TB drives as well.
> I'm applying it like this but from Linux not OmniOS:
> # ./dl_sea_fw-0.2.3_32 -f Megalodon_StdOEM_SAS_0002+C84C.lod -m ST4000NM0023
> # ./dl_sea_fw-0.2.3_32 -i
> 
> Then you should reset the drives so they reload the firmware.
> Here's our example for 4TB drives:
> -
> for i in $(lsscsi | grep 'ST4000NM0023' | awk '{print $6}') ; do
>   sg_reset -d $i
> done
> -
> 
> And reset the register that contains value from the previous firmware.
> It doesn't work well so we've got this script to run a few times until
> all disks got it. Again it matches 4TB Megalodon.
> -
> for i in $(lsscsi | grep 'ST4000NM0023' | awk '{print $6}') ; do
>   echo -n "$i "
>   if sg_logs $i --page=0x0d | grep 'Reference temperature = 68 C'
>> /dev/null ; then
> echo 'ok'
>   else
> sg_logs $i --page=0x0d --reset
> echo 'reset'
>   fi
> done
> -
> 
> 
> Cheers
> 

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160

2014-04-08 Thread Saso Kiselkov

On 4/8/14, 3:35 PM, Jim Klimov wrote:
> On 2014-04-08 03:51, Theo Schlossnagle wrote:
>> Today was an unfortunate day for the Internet as a particularly
>> devastating and quite longstanding bug was reveal in OpenSSL 1.0.1.
> 
> Thanks for the heads-up!
> 
> Can anyone please elaborate on this question, though: some of the
> legacy systems (i.e. Solaris 10 based) out in the field have not,
> in fact, seen or used OpenSSL past 0.9.8-something; and ran some
> SSL-protected email, openvpn, web or ldap services (though the
> latter is probably using some java security layer). It is however
> not known what SSL implementations and versions were used by the
> users of these systems. Are such setups vulnerable (given that
> the server side had no heartbeat handshake code with the bug) to
> the extent that everything should be urgently upgraded or not?

Anything below OpenSSL 1.0.0 (inclusive) isn't vulnerable to this. (Most
legacy systems, including OI, still run on the OpenSSL 0.9.8
release train)

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] crash

2014-04-07 Thread Saso Kiselkov

On 4/7/14, 11:19 AM, Johan Kragsterman wrote:
> 
> Hej!
> 
> 
> Got a crash here, that I would like someone have a look at.
> 
> [..snip..]
> 
>> ::stack
> vpanic()
> vdev_deadman+0x10b(ff0a277f0540)
> vdev_deadman+0x4a(ff0a1eea6040)
> vdev_deadman+0x4a(ff0a1dfea580)
> spa_deadman+0xad(ff0a1cd8a580)
> cyclic_softint+0xf3(fbc30d20, 0)
> cbe_low_level+0x14()
> av_dispatch_softvect+0x78(2)
> dispatch_softint+0x39(0, 0)
> switch_sp_and_call+0x13()
> dosoftint+0x44(ff0045805a50)
> do_interrupt+0xba(ff0045805a50, 1)
> _interrupt+0xba()
> acpi_cpu_cstate+0x11b(ff0a1ce9e670)
> cpu_acpi_idle+0x8d()
> cpu_idle_adaptive+0x13()
> idle+0xa7()
> thread_start+8()
> [..snip..]
> WARNING: ahci0: watchdog port 1 satapkt 0xff0a5a545088 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xff0a5dc38160 timed out
> 
> WARNING: ahci0: watchdog port 0 satapkt 0xff0a5dc642e0 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 0 satapkt 0xff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 0 satapkt 0xff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 0 satapkt 0xff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xff0a5fe32b90 timed out
> 
> NOTICE: SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
> 
> 
> panic[cpu0]/thread=ff00458cbc40: 
> I/O to pool 'mainpool' appears to be hung.
> 
> 
> ff00458cba20 zfs:vdev_deadman+10b ()
> ff00458cba70 zfs:vdev_deadman+4a ()
> ff00458cbac0 zfs:vdev_deadman+4a ()
> ff00458cbaf0 zfs:spa_deadman+ad ()
> ff00458cbb90 genunix:cyclic_softint+f3 ()
> ff00458cbba0 unix:cbe_low_level+14 ()
> ff00458cbbf0 unix:av_dispatch_softvect+78 ()
> ff00458cbc20 unix:dispatch_softint+39 ()
> ff00458059a0 unix:switch_sp_and_call+13 ()
> ff00458059e0 unix:dosoftint+44 ()
> ff0045805a40 unix:do_interrupt+ba ()
> ff0045805a50 unix:cmnint+ba ()
> ff0045805bc0 unix:acpi_cpu_cstate+11b ()
> ff0045805bf0 unix:cpu_acpi_idle+8d ()
> ff0045805c00 unix:cpu_idle_adaptive+13 ()
> ff0045805c20 unix:idle+a7 ()  
> ff0045805c30 unix:thread_start+8 ()
> 
> syncing file systems...   
>  done
> dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port
> 
> Would be nice to get some info about this from someone that got some more 
> clues than I got...

Essentially, this says that your SATA controller hung in a bad state
that isn't recoverable:
https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/fs/zfs/spa_misc.c#L256-L261

I'd suspect the SATA controller. If this panic comes with any
regularity, try working around the SATA controller by using a substitute
HBA and disabling the old one to see if it goes away.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] L2ARC actual size and zpool iostat -v output discrepancy?

2014-02-12 Thread Saso Kiselkov

Okay, I think I have a convincing analysis for this:

TL;DR Sadly, the zpool vdev space stats are less than useless for L2ARC
here and they're confusing you.

L2ARC devices are rotors, so they start writing at low offsets and
progress higher until they wrap around. What your vdev space stats
(zpool iostat -v) show is really the difference between the maximum and
the minimum buffer offsets. It doesn't actually say how much of the
space in between is occupied by usable buffers.  l2asize, however, does.

The reason why I asked for the l2ad_first measurement is because I
wanted to understand why your space usage was showing ~59GB of free
space on each device - it's because your L2ARC devices have not yet even
completed a single run through the rotor. If it had completed the first,
it would have indicated a serious flaw in the wrap-around logic. So this
is currently the situation:

 write hand,
 L2ARC device   \
+V-+
|  |lowest|  .. buffers & gaps . |highest|  (nothing here yet) |
+--+

   |"allocated"  |  "free"|
   |<- 313G >|<-- 59G --->|

The gaps in between the "lowest" and "highest" L2ARC buffers get created
when the buffers previously written there are evicted from L2ARC for
whatever reason (e.g. the filesystem holding them is destroyed, or they
are written to in ARC, invalidating the contents cached in L2ARC).
Unfortunately, the "allocated" vdev space stat is not altered when this
happens, so your L2ARC could really just hold two buffers and still
appear as full in the vdev stats.

The reason why l2asize is different is because it *does* take eviction
into account. So as the vdev space allocated metric was growing due to
the write hand moving to the right, the actual amount of data stored in
the L2ARC didn't grow nearly as quickly since buffers that had
previously been written had to be evicted.

As for why the (uncompressed) l2size is roughly equal to vdev space
allocated, I'd say it's due to a rounding error in the reporting tools
and some luck. The numbers don't actually even add up to the same result:

l2size: 1374605708288 bytes / 2^30 = 1280 GB
vdev space: 313 + 314 + 314 + 313 = 1254 GB

As for general ZFS design, I think we should either fix the vdev space
stats on L2ARC devices so that they precisely correspond to l2asize, or
get rid of them altogether. Right now, the discrepancy there just
confuses people.

Hope this helps.

Cheers,
-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] L2ARC actual size and zpool iostat -v output discrepancy?

2014-02-12 Thread Saso Kiselkov

On 2/12/14, 3:31 PM, wuffers wrote:
> So I only upgraded to r151008 recently, and was wondering whether the
> new L2ARC compression was working. After getting an updated arcstat
> script which added the l2asize option (which returned a 0), and a few
> rounds in IRC which lead me to the correct kstat
> (zfs:0:arcstats:l2_asize), and an even more updated arcstat to fix the 0
> result..
> 
> Now both kstat and arcstat are outputting the same info:
> zfs:0:arcstats:l2_asize 864682956800
> zfs:0:arcstats:l2_size  1374605708288
> 
> arcstat:
> read  hits  miss  hit%  l2read  l2hits  l2miss  l2hit%  arcsz  l2size 
> l2asize
> 2.7K  2.6K5398  53  44   9  83   229G   
> 1.3T 806G
> 5.1K  4.8K   28294 282  17 265   6   229G   
> 1.3T 806G
> 7.3K  7.3K1099  10   4   6  40   229G   
> 1.3T 806G
> ...
> 
> But.. why is zpool iostat -v showing me my cache devices using up ~1.25T
> (314Gx4), which is close to the 1.3T l2size?
> 
>   capacity operationsbandwidth
> pool   alloc   free   read  write   read  write
> -  -  -  -  -  -  -
> [snip]
> 
> cache  -  -  -  -  -  -
>   c2t500117310015D579d0 313G  59.4G 19 15   711K   833K
>   c2t50011731001631FDd0 314G  58.1G 18 15   712K   836K
>   c12t500117310015D59Ed0314G  58.8G 19 15   710K   835K
>   c12t500117310015D54Ed0313G  59.7G 18 15   709K   832K
> -  -  -  -  -  -  -
> 
> What's with the discrepancy? Is zpool iostat calculating the free
> capacity incorrectly now (my cache drives are 400GB)?

Can you also try running this piece of dtrace on the machine? I have a
hypothesis that I'd like to test:

dtrace -n 'fbt::l2arc_evict:entry{printf("dev=%p; l2ad_first=%d",
args[0], args[0]->l2ad_first)}'

(let it run for about 5-10s and post the output)

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] L2ARC actual size and zpool iostat -v output discrepancy?

2014-02-12 Thread Saso Kiselkov

On 2/12/14, 3:31 PM, wuffers wrote:
> So I only upgraded to r151008 recently, and was wondering whether the
> new L2ARC compression was working. After getting an updated arcstat
> script which added the l2asize option (which returned a 0), and a few
> rounds in IRC which lead me to the correct kstat
> (zfs:0:arcstats:l2_asize), and an even more updated arcstat to fix the 0
> result..
> 
> Now both kstat and arcstat are outputting the same info:
> zfs:0:arcstats:l2_asize 864682956800
> zfs:0:arcstats:l2_size  1374605708288
> 
> arcstat:
> read  hits  miss  hit%  l2read  l2hits  l2miss  l2hit%  arcsz  l2size 
> l2asize
> 2.7K  2.6K5398  53  44   9  83   229G   
> 1.3T 806G
> 5.1K  4.8K   28294 282  17 265   6   229G   
> 1.3T 806G
> 7.3K  7.3K1099  10   4   6  40   229G   
> 1.3T 806G
> ...
> 
> But.. why is zpool iostat -v showing me my cache devices using up ~1.25T
> (314Gx4), which is close to the 1.3T l2size?
> 
>   capacity operationsbandwidth
> pool   alloc   free   read  write   read  write
> -  -  -  -  -  -  -
> [snip]
> 
> cache  -  -  -  -  -  -
>   c2t500117310015D579d0 313G  59.4G 19 15   711K   833K
>   c2t50011731001631FDd0 314G  58.1G 18 15   712K   836K
>   c12t500117310015D59Ed0314G  58.8G 19 15   710K   835K
>   c12t500117310015D54Ed0313G  59.7G 18 15   709K   832K
> -  -  -  -  -  -  -
> 
> What's with the discrepancy? Is zpool iostat calculating the free
> capacity incorrectly now (my cache drives are 400GB)?

What's the block size of your SSDs and the average recordsize of the
data on them?

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] omnios : r151008 : iso : download : not happening!

2014-02-05 Thread Saso Kiselkov

On 2/5/14, 2:06 PM, Mayuresh Kathe wrote:
> On Wed, Feb 05, 2014 at 01:30:40PM +0000, Saso Kiselkov wrote:
>> On 2/5/14, 1:09 PM, Mayuresh Kathe wrote:
>>> hi,
>>>
>>> i have been trying to download the r151008 iso since afternoon.
>>> fails after downloading around 120mb.
>>>
>>> this has happened thrice.
>>> and inspite of using 'wget -c' it fails to restart.
>>>
>>> is there any problem with the omniti network connectivity?
>>> have successfully downloaded crux linux after that.
>>
>> Works for me.
> 
> yeah, worked at my end too, done in less than 40 minutes.
> strange...

I guess it would be kind of nice if the OmniTI would publish torrents,
so that the community can help out with the traffic situation, or at
least have it work as a back up.

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] omnios : r151008 : iso : download : not happening!

2014-02-05 Thread Saso Kiselkov

On 2/5/14, 1:09 PM, Mayuresh Kathe wrote:
> hi,
> 
> i have been trying to download the r151008 iso since afternoon.
> fails after downloading around 120mb.
> 
> this has happened thrice.
> and inspite of using 'wget -c' it fails to restart.
> 
> is there any problem with the omniti network connectivity?
> have successfully downloaded crux linux after that.

Works for me.

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Where to Run Omni

2014-02-03 Thread Saso Kiselkov

On 2/3/14, 9:30 PM, Paul B. Henson wrote:
>> From: Saso Kiselkov
>> Sent: Monday, February 03, 2014 7:23 AM
>> something like 10GB and turn on compression. Although the standard
>> installer doesn't let you set compression on the root pool before
>> starting installation, there's a trick to doing it. Before firing up the
> 
> With a little more effort:
> 
> http://omnios.omniti.com/wiki.php/ISOrpoolCustomize
> 
> You can configure the size of the rpool partition, provide arbitrary pool
> creation options including compression, and also size swap/dump explicitly
> rather than use the installer defaults.

Oh wow, configure network, download script, make executable, execute,
bootstrap pre-install environment, edit perl script, then cross fingers
that it works as expected (be sure to read the log!). I mean I
understand that this is more flexible than just running stuff in a
screen, but if you were willing to go to such lengths to create this
script, what's the problem with fixing the installer to simply prompt
for some of these options in the first place? You know, things like
swap/dump size, fs options, etc - little things. Doesn't have to be a
full installer rewrite.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Where to Run Omni

2014-02-03 Thread Saso Kiselkov

On 2/3/14, 3:06 PM, Matthew Mabis wrote:
> 
> Hello,
> 
> I have a few questions about a project that i am trying to work on, needless 
> to say i have 1 80GB SSD Drive that i can fit in this case, the rest of the 
> drives are a RaidZ2 configuration and there cannot be anymore added Disks to 
> this system period!
> 
> My goal is to make the SSD capable of making NFS run better (Slog/ZIL/L2ARC) 
> whatever... but i need the OS to go somewhere!  
> 
> My Questions are
> 
> 1) Splitting the SSD into partitions and running is that bad i always heard 
> it was my question is why?
> 2) Can Omni be run from say a 32GB USB3 key directly?
> 3) What options do i have here outside of adding a drive.
> 
> I have SATA Slots on the mobo just no place to add an additional drive!

Yes, you can partition the SSD and it works okay. For the OS, set aside
something like 10GB and turn on compression. Although the standard
installer doesn't let you set compression on the root pool before
starting installation, there's a trick to doing it. Before firing up the
installation, get a shell from the installer menu and run the following
commands:

# screen
# while ! zpool set compress=lz4 rpool; do sleep 1; done

(after this hit 'Ctrl-A' + 'D' and type 'exit' to return to the
installer menu and leave the loop running)

The loop will try to set compression on the newly created root pool as
soon as the installer creates it, so that the OS's data itself will be
installed compressed.

For slog (aka ZIL) you don't need much - usually around 1-2GB is more
than enough, since it only needs to hold the sync write data from the
last transaction (by default: the last 5 seconds).

Use the rest of the SSD for L2ARC. Depending on the SSD type you may
also want to keep around 10-20% of the SSD unallocated, so that the
garbage collector on the controller always has blocks available it can
pre-erase - this can boost long-term slog performance quite a bit. YMMV

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] iscsi timeouts

2014-02-03 Thread Saso Kiselkov

On 2/3/14, 10:51 AM, Tobias Oetiker wrote:
> a short update on the matter for anyone browsing the ML archives:
> 
> The affected system runs on an S2600CP motherboard with RMM4 remote
> management.  RMM comes with the ability to use any of the existing
> Ethernet ports on the MB for its communication needs ...  we have
> configured it with a separate hw port, but it seems that this
> ability the access the other ports can interfere with omnios
> operation.
> 
> 10 days ago, we have upgraded the bios to
> version SE5C600.86B.02.01.0002.082220131453 08/22/2013
> 
> since then we have not seen any issues ...
> 
> I am not 100% sure that this is the solution to the problem, as we
> only found the behaviour after several weeks of uptime ...  in any
> event, for now things look good.

Interesting observation, thanks for keeping the list updated!

Best wishes,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Illumos and Infiniband

2014-01-29 Thread Saso Kiselkov

On 1/29/14, 6:38 PM, Chris Zembower wrote:
> Please don't lecture me on the merits of 10GbE/FC over Infiniband, I've
> heard it all. :-)

You're not alone in your appreciation of IB's brute strength. As for why
stuff isn't happening, my guess is lack of manpower - we're a really
small community and people work on what they like, plus what they can
get their hands on (and IB gear is a little exotic). That being said,
I'd love to have a mature and up-to-date IB stack on Illumos that could
be used in production. It's sorely needed.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Saso Kiselkov

On 1/21/14, 10:16 PM, Saso Kiselkov wrote:
> On 1/21/14, 10:09 PM, Saso Kiselkov wrote:
>> On 1/21/14, 10:01 PM, Tobias Oetiker wrote:
>>> Hi Nld,
>>>
>>> Today Narayan Desai wrote:
>>>
>>>> Sorry, I should have given the requisite "yes, I know that this is a recipe
>>>> for sadness, for I too have experienced said sadness".
>>>>
>>>> That said, we've seen this kind of problem when there was a device in a
>>>> vdev that was dying a slow death. There wouldn't necessarily be any sign,
>>>> aside from insanely high service times on an individual device in the pool.
>>>> From this, I assume that ZFS is still sensitive to variation in underlying
>>>> drive performance.
>>>>
>>>> Tobi, what do your drive service times look like?
>>>>  -nld
>>>
>>> the drives seem fine, smart is not reporting anything out of the
>>> ordinary and also iostat -En shows 0 on all counts
>>>
>>> I don't think it is a disk issue, but rather something connected
>>> with the network ...
>>>
>>> On times the machine becomes unreachable for some time, and then it
>>> is possible to login via console and all seems well internally.
>>> setting the network interface offline and then online again using
>>> the dladm tool brings the connectivity back immediatly. waiting
>>> helps as well ... since the problem sorts itself out after a few
>>> seconds to minutes ...
>>>
>>> we just had another 'off the net' periode for 30 minutes
>>>
>>> unfortunately omnios itself does not seem to realize that something
>>> is off, at least dmesg does not show any kernel messages about this
>>> problem ...
>>>
>>> we have several systems running on the S2600CP MB ... this is the
>>> only one showing problems ...
>>>
>>> the next thing I intend todo is to upgrade the MB firmware since I
>>> found that this box has an older version than the other ones ...
>>>
>>> System Configuration: Intel Corporation S2600CP
>>> BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 
>>> 11/01/2012
>>>
>>> other ideas, most welcome !
>>
>> You mentioned a couple of e-mails back that you're using Intel I350s.
>> Can you verify that your kernel has:
>>
>> commit 43ae55058ad99c869a9ae39d039490e8a3680520
>> Author: Dan McDonald 
>> Date:   Thu Feb 7 19:27:18 2013 -0500
>>
>> 3534 Disable EEE support in igb for I350
>> Reviewed by: Robert Mustacchi 
>> Reviewed by: Jason King 
>> Reviewed by: Marcel Telka 
>> Reviewed by: Sebastien Roy 
>> Approved by: Richard Lowe 
>>
>> I guess you can check for this string at runtime:
>> $ strings /kernel/drv/amd64/igb | grep _eee_support
>>
>> If it is missing, then it could be the buggy EEE support that's throwing
>> your link out of whack here.
> 
> Nevermind, missed your description of the KVM guests being reachable
> while only the host goes offline... Did snoop show anything arriving at
> the host while it is offline?

However, on second thought, you did mention that you're running
crossover between two hosts, which would match the description of the
EEE issue:

https://illumos.org/issues/3534
"The energy efficient Ethernet (EEE) support in Intel's I350 GigE NIC
drops link on directly-attached link cases."

Anyhow, make sure you're running the EEE fix.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Saso Kiselkov

On 1/21/14, 10:09 PM, Saso Kiselkov wrote:
> On 1/21/14, 10:01 PM, Tobias Oetiker wrote:
>> Hi Nld,
>>
>> Today Narayan Desai wrote:
>>
>>> Sorry, I should have given the requisite "yes, I know that this is a recipe
>>> for sadness, for I too have experienced said sadness".
>>>
>>> That said, we've seen this kind of problem when there was a device in a
>>> vdev that was dying a slow death. There wouldn't necessarily be any sign,
>>> aside from insanely high service times on an individual device in the pool.
>>> From this, I assume that ZFS is still sensitive to variation in underlying
>>> drive performance.
>>>
>>> Tobi, what do your drive service times look like?
>>>  -nld
>>
>> the drives seem fine, smart is not reporting anything out of the
>> ordinary and also iostat -En shows 0 on all counts
>>
>> I don't think it is a disk issue, but rather something connected
>> with the network ...
>>
>> On times the machine becomes unreachable for some time, and then it
>> is possible to login via console and all seems well internally.
>> setting the network interface offline and then online again using
>> the dladm tool brings the connectivity back immediatly. waiting
>> helps as well ... since the problem sorts itself out after a few
>> seconds to minutes ...
>>
>> we just had another 'off the net' periode for 30 minutes
>>
>> unfortunately omnios itself does not seem to realize that something
>> is off, at least dmesg does not show any kernel messages about this
>> problem ...
>>
>> we have several systems running on the S2600CP MB ... this is the
>> only one showing problems ...
>>
>> the next thing I intend todo is to upgrade the MB firmware since I
>> found that this box has an older version than the other ones ...
>>
>> System Configuration: Intel Corporation S2600CP
>> BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 
>> 11/01/2012
>>
>> other ideas, most welcome !
> 
> You mentioned a couple of e-mails back that you're using Intel I350s.
> Can you verify that your kernel has:
> 
> commit 43ae55058ad99c869a9ae39d039490e8a3680520
> Author: Dan McDonald 
> Date:   Thu Feb 7 19:27:18 2013 -0500
> 
> 3534 Disable EEE support in igb for I350
> Reviewed by: Robert Mustacchi 
> Reviewed by: Jason King 
> Reviewed by: Marcel Telka 
> Reviewed by: Sebastien Roy 
> Approved by: Richard Lowe 
> 
> I guess you can check for this string at runtime:
> $ strings /kernel/drv/amd64/igb | grep _eee_support
> 
> If it is missing, then it could be the buggy EEE support that's throwing
> your link out of whack here.

Nevermind, missed your description of the KVM guests being reachable
while only the host goes offline... Did snoop show anything arriving at
the host while it is offline?

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] iscsi timeouts

2014-01-21 Thread Saso Kiselkov

On 1/21/14, 10:01 PM, Tobias Oetiker wrote:
> Hi Nld,
> 
> Today Narayan Desai wrote:
> 
>> Sorry, I should have given the requisite "yes, I know that this is a recipe
>> for sadness, for I too have experienced said sadness".
>>
>> That said, we've seen this kind of problem when there was a device in a
>> vdev that was dying a slow death. There wouldn't necessarily be any sign,
>> aside from insanely high service times on an individual device in the pool.
>> From this, I assume that ZFS is still sensitive to variation in underlying
>> drive performance.
>>
>> Tobi, what do your drive service times look like?
>>  -nld
> 
> the drives seem fine, smart is not reporting anything out of the
> ordinary and also iostat -En shows 0 on all counts
> 
> I don't think it is a disk issue, but rather something connected
> with the network ...
> 
> On times the machine becomes unreachable for some time, and then it
> is possible to login via console and all seems well internally.
> setting the network interface offline and then online again using
> the dladm tool brings the connectivity back immediatly. waiting
> helps as well ... since the problem sorts itself out after a few
> seconds to minutes ...
> 
> we just had another 'off the net' periode for 30 minutes
> 
> unfortunately omnios itself does not seem to realize that something
> is off, at least dmesg does not show any kernel messages about this
> problem ...
> 
> we have several systems running on the S2600CP MB ... this is the
> only one showing problems ...
> 
> the next thing I intend todo is to upgrade the MB firmware since I
> found that this box has an older version than the other ones ...
> 
> System Configuration: Intel Corporation S2600CP
> BIOS Configuration: Intel Corp. SE5C600.86B.01.06.0002.110120121539 11/01/2012
> 
> other ideas, most welcome !

You mentioned a couple of e-mails back that you're using Intel I350s.
Can you verify that your kernel has:

commit 43ae55058ad99c869a9ae39d039490e8a3680520
Author: Dan McDonald 
Date:   Thu Feb 7 19:27:18 2013 -0500

3534 Disable EEE support in igb for I350
Reviewed by: Robert Mustacchi 
Reviewed by: Jason King 
Reviewed by: Marcel Telka 
Reviewed by: Sebastien Roy 
Approved by: Richard Lowe 

I guess you can check for this string at runtime:
$ strings /kernel/drv/amd64/igb | grep _eee_support

If it is missing, then it could be the buggy EEE support that's throwing
your link out of whack here.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Not a UFS magic Number... Boot OmniOS installation on new Supermicro X10SAE

2014-01-12 Thread Saso Kiselkov

On 1/12/14, 3:03 PM, Svavar Örn Eysteinsson wrote:
> Well, I've changed the disk from a brand new one, just opened it up from a 
> static bag
> inserted it, and the exact same error messages…. 
> 
> :S
> 
> so two disks, no boot….

Scrap what I said earlier. I think I understand what the problem is. The
mount scripts on the installer USB are confusing the USB disk with the
SATA drive and are trying to mount the SATA disk as the installer root.
This is obviously a bug in the install image, but you should be able to
work around it by booting from an ISO (that motherboard should have a
on-board IP KVM with media redirection, so you can use that if you don't
want to burn install media).

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Not a UFS magic Number... Boot OmniOS installation on new Supermicro X10SAE

2014-01-12 Thread Saso Kiselkov

On 1/12/14, 3:03 PM, Svavar Örn Eysteinsson wrote:
> Well, I've changed the disk from a brand new one, just opened it up from a 
> static bag
> inserted it, and the exact same error messages…. 
> 
> :S
> 
> so two disks, no boot….

It's not that I think that the disks would be bad, it's that maybe
there's something written on there that the kernel is misinterpreting as
being a UFS filesystem. If you zero out the initial portions of the
disk, then we know that there's just zeros there, and we will know to
look somewhere else.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Not a UFS magic Number... Boot OmniOS installation on new Supermicro X10SAE

2014-01-12 Thread Saso Kiselkov

On 1/12/14, 2:46 PM, Svavar Örn Eysteinsson wrote:
> Hello.
> 
> I'm currently trying to install OmniOS on a brand new server.
> Using the latest OmniOS build on USB stick with a SuperMicro X10SAE 
> motherboard, and a XEON CPU.
> 
> When I have no SATA disk connected to the Intel SATA controller the 
> installation boots up fine.
> 
> As soon as I plug a disk into the controller, it gives me the following 
> errors at boot :
> 
> 
> Probing for device nodes…
> Preparing image for use
> NOTICE : mount: not a UFS magic number (0x1c798c2e)
> NOTICE : mount: not a UFS magic number (0x1c798c2e)
> NOTICE : mount: not a UFS magic number (0x0)
> NOTICE : mount: not a UFS magic number (0x1c798c2e)
> Requesting System Maintenance Mode
> Console login services cannot run.
> 
> Enter username for system maintenance….
> 
> 
> My SATA disk is empty.
> 
> Does anyone know what the hell is going on ?
> Are there any BOOT options on OmniOS installation? Can't find any information 
> regarding boot options.
> 
> Any help would be much appreciated.
> 
> Thanks allot people.

Can you try dd'ing over the disk with zeros and trying again?

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] the right card for a server

2014-01-09 Thread Saso Kiselkov

On 1/9/14, 8:23 PM, Entfernt wrote:
> This question covers OmniOS and the usual Supermicro hardware base.  I want 
> to add a four-port gigabit ethernet card to a 5017C-LF:
> 
> http://www.supermicro.com/products/system/1u/5017/sys-5017c-lf.cfm
> 
> So that's PCI Express x8.
> 
> Most cards for the machine are two-port but this one:
> 
> http://www.iphase.com/products/product.cfm/PCI%20Express/399
> 
> appears to fit the bill.  In particular, it has the Broadcom® BCM5704C 
> chipset which appears on the HCL:
> 
> https://www.illumos.org/hcl/
> 
> but I'm a programmer rather than a sysadmin or hardware person.  Can anyone 
> see any obvious reason for the card not working ?  Thanks in expectation.
> 
> 

Oh and one last thing: what do you *need* the extra 2 ports for? As in,
do you need them for the physically separate ports, or do you need the
extra performance? If it's for the performance, it might make sense to
look at 10-gig NICs. Dual-port 10-gig can be had in low-profile for just
a little more money than 4x 1-gig. But of course, if it doesn't fit your
workload requirements, then ignore me. Just a thought...

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] the right card for a server

2014-01-09 Thread Saso Kiselkov

On 1/9/14, 11:48 PM, Michael Mounteney wrote:
>> Hello,
>>
>> you need a Riser-Card (RSC-RR1U-E8) too.
>>
>> Take a look at Supermicro add-on Cards:
>> http://www.supermicro.nl/products/accessories/index.cfm
>>
>> There you can find a 4 Port Intel Nic
>> http://www.supermicro.nl/products/accessories/addon/AOC-UG-i4.cfm
> 
> I appreciate that you are trying to help, Alexandre, but there are two
> points that I find puzzling in your advice.
> 
> Firstly that I've never seen any other reference to needing a riser for a
> PCI card.  SAS, yes;   PCI, no.

Of course you need a riser. How else do you think the card's gonna make
that turn from vertical to horizontal to fit in that chassis? PCI risers
are simply (most commonly) passive mechanical adapters that allow you to
route PCI ports to other physical locations within the chassis.

And SAS risers? Never of that. Don't you mean expanders?

> Secondly, Supermicro's own compatibility matrix for the product, at
> http://www.supermicro.nl/support/resources/AOC/UIO_AOC_Compatibility.cfm ,
> excludes the NIC you mention from my hardware;  it is a full-height card
> and the 5017C-LF only accommodates low-profile.

Interesting, because from the pictures it *looks* like a full-height
slot. But yeah, if it's low-profile-only, then you'd wanna get something
that fits into low-profile (like the I350-T4: http://tinyurl.com/ncwa92s )

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] the right card for a server

2014-01-09 Thread Saso Kiselkov

On 1/9/14, 11:26 PM, Michael Mounteney wrote:
> Thanks for your comments Dan.
> 
> The board I found is a PCI Express x8 whereas the I350 is x4.  What does
> that mean, and does it matter at all ?  The server has an x8 slot.

As both cards use PCI-Express 2.0, four lanes are able to deliver up to
4x 4Gbit/s full duplex. Given that the card is a 4x 1Gbit/s Ethernet
part, I wouldn't worry about it.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OmniOS random freezes

2014-01-06 Thread Saso Kiselkov

On 1/6/14, 10:50 PM, Niels Goossens wrote:
> 3. Is the pool healthy?
> A. The drives are consumer grade sata drives and about 3 years old. They
> are not really used that much - they used to be in my Opensolaris based
> NAS before I upgraded that to something bigger. Smartctl tells me SMART
> status of all drives is OK. There are no other log entries that lead me
> to believe a drive is bad. Zpool status is OK.

Just to perform a test, you could try loading up the pool with as much
test data as you can (some repetitive incompressible test pattern would
be best, e.g. a movie file) and then run "zpool scrub" to verify all the
checksums.

> 4. Is there anything in Supermicro IPMI?
> A. The following, which has occurred only twice now:
> 
> 2013/11/23 19:53:28Correctable Memory ECC @ DIMM2A(CPU1) - Asserted
> 2013/11/23 19:53:29Uncorrectable Memory ECC @ DIMM2A(CPU1) - Asserted
> 2013/11/23 22:28:56Correctable Memory ECC @ DIMM2A(CPU1) - Asserted
> 2013/11/23 22:28:57Uncorrectable Memory ECC @ DIMM2A(CPU1) - Asserted
> 
> Even though I'd rather not see this error, I'm not alarmed considering
> it has not occurred since.

This could indicate degrading or failing ECC memory. Try running
memtest86+ on the machine for a while to see if it reports anything
useful. You can grab a pre-built ISO at http://www.memtest.org/#downiso
Alternatively, grab a bootable file from that site. Then, just put it
somewhere on your root filesystem, e.g.
/platform/i86pc/memtest86+-4.20.bin, gunzip and boot to it from GRUB by
entering the following GRUB commands:

findroot (pool_rpool,0,a)   <- partition number + slice
bootfs rpool/ROOT/omnios<- see "beadm list" for the exact name
kernel /platform/i86pc/memtest86+-4.20.bin
boot

> 5. Are core dump or crash files available?
> A. I've setup dumpadm and coreadm only today. There are no core files in
> /, or crash files in /var/crash. There are no log entries in /var/log.
> There is nothing in /var/adm/messages, the last entry there is hours
> before the machine freezes.

System crash dumps are usually saved on the dump device (rpool/dump)
until you manually retrieve them. If you run "savecore" without any
arguments it will try to extract the crash dump from the dump device and
save it to /var/crash/.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 8:44 PM, Lauri Tirkkonen wrote:
> On Thu, Dec 26 2013 20:12:18 +0000, Saso Kiselkov wrote:
>> On 12/26/13, 8:10 PM, Saso Kiselkov wrote:
>>> On 12/26/13, 7:50 PM, Saso Kiselkov wrote:
>>>> On 12/26/13, 7:45 PM, Scott Roberts wrote:
>>>>> Saso,
>>>>>
>>>>> It has the following:
>>>>>
>>>>> Broadcom 57800 2x10Gb DA/SFP+ + 2x1Gb BT Network Daughter Card
>>>>>
>>>>> Broadcom 57810 Dual Port 10Gb Direct Attach/SFP+ Network Adapter
>>>>>
>>>>> Total of (2) GigE and (4) 10Gb SFP+ ports.
>>>>
>>>> Sorry to disappoint, but those 10Gb parts aren't and probably never will
>>>> be supported. Unfortunately, Broadcom has a bad habit of shipping binary
>>>> blobs only, so there's no way we could even port over support from other
>>>> platforms (as we're doing with Intel hardware).
>>>
>>> Minor correction here. It appears the tg3 Linux driver *is* open-source,
>>> so provided that you can get Broadcom to give it a liberal-enough
>>> license to let us include it in Illumos, you might be able to port it to
>>> Illumos. FreeBSD, which is the source of many drivers that Illumos ports
>>> over, sadly, doesn't seem to support it either.
>>
>> Damn, correction #2 here: FreeBSD does include support for this since
>> 10-current, so its bxe(4) driver is a potential target for porting:
>> http://www.freebsd.org/cgi/man.cgi?query=bxe&apropos=0&sektion=4&manpath=FreeBSD+10-current&arch=default&format=html
>>
>> Now just to find the person with enough spare time on their hands to do
>> the actual grunt work...
> 
> Interestingly, Broadcom's GLDv3 driver package for these cards from
> http://www.broadcom.com/support/license.php?file=NXII_10/solaris_ev-7.8.11.zip
> includes the source code and even has an illumos compilation target (but
> no illumos binaries). We tried at work and the driver attaches to our
> 57810 NICs, but we haven't tried actually using them yet. The license
> doesn't seem very permissive though...
> 

Neat. If anybody knows the relevant people at Broadcom, it would be
interesting to ask them if they'd be interested for this work to be
folded into the Illumos mainline code. My guess is the answer will be
"no", but would be interesting to ask nonetheless.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 8:33 PM, Michael Rasmussen wrote:
> On Thu, 26 Dec 2013 20:27:55 +
> Saso Kiselkov  wrote:
> 
>>
>> If you can get them swapped out, the added benefit will be that igb and
>> ixgbe (the Intel drivers) support fast reboot, so you'll be able to
>> reboot your machine without having to go through the lengthy BIOS boot
>> checks.
> Is this the case for all Intel drivers or are there some Intel drivers
> which does not support fast reboot?

Anything handled e1000g, igb and ixgbe supports it. Don't know if this
covers *all* Intel NICs we support, but these three drivers certainly
handle all Intel NICs available for the R720.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 8:22 PM, Scott Roberts wrote:
> Thanks to all for the quick replies and suggestions.  This list is awesome.

Glad we could get this finally sorted to some kind of happy ending.
Again, sorry for giving you bad info initially.

> I just finished burning the driver update CDs for the GigE and 10G drivers
> and will re-install OmniOS using the DUs.  I will let you all know how it
> goes.
> 
> I will also question Dell about swapping out with Intel hardware.  I
> somehow missed the switch to Broadcom hardware in their last quote... my
> fault.

If you can get them swapped out, the added benefit will be that igb and
ixgbe (the Intel drivers) support fast reboot, so you'll be able to
reboot your machine without having to go through the lengthy BIOS boot
checks.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 8:19 PM, Dale Ghent wrote:
> On Dec 26, 2013, at 2:59 PM, Saso Kiselkov  wrote:
> 
>> On 12/26/13, 7:55 PM, Dale Ghent wrote:
>>>
>>>
>>> Broadcomm does have GLDv3 drivers for "Solaris" - whether it'll work on 
>>> Illumos derivatives would be a quick and easy experiment with hopefully 
>>> favorable results.
>>>
>>> http://www.broadcom.com/support/ethernet_nic/netxtremeii10.php
>>
>> The Dell R720 doesn't use NetExtreme II NICs. They use the newer NetLink
>> NICs, for which even Broadcom remarks on their driver site:
>> "Note: Broadcom does not offer UnixWare, SCO and Solaris drivers for
>> NetLink Ethernet controllers."
> 
> The driver I linked purports to cover the 578xx chips, which are the chipsets 
> that Scott quoted as being the ones he has on his server…

You're right, I got the numbers mixed up. My bad.

-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 7:59 PM, Saso Kiselkov wrote:
> On 12/26/13, 7:55 PM, Dale Ghent wrote:
>>
>>
>> Broadcomm does have GLDv3 drivers for "Solaris" - whether it'll work on 
>> Illumos derivatives would be a quick and easy experiment with hopefully 
>> favorable results.
>>
>> http://www.broadcom.com/support/ethernet_nic/netxtremeii10.php
> 
> The Dell R720 doesn't use NetExtreme II NICs. They use the newer NetLink
> NICs, for which even Broadcom remarks on their driver site:
> "Note: Broadcom does not offer UnixWare, SCO and Solaris drivers for
> NetLink Ethernet controllers."

Sorry for the false info, 578xx indeed appear to be NetExtreme II NICs,
so the driver *should* work. I got the NIC series numbers mixed up on
the driver page.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 8:10 PM, Saso Kiselkov wrote:
> On 12/26/13, 7:50 PM, Saso Kiselkov wrote:
>> On 12/26/13, 7:45 PM, Scott Roberts wrote:
>>> Saso,
>>>
>>> It has the following:
>>>
>>> Broadcom 57800 2x10Gb DA/SFP+ + 2x1Gb BT Network Daughter Card
>>>
>>> Broadcom 57810 Dual Port 10Gb Direct Attach/SFP+ Network Adapter
>>>
>>> Total of (2) GigE and (4) 10Gb SFP+ ports.
>>
>> Sorry to disappoint, but those 10Gb parts aren't and probably never will
>> be supported. Unfortunately, Broadcom has a bad habit of shipping binary
>> blobs only, so there's no way we could even port over support from other
>> platforms (as we're doing with Intel hardware).
> 
> Minor correction here. It appears the tg3 Linux driver *is* open-source,
> so provided that you can get Broadcom to give it a liberal-enough
> license to let us include it in Illumos, you might be able to port it to
> Illumos. FreeBSD, which is the source of many drivers that Illumos ports
> over, sadly, doesn't seem to support it either.

Damn, correction #2 here: FreeBSD does include support for this since
10-current, so its bxe(4) driver is a potential target for porting:
http://www.freebsd.org/cgi/man.cgi?query=bxe&apropos=0&sektion=4&manpath=FreeBSD+10-current&arch=default&format=html

Now just to find the person with enough spare time on their hands to do
the actual grunt work...

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 7:50 PM, Saso Kiselkov wrote:
> On 12/26/13, 7:45 PM, Scott Roberts wrote:
>> Saso,
>>
>> It has the following:
>>
>> Broadcom 57800 2x10Gb DA/SFP+ + 2x1Gb BT Network Daughter Card
>>
>> Broadcom 57810 Dual Port 10Gb Direct Attach/SFP+ Network Adapter
>>
>> Total of (2) GigE and (4) 10Gb SFP+ ports.
> 
> Sorry to disappoint, but those 10Gb parts aren't and probably never will
> be supported. Unfortunately, Broadcom has a bad habit of shipping binary
> blobs only, so there's no way we could even port over support from other
> platforms (as we're doing with Intel hardware).

Minor correction here. It appears the tg3 Linux driver *is* open-source,
so provided that you can get Broadcom to give it a liberal-enough
license to let us include it in Illumos, you might be able to port it to
Illumos. FreeBSD, which is the source of many drivers that Illumos ports
over, sadly, doesn't seem to support it either.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 7:55 PM, Dale Ghent wrote:
> 
> 
> Broadcomm does have GLDv3 drivers for "Solaris" - whether it'll work on 
> Illumos derivatives would be a quick and easy experiment with hopefully 
> favorable results.
> 
> http://www.broadcom.com/support/ethernet_nic/netxtremeii10.php

The Dell R720 doesn't use NetExtreme II NICs. They use the newer NetLink
NICs, for which even Broadcom remarks on their driver site:
"Note: Broadcom does not offer UnixWare, SCO and Solaris drivers for
NetLink Ethernet controllers."

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 7:45 PM, Scott Roberts wrote:
> Saso,
> 
> It has the following:
> 
> Broadcom 57800 2x10Gb DA/SFP+ + 2x1Gb BT Network Daughter Card
> 
> Broadcom 57810 Dual Port 10Gb Direct Attach/SFP+ Network Adapter
> 
> Total of (2) GigE and (4) 10Gb SFP+ ports.

Talk to your Dell sales rep, if it's newly purchased stuff, you might be
able to get those swapped out for Intel parts.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 7:50 PM, Scott Roberts wrote:
> Saso,
> 
> No worries.  I'll grab the Broadcom GigE driver and install that first.
> Thanks!
> 
> WRT the 10Gig, it just flat out isn't compatible?  That's a real problem
> for my project.

Sadly, that's the situation, it seems. If you can spare a PCI-e slot,
though, you can get an Intel X520-DA2 very cheaply nowadays. Talk to
your Dell sales rep, they can easily knock the price down to ~$250.

Cheers,
-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 7:45 PM, Scott Roberts wrote:
> Saso,
> 
> It has the following:
> 
> Broadcom 57800 2x10Gb DA/SFP+ + 2x1Gb BT Network Daughter Card
> 
> Broadcom 57810 Dual Port 10Gb Direct Attach/SFP+ Network Adapter
> 
> Total of (2) GigE and (4) 10Gb SFP+ ports.

Sorry to disappoint, but those 10Gb parts aren't and probably never will
be supported. Unfortunately, Broadcom has a bad habit of shipping binary
blobs only, so there's no way we could even port over support from other
platforms (as we're doing with Intel hardware). For Illumos I'd
recommend always going with Intel NICs whenever possible. I know it's
not much consolation, just a piece of forward-looking advice.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell R720 & Broadcom ethernet

2013-12-26 Thread Saso Kiselkov

On 12/26/13, 6:34 PM, Scott Roberts wrote:
> Hello all,
> 
> I have a new Dell R720 server and unfortunately the network interfaces do
> not show up.  "dladm show-phys" and "dladm show-link" don't return
> anything.  I can see e1000g0 and e1000g1 under /dev but when I try to
> plumb the interfaces I receive the error message "ifconfig: cannot plumb
> e1000g0: Could not open DLPI link".
> 
> Any thoughts on how I can get the network up and running?  This is the
> first time I have encountered this particular error and I have run OmniOS
> on a variety of hardware.

What does lspci say?

# lspci | grep Ethernet

(if you don't have it installed, "pkg install pciutils")

Cheers,
-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-24 Thread Saso Kiselkov

On 12/24/13, 9:21 PM, Paul B. Henson wrote:
> On Tue, Dec 24, 2013 at 09:13:26PM +0000, Saso Kiselkov wrote:
> 
>> True, I had always assumed Intellipower meant "auto-throttle"
> 
> I think Intellipower means "if we called them 5400rpm drives nobody
> would buy them so let's be vague as hell" ;).
> 
>> invariable. I guess if you placed a microphone near to it and ran some
>> FT on it you could easily determine that, but then, I've got better
>> things to do with my life.
> 
> Seems some people don't:
> 
> http://www.silentpcreview.com/article1285-page5.html
> 
> "The two Red drives produced a slight tone at ~90 Hz, confirming that
> their motors spin at about 5,400 RPM."

Well, silentpcreview is a special brand of people who are obsessed about
stuff that I don't care about one bit. For me, the best solution to get
rid of hardware-induced noise is to simply shove the hardware in a
separate room and run cables to my workstation. Anyway, I digress.

Even if the drives are 5400rpm, they are plenty snappy enough for me and
1Gbit/s network (not running any transactional workloads on the, of course).

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-24 Thread Saso Kiselkov

On 12/24/13, 9:17 PM, Dale Ghent wrote:
> 
> On Dec 24, 2013, at 4:03 PM, Paul B. Henson  wrote:
> 
>> On 12/23/2013 3:37 AM, Saso Kiselkov wrote:
>>
>>> I'm quite fond of WD Red drives. They are just marginally more expensive
>>> than your el-cheapo home drive, but are built for 8760 on hours per year
>>> (24x7) and are capable for throttling down to 5400rpm when not in use
>>> (so they suck less power). WD markets them as designed for use in NAS.
>>
>> Do you have an authoritative source for the WD Red drives actually spinning 
>> at variable rates? From what I've read, while the actual rpm might vary 
>> between models, any given model only spins at a constant rpm, thought to be 
>> somewhere between 5000-5900. WD isn't exactly clear as to what exactly 
>> "Intellipower" means 8-/...
> 
> http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771442.pdf
> 
> "A fine-tuned balance of spin speed, transfer rate and caching algorithms 
> designed to deliver both significant power savings and solid performance. For 
> each drive model, WD may use a different, invariable RPM."
> 
> So much for technical details, but that's the gist of it.

Yeah, I read that too, but you have to realize that that's just
marketing speak for "I'm not tellin'".

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-24 Thread Saso Kiselkov

On 12/24/13, 9:03 PM, Paul B. Henson wrote:
> On 12/23/2013 3:37 AM, Saso Kiselkov wrote:
> 
>> I'm quite fond of WD Red drives. They are just marginally more expensive
>> than your el-cheapo home drive, but are built for 8760 on hours per year
>> (24x7) and are capable for throttling down to 5400rpm when not in use
>> (so they suck less power). WD markets them as designed for use in NAS.
> 
> Do you have an authoritative source for the WD Red drives actually
> spinning at variable rates? From what I've read, while the actual rpm
> might vary between models, any given model only spins at a constant rpm,
> thought to be somewhere between 5000-5900. WD isn't exactly clear as to
> what exactly "Intellipower" means 8-/...
> 
> I have been pretty happy with my 3TB WD reds so far…

True, I had always assumed Intellipower meant "auto-throttle", but they
do mention in their datasheets that it can be anything-to-anything, even
invariable. I guess if you placed a microphone near to it and ran some
FT on it you could easily determine that, but then, I've got better
things to do with my life.

So yeah, they *may* or *may not* auto-throttle :)

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-23 Thread Saso Kiselkov

On 12/23/13, 2:36 PM, Michael Rasmussen wrote:
> On Mon, 23 Dec 2013 12:26:47 +
> Saso Kiselkov  wrote:
> 
>>
>> That's one damn sweet looking piece of hardware, though it's probably
>> going to cost more than the entire HP MicroServer (seeing as the CPU by
>> itself is $171 MSRP). Still, it looks like one kick-ass toy to play with.
>>
> Quad core available in Germany for €259
> http://www.servershop-bayern.de/index.php?page=product&info=9583
> Amazon.co.uk for £267
> http://www.amazon.co.uk/C2550D4I-ASROCK-MOTHERBOARD-MINI-ITX-FCBGA1283/dp/B00GG94YDS
> 
> servethehome.com has a review:
> http://www.servethehome.com/Server-detail/asrock-c2750d4i-atom-c2750-storage-platform-review/

Well, if I were in the market looking for a new home server build (which
I'm not, at least not yet), the deal breaker for me would not be so much
the price, as the presence of fully-featured IPKVM with remote media.
That would enable me to freely experiment with new ZFS development
without having to worry about getting my machine into a non-bootable
state (especially when I'm thousands of miles away from it).

I can definitely see a market for this board.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-23 Thread Saso Kiselkov

On 12/23/13, 2:30 PM, Michael Rasmussen wrote:
> On Mon, 23 Dec 2013 12:26:47 +
> Saso Kiselkov  wrote:
> 
>>
>> That's one damn sweet looking piece of hardware, though it's probably
>> going to cost more than the entire HP MicroServer (seeing as the CPU by
>> itself is $171 MSRP). Still, it looks like one kick-ass toy to play with.
>>
> On Newegg it is listed for a price of: $392.99
> http://www.newegg.com/Product/Product.aspx?Item=N82E16813157475
> 
> Combining the individual components on the board it seems like a fair
> price.
> 
> There is also a quad core (otherwise same specs) available for the price
> of: $289.99
> http://www.asrock.com/server/overview.asp?Model=C2550D4I

Well, a brand new Gen 7 MicroServer, with case, CPU, PSU and basic
equipment (2GB RAM, 1 250GB HDD) will set you back about $280-290:
http://www.amazon.co.uk/gp/aag/main/ref=olp_merch_name_3?ie=UTF8&asin=B00AHQUX86&isAmazonFulfilled=0&seller=A2JOOGMTVUOES
But you get what you pay for. That Asrock board has a lot more
horsepower and if that's what you need, go for it. Ultimately it really
comes down to choosing the right tool for the job. The MicroServer works
for me.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-23 Thread Saso Kiselkov

On 12/23/13, 2:35 PM, Eric Sproul wrote:
> On Mon, Dec 23, 2013 at 7:26 AM, Saso Kiselkov  wrote:
>> On 12/23/13, 12:09 PM, Michael Rasmussen wrote:
>>> On Mon, 23 Dec 2013 11:30:50 +
>>> "C. L. Martinez"  wrote:
>>>
>>>>
>>>> I will add Microserver to my basket, but any more hardware
>>>> recommendations for OmniOS??
>>>> ___
>>> While we are at it. Have somebody tried this motherboard?
>>> http://www.asrock.com/server/overview.asp?Model=C2750D4I
>>
>> That's one damn sweet looking piece of hardware, though it's probably
>> going to cost more than the entire HP MicroServer (seeing as the CPU by
>> itself is $171 MSRP). Still, it looks like one kick-ass toy to play with.
> 
> The Marvell SATA ports probably won't work, but you'll presumably
> still have 6 from the Intel controller, if ahci(7D) will attach.

Don't worry, I didn't plan on buying it, though by the looks of it, it
would make a formidable home/SOHO server if you need something with more
oomph than a MicroServer.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-23 Thread Saso Kiselkov

On 12/23/13, 12:09 PM, Michael Rasmussen wrote:
> On Mon, 23 Dec 2013 11:30:50 +
> "C. L. Martinez"  wrote:
> 
>>
>> I will add Microserver to my basket, but any more hardware
>> recommendations for OmniOS??
>> ___
> While we are at it. Have somebody tried this motherboard?
> http://www.asrock.com/server/overview.asp?Model=C2750D4I

That's one damn sweet looking piece of hardware, though it's probably
going to cost more than the entire HP MicroServer (seeing as the CPU by
itself is $171 MSRP). Still, it looks like one kick-ass toy to play with.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-23 Thread Saso Kiselkov

On 12/23/13, 11:30 AM, C. L. Martinez wrote:
> On Fri, Dec 20, 2013 at 3:36 PM, Saso Kiselkov  wrote:
>> On 12/20/13, 3:22 PM, C. L. Martinez wrote:
>>> On Fri, Dec 20, 2013 at 2:53 PM, Saso Kiselkov  
>>> wrote:
>>>> On Thu, Dec 19, 2013 at 11:02 AM, C. L. Martinez 
>>>> wrote:
>>>>> Thanks jimklimov, but my idea is not to use a microserver box due to
>>>>> problems with SATA disks in AHCI mode under Gen8 series ... And Gen7
>>>>> is too slowly machines ...
>>>>
>>>> Gen7 too slow for you? Didn't you say you're doing a home NAS? Don't
>>>> underestimate the venerable MicroServer, it's plenty fast enough to
>>>> saturate its Gigabit NIC, even while transparently compressing:
>>>> https://www.illumos.org/attachments/822/lz4_compression_bench.ods
>>>> (This is a set of ZFS LZ4 compression benchmarks I ran on my old Gen6
>>>> MicroServer box, so a Gen7 is even faster than this.)
>>>>
>>>> Unless you're doing some serious compute or web serving I wouldn't worry
>>>> about it. It's only 25W TDP, so you won't break the bank running the
>>>> thing 24x7 at home, and it's multi-core and has HVM and ECC memory support.
>>>>
>>>> Cheers,
>>>> --
>>>> Saso
>>>
>>> Thanks Saso ... I have discarded Microserver N56L due to the reports
>>> provided by people in their blogs about poor performance with AMD cpu
>>> using it with ZFS (most of them with FreeNAS).
>>
>> That's strange. Can you point me to these reviews? Sure it's not fast
>> enough to do a full Illumos nightly build on the spot, but it's plenty
>> fast enough for serving files and even standard web server usage (been
>> toying with the idea of replacing some old power-hungry clunkers with it).
>>
>>> But, if I buy a Microserver N56L, do I need to buy some additonal
>>> storage adapter or it is ok with the default??
>>
>> Never had any trouble with the on-board AHCI one, though you may want to
>> reflash the BIOS if you want to use the 5th SATA port in AHCI mode. If
>> you plan on using SAS drives, you can just get an OEM-branded LSI card
>> (they're much cheaper than buying LSI-original ones, despite having the
>> same components:
>> http://accessories.euro.dell.com/sna/productdetail.aspx?c=uk&l=en&s=bsd&sku=405-11540
>> ) and just replug the SFF-8087 multi-link drive-bay connector from the
>> motherboard into your HBA. Alternatively and if you're going for some
>> really crazy deployments, I've used an external micro-JBOD
>> (http://www.amazon.co.uk/Icy-Box-IB-545SSK-5-Bay-Channel/dp/B006BQYSFA)
>> with two MicroServers talking to it over external LSI HBAs
>> (http://accessories.euro.dell.com/sna/productdetail.aspx?c=uk&l=en&s=bsd&cs=ukbsdt1&sku=405-11482&ref=2531xC)
>>
>>> And another point, I need to use this NAS with OmniOS to serve iscsi
>>> disks for one KVM host (CentOS).
>>
> 
> I will add Microserver to my basket, but any more hardware
> recommendations for OmniOS??

I'm quite fond of WD Red drives. They are just marginally more expensive
than your el-cheapo home drive, but are built for 8760 on hours per year
(24x7) and are capable for throttling down to 5400rpm when not in use
(so they suck less power). WD markets them as designed for use in NAS.
I've got 4 of these in my MicroServer and I've never had any issues with
them.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-20 Thread Saso Kiselkov

On 12/20/13, 3:22 PM, C. L. Martinez wrote:
> On Fri, Dec 20, 2013 at 2:53 PM, Saso Kiselkov  wrote:
>> On Thu, Dec 19, 2013 at 11:02 AM, C. L. Martinez 
>> wrote:
>>> Thanks jimklimov, but my idea is not to use a microserver box due to
>>> problems with SATA disks in AHCI mode under Gen8 series ... And Gen7
>>> is too slowly machines ...
>>
>> Gen7 too slow for you? Didn't you say you're doing a home NAS? Don't
>> underestimate the venerable MicroServer, it's plenty fast enough to
>> saturate its Gigabit NIC, even while transparently compressing:
>> https://www.illumos.org/attachments/822/lz4_compression_bench.ods
>> (This is a set of ZFS LZ4 compression benchmarks I ran on my old Gen6
>> MicroServer box, so a Gen7 is even faster than this.)
>>
>> Unless you're doing some serious compute or web serving I wouldn't worry
>> about it. It's only 25W TDP, so you won't break the bank running the
>> thing 24x7 at home, and it's multi-core and has HVM and ECC memory support.
>>
>> Cheers,
>> --
>> Saso
> 
> Thanks Saso ... I have discarded Microserver N56L due to the reports
> provided by people in their blogs about poor performance with AMD cpu
> using it with ZFS (most of them with FreeNAS).

That's strange. Can you point me to these reviews? Sure it's not fast
enough to do a full Illumos nightly build on the spot, but it's plenty
fast enough for serving files and even standard web server usage (been
toying with the idea of replacing some old power-hungry clunkers with it).

> But, if I buy a Microserver N56L, do I need to buy some additonal
> storage adapter or it is ok with the default??

Never had any trouble with the on-board AHCI one, though you may want to
reflash the BIOS if you want to use the 5th SATA port in AHCI mode. If
you plan on using SAS drives, you can just get an OEM-branded LSI card
(they're much cheaper than buying LSI-original ones, despite having the
same components:
http://accessories.euro.dell.com/sna/productdetail.aspx?c=uk&l=en&s=bsd&sku=405-11540
) and just replug the SFF-8087 multi-link drive-bay connector from the
motherboard into your HBA. Alternatively and if you're going for some
really crazy deployments, I've used an external micro-JBOD
(http://www.amazon.co.uk/Icy-Box-IB-545SSK-5-Bay-Channel/dp/B006BQYSFA)
with two MicroServers talking to it over external LSI HBAs
(http://accessories.euro.dell.com/sna/productdetail.aspx?c=uk&l=en&s=bsd&cs=ukbsdt1&sku=405-11482&ref=2531xC)

> And another point, I need to use this NAS with OmniOS to serve iscsi
> disks for one KVM host (CentOS).

Just use the COMSTAR iSCSI target as usual.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-20 Thread Saso Kiselkov

On Thu, Dec 19, 2013 at 11:02 AM, C. L. Martinez 
wrote:
> Thanks jimklimov, but my idea is not to use a microserver box due to
> problems with SATA disks in AHCI mode under Gen8 series ... And Gen7
> is too slowly machines ...

Gen7 too slow for you? Didn't you say you're doing a home NAS? Don't
underestimate the venerable MicroServer, it's plenty fast enough to
saturate its Gigabit NIC, even while transparently compressing:
https://www.illumos.org/attachments/822/lz4_compression_bench.ods
(This is a set of ZFS LZ4 compression benchmarks I ran on my old Gen6
MicroServer box, so a Gen7 is even faster than this.)

Unless you're doing some serious compute or web serving I wouldn't worry
about it. It's only 25W TDP, so you won't break the bank running the
thing 24x7 at home, and it's multi-core and has HVM and ECC memory support.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OT: Similar hardware like MIcroserver Gen8

2013-12-20 Thread Saso Kiselkov

On 12/20/13, 1:50 PM, Johan Kragsterman wrote:
> Sorry, forgot the list:
> 
> Well, this depends a lot of what you're going to use it for...
> 
> If you're concerned about performance, or if you're concerned about how much 
> space you need, or both or neither...
> 
> I would buy a second hand HP DL380 G6 for performance needs, or a DL180 G6 
> for space needs.
> 
> If you would want both perfermance and space, I'd buy a DL380 G6 and an 
> expansion jbod, perhaps HP MSA60 if you like HP gear.
> 
> In this way you will get much more for your buck.

And you get the added benefit of watching your electricity meter turn
into a blower as it spins itself to death and eats your wallet in the
process. By a quick first-degree approximation a server drawing 250W
24x7x365 will set you back about $300 a year - that's more than the cost
of a brand new HP MicroServer. By comparison, an HP MicroServer's draw
is more in the 25-35W region.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] SSH over HTTPS

2013-12-17 Thread Saso Kiselkov

On 12/17/13, 2:14 PM, John D Groenveld wrote:
> In message <52b03a8d.8090...@gmail.com>, Saso Kiselkov writes:
>> Minor side-note, unless the proxy is trying to brutally MITM the session
>> (forged certificates and all), then there's absolutely no way for it to
>> know if a particular TLS session is carrying HTTPS traffic or something
>> else (short of doing some kind of statistical analysis of the traffic
>> flow, that is).
> 
> I believe Palo Alto Network's product combines statefull firewall and
> application proxy inspection.
> https://www.paloaltonetworks.com/content/dam/paloaltonetworks-com/en_US/assets/pdf/tech-briefs/paloaltonetworks-vs-proxy.pdf>

Which does it exactly by utilizing statistical analysis, as I mentioned.
That having been said, it's trivial to break through that by simply
encapsulating your SSH traffic using HTTP tunneling software. Then, for
all intents and purposes, your traffic looks like regular HTTPS (because
it is). Of course they may choose to filter anything that exchanges
small HTTP requests too aggressively, but that would probably break a
fair number of AJAX-based web apps such as GMail (which can be rather
chatty over the line, frequently exchanging tiny XML blobs as you type
messages, etc.).

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] SSH over HTTPS

2013-12-17 Thread Saso Kiselkov

On 12/17/13, 8:39 AM, Jim Klimov wrote:
> Possibly, your work's firewall
> is smart enough to probe the port you requested and find out that
> there is no HTTP(S) on it, so it denies the connection.

Minor side-note, unless the proxy is trying to brutally MITM the session
(forged certificates and all), then there's absolutely no way for it to
know if a particular TLS session is carrying HTTPS traffic or something
else (short of doing some kind of statistical analysis of the traffic
flow, that is).

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] Re: [SPAM] Bizarre zfs-related hang in omnios r151008 on 1-CPU VM

2013-12-06 Thread Saso Kiselkov

On 12/6/13, 8:19 PM, Eric Sproul wrote:
> Saso,
> Here is an updated zfs kernel module from a build with the "4347 ZPL
> can use dmu_tx_assign(TXG_WAIT)" fix applied:
> 
> http://omnios.omniti.com/media/zfs-driver-with-txg_wait-fix.amd64.r151008
> 
> sha1 (zfs-driver-with-txg_wait-fix.amd64.r151008) =
> 06d9451ad6f157fd61877bf24d475b14f913e964
> 
> If you need additional files or anything else, just let us know.
> 

Thanks Eric, the new module works like a charm. Here's the installation
I did:

root@omnios:~# beadm create -e omnios-r151008 omnios-r151008-fix
Created successfully
root@omnios:~# beadm mount omnios-r151008-fix
Mounted successfully on: '/tmp/tmp.JAaqdb'
root@omnios:~# cd /tmp/tmp.JAaqdb/
root@omnios:/tmp/tmp.JAaqdb# cp
~/zfs-driver-with-txg_wait-fix.amd64.r151008 kernel/fs/amd64/zfs
root@omnios:/tmp/tmp.JAaqdb# cd
root@omnios:~# bootadm update-archive -R /tmp/tmp.JAaqdb
Creating boot_archive for /tmp/tmp.JAaqdb
updating /tmp/tmp.JAaqdb/platform/i86pc/boot_archive
updating /tmp/tmp.JAaqdb/platform/i86pc/amd64/boot_archive
root@omnios:~# beadm umount omnios-r151008-fix
Unmounted successfully
root@omnios:~# reboot -p

And here's the test:

dd if=/dev/urandom of=test bs=1M count=128 ; sync

On the original BE this hangs, on the new BE works without a hitch.

I'd say ship it!

Best wishes,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [SPAM] Bizarre zfs-related hang in omnios r151008 on 1-CPU VM

2013-12-06 Thread Saso Kiselkov

On 12/6/13, 5:39 AM, Rob Logan wrote:
> 
>> on the latest stable omnios release. When I'm running in VMware Fusion
>> on a 1-CPU VM and doing any significant write IO to the pool (e.g. just
>>atomic_dec_32_nv+8()
>>dbuf_read+0x179(ff00d2393600, ff00c72f98f0, a)
>>dmu_tx_check_ioerr+0x76(ff00c72f98f0, ff00d2279cf0, 0, 1e0)
>>dmu_tx_count_write+0x395(ff00ce0536e0, 3c04000, 4000)
>>dmu_tx_hold_write+0x5a(ff00d1a55300, 4009, 3c04000, 4000)
>>zfs_write+0x3e3(ff00d09ef540, ff00028e7e60, 0,
>> ff00cd511748, 0)
>>fop_write+0x5b(ff00d09ef540, ff00028e7e60, 0,
>> ff00cd511748, 0)
>>write+0x250(1, 440660, 4000)
>>sys_syscall+0x17a()
> 
> 
> doing the normal re-write of root in r151008 three times 
> into lz4 didn’t have any issues on my 2cpu 2G vbox
> 
> root@OmniOS:~# lspci
> 00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox 
> Graphics Adapter
> 00:04.0 System peripheral: InnoTek Systemberatung GmbH VirtualBox Guest 
> Service
> 00:05.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio 
> Controller (rev 01)
> 00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
> 00:11.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet 
> Controller (Copper) (rev 02)
> 00:18.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
> 00:19.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
> 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge 
> (rev 02)
> 00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA 
> Controller [AHCI mode] (rev 02)
> 00:1f.4 USB controller: Apple Inc. KeyLargo/Intrepid USB
> 
> root@OmniOS:~# zfs get all | grep refcompressratio
> rpoolrefcompressratio  1.00x  
>-
> rpool/ROOT   refcompressratio  1.00x  
>-
> rpool/ROOT/start refcompressratio  1.85x  
>-
> rpool/ROOT/work  refcompressratio  1.98x  
>-
> rpool/ROOT/work@2013-12-05-19:24:16  refcompressratio  1.85x  
>-
> 
> not sure how to reproduce.

You need a 1-CPU system. As Matt pointed out, the hang is most probably
caused by a deadlock that was resolved in e722410.

OmniTI: I believe rolling this into the next weekly patch cycle might be
kind of important?

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] Bizarre zfs-related hang in omnios r151008 on 1-CPU VM

2013-12-06 Thread Saso Kiselkov

On 12/6/13, 6:31 AM, Matthew Ahrens wrote:
> Be sure you have the following fix; without it I recall seeing spins
> from the ZPL similar to that stack trace.  With only 1 cpu, if a kernel
> thread spins, it can be very hard to get other threads to run.
> 
> commit e722410c49fe67cbf0f639cbcc288bd6cbcf7dd1
> 
> Author: Matthew Ahrens mailto:mahr...@delphix.com>>
> 
> Date:   Tue Nov 26 13:47:33 2013 -0500
> 
> 
> 4347 ZPL can use dmu_tx_assign(TXG_WAIT)
> 
> Reviewed by: George Wilson  >
> 
> Reviewed by: Adam Leventhal mailto:a...@delphix.com>>
> 
> Reviewed by: Dan McDonald  >
> 
> Reviewed by: Boris Protopopov  >
> 
> Approved by: Dan McDonald  >

That sounds like pretty much exactly what I hit. Gonna ask the omnios
maintainers to reroll a new zfs module and retest. All of my custom
kernels are newer than this, so it's likely that that saved my bacon.

Cheers,
-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Bizarre zfs-related hang in omnios r151008 on 1-CPU VM

2013-12-05 Thread Saso Kiselkov

I'm investigating a bizarre hang situation which I noticed by accident
on the latest stable omnios release. When I'm running in VMware Fusion
on a 1-CPU VM and doing any significant write IO to the pool (e.g. just
dd'ing something around is enough to trigger this), the VM will, with
100% certainty, hang. Console input works, but all userspace programs
are stopped and nothing responds (e.g. attempting to telnet to sshd over
the network establishes the socket, but then sshd doesn't print the
version string).

Using some dtrace foo and kmdb I was able to trace it (roughly, the
exact stack trace changes between hangs, which is mighty weird in itself):

atomic_dec_32_nv+8()
dbuf_read+0x179(ff00d2393600, ff00c72f98f0, a)
dmu_tx_check_ioerr+0x76(ff00c72f98f0, ff00d2279cf0, 0, 1e0)
dmu_tx_count_write+0x395(ff00ce0536e0, 3c04000, 4000)
dmu_tx_hold_write+0x5a(ff00d1a55300, 4009, 3c04000, 4000)
zfs_write+0x3e3(ff00d09ef540, ff00028e7e60, 0,
ff00cd511748, 0)
fop_write+0x5b(ff00d09ef540, ff00028e7e60, 0,
ff00cd511748, 0)
write+0x250(1, 440660, 4000)
sys_syscall+0x17a()

(usually the trace is identical up to dmu_tx_hold_write)

I can definitely confirm that this doesn't happen on omnios r151006 and
it doesn't happen on my vanilla kernels either. My suspicion is that
something got botched in the "OMNIOS#72 Integrate Joyent updated zone
write throttle" commit, but I can't put my finger on it.

Can somebody please confirm this?

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] 2x acctual disk quantity

2013-12-04 Thread Saso Kiselkov

Just for the record and as a follow-up, I got the variable name wrong,
the correct name should have been:

scsi-vhci-failover-override =
   "HP  EG0300", "f_sym",
   "HP  DG0300", "f_sym";

With this setting, the disks got correctly discovered and set up by
scsi_vhci.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

1 2 >

1 - 100 of 138 matches

Mail list logo