from:"Rune Tipsmark"

[OmniOS-discuss] OmniOS Panic on high ZFS Write Load

2014-05-15 Thread Rune Tipsmark

My server panics on high write load using VMware to provision thick disk to the 
LU over infiniband.

I get this error here http://i.imgur.com/fxk79zJ.png every time I put over 
1.5GB/sec load on my ZFS box.

Tried various disks, controllers, omnios distributions, OI distributions etc.

Always the same, easy to reproduce.

Googled for ever to find anything, but nothing.

Does anyone have any idea? I don't really want to abandon ZFS just yet.

Venlig hilsen / Best regards,
Rune Tipsmark

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OmniOS Panic on high ZFS Write Load

2014-05-16 Thread Rune Tipsmark

Hi guys,

After having tried various distros as mentioned and after having tried SLC and 
MLC PCI-E devices as well as SSD disks I think I actually found the issue.

Previously I had a bunch of SATA disks connected to my SAS controller as well 
as a bunch of SAS disks... now that I removed the SATA disks and only have SAS 
disks left I have not been able to reproduce the issue (regardless the fact I 
didn't even use the SAS controller for some tests that crashes). Very weird and 
what a waste of a few hundred hours of reinstalling/testing, swapping cables, 
switches, memory, messing with bios settings and what have we.

I now have two stable pools which each write a reasonable ~430 MB/sec with 
sync=always on without crashing.

Lesson - stay far away from SATA disks on LSI 9207-4i4e

Thanks for all the feedback.

Br,
Rune

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Dan McDonald
Sent: Thursday, May 15, 2014 7:44 AM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] OmniOS Panic on high ZFS Write Load

On May 15, 2014, at 10:41 AM, Dan McDonald  wrote:
> 
> What OmniOS version are you running?  Also, how much memory do you have on 
> this system, and have you done any crazy tunings to increase kernel memory 
> usage?

Sorry, you said you tried this on many versions.

If you can, stick with r151010 (our latest stable) and get a system dump from 
this box.

It's possible too, as Narayan points out, checking for HW errors is helpful.

Also, I may ask you to reproduce this bug with kernel memory debugging enabled. 
 If something is using freed memory, that'd be nice to know.  And finally, are 
you using 3rd-party binary drivers?  Or the native ones in your distro?

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OmniOS Panic on high ZFS Write Load

2014-05-16 Thread Rune Tipsmark

SAS expander and 9 western digital WD4003FZEX
Now with 10 Seagate ST4000NM0023 instead things seem to work much better.

/Rune


-Original Message-
From: Dan McDonald [mailto:dan...@omniti.com] 
Sent: Friday, May 16, 2014 10:41 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] OmniOS Panic on high ZFS Write Load


On May 16, 2014, at 12:57 PM, Rune Tipsmark  wrote:

> Hi guys,
> 
> After having tried various distros as mentioned and after having tried SLC 
> and MLC PCI-E devices as well as SSD disks I think I actually found the issue.
> 
> Previously I had a bunch of SATA disks connected to my SAS controller as well 
> as a bunch of SAS disks... now that I removed the SATA disks and only have 
> SAS disks left I have not been able to reproduce the issue (regardless the 
> fact I didn't even use the SAS controller for some tests that crashes). Very 
> weird and what a waste of a few hundred hours of reinstalling/testing, 
> swapping cables, switches, memory, messing with bios settings and what have 
> we.
> 
> I now have two stable pools which each write a reasonable ~430 MB/sec with 
> sync=always on without crashing.
> 
> Lesson - stay far away from SATA disks on LSI 9207-4i4e

Were you using a JBOD or other expander?  I've *heard* you can direclty attach 
SATA disks to an mpt_sas board if you're careful.

But generally speaking, it's operationally foolish to attach SATA drives 
anywhere other than to dedicated SATA ports.

Thanks,
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Slow write performance

2014-05-16 Thread Rune Tipsmark

Not sure if it's expected but for reference here are some numbers from my two 
pools, the drives are Seagate ST4000NM0023, controllers is LSI 9207-4i4e


NAME   STATE READ WRITE CKSUM
pool01 ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
c1t5000C50055FC9533d0  ONLINE   0 0 0
c1t5000C50055FE6A63d0  ONLINE   0 0 0
  mirror-1 ONLINE   0 0 0
c1t5000C5005708296Fd0  ONLINE   0 0 0
c1t5000C5005708351Bd0  ONLINE   0 0 0
  mirror-2 ONLINE   0 0 0
c1t5000C500570858EFd0  ONLINE   0 0 0
c1t5000C50057085A6Bd0  ONLINE   0 0 0
logs
  c9d0 ONLINE   0 0 0
cache
  c7d0 ONLINE   0 0 0
  c11d0ONLINE   0 0 0

root@zfs10:~# time dd if=/dev/zero of=/pool02/dd.tst bs=1024000 count=2
2+0 records in
2+0 records out
2048000 bytes (20 GB) copied, 50.6698 s, 404 MB/s

And the other pool

NAME   STATE READ WRITE CKSUM
pool02 ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
c1t5000C50057086307d0  ONLINE   0 0 0
c1t5000C50057086B67d0  ONLINE   0 0 0
  mirror-1 ONLINE   0 0 0
c1t5000C500570870D3d0  ONLINE   0 0 0
c1t5000C50057089753d0  ONLINE   0 0 0
logs
  c10d0ONLINE   0 0 0
cache
  c12d0ONLINE   0 0 0
  c8d0 ONLINE   0 0 0

root@zfs10:~# time dd if=/dev/zero of=/pool01/dd.tst bs=1024000 count=2
2+0 records in
2+0 records out
2048000 bytes (20 GB) copied, 50.2413 s, 408 MB/s

Maybe try creating a pool from disks on one of the controllers and test.


Venlig hilsen / Best regards,
Rune Tipsmark

From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Matthew Lagoe
Sent: Friday, May 16, 2014 9:15 PM
To: omnios-discuss@lists.omniti.com
Subject: [OmniOS-discuss] Slow write performance

I have a system that I am building, 2 x E5-2620 with 3 x LSI 9207-8e one drive 
from each mirror plugged into one HBA, configuration is below

On reads I get about 653 MB/s (which is great) and writes 263 MB/s

A single drive gets around 120MB/s read and 90MB/s write and is a ST3000NM0023 
sas Seagate I would assume write performance should go up with number of vdev's 
however it seems to only increase by ~60%.

Is this expected?

NAME   STATE READ WRITE CKSUM
tank   ONLINE 0 0 0
  mirror-0 ONLINE   0 0 0
c4t5000C50057B946C7d0  ONLINE   0 0 0
c4t5000C50057B9792Bd0  ONLINE   0 0 0
c4t5000C50057BA72B3d0  ONLINE   0 0 0
  mirror-1 ONLINE   0 0 0
c4t5000C50057BA7F0Bd0  ONLINE   0 0 0
c4t5000C50057BFA69Bd0  ONLINE   0 0 0
c4t5000C50057C1A177d0  ONLINE   0 0 0
  mirror-2 ONLINE   0 0 0
c4t5000C50057C3CDF3d0  ONLINE   0 0 0
c4t5000C5005815632Fd0  ONLINE   0 0 0
c4t5000C5005815650Fd0  ONLINE   0 0 0
  mirror-3 ONLINE   0 0 0
c4t5000C5005817ECF7d0  ONLINE   0 0 0
c4t5000C50058185583d0  ONLINE   0 0 0
c4t5000C500581C8397d0  ONLINE   0 0 0
  mirror-4 ONLINE   0 0 0
c4t5000C500581CB967d0  ONLINE   0 0 0
c4t5000C500581CD21Bd0  ONLINE   0 0 0
c4t5000C500581F147Fd0  ONLINE   0 0 0


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Onboard Intel X540-T2 10gbe NIC shows 1gbit

2014-06-09 Thread Rune Tipsmark

As stated above,

Got a Super Micro X9DRE-TF+ with onboard Intel X540 10Gbase-T with a cat6 cable 
straight into another server (with Windows) using a PCI-E Intel X540-T2 
10Gbase-T as well.

Both ends are set to autosense, however only 1gbit is detected on OmniOS, I can 
force 10Gbit on Windows, but it won't come up because the other end supports 
max 1Gbit.

I tried changing link speed manually in OmniOS, no luck, says not supported.

Why does it not detect proper 10Gbit on the OmniOS end of things? Any 
suggestions as how to get this working?

Version is 151010 , tried reinstall already - no difference.


Venlig hilsen / Best regards,
Rune Tipsmark
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Onboard Intel X540-T2 10gbe NIC shows 1gbit

2014-06-09 Thread Rune Tipsmark

Var/adm/messages shows

Jun  7 01:43:22 zfs10 mac: [ID 469746 kern.info] NOTICE: ixgbe0 registered
Jun  7 01:43:22 zfs10 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe0: Intel 10Gb 
Ethernet, ixgbe 1.1.7
Jun  7 01:43:23 zfs10 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link up, 1000 
Mbps, full duplex
Jun  7 01:43:24 zfs10 mac: [ID 469746 kern.info] NOTICE: ixgbe1 registered
Jun  7 01:43:24 zfs10 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe1: Intel 10Gb 
Ethernet, ixgbe 1.1.7
Jun  7 01:43:26 zfs10 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 
Mbps, full duplex

I tried looking for new FW from both Intel and SM but none seem to have any 
from what I can find.

I tried connecting only one port on each end as they were different speeds (one 
port directly to another server, one port to a 1gbit switch) - no difference.
Will check the BIOS setting too.

Br,
Rune

-Original Message-
From: Dan McDonald [mailto:dan...@omniti.com] 
Sent: Monday, June 09, 2014 4:26 PM
To: Rune Tipsmark
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] Onboard Intel X540-T2 10gbe NIC shows 1gbit

On Jun 9, 2014, at 7:14 PM, Rune Tipsmark  wrote:

> As stated above,
>  
> Got a Super Micro X9DRE-TF+ with onboard Intel X540 10Gbase-T with a cat6 
> cable straight into another server (with Windows) using a PCI-E Intel X540-T2 
> 10Gbase-T as well.
>  
> Both ends are set to autosense, however only 1gbit is detected on OmniOS, I 
> can force 10Gbit on Windows, but it won't come up because the other end 
> supports max 1Gbit.
>  
> I tried changing link speed manually in OmniOS, no luck, says not supported.
>  
> Why does it not detect proper 10Gbit on the OmniOS end of things? Any 
> suggestions as how to get this working?
>  
> Version is 151010 , tried reinstall already - no difference.

10Gig on here should work.  I've not had hands-on experience with built-in 
ones, but I've used add-on cards with X540 to great success.

Did you inspect /var/adm/messages for any complaints from "ixgbe"?  You're 
using cat6, which is good.  One silly question --> on the BIOS on the OmniOS 
box, did you enable PCIe mappings for only the low 32-bits?  That's a longshot, 
I know, but it may be an issue.

ALSO, we recently had someone else on-list post about X540, and it turned out 
they needed a firmware upgrade to get it to work.  Please make sure your BIOS 
and device firmware are up to date.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Onboard Intel X540-T2 10gbe NIC shows 1gbit

2014-06-09 Thread Rune Tipsmark

Thanks but it is already the latest version, I don’t know if there is a 
specific firmware available for the onboard LAN only.
Br,
Rune

From: Chih-Hung Hsieh [mailto:flight@gmail.com]
Sent: Monday, June 09, 2014 5:29 PM
To: Rune Tipsmark
Cc: Dan McDonald; omnios-discuss
Subject: Re: [OmniOS-discuss] Onboard Intel X540-T2 10gbe NIC shows 1gbit

If your X9DRE-TF+ BIOS revision is not R 3.0a.
You can use the following file to update.

Mother Board: Super Micro X9DRE-TF+
BIOS Revision: R 3.0a
File Name: X9DR7P3_C04.zip

http://www.supermicro.com/about/policies/disclaimer.cfm?url=/support/resources/getfile.aspx?ID=2526

2014-06-10 7:41 GMT+08:00 Rune Tipsmark 
mailto:r...@steait.net>>:
Var/adm/messages shows

Jun  7 01:43:22 zfs10 mac: [ID 469746 kern.info<http://kern.info>] NOTICE: 
ixgbe0 registered
Jun  7 01:43:22 zfs10 ixgbe: [ID 611667 kern.info<http://kern.info>] NOTICE: 
ixgbe0: Intel 10Gb Ethernet, ixgbe 1.1.7
Jun  7 01:43:23 zfs10 mac: [ID 435574 kern.info<http://kern.info>] NOTICE: 
ixgbe0 link up, 1000 Mbps, full duplex
Jun  7 01:43:24 zfs10 mac: [ID 469746 kern.info<http://kern.info>] NOTICE: 
ixgbe1 registered
Jun  7 01:43:24 zfs10 ixgbe: [ID 611667 kern.info<http://kern.info>] NOTICE: 
ixgbe1: Intel 10Gb Ethernet, ixgbe 1.1.7
Jun  7 01:43:26 zfs10 mac: [ID 435574 kern.info<http://kern.info>] NOTICE: 
ixgbe1 link up, 1000 Mbps, full duplex

I tried looking for new FW from both Intel and SM but none seem to have any 
from what I can find.

I tried connecting only one port on each end as they were different speeds (one 
port directly to another server, one port to a 1gbit switch) - no difference.
Will check the BIOS setting too.

Br,
Rune

-Original Message-
From: Dan McDonald [mailto:dan...@omniti.com<mailto:dan...@omniti.com>]
Sent: Monday, June 09, 2014 4:26 PM
To: Rune Tipsmark
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] Onboard Intel X540-T2 10gbe NIC shows 1gbit

On Jun 9, 2014, at 7:14 PM, Rune Tipsmark 
mailto:r...@steait.net>> wrote:

> As stated above,
>
> Got a Super Micro X9DRE-TF+ with onboard Intel X540 10Gbase-T with a cat6 
> cable straight into another server (with Windows) using a PCI-E Intel X540-T2 
> 10Gbase-T as well.
>
> Both ends are set to autosense, however only 1gbit is detected on OmniOS, I 
> can force 10Gbit on Windows, but it won't come up because the other end 
> supports max 1Gbit.
>
> I tried changing link speed manually in OmniOS, no luck, says not supported.
>
> Why does it not detect proper 10Gbit on the OmniOS end of things? Any 
> suggestions as how to get this working?
>
> Version is 151010 , tried reinstall already - no difference.

10Gig on here should work.  I've not had hands-on experience with built-in 
ones, but I've used add-on cards with X540 to great success.

Did you inspect /var/adm/messages for any complaints from "ixgbe"?  You're 
using cat6, which is good.  One silly question --> on the BIOS on the OmniOS 
box, did you enable PCIe mappings for only the low 32-bits?  That's a longshot, 
I know, but it may be an issue.

ALSO, we recently had someone else on-list post about X540, and it turned out 
they needed a firmware upgrade to get it to work.  Please make sure your BIOS 
and device firmware are up to date.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Onboard Intel X540-T2 10gbe NIC shows 1gbit

2014-06-16 Thread Rune Tipsmark

Ok , had a chance to loop the cables now, the second I loop the 2 ports on my 
Windows server it brings it up as 10gbit as expected.
OmniOS still brings it up as 1gbit, tried changing cables as well.

Next stop must be a support case with SM I suppose.

Did you have the exact same motherboard Ian?

Br,
Rune

-Original Message-
From: Ian Collins [mailto:i...@ianshome.com] 
Sent: Monday, June 09, 2014 6:57 PM
To: Rune Tipsmark
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] Onboard Intel X540-T2 10gbe NIC shows 1gbit

Rune Tipsmark wrote:
>
> As stated above,
>
> Got a Super Micro X9DRE-TF+ with onboard Intel X540 10Gbase-T with a
> cat6 cable straight into another server (with Windows) using a PCI-E 
> Intel X540-T2 10Gbase-T as well.
>

I have the same combination of hardware (about a year old) and 
everything works fine as 10G with Solaris or Illumos at either end.

 From the OmniOS end:

ixgbe1   Ethernet up 1  full  ixgbe1

There's a SmartOS box at the other end:

ixgbe0   Ethernet up 1  full  ixgbe0

-- 
Ian.

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-07 Thread Rune Tipsmark

hi guys, wondering if someone might know why my pool is still allocated 1.45T 
after I removed the files on the LU's provisioned onto that pool.


 pool: pool02
 state: ONLINE
  scan: scrub in progress since Wed Oct  8 02:31:41 2014
29.1G scanned out of 1.45T at 52.4M/s, 7h54m to go
0 repaired, 1.96% done
config:

NAME   STATE READ WRITE CKSUM
pool02 ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
c4t5000C500570858EFd0  ONLINE   0 0 0
c4t5000C50057085A6Bd0  ONLINE   0 0 0
  mirror-1 ONLINE   0 0 0
c4t5000C500570870D3d0  ONLINE   0 0 0
c4t5000C50057089753d0  ONLINE   0 0 0
logs
  c11d0ONLINE   0 0 0
cache
  c13d0ONLINE   0 0 0
  c15d0ONLINE   0 0 0


Supporting 3 LU's mounted in Vmware, I formatted the LUNs again and no change...

Do I really  need to delete the pool, then re-create it?

br,
Rune



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-09 Thread Rune Tipsmark

Thanks for the input Filip,

I think I found the answer though... no VAAI in OmniOS/ZFS

VAAI provides amongst other features:

Thin Provisioning in ESXi 5.x and later hosts, which allows the ESXi host to 
tell the array when the space previously occupied by a virtual machine (whether 
it is deleted or migrated to another datastore) can be reclaimed on thin 
provisioned LUNs.

Next questions is, will VAAI be supported in OmniOS/ZFS

Br,
Rune

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Filip Marvan
Sent: Wednesday, October 08, 2014 7:53 AM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files

Hello,

maybe you will have to enable compression on that ZVOL, and rewrite that disk 
with zeroes from client. Because your pool propably don't even know from vmware 
client, that you delete your data.

Filip

--
Date: Tue, 7 Oct 2014 23:00:47 +
From: Rune Tipsmark 
To: omnios-discuss 
Subject: [OmniOS-discuss] ZFS pool allocation remains after removing
all files
Message-ID: <3dde7c3ae1994829b588d3df09855...@ex1301.steait.net>
Content-Type: text/plain; charset="us-ascii"

hi guys, wondering if someone might know why my pool is still allocated 1.45T 
after I removed the files on the LU's provisioned onto that pool.


 pool: pool02
 state: ONLINE
  scan: scrub in progress since Wed Oct  8 02:31:41 2014
29.1G scanned out of 1.45T at 52.4M/s, 7h54m to go
0 repaired, 1.96% done
config:

NAME   STATE READ WRITE CKSUM
pool02 ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
c4t5000C500570858EFd0  ONLINE   0 0 0
c4t5000C50057085A6Bd0  ONLINE   0 0 0
  mirror-1 ONLINE   0 0 0
c4t5000C500570870D3d0  ONLINE   0 0 0
c4t5000C50057089753d0  ONLINE   0 0 0
logs
  c11d0ONLINE   0 0 0
cache
  c13d0ONLINE   0 0 0
  c15d0ONLINE   0 0 0


Supporting 3 LU's mounted in Vmware, I formatted the LUNs again and no change...

Do I really  need to delete the pool, then re-create it?

br,
Rune

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-09 Thread Rune Tipsmark

Yeah, I searched and found a few threads about it, seems like it won't happen 
anytime soon.

I would actually pay for that feature in OmniOS.

Rune

-Original Message-
From: Filip Marvan [mailto:filip.mar...@aira.cz] 
Sent: Thursday, October 09, 2014 3:21 AM
To: omnios-discuss@lists.omniti.com
Cc: Rune Tipsmark
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files

Hello,

yes, VAAI could be probably the answer for vmware, and for example NexentaStor 
have some VAAI support as I know, but there were many problems with that (based 
on posts from users on Nexenta's forum) like freezing under heavy load and so 
on.

I gues, it will be very complicated to bring VAAI support to OmniOS.

Filip

>-
>Date: Thu, 9 Oct 2014 10:02:51 +0000
>From: Rune Tipsmark 
>To: Filip Marvan ,
>   "omnios-discuss@lists.omniti.com"   
> 
>Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after
>   removing all files
>Message-ID: <84b71eba9b5f43828f944cddb42f0...@ex1301.steait.net>
>Content-Type: text/plain; charset="us-ascii"
>
>Thanks for the input Filip,
>
>I think I found the answer though... no VAAI in OmniOS/ZFS
>
>VAAI provides amongst other features:
>
>Thin Provisioning in ESXi 5.x and later hosts, which allows the ESXi host to 
>tell the array when the >space previously occupied by a virtual machine 
>(whether it is deleted or migrated to another >datastore) can be reclaimed on 
>thin provisioned LUNs.
>
>Next questions is, will VAAI be supported in OmniOS/ZFS
>
>Br,
>Rune

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-09 Thread Rune Tipsmark

I am happy to test in our environment, we have some semi-live demo servers (10+ 
virtual servers) that are being used as in a real production environment with 
load on them etc.

If some VAAI works, should I be able to see it in VMware? Does it make any 
difference I use SRP to connect the LU's?

My VAAI status for each LUN is as follows:

naa.600144f0908abf5d5391079d0008
   VAAI Plugin Name:
   ATS Status: unsupported
   Clone Status: unsupported
   Zero Status: supported
   Delete Status: unsupported

Br,
Rune

-Original Message-
From: Dan McDonald [mailto:dan...@omniti.com] 
Sent: Thursday, October 09, 2014 8:46 AM
To: Richard Elling
Cc: Rune Tipsmark; Filip Marvan; omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files

On Oct 9, 2014, at 11:21 AM, Richard Elling  
wrote:

> DanMcD will know for sure, but vols do support SCSI UNMAP over comstar.

AND there are improvements to this for ZFS compliments of Delphix.

> The "missing" support is for ZFS to issue SCSI UNMAP commands to the disks.

That is true.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-09 Thread Rune Tipsmark

On OmniOS v11 r151010

-Original Message-
From: Dan McDonald [mailto:dan...@omniti.com] 
Sent: Thursday, October 09, 2014 11:11 AM
To: Rune Tipsmark
Cc: Richard Elling; Filip Marvan; omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files

On Oct 9, 2014, at 1:51 PM, Rune Tipsmark  wrote:

> I am happy to test in our environment, we have some semi-live demo servers 
> (10+ virtual servers) that are being used as in a real production environment 
> with load on them etc.
> 
> If some VAAI works, should I be able to see it in VMware?

As "hardware assisted erase", I believe.

> Does it make any difference I use SRP to connect the LU's?

My VMware is weak.

> My VAAI status for each LUN is as follows:
> 
> naa.600144f0908abf5d5391079d0008
>   VAAI Plugin Name:
>   ATS Status: unsupported

Corresponds to ATS, which is in Nexenta only for now.

>   Clone Status: unsupported

Corresponds to XCOPY, which is in Nexenta only for now.

>   Zero Status: supported

Corresponds to WRITE_SAME, which is in all illumos distros.

>   Delete Status: unsupported

Corresponds to UNMAP, which is in all illumos distros.

I'm surprised "delete status" isn't marked as supported, but I do seem to 
recall a recent push disabled UNMAP (delete status) by default.  Let me look...

... Yeah... there's a global "zvol_unmap_enabled" now.  It *should* be true by 
default, though.  Which OmniOS revision are you on?

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-09 Thread Rune Tipsmark

So if I just upgrade to latest it should be supported?

Rune

-Original Message-
From: Dan McDonald [mailto:dan...@omniti.com] 
Sent: Thursday, October 09, 2014 11:37 AM
To: Rune Tipsmark
Cc: Richard Elling; Filip Marvan; omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files

I meant to say:

In theory, the "HW Delete" should be available.  I'm not sure off the 
top of my head why it's not available in practice.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-09 Thread Rune Tipsmark

Is there a command I can run to check?

Rune

-Original Message-
From: Dan McDonald [mailto:dan...@omniti.com] 
Sent: Thursday, October 09, 2014 11:51 AM
To: Rune Tipsmark
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files


On Oct 9, 2014, at 2:38 PM, Rune Tipsmark  wrote:

> So if I just upgrade to latest it should be supported?

It should be available in r151010!  That's why I'm surprised.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-09 Thread Rune Tipsmark

Just updated to latest version r151012

Still same... I checked for vdev settings, is there another place I can check?

root@zfs10:/root# echo "::zfs_params" | mdb -k | grep vdev
zfs_vdev_max_active = 0x3e8
zfs_vdev_sync_read_min_active = 0xa
zfs_vdev_sync_read_max_active = 0xa
zfs_vdev_sync_write_min_active = 0xa
zfs_vdev_sync_write_max_active = 0xa
zfs_vdev_async_read_min_active = 0x1
zfs_vdev_async_read_max_active = 0x3
zfs_vdev_async_write_min_active = 0x1
zfs_vdev_async_write_max_active = 0xa
zfs_vdev_scrub_min_active = 0x1
zfs_vdev_scrub_max_active = 0x2
zfs_vdev_async_write_active_min_dirty_percent = 0x1e
zfs_vdev_async_write_active_max_dirty_percent = 0x3c
mdb: variable reference_tracking_enable not found: unknown symbol name
mdb: variable reference_history not found: unknown symbol name
zfs_vdev_cache_max = 0x4000
zfs_vdev_cache_size = 0x0
zfs_vdev_cache_bshift = 0x10
vdev_mirror_shift = 0x15
zfs_vdev_aggregation_limit = 0x2

Rune

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Rune Tipsmark
Sent: Thursday, October 09, 2014 3:33 PM
To: Dan McDonald
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files

Is there a command I can run to check?

Rune

-Original Message-
From: Dan McDonald [mailto:dan...@omniti.com] 
Sent: Thursday, October 09, 2014 11:51 AM
To: Rune Tipsmark
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files


On Oct 9, 2014, at 2:38 PM, Rune Tipsmark  wrote:

> So if I just upgrade to latest it should be supported?

It should be available in r151010!  That's why I'm surprised.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-10-10 Thread Rune Tipsmark

Same acceleration on iSCSI

naa.600144f0908abf5d539106e40001
   VAAI Plugin Name:
   ATS Status: unsupported
   Clone Status: unsupported
   Zero Status: supported
   Delete Status: unsupported

Rune

-Original Message-
From: Richard Elling [mailto:richard.ell...@richardelling.com] 
Sent: Friday, October 10, 2014 10:01 AM
To: Rune Tipsmark
Cc: Dan McDonald; omnios-discuss
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files


On Oct 9, 2014, at 4:58 PM, Rune Tipsmark  wrote:

> Just updated to latest version r151012
> 
> Still same... I checked for vdev settings, is there another place I can check?

It won't be a ZFS feature. On the initiator, use something like sg3_utils 
thusly:

[root@congo ~]# sg_opcodes /dev/rdsk/c0t5000C50030117C3Bd0
  SEAGATE   ST800FM0043   0005
  Peripheral device type: disk

Opcode  ServiceCDBName
(hex)   action(h)  size   
---
 00  6Test Unit Ready
 01  6Rezero Unit
 03  6Request Sense
 04  6Format Unit
 07  6Reassign Blocks
 08  6Read(6)
 0a  6Write(6)
 0b  6Seek(6)
 12  6Inquiry
 15  6Mode select(6)
 16  6Reserve(6)
 17  6Release(6)
 1a  6Mode sense(6)
 1b  6Start stop unit
 1c  6Receive diagnostic results
 1d  6Send diagnostic
 25 10Read capacity(10)
 28 10Read(10)
 2a 10Write(10)
 2b 10Seek(10)
 2e 10Write and verify(10)
 2f 10Verify(10)
 35 10Synchronize cache(10)
 37 10Read defect data(10)
 3b010Write buffer, combined header and data [or multiple 
modes]
 3b210Write buffer, data
 3b410Write buffer, download microcode and activate
 3b510Write buffer, download microcode, save, and activate
 3b610Write buffer, download microcode with offsets and 
activate
 3b710Write buffer, download microcode with offsets, save, 
and activate
 3ba10Write buffer, write data to echo buffer
 3bd10Write buffer, download microcode with offsets, select 
activation events, save and defer activate
 3be10Write buffer, download microcode with offsets, save 
and defer activate
 3bf10Write buffer, activate deferred microcode
 3b   1a10Write buffer, enable expander comms protocol and echo 
buffer
 3b   1c10Write buffer, download application client error 
history
 3c010Read buffer, combined header and data [or multiple 
modes]
 3c210Read buffer, data
 3c310Read buffer, descriptor
 3ca10Read buffer, read data from echo buffer
 3cb10Read buffer, echo buffer descriptor
 3c   1c10Read buffer, error history
 3e 10Read long(10)
 3f 10Write long(10)
 41 10Write same(10)
 42 10Unmap
 48210Sanitize, block erase
 48   1f10Sanitize, exit failure mode
 4c 10Log select
 4d 10Log sense
 55 10Mode select(10)
 56 10Reserve(10)
 57 10Release(10)
 5a 10Mode sense(10)
 5e010Persistent reserve in, read keys
 5e110Persistent reserve in, read reservation
 5e210Persistent reserve in, report capabilities
 5e310Persistent reserve in, read full status
 5f010Persistent reserve out, register
 5f110Persistent reserve out, reserve
 5f210Persistent reserve out, release
 5f310Persistent reserve out, clear
 5f410Persistent reserve out, preempt
 5f510Persistent reserve out, preempt and abort
 5f610Persistent reserve out, register and ignore existing 
key
 5f710Persistent reserve out, register and move
 7f932Read(32)
 7fa32Verify(32)
 7fb32Write(32)
 7fc32Write an verify(32)
 7fd32Write same(32)
 88 16Read(16)
 8a 16Write(16)
 8e 16Write and verify(16)
 8f 16Verify(16)
 91 16Synchronize cache(16)
 93 16Write same(16)
 9e

[OmniOS-discuss] zfs pool 100% busy, disks less than 10%

2014-10-30 Thread Rune Tipsmark

Hi all,

Hope someone can help me get this pool running as it should, I am seeing 
something like 200-300 MB/sec max which is much much less than I want to see...

11 mirrored vdevs... 2 spares and 2 SLOG devices, 192gb ram in host...

Why is this pool showing near 100% busy when the underlying disks are doing 
nothing at all


  pool: pool04
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Fri Oct 31 06:37:30 2014
config:

NAME   STATE READ WRITE CKSUM
pool04 ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
c4t5000C50055FC9533d0  ONLINE   0 0 0
c4t5000C50055FE6A63d0  ONLINE   0 0 0
  mirror-1 ONLINE   0 0 0
c4t5000C5005708296Fd0  ONLINE   0 0 0
c4t5000C5005708351Bd0  ONLINE   0 0 0
  mirror-2 ONLINE   0 0 0
c4t5000C500570858EFd0  ONLINE   0 0 0
c4t5000C50057085A6Bd0  ONLINE   0 0 0
  mirror-3 ONLINE   0 0 0
c4t5000C50057086307d0  ONLINE   0 0 0
c4t5000C50057086B67d0  ONLINE   0 0 0
  mirror-4 ONLINE   0 0 0
c4t5000C500570870D3d0  ONLINE   0 0 0
c4t5000C50057089753d0  ONLINE   0 0 0
  mirror-5 ONLINE   0 0 0
c4t5000C500625B7EA7d0  ONLINE   0 0 0
c4t5000C500625B8137d0  ONLINE   0 0 0
  mirror-6 ONLINE   0 0 0
c4t5000C500625B8427d0  ONLINE   0 0 0
c4t5000C500625B86E3d0  ONLINE   0 0 0
  mirror-7 ONLINE   0 0 0
c4t5000C500625B886Fd0  ONLINE   0 0 0
c4t5000C500625BB773d0  ONLINE   0 0 0
  mirror-8 ONLINE   0 0 0
c4t5000C500625BC2C3d0  ONLINE   0 0 0
c4t5000C500625BD3EBd0  ONLINE   0 0 0
  mirror-9 ONLINE   0 0 0
c4t5000C50062878C0Bd0  ONLINE   0 0 0
c4t5000C50062878C43d0  ONLINE   0 0 0
  mirror-10ONLINE   0 0 0
c4t5000C50062879687d0  ONLINE   0 0 0
c4t5000C50062879707d0  ONLINE   0 0 0
logs
  c10d0ONLINE   0 0 0
  c11d0ONLINE   0 0 0
spares
  c4t5000C50062879723d0AVAIL
  c4t5000C50062879787d0AVAIL


extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  337.00.0 21899.1  0.0  0.00.00.0   0   0 c10d0
0.0  337.00.0 21835.1  0.0  0.00.00.0   0   0 c11d0
  149.0  249.0 7723.2 7032.7  0.0  0.00.00.0   0   0 c12d0
  144.0  295.0 6766.2 6955.7  0.0  0.00.00.0   0   0 c14d0
  154.0  311.0 9241.6 6445.7  0.0  0.00.00.0   0   0 c15d0
  150.0  263.0 7939.2 6096.7  0.0  0.00.00.0   0   0 c13d0
  597.0 1118.0 31670.2 26530.9  5.8  0.43.40.2   5   8 pool03
0.0 1382.90.0 78460.7 11.4  1.78.31.3   5  97 pool04
0.00.00.00.0  0.0  0.00.00.0   0   0 rpool
0.00.00.00.0  0.0  0.00.00.0   0   0 c7t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c7t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c7t4d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c7t5d0
0.0   25.00.0 1043.5  0.0  0.00.01.3   0   3 
c4t5000C50055FE6A63d0
0.0   69.00.0 1478.4  0.0  0.00.00.7   0   4 
c4t5000C5005708351Bd0
0.0   41.00.0 1261.9  0.0  0.00.00.7   0   3 
c4t5000C50057086B67d0
0.0   37.00.0 2146.4  0.0  0.00.00.8   0   3 
c4t5000C500570858EFd0
0.0   37.00.0 2146.4  0.0  0.00.01.2   0   4 
c4t5000C50057085A6Bd0
0.0   69.00.0 1478.4  0.0  0.10.00.8   0   5 
c4t5000C5005708296Fd0
0.0   26.00.0 1868.9  0.0  0.00.01.2   0   3 
c4t5000C50057089753d0
0.0   27.00.0 1868.9  0.0  0.00.01.0   0   3 
c4t5000C500570870D3d0
0.0   39.00.0 1261.9  0.0  0.00.00.7   0   3 
c4t5000C50057086307d0
0.0   25.00.0 1043.5  0.0  0.00.01.3   0   3 
c4t5000C50055FC9533d0
0.0   26.00.0 2103.9  0.0  0.00.00.8   0   2 
c4t5000C50062878C43d0
0.0   15.00.0 1074.0  0.0  0.00.01.9   0   3 
c4t5000C500625B886Fd0
0.0   15.00.0 1044.0  0.0  0.00.01.6   0   2 
c4t5000C500625BD3EBd0
0.0   15.00.0 1044.0  0.0  0.00.02.6   0

Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%

2014-10-31 Thread Rune Tipsmark

Ok, makes sense.
What other kind of  indicators can I look at?

I get decent results from DD but still feels a bit slow...

Compression lz4 should not slow it down right? Cpu is not doing much when 
copying data over, maybe 15% busy or so... 

Sync=always, block size 1M
2048 bytes (205 GB) copied, 296.379 s, 691 MB/s
real4m56.382s
user0m0.461s
sys 3m12.662s

Sync=disabled, block size 1M
2048 bytes (205 GB) copied, 117.774 s, 1.7 GB/s
real1m57.777s
user0m0.237s
sys 1m57.466s

... while doing this I was looking at my FIO cards, I think the reason is that 
the SLC's need more power to deliver higher performance, they are supposed to 
deliver 1.5GB/sec but only delivers around 350MB/sec each

Now looking for aux power cables and will retest...

Br,
Rune

-Original Message-
From: Richard Elling [mailto:richard.ell...@richardelling.com] 
Sent: Friday, October 31, 2014 9:03 AM
To: Eric Sproul
Cc: Rune Tipsmark; omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%


On Oct 31, 2014, at 7:14 AM, Eric Sproul  wrote:

> On Fri, Oct 31, 2014 at 2:33 AM, Rune Tipsmark  wrote:
> 
>> Why is this pool showing near 100% busy when the underlying disks are 
>> doing nothing at all
> 
> Simply put, it's just how the accounting works in iostat.  It treats 
> the pool like any other device, so if there is even one outstanding 
> request to the pool, it counts towards the busy%.  Keith W. from 
> Joyent explained this recently on the illumos-zfs list:
> http://www.listbox.com/member/archive/182191/2014/10/sort/time_rev/pag
> e/3/entry/18:93/20141017161955:F3E11AB2-563A-11E4-8EDC-D0C677981E2F/
> 
> The TL;DR is: if your pool has more than one disk in it, the pool-wide 
> busy% is useless.

FWIW, we use %busy as an indicator that we can ignore a device/subsystem when 
looking for performance problems. We don't use it as an indicator of problems. 
In other words, if the device isn't > 10% busy, forgetabouddit. If it is more 
busy, look in more detail at the meaningful performance indicators.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%

2014-10-31 Thread Rune Tipsmark

So actually started storage vmotions on 3 host, 6 concurrent and am getting 
about 1GB/sec
Guess I need more hosts to really push this, the disk are not more than 20-25% 
busy, so in theory I could push a bit more.

I think this is resolved for now cpu sitting at 30-40% usage while moving 
1GB/sec

Iostat -xn 1
pool04   396G  39.5T  9  15.9K   325K  1.01G
pool04   396G  39.5T  7  17.0K   270K  1.03G
pool04   396G  39.5T 12  17.4K   558K  1.10G
pool04   396G  39.5T 10  16.9K   442K  1.03G
pool04   397G  39.5T  6  16.9K   332K  1021M
pool04   397G  39.5T  1  16.3K  74.9K  1.01G
pool04   397G  39.5T  8  17.0K   433K  1.05G
pool04   397G  39.5T 20  17.1K   716K  1023M
pool04   397G  39.5T 11  18.3K   425K  1.14G
pool04   398G  39.5T  0  18.3K  65.9K  1.11G
pool04   398G  39.5T 16  17.9K   551K  1.06G
pool04   398G  39.5T  0  16.8K   105K  1.03G
pool04   398G  39.5T  1  18.2K   124K  1.11G
pool04   398G  39.5T  0  17.1K  45.9K  1.05G
pool04   399G  39.5T  6  17.3K   454K  1.08G
pool04   399G  39.5T  0  17.9K  0  1.06G
pool04   399G  39.5T  2  16.9K   116K  1.04G
pool04   399G  39.5T  2  18.8K   130K  1.09G
pool04   399G  39.5T  0  17.6K  0  1.03G
pool04   400G  39.5T  3  17.5K   155K  1.04G
pool04   400G  39.5T  0  17.6K  31.5K  1.03G

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Rune Tipsmark
Sent: Friday, October 31, 2014 12:38 PM
To: Richard Elling; Eric Sproul
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%

Ok, makes sense.
What other kind of  indicators can I look at?

I get decent results from DD but still feels a bit slow...

Compression lz4 should not slow it down right? Cpu is not doing much when 
copying data over, maybe 15% busy or so... 

Sync=always, block size 1M
2048 bytes (205 GB) copied, 296.379 s, 691 MB/s
real4m56.382s
user0m0.461s
sys 3m12.662s

Sync=disabled, block size 1M
2048 bytes (205 GB) copied, 117.774 s, 1.7 GB/s
real1m57.777s
user0m0.237s
sys 1m57.466s

... while doing this I was looking at my FIO cards, I think the reason is that 
the SLC's need more power to deliver higher performance, they are supposed to 
deliver 1.5GB/sec but only delivers around 350MB/sec each

Now looking for aux power cables and will retest...

Br,
Rune

-Original Message-
From: Richard Elling [mailto:richard.ell...@richardelling.com] 
Sent: Friday, October 31, 2014 9:03 AM
To: Eric Sproul
Cc: Rune Tipsmark; omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%


On Oct 31, 2014, at 7:14 AM, Eric Sproul  wrote:

> On Fri, Oct 31, 2014 at 2:33 AM, Rune Tipsmark  wrote:
> 
>> Why is this pool showing near 100% busy when the underlying disks are 
>> doing nothing at all
> 
> Simply put, it's just how the accounting works in iostat.  It treats 
> the pool like any other device, so if there is even one outstanding 
> request to the pool, it counts towards the busy%.  Keith W. from 
> Joyent explained this recently on the illumos-zfs list:
> http://www.listbox.com/member/archive/182191/2014/10/sort/time_rev/pag
> e/3/entry/18:93/20141017161955:F3E11AB2-563A-11E4-8EDC-D0C677981E2F/
> 
> The TL;DR is: if your pool has more than one disk in it, the pool-wide 
> busy% is useless.

FWIW, we use %busy as an indicator that we can ignore a device/subsystem when 
looking for performance problems. We don't use it as an indicator of problems. 
In other words, if the device isn't > 10% busy, forgetabouddit. If it is more 
busy, look in more detail at the meaningful performance indicators.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

2014-11-01 Thread Rune Tipsmark

Hi  all,

Is it possible to do zfs send/recv via SRP or some other RMDA enabled protocol? 
IPoIB is really slow, about 50 MB/sec between two boxes, no disks are more than 
10-15% busy.

If not, is there a way I can aggregate say 8 or 16  IPoIB partitions and push 
throughput to a more reasonable speed...

Br,
Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

2014-11-01 Thread Rune Tipsmark

Sounds sensible, how do I do that?
I tried creating a view for a thin lu with my other zfs box, but how do I 
detect it?

I also stumbled across something else interesting, wondering if its possible to 
set up two identical boxes and create a pool with local/remote disks as per 
this article 
http://www.ssec.wisc.edu/~scottn/Lustre_ZFS_notes/lustre_zfs_srp_mirror.html

Br,
Rune

From: David Bomba [mailto:turbo...@gmail.com]
Sent: Saturday, November 01, 2014 6:01 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

I usually mount a iSer target and perform ZFS send to the target. This was the 
best way to exploit the rdma bandwidth to its full potential.


On 2 Nov 2014, at 11:45 am, Rune Tipsmark 
mailto:r...@steait.net>> wrote:

Hi  all,

Is it possible to do zfs send/recv via SRP or some other RMDA enabled protocol? 
IPoIB is really slow, about 50 MB/sec between two boxes, no disks are more than 
10-15% busy.

If not, is there a way I can aggregate say 8 or 16  IPoIB partitions and push 
throughput to a more reasonable speed…

Br,
Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

2014-11-02 Thread Rune Tipsmark

 24. c7t0d0 
  /pci@0,0/pci15d9,704@1f,2/disk@0,0
  25. c7t1d0 
  /pci@0,0/pci15d9,704@1f,2/disk@1,0
  26. c7t4d0 
  /pci@0,0/pci15d9,704@1f,2/disk@4,0
  27. c7t5d0 
  /pci@0,0/pci15d9,704@1f,2/disk@5,0
  28. c10d0 
  /pci@0,0/pci8086,3c08@3/pci103c,178b@0
  29. c11d0 
  /pci@0,0/pci8086,3c0a@3,2/pci103c,178b@0
  30. c12d0 
  /pci@79,0/pci8086,3c04@2/pci10b5,8616@0/pci10b5,8616@5/pci103c,178e@0
  31. c13d0 
  /pci@79,0/pci8086,3c04@2/pci10b5,8616@0/pci10b5,8616@6/pci103c,178e@0
  32. c14d0 
  /pci@79,0/pci8086,3c08@3/pci10b5,8616@0/pci10b5,8616@5/pci103c,178e@0
  33. c15d0 
  /pci@79,0/pci8086,3c08@3/pci10b5,8616@0/pci10b5,8616@6/pci103c,178e@0
Specify disk (enter its number):



-Original Message-
From: Johan Kragsterman [mailto:johan.kragster...@capvert.se] 
Sent: Sunday, November 02, 2014 9:56 AM
To: Rune Tipsmark
Cc: David Bomba; omnios-discuss@lists.omniti.com
Subject: Ang: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol


-"OmniOS-discuss"  skrev: -
Till: David Bomba 
Från: Rune Tipsmark 
Sänt av: "OmniOS-discuss" 
Datum: 2014-11-02 07:55
Kopia: "omnios-discuss@lists.omniti.com" 
Ärende: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

Sounds sensible, how do I do that?

I tried creating a view for a thin lu with my other zfs box, but how do I 
detect it?





First of all, if you run IB, forget the iscsi stuff, it's only creating un 
unnecessary IP layer that you don't need, and adds latency to your application.

Did you create SRP target service using COMSTAR?

# svcadm enable -r ibsrp/target

What's the output of "srptadm list-target" ?(on storage box), or you can also 
use "stmfadm list-target -v"

Do you got all necessary IB stuff, like a storage manager(OpenSM), in place? 
HCA ports shows up in dladm show-link?

If so, your host system should discover it as a local disk, just with "format", 
if you have created a view with the right eui. for the initiator 
HCA.


Rgrds Johan




 

I also stumbled across something else interesting, wondering if its possible to 
set up two identical boxes and create a pool with local/remote disks as per 
this article 
http://www.ssec.wisc.edu/~scottn/Lustre_ZFS_notes/lustre_zfs_srp_mirror.html

 

Br,

Rune

 

From: David Bomba [mailto:turbo...@gmail.com] 
Sent: Saturday, November 01, 2014 6:01 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

 

I usually mount a iSer target and perform ZFS send to the target. This was the 
best way to exploit the rdma bandwidth to its full potential. 

 

 

On 2 Nov 2014, at 11:45 am, Rune Tipsmark  wrote:

 

Hi  all,

 

Is it possible to do zfs send/recv via SRP or some other RMDA enabled protocol? 
IPoIB is really slow, about 50 MB/sec between two boxes, no disks are more than 
10-15% busy.

 

If not, is there a way I can aggregate say 8 or 16  IPoIB partitions and push 
throughput to a more reasonable speed…

 

Br,

Rune













___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

 

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

2014-11-02 Thread Rune Tipsmark

I know, but how do I initiate a session from ZFS10?

Br,
Rune

From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 10:33 AM
To: Rune Tipsmark
Cc: David Bomba; omnios-discuss@lists.omniti.com
Subject: Ang: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol



-Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-02 19:11
Kopia: David Bomba mailto:turbo...@gmail.com>>, 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol
Hi Johan

Got two ZFS boxes (ZFS00 recv, ZFS10 send), both with IB and all configured and 
views made for vSphere which works just fine..

What I can't  figure out is how to share a LUN with the other zfs box... see 
pasted info below...

ZFS00: (the box I want to receive my zfs snapshot)
The below are all ESX servers, cannot see the other ZFS box

root@zfs00:/pool03# stmfadm list-target -v
Target: eui.0002C90300095E7C
Operational Status: Online
Provider Name : srpt
Alias : -
Protocol  : SRP
Sessions  : 8
Initiator: eui.0002C903000F397C
Alias: 8102c90300095e7e:0002c903000f397c
Logged in since: Sun Nov  2 02:09:41 2014
Initiator: eui.0002C903000F397B
Alias: 8102c90300095e7d:0002c903000f397b
Logged in since: Sun Nov  2 02:09:40 2014
Initiator: eui.0002C903000D3D04
Alias: 8102c90300095e7e:0002c903000d3d04
Logged in since: Sat Nov  1 21:14:47 2014
Initiator: eui.0002C90300104F47
Alias: 8102c90300095e7d:0002c90300104f47
Logged in since: Sat Nov  1 21:12:54 2014
Initiator: eui.0002C903000D3D03
Alias: 8102c90300095e7d:0002c903000d3d03
Logged in since: Sat Nov  1 21:12:32 2014
Initiator: eui.0002C90300104F48
Alias: 8102c90300095e7e:0002c90300104f48
Logged in since: Sat Nov  1 21:10:45 2014
Initiator: eui.0002C903000A48FA
Alias: 8102c90300095e7e:0002c903000a48fa
Logged in since: Sat Nov  1 21:10:40 2014
Initiator: eui.0002C903000D3CA0
Alias: 8102c90300095e7e:0002c903000d3ca0
Logged in since: Sat Nov  1 21:10:39 2014
Target: iqn.2010-09.org.napp-it:1394106801
Operational Status: Online
Provider Name : iscsit
Alias : 03.06.2014
Protocol  : iSCSI
Sessions  : 0

root@zfs00:/pool03# stmfadm list-lu -v
LU Name: 600144F007780B7F5455EDD50002
Operational Status: Online
Provider Name : sbd
Alias : /pool03/LU11
View Entry Count  : 1
Data File : /pool03/LU11
Meta File : not set
Size  : 219902322
Block Size: 512
Management URL: not set
Vendor ID : SUN
Product ID: COMSTAR
Serial Num: not set
Write Protect : Disabled
Writeback Cache   : Disabled
Access State  : Active

root@zfs00:/pool03# stmfadm list-hg -v
Host Group: ESX
Host Group: Windows
Host Group: ESX-iSER
Host Group: OmniOS
Member: eui.0002C903000923E6 <<--- The other ZFS box
Member: iqn.2010-09.org.napp-it:1402013225
Member: iqn.1986-03.com.sun:01:58cfb38a32ff.5390f58d

root@zfs00:/pool03# stmfadm list-view -l 600144F007780B7F5455EDD50002
View Entry: 0
Host group   : OmniOS
Target group : All
LUN  : 0

ZFS10: (the sending box where I want to see the LUN from ZFS00)
No disk show up from ZFS00...



 Hi!


You got eui.0002C903000923E6 in host group OmniOS, but you don't have a session 
from that eui to the target.

Rgrds Johan












root@zfs10:/root# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c4t5000C50055FC9533d0 
  /scsi_vhci/disk@g5000c50055fc9533
   1. c4t5000C50055FE6A63d0 
  /scsi_vhci/disk@g5000c50055fe6a63
   2. c4t5000C500625B7EA7d0 
  /scsi_vhci/disk@g5000c500625b7ea7
   3. c4t5000C500625B86E3d0 
  /scsi_vhci/disk@g5000c500625b86e3
   4. c4t5000C500625B886Fd0 
  /scsi_vhci/disk@g5000c500625b886f
   5. c4t5000C500625B8137d0 
  /scsi_vhci/disk@g5000c500625b8137
   6. c4t5000C500625B8427d0 
  /scsi_vhci/disk@g5000c500625b8427
   7. c4t5000C500625BB773d0 
  /scsi_vhci/disk@g5000c500625bb773
   8. c4t5000C500625BC2C3d0 
  /scsi_vhci/disk@g5000c500625bc2c3
   9. c4t5000C500625BD3EBd0 
  /scsi_vhci/disk@g5000c500625bd3eb
  10. c4t5000C50057085A6Bd0 
  /scsi_vhci/disk@g5000c50057085a6b
  11. c4t5000C50057086B67d0 
  /sc

Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%

2014-11-02 Thread Rune Tipsmark

Looking a bit more at these numbers, am I seeing twice the actual rate due to 
mirroring?
How does compression affect the numbers?

Say I have 1 vdev mirrored and I see the pool writing 100 MB/sec, its what? 50 
MB each disk? But from the client side only 50 MB/sec total? What if I compress 
it at the same time at say 1.50 ratio, will the pool show 100 MB/sec and the 
client write 75 MB/sec actual?

Br,
Rune

-Original Message-
From: Richard Elling [mailto:richard.ell...@richardelling.com] 
Sent: Sunday, November 02, 2014 6:07 PM
To: Rune Tipsmark
Cc: Eric Sproul; omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%


On Oct 31, 2014, at 6:07 PM, Rune Tipsmark  wrote:

> So actually started storage vmotions on 3 host, 6 concurrent and am 
> getting about 1GB/sec Guess I need more hosts to really push this, the disk 
> are not more than 20-25% busy, so in theory I could push a bit more.
> 
> I think this is resolved for now cpu sitting at 30-40% usage while 
> moving 1GB/sec

Yes, that seems about right.
 -- richard

> 
> Iostat -xn 1
> pool04   396G  39.5T  9  15.9K   325K  1.01G
> pool04   396G  39.5T  7  17.0K   270K  1.03G
> pool04   396G  39.5T 12  17.4K   558K  1.10G
> pool04   396G  39.5T 10  16.9K   442K  1.03G
> pool04   397G  39.5T  6  16.9K   332K  1021M
> pool04   397G  39.5T  1  16.3K  74.9K  1.01G
> pool04   397G  39.5T  8  17.0K   433K  1.05G
> pool04   397G  39.5T 20  17.1K   716K  1023M
> pool04   397G  39.5T 11  18.3K   425K  1.14G
> pool04   398G  39.5T  0  18.3K  65.9K  1.11G
> pool04   398G  39.5T 16  17.9K   551K  1.06G
> pool04   398G  39.5T  0  16.8K   105K  1.03G
> pool04   398G  39.5T  1  18.2K   124K  1.11G
> pool04   398G  39.5T  0  17.1K  45.9K  1.05G
> pool04   399G  39.5T  6  17.3K   454K  1.08G
> pool04   399G  39.5T  0  17.9K  0  1.06G
> pool04   399G  39.5T  2  16.9K   116K  1.04G
> pool04   399G  39.5T  2  18.8K   130K  1.09G
> pool04   399G  39.5T  0  17.6K  0  1.03G
> pool04   400G  39.5T  3  17.5K   155K  1.04G
> pool04   400G  39.5T  0  17.6K  31.5K  1.03G
> 
> -Original Message-
> From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] 
> On Behalf Of Rune Tipsmark
> Sent: Friday, October 31, 2014 12:38 PM
> To: Richard Elling; Eric Sproul
> Cc: omnios-discuss@lists.omniti.com
> Subject: Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%
> 
> Ok, makes sense.
> What other kind of  indicators can I look at?
> 
> I get decent results from DD but still feels a bit slow...
> 
> Compression lz4 should not slow it down right? Cpu is not doing much when 
> copying data over, maybe 15% busy or so... 
> 
> Sync=always, block size 1M
> 2048 bytes (205 GB) copied, 296.379 s, 691 MB/s
> real4m56.382s
> user0m0.461s
> sys 3m12.662s
> 
> Sync=disabled, block size 1M
> 2048 bytes (205 GB) copied, 117.774 s, 1.7 GB/s
> real1m57.777s
> user0m0.237s
> sys 1m57.466s
> 
> ... while doing this I was looking at my FIO cards, I think the reason is 
> that the SLC's need more power to deliver higher performance, they are 
> supposed to deliver 1.5GB/sec but only delivers around 350MB/sec each
> 
> Now looking for aux power cables and will retest...
> 
> Br,
> Rune
> 
> -Original Message-
> From: Richard Elling [mailto:richard.ell...@richardelling.com]
> Sent: Friday, October 31, 2014 9:03 AM
> To: Eric Sproul
> Cc: Rune Tipsmark; omnios-discuss@lists.omniti.com
> Subject: Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%
> 
> 
> On Oct 31, 2014, at 7:14 AM, Eric Sproul  wrote:
> 
>> On Fri, Oct 31, 2014 at 2:33 AM, Rune Tipsmark  wrote:
>> 
>>> Why is this pool showing near 100% busy when the underlying disks 
>>> are doing nothing at all
>> 
>> Simply put, it's just how the accounting works in iostat.  It treats 
>> the pool like any other device, so if there is even one outstanding 
>> request to the pool, it counts towards the busy%.  Keith W. from 
>> Joyent explained this recently on the illumos-zfs list:
>> http://www.listbox.com/member/archive/182191/2014/10/sort/time_rev/pa
>> g 
>> e/3/entry/18:93/20141017161955:F3E11AB2-563A-11E4-8EDC-D0C677981E2F/
>> 
>> The TL;DR is: if your pool has more than one disk in it, the 
>> pool-wide busy% is useless.
> 
> FWIW, we use %busy as an indicator that we can ignore a device/subsystem when 
> looking for performance problems. We don't use it as an indicator o

Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

2014-11-02 Thread Rune Tipsmark

connectX2 and drivers are loaded, both OmniOS servers have LUNs I can access 
from both ESX and Windows... just the conection between them that I cant figure 
out.
Br,
Rune

From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 10:49 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Ang: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol


Hej!

I can see I forgot the list in previous mail...

Anyway, are you using connectx adapters?

Have you checked if the driver is loaded in the host system?


Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert


-Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-03 00:07
Ärende: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol
It doesn't show up in stmfadm list-target only my other servers such as ESX 
and Windows show up...

Subnet Manager is OpenSM running on the ESX hosts, has been working OK ever 
since I installed then, prior to that I had a couple of Cisco(Topspin) DDR 
Switches with inbuilt SM.

Any other idea? All things I configured so far on both ZFS boxes,windows and 
esx has worked as expected... just this darn thing that's not doing what its 
supposed to...

It should be enough to just add the HCA from ZFS10 onto the host-group on ZFS00 
for ZFS10 to see a LUN on ZFS00 granted that there is a view configured on 
ZFS00 as well... or am I missing a step?

Br
Rune

-Original Message-
From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 11:58 AM
To: Rune Tipsmark
Subject: Ang: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol


-Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-02 20:05
Kopia: David Bomba mailto:turbo...@gmail.com>>, 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol

I know, but how do I initiate a session from ZFS10?



If a session doesn't show up in "stmfadm list-targte -v", then you got 
something wrong in the IB fabric, if the view is right. Do you use a switch? 
Where do you have your IB Storage Manager?








Br,

Rune



From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 10:33 AM
To: Rune Tipsmark
Cc: David Bomba; 
omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>
Subject: Ang: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol






-Rune Tipsmark mailto:r...@steait.net>> skrev: -

Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-02 19:11
Kopia: David Bomba mailto:turbo...@gmail.com>>, 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

Hi Johan

Got two ZFS boxes (ZFS00 recv, ZFS10 send), both with IB and all configured and 
views made for vSphere which works just fine..

What I can't  figure out is how to share a LUN with the other zfs box... see 
pasted info below...

ZFS00: (the box I want to receive my zfs snapshot) The below are all ESX 
servers, cannot see the other ZFS box

root@zfs00:/pool03# stmfadm list-target -v
Target: eui.0002C90300095E7C
Operational Status: Online
Provider Name : srpt
Alias : -
Protocol  : SRP
Sessions  : 8
Initiator: eui.0002C903000F397C
Alias: 8102c90300095e7e:0002c903000f397c
Logged in since: Sun Nov  2 02:09:41 2014
Initiator: eui.0002C903000F397B
Alias: 8102c90300095e7d:0002c903000f397b
Logged in since: Sun Nov  2 02:09:40 2014
Initiator: eui.0002C903000D3D04
Alias: 8102c90300095e7e:0002c903000d3d04
Logged in since: Sat Nov  1 21:14:47 2014
Initiator: eui.0002C90300104F47
Alias: 8102c90300095e7d:0002c90300104f47
Logged in since: Sat Nov  1 21:12:54 2014
Initiator: eui.0002C903000D3D03
Alias: 8102c90300095e7d:0002c903000d3d03
Logged in since: Sat Nov  1 21:12:32 2014
Initiator: eui.0002C90300104F48
Alias: 8102c90300095e7e:0002c90300104f48
Logged in since: Sat Nov  1 21:10:45 2014
Initiator: eui.0002C903000A48FA
Alias: 8102c90300095e7e:0002c903000a48fa

Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

2014-11-02 Thread Rune Tipsmark

Tjenare,

Maybe its not possible, would explain why I cannot see the shared lun on the 
other system.
I don't have any spare PCI-E slots on either, all filled with IODrives etc.

Br,
Rune

From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 11:24 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Ang: RE: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other 
RDMA enabled protocol


Hej!


 Hmm, how about the target/initiiator configuration of the HCA's?

 When I think about it, I have never done this that you're trying: To use SRP 
both as target and initiator on the same system. I don't even know if it is 
possible. Since an HCA is entirely(with all ports) a target, you need to set 
one HCA as a target, and one HCA as an initiator. I've done it on fibre 
channel, but never on IB/SRP.


Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert


-Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-03 08:09
Kopia: 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol
connectX2 and drivers are loaded, both OmniOS servers have LUNs I can access 
from both ESX and Windows... just the conection between them that I cant figure 
out.
Br,
Rune

From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 10:49 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>
Subject: Ang: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol


Hej!
I can see I forgot the list in previous mail...

Anyway, are you using connectx adapters?

Have you checked if the driver is loaded in the host system?


Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert


-Rune Tipsmark mailto:r...@steait.net>> skrev: -----
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-03 00:07
Ärende: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol
It doesn't show up in stmfadm list-target only my other servers such as ESX 
and Windows show up...

Subnet Manager is OpenSM running on the ESX hosts, has been working OK ever 
since I installed then, prior to that I had a couple of Cisco(Topspin) DDR 
Switches with inbuilt SM.

Any other idea? All things I configured so far on both ZFS boxes,windows and 
esx has worked as expected... just this darn thing that's not doing what its 
supposed to...

It should be enough to just add the HCA from ZFS10 onto the host-group on ZFS00 
for ZFS10 to see a LUN on ZFS00 granted that there is a view configured on 
ZFS00 as well... or am I missing a step?

Br
Rune

-Original Message-
From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 11:58 AM
To: Rune Tipsmark
Subject: Ang: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol


-Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-02 20:05
Kopia: David Bomba mailto:turbo...@gmail.com>>, 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol

I know, but how do I initiate a session from ZFS10?



If a session doesn't show up in "stmfadm list-targte -v", then you got 
something wrong in the IB fabric, if the view is right. Do you use a switch? 
Where do you have your IB Storage Manager?








Br,

Rune



From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 10:33 AM
To: Rune Tipsmark
Cc: David Bomba; 
omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>
Subject: Ang: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol






-Rune Tipsmark mailto:r...@steait.net>> skrev: -

Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-02 19:11
Kopia: David Bomba mailto:turbo...@gmail.com>>, 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

Hi Johan

Got two ZFS boxes (ZFS00 recv, ZFS10 send), both with IB and all configured and 
views ma

Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled protocol

2014-11-03 Thread Rune Tipsmark

That was a secondary thought, maybe worth testing one day.
Primarily I was looking at a way of speeding up zfs send-recv.

Guess it's a no go on a single HCA...

From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Monday, November 03, 2014 12:41 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Ang: RE: RE: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or 
other RDMA enabled protocol

Hej!

If you want to mirror a local volume with a remote LU, I don't think it is a 
good idea. Asking for trubble...

I understand you want to get an HA solution. But I believe HA solutions needs 
other strategies than that.

Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert

-Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-03 08:37
Kopia: 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: RE: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol
Tjenare,

Maybe its not possible, would explain why I cannot see the shared lun on the 
other system.
I don't have any spare PCI-E slots on either, all filled with IODrives etc.

Br,
Rune

From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 11:24 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>
Subject: Ang: RE: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other 
RDMA enabled protocol

Hej!

 Hmm, how about the target/initiiator configuration of the HCA's?

 When I think about it, I have never done this that you're trying: To use SRP 
both as target and initiator on the same system. I don't even know if it is 
possible. Since an HCA is entirely(with all ports) a target, you need to set 
one HCA as a target, and one HCA as an initiator. I've done it on fibre 
channel, but never on IB/SRP.

Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert

-Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-03 08:09
Kopia: 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol
connectX2 and drivers are loaded, both OmniOS servers have LUNs I can access 
from both ESX and Windows... just the conection between them that I cant figure 
out.
Br,
Rune

From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 10:49 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>
Subject: Ang: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol

Hej!
I can see I forgot the list in previous mail...

Anyway, are you using connectx adapters?

Have you checked if the driver is loaded in the host system?

Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert

-----Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-03 00:07
Ärende: RE: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA enabled 
protocol
It doesn't show up in stmfadm list-target only my other servers such as ESX 
and Windows show up...

Subnet Manager is OpenSM running on the ESX hosts, has been working OK ever 
since I installed then, prior to that I had a couple of Cisco(Topspin) DDR 
Switches with inbuilt SM.

Any other idea? All things I configured so far on both ZFS boxes,windows and 
esx has worked as expected... just this darn thing that's not doing what its 
supposed to...

It should be enough to just add the HCA from ZFS10 onto the host-group on ZFS00 
for ZFS10 to see a LUN on ZFS00 granted that there is a view configured on 
ZFS00 as well... or am I missing a step?

Br
Rune

-Original Message-
From: Johan Kragsterman [mailto:johan.kragster...@capvert.se]
Sent: Sunday, November 02, 2014 11:58 AM
To: Rune Tipsmark
Subject: Ang: RE: RE: Re: [OmniOS-discuss] zfs send via SRP or other RDMA 
enabled protocol

-Rune Tipsmark mailto:r...@steait.net>> skrev: -
Till: Johan Kragsterman 
mailto:johan.kragster...@capvert.se>>
Från: Rune Tipsmark mailto:r...@steait.net>>
Datum: 2014-11-02 20:05
Kopia: David Bomba mailto:turbo...@gmail.com>>, 
"omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>" 
mailto:omnios-discuss@lists.omniti.com>>
Ärende: RE: RE: Re: [OmniOS-

Re: [OmniOS-discuss] infiniband

2014-11-09 Thread Rune Tipsmark

What network throughput were you looking at before the tweaking?

Br,
Rune

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Michael Rasmussen
Sent: Sunday, November 09, 2014 5:21 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] infiniband

On Mon, 10 Nov 2014 00:06:26 +0100
Michael Rasmussen  wrote:

> 
> This explains the difference in speed since 2.5GT/s x2 is roughly 3.5 
> gbps over IBoIP.
> 
> Can anybody explain why Omnios only uses 2 lains when PCIe slot and 
> HCA is capable of 8 lains?
> 
Found the "bug". All required was to change the slot;-)

01:00.0 InfiniBand: Mellanox Technologies MT25408 [ConnectX VPI - IB SDR / 
10GigE] (rev a0) Subsystem: Mellanox Technologies Device 0003
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF-
FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--
/usr/games/fortune -es says:
ROMEO:  Courage, man; the hurt cannot be much.
MERCUTIO:   No, 'tis not so deep as a well, nor so wide
as a church-door; but 'tis enough, 'twill serve.
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] No space left on device - upgrade failed

2014-11-10 Thread Rune Tipsmark

Hi all,

Hoping someone can help here.

root@zfs00:~# /usr/bin/pkg update --be-name=omnios-r151012 
entire@11,5.11-0.151012
Creating Plan |pkg: An error was encountered while attempting to store 
information about the
current operation in client history.
pkg: [Errno 28] No space left on device: 
'/var/pkg/history/20141110T181542Z-01.xml'

root@zfs00:~# zpool list
NAME SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
pool03  10.9T  5.03G  10.9T - 0%  1.00x  ONLINE  -
rpool   14.9G  10.5G  4.35G -70%  1.00x  ONLINE  -
root@zfs00:~#

seems like there is enough space?

How to fix? And how to keep  a tidy rpool at all times?

Br,
Rune

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] No space left on device - upgrade failed

2014-11-10 Thread Rune Tipsmark

>What is the dataset breakout (zfs list)?
>Maybe you have reservations like swap and dump (volumes in general) - their 
>unused space is not available for other datasets and not allocated on >backend 
>storage either (what zpool list reflects).

root@zfs00:~# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
pool03 189G  10.7T  38.5K  /pool03
pool03/ISO  34K  10.5T34K  /pool03/ISO
pool03/vol0116K  10.5T16K  -
rpool 13.7G   992M  38.5K  /rpool
rpool/ROOT3.55G   992M31K  legacy
rpool/ROOT/napp-it-0.9e1  3.55G   992M  3.25G  /
rpool/ROOT/omniosvar31K   992M31K  legacy
rpool/dump5.99G   992M  5.99G  -
rpool/export63K   992M32K  /export
rpool/export/home   31K   992M31K  /export/home
rpool/swap4.13G  5.09G  4.44M  -

I managed to delete some snapshots and got back on track, now 922 MB free...  
how much spare is OmniOS supposed to take up? Rpool of 13.7G where only 922M 
free... looks like its filling up worse than Windows! :)

Br,
Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] No space left on device - upgrade failed

2014-11-10 Thread Rune Tipsmark

No idea what dump contains, but I ran zfs set volsize=4G rpool/dump and its now 
4G.
I also found a crash dump in /var/crash/unknown and that freed up another 
couple of gigs.

Can I reduce swap somehow?

root@zfs00:/root# zfs list
NAMEUSED  AVAIL  REFER  MOUNTPOINT
pool03  189G  10.5T  38.5K  /pool03
pool03/ISO   27K  10.4T27K  /pool03/ISO
pool03/vol01 16K  10.4T16K  -
rpool  11.5G  2.88G  31.5K  /rpool
rpool/ROOT 3.40G  2.88G31K  legacy
rpool/ROOT/omnios-r151012  3.40G  2.88G  2.37G  /
rpool/dump 4.00G  2.88G  4.00G  -
rpool/export 63K  2.88G32K  /export
rpool/export/home31K  2.88G31K  /export/home
rpool/swap 4.13G  7.00G  4.44M  -

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Michael Rasmussen
Sent: Monday, November 10, 2014 3:47 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] No space left on device - upgrade failed

On Mon, 10 Nov 2014 23:32:14 +
Rune Tipsmark  wrote:

> 
> root@zfs00:~# zfs list
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> pool03 189G  10.7T  38.5K  /pool03
> pool03/ISO  34K  10.5T34K  /pool03/ISO
> pool03/vol0116K  10.5T16K  -
> rpool 13.7G   992M  38.5K  /rpool
> rpool/ROOT3.55G   992M31K  legacy
> rpool/ROOT/napp-it-0.9e1  3.55G   992M  3.25G  /
> rpool/ROOT/omniosvar31K   992M31K  legacy
> rpool/dump5.99G   992M  5.99G  -
> rpool/export63K   992M32K  /export
> rpool/export/home   31K   992M31K  /export/home
> rpool/swap4.13G  5.09G  4.44M  -
> 
> I managed to delete some snapshots and got back on track, now 922 MB free...  
> how much spare is OmniOS supposed to take up? Rpool of 13.7G where only 922M 
> free... looks like its filling up worse than Windows! :)
> 
What does rpool/dump contain? It is roughly reserving half of the pool.

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--
/usr/games/fortune -es says:
No one may kill a man.  Not for any purpose.  It cannot be condoned.
-- Kirk, "Spock's Brain", stardate 5431.6
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS pool allocation remains after removing all files

2014-11-11 Thread Rune Tipsmark

So just to revive an older thread

Thin Provisioned LU's have only one VAAI option supported:

naa.600144f007780b7f5462ef090002
   VAAI Plugin Name:
   ATS Status: unsupported
   Clone Status: unsupported
   Zero Status: supported
   Delete Status: unsupported

But if I create a  thin provisioned Volume two VAAI options are supported:

naa.600144f007780b7f5462eb3d0001
   VAAI Plugin Name:
   ATS Status: unsupported
   Clone Status: unsupported
   Zero Status: supported
   Delete Status: supported

-
How can this be? 

Anyone know if there are any news on the two remaining unsupported options?

Br,
Rune

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Rune Tipsmark
Sent: Friday, October 10, 2014 1:58 PM
To: Richard Elling
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files

Same acceleration on iSCSI

naa.600144f0908abf5d539106e40001
   VAAI Plugin Name:
   ATS Status: unsupported
   Clone Status: unsupported
   Zero Status: supported
   Delete Status: unsupported

Rune

-Original Message-
From: Richard Elling [mailto:richard.ell...@richardelling.com] 
Sent: Friday, October 10, 2014 10:01 AM
To: Rune Tipsmark
Cc: Dan McDonald; omnios-discuss
Subject: Re: [OmniOS-discuss] ZFS pool allocation remains after removing all 
files


On Oct 9, 2014, at 4:58 PM, Rune Tipsmark  wrote:

> Just updated to latest version r151012
> 
> Still same... I checked for vdev settings, is there another place I can check?

It won't be a ZFS feature. On the initiator, use something like sg3_utils 
thusly:

[root@congo ~]# sg_opcodes /dev/rdsk/c0t5000C50030117C3Bd0
  SEAGATE   ST800FM0043   0005
  Peripheral device type: disk

Opcode  ServiceCDBName
(hex)   action(h)  size   
---
 00  6Test Unit Ready
 01  6Rezero Unit
 03  6Request Sense
 04  6Format Unit
 07  6Reassign Blocks
 08  6Read(6)
 0a  6Write(6)
 0b  6Seek(6)
 12  6Inquiry
 15  6Mode select(6)
 16  6Reserve(6)
 17  6Release(6)
 1a  6Mode sense(6)
 1b  6Start stop unit
 1c  6Receive diagnostic results
 1d  6Send diagnostic
 25 10Read capacity(10)
 28 10Read(10)
 2a 10Write(10)
 2b 10Seek(10)
 2e 10Write and verify(10)
 2f 10Verify(10)
 35 10Synchronize cache(10)
 37 10Read defect data(10)
 3b010Write buffer, combined header and data [or multiple 
modes]
 3b210Write buffer, data
 3b410Write buffer, download microcode and activate
 3b510Write buffer, download microcode, save, and activate
 3b610Write buffer, download microcode with offsets and 
activate
 3b710Write buffer, download microcode with offsets, save, 
and activate
 3ba10Write buffer, write data to echo buffer
 3bd10Write buffer, download microcode with offsets, select 
activation events, save and defer activate
 3be10Write buffer, download microcode with offsets, save 
and defer activate
 3bf10Write buffer, activate deferred microcode
 3b   1a10Write buffer, enable expander comms protocol and echo 
buffer
 3b   1c10Write buffer, download application client error 
history
 3c010Read buffer, combined header and data [or multiple 
modes]
 3c210Read buffer, data
 3c310Read buffer, descriptor
 3ca10Read buffer, read data from echo buffer
 3cb10Read buffer, echo buffer descriptor
 3c   1c10Read buffer, error history
 3e 10Read long(10)
 3f 10Write long(10)
 41 10Write same(10)
 42 10Unmap
 48210Sanitize, block erase
 48   1f10Sanitize, exit failure mode
 4c 10Log select
 4d 10Log sense
 55 10Mode select(10)
 56 10Reserve(10)
 57 10Release(10)
 5a 10Mode sense(10)
 5e010Persistent reserve in, read keys
 5e110Persistent reserve in, read reservation
 5e210Persistent reserve in, report capabilities
 5e310Persistent reserve in, read full status
 5f010Persistent reserv

Re: [OmniOS-discuss] infiniband

2014-11-12 Thread Rune Tipsmark

Just applied it on my server, it doubled my write speeds over iSER from ESX - 
basically they are par with SRP now, just worse latency.
Now I just gotta find out if there are tweaks to be done to ESX  to improve 
further.

Did you try creating more partitions say 4 partitions to see if it performs 
better than 2? I see about twice the speed using round robin with two paths 
instead of just a single path.

Br,
Rune

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Michael Rasmussen
Sent: Sunday, November 09, 2014 11:02 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] infiniband

On Mon, 10 Nov 2014 04:38:23 +
Rune Tipsmark  wrote:

> What network throughput were you looking at before the tweaking?
> 
Raised my performance from 5.2 gbps to 7.9 gbps (50% performance
increase)

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--
/usr/games/fortune -es says:
Who needs friends when you can sit alone in your room and drink?
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] slog limits write speed more than it should

2014-11-12 Thread Rune Tipsmark

Hi all,

Got a problem... with my pool using sync=always I see a max write speed of 
about 6000 IOPS (64KB block size) during storage vMotion.
I doesn't matter if I have one, two or three SLOGs, if  I use one it will just 
do ~6000 w/s, 0% busy (SLC IO Drive), if I use two of these each will do ~3000 
w/s or so... if I use 3  each will  do ~2000 w/s or so... the underlying disks 
in the pool are maybe 10-15% busy tops...
If I disable sync I see a tasks such as vMotion for a 40GB vm drop from 3-4 
minutes to less than 1 minute.

I can't see  where the problem is, each SLOG device can easily handle much more 
than 6000 IOPS and vSphere can easily get vMotion done faster when sync is 
disabled So essentially SLOGs aren't busy, disks aren't busy... sync just 
kills the speed... what parameters can I  tweak to push this performance up 
significantly? I tried 512B, 4KB, 64KB and 128KB block sizes, 64KB (default) 
seems to have the best performance.

Any ideas welcome.
Br,
Rune



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] slog limits write speed more than it should

2014-11-12 Thread Rune Tipsmark

http://www.fusionio.com/products/iodrive

160GB SLC

-Original Message-
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] 
Sent: Wednesday, November 12, 2014 3:17 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] slog limits write speed more than it should

On Wed, 12 Nov 2014, Rune Tipsmark wrote:
> 
> each SLOG device can easily handle much more than 6000 IOPS

Where may we find the specifications for the SSDs you are using?

6000 IOPS sounds like it might be quite a lot, depending on the SSD used.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] infiniband

2014-11-12 Thread Rune Tipsmark

Nope, just a subnet manager...

I would do something like this

dladm create-part -l ibp0 -P 0x p.ibp0
dladm create-part -l ibp0 -P 0x p.ibp2
dladm create-part -l ibp0 -P 0x p.ibp4
dladm create-part -l ibp0 -P 0x p.ibp6
dladm create-part -l ibp1 -P 0x p.ibp1
dladm create-part -l ibp1 -P 0x p.ibp3
dladm create-part -l ibp1 -P 0x p.ibp5
dladm create-part -l ibp1 -P 0x p.ibp7

ipadm create-if p.ibp0
ipadm create-if p.ibp1
ipadm create-if p.ibp2
ipadm create-if p.ibp3
ipadm create-if p.ibp4
ipadm create-if p.ibp5
ipadm create-if p.ibp6
ipadm create-if p.ibp7

ipadm create-addr -T static -a 10.98.0.10 p.ibp0/ipv4
ipadm create-addr -T static -a 10.99.0.10 p.ibp1/ipv4
ipadm create-addr -T static -a 10.98.0.12 p.ibp2/ipv4
ipadm create-addr -T static -a 10.99.0.12 p.ibp3/ipv4
ipadm create-addr -T static -a 10.98.0.14 p.ibp4/ipv4
ipadm create-addr -T static -a 10.99.0.14 p.ibp5/ipv4
ipadm create-addr -T static -a 10.98.0.16 p.ibp6/ipv4
ipadm create-addr -T static -a 10.99.0.16 p.ibp7/ipv4


-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Michael Rasmussen
Sent: Wednesday, November 12, 2014 3:48 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] infiniband

How exactly do you configure this?
Is a switch required?

On Wed, 12 Nov 2014 23:19:24 +
Rune Tipsmark  wrote:

> Multipath, round robin.
> You can create multiple IPoIB partitions... if you want 4 on each port
> 
> dladm create-part -l ibp0 -P 0x p.ibp0 dladm create-part -l 
> ibp0 -P 0x p.ibp2 dladm create-part -l ibp0 -P 0x 
> p.ibp4 dladm create-part -l ibp0 -P 0x p.ibp6 dladm 
> create-part -l ibp1 -P 0x p.ibp1 dladm create-part -l ibp1 -P 
> 0x p.ibp3 dladm create-part -l ibp1 -P 0x p.ibp5 dladm 
> create-part -l ibp1 -P 0x p.ibp7
> 
> 
> -Original Message-
> From: Michael Rasmussen [mailto:m...@miras.org]
> Sent: Wednesday, November 12, 2014 3:00 PM
> To: Rune Tipsmark
> Subject: Re: [OmniOS-discuss] infiniband
> 
> On Wed, 12 Nov 2014 22:35:46 +
> Rune Tipsmark  wrote:
> 
> > 
> > Did you try creating more partitions say 4 partitions to see if it performs 
> > better than 2? I see about twice the speed using round robin with two paths 
> > instead of just a single path.
> > 
> What do you mean by partitions?
> By two paths do you mean multipath or bonding?
> 


--
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--
/usr/games/fortune -es says:
index, n.:
Alphabetical list of words of no possible interest where an
alphabetical list of subjects with references ought to be.
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] infiniband

2014-11-12 Thread Rune Tipsmark

Yes /24 and I also do the same as you, diff subnets for each port, else its 
game over...

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Michael Rasmussen
Sent: Wednesday, November 12, 2014 4:45 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] infiniband

On Thu, 13 Nov 2014 00:22:40 +
Rune Tipsmark  wrote:

> 
> ipadm create-addr -T static -a 10.98.0.10 p.ibp0/ipv4 ipadm 
> create-addr -T static -a 10.99.0.10 p.ibp1/ipv4 ipadm create-addr 
> -T static -a 10.98.0.12 p.ibp2/ipv4 ipadm create-addr -T static -a 
> 10.99.0.12 p.ibp3/ipv4 ipadm create-addr -T static -a 10.98.0.14 
> p.ibp4/ipv4 ipadm create-addr -T static -a 10.99.0.14 
> p.ibp5/ipv4 ipadm create-addr -T static -a 10.98.0.16 
> p.ibp6/ipv4 ipadm create-addr -T static -a 10.99.0.16 
> p.ibp7/ipv4
> 
What is the subnet, /16 or /24

>From my findings you cannot have both ports on the same subnet so I guess this 
>means /24, right?

--
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--
/usr/games/fortune -es says:
You will receive a legacy which will place you above want.
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] infiniband

2014-11-12 Thread Rune Tipsmark

Not sure about IPMP, I would just use the same number of partitions on my ESX 
and round robin the whole thing...

Do you have more info on the network tweaks you did earlier? With iSER I see a 
slightly higher response time than with SRP but lower spikes.
Throughput is 10% higher and CPU 50% lower, so I might just stick with iSER 
from now on.


-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Michael Rasmussen
Sent: Wednesday, November 12, 2014 4:56 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] infiniband

On Thu, 13 Nov 2014 00:22:40 +
Rune Tipsmark  wrote:

> 
> ipadm create-addr -T static -a 10.98.0.10 p.ibp0/ipv4 ipadm 
> create-addr -T static -a 10.99.0.10 p.ibp1/ipv4 ipadm create-addr 
> -T static -a 10.98.0.12 p.ibp2/ipv4 ipadm create-addr -T static -a 
> 10.99.0.12 p.ibp3/ipv4 ipadm create-addr -T static -a 10.98.0.14 
> p.ibp4/ipv4 ipadm create-addr -T static -a 10.99.0.14 
> p.ibp5/ipv4 ipadm create-addr -T static -a 10.98.0.16 
> p.ibp6/ipv4 ipadm create-addr -T static -a 10.99.0.16 
> p.ibp7/ipv4
> 
What are the next steps? IPMP? or something else?

--
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--
/usr/games/fortune -es says:
Win95 is not a virus; a virus does something.
-- unknown source
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] slog limits write speed more than it should

2014-11-14 Thread Rune Tipsmark

Looking at the first link he clearly states that ZIL is written to in a round 
robin fashion, I was under the impression that 2 log devices would then be 
faster than 1…unless mirrored of course.

If this is not true, what is the point of allowing more than 1 vdev as log 
device at all?

How can I test the max sped of my log device directly in OmniOS?

Br,
Rune

From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Nicholas George
Sent: Wednesday, November 12, 2014 9:24 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] slog limits write speed more than it should

From what I understand, ZFS writes to the intent log in a single threaded 
synchronous fashion. Adding three drives just means that the first block is 
written to the first drive synchronously, then the second block to the second 
drive synchronously, then the third to the third drive and then back around to 
the first. You don't actually increase your throughput beyond the performance 
of a single drive.

http://nex7.blogspot.com/2013/04/zfs-intent-log.html
http://www.nexentastor.org/boards/5/topics/6179

On Wed, Nov 12, 2014 at 6:21 PM, Rune Tipsmark 
mailto:r...@steait.net>> wrote:
http://www.fusionio.com/products/iodrive

160GB SLC

-Original Message-
From: Bob Friesenhahn 
[mailto:bfrie...@simple.dallas.tx.us<mailto:bfrie...@simple.dallas.tx.us>]
Sent: Wednesday, November 12, 2014 3:17 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>
Subject: Re: [OmniOS-discuss] slog limits write speed more than it should

On Wed, 12 Nov 2014, Rune Tipsmark wrote:
>
> each SLOG device can easily handle much more than 6000 IOPS

Where may we find the specifications for the SSDs you are using?

6000 IOPS sounds like it might be quite a lot, depending on the SSD used.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us<mailto:bfrie...@simple.dallas.tx.us>, 
http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] slog limits write speed more than it should

2014-11-14 Thread Rune Tipsmark

Well that sucks... I guess one more reason to move to NV-Dimms to replace slow 
SLC cards.
Br,
Rune

-Original Message-
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] 
Sent: Friday, November 14, 2014 6:48 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] slog limits write speed more than it should

On Fri, 14 Nov 2014, Rune Tipsmark wrote:

> 
> Looking at the first link he clearly states that ZIL is written to in 
> a round robin fashion, I was under the impression that 2 log devices would 
> then be faster than 1…unless mirrored of course.

This could be a mistaken impression.  The slog commits each write before 
proceeding to the next write.  If the involved SSDs have a fixed minimum write 
latency, then this would limit the maximum transaction rate regardless of the 
number of SSDs involved.  There could be an advantage to more SSDs if the 
additional time between writes allows the SSD to more effectively prepare for 
the next write and reduce the write latency.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] slog limits write speed more than it should

2014-11-14 Thread Rune Tipsmark

Is there a way to stripe two block devices and use them as log?
I tried following this one 
https://blogs.oracle.com/bilke/entry/raid_0_stripe_on_solaris
But I cannot use the device in ZFS - getting the following error: cannot use 
'/dev/md/rdsk/zil1d0': must be a block device or regular file

Br,
Rune

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Rune Tipsmark
Sent: Friday, November 14, 2014 9:47 AM
To: Bob Friesenhahn
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] slog limits write speed more than it should

Well that sucks... I guess one more reason to move to NV-Dimms to replace slow 
SLC cards.
Br,
Rune

-Original Message-
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us]
Sent: Friday, November 14, 2014 6:48 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] slog limits write speed more than it should

On Fri, 14 Nov 2014, Rune Tipsmark wrote:

> 
> Looking at the first link he clearly states that ZIL is written to in 
> a round robin fashion, I was under the impression that 2 log devices would 
> then be faster than 1…unless mirrored of course.

This could be a mistaken impression.  The slog commits each write before 
proceeding to the next write.  If the involved SSDs have a fixed minimum write 
latency, then this would limit the maximum transaction rate regardless of the 
number of SSDs involved.  There could be an advantage to more SSDs if the 
additional time between writes allows the SSD to more effectively prepare for 
the next write and reduce the write latency.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] slog limits write speed more than it should

2014-11-14 Thread Rune Tipsmark

I get only about half the bandwidth with Sync=Always compared to Sync=Disabled.

Using an SLC device that should perform better, its rated as 750 MB/sec, I only 
get something like 60% of that at best of times. If there is a way to stripe a 
ZIL it would be great. I have enough to mirror two stripes and get the speed I 
want.

Copying ~60GB from one LUN to another on same ZFS box.

Sync=Always:

[cid:image001.png@01D0001C.7E3289F0]

Sync=Disabled

[cid:image002.png@01D0001C.7E3289F0]

-Original Message-
From: Rune Tipsmark
Sent: Friday, November 14, 2014 11:53 AM
To: Rune Tipsmark; Bob Friesenhahn
Cc: omnios-discuss@lists.omniti.com
Subject: RE: [OmniOS-discuss] slog limits write speed more than it should

Is there a way to stripe two block devices and use them as log?

I tried following this one 
https://blogs.oracle.com/bilke/entry/raid_0_stripe_on_solaris

But I cannot use the device in ZFS - getting the following error: cannot use 
'/dev/md/rdsk/zil1d0': must be a block device or regular file

Br,

Rune

-Original Message-

From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Rune Tipsmark

Sent: Friday, November 14, 2014 9:47 AM

To: Bob Friesenhahn

Cc: omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>

Subject: Re: [OmniOS-discuss] slog limits write speed more than it should

Well that sucks... I guess one more reason to move to NV-Dimms to replace slow 
SLC cards.

Br,

Rune

-Original Message-

From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us]

Sent: Friday, November 14, 2014 6:48 AM

To: Rune Tipsmark

Cc: omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>

Subject: Re: [OmniOS-discuss] slog limits write speed more than it should

On Fri, 14 Nov 2014, Rune Tipsmark wrote:

>

> Looking at the first link he clearly states that ZIL is written to in

> a round robin fashion, I was under the impression that 2 log devices would 
> then be faster than 1…unless mirrored of course.

This could be a mistaken impression.  The slog commits each write before 
proceeding to the next write.  If the involved SSDs have a fixed minimum write 
latency, then this would limit the maximum transaction rate regardless of the 
number of SSDs involved.  There could be an advantage to more SSDs if the 
additional time between writes allows the SSD to more effectively prepare for 
the next write and reduce the write latency.

Bob

--

Bob Friesenhahn

bfrie...@simple.dallas.tx.us<mailto:bfrie...@simple.dallas.tx.us>, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___

OmniOS-discuss mailing list

OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>

http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] need to change c17d0 to c15d0

2014-11-19 Thread Rune Tipsmark

I moved one of my PCI-E IOdrives and the disks changed from c14d0 and  c15d0 to 
c16d0 and c17d0
How do I change it back so I can get my pool back online?

32. c16d0 
  /pci@79,0/pci8086,3c02@1/pci10b5,8616@0/pci10b5,8616@5/pci103c,178e@0
33. c17d0 
  /pci@79,0/pci8086,3c02@1/pci10b5,8616@0/pci10b5,8616@6/pci103c,178e@0

Br,
Rune

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] need to change c17d0 to c15d0

2014-11-19 Thread Rune Tipsmark

Ok, no other option? I am physically not near and its “just” a test pool so I 
can kill it and start over.
If I try the import option, should it  recognize the metadata of the devices 
and reuse it?
Br,
Rune

From: Scott LeFevre [mailto:slefe...@indy.rr.com]
Sent: Wednesday, November 19, 2014 6:47 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] need to change c17d0 to c15d0

Put the drive/hba back and then export your pool (zpool export pool-name). Then 
move the drive/hba and import the pool (zpool import pool-name).
If putting the drive/hba back isn't an option, you could try a forced import 
with 'zpool import -f pool-name'.

Cheers,
--
Scott LeFevre
317-696-1010

On Wed, 2014-11-19 at 14:36 +, Rune Tipsmark wrote:
I moved one of my PCI-E IOdrives and the disks changed from c14d0 and  c15d0 to 
c16d0 and c17d0

How do I change it back so I can get my pool back online?

32. c16d0 

  /pci@79,0/pci8086,3c02@1/pci10b5,8616@0/pci10b5,8616@5/pci103c,178e@0

33. c17d0 

  /pci@79,0/pci8086,3c02@1/pci10b5,8616@0/pci10b5,8616@6/pci103c,178e@0

Br,

Rune

___

OmniOS-discuss mailing list

OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>

http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] need to change c17d0 to c15d0

2014-11-19 Thread Rune Tipsmark

Ok makes sense, but can I just destroy my current pool in the failed state and 
then import it with the -f option and it should show up with all 4 drives...

pool: pool02
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

NAMESTATE READ WRITE CKSUM  CAPProduct 
/napp-it   IOstat mess
pool02  UNAVAIL  0 0 0  insufficient replicas
  c12d0 ONLINE   0 0 0   
  c13d0 ONLINE   0 0 0   
  c14d0 UNAVAIL  0 0 0  cannot open 

  c15d0 UNAVAIL  0 0 0  cannot open

-Original Message-
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] 
Sent: Wednesday, November 19, 2014 7:00 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] need to change c17d0 to c15d0

On Wed, 19 Nov 2014, Rune Tipsmark wrote:

> 
> Ok, no other option? I am physically not near and its “just” a test pool so I 
> can kill it and start over.
> 
> If I try the import option, should it  recognize the metadata of the devices 
> and reuse it?

Zfs puts identifying information on all the pool disks so that they may be 
properly imported by pool name without regard for device identifiers.  'zpool 
import' scans all of the available devices, searching for devices which are in 
that pool.

If you don't want to physically move the drives again, then just re-import the 
pool.

It is always a good idea to export a pool before moving its drives around and 
then import the pool afterward.  Usually it just takes a few seconds to import 
the pool.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] need to change c17d0 to c15d0

2014-11-19 Thread Rune Tipsmark

Zpool destroy pool02 and then zpool import -f pool02 worked.
Br,
Rune

-Original Message-
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] 
Sent: Wednesday, November 19, 2014 7:24 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: RE: [OmniOS-discuss] need to change c17d0 to c15d0

On Wed, 19 Nov 2014, Rune Tipsmark wrote:

> Ok makes sense, but can I just destroy my current pool in the failed 
> state and then import it with the -f option and it should show up with 
> all 4 drives...

That is actually a bit more work for you since two of the drives would be 
orphaned and look like they come from the former pool.  They will not be 
"destroyed" while they are still in state UNAVAIL.

It will take a bit more work to re-use these orphaned drives since zfs has 
precautions to avoid accidentally overwriting drives which are already in a 
pool.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Active-Active vSphere

2014-11-27 Thread Rune Tipsmark

Hi guys,
Does anyone know if Active/Active and Round Robin is supported from vSphere 
towards OmniOS ZFS on Fiber Channel?
Br
Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Active-Active vSphere

2014-11-27 Thread Rune Tipsmark

so to simplify, say we have one esxi host that has two FC ports, one omnios/zfs 
server has two FC ports as well.
Should it run ALUA with Round Robin instead of the default ALUA with MRU (most 
recently used). RR has traffic on both paths (says Active I/O on both) and MRU 
only on one...

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Saso Kiselkov
Sent: Thursday, November 27, 2014 12:47 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] Active-Active vSphere

On 11/27/14 3:35 PM, Rune Tipsmark wrote:
> Hi guys,
> 
> Does anyone know if Active/Active and Round Robin is supported from 
> vSphere towards OmniOS ZFS on Fiber Channel?

The short answer is: yes, but you wouldn't want to employ Round-Robin on it. A 
ZFS pool can only be imported on a single node, never simultaneously on two or 
more. However, using ALUA it's possible to make the LUs on it visible from two 
OmniOS nodes - this is NOT round robin, though. The initiator will see two 
paths to the LU, but only one should be active at any one time (the one that 
has the pool holding the LU imported). Access to the LU over the secondary 
target will be possible, but slow. Upon failover, the secondary would grab the 
pool and become the preferred ALUA path, so it all works out OK. Google "ALUA" 
for more on the theory behind it.

It is a complicated to set up, though, so be aware of that.

Cheers,
--
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Active-Active vSphere

2014-11-28 Thread Rune Tipsmark

Okay, i noticed alua support is not enabled per default on omnios. How do i 
enable that?



> On Nov 27, 2014, at 11:29 PM, Saso Kiselkov  wrote:
> 
>> On 11/27/14 11:40 PM, Rune Tipsmark wrote:
>> so to simplify, say we have one esxi host that has two FC ports, one 
>> omnios/zfs server has two FC ports as well.
>> Should it run ALUA with Round Robin instead of the default ALUA with MRU 
>> (most recently used). RR has traffic on both paths (says Active I/O on both) 
>> and MRU only on one...
> 
> Ah, now this is a different story altogether, if it's just one storage
> head node. In that case, Round-Robin will of course work exactly as
> expected, be it over FC or iSCSI and any number of links. The important
> thing is that the backing ZFS pool is only imported on one machine.
> 
> Cheers,
> -- 
> Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] PCIe dedicated device for ZIL

2014-12-05 Thread Rune Tipsmark

its a good idea, I use Fusion IO SLC drives but I still see a limit of about 
750 MB/sec which is too little, also I need to run multiple streams to achieve 
this, if I only use a single data stream I only get around 350 MB/sec. I makes 
no difference if I use 1 or 4 SLC drives, speed remains pretty much the same 
+/-  5%.


After testing some mSata drives in one of Intels NUCs it became interesting to 
investigate if it would be possible to get a motherboard with say 10 or 20 
mSata slots, I even asked SuperMicros engineers and they said no coz its not an 
enterprise product.


However there are some PCIe cards available where you can attach mSata drives 
and even raid them and while I havent tried this yet, I think it could deliver 
some nice cheap SLOG perforformance


check http://www.addonics.com/products/ad4mspx2.php


Maybe there are better products out there, this was just what I came across 
initially.


Maybe even completely skip a dedicated log device and set the pool to 
sync=always and base it on for example 8 of the above PCIe cards each with 4 
1TB drives and you could have 32TB raw SSD ultrafast storage in a 3unit and 
with compression and possibly dedup you could easily store a lot of data on 
even when mirroring.


This is probably my next project anyway so interested to see what you find out 
in terms of upping the speed and lowering the latency.


br,

Rune




From: OmniOS-discuss  on behalf of 
Angel Angelov 
Sent: Tuesday, December 2, 2014 3:57 PM
To: omnios-discuss@lists.omniti.com
Subject: [OmniOS-discuss] PCIe dedicated device for ZIL

Hi guys,

does anyone played with some of these PCIe devices on the market as a cheaper 
ZIL dedicated device instead of this 
one
 for example that would work with OmniOS out of the box?

Here's one of the products I have found in the Internet googling around but 
it's way too big for the purpose:
http://www.kingspec.com/en/product_xx187.html

I am trying to find a good price/performance balance when full SSD storage 
based pool is in use.

Thanks in advance.



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] hangs on reboot

2014-12-11 Thread Rune Tipsmark

hi all,



I got a bunch (3) installations of omnios on SuperMicro hardware and all 3 have 
issues rebooting. They simply hang and never ever reboot.



The install is latest version and I only added the storage-server package and 
installed napp-it and changed the fibre channel setting in 
/kernel/drv/emlxs.conf target-mode=1



two nics igb0 and igb1 configured as aggregation (aggr0)



besides this its 100% default install, not 10 minutes old...



Any ideas?

br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] hangs on reboot

2014-12-11 Thread Rune Tipsmark

NIC's are not shared with IPMI.



I will try the target-mode first, however on the 3rd box I actually have 
Infiniband and it also caused the same issue without touching the target-mode 
for FC.



I have a 4th system on an old HP server and it reboots perfect every time, no 
target-mode and has Infiniband as well... I am leaning towards something with 
the SuperMicro hardware but can't really pinpoint it.

br,

Rune


From: Dan McDonald 
Sent: Thursday, December 11, 2014 11:39 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] hangs on reboot

Nothing printed out on the console?

And try eliminating the target mode first - just in case.

Also, is one of your NICs a dual IPMI/host NIC?  If so, disable the IPMI 
portion, we don't cope with shared NIC ports like that.

Dan

Sent from my iPhone (typos, autocorrect, and all)

On Dec 11, 2014, at 5:25 PM, Rune Tipsmark 
mailto:r...@steait.net>> wrote:


hi all,



I got a bunch (3) installations of omnios on SuperMicro hardware and all 3 have 
issues rebooting. They simply hang and never ever reboot.



The install is latest version and I only added the storage-server package and 
installed napp-it and changed the fibre channel setting in 
/kernel/drv/emlxs.conf target-mode=1



two nics igb0 and igb1 configured as aggregation (aggr0)



besides this its 100% default install, not 10 minutes old...



Any ideas?

br,

Rune

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] hangs on reboot

2014-12-11 Thread Rune Tipsmark

still same... output can be seen here:

http://i.imgur.com/BuwaGGn.png

From: Dan McDonald 
Sent: Thursday, December 11, 2014 11:39 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] hangs on reboot

Nothing printed out on the console?

And try eliminating the target mode first - just in case.

Also, is one of your NICs a dual IPMI/host NIC?  If so, disable the IPMI 
portion, we don't cope with shared NIC ports like that.

Dan

Sent from my iPhone (typos, autocorrect, and all)

On Dec 11, 2014, at 5:25 PM, Rune Tipsmark 
mailto:r...@steait.net>> wrote:

hi all,

I got a bunch (3) installations of omnios on SuperMicro hardware and all 3 have 
issues rebooting. They simply hang and never ever reboot.

The install is latest version and I only added the storage-server package and 
installed napp-it and changed the fibre channel setting in 
/kernel/drv/emlxs.conf target-mode=1

two nics igb0 and igb1 configured as aggregation (aggr0)

besides this its 100% default install, not 10 minutes old...

Any ideas?

br,

Rune

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] hangs on reboot

2014-12-11 Thread Rune Tipsmark

didn't even see this one in the thread, reboot -p did it for me.

I have a system where I had done the below but it still hangs, will try -p on 
it during the weekend.

thx,
br
Rune

From: Paul Henson  on behalf of Paul B. Henson 

Sent: Thursday, December 11, 2014 11:32 PM
To: Rune Tipsmark; omnios-discuss@lists.omniti.com
Subject: RE: [OmniOS-discuss] hangs on reboot

> Rune Tipsmark
> Sent: Thursday, December 11, 2014 2:26 PM
>
> I got a bunch (3) installations of omnios on SuperMicro hardware and all 3
> have issues rebooting. They simply hang and never ever reboot.

Disable fast reboot:

svccfg -s "system/boot-config:default" setprop
config/fastreboot_default=false
svccfg -s "system/boot-config:default" setprop
config/fastreboot_onpanic=false
svcadm refresh svc:/system/boot-config:default

Some systems (it seems particularly common on supermicro hardware) tend to
wedge up with fast reboot enabled. If you want to verify this is the issue
before changing the configuration, try 'reboot -p' which should do a one
time regular reboot without changing the default.

Dan, there was some talk of making fast reboot disabled the default, and
having people who wanted it need to enable it, rather than the other way
around? Any thoughts?


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] latency spikes ~every hour

2014-12-14 Thread Rune Tipsmark

hi all,



All my vSphere (ESXi5.1) hosts experience a big spike in latency every hour or 
so.

I tested on Infiniband iSER and SRP and also 4Gbit FC and 8GBit FC. All exhibit 
the same behavior so I don't think its the connection that is causing this.

When I modify the arc_shrink_shift 10 (192GB Ram in the System) it helps a bit, 
the spikes are still with the same regularity but latency peaks at about 
~5000ms for the most part. if I leave arc_shrink_shift at the default value 
they can be higher - up to 15000ms.

Looking at the latency as seen from the vSphere hosts the average is usually 
below 1ms for most datastores.



Any ideas what can cause this or what can be done to fix?

br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Fibre Target problems

2014-12-14 Thread Rune Tipsmark

did you ever find a solution?
I have the same problem on a SuperMicro based system... FC drops and it causes 
Windows to loose connection and copying files fails... 
br,
Rune

From: OmniOS-discuss  on behalf of 
Mark 
Sent: Sunday, September 14, 2014 11:30 AM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] Fibre Target problems

On 11/09/2014 9:25 p.m., OSN | Marian Fischer wrote:
> Hi,
>
> do you have Sync=disabled in ZFS / ZPOOL settings?
> If not, this can cause the slow speed ...

No, but I didn't with OI either, and it has no issues achieving 400+
Mbytes/sec and doesn't have link loss issues.

>
> Mit besten Gruessen
>
> Marian Fischer
> --
> OSN Online Service Nuernberg GmbH http://www.osn.de
> Bucher Str. 78Tel: 0911/39905-0
> 90408 Nuernberg   Fax: 0911/39905-55
> HRB 15022 Nuernberg, Ust-Id: DE189301263  GF: Joerg Goltermann
>
> -Ursprüngliche Nachricht-
> Von: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] Im
> Auftrag von Mark
> Gesendet: Donnerstag, 11. September 2014 09:57
> An: omnios-discuss@lists.omniti.com
> Betreff: [OmniOS-discuss] Fibre Target problems
>
> I have a Supermicro X7DWN+ based server with 24 x 4Tb SAS disks on a LSI
> 6G IT mode hba.
>
> After configuring a bunch of 10Tb Luns and presenting to a Win2012
> server, write is very slow < 6 Mb/sec, and writing causes the Fc link to
> drop repeatedly.
> Reads reach about 400Mb/sec.
>
> OmniOS version is r151006_057.
>
> I've tried two different 4G and an 8G QLogic adapters, via switch or
> direct, but there is no change in behaviour. Only 1 HBA and path.
>
> The odd thing is the exact same hardware and os setup (fct, zpool etc.)
> works well with OI or Solaris 11/11, with writes getting around 4-500
> Mb/sec.
> I had to start with OI147 and work up, as the later text installer is
> very buggy. Upgrading OI with qlt installed fails.
>
> I'm at a bit of a loss as to a likely cause.
>
> Anyone have any pointers ?
>
>
> Some details and log:
>
> HBA Port WWN: 211b320aba27
>   Port Mode: Target
>   Port ID: 10100
>   OS Device Name: Not Applicable
>   Manufacturer: QLogic Corp.
>   Model: QLE2460
>   Firmware Version: 5.2.1
>   FCode/BIOS Version: N/A
>   Serial Number: not available
>   Driver Name: COMSTAR QLT
>   Driver Version: 20100505-1.05
>   Type: F-port
>   State: online
>   Supported Speeds: 1Gb 2Gb 4Gb
>   Current Speed: 4Gb
>   Node WWN: 201b320aba27
>   Link Error Statistics:
>   Link Failure Count: 0
>   Loss of Sync Count: 0
>   Loss of Signal Count: 0
>   Primitive Seq Protocol Error Count: 0
>   Invalid Tx Word Count: 4
>   Invalid CRC Count: 0
>
>
>
> prtconf
>
> value='ISP2432-based 4Gb Fibre Channel to PCI Express HBA'
> name='subsystem-name' type=string items=1
> value='unknown subsystem'
> Device Minor Nodes:dev=(3,1)
> dev_path=/pci@0,0/pci8086,4027@7/pci8086,3500@0/pci8086,3510@0/pci1077,137@0
> :qlt1
>
> echo "*stmf_trace_buf/s" | mdb -k
>
> 0xff090f7c: qlt1,0:0001318: iport is ff090f1921b8
>
> :0003718: Imported the LU 600144f09cdd922453f56a5c0005
> :0003719: Imported the LU 600144f09cdd922453f56a5d0006
> :0003721: Imported the LU 600144f09cdd922453f56a5d0007
> :0003722: Imported the LU 600144f09cdd922453f56a5e0008
> :0003725: Imported the LU 600144f09cdd922453f56a5e0009
> :0003726: Imported the LU 600144f09cdd922453f56a5f000a
> :0003728: Imported the LU 600144f09cdd922453f56a5f000b
> qlt1,0:0003815: Async event 8010 mb1=f8e8 mb2=c108, mb3=0, mb5=3362, mb6=0
> qlt1,0:0003815: Async event 8011 mb1=3 mb2=c108, mb3=0, mb5=3362, mb6=0
> qlt1,0:0003815: port state change from 0 to e
> qlt1,0:0003815: Async event 8014 mb1= mb2=6, mb3=0, mb5=3362, mb6=0
> qlt1,0:0003815: Posting unsol ELS 3 (PLOGI) rp_id=e8 lp_id=ef
> qlt1,0:0003815: Rcvd PLOGI with wrong lportid ef, expecting 0. Killing ELS.
> qlt1,0:0003815: port state change from e to 4
> qlt1,0:0004216: Posting unsol ELS 3 (PLOGI) rp_id=e8 lp_id=ef
> qlt1,0:0004216: Processing unsol ELS 3 (PLOGI) rp_id=e8
> qlt1,0:0004216: Posting unsol ELS 20 (PRLI) rp_id=e8 lp_id=ef
> qlt1,0:0004216: Processing unsol ELS 20 (PRLI) rp_id=e8
> qlt1,0:7247736: Posting unsol ELS 5 (LOGO) rp_id=e8 lp_id=ef
> qlt1,0:7247736: Processing unsol ELS 5 (LOGO) rp_id=e8
> qlt1,0:7247736: handling LOGO rp_id e8. Triggering cleanup
> :7248836: fct_port_shutdown: port-ff090f1920b8, fct_process_logo:
> unable to clean up I/O. iport-ff090f1921b8, icmd-ff0931089a00
> qlt1,0:7248836: port state change from 4 to 11
> :7248836: fct_port_shutdown: port-ff090f1920b8, fct_process_logo:
> unable to clean up I/O. iport-ff090f1921b8, icmd-

Re: [OmniOS-discuss] latency spikes ~every hour

2014-12-15 Thread Rune Tipsmark

ok I removed some of my SLOG devices and currently I am only using a single 
SLOG (no mirror or anything) and no spikes seen since.

I wonder why multiple SLOG devices would cause this.

br.

Rune


From: OmniOS-discuss  on behalf of 
Rune Tipsmark 
Sent: Sunday, December 14, 2014 2:27 PM
To: omnios-discuss@lists.omniti.com
Subject: [OmniOS-discuss] latency spikes ~every hour


hi all,



All my vSphere (ESXi5.1) hosts experience a big spike in latency every hour or 
so.

I tested on Infiniband iSER and SRP and also 4Gbit FC and 8GBit FC. All exhibit 
the same behavior so I don't think its the connection that is causing this.

When I modify the arc_shrink_shift 10 (192GB Ram in the System) it helps a bit, 
the spikes are still with the same regularity but latency peaks at about 
~5000ms for the most part. if I leave arc_shrink_shift at the default value 
they can be higher - up to 15000ms.

Looking at the latency as seen from the vSphere hosts the average is usually 
below 1ms for most datastores.



Any ideas what can cause this or what can be done to fix?

br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Fibre Target problems

2014-12-15 Thread Rune Tipsmark

where do you check that?
br,
Rune

From: Mark 
Sent: Monday, December 15, 2014 7:19 AM
To: Rune Tipsmark; omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] Fibre Target problems

On 15/12/2014 4:44 a.m., Rune Tipsmark wrote:
> did you ever find a solution?
> I have the same problem on a SuperMicro based system... FC drops and it 
> causes Windows to loose connection and copying files fails...
> br,
> Rune

I installed an older OpenIndiana version to work around the issue.

One thing I didn't check was if the fc target cache was disabled (may be
the current default), so may be worth checking that.

Mark.

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] dedup causes zfs/omnios to drop connections.

2014-12-15 Thread Rune Tipsmark

hi all,



got a new system I was intending on using as backup repository. Whenever dedup 
is enabled it dies after anywhere between 5 and 30 minutes. I need to reboot 
OmniOS to get it back online.

the files being copied onto the zfs vols are rather large, about ~2TB each... 
if I copy smaller files, say 400GB or so, it takes longer for it to crash.



what can be done to fix this? after Windows (initiator) looses the connection 
(both Fibre Channel and iSCSI) I still see a lot of disk activity using iostat 
- disks remain active for minutes after the copying has died... its like ZFS 
cannot handle dedup of large files..

br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] dedup causes zfs/omnios to drop connections.

2014-12-15 Thread Rune Tipsmark

we have only 24GB of ram on this system...
I was under the impression it would not require much when the block size was 
larger, we are on 64kb, so would expect around 2GB per 1TB.
yet we cannot even get more than a few TB on the system before it dies. 

the main purpose with this system was dedup, no problem with slow speed... its 
for backup only, so could not care less if its rather slow - but not responding 
is not acceptable.
br,
Rune

From: OmniOS-discuss  on behalf of 
Dominik Hassler 
Sent: Monday, December 15, 2014 10:11 PM
To: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] dedup causes zfs/omnios to drop connections.

Hi,

we used dedup on a production machine w/ 256 GB RAM, but disabled it
after a couple of days due to huge performance impact.

I would not recommend to use dedup even when having "enough" RAM.

On 12/15/2014 09:53 PM, Dan McDonald wrote:
>
>> On Dec 15, 2014, at 3:43 PM, Rune Tipsmark  wrote:
>>
>> hi all,
>>
>> got a new system I was intending on using as backup repository. Whenever 
>> dedup is enabled it dies after anywhere between 5 and 30 minutes. I need to 
>> reboot OmniOS to get it back online.
>> the files being copied onto the zfs vols are rather large, about ~2TB 
>> each... if I copy smaller files, say 400GB or so, it takes longer for it to 
>> crash.
>>
>> what can be done to fix this? after Windows (initiator) looses the 
>> connection (both Fibre Channel and iSCSI) I still see a lot of disk activity 
>> using iostat - disks remain active for minutes after the copying has died... 
>> its like ZFS cannot handle dedup of large files..
>
> Dedup is a memory pig and not very well implemented in ZFS.  I'd highly 
> recommend against it in production.  Either that, or really increase your 
> memory for your system in question.  There was some work going on at Nexenta 
> to perhaps put the dedup tables (DDTs) onto a dedicated slog-like device, but 
> I believe that work stalled.
>
> Sorry,
> Dan
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] mount/create volume lu from snapshot

2014-12-22 Thread Rune Tipsmark

hi all,



I have two omnios boxes and zfs replication going between the two every 30 min.



I am replicating a volume lu pool01/vol01 from hostA to hostB



how can I mount this or create a volume lu out of it on my destination box?



br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] mount/create volume lu from snapshot

2014-12-22 Thread Rune Tipsmark

hi Dan,

tried that, get data file error.

I tried zfs list and found my snapshot

pool01/PHTVOL01@1419269859_repli_zfs_zfs20_nr_4

then stmfadm create-lu pool01/PHTVOL01@1419269859_repli_zfs_zfs20_nr_4

doesn't work...  also tried pool01/PHTVOL01, no luck either.

which file do I need to use?

br,
Rune


From: Dan McDonald 
Sent: Monday, December 22, 2014 8:07 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] mount/create volume lu from snapshot

> On Dec 22, 2014, at 1:59 PM, Rune Tipsmark  wrote:
>
> hi all,
>
> I have two omnios boxes and zfs replication going between the two every 30 
> min.
>
> I am replicating a volume lu pool01/vol01 from hostA to hostB
>
> how can I mount this or create a volume lu out of it on my destination box?
>

"stmfadm create-lu" does this for a given zvol.

Read the stmfadm(1M) man page for more details.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] zfs destroy takes forever, how do I set async_destroy?

2014-12-29 Thread Rune Tipsmark

hi all,

as stated above, I have a server where I am syncing some 70 or so TB from and 
it has some very large snapshots and a destroy takes forever... has run for 
hours and hours now, 100% disk busy...I read there was a feature flag 
async_destroy but I don't seem to be able to find it.



Any ideas? the box is more or less useless and I have more rather large volumes 
I need to sync to a remote device.



br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zfs destroy takes forever, how do I set async_destroy?

2014-12-30 Thread Rune Tipsmark

I found out the feature is already enabled, I guess destroying very large 
snapshots just takes a very long time regardless...

br,

Rune


From: OmniOS-discuss  on behalf of 
Rune Tipsmark 
Sent: Monday, December 29, 2014 10:59 PM
To: omnios-discuss
Subject: [OmniOS-discuss] zfs destroy takes forever, how do I set async_destroy?


hi all,

as stated above, I have a server where I am syncing some 70 or so TB from and 
it has some very large snapshots and a destroy takes forever... has run for 
hours and hours now, 100% disk busy...I read there was a feature flag 
async_destroy but I don't seem to be able to find it.



Any ideas? the box is more or less useless and I have more rather large volumes 
I need to sync to a remote device.



br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] offline dedup

2015-01-05 Thread Rune Tipsmark

hi all,

does anyone know if offline dedup is something we can expect in the future of 
ZFS?

I have some backup boxes with 50+TB on them and only 32GB Ram and even zdb -S 
crashes due to lack of memory. Seems complete overkill to put 256+GB ram in a 
slow backup box... and if I enable dedup as is, it will crash after writing a 
few TB - reboot required.

br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed

2015-01-19 Thread Rune Tipsmark

hi all,



just in case there are other people out there using their ZFS box against 
vSphere 5.1 or later... I found my storage vmotion were slow... really slow... 
not much info available and so after a while of trial and error I found a nice 
combo that works very well in terms of performance, latency as well as 
throughput and storage vMotion.



- Use ZFS volumes instead of thin provisioned LU's - Volumes support two of the 
VAAI features

- Use thick provisioning disks, lazy zeroed disks in my case reduced storage 
vMotion by 90% or so - machine 1 dropped from 8½ minutes to 23 seconds and 
machine 2 dropped from ~7 minutes to 54 seconds... a rather nice improvement 
simply by changing from thin to thick provisioning.

- I dropped my Qlogic HBA max queue depth from default 64 to 16 on all ESXi 
hosts and now I see an average latency of less than 1ms per data store (on 8G 
fibre channel).  Of course there are spikes when doing storage vMotion at these 
speeds but its well worth it.



I am getting to the point where I am almost happy with my ZFS backend for 
vSphere.



br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed

2015-01-19 Thread Rune Tipsmark


From: Richard Elling 
Sent: Monday, January 19, 2015 1:57 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion 
Speed


On Jan 19, 2015, at 3:55 AM, Rune Tipsmark 
mailto:r...@steait.net>> wrote:


hi all,



just in case there are other people out there using their ZFS box against 
vSphere 5.1 or later... I found my storage vmotion were slow... really slow... 
not much info available and so after a while of trial and error I found a nice 
combo that works very well in terms of performance, latency as well as 
throughput and storage vMotion.



- Use ZFS volumes instead of thin provisioned LU's - Volumes support two of the 
VAAI features


AFAIK, ZFS is not available in VMware. Do you mean run iSCSI to connect the ESX 
box to
the server running ZFS? If so...
>> I run 8G Fibre Channel

- Use thick provisioning disks, lazy zeroed disks in my case reduced storage 
vMotion by 90% or so - machine 1 dropped from 8½ minutes to 23 seconds and 
machine 2 dropped from ~7 minutes to 54 seconds... a rather nice improvement 
simply by changing from thin to thick provisioning.


This makes no difference in ZFS. The "thick provisioned" volume is simply a 
volume with a reservation.
All allocations are copy-on-write. So the only difference between a "thick" and 
"thin" volume occurs when
you run out of space in the pool.
>> I am talking thick provisioning in VMware, that's where it makes a huge 
>> difference

- I dropped my Qlogic HBA max queue depth from default 64 to 16 on all ESXi 
hosts and now I see an average latency of less than 1ms per data store (on 8G 
fibre channel).  Of course there are spikes when doing storage vMotion at these 
speeds but its well worth it.


I usually see storage vmotion running at wire speed for well configured 
systems. When you get
into the 2GByte/sec range this can get tricky, because maintaining that flow 
through the RAM
and disks requires nontrivial amounts of hardware.
>> I don't even get close to wire speed unfortunately my SLOGs can only do 
>> around 5-600 MBbyte/sec with sync=always.
More likely, you're seeing the effects of caching, which is very useful for 
storage vmotion and
allows you to hit line rate.

>> Not sure this is the case with using sync=always?


I am getting to the point where I am almost happy with my ZFS backend for 
vSphere.


excellent!
 -- richard


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] VAAI Testing

2015-01-20 Thread Rune Tipsmark

I would be able to help test if its stable in my environment as well. I can't 
program though.

br,

Rune


From: OmniOS-discuss  on behalf of W 
Verb 
Sent: Tuesday, January 20, 2015 3:59 AM
To: omnios-discuss@lists.omniti.com
Subject: [OmniOS-discuss] VAAI Testing

Hi All,

After seeing the recent message regarding ZFS, iSCSI, zvols and ESXi, I decided 
to follow up on where full VAAI support is.

I found Dan's message from August: 
http://lists.omniti.com/pipermail/omnios-discuss/2014-August/002957.html

Is anyone working on his points 1 and 2?

Is anyone keeping track of the testing offers for #3?

I do a fair amount of SQA, and am willing to organize and write tests if 
needed. I also have a reasonable lab environment with which to test the code.

-Warren V
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] iostat skip first output

2015-01-24 Thread Rune Tipsmark

hi all, I am just writing some scripts to gather performance data from 
iostat... or at least trying... I would like to completely skip the first 
output since boot from iostat output and just get right to the period I 
specified with the data current from that period. Is this possible at all?



br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] iostat skip first output

2015-01-24 Thread Rune Tipsmark

nevermind, I just made it into tokens and counted my way though it... maybe not 
the best way but it works...



root@zfs10:/usr/lib/check_mk_agent/local<mailto:root@zfs10:/usr/lib/check_mk_agent/local>#
 cat disk_iostat.sh
varInterval=5
varOutput=$(iostat -xn $varInterval 2 | grep c[0-99]);
tokens=( $varOutput )
tokenCount=$(echo ${tokens[*]} | wc -w )
tokenStart=$(((tokenCount/2)-1))
tokenInterval=11
tokenEnd=$((tokenCount))
for i in $(eval echo {$tokenStart..$tokenEnd..$tokenInterval});
 do echo 0 disk_busy_${tokens[$i]} percent=${tokens[$i-1]} ${tokens[$i-1]} % 
average disk utilization last $varInterval seconds;
echo 0 disk_latency_${tokens[$i]} ms=${tokens[$i-3]} ${tokens[$i-3]} ms 
response time average last $varInterval seconds;
done


From: OmniOS-discuss  on behalf of 
Rune Tipsmark 
Sent: Saturday, January 24, 2015 6:25 PM
To: omnios-discuss@lists.omniti.com
Subject: [OmniOS-discuss] iostat skip first output


hi all, I am just writing some scripts to gather performance data from 
iostat... or at least trying... I would like to completely skip the first 
output since boot from iostat output and just get right to the period I 
specified with the data current from that period. Is this possible at all?



br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] iostat skip first output

2015-01-24 Thread Rune Tipsmark

hi Richard,

thanks for that input, will see what I can do with it.

I do store data and graph it so I can keep track of things :)

br,

Rune


From: Richard Elling 
Sent: Sunday, January 25, 2015 1:02 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] iostat skip first output


On Jan 24, 2015, at 9:25 AM, Rune Tipsmark 
mailto:r...@steait.net>> wrote:


hi all, I am just writing some scripts to gather performance data from 
iostat... or at least trying... I would like to completely skip the first 
output since boot from iostat output and just get right to the period I 
specified with the data current from that period. Is this possible at all?


iostat -xn 10 2 | awk '$1 == "extended" && NR > 2 {show=1} show == 1'

NB, this is just a derivative of a sample period. A better approach is to store
long-term trends in a database intended for such use. If that is too much work,
then you should consider storing the raw data that iostat uses for this:
kstat -p 'sd::/sd[0-9]+$/'

or in JSON:
kstat -jp 'sd::/sd[0-9]+$/'

insert shameless plug for Circonus here :-)
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Windows crashes my ZFS box

2015-02-01 Thread Rune Tipsmark

hi all,



I got some major problems... when using Windows and Fibre Channel I am able to 
kill my ZFS box totally for at least 15 minutes... it simply drops all 
connections to all hosts connected via FC. This happens under load, for example 
doing backups writing to the ZFS, running IO Meter against my ZFS...



I have looked at queue depth/length and other stuff, I just cannot seem to find 
out how this happens... I have tested on 3 different Windows machines and 3 
difference ZFS boxes - I have ESXi Servers connected to these ZFS boxes as well 
and no problem there no matter how much load I put on the LUN's.



I tried sync=always, sync=disabled, with and without log devices, dedup=on, 
dedup=off, volume based lun, thin provisioned lun, zfs based thin provisioned 
lun... you name it, I tried it...



tried all kinds of queue lengths from Windows, default is 65535, tried 16, 32, 
64, 256 etc... if I put enough stress on the lun presented to Windows it will 
cause the ZFS box to drop all FC connections for up to 15 minutes.. a reboot is 
not possible as it will hang for the same amount of minutes... might as well 
wait for it to come back...



Latest FW on all items, HBA, Switch etc. Monitoring shows a distributed load on 
the ports as expected using Round Robin and MPIO.



One thing that irritates me is that I don't get any more than ~80-120 MB/sec 
(sync=always) throughput when writing to this LUN in Windows, where I get 6-700 
MB/sec (sync=always) when writing from a VM on ESXi... The abysmal performance 
is a pain, but the fact that I can downright crash or hang my ZFS box just by 
running IOMeter is disturbing...



Any ideas why this might happen? Seems to me like a queue problem but I can't 
really get any closer than that... maybe Windows is just crappy at handling 
Fibre Channel... however no problems against HP EVA Storage same machine, 
same tests



br,

Rune




___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] ZFS Slog - force all writes to go to Slog

2015-02-18 Thread Rune Tipsmark

hi all,



I found an entry about zil_slog_limit here: 
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSWritesAndZILII

it basically explains how writes larger than 1MB per default hits the main pool 
rather than my Slog device - I could not find much further information nor the 
equivalent setting in OmniOS. I also read 
http://nex7.blogspot.ca/2013/04/zfs-intent-log.html but it didn't truly help me 
understand just how I can force every written byte to my ZFS box to go the ZIL 
regardless of size, I never ever want anything to go directly to my man pool 
ever.



I have sync=always and disabled write back cache on my volume based LU's.



Testing with zfs_txg_timeout set to 30 or 60 seconds seems to make no 
difference if I write large files to my LU's - I don't seem the write speed 
being consistent with the performance of the Slog devices. It looks as if it 
goes straight to disk and hence the performance is less than great to say the 
least.



How do I ensure 100% that all writes always goes to my Slog devices - no 
exceptions.



br,

Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZFS Slog - force all writes to go to Slog

2015-02-18 Thread Rune Tipsmark

From: Richard Elling 
Sent: Thursday, February 19, 2015 1:27 AM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] ZFS Slog - force all writes to go to Slog

On Feb 18, 2015, at 12:04 PM, Rune Tipsmark 
mailto:r...@steait.net>> wrote:

hi all,

I found an entry about zil_slog_limit here: 
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSWritesAndZILII

it basically explains how writes larger than 1MB per default hits the main pool 
rather than my Slog device - I could not find much further information nor the 
equivalent setting in OmniOS. I also read 
http://nex7.blogspot.ca/2013/04/zfs-intent-log.html but it didn't truly help me 
understand just how I can force every written byte to my ZFS box to go the ZIL 
regardless of size, I never ever want anything to go directly to my man pool 
ever.

"never ever want anything to go to main pool" is not feasible. The ZIL is a ZFS 
Intent Log
http://en.wikipedia.org/wiki/Intent_log and, unless you overwrite prior to txg 
commit, everything
ends up in the main pool.

>> yeah I know, I meant everything needs to go thru the ZIL before hitting the 
>> main pool...

I have sync=always and disabled write back cache on my volume based LU's.

Testing with zfs_txg_timeout set to 30 or 60 seconds seems to make no 
difference if I write large files to my LU's - I don't seem the write speed 
being consistent with the performance of the Slog devices. It looks as if it 
goes straight to disk and hence the performance is less than great to say the 
least.

Ultimately, the pool must be able to sustain the workload, or it will have to 
throttle.

>> it should be OK to take in some hundreds MB/sec (11 SAS mirrors each can do 
>> ~150MB/Sec sequential)
The comment for zil_slog_limit is concise:
/*
 * Use the slog as long as the logbias is 'latency' and the current commit size
 * doesn't exceed the limit or the total list size doesn't exceed its limit.
 * Limit checking is disabled by setting zil_slog_limit to UINT64_MAX.
 */
uint64_t zil_slog_limit = (1024 * 1024);
uint64_t zil_slog_list_limit = (1024 * 1024 * 200);

and you can change this on the fly using mdb to experiment.

>> how do I do this? I could not find any property matching

How do I ensure 100% that all writes always goes to my Slog devices - no 
exceptions.

The question really isn't "how" the question is "why"? Now that you know what an
Intent Log is, and how the performance of the pool is your ultimate limit, 
perhaps you
can explain what you are really trying to accomplish?

>> consistent fast write speeds at all times rather than yoyo write speeds... I 
>> get fine benchmarks but rather less fine file copy performance... and I 
>> never ever see disks being particular busy during file copy tests
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

2015-03-05 Thread Rune Tipsmark

Same problem here… have noticed I can cause this easily by using Windows as 
initiator… I cannot cause this using VMware as initiator…
No idea how to fix, but a big problem.
Br,
Rune


From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss@lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

I’ve had this problem for a while, and I have no way to diagnose what is going 
on, but occasionally when system IO gets high (I’ve seen it happen especially 
on backups), I will lose connectivity with my Fibre Channel cards which serve 
up fibre channel LUNS to a VM cluster. All hell breaks loose, and then 
connectivity gets restored. I don’t get an error that it’s dropped, at least 
not on the Omnios system, but I get notice when it’s restored (which makes no 
sense). I’m wondering if the cards are just overheating, and if heat sinks with 
a fan would help on the io chip.

Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
portid 2, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, 
portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, 
portid 1, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
portid 2, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, 
portid 10100, topology Fabric Pt-to-Pt,speed 8G
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

2015-03-05 Thread Rune Tipsmark

Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due 
to higher io load, but no console error messages.

This only happened on my SuperMicro server and never on my HP server… what 
brand are you running?

Br,
Rune


From: Nate Smith [mailto:nsm...@careyweb.com]
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss@lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Do you see the same problem with Windows and iSCSI as an initiator? I wish 
there was a way to turn up debugging to figure this out.

From: Rune Tipsmark [mailto:r...@steait.net]
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; 
omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Same problem here… have noticed I can cause this easily by using Windows as 
initiator… I cannot cause this using VMware as initiator…
No idea how to fix, but a big problem.
Br,
Rune


From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss@lists.omniti.com<mailto:omnios-discuss@lists.omniti.com>
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

I’ve had this problem for a while, and I have no way to diagnose what is going 
on, but occasionally when system IO gets high (I’ve seen it happen especially 
on backups), I will lose connectivity with my Fibre Channel cards which serve 
up fibre channel LUNS to a VM cluster. All hell breaks loose, and then 
connectivity gets restored. I don’t get an error that it’s dropped, at least 
not on the Omnios system, but I get notice when it’s restored (which makes no 
sense). I’m wondering if the cards are just overheating, and if heat sinks with 
a fan would help on the io chip.

Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
portid 2, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, 
portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, 
portid 1, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
portid 2, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, 
portid 10100, topology Fabric Pt-to-Pt,speed 8G
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

2015-03-05 Thread Rune Tipsmark

Pls see below >>


-Original Message-
From: Johan Kragsterman [mailto:johan.kragster...@capvert.se] 
Sent: Thursday, March 05, 2015 9:00 AM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss@lists.omniti.com
Subject: Ang: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Hi!

-"OmniOS-discuss"  skrev: -
Till: "'Nate Smith'" , "omnios-discuss@lists.omniti.com" 

Från: Rune Tipsmark 
Sänt av: "OmniOS-discuss" 
Datum: 2015-03-05 17:15
Ärende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven’t tried iSCSI but had similar issues with Infiniband… more 
frequent due to higher io load, but no console error messages.

 

This only happened on my SuperMicro server and never on my HP server… 
what brand are you running?



This is interesting, only on Supermicro, and never on HP? I'd like to know some 
more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes

Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed the 
>>shitty HP software and controller from and replaced with  an LSI 9207 and 
>>installed OmniOS on. I have tested on other HP and SM servers too, all 
>>exhibit the same behavior (3 SM and 2 HP tested)

Third: Did you pay attention to bios settings on the two different servers? 
Like C-states, and other settings...how about IRQ settings? And how about the 
physical PCIe buses the HBA's are sitting on? This is often causing problems, 
if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. 
>> I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings 
>> but I actually have two 8Gbit FC Cards in the SM server and both exhibit the 
>> problem. I have tried to swap things around too with no luck. I do use every 
>> available PCI-E slot though.. L2ARC, SLOG etc.

Fourth: When you say you can cause it with windows as initiator, do you mean 
windows on hardware, and not windows as a VM? And when you say you can NOT 
cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN 
access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to 
>>Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no 
>>doubt it will cause this issue... when I say I cannot cause this on VMware 
>>then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating 
>>system without raw device mapping - I have not tested if I can cause this 
>>using RDM. 
It is also true for both HP and SM - both behave just fine using VMware and 
FibreChannel - however VMware can cause issues with Infiniband on the SM but I 
think that's a different issue and has to do with Mellanox and their terrible 
drivers that are never ever updated and half-beta etc.

Since it appears on one hardware and not another, it is difficult to blame any 
specific sofware, but we just had a discussion here about iScsi/comstar, where 
Garrret suspected comstar to handle certain things bad. I don't know wether 
that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something 
>> in Comstar could be fixed,...

Bacically:

ESX+FC+SM = problem
ESX+FC+HP = no problem
Win+FC+SM = problem
Win+FC+HP = not tested
ESX+IB+SM = problem
ESX+IB+HP = no problem
Win+IB+SM = not tested, SRP not supported in Win
Win+IB+HP = not tested, SRP not supported in Win

Anyway it all lead me to some information on 8Gbit FC - in particular 
portCfgFillWord
Maybe this can affect some of this... google will reveal a great bit of info 
and also found some links.. 
http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/

What about tuning the emlxs.conf? can anything be done there to get better 
performance?

Br,
Rune


Rgrds Johan





Br,

Rune

 

 

From: Nate Smith [mailto:nsm...@careyweb.com] 
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss@lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

 

Do you see the same problem with Windows and iSCSI as an initiator? I wish 
there was a way to turn up debugging to figure this out.

 

From: Rune Tipsmark [mailto:r...@steait.net] 
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss@lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

 

Same problem here… have noticed I can cause this easily by using Windows 
as initiator… I cannot cause this using VMware as initiator…

No idea how to fix, but a big problem.

Br,

Rune

 

 

From:

Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

2015-03-05 Thread Rune Tipsmark

They are qLogic qmh2562 across the board... just figured the emlxs.conf had 
something to say since I had to edit it to get comstar into target mode.
Br,
Rune

-Original Message-
From: Johan Kragsterman [mailto:johan.kragster...@capvert.se] 
Sent: Thursday, March 05, 2015 11:07 AM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss@lists.omniti.com
Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Hi!

-Rune Tipsmark  skrev: -
Till: 'Johan Kragsterman' 
Från: Rune Tipsmark 
Datum: 2015-03-05 19:38
Kopia: 'Nate Smith' , "omnios-discuss@lists.omniti.com" 

Ärende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Pls see below >>
: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven’t tried iSCSI but had similar issues with Infiniband… more 
frequent due to higher io load, but no console error messages.

This only happened on my SuperMicro server and never on my HP server… 
what brand are you running?

This is interesting, only on Supermicro, and never on HP? I'd like to know some 
more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes

Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed 
>>the shitty HP software and controller from and replaced with  an LSI 
>>9207 and installed OmniOS on. I have tested on other HP and SM servers 
>>too, all exhibit the same behavior (3 SM and 2 HP tested)

Third: Did you pay attention to bios settings on the two different servers? 
Like C-states, and other settings...how about IRQ settings? And how about the 
physical PCIe buses the HBA's are sitting on? This is often causing problems, 
if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. 
>> I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings 
>> but I actually have two 8Gbit FC Cards in the SM server and both exhibit the 
>> problem. I have tried to swap things around too with no luck. I do use every 
>> available PCI-E slot though.. L2ARC, SLOG etc.

Fourth: When you say you can cause it with windows as initiator, do you mean 
windows on hardware, and not windows as a VM? And when you say you can NOT 
cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN 
access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to 
>>Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no 
>>doubt it will cause this issue... when I say I cannot cause this on VMware 
>>then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating 
>>system without raw device mapping - I have not tested if I can cause this 
>>using RDM. 
It is also true for both HP and SM - both behave just fine using VMware and 
FibreChannel - however VMware can cause issues with Infiniband on the SM but I 
think that's a different issue and has to do with Mellanox and their terrible 
drivers that are never ever updated and half-beta etc.

Since it appears on one hardware and not another, it is difficult to blame any 
specific sofware, but we just had a discussion here about iScsi/comstar, where 
Garrret suspected comstar to handle certain things bad. I don't know wether 
that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something 
>> in Comstar could be fixed,...

Bacically:

ESX+FC+SM = problem
ESX+FC+HP = no problem
Win+FC+SM = problem
Win+FC+HP = not tested
ESX+IB+SM = problem
ESX+IB+HP = no problem
Win+IB+SM = not tested, SRP not supported in Win HP = not tested, SRP 
Win+IB+not supported in Win

Anyway it all lead me to some information on 8Gbit FC - in particular 
portCfgFillWord Maybe this can affect some of this... google will reveal a 
great bit of info and also found some links.. 
http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/

What about tuning the emlxs.conf? can anything be done there to get better 
performance?

Are you using Emulex HBA's? That would explain thingsI have never used 
Emulex in production. Tried some times in lab env, but always turned out to 
behave strangly...

Br,
Rune

Rgrds Johan

Br,

Rune

From: Nate Smith [mailto:nsm...@careyweb.com]
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss@lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Do you see the same problem with Windows and iSCSI as an initiator? I wish 
there was a way to turn up debugging to figure this out.

From: Rune Tipsmark [mailto:r

Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

2015-03-05 Thread Rune Tipsmark

Ah ok, so just loading the qlt drives is enough, I followed a guide from 
napp-it when I first learned about solaris a year or so ago and it had the 
emlxs.conf target=1 described so I just followed it ever since. 

Any other files that can be used to tweak the target driver or comstar?
Br,
Rune


-Original Message-
From: Johan Kragsterman [mailto:johan.kragster...@capvert.se] 
Sent: Thursday, March 05, 2015 12:12 PM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss@lists.omniti.com
Subject: Ang: RE: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?



-Rune Tipsmark  skrev: -
Till: 'Johan Kragsterman' 
Från: Rune Tipsmark 
Datum: 2015-03-05 20:44
Kopia: 'Nate Smith' , "omnios-discuss@lists.omniti.com" 

Ärende: RE: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

They are qLogic qmh2562 across the board... just figured the emlxs.conf had 
something to say since I had to edit it to get comstar into target mode.
Br,
Rune







COMSTAR is target only, so you don't get COMSTAR into target mode, you get the 
HBA into target mode with a target driver, to give COMSTAR an interface to work 
with. If you are using qmh2562, you need the qlt driver, which I suppose you 
already use. emlx is the driver for Emulex  HBA's, and is of no use when you're 
using qlogic HBA's.

Rgrds Johan











-Original Message-
From: Johan Kragsterman [mailto:johan.kragster...@capvert.se] 
Sent: Thursday, March 05, 2015 11:07 AM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss@lists.omniti.com
Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?


Hi!




-Rune Tipsmark  skrev: -
Till: 'Johan Kragsterman' 
Från: Rune Tipsmark 
Datum: 2015-03-05 19:38
Kopia: 'Nate Smith' , "omnios-discuss@lists.omniti.com" 

Ärende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Pls see below >>
: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven’t tried iSCSI but had similar issues with Infiniband… more 
frequent due to higher io load, but no console error messages.

 

This only happened on my SuperMicro server and never on my HP server… 
what brand are you running?



This is interesting, only on Supermicro, and never on HP? I'd like to know some 
more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes

Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed 
>>the shitty HP software and controller from and replaced with  an LSI 
>>9207 and installed OmniOS on. I have tested on other HP and SM servers 
>>too, all exhibit the same behavior (3 SM and 2 HP tested)

Third: Did you pay attention to bios settings on the two different servers? 
Like C-states, and other settings...how about IRQ settings? And how about the 
physical PCIe buses the HBA's are sitting on? This is often causing problems, 
if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. 
>> I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings 
>> but I actually have two 8Gbit FC Cards in the SM server and both exhibit the 
>> problem. I have tried to swap things around too with no luck. I do use every 
>> available PCI-E slot though.. L2ARC, SLOG etc.

Fourth: When you say you can cause it with windows as initiator, do you mean 
windows on hardware, and not windows as a VM? And when you say you can NOT 
cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN 
access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to 
>>Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no 
>>doubt it will cause this issue... when I say I cannot cause this on VMware 
>>then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating 
>>system without raw device mapping - I have not tested if I can cause this 
>>using RDM. 
It is also true for both HP and SM - both behave just fine using VMware and 
FibreChannel - however VMware can cause issues with Infiniband on the SM but I 
think that's a different issue and has to do with Mellanox and their terrible 
drivers that are never ever updated and half-beta etc.

Since it appears on one hardware and not another, it is difficult to blame any 
specific sofware, but we just had a discussion here about iScsi/comstar, where 
Garrret suspected comstar to handle certain things bad. I don't know wether 
that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something 
>> in Comsta

Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

2015-03-06 Thread Rune Tipsmark

No idea to be honest, even if there is its scary if it can cause these kinds of 
problems…
Br,
Rune

From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Nate Smith
Sent: Friday, March 06, 2015 8:57 AM
To: 'Richard Elling'
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Yeah, there is on R720s, I think. What about on the Supermicro and HP servers?

From: Richard Elling [mailto:richard.ell...@richardelling.com]
Sent: Friday, March 06, 2015 11:39 AM
To: Nate Smith
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

On Mar 5, 2015, at 6:00 AM, Nate Smith 
mailto:nsm...@careyweb.com>> wrote:

I’ve had this problem for a while, and I have no way to diagnose what is going 
on, but occasionally when system IO gets high (I’ve seen it happen especially 
on backups), I will lose connectivity with my Fibre Channel cards which serve 
up fibre channel LUNS to a VM cluster. All hell breaks loose, and then 
connectivity gets restored. I don’t get an error that it’s dropped, at least 
not on the Omnios system, but I get notice when it’s restored (which makes no 
sense). I’m wondering if the cards are just overheating, and if heat sinks with 
a fan would help on the io chip.

Is there a PCI bridge in the data path? These can often be found on mezzanine 
or riser cards.
 -- richard

Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
portid 2, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, 
portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, 
portid 1, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
portid 2, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, 
portid 10100, topology Fabric Pt-to-Pt,speed 8G
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

2015-03-07 Thread Rune Tipsmark

ok, so HP has a riser and the FC cards are sitting in the riser. SM has no 
riser and all cards are inserted directly onto motherboard.

Also I just remembered... when I had the Infiniband ConnectX2 installed in the 
SM it would not reboot and I always had to reset it via IPMI. 

The more I think about it the more I lean towards SM having an issue... and 
Dell uses essentially SM so same same.

br,
Rune


From: Johan Kragsterman 
Sent: Saturday, March 7, 2015 4:24 PM
To: Rune Tipsmark
Cc: 'Nate Smith'; 'Richard Elling'; omnios-discuss@lists.omniti.com
Subject: Ang: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

-"OmniOS-discuss"  skrev: -
Till: "'Nate Smith'" , "'Richard Elling'" 

Från: Rune Tipsmark
Sänt av: "OmniOS-discuss"
Datum: 2015-03-07 04:06
Kopia: "omnios-discuss@lists.omniti.com" 
Ärende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

No idea to be honest, even if there is its scary if it can cause these kinds of 
problems…

Br,

Rune






You don't know wether these systems got risers or not?

That can't be difficult to find out: Are the HBA's located directly in PCIe 
slots on the system board, or are they instead located in riser boards that 
sits in the PCIe slots?

It would be very interesting to find out

If Richards theory is correct, you got HBA's sitting in risers on the 
Supermicro, but on the HP you got the HBA's directly in the PCIe slots on the 
system board.

 Rgrds Johan






From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Nate Smith
Sent: Friday, March 06, 2015 8:57 AM
To: 'Richard Elling'
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?



Yeah, there is on R720s, I think. What about on the Supermicro and HP servers?



From: Richard Elling [mailto:richard.ell...@richardelling.com]
Sent: Friday, March 06, 2015 11:39 AM
To: Nate Smith
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?





On Mar 5, 2015, at 6:00 AM, Nate Smith  wrote:



I’ve had this problem for a while, and I have no way to diagnose what is 
going on, but occasionally when system IO gets high (I’ve seen it happen 
especially on backups), I will lose connectivity with my Fibre Channel cards 
which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and 
then connectivity gets restored. I don’t get an error that it’s 
dropped, at least not on the Omnios system, but I get notice when it’s 
restored (which makes no sense). I’m wondering if the cards are just 
overheating, and if heat sinks with a fan would help on the io chip.



Is there a PCI bridge in the data path? These can often be found on mezzanine 
or riser cards.

 -- richard





Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
portid 2, topology Fabric Pt-to-Pt,speed 8G

Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, 
portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar  5 02:00:13 newstorm last message repeated 1 time

Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, 
portid 1, topology Fabric Pt-to-Pt,speed 8G

Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
portid 2, topology Fabric Pt-to-Pt,speed 8G

Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, 
portid 10100, topology Fabric Pt-to-Pt,speed 8G

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] crash dump analysis help

2015-04-18 Thread Rune Tipsmark

hi guys,

my omnios zfs server crashed today and I got a complete core dump and I was 
wondering if I am on the right track...

here is what I did so far...

root@zfs10:/root# fmdump -Vp -u 
775e0fc1-dcd2-4cb2-b800-88a1b9910f94
TIME   UUID SUNW-MSG-ID
Apr 17 2015 22:48:13.667749000 775e0fc1-dcd2-4cb2-b800-88a1b9910f94 
SUNOS-8000-KL
  TIME CLASS ENA
  Apr 17 22:48:13.6544 ireport.os.sunos.panic.dump_available 0x
  Apr 17 22:45:46.3335 ireport.os.sunos.panic.dump_pending_on_device 
0x
nvlist version: 0
version = 0x0
class = list.suspect
uuid = 775e0fc1-dcd2-4cb2-b800-88a1b9910f94
code = SUNOS-8000-KL
diag-time = 1429303693 655062
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru = 
sw:///:path=/var/crash/unknown/.775e0fc1-dcd2-4cb2-b800-88a1b9910f94
resource = 
sw:///:path=/var/crash/unknown/.775e0fc1-dcd2-4cb2-b800-88a1b9910f94
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.1
os-instance-uuid = 775e0fc1-dcd2-4cb2-b800-88a1b9910f94
panicstr = BAD TRAP: type=e (#pf Page fault) 
rp=ff01701bb960 addr=ec6093a0 occurred in module "unix" due to an illegal 
access to a user address
panicstack = unix:die+df () | unix:trap+db3 () | 
unix:cmntrap+e6 () | unix:bzero+184 () | zfs:l2arc_write_buffers+1f8 () | 
zfs:l2arc_feed_thread+240 () | unix:thread_start+8 () |
crashtime = 1429299093
panic-time = Fri Apr 17 21:31:33 2015 CEST
(end fault-list[0])
fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x5531718d 0x27cd0a88

//then extract the dump file:

savecore: not enough space in /var/crash/unknown (14937 MB avail, 27154 MB 
needed)
root@zfs10:/var/crash/unknown# savecore 
-f /pool01/ISO/vmdump.1 /pool01/ISO/
savecore: System dump time: Fri Apr 17 21:31:33 2015
savecore: saving system crash dump in /pool01/ISO//{unix,vmcore}.1
Constructing namelist /pool01/ISO//unix.1
Constructing corefile /pool01/ISO//vmcore.1
 3:33 100% done: 6897249 of 6897249 pages saved

// then mdb and $c to see last process before the crash...

root@zfs10:/pool01/ISO# mdb unix.1 vmcore.1
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix 
scsi_vhci zfs sata sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md lofs 
mpt_sas random ufs idm smbsrv nfs crypto ptm cpc kvm fcp fcip logindmux nsmb 
nsctl sdbc ii sv rdc ]
> $c
bzero+0x184()
l2arc_write_buffers+0x1f8(ff328f86, ff331782d8d8, 80, 
ff01701bbbec)
l2arc_feed_thread+0x240()
thread_start+8()

// based on this I believe my  m2 sata L2 cache Samsung ssd drives used for 
L2arc in the zpool are ready to be thrown into the bin 

Is there some way I can gather more info and confirm I am on the right track?


br,
Rune
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] crash dump analysis help

2015-04-20 Thread Rune Tipsmark

root@zfs10:/root# uname -a
SunOS zfs10 5.11 omnios-10b9c79 i86pc i386 i86pc

Any idea how I can troubleshoot further?
br,
Rune

From: Dan McDonald 
Sent: Monday, April 20, 2015 3:58 AM
To: Rune Tipsmark
Cc: omnios-discuss; Dan McDonald
Subject: Re: [OmniOS-discuss] crash dump analysis help

What version of OmniOS are you running (uname -a, or cat /etc/release)?  
Clearly a bad pointer was passed into bzero().  The pointer was 0xec6093a0, 
according to your panic screen.

It COULD be bad HW, but it could also be something more sinister.

Thanks,
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] crash dump analysis help

2015-04-20 Thread Rune Tipsmark

it's nearly 30 gigs... not sure anyone would download it :)

maybe I can compress it or something.

br,
Rune


From: Dan McDonald 
Sent: Monday, April 20, 2015 2:40 PM
To: Rune Tipsmark
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] crash dump analysis help

> On Apr 20, 2015, at 7:02 AM, Rune Tipsmark  wrote:
>
> root@zfs10:/root# uname -a
> SunOS zfs10 5.11 omnios-10b9c79 i86pc i386 i86pc

That's r151012 (modulo today's update... read the list in a bit).

> Any idea how I can troubleshoot further?

Share the dump so people can see it.

Dan


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] disk failure causing reboot?

2015-05-19 Thread Rune Tipsmark

Same issue here around two months ago when a L2arc device failed… failmode was 
default and the device was actually an mSata SSD mounted in a PCI-E mSata card:

http://www.addonics.com/products/ad4mspx2.php  and the disk was one of four of 
these http://www.samsung.com/us/computer/memory-storage/MZ-MTE1T0BW

Can these reboots be avoided in any way?

Br,
Rune

From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Schweiss, Chip
Sent: Monday, May 18, 2015 10:31 PM
To: Paul B. Henson
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] disk failure causing reboot?

I had the exact same failure mode last week.  With over 1000 spindles I see 
this about once a month.

I can publish my dump also if anyone actually want's to try to fix this 
problem, but I think there are several of the same thing already linked to 
tickets in Illumos-gate.
Pools for the most part should be set to failmode=panic or wait, but a failed 
disk should not cause a panic.   The system this happened to me on failmode was 
set to wait.  It is also on r151012, waiting on a window to upgrade to r151014. 
 My pool is raidz3, so no reason not to kick a bad disk.
All my disks are SAS in DataON JBODs, dual connected across two LSI HBAs.
BTW, pull a SAS cable and you get a panic too, not degraded multipath.
Illumos seems to panic on just about any SAS event these days regardless of 
redundancy.
-Chip

On Mon, May 18, 2015 at 3:08 PM, Paul B. Henson 
mailto:hen...@acm.org>> wrote:
On Mon, May 18, 2015 at 06:25:34PM +, Jeff Stockett wrote:
> A drive failed in one of our supermicro 5048R-E1CR36L servers running
> omnios r151012 last night, and somewhat unexpectedly, the whole system
> seems to have panicked.

You don't happen to have failmode set to panic on the pool?

From the zpool manpage:

   failmode=wait | continue | panic
   Controls the system behavior in the event of catastrophic pool
   failure. This condition is typically a result of a loss of
   connectivity to the underlying storage device(s) or a failure of
   all devices within the pool. The behavior of such an event is
   determined as follows:

   wait
   Blocks all I/O access until the device connectivity is
   recovered and the errors are cleared. This is the
   default behavior.

   continue
   Returns EIO to any new write I/O requests but allows
   reads to any of the remaining healthy devices. Any
   write requests that have yet to be committed to disk
   would be blocked.

   panic
   Prints out a message to the console and generates a
   system crash dump.

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] ZIL TXG commits happen very frequently - why?

2015-10-13 Thread Rune Tipsmark

Hi all.

Wondering if anyone could shed some light on why my ZFS pool would perform TXG 
commits up to 5 times per second. It's set to the default 5 second interval and 
occasionally it does wait 5 seconds between commits, but only when nearly idle.

I'm not sure if this impacts my performance but I would suspect it doesn't 
improve it. I force sync on all data.

I got 11 mirrors (7200rpm sas disks) two SLOG devices and two L2 ARC devices 
and a pair of spare disks.

Each log device can hold 150GB of data so plenty for 2 TXG commits. The system 
has 384GB memory.

Below is a bit of output from zilstat during a near idle time this morning so 
you wont see 4-5 commits per second, but during load later today it will 
happen..

root@zfs10:/tmp# ./zilstat.ksh -M -t -p pool01 txg
waiting for txg commit...
TIMEtxg   N-MB N-MB/s N-Max-Rate   B-MB 
B-MB/s B-Max-Rateops  <=4kB 4-32kB >=32kB
2015 Oct 14 06:21:19   10872771  3  3  0 21 
21  2234 14 19201
2015 Oct 14 06:21:22   10872772 10  3  3 70 
23 24806  0 84725
2015 Oct 14 06:21:24   10872773 12  6  5 56 
28 26682 17107558
2015 Oct 14 06:21:25   10872774 13 13  2 75 
75 14651  0 10641
2015 Oct 14 06:21:25   10872775  0  0  0  0 
 0  0  1  0  0  1
2015 Oct 14 06:21:26   10872776 11 11  6 53 
53 29645  2136507
2015 Oct 14 06:21:30   10872777 11  2  4 81 
20 32873 11 60804
2015 Oct 14 06:21:30   10872778  0  0  0  0 
 0  0  1  0  1  0
2015 Oct 14 06:21:31   10872779 12 12 11 56 
56 52631  0  8623
2015 Oct 14 06:21:33   10872780 11  5  4 74 
37 27858  0 44814
2015 Oct 14 06:21:36   10872781 14  4  6 79 
26 30977 12 82883
2015 Oct 14 06:21:39   10872782 11  3  4 78 
26 25957 18 55884
2015 Oct 14 06:21:43   10872783 13  3  4 80 
20 24930  0135795
2015 Oct 14 06:21:46   10872784 13  4  4 81 
27 29965 13 95857
2015 Oct 14 06:21:49   10872785 11  3  6 80 
26 41   1077 12215850
2015 Oct 14 06:21:53   10872786  9  3  2 67 
22 18870  1 74796
2015 Oct 14 06:21:56   10872787 12  3  5 72 
18 26909 17163729
2015 Oct 14 06:21:58   10872788 12  6  3 53 
26 21530  0 33497
2015 Oct 14 06:21:59   10872789 26 26 24 72 
72 62882 12 60810
2015 Oct 14 06:22:02   10872790  9  3  5 57 
19 28777  0 70708
2015 Oct 14 06:22:07   10872791 11  2  3 96 
24 22   1044 12 46986
2015 Oct 14 06:22:10   10872792 13  3  4 78 
19 22911 12 38862
2015 Oct 14 06:22:14   10872793 11  2  4 79 
19 26930 10 94826
2015 Oct 14 06:22:17   10872794 11  3  5 73 
24 26   1054 17151886
2015 Oct 14 06:22:17   10872795  0  0  0  0 
 0  0  2  0  0  2
2015 Oct 14 06:22:18   10872796 40 40 38 78 
78 60707  0 28680
2015 Oct 14 06:22:22   10872797 10  3  3 66 
22 21937 14164759
2015 Oct 14 06:22:25   10872798  9  2  2 66 
16 21821 11 92718
2015 Oct 14 06:22:28   10872799 24 12 14 80 
40 43750  0 23727
2015 Oct 14 06:22:28   10872800  0  0  0  0 
 0  0  2  0  0  2
2015 Oct 14 06:22:29   10872801 15  7  9 49 
24 24526 11 25490
2015 Oct 14 06:22:33   10872802 10  2  3 79 
19 24939  0 638

Re: [OmniOS-discuss] ZIL TXG commits happen very frequently - why?

2015-10-14 Thread Rune Tipsmark

 4103
   16384 |@@@  2400
   32768 |@@   1401
   65536 |@@   1504
  131072 |@897
  262144 |@427
  524288 | 39
 1048576 | 1
 2097152 | 0

   avg latency  stddeviops  throughput
write496us  3136us  9809/s   450773k/s
read   22633us 59917us   492/s17405k/s

I also happen to monitor how busy each disk is and I don't see any significant 
load there either... here is an example

[cid:391d20a6-a7e7-4ec1-850c-2153d4eb4f64]

so I'm a bit lost as what to do next, I don't see any stress on the system in 
terms of writes but I still cannot max out the 8gbit FC... reads however are 
doing fairly good, getting just over 700MB/sec which is acceptable over 8Gbit 
FC. Writes tend to be between 350 and 450 MB/sec... they should get up to 
700MB/sec as well.

Any ideas where to start?

br,
Rune






From: Schweiss, Chip 
Sent: Wednesday, October 14, 2015 2:44 PM
To: Rune Tipsmark
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] ZIL TXG commits happen very frequently - why?

It all has to do with the write throttle and buffers filling.   Here's a great 
blog post on how it works and how it's tuned:

http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/

http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/

-Chip


On Wed, Oct 14, 2015 at 12:45 AM, Rune Tipsmark 
mailto:r...@steait.net>> wrote:
Hi all.

Wondering if anyone could shed some light on why my ZFS pool would perform TXG 
commits up to 5 times per second. It’s set to the default 5 second interval and 
occasionally it does wait 5 seconds between commits, but only when nearly idle.

I’m not sure if this impacts my performance but I would suspect it doesn’t 
improve it. I force sync on all data.

I got 11 mirrors (7200rpm sas disks) two SLOG devices and two L2 ARC devices 
and a pair of spare disks.

Each log device can hold 150GB of data so plenty for 2 TXG commits. The system 
has 384GB memory.

Below is a bit of output from zilstat during a near idle time this morning so 
you wont see 4-5 commits per second, but during load later today it will 
happen..

root@zfs10:/tmp# ./zilstat.ksh -M -t -p pool01 txg
waiting for txg commit...
TIMEtxg   N-MB N-MB/s N-Max-Rate   B-MB 
B-MB/s B-Max-Rateops  <=4kB 4-32kB >=32kB
2015 Oct 14 06:21:19   10872771  3  3  0 21 
21  2234 14 19201
2015 Oct 14 06:21:22   10872772 10  3  3 70 
23 24806  0 84725
2015 Oct 14 06:21:24   10872773 12  6  5 56 
28 26682 17107558
2015 Oct 14 06:21:25   10872774 13 13  2 75 
75 14651  0 10641
2015 Oct 14 06:21:25   10872775  0  0  0  0 
 0  0  1  0  0  1
2015 Oct 14 06:21:26   10872776 11 11  6 53 
53 29645  2136507
2015 Oct 14 06:21:30   10872777 11  2  4 81 
20 32873 11 60804
2015 Oct 14 06:21:30   10872778  0  0  0  0 
 0  0  1  0  1  0
2015 Oct 14 06:21:31   10872779 12 12 11 56 
56 52631  0  8623
2015 Oct 14 06:21:33   10872780 11  5  4 74 
37 27858  0 44814
2015 Oct 14 06:21:36   10872781 14  4  6 79 
26 30977 12 82883
2015 Oct 14 06:21:39   10872782 11  3  4 78 
26 25957 18 55884
2015 Oct 14 06:21:43   10872783 13  3  4 80 
20 24930  0135795
2015 Oct 14 06:21:46   10872784 13  4  4 81 
27 29965 13 95857
2015 Oct 14 06:21:49   10872785 11  3  6 80 
26 41   1077 12215850
2015 Oct 14 06:21:53   10872786  9  3  2 67 
22 18870  1 74796
2015 Oct 14 06:21:56   10872787 12  3  5 72 
18 26909 17163729
2015 Oct 14 06:21:58

88 matches

Mail list logo