Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread James Relph
Hi Ed, Chip,

Thanks for the responses, it was basically to see whether people had been 
having any compatibility issues with Oi as backend storage.  We've seen 
datastore disconnects in the ESXi hosts over both iSCSI and NFS, and it seemed 
odd that there'd be the same problems across both protocols.  Didn't really 
show up in testing and I've seen other people  running this kind of setup 
without issue, so it was really a question to see if there were any other 
people seeing the same thing.  At the same time as the hosts were seeing 
disconnects we had other machines using the same iSCSI targets without any 
errors at all, so it is all a bit odd.

Thanks,

James


On 10 Aug 2013, at 14:32, Edward Ned Harvey (openindiana) 
openindi...@nedharvey.com wrote:

 From: James Relph [mailto:ja...@themacplace.co.uk]
 Sent: Saturday, August 10, 2013 6:12 AM
 
 Is anybody using Oi as a data store for VMware using NFS or iSCSI?
 
 I have done both.  What do you want to know?
 
 I couldn't measure any performance difference nfs vs iscsi.  Theoretically, 
 iscsi should be more reliable, by default setting the refreservation and 
 supposedly guaranteeing there will always be disk space available for writes, 
 but I haven't found that to be reality.  I have bumped into full disk 
 problems with iscsi just as much as nfs, so it's important to simply monitor 
 and manage intelligently.  And the comstar stuff seems to be kind of 
 unreliable, not to mention confusing.  NFS seems to be considerably easier to 
 manage.  So I would recommend NFS rather than iscsi.
 
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread Jim Klimov

On 2013-08-11 11:13, James Relph wrote:

Hi Ed, Chip,

Thanks for the responses, it was basically to see whether people had been 
having any compatibility issues with Oi as backend storage.  We've seen 
datastore disconnects in the ESXi hosts over both iSCSI and NFS, and it seemed 
odd that there'd be the same problems across both protocols.  Didn't really 
show up in testing and I've seen other people  running this kind of setup 
without issue, so it was really a question to see if there were any other 
people seeing the same thing.  At the same time as the hosts were seeing 
disconnects we had other machines using the same iSCSI targets without any 
errors at all, so it is all a bit odd.


Maybe something with networking? Like trunked connections and some
links going down (temporarily) and hash-routed packets to them are
not delivered properly (until the failure is detected or clink comes
back up)? Possibly, if a master (first) interface on an aggregation
becomes lost, there may also be fun with MAC address changes...

Wild shots in the dark, though not completely without practical basis ;)

HTH,
//Jim

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread James Relph

I'll pass that on to someone actually, thanks, although would we lose pings 
with that (had pings running to test for a network issue and never had packet 
loss)?  It's a bit of a puzzler!

James. 

Sent from my iPhone

 On 11 Aug 2013, at 10:43, Jim Klimov jimkli...@cos.ru wrote:
 
 On 2013-08-11 11:13, James Relph wrote:
 Hi Ed, Chip,
 
 Thanks for the responses, it was basically to see whether people had been 
 having any compatibility issues with Oi as backend storage.  We've seen 
 datastore disconnects in the ESXi hosts over both iSCSI and NFS, and it 
 seemed odd that there'd be the same problems across both protocols.  Didn't 
 really show up in testing and I've seen other people  running this kind of 
 setup without issue, so it was really a question to see if there were any 
 other people seeing the same thing.  At the same time as the hosts were 
 seeing disconnects we had other machines using the same iSCSI targets 
 without any errors at all, so it is all a bit odd.
 
 Maybe something with networking? Like trunked connections and some
 links going down (temporarily) and hash-routed packets to them are
 not delivered properly (until the failure is detected or clink comes
 back up)? Possibly, if a master (first) interface on an aggregation
 becomes lost, there may also be fun with MAC address changes...
 
 Wild shots in the dark, though not completely without practical basis ;)
 
 HTH,
 //Jim
 
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread Jim Klimov

On 2013-08-11 16:59, James Relph wrote:


I'll pass that on to someone actually, thanks, although would we lose pings 
with that (had pings running to test for a network issue and never had packet 
loss)?  It's a bit of a puzzler!


Also, does your host use ipfilter to filter and/or NAT access to the
iSCSI and NFS services? It might be that you run out of buckets
needed to track sessions. I am not sure what the defaults are now,
but remember needing to bump them a lot on an OpenSolaris SXCE 129
firewall.

There was this patch to /lib/svc/method/ipfilter :

configure_firewall()
{
create_global_rules || exit $SMF_EXIT_ERR_CONFIG
create_global_ovr_rules || exit $SMF_EXIT_ERR_CONFIG
create_services_rules || exit $SMF_EXIT_ERR_CONFIG

[ ! -f ${IPFILCONF} -a ! -f ${IPNATCONF} ]  exit 0

### Enforce and display state-table sizing
### Jim Klimov, 2009-2010
ipf -D -T 
fr_statemax=72901,fr_statesize=104147,fr_statemax,fr_statesize -E -T 
fr_statemax,fr_statesize

# ipf -E

load_ippool || exit $SMF_EXIT_ERR_CONFIG
load_ipf || exit $SMF_EXIT_ERR_CONFIG
load_ipnat || exit $SMF_EXIT_ERR_CONFIG
}


Again, I have no idea if any of this (the fr_* line) is needed on todays
systems; the defaults in SXCE were pretty much too low, as contemporary
blogs and forums helpfully pointed out...

HTH,
//Jim Klimov


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread Gary Driggs
On Aug 11, 2013, at 9:59 AM, James Relph ja...@themacplace.co.uk wrote:

 Nope, dedicated physical 10Gb network for iSCSI/NFS traffic, with 4x 10Gb 
 links (in an LACP bond) per device.  Should be pretty solid really.

If I recall correctly, you can set LACP parameters that determine how
fast the switch-over occurs between ports, the interval at which the
interfaces send LACP packets, and more. These can be set on either the
OS or switch side depending on the vendor. So if you've determined
that there is nothing wrong at either the physical layer or network
and above, then the link layer is your most likely culprit. Applying
the process of elimination or some other methodology is most advisable
for these types of troubleshooting situations.

-Gary

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread James Relph
 If I recall correctly, you can set LACP parameters that determine how
 fast the switch-over occurs between ports, the interval at which the
 interfaces send LACP packets, and more. These can be set on either the
 OS or switch side depending on the vendor. So if you've determined
 that there is nothing wrong at either the physical layer or network
 and above, then the link layer is your most likely culprit. Applying
 the process of elimination or some other methodology is most advisable
 for these types of troubleshooting situations.

I'll have to have a look, but the thing is that we were seeing these datastore 
drops while at the same time we were running pings showing no dropped packets 
and no significant network latency.  If it was an LACP issue (ports dropping 
etc.) causing iSCSI issues, wouldn't we see dropped packets at the same time?

Thanks,

James.

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread Gary
On Sun, Aug 11, 2013 at 10:57 AM, James Relph ja...@themacplace.co.uk wrote:

 If it was an LACP issue (ports dropping etc.) causing iSCSI issues, wouldn't 
 we see dropped packets at the same time?

The protocol calls for strict ordering of packets so one would think
ICMP would be useful in troubleshooting this. I remember using ping to
test switch-over and didn't put anything live on it until I could see
no noticeable outage after tuning my parameters. So if you're certain
layer two isn't at fault, have you covered the other four? For
example, is your hypervisor fully patched?

-Gary

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread Jim Klimov

On 2013-08-11 19:57, James Relph wrote:

If I recall correctly, you can set LACP parameters that determine how
fast the switch-over occurs between ports, the interval at which the



I'll have to have a look, but the thing is that we were seeing these datastore 
drops while at the same time we were running pings showing no dropped packets 
and no significant network latency.  If it was an LACP issue (ports dropping 
etc.) causing iSCSI issues, wouldn't we see dropped packets at the same time?


I think it may depend on the hashing type you use in the LACP trunk.
Basically, in LACP, every logical connection uses one of the offered
links and maxes out at its link speed. When you have many connections
they can, on average, cover all links more or less equally, and sum
up to a larger bandwidth than one link. Selection of a link for a
particular connection can depend on several factors - for example,
L2-hashing like sum up MAC addresses of DST and SRC, divide by the
number of links, use the remainder as the link number to use. Other
algorithms may bring IP addresses and port numbers into the mix, so
that, in particular, if there are only two hosts communicating over
a direct link, they still have chances to utilize all links.

So it is possible that in your case ICMP went over a working link
and data failed over a flaky link, for example. My gut-feeling in
this area would be that one of the links does not perform well, but
is not kicked out of the team (at least for a while), so connections
scheduled onto it are lost or at least lag. One idea is to verify
that LACP does not cause more trouble than benefit (perhaps by
leaving only one physical link active and trying to reproduce the
problem); a recent discussion on this subject suggested that maybe
independent (non-trunked) links and application-layer MPxIO to the
targets might be better than generic network-level aggregation.

HTH,
//Jim

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] VMware

2013-08-11 Thread Bentley, Dain
Been using it on as a datastore for both my vmware and hyper-v clusters.  No 
complaints, great performance and reliability.  I'm using iSCSI. 

Sent from my iPhone

On Aug 10, 2013, at 6:12 AM, James Relph ja...@themacplace.co.uk wrote:

 
 Hi all,
 
 Is anybody using Oi as a data store for VMware using NFS or iSCSI?
 
 Thanks,
 
 James. 
 
 Sent from my iPhone
 
 
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss