Re: [OpenIndiana-discuss] VMware
Hi Ed, Chip, Thanks for the responses, it was basically to see whether people had been having any compatibility issues with Oi as backend storage. We've seen datastore disconnects in the ESXi hosts over both iSCSI and NFS, and it seemed odd that there'd be the same problems across both protocols. Didn't really show up in testing and I've seen other people running this kind of setup without issue, so it was really a question to see if there were any other people seeing the same thing. At the same time as the hosts were seeing disconnects we had other machines using the same iSCSI targets without any errors at all, so it is all a bit odd. Thanks, James On 10 Aug 2013, at 14:32, Edward Ned Harvey (openindiana) openindi...@nedharvey.com wrote: From: James Relph [mailto:ja...@themacplace.co.uk] Sent: Saturday, August 10, 2013 6:12 AM Is anybody using Oi as a data store for VMware using NFS or iSCSI? I have done both. What do you want to know? I couldn't measure any performance difference nfs vs iscsi. Theoretically, iscsi should be more reliable, by default setting the refreservation and supposedly guaranteeing there will always be disk space available for writes, but I haven't found that to be reality. I have bumped into full disk problems with iscsi just as much as nfs, so it's important to simply monitor and manage intelligently. And the comstar stuff seems to be kind of unreliable, not to mention confusing. NFS seems to be considerably easier to manage. So I would recommend NFS rather than iscsi. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] VMware
On 2013-08-11 11:13, James Relph wrote: Hi Ed, Chip, Thanks for the responses, it was basically to see whether people had been having any compatibility issues with Oi as backend storage. We've seen datastore disconnects in the ESXi hosts over both iSCSI and NFS, and it seemed odd that there'd be the same problems across both protocols. Didn't really show up in testing and I've seen other people running this kind of setup without issue, so it was really a question to see if there were any other people seeing the same thing. At the same time as the hosts were seeing disconnects we had other machines using the same iSCSI targets without any errors at all, so it is all a bit odd. Maybe something with networking? Like trunked connections and some links going down (temporarily) and hash-routed packets to them are not delivered properly (until the failure is detected or clink comes back up)? Possibly, if a master (first) interface on an aggregation becomes lost, there may also be fun with MAC address changes... Wild shots in the dark, though not completely without practical basis ;) HTH, //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] VMware
I'll pass that on to someone actually, thanks, although would we lose pings with that (had pings running to test for a network issue and never had packet loss)? It's a bit of a puzzler! James. Sent from my iPhone On 11 Aug 2013, at 10:43, Jim Klimov jimkli...@cos.ru wrote: On 2013-08-11 11:13, James Relph wrote: Hi Ed, Chip, Thanks for the responses, it was basically to see whether people had been having any compatibility issues with Oi as backend storage. We've seen datastore disconnects in the ESXi hosts over both iSCSI and NFS, and it seemed odd that there'd be the same problems across both protocols. Didn't really show up in testing and I've seen other people running this kind of setup without issue, so it was really a question to see if there were any other people seeing the same thing. At the same time as the hosts were seeing disconnects we had other machines using the same iSCSI targets without any errors at all, so it is all a bit odd. Maybe something with networking? Like trunked connections and some links going down (temporarily) and hash-routed packets to them are not delivered properly (until the failure is detected or clink comes back up)? Possibly, if a master (first) interface on an aggregation becomes lost, there may also be fun with MAC address changes... Wild shots in the dark, though not completely without practical basis ;) HTH, //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] VMware
On 2013-08-11 16:59, James Relph wrote: I'll pass that on to someone actually, thanks, although would we lose pings with that (had pings running to test for a network issue and never had packet loss)? It's a bit of a puzzler! Also, does your host use ipfilter to filter and/or NAT access to the iSCSI and NFS services? It might be that you run out of buckets needed to track sessions. I am not sure what the defaults are now, but remember needing to bump them a lot on an OpenSolaris SXCE 129 firewall. There was this patch to /lib/svc/method/ipfilter : configure_firewall() { create_global_rules || exit $SMF_EXIT_ERR_CONFIG create_global_ovr_rules || exit $SMF_EXIT_ERR_CONFIG create_services_rules || exit $SMF_EXIT_ERR_CONFIG [ ! -f ${IPFILCONF} -a ! -f ${IPNATCONF} ] exit 0 ### Enforce and display state-table sizing ### Jim Klimov, 2009-2010 ipf -D -T fr_statemax=72901,fr_statesize=104147,fr_statemax,fr_statesize -E -T fr_statemax,fr_statesize # ipf -E load_ippool || exit $SMF_EXIT_ERR_CONFIG load_ipf || exit $SMF_EXIT_ERR_CONFIG load_ipnat || exit $SMF_EXIT_ERR_CONFIG } Again, I have no idea if any of this (the fr_* line) is needed on todays systems; the defaults in SXCE were pretty much too low, as contemporary blogs and forums helpfully pointed out... HTH, //Jim Klimov ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] VMware
On Aug 11, 2013, at 9:59 AM, James Relph ja...@themacplace.co.uk wrote: Nope, dedicated physical 10Gb network for iSCSI/NFS traffic, with 4x 10Gb links (in an LACP bond) per device. Should be pretty solid really. If I recall correctly, you can set LACP parameters that determine how fast the switch-over occurs between ports, the interval at which the interfaces send LACP packets, and more. These can be set on either the OS or switch side depending on the vendor. So if you've determined that there is nothing wrong at either the physical layer or network and above, then the link layer is your most likely culprit. Applying the process of elimination or some other methodology is most advisable for these types of troubleshooting situations. -Gary ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] VMware
If I recall correctly, you can set LACP parameters that determine how fast the switch-over occurs between ports, the interval at which the interfaces send LACP packets, and more. These can be set on either the OS or switch side depending on the vendor. So if you've determined that there is nothing wrong at either the physical layer or network and above, then the link layer is your most likely culprit. Applying the process of elimination or some other methodology is most advisable for these types of troubleshooting situations. I'll have to have a look, but the thing is that we were seeing these datastore drops while at the same time we were running pings showing no dropped packets and no significant network latency. If it was an LACP issue (ports dropping etc.) causing iSCSI issues, wouldn't we see dropped packets at the same time? Thanks, James. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] VMware
On Sun, Aug 11, 2013 at 10:57 AM, James Relph ja...@themacplace.co.uk wrote: If it was an LACP issue (ports dropping etc.) causing iSCSI issues, wouldn't we see dropped packets at the same time? The protocol calls for strict ordering of packets so one would think ICMP would be useful in troubleshooting this. I remember using ping to test switch-over and didn't put anything live on it until I could see no noticeable outage after tuning my parameters. So if you're certain layer two isn't at fault, have you covered the other four? For example, is your hypervisor fully patched? -Gary ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] VMware
On 2013-08-11 19:57, James Relph wrote: If I recall correctly, you can set LACP parameters that determine how fast the switch-over occurs between ports, the interval at which the I'll have to have a look, but the thing is that we were seeing these datastore drops while at the same time we were running pings showing no dropped packets and no significant network latency. If it was an LACP issue (ports dropping etc.) causing iSCSI issues, wouldn't we see dropped packets at the same time? I think it may depend on the hashing type you use in the LACP trunk. Basically, in LACP, every logical connection uses one of the offered links and maxes out at its link speed. When you have many connections they can, on average, cover all links more or less equally, and sum up to a larger bandwidth than one link. Selection of a link for a particular connection can depend on several factors - for example, L2-hashing like sum up MAC addresses of DST and SRC, divide by the number of links, use the remainder as the link number to use. Other algorithms may bring IP addresses and port numbers into the mix, so that, in particular, if there are only two hosts communicating over a direct link, they still have chances to utilize all links. So it is possible that in your case ICMP went over a working link and data failed over a flaky link, for example. My gut-feeling in this area would be that one of the links does not perform well, but is not kicked out of the team (at least for a while), so connections scheduled onto it are lost or at least lag. One idea is to verify that LACP does not cause more trouble than benefit (perhaps by leaving only one physical link active and trying to reproduce the problem); a recent discussion on this subject suggested that maybe independent (non-trunked) links and application-layer MPxIO to the targets might be better than generic network-level aggregation. HTH, //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] VMware
Been using it on as a datastore for both my vmware and hyper-v clusters. No complaints, great performance and reliability. I'm using iSCSI. Sent from my iPhone On Aug 10, 2013, at 6:12 AM, James Relph ja...@themacplace.co.uk wrote: Hi all, Is anybody using Oi as a data store for VMware using NFS or iSCSI? Thanks, James. Sent from my iPhone ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss