RE: Frequent NFS not responding, timed out issue

2014-05-05 Thread Hung Nguyen The
Hi Shanker,
Thanks for your suggestion, I would give it a try.
Isolating VM traffic from storage traffic, that's the best practice. I wish I 
had a chance to make recommendations about the design from the start.

I also think about upgrading openvswitch, there seems a lot of performance 
improvement. But there is not much information out there, especially the 
compability and which version is supported with Xenserver.

Btw, we are now focusing on control those VMs that are flooding packets. The 
system now can sustain a lot more flows going through.

Best Regards,
The Hung


-Original Message-
From: Shanker Balan [mailto:shanker.ba...@shapeblue.com] 
Sent: Tuesday, April 29, 2014 4:44 PM
To: CloudStack-Users
Subject: Re: Frequent NFS not responding, timed out issue

Hi,

Comments inline.

On 28-Apr-2014, at 1:45 pm, Hung Nguyen The  wrote:

> Thanks Shanker. I don't have total control on the system and it lacks a 
> proper monitoring system.

Do give collectd + SNMP a try. Works very well for me.

> After some more investigations we believe the high latency is due to to 
> openvswitch, and some VMs are flooding packets.

Ouch.

You could have a separate storage network to isolate VM traffic from storage 
traffic.

>
> Some tunings have done like increase Flow-Eviction-thresthold and the network 
> is now much more stable.
> That would be greate if someone can guide me somewhere with 
> information regarding improve performance of open vswitch on Xenserver 6.2 
> Btw, OVS is 1.4.6 version.

Have you considered upgrading openvswitch to a newer version? I personally have 
not tried updating openvswitch on xenserver but should be possible using the 
DDK. YMMV!

http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob_plain;f=INSTALL.XenServer;hb=HEAD


--
@shankerbalan

M: +91 98860 60539 | O: +91 (80) 67935867 shanker.ba...@shapeblue.com | 
www.shapeblue.com | Twitter:@shapeblue ShapeBlue Services India LLP, 22nd 
floor, Unit 2201A, World Trade Centre, Bangalore - 560 055

Need Enterprise Grade Support for Apache CloudStack?
Our CloudStack Infrastructure 
Support<http://shapeblue.com/cloudstack-infrastructure-support/> offers the 
best 24/7 SLA for CloudStack Environments.

Apache CloudStack Bootcamp training courses

**NEW!** CloudStack 4.2.1 training<http://shapeblue.com/cloudstack-training/>
28th-29th May 2014, Bangalore. 
Classromm<http://shapeblue.com/cloudstack-training/>
16th-20th June 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
23rd-27th June 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
15th-20th September 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
22nd-27th September 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
1st-6th December 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
8th-12th December 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended 
solely for the use of the individual to whom it is addressed. Any views or 
opinions expressed are solely those of the author and do not necessarily 
represent those of Shape Blue Ltd or related companies. If you are not the 
intended recipient of this email, you must neither take any action based upon 
its contents, nor copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error. Shape Blue Ltd is a company 
incorporated in England & Wales. ShapeBlue Services India LLP is a company 
incorporated in India and is operated under license from Shape Blue Ltd. Shape 
Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is 
operated under license from Shape Blue Ltd. ShapeBlue is a registered trademark.


RE: Frequent NFS not responding, timed out issue

2014-04-28 Thread Hung Nguyen The
Thanks Shanker. I don't have total control on the system and it lacks a proper 
monitoring system.

After some more investigations we believe the high latency is due to to 
openvswitch, and some VMs are flooding packets. 
Some tunings have done like increase Flow-Eviction-thresthold and the network 
is now much more stable.
That would be greate if someone can guide me somewhere with information 
regarding improve performance of open vswitch on Xenserver 6.2
Btw, OVS is 1.4.6 version.





-Original Message-
From: Shanker Balan [mailto:shanker.ba...@shapeblue.com] 
Sent: Sunday, April 27, 2014 12:25 PM
To: CloudStack-Users
Subject: Re: Frequent NFS not responding, timed out issue

Comments inline.

On 26-Apr-2014, at 3:06 pm, Hung Nguyen The 
mailto:hung...@saobacdau.vn>> wrote:

Hi Shanker,
Thanks for your comment. I am trying to figure out why the network latency is 
high.

I would like to know how you determined that the network latency is high. Did 
you do some kind of ping test? If so, does the latency change during peak / 
non-peak times?

Real metrics would be useful to identify patterns.

It is good to know if the issue with 10Gbps is no longer the case.

Goes back to the availability of metrics. :)

I am not sure whether there is a problem with the physical switch (IBM Flex 
switch) or something has to do with the driver of network card.

At the risk of sounding like a broken record, do you have metrics for your 
network?

I usually deploy a central collectd poller to pull SNMP information for all 
devices and XenServers.

I attached with the NIC info and driver version. I don’t know if this is the 
latest version or not.

You will need to check XenServer site / Citrix for the details I guess.

Regards.
@shankerbalan




With thanks,
Nguyễn Thế Hùng



-Original Message-
From: Shanker Balan [mailto:shanker.ba...@shapeblue.com]
Sent: Friday, April 25, 2014 12:27 PM
To: CloudStack-Users
Subject: Re: Frequent NFS not responding, timed out issue

Hi Hung,

Comments inline.

On 25-Apr-2014, at 2:14 am, Hùng Nguyễn Thế 
mailto:thehung.ngu...@live.com>> wrote:

> Hi all,
> We experience an intermittent NFS connection from our Xenserver 6.2 Sp1 
> latest patch.
> /var/log/kern.log shows the exact messages as in the known issues from 
> Xenserver 6.0 Release Notes [1]:
> "kernel: nfs: server 10.0.0.1 not responding, timed out"
> Our enviroment is indeed 10Gbps, storage is IBM Storwize v7000.
> Hope someone can give us some hints on this matter.


Would you have any monitoring running of the network itself? SNMP information 
collected from the switches and the physical network interfaces of the various 
devices on the network would be helpful to narrow down the issue.

Most likely there is saturation somewhere.



> Buying Citrix support subscription is on the way.
> Some addtional information:
> nfsstat -m:
> rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,
> acregmax=0,acdirmin=0,acdirmax=0,soft,proto=tcp,port=65535,timeo=133,r
> etrans=0, sec=sys,mountport=65535,local_lock=none,addr=192.168.x.x

> I have captured tcpdump and seen some symtomps like TCP retransmission, tcp 
> acked unseen segment.
> Network seems okay, the only unusual thing is the latency from xenserver to 
> the storage is fluctuating, from 1ms to several XX ms.

Try to get access to the network level metrics to debug further.

Regards.


--
@shankerbalan

M: +91 98860 60539 | O: +91 (80) 67935867 
shanker.ba...@shapeblue.com<mailto:shanker.ba...@shapeblue.com> | 
www.shapeblue.com<http://www.shapeblue.com> | Twitter:@shapeblue ShapeBlue 
Services India LLP, 22nd floor, Unit 2201A, World Trade Centre, Bangalore - 560 
055

Need Enterprise Grade Support for Apache CloudStack?
Our CloudStack Infrastructure 
Support<http://shapeblue.com/cloudstack-infrastructure-support/> offers the 
best 24/7 SLA for CloudStack Environments.

Apache CloudStack Bootcamp training courses

**NEW!** CloudStack 4.2.1 training<http://shapeblue.com/cloudstack-training/>
28th-29th May 2014, Bangalore. 
Classromm<http://shapeblue.com/cloudstack-training/>
16th-20th June 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
23rd-27th June 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
15th-20th September 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
22nd-27th September 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
1st-6th December 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
8th-12th December 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended 
solely for the use of the individual to whom it is addres

RE: Frequent NFS not responding, timed out issue

2014-04-26 Thread Hung Nguyen The
Hi Shanker,

Thanks for your comment. I am trying to figure out why the network latency is 
high.

It is good to know if the issue with 10Gbps is no longer the case.

I am not sure whether there is a problem with the physical switch (IBM Flex 
switch) or something has to do with the driver of network card.

I attached with the NIC info and driver version. I don’t know if this is the 
latest version or not.



With thanks,

Nguyễn Thế Hùng







-Original Message-
From: Shanker Balan [mailto:shanker.ba...@shapeblue.com]
Sent: Friday, April 25, 2014 12:27 PM
To: CloudStack-Users
Subject: Re: Frequent NFS not responding, timed out issue



Hi Hung,



Comments inline.



On 25-Apr-2014, at 2:14 am, Hùng Nguyễn Thế 
mailto:thehung.ngu...@live.com>> wrote:



> Hi all,

> We experience an intermittent NFS connection from our Xenserver 6.2 Sp1 
> latest patch.

> /var/log/kern.log shows the exact messages as in the known issues from 
> Xenserver 6.0 Release Notes [1]:

> "kernel: nfs: server 10.0.0.1 not responding, timed out"

> Our enviroment is indeed 10Gbps, storage is IBM Storwize v7000.

> Hope someone can give us some hints on this matter.





Would you have any monitoring running of the network itself? SNMP information 
collected from the switches and the physical network interfaces of the various 
devices on the network would be helpful to narrow down the issue.



Most likely there is saturation somewhere.







> Buying Citrix support subscription is on the way.

> Some addtional information:

> nfsstat -m:

> rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,

> acregmax=0,acdirmin=0,acdirmax=0,soft,proto=tcp,port=65535,timeo=133,r

> etrans=0, sec=sys,mountport=65535,local_lock=none,addr=192.168.x.x



> I have captured tcpdump and seen some symtomps like TCP retransmission, tcp 
> acked unseen segment.

> Network seems okay, the only unusual thing is the latency from xenserver to 
> the storage is fluctuating, from 1ms to several XX ms.



Try to get access to the network level metrics to debug further.



Regards.





--

@shankerbalan



M: +91 98860 60539 | O: +91 (80) 67935867 
shanker.ba...@shapeblue.com | 
www.shapeblue.com | Twitter:@shapeblue ShapeBlue 
Services India LLP, 22nd floor, Unit 2201A, World Trade Centre, Bangalore - 560 
055



Need Enterprise Grade Support for Apache CloudStack?

Our CloudStack Infrastructure 
Support offers the 
best 24/7 SLA for CloudStack Environments.



Apache CloudStack Bootcamp training courses



**NEW!** CloudStack 4.2.1 training

28th-29th May 2014, Bangalore. 
Classromm

16th-20th June 2014, Region A. Instructor led, 
On-line

23rd-27th June 2014, Region B. Instructor led, 
On-line

15th-20th September 2014, Region A. Instructor led, 
On-line

22nd-27th September 2014, Region B. Instructor led, 
On-line

1st-6th December 2014, Region A. Instructor led, 
On-line

8th-12th December 2014, Region B. Instructor led, 
On-line



This email and any attachments to it may be confidential and are intended 
solely for the use of the individual to whom it is addressed. Any views or 
opinions expressed are solely those of the author and do not necessarily 
represent those of Shape Blue Ltd or related companies. If you are not the 
intended recipient of this email, you must neither take any action based upon 
its contents, nor copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error. Shape Blue Ltd is a company 
incorporated in England & Wales. ShapeBlue Services India LLP is a company 
incorporated in India and is operated under license from Shape Blue Ltd. Shape 
Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is 
operated under license from Shape Blue Ltd. ShapeBlue is a registered trademark.