Re: [Openstack] inter vm communication issue

2012-06-29 Thread Tom Sante
Hi,

I am a colleague of Bram working with him on these same systems. 
We are now experiencing other issues related to networking on our nodes:

- we gave openstack eth0 as the vlan interface
- eth0 en eth1 are still slaves in a bond0 (mode 6)
==> we are seeing a big number of dropped packets on the eth1 interface, this 
under heavy load causing an unstable network on our VMs

My guess would be because we are directly using eth0 as vlan interface while it 
is a slave in a bond is creating these issues.
Or should this not create issues?
 
If so while we managed to avoid inter VM communication issues by using eth0 as 
vlin int. instead of our bond0 (= eth0+eth1)
this still leaves the issue of why a bond interface would function as the 
openstack vlan interface?

Regards,

Tom

Op vrijdag 1 juni 2012, om 15:28 heeft Bram De Wilde het volgende geschreven: 
> The bond was the culprit!
> 
> As we have been breaking our heads over this for close to 2 days it seems 
> important enough to report here:
> 
> On our ubuntu 12.04 systems we had 2 bonded interfaces configured with an ip 
> of 10.0.0.0/24 in an adaptive load balancing mode. We used this mode = 6 type 
> bonding a bonding is not supported by the switch administrator. This appears 
> not to be compatible with vlan tagged multi-host networking. @Vish: thanx for 
> the suggestion, any idea where we would have to post this issue as a bug? I 
> guess not openstack but rather the ifenslave people?
> I would suspect this not to occur with other, switch based bonding modes but 
> as we have no support for this I am unable to test...
> This explained the inter vm communication to be really unreliable an drop out 
> after a while. Using the eth0 interface instead of the bond0 as the vlan 
> interface the network now is stable as ever.
> 
> Happy openstack users we will now be configuring our private cloud for stable 
> operation in our department, thanx all!
> 
> We will be working on a solution for the name resolution in vlan tagged 
> multi-host configurations, I will keep you posted as we progress.
> 
> Kind regards,
> 
> Bram
> 
> On 1-jun-2012, at 10:02, Vishvananda Ishaya wrote:
> 
> > 
> > On Jun 1, 2012, at 12:46 AM, Bram De Wilde wrote:
> > 
> > > Thanx Vish,
> > > 
> > > On the name resolution: would you consider this a bug (I can file one if 
> > > you would like) or a feature?
> > 
> > Bug if it is an easy fix :)
> > 
> > > Could this be fixed by changing the /usr/bin/nova-dhcpbridge script to 
> > > load all mac, hostname, ip combinations for the database instead of just 
> > > the physical hosts one? Or would this create other issues?
> > 
> > We would have to do some investigation into special settings. We want to 
> > make sure that the host doesn't respond to dhcp requests from other hosts. 
> > If it is possible to set up dnsmasq to do name resolution for the other 
> > hosts without handing ip addresses then we could do it this way. Someone 
> > will have to look into it. It might have to be something a little more 
> > complicated like writing out a hosts file in addition to the dhcp file and 
> > telling dnsmasq to use it. If you want to investigate the easiest way to 
> > configure dnsmasq to do this, that would be a big help.
> > 
> > Vish
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net)
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-27 Thread Tom Sante
Hey,

I seem to have the same issue with our VMs, I commented (comment #7) on a bug 
report that seems to correspond with our DHCP issues: 
https://bugs.launchpad.net/nova/+bug/887162

Please report if you are still affected by this issue on the bug page so the 
developers can look into a fix.

Regards,


Op zaterdag 16 juni 2012, om 01:19 heeft Christian Parpart het volgende 
geschreven:

> Hey all,
> 
> it now just happened twice again, both just today. and the last at 22:00 UTC, 
> with
> the following in the nova-network's syslog:
> 
> root@gw1:/var/log# grep 'dnsmasq.*10889' daemon.log 
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37 
> cachesize 150
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6 GNU-getopt 
> no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack
> Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on 
> 10.10.40.3, lease time 3d
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses
> Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read 
> /var/lib/nova/networks/nova-br100.conf
> Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16 
> fa:16:3e:3d:ff:f3 
> Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16 
> fa:16:3e:3d:ff:f3 redis-appdata1
> 
> it seemed that this once VM was the only one who sent a dhcp request over the 
> past 5 hours, 
> and that first wone got replied with dhcp ack, and that is it.
> That's been the time the host behind that IP (redis-appdata1) stopped 
> functioning.
> 
> However, I now actually did update dnsmasq on our gateway note, to latest 
> trunk 
> of dnsmasq git repository, killed dnsmasq, restarted nova-network (which 
> auto-starts dnsmasq per 
> device).
> 
> Now, I really hoped that this one particular bug fix was the cause of the 
> downtime, 
> but appearently, thet MIGHT be another factor.
> 
> There is unfortunately nothing to read in the VM's syslog.
> What else could cause the VM to forget its IP?
> Can this also be caused by send_arp_for_ha=True?
> 
> Regards,
> Christian.
> 
> Christian.
> On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton 
> mailto:nathanael.i.bur...@gmail.com)> wrote:
> > FWIW I haven't run across the dnsmasq bug in our environment using EPEL 
> > packages. 
> > Nate
> > On Jun 14, 2012 7:20 PM, "Vishvananda Ishaya"  > (mailto:vishvana...@gmail.com)> wrote:
> > > Are you running in VLAN mode? If so, you probably need to update to a new 
> > > version of dnsmasq. See this message for reference:
> > > 
> > > http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html 
> > > 
> > > Vish
> > > 
> > > On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:
> > > > Hey all,
> > > > 
> > > > I feel really sad with saying this, now, that we have quite a few 
> > > > instances in producgtion 
> > > > since about 5 days at least, I now have encountered the second instance 
> > > > loosing its
> > > > IP address due to "No DHCPOFFER" (as of syslog in the instance).
> > > > 
> > > > I checked the logs in the central nova-network and gateway node and 
> > > > found
> > > > dnsmasq still to reply on requests from all the other instances and it 
> > > > even
> > > > got the request from the instance in question and even sent an OFFER, 
> > > > as of what
> > > > I can tell by now (i'm investigating / posting logs asap), but while it 
> > > > seemed
> > > > that the dnsmasq sends an offer, the instances says it didn't receive 
> > > > one - wtf?
> > > > 
> > > > Please tell me what I can do to actually *fix* this issue, since this 
> > > > is by far very fatal.
> > > > 
> > > > One chance I'd see (as a workaround) is, to let created instanced 
> > > > retrieve 
> > > > its IP via dhcp, but then reconfigure /etc/network/instances to 
> > > > continue with
> > > > static networking setup. However, I'd just like the dhcp thingy to get 
> > > > fixed.
> > > > 
> > > > I'm very open to any kind of helping comments, :) 
> > > > 
> > > > So long,
> > > > Christian.
> > > > 
> > > > ___
> > > > Mailing list: https://launchpad.net/~openstack
> > > > Post to : openstack@lists.launchpad.net 
> > > > (mailto:openstack@lists.launchpad.net)
> > > > Unsubscribe : https://launchpad.net/~openstack
> > > > More help : https://help.launchpad.net/ListHelp
> > > 
> > > 
> > > 
> > > ___
> > > Mailing list: https://launchpad.net/~openstack
> > > Post to : openstack@lists.launchpad.net 
> > > (mailto:openstack@lists.launchpad.net)
> > > Unsubscribe : https://launchpad.net/~openstack
> > > M