Re: [openstack-dev] [Infra][nova][magnum] Jenkins failed quite often for "Cannot set up guest memory 'pc.ram': Cannot allocate memory"

2015-12-13 Thread pcrews

Hi,

OVH is a new cloud provider for openstack-infra nodes:
http://www.openstack.org/blog/2015/12/announcing-a-new-cloud-provider-for-openstacks-ci-system-ovh/

It appears that selection of nodes on any cloud provider is a matter of 
luck:
"When a developer uploads a proposed change to an OpenStack project, 
available instances from any of our contributing cloud providers will be 
used interchangeably to test it."


You might want to ping people in #openstack-infra to find a point of 
contact for them (OVH) and/or to work with the infra folks directly to 
see about troubleshooting this further.



On 12/12/2015 02:16 PM, Hongbin Lu wrote:

Hi,

As Kai Qiang mentioned, magnum gate recently had a bunch of random
failures, which occurred on creating a nova instance with 2G of RAM.
According to the error message, it seems that the hypervisor tried to
allocate memory to the nova instance but couldn’t find enough free
memory in the host. However, by adding a few “nova hypervisor-show XX”
before, during, and right after the test, it showed that the host has 6G
of free RAM, which is far more than 2G. Here is a snapshot of the output
[1]. You can find the full log here [2].

Another observation is that most of the failure happened on a node with
name “devstack-trusty-ovh-*” (You can verify it by entering a query [3]
at http://logstash.openstack.org/ ). It seems that the jobs will be fine
if they are allocated to a node other than “ovh”.

Any hints to debug this issue further? Suggestions are greatly appreciated.



[1] http://paste.openstack.org/show/481746/

[2]
http://logs.openstack.org/48/256748/1/check/gate-functional-dsvm-magnum-swarm/56d79c3/console.html

[3] https://review.openstack.org/#/c/254370/2/queries/1521237.yaml

Best regards,

Hongbin

*From:*Kai Qiang Wu [mailto:wk...@cn.ibm.com]
*Sent:* December-09-15 7:23 AM
*To:* openstack-dev@lists.openstack.org
*Subject:* [openstack-dev] [Infra][nova][magnum] Jenkins failed quite
often for "Cannot set up guest memory 'pc.ram': Cannot allocate memory"

Hi All,

I am not sure what changes these days, We found quite often now, the
Jenkins failed for:


http://logs.openstack.org/07/244907/5/check/gate-functional-dsvm-magnum-k8s/5305d7a/logs/libvirt/libvirtd.txt.gz

2015-12-09 08:52:27.892
+:
22957: debug : qemuMonitorJSONCommandWithFd:264 : Send command
'{"execute":"qmp_capabilities","id":"libvirt-1"}' for write with FD -1
2015-12-09 08:52:27.892
+:
22957: debug : qemuMonitorSend:959 : QEMU_MONITOR_SEND_MSG:
mon=0x7fa66400c6f0 msg={"execute":"qmp_capabilities","id":"libvirt-1"}
  fd=-1
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:347 : dispatching to max 0
clients, called from event watch 6
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:360 : event not handled.
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:347 : dispatching to max 0
clients, called from event watch 6
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:360 : event not handled.
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:347 : dispatching to max 0
clients, called from event watch 6
2015-12-09 08:52:27.941
+:
22951: debug : virNetlinkEventCallback:360 : event not handled.
2015-12-09 08:52:28.070
+:
22951: error : qemuMonitorIORead:554 : Unable to read from monitor:
Connection reset by peer
2015-12-09 08:52:28.070
+:
22951: error : qemuMonitorIO:690 : internal error: early end of file
from monitor: possible problem:
Cannot set up guest 

Re: [openstack-dev] [heat] heat delete woes in Juno

2015-03-26 Thread pcrews

Regarding item #3:
I have mainly seen this issue on stacks that have been snapshotted:
https://bugs.launchpad.net/heat/+bug/1412965

In such cases, the only way to avoid (afaik) is for the owner to 
manually delete the snapshots prior to deleting the stack.  Heat tries 
to auto-delete snapshots and hangs otherwise.


On 03/26/2015 11:17 AM, Matt Fischer wrote:

Nobody on the operators list had any ideas on this, so re-posting here.

We've been having some issues with heatdelete-stack in Juno. The issues
generally fall into three categories:

1) it takes multiple calls to heat to delete a stack. Presumably due
to heat being unable to figure out the ordering on deletion and
resources being in use.

2) undeleteable stacks. Stacks that refuse to delete, get stuck in
DELETE_FAILED state. In this case, they show up in stack-list and
stack-show, yet resource-list and stack-delete deny their existence.
This means I can't be sure whether they have any real resources very easily.

3) As a corollary to item 1, stacks for which heat can never unwind the
dependencies and stay in DELETE_IN_PROGRESS forever.

Does anyone have any work-arounds for these or recommendations on
cleanup? My main worry is removing a stack from the database that is
still consuming the customer's resources. I also don't just want to
remove stacks from the database and leave orphaned records in the DB.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] How to delete a VM which is in ERROR state?

2014-12-12 Thread pcrews

On 12/09/2014 03:54 PM, Ken'ichi Ohmichi wrote:

Hi,

This case is always tested by Tempest on the gate.

https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_delete_server.py#L152

So I guess this problem wouldn't happen on the latest version at least.

Thanks
Ken'ichi Ohmichi

---

2014-12-10 6:32 GMT+09:00 Joe Gordon joe.gord...@gmail.com:



On Sat, Dec 6, 2014 at 5:08 PM, Danny Choi (dannchoi) dannc...@cisco.com
wrote:


Hi,

I have a VM which is in ERROR state.


+--+--+++-++

| ID   | Name
| Status | Task State | Power State | Networks   |


+--+--+++-++

| 1cb5bf96-619c-4174-baae-dd0d8c3d40c5 |
cirros--1cb5bf96-619c-4174-baae-dd0d8c3d40c5 | ERROR  | -  | NOSTATE
||


I tried in both CLI “nova delete” and Horizon “terminate instance”.
Both accepted the delete command without any error.
However, the VM never got deleted.

Is there a way to remove the VM?



What version of nova are you using? This is definitely a serious bug, you
should be able to delete an instance in error state. Can you file a bug that
includes steps on how to reproduce the bug along with all relevant logs.

bugs.launchpad.net/nova




Thanks,
Danny

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Hi,

I've encountered this in my own testing and have found that it appears 
to be tied to libvirt.


When I hit this, reset-state as the admin user reports success (and 
state is set), *but* things aren't really working as advertised and 
subsequent attempts to do anything with the errant vm's will send them 
right back into 'FLAIL' / can't delete / endless DELETING mode.


restarting libvirt-bin on my machine fixes this - after restart, the 
deleting vm's are properly wiped without any further user input to 
nova/horizon and all seems right in the world.


using:
devstack
ubuntu 14.04
libvirtd (libvirt) 1.2.2

triggered via:
lots of random create/reboot/resize/delete requests of varying validity 
and sanity.


Am in the process of cleaning up my test code so as not to hurt anyone's 
brain with the ugly and will file a bug once done, but thought this 
worth sharing.


Thanks,
Patrick

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal new hacking rules

2014-11-24 Thread pcrews

On 11/24/2014 09:40 AM, Ben Nemec wrote:

On 11/24/2014 08:50 AM, Matthew Gilliard wrote:

1/ assertFalse() vs assertEqual(x, False) - these are semantically
different because of python's notion of truthiness, so I don't think
we ought to make this a rule.

2/ expected/actual - incorrect failure messages have cost me more time
than I should admit to. I don't see any reason not to try to improve
in this area, even if it's difficult to automate.


Personally I'd rather kill the expected, actual ordering and just have
first, second or something that doesn't imply which value is which.
Because it can't be automatically enforced, we'll _never_ fix all of the
expected, actual mistakes (and will continually introduce new ones), so
I'd prefer to eliminate the confusion by not requiring a specific ordering.


++.  It should be a part of review to ensure that the test (including 
error messages) makes sense.  Simply having a (seemingly costly to 
implement and enforce) rule stating that something must adhere to a 
pattern does not guarantee that.


assertEqual(expected, actual, msg=nom nom nom cookie cookie yum) 
matches the pattern, but the message still doesn't necessarily provide 
much worth.


Focusing on making tests informative and clear about what is thought to 
be broken on failure seems to be the better target (imo).




Alternatively I suppose we could require kwargs for expected and actual
in assertEqual.  That would at least make it more obvious when someone
has gotten it backward, but again that's a ton of code churn for minimal
gain IMHO.



3/ warn{ing} - 
https://github.com/openstack/nova/blob/master/nova/hacking/checks.py#L322

On the overarching point: There is no way to get started with
OpenStack, other than starting small.  My first ever patch (a tidy-up)
was rejected for being trivial, and that was confusing and
disheartening. Nova has a lot on its plate, sure, and plenty of
pending code reviews.  But there is also a lot of inconsistency and
unloved code which *is* worth fixing, because a tidy codebase is a joy
to work with, *and* these changes are ideal to bring new reviewers and
developers into the project.

Linus' post on this from the LKML is almost a decade old (!) but worth reading.
https://lkml.org/lkml/2004/12/20/255

   MG

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] database hoarding

2014-10-30 Thread pcrews

On 10/30/2014 03:30 PM, Abel Lopez wrote:

It seems that every release, there is more and more emphasis on upgradability. 
This is a good thing, I've love to see production users easily go from old to 
new.

As an operator, I've seen first hand the results of neglecting the databases 
that openstack creates. If we intend to have users go year-over-year with 
upgrades, we're going to expect them to carry huge databases around.

Just in my lab, I have over 10 deleted instances in the last two months.
Frankly, I'm not convinced that simply archiving deleted rows is the best idea. 
Sure, gets your production databases and tables to a manageable size, but 
you're simply hoarding old data.

As an operator, I'd prefer to have time based criteria over number of rows, too.
I envision something like `nova-manage db purge [days]` where we can leave it 
up to the administrator to decide how much of their old data (if any) they'd be 
OK losing.

Think about data destruction guidelines too, some companies require old data be 
destroyed when not needed, others require maintaining it.
We can easily provide both here.

I've drafted a simple blueprint 
https://blueprints.launchpad.net/nova/+spec/database-purge

I've love some input from the community.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

HP's LBaaS code (libra), uses something similar for the reasons you 
state - 
http://libra.readthedocs.org/en/latest/admin_api/schedulers.html#expunge-scheduler


The admin-api code would go through and wipe any records that were older 
than the --expire-days parameter, although this was more of an automated 
process vs. a user-triggered function.


++ on the notion that this would be a useful and integrated 
quality-of-life tool for operations. Am in favor.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Contribution work flow

2014-09-13 Thread pcrews

Shar,

Hi!

1)  install git-review and set it up (poke around openstack docs)
2)  after crafting your patch in a new branch (git branch 
name-of-branch-you-are-working-on), commit the changes (git add -a), 
craft a commit message, save it, and then type git review
If everything is correct, it will submit to openstack's ci machine 
(review.openstack.org) and you can track CI testing + reviews and such.


Also - cool to see you working on OpenStack!

Cheers,
Patrick

On 09/13/2014 03:09 AM, Sharan Kumar M wrote:


Hi,

I am about to submit my first patch. I saw the contributions guidelines
in the documentations. Just to make it clear, is it that I issue a pull
request in GitHub, which automatically pushes my patch to gerrit? Also,
I found something called change-Id in the commit message. Is it the hash
code for the git commit? If yes, should we prefix a 'I' in the beginning
of hash code?

Thanks,
Sharan Kumar M


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][mysql] IMPORTANT: MySQL Galera does *not* support SELECT ... FOR UPDATE

2014-05-20 Thread pcrews

On 05/20/2014 10:07 AM, Jay Pipes wrote:

On 05/19/2014 02:32 PM, sridhar basam wrote:

On Mon, May 19, 2014 at 1:30 PM, Jay Pipes jaypi...@gmail.com
mailto:jaypi...@gmail.com wrote:

Stackers,

On Friday in Atlanta, I had the pleasure of moderating the database
session at the Ops Meetup track. We had lots of good discussions and
heard important feedback from operators on DB topics.

For the record, I would not bring this point up so publicly unless I
believed it was a serious problem affecting a large segment of
users. When doing an informal survey of the users/operators in the
room at the start of the session, out of approximately 200 people in
the room, only a single person was using PostgreSQL, about a dozen
were using standard MySQL master/slave replication, and the rest
were using MySQL Galera clustering. So, this is a real issue for a
large segment of the operators -- or at least the ones at the
session. :)


​We are one of those operators that use Galera for replicating our mysql
databases. We used to  see issues with deadlocks when having multiple
mysql writers in our mysql cluster. As a workaround we have our haproxy
configuration in an active-standby configuration for our mysql VIP.

I seem to recall we had a lot of the deadlocks happen through Neutron.
When we go through our Icehouse testing, we will redo our multimaster
mysql setup and provide feedback on the issues we see.


Thanks very much, Sridar, much appreciated.

This issue was raised at the Neutron IRC meeting yesterday, and we've
agreed to take a staged approach. We will first work on documentation to
add to the operations guide that explains the issues (and the tradeoffs
of going to a single-writer cluster configuration vs. just having the
clients retry some request). Later stages will work on a non-locking
quota-management algorithm, possibly in conjunction with Climate, and
looking into how to use coarser-grained file locks or a distributed lock
manager for handling cross-component deterministic reads in Neutron.

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Am late to this topic, but wanted to share this in case anyone wanted to 
read further on this behavior with galera - 
http://www.mysqlperformanceblog.com/2012/08/17/percona-xtradb-cluster-multi-node-writing-and-unexpected-deadlocks/


--
patrick


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev