Hi, Our [Fault-Genes WG] has been working on defining the fault classifications for key OpenStack projects in an effort to support OpenStack fault management & self-healing. We have been using machine learning (unsupervised data) as a method to look into all bugs and issues submitted by the community and it has been very challenging to define the classification completely by the machine. We have decided to go with supervised data set. In order to do this, we need to come up with our training data.
We need your help to generate the training data set. Basically, we only need 2 or 3 unique fault classifications with a short description and the associated mitigations from each member who is familiar with OpenStack design & operation. This way we can build a focused library of faults & mitigations for each project. Once this data is accumulated, we will develop our own specific algorithms that can be applied to all future OpenStack issues. Thanks in advance for your support. No. Project Fault Classification Description Root Cause Mitigation 1 2 3 Below are examples of what a couple of developers in Neutron have provided. I am sure there are other types of fault classifications in Neurton that have not been captured in this table. Fault Classification Root Cause Mitigation Network Connectivity Issues Virtual interface in the VM admin down Un-shut the virtual interface Virtual interface does not have IP address via DHCP Depends on lower level root cause Virtual network does not have interface to the router Add virtual network as one of the router interfaces vNIC port of VM not active (stuck in build) Depends on lower level root cause Security group lock in traffic Fix the security group to allow relevant traffic Unable to Add Port to Bridge Libvirtd in Apparmor is blocking allow Libvirtd profile in Appamor No Valid Host Found/insufficient hypervisor resources Compute nodes do not have sufficient resources free up required compute storage and memory resources on compute node No Resource Configuration issues Change config setting Authentication/permissions error Configuration error such as port # or Password Make sure end points are properly configured Gateway access not reachable Use custom keep-alive health-check Design issue of OpenStack Network node Out of band health checking mechanism Security Group Mis-configuration The security group Change security rules/Programming the security group DNS Attack Implement CERT alerts updates Network design issue Network storm Reduce L2 broadcast domain Nemat
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
