Re: [rdo-dev] tripleo cluster failure

Rahul Pathak Tue, 28 Jul 2020 04:58:23 -0700

Reminder!! Please help. 
------------------------------- 
Hi, 

Is there some kind of script that I can run to know the exact issue about what 
resource crunch is there ?


Below is the memory and disk utilization of controller: 

[heat-admin@overcloud-controller-0 ~]$ free -m 
total used free shared buff/cache available 
Mem: 128722 25632 92042 70 11047 102377 
Swap: 0 0 0 
[heat-admin@overcloud-controller-0 ~]$ df -h 
Filesystem Size Used Avail Use% Mounted on 
devtmpfs 63G 0 63G 0% /dev 
tmpfs 63G 39M 63G 1% /dev/shm 
tmpfs 63G 27M 63G 1% /run 
tmpfs 63G 0 63G 0% /sys/fs/cgroup 
/dev/sda2 1.9T 12G 1.9T 1% / 
tmpfs 13G 0 13G 0% /run/user/0 
tmpfs 13G 0 13G 0% /run/user/1000 

Same memory and disk available on all three controllers. 

On the same environment when I installed overcloud with redhat repos and redhat 
overcloud images...I have not faced this issue. I have tested almost 500 
projects and 500 networks (one network per project) on the same environment 
with redhat and it was working fine without any issue and cluster failure. But 
when I am using Centos-7 tripleo repo's its happening again n again. 


Regards 
Rahul Pathak 
i2k2 Networks (P) Ltd. | Spring Meadows Business Park 
A61-B4 & 4A First Floor, Sector 63, Noida - 201 301 
ISO/IEC 27001:2005 & ISO 9001:2008 Certified 

----- Original Message -----

From: "Alfredo Moralejo Alonso" <[email protected]> 
To: "Arkady Shtempler" <[email protected]> 
Cc: "Rahul Pathak" <[email protected]>, "RDO Developmen List" 
<[email protected]> 
Sent: Thursday, July 23, 2020 5:40:42 PM 
Subject: Re: [rdo-dev] tripleo cluster failure 



On Thu, Jul 23, 2020 at 2:05 PM Arkady Shtempler < [email protected] > wrote: 



Hi all! 

Rahul - there is nothing relevant in the attached file, you've probably 
executed LogTool on "working environment", so there is nothing interesting in 
it. 
I think that you had to mention the Error we've detected in an already 
"crushed" environment, just as I was suggesting you to do. 




Yes, nothing interesting in the attached logs. 

<blockquote>


Alfredo - this Error was logged almost on each OC node at the same time when 
the problems had started. 


    * 
hyp-0 

    * 
------------------------------ LogPath: 
/var/log/containers/neutron/openvswitch-agent.log.1 
------------------------------ 

    * 
IsTracebackBlock:False 

    * 
UniqueCounter:1 

    * 
AnalyzedBlockLinesSize:18 

    * 
26712-2020-07-21 20:07:54.604 54410 INFO 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-a2173b45-f18b-45f9-90d8-cb8ab1754332 - -...<--LogTool-LINE IS TOO LONG! 

    * 
26713-2020-07-21 20:07:54.605 54410 INFO 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-a2173b45-f18b-45f9-90d8-cb8ab1754332 - -...<--LogTool-LINE IS TOO LONG! 

    * 
26714:2020-07-21 20:07:56.363 54410 ERROR oslo.messaging._drivers.impl_rabbit 
[-] [2f214f0b-84d0-49d4-bcf4-477565903585] AMQP server 
overcloud-control...<--LogTool-LINE IS TOO LONG! 

    * 
26715:2020-07-21 20:07:56.364 54410 ERROR oslo.messaging._drivers.impl_rabbit 
[-] [664475bc-3b39-4ce5-a60e-f010b8d5201d] AMQP server 
overcloud-control...<--LogTool-LINE IS TOO LONG! 

    * 
26716:2020-07-21 20:07:56.364 54410 ERROR oslo.messaging._drivers.impl_rabbit 
[-] [68a0ab43-a216-43fe-aa58-860d3dc5e69e] AMQP server 
overcloud-control...<--LogTool-LINE IS TOO LONG! 

    * 
... 

    * 
... 

    * 
... 

    * 
LogTool --> THIS BLOCK IS TOO LONG! 

    * 
LogTool --> POTENTIAL BLOCK'S ISSUES: 

    * 
26714:2020-07-21 20:07:56.363 54410 ERROR oslo.messaging._drivers.impl_ rabbit 
[-] [2f214f0b-84d0-49d4-bcf4- 477565903585] AMQP server 
overcloud-controller-2.inter.. . 

    * 
ection. Check login credentials: Socket closed: IOError: Socket closed 

    * 
no 111] ECONNREFUSED. Trying again in 1 seconds.: error: [Errno 111] 
ECONNREFUSED 

    * 
rnalapi.i2k2cloud02.com:5672 is unreachable: <AMQPError: unknown error>. Trying 
again in 1 seconds.: RecoverableConnectionError: <AMQPError: unknown error> 



</blockquote>


I'd suggest to check rabbitmq and mariadb logs. 

AFAIK, there is not a configuration that may limit the number of networks or 
project, but it may be hitting some resources scarcity that affect the running 
services. What's the memory sizing and usage of the controllers? 


<blockquote>


You can find more Error Blocks in the attached file. 

Thanks! 
        

</blockquote>

_______________________________________________
dev mailing list
[email protected]
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: [email protected]

Re: [rdo-dev] tripleo cluster failure

Reply via email to