Some of the alarms are transient (should generate Syslog trap though), and they generate a Chassis alarm upon occurrence (i.e. PFE<>Fabric plane took a hit of CRC errors and then got recovered through fabric healing). Sometimes Chassis does not clear alarm when the transient state gets cleared and that requires a reboot treatment to the RE (yeah Routing engine :)
I think chassisd (a daemon) not getting signaled from the relevant other processes when the states get cleared. On Sun, Feb 14, 2016 at 6:39 PM, Alex K. <nsp.li...@gmail.com> wrote: > Hello everyone, > > For some time now, one of my customers are getting "major alarms" from the > MPC mentioned above on one of their MX960s. > > The issue is that nothing more than that message (+alarm) seems to be > present. Nothing preceding that error, neither in "log messages" nor in > "chassisd". There seems to be output rate drop, at the time of those > incidents till the MPC get restarted (by the appropriate network team) and > than everything gets back to normal. > > It's worth mentioning that they have a second MX960 serving the other half > of their end-users, but configured exactly the same - which never had that > issue (therefore it's probably not traffic related). > > They are running 12.3R6.6. The linecard was already replaced. There is > seems to be no trace options available for monitoring MPCs and their > internal status and Juniper web site lacks potential explanations and > leads, therefore I'm addressing the community - any advice for getting to > the bottom of this, will be welcomed! Additionally, any experience with > troubleshooting similar hardware issues might be as helpful as any advice. > > Thank you. > _______________________________________________ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp