On Thu, Mar 25, 2010 at 12:31:15AM +0300, Pavel Lunin wrote: > Richard, one more thing. What do you do with the crash dumps > untarzipping them on the router/switch itself? I have never done > anything with them but sending to JTA. I believe it can have a lot of > sense to pick them and discover yourself (though I've never tried), > but why on the switch itself? Am I missing something important?
You can run gdb on the coredump files locally and get a pretty good idea of what blew up and where, which is often quite helpful in working around the original problem. Also, JTAC is far too often surprisingly bad at working with coredumps, and without the ability to independently verify things myself and tell them they were confused I've had some cases which would probably never have been solved. The story that was explained to me was that JTAC has some point and click tool that they load the core into, which parses it and searches their PR database to find matching backtraces. The problem is I'm convinced at this point nobody in JTAC actually knows what a backtrace is or how to read it, they just match it to whatever their tool tells them, and surprisingly often their tool is very very wrong. The other big problem of course is file size and compression. Apparently their tool only works with .zip files not .tgz files (which is a small bit of a problem, seeing as how the router only has gzip :P), so they have to uncompress it locally first before they can load it. I've had JTAC not know what a .tgz file was, I've had Advanced JTAC spend days trying to figure out why they couldn't get any data out of a coredump when the problem turned out to be their local filesystem quota wasn't big enough to work with a large core file, etc, etc. Even when things work "right" it seems to take them 12-72 hours to parse a coredump even on a p1 case, and a healthy percentage of the time their analysis is just flat out wrong. Without the ability to look at the dump yourself, you'd never know they were barking up the wrong tree. Because EX uses PowerPC, it isn't even particularly easy to find a FreeBSD ppc box where you can actually do any useful analysis of the coredumps. That assumes of course that you have working connectivity on the box in question and can quickly copy the sometimes very large files off, which due to the original problem that caused the crash is often times not the case. And where do they plan on writing a 2GB core dump when there is an EX kernel panic and you only have 600MB of free space on an "empty" box? You can bet there will be, I run into them at least 2 or 3 times a year on MX easily, it's just a fact of life. I mean seriously what does 32GB of flash cost, $100? Think about the amount of grief that will be caused by this in comparison, and tell me it was a smart move on their part. :) -- Richard A Steenbergen <r...@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC) _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp