Dustin: We're using "amgtar" to dump, so I believe that amgtar is also used for the validation. At some point amgtar calls gtar.
I've done the 'truss' on amcheckdump and will send a much reduced version as a .zip file attachment in a follow-up email. The truss output covers the period in time between seeing the following in the window where amcheckdump was started: Validating image thor1:/root datestamp 20100803111814 level 2 part 1/1 on tape HBK1_17 file #2 and: Image was not successfully validated: Error writing data to validation command: Error writing fd 9: Broken pipe. It looks to me like: - process 10344 (amcheckdump) creates a pipe with file descriptors 8 and 9 at each end. - the process forks off a child (10350) - the parent closes fd8 - the child closes fd9 and eventually dups fd8 onto fd0 (std in) So, we end up with a parent and child at both ends of a pipe with fd9 on the parent side and fd0 on the child side. Both the parent and child write/read sucessfully for a while. Eventually, (line 4461) the child seems to exit normally with exit status of 0. It looks to me like this is an orderly exit with normal cleanup, closing of file descriptors, etc. Unfortunately, as soon as the child exists the next parent write gets a SIGPIPE signal (lines 4462 and 5488). This doesn't happen with every DLE checked - about 20% seem to check out okay. Is it possible that this is a race condition with it being random chance whether the parent finishes reading before the child exits? Could there be a missing "flush" command somewhere? Any insight you can provide would be appreciated! Do you know if anyone else has sucessfully used amcheckdump with the mod 3306 changes on Solaris (or on any other platforms?). Thanks, Sean > >On Mon, Aug 2, 2010 at 12:56 AM, Sean Walmsley ><s...@fpp.nuclearsafetysolutions.com> wrote: >> Since this error is coming from just a few lines after the line >> that caused the hanging problem, and since amcheckdump should only >> be writing to the /dev/null device that wasn't being opened properly >> before, we thought that the problems might be related. > >I assume that the validation command is gtar? Is it giving any kind >of error message? Can you use 'truss' to see if you can figure out >why it's exiting while there's still data flowing? > >Dustin > ================================================================= Sean Walmsley sean at fpp . nuclearsafetysolutions dot com Nuclear Safety Solutions Ltd. 416-592-4608 (V) 416-592-5528 (F) 700 University Ave M/S H04 J19, Toronto, Ontario, M5G 1X6, CANADA