Hi Charles,

> I
>>
> have been unable to figure out the problem, but I can replicate your
> results here.  The previous commit works, while this one fails.
>
> I did try running under valgrind and with MALLOC_CHECK_ set to 3, but
> didn't get anything useful (to my eyes, anyway).  Interestingly, I did
> seem to get slightly different behavior when running under valgrind
> with the memorytest, but nothing that really points a finger at what
> might be wrong.  It's possible there's some timing related problem,
> but I suspect what's happening is valgrind is changing the system
> behavior with respect to the task that crashes.
>
> Mostly I just wanted to report that I can replicate John's results on
> Debian, even though I haven't made any other progress.

Interesting tests.  Here's something new I'm having trouble explaining. 
  Maybe someone else can figure out what's going on.

I've been running gdb on halcmd.  While running latency-test, make a 
copy of the /tmp/hal.lat.foo directory into /tmp/hal.lat, and stop the 
test.  Then:

. scripts/env-environment
cd /tmp/hal.lat
halcmd -f lat.hal
# if you're running the pre-problem version, stop the test and
halcmd unload all
# if not, go kill -9 the mess

This works nicely to reproduce the problem, except it does something 
weird for me:

Run this in the pre-problem version.  Should work well.

Then fix your PATH to point at the problem version.

halcmd -f lat.hal

Now go clean up the mess.

Fix your PATH again to point at the pre-problem version.

halcmd -f lat.hal

Stops working for me.  I'm probably missing something obvious, but I've 
carefully compared the environments and ps lists before/after every run 
and there are no differences that seem important.  If I kill the shell 
and start over, the pre-problem version will begin working again.

 From the debugging side, neither of my tacks have gotten very far. 
I've never done anything terribly difficult with C, gdb or assembly 
before, so I've been reading docs at every step.  The two tacks:

- Use the debugger to understand why test_and_set_bit returns 0.  I 
don't know what the 'tsbbl' instruction is, and haven't figured out how 
to examine memory yet.

- Find some way to log function calls and argument values as the 
execution progresses.  I hope to (dis)confirm that the pre-problem and 
problem versions are taking the same path up to the point where the 
problem version starts spinning.  Stepping through with gdb is, of 
course, too painful.  I read that valgrind might be able to help with 
this, but it doesn't seem to be a common usage.  Charles, have you heard 
of using valgrind for that purpose?

That's all I've got.

        John

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Emc-developers mailing list
Emc-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to