Hi, all
After a lot of experiments and thoughts, I finally got rid of the
annoying invalid
read/write of size 8 errors reported by valgrind.
It only took several minutes to fix the errors once I tracked down to the
cause of it. And I decided to talk about this interesting process in
details:
- After some googling, I noticed that normally such errors are caused by
invalid address access, such as reading from a already freed memory or
array subscript out of range. Thus the first things I did was to review the
coroutine module and see if there's any possible double free() or
incorrect allocated size from malloc(). It turns out no, the memory
management for coroutine is too simple to have a problem.
-
Compiled the code without any optimization, tracked down the backtrace
and found out the errors raised after a coroute finished executing and
tried to free itself.
static void _coro_entry_point(scheduler *sch)
{
...
co->func(sch, co->data); // main logic of a coroutine
_coro_release(co); // release the coroutine, where errors occur :-(
sch->co[id] = NULL; // after releasing, `sch` became invalid
sch->n_coro--;
sch->running_id = -1;
}
It confused me for several days while trying to explained why a single
call to free a coroutine will resulted in an invalid memory access. I
proposed some possible reasons: maybe a user context switch changed the
address of sch, maybe some memory corruption ruined the coroutines
maintained by the given sch. After some experiments, I removed all these
reasons from the possible list.
- GDB didn't even complain any thing while executing the code. valgrind told
me there was something wrong without telling what it was. Maybe it was
something wrong with valgrind or libc. I asked my metors for help,
acidx said
it was unlikely because all these tools/library are well tested, and taught
me how to write myself a malloc/free wrapper and mentioned a tool named
MemorySanitizer. edsiper adviced me to try another memory management
library jemalloc.
- I've tried every possible methods but code won't lie, the errors
remained. I got my final bullet the MemorySanitizer, it was part of the
Clang project. I've heard that clang is known for its more
human-friendly error explaination. After reading the introduction of
MemorySanitizer <http://clang.llvm.org/docs/MemorySanitizer.html>, I
added to corresponding flags and altered gcc withclang. After compiling
and executing, I got only one simple warning saying that sch became
uninitialized after release, this really drove me crazy.
- I started to rethink the model of the context switch and found out I
was wrong with the stack allocating for context. I thought it was only used
for saving all the context information while switching out the given
coroutine, however it is also used for executing the coroutine and the
errors are explainable: because _coro_entry_point is executed on the
stack of the coroutine co, once co was released, the stack became
invalid, but _coro_entry_point still haven't finished and tried to
access the local memory of the stack which caused the problems.
- Aha, now that we narrowed down the origin of the errors: The release
of a coroutine shall not happened within itself, we must free it somewhere
else. A coroutine is marked as CORO_DEAD after its execution, and next
time when a new coroutine required the same slot, the dead one was released
and replaced with a new one, a better solution will be reusing the dead
coroutine. For the details, please refer to the source code
<https://github.com/swpd/coroutine/tree/coro>.
For the next week, I will be redesigning the coroutine module to make it
more general and try to integrate it with Duda I/O.
Blog Post:
http://blog-swpd.rhcloud.com/gsoc-2014-update-duda-io-coroutines-week-6/
Github Repo: https://github.com/swpd/coroutine
Best Regards,
swpd
_______________________________________________
Monkey mailing list
[email protected]
http://lists.monkey-project.com/listinfo/monkey