Hi, all

After a lot of experiments and thoughts, I finally got rid of the
annoying invalid
read/write of size 8 errors reported by valgrind.

It only took several minutes to fix the errors once I tracked down to the
cause of it. And I decided to talk about this interesting process in
details:

   - After some googling, I noticed that normally such errors are caused by
   invalid address access, such as reading from a already freed memory or
   array subscript out of range. Thus the first things I did was to review the
   coroutine module and see if there's any possible double free() or
   incorrect allocated size from malloc(). It turns out no, the memory
   management for coroutine is too simple to have a problem.
   -

   Compiled the code without any optimization, tracked down the backtrace
   and found out the errors raised after a coroute finished executing and
   tried to free itself.

   static void _coro_entry_point(scheduler *sch)
   {
       ...
       co->func(sch, co->data); // main logic of a coroutine
       _coro_release(co);       // release the coroutine, where errors occur :-(
       sch->co[id] = NULL;      // after releasing, `sch` became invalid
       sch->n_coro--;
       sch->running_id = -1;
   }

   It confused me for several days while trying to explained why a single
   call to free a coroutine will resulted in an invalid memory access. I
   proposed some possible reasons: maybe a user context switch changed the
   address of sch, maybe some memory corruption ruined the coroutines
   maintained by the given sch. After some experiments, I removed all these
   reasons from the possible list.
   - GDB didn't even complain any thing while executing the code. valgrind told
   me there was something wrong without telling what it was. Maybe it was
   something wrong with valgrind or libc. I asked my metors for help,
acidx said
   it was unlikely because all these tools/library are well tested, and taught
   me how to write myself a malloc/free wrapper and mentioned a tool named
   MemorySanitizer. edsiper adviced me to try another memory management
   library jemalloc.
   - I've tried every possible methods but code won't lie, the errors
   remained. I got my final bullet the MemorySanitizer, it was part of the
   Clang project. I've heard that clang is known for its more
   human-friendly error explaination. After reading the introduction of
   MemorySanitizer <http://clang.llvm.org/docs/MemorySanitizer.html>, I
   added to corresponding flags and altered gcc withclang. After compiling
   and executing, I got only one simple warning saying that sch became
   uninitialized after release, this really drove me crazy.
   - I started to rethink the model of the context switch and found out I
   was wrong with the stack allocating for context. I thought it was only used
   for saving all the context information while switching out the given
   coroutine, however it is also used for executing the coroutine and the
   errors are explainable: because _coro_entry_point is executed on the
   stack of the coroutine co, once co was released, the stack became
   invalid, but _coro_entry_point still haven't finished and tried to
   access the local memory of the stack which caused the problems.
   - Aha, now that we narrowed down the origin of the errors: The release
   of a coroutine shall not happened within itself, we must free it somewhere
   else. A coroutine is marked as CORO_DEAD after its execution, and next
   time when a new coroutine required the same slot, the dead one was released
   and replaced with a new one, a better solution will be reusing the dead
   coroutine. For the details, please refer to the source code
   <https://github.com/swpd/coroutine/tree/coro>.

For the next week, I will be redesigning the coroutine module to make it
more general and try to integrate it with Duda I/O.

Blog Post:
http://blog-swpd.rhcloud.com/gsoc-2014-update-duda-io-coroutines-week-6/

Github Repo: https://github.com/swpd/coroutine

Best Regards,

swpd
_______________________________________________
Monkey mailing list
[email protected]
http://lists.monkey-project.com/listinfo/monkey

Reply via email to