I agree that a mutex should never have a null owner and a nonzero level.

Unfortunately, my first guess is some form of memory corruption:
it seems like a null value accidentally got written to `mu_owner`.  I
could be missing it, but I don't see any logic error in the mutex code
which could cause this.

Getting to the bottom of this is probably going to be difficult,
especially if it is not easy to reproduce.  I don't know how valuable
they are, but my two suggestions are:

1. Look at the `.lst` file that newt generates during a build to
determine what object immediately follows the mutex in RAM.  Maybe an
errant write intended for this object is clearing the owner field.

2. Instrument the code with a bunch of asserts and logs.  Maybe you can
catch the problem shortly after it happens.

Like I said, probably not the most helpful advice, but I don't think
this is going to be an easy one to solve!

Chris

On Mon, Nov 06, 2017 at 03:16:06PM -0800, Jitesh Shah wrote:
> Hey wil,
> Are you saying that because "mu_level" is set to 1?
> 
> It is set to 1 because the last call to os_mutex_release() failed on
> account of "mu_owner" not matching. Thus, the task that got the mutex
> failed to release it. That explains t_lockcnt and mu_level, right?
> 
> Jitesh
> 
> On Mon, Nov 6, 2017 at 7:56 AM, will sanfilippo <wi...@runtime.io> wrote:
> 
> > What this looks like to me is that there was a nested pend without the
> > same number of releases. Maybe some path out of some code that is rarely
> > hit where a mutex is granted but not released?
> >
> > Just a guess...
> >
> > > On Nov 5, 2017, at 8:26 PM, Jitesh Shah <jit...@liveathos.com> wrote:
> > >
> > > Hey Guys,
> > > I am running v1.0.0 branch (0db6321a75deda126943aa187842da6b977cd1c1).
> > > Seeing some strange mutex behaviour.
> > >
> > > So once in a bazillion times, a mutex fails to release. Here is how the
> > > structure looks like when it fails:
> > >
> > >> (gdb) p/x send_mutex
> > >> $1 = {mu_head = {slh_first = 0x0}, _pad = 0x0, mu_prio = 0x1, mu_level =
> > >> 0x1, mu_owner = 0x0}
> > >
> > >
> > > Why is mu_owner set to 0? That causes the os_mutex_release call to fail
> > > since the current task doesn't match the owner task anymore.
> > >
> > > The task which holds the mutex looks like this:
> > >
> > >> (gdb) p/x cent_task
> > >> $3 = {t_stackptr = 0x20008a28, t_stacktop = 0x20008ac8, t_stacksize =
> > >> 0x80, t_taskid = 0x6, t_prio = 0x1, t_state = 0x1, t_flags = 0x10,
> > >> t_lockcnt = 0x1, t_pad = 0x0,
> > >>  t_name = 0x22378, t_func = 0x90ad, t_arg = 0x0, t_obj = 0x0,
> > >> t_sanity_check = {sc_checkin_last = 0x0, sc_checkin_itvl = 0x0, sc_func
> > =
> > >> 0x0, sc_arg = 0x0, sc_next = {
> > >>      sle_next = 0x0}}, t_next_wakeup = 0x0, t_run_time = 0x0,
> > >> t_ctx_sw_cnt = 0x213d, t_os_task_list = {stqe_next = 0x0}, t_os_list =
> > >> {tqe_next = 0x20001338,
> > >>    tqe_prev = 0x200001a8}, t_obj_list = {sle_next = 0x0}}
> > >
> > >
> > > Comparing t_prio and mu_prio, this confirms that this task is indeed
> > > holding the mutex (no other task is waiting on the mutex).
> > >
> > > What can happen that set mu_owner to 0? My original theory was that if a
> > > mutex_pend was called from an interrupt context, mu_owner would be 0. But
> > > in this case, the only task that is calling mutex is running an eventq,
> > so
> > > that is unlikely.
> > >
> > > Any ideas?
> > >
> > > Jitesh
> > >
> > > --
> > > This email including attachments contains Mad Apparel, Inc. DBA Athos
> > > privileged, confidential, and proprietary information solely for the use
> > > for the addressed recipients. If you are not the intended recipient,
> > please
> > > be aware that any review, disclosure, copying, distribution, or use of
> > the
> > > contents of this message is strictly prohibited. If you have received
> > this
> > > in error, please delete it immediately and notify the sender. All rights
> > > reserved by Mad Apparel, Inc. 2012. The information contained herein is
> > the
> > > exclusive property of Mad Apparel, Inc. and should not be used,
> > > distributed, reproduced, or disclosed in whole or in part without prior
> > > written permission of Mad Apparel, Inc.
> >
> >
> 
> -- 
> This email including attachments contains Mad Apparel, Inc. DBA Athos 
> privileged, confidential, and proprietary information solely for the use 
> for the addressed recipients. If you are not the intended recipient, please 
> be aware that any review, disclosure, copying, distribution, or use of the 
> contents of this message is strictly prohibited. If you have received this 
> in error, please delete it immediately and notify the sender. All rights 
> reserved by Mad Apparel, Inc. 2012. The information contained herein is the 
> exclusive property of Mad Apparel, Inc. and should not be used, 
> distributed, reproduced, or disclosed in whole or in part without prior 
> written permission of Mad Apparel, Inc.

Reply via email to