Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread malo
Hello Michael,

that is perfect bug report - sorry I have missed that.

wbr
malo

On 24 February 2016 at 22:13, Michael Andersen 
wrote:

> I did actually put a check in, and break and get back the details. I put
> the stacktrace on an issue:
>
> https://github.com/RIOT-OS/RIOT/issues/4841
>
> I have not made a reproducer because I've been pretty busy, but also
> because if the problem is due to the network stack, it might be hard to
> reproduce without all the other chatty nodes around sending ND, RPL and UDP
> traffic.
>
> On Wed, Feb 24, 2016 at 1:09 PM, malo  wrote:
>
>> Hello Michael,
>>
>> this is ugly hack indeed...
>> could you rather add some check into the _add_timer functions, put
>> breakpoint there and send the backtrace back?
>>
>> Anyway majority of the authors are probably still in the list, so
>> everybody just scratch the top of the head and tries to remember if setting
>> timer from two threads somewhere?
>>
>> wbr
>> malo
>>
>>
>> On 24 February 2016 at 21:02, Michael Andersen 
>> wrote:
>>
>>>

 Would you be willing to share this hack with us? Maybe it gives us more
 insights.


>>> Sure! Here is what I did:
>>>
>>> This is the "lightweight" remove
>>> +int _xtimer_ensure_remove(xtimer_t *timer)
>>> +{
>>> +  int res = 0;
>>> +  if (timer_list_head == timer) {
>>> +  uint32_t next;
>>> +  timer_list_head = timer->next;
>>> +  if (timer_list_head) {
>>> +  next = timer_list_head->target - XTIMER_OVERHEAD;
>>> +  }
>>> +  else {
>>> +  next = _lltimer_mask(0x);
>>> +  }
>>> +  _lltimer_set(next);
>>> +  }
>>> +  else {
>>> +  res = _remove_timer_from_list(&timer_list_head, timer) ||
>>> +  _remove_timer_from_list(&overflow_list_head, timer) ||
>>> +  _remove_timer_from_list(&long_list_head, timer);
>>> +  }
>>> +
>>> +  return res;
>>> +}
>>>
>>> And then I call it from these two places inside the existing critical
>>> sections:
>>>
>>> @@ -104,6 +106,7 @@ void _xtimer_set64(xtimer_t *timer, uint32_t offset,
>>> uint32_t long_offset)
>>>  }
>>>
>>>  int state = disableIRQ();
>>> +_xtimer_ensure_remove(timer);
>>>  _add_timer_to_long_list(&long_list_head, timer);
>>>  restoreIRQ(state);
>>>  DEBUG("xtimer_set64(): added longterm timer (long_target=%"
>>> PRIu32 " target=%" PRIu32 ")\n",
>>> @@ -179,6 +182,7 @@ int _xtimer_set_absolute(xtimer_t *timer, uint32_t
>>> target)
>>>  }
>>>
>>>  unsigned state = disableIRQ();
>>> +_xtimer_ensure_remove(timer);
>>>  if ( !_this_high_period(target) ) {
>>>  DEBUG("xtimer_set_absolute(): the timer doesn't fit into the
>>> low-level timer's mask.\n");
>>>  _add_timer_to_long_list(&long_list_head, timer);
>>>
>>> I then ran it for a week without problems (previously it would
>>> definitely crash within two hours). It's a bad hack though because we spend
>>> the effort to remove the timer earlier in the call chain so I really don't
>>> want to duplicate that work. Also, from my understanding, the timer should
>>> never actually be there so I don't like hiding the fact that somewhere else
>>> someone is doing something ugly.
>>>
>>>
 Cheers,
 Oleg
 --
 panic("This never returns");
 linux-2.6.6/kernel/power/swsusp.c

 ___
 devel mailing list
 devel@riot-os.org
 https://lists.riot-os.org/mailman/listinfo/devel


>>>
>>> ___
>>> devel mailing list
>>> devel@riot-os.org
>>> https://lists.riot-os.org/mailman/listinfo/devel
>>>
>>>
>>
>> ___
>> devel mailing list
>> devel@riot-os.org
>> https://lists.riot-os.org/mailman/listinfo/devel
>>
>>
>
> ___
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
>
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread Michael Andersen
I did actually put a check in, and break and get back the details. I put
the stacktrace on an issue:

https://github.com/RIOT-OS/RIOT/issues/4841

I have not made a reproducer because I've been pretty busy, but also
because if the problem is due to the network stack, it might be hard to
reproduce without all the other chatty nodes around sending ND, RPL and UDP
traffic.

On Wed, Feb 24, 2016 at 1:09 PM, malo  wrote:

> Hello Michael,
>
> this is ugly hack indeed...
> could you rather add some check into the _add_timer functions, put
> breakpoint there and send the backtrace back?
>
> Anyway majority of the authors are probably still in the list, so
> everybody just scratch the top of the head and tries to remember if setting
> timer from two threads somewhere?
>
> wbr
> malo
>
>
> On 24 February 2016 at 21:02, Michael Andersen 
> wrote:
>
>>
>>>
>>> Would you be willing to share this hack with us? Maybe it gives us more
>>> insights.
>>>
>>>
>> Sure! Here is what I did:
>>
>> This is the "lightweight" remove
>> +int _xtimer_ensure_remove(xtimer_t *timer)
>> +{
>> +  int res = 0;
>> +  if (timer_list_head == timer) {
>> +  uint32_t next;
>> +  timer_list_head = timer->next;
>> +  if (timer_list_head) {
>> +  next = timer_list_head->target - XTIMER_OVERHEAD;
>> +  }
>> +  else {
>> +  next = _lltimer_mask(0x);
>> +  }
>> +  _lltimer_set(next);
>> +  }
>> +  else {
>> +  res = _remove_timer_from_list(&timer_list_head, timer) ||
>> +  _remove_timer_from_list(&overflow_list_head, timer) ||
>> +  _remove_timer_from_list(&long_list_head, timer);
>> +  }
>> +
>> +  return res;
>> +}
>>
>> And then I call it from these two places inside the existing critical
>> sections:
>>
>> @@ -104,6 +106,7 @@ void _xtimer_set64(xtimer_t *timer, uint32_t offset,
>> uint32_t long_offset)
>>  }
>>
>>  int state = disableIRQ();
>> +_xtimer_ensure_remove(timer);
>>  _add_timer_to_long_list(&long_list_head, timer);
>>  restoreIRQ(state);
>>  DEBUG("xtimer_set64(): added longterm timer (long_target=%"
>> PRIu32 " target=%" PRIu32 ")\n",
>> @@ -179,6 +182,7 @@ int _xtimer_set_absolute(xtimer_t *timer, uint32_t
>> target)
>>  }
>>
>>  unsigned state = disableIRQ();
>> +_xtimer_ensure_remove(timer);
>>  if ( !_this_high_period(target) ) {
>>  DEBUG("xtimer_set_absolute(): the timer doesn't fit into the
>> low-level timer's mask.\n");
>>  _add_timer_to_long_list(&long_list_head, timer);
>>
>> I then ran it for a week without problems (previously it would definitely
>> crash within two hours). It's a bad hack though because we spend the effort
>> to remove the timer earlier in the call chain so I really don't want to
>> duplicate that work. Also, from my understanding, the timer should never
>> actually be there so I don't like hiding the fact that somewhere else
>> someone is doing something ugly.
>>
>>
>>> Cheers,
>>> Oleg
>>> --
>>> panic("This never returns");
>>> linux-2.6.6/kernel/power/swsusp.c
>>>
>>> ___
>>> devel mailing list
>>> devel@riot-os.org
>>> https://lists.riot-os.org/mailman/listinfo/devel
>>>
>>>
>>
>> ___
>> devel mailing list
>> devel@riot-os.org
>> https://lists.riot-os.org/mailman/listinfo/devel
>>
>>
>
> ___
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
>
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread malo
Hello Michael,

this is ugly hack indeed...
could you rather add some check into the _add_timer functions, put
breakpoint there and send the backtrace back?

Anyway majority of the authors are probably still in the list, so everybody
just scratch the top of the head and tries to remember if setting timer
from two threads somewhere?

wbr
malo


On 24 February 2016 at 21:02, Michael Andersen 
wrote:

>
>>
>> Would you be willing to share this hack with us? Maybe it gives us more
>> insights.
>>
>>
> Sure! Here is what I did:
>
> This is the "lightweight" remove
> +int _xtimer_ensure_remove(xtimer_t *timer)
> +{
> +  int res = 0;
> +  if (timer_list_head == timer) {
> +  uint32_t next;
> +  timer_list_head = timer->next;
> +  if (timer_list_head) {
> +  next = timer_list_head->target - XTIMER_OVERHEAD;
> +  }
> +  else {
> +  next = _lltimer_mask(0x);
> +  }
> +  _lltimer_set(next);
> +  }
> +  else {
> +  res = _remove_timer_from_list(&timer_list_head, timer) ||
> +  _remove_timer_from_list(&overflow_list_head, timer) ||
> +  _remove_timer_from_list(&long_list_head, timer);
> +  }
> +
> +  return res;
> +}
>
> And then I call it from these two places inside the existing critical
> sections:
>
> @@ -104,6 +106,7 @@ void _xtimer_set64(xtimer_t *timer, uint32_t offset,
> uint32_t long_offset)
>  }
>
>  int state = disableIRQ();
> +_xtimer_ensure_remove(timer);
>  _add_timer_to_long_list(&long_list_head, timer);
>  restoreIRQ(state);
>  DEBUG("xtimer_set64(): added longterm timer (long_target=%"
> PRIu32 " target=%" PRIu32 ")\n",
> @@ -179,6 +182,7 @@ int _xtimer_set_absolute(xtimer_t *timer, uint32_t
> target)
>  }
>
>  unsigned state = disableIRQ();
> +_xtimer_ensure_remove(timer);
>  if ( !_this_high_period(target) ) {
>  DEBUG("xtimer_set_absolute(): the timer doesn't fit into the
> low-level timer's mask.\n");
>  _add_timer_to_long_list(&long_list_head, timer);
>
> I then ran it for a week without problems (previously it would definitely
> crash within two hours). It's a bad hack though because we spend the effort
> to remove the timer earlier in the call chain so I really don't want to
> duplicate that work. Also, from my understanding, the timer should never
> actually be there so I don't like hiding the fact that somewhere else
> someone is doing something ugly.
>
>
>> Cheers,
>> Oleg
>> --
>> panic("This never returns");
>> linux-2.6.6/kernel/power/swsusp.c
>>
>> ___
>> devel mailing list
>> devel@riot-os.org
>> https://lists.riot-os.org/mailman/listinfo/devel
>>
>>
>
> ___
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
>
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread Michael Andersen
>
>
>
> Would you be willing to share this hack with us? Maybe it gives us more
> insights.
>
>
Sure! Here is what I did:

This is the "lightweight" remove
+int _xtimer_ensure_remove(xtimer_t *timer)
+{
+  int res = 0;
+  if (timer_list_head == timer) {
+  uint32_t next;
+  timer_list_head = timer->next;
+  if (timer_list_head) {
+  next = timer_list_head->target - XTIMER_OVERHEAD;
+  }
+  else {
+  next = _lltimer_mask(0x);
+  }
+  _lltimer_set(next);
+  }
+  else {
+  res = _remove_timer_from_list(&timer_list_head, timer) ||
+  _remove_timer_from_list(&overflow_list_head, timer) ||
+  _remove_timer_from_list(&long_list_head, timer);
+  }
+
+  return res;
+}

And then I call it from these two places inside the existing critical
sections:

@@ -104,6 +106,7 @@ void _xtimer_set64(xtimer_t *timer, uint32_t offset,
uint32_t long_offset)
 }

 int state = disableIRQ();
+_xtimer_ensure_remove(timer);
 _add_timer_to_long_list(&long_list_head, timer);
 restoreIRQ(state);
 DEBUG("xtimer_set64(): added longterm timer (long_target=%" PRIu32
" target=%" PRIu32 ")\n",
@@ -179,6 +182,7 @@ int _xtimer_set_absolute(xtimer_t *timer, uint32_t
target)
 }

 unsigned state = disableIRQ();
+_xtimer_ensure_remove(timer);
 if ( !_this_high_period(target) ) {
 DEBUG("xtimer_set_absolute(): the timer doesn't fit into the
low-level timer's mask.\n");
 _add_timer_to_long_list(&long_list_head, timer);

I then ran it for a week without problems (previously it would definitely
crash within two hours). It's a bad hack though because we spend the effort
to remove the timer earlier in the call chain so I really don't want to
duplicate that work. Also, from my understanding, the timer should never
actually be there so I don't like hiding the fact that somewhere else
someone is doing something ugly.


> Cheers,
> Oleg
> --
> panic("This never returns");
> linux-2.6.6/kernel/power/swsusp.c
>
> ___
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
>
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread Oleg Hahm
Hey Kaspar!

On Wed, Feb 24, 2016 at 08:45:45PM +0100, Kaspar Schleiser wrote:
> On 02/24/2016 07:50 PM, Oleg Hahm wrote:
> > On Wed, Feb 24, 2016 at 02:56:32PM +0100, malo wrote:
> >> - return an error (cleanest, but currently xtimer_set() never returns an
> >> error)
> > 
> > This somehow rings a bell, doesn't it? ;-)
> 
> No?

I was referring to the discussion about return values for xtimer_set() in
https://github.com/RIOT-OS/RIOT/pull/2926#discussion_r37421725 but it's
probably better for me that you don't remember. ;-)

Cheers,
Oleg
-- 
panic("CPU too expensive - making holiday in the ANDES!");
linux-2.2.16/arch/mips/kernel/traps.c


signature.asc
Description: PGP signature
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread Kaspar Schleiser
Hey,

On 02/24/2016 07:50 PM, Oleg Hahm wrote:
> On Wed, Feb 24, 2016 at 02:56:32PM +0100, malo wrote:
>> - return an error (cleanest, but currently xtimer_set() never returns an
>> error)
> 
> This somehow rings a bell, doesn't it? ;-)

No?

Kaspar
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread Oleg Hahm
Hi Michael!

On Wed, Feb 24, 2016 at 10:20:12AM -0800, Michael Andersen wrote:
> > the current xtimer is "just" not thread safe and should not be used/set 
> > from multiple threads (hope we agree on this). the question is, if there is 
> > somewhere in the riot core single xtimer_t* set from multiple threads?
> > Or was just userspace misuse...
> > Hope the second is the case:)
> >
> It was my code that initially discovered it. I was quite deliberate about
> not accessing the same object from two threads. In fact the timer that was
> found to be in the list twice was one created by riot to wait for message
> with timeout. Not ruling out user error, but it would have to be a more
> subtle user error :-)

Do I understand this correctly: your application was not using multiple
threads accessing the same timer (i.e. xtimer_t struct), but you had still
concurrency problems with the timer? Can you elaborate on this?

> I've been busy on other things, so I have not had time to dig into where
> this is coming from, but a temporary hotfix that stopped my mesh from
> crashing was to put a modified version of remove_timer  inside the critical
> section of add timer. At least that way the critical section does not
> contain the spin wait. Not advocating this as the solution, it's just a
> hack until I get time to improve it.

Would you be willing to share this hack with us? Maybe it gives us more
insights.

Cheers,
Oleg
-- 
panic("This never returns");
linux-2.6.6/kernel/power/swsusp.c


signature.asc
Description: PGP signature
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread Oleg Hahm
Hi Kaspar!

On Wed, Feb 24, 2016 at 02:56:32PM +0100, malo wrote:
> - return an error (cleanest, but currently xtimer_set() never returns an
> error)

This somehow rings a bell, doesn't it? ;-)

Cheers,
Oleg
-- 
gur orfg guvat nobhg EBG13 wbxrf vf, rirelbar unf gb qvt hc gurve 20 lrne byq
pbairegref


signature.asc
Description: PGP signature
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread Oleg Hahm
Hi!

On Wed, Feb 24, 2016 at 02:56:32PM +0100, malo wrote:
> the question is, if there is somewhere in the riot core single xtimer_t* set
> from multiple threads?

If you just refer to the kernel as core (as in the folder structure),then we
can be pretty sure about this: the kernel works completely without any timer.
But if you refer to the "whole thing" as core (including e.g. network stack or
drivers) then we have to carefully analyze this.

Cheers,
Oleg
-- 
A SQL Query walks into a bar, approaches two tables and asks 'Can I join you?'


signature.asc
Description: PGP signature
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-24 Thread Michael Andersen
On Wed, Feb 24, 2016 at 5:56 AM, malo  wrote:

> Hello Kaspar,
> sorry for "out out order" message, but I was not in the list when this thread 
> was started.
>
> Adding my two cents:
> the current xtimer is "just" not thread safe and should not be used/set from 
> multiple threads (hope we agree on this). the question is, if there is 
> somewhere in the riot core single xtimer_t* set from multiple threads?
> Or was just userspace misuse...
> Hope the second is the case:)
>
>
It was my code that initially discovered it. I was quite deliberate about
not accessing the same object from two threads. In fact the timer that was
found to be in the list twice was one created by riot to wait for message
with timeout. Not ruling out user error, but it would have to be a more
subtle user error :-)

I've been busy on other things, so I have not had time to dig into where
this is coming from, but a temporary hotfix that stopped my mesh from
crashing was to put a modified version of remove_timer  inside the critical
section of add timer. At least that way the critical section does not
contain the spin wait. Not advocating this as the solution, it's just a
hack until I get time to improve it.

>
> As for returning error on xtimer_set - what you would do with the return 
> value?
> maybe something like?:
> while (true)
> {
>   if (!xtimer_set())
>   {
> yield();
>   }
>   else
>   {
> break;
>   }
> }
> quite ugly...
>
>
> Sure there should be check in _add_timer_to_long_list and _add_timer_to_list 
> for duplicates and assert in the case that trying to add pointer which is 
> already in - just to warn user that is doing something bad.
>
> wbr
> malo
>
> > 
> >  would
> > end with list_head equal to the timer (assuming no other timer has the
> > same time), and then the next two lines would basically link the timer
> > to itself.
> >
> > I could be wrong though, that is just a guess.
>
> I think your analysis is correct, I managed to create a test case that
> shows pretty much the behaviour you're describing.
>
> Guarding most of xtimer_set() (using disableIRQ/restoreIRQ) fixes the
> problem, but disables interrupts for the backoff spin loop.
>
> While hanging within xtimer is probably the worst, I'm not sure what
> would be the best semantic for concurrently setting the same timer object:
>
> - return an error (cleanest, but currently xtimer_set() never returns an
> error)
> - first xtimer_set() wins (easy to implement by somehow tagging the
> timer object, but probably unexpected)
> - second xtimer_set() wins (very hard to do as xtimer_set() can be
> called in ISR context, and there's no way to wait for, e.g., a mutex)
> - guard the whole timer setting procedure (would disable interrupts
> while spinning)
> - ?
>
> Opinions?
>
> Kaspar
>
>
> ___
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
>
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


[riot-devel] Odd problems with xtimer

2016-02-24 Thread malo
Hello Kaspar,
sorry for "out out order" message, but I was not in the list when this
thread was started.

Adding my two cents:
the current xtimer is "just" not thread safe and should not be
used/set from multiple threads (hope we agree on this). the question
is, if there is somewhere in the riot core single xtimer_t* set from
multiple threads?
Or was just userspace misuse...
Hope the second is the case:)


As for returning error on xtimer_set - what you would do with the return value?
maybe something like?:
while (true)
{
  if (!xtimer_set())
  {
yield();
  }
  else
  {
break;
  }
}
quite ugly...


Sure there should be check in _add_timer_to_long_list and
_add_timer_to_list for duplicates and assert in the case that trying
to add pointer which is already in - just to warn user that is doing
something bad.

wbr
malo

> 
>  would
> end with list_head equal to the timer (assuming no other timer has the
> same time), and then the next two lines would basically link the timer
> to itself.
>
> I could be wrong though, that is just a guess.

I think your analysis is correct, I managed to create a test case that
shows pretty much the behaviour you're describing.

Guarding most of xtimer_set() (using disableIRQ/restoreIRQ) fixes the
problem, but disables interrupts for the backoff spin loop.

While hanging within xtimer is probably the worst, I'm not sure what
would be the best semantic for concurrently setting the same timer object:

- return an error (cleanest, but currently xtimer_set() never returns an
error)
- first xtimer_set() wins (easy to implement by somehow tagging the
timer object, but probably unexpected)
- second xtimer_set() wins (very hard to do as xtimer_set() can be
called in ISR context, and there's no way to wait for, e.g., a mutex)
- guard the whole timer setting procedure (would disable interrupts
while spinning)
- ?

Opinions?

Kaspar
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-19 Thread Kaspar Schleiser
Hey Michael,

On 02/10/2016 11:08 PM, Michael Andersen wrote:
> One question, in the network stacks, are there ever two threads possibly
> using the same timer object? I ask because the timer_remove and the
> insert are in two different critical sections, and if there are
> concurrent calls with the same timer object then it might be possible to
> interrupt between the critical sections and insert a timer that is
> already in the list. What would then happen is that this loop
> 
>  would
> end with list_head equal to the timer (assuming no other timer has the
> same time), and then the next two lines would basically link the timer
> to itself.
> 
> I could be wrong though, that is just a guess.

I think your analysis is correct, I managed to create a test case that
shows pretty much the behaviour you're describing.

Guarding most of xtimer_set() (using disableIRQ/restoreIRQ) fixes the
problem, but disables interrupts for the backoff spin loop.

While hanging within xtimer is probably the worst, I'm not sure what
would be the best semantic for concurrently setting the same timer object:

- return an error (cleanest, but currently xtimer_set() never returns an
error)
- first xtimer_set() wins (easy to implement by somehow tagging the
timer object, but probably unexpected)
- second xtimer_set() wins (very hard to do as xtimer_set() can be
called in ISR context, and there's no way to wait for, e.g., a mutex)
- guard the whole timer setting procedure (would disable interrupts
while spinning)
- ?

Opinions?

Kaspar
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-12 Thread Michael Andersen
Hi

I am using other IPC messages, yes. There is a thread waiting with
xtimer_msg_receive_timeout that gets messages either from xtimer_set_msg or
from the network stack on packet reception.

Incidentally, if I decrease the load on the MCU by increasing the sensor
sampling interval, the problem seems to go away (or at least it has not
shown again in the past day).

I inserted a trap in the timer code that will stop everything and let me
debug if a timer is found in the linked list with the same address as the
timer to be inserted, I'll let you know again with more info when I
reproduce it.

Thanks
Michael



On Thu, Feb 11, 2016 at 2:05 AM, Joakim Nohlgård 
wrote:

> Also, you can use the .map file to find out if there are any buffers
> or other things nearby which may have overflowed and messed up your
> state.
>
> Are you using any IPC messages other than the xtimer functions?
> (I wonder if there might be a race between the timer ISR callbacks and
> the message reception in xtimer)
>
> Regards, Joakim
>
> On Wed, Feb 10, 2016 at 11:21 PM, Martine Lenders
>  wrote:
> > Hi,
> > normally you can guess where the timer came from by looking at the
> address
> > (or the debugger straight tells you). Is this somehow possible for your
> case
> > (i.e. 0x200010a4)? That might be helpful for the timer people.
> >
> > Regards,
> > Martine
> >
> > 2016-02-10 23:08 GMT+01:00 Michael Andersen :
> >>
> >> Hi
> >>
> >> Thanks for the reply. I am on a platform essentially equal to a
> >> samr21xpro.
> >>
> >> The short answers:
> >>  - samr21xpro
> >> - only one declared xtimer_t object that is used more than once. I use
> it
> >> with xtimer_set_msg for a thread to send itself a message. Both the
> timer
> >> and the msg object are statically allocated. On the other hand, I have
> RPL
> >> and all sorts of network things going and I have no doubt there are a
> ton of
> >> timers involved. In terms of ephemeral timers, I call xtimer_usleep a
> LOT
> >> with intervals of between 1ms and 100ms from multiple threads. I also
> send
> >> packets every 200ms or so and receive them every 500ms or so.
> >>  -The interrupt load might be pretty steep if the radio is interrupting
> on
> >> every packet (promiscuous mode). I don't think it is. Otherwise I would
> >> imagine that other than the timers it is less than ten per second.
> >>
> >> As for memory corruption, that may well be the case. I will double check
> >> my code. I thought it was somewhat unusual that multiple boards would
> all
> >> get a timer pointing to itself, but I suppose not all corruption is
> >> non-deterministic and they all run identical firmware, so it might be
> >> corruption.
> >>
> >> One question, in the network stacks, are there ever two threads possibly
> >> using the same timer object? I ask because the timer_remove and the
> insert
> >> are in two different critical sections, and if there are concurrent
> calls
> >> with the same timer object then it might be possible to interrupt
> between
> >> the critical sections and insert a timer that is already in the list.
> What
> >> would then happen is that this loop would end with list_head equal to
> the
> >> timer (assuming no other timer has the same time), and then the next two
> >> lines would basically link the timer to itself.
> >>
> >> I could be wrong though, that is just a guess.
> >>
> >>
> >> On Wed, Feb 10, 2016 at 2:45 AM, Kaspar Schleiser 
> >> wrote:
> >>>
> >>> Hey Michael,
> >>>
> >>> On 02/10/2016 07:57 AM, Michael Andersen wrote:
> >>> > it seems that one of the nodes in the list points to itself, hence
> the
> >>> > endless loop.
> >>> >
> >>> > My first question is: when is this possible? It seems at first glance
> >>> > that all code paths that lead here call remove_timer to prevent this
> >>> > sort of problem.
> >>> It should not be possible (tm).
> >>>
> >>> I took another look at the code, it seems to me that timer->next gets
> >>> overwritten whenever a timer is set, so there can't be some outdated
> >>> value.
> >>>
> >>> It might be that the list logic has a bug somewhere, but I remember
> >>> testing them quite rigourously.
> >>>
> >>> > I don't access a the same timer object from two
> >>> > different threads. My code using xtimer functions is not reentered.
> >>> >
> >>> > I don't use that many timer operations in my application code, but I
> do
> >>> > assume that the following functions don't require any freeing or
> >>> > removing afterwards, am I wrong?
> >>> Completely right.
> >>>
> >>> Could you tell us more on how you are using timers?
> >>>
> >>> Interesting would be things like
> >>>
> >>> - what platform are you on
> >>> - how many timers are simultaneously active
> >>> - how are the intervals
> >>> - how is the interrupt load
> >>>
> >>> ... that might help corner the issue.
> >>>
> >>> You should consider xtimer just showing a problem which might be caused
> >>> by memory corruption.
> >>>
> >>> Kaspar
> >>> ___
> 

Re: [riot-devel] Odd problems with xtimer

2016-02-11 Thread Joakim Nohlgård
Also, you can use the .map file to find out if there are any buffers
or other things nearby which may have overflowed and messed up your
state.

Are you using any IPC messages other than the xtimer functions?
(I wonder if there might be a race between the timer ISR callbacks and
the message reception in xtimer)

Regards, Joakim

On Wed, Feb 10, 2016 at 11:21 PM, Martine Lenders
 wrote:
> Hi,
> normally you can guess where the timer came from by looking at the address
> (or the debugger straight tells you). Is this somehow possible for your case
> (i.e. 0x200010a4)? That might be helpful for the timer people.
>
> Regards,
> Martine
>
> 2016-02-10 23:08 GMT+01:00 Michael Andersen :
>>
>> Hi
>>
>> Thanks for the reply. I am on a platform essentially equal to a
>> samr21xpro.
>>
>> The short answers:
>>  - samr21xpro
>> - only one declared xtimer_t object that is used more than once. I use it
>> with xtimer_set_msg for a thread to send itself a message. Both the timer
>> and the msg object are statically allocated. On the other hand, I have RPL
>> and all sorts of network things going and I have no doubt there are a ton of
>> timers involved. In terms of ephemeral timers, I call xtimer_usleep a LOT
>> with intervals of between 1ms and 100ms from multiple threads. I also send
>> packets every 200ms or so and receive them every 500ms or so.
>>  -The interrupt load might be pretty steep if the radio is interrupting on
>> every packet (promiscuous mode). I don't think it is. Otherwise I would
>> imagine that other than the timers it is less than ten per second.
>>
>> As for memory corruption, that may well be the case. I will double check
>> my code. I thought it was somewhat unusual that multiple boards would all
>> get a timer pointing to itself, but I suppose not all corruption is
>> non-deterministic and they all run identical firmware, so it might be
>> corruption.
>>
>> One question, in the network stacks, are there ever two threads possibly
>> using the same timer object? I ask because the timer_remove and the insert
>> are in two different critical sections, and if there are concurrent calls
>> with the same timer object then it might be possible to interrupt between
>> the critical sections and insert a timer that is already in the list. What
>> would then happen is that this loop would end with list_head equal to the
>> timer (assuming no other timer has the same time), and then the next two
>> lines would basically link the timer to itself.
>>
>> I could be wrong though, that is just a guess.
>>
>>
>> On Wed, Feb 10, 2016 at 2:45 AM, Kaspar Schleiser 
>> wrote:
>>>
>>> Hey Michael,
>>>
>>> On 02/10/2016 07:57 AM, Michael Andersen wrote:
>>> > it seems that one of the nodes in the list points to itself, hence the
>>> > endless loop.
>>> >
>>> > My first question is: when is this possible? It seems at first glance
>>> > that all code paths that lead here call remove_timer to prevent this
>>> > sort of problem.
>>> It should not be possible (tm).
>>>
>>> I took another look at the code, it seems to me that timer->next gets
>>> overwritten whenever a timer is set, so there can't be some outdated
>>> value.
>>>
>>> It might be that the list logic has a bug somewhere, but I remember
>>> testing them quite rigourously.
>>>
>>> > I don't access a the same timer object from two
>>> > different threads. My code using xtimer functions is not reentered.
>>> >
>>> > I don't use that many timer operations in my application code, but I do
>>> > assume that the following functions don't require any freeing or
>>> > removing afterwards, am I wrong?
>>> Completely right.
>>>
>>> Could you tell us more on how you are using timers?
>>>
>>> Interesting would be things like
>>>
>>> - what platform are you on
>>> - how many timers are simultaneously active
>>> - how are the intervals
>>> - how is the interrupt load
>>>
>>> ... that might help corner the issue.
>>>
>>> You should consider xtimer just showing a problem which might be caused
>>> by memory corruption.
>>>
>>> Kaspar
>>> ___
>>> devel mailing list
>>> devel@riot-os.org
>>> https://lists.riot-os.org/mailman/listinfo/devel
>>
>>
>>
>> ___
>> devel mailing list
>> devel@riot-os.org
>> https://lists.riot-os.org/mailman/listinfo/devel
>>
>
>
> ___
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-10 Thread Martine Lenders
Hi,
normally you can guess where the timer came from by looking at the address
(or the debugger straight tells you). Is this somehow possible for your
case (i.e. 0x200010a4)? That might be helpful for the timer people.

Regards,
Martine

2016-02-10 23:08 GMT+01:00 Michael Andersen :

> Hi
>
> Thanks for the reply. I am on a platform essentially equal to a
> samr21xpro.
>
> The short answers:
>  - samr21xpro
> - only one declared xtimer_t object that is used more than once. I use it
> with xtimer_set_msg for a thread to send itself a message. Both the timer
> and the msg object are statically allocated. On the other hand, I have RPL
> and all sorts of network things going and I have no doubt there are a ton
> of timers involved. In terms of ephemeral timers, I call xtimer_usleep a
> LOT with intervals of between 1ms and 100ms from multiple threads. I also
> send packets every 200ms or so and receive them every 500ms or so.
>  -The interrupt load might be pretty steep if the radio is interrupting on
> every packet (promiscuous mode). I don't think it is. Otherwise I would
> imagine that other than the timers it is less than ten per second.
>
> As for memory corruption, that may well be the case. I will double check
> my code. I thought it was somewhat unusual that multiple boards would all
> get a timer pointing to itself, but I suppose not all corruption is
> non-deterministic and they all run identical firmware, so it might be
> corruption.
>
> One question, in the network stacks, are there ever two threads possibly
> using the same timer object? I ask because the timer_remove and the insert
> are in two different critical sections, and if there are concurrent calls
> with the same timer object then it might be possible to interrupt between
> the critical sections and insert a timer that is already in the list. What
> would then happen is that this loop
> 
>  would
> end with list_head equal to the timer (assuming no other timer has the same
> time), and then the next two lines would basically link the timer to itself.
>
> I could be wrong though, that is just a guess.
>
>
> On Wed, Feb 10, 2016 at 2:45 AM, Kaspar Schleiser 
> wrote:
>
>> Hey Michael,
>>
>> On 02/10/2016 07:57 AM, Michael Andersen wrote:
>> > it seems that one of the nodes in the list points to itself, hence the
>> > endless loop.
>> >
>> > My first question is: when is this possible? It seems at first glance
>> > that all code paths that lead here call remove_timer to prevent this
>> > sort of problem.
>> It should not be possible (tm).
>>
>> I took another look at the code, it seems to me that timer->next gets
>> overwritten whenever a timer is set, so there can't be some outdated
>> value.
>>
>> It might be that the list logic has a bug somewhere, but I remember
>> testing them quite rigourously.
>>
>> > I don't access a the same timer object from two
>> > different threads. My code using xtimer functions is not reentered.
>> >
>> > I don't use that many timer operations in my application code, but I do
>> > assume that the following functions don't require any freeing or
>> > removing afterwards, am I wrong?
>> Completely right.
>>
>> Could you tell us more on how you are using timers?
>>
>> Interesting would be things like
>>
>> - what platform are you on
>> - how many timers are simultaneously active
>> - how are the intervals
>> - how is the interrupt load
>>
>> ... that might help corner the issue.
>>
>> You should consider xtimer just showing a problem which might be caused
>> by memory corruption.
>>
>> Kaspar
>> ___
>> devel mailing list
>> devel@riot-os.org
>> https://lists.riot-os.org/mailman/listinfo/devel
>>
>
>
> ___
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
>
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-10 Thread Michael Andersen
Hi

Thanks for the reply. I am on a platform essentially equal to a samr21xpro.

The short answers:
 - samr21xpro
- only one declared xtimer_t object that is used more than once. I use it
with xtimer_set_msg for a thread to send itself a message. Both the timer
and the msg object are statically allocated. On the other hand, I have RPL
and all sorts of network things going and I have no doubt there are a ton
of timers involved. In terms of ephemeral timers, I call xtimer_usleep a
LOT with intervals of between 1ms and 100ms from multiple threads. I also
send packets every 200ms or so and receive them every 500ms or so.
 -The interrupt load might be pretty steep if the radio is interrupting on
every packet (promiscuous mode). I don't think it is. Otherwise I would
imagine that other than the timers it is less than ten per second.

As for memory corruption, that may well be the case. I will double check my
code. I thought it was somewhat unusual that multiple boards would all get
a timer pointing to itself, but I suppose not all corruption is
non-deterministic and they all run identical firmware, so it might be
corruption.

One question, in the network stacks, are there ever two threads possibly
using the same timer object? I ask because the timer_remove and the insert
are in two different critical sections, and if there are concurrent calls
with the same timer object then it might be possible to interrupt between
the critical sections and insert a timer that is already in the list. What
would then happen is that this loop

would
end with list_head equal to the timer (assuming no other timer has the same
time), and then the next two lines would basically link the timer to itself.

I could be wrong though, that is just a guess.


On Wed, Feb 10, 2016 at 2:45 AM, Kaspar Schleiser 
wrote:

> Hey Michael,
>
> On 02/10/2016 07:57 AM, Michael Andersen wrote:
> > it seems that one of the nodes in the list points to itself, hence the
> > endless loop.
> >
> > My first question is: when is this possible? It seems at first glance
> > that all code paths that lead here call remove_timer to prevent this
> > sort of problem.
> It should not be possible (tm).
>
> I took another look at the code, it seems to me that timer->next gets
> overwritten whenever a timer is set, so there can't be some outdated value.
>
> It might be that the list logic has a bug somewhere, but I remember
> testing them quite rigourously.
>
> > I don't access a the same timer object from two
> > different threads. My code using xtimer functions is not reentered.
> >
> > I don't use that many timer operations in my application code, but I do
> > assume that the following functions don't require any freeing or
> > removing afterwards, am I wrong?
> Completely right.
>
> Could you tell us more on how you are using timers?
>
> Interesting would be things like
>
> - what platform are you on
> - how many timers are simultaneously active
> - how are the intervals
> - how is the interrupt load
>
> ... that might help corner the issue.
>
> You should consider xtimer just showing a problem which might be caused
> by memory corruption.
>
> Kaspar
> ___
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


Re: [riot-devel] Odd problems with xtimer

2016-02-10 Thread Kaspar Schleiser
Hey Michael,

On 02/10/2016 07:57 AM, Michael Andersen wrote:
> it seems that one of the nodes in the list points to itself, hence the
> endless loop.
> 
> My first question is: when is this possible? It seems at first glance
> that all code paths that lead here call remove_timer to prevent this
> sort of problem. 
It should not be possible (tm).

I took another look at the code, it seems to me that timer->next gets
overwritten whenever a timer is set, so there can't be some outdated value.

It might be that the list logic has a bug somewhere, but I remember
testing them quite rigourously.

> I don't access a the same timer object from two
> different threads. My code using xtimer functions is not reentered.
> 
> I don't use that many timer operations in my application code, but I do
> assume that the following functions don't require any freeing or
> removing afterwards, am I wrong?
Completely right.

Could you tell us more on how you are using timers?

Interesting would be things like

- what platform are you on
- how many timers are simultaneously active
- how are the intervals
- how is the interrupt load

... that might help corner the issue.

You should consider xtimer just showing a problem which might be caused
by memory corruption.

Kaspar
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel


[riot-devel] Odd problems with xtimer

2016-02-09 Thread Michael Andersen
Hi

I am new to RIOT, so I hope this is user error, but I am having some grief
inside xtimer. I am running a mesh of nodes with a full RPL stack (in case
that's relevant) and the biggest problem I have at the moment is nodes
hanging in _add_timer_to_list in xtimer_core (link
)

Using GDB, it seems that one of the nodes in the list points to itself,
hence the endless loop.

(gdb) print/x *list_head
$3 = 0x200010a4
(gdb) print/x (*list_head)->next
$4 = 0x200010a4

My first question is: when is this possible? It seems at first glance that
all code paths that lead here call remove_timer to prevent this sort of
problem. I don't access a the same timer object from two different threads.
My code using xtimer functions is not reentered.

I don't use that many timer operations in my application code, but I do
assume that the following functions don't require any freeing or removing
afterwards, am I wrong?

xtimer_now
xtimer_set_msg (the msg is statically alllocated)
xtimer_msg_receive_timeout
xtimer_usleep_until

Any help would be appreciated.
___
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel