Hi Gavin,

thanks for the answer.

On 09/29/2016 01:21 AM, Gavin Lambert wrote:
Given that stack trace, and that it works on default but not 1.5.2, then
most likely the commit that worked around the issue for you was
https://sourceforge.net/p/etherlabmaster/code/ci/3affe9cd0b66fe55ef8e8060778
ef9461a8204a0.

Having said that, given that the only reason I can think of that this would
segfault is if strerror returned NULL or an invalid pointer, it suggests
that you might have a broken or badly configured libc.  If you're building
the libc yourself, make sure that you're using an up-to-date version and
haven't excluded the strerror text.

Another possibility is that if you were concurrently calling strerror() on
another thread (and your libc doesn't implement strerror in a thread-local
manner) then it could have corrupted the buffer.  Most likely another patch
would be required to resolve this "properly", although one workaround for
this is to avoid calling ecrt_* APIs from more than one thread.

Although I suppose since you're linking to RTDM it's possible that
strerror() is coming from there rather than the libc; I'm not exactly sure
how RTAI/Xenomai work.  Or possibly that in that context it could be that
the fprintf(strerr) itself is failing -- but this isn't new code so I would
have thought the problem would have come up earlier if that were the case.

I'm not sure exactly which commit 1.5.2 is based on, but it will be one of
the ones in the "stable-1.5" branch.  Everything on "default" is newer than
that.
My test application has only one Xenomai-task (thread) like the xenomai
example, so I don't think this is a concurrency problem unless there is
a thread of the master itself involved. My libc is rather old though
(2.13). Unfortunately there is no newer version backported for Debian
wheezy and I don't want to install it from sources since it wont be
available for working environments anyway. I will leave this matter for
now since it seems to be fixed or at least omitted already. I needed to
know when this was fixed to add the fix as patch to our Debian package
of the EtherCAT master.

#2.)
I did some minor tests with the patch queue and got some bad system
freezes with the xenomai example. I could locate the patch that seems to
cause the system freezes:
0011-Master-locks-to-avoid-corrupted-datagram-queue.patch
The only notable thing I could see in the kernel log is that the slaves went
back to PREOP. The Xenomai task was still running and hanging at some point
of the cycle (I placed an rt_printf in the cycle which should have printed
the cycle_counter value every other second).
The patch series seems to work if I apply the patches up to 0010-Sdo-
directory-now-only-fetched-on-request.patch. Is this reproduceable for
you?
I'm not sure about this as I don't use Xenomai myself.  That particular
patch was authored by Knud Baastrup, so I've added him to the email chain
directly just in case.  If I recall correctly I think he, like myself, was
using PREEMPT_RT so it's possible that this has not been tested with
Xenomai.

Do you have locking on the Xenomai side as well?  Do you call ecrt APIs from
multiple Xenomai tasks?  I believe the patch assumes that there is no
external locking between tasks, so you might be running into deadlocks
depending on the order in which things happen.

Using Linux locks between Xenomai tasks is probably not ideal, but I would
have expected that it ought to work as this occurs in other places as well.
This problem occured with the xenomai example (./examples/xenomai in the
masters source code) as well. There is only one Xenomai task and no
explicit locking from applications side. I am new to Xenomai but as far
as I understand Xenomai it uses a 'dual kernel' configuration called
'cobalt core' which has higher priority than the normal kernel and does
all the scheduling of realtime tasks (see
https://xenomai.org/start-here/#How_does_Xenomai_deliver_real-time). A
Xenomai task should therefore block every task executed in normal kernel
space until it's executed. My guess is that the task waits infinitely
for a master component to be unlocked by another thread in kernel space
which is never done because this thread is not executed due to the
higher priority of the Xenomai task.


Best regards,
Christoph




________________________________

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. 
Jutta Koch-Unterseher
Geschäftsführung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de
_______________________________________________
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Reply via email to