Hi Gavin, thanks for the answer.
On 09/29/2016 01:21 AM, Gavin Lambert wrote:
Given that stack trace, and that it works on default but not 1.5.2, then most likely the commit that worked around the issue for you was https://sourceforge.net/p/etherlabmaster/code/ci/3affe9cd0b66fe55ef8e8060778 ef9461a8204a0. Having said that, given that the only reason I can think of that this would segfault is if strerror returned NULL or an invalid pointer, it suggests that you might have a broken or badly configured libc. If you're building the libc yourself, make sure that you're using an up-to-date version and haven't excluded the strerror text. Another possibility is that if you were concurrently calling strerror() on another thread (and your libc doesn't implement strerror in a thread-local manner) then it could have corrupted the buffer. Most likely another patch would be required to resolve this "properly", although one workaround for this is to avoid calling ecrt_* APIs from more than one thread. Although I suppose since you're linking to RTDM it's possible that strerror() is coming from there rather than the libc; I'm not exactly sure how RTAI/Xenomai work. Or possibly that in that context it could be that the fprintf(strerr) itself is failing -- but this isn't new code so I would have thought the problem would have come up earlier if that were the case. I'm not sure exactly which commit 1.5.2 is based on, but it will be one of the ones in the "stable-1.5" branch. Everything on "default" is newer than that.
My test application has only one Xenomai-task (thread) like the xenomai example, so I don't think this is a concurrency problem unless there is a thread of the master itself involved. My libc is rather old though (2.13). Unfortunately there is no newer version backported for Debian wheezy and I don't want to install it from sources since it wont be available for working environments anyway. I will leave this matter for now since it seems to be fixed or at least omitted already. I needed to know when this was fixed to add the fix as patch to our Debian package of the EtherCAT master.
#2.) I did some minor tests with the patch queue and got some bad system freezes with the xenomai example. I could locate the patch that seems to cause the system freezes: 0011-Master-locks-to-avoid-corrupted-datagram-queue.patch The only notable thing I could see in the kernel log is that the slaves went back to PREOP. The Xenomai task was still running and hanging at some point of the cycle (I placed an rt_printf in the cycle which should have printed the cycle_counter value every other second). The patch series seems to work if I apply the patches up to 0010-Sdo- directory-now-only-fetched-on-request.patch. Is this reproduceable for you? I'm not sure about this as I don't use Xenomai myself. That particular patch was authored by Knud Baastrup, so I've added him to the email chain directly just in case. If I recall correctly I think he, like myself, was using PREEMPT_RT so it's possible that this has not been tested with Xenomai. Do you have locking on the Xenomai side as well? Do you call ecrt APIs from multiple Xenomai tasks? I believe the patch assumes that there is no external locking between tasks, so you might be running into deadlocks depending on the order in which things happen. Using Linux locks between Xenomai tasks is probably not ideal, but I would have expected that it ought to work as this occurs in other places as well.
This problem occured with the xenomai example (./examples/xenomai in the masters source code) as well. There is only one Xenomai task and no explicit locking from applications side. I am new to Xenomai but as far as I understand Xenomai it uses a 'dual kernel' configuration called 'cobalt core' which has higher priority than the normal kernel and does all the scheduling of realtime tasks (see https://xenomai.org/start-here/#How_does_Xenomai_deliver_real-time). A Xenomai task should therefore block every task executed in normal kernel space until it's executed. My guess is that the task waits infinitely for a master component to be unlocked by another thread in kernel space which is never done because this thread is not executed due to the higher priority of the Xenomai task. Best regards, Christoph ________________________________ Helmholtz-Zentrum Berlin für Materialien und Energie GmbH Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V. Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher Geschäftsführung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking Sitz Berlin, AG Charlottenburg, 89 HRB 5583 Postadresse: Hahn-Meitner-Platz 1 D-14109 Berlin http://www.helmholtz-berlin.de _______________________________________________ etherlab-dev mailing list etherlab-dev@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-dev