I'm not entirely sure, but I don't think simply changing that would be safe.

The whole "on interrupt return -EINTR" thing assumes that it's safe to simply 
make the exact same call again to "resume" the operation.  This is true in the 
first case because it's just waiting for the request to be enqueued, and on 
interrupt it simply dequeues it again.  However after that there's a race where 
it might have already been sent and is waiting for a response, and in that case 
it's not safe to return -EINTR because it might end up being sent a second 
time, which could cause incorrect behavior of the slave.  (And would probably 
also confuse the mailbox FSM.)

It might be possible to abort the request on interrupt instead, but that would 
be annoying as thread signals can cause spurious interrupts.  (And might still 
end up meaning the slave will receive requests twice, if the app then 
explicitly retries.)


If you instead explicitly close(masterfd) (aka ecrt_release_master) in your 
problem case, this should abort all pending requests and wake up the threads - 
you can see the code that does this in ec_slave_clear and 
ec_master_clear_slaves.

(The OS will automatically do this when your process actually terminates, but 
not while you still have a live thread.  So you will have to use an 
exception/signal handler to intercept the crash in progress.)

Another option is to use the non-blocking SDO request APIs instead.  Using 
these (on the cyclic thread) is better anyway for regular transfers done while 
the master is activated, as it avoids ping-ponging the master locks between 
multiple threads, which can increase cycle latency.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>  
[cid:insta2_1cd85de9-b3a2-4971-9904-52b2481a7c82.png] 
<https://www.instagram.com/compacsort/>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Geller, Nir
Sent: Wednesday, 29 January 2020 23:31
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] wait_event() causes uninterruptible_sleep

Hi There,

we are working with etherlab's ethercat master and recently we've encountered a 
problem that is related to a non interruptible wait_event().

The scenario:
A multi-threaded user space app cyclically reads SDO from some ecat slave.
The user space app then crashes.
All the threads end besides the one that performs the SDO read:

.....
1022  1022 TS       -   0  19   0  0.0 Zl   task_dead                abcde 
<defunct>
1022  1202 RR       2   -  42   0  0.6 Dl   ecrt_master_sdo_upload   abcde1
.....

This situation interferes with debugging the app, and prevents a core dump from 
being generated.

In master.c in ecrt_master_sdo_upload() I see an invoke of 
wait_event_interruptible() followed by an invoke of wait_event().

After changing wait_event() to wait_event_interruptible() the app can 
successfully crash, and it is now easier to debug.

Needless to say, we need a core dump to be generated when the app crashes at 
costumer's site.

The question is what is the reason behind using wait_event() instead of 
wait_event_interruptible() ?

Is it safe for us to change the code?

Thanks,

Nir.
_______________________________________________
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Reply via email to