A few more questions if I may ...

My understanding of an unmatched datagram is that the master doesn't recognize the frame it sent from the frame received, not that they're not making it all the way around, though I'm sure that could very well be the case, but what I'm looking at is as follows.

- datagram 0x21 is identified as being sent at 17401.923410.
- datagram 0x21 is shown received at 17401.923614.

And at 17401.923646, datagram 0x21 is shown as unmatched at 17401.923647.

[17401.923395] EtherCAT DEBUG 0: ec_master_send_datagrams(device_index = 0)
[17401.923396] EtherCAT DEBUG 0: Adding datagram 0x21
[17401.923397] EtherCAT DEBUG 0: frame size: 46
[17401.923397] EtherCAT DEBUG 0: Sending frame:
[17401.923398] EtherCAT DEBUG: FF FF FF FF FF FF 00 60 E0 7A 63 E0 88 A4 18 10 [17401.923410] EtherCAT DEBUG: 0A 21 00 00 00 00 0C 00 00 00 00 00 00 00 00 00 [17401.923421] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923433] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923501] EtherCAT DEBUG 0: Datagram 00000000965a32dd already queued (skipping).
[17401.923503] EtherCAT DEBUG 0: ec_master_send_datagrams(device_index = 0)
[17401.923504] EtherCAT DEBUG 0: Adding datagram 0x22
[17401.923505] EtherCAT DEBUG 0: frame size: 46
[17401.923506] EtherCAT DEBUG 0: Sending frame:
[17401.923507] EtherCAT DEBUG: FF FF FF FF FF FF 00 60 E0 7A 63 E0 88 A4 18 10 [17401.923518] EtherCAT DEBUG: 0A 22 00 00 00 00 0C 00 00 00 00 00 00 00 00 00 [17401.923529] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923541] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923600] EtherCAT DEBUG 0: Received frame:
[17401.923602] EtherCAT DEBUG: FF FF FF FF FF FF 02 60 E0 7A 63 E0 88 A4 18 10 [17401.923614] EtherCAT DEBUG: 0A 21 00 00 00 00 0C 00 00 00 00 00 00 00 00 00 [17401.923625] EtherCAT DEBUG: 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00
[17401.923637] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923646] EtherCAT DEBUG 0: UNMATCHED datagram:
[17401.923647] EtherCAT DEBUG: 0A 21 00 00 00 00 0C 00 00 00 00 00 00 00 00 00
[17401.923659] EtherCAT DEBUG: 00 00 00 00 00 00 01 00
[17401.923665] EtherCAT DEBUG 0: Received frame:
[17401.923666] EtherCAT DEBUG: FF FF FF FF FF FF 02 60 E0 7A 63 E0 88 A4 18 10 [17401.923678] EtherCAT DEBUG: 0A 22 00 00 00 00 0C 00 00 00 00 00 00 00 00 00 [17401.923689] EtherCAT DEBUG: 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00
[17401.923700] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923719] EtherCAT DEBUG 0: ec_master_send_datagrams(device_index = 0)
[17401.923720] EtherCAT DEBUG 0: Adding datagram 0x23
[17401.923721] EtherCAT DEBUG 0: frame size: 46
[17401.923721] EtherCAT DEBUG 0: Sending frame:
[17401.923722] EtherCAT DEBUG: FF FF FF FF FF FF 00 60 E0 7A 63 E0 88 A4 18 10 [17401.923733] EtherCAT DEBUG: 0A 23 00 00 00 00 0C 00 00 00 00 00 00 00 00 00 [17401.923745] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923756] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923825] EtherCAT DEBUG 0: Datagram 00000000965a32dd already queued (skipping).

From my reading, the master did send and receive this datagram. Yet I'm still not quite clear on why this is unmatched. I can only assume that something (which doesn't show in the log), is different and the master doesn't like and we're saying it's some sort of network corruption? Am I mis-interpretting things?

WRT calling ecrt_master_receive() too soon after ecrt_master_send(), I've timed the calls by using the following before each call.

{
  static uint8_t tictoc = 0;
  tictoc = !tictoc;

  clock_gettime(CLOCK_REALTIME, &recv[tictoc]);
  printf("recv[%d]: %lld.%.9ld\n", tictoc, (long long) recv[tictoc].tv_sec, recv[tictoc].tv_nsec);
  ecrt_master_receive(master);
  // ...
}

The time diff between ecrt_master_send() and the next iteration of the task ecrt_master_recv() is common in the following:
1) 7400ns
2) 7304ns
3) 7289ns

recv[0]: 1576621482.827606007
proc[0]: 1576621482.827611921
queue[0]: 1576621482.827618270
send[0]: 1576621482.827622648
after[0]: 1576621482.827627484 <--- 1

recv[1]: 1576621482.827634884 <--- 1
proc[1]: 1576621482.827641278
queue[1]: 1576621482.827647525
send[1]: 1576621482.827652114
after[1]: 1576621482.827656902 <--- 2

recv[0]: 1576621482.827664206 <--- 2
proc[0]: 1576621482.827670504
queue[0]: 1576621482.827676664
send[0]: 1576621482.827681030
after[0]: 1576621482.827685919 <--- 3

recv[1]: 1576621482.827693208 <--- 3
proc[1]: 1576621482.827699553
queue[1]: 1576621482.827706048
send[1]: 1576621482.827710468
after[1]: 1576621482.827715254

This was running at 500Hz with pretty much an empty task doing nothing other than the calls listed in the OP using the ec_e1000e. With those numbers, it looks as if there's plenty of time to receive properly. I've also removed the following 3 lines which I place just before ecrt_domain_queue(). If I add those back, the number of mismatches triples.

    ecrt_master_application_time(master, TIMESPEC2NS(wakeup_time));
    ecrt_master_sync_reference_clock(master);
    ecrt_master_sync_slave_clocks(master);

Thoughts?


On 12/16/19 4:26 PM, Gavin Lambert wrote:
Another possible cause is if your cyclic loop timing is off somehow, and you're 
calling ecrt_master_receive too soon after ecrt_master_send so that the packet 
hasn't had time to make it back to the master yet.

The timing code you have below looks correct, but bear in mind that the way 
wakeup_time is incremented is designed to smooth latency jitter but can result 
in a zero-time sleep if the task is already running behind for some reason.  So 
adding some logging of your wakeup_time vs. current time and other performance 
metrics could be useful as well.

Also, double check the code you're using to change the process priority; you 
didn't include that part below.  Using SCHED_FIFO requires root permissions but 
even then there can be restrictions.  Verify it's not failing.


Gavin Lambert
Senior Software Developer


COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +49 2630 96520 | https://www.tomra.com

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.
-----Original Message-----
From: Gavin Lambert
Sent: Tuesday, 17 December 2019 11:09
To: Gary Grobe <ggr...@houstonmechatronics.com>; etherlab-users@etherlab.org
Subject: Re: [etherlab-users] Unmatched/Skipped Datagrams

There's a 4.19 version of all drivers (except r8169, because reasons), including igb, in 
the unofficial patchset 
(https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fu%2Fuecasm%2Fetherlab-patches%2Fci%2Fdefault%2Ftree%2F&amp;data=02%7C01%7Cgavin.lambert%40tomra.com%7Cd8107ac890f84bfecff508d782748ed6%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637121309444435878&amp;sdata=zYsu8AzSjFAjmgkqI8MS9xAa0r4Rsowadhwiu0yuel4%3D&amp;reserved=0).

It can be normal to see a small number of unmatched datagrams when you're 
starting your app up (and it's transitioning from background mode to realtime 
mode), and when you're replugging the network, but otherwise no, they should 
never appear.  It means the EtherCAT packets aren't getting all the way around 
the network.

It's possible to watch the flow of packets, either by using one of the built-in 
debug interfaces or (usually easier) by connecting a small dumb switch between 
the master and the first slave, and then connecting a third laptop/PC running 
Wireshark to the switch as well (preferably in such a way that the laptop can't 
send packets, but it's not critical).  This can sometimes give you more hints 
about what's going on.

Using "ethercat debug 1" and observing the syslog can also help.


Gavin Lambert
Senior Software Developer

  -----Original Message-----
From: Gary Grobe
Sent: Tuesday, 17 December 2019 10:30
To: etherlab-users@etherlab.org
Subject: Re: [etherlab-users] Unmatched/Skipped Datagrams

The ec_igb doesn't seem to be available for 4.19. If nothing else works, I may 
try to roll back the kernel as a last resort, but according to recent mailings, 
it would appear that others are running 4.19 LTS w/o issue.

I did just give things a try on new hardware:
- Linux 4.19.82-rt30 #1 SMP PREEMPT RT x86_64 GNU/Linux
- IgH EtherCAT master 1.5.2 6258f7d85afb+
- Intel(R) Core(TM) i7-7700T CPU @ 2.90GHz
- ec_e1000e using the following NIC:

00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection 
I219-LM [8086:156f] (rev 31)
          Subsystem: Intel Corporation Ethernet Connection I219-LM [8086:0000]
          Kernel driver in use: ec_e1000e
          Kernel modules: e1000e

Also replaced cables, but even at 50Hz, I'm still seeing the following with the 
e1000e.

[11576.944102] EtherCAT WARNING 0: 7 datagrams UNMATCHED!
[11577.940744] EtherCAT WARNING: Datagram 000000004e9cc6e1
(domain0-0-main) was SKIPPED 13 times.
[11577.944111] EtherCAT WARNING 0: 13 datagrams UNMATCHED!
[11578.944115] EtherCAT WARNING 0: 13 datagrams UNMATCHED!
[11578.944153] EtherCAT WARNING: Datagram 000000004e9cc6e1
(domain0-0-main) was SKIPPED 20 times.
[11579.944090] EtherCAT WARNING 0: 17 datagrams UNMATCHED!
[11580.940728] EtherCAT WARNING: Datagram 000000004e9cc6e1
(domain0-0-main) was SKIPPED 25 times.
[11580.944071] EtherCAT WARNING 0: 9 datagrams UNMATCHED!
[11581.944077] EtherCAT WARNING 0: 31 datagrams UNMATCHED!
[11581.944091] EtherCAT WARNING: Datagram 000000004e9cc6e1
(domain0-0-main) was SKIPPED 32 times.
[11582.944104] EtherCAT WARNING 0: 4 datagrams UNMATCHED!
[11583.940692] EtherCAT WARNING: Datagram 000000004e9cc6e1
(domain0-0-main) was SKIPPED 18 times.
[11583.944069] EtherCAT WARNING 0: 19 datagrams UNMATCHED!
[11584.944048] EtherCAT WARNING 0: 3 datagrams UNMATCHED!
[11584.944179] EtherCAT WARNING: Datagram 000000004e9cc6e1
(domain0-0-main) was SKIPPED 8 times.
[11585.944076] EtherCAT WARNING 0: 30 datagrams UNMATCHED!

I've also replaced the controllers w/ a few in-house as well as Elmo motion 
controllers, but to no avail. Is it common/ok to have a small percentage of 
these messages to appear?

Any other thoughts?

On 12/15/19 3:55 PM, Gavin Lambert wrote:
1kHz cycle is typically too fast for the generic driver.  Try using the ec_igb 
driver instead, or reduce your cycle to 250 Hz or less.

Otherwise, there may be some other issue with your network wiring, or your 
hardware.  Have you tried it on other hardware?


Gavin Lambert
Senior Software Developer


COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland
1061 | New Zealand
Switchboard: +49 2630 96520 | https://www.tomra.com

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.
-----Original Message-----
From: Gary Grobe
Sent: Saturday, 14 December 2019 11:19
To: etherlab-users@etherlab.org
Subject: [etherlab-users] Unmatched/Skipped Datagrams

My configuration is:
- Linux 4.19.82-rt30 #1 SMP PREEMPT RT x86_64 GNU/Linux
- IgH EtherCAT master 1.5.2 6258f7d85afb+
- Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz
- ec_generic based on the following NIC:

03:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network 
Connection [8086:1539] (rev 03)
           Subsystem: Intel Corporation I211 Gigabit Network Connection 
[8086:0000]
           Kernel driver in use: igb

The app is a user-space client. The following is the cyclic task cut down to a 
minimum, and yet still shows the issue.

{
       // ...

       struct sched_param param = { };

       param.sched_priority = sched_get_priority_max(SCHED_FIFO);

       if (mlockall(MCL_CURRENT | MCL_FUTURE) == -1) {
           std::cout << "Warning: Failed to lock memory: " <<
strerror(errno) <<
               std::endl;
       }

       stack_prefault();

       clock_gettime(CLOCK_MONOTONIC, &wakeup_time);
       wakeup_time.tv_sec += 1;
       wakeup_time.tv_nsec = 0;

       while (running) {
           ret = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &wakeup_time,
               NULL);

           if (ret) {
               std::cerr << "clock_nanosleep(): " << strerror(ret) <<
std::endl;

               break;
           }

           task_cycle();

           wakeup_time.tv_nsec += PERIOD_NS;

           while (wakeup_time.tv_nsec >= NSEC_PER_SEC) {
               wakeup_time.tv_nsec -= NSEC_PER_SEC;
               wakeup_time.tv_sec++;
           }
       }
}

inline void TestChain::task_cycle()
{
       ecrt_master_receive(master);
       ecrt_domain_process(domain);

       ecrt_master_application_time(master, TIMESPEC2NS(wakeup_time));
       ecrt_master_sync_reference_clock(master);
       ecrt_master_sync_slave_clocks(master);

       if (counter) {
           counter--;
       }
       else {
           counter = FREQUENCY;
       }

       ecrt_domain_queue(domain);
       ecrt_master_send(master);
}

The following messages from dmesg were taken at 1kHz. Doesn't matter if it's 
doing 10Hz or 1kHz, the UNMATCHED and SKIPPED numbers are quite large. I've 
only seen 1Hz work w/o issue.

[ 4132.457210] EtherCAT 0: Domain 0: 268 working counter changes - now 1/2.
[ 4133.373743] EtherCAT WARNING 0: 657 datagrams UNMATCHED!
[ 4133.461587] EtherCAT WARNING: Datagram 000000002f1447da
(domain0-0-main) was SKIPPED 736 times.
[ 4133.461593] EtherCAT 0: Domain 0: 395 working counter changes - now 0/2.
[ 4134.374131] EtherCAT WARNING 0: 643 datagrams UNMATCHED!
[ 4134.465961] EtherCAT WARNING: Datagram 000000002f1447da
(domain0-0-main) was SKIPPED 776 times.
[ 4134.465968] EtherCAT 0: Domain 0: 232 working counter changes - now 0/2.
[ 4135.374332] EtherCAT WARNING 0: 850 datagrams UNMATCHED!
[ 4135.470468] EtherCAT WARNING: Datagram 000000002f1447da
(domain0-0-main) was SKIPPED 722 times.
[ 4135.470476] EtherCAT 0: Domain 0: 331 working counter changes - now 1/2.
[ 4136.374860] EtherCAT WARNING 0: 649 datagrams UNMATCHED!
[ 4136.475180] EtherCAT WARNING: Datagram 000000002f1447da
(domain0-0-main) was SKIPPED 858 times.
[ 4136.475187] EtherCAT 0: Domain 0: 282 working counter changes - now 1/2.
[ 4137.375178] EtherCAT WARNING 0: 757 datagrams UNMATCHED!
[ 4138.319869] EtherCAT WARNING: Datagram 000000002f1447da
(domain0-0-main) was SKIPPED 745 times.
[ 4138.319877] EtherCAT 0: Domain 0: 208 working counter changes - now 1/2.
[ 4138.375593] EtherCAT WARNING 0: 972 datagrams UNMATCHED!
[ 4139.324269] EtherCAT WARNING: Datagram 000000002f1447da
(domain0-0-main) was SKIPPED 911 times.

--
_The contents of this email message and any attachments are intended solely for 
the addressee(s).  This message (including any attachments) may contain 
confidential, proprietary, privileged and/or private information. The 
information is intended to be for the use of the individual or entity 
designated above. If you are not the intended recipient of this message, please 
notify the sender immediately, and delete the message and any attachments.  
_Any disclosure, reproduction, distribution or other use of this message or any 
attachments by an individual or entity other than the intended recipient is 
prohibited. __ _______________________________________________
etherlab-users mailing list
etherlab-users@etherlab.org
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.etherlab.org%2Fmailman%2Flistinfo%2Fetherlab-users&amp;data=02%7C01%7Cgavin.lambert%40tomra.com%7Cd8107ac890f84bfecff508d782748ed6%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637121309444445873&amp;sdata=R2bB6zC6lYaR7mcWdONQRGmrG%2FvZICVl3b0ef6aBXHQ%3D&amp;reserved=0
_______________________________________________
etherlab-users mailing list
etherlab-users@etherlab.org
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.etherlab.org%2Fmailman%2Flistinfo%2Fetherlab-users&amp;data=02%7C01%7Cgavin.lambert%40tomra.com%7Cd8107ac890f84bfecff508d782748ed6%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637121309444445873&amp;sdata=R2bB6zC6lYaR7mcWdONQRGmrG%2FvZICVl3b0ef6aBXHQ%3D&amp;reserved=0

--
_The contents of this email message and any attachments are intended solely for the addressee(s).  This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments.  _Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited. __
_______________________________________________
etherlab-users mailing list
etherlab-users@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-users

Reply via email to