Re: [etherlab-dev] Ethercat master module make fails

2020-02-25 Thread Gavin Lambert
Note that the standard distribution of the Etherlab master does not support 
kernel 4.19.  You can either downgrade your kernel or try using the unofficial 
patchset, which includes some compatibility patches for 4.19.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>  
[cid:insta2_1cd85de9-b3a2-4971-9904-52b2481a7c82.png] 
<https://www.instagram.com/compacsort/>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Steih, Martin
Sent: Tuesday, 25 February 2020 02:10
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] Ethercat master module make fails

Hello,

I am trying to make the ethercat master module as described in the 
documentation, but it failes compiling the examples/mini/mini.c. The compiler 
complains about an implicit function declaration (init_timer line 496) as well 
as an incompatible pointer type at the following line. I am using kernel 
version 4.19.xx with rt patch. Does someone has any suggestions?

i. A. Martin Steih
Entwicklung


Lachmann & Rink GmbH
Hommeswiese 129
57258 Freudenberg

Telefon: +49 2734 2817 430
Telefax: +49 2734 2817 20
E-Mail: martin.st...@lachmann-rink.de<mailto:martin.st...@lachmann-rink.de>
Internet: 
https://www.lachmann-rink.de<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.lachmann-rink.de%2F&data=02%7C01%7Cgavin.lambert%40tomra.com%7Cfbce07489c1b4c3d3b8a08d7b92ae04f%7C4308d118edd143008a37cfeba8ad5898%7C0%7C1%7C637181466107808753&sdata=RC0oUxKY34BccfzEWFjPe7vyMBxgcPwemMhnuuwrF2Y%3D&reserved=0>

Geschäftsführer: Dipl.-Ing. Arjan Bijlard, Dipl.-Inf. Claudius Rink

Amtsgericht Siegen, HRB 2600
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] wait_event() causes uninterruptible_sleep

2020-01-29 Thread Gavin Lambert
I'm not entirely sure, but I don't think simply changing that would be safe.

The whole "on interrupt return -EINTR" thing assumes that it's safe to simply 
make the exact same call again to "resume" the operation.  This is true in the 
first case because it's just waiting for the request to be enqueued, and on 
interrupt it simply dequeues it again.  However after that there's a race where 
it might have already been sent and is waiting for a response, and in that case 
it's not safe to return -EINTR because it might end up being sent a second 
time, which could cause incorrect behavior of the slave.  (And would probably 
also confuse the mailbox FSM.)

It might be possible to abort the request on interrupt instead, but that would 
be annoying as thread signals can cause spurious interrupts.  (And might still 
end up meaning the slave will receive requests twice, if the app then 
explicitly retries.)


If you instead explicitly close(masterfd) (aka ecrt_release_master) in your 
problem case, this should abort all pending requests and wake up the threads - 
you can see the code that does this in ec_slave_clear and 
ec_master_clear_slaves.

(The OS will automatically do this when your process actually terminates, but 
not while you still have a live thread.  So you will have to use an 
exception/signal handler to intercept the crash in progress.)

Another option is to use the non-blocking SDO request APIs instead.  Using 
these (on the cyclic thread) is better anyway for regular transfers done while 
the master is activated, as it avoids ping-ponging the master locks between 
multiple threads, which can increase cycle latency.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>  
[cid:insta2_1cd85de9-b3a2-4971-9904-52b2481a7c82.png] 
<https://www.instagram.com/compacsort/>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Geller, Nir
Sent: Wednesday, 29 January 2020 23:31
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] wait_event() causes uninterruptible_sleep

Hi There,

we are working with etherlab's ethercat master and recently we've encountered a 
problem that is related to a non interruptible wait_event().

The scenario:
A multi-threaded user space app cyclically reads SDO from some ecat slave.
The user space app then crashes.
All the threads end besides the one that performs the SDO read:

.
1022  1022 TS   -   0  19   0  0.0 Zl   task_deadabcde 

1022  1202 RR   2   -  42   0  0.6 Dl   ecrt_master_sdo_upload   abcde1
.

This situation interferes with debugging the app, and prevents a core dump from 
being generated.

In master.c in ecrt_master_sdo_upload() I see an invoke of 
wait_event_interruptible() followed by an invoke of wait_event().

After changing wait_event() to wait_event_interruptible() the app can 
successfully crash, and it is now easier to debug.

Needless to say, we need a core dump to be generated when the app crashes at 
costumer's site.

The question is what is the reason behind using wait_event() instead of 
wait_event_interruptible() ?

Is it safe for us to change the code?

Thanks,

Nir.
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Hot plugged modules failing to read DC register

2019-09-24 Thread Gavin Lambert
When the slave goes to safeop+error it should also output an AL error code 
which might give a hint as to why.  This should be logged to the syslog when 
the master acknowledges the error.

AL error 0x001B, for example, indicates that the slave stopped receiving SM 
frames (typical of a comms interruption) - and features/quick-op in the 
patchset tries to do a quicker recovery for this case by trying to go straight 
back to OP instead of going through a full PREOP reconfiguration.  It's 
possible that some slaves may need the full reconfigure, so you can disable 
this behaviour at configure time.

Other AL error codes mean other things, such as your DC cycle being poorly 
synced and frames not occurring in a strict SYNC0-SM-SYNC0-SM ordering.


But I wouldn't normally expect any standard registers to fail WC when this 
occurs, unless perhaps the slave was performing a full power reset (or 
otherwise holding the slave's ESC in reset).  Though this would interrupt comms 
to any downstream slaves as well, so it's not something that slaves are 
supposed to do of their own accord.  (And it shouldn't stay in safeop+error in 
that case, it should revert to Init, although that's up to the slave 
implementation.)

90ms seems a bit slow for just an ESC power-on SII read, although it's possible 
that it's doing something more complicated.

I'm not really familiar with those modules, however; you're probably best off 
asking Beckhoff directly.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>  
[cid:insta2_1cd85de9-b3a2-4971-9904-52b2481a7c82.png] 
<https://www.instagram.com/compacsort/>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot
Sent: Tuesday, 24 September 2019 17:20
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] Hot plugged modules failing to read DC register

Hi,

I've had occasional issues with EL7332 and EL7342 modules where they will go to 
SafeOp + Error if you try and use them in DC mode.  I've finally had some time 
to look into it a little further.

When the modules go to SafeOp + Error the master outputs the message "Slave has 
no System Time register; delay measurement only." (with debug level 1).  This 
occurs due to the datagram reading register 0x0910 returning a working counter 
of zero.

I created a quick hack to retry reading the register up to 100 times before 
failing.  After approx. 90ms the EL7342 module I'm testing with successfully 
returned the datagram and the slave entered Op state successfully.

In my test setup I also have an EL5101 module that was doing the exact same 
thing (and taking around the same time), but I've never really had issues with 
them before.  I suspect the difference is that if you have incorrect settings 
on the EL7342 module and try to run a motor it can error out and reset itself, 
causing a situation equivalent to a hot plug.

Without my hack both modules need to wait for the SII read to complete for a 
similar length of time, so it looks like the slaves do not respond to the 
0x0910 register request until the EEPROM read is complete.  Does anyone know if 
this is expected behaviour, or know of a better solution than to retry reading 
the register (up to 200ms ???)?


Regards,
Graeme.
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Unofficial patchset update 20190904

2019-09-09 Thread Gavin Lambert
*   Modifies some internals of the kernel/tool interface to fix a race 
condition that could cause "ethercat pcap" to report an error if the log is not 
yet full.
*   Note that the race on the clear operation is still present.
 *   features/pcap/0004-high-precision.patch
*   Uses microsecond-precision timestamps in the pcap files instead of 
jiffie-precision (typically 1-10ms).
*   This is disabled (and jiffie precision is still used) when compiling 
for RTDM, due to Graeme Foot's note that do_gettimeofday is troublesome under 
RTAI.
 *   features/mbg/0001-mailbox-gateway.patch
*   EtherCAT Mailbox Gateway server, from 
http://lists.etherlab.org/pipermail/etherlab-dev/2019/000706.html
*   I had to add some missing includes and fix a typo to get this to 
compile.  I have not verified its behaviour myself.



Dropped patches:

 *   devices/0008-linux-4.13.patch
*   Patch dropped because 4.13 didn't end up being an LTS release.  Use 
4.14 instead.
 *   base/0004-dc_user-tabs.patch
*   Patch dropped because stable patches fix the original problem.



Patches not taken:

 *   TTY support for newer kernels -- 
http://lists.etherlab.org/pipermail/etherlab-users/2018/003516.html
*   This appears to be incomplete.  And the proposed changes seemed 
unnecessary when compiling for 4.19.  Another related patch was accepted, 
though.
 *   Typo in igb_main for 3.18 -- 
http://lists.etherlab.org/pipermail/etherlab-dev/2019/000690.html
*   This appears to be a bug in the upstream kernel, which is still not 
fixed today in 3.18 - it was not fixed until 4.1, by 
https://github.com/torvalds/linux/commit/2439fc4d71f71b47c.  As such, since 
there might be other non-EtherCAT-related bugs, it seems best to recommend 
using a newer kernel if you want to use IGB, rather than trying to patch just 
this one thing.



I think this addresses all patches sent to myself or to the mailing lists since 
the last patchset release.  There was a period where I wasn't receiving 
messages from the list, however, so if I've missed one then I apologise.  
Please let me know if there are any other changes that ought to be included.



Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>  
[cid:insta2_1cd85de9-b3a2-4971-9904-52b2481a7c82.png] 
<https://www.instagram.com/compacsort/>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] EtherCAT Mailbox Gateway Server patch

2019-07-18 Thread Gavin Lambert
Sounds interesting.

To address one of your questions, standard slave behaviour is to report 
"subindex not existing" (0x11) only if the requested subindex is higher than 
the maximum subindex that exists.  When accessing a "gap" subindex for which no 
data is available, it reports "data cannot be read or stored" (0x18) instead.  
(There's a few other variations of errors for cases where data is written when 
read-only, or can only be read/written in a different AL state, etc.)

Also, a slave's module profile should be readable from SDO 0x1000.  Most 
general-purpose modular devices will report 0x1389 here (others will report 
something else, of course); it's a mandatory object for any slave that supports 
CoE.


And yes, I'm currently in the process of integrating and updating the patchset. 
 It's nearly done, but I've hit a bit of a brick wall at present where due to a 
recent kernel patch (which appears to be in recent versions of 4.4+) the e1000e 
driver fails to recover from loss of link when using a motherboard-based 
adapter.  Frustratingly, it's a code change outside of the e1000e driver itself 
which is affecting its operation - although it does appear to operate correctly 
when used with ec_generic.  The good news is that the igb driver appears to be 
unaffected.  I'm hoping to figure out a workaround before release, though if it 
takes much longer then I might just release it as-is.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot 
Sent: Friday, 19 July 2019 17:57
To: etherlab-dev@etherlab.org
Cc: Gavin Lambert 
Subject: EtherCAT Mailbox Gateway Server patch

Hi,

I have attached a patch to implement an EtherCAT Mailbox Gateway server.  
Florian you may be interested in this as it is on the EtherLab TODO list.

The server provides for UDP and up to 16 TCP connections.  The the UDP and TCP 
connections are not multi-threaded.

It is based on the specification:
https://www.ethercat.org/memberarea/download/ETG8200_V1i0i0_G_R_MailboxGateway.pdf

It is designed to be used with tools such as:
https://download.beckhoff.com/download/document/automation/twinsafe/twinsafe_loader_en.pdf

Note: the TwinSAFE Loader is a new program that allows you to apply TwinSAFE 
programs to your TwinSAFE modules without having the modules connected to 
TwinCAT.  You are still required to use TwinCAT to create the safety programs.  
Last I looked there was no download link for the TwinSAFE loader as it is new 
and Beckhoff are still deciding on licensing, but I got a copy from our local 
rep.


The server is built on top of the GavinL Etherlab master patchset 20171108.  It 
integrates into the multiple mailbox protocol patches functionality.  Mailbox 
datagrams from the Mailbox Gateway server are recognized due to the Mailbox 
Header address not matching the datagram ADP address.

The server is a user space program named "ethercat_mbg" loosly based on the 
"ethercat" user space tool (MBG stands for Mailbox Gateway).  The program can 
be run as a foreground or background task depending on whether you want to run 
it temporarily or as a daemon.  There is no security incorporated into the 
protocol so the server will give full access to the slave mailboxes while it is 
running.  The mailbox gateway listens on port 0x88A4 (34980) should you want to 
firewall the port.


The Mailbox Gateway protocol requires the EtherCAT master to provide a Master 
Object Dictionary as specified by ETG.5001.3 (Modular Device Profile Part 3: 
Fieldbus Gateway Profile Specifications), Annex A:
https://www.ethercat.org/memberarea/download/ETG5001_3_V0i1i2_S_D_MDP_Gateways.pdf

I have implemented all of the optional items, along with A.2.1.3 Diagnosis Data 
(index 0xAnnn) which is supposed to be optional but seems to be required by the 
TwinSAFE loader.  I'm not massively happy with the master object dictionary 
function (ec_master_obj_

Re: [etherlab-dev] EoE IP command patch

2019-07-11 Thread Gavin Lambert
Regarding patch 0002, I'm curious why the callbacks are being disabled for the 
RTDM case.  (master/ioctl.c)  (And note that EC_EOE is enabled by default, so 
this will affect most RTDM users.)

As I understand it (although I might be wrong), the callbacks are expressly 
intended for use with RTAI/Xenomai apps, so that you can make it use an 
RTAI/Xenomai lock instead of a Linux lock (or defer sending/receiving entirely 
to another cycle, if you're busy and don't want to lock).  It does require 
either a kernel-space app or at least a stub that implements a suitable locking 
model -- user-space applications will always have NULL callbacks and would thus 
behave the same as if this part of your patch had never been applied.

Since the ec_ioctl_* locks are automatically disabled for RTDM, the change 
you're suggesting in this patch completely disables any possibility of 
locking/deferral at all for RTDM apps, which seems a bit odd, since AFAIK 
that's the only reason that the callbacks exist in the first place.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot
Sent: Thursday, 7 February 2019 10:35
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] EoE IP command patch

Hi,

FYI, I've updated my EoE patches and added a new one.

0001-eoe-addif-delif-tools.patch

This has been updated so that if the "eoe_autocreate" flag is 1 (true) then 
static "eoe_interfaces" can still be used, resulting in a combination of static 
and dynamic EoE ifaces.  I'm doing this so I can have static iface ports for my 
switch devices (EL6601, EL6614 modules) and dynamic iface ports for my Yaskawa 
sigma 7 amps which support configuration via EoE.


0002-eoe-via-rtdm.patch

Line number changes due to the above patch.


0003-eoe-ip.patch

This is a new patch to fix some EoE bugs to do with the "ethercat ip" command.  
The ip command allows you to set the MAC, IP address, subnet mask, gateway, DNS 
server and name on an EoE device.  This command was returning errors saying it 
had timed out.  This was due to the EoE frame thread receiving and dropping the 
reply from the ip command.  I'm not sure if this became a problem due to the 
mailbox patches or whether it would have been a problem anyway.  Note: the ip 
command would also drop mailbox replies that were meant to go to the EoE frame 
thread handler.

To resolve this I have created two EoE mailbox reply caches.  One for the EoE 
frame thread and one for the ip command.

Secondly the EtherCAT master was packing the ip command data, so if an item was 
not being set subsequent items would not leave a space for it and dynamically 
sizing the data structure.  The ETG.1000.6 standard shows the ip command (EoE 
Init Request) data structure requires the data to have fixed positions and to 
leave space for unused items.  It is also a fixed size.  I have confirmed this 
with Beckhoff.


Regards,
Graeme Foot.
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] pcap logging patch

2019-06-10 Thread Gavin Lambert
Sounds interesting, although a rolling buffer would probably be more generally 
useful (for all but the startup case).  This is basically what EC_DEBUG_IF 
does; other than being disabled by default, why wasn't that suitable?

The method that I usually use to debug traffic issues is to insert a dumb 
hub/switch between the master and first slave, and additionally connect another 
PC running Wireshark to spy on the traffic.  (For best results, disable the 
TCP/IP bindings on the monitoring PC to avoid injecting non-EtherCAT packets, 
although EtherCAT nodes will ignore these anyway.)  And it's reasonably 
portable; you just need an extra network cable and some power, no software 
changes at all.

You can go even better by adding a dedicated network monitoring device (which 
guarantees not to accept packets from the monitoring PC), but I find that the 
above is sufficient for most purposes, especially since EtherCAT packets are 
sent as broadcasts.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: etherlab-dev  On Behalf Of Graeme Foot
Sent: Tuesday, 11 June 2019 13:41
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] pcap logging patch

Hi,

In case anyone is interested I've attached a patch for an EtherCAT comms 
logging function:

/features/pcap/0001-pcap-logging.patch

This will cache the first 30mb (defined under PCAP_SIZE) of EtherCAT comms 
traffic to memory in pcap format.  It adds a pcap command to the ethercat tool 
utility, which also has a reset option to clear the cache and continue logging.

I know there are already other debug options, i.e.:
- Debug level 2, will print the EtherCAT comms to syslog direct
- EC_DEBUG_IF, which creates a local IFACE port that gets the EtherCAT comms 
traffic mirrored to it
(to be logged in wireshark locally or from a remote computer if 
the debug IFACE is bridged to a real IFACE)
- EC_DEBUG_RING, will print the EtherCAT comms to syslog if Debug level > 0
Warning: EC_DEBUG_RING uses the do_gettimeofday() method.  This is not safe to 
be called from an
RTAI realtime thread.  It will freeze your system if you only have one CPU.  It 
should use jiffies instead.

None of the options above really suited my situation as I wanted to track down 
intermittent startup issues at client sites.  The Syslog rotates too quickly 
and has other information in it and the Debug IFace option was not suitable to 
set up at a client site.


Regards,

Graeme Foot
Kinetic Engineering Design Ltd.

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Missing Vendor ID / Product Code

2019-06-10 Thread Gavin Lambert
Did you have a look at 
base/0026-Prevent-abandoning-the-mailbox-state-machines-early-.patch?  Because 
that does something similar.

(It was base/0019-Support-for-multiple-mailbox-protocols.patch which added the 
handling of the INVALID datagram state for the mailbox state machines.  The one 
above was a bugfix for this patch, essentially.)


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com>
 [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] 
<https://www.facebook.com/Compacsort>  
[cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] 
<https://www.linkedin.com/company/compac-sorting-equipment/>  
[cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] 
<https://vimeo.com/compacsort>  
[cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] 
<https://twitter.com/compacsort>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot 
Sent: Tuesday, 11 June 2019 11:52
To: etherlab-dev@etherlab.org
Cc: Gavin Lambert 
Subject: RE: Missing Vendor ID / Product Code

Hi,

Unfortunately "0008-fsm_sii-loading-check.patch" (below) didn't fix my main 
problem.  It turns out it is an inherent problem with how the masters external 
datagram ring works.  I have attached a patch that plugs the hole causing the 
problem I was having but there may be other cases where issues could occur.

Patch: 
/features/parallel-slave/0009-ec_master_exec_slave_fsms-external-datagram-fix.patch


The guts of the problem:

ec_master_exec_slave_fsms() calls ec_master_get_external_datagram() to get a 
datagram from the external datagram ring.  The datagram is then passed to 
ec_fsm_slave_exec() of the slaves with some work to do.  This call will then 
return either 1 for fsm still in progress or 0 for fsm is complete.  The master 
assumes that if the fsm is still in progress then the datagram has been 
consumed and is in use, but there are various cases where this is not true.  If 
any of these cases occur then in the first loop of ec_master_exec_slave_fsms() 
these slaves fsm's may be executed multiple times while another slaves fsm is 
waiting on its datagram to return.

If too many slaves, or cycles, occur during this time then the waiting slaves 
datagram either gets its state set to EC_DATAGRAM_INVALID or gets reused by 
another slave.  This can lead to "cancelled" datagram replies or the two slaves 
getting the results from the second slaves datagram (as the first datagram 
index will be replaced and its reply is lost).


In my case this was occurring due to using the "0001-load-sii-from-file.patch" 
patch.  During the SII config stage of a slave this patch will create a kthread 
to attempt to read the SII file from disk.  In the meantime the 
ec_fsm_slave_exec() command will continue returning a value of 1 (fsm in 
progress) but will not be using the presented datagrams (setting the datagram 
state to EC_DATAGRAM_INVALID).

During initial startup and configuration of the master the 
ec_master_exec_slave_fsms() call is made from ec_master_idle_thread() in a loop 
with (in my configuration) a call to schedule() before resuming the loop.  This 
means that multiple loops may occur before a reply to a slaves datagram 
returns, leaving plenty of time for the in-use datagrams to be recycled 
resulting in its state or data being overwritten.


The patch I have attached now also tests the datagrams state for 
EC_DATAGRAM_INVALID before incrementing the external datagram ring index.  This 
solves my problem where the datagrams state is being set to EC_DATAGRAM_INVALID 
while waiting for the kthread to complete.

I suspect there may be other instances where this problem could occur.  One 
case I have thought of, but haven't been able to confirm, is when multiple 
protocols try to access a slaves mailbox at the same time (e.g. COE, EOE, FOE 
etc).  Only one protocol is allowed to communicate at a time.  The other 
protocols will be offered a datagram from the ring, but they aren't able to use 
it until their turn comes up.  In these cases if ec_read_mbox_locked() fails 
the datagram state is also set to EC_DATAGRAM_INVALID so the patch should also 
cover this case.


Regards,
Graeme.


From: etherlab-dev 
mailto:etherlab-dev-boun...@etherlab.org>> 
On Behalf Of Graeme Foot
Sent: Monday, 4 March 2019 2:36 PM
To: etherlab-dev@etherlab.org<mailto:etherlab-dev@etherlab.org>
Subject: Re: [etherlab-dev] Missing Vend

Re: [etherlab-dev] Install in Kernel 4.15? Exclude examples?

2018-09-10 Thread Gavin Lambert
The unofficial patchset 
(https://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme) 
contains driver and other patches to let it build on kernel 4.14.  I haven’t 
personally tried anything newer than this.

Usually it’s better to stick to one of the supported versions (which tend to 
track popular LTS distributions, rather than bleeding edge – 4.15 isn’t even 
considered a stable version) unless you desperately need to use a newer kernel 
for some other reason.  In which case, you’re welcome to submit additional 
patches.

If you want to skip compiling all the examples, you can comment out the 
following line in Makefile.am (though it’s not recommended, since if you can’t 
compile examples then you’ll probably have problems with applications as well):
SUBDIRS += examples

If the version of your automake tools changes then you should run the bootstrap 
script again.

From: Chris Grigg
Sent: Tuesday, 11 September 2018 06:13
Subject: [etherlab-dev] Install in Kernel 4.15? Exclude examples?

Hello everyone,

Hope the summer is treating you all well. I have a multi-parter:

1. Has anyone attempted to install under Kernel 4.15? One of the examples is 
complaining about `init_timer`, which was removed in 4.15.

2. Alternative/additionally, is it possible to remove the examples completely? 
I am unsure if there will be other breaking changes, but this is something I'd 
be interested in either way. Removing references from the `configure` and 
`configure.ac`
 scripts doesn't do the trick.

Finally, if anyone attempts to install and receives an error about 
`aclocal-1.14` not existing, you might need to modify a variable `configure` to 
use 1.15 or whichever version is installed on your system. We were working fine 
until recently, I'm guessing that Ubuntu is now installing 1.15 automatically.

Appreciate the help.

Thanks,

Chris
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Unofficial patchset update 20180622

2018-06-24 Thread Gavin Lambert
Hi all,



I've just updated the unofficial patchset to version 20180622.  It is still 
based on the same upstream default commit as before: 33b922.



https://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme



Notable changes since the last release (20171108):



  *   Refreshed all patches; they now contain a little more context.
  *   devices/0009-cx2100-2.6.patch and 0010-cx2100-4.9.patch: incorporated 
logging changes provided by Graeme Foot.
  *   base/-version-magic.patch: bumped version numbers since new patches 
have introduced new API.
  *   base/0032-signal-4.11.patch: due to stable/0017 this patch was rewritten 
(but is still required because stable is still missing an include).
  *   
features/rt-slave/0001-allow-app-to-process-slave-requests-from-realtime.patch: 
incorporated changes provided by Graeme Foot.



New patches:



  *   Pulled stable/0017 through 0023 from stable-1.5 branch into default 
branch.
  *   devices/0011-linux-4.14.patch: Update device drivers for Linux 4.14.
  *   devices/0012-e1000-unused-variable.patch: Avoid uninitialzed variable 
warning in e1000, provided by Graeme Foot.
  *   base/0033-dc-sync1-offset.patch: use both sync1_cycle and sync1_offset to 
determine SYNC1 register value; resolves issue with using SYNC1 shifts.  
Provided by Graeme Foot.
 *   Note that I've modified this slightly from the version posted in the 
ML; it now forces SYNC1 to 0 if you try to set SYNC1 values without a valid 
SYNC0 time, instead of setting up an incorrect SYNC0 cycle.
  *   features/eoe-rtdm/0001-eoe-addif-delif-tools.patch: explicit EoE 
interfaces, provided by Graeme Foot.
 *   Also modified slightly to resolve some compile errors and incorrect 
printfs.
  *   features/eoe-rtdm/0002-eoe-via-rtdm.patch: application-controlled EoE for 
RTDM, provided by Graeme Foot.



Note that other than a quick glance through and verifying it compiles I haven't 
done much vetting on the EoE patches, as I don't use EoE or RTDM myself.  
Please let me know if there are any issues, or if I've overlooked some other 
patches.
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Cannot compile for kernel 4.14

2018-06-21 Thread Gavin Lambert
On 2 May 2018 17:34, quoth Joye Laurent:
> I'm using the code from etherlabmaster (default branch) + your own patches
> (Patchset 20171108 based on default branch 33b922ec1871).
> 
> I'm trying to compile a kernel 4.14 with rtmutex enabled at configure time.
> The compilation fails because an include directive is missing. If I add, in 
> the
> file master/locks.h, the line "include " right after the line
> "#ifdef EC_USE_RTMUTEX", it works.

My apologies for the late reply.  I've tried this myself (without the extra 
#include) and it compiles ok.

Are you using an -rt patched kernel?  There is an assumption that if you want 
rtmutexes you should be using an -rt patched kernel as well, as otherwise they 
would provide less benefit.

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Alias Addressing

2018-06-10 Thread Gavin Lambert
Note that this sort of question belongs on the users list, not the dev list.

When you are commissioning a particular network, you first run "ethercat 
slaves" to view the list of existing slaves on the network.  This will show 
both their position-based address and their alias-based address.

You can optionally run "ethercat alias" with additional parameters to specify a 
specific slave (usually using -p alone, but optionally with some combination of 
-a and -p if you want to reassign the alias of a slave, or use an existing 
alias as a base position).  This will set the alias of the specified slave, 
allowing it to be used in alias-based addressing later.

See "ethercat slaves --help" and "ethercat alias --help" for more information.

Giving slaves an alias is optional, but it can be useful where you have a 
non-linear network or if devices can be reordered without recommissioning.  
Either way your application needs to use matching addressing.

>From application code, you use ecrt_master_slave_config() to begin defining 
>the slave configuration.  This works with either position-based or alias-based 
>addressing; see the documentation for that method for more information.

From: lingjie_k...@amat.com
Sent: Friday, 1 June 2018 05:35
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] Alias Addressing

Hi,

I have a question regarding alias addressing. In page 12 of the 1.5.2 ethercat 
manual, it says that slave position can be specified by position addressing or 
alias addressing and refer to 7.1.2.

In 7.1.2, it says setting alias address by doing
ethercat alias [option] 


  1.  However, my question is that does this command actually address slave by 
its alias number and configure network as alias address for all slaves?
  2.  Meanwhile, does etherlab actually support alias address?
  3.  If so, how to address number of slaves by its alias?
  4.  If each slave has different PDO mapping, how to address the PDO mapping 
for each slave based on alias.

Thanks,

-Lingjie
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

2018-03-15 Thread Gavin Lambert
On 5 March 2018 22:58, quoth Esben Haabendal:
> The multiple process world can be different though.  You basically end up
> with a new EtherCAT API.  A combination of the etherlabmaster API and a
> custom named semaphore API.  Without this, applications will not work
> properly together.  Why not include such a feature directly in
> etherlabmaster?  Without it, I think we are making the user-space
> applications (non-RTDM) into a second-class citizen.

If you have multiple independent processes then they must (of necessity) either 
operate on separate masters (in which case no locking beyond what the kernel 
already does is required) or they must all communicate (through some mechanism 
of your own devising) with a single process who "owns" the master.  The master 
library does not allow you to reserve or activate a single master concurrently 
in separate processes.

If you have multiple tasks within a single process operating on the same master 
(eg. multiple cycles with different intervals) then they _should_ operate on 
different domains, and *must* coordinate their calls to the ECRT APIs in some 
fashion.  In upstream Etherlab, this requires application-level locking.  In 
the current patchset, the locking is done for you (except for RTDM), but I'm 
not entirely convinced this is the correct design choice (see my other reply).

If you have one process that is running a realtime application loop and another 
process that only performs non-realtime tasks (eg. injecting CoE requests), 
even on the same master, then all versions of Etherlab handle this for you 
without requiring additional locking.  This is how you can still use the 
"ethercat" command line tool while running a realtime application.

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Community contribution

2018-03-15 Thread Gavin Lambert
My apologies for the delayed response; I've recently switched email providers 
and this mailing list always seems to annoy our servers for some reason.

I'm also based in New Zealand and don't really travel all that much, so I'm not 
likely to get to a physical conference.


Having said that, I would love to get more of the patches in the patchset 
included upstream where possible -- the main reason why it's based on the 
default branch rather than the stable branch is because some of the patches 
from past versions of the patchset have already been merged to default.  This 
is also why I've structured it the way I have, with bugfixes separate from new 
features, and (theoretically at least) ordered by importance and intended 
merging order.  (Though I've perhaps been less aggressive in reordering patches 
than I should have, to reduce churn.)

I've also strived to introduce configure options or other easily togglable 
settings for some of the more niche or controversial functionality.  This is 
also the reason why it's a patchset rather than a full fork, so that it should 
be easier to merge into upstream Mercurial.  (And at least one other person has 
made a full fork available for the people who prefer that.)

I've tried to minimise compatibility breakage and indeed in the current version 
of the patchset I don't think there is any hard breakage (beyond unavoidable 
changes to ioctls, which the readme and version detection addresses).

As always, I welcome suggestions for changes to or alternate orderings of the 
patches that might make them easier to merge upstream, either in whole or in 
part.


For the record, regarding the patches base/0017 and base/0018 that seem 
particularly in contention at the moment: they were originally submitted by 
Knud Baastrup (included in his series of patches to mailbox functionality, 
which I feel is critically important) -- I'm not sure if he's still around, but 
perhaps he could further explain the motivation?

I do have some reservations about them as they do contradict the core Etherlab 
documented policy of requiring application-level locks when running multiple 
realtime tasks -- and my own applications use just one realtime task and don't 
use EoE so they do not require any additional locking.  But they seemed 
valuable to retain specifically for the case of a pure userspace application 
that wants to use EoE, since in this scenario it is not possible to supply 
callbacks to provide the necessary locking for EoE (unless I've missed 
something?).  This is the main reason that I have retained them, although I did 
rewrite them a couple of times to make the locking optional for RTDM 
applications, where it is counterproductive.  I welcome alternative suggestions 
for handling these cases as well.

I have not yet had time to fully review Graeme Foot's latest EoE patches, so I 
haven't integrated them yet, but they do seem like a step towards treating EoE 
as a first-class task explicitly managed by the application (as if they were a 
separate domain), which in turn could perhaps be extended to pure userspace 
applications.  This might perhaps be another path towards integrating EoE in a 
way that doesn't require these additional lock patches.

> -Original Message-
> From: Graeme Foot
> Sent: Tuesday, 6 March 2018 18:27
> To: Florian Pose ; etherlab-dev@etherlab.org
> Subject: Re: [etherlab-dev] Community contribution
> 
> Hi,
> 
> I would also be interested in an etherlab related ethercat conference, but 
> it's
> unlikely that I would be able to get to it.  I'm based in New Zealand.
> 
> My best chance of getting to that part of the world (though very small) is 
> that
> I could be able to go to Hannover Messe in 2019, though that is probably too
> far away.  There is an even smaller chance I could get to the 2018 Hannover
> Messe, but that may be too soon (23-27 April).
> 
> Regards,
> Graeme.
> 
> 
> -Original Message-
> From: Florian Pose
> Sent: Monday, 5 March 2018 10:18 PM
> To: etherlab-dev@etherlab.org
> Subject: Re: [etherlab-dev] Community contribution
> 
> Hello all,
> 
> On Fri, Mar 02, 2018 at 10:58:54AM +0100, Esben Haabendal wrote:
> > > Do main contributors (Florian Pose, Philipp Weyer, Gavin ...) have
> > > an opinion on that ?
> >
> > I really hope they do, and hope they will participate in the
> > discussion here.
> 
> sure we have. ;-)
> 
> The goal must be to maintain one source that as-many-as-possible users can
> live with. I understand that the current situation (with different versions of
> patchsets) is not satisfactory.
> 
> From the IgH point-of-view we have the stable-1.5 branch that we see as
> matured software and that we use heavily in our everyday projects. This
> branch nearly contains everything *we* need (except for some native
> drivers that are more up-to-date).
> 
> The other side is (and this is what was the goal from beginning) that the
> master (and moreover all other software within the EtherLab project) sh

Re: [etherlab-dev] EoE patchs and questions

2018-02-15 Thread Gavin Lambert
Those sound like great changes to have.

 

I suspect the EoE-OP thing came from an assumption that the slave had to be
in OP to transfer EoE frames; there was previously a similar assumption
regarding the DC reference clock that was fixed in
<https://sourceforge.net/p/etherlabmaster/code/ci/559f2f9c5b08700f2e4722f498
799236a2c9f78a/> [559f2f].  I don't have any experience with EoE myself but
a quick glance through the manual for EL6614 does suggest that it will
happily do EoE in PREOP and above.  Do you think there could be any older
slaves that might need OP for that?

 

The register write to 0x808 as a recovery from that condition seems a bit
peculiar - most of those registers are read-only while SM1 is enabled -
though you're writing 0 to 0x80E, which should disable the SM, which then
ought to stop it working entirely, unless something reconfigures it.

 

Perhaps inspecting other SM registers might be interesting?  Or see if
there's anything noticeable around that time in a Wireshark trace (if you
have some way to detect exactly when it stops)?  Does the problem still
happen with fewer patches applied?

 

From: Graeme Foot
Sent: Friday, 16 February 2018 19:01
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] EoE patchs and questions

 

Hi,

 

I've been setting up my system to use EoE (Ethernet over EtherCAT) with an
RTAI user space application.

 

I've updated my master to revision 33b922ec1871 (default branch) and applied
the gavinl (Gavin Lambert) patch set 20171108.

Linux 2.6.32.11

RTAI 3.8.1

 

 

Firstly I have a bit of a different use case for my EoE.  The current
implementation auto creates and removes the eoe interfaces as the EoE
capable slaves are configured and removed.  This means the interface is not
available until the slave is scanned, and is not available if it is removed.
The eoe interface is also temporarily destroyed on a bus rescan.  In my use
case I want to bridge the eoe interface to a real Ethernet interface.  So I
want the eoe interface to always exists whether the slave is plugged in or
not.

 

So the first patch does a few things:

1) adds explicit eoe_addif and eoe_delif tool functions so that you can
manually add/remove an eoe iface without the slave existing

2) no longer deletes and eoe iface if the slave disappears

3) will relink a slave to an eoe iface when it is configured

4) will let you configure eoe ifaces via the sysconfig/ethercat config file

5) will let you turn off auto creation of eoe ifaces via the
sysconfig/ethercat config file

6) no longer keeps slaves with EoE capability in OP mode when the master is
deactivated

 

The above is made possible by using the netif_carrier_on() and
netif_carrier_off() functions of the iface.  (The same as having a normal
network interface up, but not plugged in.)

 

The other thing the patch does is fix a race condition bug in the eoe iface
code.  The current implementation uses a struct list_head queue with a
semaphore to protect it between the iface tx callback and the ethercat
thread.  Sleeps are not allowed in the ifaces tx callback as it is in an
interrupt context.  To fix this I have changed the queue to a ring buffer so
that it no longer needs a lock.

 

FYI, when the race condition occurred I was getting:

BUG: scheduling while atomic

Call Trace:

[] ? ktime_get_real+0x0/0x29

[] ? ktime_get+0x0/0x88

 

Florian you may be interested in this patch, especially the bug fix part.

 

 

The second patch is so that I can run the EoE pump without callbacks.  As I
am using a user space RTAI application I cannot use callbacks as they would
need to call back from a kernel context to the user space context.  Instead
I am running a thread in my application that makes calls into EtherCAT in a
similar fashion to the masters EoE thread.  I have created two functions
(ecrt_master_eoe_is_open() and ecrt_master_eoe_process()) to call without
application locks as the locks only need to be around the
ecrt_master_receive() and ecrt_master_send_ext() calls.

 

 

Now for the question.  I have been hammering my test rig pretty hard with
various communications (pings with multiple fragments multiple times a
second from both directions, SDO calls to the EoE slave without a pause
approx. 100 per second).  Every now and then (after around 10 to 30 minutes
with the above tests) the receive mailbox (SM1) of the EoE slave stops
responding (slave to master).  CoE reads to the slave also fail.  The
transmit mailbox still continues to function.  The RX SM1 status register
continually returns a zero value.  I have found that if I send the command
below the receive mailbox starts to function again (until it doesn't):

 

  ethercat reg_write -p3 0x808 -tuint64 0

 

Has anyone else come across this?  At the moment I suspecting a Slave
firmware bug (EL6614).  Does anyone have any other ideas?

 

 

Regards,

Graeme Foot.

 

 

 

___
etherlab-dev ma

Re: [etherlab-dev] fsm_slave buggy?

2018-02-08 Thread Gavin Lambert
The documentation is not entirely up to date, so it can be misleading at
times.

 

Returning 1 does not indicate success or failure, it indicates whether the
datagram has been populated and needs to be sent (and thus whether the state
machine needs to continue).  Success or failure of the request is indicated
by the state member of the request itself.

 

Having said that, there are some inconsistencies in these state machines,
which have been addressed in the unofficial patchset.  Have a look at
https://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme.

 

From: William Ledda
Sent: Friday, 9 February 2018 05:54
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] fsm_slave buggy?

 

Dear developers, 

I was looking into the code of the stable-1.5 branch and It seems there is
some bugs in fsm_slave. As far as I understand from the documentation of the
following functions: 

 

*   ec_fsm_slave_action_process_sdo, 
*   ec_fsm_slave_action_process_reg, 
*   ec_fsm_slave_action_process_foe, 
*   ec_fsm_slave_action_process_soe

 

they should return 1 in case of success (i.e. request processed) 0
otherwise. They return instead 1 even in case of requests are aborted. I see
new commits on the sable version branch but nothing about this. In the
default branch they have been instead fixed long time ago (2597:0e145bb05859
2014-11-12 14:42 ?) 

 

Which is the correct one? 

 

Kind regards

 

William

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Explicit Device ID

2017-11-28 Thread Gavin Lambert
On 29 November 2017 12:33, quoth Matthieu Bec:
> in my use case the devices are all the same and serviceable. We'd like to
> avoid reprogramming IDs if a device fails and gets swapped by another, so
> we intend to use an ID from an EEPROM dongle (that will be permanently
> mounted) rather than SII (that will change with the new device).

The replacement-of-identical-devices case is what the default positional 
addressing is designed for -- you take a device away and put a replacement 
device at the same relative position in the network so that it assumes the same 
duties.  As long as your network is mostly designed as a chain and you only 
remove and replace devices when the master application is not running (or the 
master application can detect that devices are in an abnormal state and waits 
for the network to be corrected before resuming operation), this goes a long 
way.

Explicit device addressing is more for the case where you have a tree-structure 
network where people might plug things in with a different order, or with 
groups of devices that can appear or disappear at different times.


What I do is to hold the ESC in reset on boot briefly so that I can access the 
SII EEPROM before the ESC can (I have it wired as dual-master for exactly this 
reason); during this time it checks the stored station alias against the 
external ID and if it's zero or otherwise unprogrammed then it replaces it with 
the external id.  (It can also recover from blank or outdated SII contents, 
making one less thing to reprogram for a firmware update.)

In your case you could do something similar, but always overwrite the station 
alias with the external ID.  Or allow the ESC to boot and update the alias 
register in memory only -- I don't really recommend that approach though as it 
produces a race condition with being visible on the EtherCAT network.  You 
could possibly also write it back to the SII (if the master has been configured 
with --enable-sii-assign) so that it will already be correct on next boot.

(If you do update the SII EEPROM, then don't forget that the alias participates 
in the checksum, so you have to recalculate that too.)

> Basically, what you describe in your last paragraph poking the AL Ctrl/Status
> Code Registers. Can you explain why this is less useful than station alias?

Mostly just the lack of present support in the master itself.  With station 
aliases, you can use alias-based addressing (such that all your 
ecrt_slave_configs are addressed as (alias, 0)), which means that the master 
application can remain at realtime and the library will automatically recognise 
network changes and reconfigure the slaves correctly even if they get 
reordered.  If you set up your domains correctly or otherwise adapt to devices 
going missing, you can keep talking to the other devices this way.

(Station aliases can also optionally be used directly as station ids in the 
low-level EtherCAT datagrams, but I don't think Etherlab makes use of this.)

With explicit ids, you can poke the registers to discover the "real" locations 
of devices but if you discover that something has changed you'll have to bring 
the whole master out of realtime and generate new configurations based on their 
current positions before you can resume operation.

Unless you want to try adding support for it to the master.

The basic functionality is simple enough (just looking up a different value and 
pretending it was the alias), but the complication is knowing whether a slave 
supports it or not.  ETG1020 specifies some SII XML parameters 
(IdentificationAdo and IdentificationReg134) to specify this, but AFAIK they 
are not encoded in the SII EEPROM, which is all that Etherlab has access to.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Explicit Device ID

2017-11-28 Thread Gavin Lambert
On 29 November 2017 10:10, quoth Matthieu Bec:
> I have an application that requires uniquely identifying all our slaves
(~180
> that are essentially one same model) in the fieldbus.
> Using "configured station alias" seems it could work, since it means
updating
> all the SII individually.
> Those are custom built slave built around micro-processor and we have a
way
> to expose unique id (from a location eeprom dongle) I was looking at
"Explicit
> Device Identification" from ETG1020 protocol enhancement.
> 
> Has anyone experience with that? I don't see it being currently supported
in
> etherlab master.

The method I use is to program the EEPROM with a unique serial/alias address
during production, so in the field they're preprogrammed with an appropriate
station alias.

The alias can alternatively be set during commissioning using the "ethercat
alias" command, or other means.

These are probably the simplest thing that just works.


The explicit device id method is mostly intended for things like hardware
dipswitches.  There's a procedure outlined in ETG1020 to let the slave
configure the alias on its own on bootup (the short version is to use the
value in the SII EEPROM if nonzero, then the id selector if nonzero, or
signal an error if both are nonzero).  This process is invisible to the
master and again would just work, given a slave that supports it.

ETG1020 also specifies a command the master can send to request the slave
provide a hardware address as a one-off request.  This is not directly
supported by the Etherlab master (although you can use register requests to
do it yourself in the application or a commissioning tool).  This is
independent of the station alias but that also makes it less useful --
especially in a network where devices could be connected in alternate orders
(which is usually the reason why you want individual addressing) you should
be using the alias addresses to define your slave_configs, so that things
will Just WorkT when the network is reconfigured.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Unofficial patchset update 20171108

2017-11-07 Thread Gavin Lambert
Hi all,

I've just updated the unofficial patchset to version 20171108.  It is still
based on the same upstream default commit as before: 33b922.

https://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme

Notable changes since the last release (just a few days ago!):

* New patch devices/0008-linux-4.13.patch.  This adds device patches for
kernel 4.13.  Just because.  Note that I haven't actually tested any of
these (other than verifying compilation); I'm still using 4.9 myself.

* New patch base/0031-debugif-3.17.patch.  This is Ricardo Delgado's fix for
kernel 3.17 and later when --enable-debug-if is specified.

* New patch base/0032-signal-4.11.patch.  This is Ricardo Delgado's fix for
kernel 4.11 and later.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Unofficial patchset update 20171102

2017-11-02 Thread Gavin Lambert
Hi all,

I've just updated the unofficial patchset to version 20171102.  It is still
based on the same upstream default commit as before: 33b922.

https://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme

Notable changes since the last release:

* I've added a new "stable" directory of patches; these contain commits that
have been made upstream to the stable-1.5 branch but not yet to the default
branch.  I only looked at recent history for these so it's entirely possible
that I've missed a few.  But notable inclusions among these are an update to
the CCAT driver and the new IGB driver.

* I've dropped the old driver patches for Linux 4.9 and replaced them with
new patches for all drivers (including IGB) for Linux 3.18, 4.1, 4.4, and
4.9.  As usual since I don't have the hardware myself all I can guarantee is
that the orig code matches what was in the kernel sources (specifically the
vanilla+rt sources, in case it makes a difference) and that the EtherCAT
versions do compile.  I've made my best effort to forward-port the patches
but I can't make any promises that they'll work or have no bugs or memory
leaks.

* New patch base/0030-ext-timeout.patch: if you have a large domain, a fast
cycle, and many concurrent slave requests in progress, it can happen that
there is too much data to safely fit into the cycle, so the master wants to
defer some of the slave requests to the next cycle.  When it does this, it
tries to check when they were originally queued and timeout the requests
beyond a short interval to prevent them being stuck forever -- however the
time it checks against is not actually the time they were queued but rather
the time that the *previous* datagram using the same "slot" was actually
sent, which is obviously silly and tends to always time out the request
datagrams even when not necessary.  This patch fixes that.

Additionally, there's one further patch which I have *not* included in the
patchset, but merely attached to this message for review, because I can't
decide whether it's a good idea to include it or not.  The issue is that
when you call ecrt_slave_config_state it performs no locking and returns
some cached state inside both sc and sc->slave (and patch
features/status/0001 adds a few more things from inside sc->slave).  There's
even a comment indicating that no locks are required to protect sc, which is
true (at least as long as the application is not being dumb).  The trouble
is that if a rescan is in progress (or just about to start) then it is
possible for sc->slave to become NULL inside the method, which in turn
causes a kernel BUG report -- but after this everything continues running
correctly.  The patch adds a lock to resolve this race condition, but I'm a
little hesitant about including it as I don't like the idea of potentially
slowing down ecrt_slave_config_state (as it might be called a *lot* by the
application on large networks), and rescans should be quite rare during
normal operation.  (I noticed this when I had made an experimental change
which caused rescanning to occur nearly constantly.)  Feedback is welcomed.



sc-state-lock.patch
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] foe_write for copley firmware update

2017-10-30 Thread Gavin Lambert
They are not the same.  The "p/etherlabmaster" repository is the mainline 
repository, while the "u/uecasm/etherlab-patches" repository contains 
unofficial patches not yet in mainline.

However, that particular patch is included in the mainline default branch, just 
not in the stable-1.5 branch.

The unofficial patchset does contain some additional FoE patches but they are 
new features rather than bugfixes; at least in this regard you should be fine 
using the mainline default branch alone.
 
> -Original Message-
> From: lingjie_k...@amat.com
> Sent: Tuesday, 31 October 2017 06:23
> To: ricardo.riba...@gmail.com
> Cc: etherlab-dev@etherlab.org
> Subject: Re: [etherlab-dev] foe_write for copley firmware update
> 
> Hi Ricardo,
> 
> Thanks for the information. Meanwhile, I just want to double check that is
> the patchset version that you are talking about the same as the EtherLAB
> EtherCAT Master code in mercurial repository.
> https://sourceforge.net/p/etherlabmaster/code/ci/default/tree/
> 
> Because I see the document in NEWS file.
> 
> Changes since 1.5.2:
> 
> * Fixed FoE timeout calculation bug.
> 
> Best regards,
> 
> Lingjie (Kimi) Kong
> Software Engineer – Servo Control Engineering || Common Solution Group
> || Applied Materials lingjie_k...@amat.com || Office: (408)563-4400
> 
> The content of this message is Applied Materials Confidential. If you are not
> the intended recipient and have received this message in error, any use or
> distribution is prohibited. Please notify me immediately by reply e-mail and
> delete this message from your computer system. Thank you.
> 
> ** Save a tree. Please don't print this e-mail unless needed.
> 
> -Original Message-
> From: Ricardo Ribalda Delgado [mailto:ricardo.riba...@gmail.com]
> Sent: Monday, October 30, 2017 5:37 AM
> To: Lingjie Kong --TR 
> Cc: etherlab-dev@etherlab.org
> Subject: Re: [etherlab-dev] foe_write for copley firmware update
> 
> Have you tried the unofficial patchset? It solves an issue with the foe 
> timeout
> calculation
> 
> https://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/
> 
> 
> regards!
> 
> On Wed, Oct 25, 2017 at 6:33 PM,   wrote:
> > Hi,
> >
> >
> >
> > I am just wondering that does anybody notice the following issue when
> > downloading the firmware for Copley Drive through foe_write command.
> >
> >
> >
> > First, I put the Copley drive into BOOT mode.
> >
> > ethercat -m0 states -p0 BOOT
> >
> >
> >
> > Then, I send the Copley .cff to download firmware
> >
> > ethercat -m0 foe_write -p0 /var/ftp/gather/BE2_2.99.cff
> >
> >
> >
> > However, I consistently get FOE_TIMEOUT_ERROR and FOE_WC_ERROR
> >
> >
> >
> > Based on what Copley told me that,
> >
> >
> >
> > “It looks like your EtherCAT FoE command timed out during the firmware
> > download.  Can you increase the timeout that it uses? At the start of
> > the firmware download the drive needs to erase the old firmware before
> > it can start writing the new file.  This can take several seconds.
> > During that time the drive will indicate that it's busy and the master
> > should keep trying.  It's possible that the master program you're
> > using quit before the drive finished erasing the flash.”
> >
> >
> >
> > Therefore, I am just wondering that does anybody has problem on
> > downloading firmware through foe_write like above and how to resolve
> > it. Any suggestion will be really appreciated.
> >
> >
> >
> > Best regards,
> >
> >
> >
> > Lingjie (Kimi) Kong
> >
> > Software Engineer – Servo Control Engineering || Common Solution Group
> > || Applied Materials
> >
> > lingjie_k...@amat.com || Office: (408)563-4400
> >
> > The content of this message is Applied Materials Confidential. If you
> > are not the intended recipient and have received this message in
> > error, any use or distribution is prohibited. Please notify me
> > immediately by reply e-mail and delete this message from your computer
> system. Thank you.
> >
> > ** Save a tree. Please don't print this e-mail unless needed.
> >
> >
> >
> >
> > 
> >
> >
> > ___
> > etherlab-dev mailing list
> > etherlab-dev@etherlab.org
> > http://lists.etherlab.org/mailman/listinfo/etherlab-dev
> >
> 
> 
> 
> --
> Ricardo Ribalda
> 
> 
> 
> ___
> etherlab-dev mailing list
> etherlab-dev@etherlab.org
> http://lists.etherlab.org/mailman/listinfo/etherlab-dev

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Etherlab master looses mailbox configuration during client connection loss

2017-09-20 Thread Gavin Lambert
The size in ecrt_slave_config_create_voe_handler specifies the maximum size of 
the VoE request/reply, which is different from (but must be equal or smaller 
than) the actual mailbox size.

The mailbox size itself is specified by the slave's SII data, as read from its 
EEPROM during the INIT -> PREOP transition.  If the slave reboots or once 
communications are restored it will be reconfigured (which involves redoing the 
INIT -> PREOP transition), but that shouldn't change the sizes unless there's a 
bug in the slave itself.

It's possible you're running into some kind of timing error, where it's somehow 
trying to execute the VoE request before the slave has properly re-entered 
PREOP or higher, although I thought that such cases are prevented in the 
unofficial patchset at least.  My application code does perform an additional 
sanity check before executing requests though so it's possible I missed a 
corner case if that is omitted.

Try running your test again with "ethercat debug 1" in effect and look at the 
syslog.  In particular look for log messages starting with "Mailbox 
configuration"; this is where it reports the detected mailbox sizes that are 
later used in the error message below.  Also check in what order things are 
happening with regard to when you're interrupting the slave and when the VoE 
request tries to execute and generates the error.
 
> -Original Message-
> From: Christoph Schröder
> Sent: Thursday, 21 September 2017 03:09
> To: etherlab-dev@etherlab.org
> Subject: [etherlab-dev] Etherlab master looses mailbox configuration during
> client connection loss
> 
> Hi All,
> 
> I encountered a problem with the recovery abilities of the Etherlab master
> after connection loss (e.g. pull out cable of one slave and plug it in 
> again). The
> master seems to reset the mailbox configuration. If I start a VoE-request I 
> get
> the following kernel message:
> [132256.054043] EtherCAT ERROR 0-main-0: Data size (24) does not fit in
> mailbox (0)!
> 
> The mailbox size configured through ecrt_slave_config_create_voe_handler
> seems to be lost and not only for the slave disconnected, but also for the
> slave that never lost it's connection (tested with 2). This happens with and
> without the newest inofficial patchset (20170914).
> 
> This seems to be a bug as ecrt_slave_config_create_voe_handler has to be
> called before ecrt_master_activate, so recreation of the config after
> recovery of the connection is not possible.
> 
> Without connection loss everything works fine, but we would like to make
> the system as robust as possible without the need to restart the application.
> Does anyone has an idea how to fix this or can someone at least explain what
> happens during a connection loss and recovery resp.
> which functions are called by the master?
> 
> 
> Thanks and best regards,
> Christoph
> 
> 
> 
> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
> 
> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
> Forschungszentren e.V.
> 
> Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr.
> Jutta Koch-Unterseher
> Geschäftsführung: Prof. Dr. Bernd Rech (kommissarisch), Thomas Frederking
> 
> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
> 
> Postadresse:
> Hahn-Meitner-Platz 1
> D-14109 Berlin
> 
> http://www.helmholtz-berlin.de
> ___
> etherlab-dev mailing list
> etherlab-dev@etherlab.org
> http://lists.etherlab.org/mailman/listinfo/etherlab-dev

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Unofficial patchset update 20170914

2017-09-14 Thread Gavin Lambert
I've just pushed a new update to the unofficial patchset; it is now at
version 20170914 and is based on the latest upstream default (33b922 at this
moment).


https://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme

Feedback is welcomed.

Notable changes:

  * Now ahead by one additional upstream commit.

  * Added one patch base/0029-kern-cont.patch.

* This fixes an issue on recent kernels where parts of logged
messages (especially at debug 1 or higher) end up on separate lines in the
logs instead of the same line as intended.  (Upstream linux would probably
prefer that the values are buffered and only printed as complete lines,
possibly replacing some usage with print_hex_dump calls, but that seemed
like a larger change than I wanted to make at the moment.)

* This probably should be higher up the list but I didn't want to
renumber everything for such a minor change.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] foe_write result

2017-09-14 Thread Gavin Lambert
That's odd, it definitely should be returning exit code 1 on any error
according to the code.  Are you sure you're testing it correctly?

 

But no, it does check whether the slave reports successfully receiving the
file or not and will return 1 in case of error on that end as well.
(Depending on the slave, this may or may not also indicate whether it was
successfully saved to the slave's internal memory or not.)

 

There is no documentation on the ioctls other than the source itself;
they're an internal interface mostly intended to be used only by the command
line tool and userspace library.  You will need to look at the source of the
tool for inspiration if you want to use them directly.  It's safer and
easier to use the tool commands if that suits your usage requirements
however; they just provide less detail for programmatic use.

 

From: lingjie_k...@amat.com [mailto:lingjie_k...@amat.com] 
Sent: Friday, 15 September 2017 12:07
To: Gavin Lambert ; etherlab-dev@etherlab.org
Subject: RE: foe_write result

 

Hi Gavin,

 

Thanks for your advices and I really appreciated. I still get several
questions that I want to double check with you.

 

First, I assume the error on return non-zero exit code in case of error only
check whether the linux command is executed successfully or not. It does not
actually get the status of the slave to see whether it received the file
successfully. 

 

For example, I am trying to send a file in a folder over ethercat while
leave the folder empty by issuing the command

 

ethercat -m0 foe_write -p0 /var/ftp/gather/Drive_configuration.ccd

 

it shows that Failed to open '/var/ftp/gather/Drive_configuration.ccd' in
stderr. However, the exit code is still 0 which means no error. 

 

Second, is there a specific document on how to use ioctl to send file over
FoE in detail because I am not exactly sure on where to start on the source
code.

 

Best regards,

 

Lingjie (Kimi) Kong

Software Engineer - Servo Control Engineering || Common Solution Group ||
Applied Materials

lingjie_k...@amat.com <mailto:lingjie_k...@amat.com>  || Office:
(408)563-4400 || Cell: (858)242-9076

The content of this message is Applied Materials Confidential. If you are
not the intended recipient and have received this message in error, any use
or distribution is prohibited. Please notify me immediately by reply e-mail
and delete this message from your computer system. Thank you. 

** Save a tree. Please don't print this e-mail unless needed.

 

From: Gavin Lambert [mailto:gav...@compacsort.com] 
Sent: Thursday, September 14, 2017 3:52 PM
To: Lingjie Kong --TR mailto:lingjie_k...@amat.com>
>; etherlab-dev@etherlab.org <mailto:etherlab-dev@etherlab.org> 
Subject: RE: foe_write result

 

The "foe_write" ethercat command will print an error on stderr and return a
non-zero exit code in case of error.  It's intended to be used interactively
by whoever is commissioning the system, although it can be used in a script.

 

Alternatively if you want to call it programmatically you could use the
equivalent master ioctl call directly, which provides the specific FoE error
code.

 

The unofficial patchset also adds some additional ecrt.h APIs that you can
call for FoE, although these require more "plumbing" to use.

 

From: lingjie_k...@amat.com <mailto:lingjie_k...@amat.com> 
Sent: Friday, 15 September 2017 10:25
To: etherlab-dev@etherlab.org <mailto:etherlab-dev@etherlab.org> 
Subject: [etherlab-dev] foe_write result

 

Hi,

 

I am working on file over ethercat (FOE) to send motor driver's
configuration file. From the ethercat 1.5.2 document, it looks like I can
use foe_write to send the file. However, I am just wondering that what I can
check to see whether the file transfer is successfully or not. 

 

Best regards,

 

Lingjie (Kimi) Kong

Software Engineer - Servo Control Engineering || Common Solution Group ||
Applied Materials

lingjie_k...@amat.com <mailto:lingjie_k...@amat.com>  || Office:
(408)563-4400

 

  _  

 

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] foe_write result

2017-09-14 Thread Gavin Lambert
The "foe_write" ethercat command will print an error on stderr and return a
non-zero exit code in case of error.  It's intended to be used interactively
by whoever is commissioning the system, although it can be used in a script.

 

Alternatively if you want to call it programmatically you could use the
equivalent master ioctl call directly, which provides the specific FoE error
code.

 

The unofficial patchset also adds some additional ecrt.h APIs that you can
call for FoE, although these require more "plumbing" to use.

 

From: lingjie_k...@amat.com
Sent: Friday, 15 September 2017 10:25
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] foe_write result

 

Hi,

 

I am working on file over ethercat (FOE) to send motor driver's
configuration file. From the ethercat 1.5.2 document, it looks like I can
use foe_write to send the file. However, I am just wondering that what I can
check to see whether the file transfer is successfully or not. 

 

Best regards,

 

Lingjie (Kimi) Kong

Software Engineer - Servo Control Engineering || Common Solution Group ||
Applied Materials

lingjie_k...@amat.com   || Office:
(408)563-4400

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Unofficial patchset update 20170727

2017-07-31 Thread Gavin Lambert
uded ecrt_slave_config_alloc_sdo_request because it's
not really any different from the other two create methods.  (You can always
pass 0 for the index/subindex if you want that.)
   While I still personally prefer the single-API version, this makes it
a non-ECRT-API-breaking change, although still an ABI-breaking change.

- Patch 0066 has been rejected.  I don't think it's worthwhile adding an
entirely separate API for this; just verify that your app was compiled
against the correct version. I've added some notes in the readme about doing
that, and added a patch (base/) to make it easier to distinguish patched
and unpatched sources and binaries.  (And there are plenty of other patches
that break ABI; it doesn't seem useful singling this one out.)

>From http://lists.etherlab.org/pipermail/etherlab-users/2016/003112.html
(Dr.-Ing. Wilhelm Hagemeister):

- I have not applied this patch, since it limits performance, it's a bit
device-specific, and it came from IgH themselves.

>From http://lists.etherlab.org/pipermail/etherlab-dev/2017/000573.html
(Patrick Brünn):

- Imported as base/0005-support-vm_fault-kernel-v4.10.

>From http://lists.etherlab.org/pipermail/etherlab-dev/2017/000581.html
(Gavin Lambert):

- The changes suggested in this post (and the later correction) have
been folded into base/0019-Support-for-multiple-mailbox-protocols (formerly
patch 0005).

>From http://lists.etherlab.org/pipermail/etherlab-dev/2017/000583.html
(Graeme Foot):

- Imported as devices/0005-cx2100-2.6, although it only supports Linux
2.6.32.
- Theoretically forward-ported to Linux 4.9 as devices/0006-cx2100-4.9
-- it compiles, at least; I don't have the hardware to test it.
- I assume that this is intended only for 32-bit systems, as there is
some code that generates suspicious warnings when compiled for 64-bit.  I
haven't tried to correct this.

>From http://lists.etherlab.org/pipermail/etherlab-dev/2017/000587.html (Nir
Geller):

- I'm not sure what to do about this patch.  AFAIK in theory the change
shouldn't be needed (there's a separate EoE thread which should be running
in Linux kernel mode, so app code shouldn't need to do anything in
particular other than getting lock callbacks correct if not in vanilla
usermode) but as I don't use EoE myself I don't know enough about it to say
for sure.  Any chance someone else can chime in on this one?

>From http://lists.etherlab.org/pipermail/etherlab-dev/2017/000592.html
(Steffen Dirkwinkel):

- I haven't applied the suggested patch, but since ssize_t is only used
in one place, and it really didn't need to be (especially since no existing
callers appear to use the return value anyway), I've opted to remove this
usage instead, as patch base/0006-avoid-ssize_t.

>From http://lists.etherlab.org/pipermail/etherlab-dev/2017/000595.html
(Graeme Foot):

- Imported patch 0001 as
base/0007-replace-fprintf-calls-with-EC_PRINT_ERR.

- Imported patch 0008 as
base/0008-read-reference-slave-clock-64bit-time.  (Perhaps this should have
been a feature patch instead, but it seems harmless enough.)

- Imported patch 0010 as
features/rt-slave/0001-allow-app-to-process-slave-requests-from-realtime.

>From http://lists.etherlab.org/pipermail/etherlab-dev/2017/000596.html
(Ricardo Ribalda Delgado):

- This patch is already included in the default branch.

>From http://lists.etherlab.org/pipermail/etherlab-dev/2017/000600.html
(Gavin Lambert):

- I have elected to not make the change mentioned here, as my analysis
wasn't quite correct and it shouldn't make any practical difference.  And
the change would introduce busy cycles, which is undesirable.

Let me know if I've missed any patches, or if you think some of them were
mishandled.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Problem using userspace library and EoE

2017-05-29 Thread Gavin Lambert
On 26 May 2017 19:02, quoth Gregor Beck:
> reading and writing SDO's from userspace while an EoE interface is up (but
> otherwise unused) occasionally failed with:
> 
> Failed to execute SDO upload: Input/output error
[...]
> Skimming the documentation suggests using ecrt_master_callbacks() but it
> isn't provided in userspace.
> 
> How is this supposed to work?

You might want to have a look at the unofficial patchset
(http://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme).

This includes a number of patches specifically intended to improve
interoperability of CoE and EoE, among other things.

I'm planning a new release of it in the near-ish future, but have been
delayed by some other things so it'll probably still be a few weeks away.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Pre-announcement: unofficial patchset update (Gavin Lambert)

2017-05-07 Thread Gavin Lambert
On 7 May 2017 20:33, quoth Graeme Foot:
> I don't use ec_master_set_send_interval() or --enable-hrtimer since my
masters operational thread 
> has always run at 10ms (100Hz) anyway.  (I probably should so will look at
adding that in at some 
> stage.)  My Linux kernel is configured to run at 100Hz and the master
thread is not realtime so is 
> scheduled by the Linux scheduler (RTAI).  Because Linux is set to 100Hz,
it only runs the masters 
> operation thread once every 10ms.
> 
> Prior to your SDO patch, ec_fsm_master_action_process_sdo() was being
called by the masters 
> fsm, but from its idle state.  So it would only be called after all other
processing and 
> housekeeping was complete and was only being fired approx once every 800ms
on my setup 
> with 50 odd slaves.  After the patch ec_master_exec_slave_fsms() is now
called every time 
> the masters operational thread fires.  All good except that on my system
that is still only once 
> every 10ms.

Right, but what I was saying is that prior to my patch it would actually
service the requests faster than 10ms even on your system due to the way the
re-scheduling is done (the master isn't idle until it finishes any
outstanding requests, so it calls schedule() instead of schedule_timeout(1)
-- ie. if there's no other work for the kernel to do it will reschedule
immediately instead of waiting for the next 10ms time slice).  After my
patch the master is idle even while requests are in progress, so the
condition it checks is no longer sufficient and it calls schedule_timeout(1)
too soon, making it slower than it should be.

I didn't notice this regression because I *am* using --enable-hrtimer, which
does not have the same issue.  So what I was suggesting is that *you* try
using --enable-hrtimer, which I'm reasonably certain will solve that
performance issue without needing to try to exec slave FSMs from the
realtime context.

If you can't use --enable-hrtimer for some reason, then the most likely
solution to the above issue is to find the two lines where it checks for
ec_fsm_master_idle (in ec_master_idle_thread and ec_master_operation_thread)
and change the condition from this:

if (ec_fsm_master_idle(&master->fsm)) {

to this:

if (ec_fsm_master_idle(&master->fsm)
&& !master->fsm_exec_count) {

This is just air code and I haven't tested it, but it seems reasonably
likely to solve the issue and restore performance without --enable-hrtimer
to pre-patch levels or better.  Though there might be a risk that it will
make the kernel do some busy-waiting in some cases, though that shouldn't
bother an RTAI application.

> So my new patch allows ec_master_exec_slave_fsms() to be called from my
realtime context.  As 
> you pointed out the master_sem lock would cause a deadlock, so I don't use
it.  Because I don't 
> use the lock I have instead added some flags to track whether it is
currently safe to make the 
> ec_master_exec_slave_fsms() call.  It's generally just the rescan thats a
problem.

I haven't looked at your patch in detail, but it makes me nervous to pull
code outside of a lock like that; there are a lot of data structures that it
protects, and some of them might be more subtle than rescan.  Also, while
this probably isn't a problem with RTAI (since it can pre-empt the Linux
kernel), this API probably would be unsafe to use with regular kernel or
userspace code due to the inverse problem -- what if the app code is in the
middle of executing ec_master_exec_slave_fsms() when the master thread
decides to start a rescan (or otherwise mutate data structures it depends
on)?

> I don't know if the patch will be useful for anyone else, but is useful if
Linux is configured for 
> 100Hz.  It may also be useful on short cycle time systems, e.g. 100 -
250us cycle times, 
> where you want to process the SDO's faster.  Even if Linux is set to
1000Hz is will only 
> schedule the master operational thread at 1ms.  The master thread may also
be delayed if 
> the Linux side gets some heavy CPU usage.

SDOs by design are intended to be slower-than-cycle tasks.  They're for
occasional configuration, diagnostic, or slow acyclic tasks, not for rapid
activity, so if you're trying to get 1ms or higher response rates out of
them, you're probably doing it wrong.  (Recommended timeouts for SDO tasks
are generally measured in *seconds*, not milliseconds.)


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Pre-announcement: unofficial patchset update (Gavin Lambert)

2017-05-04 Thread Gavin Lambert
Mere moments ago, quoth I:
> On 28 April 2017 11:59, quoth Graeme Foot:
> > 2) Your 0054 patch successfully moved the slave sdo request processing
> > from the master fsm's idle state to be called every cycle of the master 
> > thread.
> > However, I have my Linux environment set to run at 100Hz, so this
> > thread would only fire once every 10ms.  Each SDO read/write request
> > would take approx. 30ms to complete.  I didn't want to change Linux to
> > run at 1000Hz so I
> > created patch "etherlabmaster-0010-
> > allow_app_to_process_sdo_requests_from_realtime.patch".
> 
> Maybe I'm missing something about how the RTAI version works, but all I
did
> was to move the logic from inside fsm_master to inside fsm_slave; AFAIK
> both of these execute on the same thread (either the master IDLE or master
> OPERATION thread).  This is independent of the application realtime loop
> either way, so this should not have changed how frequently they run; it
just
> allows multiple slave requests to run in parallel rather than forcing them
to
> execute sequentially, so it should be a net performance gain.
> 
> Though it does seem reasonable in your use case to want to run the slave
> FSMs more often.  I don't see how you could do that safely, though -- you
> can't run ec_master_exec_slave_fsms outside of the master_sem lock, and
> your RTAI task might interrupt Linux while it's still holding that lock,
which will
> deadlock your RTAI task.

Are you configuring with --enable-hrtimer?  After having a quick look at the
code, I can see a potential performance degradation from the patch if you
aren't using it (or did enable it but didn't call
ec_master_set_send_interval with an appropriate value after activating the
master).  Perhaps you could try enabling that instead of using your patch
and see if it helps?


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Pre-announcement: unofficial patchset update (Gavin Lambert)

2017-05-04 Thread Gavin Lambert
On 28 April 2017 11:59, quoth Graeme Foot:
> First off, I also prefer #3.  I use buildroot to create my Linux
environment and
> buildroot applies patches in alphabetical order (at least in my version
which is
> now pretty old), so the number at the front is important.  Buildroot also
> requires that the patch starts with the name of the package (and
optionally a
> revision number), but that is easy for me to prefix.

FWIW, it looks like buildroot does follow a series file if one is present,
so you could just take the series file that's supplied with the patchset and
comment or delete from it the patches you're uninterested in, and add
additional ones of your own.

> When I tried to use your patchset (and the EtherCAT revision they were
for)
> the computer would freeze just after starting my application and going
> realtime.  We use RTAI and I read your notes re that it wasn't tested with
> RTAI.  I didn't have much time to look into problem but I suspect there
may
> have been a lock that ended up in a call from the realtime context that
was
> blocked due to be held in the master thread.

I suspect this is probably either patch 0007, patch 0011, or patch 0030.  If
you do get a chance to look into it, it'd be good to confirm that.  I'm
considering wrapping one or more of these in a configure --enable (or
otherwise disabling them when RTDM is enabled).

> I ended up cherry picking the changes I needed (patch 0054 (sdo requests 
> via slave fsm) and 0038 (sdo write request)).

That could be a little dangerous; the 005x patches are highly dependent on
prior patches and it's probably very bad to run 0054 without 0050-0053.

> As to other potential patches (Note: my patches are against 2526
> (2eff7c993a63) on the stable-1.5 branch):

Thanks, I'll look at including these too.

> 2) Your 0054 patch successfully moved the slave sdo request processing
from
> the master fsm's idle state to be called every cycle of the master thread.
> However, I have my Linux environment set to run at 100Hz, so this thread
> would only fire once every 10ms.  Each SDO read/write request would take
> approx. 30ms to complete.  I didn't want to change Linux to run at 1000Hz
so I
> created patch "etherlabmaster-0010-
> allow_app_to_process_sdo_requests_from_realtime.patch".

Maybe I'm missing something about how the RTAI version works, but all I did
was to move the logic from inside fsm_master to inside fsm_slave; AFAIK both
of these execute on the same thread (either the master IDLE or master
OPERATION thread).  This is independent of the application realtime loop
either way, so this should not have changed how frequently they run; it just
allows multiple slave requests to run in parallel rather than forcing them
to execute sequentially, so it should be a net performance gain.

Though it does seem reasonable in your use case to want to run the slave
FSMs more often.  I don't see how you could do that safely, though -- you
can't run ec_master_exec_slave_fsms outside of the master_sem lock, and your
RTAI task might interrupt Linux while it's still holding that lock, which
will deadlock your RTAI task.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Pre-announcement: unofficial patchset update

2017-04-27 Thread Gavin Lambert
Hi all,

I've recently been doing some more work on my local Etherlab-related code,
and as a result I'm planning to "shortly" (by which I mean still probably a
few weeks away) release a new version of my default-branch unofficial
patchset.

I thought it would be a good idea to let everyone know in advance both to
gather any new patches people might want to add (or feedback on or suggested
changes to existing patches), but also since this is the first update since
publishing it as a repository I'd like to know people's preferences with
regard to re-ordering patches from the existing set.  For example, I could:

1. Retain existing patches exactly as is (barring fuzz updates) and only add
new patches (modifications to existing patches are added as a new patch).
2. Retain existing patches with the same numbering but allow both new
patches (with strictly higher numbers) and modifying existing patches.
3. Renumber existing patches as needed to insert new patches in a logical
place (either grouping patches by related function or putting the simplest
patches first so they have fewer dependencies and are thus hopefully easier
to get included into upstream).
4. Abandon the idea of numbered patches entirely and just rely on consistent
names plus the series file to maintain order.  (Thus also allowing new
patches to be inserted wherever seems sensible.)

If I don't get any feedback, I'm probably inclined towards #3 at this point,
but I might go to #4 if it looks tricky to maintain patch history properly
with #3.  I could be persuaded towards #2 (though it's not my preference)
but am disinclined towards #1.  Or maybe there's some alternate method I
haven't considered.

I might see if I can group related patches into subdirectories -- I know
quilt/Debian patchsets support this, though I'm not sure about HG patchsets.

(FYI, I've already made a note of the patch updates Knud Baastrup submitted
in December, and a few other patches posted to the list since then, though I
haven't integrated them yet.)

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Is AoE supported by IgH EtherCAT Master or is there a work around?

2017-01-24 Thread Gavin Lambert
On 25 January 2017 10:55, quoth Rusu Valerian:
> Is AoE supported by IgH EtherCAT Master or is there a work around? E.g if it 
> is 
> not supported is it possible to get the card configured in a different setup 
> that 
> supports AoE then use it with IgH EtherCAT Master in my setup?

AoE is not supported.

As for whether there's a workaround, I don't know, as I'm not familiar with 
that terminal.  If it's only used for configuration and there is a way to 
persistently save that configuration to internal non-volatile memory (which 
would be unusual but not unprecedented) then it might be possible.  If it's 
intended for use during normal operation then you're probably out of luck, 
unless it provides a way to use CoE or PDOs as an alternative.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Problem to get VoE Data (Etherlab master patchset 20160804)

2017-01-19 Thread Gavin Lambert
On 20 January 2017 04:29, quoth Christoph Schroeder:
> On 01/19/2017 01:09 AM, Gavin Lambert wrote:
> > Try the following changes (sorry, can't generate a patch right now):
> >
> >* voe_handler.c:372:  "mbox_coe_data" should be "mbox_voe_data".
> >
> >* voe_handler.c:500:  add:
> >   memcpy(voe->datagram.data + EC_MBOX_HEADER_SIZE, data,
> > data_size);
> >
> >* voe_handler.c:608:  same here.
> >
> that worked, I had to add some minor changes:
> memcpy(voe->datagram.data + EC_MBOX_HEADER_SIZE, data, data_size);
> => memcpy(voe->datagram.data + EC_MBOX_HEADER_SIZE, data, voe-
> >data_size + EC_VOE_HEADER_SIZE); The VoE header is taken into account
> here by ecrt_voe_handler_data and there is also
> ecrt_voe_handler_received_header so the VoE header has to be copied too.

Sorry, I meant rec_size rather than data_size.

rec_size == voe->data_size + EC_VOE_HEADER_SIZE already, since voe->data_size 
isn't actually assigned until that line.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Problem to get VoE Data (Etherlab master patchset 20160804)

2017-01-18 Thread Gavin Lambert
On  19 January 2017 05:26, quoth Christoph Schroeder:
> I am currently testing the patchset 20160804 and encountered a problem
> introduced by 0005-Support-for-multiple-mailbox-protocols.patch. It's not
> possible to retrieve the data via the libraries "ecrt_voe_handler_data"
> function anymore after a successful ecrt_voe_handler_read. I still get the
> correct data size but there is no data found at the returned pointer address.
> 
> I guess this is the problem:
> "The mailbox state machines will check and fetch the data from their own
> buffer instead of the datagram buffer (that is no longer used for mailbox
> read data)."
> 
> The data seems to be stored somewhere else now and the function still
> returns a pointer to a datagram buffer. Communication still works and the
> data is there inside the master as debuglevel=1 prints the correct data into
> the kernel log. I found the lines were this is done in the new introduced
> "ec_voe_handler_state_read_response_data", but how can I access the
> data from the library without using ioctl calls from my user application? I
> would appreciate any hint or an update of the patch.

Try the following changes (sorry, can't generate a patch right now):

  * voe_handler.c:372:  "mbox_coe_data" should be "mbox_voe_data".

  * voe_handler.c:500:  add:
 memcpy(voe->datagram.data + EC_MBOX_HEADER_SIZE, data, data_size);

  * voe_handler.c:608:  same here.

Give this a try and let me know if it helps, or if further changes are 
required, and I can update the patchset accordingly.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] EoE in OP mode

2017-01-18 Thread Gavin Lambert
Note that the patchset has only really been tested with RT_PREEMPT or
otherwise standard user mode usage.

 

In particular, there are some patches that change locks and callbacks in
ways that I don't think are entirely compatible with RTAI / Xenomai; there
have previously been reported problems using those with this patchset.

 

As I was neither the author of those patches nor do I use Xenomai (or EoE)
myself, I don't really know what needs to be done to resolve the issues
(except just dropping them and possibly breaking the scenario they were
originally authored to fix); additionally, I don't have much time at the
moment to work on EtherCAT.  I welcome assistance in correcting this
situation. :)

 

 

As far as I understand, ec_master_send/receive are only ever supposed to be
invoked on one thread at a time; when you're using the userspace library
this is enforced by a Linux lock in the corresponding ioctl, but this
doesn't apply or is insufficient when using a kernel-mode application or
RTAI/Xenomai.  In those, you need to register callbacks and use your own
appropriate locking mechanism to ensure that the send/receive are not called
concurrently.

 

In particular note that both the send callback and the receive callback are
permitted to do nothing if called in a context where they can't wait on a
lock but something else is concurrently busy doing the same thing.  So if
you're calling send/receive from an interrupt thread, you will need to keep
track of this and force the EoE thread callback to block until the interrupt
is done, and also to make the interrupt thread avoid send/receive without
blocking if the EoE thread is already in the middle of it.  Alternately you
could probably make the interrupt handler responsible to do both of these
things and have the EoE callbacks always do nothing, which might be better
for your application performance.  (Though like I said, I haven't looked at
the code much in this area so take these suggestions with a grain of salt; I
could have something incorrect.)

 

From: etherlab-dev [mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of
Geller, Nir
Sent: Wednesday, 18 January 2017 23:38
To: etherlab-dev@etherlab.org; Slutsker, Rasty

Subject: [etherlab-dev] EoE in OP mode

 

Hi,

 

I recently upgraded ethercat master to Gavin Patchset 20160804, adding to
that, patch 0061.

 

EoE seems to be working fine while the master is idle, with heavy SDO
traffic in parallel.

 

When the master is active our realtime application invokes
ecrt_master_receive(master);  and  ecrt_master_send(master);  from interrupt
context, and NOT from 

ec_master_operation_thread() thread context.

 

The problem comes up when the master is active.

 

Just as I issue

 

ifconfig eoe0a1 up

 

I get a bunch of UNMATCHED DATAGRAMS in the kernel log, and the master is
released.

 

[   73.324525] EtherCAT DEBUG 0: UNMATCHED datagram:

[   73.324528] EtherCAT DEBUG: 0D 83 01 00 10 09 08 80 00 00 68 5A 4A 84 9C
9B 

[   73.324539] EtherCAT DEBUG: 84 11 01 00 

[   73.324544] EtherCAT DEBUG 0: UNMATCHED datagram:

[   73.324547] EtherCAT DEBUG: 04 84 01 00 90 09 08 80 00 00 B0 3D 4C 84 9C
9B 

[   73.324557] EtherCAT DEBUG: 84 11 01 00 

[   73.324562] EtherCAT DEBUG 0: UNMATCHED datagram:

[   73.324565] EtherCAT DEBUG: 0C 85 00 00 00 00 10 80 00 00 00 00 70 FF FF
FF 

[   73.324575] EtherCAT DEBUG: 50 52 70 FF FF FF 00 00 31 00 03 00 

[   73.324584] EtherCAT DEBUG 0: UNMATCHED datagram:

[   73.324587] EtherCAT DEBUG: 07 86 01 00 30 01 02 00 00 00 08 00 01 00 

[   73.324838] EtherCAT 0: fsm->slaves_responding[fsm->dev_idx]=1

[   73.324843] EtherCAT 0: 0 slave(s) responding on main device.

[   73.324846] EtherCAT 0: datagram->working_counter=0
<-  In wireshark capture WC is 1 

[   73.324850] EtherCAT 0: datagram->state=4

[   73.324853] EtherCAT 0: datagram->device_index=0

[   73.324856] EtherCAT 0: datagram->device_origin=0

[   73.324860] EtherCAT 0: datagram->index=134

[   73.324863] EtherCAT 0: datagram->type=7

[   73.324866] EtherCAT DEBUG 0: Rescanning the bus

 

 

This happens due to a timeout. When the EoE thread invokes

 

master->receive_cb(master->cb_data); which leads to invoke of
ecrt_master_receive(master); it somehow messes up 

 

master->devices[EC_DEVICE_MAIN].cycles_poll

 

which leads to a negative time delta in the calculation
master->devices[EC_DEVICE_MAIN].cycles_poll - datagram->cycles_sent.

 

Attempting to bypass that in the EoE thread, I commented out
master->receive_cb(master->cb_data);  and  master->send_cb(master->cb_data);

and once I invoke

ifconfig eoe0a1 up

 

I get an explosion of

 

[  123.529911] EtherCAT WARNING 0-main-0: Failed to receive mbox check
datagram for eoe0a1.

[  123.529918] EtherCAT WARNING 0-main-0: Failed to receive mbox check
datagram for eoe0a1.

[  123.529925] EtherCAT WARNING 0-main-0: Failed to receive mbox check
datagram for eoe0a1.

[  123.529932] EtherCAT WARNING 0-main-0: Failed to receiv

Re: [etherlab-dev] Problems with Xenomai

2016-09-28 Thread Gavin Lambert
On 29 September 2016 03:07 quoth Christoph Schröder,
> #1.)
> Starting with the tarball release 1.5.2 and encountered a problem with
> ecrt_master_reference_clock_time which led to a segmentation fault. My
> DC config here is basically the same as in the rtai_rtdm_dc example with
> minor fixes since I am not using RTAI. The rest is based on the xenomai
> example. The problem seems to be fixed in the mercurial repo (tested
> 5a70ffc4644b for later tests of the patch queue) and I would like to know
> which commit fixed this issue. Unfortunately I can't find the point where
the
> release 1.5.2 was taken from since the changelog messages do not
> correspond to the commit messages and there is no Label for the release.
> 
> This is my debugging output:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x77fd8700 (LWP 4389)] 0x768d53ca in
> vfprintf () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) backtrace
> #0  0x768d53ca in vfprintf () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x768daa00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x768d553e in vfprintf () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x768e0188 in fprintf () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x77bd8944 in ecrt_master_reference_clock_time (
>  master=, time=) at master.c:717

Given that stack trace, and that it works on default but not 1.5.2, then
most likely the commit that worked around the issue for you was
https://sourceforge.net/p/etherlabmaster/code/ci/3affe9cd0b66fe55ef8e8060778
ef9461a8204a0.

Having said that, given that the only reason I can think of that this would
segfault is if strerror returned NULL or an invalid pointer, it suggests
that you might have a broken or badly configured libc.  If you're building
the libc yourself, make sure that you're using an up-to-date version and
haven't excluded the strerror text.

Another possibility is that if you were concurrently calling strerror() on
another thread (and your libc doesn't implement strerror in a thread-local
manner) then it could have corrupted the buffer.  Most likely another patch
would be required to resolve this "properly", although one workaround for
this is to avoid calling ecrt_* APIs from more than one thread.

Although I suppose since you're linking to RTDM it's possible that
strerror() is coming from there rather than the libc; I'm not exactly sure
how RTAI/Xenomai work.  Or possibly that in that context it could be that
the fprintf(strerr) itself is failing -- but this isn't new code so I would
have thought the problem would have come up earlier if that were the case.

I'm not sure exactly which commit 1.5.2 is based on, but it will be one of
the ones in the "stable-1.5" branch.  Everything on "default" is newer than
that.

> #2.)
> I did some minor tests with the patch queue and got some bad system
> freezes with the xenomai example. I could locate the patch that seems to
> cause the system freezes:
> 0011-Master-locks-to-avoid-corrupted-datagram-queue.patch
> The only notable thing I could see in the kernel log is that the slaves
went
> back to PREOP. The Xenomai task was still running and hanging at some
point
> of the cycle (I placed an rt_printf in the cycle which should have printed
the
> cycle_counter value every other second).
> The patch series seems to work if I apply the patches up to 0010-Sdo-
> directory-now-only-fetched-on-request.patch. Is this reproduceable for
> you?

I'm not sure about this as I don't use Xenomai myself.  That particular
patch was authored by Knud Baastrup, so I've added him to the email chain
directly just in case.  If I recall correctly I think he, like myself, was
using PREEMPT_RT so it's possible that this has not been tested with
Xenomai.

Do you have locking on the Xenomai side as well?  Do you call ecrt APIs from
multiple Xenomai tasks?  I believe the patch assumes that there is no
external locking between tasks, so you might be running into deadlocks
depending on the order in which things happen.

Using Linux locks between Xenomai tasks is probably not ideal, but I would
have expected that it ought to work as this occurs in other places as well.

> #3.)
> In both versions (1.5.2 and repository 5a70ffc4644b) I get a lost frame at
> startup. Is this anything to worry about?
> [Wed Sep 28 15:24:51 2016] EtherCAT 0: Master thread exited.
> [Wed Sep 28 15:24:51 2016] EtherCAT 0: Starting EtherCAT-OP thread.
> [Wed Sep 28 15:24:51 2016] EtherCAT WARNING 0: 1 datagram UNMATCHED!
> [Wed Sep 28 15:24:52 2016] EtherCAT 0: Domain 0: Working counter changed
> to 2/3.
> [Wed Sep 28 15:24:52 2016] EtherCAT 0: Slave states on main device: OP.

I don't think this is anything to worry about; it's probably just that the
idle thread sent a request and then exited before the reply came back; the
reply then sat in the buffers until the OP thread started but it had either
timed out or reset the state machines in the meantime so it was no longer

Re: [etherlab-dev] ethercat master 1.5.1 - EoE & CoE in parallel

2016-08-23 Thread Gavin Lambert
On 24 August 2016 01:32, quoth Nir Geller:
> According to the implementation of the master and posts in etherlab-users
and 
> etherlab-dev I understand that a mail box protocol demultiplexer isn't 
> implemented in ethercat master 1.5.1.
> 
> Is there a patch that implements such a thing on 1.5.1?

I've previously sent you a link to my unofficial patchset.  This includes a
mailbox demultiplexer and all sorts of other new features and other fixes.
You can choose to apply only a subset of patches if you wish, though you
might need to deal with conflicts and fuzz that way.  But the latest version
applies against the default branch, which is newer than 1.5.2.

The patch repository also includes older versions of the patchset, some of
which were intended to apply against 1.5.2.  But the older versions lack the
handy readme file, so you'd have to look at the list archives for more
details.

1.5.1 is sufficiently old now that you shouldn't be bothering with it.  But
if you *really* want to, there's no particular reason why you couldn't
backport some of the patches.  (Though if you' go ahead with that, then I
recommend looking at a prior version of the patchset, either from the
repository or the mailing list archives -- there were several patches
dropped because they were incorporated into default, but are still absent
from 1.5.2 and older.)

> Would you recommend to migrate to 1.5.2, or implement such functionality
on 
> the current code of 1.5.1?

I would recommend migrating to default.  Even 1.5.2 was a long long time
ago.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] e1000e link detection issues with kernel 3.14

2016-07-06 Thread Gavin Lambert
On Wednesday, 6 July 2016 22:04, quoth Christoph Permes:
> After checking the e1000e code I noticed that there has been a change in the
> original driver between kernel 3.8 and 3.10 affecting link detection in the
> e1000_watchdog function (the change has been reverted in kernel 3.16).
> With these changes a check for adapter->ecdev got lost in the EtherCAT
> capable drivers starting with kernel 3.10.

I haven't yet looked into your patches but this sounds similar to something 
that I fixed in patch 0024 of my patchset, the latest version of which you can 
find here:  http://lists.etherlab.org/pipermail/etherlab-dev/2016/000553.html 
(original post 
http://lists.etherlab.org/pipermail/etherlab-dev/2015/000475.html).  Perhaps it 
may be of interest.

Although according to my notes this was to fix something changed in 2.6.37 and 
later, where the watchdog is called on the wrong thread (and often not at all). 
 So they might be independent despite being related.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ECAT Mailbox Repeat Request

2016-07-06 Thread Gavin Lambert
On Thursday, 7 July 2016 02:15, quoth Christoph Schroeder:
> EtherCAT slaves support a repeat request for mailbox protocolls which is
> initiated by toggling bit 1 in the 'Activate SyncManager Register'
> (0x0806+y*8 where y = number of SyncManager). Is this actually supported by
> the master especially for VoE? I already looked up the master code and 
> couldn't
> find any reg write to these registers.

Currently no, it has no facility to trigger that.  Most of the mailbox handling 
is fairly crude at present and relies on being able to retry sending the 
original request to get the same (or equivalent more recent) answer.

The stable and default branches don't really have good support for mixing 
multiple mailbox types on a single slave either, though there are some patches 
floating around the dev list that improve on this.  None of them add repeat 
support, however.  But they might resolve the issue that prompted you to ask 
about it. :)

I suspect that supporting repeat requests would also require making the master 
properly support mailbox service counting, which is not currently the case 
either.

(In principle, repeat requests are only useful for the case when the mailbox 
fetch datagram from the master successfully reaches the slave but is then lost 
on the way back to the master -- at this point the slave thinks it has 
successfully delivered the mailbox content and will clear it, but the master 
does not receive it, and will typically then either retry the original request 
after a timeout or fail the request and let higher-level code decide whether to 
retry or not.  So one alternate goal is to improve your network robustness so 
that datagrams don't get lost in the first place.)


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] Default branch patchset re-applied on 5a70ff

2016-06-14 Thread Gavin Lambert
Thanks.

 

Just as an FYI, I've discovered a memory leak in patch 0042 in more recent
kernel versions, and I decided that it makes more sense to lift the
file-loading part to a separate source file, so I'm in the process of
rewriting patches 0041 and 0042 again.  (Mainly rewriting 0042, but in the
next version I'll fold them together as it will make things more readable.)

 

I'm still interested in any feedback on their current implementation, but
the new version is going to look much cleaner, I think.  I'm hoping to have
a new version of the patch ready in a few hours, but it might take a little
longer. :) 

 

Regarding patch 0034, two potential improvements that occurred to me after
looking through it:

* A successful SII write should update the SII cache, so that an SII
read will read it back without a rescan.

* It might be useful to have an option to sii_read that does cause
it to read the actual device SII without a full rescan; particularly for
devices that store calibration or other custom settings in the SII.  Or
alternately a way to force a rescan to not use the cached SII.  (Having said
that, it's reasonably easy to clear the cache via a "service ethercat
restart", which should suffice in most cases.)

These go beyond the scope of the original patch, of course; they're just
ideas, and I don't mean to imply any sort of obligation.  Just some thoughts
if you're rewriting it anyway. :) 

 

From: Knud Baastrup [mailto:k...@deif.com] 
Sent: Wednesday, 15 June 2016 00:51
To: Gavin Lambert ; Florian Pose ;
'David Page' 
Cc: etherlab-dev@etherlab.org
Subject: RE: [PATCH] Default branch patchset re-applied on 5a70ff

 

Gavin, we really appreciate the time you put into reviewing and improving
the patches.

 

I have done some more testing with focus on the EoE part and in one setup I
have now observed that the EtherCAT-EoE thread can end up in an forever
uninterruptable sleep. I can reproduce this in both the patch serie from
20160502/20160610 and in the newest patch serie from 20160613. I will
investigate this further.

 

A few comments to below patches:

 

Patch 0037:

The versioning is a bit tricky and to be safe we have to use version 3.17
where the detect_deadlock argument for sure is dropped. If using the RT
kernel patch, the argument must be dropped from 3.14.34-rt32, but it is not
possible to do a check on the rt32 version part that is just given by a tag.
I will update the patch to use version 3.17 and maybe also allow the
detect_deadlock argument to be dropped via configure.ac so we can support
the Linux Real time versions from 3.14.34-rt32.

 

Patch 0032:

Yes, now I recall that we discussed this about a year ago and it is actually
the commit message that is a bit wrong as we do actually enter INIT + ERROR
after the Master has requested PREOP. I will update the commit message to
make this a bit more clear.

 

Patch 0034:

Yes, I should have known that this of cause is unnecessary for the read part
(as I have previously worked with the SII cache part). I will update the
patch and remove this part again.

 

Thanks,

 

Knud

 

From: Gavin Lambert [mailto:gav...@compacsort.com] 
Sent: 13. juni 2016 08:34
To: Knud Baastrup; Florian Pose; 'David Page'
Cc: etherlab-dev@etherlab.org <mailto:etherlab-dev@etherlab.org> 
Subject: RE: [PATCH] Default branch patchset re-applied on 5a70ff

 

Ok, I've looked through the new patches now.  Attached is my refresh of
them.  Mostly it's just inclusion of a series file and cleaning up the
commit messages to be HG-safe (HG doesn't quite like some of the things that
git adds, such as diffstats and a few other artefacts).  Other changes
include:

 

* Adopted the filenames from Knud's set.

* Replaced
0037-Breaking-change-rt_mutx_lock_interruptible-calls-for.patch with a
version that isn't a breaking change (assuming the version numbers in the
commit message were correct; I haven't verified this).  This could probably
be folded into patch 0004, but I left it separate for clarity.

* Added 0040-rescan-check-revision.patch.  This modifies patch 0013
in four ways:

1.   The SII cache-and-reuse behaviour can be disabled via
-disable-sii-cache at configure time, rather than requiring modifying a
header file.

2.   The revision number is also verified before using the cached
version (this resolves some issues when the device firmware is upgraded).

3.   If both the alias and serial are read as 0, it will no longer
bother reading the vendor/product/revision, as it is now known that the SII
is not in the cache.

4.   Several similar states are consolidated into one.

* Added 0041-load-sii-from-file.patch, from Graeme Foot's recent
patch; but I've made the following modifications:

1.   The functionality is disabled by default.

2.   At configure time, you can use -ena

Re: [etherlab-dev] ESL protocol

2016-04-13 Thread Gavin Lambert
On Thursday, 14 April 2016 04:29, quoth Matthieu Bec:
> I was wondering if anyone looked at implementing the Ethercat Switch Link
> protocol for devices like BH-CU2508 ?
> It does not look terribly complicated conceptually - but probably quite
involved
> to implement in the kernel, setup virtual network interfaces, etc.

I looked at it briefly (as a "would be nice", since it's a neat little toy),
and yes, the protocol format itself is fairly straightforward.

The problem is that as far as I can tell it requires foreknowledge of the
network configuration rather than auto-detection and would need some sort of
virtual layer between the "real" NIC and the master stack, both of which
seem like major architectural changes.

These probably aren't insurmountable but at the time it was enough for me to
file it in the "too hard" basket.  I'm not an "official" dev though, just
someone who has submitted a few patches, and I could be wrong about that.
If it's something you want, you could always try adding support yourself.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

2016-02-03 Thread Gavin Lambert
Well, I guess that would work too, but I was thinking of a different 
arrangement.

I have the "real" output values stored in scattered memory locations (in an 
object graph related to their functions; not structured like the domain memory 
at all) and then the cyclic task uses EC_WRITE_* to copy the individual values 
from the objects to the domain memory.

It's not really any different from having a secondary cache that can be 
memcpy'd, I guess, but it "feels" like less copying.  (Well, I suppose 
technically it might be slightly slower when doing the actual copy, but 
conversely it'd be faster at doing the calculations, so I think that's a wash.)

OTOH I'm not controlling precision motors, so calculation latency probably 
doesn't bother me as much as it does some others. :) 

> -Original Message-
> From: Graeme Foot [mailto:graeme.f...@touchcut.com]
> Sent: Thursday, 4 February 2016 12:21
> To: Gavin Lambert ; 'Tillman, Scott'
> ; Dr.-Ing. Matthias Schöpfer
> ; etherlab-dev@etherlab.org
> Subject: RE: [etherlab-dev] Possible Realtime Issues with Ethercat Master and
> RT Preempt Kernel
> 
> Hi,
> 
> Yes, the EC_WRITE_* macros should still be used when writing to the cached
> write memory, but then a straight memcpy from the cache to the domain
> memory is fine.
> 
> Graeme.
> 
> 
> -Original Message-
> From: Gavin Lambert [mailto:gav...@compacsort.com]
> Sent: Thursday, 4 February 2016 11:48 a.m.
> To: 'Tillman, Scott'; Graeme Foot; Dr.-Ing. Matthias Schöpfer; etherlab-
> d...@etherlab.org
> Subject: RE: [etherlab-dev] Possible Realtime Issues with Ethercat Master and
> RT Preempt Kernel
> 
> On 3 February 2016 21:02, quoth Tillman, Scott:
> > Since you brought up the typical process cycle: I have been using a
> > process similar the second one you describe.  I was very surprised
> > when I was doing my initial development that the output frame and the
> > return frame were overlaid, requiring double buffering of the output
> > data.  It seems like you should be able to configure the domain to
> > place the return data in a separate (possibly
> > neighboring) memory area.  As it is the double buffering is the same
> > idea, but causes an extra memcpy just prior to sending the domain data.
> 
> The expectation is that you'll use the EC_WRITE_* macros to insert values into
> the domain memory; this takes care of byte-swapping to little-endian for you 
> if
> you happen to be running on a big-endian machine.  You can usually only get
> away with a blanket memcpy if you know your master code will only ever run on
> little-endian machines.
> 
> > More problematic is the absence of any way to block (in user-space)
> > waiting for the domain's return packet.  As it is I am setting up my
> > clock at 0.5ms to handle a 1ms frame time:
> [...]
> > Are these two things there somewhere and I've just missed them, or is
> > there a good reason they haven't been implemented?  It seems like
> > these two items would minimize the overhead and maximize the
> > processing time available for most applications.
> 
> There isn't really a way to do that; it's a fundamental design choice of the
> master.  The EtherCAT-custom drivers disable interrupts and operate purely in
> polled mode in order to reduce the latency of handling an interrupt and
> subsequent context-switching to a kernel thread and then a user thread.  What
> gets sacrificed along the way is any ability to wake up a thread when the 
> packet
> arrives, since nothing actually knows that the packet has arrived until 
> polled.
> 
> To put it another way, when the datagram arrives back from the slaves, it just
> sits in the network card's hardware buffer until the buffer read is triggered 
> by
> an explicit call to ec_master_receive().
> 
> The generic drivers have interrupts enabled (so the packets will be 
> immediately
> read out of the hardware buffer into a kernel buffer) but the master still 
> treats
> it as a polled device and won't react until explicitly asked to receive.
> 
> With some patches (such that ec_master_receive will tell you if it has 
> received
> all the datagrams back, or similar) you could call this repeatedly (perhaps 
> with
> short sleeps) shortly after sending the datagrams to detect as soon as they're
> back again, but obviously this will increase the processor load and give the
> system less time to do non-realtime things.  If you have some idle cores then
> this may not be a problem, however, and the quicker reaction may be worth it.
> 
> Having said that, as long as your calculation time is fa

Re: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

2016-02-03 Thread Gavin Lambert
On 3 February 2016 21:02, quoth Tillman, Scott:
> Since you brought up the typical process cycle: I have been using a process
> similar the second one you describe.  I was very surprised when I was doing my
> initial development that the output frame and the return frame were overlaid,
> requiring double buffering of the output data.  It seems like you should be 
> able
> to configure the domain to place the return data in a separate (possibly
> neighboring) memory area.  As it is the double buffering is the same idea, but
> causes an extra memcpy just prior to sending the domain data.

The expectation is that you'll use the EC_WRITE_* macros to insert values into 
the domain memory; this takes care of byte-swapping to little-endian for you if 
you happen to be running on a big-endian machine.  You can usually only get 
away with a blanket memcpy if you know your master code will only ever run on 
little-endian machines.

> More problematic is the absence of any way to block (in user-space) waiting 
> for
> the domain's return packet.  As it is I am setting up my clock at 0.5ms to 
> handle
> a 1ms frame time:
[...]
> Are these two things there somewhere and I've just missed them, or is there a
> good reason they haven't been implemented?  It seems like these two items
> would minimize the overhead and maximize the processing time available for
> most applications.

There isn't really a way to do that; it's a fundamental design choice of the 
master.  The EtherCAT-custom drivers disable interrupts and operate purely in 
polled mode in order to reduce the latency of handling an interrupt and 
subsequent context-switching to a kernel thread and then a user thread.  What 
gets sacrificed along the way is any ability to wake up a thread when the 
packet arrives, since nothing actually knows that the packet has arrived until 
polled.

To put it another way, when the datagram arrives back from the slaves, it just 
sits in the network card's hardware buffer until the buffer read is triggered 
by an explicit call to ec_master_receive().

The generic drivers have interrupts enabled (so the packets will be immediately 
read out of the hardware buffer into a kernel buffer) but the master still 
treats it as a polled device and won't react until explicitly asked to receive.

With some patches (such that ec_master_receive will tell you if it has received 
all the datagrams back, or similar) you could call this repeatedly (perhaps 
with short sleeps) shortly after sending the datagrams to detect as soon as 
they're back again, but obviously this will increase the processor load and 
give the system less time to do non-realtime things.  If you have some idle 
cores then this may not be a problem, however, and the quicker reaction may be 
worth it.

Having said that, as long as your calculation time is fairly constant, it's 
probably better to use the "classic" cycle structure than to do this -- the 
exact same input values will be read either way, as they're captured at the 
"input latch time" of the slave, which is typically either just after the last 
or in anticipation of the next datagram exchange.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] CoE FSM dictionary entry requests

2015-08-09 Thread Gavin Lambert
Hi,

I was looking through the CoE code recently (specifically trying to decide
whether to implement ETG1004 (unit specification) in a slave -- given that
neither Etherlab nor the SSC appears to support it though, I'm leaning
towards not) and noticed something a little odd.

In fsm_coe.c's ec_fsm_coe_dict_prepare_entry, the value info byte is
specified as 0x01 for "access rights only".

According to ETG1000.6 5.6.3.6, that bit is "reserved" and the access rights
are supplied in the response regardless.  So shouldn't this be 0x00 instead
of 0x01?  Or is there some historic reason for this that I'm unaware of?

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] My patchset roundup July 2015

2015-08-09 Thread Gavin Lambert
I'm not sure that's correct either.  Generally the error indication is to
provide information about a refusal to enter a higher state, and can be
acknowledged.  So the "correct" flow would be:

1. Slave is in BOOT and has received invalid firmware; if possible, signal
an FoE error or log a Diagnostics message at this time.
2. Master requests INIT.
3. Slave acknowledges INIT with no error.
4. Master requests PREOP.
5. Slave refuses PREOP by replying with INIT + ERROR.
6. Master acknowledges ERROR.
7. Slave returns to INIT (clears ERROR).
8. Repeat 4-7 if master re-requests PREOP.

Again, have a look at ETG1000.6 6.4.1.4.
            
> -Original Message-
> From: Knud Baastrup [mailto:k...@deif.com]
> Sent: Thursday, 6 August 2015 20:26
> To: Gavin Lambert 
> Cc: etherlab-dev@etherlab.org
> Subject: RE: [etherlab-dev] [PATCH] My patchset roundup July 2015
> 
> Hi Gavin,
> 
> We do enter state INIT, but with the Error indication flag set (due to
invalid
> firmware) and that is why I wrote "prevented to enter state INIT", which I
> agree could be misunderstood.
> 
> BR, Knud
> 
> 
> -Original Message-
> From: Gavin Lambert [mailto:gav...@compacsort.com]
> Sent: 6. august 2015 09:42
> To: Knud Baastrup
> Cc: etherlab-dev@etherlab.org
> Subject: RE: [etherlab-dev] [PATCH] My patchset roundup July 2015
> 
> Hi Knud,
> 
> Thanks for sharing the updates.
> 
> Regarding the description of the update for patch 0008, though, this
puzzles
> me a little.
> 
> According to ETG1000.5 (specifically 6.2.1.3.8 "Stop Bootstrap Mode"
> service, and also ETG1000.6 6.4.1.4 #51 and #57), a slave is not permitted
to
> refuse the transition from BOOT to INIT (this generally applies to all of
the
> other "upwards" transitions as well).
> 
> Of course the master still needs to be able to cope with it anyway (a
crashed
> slave may fail to respond to all transitions until power cycled, although
this is
> not the same as a refusal), but it's not something that's supposed to
happen.
> 
> I assume this is one of your own slaves?  My reading of the standards
> suggests that in the case of invalid firmware a more appropriate response
> would be to allow the transition to INIT (and back to BOOT), but refuse
> transition to PREOP with AL status 0x0014 ("no valid firmware").
> 
> Having said that, there's very little information in the standards (at
least that
> I've found) about how firmware updates are recommended to work
> (presumably because this is very hardware-specific), so it wouldn't hurt
to
> ask about it on the ETG forums, if you haven't already.
> 
> Regards,
> Gavin Lambert
> 
> > -Original Message-
> > From: Knud Baastrup [mailto:k...@deif.com]
> > Sent: Thursday, 6 August 2015 19:05
> > To: Gavin Lambert
> > Cc: etherlab-dev@etherlab.org
> > Subject: RE: [etherlab-dev] [PATCH] My patchset roundup July 2015
> >
> > Hi Gavin,
> >
> > I look forward to try out the complete access support that you have
> > implemented. We have several cases where we believe this feature can
> > be very useful.
> >
> >
> > I have done a few updates on some of the knud-xxx patches as well,
> > which includes:
> >
> > Correction to 0003-Eoe-mac-address-now-derived-from-unique-
> mac.patch:
> > Added new line to EC_SLAVE_INFO
> >
> > Correction to 0008-Clear-slave-mailboxes-after-a-re-scan.patch:
> > Now assuming that the slaves mailbox data is valid even if the slave
> scanning
> > skipped the clear mailbox state, e.g. if the slave refused to enter
> > state
> INIT.
> > This correct an introduced bug where invalid application firmware
> > (that prevented the slave module to enter state INIT) could prevent
> > mailbox communication (FoE) in state BOOT.
> >
> > New 0017-EoE-processing-is-now-only-allowed-in-state-PREOP-SA.patch
> > The patch ensure that EoE fragments only is forwarded in state PREOP,
> > SAFEOP and OP and not in state BOOT that is mainly (only?) intended
> > for bootstrapping using FoE. The patch corrects an issue where EoE
> > fragments can interrupt a successful FoE firmware upgrade in state
> > BOOT despite that the slave code is restricted to only use FoE in
> > state BOOT. In our case we
> use
> > a common bootloader for all I/O modules that only supports FoE. The
> > bootloader will however still use mailbox buffer to receive EoE
> > fragments (and additional to inform the Master that EoE is not
> > supported), which in some cases can prevent the execution of a FoE
> request.
> >
> > Mvh. Knud
> &

Re: [etherlab-dev] [PATCH] My patchset roundup July 2015

2015-08-06 Thread Gavin Lambert
Hi Knud,

Thanks for sharing the updates.

Regarding the description of the update for patch 0008, though, this puzzles
me a little.

According to ETG1000.5 (specifically 6.2.1.3.8 "Stop Bootstrap Mode"
service, and also ETG1000.6 6.4.1.4 #51 and #57), a slave is not permitted
to refuse the transition from BOOT to INIT (this generally applies to all of
the other "upwards" transitions as well).

Of course the master still needs to be able to cope with it anyway (a
crashed slave may fail to respond to all transitions until power cycled,
although this is not the same as a refusal), but it's not something that's
supposed to happen.

I assume this is one of your own slaves?  My reading of the standards
suggests that in the case of invalid firmware a more appropriate response
would be to allow the transition to INIT (and back to BOOT), but refuse
transition to PREOP with AL status 0x0014 ("no valid firmware").

Having said that, there's very little information in the standards (at least
that I've found) about how firmware updates are recommended to work
(presumably because this is very hardware-specific), so it wouldn't hurt to
ask about it on the ETG forums, if you haven't already.

Regards,
Gavin Lambert
            
> -Original Message-
> From: Knud Baastrup [mailto:k...@deif.com]
> Sent: Thursday, 6 August 2015 19:05
> To: Gavin Lambert
> Cc: etherlab-dev@etherlab.org
> Subject: RE: [etherlab-dev] [PATCH] My patchset roundup July 2015
> 
> Hi Gavin,
> 
> I look forward to try out the complete access support that you have
> implemented. We have several cases where we believe this feature can be
> very useful.
> 
> 
> I have done a few updates on some of the knud-xxx patches as well,
> which includes:
> 
> Correction to 0003-Eoe-mac-address-now-derived-from-unique-mac.patch:
> Added new line to EC_SLAVE_INFO
> 
> Correction to 0008-Clear-slave-mailboxes-after-a-re-scan.patch:
> Now assuming that the slaves mailbox data is valid even if the slave
scanning
> skipped the clear mailbox state, e.g. if the slave refused to enter state
INIT.
> This correct an introduced bug where invalid application firmware (that
> prevented the slave module to enter state INIT) could prevent mailbox
> communication (FoE) in state BOOT.
> 
> New 0017-EoE-processing-is-now-only-allowed-in-state-PREOP-SA.patch
> The patch ensure that EoE fragments only is forwarded in state PREOP,
> SAFEOP and OP and not in state BOOT that is mainly (only?) intended for
> bootstrapping using FoE. The patch corrects an issue where EoE fragments
> can interrupt a successful FoE firmware upgrade in state BOOT despite that
> the slave code is restricted to only use FoE in state BOOT. In our case we
use
> a common bootloader for all I/O modules that only supports FoE. The
> bootloader will however still use mailbox buffer to receive EoE fragments
> (and additional to inform the Master that EoE is not supported), which in
> some cases can prevent the execution of a FoE request.
> 
> Mvh. Knud
> 
> 
> -Original Message-
> From: etherlab-dev [mailto:etherlab-dev-boun...@etherlab.org] On Behalf
> Of Gavin Lambert
> Sent: 10. juli 2015 09:14
> To: etherlab-dev@etherlab.org; Florian Pose
> Subject: [etherlab-dev] [PATCH] My patchset roundup July 2015
> 
> Hi all,
> 
> It's been a while since I last posted my current patchset for Etherlab, so
it
> seemed like a good time to share the newest ones.
> 
> As with the last set, this is based on stable-1.5 (specifically
> 4b0b906df1b40a1b5610282117b2c22581890575) and contains both my own
> patches and patches from others in the community, and I'm sharing them in
> case people find them useful and that hopefully they'll make it into
mainline.
> 
> Having said that, it appears that some of the patches in here have made it
> into the default branch already; I haven't had time yet to inspect them
and
> rebase my patchset onto that branch, but it's on my TODO list. :)
> 
> The series file included in the archive shows the intended application
order
> (though you'll probably want to skip some of them if you're on default
> already).
> 
> Short descriptions (commit notes) are at the top of each patch; detailed
> descriptions for the older patches can be found in previous emails, but
I'll
> give a bit more background on the newer patches below.  The older patches
> are unchanged except for possible defuzzing.
> 
> Before that though, just a reminder that patch 0006 "dc_sync_vs_sys_time"
> is only a partial fix; it works for slaves that have AssignActivate bits
0x3000
> set, and that don't mind the offset between SM events and SYNC events
> changing, but won'

Re: [etherlab-dev] Beckhoff AX5000 SoE drive: The PDI has no access rights to the ESC eeprom.

2015-07-12 Thread Gavin Lambert
On 11 July 2015 12:38, quoth Nuno Gonçalves:
> I'm running etherlab master default branch and it is making the Beckhoff
> AX5000 SoE drive unhappy.
> 
> When changing state to PREOP the drive complains with the following error:
> 
> External Periphery - Control card: Reading the ESC-eeprom failed: The PDI
> has no access rights to the ESC eeprom.
> 
> This happens before any IDN and DC configuration are due (transition to
> SAFEOP).
> 
> Trusting the error message, what kind of ESC-eeprom reading are we
> requesting on the transition to PREOP that can be causing this issue?
> 
> I can try to Wireshark this and TwinCAT initialization to compare both,
and
> see if I can spot the issue, but I don't think that will work very
easily...

It's not Etherlab trying to read the EEPROM, it's the slave itself trying
to.  ("PDI" is the slave's interface to its own ESC.)

By default, Etherlab reserves SII (EEPROM) access to ECAT only; the PDI can
only access its EEPROM before it brings the ESC out of reset.

Try passing "--enable-sii-assign" to configure; this tells Etherlab to grant
the PDI access to its EEPROM at specific times, including during PREOP.

At the moment the implementation of this is not standards-compliant, but it
should hopefully be sufficient for most cases; so it should resolve your
issue.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] My patchset roundup July 2015

2015-07-12 Thread Gavin Lambert
On 13 July 2015 10:20, quoth Graeme Foot:
> Re: gavinl-2004-sdo_write_size.
> 
> What I have been doing to allow one SDO Request object for all data sizes
is
> to call ecrt_sdo_request_read() every time before calling
> ecrt_sdo_request_write().  This sets the size specific to the object you
are
> about to write at the expense of extra time and processing.  It does
however
> ensure the size is matched correctly to the object you are about to write
to.

Sure, that would work too, given the way that reading changes the next write
size.  And it's true that most of the time I do a read first in any case to
check the current value (which may affect whether I do the write at all).

But writing without a prior read still seems like something you should be
able to do, and the read changing the write size is technically an
implementation bug that I wouldn't want to rely on.

> Also as you say, the SDO Request must be created with the largest expected
> data size.  The largest I have come across so far is a STRING[16] (16
bytes).
> Has anyone seen larger?

Strings can be longer than that of course, but the other time that you may
get larger values is when using Complete Access (via the other patches).
The motivating example was reading an entire array at once, which was about
40 bytes.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] EC_FIND_SLAVE when alias != 0 is incorrect

2015-04-27 Thread Gavin Lambert
No, the original code is correct.

 

EtherCAT slaves can be addressed in one of two ways:

 

1.   0, abs_ring_pos => where abs_ring_pos is the absolute position in the 
virtual ring (0 is the first slave).

2.   alias, rel_ring_pos => where alias is the alias of some slave and 
rel_ring_pos is the *relative* offset from that slave (0 is the slave with that 
alias, 1 is the first slave after that one, etc).

 

The change you posted below breaks both of these.  If it is “working” for you, 
then it is only by coincidence as a result of you passing incorrect parameters.

 

From: etherlab-dev [mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of Raz
Sent: Monday, 27 April 2015 21:02
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] EC_FIND_SLAVE when alias != 0 is incorrect

 

when alias is used EC_FIND_SLAVE fails to find slave at position greater than 0


>From 1349cabe197e84208a196d9186111551f24e25f5 Mon Sep 17 00:00:00 2001
From: raz mailto:razi...@gmail.com> >
Date: Mon, 27 Apr 2015 11:51:23 +0300
Subject: [PATCH] alias search incorrect

---
 drivers/misc/ethercat_master/master/master.c | 6 +++---

diff --git a/drivers/misc/ethercat_master/master/master.c 
b/drivers/misc/ethercat_master/master/master.c
index 34cc1b5..d9ec5b3 100644
--- a/drivers/misc/ethercat_master/master/master.c
+++ b/drivers/misc/ethercat_master/master/master.c
@@ -1569,9 +1569,9 @@ void ec_master_attach_slave_configs(
 } \
 if (slave == master->slaves + master->slave_count) \
 return NULL; \
-} \
-\
-slave += position; \
+} else{ \
+   slave += position; \
+   } \
 if (slave < master->slaves + master->slave_count) { \
 return slave; \
 } else { \
-- 
1.9.1


-- 

https://sites.google.com/site/ironspeedlinux/

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] A whole lotta patchin' goin' on

2015-03-17 Thread Gavin Lambert
On 13 March 2015 16:12, quoth I:
> gavinl-1011-e1000e_watchdog:
>   This resolves an issue with the e1000e driver that I mentioned earlier
--
> when wired for cable redundancy, the second port didn't establish link
until
> the network broke, causing an unacceptable delay in failover to the
> redundant connection.  Turns out the problem was that the port watchdog
has
> the job of detecting link up/down and the watchdog was not run if the port
> was receiving packets, even if it didn't think it had a link.  (With
> redundant wiring, it would transmit on the main link and receive back on
the
> backup link, resetting the backup link's watchdog each time so that it
never
> ran.)  This patch removes the reset of watchdog on receive, so that the
> watchdog runs every 2 seconds regardless.
>   I haven't checked the other network drivers to see if they're similarly
> afflicted.

After a bit more testing, I need to revise this patch.  It causes ~450us of
extra delay inside ecrt_master_receive whenever the 2 second timer hits,
which I think we can all agree is a bad thing.

On looking closer at the older kernel versions, I noticed that from 2.6.35
and earlier the watchdog task was being scheduled to a kernel worker thread,
while from 2.6.37 and later it was changed to perform this directly on the
master application thread.  Does anyone recall what the reason for this
change was, or whether it was accidental?  It seems to have happened in
commit c350fc89afd7ac6bb64b706bbc333df5e53e3d2f.

(Note that prior to this patch on all versions it would simply never execute
the watchdog task as long as it was receiving packets, meaning that the
stats calculations and other housekeeping tasks that seem to be part of this
don't get performed; I'm not familiar enough with the driver/hardware
internals to know whether this is a good thing or not.  Given the cyclic
nature of EtherCAT, there is rarely a time that ports stop receiving
packets.)

In the revised patch (attached), I've chosen to continue running the
watchdog every 2 seconds even if RX happens (which fixes redundancy) but
I've moved the watchdog work back to the worker thread (on 2.6.37+) to avoid
holding up ecrt_master_receive.  There is a slight race with the timer reset
as a result (it doesn't take the time required to run the watchdog task into
account) but as this is 2 seconds vs. ~500us that seems reasonably safe --
and it's what happened in the older kernel versions as well.

I did consider an alternate patch which still avoids calling the watchdog if
the port is receiving data, but I'm not convinced there's value in avoiding
the "link_up" work in the watchdog task, especially when it's being done on
a worker thread.  Perhaps someone more familiar with this could enlighten
me?



gavinl-1011-e1000e_watchdog.patch
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] A whole lotta patchin' goin' on

2015-03-12 Thread Gavin Lambert
Attached two more patches, one of which is the missing one from the prior
message (despite the numbers, they can basically be applied in any order,
but that's the minimum-fuzz locations).

gavinl-1005-quick_op_watchdog:
  When some kind of comms error occurs and a slave watchdogs back to
SAFEOP+ERROR, this patch will detect that the error case was specifically
due to a watchdog timeout and will transition it straight back to OP instead
of going all the way back to PREOP and fully reconfiguring it.  As a side
benefit it also caches the last AL status code from the slave, which could
potentially be made available to the application (although this patch does
not do so).

gavinl-1011-e1000e_watchdog:
  This resolves an issue with the e1000e driver that I mentioned earlier --
when wired for cable redundancy, the second port didn't establish link until
the network broke, causing an unacceptable delay in failover to the
redundant connection.  Turns out the problem was that the port watchdog has
the job of detecting link up/down and the watchdog was not run if the port
was receiving packets, even if it didn't think it had a link.  (With
redundant wiring, it would transmit on the main link and receive back on the
backup link, resetting the backup link's watchdog each time so that it never
ran.)  This patch removes the reset of watchdog on receive, so that the
watchdog runs every 2 seconds regardless.
  I haven't checked the other network drivers to see if they're similarly
afflicted.


On 11 March 2015 13:45, I quoth:
> I've attached my current patch series from stable-1.5.  This fixes various
> issues that I've encountered while using the master library.  I've posted
> some of these before but it seemed best to repost the full bundle.
> 
> Note that this bundle includes a few select patches from Frank
Heckenbach's
> and Knud Baastrup's previously posted patch series.  There are a few
included
> patches that I probably haven't tested very thoroughly (notably anything
to
> do with EoE, since I don't use that myself), and conversely there are
> probably patches that I haven't included that are perfectly good and
should
> get merged to mainline.  This particular bundle mostly uses Knud's patches
> for mailbox queuing and out-of-order replies, since they seem "simpler" --
I
> have an alternate bundle that uses Frank's patches if you'd prefer to see
> that.
> 
> The bundle is formatted as an HG MQ patch queue (which also works with
quilt
> and other similar utilities), so the series file defines the intended
> application order.  But the short version is that there are a few low-
> numbered patches to be applied first, followed by Frank & Knud's patches,
> then the high-numbered gavinl patches at the end.
> 
> Each patch has a brief description at the top (intended as a commit
message),
> but I'll go through them all here as well:
> 
> gavinl-0001-deactivate_unmap:
>   Deactivating the master from userspace did not release its process data
> memory mapping -- only releasing it did.  This means that if a master is
> deactivated, reconfigured, and reactivated, it becomes impossible to
"really"
> release without terminating the application, due to a dangling handle use.
> This patch moves the deallocation to where it belongs.
> 
> gavinl-0002-dc_refclk_not_op:
>   The reference clock defaults to the first slave on the network.  If this
is
> not actually included in the application config, then it may fail to
> transition to OP when requested, causing syslog spam.  Since I can't think
of
> any reason why a refclock would not function correctly when left in PREOP,
> this patch changes that default behaviour -- however just in case there's
> some weird slave out there that needs this, it can be enabled again via
> configure.
> 
> gavinl-0003-refclk_nxio:
>   When using ecrt_master_reference_clock_time in a loop (eg. as part of
> synchronising the master to the refclock rather than the reverse), you're
> very likely to call it before a reference clock has been selected.  When
> you're doing this from a userspace app, this generates a lot of pointless
> stderr spam.  This patch removes this, as any app calling this should be
> prepared to deal with failure in a sensible way anyway.  (Arguably all the
> other stderr prints in this file should be made optional as well, but
that's
> a separate issue.)
> 
> gavinl-0004-abort_slave_config_reg_requests:
>   If a slave_config register request is in progress and the corresponding
> slave goes offline, the request is aborted but left permanently in BUSY
> state.  This patch marks the request with FAILURE/ERROR in this case.
> 
> gavinl-0005-abort_detached_requests:
>   Similarly slave_config SDO and register requests that are queued for
slaves
> that are offline will never be started and will simply remain BUSY forever
> from the perspective of the application.  This patch aborts (and marks as
> FAILURE/ERROR) these requests.
> 
> gavinl-0006-dc_sync_vs_sys_time:
>   When adjusting the 

Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-03-02 Thread Gavin Lambert
On 3 March 2015 01:49, quoth Knud Baastrup:
> Thanks, attached updated patches. See inline comments.

Looks good, as far as I can see. :)

Although speaking of syslog spam, I'm getting quite a lot of "Busy -
processing internal SDO request!" now.

>> Although speaking of the EC_REGALIAS code, if that's enabled and if the
>> register 0x0012 alias is different from the SII alias, then this patch
might
>> malfunction (it should probably skip reading the SII alias and go
straight
>> for the register).  Having said that, normally the two should be the
same,
>> unless someone is in the process of changing the alias (in which case
>> rebooting the slave afterwards should "fix" everything).  There might be
some
>> odd slaves out there though, which could be why EC_REGALIAS was added in
the
>> first place..?
>
> The patch should not be malfunctioning, but yes if alias (or a serial
> number) is updated after a re-scan the stored sii_image cannot be matched
in
> the coming re-scan and a new sii_image will be created for that particular
> module.

The issue would be if some slave always had some wrong value in its SII but
loaded some other value to register 0x0012 on startup (eg. from onboard
dipswitches).  This is not as unlikely as it sounds as it can be quite
awkward for the slave to access its own SII, especially with the default
Etherlab configuration (EC_SII_ASSIGN is not defined by default, and I'm
fairly sure it's not implemented correctly anyway).  This could either work
by coincidence (if the "wrong" value was still unique), or cause either a
cache miss or in the worst case a hit on the wrong data (if that alias value
is shared with another slave).

Fortunately the standards require that in this case the "wrong SII value"
must be zero, which would just make your patch ignore the alias instead of
getting an invalid cache hit, but it's always possible there's some slave
that violates this.  (Also the standard says that in case they're both
non-zero it doesn't need to signal an error until the INIT->PREOP
transition, and the scan may occur before this.)

So I was thinking that in the EC_REGALIAS case your patch should just read
register 0x0012 sooner instead of reading the SII alias at first and then
reading 0x0012 later (but not using the latter for the SII lookup).  It'd
save several network cycles too.

Having said that, I don't know how common use of EC_REGALIAS is (I don't use
it myself).  Maybe it doesn't really matter.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-03-01 Thread Gavin Lambert
On 27 February 2015 21:44, quoth Knud Baastrup:
> 
> Added one additional patch
(0015-Internal-SDO-requests-now-synchronized-with-
> external.patch) that solves an issue with input/output errors when
executing
> the ethercat sdos command (that now fetch directory) while configured SDO
> requests are executed from user application. Can also be observed with
> ethercat upload/download from EtherCAT Tool together with the execution of
> configured SDO requests. See the documentation in the patch it selves for
> more information.

I just noticed that patch "17_remove_prints_to_avoid_syslog_spam.patch" from
the 02022015 patch series appears to have vanished from later series.  I
assume this was intentional, as I don't recall seeing the spam it referred
to, but I thought I'd mention it just in case.

Also, some compiler warnings are still present from patch 0013:

master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_enter_attach_sii':
master/fsm_slave_scan.c:494:17: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'int' [-Wformat=]
 EC_SLAVE_DBG(slave, 1, "Slave can re-use SII image data
stored."
 ^
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
 EC_SLAVE_DBG(slave, 1, "Slave can re-use SII image data
stored."
 ^
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of
type 'size_t', but argument 6 has type 'uint32_t' [-Wformat=]
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of
type 'size_t', but argument 7 has type 'uint32_t' [-Wformat=]
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_alias':
master/fsm_slave_scan.c:721:5: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'int' [-Wformat=]
 EC_SLAVE_DBG(slave, 1, "Alias: %zu\n", slave->effective_alias);
 ^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_serial':
master/fsm_slave_scan.c:759:5: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
 EC_SLAVE_DBG(slave, 1, "Serial Number: %zu\n",
slave->effective_serial_number);
 ^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_vendor':
master/fsm_slave_scan.c:792:5: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
 EC_SLAVE_DBG(slave, 1, "Vendor ID: %zu\n", slave->effective_vendor_id);
 ^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_product':
master/fsm_slave_scan.c:825:5: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
 EC_SLAVE_DBG(slave, 1, "Product code: %zu\n",
slave->effective_product_code);
 ^

The ones complaining about "int" probably need casts to "unsigned" (or
uint32_t if you prefer) due to default parameter extension, and the %zu
should just be %u for all of them.

Also vendor ids and product codes are usually printed in hex.  Not sure
about serial numbers, but "ethercat slaves -v" displays those in hex too, so
that seems reasonable.

On a somewhat related note, I'm not sure "effective_serial" etc are good
variable names.  "Effective alias" is phrased that way because there are
several different kinds of alias, but this contains the one that's currently
in use (eg. see the EC_REGALIAS code, which allows the effective alias to
come from a register rather than the SII); but that isn't really true for
the vendor/product/serial values.  This is just a minor quibble of course.
:)

Although speaking of the EC_REGALIAS code, if that's enabled and if the
register 0x0012 alias is different from the SII alias, then this patch might
malfunction (it should probably skip reading the SII alias and go straight
for the register).  Having said that, normally the two should be the same,
unless someone is in the process of changing the alias (in which case
rebooting the slave afterwards should "fix" everything).  There might be
some odd slaves out there though, which could be why EC_REGALIAS was added
in the first place..?

Finally, this is one of those "probably not strictly necessary but it makes
things tidier just in case" changes, but I recommend adding the following
hunk to patch 0005:

--- a/master/datagram.c
+++ b/master/datagram.c
@@ -586,6 +586,9 @@
 case EC_DATAGRAM_ERROR:
 printk("error");
 break;
+case EC_DATAGRAM_INVALID:
+printk("invalid");
+break;
 default:
 printk("???");
 }


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-02-15 Thread Gavin Lambert
On 13 February 2015 21:39, quoth Knud Baastrup:
>> Nice!  Although there still seem to be some funny things going on with
the
>> whitespace, eg. see patch 0013's master/fsm_slave_config.c's second hunk
>> (ec_fsm_slave_config_enter_mbox_sync).
>
> I guess I need more help to figure this out. I cannot (with my current
> knowledge of patch management) see anything wrong in this specific hunk
(line
> 374 to 476). Do you get some kind of warning when applying the patch or
how
> do you observe the issue?

The second hunk covers lines 467 to 524 in the patched file.

There's no patching errors or anything like that, it's just that the
inserted lines have only four spaces instead of eight, so the indentation
appears wrong when compared to the surrounding code.

I didn't examine the patches with a fine-toothed comb (though I did spend a
bit of time looking through them, of course), so I don't know if there are
other instances of this or if this was the only one, but I happened to
notice this case so I thought I'd mention it.  Obviously it doesn't affect
the actual operation of the patch, it's just a code style issue.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-02-12 Thread Gavin Lambert
On 13 February 2015 03:30, quoth Knud Baastrup:
> I have attached a new set of patches (now by using git format-patch, which
> also imply that the patch names are given by the commit text).
> 
> I believe that I have addressed the issues you highlighted in prior mails
> including the alias support that might be relevant for us as well sometime
in
> the future.

Nice!  Although there still seem to be some funny things going on with the
whitespace, eg. see patch 0013's master/fsm_slave_config.c's second hunk
(ec_fsm_slave_config_enter_mbox_sync).

> The locks that conflicts with RTAI could be removed with a define guard,
e.g.
> re-use the EC_RTDM define already available?

They're not conflicts, and I wasn't suggesting any specific changes (as I
said, I don't use RTDM myself so I don't really know specifics).  It was
just a comment to indicate why the master originally didn't do any locking
there, and that your original problem *could* theoretically have been solved
by doing the locking in the application code instead, as it's only an issue
with concurrent realtime tasks, which are likely to need some
application-level locking anyway.  It's a possible reason that Florian might
not want to accept the patch, but that doesn't mean that you should modify
or withdraw it -- that's something he can decide.


I'll integrate your new patchset into my build and do a bit more testing;
I'm hoping to post my full patch queue in a few days.  This will include
your patches as well -- I hope you don't mind?  (They'll be clearly
attributed, of course.)


Just some further thoughts on patch 0010 (deferring the sdo dictionary
fetch): one of the interesting things about fsm_slave over fsm_master is
that the former can run in parallel while the latter only in series.  In
principle, this means that if someone issues "ethercat sdos" with no filters
on a large network, the fetch time could be reduced considerably.  (It won't
reduce the time to that of a single slave, as it has caps on the number of
slave FSMs it can run in parallel to prevent blowing out the number of
frames and causing latency.)  Currently your patch forces this to still run
sequentially anyway, because the ioctl is blocking and it only does one
slave at a time.

I was thinking about having a go at trying to make that change myself, but
having said that, given that running "ethercat sdos" on multiple slaves is
not particularly useful (since networks usually contain many duplicates) and
that this is generally only used during development or commissioning, I'm
not sure whether it's really worth it.

I was also wondering if it should do a more limited dictionary fetch by
default (just of the PDOs), which could improve some of the logging, but
even then those messages only appear when the debug level is increased, so
most of the time it wouldn't be all that useful.  And someone can just look
at the slave docs (or a local dictionary scan) if they want to interpret
logs to find out what a particular SDO entry is called from its
index:subindex.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Sanity check: default send interval

2015-02-11 Thread Gavin Lambert
On 11 February 2015 23:42, quoth Frank Heckenbach:
> > In master/master.c's ec_master_init (line 213 in my patched version;
> > line
> > 211 in stable-1.5) there's the following line:
> >
> > ec_master_set_send_interval(master, 100 / HZ);
> >
> > According to the definition further down and the docs, that second
> > parameter is supposed to be the time between master application cycles
> > in microseconds, which is used to calculate a few queue sizes and also
> > to control the master thread sleep time if EC_USE_HRTIMER is defined
> > (via configure --enable-hrtimer).
> >
> > Doesn't the use of HZ above mean that this is actually calculating
> > "how many seconds is 100 jiffies"
> 
> Which is the same as "how many microseconds is 1 jiffy". So sleeping for
this
> many microseconds is as close as possible to the non-hrtimer code which
> sleeps for 1 jiffy.

Thanks, that makes more sense now.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Sanity check: default send interval

2015-02-10 Thread Gavin Lambert
Just noticed something that I think is incorrect, but I thought I'd ask for
a sanity check from the group in case I'm missing something.

In master/master.c's ec_master_init (line 213 in my patched version; line
211 in stable-1.5) there's the following line:

ec_master_set_send_interval(master, 100 / HZ);

According to the definition further down and the docs, that second parameter
is supposed to be the time between master application cycles in
microseconds, which is used to calculate a few queue sizes and also to
control the master thread sleep time if EC_USE_HRTIMER is defined (via
configure --enable-hrtimer).

Doesn't the use of HZ above mean that this is actually calculating "how many
seconds is 100 jiffies" rather than an actual microseconds value?
What's the intended default send interval?


On a somewhat related note, is there some guidance for when using
--enable-hrtimer is good or bad?  Does it relate just to whether the hrtimer
is present/trusted or is there something more subtle?  In terms of the code,
it just seems to be the difference between a "sleep for X time" vs. a "sleep
for 1 quantum", so it *seems* like --enable-hrtimer is good for reducing CPU
usage.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-02-10 Thread Gavin Lambert
On 10 February 2015 20:47:
>> 4. Regarding 13_domain_lock.patch, I believe the original rationale of
the
>> master is that locking between concurrent application tasks is the
>> responsibility of the application, not the master -- that's why in
>> kernelspace it has send/receive callbacks (formerly lock/unlock
callbacks) so
>> that eg. RTAI locks can be substituted, or locking can be avoided if the
>> application has some other way to schedule things to avoid actual
concurrency
>> (or if it's only running a single task).  See the "Concurrent Master
Access"
>> section in the docs.  I don't have any personal objections to this patch
>> though.
>
> Yes, we just faced some cases where the application developers did not
> include the necessary locking, which have quite severe impact for the
> complete system. We have not used RTAI, but I am not sure I understand why
> the extra locks become a problem for RTAI?

Not a problem as such, but not necessarily sufficient to protect anything.
RTAI/Xenomai is a separate kernel, so concurrent tasks would be using RTAI
locks instead of regular kernel locks, so they would make the kernel locking
redundant.  It does trivially hurt performance to take a lock that is never
contended, but it's usually not worth worrying about that unless it's in a
tight loop.

Having said that, I don't use RTAI *or* concurrent tasks, so it doesn't
really affect me either way. :)

>> 6. Regarding 16_improved_ethercat_rescan_performance.patch, it looks like
a
>> stray temporary file was included in the patch.  Also, I'm not sure it's
safe
>> to retrieve the data only by serial number.  Serial numbers are not
>> guaranteed unique between vendors, or even between product lines -- I
think
>> at minimum you should include the vendor id and product code in the
index.
>> Also, possibly this should have a #define config guard to disable this
>> functionality in case the master will be used at a site with pathological
>> slaves (eg. multiple slaves with identical non-zero vendor/product/serial
>> triplets, since *technically* they're not guaranteed unique at all --
>> although any slave vendor who does this deserves a kick).
>
> Yes sorry, my mistake with the temporary file. I can only agree with
> you that vendor and productcode should be included in the index in order
for
> this patch to be used in large scale, I will add this. I can also agree
with
> the #define guard.

To reduce network cycles a bit, I suggest trying the alias (if nonzero)
first, as this is required to be network-unique if defined (meaning that you
wouldn't need to check vendor/product/serial); falling back to reading
serial, check if nonzero, and only then read the vendor and product codes.
(I'm not sure if the alias is already known at that point or if that
requires a network cycle to read as well, but even if the latter it means
one read instead of at least three.)

Of course, I might be a little biased since as I mentioned before I usually
configure aliases on all slaves. :)

>> I haven't had a chance to test things locally yet, but at least
everything is
>> compiling ok with these patches. :)

I've given it some minimal testing now, and I'm happy to report that in a
network with about 10 slaves (all with serial numbers) this reduces the
total rescan time from about 45 seconds to about 2, at least for subsequent
scans.  (Numbers are vague because the test conditions weren't quite
identical in each case.)

Although the SDO dictionary patch means that I'm not really testing your
mailbox patches anymore, because dictionary vs. other SDO requests were the
main cause of mailbox conflicts that I see with an unpatched master.  (I'm
not using EoE, which is the other main source of conflicts.)  That's also a
good patch, as the dictionary scan of those 10 slaves takes about two
minutes, and normally the information isn't required for standard running,
only when commissioning.  (It's a little disconcerting to see "ethercat
sdos" just sit apparently dead for a few minutes though.  Maybe it needs
some kind of progress reporting.  Although it's not as bad when limited to a
single slave, which is probably the more common use case.)


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-02-08 Thread Gavin Lambert
On 2 February 2015 20:18, you quoth:
> 
> I will just update you on some additional pathes I have prepared for
EtherCAT
> Master. I have attached the complete set of patches that we currently use,
> but only the below patches have been updated added.

I'm just having a more detailed look through these patches now, and there's
a few niggles. ;)

1. There are several files that appear to have tabs in them; it's usually a
good idea when sharing patches with others to use spaces only, as different
people/editors use different tab sizes.

2. There's several diffs in various files (eg. 12_sdo_directory.patch's
master/ioctl.h) that contain only whitespace changes on various lines for no
readily apparent reason.  These sorts of things can cause unnecessary
conflicts and hide the true intent of the patch.  This may have been the
result of a space -> tab -> space conversion gone wrong.

3. This is optional, but I think it's good style to include a short text
description at the top of each patch file, which can act as a commit
message, and helps people reading the patch later without having to hunt
down the original email.  (If you're using HG MQ it does this automatically;
I think git format-patch will also do this for you if you have a branch
structured with one commit per patch, though it also adds quite a bit of
extra email-header junk.)

4. Regarding 13_domain_lock.patch, I believe the original rationale of the
master is that locking between concurrent application tasks is the
responsibility of the application, not the master -- that's why in
kernelspace it has send/receive callbacks (formerly lock/unlock callbacks)
so that eg. RTAI locks can be substituted, or locking can be avoided if the
application has some other way to schedule things to avoid actual
concurrency (or if it's only running a single task).  See the "Concurrent
Master Access" section in the docs.  I don't have any personal objections to
this patch though.

5. Regarding 14_fix_string_handling.patch, I don't think this is the "right"
fix.  I've attached Frank's 04 patch which fixes this a different way.

6. Regarding 16_improved_ethercat_rescan_performance.patch, it looks like a
stray temporary file was included in the patch.  Also, I'm not sure it's
safe to retrieve the data only by serial number.  Serial numbers are not
guaranteed unique between vendors, or even between product lines -- I think
at minimum you should include the vendor id and product code in the index.
Also, possibly this should have a #define config guard to disable this
functionality in case the master will be used at a site with pathological
slaves (eg. multiple slaves with identical non-zero vendor/product/serial
triplets, since *technically* they're not guaranteed unique at all --
although any slave vendor who does this deserves a kick).

I haven't had a chance to test things locally yet, but at least everything
is compiling ok with these patches. :)



frank_04-string-download.patch
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-02-02 Thread Gavin Lambert
On 3 February 2015 20:08, quoth Knud Baastrup:
> Yes, I decided not to use the alias as matching criteria for the slave
data
> due to the "tree points" intended use of aliases. Currently we do not use
> aliases on our modules, but we are planning to introduce this in the near
> future to prevent that some modules get the wrong configuration if a rack
of
> modules (with same vendor and product code as the following rack) are
> disconnected due to wire/power break. What do you mean by the "master
> configurator" ?

Depending on context, either the software that sets up the network layout
(assigning aliases, saving persistent parameters, etc), or the person in
charge of doing that work for a particular installation.  Depending on how
your application and network operate, sometimes that's an explicit step, and
sometimes it's part of the application.

Since I'm a lazy person, when our modules are assigned a serial number
during production they get that assigned as their alias as well (although
they can also be explicitly assigned a different alias if the network
configurator wishes), which simplifies network configuration quite a bit.
But then, these are discrete units rather than "racks" so there's a higher
chance they'll get wired in an unexpected order, so I think this provides
the best compromise in network flexibility.  YMMV.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-02-02 Thread Gavin Lambert
On 3 February 2015 13:38, quoth Graeme Foot:
> Sent: Tuesday, 3 February 2015 13:38
> To: Gavin Lambert; 'Knud Baastrup'
> Cc: etherlab-dev@etherlab.org
> Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
> 
> From: etherlab-dev [mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of
> Gavin Lambert
> > On 2 February 2015 20:18, quoth Knud Baastrup:
> > > 16_improved_ethercat_rescan_performance.patch:
> > > The SII data and PDOs will now be stored when the EtherCAT master is
> > > in
> > its
> > > operation phase. The stored SII data and PDOs will be detached from
> > > the slaves prior to a scanning and re-attached during the scanning
> > > without the need to fetch the SII data and PDOs once again. The SII
> > > data and PDOs will however only be stored if the slave have a serial
> > > number defined as this serial number will be used when re-attaching
the
> SII data and PDOs.
> >
> > Ooh, thanks for that one.  That's one of the performance holes that I
was
> planning on investigating myself soonish.
> >
> 
> None of my modules seem to have serial numbers (Beckhoff IO and yaskawa
> amps), or is that a bug that's got a patch?  I'm running the original
1.5.2
> (+ misc patches).

It's not a bug, or really related to the master software at all.

It's up to the device manufacturer whether a given device has a serial
number or not.  In my case they do, but I'm mostly using in-house hardware.
Beckhoff modules as a general rule don't seem to have serial numbers.

This is separate from the "alias" which is used in network addressing and is
typically (but optionally) set by the network designer via the master
configurator.  The EtherLab master lacks specific commands to set aliases
but it can recognise aliases set by other masters, and some slaves support a
CoE download to reconfigure their alias (which can be accessed via the
command line), and some others have dipswitches.  Generally aliases are only
needed at "tree points" in the network graph, so if you're only using simple
chains they're less useful.

Technically the serial number can be altered by the master / network
designer as well via an EEPROM download, but this is less "encouraged".


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2015-02-02 Thread Gavin Lambert
On 2 February 2015 20:18, quoth Knud Baastrup:
> 16_improved_ethercat_rescan_performance.patch:
> The SII data and PDOs will now be stored when the EtherCAT master is in
its
> operation phase. The stored SII data and PDOs will be detached from the
> slaves prior to a scanning and re-attached during the scanning without the
> need to fetch the SII data and PDOs once again. The SII data and PDOs will
> however only be stored if the slave have a serial number defined as this
> serial number will be used when re-attaching the SII data and PDOs.

Ooh, thanks for that one.  That's one of the performance holes that I was
planning on investigating myself soonish.

(A somewhat related one is that when a slave drops to SAFEOP+ERROR as a
result of a comms watchdog error, it *should* be safe for the master to
bring it straight up to OP, especially when the slave has a serial and can
be unambiguously identified, but currently it's doing a full back-to-PREOP
reconfigure.)

I've been meaning to post the full patchset that I'm using at the moment (in
the hopes that a few pieces at least can get integrated), but I'm still
investigating a few issues (and working on unrelated things), so it's not
quite ready yet.  Although (at least partly due to inertia) I'm currently
using Frank's mailbox patches rather than yours. :)


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Userspace fork of Etherlab

2015-01-18 Thread Gavin Lambert
On 17 January 2015 02:00, quoth Frank Heckenbach:
> My result was that with very small packets, I can get cycle times of
1500ms
> without overruns. As the size of the packet (or packets if larger than
MTU)
> per cycles increases, so does the cycle time to run reliably, and with
really
> large packets I can get cycle times such that a bit more than 50% of
> theoretical bandwidth is used in either direction (which seems quie
> reasonable since it's what I've experienced and seen recommended for other
> communication protocols, and it also proves full-duplex works, otherwise
no
> more than 50% would be possible).
> 
> Our project runs at 2000ms (500Hz), and at that rate, I could get packets
of
> 5KB/cycle without overruns which is way above what we require. So the
> userspace port will only be for "slow" cycles (up to 500Hz), but that's
what
> we need. Maybe in a few years, the standard kernel's RT features will
improve
> and the userspace code will allow for faster cycle times without many
> changes.

I assume you meant microseconds here, which are usually shortened to µs or
us, not ms (which is milliseconds).  Cycle times of 1500ms would be quite
bad for most applications. :)

> I know about the userspace library. But moving to userspace is not my main

> goal. My main goal is better maintainability for my application, and
getting 
> rid of kernel dependencies is a step towards this goal. Using the
kernelspace 
> Etherlab code with a userspace application wouldn't help in any
significant 
> way since I'd have the same amount of kernel dependencies (my application 
> does not contain many).

Given that one of your requirements (judging from your previous patches) is
EoE support, that might be tricky without kernel support.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Master re-request race with slave mailboxes

2014-08-13 Thread Gavin Lambert
On 13 August 2014, quoth Frank Heckenbach:
> > In my application, on startup it requests the master and then uses
> > ecrt_master_sdo_upload to fetch certain information from slaves (eg.
> > profile, version, etc), both for diagnostics and to help ensure the
> > config is sane.  While this normally works fine, there can be problems
> > if it occurs too soon after the master service is started or after it
> > was last released.
> 
> Just a quick thought, did you try waiting until the dictionaries are
> completely fetched (cf. my patch #28)?

That does help with half of it (the initial startup after starting the
service), but it doesn't help with the release-rerequest race (because the
dictionaries aren't re-fetched during that time).

Basically at some unknown-to-the-app point (following a deactivate/release)
it will pause any in-progress requests, bump the slave back to INIT and then
to PREOP (clearing its mailboxes along the way), and then resume the
in-progress requests -- possibly at a point where it will now futilely poll
the mailbox for a reply that will never come, because the slave already
replied and then erased it at the master's request.  (It's a race, so most
of the time it gets lucky and this only happens sometimes.)

Though you've reminded me that if the application does wait for dictionaries
to be fetched, then implementing #4 in the master should be sufficient to
solve this.  (It's a slight variation on #4 + #1.)

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Master re-request race with slave mailboxes

2014-08-12 Thread Gavin Lambert
Hi,

Is it expected that after ecrt_request_master(), all online slaves are in
PREOP (or possibly stuck in INIT with error_flag=1)?  Or is an application
expected to explicitly verify the state of all slaves before trying to do
anything?

In the continuing saga of fun with mailbox SDOs, I've found that even with
Frank's or Knud's patches to reduce mailbox contention, there are still some
issues that stem from the slave state not being as expected.

In my application, on startup it requests the master and then uses
ecrt_master_sdo_upload to fetch certain information from slaves (eg.
profile, version, etc), both for diagnostics and to help ensure the config
is sane.  While this normally works fine, there can be problems if it occurs
too soon after the master service is started or after it was last released.

In particular, when the master is deactivated or released it will internally
schedule a transition back to PREOP for all slaves.  If the master is
re-requested too quickly, then this may not have even started yet, and since
SDO requests are disallowed (and the request state machines not processed)
during slave reconfiguration, it can end up doing two consecutive writes
(first the upload request from the application, then a retry or occasionally
something involving 0x1C12 and 0x1C13).  Firstly, this can result in the
second request to fail due to an unexpected response and consequently fail
the entire slave configuration (unless retried as in Frank's patches), and
secondly this will result in the application request timing out (because the
request machine is paused in a state where it just sent the request and then
resumed thinking that it just needs to wait for the reply, but in the
meantime the mailbox has been reset out from under it).

And of course this also means that currently when ecrt_request_master()
returns, some slaves may still be in a non-PREOP state pending transition to
PREOP, so it is not possible to rely on accessing SDOs that are "preop only"
- although this probably isn't a big problem as most of those will probably
be used with ecrt_slave_config_sdo* instead, which is safer.

Another interesting quirk that I noticed along the way is that
ecrt_request_master() will internally wait on master->config_busy - but this
is toggled (and waitqueue released) in between each slave, so even if slave
configuration has started, ecrt_request_master() will block only until it
finishes configuring the current slave and then return to the application
while configuration continues in the background; this seems of dubious
usefulness to me.  ("slave configuration" here refers to returning the
slaves to PREOP.)

I'm happy to look at writing some patches to resolve this behaviour, but
before I do that it seemed like a good idea to ask which behaviour is more
correct (in the view of the community):

1. Everything is working as expected (no patches are required), and it's the
application's responsibility to wait for the slave to return to PREOP before
using ecrt_master_sdo_{down,up}load.

2. ecrt_request_master() should block until all slaves finish returning to
PREOP, not just whichever one slave happens to be in progress at the time.
(Sub-decision: should it be the open or the reserve that blocks?  Currently
it's only the latter.)

3. ecrt_master_deactivate() (and consequently ecrt_release_master() too)
should block until all slaves finish returning to PREOP.  (This won't help
with initial startup happening too early.)

4. Don't allow configuration to start while a request is still in progress,
but then do the configuration before starting the *next* request.  (This
won't help with ensuring it's in PREOP before requesting, but will prevent
the mailbox mixup and timeout.)

5. Something else that I did not think of.

(Note that where I say "return to PREOP" above, this also applies to the
initial change to PREOP if the application is started too soon after the
master module is loaded.)

Thoughts?

(Hopefully this doesn't bias the responses too much, but I'm slightly
leaning towards #4, as this would uniformly apply to all types of requests
from all sources [command-line, blocking API, asynch API], and is likely to
be a step closer to structural improvement of the state machines.  It's a
little weaker in not assuring PREOP, but *usually* SDOs are always readable
and the write-in-PREOP-only SDOs should be handled via
ecrt_slave_config_sdo* as noted above.  The main problem with this [and why
one of the other options might be better] is that it could still try [and
fail] to transfer while the slave is in INIT, in the case when the app is
started too soon after the master, so #1 or #2 may be needed anyway.)

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Multiple mailbox protocols and other issues

2014-08-06 Thread Gavin Lambert
On 11 July 2014, quoth Knud Baastrup:
> Patch 2 (ethercat_152_patch_2_foe.patch):
> Wrong datagram used for timeout calculation.

You'll be happy to learn that this patch is not required if you use the
latest code; it was fixed in 8bb574da5da2.

> Patch 4 (ethercat_152_patch_4_eoe_mac.patch)
> Change the MAC address for eoe0sY devices to real local 
> administrated MAC addresses based on the NIC part of eth0 and 
> the EoE slaves ring position.

In the line "if (ETH_ALEN >= 5)", shouldn't that be > 5 or >= 6?  Also, if
this test fails (which seems unlikely, but then why test if it can't
happen?), this will leave the address uninitialized, which seems
undesirable.  Maybe it should fall back to the prior code in that case?

> Patch 5 (ethercat_152_patch_5_priority_inheritance.patch)
> Replaced semaphores with mutexes to utilize priority inheritance 
> and limit impact from lower priority tasks (EtherCAT-EOE) running 
> as sched_other task. 

While I like the idea of using RT-mutexes, they do have a minimum kernel
version (I'm not sure exactly what that is except that it's somewhere in the
2.6.x series) and currently Etherlab provides drivers for some kernel
versions that are before that cutoff, I suspect.  (And do rt_mutexes and
RTAI play nicely together or not?)  So this might break compatibility, and
so possibly should be a configure option.  Or maybe nobody cares about those
older kernel versions any more?  (I don't personally, I'm just wondering if
it might be a concern.)

Also I found a few cases where "down" and "up" hadn't been changed to
"rt_mutex_lock" etc.  Not sure if this was the result of a patch application
failure or if this was code added since the patches' base version.

> Patch 6 (ethercat_152_patch_6_mailbox.patch)
> Alternative solution to Patch 9-10-11 provided by Frank Heckenbach for 
> 1.5.0 (that I did not succeed to get up running on 1.5.2).

Did you look at the updated-to-1.5.2 patches that I posted?

> In this solution I accept that a mailbox read request (e.g. FP-RD) for 
> a given mailbox protocol can return data from any other mailbox protocol 
> running at the same time.  [...]

In master/master.c near line 1240, on receipt of a datagram you're searching
through the slave list to check its mailbox settings.  This code appears
unsafe in the case when this search fails (which presumably could occur eg.
if a datagram with a corrupted address arrives or if slave scanning has just
started and cleared master->slaves).

Also this seems to generate quite a lot of "Await configured mailbox
address" spam at debug level 1 even when no mailbox activity is taking
place..?

When compiling with a recent kernel version (3.13) I needed to #include
 in master/slave.h in order to compile several files (the
first is master/fmmu_config.c).  This doesn't appear to be needed in 3.2
however (or by some of the other .c files); I guess the indirect includes
have changed?

> Patch 10 (ethercat_152_patch_10_scan_skip_stats.patch)
> No reason to write output statistics in syslog when issuing a slave
scanning 
> where UNMATCHED datagrams are expected behavior.

I'm not sure I follow this one.  Why are unmatched datagrams expected?
Also, this patch isn't going to stop them printing, just delay it until the
scan completes.

Otherwise, this patchset seems to work ok.  I've only given it fairly basic
testing thus far though.

However there was a bit of fuzz and other conflicts when trying to apply
these to the latest HG source, so in the interests of making it easier for
Florian and anyone else using the latest source to examine and test these
patches, I've attached updated versions.  These have only had the following
changes:
 - de-fuzzed based on 8dd49f6f6d32 (close to stable-1.5 tip).
 - various tabs converted to whitespace and related minor whitespace
changes.
   - I noticed some inconsistent newline-brace styles but I left those
alone.
 - omitted patch 2 as specified above (it was empty after defuzzing).
 - converted a few extra down/ups in patch 5 as specified above.
 - added #include to master/slave.h in patch 6 as specified above.

In particular I've basically just included changes required to compile; I
haven't tried to fix any of the other possible issues mentioned above.

I've also included a series file, so in theory it should immediately be
MQ-compatible if extracted into the .hg dir.  (You might need to "hg qq -c
knud" first.)

Regards,
Gavin Lambert



patches-knud.tar.gz
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Support for multiple mailbox protocols

2014-07-01 Thread Gavin Lambert
On 1 July 2014, quoth Knud Baastrup:
> I have tested a bit on below scenario in my current setup and 
> prepared this small attached patch that await the SDO dictionary 
> fetching to be completed for a given slave (if SDO Info is 
> supported by the slave) before the slave is set ready for 
> external SDO requests. It is simple and works very well in my setup.

I haven't tested it yet, but it looks good.  It will delay the SDO (and other) 
transfers a bit (the dictionary scan can be quite slow on complex slaves) but 
by their nature SDO transfers tend to be less time-critical so I don't see this 
being a real problem.

> I have also reproduced scenario 1 from the bug report 
> http://lists.etherlab.org/pipermail/etherlab-dev/2014/000377.html

I've reproduced all of them.  But the combination of the subset of Frank's 
patches (with my 1.5.2 updates) and one other patch has squashed them all.  
There may be better ways to do some of the things, and further things that 
could be improved, but they do work:

 - http://lists.etherlab.org/pipermail/etherlab-dev/2014/000401.html
 - http://lists.etherlab.org/pipermail/etherlab-dev/2014/000411.html

(Note that if anyone wants to actually use the second one, let me know; similar 
changes are needed in fsm_foe and fsm_soe for completeness, and I have another 
version of the patch that does all three.)

> The new scanning will use an internal SDO request to fetch the assigned 
> PDOs (1c12), but the slave will return the data it had prepared just 
> before it were disrupted. I believe we need to fix this in the master 
> and ensure that the master “empties” any full mailboxes before it starts 
> to fetch the assigned PDOs. I think this can be done by sending a check 
> diagram and then fetch and discard the data for any slaves left with a 
> written mailbox. This operation could be done with a new state in the 
> fsm_slave_scan FSM. Any other suggestions?

Have a look at Frank's patch #26 "clear-mailbox".

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Support for multiple mailbox protocols

2014-06-30 Thread Gavin Lambert
On 30 June 2014, quoth Jun Yuan:
> The slave's CoE FSM instance is not controlled by the master FSM, but 
> by the slave FSM itself. The slave FSM is responsible for the CoE 
> requests in ec_slave_t issued directly by the user via the function 
> ecrt_master_sdo_download, ecrt_master_sdo_download_complete, and 
> ecrt_master_sdo_upload. These functions can be executed either in 
> the user application before the master's activation, or in the 
> terminal in whatever time.
>
> The master's CoE FSM instance, on the other hand, is responsible for 
> the CoE requests which should executed in the background by the 
> master FSM, such as automatically fetch slaves' coe dictionary while 
> master is idle, those in the slave->config->sdo_requests, which could 
> be issued via the ecrt_slave_config_sdo functions (which should be 
> configured during the master's activation), or via the function 
> ecrt_slave_config_create_sdo_request (which can be issued afterwards 
> in the user application's RT thread)

Actually the ecrt_slave_config_sdo* functions set up slave->config->sdo_configs 
(not sdo_requests); sending of these is managed by fsm_slave_config, which in 
turn is managed by fsm_master in such a way that it can't occur concurrently 
with anything else, AFAIK.  (fsm_slave is disabled until set_ready is called, 
which occurs later, and fsm_master waits for fsm_slave_config to complete 
before entering the idle state, which is where the other SDO requests are 
processed.)

It does seem odd that fsm_master processes slave->config->sdo_requests (from 
ecrt_slave_config_create_sdo_request), while fsm_slave does slave->sdo_requests 
(from ecrt_master_sdo_*) and also all the *other* non-SDO 
ecrt_slave_config_create_*_requests.

The fsm_master code seems older; maybe it was just never moved once fsm_slave 
was created?  It does mean that "realtime" SDO requests don't have to wait for 
set_ready, but I can't think why that would be desirable given that it doesn't 
apply to the other types of realtime request, and given that AFAIK the 
set_ready calls are all made before fsm_master enters idle anyway.

Actually I'm not sure why fsm_slave shouldn't be made responsible for both of 
those things (config->sdo_requests and SDO dictionary scanning).  Doing that 
would avoid the CoE concurrency issue altogether.  (One downside is that all 
kinds of requests would then be delayed until the dictionary scan completed, 
unless this was made less monolithic.  One advantage of Frank's locking patch 
over this is that it would still allow other requests [except 
create_sdo_requests, unless they were moved to fsm_slave] to interleave with 
dictionary scanning, albeit at a slower rate.)

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Support for multiple mailbox protocols

2014-06-30 Thread Gavin Lambert
.  But 
I think the Etherlab CoE state machine is already prepared for that.)

> While reading the documentation and another open source ethercat project 
> SOEM, I found there is a mailbox service "counter" besides the service 
> type in the mailbox header. It says, "Counter of the mailbox services 
> (0 ist start value, next value after 7 is 1)". I wonder if what this 
> counter is used for, how is it implemented in your slave example code, 
> whether it could be useful for us in the multiple conversation situation.

As I understand it, mostly it's intended as a way to avoid the situation above 
with repeated requests causing duplicated responses.  The idea is that when the 
master is sending a request it picks a value from 1-7 to go in there (this 
should increment with each unique request according to the spec, but slaves 
shouldn't be overly picky about it).  If the send gets WC=0, then it can repeat 
the request *with the same counter*.  If the slave receives two *consecutive* 
requests with the same counter value, the second is ignored, which would have 
meant the scenario above would have resulted in only one reply, and everyone 
would have been happy.  Note that the counter is global and not per-protocol.  
Also note that this does mean that even sends for different protocols need to 
be aware of each other at the lower level in order to set the correct counter 
and retry if necessary before sending the subsequent request.

When the master uses a counter value of 0 (which Etherlab currently always 
does) then this is bypassed and all requests are processed.  Similarly when the 
slave generates responses into the receive mailbox it may either always use 0 
for the counter or it may increment 1-7, but this is independent from the send 
mailbox counter.

So it's not really intended to deal with multiple conversation threads.  There 
are some other fields in the mailbox header that do look like they're intended 
for that sort of thing (channel and priority) but currently they're reserved in 
the spec and not actually implemented AFAIK.

There's also a mechanism for getting a slave to repeat a response without 
re-sending the request (which might have side effects), which could be useful 
if a check indicated something in the read mailbox but then the subsequent 
fetch timed out (after actually succeeding at clearing the read mailbox).  This 
involves a register write to 0x080E, but I haven't looked too closely at the 
specifics, or how likely slaves are to implement it.

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Meaning of FSM return values and "datagram_used"

2014-06-30 Thread Gavin Lambert
Hi Florian,

In 9cdd7669dc0b, the return value of the protocol-specific FSMs
(fsm_coe_exec, fsm_foe_exec, etc) was apparently changed from returning 0
when the FSM had completed to returning 0 when the datagram was not used
(which could also include when the datagram was still in INIT, QUEUED, and
SENT states).

However most of the places that actually call the FSMs seem to fairly
uniformly assume that returning zero indicates completion of the FSM.  Many
of the FSMs that call fsm_coe in particular themselves still claim to return
zero only on completion of their FSM.  As a result, some operations are not
actually completed.

In particular, I found a case (intermittently but repeatedly reproducible)
where the SDO dictionary scan was in progress, but a particular datagram was
delayed as a result of the master being released (and consequent transfer of
sending responsibility from operation thread to idle thread).  This caused
the scan to be abandoned for that particular slave and then resumed for the
next one.  I had Frank's "coe-lock" patch applied at the time (since this
scenario also demonstrated CoE concurrency), so since the CoE FSM was
abandoned and restarted instead of being run to completion it resulted in a
dangling lock and thus deadlocked shortly thereafter.  (I have a more
detailed analysis of this if you're interested.)

(Note that Frank's original patches are not vulnerable to this as the issue
was introduced between 1.5.0 and 1.5.2.)


As a temporary hack I tried making fsm_coe_exec also return 1 in the case
where the new datagram was not used as a result of the old datagram still
being INIT/QUEUED/SENT (ie. only return 0 when actually finished), and this
appears to have resolved the problem (but presumably not in the "right
way").  I didn't notice anything obviously bad happen as a result of this
change (according to behaviour and logs) but I didn't check everything.

Just for comparison, I've attached the patch I used for the hack.  It's
based on the patch series I posted last week (a subset of Frank's patches
updated for 1.5.2), and the most important change is near the top where it
checks fsm->datagram->state and uses the opposite return value.  Obviously a
complete patch should probably change the other FSMs (eg. FoE) similarly,
but like I said before I'm not sure this is the best way to do it, and also
the CoE FSM is the only one that gets called from so many different places,
so the others are probably less vulnerable.


This was for a separate reason, but both Frank's and Knud's patches have
introduced an INVALID state for datagrams in order to be able to return 1
from FSMs (to indicate non-completion) but not actually send a new datagram,
which again might not be the "right" way to do it but does seem to be a
common need.

Regards,
Gavin Lambert



coe-finish.patch
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Support for multiple mailbox protocols

2014-06-29 Thread Gavin Lambert
On 30 June 2014, quoth Knud Baastrup:
> I am still not able to identify a scenario where the EtherCAT master use
> concurrent CoE requests towards the same slave. I know that both master
> and slave FSM have CoE FSM instances, but their usage are controlled by
> the master FSM in dedicated states that should prevent concurrent
> access. Do any of you have any likely concrete scenario this can happen?

The case when it does happen with current master code is the same as the one 
Frank outlined when he provided his original patches.

Specifically, after doing a network scan and basic configuration (either to 
PREOP or to OP depending on the application state), fsm_master goes into an 
idle loop where one of the things it will do is to read the SDO dictionaries of 
each slave that it hasn't previously read.  This will execute concurrently with 
any other SDO request, whether that's coming from the command line tool or an 
application using ecrt_master_sdo_{down,up}load, or a realtime SDO request 
object.

Most of the time you can get away with that, as you're probably accessing 
different slaves, or you don't start the application until the network is 
stable and fully scanned (though bear in mind that it'll trigger a rescan if 
the network is disrupted at all).  Sometimes you don't get away with it though 
and you'll get "unexpected response" errors and other issues.

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ethercat-1.5: Various issues

2014-06-29 Thread Gavin Lambert
On 27 June 2014, quoth Frank Heckenbach:
> The issues regarding io_sem (#18, #19, #20) apply to all code, RTAI or
> not.

True, but like I said the current 1.5.2 locking code appears quite different
and I wasn't sure how to integrate it properly, so it seemed safest to leave
it alone for now and let Florian examine that part.

> (I also didn't take most of your reformatting changes. In my patches, I
> tried to stick to the apparent coding style in 1.5.0; maybe it has
> changed in 1.5.2, but since my patches are, and possibly might remain,
> for 1.5.0, I'll stick to that style. I only took a few changes where my
> version indeed didn't match that style.)

That's funny, that's exactly the reason why I did the reformatting in the
first place. :)

> Sorry, I meant the reverse check (but again, only as a last resort), so 
> datagrams that are just a little "too new" aren't timed out:
>
>  jiffies_sent - jiffies_poll > timeout_jiffies

See discussion below, but I don't think this is appropriate.

> > 11-mailbox-buffer:
> >   Omits your change to time out QUEUED datagrams as well as SENT ones.
> > [Remainder of fix for invalid timeout values.]
> 
> I'm not sure that's the fix, or why this would produce such values (did
> you try to debug it as I described in my last mail?), so for now I'll
> stick to my code which works for me.

The previous code starts tracking timeouts only after the packets are SENT;
as such it assumes that {jiffies,cycles}_sent is always <=
{jiffies,cycles}_poll (in modulo space, ie. allowing for wraparound).  The
cases where I was getting odd durations were when that assumption was
violated, specifically when (a) _sent/_received were set to a value higher
than _poll at the time when a datagram was fetched from a buffer instead of
from the network (because "now" was higher than _poll), and (b) when
QUEUED-but-not-SENT datagrams were evaluated for timeout, and thus the _sent
value is not yet set (and could be undefined).

Case (a) was fixed by using _poll instead of the current time when fetching
datagrams from the buffer.  Case (b) was fixed by not trying to time out
QUEUED datagrams.

I removed that change because I don't think even in light of patch #16 it is
useful to test QUEUED datagrams for timeout, as by existing definition they
cannot start to time out until they are SENT.  (Yes, patch #16 may extend
the time they're queued for another cycle, but I don't think that is
important in practice unless they can somehow get "lost" and never sent,
which did not appear to be the case.)

In any case, if you want to be able to time out QUEUED datagrams as well,
you'll have to set the _sent time to the time the datagram is queued, not
the time that it is actually sent.  (Or possibly both, depending on how you
want the existing timeout values to be affected.)

> > 28-dictionary-fetched:
> >   Also added to ec_slave_info_t (for ecrt_master_get_slave).
> >   Bumped EC_IOCTL_VERSION_MAGIC to allow userspace apps to detect
> > incompatible versions.
> 
> OK, but I think you should also add the line to ecrt_master_get_slave in
> master/master.c (not only to ecrt_master_get_slave in lib/master.c).

Fair point.  I will do that.

(Incidentally, while testing another issue that I'll make a post about
shortly, I found it useful to be able to detect when SDO reading of a
particular slave was in progress in addition to when it was completed, so I
extended this a little.  I'm not currently planning to keep those extra
changes though.)

> > 30-debug-level:
> >   New patch; alters the debug level checks from patch 11 to level 2
> > from level 1 for the messages printed in the common case, to reduce
> > log flooding at level 1.  (I made this a separate patch because the
> > original behaviour can be useful when testing.)
> 
> In my version, I made this change in #11 directly. (When testing, you
> can increase the debug level; having more different versions of the
> patches makes them even more unmaintainable than they apparently already
> are.)

I meant that given how much more verbose debug level 2 is than 1, it can be
useful when testing the behaviour of patch #11 itself to have it print the
information at debug level 1.  But then once you're happy that it's working
as expected, the "normal case" messages should be moved to level 2 to make
level 1 more useful.  That's why I put it into separate patches, so that the
intermediate state can be tested.  (You may note that I left some of the
messages at level 1 still, since they indicate that something unusual
happened and this is useful to know in operation.)  Maybe we just need some
additional debug levels in the end. :)

I already have the issue at debug level 1 that the kernel log's ringbuffer
sometimes fills up and messages get dropped.  Trying to make sense of things
at debug level 2 where that would happen even more frequently would quickly
be too painful to be viable.


___
etherlab-dev mailing list
etherlab-dev@et

Re: [etherlab-dev] Support for multiple mailbox protocols

2014-06-29 Thread Gavin Lambert
On 28 June 2014, quoth Jun Yuan:
> Are you sure that some slaves will choke with multiple CoE requests?
> Does these slaves then support simultaneous mailbox requests in 
> different protocol, i.e. CoE and EoE, or CoE and SoE in parallel? 
> Do we always need to wait until the slave have a response mail for 
> the last mailbox request, before another mailbox request is able to 
> send? I didn't find the answer in the ethercat documentation yet. 

As far as I am aware, all of those things are unspecified and left up to the 
actual slave implementation, so the answers may vary.

But for what it's worth, the example slave code that I've seen lets the vendor 
choose whether to implement internal send/receive queues (such that the mailbox 
accepts subsequent requests before finishing processing, and will *typically* 
respond in order -- but asynchronous replies such as CoE emergencies and EoE 
packets can be injected at any time), or whether to save memory and skip the 
queues (in which case only one thing can be processed at a time).

Really it mostly depends on whether the slaves internally process mailbox 
requests synchronously or asynchronously, which is also left up to the vendor 
to decide.  And again, looking at slave example code suggests that most 
commonly processing is synchronous, but in some cases there is some 
asynchronous plumbing that supports only one pending conversation per protocol.

The master can sort of tell the difference between these cases; a slave without 
queues will clear the send mailbox (allowing a subsequent send to succeed) 
while it is processing a request, but until it finishes and fetches the next 
request any further attempts to send will fail (as the send mailbox is still 
full).  Conversely a slave that implements a queue is likely to clear the send 
mailbox fairly quickly and repeatedly (even if it's still synchronously 
processing only the first request) and might eventually either stop pulling 
requests from the send mailbox or pulling them anyway and sending error 
responses (out of order relative to processing), if the master is sending 
requests faster than they can be queued.

I *believe* that in general it is only safe to rely on having one in-progress 
conversation per protocol, such that the protocol can be used to determine 
which conversation thread the reply applies to, but that the master must be 
prepared to accept specific protocols in a different order to when they were 
sent (and again, asynchronous replies at any time).

It's likely that many slaves will be able to cope with receiving multiple CoE 
conversations in parallel (even when internal processing is only synchronous), 
but then the master logic required to route the replies to the appropriate FSM 
becomes more complex (in case internal processing is asynchronous).  And I'm 
dubious that this should be relied on by the master, which is why I'm 
uncomfortable with the way this patch would work -- it would try to send both 
requests and then *hope* that the mailbox read lock is acquired by the one that 
the slave responds to first; if it gets it wrong, it will treat both replies as 
failures because they were routed incorrectly.

But no, I can't point to anything concrete in the standards.  This is mostly 
just a feeling, partly based on slave example code.

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Support for multiple mailbox protocols

2014-06-25 Thread Gavin Lambert
On 26 June 2014, quoth Knud Baastrup:
>> Additionally it doesn't look like you have any protection against 
>> concurrent CoE access (which TBH I'm not entirely sure whether this 
>> occurs, but Frank's patch 27 suggests it does), and I'm definitely 
>> not a fan of allocating/deallocating memory on each mailbox transfer, 
>> which is what it looks like you're doing.
>
> I believe that the check_mbox flag should work ok for concurrent CoE 
> access as well (however, I can as well not see how this can happen?) 
> as the check_mbox flag will ensure only one ongoing read request per 
> slave no matter which mailbox protocol.

The issue, as I understand it, is that both fsm_master and fsm_slave have
their own separate fsm_coe instances.  (Several other state machines have
references to an fsm_coe but it's always handed down from one or the other
of these parents.)  So it's just a question of whether fsm_master and
fsm_slave can execute (their CoE related parts) concurrently or not.  Which
I'm not entirely certain about from looking at the code, but I should add
that after adding Frank's coe-lock patch I have observed cases where it has
reported concurrent CoE access.  (I haven't been able to get it to happen in
my bench testing but it has occurred in field tests; as a result I'm not
sure exactly where it's coming from.)

If there really is concurrent CoE going on, it's not a good idea to send two
CoE requests in parallel to the same slave -- some slaves can cope with that
(and send both replies) but some may choke, and the order in which they
reply is not guaranteed.  So for one thing, your patch doesn't attempt to
control sending, only receiving; this could result in both requests being
sent, but the FSM that "wins" the check-lock might not be the one whose
answer first arrives.  And the non-atomic check I mentioned before could
result in both checks being active at once if they're coming from separate
threads (which is less likely than sequentially concurrent access, but if
you didn't want to protect against threads you wouldn't have used a lock).

> Each fetch data datagram is already allocating memory corresponding to 
> the mailbox size so allocation memory is already heavily used.

I don't believe so.  Each time it does call ec_datagram_prealloc, yes, but
this will only allocate memory if the datagram isn't already large enough;
it might take a few calls to fully expand the whole datagram ring buffer but
after that it should be able to exchange datagrams of any equal or smaller
size without reallocation.  (That's why it's a prealloc, not an alloc.)
Conversely your version will always free/realloc on every transfer, which is
what I'm objecting to.

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ethercat-1.5: Various issues

2014-06-25 Thread Gavin Lambert
On 26 June 2014, quoth Frank Heckenbach:
> OK. (I suppose you mean ec_slave_datagram_from_buffer instead of
> ec_slave_buffer_to_datagram.)

Yes, sorry.

> - Switching from wrap-around to real arithmetic makes things clearer
>   (calculating with wrap-around is very confusing; writing them out
>   allows for normal mathematical reasoning):

It is indeed very confusing, but your explanation makes sense.  Thank you.

> - In your previous mail you said the timeout was 790269982us most
>   of the time?! You must have done something different there,
>   because with the above assumptions (using jiffies, HZ=250, 32 bit)
>   the maximum possible value of that calculation is
>   (2^32 - 1) / 250 = 17179869.

I did see both values; I don't recall at the moment exactly which
circumstances triggered each one.

> But I'm not sure how this can happen: Either due to a race condition
> e.g. with ecrt_master_send called by ec_cdev_ioctl_send or
> ec_master_send_ext etc. (which may have been the case when I made this
> particular change; I don't remember if I fixed the locking bugs before
> or after it; but you have applied my locking patches and done proper
> locking in your application, haven't you?) or indirectly through device-
> >poll() (that's some complicated code paths, can't see it right now). To
> check the latter, you could debug-print jiffies again after device-
> >poll() in ec_device_poll.

I have not applied your locking patches as yet, partly because the 1.5.2
code seems quite different and I'm not sure how to correctly apply them, and
because most of them seem to be focused on locking for RTAI applications,
and (at least at the moment) I'm using a userspace app, which has a somewhat
different locking model.

I've also left out most of your EoE-focused patches, because I don't have
any EoE devices at the moment so I couldn't test them anyway.

> If it's something else (which I don't see ATM), you could make it wrap-
> around-safe by changing
> 
>   jiffies_poll > jiffies_sent
> 
> to
> 
>   jiffies_poll - jiffies_sent > timeout_jiffies
> 
> and likewise with cycles. (But I wouldn't recommend this without finding
> the root cause of the problem.)

That's exactly what the existing (pre-patch) code already does though, which
is why I said that your added check was redundant with the wraparound-safe
version.  (Removing it did, however, cause those weird timeout warnings,
until I made those other changes I've already mentioned, and detailed
below.)

> One reason would be patch #16 which means datagrams can be queued and
> not sent for a while (instead of corrputing other frame data).
> If so, the change would be in the wrong patch, of course. I'm not 100%
> sure at the moment if something in #11 doesn't also require it, but I
> think #16 is reason enough, so I don't have to check #11 in detail, do
> I?

It didn't seem to me that #16 needed it, but maybe I missed something.

I've attached the complete set of your patches that I currently have
applied, including my modifications both for bringing them up to 1.5.2 and
to try to resolve some of the issues that I encountered, along with a few
cosmetic changes (mostly spacing).  It's possible that one of the patches I
didn't apply contained the "missing link", but they did seem reasonably
consistent on a read-through at least.

The differences from your patches:

04-string-download:
  Just cosmetic.

07-sdo-up-download:
  Also made similar changes to ecrt_master_sdo_download_complete.

08-mrproper:
  No changes.

09-mailbox-tag:
  Temporarily added a definition of EC_MBOX_NO_PROTOCOL to allow it to
compile without future patches.

10-mailbox-allocate-buffer:
  Removed the temporary definition since this patch has the "real" one.

11-mailbox-buffer:
  Changed ec_slave_datagram_to_buffer and ec_slave_datagram_from_buffer to
use uint8_t protocol.
  When fetching a mailbox response from the buffer, sets the sent/received
timestamps to {cycles,jiffies}_poll instead of the current time. [Partially
fixes an issue with invalid values printed during timeout checks.]
  Omits your change to time out QUEUED datagrams as well as SENT ones.
[Remainder of fix for invalid timeout values.]
  Omits {cycles,jiffies}_poll > {cycles,jiffies}_sent comparison when
checking timeouts.  [Not wraparound-safe.]
  Removed declaration of last_index variable, as not actually used up to
this patch.

12-fetch-check:
  Just moved due to 1.5.0/1.5.2 differences.

13-send-retry:
  Just cosmetic.

14-index-reuse:
  Added back declaration of last_index variable, as it's needed now.
  Otherwise just cosmetic/version differences.

16-frame-corruption:
  Just cosmetic/version differences.

25-output-stats:
  Just version differences.

26-clear-mailbox:
  Just cosmetic/version differences.

27-coe-lock:
  Just version differences.

28-dictionary-fetched:
  Also added to ec_slave_info_t (for ecrt_master_get_slave).
  Bumped EC_IOCTL_VERSION_MAGIC to allow userspace apps to detect
incompatible versions.

29-init

Re: [etherlab-dev] i met problems when i try to communicate with slaves.

2014-06-25 Thread Gavin Lambert
On 25 June 2014, quoth taotao:
> I try to build ethercat on ubuntu. Now Ethercat host master has been 
> successfully established.

These sorts of questions really belong on the users list.

> Q1: How can i find slave's alias and position(ec_slave_config_t)?
> i try to use command "/opt/xxx/ethercat sl -v" , but there not have this 
> information.  

It does.  If you run "ethercat slaves" without the -v, you'll see a number on 
the left (this is the absolute ring position, starting from 0) and in the 
second column a N:M pair (this is the relative position from the nearest 
previous device with an alias; if it's showing :0 then that's the alias for the 
device itself).

> Q2: If i just have only one slave (ET1100) , so I just one slave's alias and 
> position, 
> I 'm right?
> Example c file always has four defines ( slave's alias and position).

You can address any slave either as "0, absolute" or as "alias, relative".  
Both work; which one you use just depends on what the requirements of your 
application and network configuration are.

> Q3:  struct ec_pdo_entry_reg_t:
> what is slave alias address?   It is xml (beckoff 's slave configure xml) 
> keyword :startaddress?
> what is slave direction? how can it get this information?

No.  You need to specify which PDOs your application is interested in 
retrieving.  You should already know that from the slave's datasheet, or you 
can look at the defaults using the "ethercat pdos" command.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Support for multiple mailbox protocols

2014-06-24 Thread Gavin Lambert
Hi Knud,



I haven't reviewed or tested the whole patch yet, but I like the idea in
concept.  One thing that made me pause though is the way that the check_mbox
flag is handled.  You have it protected by a semaphore (implying you're
expecting it to get used concurrently), but you have non-atomic
test-and-then-set actions which will mean that if concurrent access is
attempted multiple users might call prepare_check and check_mbox_set
(causing some of them to be "lost", which may or may not be a problem), and
then on failure one might call check_mbox_clear while another is still
waiting for a check to occur (which seems like it would be a problem).

 

Again, I haven't followed the logic all the way through yet so possibly this
isn't a real problem, but it bothers me. J

 

If the intent is to have only one concurrent state machine trigger a check
datagram at a time (which it seems like it is), then you should probably be
using an atomic test-and-set operation instead.

 

Additionally it doesn't look like you have any protection against concurrent
CoE access (which TBH I'm not entirely sure whether this occurs, but Frank's
patch 27 suggests it does), and I'm definitely not a fan of
allocating/deallocating memory on each mailbox transfer, which is what it
looks like you're doing.

 

Regards,

Gavin Lambert

 

From: etherlab-dev-boun...@etherlab.org
[mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of Knud Baastrup
Sent: Wednesday, 25 June 2014 01:13
To: etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] Support for multiple mailbox protocols

 

Hi !

 

I just discovered that the provided patch included a hardcoded mailbox size
that I have now replaced with a dynamic allocated buffer. I have attached a
new patch (ethercat_152_stable_mailbox_1.patch) that fully replaces the
prior patch (ethercat_152_stable_mailbox.patch).

 

Thanks,

 

Knud Baastrup

 

 

From: etherlab-dev-boun...@etherlab.org
[mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of Knud Baastrup
Sent: 23. juni 2014 14:27
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] Support for multiple mailbox protocols

 

Hello Florian, Gavin, Frank (and others facing the lack of support for
multiple mailbox protocols)

 

 

I have like Frank Heckenbach and Gavin also struggled with the lack of
support for multiple mailbox protocols and came up with an alternative
solution to the one provided by Frank in patch 9-10-11. 

 

I have attached the patch that is based on the stable-1.5 branch. The patch
should support all the mailbox protocols, but has only been tested with CoE,
EoE and FoE.

 

I will in few lines try to summarize the patch:

In this patch I accept that a mailbox read request (e.g. FP-RD) for a given
mailbox protocol can return data from any other mailbox protocol running at
the same time. The data returned by a read datagram is therefore stored in a
separate buffer for each mailbox protocol instead of the datagram data
buffer. The mailbox state machines will check and fetch the data from their
own buffer instead of the datagram buffer (that is no longer used for
mailbox read data). A check_mbox flag is introduced to track when a given
slave has an ongoing mailbox read request. In normal case the mailbox state
machine will run as previously if no mailbox read request is ongoing, but if
a mailbox read-request is ongoing (check_mbox flag is set) it will check its
own mailbox buffer (as the ongoing mailbox read request might have returned
its data) and otherwise wait until the read request is done and it gets the
opportunity to reserve the mailbox for its own read request.

 


Venlig hilsen / Best regards, 

Knud Baastrup 
DEIF Wind Power Technology
SW Developer

Direct.: +45 9614 8458
E-mail:  <mailto:k...@deif.com> k...@deif.com
---

Retrofit your Vestas COTAS controller and optimize availability that will
improve your annual energy generation, reduce service cost and extend the
lifetime of your turbine.
V27 V39 V44 V47

 
<http://www.deifwindpower.com/retrofit.aspx?utm_source=Retrofit&utm_medium=e
mail%20signatur&utm_term=Retrofit%2BVestas%2BCOTAS&utm_content=textlink&utm_
campaign=Retrofit> Read more about DEIF's solutions to retrofit your
turbines on our website

 

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ethercat-1.5: Various issues

2014-06-17 Thread Gavin Lambert
A few hours ago, quoth I:
> So I thought I'd try asking Florian which method would be preferred for
> submitting code changes (ie. which would be most likely to actually get
> merged to mainline):
>
> 1. Stick with posting patches to the ML and hope they don't fall through
> the cracks.
> 
> 2. Fork, branch, commit, merge-request.  (At least branches can be
> closed later to reduce the clutter a little, although that doesn't help
> SF's repository browser at the moment.)
> 
> 3. Fork, tag, commit, merge-request.  (Tags can kinda be deleted but
> they seem a little awkward for this workflow since they require extra
> commits to
> add/remove/update.)
> 
> 4. Fork, bookmark, commit, mention on ML (because SF's merge-requests
> don't handle bookmarks yet).  (From what I can tell, bookmarks are the
> closest Mercurial equivalent to branches that can be deleted after their
> changes are merged, but SF's UI doesn't yet know they exist.)
> 
> 5. Something else.  (hg bundle maybe?)

6. Fork, commit, merge-request.  (ie. an entire fork/clone per "feature", no
explicit branching)

This actually seems like the easiest model using the existing tools, and
seems to be recommended by quite a few people.  But it makes me cringe
inside.  Especially if you want to submit multiple independent features.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ethercat-1.5: Various issues

2014-06-17 Thread Gavin Lambert
A few days ago, quoth I:
> I'm planning to set up a forked repository on SF consisting of the
> current 1.5.2 plus several of the patches I've submitted in the past, 
> in the hopes that maybe it'll be easier for IgH to do an hg pull 
> rather than applying a patch from a mailing list

I've been playing around with it for a little bit, but I'm not sure the
merge-request tools on SF are really up to the job yet.

Also I'm more familiar with Git than Mercurial, and the Git code-sharing
pattern of "fork, branch, commit, pull request, delete" doesn't seem to map
well to Mercurial, since it has very permanent branches that refuse to die
after they're merged.

So I thought I'd try asking Florian which method would be preferred for
submitting code changes (ie. which would be most likely to actually get
merged to mainline):

1. Stick with posting patches to the ML and hope they don't fall through the
cracks.

2. Fork, branch, commit, merge-request.  (At least branches can be closed
later to reduce the clutter a little, although that doesn't help SF's
repository browser at the moment.)

3. Fork, tag, commit, merge-request.  (Tags can kinda be deleted but they
seem a little awkward for this workflow since they require extra commits to
add/remove/update.)

4. Fork, bookmark, commit, mention on ML (because SF's merge-requests don't
handle bookmarks yet).  (From what I can tell, bookmarks are the closest
Mercurial equivalent to branches that can be deleted after their changes are
merged, but SF's UI doesn't yet know they exist.)

5. Something else.  (hg bundle maybe?)

Another possibility might be to use the SF-provided Wiki pages or Tickets to
keep track of contributed patches, although that might require some
permission changes to let non-members edit the wiki page.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ethercat-1.5: Various issues

2014-06-15 Thread Gavin Lambert
On 13 June 2014, quoth I:
> Ah.  I found a spot in ec_master_queue_datagram where I had incorrectly
> applied patch 11 (and jiffies_sent would have been 0).  I've been
> sidetracked a little and haven't had a chance to re-test this, but I
> expect it will solve the issue; thanks for the hint!

Unfortunately that wasn't it.  But the bisection search I did definitely
suggested it was something in this patch that introduced the behaviour.

I did make some other intentional changes from the patch:
  - I changed the "protocol" parameter of ec_slave_datagram_to_buffer and
ec_slave_buffer_to_datagram to uint8_t from uint16_t (since that seems more
consistent).
  - I left out the cycles_poll > cycles_sent and jiffies_poll > jiffies_sent
checks in the timeout checking, since as I noted before these would not be
safe against wraparound.
  - Some of the logging levels were changed.
  - Otherwise it's only minor formatting changes and nothing that should
affect functionality.

If I put the jiffies_poll > jiffies_sent check back in, I do not get these
17171869us timeouts; but I'm unconvinced it's safe to leave this check in.

Incidentally, reversing the calculation:
  time_us = (unsigned int)((jiffies_poll - jiffies_sent) * 100 / HZ);
where time_us = 17171869 and HZ = 250, suggests that (poll-sent) is -2
jiffies (17171869 * 250 / 100 ~= -2), but then plugging that forwards
through the formula I'm not sure why time_us != 4294959296 (aka -8000)
instead.  (I did notice that (uint32_t)-2 * (uint64_t)(100 / 250) =
17179869176000, which at least has the right digits but they're at the wrong
magnitude.)  I suspect I'm missing something fundamental (and obvious).

In any case, I changed the code where it assigned the current time to
datagrams pulled out of the buffer to assign {cycles,jiffies}_poll instead,
and that seemed to resolve the issue without the need for the dubious
comparison.  Do you agree that this is a reasonable solution?

> (Part of the side-track suggested that patch 26 might not be sufficient
> to solve that problem, but I haven't confirmed that yet, and it'll
> probably be a few days before I get a chance to check it again.  And of
> course it's possible that this was just another error on my part, or
> affected by the above goof.)

This did work in the end; when it was appearing not to work, it was on a
bisection build that didn't have that patch applied yet.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ethercat-1.5: Various issues

2014-06-11 Thread Gavin Lambert
Quoth Frank Heckenbach:
> Perhaps you can at least make sure they compile with the new version.
> Ultimately, Florian will have to integrate the patches into the
> development sources, or not.

Well, they won't all merge/compile cleanly without some changes.  (Also
there's a few cases where variables introduced in one patch aren't used
until a later patch, and the reverse, so some of the intermediate states
won't compile.)  But *most* of the changes are fairly minor.

> Indeed, it doesn't, but the development code as obtained by the "hg
> clone" command as listed on http://etherlab.org/de/ethercat/index.php
> does. Though it uses
> master->io_mutex instead of master->io_sem now. For what I can tell
> (seeing that 1.5.0 doesn't have io_mutex at all), an intermediate
> version introduced the code with io_sem (which I then took) and was
> later changed to io_mutex along with other changes. (This all might not
> be very relevant except to explain where I got the code from; otherwise
> you can treat it like it was one of my patches, I guess.)

I'm not really sure what's going on with the default branch, but as best I
can tell it's outdated and should be ignored.  All the new changes are on
the stable-1.5 branch.  (There hasn't been anything committed to "default"
since 2011.)

> I admit I'm not very proficient with hg, so I probably mixed up the
> commits. I'd have to read up on it and dig deeper, maybe you're faster
> at it. In any case, the cloned code (see above) does contain lock_cb and
> unlock_cb in place of send_cb and receive_cb as in 1.5.0 and 1.5.2. So I
> figure, Florian made this change, but hasn't pushed it into a release
> yet.

I haven't traced the history, but I suspect that (as this was on the default
branch), this was the code before send_cb and receive_cb were introduced in
the first place.  So changing it back would presumably be a regression.  (I
don't have particularly strong feelings about it though because I'm not
using RTAI.)

> I don't know about your application, but in my case, I only want to run
> with the correct (configured) number of slaves anyway.
> Therefore, in my cyclic code I always call ecrt_master_state and check
> that in the result link_up is set and slaves_responding is equal to the
> number I expect; otherwise I abort immediately because something's
> fishy.

Mine is a bit more flexible; I'm trying to get it to support hotplugging as
cleanly as possible if I can, along with a bit of autodetection.

> ec_fsm_coe_exec could return 0 in 1.5.0 already. The return value is
> only used by those callers that go on to do other things after calling
> it (to return early if 0); those that call it at their end don't (need
> to) check the result, and I think that's ok.
> 
> One of my changes was that ec_fsm_master_exec doesn't always return
> 1 after executing a state (it still does in 1.5.2). It's a consequence
> of my previous changes: If CoE is reserved by the slave FSM, the master
> FSM must wait. In order to do that I set
> datagram->state to EC_DATAGRAM_INVALID, and prevent it from being
> queued. Returning 0 from ec_fsm_master_exec is an easy way to achieve
> this (for both callers) since the 0 return (the first branch in
> ec_fsm_master_exec) was already there in case the FSM is still waiting
> for a reply and doesn't execute a state at all (and therefore obviously
> doesn't send a new datagram either).

But in 1.5.2 ec_fsm_coe_exec has "datagram_used" and could return 0 when the
datagram was in INIT, QUEUED, or SENT, which I assumed would occur in some
of the middle states while it was waiting for something to happen, which
didn't sound that different from what you were trying to do.  But you're
right that previously this didn't make any difference to what
ec_fsm_master_exec returned, which seems a little odd.  I wonder if this
might be why I sometimes see "datagram initialised" errors.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] ethercat-1.5: Various issues

2014-06-10 Thread Gavin Lambert
On Friday, 6 June 2014 02:23, quoth Frank Heckenbach:
> I attach my complete set of patches, including the patches I've sent in
previous 
> mails (02-* to 08-*, slightly adjusted; 01-ethercat-1.5-header.patch was
applied 
> in your code already).

I've been experimenting with merging most of these into my local code
(ignoring some of the EoE-specific patches as I don't have any way to test
that at the moment), with the intent to re-release as 1.5.2-compatible
patches once it seems to be behaving itself.  While merging I came across a
few oddities that I was hoping you might be able to clarify:

>(09-ethercat-1.5-mailbox-tag.patch and
>10-ethercat-1.5-mailbox-allocate-buffer.patch contain the boring
>parts (preparations, new data structures) that shouldn't change
>the behaviour. The main change is in
>11-ethercat-1.5-mailbox-buffer.patch.)

In patch 11, you have a change to master.c that checks for jiffies_poll >
jiffies_sent in addition to the original (jiffies_poll - jiffies_sent) >
timeout_jiffies.

What was the motivation for this?  It doesn't seem like it will be
wraparound-safe, and the safe version of it would be redundant with the
original code.

>After I backported code from your repository to add locking in
>ec_master_clear_slaves()
>(18-ethercat-1.5-locking-fix-backport.patch)

I'm curious where this was backported from.  The current 1.5.2 code doesn't
have this.

>I see that in newer versions (e.g. commit 53b5128e1313), you
>apparently reverted the callback mechanism from send/receive
>callbacks back to lock/unlock callbacks as it was in 1.4. I also
>prefer the latter since they can be used more generally.

The specified commit does not seem to be related to callbacks, was made in
2011, and the latest 1.5.2 code still has send/receive callbacks.  So this
confuses me.

>  To avoid this, I fetch the mailbox once before using it for the
>  first time, ignoring any result, whether empty or not.
>  (26-ethercat-1.5-clear-mailbox.patch)

In this patch, you log a message saying that data was cleared if the fetch
datagram working counter != 1.  Shouldn't that be != 0 or == 1 instead?
AFAIK the working counter will be 0 if the mailbox is already empty and 1 if
it fetched and discarded the mailbox contents.

Also, this blindly clears the mailbox whenever the slave is rescanned.  It's
possible for the slave to be rescanned during operation (eg. if the number
of responding slaves on the network changes).  I'm not sure if this will
have any negative consequences for pending CoE/FoE/EoE requests (and
presumably unsolicited EoE received packets) or if these are just
abandoned/reset on rescan anyway (which might also be a problem, but at
least not a new one).  I haven't looked closely enough at the code in
question to be sure.

>  The next problem then is that some code (e.g.
>  ec_fsm_master_exec()) just assumes that the FSM has a datagram to
>  send out in every state, so it always returns 1 unless it's
>  waiting for a reply. With my previous change, this isn't the case
>  anymore, and it cannot be -- unless I'd block the FSM completely
[...]
>  (27-ethercat-1.5-coe-lock.patch)

This one was also a bit tricky.  Since the patch was made, it looks like
ec_fsm_coe_exec had already been changed to include the concept of not
sending a datagram and returning 0 -- except that most of the places that it
gets called just ignore the return value and then return 1 from the
higher-level state machine anyway, and in other places assume that not using
the datagram means that the FSM has completed.  So I'm not sure what's up
with that, although I suspect the original issue you were trying to resolve
remains.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] [Patch] User-mode deactivate bug

2014-05-26 Thread Gavin Lambert
Hi,

When using the user-mode library, I found that calling
ecrt_master_deactivate does not properly release the domain memory mapping.
In particular, the following sequence:

 - ecrt_request_master
 - (domain set up)
 - ecrt_master_activate
 - ecrt_master_deactivate
 - (domain set up)
 - ecrt_master_activate
 - ecrt_master_deactivate
 - ecrt_master_release

Does *not* result in actually releasing the master kernel module -- the
master thread remains running and the master module cannot be unloaded until
the application process terminates.

It's especially noticeable when the first domain is non-empty and the second
is empty -- during ecrt_master_release the call to munmap will fail with
EINVAL because master->process_data is non-NULL but
master->process_data_size is 0.

The attached patch resolves this by moving the munmap to
ecrt_master_deactivate instead of ecrt_master_release.

Regards,
Gavin Lambert




lib-deactivate.patch
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] ethercat tool xml command

2014-04-21 Thread Gavin Lambert
On Saturday, 19 April 2014 06:53, quoth Henry Bausley:
> When using the command ethercat xml and a junction box ie. EK1152  or
> likely anything else that does not have syncs the command will report
> incorrectly the subsequent device's slave position and vendor id in hex.
[...]
> -<< in << "  " <<
> endl
> +<< in << "  " << endl

I think you have the patch reversed.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] compile and use ethercat driver under linux 3.10.33

2014-04-08 Thread Gavin Lambert
On 9 April 2014, quoth dries geentjens:
> Still, when I try to do "ethercat slaves" for example, the raspberry pi 
> immediatly gives me the following line: "IOCTL() version magic difference.

> EtherCAT0:28 ethercat:13".

This means that your installed kernel modules do not match your userland
versions.

> The second computer gives "Module ec_master not found. Failed".
> What is saw was the fact that I don't have the "ec_master.ko" in the 
> /lib/modules//ethercat.  This ethercat directory is just empty. 
> What can be the problem for this problem?

This suggests that you have not installed the kernel modules.

Both of these problems most likely stem from the same source: you are not
building the system properly.  Read the INSTALL file.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Missing mailbox handler

2014-03-31 Thread Gavin Lambert
Hi,

 

I've previously reported this issue and stated in general what needs to be
done to resolve it (essentially, have a central dispatcher that's
responsible for retrieving mailbox contents from slaves and hand them off to
the protocol-specific state machines from there, possibly making use of the
mailbox state PDOs on those slaves that support it), but I don't have any
sort of implementation or plan for this at the moment, and it's low on my
list of things to look at because the only slaves I'm working with at the
moment only support CoE (or only FoE in boot mode), so there's no conflict.

 

Regards,

Gavin Lambert

 

From: etherlab-dev-boun...@etherlab.org
[mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of Knud Baastrup
Sent: Tuesday, 1 April 2014 03:57
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] Missing mailbox handler

 

Hi

 

I often get below SDO: Input/output error when using the EtherCAT command
line tool while an EoE handler is running as well.

 

~ # ethercat upload -p1 0xf101 1 -t uint8

Failed to upload SDO: Input/output error

 

 

Below is some debug info from the syslog where it seems like the SDO upload
requested via the EtherCAT command line tool has been received by the EoE
handler that have simply dropped the datagram due to wrong mailbox protocol.


 

2000-01-06T22:14:11.140760+00:00 PCM51-sn214 kernel: [171153.189812]
EtherCAT DEBUG 0: ecrt_master_sdo_upload(master = 0xc786e000, slave_position
= 1, index = 0xF101, subindex = 0x01, target = 0xc7198720, target_size = 1,
result_size = 0xc7909dc8, abort_code = 0xc7909dcc)

2000-01-06T22:14:11.140845+00:00 PCM51-sn214 kernel: [171153.189865]
EtherCAT DEBUG 0-1: Scheduling SDO upload request.

2000-01-06T22:14:11.140877+00:00 PCM51-sn214 kernel: [171153.191475]
EtherCAT DEBUG 0-1: Processing SDO request...

2000-01-06T22:14:11.140900+00:00 PCM51-sn214 kernel: [171153.191507]
EtherCAT DEBUG 0-1: Uploading SDO 0xF101:01.

2000-01-06T22:14:11.140922+00:00 PCM51-sn214 kernel: [171153.191529]
EtherCAT DEBUG 0-1: Upload request:

2000-01-06T22:14:11.140945+00:00 PCM51-sn214 kernel: [171153.191544]
EtherCAT DEBUG: 00 20 40 01 F1 01 00 00 00 00 

2000-01-06T22:14:11.152670+00:00 PCM51-sn214 kernel: [171153.196476]
EtherCAT WARNING 0-1: Other mailbox protocol response for eoe0s1.

2000-01-06T22:14:11.164709+00:00 PCM51-sn214 kernel: [171153.204544]
EtherCAT ERROR 0-1: Reception of CoE upload response failed: No response.

2000-01-06T22:14:11.164791+00:00 PCM51-sn214 kernel: [171153.212760]
EtherCAT ERROR 0-1: Failed to process SDO request.

 

 

I can see that somebody have added an FIXME comment in the code
(ethernet.c),  but I wonder if anybody have already done some thoughts of a
possible implementation or if anybody is already working on a solution ?

 

if (mbox_prot != 0x02) { // EoE FIXME mailbox handler necessary

   eoe->stats.rx_errors++;

#if EOE_DEBUG_LEVEL >= 1

   EC_SLAVE_WARN(eoe->slave, "Other mailbox protocol response for %s.\n",
eoe->dev->name);

#endif

 

 

BR,  Knud Baastrup

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] "ecrt_master_write_idn(...)" and "invalid opcode: 0000"

2014-02-17 Thread Gavin Lambert
Quoth Koch Daniel:
> Everytime I try to reset a slave-error on a BoschRexroth Servo-Drive by
> sendig a SoE-Message (S-0-0099) via ecrt_master_write_idn, i am facing a
> kernel issue (see below) on my rtai-system (version 3.6.1 on a patched
> linux 2.6.24 with ethercat-master 1.5.2). Has anybody ever faced this
> issue as well? Hopefully, anyone knows a hint getting rid off of this?

I can't speak to this exactly, but:

> Feb 13 15:15:20 pc-wt1 kernel: Call Trace:
> Feb 13 15:15:20 pc-wt1 kernel:  [] __kmalloc+0x9c/0xd5
> Feb 13 15:15:20 pc-wt1 kernel:  [] vscnprintf+0x14/0x20
> Feb 13 15:15:20 pc-wt1 kernel:  []
> ec_soe_request_alloc+0x23/0x52 [ec_master]
> Feb 13 15:15:20 pc-wt1 kernel:  []
> ecrt_master_write_idn+0x62/0x2d1 [ec_master]
> Feb 13 15:15:20 pc-wt1 kernel:  [] calc_idn+0x52/0x5c [r7912]

This trace indicates that inside ec_soe_request_alloc, it was trying to call
vscnprintf.  The only path that would do this is when kernel memory
allocation fails.

This suggests that you are either out of memory or something was making it
request an allocation for a silly number of bytes (perhaps an incorrect
parameter value somewhere).

Given that the allocation inside vscnprintf also appears to have failed
(with a crash), it's likely that you are out of memory.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] FoE omnibus patch

2014-01-22 Thread Gavin Lambert
Quoth Dave Page:
>  The attached patch against 51ad16e57f8f includes Gavin Lambert's
> FoE patches as folows:
[...]
>  And includes a one-liner PacketNo read busy sequence patch as well
> as a FoE spurious timeout patch.

There's an error in the spurious timeout part of the patch.

In ec_fsm_foe_state_ack_check, the patched code uses datagram->jiffies_received 
instead of fsm->datagram->jiffies_received.  The latter is the received packet, 
while the former is a "recyclable" packet intended for the outgoing request.

(This is actually a regression from commit 8bb574, which fixed the bug a 
different way; so 51ad16 should have worked as-is, although I think I prefer 
use of "time_after" as it keeps the multiplies away.)


I've attached a modified version of the omnibus patch which fixes that, plus 
some minor reformatting.  I've verified that the code compiles with the patch 
but not against slave hardware as yet.  (Also, my patch was against f8b779, but 
it should still apply cleanly to 51ad16 as that file hasn't been changed since 
then.)


Florian, if you'd prefer that these patches be separated (either as a series or 
independent), let me know; I can do that pretty quickly.



foe_omnibus.patch
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] FoE support for read BUSY (ETG.1020 Section 15)

2014-01-21 Thread Gavin Lambert
Mere moments ago, quoth I:
> Did you see my other patch specifically for FoE busy?  My slave is
> working fine for busy reads with both of my patches applied.  (There was
> a third patch included in the same email as the busy patch, but that's
> optional and just increases debug logging.)
> 
> My patch also fixes handling of error packets.

These are at
http://lists.etherlab.org/pipermail/etherlab-dev/2013/000324.html, if you're
having trouble finding it.

> As for incrementing the packet number or not, it's been a while since I
> looked at that so I don't remember whether it increments or not with my
> patches applied, but I think I remember making it do whatever the SSC
> was expecting.

I had a quick look at my SSC patches, so now I think I remember -- my
version would continue incrementing the packet number (which is partly why I
ran into the 256 packet limit that prompted the other patch).

You're correct that ETG.1020 indicates that the packet number should not be
increasing though; I missed that part.  (It worries me a bit when both
master AND slave code have to be changed to make something fundamental like
this work, and it makes me wonder about existing devices -- but I guess FoE
read is not implemented all that often.)

One of us should probably make a combined patch set that fixes all three
issues, for ease of future players (and maybe even merging, one of these
days).  I'll probably get to it in a few days when I (hopefully) get back to
EtherCAT-related work, if you don't beat me to it. :)


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] [PATCH] FoE support for read BUSY (ETG.1020 Section 15)

2014-01-21 Thread Gavin Lambert
Quoth Dave Page:
>  A small change was required to make the BUSY OpCode transition to
> the correct state in fsm_foe.c, and have the subsequent FoE Ack.req show
> the correct PacketNo IAW
> ETG.1020 Section 15
> "FoE Extension."
> 
>  The attached patch includes the FoE PacketNo length fix from Gavin
> Lambert

Did you see my other patch specifically for FoE busy?  My slave is working fine 
for busy reads with both of my patches applied.  (There was a third patch 
included in the same email as the busy patch, but that's optional and just 
increases debug logging.)

My patch also fixes handling of error packets.


As for incrementing the packet number or not, it's been a while since I looked 
at that so I don't remember whether it increments or not with my patches 
applied, but I think I remember making it do whatever the SSC was expecting.


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Large PDOs

2013-12-11 Thread Gavin Lambert
I'm not sure how the former would help you, since you're still constrained
by the wire delays of the large data block.

 

And the second just sounds like "I want non-standard CoE"... so why not just
use CoE?  It uses zero extra FPGA resources (assuming you already have a
general-purpose CPU) and not all that much software - probably little more
than you'd need to write something custom.  (Though if you want something
even more slimmed down, you can use VoE - but then it might be more of a
hassle to use from the master side.)

 

From: Jeroen Van den Keybus [mailto:jeroen.vandenkey...@gmail.com] 
Sent: Wednesday, 11 December 2013 22:14
To: Gavin Lambert
Cc: etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] Large PDOs

 

Gavin,

 

 

Thanks for your reply. Currently this slave does not support SDOs since it
is a 'resource constrained' FPGA based design.

 

Both fast and small, and slow and large PDOs are, unfortunately, on the same
slave.

 

I would have liked to stay away from using SDOs in the control process. So
currently I'm considering two options: either bring out some form of freely
programmable datagram service in the API, or construct a mailbox-like
protocol over a limited-size protocol. The latter isn't exactly good for the
'resource constrained' part of my design.

 

 

J.

 

 

J.

 

 

 

2013/12/10 Gavin Lambert 

How often do you need to access the large value?  If it's at a reasonably
slow rate (and if you have the freedom to change the slave, or at least
unmap the PDO, which it sounds like you do from the below) then you might
want to consider accessing it as an SDO instead.  If it's an array or record
type then you should be able to access it in small enough chunks to not
upset your high-speed domain.  (Provided that the slave supports CoE, of
course.)

 

For the high-speed data, if it's on a separate slave you could consider
using a separate EtherCAT network for it. If it's on the same slave then you
might need to either break up access to the large chunk as above, or batch
up multiple values (assuming it's unidirectional) as in oversampling so that
you can have a slower cycle rate.

 

From: etherlab-dev-boun...@etherlab.org
[mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of Jeroen Van den
Keybus
Sent: Wednesday, 11 December 2013 11:02
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] Large PDOs

 

Hi,

 

 

I want to use a single, quite large (1,024 byte) PDO.

 

Does anyone know how to specify such a large PDO in the XML description file
/ SII EEPROM content ? It seems that only base types are allowed as DataType
in a PDO entry (basically any common data type up to 64 bits). Even if using
64-bit ULINTs, that still means I need 64 PDO entries. That's very unwieldy,
especially since the data only make sense as an array. I would also like to
use a single pointer in IgH master to access it.

 

Another issue is that there's a second domain of 20 bytes that's being
accessed at 100us intervals. Obviously, the large PDO, already requiring
more than 100us on the line for a data exchange, is going to prevent the
small one from being delivered timely, although the large one only needs to
be exchanged once per second. Is there a common way of solving this (perhaps
splitting the domain transfer ?).

I was thinking of accessing the large PDO directly (outside whatever is
defined in the XML, directly to a configured SM) and use a series of FPxx
commands to exchange data. Is there a way to do this in IgH master ? 

Thanks,

J. 

 

 

 

 

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Large PDOs

2013-12-10 Thread Gavin Lambert
How often do you need to access the large value?  If it's at a reasonably
slow rate (and if you have the freedom to change the slave, or at least
unmap the PDO, which it sounds like you do from the below) then you might
want to consider accessing it as an SDO instead.  If it's an array or record
type then you should be able to access it in small enough chunks to not
upset your high-speed domain.  (Provided that the slave supports CoE, of
course.)

 

For the high-speed data, if it's on a separate slave you could consider
using a separate EtherCAT network for it. If it's on the same slave then you
might need to either break up access to the large chunk as above, or batch
up multiple values (assuming it's unidirectional) as in oversampling so that
you can have a slower cycle rate.

 

From: etherlab-dev-boun...@etherlab.org
[mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of Jeroen Van den
Keybus
Sent: Wednesday, 11 December 2013 11:02
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] Large PDOs

 

Hi,

 

 

I want to use a single, quite large (1,024 byte) PDO.

 

Does anyone know how to specify such a large PDO in the XML description file
/ SII EEPROM content ? It seems that only base types are allowed as DataType
in a PDO entry (basically any common data type up to 64 bits). Even if using
64-bit ULINTs, that still means I need 64 PDO entries. That's very unwieldy,
especially since the data only make sense as an array. I would also like to
use a single pointer in IgH master to access it.

 

Another issue is that there's a second domain of 20 bytes that's being
accessed at 100us intervals. Obviously, the large PDO, already requiring
more than 100us on the line for a data exchange, is going to prevent the
small one from being delivered timely, although the large one only needs to
be exchanged once per second. Is there a common way of solving this (perhaps
splitting the domain transfer ?).

I was thinking of accessing the large PDO directly (outside whatever is
defined in the XML, directly to a configured SM) and use a series of FPxx
commands to exchange data. Is there a way to do this in IgH master ? 

Thanks,



J. 

 

 

 

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] [PATCH] FoE: fix read packet number check

2013-11-28 Thread Gavin Lambert
The attached patch fixes an issue with FoE reads where it spuriously fails
if the number of packets required to perform the read operation exceeds 255.

There are still some other issues remaining with FoE, some of which I'm
going to look at shortly.  Although it doesn't look like my prior fixes have
been merged yet.



foe_readpacket.patch
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] Mailbox handling of interleaving responses

2013-11-19 Thread Gavin Lambert
Well, I don't know if it's typical of "real" slaves, but the example slave
code certainly appears (via code inspection) capable of unsolicited posts to
the mailbox, in particular for CoE emergencies.  I haven't checked EoE
specifically but it certainly seems like the sort of thing that ought to be
able to generate unsolicited posts.  And certainly any of the services can
produce delayed responses, during which waiting time it could be possible to
complete a different service request (though it's possible some slaves might
not support this, and it's likely parallel requests would always have to be
for different service types).

 

There are also provisions for multi-fragment responses for all the service
types.  I haven't checked further whether multiple replies can normally be
generated without master action but the slave code seems capable of tracking
and posting (in arbitrary order) queued messages from multiple services to
the mailbox whenever it gets released by a read from the master.

 

I'm fairly certain that the standard specifies that you're only supposed to
have two mailboxes, one for outgoing messages and one for incoming ones; not
one per service type.  (I think it also specifies that they're optional, but
when they exist they're required to be on SM 0 and 1.)

 

In any case, I brought up this topic in the hope that Florian would have a
look at it - even if the master doesn't go to the extent of interleaving
requests itself, I do think that it's a bug if it can't cope with receiving
interleaved responses from slaves.  It's possible that this could be an
explanation for some of the EoE issues that people have been reporting from
time to time.  (I don't use EoE myself so I can't confirm or deny.)

 

It's notable that the CoE state machine appears to be littered with checks
for "wait, was that reply unexpectedly an emergency response", which
suggests that these at least are asynchronous - but I'm not sure that this
behaviour is correct as it stands either, as a CoE emergency could be sent
just after the master posted an FoE or EoE message, for example, and in this
case it looks like it will be discarded, as I mentioned in my original
email.  (It also looks like it won't pick up emergencies unless some other
CoE request is in progress, or until the next request is made, which also
seems wrong.)

 

I must admit that I haven't looked too closely at the datagram-level parts
of the standards, so I don't know if there's an easier way to ask the
network if there's a slave with something in their mailbox short of
individually polling every slave.  Though I think there's supposed to be
some sort of network-based interrupt mechanism for per-slave events, related
to slave registers 0x0200/0x0210.  (Appears to be related to the bit that
you mentioned too.)

 

But I think that's what's needed - some sort of central dispatch that's in
charge of detecting (ideally via the normal domain datagram) and fetching
mailbox data from all slaves and then posting it to the appropriate
slave+service FSM, rather than leaving it up to the individual service FSMs
to explicitly post datagrams to check and read data.

 

From: Jeroen Van den Keybus [mailto:jeroen.vandenkey...@gmail.com] 
Sent: Wednesday, 20 November 2013 09:52
To: Gavin Lambert
Cc: etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] Mailbox handling of interleaving responses

 

Dear Gavin,

 

I think a more robust solution would be to always scan for and fetch data
out of the slave->master mailbox, and then queue these to the appropriate
protocol-specific FSM to handle as they arrive, according to the type
specified in the data itself (so that while FoE was waiting for a response
it could successfully process a CoE or EoE response, for example).

 


Does that make sense, or have I missed something?

 

I think it does. Incidentally, the Ethercat standard specifies to use the
Sync Manager (SM) write flag (SM offset 0x5 bit 0) for precisely that (or
try to read the buffer and observe the WKC).

 

But I also think that any protocol available in slaves (...oE) does not post
into the mailbox on its own initiative. Therefore, if the master does not
initiate any EoE, it should not fear encountering EoE traffic. Keeps things
simple, especially at the slave side.

 

To make matters more complicated, I have long lived in the belief that a
mailbox (sync manager) pair was needed per type of mailbox. The standard is
inconveniently unclear about that kind of details. But I found not a single
example of a multi-mailbox configuration.

 

 

J.

 

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] Mailbox handling of interleaving responses

2013-11-13 Thread Gavin Lambert
Since there's been no response to this other than "maybe you should send
this to the dev list", I'm resending it to the dev list. :)

-Original Message-
Sent: Tuesday, 20 August 2013 15:46
To: etherlab-us...@etherlab.org

Hi,

I was having a look through the master source and I think I've found a
potential problem, if I'm reading it correctly.

It looks like the master FSMs are expecting that when they poke an FoE
datagram (for example) into the master->slave mailbox then when data appears
in the slave->master mailbox it can only possibly be the corresponding FoE
response, and they'll discard any protocol type that they're not expecting.
(And I'm not just picking on FoE -- CoE does the same thing.)

This doesn't seem correct.  Mailboxes are inherently an asynchronicity
mechanism and I would think there's no particular reason why a slave
couldn't post a CoE, EoE, or emergency response to the mailbox before it
posts the FoE response.

While the master could control these to some extent (eg. not posting FoE
requests while waiting for CoE responses) I think there are still some
things (such as emergencies) that the slave is allowed to post unsolicited.
I haven't looked into how things like EoE work but I would think that they
could also be unsolicited.

I think a more robust solution would be to always scan for and fetch data
out of the slave->master mailbox, and then queue these to the appropriate
protocol-specific FSM to handle as they arrive, according to the type
specified in the data itself (so that while FoE was waiting for a response
it could successfully process a CoE or EoE response, for example).

And once this was done the master could then potentially intentionally
interleave its requests to improve performance, eg. to exchange SDOs while
waiting for an FoE or EoE transfer to complete.  (Though this might need a
per-slave setting to enable/disable, as some slaves might not support
interleaved requests.)

Does that make sense, or have I missed something?

Regards,
Gavin Lambert


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] [Patch] Fix for FoE busy response during read and error handling

2013-08-25 Thread Gavin Lambert
Attached are two patches to fix issues I encountered while testing FoE.
(Incidentally, from prior posts to this list it seems like you've had
trouble finding a slave to test FoE on.  While I can't get you one of your
own, I do presently have access to a slave that I can modify, if that would
help with anything else you would like tested.)

Patch 1 is the "main" bugfix patch, and fixes the following issues:
  - if a slave returns an ERROR response to an FoE write, the error was
treated as a generic process error (ACK_ERROR) instead of being reported
properly.
  - if a slave returns a BUSY response to an FoE read, the state machine got
hung up and started treating its own transmitted datagram as having been
received, causing general confusion all around.
  - if a slave returns a single null byte as its FoE ERROR text, the master
would try to log it using the verbose path, which is undesirable.  (The FoE
standard is unclear whether a slave that does not want to return error text
should omit the field entirely or transmit a null byte.  The former is
probably the preferred behaviour but the slave implementation example code
will do the latter, so it should be handled gracefully.)
  - if a slave returns an error message longer than 256 bytes, use of
strncpy will result in an unterminated buffer and the error log will go
astray.  (Unusual but not impossible, depending on the size of the slave's
mailbox.)

Patch 2 is optional and just consists of extra logging information (gated on
FOE_DEBUG) that helped me to track down the above issues.  It does make
transfers quite verbose, but only when FOE_DEBUG is defined.

Both patches are generated independently from
f8b779c9794edceab56cfd0085bfb99970044745, so they should be able to apply
cleanly in either order against the latest version.


On a somewhat related note, if you're not looking closely at the users list,
I posted a message there last week titled "Mailbox handling of interleaving
responses" (2013-08-20), which I would appreciate developer feedback on.
(Reposting the whole thing here seems silly/spammy though.)



foe_busy_1.patch
Description: Binary data


foe_busy_2.patch
Description: Binary data
___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


[etherlab-dev] CoE SII enable_pdo_assign etc

2013-03-26 Thread Gavin Lambert
I was recently looking at the behaviour of the coe_details flags in
master/slave.c:ec_slave_fetch_sii_general and master/fsm_pdo.c, and I'm not
entirely sure that it's correct.  (I'm also not entirely sure that it's
not.)

 

Currently it appears that the master is treating these as capabilities - if
the PDO Assign flag is not set, for example, then it will refuse to write to
SDO 1C12/1C13 during device configuration, even if they are writable and the
application code does pass in specific PDO configurations it wants.

 

Maybe my interpretation is faulty or the standard is worded incorrectly, but
I don't read that meaning from the description in ETG2000 table 40 - my
interpretation of this is that these define whether the master is expected
(or required) to send the configuration to the slave, but does not imply
that the master is not allowed to do so if these flags are not set.  (That's
determined by the access rights on the mapping objects themselves.)

 

(I'm sure there ought to be a description somewhere in ETG1000 too, but I
can't find it.  The ETG1000 documents are really hard to follow.)

 

So I would think that the startup configuration process should raise an
error if PDO Assign is set but the application has not supplied PDO
configuration data, but if the application has supplied PDO config data then
it should try to write it regardless of the flag (and cope with the error
return if it was read-only).  Or if SDO Info is enabled and it's previously
queried for the available PDOs (which I'm not certain of but seems likely)
then it should already know if the mapping objects are writable or not.

 

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] PDO entry with Index 0x0000, missing PDO registration

2013-03-12 Thread Gavin Lambert
IIRC, a PDO :00 means that it’s just padding, not data.  As such you
probably shouldn’t include it in your PDO mapping structure.  I think the
master code is smart enough to align it correctly anyway, but I must admit I
haven’t really experimented with that sort of layout.

 

From: etherlab-dev-boun...@etherlab.org
[mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of Jürgen Kunz
Sent: Wednesday, 13 March 2013 06:34
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] PDO entry with Index 0x, missing PDO
registration

 

Hello,

I had the problem with a new slave-type that only the first of four slaves
worked (OP), all others were in SAFEOP +ERROR. In the debugging output of
the master (ethercat debug 1) I saw that only some of the PDOs of the first
slave hat been registered in the domain, the PDOs of the other slaves had
not been registered.
The cause is that one of the PDOs have an entry index of 0x (see
pdos.txt), which causes the ecrt_domain_reg_pro_entry_list to quit, so all
other PDO entries after that are ignored.
With the attached patch (diff.txt) I get all slaves to work.

Regards,
Jürgen Kunz

-- 
Dipl.-Inform. Jürgen Kunz

Technische Universität Darmstadt  
FG Simulation, Systemoptimierung und Robotik
 
Hochschulstr. 10
64289 Darmstadt

Tel.: ++49 (0) 6151-16-70383
Fax: ++49 (0) 6151-16-6648
E-Mail: kunz(at)sim.tu-darmstadt.de
Homepage: http://www.sim.tu-darmstadt.de

___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] e1000e driver for 3.2.x kernel

2012-10-24 Thread Gavin Lambert
Quoth myself:
> (The in-kernel e1000e driver plus the "generic" driver appear to work
> fine, but I'm presuming this has less functionality or performance.)

Although one possibly-unrelated problem that I'm still having with either
the generic+e1000e driver or my patched-up ec_e1000e driver is that if it
had a link at some point during or after startup, and then I disconnect the
cable or power-down the first slave device, it doesn't seem to recognise the
loss of link and just keeps reporting datagram timeouts.  I don't remember
this being a problem with the previous master hardware I was using, which
was using the r8169 driver (and a 2.6 kernel) instead.

Any hints where to look to try to resolve this?


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


Re: [etherlab-dev] e1000e driver for 3.2.x kernel

2012-10-24 Thread Gavin Lambert
Quoth Jürgen Kunz:
> Attached you'll find the e1000e driver for 3.2.x kernel versions. 
> I've tested it on a 64-bit Debian Squeeze with 3.2.0-rt backports 
> kernel (=RT-Preempt).

As recently mentioned over on the users list, I'm having a problem with this
driver.

I'm using the same distribution (Debian Squeeze x64) but I'm trying to use
the vanilla 3.2.31 kernel (plus PREEMPT_RT rt47 patches).

(The in-kernel e1000e driver plus the "generic" driver appear to work fine,
but I'm presuming this has less functionality or performance.)

The 3.2.31 in-kernel sources for the driver don't quite match the -orig
sources provided in the hg tip, so there have been some minor changes but
nothing that seems directly relevant to my problem.

I've found two separate issues so far; the first is that initialisation
aborts due to an interrupt failure:
> ec_e1000e :01:00.0: irq 41 for MSI/MSI-X
> ec_e1000e :01:00.0: (unregistered net_device): MSI interrupt 
> test failed, using legacy interrupt.

I've traced this to netdev-3.2-ethercat.c line 3869 (inside e1000_open),
where it calls e1000_request_irq (which does nothing for EC devices for some
reason) and then tries to test that the interrupt works, which of course it
won't.  I've tried working around this by putting this whole block into an
"if (!adapter->ecdev)" block; this seems to avoid the error but I'm not sure
if it has any negative consequences.

(Note that my line numbers will probably not quite match yours as I'm
looking at a repatched version of the 3.2.31 sources; but these problems
happen with the original hg tip driver too.)

The second issue is that I get this every couple of seconds:
> ec_e1000e :01:00.0: (unregistered net_device): Reset adapter

And as a result, communication is impossible since it keeps resetting.  I've
traced this to e1000_watchdog_task, specifically line 4563 just after the
"we've lost link" comment block.  The link-test logic here looks wrong;
specifically, it is only checking the tx_ring in the !ecdev case.  Rewriting
this condition "properly" appears to get me past this issue, at least to the
point at which low-speed bus scans etc seem to be working again.  I haven't
yet tested under high-speed conditions.

I can send through patches for the fixes and for the 3.2.0 -> 3.2.31 update;
how would you prefer them?
  1. entire file contents
  2. all-in-one patch from current hg tip to my modified version
  3. separate patches from current hg tip to 3.2.31 basic, and from that to
"fixed" version


___
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev


  1   2   >