[Machinekit] I/O latency measured with different peripherals

Michel Dagenais Thu, 14 Jul 2016 09:05:02 -0700

Last Summer, I mentioned upcoming work 
<https://groups.google.com/d/msg/machinekit/_zrDtP9ZGeU/OOU_aQz8eN0J> to 
study the latency of different things, including I/O, in Machinekit. The 
objectives were somewhat ambitious for the Summer intern, given that I was 
away most of the Summer. We were recently able to get back to this and get 
interesting results. With Preempt-RT, we are getting maximum latencies 
between 50 and 100us on various ARM platforms (BBB, Zynq, NXP, Raspberry Pi 
3...). We are getting much better latencies on Intel systems at less than 
15us. It is fairly easy to find other results of cyclictest on the web for 
different configurations (Preempt-RT or not, with CPU isolation, under 
stress test...).


Then, we set out to measure the I/O latency. The idea is to look at the 
impact of drivers and hardware on the latency. The common Mesa FPGA card on 
PCIe setup offers excellent performance but at some cost in hardware, 
volume and wiring complexity. For example, my CNC has a computer with a 
Mesa 5i20 and a couple of 50 pins cables going to the controller (power 
supply and power electronics). If we can run a similar setup but with the 
FPGA inside the controller, and only an ethernet or USB cable linking the 
computer to the controller, this will be much better. Note that you can 
already achieve this with an external PCIe connection and locating a Mesa 
card inside the controller.

I programmed a new component (io_latency.icomp) to measure the time 
required to loop back a GPIO signal. To simplify the interfacing with the 
existing drivers, a simple approach was taken: update (toggle) an output, 
write the updated output to the hardware, wait for some number of ns to 
insure propagation, read the inputs from the hardware, check that the 
toggled value had time to propagate. I even setup a loop with a varying 
delay, 1000 measurements with 0 delay, 1000 measurements with 100ns 
delay... each time checking how many times the loopback value was 
propagated properly. This loop with varying delays was useless as the FPGA 
and 1cm loopback jumpers were always "instantaneous" as compared to the 
time required for writing to and reading from the hardware. The results 
were always that with 0 delay inserted, all values were already reflected 
through the loopback when the jumper was inserted and only half when no 
jumper was present (floating high value was taken as correct propagation of 
a TRUE from the output). In summary, you can pretty much get the same 
results simply by measuring the time to execute the write and read 
functions from the hardware drivers. We ran the tests with a "ASRock Q1900M 
PRO3 Intel Celeron J1900 2.0 GHz 
<http://www.newegg.ca/Product/Product.aspx?Item=N82E16813157565>" board.

The 5i20 and 5i24 gave almost identical results at about 15ns to write and 
20ns to read for a total of 35ns. The parallel port requires 7us average, 
14us maximum. However, this carries much less data (only a few GPIO pins as 
compared to tens and tens in the 5i20 and 5i24). Finally, the 7i80 
connected through Ethernet was tested (Gigabit ethernet on host but the 
card is only 100 base T). We got 200us, 35us writing (one packet sent) and 
165us reading (one request sent and one reply packet received). We also 
have a 7i61 USB card but there appears to be no LinuxCNC / Machinekit 
drivers for it.

I will be very curious to see what type of latency is involved in the new 
SoC FPGA chips such as the Zynq. Perhaps the read and write times to the 
FPGA are lower than when going through the PCI bus, closer to what we have 
with the paralle port. Then, the solution proposed by Machinekit to have an 
inexpensive SoC FPGA in the controller itself, and a remote viewer 
connected by Ethernet or Wifi, will be an excellent solution with very good 
performance, simple wiring and low cost!

In a related area, our tools for tracing long latencies with LTTng have 
progressed nicely. We are thus capable of understanding very precisely the 
various causes of longer latencies in Preempt-RT and other setups. One 
interesting tool is to couple hardware tracing (i.e. Intel PT or ARM 
Coresight) or even kernel tracing, in in-memory flight recorder mode,  with 
a program that detects long latencies (e.g. cyclictest) and then triggers a 
trace snapshot to disk.

-- 
website: http://www.machinekit.io blog: http://blog.machinekit.io github: 
https://github.com/machinekit
--- 
You received this message because you are subscribed to the Google Groups 
"Machinekit" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/machinekit.
For more options, visit https://groups.google.com/d/optout.

[Machinekit] I/O latency measured with different peripherals

Reply via email to