Last Summer, I mentioned upcoming work <https://groups.google.com/d/msg/machinekit/_zrDtP9ZGeU/OOU_aQz8eN0J> to study the latency of different things, including I/O, in Machinekit. The objectives were somewhat ambitious for the Summer intern, given that I was away most of the Summer. We were recently able to get back to this and get interesting results. With Preempt-RT, we are getting maximum latencies between 50 and 100us on various ARM platforms (BBB, Zynq, NXP, Raspberry Pi 3...). We are getting much better latencies on Intel systems at less than 15us. It is fairly easy to find other results of cyclictest on the web for different configurations (Preempt-RT or not, with CPU isolation, under stress test...).
Then, we set out to measure the I/O latency. The idea is to look at the impact of drivers and hardware on the latency. The common Mesa FPGA card on PCIe setup offers excellent performance but at some cost in hardware, volume and wiring complexity. For example, my CNC has a computer with a Mesa 5i20 and a couple of 50 pins cables going to the controller (power supply and power electronics). If we can run a similar setup but with the FPGA inside the controller, and only an ethernet or USB cable linking the computer to the controller, this will be much better. Note that you can already achieve this with an external PCIe connection and locating a Mesa card inside the controller. I programmed a new component (io_latency.icomp) to measure the time required to loop back a GPIO signal. To simplify the interfacing with the existing drivers, a simple approach was taken: update (toggle) an output, write the updated output to the hardware, wait for some number of ns to insure propagation, read the inputs from the hardware, check that the toggled value had time to propagate. I even setup a loop with a varying delay, 1000 measurements with 0 delay, 1000 measurements with 100ns delay... each time checking how many times the loopback value was propagated properly. This loop with varying delays was useless as the FPGA and 1cm loopback jumpers were always "instantaneous" as compared to the time required for writing to and reading from the hardware. The results were always that with 0 delay inserted, all values were already reflected through the loopback when the jumper was inserted and only half when no jumper was present (floating high value was taken as correct propagation of a TRUE from the output). In summary, you can pretty much get the same results simply by measuring the time to execute the write and read functions from the hardware drivers. We ran the tests with a "ASRock Q1900M PRO3 Intel Celeron J1900 2.0 GHz <http://www.newegg.ca/Product/Product.aspx?Item=N82E16813157565>" board. The 5i20 and 5i24 gave almost identical results at about 15ns to write and 20ns to read for a total of 35ns. The parallel port requires 7us average, 14us maximum. However, this carries much less data (only a few GPIO pins as compared to tens and tens in the 5i20 and 5i24). Finally, the 7i80 connected through Ethernet was tested (Gigabit ethernet on host but the card is only 100 base T). We got 200us, 35us writing (one packet sent) and 165us reading (one request sent and one reply packet received). We also have a 7i61 USB card but there appears to be no LinuxCNC / Machinekit drivers for it. I will be very curious to see what type of latency is involved in the new SoC FPGA chips such as the Zynq. Perhaps the read and write times to the FPGA are lower than when going through the PCI bus, closer to what we have with the paralle port. Then, the solution proposed by Machinekit to have an inexpensive SoC FPGA in the controller itself, and a remote viewer connected by Ethernet or Wifi, will be an excellent solution with very good performance, simple wiring and low cost! In a related area, our tools for tracing long latencies with LTTng have progressed nicely. We are thus capable of understanding very precisely the various causes of longer latencies in Preempt-RT and other setups. One interesting tool is to couple hardware tracing (i.e. Intel PT or ARM Coresight) or even kernel tracing, in in-memory flight recorder mode, with a program that detects long latencies (e.g. cyclictest) and then triggers a trace snapshot to disk. -- website: http://www.machinekit.io blog: http://blog.machinekit.io github: https://github.com/machinekit --- You received this message because you are subscribed to the Google Groups "Machinekit" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. Visit this group at https://groups.google.com/group/machinekit. For more options, visit https://groups.google.com/d/optout.
