Re: [RFC] generic CAN/CAN FD susbsytem for RTEMS from scratch - online documentation

Christian MAUDERER Tue, 14 May 2024 01:10:43 -0700

On 2024-05-13 17:40, Christian MAUDERER wrote:

Hello Pavel and Michal,


sorry for the late reply. I was on vacation last week.

On 2024-05-06 11:27, Pavel Pisa wrote:

Dear Christian,

On Tuesday 30 of April 2024 08:40:43 Christian MAUDERER wrote:

For others, code under review hosted in CTU university GitLab
server
    https://gitlab.fel.cvut.cz/otrees/rtems/rtems-canfd
Documentation

https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/can/can-html/can.htmlhttps://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/index.html


Main developer behind extension to CAN FD and switch to RTEMS
is Michal Lenc.

The intention is to (hopefully) reach state when it meets criteria
to mainlining int RTEMS CPU kit under

    cpukit/dev/can

...

I agree, that it is compromise. But adding yet another file descriptor
like multiplexor for queues to each file descriptor seems to me as
too much complexity. But it can be added. even later as IOCTL to remove
individual queues based on CAN ID matches or queues IDs if create
is modified to return internal queue IDs...


I somehow missed that you can open the device multiple times and get
independent queues. With that, it's completely OK and should be flexible
enough for most applications.

It's great that you already have put some thought into how it could be
extended later if some application needs more flexibility.

...

Did you check with
some other hardware controller, whether the whole structures / defines
/ flags close to the hardware do work well for other controllers too?


The code/concept is based on my previous LinCAN and OrtCAN work

https://ortcan.sourceforge.net/lincan/

...

I didn't want to doubt your competence. Like I said it's some trap that
I have fallen into often enough myself (like when guiding Prashanths
GSoC project). But it's clear that you have put a lot of thought into
that. So I would expect that there shouldn't be much trouble with most
controllers. Maybe except for the ones where a semiconductor vendor
thought it would be a good idea to create a completely different
concept. But these are always difficult.


I agree with discussion and searching for hard arguments.

The solution is compromise and in general CAN bus concept
is optimized for direct replacement of wires in car
going between distinc units and its use as general
communication solution has some difficulties and requires
some compromises.

For small devices with predefined purpose and Autosar,
it is ideal to allocate for each CAN ID (wire signal)
to be sent one communication object on the controller.
Same for each received signal value or their set in the
single frame. The most controllers are equipped by filters
and mechanism to do so including selection of the
Tx message object for physical bus-link arbitration
according to the priority. Then sending side updates
signal value in corresponding Tx object and receiving
side sees most actual one usually on the best effort basis,
older unread frames are overwritten by updated value.

But even in simple ECU, there are obstacles to use this
principle in all kind of the communication. CAN bus is used
for firmware updates and general configuration. In this
case, the reliable delivery of all messages with given
CAN ID is required because whole sequence has to be
received and processed and the state evolution is associated
to the sequence. If a single message is lost, then all
data are unusable. Because sequence requires exact ordering
it is typical that only single Tx object is used. On Rx
side there can be problem to capture all frames without
overwrite by single Rx object so some controllers ad FIFO
which can be attached to each object or some mechanism
how to allocate more Rx objects and pass them to the user
in FIFO order.

That works for small ECUs with single purpose firmware.
But on general purpose operating system which should
allow even complete monitoring of the CAN bus, allows
dynamically started applications and even whole virtual
CAN/CANopen nodes, allocation the controller Tx/Rx message
objects for each specific purpose is impossible.

That is why all generic CAN subsystems which I know
(CAN4Linux, LinCAN, SockteCAN, NuttX char device CAN,
windows Peak's drivers etc.) define API based on
opening driver and presenting received messages
in FIFO order to application (with options for software
filtering but usually not propagated to controller,
HW - LinCAN has some option to union user FIFOs to
mask and ID propagated to HW, but you usually end with
fully end with need to receve all anyway and it has not been
used at the end). The Tx FIFO order is required for messages
with same ID or even sometimes between same stream of mesages
even wit altering ID for correct realization of some higher
level protocols.

The result is that even on hardware equipped with multiple
Tx objects but without special Tx FIFO order preserving
cyclic queue only single Tx object is used to realize
transmission of all messages, for example SocketCAN on
XCAN controller. So only part of the CAN bus media
badwidth can be utilized by single node. May it be, it is sometimes
a luck, because CAN IDs are not correctly allocated according
to priority even on cars critical subsystems. On the Rx side original
buffers approach is hard to use in order preserving FIFO concept,
but the most of today controllers add some option to keep order
and leave processing and distribution on software side.
See evolution from CCAN to DCAN to overcome that problem.
We have even made LinCAN for CCAN many many years ago
which somehow kept required properties but it was headache.

So back to generic OS can interfaces, all I know are FIFO(s)
based. Most of them keep strict FIFO order on Tx side
which results in HoL (head-of-the-line) blocking and priority
inversion on bus loaded by middle priority from other node.

That is why SocketCAN adds alloc_candev_mqs (multiple-queues) alternative
for drivers

https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev/dev.c#L249

but as I know, no mainline kernel driver is using that.
We have done some work to research and even a little extend
Linux networking QoS subsystem to solve buffer bloat by old
messages for traffic requiring best effort (most up to date
data for control) for given IDs and to limit badwidth
of others or virtual guests connected through QEMU to
physical bus etc. may years ago at time when multi-queue
has not been available on Linux side. I have long time plan
to extend CTU CAN FD mainline Linux driver for this support
and probably to be the first example how to overcome HoL/priority
inversion in Linux CAN subsystem. It has been planned in original
LinCAN before SoketCAN and it is now implemented in proposed
RTEMS CAN/FD framework where application can setup multiple
queues even for single open instance with different Tx priority
class and when used and mapped correctly to CAN IDs, it can
prevent priority inversion. It is not generic, because it is
quite expensive for deeper FIFOs and even mutual order of
Tx messages has to be preserved for many protocols as discussed
earlier. CTU CAN FD IP core interface to software has been architected
by me to allow maximal utilization of the Tx buffers and their
reallocation when needed for higher priority message.
Wait for DTP processing and publication of our international CAN
Conference 2024 article or come and meet next week in Baden-Baden

   https://www.can-cia.org/icc/

There are two branches of the thought from this point

1) how it maps to other controllers

For these equipped by single Tx object only (i.e. SJA1000),
it maps well because attempt to repeat Tx and arbitration
can be disabled when higher priority queue becomes ready
and our CAN infrastructure allows to push back lower
priority message and schedule higher one to be sent.

For more complex one, if they do not allow to control Tx objects
order then only single Tx object can be used. Bad, link underutilization,
but it is what is standard in SocketCAN and other CAN solutions
for general purpose operating systems today. All controllers
which I know allows to stop Tx attempt repeat and I hope to
seen at all option t check if the latest attempt has been
successful or not. So newt RTEMS CAN can use them same
as on SJA1000. On Rx side, most have FIFO preserving
option to use multiple buffers. Sometimes partially
broken, burdened by erratas etc. (like iMX RT where
we overcome these problems in NuttX drivers).
When number of Tx priority classes is limited (for proposed
system by default 3 but compile time configurable) then
we can allocate one Tx buffer for each class, easy and
preserves HoL priority inversion even on simple controllers.
If there is option to order Tx according to the buffer
index in the controller, then there is option for a little
more performant solution when multiple Tx buffers are allocated
for each class and they are sequentially filled till highest
allocated buffer index is filled. Then there is some gap till
all these buffers in given priority are sent because
cyclic filling of the minimal index would result in reordering
with possible break of some protocol requirements.
Some controllers allows to attach DMA realized FIFOs to more
Tx objects, in such case it would map to proposed design well
too. Some newer controllers adds local priority bits above
CAN ID ones (i.e. new NXP FlexCAN). This could allow cyclic
use of some Tx objects/buffers similar to CTU CAN FD.
There will be problems because multiple Tx buffers priorities
are not reachable by single atomic operation like in CTU CAN FD
case. But I have some idea how to implement sequential
updates to ensure order in the class. There would be problem,
that most controllers do not allow to update this information
on the objects participating actively in arbitration. So it would
lead to much more acrobation between eggs and some gap time,
where none message is offered in the link arbitration even that
there are pending user requests will be inevitable in some
scenarios after some number of messages sent. That cannot
be on the bus side worse that considering fixed order according
to index. May be, it can be found that overhead does not worth
that. But we preserve API in variants in all cases...

2) use of the CAN bus in applications requiring maximal bus
transparency with minimal latency and SW load. This is
totally opposite of the general CAN bus subsystem for
general purpose RTOS. The API in this case should allocated
Tx and Rx controller objects for the individual purposes/CAN IDs.
Rx side SW processing can be considered as alternative and proposed
framework allows to setup queues, but it has overhead and under
extreme load it can lost some messages if HW is not performant enough.
On Tx side it is even more problematic.

But if this type of use of RTEMS for example for Autosar or Simulink
generated code is considered then it is possible to extend actual
proposed API by IOCTLs which allows to reserve some controller
objects for specific purposes and allows to access them directly
for minimal overhead and use under direct application control or attach
separated controller side "canque_ends_dev_t" to such objects and
propagate them to some clients to standard CAN read and write API.

So I think that the proposed framework provides what is expected
bu most of general purpose CAN/CAN FD framework users, tries to
perpare a little even for come of CAN XL, solves problems which
may be practically unsolved by all other generic approaches still.
And we have some clue how to extend support for most/all other
controllers and even some open doors to offer even ECU style
API for applications which benefit from direct controller
buffers use/allocation which is possible on controllers
with abundant number of buffers (not case of SJA1000
and very limited on CTU CAN FD - max 8 can be configured
to silicon under actual registers map).

I understand that the text is long but you have asked for
it in the fact and I provide complete thought dump
to analyes it.

Thanks for the (very) detailed explanation. My intention was to expressthat I'm completely OK with only one driver because you clearly havethought about other hardware too. Your explanation just makes it evenmore clear how much thought you put into it ;)


I would be happy if you and or others find time to look
into actual code implementation to identify what could
be issue for mainlining as soon as possible because
after May 24 changes do not propagate into Michal Lenc's
thesis text which can be alternative and more in depth
documentation and analysis than what fits into official
RTEMS one. The full document has already 47 pages and
34 of the actual text without content and appendices.
Document includes benchmarks under RTEMS load by HTTP
traffic, priority inversion prevention confirmation
by measurements with performance data etc.
It will be published on CTU in May or June
   https://dspace.cvut.cz/
and links will be added to
   https://canbus.pages.fel.cvut.cz/
same as for much shorter iCC article and presentation.

Code review without patches or a review system is always a bit moreeffort because there is nothing to add comments directly. It seems thatI can't register on the gitlab instance that you provided. So let's tryit here.

I'll mainly take a look at the headers because they define theinterface. That's the most important part if you ask me. Bugs in thecode should be fixable later.

I'll try to categorize my comments a bit. If it has a *Style* or *Typo*in front of it, you can just ignore it. It's not really important. It'sjust something that I noted while reading through the code. *Question*or *Note* are more important.

And please note: You know CAN a lot better than me. So quite possiblethat I don't see a lot of stuff and that I might have some odd oddquestions.

### First the ones that I plan to more or less ignore so that I canconcentrate on the important parts:


Test or demo apps. So most likely not relevant for a review:

./rtems_can_test/can_test.h
./rtems_can_test/app_def.h
./rtems_can_test/system.h
./rtems_can_test/can_register.h
./zynq_template/app_def.h
./zynq_template/zynq_reg.h
./zynq_template/system.h

Seems to be left over from some tests:

./lib/libbar/bar.h

Driver specific files. I think these are not that high priority either:

./lib/candrv/ctucanfd/ctucanfd_txb.h
./lib/candrv/ctucanfd/ctucanfd_kframe.h
./lib/candrv/ctucanfd/ctucanfd_kregs.h
./lib/candrv/dev/can/ctucanfd.h


### Now the more important ones: The interfaces

#### ./lib/candrv/dev/can/can.h

*Style*: I would suggest to group defines a bit more. You already usedprefixes like RTEMS_CAN_QUEUE_* which is great. You can improve that abit more if you use Doxygen "@name" and "@{" ... "@}". For an exampletake a look at


https://gitlab.rtems.org/rtems/rtos/rtems/-/blob/main/cpukit/include/dev/i2c/i2c.h?ref_type=heads#L80

Which leads to a group in the doxygen output:

https://docs.rtems.org/doxygen/branches/master/cpukit_2include_2dev_2i2c_2i2c_8h.html

The same is true for some other defines in other files. I won't mentionit every time.

*Question*: Why do you prefix some defines with RTEMS (likeRTEMS_CAN_CHIP_MODE) and others don't have that prefix (likeCAN_CTRLMODE_LOOPBACK)? The same is true for some other defines in otherfiles. I won't mention it every time.

*Style*: Sometimes you use Doxygen @brief. Sometimes \ref. I think itworks, but it's a bit odd.

*Question*: You have ioctls like RTEMS_CAN_DISCARD_QUEUES. According tothe description, that ioctl has a parameter. Why is it an _IO and not an_IOW? The same is true for some more of the _IO ioctls.

*Typo*: The description of RTEMS_CAN_GET_BITTIMING has a "geets" in it'scomment.

*Note*: struct can_chip_ops doesn't have a description. I'm not entirelysure what every member should do. For example: check_bittiming: From thename I would expect that it only checks a bit timing. From thectucanfd.c it also sets a bit timing. I think it would be good if youwould add short descriptions here like "Check and set a bittiming.Returns 0 on success or negated errno on error."

*Detail*: can_bus_register(...) and can_bitrate2bittiming are missing adescription. If they are a public interface, it would be good if theywould have one.



#### ./lib/candrv/dev/can/can-frame.h

*Detail*: Description of the can_frame_header. You have fielddescriptions like "This member holds the CAN ID value". My first thoughtwas that it is some kind of address. But with taking a look at the code,it seems that it is a bit mask that is combined out of CAN_ERR_ID_*defines. If a field is expected to contain a certain group of defines:Can you add a note regarding that?

*Style*: You have a group of defines like CAN_ERR_*_DATA_BYTE. On thefirst glance, I thought it would be the same group as CAN_ERR_ID_*. Ithink I would have used CAN_ERR_DATA_BYTE_* instead. Of course that's astyle question and there are always good arguments for any style. Again:A doxygen group using @{ and @} might achieve the same.



*Typo*: You have a CAN_ERR_PROT_LOC_DARA_BYTE instead of ..._DATA_BYTE.

*Question*: There are a lot of defines in can-frame.h likeCAN_ERR_PROT_LOC_DATA or CAN_ERR_TRX_UNSPEC. Is it clear for someonemore used to CAN, how to use these? Or would a description like"Possible values for the Byte xyz in a can message" help?



#### ./lib/candrv/dev/can/can-filter.h

*Detail*: Again: I'm not happy with the descriptions of the fields ofthe structure. A field "flags" that is described as "This member holdsCAN flags" isn't really helpful. Which values can I assign to thatfield? Is it a bit mask? Is it a field defined according to somestandard? In that case even a "Holds standard CAN flags" would be usefulbecause then I know that I just have to take a look at any CANdocumentation.



#### ./lib/candrv/dev/can/can-devcommon.h

OK.


#### ./lib/candrv/dev/can/can-helpers.h

*Note*: I don't like global defines like MAX_MSGOBJS without a prefix.That's polluting the name space. Is there a reason that it doesn't havethe CAN or RTEMS_CAN prefix like all other defines?

Similar: There are defines like "BIT". Is there a reason for using sucha generic name? If it (for example) helps porting existing drivers fromanother stack, that's great. Otherwise, I don't like these names.


Some more in this file are: len2dlc, GENMASK, FIELD_PREP, FIELD_GET

Now I'm running out of time. I'll try to take a look at the followingfiles later or tomorrow:


Like promised:

#### ./lib/candrv/dev/can/can-queue.h

*Note*: The doxygen documentation of struct canqueue_slot_t didn't workas expected: You used for example @next to describe the "next" field.That clearly didn't work:https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/structcanque__slot__t.html

*Question*: You use the Atomic_Uint from rtems/score/atomic.h for theslot_flags. For new code, I would suggest using the atomic_uint fromstdatomic.h instead (C11). You have included that file already, so itshouldn't be a problem.

*Question*: Why is string.h included in that header? I don't see anystr*, mem* or stp* functions used.

*Question*: Some of the functions have a bit of a short description. Forexample the canqueue_filter_match: The brief description basically justtells me exactly what the name of the function already told me. But howand in what situation would I use that function? How do these filterswork? I can't tell that easily from the description or from theimplementation of the function. Is it even thought as an interface thata user (in this case: someone writing a driver or an application) has tounderstand?

Maybe I have a basic problem here: Which headers are (more or less)public ones (used to write drivers and applications) and which ones areinternal only? Or in other words: Which headers will be installed?

For this file, I strongly suspect that it is not user-facing. You have alot of undocumented functions in it (like canqueue_next_inedge orcanque_for_each_inedge).

The files that are really relevant for review at the current point aremainly the ones that a user (again: someone writing a driver or anapplication) can see.

If it is a public facing header: Did I miss some general description howa filter works and how a user should use it? It's quite possible that Imissed that. I'm still only scratching the surface of your work at themoment.



#### ./lib/candrv/dev/can/can-stats.h

Looks OK.

#### ./lib/candrv/dev/can/can-virtual.h

OK.

#### ./lib/candrv/dev/can/can-bittiming.h

*Detail*: can_bittiming_const has a name with a fixed 32 char length. Inctucanfd.c you initialize that with a constant string. Is there somereason to have a fixed length string in RAM instead of using a pointerto a constant string that can have an arbitrary length and can be (forexample) in the Flash?



Best regards

Christian


Best regards

Christian

Best wishes,

                 Pavel
--
                 Pavel Pisa

     phone:      +420 603531357
     e-mail:     [email protected]
     Department of Control Engineering FEE CVUT
     Karlovo namesti 13, 121 35, Prague 2
     university: http://control.fel.cvut.cz/
     personal:   http://cmp.felk.cvut.cz/~pisa
     company:    https://pikron.com/ PiKRON s.r.o.
     Kankovskeho 1235, 182 00 Praha 8, Czech Republic
     projects:   https://www.openhub.net/accounts/ppisa
     social:     https://social.kernel.org/ppisa
     CAN related:http://canbus.pages.fel.cvut.cz/
     RISC-V education: https://comparch.edu.cvut.cz/
     Open Technologies Research Education and Exchange Services
     https://gitlab.fel.cvut.cz/otrees/org/-/wikis/home


--
--------------------------------------------
embedded brains GmbH & Co. KG
Herr Christian MAUDERER
Dornierstr. 4
82178 Puchheim
Germany
email:  [email protected]
phone:  +49-89-18 94 741 - 18
mobile: +49-176-152 206 08

Registergericht: Amtsgericht München
Registernummer: HRA 117265
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/
_______________________________________________
devel mailing list
[email protected]
http://lists.rtems.org/mailman/listinfo/devel

Re: [RFC] generic CAN/CAN FD susbsytem for RTEMS from scratch - online documentation

Reply via email to