Hello Christian, thank you for the thorough review. I am currently at the international CAN Conference at Baden-Baden, so I will address this once I return by the end of the week.
Best regards, Michal Lenc On 14. 05. 24 10:10, Christian MAUDERER wrote: > On 2024-05-13 17:40, Christian MAUDERER wrote: >> Hello Pavel and Michal, >> >> sorry for the late reply. I was on vacation last week. >> >> On 2024-05-06 11:27, Pavel Pisa wrote: >>> Dear Christian, >>> >>> On Tuesday 30 of April 2024 08:40:43 Christian MAUDERER wrote: >>>>> For others, code under review hosted in CTU university GitLab >>>>> server >>>>> https://gitlab.fel.cvut.cz/otrees/rtems/rtems-canfd >>>>> Documentation >>>>> >>>>> https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/can/can-html/can.html >>>>> >>>>> https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/index.html >>>>> >>>>> Main developer behind extension to CAN FD and switch to RTEMS >>>>> is Michal Lenc. >>>>> >>>>> The intention is to (hopefully) reach state when it meets criteria >>>>> to mainlining int RTEMS CPU kit under >>>>> >>>>> cpukit/dev/can >>> ... >>>>> I agree, that it is compromise. But adding yet another file >>>>> descriptor >>>>> like multiplexor for queues to each file descriptor seems to me as >>>>> too much complexity. But it can be added. even later as IOCTL to >>>>> remove >>>>> individual queues based on CAN ID matches or queues IDs if create >>>>> is modified to return internal queue IDs... >>>> >>>> I somehow missed that you can open the device multiple times and get >>>> independent queues. With that, it's completely OK and should be >>>> flexible >>>> enough for most applications. >>>> >>>> It's great that you already have put some thought into how it could be >>>> extended later if some application needs more flexibility. >>> ... >>>>>> Did you check with >>>>>> some other hardware controller, whether the whole structures / >>>>>> defines >>>>>> / flags close to the hardware do work well for other controllers >>>>>> too? >>>>> >>>>> The code/concept is based on my previous LinCAN and OrtCAN work >>>>> >>>>> https://ortcan.sourceforge.net/lincan/ >>> ... >>>> I didn't want to doubt your competence. Like I said it's some trap >>>> that >>>> I have fallen into often enough myself (like when guiding Prashanths >>>> GSoC project). But it's clear that you have put a lot of thought into >>>> that. So I would expect that there shouldn't be much trouble with most >>>> controllers. Maybe except for the ones where a semiconductor vendor >>>> thought it would be a good idea to create a completely different >>>> concept. But these are always difficult. >>> >>> I agree with discussion and searching for hard arguments. >>> >>> The solution is compromise and in general CAN bus concept >>> is optimized for direct replacement of wires in car >>> going between distinc units and its use as general >>> communication solution has some difficulties and requires >>> some compromises. >>> >>> For small devices with predefined purpose and Autosar, >>> it is ideal to allocate for each CAN ID (wire signal) >>> to be sent one communication object on the controller. >>> Same for each received signal value or their set in the >>> single frame. The most controllers are equipped by filters >>> and mechanism to do so including selection of the >>> Tx message object for physical bus-link arbitration >>> according to the priority. Then sending side updates >>> signal value in corresponding Tx object and receiving >>> side sees most actual one usually on the best effort basis, >>> older unread frames are overwritten by updated value. >>> >>> But even in simple ECU, there are obstacles to use this >>> principle in all kind of the communication. CAN bus is used >>> for firmware updates and general configuration. In this >>> case, the reliable delivery of all messages with given >>> CAN ID is required because whole sequence has to be >>> received and processed and the state evolution is associated >>> to the sequence. If a single message is lost, then all >>> data are unusable. Because sequence requires exact ordering >>> it is typical that only single Tx object is used. On Rx >>> side there can be problem to capture all frames without >>> overwrite by single Rx object so some controllers ad FIFO >>> which can be attached to each object or some mechanism >>> how to allocate more Rx objects and pass them to the user >>> in FIFO order. >>> >>> That works for small ECUs with single purpose firmware. >>> But on general purpose operating system which should >>> allow even complete monitoring of the CAN bus, allows >>> dynamically started applications and even whole virtual >>> CAN/CANopen nodes, allocation the controller Tx/Rx message >>> objects for each specific purpose is impossible. >>> >>> That is why all generic CAN subsystems which I know >>> (CAN4Linux, LinCAN, SockteCAN, NuttX char device CAN, >>> windows Peak's drivers etc.) define API based on >>> opening driver and presenting received messages >>> in FIFO order to application (with options for software >>> filtering but usually not propagated to controller, >>> HW - LinCAN has some option to union user FIFOs to >>> mask and ID propagated to HW, but you usually end with >>> fully end with need to receve all anyway and it has not been >>> used at the end). The Tx FIFO order is required for messages >>> with same ID or even sometimes between same stream of mesages >>> even wit altering ID for correct realization of some higher >>> level protocols. >>> >>> The result is that even on hardware equipped with multiple >>> Tx objects but without special Tx FIFO order preserving >>> cyclic queue only single Tx object is used to realize >>> transmission of all messages, for example SocketCAN on >>> XCAN controller. So only part of the CAN bus media >>> badwidth can be utilized by single node. May it be, it is sometimes >>> a luck, because CAN IDs are not correctly allocated according >>> to priority even on cars critical subsystems. On the Rx side original >>> buffers approach is hard to use in order preserving FIFO concept, >>> but the most of today controllers add some option to keep order >>> and leave processing and distribution on software side. >>> See evolution from CCAN to DCAN to overcome that problem. >>> We have even made LinCAN for CCAN many many years ago >>> which somehow kept required properties but it was headache. >>> >>> So back to generic OS can interfaces, all I know are FIFO(s) >>> based. Most of them keep strict FIFO order on Tx side >>> which results in HoL (head-of-the-line) blocking and priority >>> inversion on bus loaded by middle priority from other node. >>> >>> That is why SocketCAN adds alloc_candev_mqs (multiple-queues) >>> alternative >>> for drivers >>> >>> https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev/dev.c#L249 >>> >>> >>> but as I know, no mainline kernel driver is using that. >>> We have done some work to research and even a little extend >>> Linux networking QoS subsystem to solve buffer bloat by old >>> messages for traffic requiring best effort (most up to date >>> data for control) for given IDs and to limit badwidth >>> of others or virtual guests connected through QEMU to >>> physical bus etc. may years ago at time when multi-queue >>> has not been available on Linux side. I have long time plan >>> to extend CTU CAN FD mainline Linux driver for this support >>> and probably to be the first example how to overcome HoL/priority >>> inversion in Linux CAN subsystem. It has been planned in original >>> LinCAN before SoketCAN and it is now implemented in proposed >>> RTEMS CAN/FD framework where application can setup multiple >>> queues even for single open instance with different Tx priority >>> class and when used and mapped correctly to CAN IDs, it can >>> prevent priority inversion. It is not generic, because it is >>> quite expensive for deeper FIFOs and even mutual order of >>> Tx messages has to be preserved for many protocols as discussed >>> earlier. CTU CAN FD IP core interface to software has been architected >>> by me to allow maximal utilization of the Tx buffers and their >>> reallocation when needed for higher priority message. >>> Wait for DTP processing and publication of our international CAN >>> Conference 2024 article or come and meet next week in Baden-Baden >>> >>> https://www.can-cia.org/icc/ >>> >>> There are two branches of the thought from this point >>> >>> 1) how it maps to other controllers >>> >>> For these equipped by single Tx object only (i.e. SJA1000), >>> it maps well because attempt to repeat Tx and arbitration >>> can be disabled when higher priority queue becomes ready >>> and our CAN infrastructure allows to push back lower >>> priority message and schedule higher one to be sent. >>> >>> For more complex one, if they do not allow to control Tx objects >>> order then only single Tx object can be used. Bad, link >>> underutilization, >>> but it is what is standard in SocketCAN and other CAN solutions >>> for general purpose operating systems today. All controllers >>> which I know allows to stop Tx attempt repeat and I hope to >>> seen at all option t check if the latest attempt has been >>> successful or not. So newt RTEMS CAN can use them same >>> as on SJA1000. On Rx side, most have FIFO preserving >>> option to use multiple buffers. Sometimes partially >>> broken, burdened by erratas etc. (like iMX RT where >>> we overcome these problems in NuttX drivers). >>> When number of Tx priority classes is limited (for proposed >>> system by default 3 but compile time configurable) then >>> we can allocate one Tx buffer for each class, easy and >>> preserves HoL priority inversion even on simple controllers. >>> If there is option to order Tx according to the buffer >>> index in the controller, then there is option for a little >>> more performant solution when multiple Tx buffers are allocated >>> for each class and they are sequentially filled till highest >>> allocated buffer index is filled. Then there is some gap till >>> all these buffers in given priority are sent because >>> cyclic filling of the minimal index would result in reordering >>> with possible break of some protocol requirements. >>> Some controllers allows to attach DMA realized FIFOs to more >>> Tx objects, in such case it would map to proposed design well >>> too. Some newer controllers adds local priority bits above >>> CAN ID ones (i.e. new NXP FlexCAN). This could allow cyclic >>> use of some Tx objects/buffers similar to CTU CAN FD. >>> There will be problems because multiple Tx buffers priorities >>> are not reachable by single atomic operation like in CTU CAN FD >>> case. But I have some idea how to implement sequential >>> updates to ensure order in the class. There would be problem, >>> that most controllers do not allow to update this information >>> on the objects participating actively in arbitration. So it would >>> lead to much more acrobation between eggs and some gap time, >>> where none message is offered in the link arbitration even that >>> there are pending user requests will be inevitable in some >>> scenarios after some number of messages sent. That cannot >>> be on the bus side worse that considering fixed order according >>> to index. May be, it can be found that overhead does not worth >>> that. But we preserve API in variants in all cases... >>> >>> 2) use of the CAN bus in applications requiring maximal bus >>> transparency with minimal latency and SW load. This is >>> totally opposite of the general CAN bus subsystem for >>> general purpose RTOS. The API in this case should allocated >>> Tx and Rx controller objects for the individual purposes/CAN IDs. >>> Rx side SW processing can be considered as alternative and proposed >>> framework allows to setup queues, but it has overhead and under >>> extreme load it can lost some messages if HW is not performant enough. >>> On Tx side it is even more problematic. >>> >>> But if this type of use of RTEMS for example for Autosar or Simulink >>> generated code is considered then it is possible to extend actual >>> proposed API by IOCTLs which allows to reserve some controller >>> objects for specific purposes and allows to access them directly >>> for minimal overhead and use under direct application control or attach >>> separated controller side "canque_ends_dev_t" to such objects and >>> propagate them to some clients to standard CAN read and write API. >>> >>> So I think that the proposed framework provides what is expected >>> bu most of general purpose CAN/CAN FD framework users, tries to >>> perpare a little even for come of CAN XL, solves problems which >>> may be practically unsolved by all other generic approaches still. >>> And we have some clue how to extend support for most/all other >>> controllers and even some open doors to offer even ECU style >>> API for applications which benefit from direct controller >>> buffers use/allocation which is possible on controllers >>> with abundant number of buffers (not case of SJA1000 >>> and very limited on CTU CAN FD - max 8 can be configured >>> to silicon under actual registers map). >>> >>> I understand that the text is long but you have asked for >>> it in the fact and I provide complete thought dump >>> to analyes it. >> >> Thanks for the (very) detailed explanation. My intention was to >> express that I'm completely OK with only one driver because you >> clearly have thought about other hardware too. Your explanation just >> makes it even more clear how much thought you put into it ;) >> >>> >>> I would be happy if you and or others find time to look >>> into actual code implementation to identify what could >>> be issue for mainlining as soon as possible because >>> after May 24 changes do not propagate into Michal Lenc's >>> thesis text which can be alternative and more in depth >>> documentation and analysis than what fits into official >>> RTEMS one. The full document has already 47 pages and >>> 34 of the actual text without content and appendices. >>> Document includes benchmarks under RTEMS load by HTTP >>> traffic, priority inversion prevention confirmation >>> by measurements with performance data etc. >>> It will be published on CTU in May or June >>> https://dspace.cvut.cz/ >>> and links will be added to >>> https://canbus.pages.fel.cvut.cz/ >>> same as for much shorter iCC article and presentation. >>> >> >> Code review without patches or a review system is always a bit more >> effort because there is nothing to add comments directly. It seems >> that I can't register on the gitlab instance that you provided. So >> let's try it here. >> >> I'll mainly take a look at the headers because they define the >> interface. That's the most important part if you ask me. Bugs in the >> code should be fixable later. >> >> I'll try to categorize my comments a bit. If it has a *Style* or >> *Typo* in front of it, you can just ignore it. It's not really >> important. It's just something that I noted while reading through the >> code. *Question* or *Note* are more important. >> >> And please note: You know CAN a lot better than me. So quite possible >> that I don't see a lot of stuff and that I might have some odd odd >> questions. >> >> >> ### First the ones that I plan to more or less ignore so that I can >> concentrate on the important parts: >> >> Test or demo apps. So most likely not relevant for a review: >> >> ./rtems_can_test/can_test.h >> ./rtems_can_test/app_def.h >> ./rtems_can_test/system.h >> ./rtems_can_test/can_register.h >> ./zynq_template/app_def.h >> ./zynq_template/zynq_reg.h >> ./zynq_template/system.h >> >> Seems to be left over from some tests: >> >> ./lib/libbar/bar.h >> >> Driver specific files. I think these are not that high priority either: >> >> ./lib/candrv/ctucanfd/ctucanfd_txb.h >> ./lib/candrv/ctucanfd/ctucanfd_kframe.h >> ./lib/candrv/ctucanfd/ctucanfd_kregs.h >> ./lib/candrv/dev/can/ctucanfd.h >> >> >> ### Now the more important ones: The interfaces >> >> #### ./lib/candrv/dev/can/can.h >> >> *Style*: I would suggest to group defines a bit more. You already >> used prefixes like RTEMS_CAN_QUEUE_* which is great. You can improve >> that a bit more if you use Doxygen "@name" and "@{" ... "@}". For an >> example take a look at >> >> https://gitlab.rtems.org/rtems/rtos/rtems/-/blob/main/cpukit/include/dev/i2c/i2c.h?ref_type=heads#L80 >> >> >> Which leads to a group in the doxygen output: >> >> https://docs.rtems.org/doxygen/branches/master/cpukit_2include_2dev_2i2c_2i2c_8h.html >> >> >> The same is true for some other defines in other files. I won't >> mention it every time. >> >> >> *Question*: Why do you prefix some defines with RTEMS (like >> RTEMS_CAN_CHIP_MODE) and others don't have that prefix (like >> CAN_CTRLMODE_LOOPBACK)? The same is true for some other defines in >> other files. I won't mention it every time. >> >> >> *Style*: Sometimes you use Doxygen @brief. Sometimes \ref. I think it >> works, but it's a bit odd. >> >> >> *Question*: You have ioctls like RTEMS_CAN_DISCARD_QUEUES. According >> to the description, that ioctl has a parameter. Why is it an _IO and >> not an _IOW? The same is true for some more of the _IO ioctls. >> >> >> *Typo*: The description of RTEMS_CAN_GET_BITTIMING has a "geets" in >> it's comment. >> >> >> *Note*: struct can_chip_ops doesn't have a description. I'm not >> entirely sure what every member should do. For example: >> check_bittiming: From the name I would expect that it only checks a >> bit timing. From the ctucanfd.c it also sets a bit timing. I think it >> would be good if you would add short descriptions here like "Check >> and set a bittiming. Returns 0 on success or negated errno on error." >> >> >> *Detail*: can_bus_register(...) and can_bitrate2bittiming are missing >> a description. If they are a public interface, it would be good if >> they would have one. >> >> >> #### ./lib/candrv/dev/can/can-frame.h >> >> *Detail*: Description of the can_frame_header. You have field >> descriptions like "This member holds the CAN ID value". My first >> thought was that it is some kind of address. But with taking a look >> at the code, it seems that it is a bit mask that is combined out of >> CAN_ERR_ID_* defines. If a field is expected to contain a certain >> group of defines: Can you add a note regarding that? >> >> >> *Style*: You have a group of defines like CAN_ERR_*_DATA_BYTE. On the >> first glance, I thought it would be the same group as CAN_ERR_ID_*. I >> think I would have used CAN_ERR_DATA_BYTE_* instead. Of course that's >> a style question and there are always good arguments for any style. >> Again: A doxygen group using @{ and @} might achieve the same. >> >> >> *Typo*: You have a CAN_ERR_PROT_LOC_DARA_BYTE instead of ..._DATA_BYTE. >> >> >> *Question*: There are a lot of defines in can-frame.h like >> CAN_ERR_PROT_LOC_DATA or CAN_ERR_TRX_UNSPEC. Is it clear for someone >> more used to CAN, how to use these? Or would a description like >> "Possible values for the Byte xyz in a can message" help? >> >> >> #### ./lib/candrv/dev/can/can-filter.h >> >> *Detail*: Again: I'm not happy with the descriptions of the fields of >> the structure. A field "flags" that is described as "This member >> holds CAN flags" isn't really helpful. Which values can I assign to >> that field? Is it a bit mask? Is it a field defined according to some >> standard? In that case even a "Holds standard CAN flags" would be >> useful because then I know that I just have to take a look at any CAN >> documentation. >> >> >> #### ./lib/candrv/dev/can/can-devcommon.h >> >> OK. >> >> >> #### ./lib/candrv/dev/can/can-helpers.h >> >> *Note*: I don't like global defines like MAX_MSGOBJS without a >> prefix. That's polluting the name space. Is there a reason that it >> doesn't have the CAN or RTEMS_CAN prefix like all other defines? >> >> Similar: There are defines like "BIT". Is there a reason for using >> such a generic name? If it (for example) helps porting existing >> drivers from another stack, that's great. Otherwise, I don't like >> these names. >> >> Some more in this file are: len2dlc, GENMASK, FIELD_PREP, FIELD_GET >> >> >> Now I'm running out of time. I'll try to take a look at the following >> files later or tomorrow: >> > > Like promised: > > #### ./lib/candrv/dev/can/can-queue.h > > *Note*: The doxygen documentation of struct canqueue_slot_t didn't > work as expected: You used for example @next to describe the "next" > field. That clearly didn't work: > https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/structcanque__slot__t.html > > > *Question*: You use the Atomic_Uint from rtems/score/atomic.h for the > slot_flags. For new code, I would suggest using the atomic_uint from > stdatomic.h instead (C11). You have included that file already, so it > shouldn't be a problem. > > > *Question*: Why is string.h included in that header? I don't see any > str*, mem* or stp* functions used. > > > *Question*: Some of the functions have a bit of a short description. > For example the canqueue_filter_match: The brief description basically > just tells me exactly what the name of the function already told me. > But how and in what situation would I use that function? How do these > filters work? I can't tell that easily from the description or from > the implementation of the function. Is it even thought as an interface > that a user (in this case: someone writing a driver or an application) > has to understand? > > Maybe I have a basic problem here: Which headers are (more or less) > public ones (used to write drivers and applications) and which ones > are internal only? Or in other words: Which headers will be installed? > > For this file, I strongly suspect that it is not user-facing. You have > a lot of undocumented functions in it (like canqueue_next_inedge or > canque_for_each_inedge). > > The files that are really relevant for review at the current point are > mainly the ones that a user (again: someone writing a driver or an > application) can see. > > If it is a public facing header: Did I miss some general description > how a filter works and how a user should use it? It's quite possible > that I missed that. I'm still only scratching the surface of your work > at the moment. > > > #### ./lib/candrv/dev/can/can-stats.h > > Looks OK. > > #### ./lib/candrv/dev/can/can-virtual.h > > OK. > > #### ./lib/candrv/dev/can/can-bittiming.h > > *Detail*: can_bittiming_const has a name with a fixed 32 char length. > In ctucanfd.c you initialize that with a constant string. Is there > some reason to have a fixed length string in RAM instead of using a > pointer to a constant string that can have an arbitrary length and can > be (for example) in the Flash? > > > Best regards > > Christian > >> >> Best regards >> >> Christian >> >>> Best wishes, >>> >>> Pavel >>> -- >>> Pavel Pisa >>> >>> phone: +420 603531357 >>> e-mail: p...@cmp.felk.cvut.cz >>> Department of Control Engineering FEE CVUT >>> Karlovo namesti 13, 121 35, Prague 2 >>> university: http://control.fel.cvut.cz/ >>> personal: http://cmp.felk.cvut.cz/~pisa >>> company: https://pikron.com/ PiKRON s.r.o. >>> Kankovskeho 1235, 182 00 Praha 8, Czech Republic >>> projects: https://www.openhub.net/accounts/ppisa >>> social: https://social.kernel.org/ppisa >>> CAN related:http://canbus.pages.fel.cvut.cz/ >>> RISC-V education: https://comparch.edu.cvut.cz/ >>> Open Technologies Research Education and Exchange Services >>> https://gitlab.fel.cvut.cz/otrees/org/-/wikis/home >> > _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel