Re: CAN TX fail handling

2023-08-10 Thread Alan C. Assis
Hi Nathan,

On 8/10/23, Nathan Hartman  wrote:
> On Thu, Aug 10, 2023 at 4:38 AM Tim Hardisty 
> wrote:
>
>> I like your idea of IOCTLs - I will be revisiting this issue in the next
>> few weeks and will look to see what's involved in implementing this as it
>> "feels" right.
>>
>
> snip
>
> In trying to cover potential board faults, I have found that if
>> there's something that prevents a CAN message reaching an
>> endpoint/destination, the CAN transmitter (of course, as I
>> understand it) is continuously retrying the message send, meaning
>> the test app hangs when you try and close the file once the test has
>> been deemed to fail. That is "by design" in the higher (i.e.
>> non-arch specific) can code as it waits for the TX FIFO/queue to empty
>> until the close is allowed.
>>
>> What is the correct POSIX way to handle this error condition?
>>
>>
> Sounds like in CAN we need the equivalent of tcflush() / tcdrain() as found
> in termios. (Try looking up the man page for these functions on your system
> or at online manpages.) In NuttX, at least for serial ports (i.e., UARTs),
> these functions call IOCTLs which (if I remember correctly) are partly
> implemented in the upper half driver (to clear the software buffer) and
> partly passed to the lower half driver (to flush the hardware FIFO, if
> applicable in the arch in question).
>
> I am not sure whether actual *termios* and its tc family of functions like
> tcflush() / tcdrain() are a good fit for CAN. Maybe they are and you can
> just adopt the same IOCTLs they use. But even if not, you can follow along
> how these are implemented in NuttX and do something very similar.
>

I think Can4Linux could be a good standard to follow, it was used on
Linux before SocketCAN (and still an option there).

https://en.wikipedia.org/wiki/Can4linux

https://gitlab.com/hjoertel/can4linux

BR,

Alan


Re: CAN TX fail handling

2023-08-10 Thread Nathan Hartman
On Thu, Aug 10, 2023 at 4:38 AM Tim Hardisty 
wrote:

> I like your idea of IOCTLs - I will be revisiting this issue in the next
> few weeks and will look to see what's involved in implementing this as it
> "feels" right.
>

snip

In trying to cover potential board faults, I have found that if
> there's something that prevents a CAN message reaching an
> endpoint/destination, the CAN transmitter (of course, as I
> understand it) is continuously retrying the message send, meaning
> the test app hangs when you try and close the file once the test has
> been deemed to fail. That is "by design" in the higher (i.e.
> non-arch specific) can code as it waits for the TX FIFO/queue to empty
> until the close is allowed.
>
> What is the correct POSIX way to handle this error condition?
>
>
Sounds like in CAN we need the equivalent of tcflush() / tcdrain() as found
in termios. (Try looking up the man page for these functions on your system
or at online manpages.) In NuttX, at least for serial ports (i.e., UARTs),
these functions call IOCTLs which (if I remember correctly) are partly
implemented in the upper half driver (to clear the software buffer) and
partly passed to the lower half driver (to flush the hardware FIFO, if
applicable in the arch in question).

I am not sure whether actual *termios* and its tc family of functions like
tcflush() / tcdrain() are a good fit for CAN. Maybe they are and you can
just adopt the same IOCTLs they use. But even if not, you can follow along
how these are implemented in NuttX and do something very similar.

Hope this helps,
Nathan


Re: CAN TX fail handling

2023-08-10 Thread Tim Hardisty
Thanks David - whilst I perhaps could of searched for that, it is why I 
asked here as I was sure someone else was likely to have seen this.


I like your idea of IOCTLs - I will be revisiting this issue in the next 
few weeks and will look to see what's involved in implementing this as 
it "feels" right.


On 10/08/2023 09:04, David Sidrane wrote:

Tim,

Seehttps://github.com/apache/nuttx/issues/3927

David

-Original Message-
From: Alan C. Assis
Sent: Wednesday, August 9, 2023 3:47 PM
To:dev@nuttx.apache.org
Cc: Pavel Pisa
Subject: Re: CAN TX fail handling

Hi Tim,

Agree! This behavior could be implemented in the driver, for example using
some elapsed time. But again, it needs to be analyzed careful to avoid
introduce sometime too specific for a user needs.

Currently the can_close() try to wait for the TX complete that could never
happen because this issue.

If you implement the idea of resetting the CAN controller in the
can_close() you need to guarantee that it will be reinitialized correctly,
because in can_open() it expects the CAN controller in working state.

BR,

Alan

On 8/9/23, Tim Hardisty  wrote:

Thanks Alan,

I can see that a timeout/retry in detail would be hardware dependent.
But in the absence of "something," code can send a message, but have
no idea that it hasn't actually been sent, then try and close the "file"
and the thread will hang indefinitely. I think we need something that
reports the fail so some kind of recovery/reset can be attempted?

Perhaps the "close" could be wrapped with something to deal with this?
Or the open mode needs to be different somehow?

POSIX/Linux type programming is new to me, after decades of bare-metal
type software dev where I'm in total control albeit unique to a
given/chosen processor, so any suggestions would be very welcome.

On 09/08/2023 19:56, Alan C. Assis wrote:

Hi Tim,

I think that the default behavior of CAN Controller is trying to send
indefinitely a message, some HW can define some retry limit.

Please take a look:
https://forum.pjrc.com/threads/67435-FlexCAN-Infinite-Endless-TX-Retr
ies

So, I'm not sure if it will make sense to implement a CAN TX timeout
on NuttX side, since this behavior could be HW dependent.

BR,

Alan

On 8/9/23, Tim Hardisty   wrote:

I am now cracking on with the app for my custom board, and in
parallel writing a production board-test app.

In trying to cover potential board faults, I have found that if
there's something that prevents a CAN message reaching an
endpoint/destination, the CAN transmitter (of course, as I
understand it) is continuously retrying the message send, meaning
the test app hangs when you try and close the file once the test has
been deemed to fail. That is "by design" in the higher (i.e.
non-arch specific) can code as it waits for the TX FIFO/queue to empty
until the close is allowed.

What is the correct POSIX way to handle this error condition?

Might it be better to use Socket CAN, for example, assuming it has
better error handling by design, or is the NuttX CAN "system"
fundamentally missing something to handle this (or, more likely,
I've just missed it )?



--

Regards,

Tim Hardisty


A picture containing text, clipart Description automatically generated



+44 (0) 1305 534535



<http://www.jti.uk.com/>



JTi.uk.com<https://www.jti.uk.com/>



<https://www.facebook.com/JTinnovations/>



\JTinnovations<https://www.facebook.com/JTinnovations/>

JT Innovations Ltd.

Registered office: 36 East St, Weymouth, Dorset, DT3 4DT, UK.

Company number 7619086

VAT Registration GB 111 7906 35


--

Regards,

Tim Hardisty


A picture containing text, clipart Description automatically generated



+44 (0) 1305 534535



<http://www.jti.uk.com/>



JTi.uk.com <https://www.jti.uk.com/>



<https://www.facebook.com/JTinnovations/>



\JTinnovations <https://www.facebook.com/JTinnovations/>

JT Innovations Ltd.

Registered office: 36 East St, Weymouth, Dorset, DT3 4DT, UK.

Company number 7619086

VAT Registration GB 111 7906 35


RE: CAN TX fail handling

2023-08-10 Thread David Sidrane
Tim,

See https://github.com/apache/nuttx/issues/3927

David

-Original Message-
From: Alan C. Assis 
Sent: Wednesday, August 9, 2023 3:47 PM
To: dev@nuttx.apache.org
Cc: Pavel Pisa 
Subject: Re: CAN TX fail handling

Hi Tim,

Agree! This behavior could be implemented in the driver, for example using
some elapsed time. But again, it needs to be analyzed careful to avoid
introduce sometime too specific for a user needs.

Currently the can_close() try to wait for the TX complete that could never
happen because this issue.

If you implement the idea of resetting the CAN controller in the
can_close() you need to guarantee that it will be reinitialized correctly,
because in can_open() it expects the CAN controller in working state.

BR,

Alan

On 8/9/23, Tim Hardisty  wrote:
> Thanks Alan,
>
> I can see that a timeout/retry in detail would be hardware dependent.
> But in the absence of "something," code can send a message, but have
> no idea that it hasn't actually been sent, then try and close the "file"
> and the thread will hang indefinitely. I think we need something that
> reports the fail so some kind of recovery/reset can be attempted?
>
> Perhaps the "close" could be wrapped with something to deal with this?
> Or the open mode needs to be different somehow?
>
> POSIX/Linux type programming is new to me, after decades of bare-metal
> type software dev where I'm in total control albeit unique to a
> given/chosen processor, so any suggestions would be very welcome.
>
> On 09/08/2023 19:56, Alan C. Assis wrote:
>> Hi Tim,
>>
>> I think that the default behavior of CAN Controller is trying to send
>> indefinitely a message, some HW can define some retry limit.
>>
>> Please take a look:
>> https://forum.pjrc.com/threads/67435-FlexCAN-Infinite-Endless-TX-Retr
>> ies
>>
>> So, I'm not sure if it will make sense to implement a CAN TX timeout
>> on NuttX side, since this behavior could be HW dependent.
>>
>> BR,
>>
>> Alan
>>
>> On 8/9/23, Tim Hardisty  wrote:
>>> I am now cracking on with the app for my custom board, and in
>>> parallel writing a production board-test app.
>>>
>>> In trying to cover potential board faults, I have found that if
>>> there's something that prevents a CAN message reaching an
>>> endpoint/destination, the CAN transmitter (of course, as I
>>> understand it) is continuously retrying the message send, meaning
>>> the test app hangs when you try and close the file once the test has
>>> been deemed to fail. That is "by design" in the higher (i.e.
>>> non-arch specific) can code as it waits for the TX FIFO/queue to empty
>>> until the close is allowed.
>>>
>>> What is the correct POSIX way to handle this error condition?
>>>
>>> Might it be better to use Socket CAN, for example, assuming it has
>>> better error handling by design, or is the NuttX CAN "system"
>>> fundamentally missing something to handle this (or, more likely,
>>> I've just missed it )?
>>>
>>>
> --
>
> Regards,
>
> Tim Hardisty
>
>
> A picture containing text, clipart Description automatically generated
>
>
>
> +44 (0) 1305 534535
>
>
>
> <http://www.jti.uk.com/>
>
>
>
> JTi.uk.com <https://www.jti.uk.com/>
>
>
>
> <https://www.facebook.com/JTinnovations/>
>
>
>
> \JTinnovations <https://www.facebook.com/JTinnovations/>
>
> JT Innovations Ltd.
>
> Registered office: 36 East St, Weymouth, Dorset, DT3 4DT, UK.
>
> Company number 7619086
>
> VAT Registration GB 111 7906 35
>


Re: CAN TX fail handling

2023-08-09 Thread Alan C. Assis
Hi Tim,

Agree! This behavior could be implemented in the driver, for example
using some elapsed time. But again, it needs to be analyzed careful to
avoid introduce sometime too specific for a user needs.

Currently the can_close() try to wait for the TX complete that could
never happen because this issue.

If you implement the idea of resetting the CAN controller in the
can_close() you need to guarantee that it will be reinitialized
correctly, because in can_open() it expects the CAN controller in
working state.

BR,

Alan

On 8/9/23, Tim Hardisty  wrote:
> Thanks Alan,
>
> I can see that a timeout/retry in detail would be hardware dependent.
> But in the absence of "something," code can send a message, but have no
> idea that it hasn't actually been sent, then try and close the "file"
> and the thread will hang indefinitely. I think we need something that
> reports the fail so some kind of recovery/reset can be attempted?
>
> Perhaps the "close" could be wrapped with something to deal with this?
> Or the open mode needs to be different somehow?
>
> POSIX/Linux type programming is new to me, after decades of bare-metal
> type software dev where I'm in total control albeit unique to a
> given/chosen processor, so any suggestions would be very welcome.
>
> On 09/08/2023 19:56, Alan C. Assis wrote:
>> Hi Tim,
>>
>> I think that the default behavior of CAN Controller is trying to send
>> indefinitely a message, some HW can define some retry limit.
>>
>> Please take a look:
>> https://forum.pjrc.com/threads/67435-FlexCAN-Infinite-Endless-TX-Retries
>>
>> So, I'm not sure if it will make sense to implement a CAN TX timeout
>> on NuttX side, since this behavior could be HW dependent.
>>
>> BR,
>>
>> Alan
>>
>> On 8/9/23, Tim Hardisty  wrote:
>>> I am now cracking on with the app for my custom board, and in parallel
>>> writing a production board-test app.
>>>
>>> In trying to cover potential board faults, I have found that if there's
>>> something that prevents a CAN message reaching an endpoint/destination,
>>> the CAN transmitter (of course, as I understand it) is continuously
>>> retrying the message send, meaning the test app hangs when you try and
>>> close the file once the test has been deemed to fail. That is "by
>>> design" in the higher (i.e. non-arch specific) can code as it waits for
>>> the TX FIFO/queue to empty until the close is allowed.
>>>
>>> What is the correct POSIX way to handle this error condition?
>>>
>>> Might it be better to use Socket CAN, for example, assuming it has
>>> better error handling by design, or is the NuttX CAN "system"
>>> fundamentally missing something to handle this (or, more likely, I've
>>> just missed it )?
>>>
>>>
> --
>
> Regards,
>
> Tim Hardisty
>
>
> A picture containing text, clipart Description automatically generated
>
>   
>
> +44 (0) 1305 534535
>
>   
>
> 
>
>   
>
> JTi.uk.com 
>
>   
>
> 
>
>   
>
> \JTinnovations 
>
> JT Innovations Ltd.
>
> Registered office: 36 East St, Weymouth, Dorset, DT3 4DT, UK.
>
> Company number 7619086
>
> VAT Registration GB 111 7906 35
>


Re: CAN TX fail handling

2023-08-09 Thread Tim Hardisty

Thanks Alan,

I can see that a timeout/retry in detail would be hardware dependent. 
But in the absence of "something," code can send a message, but have no 
idea that it hasn't actually been sent, then try and close the "file" 
and the thread will hang indefinitely. I think we need something that 
reports the fail so some kind of recovery/reset can be attempted?


Perhaps the "close" could be wrapped with something to deal with this? 
Or the open mode needs to be different somehow?


POSIX/Linux type programming is new to me, after decades of bare-metal 
type software dev where I'm in total control albeit unique to a 
given/chosen processor, so any suggestions would be very welcome.


On 09/08/2023 19:56, Alan C. Assis wrote:

Hi Tim,

I think that the default behavior of CAN Controller is trying to send
indefinitely a message, some HW can define some retry limit.

Please take a look:
https://forum.pjrc.com/threads/67435-FlexCAN-Infinite-Endless-TX-Retries

So, I'm not sure if it will make sense to implement a CAN TX timeout
on NuttX side, since this behavior could be HW dependent.

BR,

Alan

On 8/9/23, Tim Hardisty  wrote:

I am now cracking on with the app for my custom board, and in parallel
writing a production board-test app.

In trying to cover potential board faults, I have found that if there's
something that prevents a CAN message reaching an endpoint/destination,
the CAN transmitter (of course, as I understand it) is continuously
retrying the message send, meaning the test app hangs when you try and
close the file once the test has been deemed to fail. That is "by
design" in the higher (i.e. non-arch specific) can code as it waits for
the TX FIFO/queue to empty until the close is allowed.

What is the correct POSIX way to handle this error condition?

Might it be better to use Socket CAN, for example, assuming it has
better error handling by design, or is the NuttX CAN "system"
fundamentally missing something to handle this (or, more likely, I've
just missed it )?



--

Regards,

Tim Hardisty


A picture containing text, clipart Description automatically generated



+44 (0) 1305 534535







JTi.uk.com 







\JTinnovations 

JT Innovations Ltd.

Registered office: 36 East St, Weymouth, Dorset, DT3 4DT, UK.

Company number 7619086

VAT Registration GB 111 7906 35


Re: CAN TX fail handling

2023-08-09 Thread Alan C. Assis
Hi Tim,

I think that the default behavior of CAN Controller is trying to send
indefinitely a message, some HW can define some retry limit.

Please take a look:
https://forum.pjrc.com/threads/67435-FlexCAN-Infinite-Endless-TX-Retries

So, I'm not sure if it will make sense to implement a CAN TX timeout
on NuttX side, since this behavior could be HW dependent.

BR,

Alan

On 8/9/23, Tim Hardisty  wrote:
> I am now cracking on with the app for my custom board, and in parallel
> writing a production board-test app.
>
> In trying to cover potential board faults, I have found that if there's
> something that prevents a CAN message reaching an endpoint/destination,
> the CAN transmitter (of course, as I understand it) is continuously
> retrying the message send, meaning the test app hangs when you try and
> close the file once the test has been deemed to fail. That is "by
> design" in the higher (i.e. non-arch specific) can code as it waits for
> the TX FIFO/queue to empty until the close is allowed.
>
> What is the correct POSIX way to handle this error condition?
>
> Might it be better to use Socket CAN, for example, assuming it has
> better error handling by design, or is the NuttX CAN "system"
> fundamentally missing something to handle this (or, more likely, I've
> just missed it )?
>
>


CAN TX fail handling

2023-08-09 Thread Tim Hardisty
I am now cracking on with the app for my custom board, and in parallel 
writing a production board-test app.


In trying to cover potential board faults, I have found that if there's 
something that prevents a CAN message reaching an endpoint/destination, 
the CAN transmitter (of course, as I understand it) is continuously 
retrying the message send, meaning the test app hangs when you try and 
close the file once the test has been deemed to fail. That is "by 
design" in the higher (i.e. non-arch specific) can code as it waits for 
the TX FIFO/queue to empty until the close is allowed.


What is the correct POSIX way to handle this error condition?

Might it be better to use Socket CAN, for example, assuming it has 
better error handling by design, or is the NuttX CAN "system" 
fundamentally missing something to handle this (or, more likely, I've 
just missed it )?