Re: Sleep Resolution

2021-03-24 Thread Xiang Xiao
Another way to avoid the calibration is to reuse the hardware timer in the
busy loop:
https://github.com/apache/incubator-nuttx/blob/master/drivers/timers/arch_alarm.c#L60-L74
https://github.com/apache/incubator-nuttx/blob/master/drivers/timers/arch_timer.c#L122-L144

On Thu, Mar 25, 2021 at 11:42 AM Gregory Nutt  wrote:

>
> > Why not call up_udelay or up_mdelay? The arch/soc should provide a best
> > implementation for you.
>
> I was wondering that too.
>
> Also, as a side note, it is very important to calibrate the delay loop
> using in those functions.  If the delay loop is properly calibrated,
> these can be very accurate (but I suspect most people no longer
> calibrate the delay loop).
>
> There is an app at apps/examples/calib_udelay that can be used to do that.
>
>


Re: Sleep Resolution

2021-03-24 Thread Gregory Nutt




Why not call up_udelay or up_mdelay? The arch/soc should provide a best
implementation for you.


I was wondering that too.

Also, as a side note, it is very important to calibrate the delay loop 
using in those functions.  If the delay loop is properly calibrated, 
these can be very accurate (but I suspect most people no longer 
calibrate the delay loop).


There is an app at apps/examples/calib_udelay that can be used to do that.



Re: Sleep Resolution

2021-03-24 Thread Xiang Xiao
Why not call up_udelay or up_mdelay? The arch/soc should provide a best
implementation for you.

On Thu, Mar 25, 2021 at 2:00 AM Fotis Panagiotopoulos 
wrote:

> If you are using an ARM MCU you may find the following helpful.
> You must ensure that it cannot be scheduled out in any way though...
>
> (Directly copy-pasting from one of our HAL libraries... You will need to
> fine-tune it to your needs.)
>
> /**
>  * Multiplier value for the ASM delay function. The delay cycles will be
>  * the requested uSecs delay multiplied by this value. The ASM loop needs
>  * two instructions per loop, which need 2 cycles to execute (4 if branch
>  * predictor misses, but that would not be the case for most executions).
>  * Assuming a 180MHz clock we need 60 loops per microsecond.
>  * @note The value is verified to be accurate after oscilloscope
> measurements.
>  */
> #define DELAY_ASM_MULTIPLIER( 60 )
>
> void delayASM(uint32_t us)
> {
> us *= DELAY_ASM_MULTIPLIER;
>
> //This implementation is taking care of the loop in assembly
> //instructions, making the result much more predictable than
> //a standard for loop with "nop" instructions, and it is
> //independent of the compiler, and its optimizations.
> asm volatile("   mov r0, %[us]  \n\t"
>  "1: subs r0, #1\n\t"
>  "   bhi 1b \n\t"
>  :
>  : [us] "r" (us)
>  : "r0");
> }
>
> Στις Τετ, 24 Μαρ 2021 στις 7:53 μ.μ., ο/η Grr  έγραψε:
>
> > This is a SocketCAN driver for MCP2515 with a new SPI system that
> > streamlines the select->write->read->deselect process and so exposes the
> > need to guarantee hold time between read and deselect and disable time
> > between deselect and next select
> >
> > Looking at David's code, it seems the loop is the right answer. The DWT
> > cannot be used for a portable solution but maybe an inline function.
> Thanks
> > for the idea
> >
> > I believe NOPs are optimized away but it seems asm("") or something close
> > to that is not
> >
> > It would be nice to incorporate a general solution for this problem to
> the
> > Nuttx toolbox
> >
> > El mié, 24 mar 2021 a las 11:24, David Sidrane ( >)
> > escribió:
> >
> > > What HW is this on?
> > >
> > > -Original Message-
> > > From: Grr [mailto:gebbe...@gmail.com]
> > > Sent: Wednesday, March 24, 2021 10:09 AM
> > > To: dev@nuttx.apache.org
> > > Subject: Re: Sleep Resolution
> > >
> > > Thank you very much for your response
> > >
> > > What I'm trying to do is to generate hold and disable times for SPI CS,
> > > which should be about 50 ns
> > >
> > > I started by an empty for loop but it seems optimization gets rid of it
> > (I
> > > haven't researched the issue properly). Then I thought a proper
> function
> > > would be better but got stuck in that expression "sleep resolution"
> > >
> > > For that scale (10 SYSCLK cycles), a loop is probably OK but I wanted
> to
> > > make sure there's not a more appropriate system tool
> > >
> > > El mié, 24 mar 2021 a las 10:46, Sara da Cunha Monteiro de Souza (<
> > > saramonteirosouz...@gmail.com>) escribió:
> > >
> > > > Hi Grr,
> > > >
> > > > I have never needed to use this function neither this range (ns).
> > > > But I used the usleep function which resolution is defined as
> > > > CONFIG_USEC_PER_TICK.
> > > > But maybe, in your case, for such range, you should consider using a
> > > > hardware timer or a Timer Hook.
> > > > Take a look at this wiki:
> > > > https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
> > > >
> > > > Sara
> > > >
> > > > Em qua., 24 de mar. de 2021 às 13:37, Grr 
> > escreveu:
> > > >
> > > > > Hello to all.
> > > > >
> > > > > Looking for the right way to create a _very_ short delay (10-100
> > ns), I
> > > > > found clock_nanosleep, whose description says:
> > > > >
> > > > > "The suspension time caused by this function may be longer than
> > > > > requested
> > > > > because the argument value is rounded up to an integer multiple of
> > the
> > > > > sleep resolution"
> > > > >
> > > > > What is the sleep resolution and where/how is defined?
> > > > >
> > > > > TIA
> > > > > Grr
> > > > >
> > > >
> > >
> >
>


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt





What I'm thinking is that, besides the TLS based solution, adding a 
non-standard getopt() seems to be a good option anyway, since it is a 
lightweight solution to this particular function.


Why do you think TLS is not lightweight.  It is very lightweight. The 
non-standard, non-portable approach is essentially the same 
computationally.


TLS is simply a little chunk of memory that lies at the "bottom" of 
the stack ("bottom" meaning the lowest address when the push-down 
stack memory was allocated).  Get the bottom of the stack and you have 
the TLS data.


TLS is more efficient if you can align stacks.  Then the TLS pointer 
can be obtained by just ANDing the current stack pointer. That is 
trivial.


If the stack is not aligned, then we have to ask the OS where the 
stack allocation begins.


Since each thread has its own stack, this provides a nearly 
instantaneous way to get thread-specific data.


So I would say that the complexity is higher only because this is not 
standard, familiar C programming, but in terms of light- vs 
heavy-weight, I do not see a real difference.  No decisions should be 
made based on that weighty-ness dimension.  The significant, important 
criteria are standard vs. non-standard and portable vs. non-portable.  
Those matter.


And given that there are a dozen or more cases where thread-safety of 
globals needed, TLS is a more reasonable general solution than trying to 
re=write re-entrant versions of all of the functions that rely on 
globals and damaging the OS.






Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt





What I'm thinking is that, besides the TLS based solution, adding a 
non-standard getopt() seems to be a good option anyway, since it is a 
lightweight solution to this particular function.


Why do you think TLS is not lightweight.  It is very lightweight. The 
non-standard, non-portable approach is essentially the same 
computationally.


TLS is simply a little chunk of memory that lies at the "bottom" of the 
stack ("bottom" meaning the lowest address when the push-down stack 
memory was allocated).  Get the bottom of the stack and you have the TLS 
data.


TLS is more efficient if you can align stacks.  Then the TLS pointer can 
be obtained by just ANDing the current stack pointer. That is trivial.


If the stack is not aligned, then we have to ask the OS where the stack 
allocation begins.


Since each thread has its own stack, this provides a nearly 
instantaneous way to get thread-specific data.


So I would say that the complexity is higher only because this is not 
standard, familiar C programming, but in terms of light- vs 
heavy-weight, I do not see a real difference.  No decisions should be 
made based on that weighty-ness dimension.  The significant, important 
criteria are standard vs. non-standard and portable vs. non-portable.  
Those matter.







Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt




What I'm thinking is that, besides the TLS based solution, adding a 
non-standard getopt() seems to be a good option anyway, since it is a 
lightweight solution to this particular function.


Why do you think TLS is not lightweight.  It is very lightweight.  The 
non-standard, non-portable approach is essentially the same computationally.





Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt

Thanks for all answers. I don't entirely understand most of them though as I'm 
not really familiar with the implications of TLS or how to use it correctly. 
Also, do we need per-thread or per-task data here?
You would expect getopt() to be used only on the many thread since that
is the only thread that receives argc and argv.

So if it is only used in one thread there would only be a copy of the data? 
What if I spawn multiple threads and call getopt only on one?


Yes there is only one copy of the data for each thread that uses 
getopt().  In the normal case, only one thread, the main thread, uses 
getopt().


I had considered a single allocation for the main thread.  If other 
threads in the same task call getopt(), it could share that single 
per-task allocation (rather than creating another copy), i.e., each 
thread's TLS data would refer to the same memory.  That is how other 
Unix systems work.





Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Matias N.

On Wed, Mar 24, 2021, at 19:27, Byron Ellacott wrote:
> Here's what I found in libc that would need task (thread) specific data:
> 
>   - libs/libc/misc/lib_umask.c has g_mask
>   - libs/libc/libgen/lib_dirname.c and libs/libc/libgen/lib_basename each
> have a g_retchar
>   - libs/libc/syslog/lib_setlogmask.c has g_syslog_mask (and a comment
> describing this issue)
>   - libs/libc/pwd/* uses either g_passwd and g_passwd_buffer or g_pwd and
> g_buf
>   - libs/libc/grp/* uses a similar pair for group data
>   - libs/libc/unistd/lib_getopt.c we know of, it has four words of global
> data
>   - libs/libc/time/lib_localtime.c uses g_tm and may need per-task timezone
> settings
>   - libs/libc/netdb/lib_netdb.c specifies h_errno as a global
>   - libs/libc/netdb/lib_gethostbyname2.c  and lib_gethostbyaddr.c use
> g_hostent and g_hostbuffer
>   - libs/libc/stdlib/lib_srand.c uses a variety of globals depending on
> build options
>   - libs/libc/string/lib_strtok.c uses g_saveptr

Thanks for this list, I will update the issue and make it into a task list.

> 
> Statically allocating a TLS key for each module would consume around 11
> keys in each task. Dynamically allocated TLS keys cannot ever be released,
> because these are globals handed over to user code with no indication when
> they're no longer needed. It may be better to have an additional static
> element in tls_info_s pointing to a heap-allocated structure containing the
> libc globals. Functionally this is the same as a statically reserved TLS
> key, but it's clearer what it's for.

I think that is more or less the idea discussed in the PR. It will be done 
later on.

Best,
Matias

> -- 
> Byron
> 
> On Thu, Mar 25, 2021 at 12:51 AM Gregory Nutt  > wrote:
> 
> > On 3/24/2021 8:38 AM, Matias N. wrote:
> > > So, if I follow correctly, we could maybe have one TLS pointer pointing
> > to a struct of pointers, one per each group of globals (one of this groups,
> > would be the set of variables used by getopt()), like:
> > >
> > > struct task_globals_s
> > > {
> > >struct getopt_globals_s *getopt_globals;
> > >/* ...others */
> > > };
> > >
> > > Then getopt globals would only be allocated once for each task, and only
> > when getopt() is called.
> > >
> > > Something like that?
> >
> > Yes, that is a possibility.  But that is already implemented just as you
> > describe as POSIX thread-specific data.
> >
> > The TLS data structure is defined in include/nuttx/tls.h as following.
> > it is just an array of pointer size things and the errno variable.
> >
> > struct tls_info_s
> > {
> > #if CONFIG_TLS_NELEM > 0
> >uintptr_t tl_elem[CONFIG_TLS_NELEM]; /* TLS elements */
> > #endif
> >int tl_errno;/* Per-thread error number */
> > };
> >
> > This structure lies at the "bottom" of stack of every thread in user space.
> >
> > The standard pthread_getspecific() is then implemented as:
> >
> > FAR void *pthread_getspecific(pthread_key_t key)
> > {
> >return (FAR void *)tls_get_value((int)key);
> > }
> >
> > Where
> >
> > uintptr_t tls_get_value(int tlsindex)
> > {
> >FAR struct tls_info_s *info;
> >uintptr_t ret = 0;
> >
> >DEBUGASSERT(tlsindex >= 0 && tlsindex < CONFIG_TLS_NELEM);
> >if (tlsindex >= 0 && tlsindex < CONFIG_TLS_NELEM)
> >  {
> >/* Get the TLS info structure from the current threads stack */
> >
> >info = up_tls_info();
> >DEBUGASSERT(info != NULL);
> >
> >/* Get the element value from the TLS info. */
> >
> >ret = info->tl_elem[tlsindex];
> >  }
> >
> >return ret;
> > }
> >
> > The POSIX interface supports a pthread_key_create() to manage the indexing.
> >
> >
> >
> >
> 


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Byron Ellacott
On Thu, Mar 25, 2021 at 8:27 AM Byron Ellacott 
wrote:

> Hi,
>
> Since the basic problem is that `getopt` doesn't have a per-task value it
> can use, how would it keep track of which TLS key it's been allocated?
>

This question, at least, I understand the answer to having looked at the PR
- the TLS key is shared across all threads (of course it would need to be)
so can be stored in a single global.

-- 
Byron


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Byron Ellacott
Hi,

Since the basic problem is that `getopt` doesn't have a per-task value it
can use, how would it keep track of which TLS key it's been allocated?

Here's what I found in libc that would need task (thread) specific data:

  - libs/libc/misc/lib_umask.c has g_mask
  - libs/libc/libgen/lib_dirname.c and libs/libc/libgen/lib_basename each
have a g_retchar
  - libs/libc/syslog/lib_setlogmask.c has g_syslog_mask (and a comment
describing this issue)
  - libs/libc/pwd/* uses either g_passwd and g_passwd_buffer or g_pwd and
g_buf
  - libs/libc/grp/* uses a similar pair for group data
  - libs/libc/unistd/lib_getopt.c we know of, it has four words of global
data
  - libs/libc/time/lib_localtime.c uses g_tm and may need per-task timezone
settings
  - libs/libc/netdb/lib_netdb.c specifies h_errno as a global
  - libs/libc/netdb/lib_gethostbyname2.c  and lib_gethostbyaddr.c use
g_hostent and g_hostbuffer
  - libs/libc/stdlib/lib_srand.c uses a variety of globals depending on
build options
  - libs/libc/string/lib_strtok.c uses g_saveptr

Statically allocating a TLS key for each module would consume around 11
keys in each task. Dynamically allocated TLS keys cannot ever be released,
because these are globals handed over to user code with no indication when
they're no longer needed. It may be better to have an additional static
element in tls_info_s pointing to a heap-allocated structure containing the
libc globals. Functionally this is the same as a statically reserved TLS
key, but it's clearer what it's for.

-- 
Byron

On Thu, Mar 25, 2021 at 12:51 AM Gregory Nutt  wrote:

> On 3/24/2021 8:38 AM, Matias N. wrote:
> > So, if I follow correctly, we could maybe have one TLS pointer pointing
> to a struct of pointers, one per each group of globals (one of this groups,
> would be the set of variables used by getopt()), like:
> >
> > struct task_globals_s
> > {
> >struct getopt_globals_s *getopt_globals;
> >/* ...others */
> > };
> >
> > Then getopt globals would only be allocated once for each task, and only
> when getopt() is called.
> >
> > Something like that?
>
> Yes, that is a possibility.  But that is already implemented just as you
> describe as POSIX thread-specific data.
>
> The TLS data structure is defined in include/nuttx/tls.h as following.
> it is just an array of pointer size things and the errno variable.
>
> struct tls_info_s
> {
> #if CONFIG_TLS_NELEM > 0
>uintptr_t tl_elem[CONFIG_TLS_NELEM]; /* TLS elements */
> #endif
>int tl_errno;/* Per-thread error number */
> };
>
> This structure lies at the "bottom" of stack of every thread in user space.
>
> The standard pthread_getspecific() is then implemented as:
>
> FAR void *pthread_getspecific(pthread_key_t key)
> {
>return (FAR void *)tls_get_value((int)key);
> }
>
> Where
>
> uintptr_t tls_get_value(int tlsindex)
> {
>FAR struct tls_info_s *info;
>uintptr_t ret = 0;
>
>DEBUGASSERT(tlsindex >= 0 && tlsindex < CONFIG_TLS_NELEM);
>if (tlsindex >= 0 && tlsindex < CONFIG_TLS_NELEM)
>  {
>/* Get the TLS info structure from the current threads stack */
>
>info = up_tls_info();
>DEBUGASSERT(info != NULL);
>
>/* Get the element value from the TLS info. */
>
>ret = info->tl_elem[tlsindex];
>  }
>
>return ret;
> }
>
> The POSIX interface supports a pthread_key_create() to manage the indexing.
>
>
>
>


Re: Sleep Resolution

2021-03-24 Thread Nathan Hartman
On Wed, Mar 24, 2021 at 4:49 PM Grr  wrote:
>
> Since afterstart = 0, there should be no loop to optimize out except ONE
> value test
>
> One hundred cycles for that would seem excessive for me
>
> TWENTY THOUSAND CYCLES for a _zero_ loop?!?
>
> Maybe in Java

We didn't see the value of transfer->dev->afterstart.

Could this be exposing a bug elsewhere? E.g., afterstart has a
different value than you expect?

Alternately could it be that afterstart had the correct value but the
task was interrupted during the loop and a big portion of the 20,000
cycles were spent doing something else?

Nathan


Re: Sleep Resolution

2021-03-24 Thread Grr
for(delay = 0; delay < transfer->dev->afterstart; delay++);

afterstart is loop's limit, which determines the desired delay length,
known by definition

El mié, 24 mar 2021 a las 14:57, Johnny Billquist ()
escribió:

> Well. There was nothing in there that showed me that afterstart == 0. Is
> this a known fact, or an assumption?
>
>Johnny
>
> On 2021-03-24 21:47, Grr wrote:
> > Since afterstart = 0, there should be no loop to optimize out except ONE
> > value test
> >
> > One hundred cycles for that would seem excessive for me
> >
> > TWENTY THOUSAND CYCLES for a _zero_ loop?!?
> >
> > Maybe in Java
> >
> >
> >
> > El mié, 24 mar 2021 a las 14:30, Gregory Nutt ()
> > escribió:
> >
> >>
> >>> Weird behavior:
> >>>
> >>> Simply changing loop counter variable from uint16_t to volatile
> uint16_t
> >>> causes initial delay (with variable delay = 0) going from ~500 ns to
> >> ~120 us
> >>>
> >>> The code is
> >>>
> >>> uint16_t delay;
> >>>
> >>> select_function();
> >>> for(delay = 0; delay < transfer->dev->afterstart; delay++);
> >>>
> >>> Any ideas?
> >>>
> >> I imagine that the delay loop is no longer being optimized out. That is
> >> what volatile is supposed to do (people often don't understand that, it
> >> is a great interview question).
> >>
> >
>
> --
> Johnny Billquist  || "I'm on a bus
>||  on a psychedelic trip
> email: b...@softjar.se ||  Reading murder books
> pdp is alive! ||  tryin' to stay hip" - B. Idol
>


Re: Sleep Resolution

2021-03-24 Thread Johnny Billquist
Well. There was nothing in there that showed me that afterstart == 0. Is 
this a known fact, or an assumption?


  Johnny

On 2021-03-24 21:47, Grr wrote:

Since afterstart = 0, there should be no loop to optimize out except ONE
value test

One hundred cycles for that would seem excessive for me

TWENTY THOUSAND CYCLES for a _zero_ loop?!?

Maybe in Java



El mié, 24 mar 2021 a las 14:30, Gregory Nutt ()
escribió:




Weird behavior:

Simply changing loop counter variable from uint16_t to volatile uint16_t
causes initial delay (with variable delay = 0) going from ~500 ns to

~120 us


The code is

uint16_t delay;

select_function();
for(delay = 0; delay < transfer->dev->afterstart; delay++);

Any ideas?


I imagine that the delay loop is no longer being optimized out. That is
what volatile is supposed to do (people often don't understand that, it
is a great interview question).





--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: Sleep Resolution

2021-03-24 Thread Grr
Since afterstart = 0, there should be no loop to optimize out except ONE
value test

One hundred cycles for that would seem excessive for me

TWENTY THOUSAND CYCLES for a _zero_ loop?!?

Maybe in Java



El mié, 24 mar 2021 a las 14:30, Gregory Nutt ()
escribió:

>
> > Weird behavior:
> >
> > Simply changing loop counter variable from uint16_t to volatile uint16_t
> > causes initial delay (with variable delay = 0) going from ~500 ns to
> ~120 us
> >
> > The code is
> >
> > uint16_t delay;
> >
> > select_function();
> > for(delay = 0; delay < transfer->dev->afterstart; delay++);
> >
> > Any ideas?
> >
> I imagine that the delay loop is no longer being optimized out. That is
> what volatile is supposed to do (people often don't understand that, it
> is a great interview question).
>


Re: USB Host and Bluetooth "dongle".

2021-03-24 Thread Matias N.
Not entirely sure about your scenario here, but if the dongle simply expects 
HCI communication and you manage
to get a /dev/ttyUSB0 or whatever on NuttX side, then you should be able to use 
the bt_uart bridge to expose the controller
to NuttX. From then on, you can choose to use NuttX host layer or an external 
one such as nimBLE.

Best,
Matias

On Wed, Mar 24, 2021, at 14:34, Tim wrote:
> I hope this is not too much if a newbie/dumb question as I have tried to
> research this first.
> 
>  
> 
> I've been playing with Nuttx on a SAMA5D2-XULT board ahead of porting to my
> custom board (SAMA5D27C-5M, actually), especially in regard to USB Host
> functionality as that's what's drawn me to NuttX in the first place.
> 
>  
> 
> USB memory sticks are recognised and I can mount them (not got automount to
> work, yet - I'm still learning!; and date/time stamping of files written
> doesn't seem to work either) - nice to have things like this work "out of
> the box"! 
> 
>  
> 
> I can also build in Bluetooth support and the demo app, but as far as I can
> tell there are no actual Bluetooth host drivers in NuttX to properly support
> a detected Bluetooth device. My custom board actually has a SiLabs Bluetooth
> module on it (connected to the SAM via UART and proven to work) but I quite
> like the idea of just using plug-in Bluetooth devices instead :)
> 
>  
> 
> I can see the device is detected (it gets powered, and dmesg reports stuff)
> but no drivers are loaded as best as I can tell. Is that correct or have I
> missed something?
> 
>  
> 
> Drivers are not my strong point but, if I'm right and a driver will be
> needed, is a generic Linux driver a good place to start were I to look to
> add/create one to NuttX? Or maybe someone has already done this?
> 
>  
> 
>  
> 
> Thanks,
> 
>  
> 
> Tim.
> 
>  
> 
> 


Re: Sleep Resolution

2021-03-24 Thread Gregory Nutt




Weird behavior:

Simply changing loop counter variable from uint16_t to volatile uint16_t
causes initial delay (with variable delay = 0) going from ~500 ns to ~120 us

The code is

uint16_t delay;

select_function();
for(delay = 0; delay < transfer->dev->afterstart; delay++);

Any ideas?

I imagine that the delay loop is no longer being optimized out. That is 
what volatile is supposed to do (people often don't understand that, it 
is a great interview question).


Re: Sleep Resolution

2021-03-24 Thread Nathan Hartman
On Wed, Mar 24, 2021 at 3:53 PM Johnny Billquist  wrote:
>
> Perfectly expected.
> With volatile, the compiler are not allowed to optimize away the memory
> accesses for updating the loop variable. So you are suddenly getting a
> lot of memory read/write cycles that probably didn't happen before.
> I would even have expected that prior to the volatile, that loop would
> be totally optimized away.

Yes, that's exactly right.

Be careful with volatile:
https://blog.regehr.org/archives/28

Enjoy,
Nathan


Re: USB Host and Bluetooth "dongle".

2021-03-24 Thread Frank-Christian Kruegel

Am 24.03.2021 um 18:34 schrieb Tim:


I can also build in Bluetooth support and the demo app, but as far as I can
tell there are no actual Bluetooth host drivers in NuttX to properly support
a detected Bluetooth device. My custom board actually has a SiLabs Bluetooth
module on it (connected to the SAM via UART and proven to work) but I quite
like the idea of just using plug-in Bluetooth devices instead :)


Bluetooth is quite complex. You will have to deal with several layers:

1. Hardware driver (like Ethernet Hardware driver)
2. Low Level Protocol Stack (HCI, like network TCP/IP layer)
3. High Level Protocol Stack ("Profiles", like network mail, web, ntp, 
nfs application protocols)


A complete stack fron top to bottom has a code size of several 1 
lines of code.


USB dongles usually implement the hardware driver and the low-level 
protocol stack (HCI). The computer has to run the high level protocol stack.


Other modules, especially the serial ones, implement the whole stack: 
hardware driver, low-level stack, and one or more profiles (SPP, 
audio,...). If you only need SPP, you really should use these modules. 
This allows you to write only 100 lines of code instad of 1.


fchk



Re: Sleep Resolution

2021-03-24 Thread Johnny Billquist

Perfectly expected.
With volatile, the compiler are not allowed to optimize away the memory 
accesses for updating the loop variable. So you are suddenly getting a 
lot of memory read/write cycles that probably didn't happen before.
I would even have expected that prior to the volatile, that loop would 
be totally optimized away.


  Johnny

On 2021-03-24 20:47, Grr wrote:

Weird behavior:

Simply changing loop counter variable from uint16_t to volatile uint16_t
causes initial delay (with variable delay = 0) going from ~500 ns to ~120 us

The code is

uint16_t delay;

select_function();
for(delay = 0; delay < transfer->dev->afterstart; delay++);

Any ideas?

El mié, 24 mar 2021 a las 12:21, Gregory Nutt ()
escribió:




What I'm trying to do is to generate hold and disable times for SPI CS,
which should be about 50 ns

That resolution is too high for any system timer.

I started by an empty for loop but it seems optimization gets rid of it

(I

haven't researched the issue properly). Then I thought a proper function
would be better but got stuck in that expression "sleep resolution"


Add volatile to the loop counter variable and the optimizer will not
remove it.







--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: Sleep Resolution

2021-03-24 Thread Grr
Weird behavior:

Simply changing loop counter variable from uint16_t to volatile uint16_t
causes initial delay (with variable delay = 0) going from ~500 ns to ~120 us

The code is

uint16_t delay;

select_function();
for(delay = 0; delay < transfer->dev->afterstart; delay++);

Any ideas?

El mié, 24 mar 2021 a las 12:21, Gregory Nutt ()
escribió:

>
> > What I'm trying to do is to generate hold and disable times for SPI CS,
> > which should be about 50 ns
> That resolution is too high for any system timer.
> > I started by an empty for loop but it seems optimization gets rid of it
> (I
> > haven't researched the issue properly). Then I thought a proper function
> > would be better but got stuck in that expression "sleep resolution"
>
> Add volatile to the loop counter variable and the optimizer will not
> remove it.
>
>
>


Re: Sleep Resolution

2021-03-24 Thread Gregory Nutt




Is the Tickless mode considered stable enough for production use now?
IIRC it had some caveats when I last looked into it and I haven't had
a chance to study it again.
I believe so.  I am not aware of any issues.  It has been around for 
several years and has been used by a lot of NuttX users.  So from what I 
know the answer is "Yes."  But I don't do products myself so I would 
welcome the feedback and experiences of anyone else.


Re: Sleep Resolution

2021-03-24 Thread Nathan Hartman
On Wed, Mar 24, 2021 at 2:53 PM Gregory Nutt  wrote:
>
>
> > The way the logic in clock_nanosleep() is written, the minimum delay
> > ends up being 2 such ticks. I don't remember why and I can't seem to
> > find it in the code right now, but I know this because I checked into
> > it recently and found out that that's how it works.
>
> See https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
>
> This is a translation.  It does not effect the accuracy, it effects the
> mean delay.  The accuracy is still 10 MS.  The quantization error will
> lie in the range of 0 to +10 MS.  If you did not add one tick, the error
> would be in the range of -1 to 0 MS which is unacceptable.

Thanks

> > It does not make sense to change the tick interval to a higher
> > resolution (shorter time) because then the OS will spend a
> > significantly increasing amount of time in useless interrupts etc.
>
> Unless you use Tickless mode then it is easy to get very high resolution
> (1 uS range) with no CPU overhead.

Is the Tickless mode considered stable enough for production use now?
IIRC it had some caveats when I last looked into it and I haven't had
a chance to study it again.

Nathan


Re: Sleep Resolution

2021-03-24 Thread Gregory Nutt




The way the logic in clock_nanosleep() is written, the minimum delay
ends up being 2 such ticks. I don't remember why and I can't seem to
find it in the code right now, but I know this because I checked into
it recently and found out that that's how it works.


See https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays

This is a translation.  It does not effect the accuracy, it effects the 
mean delay.  The accuracy is still 10 MS.  The quantization error will 
lie in the range of 0 to +10 MS.  If you did not add one tick, the error 
would be in the range of -1 to 0 MS which is unacceptable.



It does not make sense to change the tick interval to a higher
resolution (shorter time) because then the OS will spend a
significantly increasing amount of time in useless interrupts etc.


Unless you use Tickless mode then it is easy to get very high resolution 
(1 uS range) with no CPU overhead.


50 ns is still probably out of reach of the system timer.



Re: Sleep Resolution

2021-03-24 Thread Nathan Hartman
On Wed, Mar 24, 2021 at 12:37 PM Grr  wrote:
> Looking for the right way to create a _very_ short delay (10-100 ns), I
> found clock_nanosleep, whose description says:
>
> "The suspension time caused by this function may be longer than requested
> because the argument value is rounded up to an integer multiple of the
> sleep resolution"
>
> What is the sleep resolution and where/how is defined?

I'm a little late to the party, but...

The resolution would likely be 20 milliseconds, which is far more than you want.

Here's how I get that 20 millisecond value:

If you do not configure CONFIG_USEC_PER_TICK to a custom value, it is
10,000 microseconds (10 milliseconds) per "tick" by default.

The way the logic in clock_nanosleep() is written, the minimum delay
ends up being 2 such ticks. I don't remember why and I can't seem to
find it in the code right now, but I know this because I checked into
it recently and found out that that's how it works.

Note that all the sleep functions, whether sleep(), usleep(),
nanosleep(), etc., promise to delay for AT LEAST the length of time
you specify. There is no upper limit to the length of the delay as
it's subject to scheduling, the resolution of the clock, and whatever
else is going on in the system.

It does not make sense to change the tick interval to a higher
resolution (shorter time) because then the OS will spend a
significantly increasing amount of time in useless interrupts etc.

Also, it does not make sense to use these functions for such short
delays in the nanosecond range. Just the processing overhead of
calling one of those functions is much more than 10 to 100 ns.

When I need such a short delay (e.g., when you need a delay between
setting Chip Select of a SPI peripheral and actual start of
communication), I measure how long a NOP instruction takes on the
microcontroller in question and I insert that many NOPs. If the delay
would require a lot of NOPs, you can use a for-loop with a volatile
loop counter ensuring that the compiler doesn't just optimize away the
loop. Note that if there's a task switch or interrupt during that
time, the delay will likely be much longer than you intend. If the
timing is critical, a simple hack is to use a critical section around
it, but I would use that as a last resort; first, I would look into
doing whatever needs that fine delay, e.g., waveform shaping, with
hardware instead.

These solutions are obviously very closely tied to the specific
microcontroller and its clock speed, so they're very non-portable. If
someone has a better suggestion, I'd love to learn about it!

Nathan


Re: Sleep Resolution

2021-03-24 Thread Gregory Nutt




What I'm trying to do is to generate hold and disable times for SPI CS,
which should be about 50 ns

That resolution is too high for any system timer.

I started by an empty for loop but it seems optimization gets rid of it (I
haven't researched the issue properly). Then I thought a proper function
would be better but got stuck in that expression "sleep resolution"


Add volatile to the loop counter variable and the optimizer will not 
remove it.





RE: Sleep Resolution

2021-03-24 Thread David Sidrane
I asked about HW because some of the new SPI controller IP have the delays
programmable in HW. One from CS active to clock/data and the other is inter
data delays.

Using HW SS and the timers it is built in.

The issue is on a shared bus. We would need to extend the SPI API to support
the settings.

David

-Original Message-
From: Grr [mailto:gebbe...@gmail.com]
Sent: Wednesday, March 24, 2021 10:52 AM
To: dev@nuttx.apache.org
Subject: Re: Sleep Resolution

This is a SocketCAN driver for MCP2515 with a new SPI system that
streamlines the select->write->read->deselect process and so exposes the
need to guarantee hold time between read and deselect and disable time
between deselect and next select

Looking at David's code, it seems the loop is the right answer. The DWT
cannot be used for a portable solution but maybe an inline function. Thanks
for the idea

I believe NOPs are optimized away but it seems asm("") or something close
to that is not

It would be nice to incorporate a general solution for this problem to the
Nuttx toolbox

El mié, 24 mar 2021 a las 11:24, David Sidrane ()
escribió:

> What HW is this on?
>
> -Original Message-
> From: Grr [mailto:gebbe...@gmail.com]
> Sent: Wednesday, March 24, 2021 10:09 AM
> To: dev@nuttx.apache.org
> Subject: Re: Sleep Resolution
>
> Thank you very much for your response
>
> What I'm trying to do is to generate hold and disable times for SPI CS,
> which should be about 50 ns
>
> I started by an empty for loop but it seems optimization gets rid of it (I
> haven't researched the issue properly). Then I thought a proper function
> would be better but got stuck in that expression "sleep resolution"
>
> For that scale (10 SYSCLK cycles), a loop is probably OK but I wanted to
> make sure there's not a more appropriate system tool
>
> El mié, 24 mar 2021 a las 10:46, Sara da Cunha Monteiro de Souza (<
> saramonteirosouz...@gmail.com>) escribió:
>
> > Hi Grr,
> >
> > I have never needed to use this function neither this range (ns).
> > But I used the usleep function which resolution is defined as
> > CONFIG_USEC_PER_TICK.
> > But maybe, in your case, for such range, you should consider using a
> > hardware timer or a Timer Hook.
> > Take a look at this wiki:
> > https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
> >
> > Sara
> >
> > Em qua., 24 de mar. de 2021 às 13:37, Grr  escreveu:
> >
> > > Hello to all.
> > >
> > > Looking for the right way to create a _very_ short delay (10-100 ns),
> > > I
> > > found clock_nanosleep, whose description says:
> > >
> > > "The suspension time caused by this function may be longer than
> > > requested
> > > because the argument value is rounded up to an integer multiple of the
> > > sleep resolution"
> > >
> > > What is the sleep resolution and where/how is defined?
> > >
> > > TIA
> > > Grr
> > >
> >
>


Re: Sleep Resolution

2021-03-24 Thread Fotis Panagiotopoulos
If you are using an ARM MCU you may find the following helpful.
You must ensure that it cannot be scheduled out in any way though...

(Directly copy-pasting from one of our HAL libraries... You will need to
fine-tune it to your needs.)

/**
 * Multiplier value for the ASM delay function. The delay cycles will be
 * the requested uSecs delay multiplied by this value. The ASM loop needs
 * two instructions per loop, which need 2 cycles to execute (4 if branch
 * predictor misses, but that would not be the case for most executions).
 * Assuming a 180MHz clock we need 60 loops per microsecond.
 * @note The value is verified to be accurate after oscilloscope
measurements.
 */
#define DELAY_ASM_MULTIPLIER( 60 )

void delayASM(uint32_t us)
{
us *= DELAY_ASM_MULTIPLIER;

//This implementation is taking care of the loop in assembly
//instructions, making the result much more predictable than
//a standard for loop with "nop" instructions, and it is
//independent of the compiler, and its optimizations.
asm volatile("   mov r0, %[us]  \n\t"
 "1: subs r0, #1\n\t"
 "   bhi 1b \n\t"
 :
 : [us] "r" (us)
 : "r0");
}

Στις Τετ, 24 Μαρ 2021 στις 7:53 μ.μ., ο/η Grr  έγραψε:

> This is a SocketCAN driver for MCP2515 with a new SPI system that
> streamlines the select->write->read->deselect process and so exposes the
> need to guarantee hold time between read and deselect and disable time
> between deselect and next select
>
> Looking at David's code, it seems the loop is the right answer. The DWT
> cannot be used for a portable solution but maybe an inline function. Thanks
> for the idea
>
> I believe NOPs are optimized away but it seems asm("") or something close
> to that is not
>
> It would be nice to incorporate a general solution for this problem to the
> Nuttx toolbox
>
> El mié, 24 mar 2021 a las 11:24, David Sidrane ()
> escribió:
>
> > What HW is this on?
> >
> > -Original Message-
> > From: Grr [mailto:gebbe...@gmail.com]
> > Sent: Wednesday, March 24, 2021 10:09 AM
> > To: dev@nuttx.apache.org
> > Subject: Re: Sleep Resolution
> >
> > Thank you very much for your response
> >
> > What I'm trying to do is to generate hold and disable times for SPI CS,
> > which should be about 50 ns
> >
> > I started by an empty for loop but it seems optimization gets rid of it
> (I
> > haven't researched the issue properly). Then I thought a proper function
> > would be better but got stuck in that expression "sleep resolution"
> >
> > For that scale (10 SYSCLK cycles), a loop is probably OK but I wanted to
> > make sure there's not a more appropriate system tool
> >
> > El mié, 24 mar 2021 a las 10:46, Sara da Cunha Monteiro de Souza (<
> > saramonteirosouz...@gmail.com>) escribió:
> >
> > > Hi Grr,
> > >
> > > I have never needed to use this function neither this range (ns).
> > > But I used the usleep function which resolution is defined as
> > > CONFIG_USEC_PER_TICK.
> > > But maybe, in your case, for such range, you should consider using a
> > > hardware timer or a Timer Hook.
> > > Take a look at this wiki:
> > > https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
> > >
> > > Sara
> > >
> > > Em qua., 24 de mar. de 2021 às 13:37, Grr 
> escreveu:
> > >
> > > > Hello to all.
> > > >
> > > > Looking for the right way to create a _very_ short delay (10-100
> ns), I
> > > > found clock_nanosleep, whose description says:
> > > >
> > > > "The suspension time caused by this function may be longer than
> > > > requested
> > > > because the argument value is rounded up to an integer multiple of
> the
> > > > sleep resolution"
> > > >
> > > > What is the sleep resolution and where/how is defined?
> > > >
> > > > TIA
> > > > Grr
> > > >
> > >
> >
>


Re: Sleep Resolution

2021-03-24 Thread Grr
This is a SocketCAN driver for MCP2515 with a new SPI system that
streamlines the select->write->read->deselect process and so exposes the
need to guarantee hold time between read and deselect and disable time
between deselect and next select

Looking at David's code, it seems the loop is the right answer. The DWT
cannot be used for a portable solution but maybe an inline function. Thanks
for the idea

I believe NOPs are optimized away but it seems asm("") or something close
to that is not

It would be nice to incorporate a general solution for this problem to the
Nuttx toolbox

El mié, 24 mar 2021 a las 11:24, David Sidrane ()
escribió:

> What HW is this on?
>
> -Original Message-
> From: Grr [mailto:gebbe...@gmail.com]
> Sent: Wednesday, March 24, 2021 10:09 AM
> To: dev@nuttx.apache.org
> Subject: Re: Sleep Resolution
>
> Thank you very much for your response
>
> What I'm trying to do is to generate hold and disable times for SPI CS,
> which should be about 50 ns
>
> I started by an empty for loop but it seems optimization gets rid of it (I
> haven't researched the issue properly). Then I thought a proper function
> would be better but got stuck in that expression "sleep resolution"
>
> For that scale (10 SYSCLK cycles), a loop is probably OK but I wanted to
> make sure there's not a more appropriate system tool
>
> El mié, 24 mar 2021 a las 10:46, Sara da Cunha Monteiro de Souza (<
> saramonteirosouz...@gmail.com>) escribió:
>
> > Hi Grr,
> >
> > I have never needed to use this function neither this range (ns).
> > But I used the usleep function which resolution is defined as
> > CONFIG_USEC_PER_TICK.
> > But maybe, in your case, for such range, you should consider using a
> > hardware timer or a Timer Hook.
> > Take a look at this wiki:
> > https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
> >
> > Sara
> >
> > Em qua., 24 de mar. de 2021 às 13:37, Grr  escreveu:
> >
> > > Hello to all.
> > >
> > > Looking for the right way to create a _very_ short delay (10-100 ns), I
> > > found clock_nanosleep, whose description says:
> > >
> > > "The suspension time caused by this function may be longer than
> > > requested
> > > because the argument value is rounded up to an integer multiple of the
> > > sleep resolution"
> > >
> > > What is the sleep resolution and where/how is defined?
> > >
> > > TIA
> > > Grr
> > >
> >
>


USB Host and Bluetooth "dongle".

2021-03-24 Thread Tim
I hope this is not too much if a newbie/dumb question as I have tried to
research this first.

 

I've been playing with Nuttx on a SAMA5D2-XULT board ahead of porting to my
custom board (SAMA5D27C-5M, actually), especially in regard to USB Host
functionality as that's what's drawn me to NuttX in the first place.

 

USB memory sticks are recognised and I can mount them (not got automount to
work, yet - I'm still learning!; and date/time stamping of files written
doesn't seem to work either) - nice to have things like this work "out of
the box"! 

 

I can also build in Bluetooth support and the demo app, but as far as I can
tell there are no actual Bluetooth host drivers in NuttX to properly support
a detected Bluetooth device. My custom board actually has a SiLabs Bluetooth
module on it (connected to the SAM via UART and proven to work) but I quite
like the idea of just using plug-in Bluetooth devices instead :)

 

I can see the device is detected (it gets powered, and dmesg reports stuff)
but no drivers are loaded as best as I can tell. Is that correct or have I
missed something?

 

Drivers are not my strong point but, if I'm right and a driver will be
needed, is a generic Linux driver a good place to start were I to look to
add/create one to NuttX? Or maybe someone has already done this?

 

 

Thanks,

 

Tim.

 



Re: Sleep Resolution

2021-03-24 Thread Barbiani
How about adding a few nops with the interrupts disabled?

A context switch would take longer than this delay.

On Wed, Mar 24, 2021, 14:24 David Sidrane  wrote:

> What HW is this on?
>
> -Original Message-
> From: Grr [mailto:gebbe...@gmail.com]
> Sent: Wednesday, March 24, 2021 10:09 AM
> To: dev@nuttx.apache.org
> Subject: Re: Sleep Resolution
>
> Thank you very much for your response
>
> What I'm trying to do is to generate hold and disable times for SPI CS,
> which should be about 50 ns
>
> I started by an empty for loop but it seems optimization gets rid of it (I
> haven't researched the issue properly). Then I thought a proper function
> would be better but got stuck in that expression "sleep resolution"
>
> For that scale (10 SYSCLK cycles), a loop is probably OK but I wanted to
> make sure there's not a more appropriate system tool
>
> El mié, 24 mar 2021 a las 10:46, Sara da Cunha Monteiro de Souza (<
> saramonteirosouz...@gmail.com>) escribió:
>
> > Hi Grr,
> >
> > I have never needed to use this function neither this range (ns).
> > But I used the usleep function which resolution is defined as
> > CONFIG_USEC_PER_TICK.
> > But maybe, in your case, for such range, you should consider using a
> > hardware timer or a Timer Hook.
> > Take a look at this wiki:
> > https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
> >
> > Sara
> >
> > Em qua., 24 de mar. de 2021 às 13:37, Grr  escreveu:
> >
> > > Hello to all.
> > >
> > > Looking for the right way to create a _very_ short delay (10-100 ns), I
> > > found clock_nanosleep, whose description says:
> > >
> > > "The suspension time caused by this function may be longer than
> > > requested
> > > because the argument value is rounded up to an integer multiple of the
> > > sleep resolution"
> > >
> > > What is the sleep resolution and where/how is defined?
> > >
> > > TIA
> > > Grr
> > >
> >
>


RE: Sleep Resolution

2021-03-24 Thread David Sidrane
What HW is this on?

-Original Message-
From: Grr [mailto:gebbe...@gmail.com]
Sent: Wednesday, March 24, 2021 10:09 AM
To: dev@nuttx.apache.org
Subject: Re: Sleep Resolution

Thank you very much for your response

What I'm trying to do is to generate hold and disable times for SPI CS,
which should be about 50 ns

I started by an empty for loop but it seems optimization gets rid of it (I
haven't researched the issue properly). Then I thought a proper function
would be better but got stuck in that expression "sleep resolution"

For that scale (10 SYSCLK cycles), a loop is probably OK but I wanted to
make sure there's not a more appropriate system tool

El mié, 24 mar 2021 a las 10:46, Sara da Cunha Monteiro de Souza (<
saramonteirosouz...@gmail.com>) escribió:

> Hi Grr,
>
> I have never needed to use this function neither this range (ns).
> But I used the usleep function which resolution is defined as
> CONFIG_USEC_PER_TICK.
> But maybe, in your case, for such range, you should consider using a
> hardware timer or a Timer Hook.
> Take a look at this wiki:
> https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
>
> Sara
>
> Em qua., 24 de mar. de 2021 às 13:37, Grr  escreveu:
>
> > Hello to all.
> >
> > Looking for the right way to create a _very_ short delay (10-100 ns), I
> > found clock_nanosleep, whose description says:
> >
> > "The suspension time caused by this function may be longer than
> > requested
> > because the argument value is rounded up to an integer multiple of the
> > sleep resolution"
> >
> > What is the sleep resolution and where/how is defined?
> >
> > TIA
> > Grr
> >
>


RE: Sleep Resolution

2021-03-24 Thread David Sidrane
Have a look at

https://github.com/PX4/PX4-Autopilot/blob/3ef93823f4b8f870b056549d321473a02fb69b1f/platforms/nuttx/src/px4/common/srgbled/srgbled.cpp#L112-L115


-Original Message-
From: Grr [mailto:gebbe...@gmail.com]
Sent: Wednesday, March 24, 2021 9:36 AM
To: dev@nuttx.apache.org
Subject: Sleep Resolution

Hello to all.

Looking for the right way to create a _very_ short delay (10-100 ns), I
found clock_nanosleep, whose description says:

"The suspension time caused by this function may be longer than requested
because the argument value is rounded up to an integer multiple of the
sleep resolution"

What is the sleep resolution and where/how is defined?

TIA
Grr


Re: Sleep Resolution

2021-03-24 Thread Alan Carvalho de Assis
Hi Grr,

It is not a simple challenge even using a microcontroller running a
baremetal busy loop code, let alone using a RTOS. More info:

https://forum.allaboutcircuits.com/threads/programming-a-1-ns-delay-for-microcontroller-processor.90227/

It is better to have a dedicated hw taking care of the ns-resolution
delays and using the RTOS to control and gather the resulted data.

What are trying to achive? (of course if can share or talk about it).

BR,

Alan

On 3/24/21, Grr  wrote:
> Hello to all.
>
> Looking for the right way to create a _very_ short delay (10-100 ns), I
> found clock_nanosleep, whose description says:
>
> "The suspension time caused by this function may be longer than requested
> because the argument value is rounded up to an integer multiple of the
> sleep resolution"
>
> What is the sleep resolution and where/how is defined?
>
> TIA
> Grr
>


Re: Sleep Resolution

2021-03-24 Thread Grr
Thank you very much for your response

What I'm trying to do is to generate hold and disable times for SPI CS,
which should be about 50 ns

I started by an empty for loop but it seems optimization gets rid of it (I
haven't researched the issue properly). Then I thought a proper function
would be better but got stuck in that expression "sleep resolution"

For that scale (10 SYSCLK cycles), a loop is probably OK but I wanted to
make sure there's not a more appropriate system tool

El mié, 24 mar 2021 a las 10:46, Sara da Cunha Monteiro de Souza (<
saramonteirosouz...@gmail.com>) escribió:

> Hi Grr,
>
> I have never needed to use this function neither this range (ns).
> But I used the usleep function which resolution is defined as
> CONFIG_USEC_PER_TICK.
> But maybe, in your case, for such range, you should consider using a
> hardware timer or a Timer Hook.
> Take a look at this wiki:
> https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
>
> Sara
>
> Em qua., 24 de mar. de 2021 às 13:37, Grr  escreveu:
>
> > Hello to all.
> >
> > Looking for the right way to create a _very_ short delay (10-100 ns), I
> > found clock_nanosleep, whose description says:
> >
> > "The suspension time caused by this function may be longer than requested
> > because the argument value is rounded up to an integer multiple of the
> > sleep resolution"
> >
> > What is the sleep resolution and where/how is defined?
> >
> > TIA
> > Grr
> >
>


Re: Sleep Resolution

2021-03-24 Thread Sara da Cunha Monteiro de Souza
Hi Grr,

I have never needed to use this function neither this range (ns).
But I used the usleep function which resolution is defined as
CONFIG_USEC_PER_TICK.
But maybe, in your case, for such range, you should consider using a
hardware timer or a Timer Hook.
Take a look at this wiki:
https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays

Sara

Em qua., 24 de mar. de 2021 às 13:37, Grr  escreveu:

> Hello to all.
>
> Looking for the right way to create a _very_ short delay (10-100 ns), I
> found clock_nanosleep, whose description says:
>
> "The suspension time caused by this function may be longer than requested
> because the argument value is rounded up to an integer multiple of the
> sleep resolution"
>
> What is the sleep resolution and where/how is defined?
>
> TIA
> Grr
>


Sleep Resolution

2021-03-24 Thread Grr
Hello to all.

Looking for the right way to create a _very_ short delay (10-100 ns), I
found clock_nanosleep, whose description says:

"The suspension time caused by this function may be longer than requested
because the argument value is rounded up to an integer multiple of the
sleep resolution"

What is the sleep resolution and where/how is defined?

TIA
Grr


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Matias N.
Great, thanks!
I was just writing an issue to have this noted somewhere.

Best,
Matias

On Wed, Mar 24, 2021, at 13:23, Gregory Nutt wrote:
> I think it is not very much work to implement.  Perhaps I will submit a 
> draft PR for your review.
> 
> 
> On 3/24/2021 9:34 AM, Matias N. wrote:
> > Yes, you're right, TLS is the way to go.
> > I only wonder how to minimize the impact. Could this array inside the TLS 
> > struct be grown as needed during runtime? That way if no application calls 
> > to getopt() (or any other function requiring similar solution), no extra 
> > space on TLS is used.
> >
> > On Wed, Mar 24, 2021, at 12:32, Gregory Nutt wrote:
>  Se we can either add something special just as for errno or use
>  entries in that array (which would require establishing a minimum
>  number of entries to satisfy the case of getopt en potentially
>  others). I think it is better to somehow "reserve" space for the
>  known required cases.
> 
>  What i'm worried about is: how many other cases like this there could
>  be? Maybe there will be a considerable number of this entries added
>  to TLS structure (yes, four bytes, but they can add up quickly). I
>  would personally prefer to use reentrant versions when they are
>  available, instead of increasing memory use of every thread. Not sure
>  what is really best here...
> >>> Standardization is certainly the highest value of the OS and the thing
> >>> that makes NuttX what it is.  Sacrificing standardization sacrifices
> >>> the core value of the OS.
> >> Standardization supports portability.  If we bring in code from Linux,
> >> it will not use getopt_r(), it will use getopt() or getopt_long() and
> >> may not work as expected without the TLS-based change.  Similarly, if we
> >> write applications that depend on the non-standard getop_r(), that code
> >> will not compile or build under Linux.  We will have lost portability.
> >>
> >> Many people support code components that operate under either Linux or
> >> NuttX and they depend on having this compatibility.  Why break it?  It
> >> is not consistent with the principles set out in INVIOLABLES.md
> >>
> >>
> >>
> >>
> 
> 


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt
I think it is not very much work to implement.  Perhaps I will submit a 
draft PR for your review.



On 3/24/2021 9:34 AM, Matias N. wrote:

Yes, you're right, TLS is the way to go.
I only wonder how to minimize the impact. Could this array inside the TLS 
struct be grown as needed during runtime? That way if no application calls to 
getopt() (or any other function requiring similar solution), no extra space on 
TLS is used.

On Wed, Mar 24, 2021, at 12:32, Gregory Nutt wrote:

Se we can either add something special just as for errno or use
entries in that array (which would require establishing a minimum
number of entries to satisfy the case of getopt en potentially
others). I think it is better to somehow "reserve" space for the
known required cases.

What i'm worried about is: how many other cases like this there could
be? Maybe there will be a considerable number of this entries added
to TLS structure (yes, four bytes, but they can add up quickly). I
would personally prefer to use reentrant versions when they are
available, instead of increasing memory use of every thread. Not sure
what is really best here...

Standardization is certainly the highest value of the OS and the thing
that makes NuttX what it is.  Sacrificing standardization sacrifices
the core value of the OS.

Standardization supports portability.  If we bring in code from Linux,
it will not use getopt_r(), it will use getopt() or getopt_long() and
may not work as expected without the TLS-based change.  Similarly, if we
write applications that depend on the non-standard getop_r(), that code
will not compile or build under Linux.  We will have lost portability.

Many people support code components that operate under either Linux or
NuttX and they depend on having this compatibility.  Why break it?  It
is not consistent with the principles set out in INVIOLABLES.md








Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Matias N.
Yes, you're right, TLS is the way to go.
I only wonder how to minimize the impact. Could this array inside the TLS 
struct be grown as needed during runtime? That way if no application calls to 
getopt() (or any other function requiring similar solution), no extra space on 
TLS is used.

On Wed, Mar 24, 2021, at 12:32, Gregory Nutt wrote:
> 
> >
> >> Se we can either add something special just as for errno or use 
> >> entries in that array (which would require establishing a minimum 
> >> number of entries to satisfy the case of getopt en potentially 
> >> others). I think it is better to somehow "reserve" space for the 
> >> known required cases.
> >>
> >> What i'm worried about is: how many other cases like this there could 
> >> be? Maybe there will be a considerable number of this entries added 
> >> to TLS structure (yes, four bytes, but they can add up quickly). I 
> >> would personally prefer to use reentrant versions when they are 
> >> available, instead of increasing memory use of every thread. Not sure 
> >> what is really best here...
> > Standardization is certainly the highest value of the OS and the thing 
> > that makes NuttX what it is.  Sacrificing standardization sacrifices 
> > the core value of the OS.
> 
> Standardization supports portability.  If we bring in code from Linux, 
> it will not use getopt_r(), it will use getopt() or getopt_long() and 
> may not work as expected without the TLS-based change.  Similarly, if we 
> write applications that depend on the non-standard getop_r(), that code 
> will not compile or build under Linux.  We will have lost portability.
> 
> Many people support code components that operate under either Linux or 
> NuttX and they depend on having this compatibility.  Why break it?  It 
> is not consistent with the principles set out in INVIOLABLES.md
> 
> 
> 
> 


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt





Se we can either add something special just as for errno or use 
entries in that array (which would require establishing a minimum 
number of entries to satisfy the case of getopt en potentially 
others). I think it is better to somehow "reserve" space for the 
known required cases.


What i'm worried about is: how many other cases like this there could 
be? Maybe there will be a considerable number of this entries added 
to TLS structure (yes, four bytes, but they can add up quickly). I 
would personally prefer to use reentrant versions when they are 
available, instead of increasing memory use of every thread. Not sure 
what is really best here...
Standardization is certainly the highest value of the OS and the thing 
that makes NuttX what it is.  Sacrificing standardization sacrifices 
the core value of the OS.


Standardization supports portability.  If we bring in code from Linux, 
it will not use getopt_r(), it will use getopt() or getopt_long() and 
may not work as expected without the TLS-based change.  Similarly, if we 
write applications that depend on the non-standard getop_r(), that code 
will not compile or build under Linux.  We will have lost portability.


Many people support code components that operate under either Linux or 
NuttX and they depend on having this compatibility.  Why break it?  It 
is not consistent with the principles set out in INVIOLABLES.md






Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt




Se we can either add something special just as for errno or use entries in that array 
(which would require establishing a minimum number of entries to satisfy the case of 
getopt en potentially others). I think it is better to somehow "reserve" space 
for the known required cases.

What i'm worried about is: how many other cases like this there could be? Maybe 
there will be a considerable number of this entries added to TLS structure 
(yes, four bytes, but they can add up quickly). I would personally prefer to 
use reentrant versions when they are available, instead of increasing memory 
use of every thread. Not sure what is really best here...
Standardization is certainly the highest value of the OS and the thing 
that makes NuttX what it is.  Sacrificing standardization sacrifices the 
core value of the OS.


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Matias N.
Se we can either add something special just as for errno or use entries in that 
array (which would require establishing a minimum number of entries to satisfy 
the case of getopt en potentially others). I think it is better to somehow 
"reserve" space for the known required cases.

What i'm worried about is: how many other cases like this there could be? Maybe 
there will be a considerable number of this entries added to TLS structure 
(yes, four bytes, but they can add up quickly). I would personally prefer to 
use reentrant versions when they are available, instead of increasing memory 
use of every thread. Not sure what is really best here...

On Wed, Mar 24, 2021, at 11:51, Gregory Nutt wrote:
> On 3/24/2021 8:38 AM, Matias N. wrote:
> > So, if I follow correctly, we could maybe have one TLS pointer pointing to 
> > a struct of pointers, one per each group of globals (one of this groups, 
> > would be the set of variables used by getopt()), like:
> >
> > struct task_globals_s
> > {
> >struct getopt_globals_s *getopt_globals;
> >/* ...others */
> > };
> >
> > Then getopt globals would only be allocated once for each task, and only 
> > when getopt() is called.
> >
> > Something like that?
> 
> Yes, that is a possibility.  But that is already implemented just as you 
> describe as POSIX thread-specific data.
> 
> The TLS data structure is defined in include/nuttx/tls.h as following.  
> it is just an array of pointer size things and the errno variable.
> 
> struct tls_info_s
> {
> #if CONFIG_TLS_NELEM > 0
>uintptr_t tl_elem[CONFIG_TLS_NELEM]; /* TLS elements */
> #endif
>int tl_errno;/* Per-thread error number */
> };
> 
> This structure lies at the "bottom" of stack of every thread in user space.
> 
> The standard pthread_getspecific() is then implemented as:
> 
> FAR void *pthread_getspecific(pthread_key_t key)
> {
>return (FAR void *)tls_get_value((int)key);
> }
> 
> Where
> 
> uintptr_t tls_get_value(int tlsindex)
> {
>FAR struct tls_info_s *info;
>uintptr_t ret = 0;
> 
>DEBUGASSERT(tlsindex >= 0 && tlsindex < CONFIG_TLS_NELEM);
>if (tlsindex >= 0 && tlsindex < CONFIG_TLS_NELEM)
>  {
>/* Get the TLS info structure from the current threads stack */
> 
>info = up_tls_info();
>DEBUGASSERT(info != NULL);
> 
>/* Get the element value from the TLS info. */
> 
>ret = info->tl_elem[tlsindex];
>  }
> 
>return ret;
> }
> 
> The POSIX interface supports a pthread_key_create() to manage the indexing.
> 
> 
> 
> 


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt

On 3/24/2021 8:38 AM, Matias N. wrote:

So, if I follow correctly, we could maybe have one TLS pointer pointing to a 
struct of pointers, one per each group of globals (one of this groups, would be 
the set of variables used by getopt()), like:

struct task_globals_s
{
   struct getopt_globals_s *getopt_globals;
   /* ...others */
};

Then getopt globals would only be allocated once for each task, and only when 
getopt() is called.

Something like that?


Yes, that is a possibility.  But that is already implemented just as you 
describe as POSIX thread-specific data.


The TLS data structure is defined in include/nuttx/tls.h as following.  
it is just an array of pointer size things and the errno variable.


   struct tls_info_s
   {
   #if CONFIG_TLS_NELEM > 0
  uintptr_t tl_elem[CONFIG_TLS_NELEM]; /* TLS elements */
   #endif
  int tl_errno;    /* Per-thread error number */
   };

This structure lies at the "bottom" of stack of every thread in user space.

The standard pthread_getspecific() is then implemented as:

   FAR void *pthread_getspecific(pthread_key_t key)
   {
  return (FAR void *)tls_get_value((int)key);
   }

Where

   uintptr_t tls_get_value(int tlsindex)
   {
  FAR struct tls_info_s *info;
  uintptr_t ret = 0;

  DEBUGASSERT(tlsindex >= 0 && tlsindex < CONFIG_TLS_NELEM);
  if (tlsindex >= 0 && tlsindex < CONFIG_TLS_NELEM)
    {
  /* Get the TLS info structure from the current threads stack */

  info = up_tls_info();
  DEBUGASSERT(info != NULL);

  /* Get the element value from the TLS info. */

  ret = info->tl_elem[tlsindex];
    }

  return ret;
   }

The POSIX interface supports a pthread_key_create() to manage the indexing.





Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Matias N.
So, if I follow correctly, we could maybe have one TLS pointer pointing to a 
struct of pointers, one per each group of globals (one of this groups, would be 
the set of variables used by getopt()), like:

struct task_globals_s
{
  struct getopt_globals_s *getopt_globals;
  /* ...others */
};

Then getopt globals would only be allocated once for each task, and only when 
getopt() is called.

Something like that?

On Wed, Mar 24, 2021, at 11:24, Gregory Nutt wrote:
> 
> > Of course, I would only call getopt() once. My question was if we use TLS, 
> > would the memory use scale with the number of threads? Or would this memory 
> > for getopt() only be allocated on getopt() calls?
> 
> Yes and yes, but the memory use might be as small as a single pointer.  
> Per task data would be better and exists now in the OS, but we would 
> have to implement some internal OS api's that libc could use to access it.
> 
> Another thing to consider is that the current per-task-data is protected 
> and can only be accessed in supervisor mode.  That is not a problem for 
> the FLAT build but might add complication to the PROTECTED build (or at 
> least more system call overhead).
> 
> KERNEL build mode does not need per-task data at all.  It would be 
> better if the data could just be kept in globals for KERNEL build, just 
> as with Linux.
> 
> 
> 


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt




Of course, I would only call getopt() once. My question was if we use TLS, 
would the memory use scale with the number of threads? Or would this memory for 
getopt() only be allocated on getopt() calls?


Yes and yes, but the memory use might be as small as a single pointer.  
Per task data would be better and exists now in the OS, but we would 
have to implement some internal OS api's that libc could use to access it.


Another thing to consider is that the current per-task-data is protected 
and can only be accessed in supervisor mode.  That is not a problem for 
the FLAT build but might add complication to the PROTECTED build (or at 
least more system call overhead).


KERNEL build mode does not need per-task data at all.  It would be 
better if the data could just be kept in globals for KERNEL build, just 
as with Linux.





Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Matias N.
Of course, I would only call getopt() once. My question was if we use TLS, 
would the memory use scale with the number of threads? Or would this memory for 
getopt() only be allocated on getopt() calls?

On Wed, Mar 24, 2021, at 10:56, Gregory Nutt wrote:
> 
> >> You would expect getopt() to be used only on the many thread since that
> >> is the only thread that receives argc and argv.
> > So if it is only used in one thread there would only be a copy of the data? 
> > What if I spawn multiple threads and call getopt only on one?
> 
> It is hard to imagine how you could could call getopt() on any pthread 
> created with pthread_create():  The thread has no argc and argv inputs 
> so how could a pthread use getopt() unless you contrive something very 
> artificial situation.  pthreads do not receive argument lists and, 
> hence, don't need to parse argument lists.  The single thread that 
> starts with main() is the only thread that has argc and argv and the 
> only thread that can normally call getopt().
> 
> 
> 


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt




You would expect getopt() to be used only on the many thread since that
is the only thread that receives argc and argv.

So if it is only used in one thread there would only be a copy of the data? 
What if I spawn multiple threads and call getopt only on one?


It is hard to imagine how you could could call getopt() on any pthread 
created with pthread_create():  The thread has no argc and argv inputs 
so how could a pthread use getopt() unless you contrive something very 
artificial situation.  pthreads do not receive argument lists and, 
hence, don't need to parse argument lists.  The single thread that 
starts with main() is the only thread that has argc and argv and the 
only thread that can normally call getopt().





Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Matias N.


On Wed, Mar 24, 2021, at 10:37, Gregory Nutt wrote:
> 
> > Thanks for all answers. I don't entirely understand most of them though as 
> > I'm not really familiar with the implications of TLS or how to use it 
> > correctly. Also, do we need per-thread or per-task data here?
> 
> You would expect getopt() to be used only on the many thread since that 
> is the only thread that receives argc and argv.

So if it is only used in one thread there would only be a copy of the data? 
What if I spawn multiple threads and call getopt only on one?

> 
> A faithful, bug-for-bug, implementation would require a per task, but 
> AFAIK there would be no real problem with per thread either.

Yes, my thinking is that getopt() does not provide thread safety guarantees but 
it is not wrong to provide them.
I think that some obscure case of changing getopt() globals from different 
threads is worst to support than just not
doing anything in our case.

> 
> >
> > What I'm thinking is that, besides the TLS based solution, adding a 
> > non-standard getopt() seems to be a good option anyway, since it is a 
> > lightweight solution to this particular function.
> 
> Except that NuttX is a standards based OS and we avoid non-standard 
> interfaces like the plague.  Using TLS is 100% transparent and 100% 
> compatible.  Why would you adopt a non-standard solution when a better, 
> fully compliant implementation is readily available?

My thinking is that this could be a case of something that in FLAT mode would 
give the wrong results if just used as is and the non-standard function allows 
to overcome this. But of course if the TLS solution is the right approach and 
does not incur in extra resource usage, the extra function would of course not 
be needed. My concern was that "the right approach" would incur in too much 
resource usage. But again, I don't understand how TLS works yet.

Best,
Matias

Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt




Thanks for all answers. I don't entirely understand most of them though as I'm 
not really familiar with the implications of TLS or how to use it correctly. 
Also, do we need per-thread or per-task data here?


You would expect getopt() to be used only on the many thread since that 
is the only thread that receives argc and argv.


A faithful, bug-for-bug, implementation would require a per task, but 
AFAIK there would be no real problem with per thread either.




What I'm thinking is that, besides the TLS based solution, adding a 
non-standard getopt() seems to be a good option anyway, since it is a 
lightweight solution to this particular function.


Except that NuttX is a standards based OS and we avoid non-standard 
interfaces like the plague.  Using TLS is 100% transparent and 100% 
compatible.  Why would you adopt a non-standard solution when a better, 
fully compliant implementation is readily available?






Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Matias N.
Thanks for all answers. I don't entirely understand most of them though as I'm 
not really familiar with the implications of TLS or how to use it correctly. 
Also, do we need per-thread or per-task data here? 

What I'm thinking is that, besides the TLS based solution, adding a 
non-standard getopt() seems to be a good option anyway, since it is a 
lightweight solution to this particular function.

So, how should we proceed to address this somehow? 

Best,
Matias

On Wed, Mar 24, 2021, at 10:22, Gregory Nutt wrote:
> 
> >> The custom handler isn't enough here, because the real problem is we need
> >> the global variables per task/process.
> >> As Greg suggests, we need something like TLS but per task/process not per
> >> thread(e.g. task_getspecific/task_setspecific).
> >> Once the mechanism is done, getopt can be converted to confirm the standard
> >> trivally.
> >>
> > I was looking at this exact issue last week (see comment in
> > https://github.com/apache/incubator-nuttx/pull/3054).
> >
> > The basis for this mechanism exists in the way errno is handled. Perhaps a
> > structure defined for all libc globals added to TLS and a call from the
> > task creation code to initialise it?
> 
> That would, of course, make the stack usage much larger.  Perhaps an 
> allocate-on-demand approach would make that doable.
> 
> POSIX thread specific data has a good, interface for such 
> allocate-on-demand usage.  POSIX thread specific data is built on top of 
> TLS.
> 
> 
> 


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt




The custom handler isn't enough here, because the real problem is we need
the global variables per task/process.
As Greg suggests, we need something like TLS but per task/process not per
thread(e.g. task_getspecific/task_setspecific).
Once the mechanism is done, getopt can be converted to confirm the standard
trivally.


I was looking at this exact issue last week (see comment in
https://github.com/apache/incubator-nuttx/pull/3054).

The basis for this mechanism exists in the way errno is handled. Perhaps a
structure defined for all libc globals added to TLS and a call from the
task creation code to initialise it?


That would, of course, make the stack usage much larger.  Perhaps an 
allocate-on-demand approach would make that doable.


POSIX thread specific data has a good, interface for such 
allocate-on-demand usage.  POSIX thread specific data is built on top of 
TLS.





Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Gregory Nutt




The custom handler isn't enough here, because the real problem is we need
the global variables per task/process.
As Greg suggests, we need something like TLS but per task/process not per
thread(e.g. task_getspecific/task_setspecific).
Once the mechanism is done, getopt can be converted to confirm the standard
trivally.
The mechanism there to emulate per process global variables.  It is the 
task group.  Data added to the task group structure is per 
task/process.  Examples are file descriptors and signal handlers which 
must follow that scoping

The transparent/standard solution is switched to the ELF binary(note: it
doesn't depend on KERNEL mode), and then loses the XIP benefit(huge memory
penalty). But, it's doable to XIP again by combining ELF loader and ROMFS.


ELF can never be XIP from ROMFS because it requires relocations inside 
of the ELF image to link to the base system.


In Linux, ELF is separated into a text region that holds the shared code 
and a RAM region (the Global Offset Table "GOT") that is positioned 
right after the text region using the MMU.  The GOT is not useful in the 
NuttX model because we cannot force the GOT to a known position with the 
MMU.  So the GOT is not included in the link and relocations must be 
performed directly into the ELF text region


NxFLAT will run XIP from ROMFS because it does all relocations in a 
"thunk" layer outside of the text region.  But NxFLAT has some other 
limitations with regard to points.



When using globals, best practice is to make it really clear that the
variables are global. Many programmers do this by prefixing global
variable names with g_*.


This prefix is required by the coding standard.



Yes, my concern is about functions such as getopt(). If you just follow the
description of the API and use it as normal you reach this pitfall. I was
looking
for some approach to avoid this as much as possible. For getopt() I see
there's
even no standard getopt_r(), so we would have to provide our own, which
may not
be a bad idea.
Still, this issue will probably present in many other places.



Seldom people will call getopt_r in Linux, because the different process
gets the new and clean copy, but it is crucial for NuttX to work correctly.
Yes, getopt_r isn't standardized by committee, but it follows the
convention used by other similar functions(e.g. strtok_r) and implemented
by glibc.
getopt_r is unnecessary in Linux since it naturally has per process 
global variables.  Using getopt_r would serve no purpose in a true Unix 
environment and, hence, it not included in any standard.




RE: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread David Sidrane
> For getopt() I see there's
even no standard getopt_r(), so we would have to provide our own, which
may not
be a bad idea.

Here is the one we have been using.

https://github.com/PX4/PX4-Autopilot/commit/eab32572f42f8e3e715b952512b6f5
df9041f848

https://github.com/PX4/PX4-Autopilot/blob/master/platforms/common/px4_geto
pt.c


David

-Original Message-
From: Matias N. [mailto:mat...@imap.cc]
Sent: Tuesday, March 23, 2021 6:18 PM
To: dev@nuttx.apache.org
Subject: Re: avoiding pitfal of reuse of globals in FLAT mode?



On Tue, Mar 23, 2021, at 22:09, Nathan Hartman wrote:
> On Tue, Mar 23, 2021 at 8:39 PM Matias N. mailto:matias%40imap.cc>> wrote:
>
> > Hi,
> > while using getopt() from a task started from NSH I realized
subsequent
> > calls reused the global optind and similar variables resulting in
different
> > results each time. I'm aware this is expected in FLAT mode and is
related
> > to the issue of static C++ constructors (they would only be called
once,
> > not every time the task is started).
> >
> > What I wonder is what could we do to avoid this common pitfall:
> > - document it somewhere (a common issues/troubleshooting section in
the
> > docs would be good to have anyways) and just accept the issue
> > - religiously initialize globals myself before being used (a pain,
error
> > prone, and a bit adhoc, working only for FLAT mode)
>
>
> When using globals, best practice is to make it really clear that the
> variables are global. Many programmers do this by prefixing global
variable
> names with g_*.
>
> I take a different approach: A long time ago, I started grouping all
> globals in a struct, which has one global instance called Global. It
makes
> it easy to find all globals, and furthermore at the start of the program
as
> a matter of policy the first thing I do is memset() the Global struct to
0.
> Yes, I know that is often redundant to the startup code, but in some
> situations the startup code doesn't initialize globals. The FLAT model
is
> one example of this (from the 2nd invocation onwards). I've seen other
> examples of this over the years. By memset()ing your globals at the
start
> of main() you can rest assured that the globals are in fact zeroed,
> regardless of whatever else happened before main(). It has another side
> benefit: with globals grouped this way, it becomes trivial to take a
> standalone program and turn it into a component of a larger program.
> tl;dr, this
> approach has worked great for me for a long time.

That sounds like a good approach.

>
> Caveat: It won't help if your program (or any API called by it) uses
> globals that are outside your control, and therefore, not initialized by
> you. :-/

Yes, my concern is about functions such as getopt(). If you just follow
the
description of the API and use it as normal you reach this pitfall. I was
looking
for some approach to avoid this as much as possible. For getopt() I see
there's
even no standard getopt_r(), so we would have to provide our own, which
may not
be a bad idea.
Still, this issue will probably present in many other places.

>
> Nathan
>

Best,
Matias


Re: avoiding pitfal of reuse of globals in FLAT mode?

2021-03-24 Thread Byron Ellacott
On Wed, Mar 24, 2021 at 2:08 PM Xiang Xiao 
wrote:

> On Wed, Mar 24, 2021 at 9:18 AM Matias N.  wrote:
>
> >
> > > > - devise a mechanism to mimic what would be done by OS in KERNEL mode
> > (add
> >
> > > some custom handler to APIs internally using globals, such as getopt,
> > that can be
> >
> > > called either manually by user or by the OS itself when the task is
> > started?)
>
>
> The custom handler isn't enough here, because the real problem is we need
> the global variables per task/process.
> As Greg suggests, we need something like TLS but per task/process not per
> thread(e.g. task_getspecific/task_setspecific).
> Once the mechanism is done, getopt can be converted to confirm the standard
> trivally.
>

I was looking at this exact issue last week (see comment in
https://github.com/apache/incubator-nuttx/pull/3054).

The basis for this mechanism exists in the way errno is handled. Perhaps a
structure defined for all libc globals added to TLS and a call from the
task creation code to initialise it?


> >
> > > - other?
>
> The transparent/standard solution is switched to the ELF binary(note: it
> doesn't depend on KERNEL mode), and then loses the XIP benefit(huge memory
> penalty). But, it's doable to XIP again by combining ELF loader and ROMFS.
>

Switching to ELF doesn't necessarily help - I encountered the problem
loading nsh as an ELF binary in a FLAT built. The globals optarg and optind
aren't in libc.csv. They'll be picked up and included in symtab_apps, but
those symbols will simply point to the (system) global variables, not a
per-task variable.

  Byron