Re: UPD send delay

2020-11-09 Thread Gregory Nutt



There are two threads involved:

1. The user thread that calls the UDP sendto() interface:

  * Lock the network.
  * Call netdev_txnotify_dev() to inform the driver that TX data is
available.  The driver should schedule the TX poll on LP work queue.
  * If CONFIG_NET_UDP_WRITE_BUFFERS is enabled, the UDP sendto() to
will copy the UDP packet into a write buffer, unlock the network,
and return to the caller immediately.
  * if CONFIG_NET_UDP_WRITE_BUFFERS is NOT enabled, the UDP sendto()
will unlock the network and wait for the driver TX poll

The other thread is the LP work queue thread.  Work was schedule here 
when netdev_txnotify_dev() was called.


  * Lock the network (perhaps waiting for the user thread to unlock it).
  * Perform the TX poll
  * If CONFIG_NET_UDP_WRITE_BUFFERS is enabled, it will copy the
buffered UDP packet into the driver packet buffer.
  * If CONFIG_NET_UDP_WRITE_BUFFERS is NOT enabled, it will copy the
user data directly into the driver packet buffer.
  * When the packet buffer is filled, the Ethernet driver will send
(or schedule to send) the packet

For single packet transfers, I would think that the latency would be a 
little less if CONFIG_NET_UDP_WRITE_BUFFERS were disabled.  That would 
save one packet copy with the side effect of making the user 
application wait until the data is accepted by the driver.


LP worker thread priority could have some effect in certain situations.

See also the slide entitled /Tx Event Handler “Rendezvous”/ in 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=139629397




Re: UPD send delay

2020-11-09 Thread Gregory Nutt



I've already tested network throughput with 'udpblaster' tool and was
happily wondering by NuttX capability to flood full Ethernet bandwidth in
both directions simultaneously. That was one of the reasons why we've
chosen NuttX for our project. "If it's so strong in flooding, it should
have low latency" - we thought.. but not, real latency seems to be not much
great.
In a throughput test like updblaster, the latency that you are concerned 
with now is hidden due to the overlap in sending queued data packets.  
The next packet is always in place and ready to be sent when the 
previous packet is sent.

I suppose the delay hides somewhere in the Ethernet driver because packets
from RAW sockets suffer as well as UDP packets. But I need some glue to
help me choose the right direction for further digging..


There are two threads involved:

1. The user thread that calls the UDP sendto() interface:

 * Lock the network.
 * Call netdev_txnotify_dev() to inform the driver that TX data is
   available.  The driver should schedule the TX poll on LP work queue.
 * If CONFIG_NET_UDP_WRITE_BUFFERS is enabled, the UDP sendto() to will
   copy the UDP packet into a write buffer, unlock the network, and
   return to the caller immediately.
 * if CONFIG_NET_UDP_WRITE_BUFFERS is NOT enabled, the UDP sendto()
   will unlock the network and wait for the driver TX poll

The other thread is the LP work queue thread.  Work was schedule here 
when netdev_txnotify_dev() was called.


 * Lock the network (perhaps waiting for the user thread to unlock it).
 * Perform the TX poll
 * If CONFIG_NET_UDP_WRITE_BUFFERS is enabled, it will copy the
   buffered UDP packet into the driver packet buffer.
 * If CONFIG_NET_UDP_WRITE_BUFFERS is NOT enabled, it will copy the
   user data directly into the driver packet buffer.
 * When the packet buffer is filled, the Ethernet driver will send (or
   schedule to send) the packet

For single packet transfers, I would think that the latency would be a 
little less if CONFIG_NET_UDP_WRITE_BUFFERS were disabled. That would 
save one packet copy with the side effect of making the user application 
wait until the data is accepted by the driver.


LP worker thread priority could have some effect in certain situations.



The


Re: UPD send delay

2020-11-09 Thread Andrey Shetov
Thanks Gregory, for that wiki page.

I've already tested network throughput with 'udpblaster' tool and was
happily wondering by NuttX capability to flood full Ethernet bandwidth in
both directions simultaneously. That was one of the reasons why we've
chosen NuttX for our project. "If it's so strong in flooding, it should
have low latency" - we thought.. but not, real latency seems to be not much
great.

I suppose the delay hides somewhere in the Ethernet driver because packets
from RAW sockets suffer as well as UDP packets. But I need some glue to
help me choose the right direction for further digging..

On Mon, Nov 9, 2020 at 8:43 PM Gregory Nutt  wrote:

> Probably worth taking a look at
> https://cwiki.apache.org/confluence/display/NUTTX/TCP+Network+Performance
>
> That deals specifically with TCP but discussion related to the Ethernet
> driver also applies.  The NuttX (TCP) is capable of performing at
> network speeds; the performance bottleneck is always the Ethernet
> driver... usually the way in which the Ethernet driver buffers packets.
>
> If you want to optimize UDP performance, you might need to improve the
> Ethernet driver.
>
>
>


Re: UPD send delay

2020-11-09 Thread Gregory Nutt
Probably worth taking a look at 
https://cwiki.apache.org/confluence/display/NUTTX/TCP+Network+Performance


That deals specifically with TCP but discussion related to the Ethernet 
driver also applies.  The NuttX (TCP) is capable of performing at 
network speeds; the performance bottleneck is always the Ethernet 
driver... usually the way in which the Ethernet driver buffers packets.


If you want to optimize UDP performance, you might need to improve the 
Ethernet driver.





Re: UPD send delay

2020-11-09 Thread Andrey Shetov
Here is my current config file.

Thanks for advice about the work queue priority, I found and set the
CONFIG_STM32H7_ETHMAC_HPWORK option.
But mystically this make the latency worse - average ping-pong time
increased from ~375 to ~400 usec..

On Mon, Nov 9, 2020 at 8:04 PM Maciej Wójcik  wrote:

> In case of board I was working with, the network traffic was handled by low
> priority work queue. It is probably the same with your network driver.
>
> Such work queue has configuration. Maybe you need to change some parameters
> of it.
>
> It might help if you send here your configuration file, or part of it.
>
> On Mon, 9 Nov 2020, 13:55 Andrey Shetov,  wrote:
>
> > Hi, Jukka.
> >
> > I'm working with the thread author on a low latency audio processor based
> > on the STM32H7 chip and can provide some technical details about the
> issue.
> > We've used Ethernet config from your repo bitbucket.org/jlaitin/nuttx/
> and
> > integrated it to master branch of Nuttx OS. After that, we've ported
> > CONFIG_SCHED_CRITMONITOR functionality into H7 platform and have
> > measured sending and receiving time of UDP packets by inserting bunch of
> > tracepoints into the driver. Results are little strange - packet
> receiving
> > is OK, but actual time of packet _sending_ at 100Mbit/s is about 200
> usec.
> > By "actual time" I mean delay between sendto() and stm32_txdone()
> > functions.
> >
> > For now we are looking the reasons of such delay and ways to reduce it,
> any
> > help will be much appreciated.
> > PS We've already tried to use raw (packet) socket, it reduces the delay a
> > little, but not significant.
> >
> > -> sendto_start_time: 0 usec
> > stm32_dopoll: 58 usec
> > stm32_transmit: 93 usec
> > -> sendto_done_time: 120 usec
> > stm32_interrupt: 186 usec
> > ETH_DMACSR_TI: 187 usec
> > stm32_interrupt_work: 205 usec
> > stm32_txdone: 207 usec
> >
> >
> > On Sat, Nov 7, 2020 at 8:29 AM Jukka Laitinen 
> > wrote:
> >
> > > Hi,
> > >
> > > I am not quite sure about from where you measured (and what is your phy
> > > config), but sending out 1000 bytes will take in theory about 800+ us
> at
> > > 10Mbps and 80+ us at 100Mbps, so you probably wouldn't want it to be
> > > synchronous (you'd like to be able to receive in full duplex at the
> same
> > > time when transmit happens). The ethernet driver uses the stm32h7 eth
> > > block's internal dma for sending. Other ethernet traffic, hubs,
> switches,
> > > the network cards.. may cause additional delays.
> > >
> > > Handling the interrupt should be fast, but the actual completion
> happens
> > > in your thread (waking up from mutex), so this can be delayed by some
> > > higher priority thread running in your system.
> > >
> > > - Jukka
> > >
> > >
> > > S D kirjoitti tiistai 3. marraskuuta 2020:
> > > > Hello!
> > > > could you please help a nuttx newbie.
> > > > I'm using a stm32h7 for my project
> > > > I'm trying to communicate with "big" Linux machine via direct
> ethernet
> > > (UDP), and I see relatively big delays for outgoing packets (with a
> 1000
> > > bytes payload).
> > > > I see that sendto is executed in about 120usecs, but the actual
> > transmit
> > > is seems to be executed much later, and I see the actual transmit
> > > completion (interrupt) in about 100 more usecs later after return from
> > > sendto  call.
> > > >
> > > > I spent a few days reading stm32_ethernet.c driver (and all the other
> > > stuff it works with) and it seems the system is heavily asynchronous.
> > > > Could you please tell, Is there a way to send/receive data
> > > immediately/synchronous if all I need a direct Ethernet connection to
> > > another device (may be even with raw ethernet frames) ?
> > > >
> > > > Thanks!
> > > >
> >
>
#
# Automatically generated file; DO NOT EDIT.
# NuttX/x86_64 Configuration
#

#
# Build Setup
#
CONFIG_EXPERIMENTAL=y
# CONFIG_DEFAULT_SMALL is not set
CONFIG_DEFAULT_TASK_STACKSIZE=2048
CONFIG_HOST_LINUX=y
# CONFIG_HOST_MACOS is not set
# CONFIG_HOST_WINDOWS is not set
# CONFIG_HOST_OTHER is not set

#
# Build Configuration
#
CONFIG_APPS_DIR="../apps"
CONFIG_BUILD_FLAT=y
# CONFIG_BUILD_2PASS is not set

#
# Binary Output Formats
#
CONFIG_INTELHEX_BINARY=y
# CONFIG_MOTOROLA_SREC is not set
CONFIG_RAW_BINARY=y
# CONFIG_UBOOT_UIMAGE is not set
# CONFIG_DFU_BINARY is not set
#
# Customize Header Files
#
# CONFIG_ARCH_HAVE_STDINT_H is not set
# CONFIG_ARCH_HAVE_STDBOOL_H is not set
# CONFIG_ARCH_HAVE_MATH_H is not set
# CONFIG_ARCH_FLOAT_H is not set
CONFIG_ARCH_HAVE_STDARG_H=y
# CONFIG_ARCH_STDARG_H is not set
CONFIG_ARCH_HAVE_SETJMP=y
# CONFIG_ARCH_SETJMP_H is not set
# CONFIG_ARCH_DEBUG_H is not set

#
# Debug Options
#
CONFIG_DEBUG_ALERT=y
# CONFIG_DEBUG_FEATURES is not set
CONFIG_ARCH_HAVE_STACKCHECK=y
# CONFIG_STACK_COLORATION is not set
# CONFIG_STACK_CANARIES is not set
# CONFIG_ARCH_HAVE_HEAPCHECK is not set
CONFIG_DEBUG_SYMBOLS=y
CONFIG_ARCH_HAVE_CUSTOMOPT=y
CONFIG_DEBUG_NOOPT=y
# CONFIG_DEBUG_CUSTOMOPT is not set
# CONFIG_DEBUG_FULLOPT is 

Re: UPD send delay

2020-11-09 Thread Maciej Wójcik
In case of board I was working with, the network traffic was handled by low
priority work queue. It is probably the same with your network driver.

Such work queue has configuration. Maybe you need to change some parameters
of it.

It might help if you send here your configuration file, or part of it.

On Mon, 9 Nov 2020, 13:55 Andrey Shetov,  wrote:

> Hi, Jukka.
>
> I'm working with the thread author on a low latency audio processor based
> on the STM32H7 chip and can provide some technical details about the issue.
> We've used Ethernet config from your repo bitbucket.org/jlaitin/nuttx/ and
> integrated it to master branch of Nuttx OS. After that, we've ported
> CONFIG_SCHED_CRITMONITOR functionality into H7 platform and have
> measured sending and receiving time of UDP packets by inserting bunch of
> tracepoints into the driver. Results are little strange - packet receiving
> is OK, but actual time of packet _sending_ at 100Mbit/s is about 200 usec.
> By "actual time" I mean delay between sendto() and stm32_txdone()
> functions.
>
> For now we are looking the reasons of such delay and ways to reduce it, any
> help will be much appreciated.
> PS We've already tried to use raw (packet) socket, it reduces the delay a
> little, but not significant.
>
> -> sendto_start_time: 0 usec
> stm32_dopoll: 58 usec
> stm32_transmit: 93 usec
> -> sendto_done_time: 120 usec
> stm32_interrupt: 186 usec
> ETH_DMACSR_TI: 187 usec
> stm32_interrupt_work: 205 usec
> stm32_txdone: 207 usec
>
>
> On Sat, Nov 7, 2020 at 8:29 AM Jukka Laitinen 
> wrote:
>
> > Hi,
> >
> > I am not quite sure about from where you measured (and what is your phy
> > config), but sending out 1000 bytes will take in theory about 800+ us at
> > 10Mbps and 80+ us at 100Mbps, so you probably wouldn't want it to be
> > synchronous (you'd like to be able to receive in full duplex at the same
> > time when transmit happens). The ethernet driver uses the stm32h7 eth
> > block's internal dma for sending. Other ethernet traffic, hubs, switches,
> > the network cards.. may cause additional delays.
> >
> > Handling the interrupt should be fast, but the actual completion happens
> > in your thread (waking up from mutex), so this can be delayed by some
> > higher priority thread running in your system.
> >
> > - Jukka
> >
> >
> > S D kirjoitti tiistai 3. marraskuuta 2020:
> > > Hello!
> > > could you please help a nuttx newbie.
> > > I'm using a stm32h7 for my project
> > > I'm trying to communicate with "big" Linux machine via direct ethernet
> > (UDP), and I see relatively big delays for outgoing packets (with a 1000
> > bytes payload).
> > > I see that sendto is executed in about 120usecs, but the actual
> transmit
> > is seems to be executed much later, and I see the actual transmit
> > completion (interrupt) in about 100 more usecs later after return from
> > sendto  call.
> > >
> > > I spent a few days reading stm32_ethernet.c driver (and all the other
> > stuff it works with) and it seems the system is heavily asynchronous.
> > > Could you please tell, Is there a way to send/receive data
> > immediately/synchronous if all I need a direct Ethernet connection to
> > another device (may be even with raw ethernet frames) ?
> > >
> > > Thanks!
> > >
>


Re: UPD send delay

2020-11-09 Thread Andrey Shetov
Hi, Jukka.

I'm working with the thread author on a low latency audio processor based
on the STM32H7 chip and can provide some technical details about the issue.
We've used Ethernet config from your repo bitbucket.org/jlaitin/nuttx/ and
integrated it to master branch of Nuttx OS. After that, we've ported
CONFIG_SCHED_CRITMONITOR functionality into H7 platform and have
measured sending and receiving time of UDP packets by inserting bunch of
tracepoints into the driver. Results are little strange - packet receiving
is OK, but actual time of packet _sending_ at 100Mbit/s is about 200 usec.
By "actual time" I mean delay between sendto() and stm32_txdone()
functions.

For now we are looking the reasons of such delay and ways to reduce it, any
help will be much appreciated.
PS We've already tried to use raw (packet) socket, it reduces the delay a
little, but not significant.

-> sendto_start_time: 0 usec
stm32_dopoll: 58 usec
stm32_transmit: 93 usec
-> sendto_done_time: 120 usec
stm32_interrupt: 186 usec
ETH_DMACSR_TI: 187 usec
stm32_interrupt_work: 205 usec
stm32_txdone: 207 usec


On Sat, Nov 7, 2020 at 8:29 AM Jukka Laitinen  wrote:

> Hi,
>
> I am not quite sure about from where you measured (and what is your phy
> config), but sending out 1000 bytes will take in theory about 800+ us at
> 10Mbps and 80+ us at 100Mbps, so you probably wouldn't want it to be
> synchronous (you'd like to be able to receive in full duplex at the same
> time when transmit happens). The ethernet driver uses the stm32h7 eth
> block's internal dma for sending. Other ethernet traffic, hubs, switches,
> the network cards.. may cause additional delays.
>
> Handling the interrupt should be fast, but the actual completion happens
> in your thread (waking up from mutex), so this can be delayed by some
> higher priority thread running in your system.
>
> - Jukka
>
>
> S D kirjoitti tiistai 3. marraskuuta 2020:
> > Hello!
> > could you please help a nuttx newbie.
> > I'm using a stm32h7 for my project
> > I'm trying to communicate with "big" Linux machine via direct ethernet
> (UDP), and I see relatively big delays for outgoing packets (with a 1000
> bytes payload).
> > I see that sendto is executed in about 120usecs, but the actual transmit
> is seems to be executed much later, and I see the actual transmit
> completion (interrupt) in about 100 more usecs later after return from
> sendto  call.
> >
> > I spent a few days reading stm32_ethernet.c driver (and all the other
> stuff it works with) and it seems the system is heavily asynchronous.
> > Could you please tell, Is there a way to send/receive data
> immediately/synchronous if all I need a direct Ethernet connection to
> another device (may be even with raw ethernet frames) ?
> >
> > Thanks!
> >


Using multiple SPI devices on samd2l1

2020-11-09 Thread Bernd Walter
I am supposed to call an spoi bus init function to get the spi device I
then use to register my device to.

This is the code in sam34:
/
 * Public Functions
 /

/
 * Name: sam_spibus_initialize
 *
 * Description:
 *   Initialize the selected SPI port
 *
 * Input Parameters:
 *   cs - Chip select number (identifying the "logical" SPI port)
 *
 * Returned Value:
 *   Valid SPI device structure reference on success; a NULL on failure
 *
 /

struct spi_dev_s *sam_spibus_initialize(int port)
{
  struct sam_spidev_s *spi;
  struct sam_spics_s *spics;
  int csno  = (port & __SPI_CS_MASK) >> __SPI_CS_SHIFT;
  int spino = (port & __SPI_SPI_MASK) >> __SPI_SPI_SHIFT;

...


The delivered port is the bus number and chip select number combined.
I get an spi dev device specific for the bus and chipselect.

Not so for samd2l1.
The sam_spibus_initialize() takes the port directly and uses it to
select the physical bus.
There is no chipselect part.
Additionally it seems to init the physical bus unrelated if it had been
setup already.
So it is my interpretation that I can't call this function twice.

There is a similar problem I've noticed with the i2c_master setup.

I don't know if the chip select is really required, but at least both
drivers have no support to be called multiple times.

-- 
B.Walter  http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.