Hi Jerin,

> >
> > Hi Jerin,
> 
> Hi Konstantin,
> 
> >
> > > > > >
> > > >
> > > > [...]
> > > >
> > > > > > +
> > > > > > +#ifdef RTE_ETHDEV_TX_PREP
> > > > >
> > > > > Sorry for being a bit late on that discussion, but what the
> > > > > point of having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> > > > > As I can see right now, if driver doesn't setup tx_pkt_prep,
> > > > > then nb_pkts would be return anyway...
> > > > >
> > > > > BTW, there is my another question - should it be that way?
> > > > > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if
> > > > > dev->tx_pk_prep == NULL?
> > > > >
> > > >
> > > > It's an answer to the Jerin's request discussed here:
> > > > http://dpdk.org/ml/archives/dev/2016-September/046437.html
> > > >
> > > > When driver doesn't support tx_prep, default behavior is "we don't
> > > > know requirements, so we have nothing to do here". It will
> > > > simplify
> > > application logic and improve performance for these drivers, I think. 
> > > Catching this error with every burst may be problematic.
> > > >
> > > > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same
> > > > thread, I still don't think It's the best solution of the problem
> > > described by him. I have added it here for further discussion.
> > > >
> > > > Jerin, have you something to add?
> > >
> > > Nothing very specific to add here. I think, I have tried to share
> > > the rational in, http://dpdk.org/ml/archives/dev/2016-
> > > September/046437.html
> > >
> >
> > Ok, not sure I am fully understand your intention here.
> > I think I understand why you propose rte_eth_tx_prep() to do:
> >     if (!dev->tx_pkt_prep)
> >             return nb_pkts;
> >
> > That allows drivers to NOOP the tx_prep functionality without paying
> > the price for callback invocation.
> 
> In true sense, returning the nb_pkts makes it functional NOP as well(i.e The 
> PMD does not have any limitation on Tx side, so all
> packets are _good_(no preparation is required))
> 
> 
> > What I don't understand, why with that in place we also need a NOOP
> > for the whole rte_eth_tx_prep():
> > +static inline uint16_t
> > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id 
> > __rte_unused,
> > +           struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts) {
> > +   return nb_pkts;
> > +}
> > +
> > +#endif
> >
> > What are you trying to save here: just reading ' dev->tx_pkt_prep'?
> > If so, then it seems not that performance critical for me.
> > Something else?
> 
> The proposed scheme can make it as true NOOP from compiler perspective too if 
> a target decided to do that, I have checked the
> instruction generation with arm Assembly, a non true compiler NOOP has 
> following instructions overhead at minimum.
> 
>       # 1 load
>       # 1  mov
>       if (!dev->tx_pkt_prep)
>               return nb_pkts;

Yep.

> 
>       # compile can't predict this function needs be executed or not so
>       # pressure on register allocation and mostly likely it call for
>       # some stack push and pop based load on outer function(as it is an
>       # inline function)


Well, I suppose compiler wouldn't try to fill function argument registers 
before the branch above. 

> 
>       return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts, 
> nb_pkts);
> 
>       # 1 branch
>       if (unlikely(nb_prep < nb_rx)) {
>               # bunch of code not executed, but pressure on i cache
>               int i;
>               for (i = nb_prep; i < nb_rx; i++)
>                        rte_pktmbuf_free(pkts_burst[i]);
>       }
> 
> From a server target(IA or high-end armv8) with external PCIe based system 
> makes sense to have RTE_ETHDEV_TX_PREP option
> enabled(which is the case in proposed patch) but the low end arm platforms 
> with
> - limited cores
> - less i cache
> - IPC == 1
> - running around 1GHz
> - most importantly, _integrated_ nic controller with no external PCIE
>   support
> does not make much sense to waste the cycles/time for it.
> cycle saved is cycle earned.

Ok, so it is all to save one memory de-refrence and a comparison plus branch.
Do you have aby estimation how badly it would hit low-end cpu performance?
The reason I am asking: obviously I would prefer to avoid to introduce new 
build config option
(that's the common dpdk coding practice these days) unless it is really 
important.  

> 
> Since DPDK compilation is based on _target_, I really don't see any issue 
> with this approach nor it does not hurt anything on server
> target.
> So, IMO, It should be upto the target to decide what works better for the 
> target.
> 
> Jerin
> 
> > From my point of view NOOP on the driver level is more than enough.
> > Again I would prefer to introduce new config option, if possible.
> >
> > Konstantin
> >

Reply via email to