Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
On Thu, 13 Oct 2016 14:35:06 +, Oleg Kuporosov wrote: > The hard requirement of financial services industry is accurate > timestamping aligned with the packet itself. This patch is to satisfy > this requirement: > > - include uint64_t timestamp field into rte_mbuf with minimal impact > to throughput/latency. Keep it just simple uint64_t in ns (more than > 580 years) would be enough for immediate needs while using full > struct timespec with twice bigger size would have much stronger > performance impact as missed cacheline0. > > - it is possible as there is 6-bytes gap in 1st cacheline (fast path) > and moving uint16_t vlan_tci_outer field to 2nd cacheline. > > - such move will only impact for pretty rare usable VLAN RX stripping > mode for outer TCI (it used only for one NIC i40e from the whole > set and allows to keep minimal performance impact for RX/TX > timestamps. > > Signed-off-by: Oleg Kuporosov > --- > lib/librte_mbuf/rte_mbuf.h | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 23b7bf8..1e1f2ed 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -851,8 +851,7 @@ struct rte_mbuf { > > uint32_t seqn; /**< Sequence number. See also > rte_reorder_insert() */ > - /** Outer VLAN TCI (CPU order), valid if > PKT_RX_QINQ_STRIPPED is set. */ > - uint16_t vlan_tci_outer; > + uint64_t timestamp; /**< Packet's timestamp, usually > in ns */ > /* second cache line - fields only used in slow path or on > TX */ MARKER cacheline1 __rte_cache_min_aligned; > @@ -885,6 +884,9 @@ struct rte_mbuf { > }; > }; > > + /** Outer VLAN TCI (CPU order), valid if > PKT_RX_QINQ_STRIPPED is set. */ > + uint16_t vlan_tci_outer; > + > /** Size of the application private data. In case of an > indirect >* mbuf, it stores the direct mbuf private data size. */ > uint16_t priv_size; FYI, I posted a RFC patchset that introduces the timestamp field in the mbuf for v17.05: http://dpdk.org/ml/archives/dev/2017-January/056187.html Regards, Olivier
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
Hi, > -Original Message- > From: Oleg Kuporosov [mailto:olegk at mellanox.com] > Sent: Wednesday, October 19, 2016 6:40 PM > To: Pattan, Reshma ; Olivier Matz > > Cc: Richardson, Bruce ; Ananyev, Konstantin > ; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the > packet > > Hello Reshma > > > I just read this mail chain, to make every one aware again, I am > > emphasizing on the point that I am also adding new "time_arraived" > > field to mbuf struct as part of below 17.02 Road map item. > > Thanks for your work for extending statistics support in DPDK. > > "time_arrived" points here for mostly RX path, while more common term used > "timestamp" > Allows also using it for TX path as well in the future. > I will take a note of this for next revision. Thanks, Reshma
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
Hello Konstantin, > > My vote also would be to have timestamp in the second cache line. > About moving seqn to the 2-nd cache line too - that's probably a fair point. It may impact throughput till ~6% for applications required such embedded Timestamps. > About the rest of the patch: > Do you really need to put that code into the PMDs itself? > Can't the same result be achieved by using RX callbacks? > Again that approach would work with any PMD and there would be no need > to modify PMD code itself. Correct, the approach with using callbacs (rte_eth_timesync_read_[r|t]x_timestamp()) Has also some Cons for this use case: - FSI needs the most accurate stamping as possible by reasons were described in Cover letter - callback will be called from user app and so you have to take into account Difference between time when packet was released by NIC and callback call - such difference is not easy to estimate correctly due to dependency on CPU type, Its frequency and current load conditions - even estimated it would be hard without additional performance penalty to align Packet with timestamp, taking account that call may actually placed from Different thread or even process. It looks the least impacting and correct way is to have timestamp in rte_mbuf and fill It in Rx burst procedure. > Another thing, that I am in doubt why to use system time? > Wouldn't raw HW TSC value (rte_rdtsc()) do here? System time is being used for periodic clock synchronization between wall clock and NIC to estimate NIC clock deviation. It is in assumption the system itself is synchronized by PTP from master clock. It is run on context of control thread. Thanks, Oleg.
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
On Thu, Oct 20, 2016 at 4:03 AM, Oleg Kuporosov wrote: > Hello Konstantin, > >> >> My vote also would be to have timestamp in the second cache line. >> About moving seqn to the 2-nd cache line too - that's probably a fair point. > > It may impact throughput till ~6% for applications required such embedded > Timestamps. > >> About the rest of the patch: >> Do you really need to put that code into the PMDs itself? >> Can't the same result be achieved by using RX callbacks? >> Again that approach would work with any PMD and there would be no need >> to modify PMD code itself. > > Correct, the approach with using callbacs > (rte_eth_timesync_read_[r|t]x_timestamp()) > Has also some Cons for this use case: > - FSI needs the most accurate stamping as possible by reasons were described > in > Cover letter >From my experience this is only true if there is near-zero performance impact. From my perspective this is only relevant if the used hardware supports offloading of writing the timestamps. Everything else is a huge impact if its unconditionally enabled. The regulatory requirements are already covered by the exchange protocols which means that timestamps are already present in the network packet payload (generated by the exchange trading system and/or the trading application itself). In the end it is the exchange itself and its members that are regulated. I can see that this might be interesting for exchange members allowing sponsored naked access (for non-exchange members) to generate data that they are not front-running their clients. I doubt that this non-functional requirement is important enough to sacrifice the functional requirement of supporting QinQ. > - callback will be called from user app and so you have to take into account > Difference between time when packet was released by NIC and callback call Have you looked at using dedicated preallocated trace buffers that are filled with timestamps values? This should work fine for getting some inside into the latency between application readiness and the actual time the burst happened. Thanks, Jan > - such difference is not easy to estimate correctly due to dependency on CPU > type, > Its frequency and current load conditions > - even estimated it would be hard without additional performance penalty to > align > Packet with timestamp, taking account that call may actually placed from > Different thread or even process. > > It looks the least impacting and correct way is to have timestamp in rte_mbuf > and fill > It in Rx burst procedure. > >> Another thing, that I am in doubt why to use system time? >> Wouldn't raw HW TSC value (rte_rdtsc()) do here? > > System time is being used for periodic clock synchronization between wall > clock and NIC to estimate NIC clock deviation. It is in assumption the system > itself is > synchronized by PTP from master clock. It is run on context of control thread. > > Thanks, > Oleg.
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
Hello Reshma > I just read this mail chain, to make every one aware again, I am emphasizing > on the point that I am also adding new "time_arraived" field to mbuf struct as > part of below 17.02 Road map item. Thanks for your work for extending statistics support in DPDK. "time_arrived" points here for mostly RX path, while more common term used "timestamp" Allows also using it for TX path as well in the future. Best regards, Oleg.
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
Hello Oliver Great thanks for your review and con > > - include uint64_t timestamp field into rte_mbuf with minimal impact to > > throughput/latency. Keep it just simple uint64_t in ns (more than 580 > > years) would be enough for immediate needs while using full > > struct timespec with twice bigger size would have much stronger > > performance impact as missed cacheline0. > > > > - it is possible as there is 6-bytes gap in 1st cacheline (fast path) > > and moving uint16_t vlan_tci_outer field to 2nd cacheline. > > > > - such move will only impact for pretty rare usable VLAN RX stripping > > mode for outer TCI (it used only for one NIC i40e from the whole set and > > allows to keep minimal performance impact for RX/TX timestamps. > > This argument is difficult to accept. One can say you are adding a field for a > pretty rare case used by only one NIC :) It may be looks so, but in fact not only for one NIC. Absence of timestamp there Required from developers implement its support in out of DPDK data path with Local DPDK patching which also may lead some penalty in accuracy. Good example here is open source implementation of tracing library - https://github.com/wanduow/libtrace/tree/libtrace4 These folks patched DPDK to have timestamp for every packet (Intel DPDK Patches folder) And used it in out of band to store and later processing (lib/format_dpdk.c). That was actually the starting point of my investigations. Such approach is working for 1GBb case but not for 10-100 cases. > > Honestly, I'm not able to judge whether timestamp is more important than > vlan_tci_outer. As room is tight in the first cache line, your patch > submission > is the occasion to raise the question: how to decide what should be in the > first part of the mbuf? There are also some other candidates for moving: m- > >seqn is only used in librte_reorder and it is not set in the RX part of a > >driver. Agree, it is difficult to decide, my thoughts were: - there is hole (6 bytes) which wasn't marked as reserved for any planned extensions; -vlan_tci_outer is being used by only one NIC (i40e) so far, 2nd NIC (fm10k) is using it With comment: * mbuf->vlan_tci_outer is an idle field in fm10k driver, * so it can be selected to store sglort value. To store some another value under some specific "if". Also for i40e it is under #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC which per doc should be Enabled for high throughput of small packets. So in default case (disabled) it anyway has some performance penalty with using 32 bytes descriptor and moving it to 2nd CL would not hit big additional penalty. Unfortunately I have no such NIC to measure. Is there any data how often double tagging in being used in DPDK applications? Another my thought was to have at the end of 1st CL enum which may hold Reserved fields per specific use cases and data widths (uint8, 2xuint4, 4xuint2, 8xbytes). > > About the timestamp, it would be valuable to have other opinions, not only > about the placement of the field in the structure, but also to check that this > API is also usable for other NICs. Sure, but I didn't change timesync/timestamping API itself. > Have you measured the impact of having the timestamp in the second part > of the mbuf? Yes, the worst case with prefetching of 2nd CL it is 5.7-5.9 % penalty for combined RX+TX And expectedly much worse without prefetching. In the best case it is 0.3..0.5 % for RX only. It can be explained by much harder cache trashing when TX is "on". > Changing the mbuf structure should happen as rarely as possible, and we > have to make sure we take the correct decisions. I think we will discuss this > at > dpdk userland 2016. Oh, yes, please discuss, I would not be able to join. :( > Apart from that, I wonder if an ol_flag should be added to tell that the > timestamp field is valid in the mbuf. Oliver, there is PKT_RX_IEEE1588_TMST flag already. Best regards, Oleg.
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
Hi lads, > > > > On 10/13/2016 04:35 PM, Oleg Kuporosov wrote: > > The hard requirement of financial services industry is accurate > > timestamping aligned with the packet itself. This patch is to satisfy > > this requirement: > > > > - include uint64_t timestamp field into rte_mbuf with minimal impact to > > throughput/latency. Keep it just simple uint64_t in ns (more than 580 > > years) would be enough for immediate needs while using full > > struct timespec with twice bigger size would have much stronger > > performance impact as missed cacheline0. > > > > - it is possible as there is 6-bytes gap in 1st cacheline (fast path) > > and moving uint16_t vlan_tci_outer field to 2nd cacheline. > > > > - such move will only impact for pretty rare usable VLAN RX stripping > > mode for outer TCI (it used only for one NIC i40e from the whole set and > > allows to keep minimal performance impact for RX/TX timestamps. > > This argument is difficult to accept. One can say you are adding > a field for a pretty rare case used by only one NIC :) > > Honestly, I'm not able to judge whether timestamp is more important than > vlan_tci_outer. As room is tight in the first cache line, your patch > submission is the occasion to raise the question: how to decide what > should be in the first part of the mbuf? There are also some other > candidates for moving: m->seqn is only used in librte_reorder and it > is not set in the RX part of a driver. > > About the timestamp, it would be valuable to have other opinions, > not only about the placement of the field in the structure, but also > to check that this API is also usable for other NICs. > > Have you measured the impact of having the timestamp in the second part > of the mbuf? My vote also would be to have timestamp in the second cache line. About moving seqn to the 2-nd cache line too - that's probably a fair point. About the rest of the patch: Do you really need to put that code into the PMDs itself? Can't the same result be achieved by using RX callbacks? Again that approach would work with any PMD and there would be no need to modify PMD code itself. Another thing, that I am in doubt why to use system time? Wouldn't raw HW TSC value (rte_rdtsc()) do here? Konstantin > > Changing the mbuf structure should happen as rarely as possible, and we > have to make sure we take the correct decisions. I think we will > discuss this at dpdk userland 2016. > > > Apart from that, I wonder if an ol_flag should be added to tell that > the timestamp field is valid in the mbuf. > > Regards, > Olivier
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz > Sent: Tuesday, October 18, 2016 4:44 PM > To: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the > packet > > > > On 10/13/2016 04:35 PM, Oleg Kuporosov wrote: > > The hard requirement of financial services industry is accurate > > timestamping aligned with the packet itself. This patch is to satisfy > > this requirement: > > > > - include uint64_t timestamp field into rte_mbuf with minimal impact to > > throughput/latency. Keep it just simple uint64_t in ns (more than 580 > > years) would be enough for immediate needs while using full > > struct timespec with twice bigger size would have much stronger > > performance impact as missed cacheline0. > > > > - it is possible as there is 6-bytes gap in 1st cacheline (fast path) > > and moving uint16_t vlan_tci_outer field to 2nd cacheline. > > > > - such move will only impact for pretty rare usable VLAN RX stripping > > mode for outer TCI (it used only for one NIC i40e from the whole set and > > allows to keep minimal performance impact for RX/TX timestamps. > > This argument is difficult to accept. One can say you are adding a field for a > pretty rare case used by only one NIC :) > > Honestly, I'm not able to judge whether timestamp is more important than > vlan_tci_outer. As room is tight in the first cache line, your patch > submission is > the occasion to raise the question: how to decide what should be in the first > part > of the mbuf? There are also some other candidates for moving: m->seqn is only > used in librte_reorder and it is not set in the RX part of a driver. > > About the timestamp, it would be valuable to have other opinions, not only > about the placement of the field in the structure, but also to check that > this API > is also usable for other NICs. > > Have you measured the impact of having the timestamp in the second part of > the mbuf? > > Changing the mbuf structure should happen as rarely as possible, and we have > to > make sure we take the correct decisions. I think we will discuss this at dpdk > userland 2016. > > I just read this mail chain, to make every one aware again, I am emphasizing on the point that I am also adding new "time_arraived" field to mbuf struct as part of below 17.02 Road map item. " Extended Stats (Latency and Bit Rate Statistics): Enhance the Extended NIC Stats (Xstats) implementation to support the collection and reporting of latency and bit rate measurements. Latency statistics will include min, max and average latency, and jitter. Bit rate statistics will include peak and average bit rate aggregated over a user-defined time period. This will be implemented for IXGBE and I40E." Adding new field " time_arrived" to the 2nd cache line of rte_mbuf struct to mark the packet arrival time for latency measurements. Adding this new filed to second cache line will not break the ABI. I implemented a new library to mark the time as part of rte_eth_rx callbacks and use that time stamp inside rte_eth_tx callback for measuring the latency. Below is the latest v3 RFC patch for reference. http://dpdk.org/dev/patchwork/patch/16655/ Comments are welcome. Thanks, Reshma
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
On 10/13/2016 04:35 PM, Oleg Kuporosov wrote: > The hard requirement of financial services industry is accurate > timestamping aligned with the packet itself. This patch is to satisfy > this requirement: > > - include uint64_t timestamp field into rte_mbuf with minimal impact to > throughput/latency. Keep it just simple uint64_t in ns (more than 580 > years) would be enough for immediate needs while using full > struct timespec with twice bigger size would have much stronger > performance impact as missed cacheline0. > > - it is possible as there is 6-bytes gap in 1st cacheline (fast path) > and moving uint16_t vlan_tci_outer field to 2nd cacheline. > > - such move will only impact for pretty rare usable VLAN RX stripping > mode for outer TCI (it used only for one NIC i40e from the whole set and > allows to keep minimal performance impact for RX/TX timestamps. This argument is difficult to accept. One can say you are adding a field for a pretty rare case used by only one NIC :) Honestly, I'm not able to judge whether timestamp is more important than vlan_tci_outer. As room is tight in the first cache line, your patch submission is the occasion to raise the question: how to decide what should be in the first part of the mbuf? There are also some other candidates for moving: m->seqn is only used in librte_reorder and it is not set in the RX part of a driver. About the timestamp, it would be valuable to have other opinions, not only about the placement of the field in the structure, but also to check that this API is also usable for other NICs. Have you measured the impact of having the timestamp in the second part of the mbuf? Changing the mbuf structure should happen as rarely as possible, and we have to make sure we take the correct decisions. I think we will discuss this at dpdk userland 2016. Apart from that, I wonder if an ol_flag should be added to tell that the timestamp field is valid in the mbuf. Regards, Olivier
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
The hard requirement of financial services industry is accurate timestamping aligned with the packet itself. This patch is to satisfy this requirement: - include uint64_t timestamp field into rte_mbuf with minimal impact to throughput/latency. Keep it just simple uint64_t in ns (more than 580 years) would be enough for immediate needs while using full struct timespec with twice bigger size would have much stronger performance impact as missed cacheline0. - it is possible as there is 6-bytes gap in 1st cacheline (fast path) and moving uint16_t vlan_tci_outer field to 2nd cacheline. - such move will only impact for pretty rare usable VLAN RX stripping mode for outer TCI (it used only for one NIC i40e from the whole set and allows to keep minimal performance impact for RX/TX timestamps. Signed-off-by: Oleg Kuporosov --- lib/librte_mbuf/rte_mbuf.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 23b7bf8..1e1f2ed 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -851,8 +851,7 @@ struct rte_mbuf { uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */ - /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */ - uint16_t vlan_tci_outer; + uint64_t timestamp; /**< Packet's timestamp, usually in ns */ /* second cache line - fields only used in slow path or on TX */ MARKER cacheline1 __rte_cache_min_aligned; @@ -885,6 +884,9 @@ struct rte_mbuf { }; }; + /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */ + uint16_t vlan_tci_outer; + /** Size of the application private data. In case of an indirect * mbuf, it stores the direct mbuf private data size. */ uint16_t priv_size; -- 1.8.3.1