[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
Hello Konstantin, > > My vote also would be to have timestamp in the second cache line. > About moving seqn to the 2-nd cache line too - that's probably a fair point. It may impact throughput till ~6% for applications required such embedded Timestamps. > About the rest of the patch: > Do you really need to put that code into the PMDs itself? > Can't the same result be achieved by using RX callbacks? > Again that approach would work with any PMD and there would be no need > to modify PMD code itself. Correct, the approach with using callbacs (rte_eth_timesync_read_[r|t]x_timestamp()) Has also some Cons for this use case: - FSI needs the most accurate stamping as possible by reasons were described in Cover letter - callback will be called from user app and so you have to take into account Difference between time when packet was released by NIC and callback call - such difference is not easy to estimate correctly due to dependency on CPU type, Its frequency and current load conditions - even estimated it would be hard without additional performance penalty to align Packet with timestamp, taking account that call may actually placed from Different thread or even process. It looks the least impacting and correct way is to have timestamp in rte_mbuf and fill It in Rx burst procedure. > Another thing, that I am in doubt why to use system time? > Wouldn't raw HW TSC value (rte_rdtsc()) do here? System time is being used for periodic clock synchronization between wall clock and NIC to estimate NIC clock deviation. It is in assumption the system itself is synchronized by PTP from master clock. It is run on context of control thread. Thanks, Oleg.
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
Hello Reshma > I just read this mail chain, to make every one aware again, I am emphasizing > on the point that I am also adding new "time_arraived" field to mbuf struct as > part of below 17.02 Road map item. Thanks for your work for extending statistics support in DPDK. "time_arrived" points here for mostly RX path, while more common term used "timestamp" Allows also using it for TX path as well in the future. Best regards, Oleg.
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
Hello Oliver Great thanks for your review and con > > - include uint64_t timestamp field into rte_mbuf with minimal impact to > > throughput/latency. Keep it just simple uint64_t in ns (more than 580 > > years) would be enough for immediate needs while using full > > struct timespec with twice bigger size would have much stronger > > performance impact as missed cacheline0. > > > > - it is possible as there is 6-bytes gap in 1st cacheline (fast path) > > and moving uint16_t vlan_tci_outer field to 2nd cacheline. > > > > - such move will only impact for pretty rare usable VLAN RX stripping > > mode for outer TCI (it used only for one NIC i40e from the whole set and > > allows to keep minimal performance impact for RX/TX timestamps. > > This argument is difficult to accept. One can say you are adding a field for a > pretty rare case used by only one NIC :) It may be looks so, but in fact not only for one NIC. Absence of timestamp there Required from developers implement its support in out of DPDK data path with Local DPDK patching which also may lead some penalty in accuracy. Good example here is open source implementation of tracing library - https://github.com/wanduow/libtrace/tree/libtrace4 These folks patched DPDK to have timestamp for every packet (Intel DPDK Patches folder) And used it in out of band to store and later processing (lib/format_dpdk.c). That was actually the starting point of my investigations. Such approach is working for 1GBb case but not for 10-100 cases. > > Honestly, I'm not able to judge whether timestamp is more important than > vlan_tci_outer. As room is tight in the first cache line, your patch > submission > is the occasion to raise the question: how to decide what should be in the > first part of the mbuf? There are also some other candidates for moving: m- > >seqn is only used in librte_reorder and it is not set in the RX part of a > >driver. Agree, it is difficult to decide, my thoughts were: - there is hole (6 bytes) which wasn't marked as reserved for any planned extensions; -vlan_tci_outer is being used by only one NIC (i40e) so far, 2nd NIC (fm10k) is using it With comment: * mbuf->vlan_tci_outer is an idle field in fm10k driver, * so it can be selected to store sglort value. To store some another value under some specific "if". Also for i40e it is under #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC which per doc should be Enabled for high throughput of small packets. So in default case (disabled) it anyway has some performance penalty with using 32 bytes descriptor and moving it to 2nd CL would not hit big additional penalty. Unfortunately I have no such NIC to measure. Is there any data how often double tagging in being used in DPDK applications? Another my thought was to have at the end of 1st CL enum which may hold Reserved fields per specific use cases and data widths (uint8, 2xuint4, 4xuint2, 8xbytes). > > About the timestamp, it would be valuable to have other opinions, not only > about the placement of the field in the structure, but also to check that this > API is also usable for other NICs. Sure, but I didn't change timesync/timestamping API itself. > Have you measured the impact of having the timestamp in the second part > of the mbuf? Yes, the worst case with prefetching of 2nd CL it is 5.7-5.9 % penalty for combined RX+TX And expectedly much worse without prefetching. In the best case it is 0.3..0.5 % for RX only. It can be explained by much harder cache trashing when TX is "on". > Changing the mbuf structure should happen as rarely as possible, and we > have to make sure we take the correct decisions. I think we will discuss this > at > dpdk userland 2016. Oh, yes, please discuss, I would not be able to join. :( > Apart from that, I wonder if an ol_flag should be added to tell that the > timestamp field is valid in the mbuf. Oliver, there is PKT_RX_IEEE1588_TMST flag already. Best regards, Oleg.
[dpdk-dev] [PATCH 3/3] net/mlx5: implementation of Rx packet timestamping support
Accurate RX timestaping is minimal requirement for several important financial services industry requirement for several applications like packet capture. Support of periodic time synchronization with system time and adjusting clock deviation was also added. We assume the system is synchronized by PTP itself, so there is no links to PTP client implementation in DPDK. Time synchronization is run by control thread by rte_alarm for each 5 seconds. New synchronization object is copied to each configured RXQ for TS calculation. RX timestamp is calculated in nanoseconds and stored in rte_mbuf. RX timestamp doesn't require additional API as simply can be read from timestamp field of rte_mbuf. TX timestamps were not added due to several reasons - there is no API yet, issue with burst mode as TS will be provided only for the latest packet in the burst and rte_mbuf will be discarded immediately after completion. Enabling/disabling time synchronization DPDK API was added. Signed-off-by: Oleg Kuporosov --- drivers/net/mlx5/mlx5.c | 7 +- drivers/net/mlx5/mlx5.h | 10 +- drivers/net/mlx5/mlx5_defs.h | 4 + drivers/net/mlx5/mlx5_ethdev.c | 222 ++- drivers/net/mlx5/mlx5_rxq.c | 2 + drivers/net/mlx5/mlx5_rxtx.c | 19 ++- drivers/net/mlx5/mlx5_rxtx.h | 7 +- drivers/net/mlx5/mlx5_time.h | 53 drivers/net/mlx5/mlx5_trigger.c | 1 + lib/librte_eal/common/include/rte_time.h | 45 +++ 10 files changed, 360 insertions(+), 10 deletions(-) create mode 100644 drivers/net/mlx5/mlx5_time.h diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 83bdf65..64dab28 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -1,8 +1,8 @@ /*- * BSD LICENSE * - * Copyright 2015 6WIND S.A. - * Copyright 2015 Mellanox. + * Copyright 2015-2016 6WIND S.A. + * Copyright 2015-2016 Mellanox. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -122,6 +122,7 @@ mlx5_dev_close(struct rte_eth_dev *dev) (void *)dev, ((priv->ctx != NULL) ? priv->ctx->device->name : "")); /* In case mlx5_dev_stop() has not been called. */ + mlx5_timesync_disable(dev); priv_dev_interrupt_handler_uninstall(priv, dev); priv_special_flow_disable_all(priv); priv_mac_addrs_disable(priv); @@ -219,6 +220,8 @@ static const struct eth_dev_ops mlx5_dev_ops = { .rss_hash_update = mlx5_rss_hash_update, .rss_hash_conf_get = mlx5_rss_hash_conf_get, .filter_ctrl = mlx5_dev_filter_ctrl, + .timesync_enable = mlx5_timesync_enable, + .timesync_disable = mlx5_timesync_disable, }; static struct { diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index d4fb5ff..49447c4 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -1,8 +1,8 @@ /*- * BSD LICENSE * - * Copyright 2015 6WIND S.A. - * Copyright 2015 Mellanox. + * Copyright 2015-2016 6WIND S.A. + * Copyright 2015-2016 Mellanox. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -54,6 +54,7 @@ #ifdef PEDANTIC #pragma GCC diagnostic ignored "-pedantic" #endif +#include #include #include #include @@ -64,6 +65,7 @@ #endif #include "mlx5_utils.h" +#include "mlx5_time.h" #include "mlx5_rxtx.h" #include "mlx5_autoconf.h" #include "mlx5_defs.h" @@ -113,6 +115,7 @@ struct priv { unsigned int mps:1; /* Whether multi-packet send is supported. */ unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */ unsigned int pending_alarm:1; /* An alarm is pending. */ + unsigned int timesync_en:1; /* Timesync (timestamping) enabled */ unsigned int txq_inline; /* Maximum packet size for inlining. */ unsigned int txqs_inline; /* Queue number threshold for inlining. */ /* RX/TX queues. */ @@ -120,6 +123,7 @@ struct priv { unsigned int txqs_n; /* TX queues array size. */ struct rxq *(*rxqs)[]; /* RX queues. */ struct txq *(*txqs)[]; /* TX queues. */ + struct mlx5_timesync timesync; /* time synronization object */ /* Indirection tables referencing all RX WQs. */ struct ibv_exp_rwq_ind_table *(*ind_tables)[]; unsigned int ind_tables_n; /* Number of indirection tables. */ @@ -203,6 +207,8 @@ int mlx5_set_link_up(struct rte_eth_dev *dev); struct priv *mlx5_secondary_data_setup(struct priv *priv); void priv_select_tx_function(struct priv *); void priv_select_rx_function(struct priv *); +int mlx5_timesync_enable(struct rte_eth_dev *dev); +int mlx5_timesync_disable(struct rte_eth_dev *dev)
[dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps
Implemented two methods of control - by --enable-timestamps CL testpmd application we can enable timestamping for all ports; - in interactive mode port config timestamps on|off is able to configure timestamping per specific port. The control doesn't interact with IEEE1588 PTP implementation there as it is under macro compilation but can be extended in the future. This feature is required for debugging/testing purposes for real time HW packet timestamping. Signed-off-by: Oleg Kuporosov --- app/test-pmd/cmdline.c | 122 app/test-pmd/parameters.c | 4 + app/test-pmd/testpmd.c | 5 ++ app/test-pmd/testpmd.h | 1 + doc/guides/testpmd_app_ug/run_app.rst | 4 + doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 ++ 6 files changed, 143 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 17d238f..9b202ce 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -561,6 +561,9 @@ static void cmd_help_long_parsed(void *parsed_result, "Set crc-strip/scatter/rx-checksum/hardware-vlan/drop_en" " for ports.\n\n" + "port config (port_id|all) timestamps (on|off)\n" + "Enable/disable packet timestamps.\n\n" + "port config all rss (all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none)\n" "Set the RSS mode.\n\n" @@ -10188,6 +10191,123 @@ cmdline_parse_inst_t cmd_config_l2_tunnel_en_dis_specific = { }, }; +/* Configure port timestamping */ +struct cmd_config_timestamps_result { + cmdline_fixed_string_t port; + cmdline_fixed_string_t config; + cmdline_fixed_string_t all; + uint8_t id; + cmdline_fixed_string_t timestamps; + cmdline_fixed_string_t mode; +}; + +cmdline_parse_token_string_t cmd_config_timestamps_port = + TOKEN_STRING_INITIALIZER + (struct cmd_config_timestamps_result, +port, "port"); +cmdline_parse_token_string_t cmd_config_timestamps_config = + TOKEN_STRING_INITIALIZER + (struct cmd_config_timestamps_result, +config, "config"); +cmdline_parse_token_string_t cmd_config_timestamps_all_str = + TOKEN_STRING_INITIALIZER + (struct cmd_config_timestamps_result, +all, "all"); +cmdline_parse_token_num_t cmd_config_timestamps_id = + TOKEN_NUM_INITIALIZER + (struct cmd_config_timestamps_result, +id, UINT8); +cmdline_parse_token_string_t cmd_config_timestamps_ts_str = + TOKEN_STRING_INITIALIZER + (struct cmd_config_timestamps_result, + timestamps, "timestamps"); +cmdline_parse_token_string_t cmd_config_timestamps_path = + TOKEN_STRING_INITIALIZER + (struct cmd_config_timestamps_result, + mode, "on#off"); + +/* enable/disable timestamps (on/off) for a port */ +static void +cmd_config_timestamps_all_parsed( + void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_config_timestamps_result *res = parsed_result; + portid_t pid; + uint8_t mode = 0; + + if (!strcmp("on", res->mode)) + mode = 1; + else if (!strcmp("off", res->mode)) + mode = 0; + else { + printf("Unknown timestamps mode parameter\n"); + return; + } + FOREACH_PORT(pid, ports) { + if (mode) + rte_eth_timesync_enable(pid); + else + rte_eth_timesync_disable(pid); + } +} + +cmdline_parse_inst_t cmd_config_timestamps_all = { + .f = cmd_config_timestamps_all_parsed, + .data = NULL, + .help_str = "port config all timestamps on|off", + .tokens = { + (void *)_config_timestamps_port, + (void *)_config_timestamps_config, + (void *)_config_timestamps_all_str, + (void *)_config_timestamps_ts_str, + (void *)_config_timestamps_path, + NULL, + }, +}; + +/* enable/disable timestamps (rx/tx/both) for a port */ +static void +cmd_config_timestamps_specific_parsed( + void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_config_timestamps_result *res = + parsed_result; + uint8_t mode = 0; + + if (port_id_is_invalid(res->id, ENABLED_WARN)) + return; + if (!strcmp("on", res->mode)) + mode = 1; + else if (!strcmp(&
[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
The hard requirement of financial services industry is accurate timestamping aligned with the packet itself. This patch is to satisfy this requirement: - include uint64_t timestamp field into rte_mbuf with minimal impact to throughput/latency. Keep it just simple uint64_t in ns (more than 580 years) would be enough for immediate needs while using full struct timespec with twice bigger size would have much stronger performance impact as missed cacheline0. - it is possible as there is 6-bytes gap in 1st cacheline (fast path) and moving uint16_t vlan_tci_outer field to 2nd cacheline. - such move will only impact for pretty rare usable VLAN RX stripping mode for outer TCI (it used only for one NIC i40e from the whole set and allows to keep minimal performance impact for RX/TX timestamps. Signed-off-by: Oleg Kuporosov --- lib/librte_mbuf/rte_mbuf.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 23b7bf8..1e1f2ed 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -851,8 +851,7 @@ struct rte_mbuf { uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */ - /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */ - uint16_t vlan_tci_outer; + uint64_t timestamp; /**< Packet's timestamp, usually in ns */ /* second cache line - fields only used in slow path or on TX */ MARKER cacheline1 __rte_cache_min_aligned; @@ -885,6 +884,9 @@ struct rte_mbuf { }; }; + /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */ + uint16_t vlan_tci_outer; + /** Size of the application private data. In case of an indirect * mbuf, it stores the direct mbuf private data size. */ uint16_t priv_size; -- 1.8.3.1
[dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support
Hello DPDK Developers, Financial Services Industry which is pretty eager for several DPDK features especially low latency while high throughput. The major issue so far for increasing DPDK adoption there is requirement for several applications (capturers, some trading algorithms) to support of accurate timestamping. The requirement mostly came from regulatory and customers need strictly follow it. Current state of timestamping support in DPDK looks pretty good: - there is API to enable/disable timestamping acquisition by rte_eth_timesync_enable/disable - get timestamps itself by timesync_read_rx/tx_timestamp - and even implementation of NTP IEEE 1588 for time synchronization by rte_eth_timesync_adjust_read/write_time APIs. But it misses the most important feature there ? embedding timestamp in rte_mbuf aligning it with packet. We would like to change this to increase DPDK adoption for several new DPDK-based applications for FSI segment. It also might be very applicable for several RT media and debugging purposes of network flows/streams in other segments like HPC. There are several thoughts how to improve there: - include uint64_t timestamp field into rte_mbuf with minimal impact to throughput/latency. Keep it just simple uint64_t in ns (more than 580 years) would be enough for immediate needs while using full struct timespec with twice bigger size would have much stronger performance impact as missed cacheline0. It is possible as there is 6-bytes gap in 1st cacheline (fast path) and moving uint16_t vlan_tci_outer field to 2nd cacheline. - such move will only impact for pretty rare usable VLAN RX stripping mode for outer TCI (it used only for one NIC i40e from the whole set and keep minimal performance impact for timestamps. - PMD can fill the field in their callback completion routines depending on enabling this feature in runtime. We evaluated other possible options but looks it will have even worse performance impact. Oleg Kuporosov (3): mbuf: embedding timestamp into the packet app/testpmd: enabled control for packet timestamps net/mlx5: implementation of Rx packet timestamping support app/test-pmd/cmdline.c | 122 +++ app/test-pmd/parameters.c | 4 + app/test-pmd/testpmd.c | 5 + app/test-pmd/testpmd.h | 1 + doc/guides/testpmd_app_ug/run_app.rst | 4 + doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 + drivers/net/mlx5/mlx5.c | 7 +- drivers/net/mlx5/mlx5.h | 10 +- drivers/net/mlx5/mlx5_defs.h| 4 + drivers/net/mlx5/mlx5_ethdev.c | 222 +++- drivers/net/mlx5/mlx5_rxq.c | 2 + drivers/net/mlx5/mlx5_rxtx.c| 19 ++- drivers/net/mlx5/mlx5_rxtx.h| 7 +- drivers/net/mlx5/mlx5_time.h| 53 +++ drivers/net/mlx5/mlx5_trigger.c | 1 + lib/librte_eal/common/include/rte_time.h| 45 ++ lib/librte_mbuf/rte_mbuf.h | 6 +- 17 files changed, 507 insertions(+), 12 deletions(-) create mode 100644 drivers/net/mlx5/mlx5_time.h Thanks, Oleg. -- 1.8.3.1