> > We have tested the effect of turbo mode on TSC and there is none. The > TSC frequency remains at the nominal clock speed, no matter if the core is > clocked down or up. So, I believe for PMD threads (where performance > matters) TSC would be an adequate and efficient clock. > > It's highly platform dependent and testing on a few systems doesn't > guarantee anything. > From the other hand POSIX guarantee the monotonic characteristics for > CLOCK_MONOTONIC.
TSC is also monotonic on a given core. Does CLOCK_MONOTONIC guarantee any better accuracy than TSC for PMD threads? > > On PMDs I am a bit concerned about the overhead/latency introduced > with the clock_gettime() system call, but I haven't done any measurements > to check the actual impact. Have you? > > Have you seen my incremental patches? > There is no overhead, because we're just replacing 'time_msec' with > 'time_usec'. > No difference except converting timespec to usec instead of msec. I did look at you incremental patches and we will test their performance. I was concerned about the system call cost on master already before. Perhaps I'm paranoid, but I would like to double check by testing. > > If we go for CLOCK_MONOTONIC in microsecond resolution, we should > make sure that the clock is read not more than once once every iteration > (and cache the us value as now in the pmd ctx struct as suggested in your > other patch). But then for consistency also the XPS feature should use the > PMD time in us resolution. > > Again, please, look at my incremental patches. As far as I could see you did, for example, not consistently adapt tx_port->last_used to microsecond resolution. > > For non-PMD thread we could actually skip time-based output batching > completely. The packet rates and the frequency of calls to > dpif_netdev_run() in the main ovs-vswitchd thread are so low that time- > based flushing doesn't seem to make much sense. Have you considered this option? > > > > Below you can find an alternative incremental patch on top of your RFC > 4/4 that uses TSC on PMD. We will be comparing the two alternatives for > performance both with non-PMD guests (iperf3) as well as PMD guests > (DPDK testpmd). > > In your version you need to move all the output_batching related code > under #ifdef DPDK_NETDEV because it will brake userspace networking if > compiled without dpdk and output-max-latency != 0. Not sure. Batching should implicitly be disabled because cycles_counter() and cycles_per_microsecond() would both return zero. But I agree that would be fairly cryptic design. If we used TSC in PMDs we should explicitly not do time-based tx batching on the non-PMD thread. Anyway, if the cost of the clock_gettime() system call proves insignificant and our performance tests comparing our TSC-based with your CLOCK_MONOTONIC-based implementation show equivalent results, we can go for your approach. BR, Jan _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev