On Mon, 16 Feb 2015 22:44:31 +0000 "Dumitrescu, Cristian" <cristian.dumitrescu at intel.com> wrote:
> Hi Stephen, > > Sorry, NACK. > > 1. Overflow issue > As you declare cycles_per_byte as uint32_t, for a CPU frequency of 2-3 GHz, > the line of code below results in overflow: > port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) / > params->rate; > Therefore, there is most likely a significant accuracy loss, which might > result in more packets allowed to go out than it should. The tsc shifted is still 64 bits. and rate is 32 bits bytes/sec. I chose scale such that if clock = 3 Ghz then min rate = 715 bytes/sec = 5722 bits/sec > 2. Integer division has a higher cost than floating point division > My understanding is we are considering a performance improvement by replacing > the double precision floating point division in: > double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte; > with an integer division: > uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) / > port->cycles_per_byte; > I don't think this is going to have the claimed benefit, as acording to > "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Appendix > C), the latency of the integer division instruction is significantly bigger > than the latency of integer division: > Instruction FDIV double precision: latency = 38-40 cycles > Instruction IDIV: latency = 56 - 80 cycles I observed that performance when from 5Gbit/sec to 10Gbit/sec. Mostly because the floating point engages more instruction units and does not pipeline. Cycle count is not everything. This was on Ivy Bridge processor. > 3. Alternative > I hear though your suggestion about replacing the floating point division > with a more performant construction. One suggestion would be to replace it > with an integer multiplication followed by a shift right, probably by using a > uint64_t bytes_per_cycle_scaled_up (the inverse of cycles_per_bytes). I need > to prototype this code myself. Would you be OK to look into providing an > alternative implementation? > I looked into multiplative integer method, and will do it in future. But it has more scaling issues since it would require that the values both be 32 bits.