ppisa commented on code in PR #17360:
URL: https://github.com/apache/nuttx/pull/17360#discussion_r2550306318
##########
drivers/can/can.c:
##########
@@ -702,6 +717,13 @@ static ssize_t can_write(FAR struct file *filep, FAR const
char *buffer,
/* Increment the number of bytes that were sent */
nsent += msglen;
+
+ if (msgalign > 1)
+ {
+ nsent = powerof2(msgalign)
Review Comment:
Even on powerful platforms as is x86, the integer division has significant
latency, really significant.
For Zen5
([https://www.agner.org/optimize/instruction_tables.pdf](https://www.agner.org/optimize/instruction_tables.pdf))
you read for 32-bit division (our case) if we do not switch to size_t which
would be 64-bit and even worse, you read
|Instruction |Operands |Ops |Latency |Reciprocal throughput|
|
-----------------|---------------|-------|---------|-------------------------------|
|DIV | r32/m32 |2 | 11-14 | 6
|
|IDIV | r32/m32 |2 | 11-14 | 6
|
This means that the latency for the availability of result is up to 14
cycles and only the single division can be issued once per 6 cycles. If you
compare it with 64-bit multiplication which has 3 cycles latency and can be
started once per cycle or even shifts (SHL, SHR, SAR) which has latency 1 cycle
and two can be started per cycle then the division is horrible. The misdirected
branch has penalty equivalent up to around 100 sequentially executed simple
instructions but it would be kept in branch predictor for this case and the
latency is masked. So I think that the check for power of two and branch should
be there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]