Looking around in the code, I notice that the implementation of
ddi_swap32() is fairly inefficient. In particular, it uses macros to
swap the bytes around using shifting operations.
Conversely, htonl on i386 systems is implemented using the natural x86
bswap instruction. (This is true for both the kernel and libc.)
A little bit of testing shows *on my system* that the bswap
implementation is considerably faster. A short loop of tests runs
almost twice as fast using htonl() compared to ddi_swap32().
On UltraSPARC systems, there are also some UltraSPARC-specific
extensions which add little-endian direct access, potentially saving a
lot of time. (I'm thinking of code that does PIOs into little endian
PCI devices, for example. Endian swap of data like audio data is
another good example.) It appears that the little endian ASIs are not
available on generic V9, but only the UltraSPARC variants. (Not sure
whether Niagra family CPUs have them or not.)
What do folks think about replacing ddi_swap16/32 with
processor-specific variants that could do a much more efficient CPU
instruction?
- Garrett
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code