On Wed, 14 Feb 2007, Alan wrote: > > > My comment is not very good, in fact on some cameras I need to swap the > > > bytes > > > to have correct JPEG data (so this is not an endianness issue I think). > > > Maybe there is a macro to swap bytes in a buffer? I cannot find it. > > > > Sorry, there's a swab32, but no swab16. I misremembered. > > Its just called "swab" for 16bit values and is a gcc builtin/string > function.
The C library function swab() isn't usable in the kernel, as it's not part of the kernel's C lib. Gcc doesn't have a builtin swab/bswap16 yet, maybe it will someday: http://gcc.gnu.org/ml/gcc-patches/2006-07/msg00496.html The kernel does have swab64, swab32, and yes, swab16 macros! They're all defined in the same place in asm/byteorder.h. There are architecture optimized versions for some cases, but not for x86 and swab16 as gcc supposedly does ok (or does it? *). There are three versions of the swabXX functions, a normal one, one that takes a pointer to the data, and one that swaps the data in-place. The more specialized versions might be faster in some cases. I don't see any version of that swaps an array of data, like C-lib swab(), which be a lot more useful that swab16 vs swab16p vs swab16s, IMHO. uint16_t *p) *p = swab16(*p); // one way *p = swab16p(p); // maybe better swab16s(p); // best >> + /* swap to good indian if camera needs it */ >> + if (cam->method == 0) >> + for (i = 0; i < BUFFER_SIZE; i += 2) { >> + swap = cam->buffer[i]; >> + cam->buffer[i] = cam->buffer[i + 1]; >> + cam->buffer[i + 1] = swap; >> + } + /* swap to good endian if camera needs it */ + if (cam->method == 0) + for (i = 0; i < BUFFER_SIZE/2; i++) { + swab16s((uint16_t*)cam->buffer +i); + } or + /* swap to good native american if camera needs it */ + if (cam->method == 0) { + uint16_t *buf = cam->buffer; + for (i = 0; i < BUFFER_SIZE/2; i++) + swab16s(buf++); + } *** Does gcc really optimize swab16() well? Compiled this with gcc 4.0.1 for athlon (using 2.6.20's compiler options): void bar(uint16_t *p) { int i; for(i=0;i<127;i++) swab16s(p + i); } Resulting asm code does not look that good to me. gcc does a copy, two shifts, and then an or to effect the swab16. Surely rotating a 16-bit register would be faster? There shouldn't be any partial register stalls. I don't see why gcc decides to add two the pointer, then offset it by -2 when it uses it. What's the point of that? bar: pushl %ebx # movl $1, %ebx # %ebx = i leal 2(%eax), %ecx # %ecx = p+2, why add 2? just use eax .p2align 4,,7 .L21: movzwl -2(%ecx), %eax # Have to offset by -2 incl %ebx # movl %eax, %edx # do the swab16 sall $8, %edx # shrl $8, %eax # orl %eax, %edx # movw %dx, -2(%ecx) # addl $2, %ecx # why not skip this and use (%ecx,%ebx,2) cmpl $128, %ebx # counting from -128...0 would avoid this jne .L21 popl %ebx # used too many registers ret Anyway, surely this would be faster: bar: movl $-128, %ecx # start at -128, count to 0 add $256, %eax # (p+256)[-128] == p[0] .p2align 4,,7 .L21: movzwl (%eax,%ecx,2), %edx rorw $8, %dx # all that's needed for swab16 movw %dx, (%eax,%ecx,2) inc %ecx jnz .L21 ret Ok, the loop optimization is a little hard for gcc, but isn't it supposed to be able to figure out "rorw $8, %reg"? ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel