The checksum routine should really be written in assembly. By writing it in assembly you can take advantage of the carry flag. This is not possible in C.
A very efficient assembly version will first load a big chunk of data into the registers using a "load multiple" instruction, then add all the 16 or 32bit registers using a "add with carry" instruction. (then loop as many times as necessary) Processors with 32bit "add with carry" instructions can do a very fast checksum computation using this method, but even 16bit "add with carry" instructions yield good results. If you are looking for other things to optimise... Make sure routines such as memcopy and setmem are performed using either DMA or "load/store multiple" assembly instructions. /Timmy Brolin -----Original Message----- From: "Ashutosh Srivastava" <[EMAIL PROTECTED]> To: "Mailing list for lwIP users" <[email protected]> Date: Tue, 15 Nov 2005 12:26:13 +0530 Subject: Re: [lwip-users] lwIP Checksum routine E-mail signatureThanks for this optimization info. I have already started on coding the checksum computation in my processor assembly. Can anyone suggest any other critical part of LWIP which gives performance enhancement when optimized in assembly? Thanks, Ashutosh ----- Original Message ----- From: Jim Gibbons To: Mailing list for lwIP users Sent: Tuesday, November 15, 2005 4:52 AM Subject: Re: [lwip-users] lwIP Checksum routine We did an optimization for one port (NiosII). This is very CPU dependent. In our particular case, we did better with 16-bit accesses owing to a slow shifter. We did the best by handling 8 half-words in one pass of an outer loop. This allowed us to use small constant offsets that could be encoded in the load instructions, e.g., acc += data[0]; acc += data[1]; etc. The loop overheads and the pointer update (data += 8) became a much smaller fraction of the CPU time taken. But, as I said, this stuff is very CPU dependent. Considering that, I think that the core code is as it should be. It's a simple thing to change for your particular CPU, so I would urge you to do so. I would also urge you to try a couple of different things and measure your results. We were surprised when we found that full word accesses weren't good for us, and you may find some surprising things with your CPU. You might also want to check your ethernet chip. Some of the newer ones can assist you at the time of transmission. Good luck! Sathya Thammanur wrote: Hi all, The lwip_chksum() function in lwip/src/core/inet.c seems to be unoptimized. This is doing halfword reads and additions. Wouldnt it be better to do word accesses and hence additions? There would be some prologue and epilogue code to checks for bringing the buffer to halfword->word boundaries. HAs anyone tried doing the same for any of their ports? Or am I missing something out here? Thanks, Sathya ---------------------------------------------------------------------------- _______________________________________________ lwip-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/lwip-users -- Jim Gibbons [EMAIL PROTECTED] Gibbons and Associates, Inc. TEL: (408) 984-1441 900 Lafayette, Suite 704, Santa Clara, CA FAX: (408) 247-6395 ------------------------------------------------------------------------------ _______________________________________________ lwip-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/lwip-users _______________________________________________ lwip-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/lwip-users
