Hi all, I have done a test of memcpy() and __copy_tofrom_user() on the mpc8313. And the major conclusion is that __copy_tofrom_user is more efficient than memcpy. Sometimes about 40%.
If I good understand, the memcpy() just copy the data, while __copy_tofrom_user() take care if the memory wasn't swapped out. So then memcpy() shall be faster than __copy_tofrom_user(). Am I right? Is here anybody, who can confirm such results and maybe is able to improve the memcpy()? Let talk about the test. I have prepared two pieces of memory of size 64KB and I make sure that this memory is not swapped out (necessary for memcpy() later). Then I run one of the memory copy function to transfer 32MB and I measure the time. The memory is copied in chunks from 64KB to 8B. I take care about the cache calling flush_dcache_range() whenever whole 64KB was used. I know, that memcpy on the kernel level is not intended to copy memory blocks in userspace and __copy_tofrom_user is not intended to copy data only between two user blocks, but for the performance test it doesn't matter. Bellow you may see the short piece of code in the kernel module. #define TEST_BUF_SIZE (64*1024) int function; char *buf1, *buf2, *buf1_bis, *buf2_bis; unsigned int size, cnt; get_user(function, &((TEST_ARG*)(arg))->function); get_user(buf1, &((TEST_ARG*)(arg))->buf1); get_user(buf2, &((TEST_ARG*)(arg))->buf2); get_user(size, &((TEST_ARG*)(arg))->size); cnt = (32*1024*1024)/size; /* how many repeats of memory copy is needed to transfer 32MB ? */ buf1_bis = buf1; buf2_bis = buf2; switch (function) { case MEMCPY_TEST: while (cnt-->0) { if (buf1_bis >= buf1+TEST_BUF_SIZE) { /* need for flusch data cache as seldom as possible */ buf1_bis = buf1; buf2_bis = buf2; flush_dcache_range((int)buf1, (int)(buf2+TEST_BUF_SIZE)); } if (buf1_bis != memcpy(buf1_bis, buf2_bis, size)) break; buf1_bis += size; buf2_bis += size; } break; case COPY_TOFROM_USER_TEST: while (cnt-->0) { if (buf1_bis >= buf1+TEST_BUF_SIZE) { /* need for flusch data cache as seldom as possible */ buf1_bis = buf1; buf2_bis = buf2; flush_dcache_range((int)buf1, (int)(buf2+TEST_BUF_SIZE)); } ret = __copy_tofrom_user(buf1_bis, buf2_bis, size); if (ret != 0) break; buf1_bis += size; buf2_bis += size; } break; } Bellow are the results: memcpy() chunk: 65536 [B] | transfer: 69.2 [MB/s] | time: 1.849727 [s] | size: 128.000 [MB] chunk: 32768 [B] | transfer: 69.2 [MB/s] | time: 1.849700 [s] | size: 128.000 [MB] chunk: 16384 [B] | transfer: 69.2 [MB/s] | time: 1.849845 [s] | size: 128.000 [MB] chunk: 8192 [B] | transfer: 69.2 [MB/s] | time: 1.850535 [s] | size: 128.000 [MB] chunk: 4096 [B] | transfer: 69.1 [MB/s] | time: 1.853405 [s] | size: 128.000 [MB] chunk: 2048 [B] | transfer: 69.1 [MB/s] | time: 1.852877 [s] | size: 128.000 [MB] chunk: 1024 [B] | transfer: 69.2 [MB/s] | time: 1.849963 [s] | size: 128.000 [MB] chunk: 512 [B] | transfer: 69.0 [MB/s] | time: 1.853793 [s] | size: 128.000 [MB] chunk: 256 [B] | transfer: 68.6 [MB/s] | time: 1.866222 [s] | size: 128.000 [MB] chunk: 128 [B] | transfer: 68.0 [MB/s] | time: 1.883002 [s] | size: 128.000 [MB] chunk: 64 [B] | transfer: 67.2 [MB/s] | time: 1.904073 [s] | size: 128.000 [MB] chunk: 32 [B] | transfer: 64.7 [MB/s] | time: 1.978109 [s] | size: 128.000 [MB] chunk: 16 [B] | transfer: 54.5 [MB/s] | time: 2.348682 [s] | size: 128.000 [MB] chunk: 8 [B] | transfer: 47.4 [MB/s] | time: 2.698635 [s] | size: 128.000 [MB] __copy_tofrom_user() chunk: 65536 [B] | transfer: 97.3 [MB/s] | time: 1.315155 [s] | size: 128.000 [MB] chunk: 32768 [B] | transfer: 97.3 [MB/s] | time: 1.315762 [s] | size: 128.000 [MB] chunk: 16384 [B] | transfer: 97.2 [MB/s] | time: 1.316946 [s] | size: 128.000 [MB] chunk: 8192 [B] | transfer: 96.8 [MB/s] | time: 1.321705 [s] | size: 128.000 [MB] chunk: 4096 [B] | transfer: 96.6 [MB/s] | time: 1.325516 [s] | size: 128.000 [MB] chunk: 2048 [B] | transfer: 96.6 [MB/s] | time: 1.325570 [s] | size: 128.000 [MB] chunk: 1024 [B] | transfer: 96.8 [MB/s] | time: 1.322599 [s] | size: 128.000 [MB] chunk: 512 [B] | transfer: 97.8 [MB/s] | time: 1.308186 [s] | size: 128.000 [MB] chunk: 256 [B] | transfer: 100.2 [MB/s] | time: 1.277788 [s] | size: 128.000 [MB] chunk: 128 [B] | transfer: 91.5 [MB/s] | time: 1.398216 [s] | size: 128.000 [MB] chunk: 64 [B] | transfer: 87.0 [MB/s] | time: 1.471784 [s] | size: 128.000 [MB] chunk: 32 [B] | transfer: 75.0 [MB/s] | time: 1.706426 [s] | size: 128.000 [MB] chunk: 16 [B] | transfer: 47.8 [MB/s] | time: 2.678039 [s] | size: 128.000 [MB] chunk: 8 [B] | transfer: 41.5 [MB/s] | time: 3.084689 [s] | size: 128.000 [MB] Regards Dominik Bozek BTW. The memcpy() maybe optimized as it is on i32 when the size of block is known at compile time. _______________________________________________ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded