Re: [Maria-developers] uint6korr optimization
Hi, couldn't do this earlier, please excuse for being so late. I played with the test-pgm and modified it a bit. You can find my version here: https://drive.google.com/file/d/0B4h65dJSL95DXzdpaUNGYTQ5cVk/edit?usp=sharing I also modified the definitions in myisampack.h and did some tests with a table with 10 mio records in it (1 GB in size) to see the effect of the modifications. You will find my results at the end of the source-code. At a first look the uintXkorr looks like a waste of time but the modified versions do not have much effect on the whole process. Regards AugustQ On So, 2014-01-26 at 19:28 +0400, Alexey Botchkov wrote: > Thanks for the suggestions, Kristian. > I for some reason didn't notice that __builtin_bswap things. > > Best regards. > HF > > > 23.01.2014 19:51, Kristian Nielsen wrote: > > Kristian Nielsen writes: > > > >> Do it like this: > >> static inline ulonglong > >> mi_uint6korr(const void *p) > >> { > >>uint32 a= *(uint32 *)p; > >>uint16 b= *(uint16 *)(4+(char *)p); > >>ulonglong v= ((ulonglong)a | ((ulonglong)b << 32)) << 16; > >>asm ("bswapq %0" : "=r" (v) : "0" (v)); > >>return v; > >> } > > Note that GCC also has __builtin_bswap64() (and __builtin_bswap32()). They > > also generate bswap instruction, but would also work on other platforms... > > > > - Kristian. > > > ___ > Mailing list: https://launchpad.net/~maria-developers > Post to : maria-developers@lists.launchpad.net > Unsubscribe : https://launchpad.net/~maria-developers > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] uint6korr optimization
Thanks for the suggestions, Kristian. I for some reason didn't notice that __builtin_bswap things. Best regards. HF 23.01.2014 19:51, Kristian Nielsen wrote: Kristian Nielsen writes: Do it like this: static inline ulonglong mi_uint6korr(const void *p) { uint32 a= *(uint32 *)p; uint16 b= *(uint16 *)(4+(char *)p); ulonglong v= ((ulonglong)a | ((ulonglong)b << 32)) << 16; asm ("bswapq %0" : "=r" (v) : "0" (v)); return v; } Note that GCC also has __builtin_bswap64() (and __builtin_bswap32()). They also generate bswap instruction, but would also work on other platforms... - Kristian. ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] uint6korr optimization
Kristian Nielsen writes: > Do it like this: > static inline ulonglong > mi_uint6korr(const void *p) > { > uint32 a= *(uint32 *)p; > uint16 b= *(uint16 *)(4+(char *)p); > ulonglong v= ((ulonglong)a | ((ulonglong)b << 32)) << 16; > asm ("bswapq %0" : "=r" (v) : "0" (v)); > return v; > } Note that GCC also has __builtin_bswap64() (and __builtin_bswap32()). They also generate bswap instruction, but would also work on other platforms... - Kristian. ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] uint6korr optimization
Alexey Botchkov writes: > Only question i have with this is if there's a possibility to make the > hf_mi_uint6korr(dest,src) macro a function. > So we can write as usual > dest= hf_mi_uint6korr(src) > > Didn't find how that can nicely be done with the assembler code. Do it like this: static inline ulonglong uint6korr(const void *p) { uint32 a= *(uint32 *)p; uint16 b= *(uint16 *)(4+(char *)p); return (ulonglong)a | ((ulonglong)b << 32); } static inline ulonglong mi_uint6korr(const void *p) { uint32 a= *(uint32 *)p; uint16 b= *(uint16 *)(4+(char *)p); ulonglong v= ((ulonglong)a | ((ulonglong)b << 32)) << 16; asm ("bswapq %0" : "=r" (v) : "0" (v)); return v; } I get: elapsed 25 seconds on korr6-1 elapsed 31 seconds on korr6-2 elapsed 35 seconds on hf_korr6-1 elapsed 44 seconds on hf_korr6-2 elapsed 22 seconds on mi6-1 elapsed 38 seconds on mi6-2 elapsed 27 seconds on hf_mi6B-1 elapsed 42 seconds on hf_mi6-2 So they are even a bit faster than the macros. - Kristian. ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] uint6korr optimization
Hi all. Only question i have with this is if there's a possibility to make the hf_mi_uint6korr(dest,src) macro a function. So we can write as usual dest= hf_mi_uint6korr(src) Didn't find how that can nicely be done with the assembler code. Best regards. HF 23.01.2014 16:20, Michael Widenius wrote: Hi! "Alexey" == Alexey Botchkov writes: Alexey> Hi, Monty. Alexey> I've looked at that part of the code, and tried to make the improved Alexey> versions of uint6korr and mi_uint6korr. Alexey> The small program attached that benchmarks these. Alexey> Depending on enabled macros it loops uint6korr, hf_uint6korr, Alexey> mi_uint6korr and hf_mi_uint6korr respectively. It performs 2 loops on Alexey> each functions first runs it once, the second - twice, so we can Alexey> calculate how much time was spent on the operation itself. Alexey> The results i got so far are: Alexey> elapsed 103 seconds on korr6-1 Alexey> elapsed 190 seconds on korr6-2 Alexey> elapsed 50 seconds on hf_korr6-1 Alexey> elapsed 79 seconds on hf_korr6-2 Alexey> elapsed 106 seconds on mi6-1 Alexey> elapsed 195 seconds on mi6-2 Alexey> elapsed 56 seconds on hf_mi6B-1 Alexey> elapsed 88 seconds on hf_mi6-2 Alexey> So the Alexey> hf_uint6korr is 3 times faster than uint6korr. Alexey> hf_mi_uint6korr is 2.8 times faster than mi_uint6korr. Alexey> You're welcome to check the code out. Thanks. What is important is to get fast versions of mi_uint3korr (Used a lot in ma_dynrec.c and gis) mi_uint4korr mi_uint5korr mi_uint6korr mi_uint7korr mi_uint8korr (Used for some variables) uint5korr uint6korr (Used a lot in Aria) uint8korr (I am including the full test so that anyone can comment upon this) -- #include #include #include #define TEST_KORR6 #define TEST_HF_KORR6 #define TEST_MI6 #define TEST_HF_MI6 #define uint6korr(A)((ulonglong)(((uint32)((uchar) (A)[0])) + \ (((uint32)((uchar) (A)[1])) << 8) + \ (((uint32)((uchar) (A)[2])) << 16) + \ (((uint32)((uchar) (A)[3])) << 24)) + \ (((ulonglong) ((uchar) (A)[4])) << 32) + \ (((ulonglong) ((uchar) (A)[5])) << 40)) #define mi_uint6korr(A) ((ulonglong)(((uint32) (((const uchar*) (A))[5])) +\ (((uint32) (((const uchar*) (A))[4])) << 8) +\ (((uint32) (((const uchar*) (A))[3])) << 16) +\ (((uint32) (((const uchar*) (A))[2])) << 24)) +\ (((ulonglong) (((uint32) (((const uchar*) (A))[1])) +\ (((uint32) (((const uchar*) (A))[0]) << 8 <<\ 32)) #define hf_uint6korr(A) (((ulonglong) ((uint32 *) (A))[0]) + (((ulonglong) ((uint16 *) (A))[2]) << 32)) #define hf_mi_uint6korr(src, dest) \ __asm__ ( \ "bswapq %1;" \ "mov %1, %0;" \ :"=r"(dest) \ :"r"(hf_uint6korr(src)<<16)\ : \ ) typedef unsigned long long int ulonglong; typedef unsigned int uint32; typedef unsigned char uchar; typedef unsigned short uint16; time_t t0, t1; ulonglong i; #define GM 100LL #define BAS 200 int main() { ulonglong *pb, *pb2; char *art, *art2; ulonglong ci; art= malloc(6*BAS); pb= malloc(sizeof(ulonglong)*BAS); for (i=0; i<6*BAS; i++) art[i]= (char)i; art2= malloc(6*BAS); pb2= malloc(sizeof(ulonglong)*BAS); for (i=0; i<6*BAS; i++) art2[i]= (char)i; #ifdef TEST_KORR6 ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= uint6korr(art+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on korr6-1\n", t1 - t0); ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= uint6korr(art+ci*6); pb2[ci]= uint6korr(art2+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on korr6-2\n", t1 - t0); #endif /*KORR6*/ #ifdef TEST_HF_KORR6 ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= hf_uint6korr(art+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on hf_korr6-1\n", t1 - t0); ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= hf_uint6korr(art+ci*6); pb2[ci]= hf_uint6korr(art2+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on hf_korr6-2\n", t1 - t0); #endif /*HF_KORR6*/ #ifdef TEST_MI6 ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= mi_uint6korr(art+ci*6); ci++; } t1= time(0); printf("elapsed
Re: [Maria-developers] uint6korr optimization
Hi! > "Alexey" == Alexey Botchkov writes: Alexey> Hi, Monty. Alexey> I've looked at that part of the code, and tried to make the improved Alexey> versions of uint6korr and mi_uint6korr. Alexey> The small program attached that benchmarks these. Alexey> Depending on enabled macros it loops uint6korr, hf_uint6korr, Alexey> mi_uint6korr and hf_mi_uint6korr respectively. It performs 2 loops on Alexey> each functions first runs it once, the second - twice, so we can Alexey> calculate how much time was spent on the operation itself. Alexey> The results i got so far are: Alexey> elapsed 103 seconds on korr6-1 Alexey> elapsed 190 seconds on korr6-2 Alexey> elapsed 50 seconds on hf_korr6-1 Alexey> elapsed 79 seconds on hf_korr6-2 Alexey> elapsed 106 seconds on mi6-1 Alexey> elapsed 195 seconds on mi6-2 Alexey> elapsed 56 seconds on hf_mi6B-1 Alexey> elapsed 88 seconds on hf_mi6-2 Alexey> So the Alexey> hf_uint6korr is 3 times faster than uint6korr. Alexey> hf_mi_uint6korr is 2.8 times faster than mi_uint6korr. Alexey> You're welcome to check the code out. Thanks. What is important is to get fast versions of mi_uint3korr (Used a lot in ma_dynrec.c and gis) mi_uint4korr mi_uint5korr mi_uint6korr mi_uint7korr mi_uint8korr (Used for some variables) uint5korr uint6korr (Used a lot in Aria) uint8korr (I am including the full test so that anyone can comment upon this) -- #include #include #include #define TEST_KORR6 #define TEST_HF_KORR6 #define TEST_MI6 #define TEST_HF_MI6 #define uint6korr(A)((ulonglong)(((uint32)((uchar) (A)[0])) + \ (((uint32)((uchar) (A)[1])) << 8) + \ (((uint32)((uchar) (A)[2])) << 16) + \ (((uint32)((uchar) (A)[3])) << 24)) + \ (((ulonglong) ((uchar) (A)[4])) << 32) + \ (((ulonglong) ((uchar) (A)[5])) << 40)) #define mi_uint6korr(A) ((ulonglong)(((uint32) (((const uchar*) (A))[5])) +\ (((uint32) (((const uchar*) (A))[4])) << 8) +\ (((uint32) (((const uchar*) (A))[3])) << 16) +\ (((uint32) (((const uchar*) (A))[2])) << 24)) +\ (((ulonglong) (((uint32) (((const uchar*) (A))[1])) +\ (((uint32) (((const uchar*) (A))[0]) << 8 <<\ 32)) #define hf_uint6korr(A) (((ulonglong) ((uint32 *) (A))[0]) + (((ulonglong) ((uint16 *) (A))[2]) << 32)) #define hf_mi_uint6korr(src, dest) \ __asm__ ( \ "bswapq %1;" \ "mov %1, %0;" \ :"=r"(dest) \ :"r"(hf_uint6korr(src)<<16)\ : \ ) typedef unsigned long long int ulonglong; typedef unsigned int uint32; typedef unsigned char uchar; typedef unsigned short uint16; time_t t0, t1; ulonglong i; #define GM 100LL #define BAS 200 int main() { ulonglong *pb, *pb2; char *art, *art2; ulonglong ci; art= malloc(6*BAS); pb= malloc(sizeof(ulonglong)*BAS); for (i=0; i<6*BAS; i++) art[i]= (char)i; art2= malloc(6*BAS); pb2= malloc(sizeof(ulonglong)*BAS); for (i=0; i<6*BAS; i++) art2[i]= (char)i; #ifdef TEST_KORR6 ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= uint6korr(art+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on korr6-1\n", t1 - t0); ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= uint6korr(art+ci*6); pb2[ci]= uint6korr(art2+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on korr6-2\n", t1 - t0); #endif /*KORR6*/ #ifdef TEST_HF_KORR6 ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= hf_uint6korr(art+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on hf_korr6-1\n", t1 - t0); ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= hf_uint6korr(art+ci*6); pb2[ci]= hf_uint6korr(art2+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on hf_korr6-2\n", t1 - t0); #endif /*HF_KORR6*/ #ifdef TEST_MI6 ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= mi_uint6korr(art+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on mi6-1\n", t1 - t0); ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; pb[ci]= mi_uint6korr(art+ci*6); pb2[ci]= mi_uint6korr(art2+ci*6); ci++; } t1= time(0); printf("elapsed %d seconds on mi6-2\n", t1 - t0); #endif /*MI6*/ #ifdef TEST_HF_MI6 ci= 0; t0= time(0); for (i=0; i= BAS) ci= 0; hf_mi_uint6korr(art+ci*6, pb[ci]);