Re: [Maria-developers] uint6korr optimization

2014-02-16 Thread AugustQ
Hi,

couldn't do this earlier, please excuse for being so late.

I played with the test-pgm and modified it a bit. You can find my
version here:
https://drive.google.com/file/d/0B4h65dJSL95DXzdpaUNGYTQ5cVk/edit?usp=sharing

I also modified the definitions in myisampack.h and did some tests with
a table with 10 mio records in it (1 GB in size) to see the effect of
the modifications. You will find my results at the end of the
source-code.

At a first look the uintXkorr looks like a waste of time but the
modified versions do not have much effect on the whole process.

Regards
AugustQ


On So, 2014-01-26 at 19:28 +0400, Alexey Botchkov wrote:
> Thanks for the suggestions, Kristian.
> I for some reason didn't notice that __builtin_bswap things.
> 
> Best regards.
> HF
> 
> 
> 23.01.2014 19:51, Kristian Nielsen wrote:
> > Kristian Nielsen  writes:
> >
> >> Do it like this:
> >> static inline ulonglong
> >> mi_uint6korr(const void *p)
> >> {
> >>uint32 a= *(uint32 *)p;
> >>uint16 b= *(uint16 *)(4+(char *)p);
> >>ulonglong v= ((ulonglong)a | ((ulonglong)b << 32)) << 16;
> >>asm ("bswapq %0" : "=r" (v) : "0" (v));
> >>return v;
> >> }
> > Note that GCC also has __builtin_bswap64() (and __builtin_bswap32()). They
> > also generate bswap instruction, but would also work on other platforms...
> >
> >   - Kristian.
> 
> 
> ___
> Mailing list: https://launchpad.net/~maria-developers
> Post to : maria-developers@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] uint6korr optimization

2014-01-26 Thread Alexey Botchkov

Thanks for the suggestions, Kristian.
I for some reason didn't notice that __builtin_bswap things.

Best regards.
HF


23.01.2014 19:51, Kristian Nielsen wrote:

Kristian Nielsen  writes:


Do it like this:
static inline ulonglong
mi_uint6korr(const void *p)
{
   uint32 a= *(uint32 *)p;
   uint16 b= *(uint16 *)(4+(char *)p);
   ulonglong v= ((ulonglong)a | ((ulonglong)b << 32)) << 16;
   asm ("bswapq %0" : "=r" (v) : "0" (v));
   return v;
}

Note that GCC also has __builtin_bswap64() (and __builtin_bswap32()). They
also generate bswap instruction, but would also work on other platforms...

  - Kristian.



___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] uint6korr optimization

2014-01-23 Thread Kristian Nielsen
Kristian Nielsen  writes:

> Do it like this:

> static inline ulonglong
> mi_uint6korr(const void *p)
> {
>   uint32 a= *(uint32 *)p;
>   uint16 b= *(uint16 *)(4+(char *)p);
>   ulonglong v= ((ulonglong)a | ((ulonglong)b << 32)) << 16;
>   asm ("bswapq %0" : "=r" (v) : "0" (v));
>   return v;
> }

Note that GCC also has __builtin_bswap64() (and __builtin_bswap32()). They
also generate bswap instruction, but would also work on other platforms...

 - Kristian.

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] uint6korr optimization

2014-01-23 Thread Kristian Nielsen
Alexey Botchkov  writes:

> Only question i have with this is if there's a possibility to make the
> hf_mi_uint6korr(dest,src) macro a function.
> So we can write as usual
> dest= hf_mi_uint6korr(src)
>
> Didn't find how that can nicely be done with the assembler code.

Do it like this:

static inline ulonglong
uint6korr(const void *p)
{
  uint32 a= *(uint32 *)p;
  uint16 b= *(uint16 *)(4+(char *)p);
  return (ulonglong)a | ((ulonglong)b << 32);
}

static inline ulonglong
mi_uint6korr(const void *p)
{
  uint32 a= *(uint32 *)p;
  uint16 b= *(uint16 *)(4+(char *)p);
  ulonglong v= ((ulonglong)a | ((ulonglong)b << 32)) << 16;
  asm ("bswapq %0" : "=r" (v) : "0" (v));
  return v;
}

I get:

elapsed 25 seconds on korr6-1
elapsed 31 seconds on korr6-2
elapsed 35 seconds on hf_korr6-1
elapsed 44 seconds on hf_korr6-2
elapsed 22 seconds on mi6-1
elapsed 38 seconds on mi6-2
elapsed 27 seconds on hf_mi6B-1
elapsed 42 seconds on hf_mi6-2

So they are even a bit faster than the macros.

 - Kristian.

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] uint6korr optimization

2014-01-23 Thread Alexey Botchkov

Hi all.

Only question i have with this is if there's a possibility to make the
hf_mi_uint6korr(dest,src) macro a function.
So we can write as usual
dest= hf_mi_uint6korr(src)

Didn't find how that can nicely be done with the assembler code.

Best regards.
HF

23.01.2014 16:20, Michael Widenius wrote:

Hi!


"Alexey" == Alexey Botchkov  writes:

Alexey> Hi, Monty.
Alexey> I've looked at that part of the code, and tried to make the improved
Alexey> versions of uint6korr and mi_uint6korr.

Alexey> The small program attached that benchmarks these.
Alexey> Depending on enabled macros it loops uint6korr, hf_uint6korr,
Alexey> mi_uint6korr and hf_mi_uint6korr respectively. It performs 2 loops on
Alexey> each functions first runs it once, the second - twice, so we can
Alexey> calculate how much time was spent on the operation itself.

Alexey> The results i got so far are:
Alexey> elapsed 103 seconds on korr6-1
Alexey> elapsed 190 seconds on korr6-2
Alexey> elapsed 50 seconds on hf_korr6-1
Alexey> elapsed 79 seconds on hf_korr6-2
Alexey> elapsed 106 seconds on mi6-1
Alexey> elapsed 195 seconds on mi6-2
Alexey> elapsed 56 seconds on hf_mi6B-1
Alexey> elapsed 88 seconds on hf_mi6-2

Alexey> So the
Alexey>   hf_uint6korr is 3 times faster than uint6korr.
Alexey>   hf_mi_uint6korr is 2.8 times faster than mi_uint6korr.

Alexey> You're welcome to check the code out.

Thanks.

What is important is to get fast versions of

mi_uint3korr  (Used a lot in ma_dynrec.c and gis)
mi_uint4korr
mi_uint5korr
mi_uint6korr
mi_uint7korr
mi_uint8korr  (Used for some variables)

uint5korr
uint6korr (Used a lot in Aria)
uint8korr

(I am including the full test so that anyone can comment upon this)

--
#include 
#include 
#include 

#define TEST_KORR6
#define TEST_HF_KORR6
#define TEST_MI6
#define TEST_HF_MI6

#define uint6korr(A)((ulonglong)(((uint32)((uchar) (A)[0]))  + \
(((uint32)((uchar) (A)[1])) << 
8)   + \
(((uint32)((uchar) (A)[2])) << 
16)  + \
(((uint32)((uchar) (A)[3])) << 
24)) + \
  (((ulonglong) ((uchar) (A)[4])) << 32) +   \
  (((ulonglong) ((uchar) (A)[5])) << 40))
#define mi_uint6korr(A) ((ulonglong)(((uint32) (((const uchar*) (A))[5])) +\
   (((uint32) (((const uchar*) (A))[4])) 
<< 8) +\
   (((uint32) (((const uchar*) (A))[3])) 
<< 16) +\
   (((uint32) (((const uchar*) (A))[2])) 
<< 24)) +\
 (((ulonglong) (((uint32) (((const uchar*) 
(A))[1])) +\
(((uint32) 
(((const uchar*) (A))[0]) << 8 <<\
   32))

#define hf_uint6korr(A) (((ulonglong) ((uint32 *) (A))[0]) + (((ulonglong) ((uint16 
*) (A))[2]) << 32))
#define hf_mi_uint6korr(src, dest) \
 __asm__ (   \
   "bswapq %1;"  \
   "mov %1, %0;" \
   :"=r"(dest) \
   :"r"(hf_uint6korr(src)<<16)\
   : \
 )

typedef unsigned long long int ulonglong;
typedef unsigned int uint32;
typedef unsigned char uchar;
typedef unsigned short uint16;

time_t t0, t1;
ulonglong i;

#define GM 100LL
#define BAS 200

int main()
{
   ulonglong *pb, *pb2;
   char *art, *art2;
   ulonglong ci;

   art= malloc(6*BAS);
   pb= malloc(sizeof(ulonglong)*BAS);
   for (i=0; i<6*BAS; i++)
 art[i]= (char)i;

   art2= malloc(6*BAS);
   pb2= malloc(sizeof(ulonglong)*BAS);
   for (i=0; i<6*BAS; i++)
 art2[i]= (char)i;

#ifdef TEST_KORR6
   ci= 0;
   t0= time(0);
   for (i=0; i= BAS)
   ci= 0;

 pb[ci]= uint6korr(art+ci*6);
 ci++;
   }
   t1= time(0);
   printf("elapsed %d seconds on korr6-1\n", t1 - t0);

   ci= 0;
   t0= time(0);
   for (i=0; i= BAS)
   ci= 0;
 pb[ci]= uint6korr(art+ci*6);
 pb2[ci]= uint6korr(art2+ci*6);
 ci++;
   }
   t1= time(0);
   printf("elapsed %d seconds on korr6-2\n", t1 - t0);
#endif /*KORR6*/

#ifdef TEST_HF_KORR6
   ci= 0;
   t0= time(0);
   for (i=0; i= BAS)
   ci= 0;

 pb[ci]= hf_uint6korr(art+ci*6);
 ci++;
   }
   t1= time(0);
   printf("elapsed %d seconds on hf_korr6-1\n", t1 - t0);

   ci= 0;
   t0= time(0);
   for (i=0; i= BAS)
   ci= 0;
 pb[ci]= hf_uint6korr(art+ci*6);
 pb2[ci]= hf_uint6korr(art2+ci*6);
 ci++;
   }
   t1= time(0);
   printf("elapsed %d seconds on hf_korr6-2\n", t1 - t0);
#endif /*HF_KORR6*/

#ifdef TEST_MI6
   ci= 0;
   t0= time(0);
   for (i=0; i= BAS)
   ci= 0;

 pb[ci]= mi_uint6korr(art+ci*6);
 ci++;
   }
   t1= time(0);
   printf("elapsed 

Re: [Maria-developers] uint6korr optimization

2014-01-23 Thread Michael Widenius

Hi!

> "Alexey" == Alexey Botchkov  writes:

Alexey> Hi, Monty.
Alexey> I've looked at that part of the code, and tried to make the improved
Alexey> versions of uint6korr and mi_uint6korr.

Alexey> The small program attached that benchmarks these.
Alexey> Depending on enabled macros it loops uint6korr, hf_uint6korr,
Alexey> mi_uint6korr and hf_mi_uint6korr respectively. It performs 2 loops on
Alexey> each functions first runs it once, the second - twice, so we can
Alexey> calculate how much time was spent on the operation itself.

Alexey> The results i got so far are:
Alexey> elapsed 103 seconds on korr6-1
Alexey> elapsed 190 seconds on korr6-2
Alexey> elapsed 50 seconds on hf_korr6-1
Alexey> elapsed 79 seconds on hf_korr6-2
Alexey> elapsed 106 seconds on mi6-1
Alexey> elapsed 195 seconds on mi6-2
Alexey> elapsed 56 seconds on hf_mi6B-1
Alexey> elapsed 88 seconds on hf_mi6-2

Alexey> So the
Alexey>   hf_uint6korr is 3 times faster than uint6korr.
Alexey>   hf_mi_uint6korr is 2.8 times faster than mi_uint6korr.

Alexey> You're welcome to check the code out.

Thanks.

What is important is to get fast versions of

mi_uint3korr  (Used a lot in ma_dynrec.c and gis)
mi_uint4korr
mi_uint5korr
mi_uint6korr
mi_uint7korr
mi_uint8korr  (Used for some variables)

uint5korr
uint6korr (Used a lot in Aria)
uint8korr

(I am including the full test so that anyone can comment upon this)

--
#include 
#include 
#include 

#define TEST_KORR6
#define TEST_HF_KORR6
#define TEST_MI6
#define TEST_HF_MI6

#define uint6korr(A)((ulonglong)(((uint32)((uchar) (A)[0]))  + \
   (((uint32)((uchar) (A)[1])) << 
8)   + \
   (((uint32)((uchar) (A)[2])) << 
16)  + \
   (((uint32)((uchar) (A)[3])) << 
24)) + \
 (((ulonglong) ((uchar) (A)[4])) << 32) +   \
 (((ulonglong) ((uchar) (A)[5])) << 40))
#define mi_uint6korr(A) ((ulonglong)(((uint32) (((const uchar*) (A))[5])) +\
  (((uint32) (((const uchar*) (A))[4])) 
<< 8) +\
  (((uint32) (((const uchar*) (A))[3])) 
<< 16) +\
  (((uint32) (((const uchar*) (A))[2])) 
<< 24)) +\
(((ulonglong) (((uint32) (((const uchar*) (A))[1])) 
+\
   
(((uint32) (((const uchar*) (A))[0]) << 8 <<\
  32))

#define hf_uint6korr(A) (((ulonglong) ((uint32 *) (A))[0]) + (((ulonglong) 
((uint16 *) (A))[2]) << 32))
#define hf_mi_uint6korr(src, dest) \
__asm__ (   \
  "bswapq %1;"  \
  "mov %1, %0;" \
  :"=r"(dest) \
  :"r"(hf_uint6korr(src)<<16)\
  : \
)

typedef unsigned long long int ulonglong;
typedef unsigned int uint32;
typedef unsigned char uchar;
typedef unsigned short uint16;

time_t t0, t1;
ulonglong i;

#define GM 100LL
#define BAS 200

int main()
{
  ulonglong *pb, *pb2;
  char *art, *art2;
  ulonglong ci;

  art= malloc(6*BAS);
  pb= malloc(sizeof(ulonglong)*BAS);
  for (i=0; i<6*BAS; i++)
art[i]= (char)i;

  art2= malloc(6*BAS);
  pb2= malloc(sizeof(ulonglong)*BAS);
  for (i=0; i<6*BAS; i++)
art2[i]= (char)i;

#ifdef TEST_KORR6
  ci= 0;
  t0= time(0);
  for (i=0; i= BAS)
  ci= 0;

pb[ci]= uint6korr(art+ci*6);
ci++;
  }
  t1= time(0);
  printf("elapsed %d seconds on korr6-1\n", t1 - t0);

  ci= 0;
  t0= time(0);
  for (i=0; i= BAS)
  ci= 0;
pb[ci]= uint6korr(art+ci*6);
pb2[ci]= uint6korr(art2+ci*6);
ci++;
  }
  t1= time(0);
  printf("elapsed %d seconds on korr6-2\n", t1 - t0);
#endif /*KORR6*/

#ifdef TEST_HF_KORR6
  ci= 0;
  t0= time(0);
  for (i=0; i= BAS)
  ci= 0;

pb[ci]= hf_uint6korr(art+ci*6);
ci++;
  }
  t1= time(0);
  printf("elapsed %d seconds on hf_korr6-1\n", t1 - t0);

  ci= 0;
  t0= time(0);
  for (i=0; i= BAS)
  ci= 0;
pb[ci]= hf_uint6korr(art+ci*6);
pb2[ci]= hf_uint6korr(art2+ci*6);
ci++;
  }
  t1= time(0);
  printf("elapsed %d seconds on hf_korr6-2\n", t1 - t0);
#endif /*HF_KORR6*/

#ifdef TEST_MI6
  ci= 0;
  t0= time(0);
  for (i=0; i= BAS)
  ci= 0;

pb[ci]= mi_uint6korr(art+ci*6);
ci++;
  }
  t1= time(0);
  printf("elapsed %d seconds on mi6-1\n", t1 - t0);

  ci= 0;
  t0= time(0);
  for (i=0; i= BAS)
  ci= 0;
pb[ci]= mi_uint6korr(art+ci*6);
pb2[ci]= mi_uint6korr(art2+ci*6);
ci++;
  }
  t1= time(0);
  printf("elapsed %d seconds on mi6-2\n", t1 - t0);
#endif /*MI6*/

#ifdef TEST_HF_MI6
  ci= 0;
  t0= time(0);
  for (i=0; i= BAS)
  ci= 0;
hf_mi_uint6korr(art+ci*6, pb[ci]);