Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-09-02 Thread Alik Khilazhev
Hello Fabien,

Thank you for detailed review. I hope I have fixed all the issues you mentioned 
in your letter.



pgbench-zipf-08v.patch
Description: Binary data

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-22 Thread Alik Khilazhev
Hello, Fabien

I am attaching patch v7.

> Yes, I agree. a >= 1 does not make much sense... If you want uniform you 
> should use random(), not call random_zipfian with a = 1. Basically it 
> suggests that too large values of "a" should be rejected. Not sure where to 
> put the limit, though.

I set upper bound for parameter to be equal to 1000.
> 
> Yes, as a general principle I think that the documentation should reflect the 
> implementation.

Documentation have been updated, I have removed algorithms descriptions and put 
references to them there.



pgbench-zipf-07v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-13 Thread Alik Khilazhev
Hello Fabien,

> 
> I think that this method should be used for a>1, and the other very rough one 
> can be kept for parameter a in [0, 1), a case which does not make much sense 
> to a mathematician as it diverges if unbounded.

Now “a” does not have upper bound, that’s why on using iterative algorithm with 
a >= 1 program will stuck on infinite loop because of following line of 
code:
double b = pow(2.0, s - 1.0); 
Because after overflow “b” becomes “+Inf”.

So should upper bound for “a" be set? 

Should I mention in docs that there are two algorithms are used depending on 
values of a(s/theta)?

In attaching patch, I have added computeIterativeZipfian method and it’s usage 
in getZipfianRand.
Is it better to move code of computing via cache to new method, so that 
getZipfianRand will contain only 2 computeXXXZipfian method calls?


pgbench-zipf-06v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-06 Thread Alik Khilazhev
Hello Fabien,


> On 5 Aug 2017, at 12:15, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
> 
> 
> Hello Alik,
> 
> I've done some math investigations, which consisted in spending one hour with 
> Christian, a statistician colleague of mine. He took an old book out of a 
> shelf, opened it to page 550 (roughly in the middle), and explained to me how 
> to build a real zipfian distribution random generator.
> 
> The iterative method is for parameter a>1 and works for unbounded values. It 
> is simple to add a bound. In practice the iterative method is quite 
> effective, i.e. number of iterations is typically small, at least if the 
> bound is large and if parameter a is not too close to 1.

> I've attached a python3 script which implements the algorithm. It looks like 
> magic. Beware that a C implementation should take care of float and int 
> overflows.
> 
Thank you for the script. I will rewrite it to C and add to the patch soon. 

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-21 Thread Alik Khilazhev
Hello!

I realized that I was sending emails as HTML and latest patch is not visible in 
the archive now.
That’s why I am attaching it again.

I am sorry for that.



pgbench-zipf-05v.patch
Description: Binary data

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-21 Thread Alik Khilazhev
Hmmm. On second thought, maybe one or the other is enough, either restrict the parameter to values where the approximation is good, or put out a clear documentation about when the approximation is not very good, but it may be still useful even if not precise.So I would be in favor of expanding the documentation but not restricting the parameter beyond avoiding value 1.0.I have removed restriction and expanded documentation in attaching patch v5. Also I have recorded patch to CF 2017-09 —  https://commitfest.postgresql.org/14/1206/. 

pgbench-zipf-05v.patch
Description: Binary data
—Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company



Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-20 Thread Alik Khilazhev



> I think that developping a test would be much simpler with the improved tap 
> test infrastructure, so I would suggest to wait to know the result of the 
> corresponding patch.

Ok, I will wait then.

> Also, could you recod the patch to CF 2017-09?
> https://commitfest.postgresql.org/14/ <https://commitfest.postgresql.org/14/>

Yea, I will send it.

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-20 Thread Alik Khilazhev
Hello Fabien,I am attaching patch v4. On 19 Jul 2017, at 17:21, Fabien COELHO  wrote:About the maths: As already said, I'm not at ease with a random_zipfian function which does not display a (good) zipfian distribution. At the minimum the documentation should be clear about the approximations implied depending on the parameter value.I add one more sentence to documentation to emphasize that degree of proximity depends on parameter . And also I made restriction on parameter, now it can be only in range (0; 1)In the litterature the theta parameter seems to be often called alphaor s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest tostick to "s" instead of "theta”?I have renamed it to “s”.Functions zipfZeta(n, theta) does not really computes the zeta(n) function,so I think that a better name should be chosen. It seems to computeH_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct.Renamed zipfZeta to zipfGeneralizedHarmonic, zetan to harmonicn.The handling of cache overflow by randomly removing one entry looks likea strange idea. Rather remove the oldest entry?Replaced with Least Recently Used replacement algorithm.ISTM that it should print a warning once if the cache array overflows as performance would drop heavily.Now it prints warning message if array overflowed. To print message only one time, it uses global flag, which is available for all threads. And theoretically message can be printed more than one time. It could be solved easily using pg_atomic_test_set_flag() from src/include/port/atomics.h but it can not be used in pgbench because of following lines of code there:#ifdef FRONTEND#error "atomics.h may not be included from frontend code"#endifOr it can be fixed by using mutexes from pthread, but I think code become less readable and more complex in this case.So, should I spend time on solving this issue? If the zipf cache is constant size, there is no point in using dynamic allocation, just declare an array…Fixed. Does ZIPF_CACHE_SIZE = 15 is ok? There should be non regression tests somehow. If the "improve pgbenchtap test infrastructure" get through, things should be added there.I will send tests later, as separate patch.

pgbench-zipf-04v.patch
Description: Binary data
—Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Alik Khilazhev

> On 17 Jul 2017, at 13:51, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
> 
> 
> Is this bias expected from the drawing method, say because it is approximated 
> and the approximation is weak at some points, or is there an issue with its 
> implementation, says some shift which gets smoothed down for higher indexes?
> 

I have checked paper where such implementation was proposed and there theta 
allowed only on range between 0 and 1. It seems like it is not guaranteed that 
it should work well when theta is more than 1.

I am attaching paper, see page 23.


syntheticdatagen.pdf
Description: Adobe PDF document

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Alik Khilazhev
Hello Fabien,On 14 Jul 2017, at 17:51, Fabien COELHO  wrote:Ok, so you did not get the large bias for i=3. Strange.I got large bias for i=3 and theta > 1 even with a million outcomes, but for theta < 1 (I have tested on theta = 0.1 and 0.3) it showed quite good results.I am attaching patch v3. Among other things I fixed small typo in description of random_exponential function in pgbench.sgml file.

pgbench-zipf-03v.patch
Description: Binary data
—Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-14 Thread Alik Khilazhev
00 00 00 00 00
 pgbench_accounts_pkey | 0 | 72 |62 |61 |63 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
37 57 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 73 |63 |62 |64 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
a5 58 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 74 |64 |63 |65 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
13 5a 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 75 |65 |64 |66 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
81 5b 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 76 |66 |65 |67 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
ef 5c 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 77 |67 |66 |68 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
5d 5e 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 78 |68 |67 |69 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
cb 5f 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 79 |69 |68 |70 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
39 61 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 80 |70 |69 |71 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
a7 62 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 81 |71 |70 |72 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
15 64 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 82 |72 |71 |73 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
83 65 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 83 |73 |72 |74 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
f1 66 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 84 |74 |73 |75 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
5f 68 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 85 |75 |74 |76 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
cd 69 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 86 |76 |75 |77 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
3b 6b 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 87 |77 |76 |78 |   
   1 | l|367 |  0 |16 |  8192 |   808 | 
a9 6c 00 00 00 00 00 00

— 
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-14 Thread Alik Khilazhev

> On 13 Jul 2017, at 19:14, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
> 
> Documentation says that the closer theta is from 0 the flatter the 
> distribution
> but the implementation requires at least 1, including strange error messages:
> 
>  zipfian parameter must be greater than 1.00 (not 1.00)
> 
> Could theta be allowed between 0 & 1 ? I've tried forcing with theta = 0.1
> and it worked well, so I'm not sure that I understand the restriction.
> I also tried with theta=0.001 but it seemed less good.

Algorithm works with theta less than 1. The only problem here is that theta can 
not be 1, because of next line of code

cell->alpha = 1. / (1 - theta);

That’s why I put such restriction. Now I see 2 possible solutions for that:
1) Exclude 1, and allow everything in range (0;+∞).
2) Or just increase/decrease theta by very small number if it is 1. 

> I have also tried to check the distribution wrt the explanations, with the 
> attached scripts, n=100, theta=1.01/1.5/3.0: It does not seem to work, 
> there is repeatable +15% bias on i=3 and repeatable -3% to -30% bias for 
> values in i=10-100, this for different values of theta (1.01,1.5, 3.0).
> 
> If you try the script, beware to set parameters (theta, n) consistently.

I've executed scripts that you attached with different theta and number of 
outcomes(not n, n remains the same = 100) and I found out that for theta = 0.1 
and big number of outcomes it gives distribution very similar to zipfian(for 
number of outcomes = 100 000, bias -6% to 8% in whole range and for NOO = 1000 
000, bias is -2% to 2%).

By, number of outcomes(NOO) I mean how many times random_zipfian was called. 
For example:
pgbench -f compte_bench.sql -t 10

So, I think it works but works worse for small number of outcomes. And also we 
need to find optimal theta for better results.

— 
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com <http://www.postgrespro.com/>
The Russian Postgres Company

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-13 Thread Alik Khilazhev
On 13 Jul 2017, at 00:20, Peter Geoghegan  wrote:Actually, I mean that I wonder how much of a difference it would makeif this entire block was commented out within _bt_doinsert():if (checkUnique != UNIQUE_CHECK_NO){    …}I am attaching results of test for 32 and 128 clients for original and patched(_bt_doinsert) variants.— Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
---+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|298 |  0 |15 |  8192 |  2196 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  3 |   575 |   289 |   860 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
19 c2 04 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  4 |   860 |   575 |  1145 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
21 58 06 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  5 |  1145 |   860 |  1430 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
29 ee 07 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  6 |  1430 |  1145 |  1715 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
31 84 09 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  7 |  1715 |  1430 |  2000 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
39 1a 0b 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  8 |  2000 |  1715 |  2285 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
41 b0 0c 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  9 |  2285 |  2000 |  2570 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
49 46 0e 00 00 00 00 00
 pgbench_accounts_pkey | 1 | 10 |  2570 |  2285 | 0 |   
   0 | i|177 |  0 |15 |  8192 |  4616 | 
 pgbench_accounts_pkey | 0 |  1 | 1 | 0 |  2751 |   
  65 | l| 21 |225 |16 |  8192 |  3228 | 
14 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  2 |  2751 | 1 |  2746 |   
  65 | l| 40 |182 |16 |  8192 |  3708 | 
3b 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  3 |  2746 |  2751 |  2750 |   
  65 | l| 43 |240 |16 |  8192 |  2488 | 
65 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  4 |  2750 |  2746 |  2745 |   
  65 | l| 51 | 57 |16 |  8192 |  5988 | 
97 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  5 |  2745 |  2750 |  2756 |   
  65 | l| 42 | 47 |16 |  8192 |  6368 | 
c0 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  6 |  2756 |  2745 |  2748 |   
  65 | l| 54 |139 |16 |  8192 |  4288 | 
f5 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  7 |  2748 |  2756 |  2755 |   
  65 | l| 57 |333 |16 |  8192 |   348 | 
2d 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  8 |  2755 |  2748 | 2 |   
  65 | l| 67 |308 |16 |  8192 |   648 | 
6f 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  9 | 2 |  2755 |  2753 |   
  65 | l| 75 |280 |16 |  8192 |  1048 | 
b9 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 10 |  2753 | 2 |  2747 |   
  65 | l| 83 |260 |16 |  8192 |  1288 | 
0b 02 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 11 |  2747 |  2753 |  2754 |   
  65 | l| 91 |192 |16 |  8192 |  2488 | 
65 02 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 12 |  2754 |  2747 | 4 |   
  65 | l|121 |196 |16 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Alik Khilazhev
Hello!

I want to say that our company is already engaged in the search for the causes 
of the problem and their solution. And also we have few experimental patches 
that increases performance for 1000 clients by several times.
 
In addition, I have fixed threadsafety issues and implemented per-thread cache 
for zeta values. See attached patch.


pgbench-zipf-02v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Alik Khilazhev
On 7 Jul 2017, at 21:53, Peter Geoghegan  wrote:Is it possible for you to instrument the number of B-Tree pageaccesses using custom instrumentation for pgbench_accounts_pkey?If that seems like too much work, then it would still be interestingto see what the B-Tree keyspace looks like before and after varyingthe "nclient" count from, say, 32 to 128. Maybe there is a significantdifference in how balanced or skewed it is in each case. Or, the indexcould simply be more bloated.There is a query that I sometimes use, that itself uses pageinspect,to summarize the keyspace quickly. It shows you the highkey for everyinternal page, starting from the root and working down to the lowestinternal page level (the one just before the leaf level -- level 1),in logical/keyspace order. You can use it to visualize thedistribution of values. It could easily include the leaf level, too,but that's less interesting and tends to make the query take ages. Iwonder what the query will show here.Here is the query:…I am attaching results of query that you sent. It shows that there is nothing have changed after executing tests. ...before 128

  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
-—-+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  3 |   575 |   289 |   860 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
19 c2 04 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  4 |   860 |   575 |  1145 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
21 58 06 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  5 |  1145 |   860 |  1430 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
29 ee 07 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  6 |  1430 |  1145 |  1715 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
31 84 09 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  7 |  1715 |  1430 |  2000 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
39 1a 0b 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  8 |  2000 |  1715 |  2285 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
41 b0 0c 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  9 |  2285 |  2000 |  2570 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
49 46 0e 00 00 00 00 00
 pgbench_accounts_pkey | 1 | 10 |  2570 |  2285 | 0 |   
   0 | i|177 |  0 |15 |  8192 |  4616 | 
(11 rows)

latency average = 1.375 ms
tps = 93085.250384 (including connections establishing)
tps = 93125.724773 (excluding connections establishing)
SQL script 1: /home/nglukhov/ycsb_read_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2782999 transactions (49.8% of total, tps = 46364.447705)
 - latency average = 0.131 ms
 - latency stddev = 0.087 ms
SQL script 2: /home/nglukhov/ycsb_update_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2780197 transactions (49.8% of total, tps = 46317.766703)
 - latency average = 2.630 ms
 - latency stddev = 14.092 ms

after 128

  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
—--+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|353 |  0 |15 |  8192 |  1096 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-10 Thread Alik Khilazhev
Hello, Fabien!

> Your description is not very precise. What version of Postgres is used? If 
> there is a decline, compared to which version? Is there a link to these 
> results?

Benchmark have been done in master v10. I am attaching image with results:
.

> Indeed, the function computation is over expensive, and the numerical 
> precision of the implementation is doubtful.
> 
> If there is no better way to compute this function, ISTM that it should be 
> summed in reverse order to accumulate small values first, from (1/n)^s + ... 
> + (1/2)^ s. As 1/1 == 1, the corresponding term is 1, no point in calling pow 
> for this one, so it could be:
> 
>   double ans = 0.0;
>   for (i = n; i >= 2; i--)
> ans += pow(1. / i, theta);
>   return 1.0 + ans;

You are right, it’s better to reverse order.

> If the functions when actually used is likely to be called with different 
> parameters, then some caching beyond the last value would seem in order. 
> Maybe a small fixed size array?
> 
> However, it should be somehow thread safe, which does not seem to be the case 
> with the current implementation. Maybe a per-thread cache? Or use a lock only 
> to update a shared cache? At least it should avoid locking to read values…

Yea, I forget about thread-safety. I will implement per-thread cache with small 
fixed array.

> Given the explanations, the random draw mostly hits values at the beginning 
> of the interval, so when the number of client goes higher one just get 
> locking contention on the updated row?

Yes, exactly. 

> ISTM that also having the tps achieved with a flat distribution would allow 
> to check this hypothesis.

On Workload A with uniform distribution PostgreSQL shows better results than 
MongoDB and MySQL(see attachment). Also you can notice that for small number of 
clients  type of distribution does not affect on tps on MySQL. 



And it’s important to mention that postgres run with option 
synchronous_commit=off, to satisfy  durability MongoDB 
writeConcern=1=false. In this mode there is possibility to lose all 
changes in the last second. If we run postgres with max durability MongoDB will 
lag far behind. 
---
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com <http://www.postgrespro.com/>
The Russian Postgres Company



[HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-07 Thread Alik Khilazhev
Hello!

PostgreSQL shows very bad results in YCSB Workload A (50% SELECT and 50% UPDATE 
of random row by PK) on benchmarking with big number of clients using Zipfian 
distribution. MySQL also has decline but it is not significant as it is in 
PostgreSQL. MongoDB does not have decline at all. And if pgbench would have 
Zipfian distribution random number generator, everyone will be able to make 
research on this topic without using YCSB. 
 
This is the reason why I am currently working on random_zipfian function.

The bottleneck of algorithm that I use is that it calculates zeta function (it 
has linear complexity - https://en.wikipedia.org/wiki/Riemann_zeta_function). 
It my cause problems on generating huge amount of big numbers. 

That’s why I added caching for zeta value. And it works good for cases when 
random_zipfian called with same parameters in script. For example:

… 
\set a random_zipfian(1, 100, 1.2)
\set b random_zipfian(1, 100, 1.2)
…

In other case, second call will override cache of first and caching does not 
make any sense:
…
\set a random_zipfian(1, 100, 1.2)
\set b random_zipfian(1, 200, 1.4)
… 

That’s why I have a question: should I implement support of caching zeta values 
for calls with different parameters, or not? 

P.S. I attaching patch and script - analogue of YCSB Workload A.
Run benchmark with command:
$ pgbench -f  ycsb_read_zipf.sql -f  ycsb_update_zipf.sql

On scale = 10(1 million rows) it gives following results on machine with 144 
cores(with synchronous_commit=off):
nclientstps
1   8842.401870
2   18358.140869
4   45999.378785
8   88713.743199
16  170166.998212
32  290069.221493
64  178128.030553
128 88712.825602
256 38364.937573
512 13512.765878
10006188.136736


ycsb_read_zipf.sql
Description: application/sql


pgbench-zipf-01v.patch
Description: Binary data


ycsb_update_zipf.sql
Description: application/sql

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers