Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-08 Thread Mengxing Liu
Thank you very much! I follow your advice and here is the result. 

SerializableXactHashLock 73
predicate_lock_manager 605
WALWriteLock 3
SerializableFinishedListLock 665

There are more than 90 events each time.  
SerializableXactHashLock/SerializableFinishedListLock are both used in SSI. 
I think that's why PG is so slow in high contention environment.


> -Original Messages-
> From: "Robert Haas" <robertmh...@gmail.com>
> Sent Time: 2017-06-08 01:30:58 (Thursday)
> To: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>
> Cc: kgrittn <kgri...@gmail.com>, "Alvaro Herrera" <alvhe...@2ndquadrant.com>, 
> "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> On Tue, Jun 6, 2017 at 12:16 PM, Mengxing Liu
> <liu-m...@mails.tsinghua.edu.cn> wrote:
> > I think disk I/O is not the bottleneck in our experiment, but the global 
> > lock is.
> 
> A handy way to figure this kind of thing out is to run a query like
> this repeatedly during the benchmark:
> 
> SELECT wait_event_type, wait_event FROM pg_stat_activity;
> 
> I often do this by using psql's \watch command, often \watch 0.5 to
> run it every half-second.  I save all the results collected during the
> benchmark using 'script' and then analyze them to see which wait
> events are most frequent.  If your theory is right, you ought to see
> that SerializableXactHashLock occurs as a wait event very frequently.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-07 Thread Mengxing Liu


> From: "Kevin Grittner" 
>  wrote:
> 
> > "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all),  
> > cpu user time should be about 12*4 = 48.
> > There seems to be no process blocked by IO.
> >
> > procs ---memory-- ---swap-- -io -system-- 
> > --cpu-
> >  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
> > wa st
> > 28  0  0 981177024 315036 7084376000 0 900  1  
> > 0 99  0  0
> > 21  1  0 981178176 315036 7084378400 0 0 25482 329020 
> > 12  3 85  0  0
> > 18  1  0 981179200 315036 7084379200 0 0 26569 323596 
> > 12  3 85  0  0
> > 17  0  0 981175424 315036 7084380800 0 0 25374 322992 
> > 12  4 85  0  0
> > 12  0  0 981174208 315036 7084382400 0 0 24775 321577 
> > 12  3 85  0  0
> >  8  0  0 981179328 315036 7084533600 0 0 13115 199020  
> > 6  2 92  0  0
> > 13  0  0 981179200 315036 7084579200 0 0 22893 301373 
> > 11  3 87  0  0
> > 11  0  0 981179712 315036 7084580800 0 0 26933 325728 
> > 12  4 84  0  0
> > 30  0  0 981178304 315036 7084582400 0 0 23691 315821 
> > 11  4 85  0  0
> > 12  1  0 981177600 315036 7084583200 0 0 29485 320166 
> > 12  4 84  0  0
> > 32  0  0 981180032 315036 7084584800 0 0 25946 316724 
> > 12  4 84  0  0
> > 21  0  0 981176384 315036 7084586400 0 0 24227 321938 
> > 12  4 84  0  0
> > 21  0  0 981178880 315036 7084588000 0 0 25174 326943 
> > 13  4 83  0  0
> 
> This machine has 120 cores?  Is hyperthreading enabled?  If so, what
> you are showing might represent a total saturation of the 30 cores.
> Context switches of about 300,000 per second is pretty high.  I can't
> think of when I've seen that except when there is high spinlock
> contention.
> 

Yes, and the hyper-threading is closed. 

> Just to put the above in context, how did you limit the test to 30
> cores?  How many connections were open during the test?
> 

I used numactl to limit the test in the first two sockets (15 cores in each 
socket)
And there are 90 concurrent connections. 

> > The flame graph is attached. I use 'perf' to generate the flame graph. Only 
> > the CPUs running PG server are profiled.
> > I'm not familiar with other part of PG. Can you find anything unusual in 
> > the graph?
> 
> Two SSI functions stand out:
> 10.86% PredicateLockTuple
>  3.51% CheckForSerializableConflictIn
> 
> In both cases, most of that seems to go to lightweight locking.  Since
> you said this is a CPU graph, that again suggests spinlock contention
> issues.
> 
> -- 

Yes. Is there any other kind of locks besides spinlock? I'm reading locks in PG 
now. If all locks are spinlock, the CPU should be used 100%. But now only 50% 
CPU are used. 
I'm afraid there are extra time waiting for mutex or semaphore.
These SSI functions will cost more time than reported, because perf doesn't 
record the time sleeping & waiting for locks. 
CheckForSerializableConflictIn takes 10% of running time. (refer to my last 
email) 

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-07 Thread Kevin Grittner
On Sun, Jun 4, 2017 at 11:27 AM, Mengxing Liu
 wrote:

> "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all),  
> cpu user time should be about 12*4 = 48.
> There seems to be no process blocked by IO.
>
> procs ---memory-- ---swap-- -io -system-- 
> --cpu-
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa 
> st
> 28  0  0 981177024 315036 7084376000 0 900  1  0 
> 99  0  0
> 21  1  0 981178176 315036 7084378400 0 0 25482 329020 12  
> 3 85  0  0
> 18  1  0 981179200 315036 7084379200 0 0 26569 323596 12  
> 3 85  0  0
> 17  0  0 981175424 315036 7084380800 0 0 25374 322992 12  
> 4 85  0  0
> 12  0  0 981174208 315036 7084382400 0 0 24775 321577 12  
> 3 85  0  0
>  8  0  0 981179328 315036 7084533600 0 0 13115 199020  6  
> 2 92  0  0
> 13  0  0 981179200 315036 7084579200 0 0 22893 301373 11  
> 3 87  0  0
> 11  0  0 981179712 315036 7084580800 0 0 26933 325728 12  
> 4 84  0  0
> 30  0  0 981178304 315036 7084582400 0 0 23691 315821 11  
> 4 85  0  0
> 12  1  0 981177600 315036 7084583200 0 0 29485 320166 12  
> 4 84  0  0
> 32  0  0 981180032 315036 7084584800 0 0 25946 316724 12  
> 4 84  0  0
> 21  0  0 981176384 315036 7084586400 0 0 24227 321938 12  
> 4 84  0  0
> 21  0  0 981178880 315036 7084588000 0 0 25174 326943 13  
> 4 83  0  0

This machine has 120 cores?  Is hyperthreading enabled?  If so, what
you are showing might represent a total saturation of the 30 cores.
Context switches of about 300,000 per second is pretty high.  I can't
think of when I've seen that except when there is high spinlock
contention.

Just to put the above in context, how did you limit the test to 30
cores?  How many connections were open during the test?

> The flame graph is attached. I use 'perf' to generate the flame graph. Only 
> the CPUs running PG server are profiled.
> I'm not familiar with other part of PG. Can you find anything unusual in the 
> graph?

Two SSI functions stand out:
10.86% PredicateLockTuple
 3.51% CheckForSerializableConflictIn

In both cases, most of that seems to go to lightweight locking.  Since
you said this is a CPU graph, that again suggests spinlock contention
issues.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-07 Thread Robert Haas
On Tue, Jun 6, 2017 at 12:16 PM, Mengxing Liu
 wrote:
> I think disk I/O is not the bottleneck in our experiment, but the global lock 
> is.

A handy way to figure this kind of thing out is to run a query like
this repeatedly during the benchmark:

SELECT wait_event_type, wait_event FROM pg_stat_activity;

I often do this by using psql's \watch command, often \watch 0.5 to
run it every half-second.  I save all the results collected during the
benchmark using 'script' and then analyze them to see which wait
events are most frequent.  If your theory is right, you ought to see
that SerializableXactHashLock occurs as a wait event very frequently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-06 Thread Mengxing Liu
Hi, Kevin and Alvaro. 

I think disk I/O is not the bottleneck in our experiment, but the global lock 
is. 

For disk I/O, there are two evidences:
1) The total throughput is not more than 10 Ktps. Only a half are update 
transactions. An update transaction modifies 20 tuples; each tuple's size is 
about 100B.  
So the data written to the disk should be less than 10MB per second. Even we 
take write-ahead log in consideration (just double it), the data should be less 
than 20MB/s. 
I replaced ramdisk with SSD, and "iostat" shows the same result, while our 
SSD's sequential write speed is more than 700MB/s. 
2) I changed isolation level from "serializable" to "read committed". As the 
isolation requirement becomes looser, throughput is increased. But in this 
case, the CPU utilization is nearly 100%. (It's about 50% in serializable model)
Therefore, disk I/O is not the bottleneck.

For the lock:
I read the source code in predicate.c; I found many functions use a global 
lock:  SerializableXactHashLock. So there is only one process on working at any 
time!
As the problem of CPU not being fully used just happened after I had changed 
isolation level to "serailizable", this global lock should be the bottleneck.
Unfortunately, "perf" seems unable to record time waiting for locks.
I did it by hand.  Specifically, I use function "gettimeofday" just before 
acquiring locks and after releasing locks. 
In this way, I found function CheckForSerializableIn/CheckForSerializableOut 
takes more than 10% of running time, which is far bigger than what reported by 
perf in the last email.

If my statement is right, it sounds like good news as we find where the problem 
is.
Kevin has mentioned that the lock is used to protect the linked list. So I want 
to replace the linked list with hash table in the next step. After that I will 
try to remove this lock carefully.
But  in this way, our purpose has been changed. O(N^2) tracking is not the 
bottleneck, the global lock is.

By the way, using "gettimeofday" to profile is really ugly. 
Perf lock can only record kernel mutex, and requires kernel support, so I 
didn't use it.
Do you have any good idea about profiling time waiting for locks?


> -Original Messages-
> From: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>
> Sent Time: 2017-06-05 00:27:51 (Monday)
> To: "Kevin Grittner" <kgri...@gmail.com>
> Cc: "Alvaro Herrera" <alvhe...@2ndquadrant.com>, 
> "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> 
> 
> 
> > -Original Messages-
> > From: "Kevin Grittner" <kgri...@gmail.com>
> 
> > > I tried 30 cores. But the CPU utilization is about 45%~70%.
> > > How can we distinguish  where the problem is? Is disk I/O or Lock?
> > 
> > A simple way is to run `vmstat 1` for a bit during the test.  Can
> > you post a portion of the output of that here?  If you can configure
> > the WAL directory to a separate mount point (e.g., use the --waldir
> > option of initdb), a snippet of `iostat 1` output would be even
> > better.
> 
> "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all),  
> cpu user time should be about 12*4 = 48. 
> There seems to be no process blocked by IO. 
> 
> procs ---memory-- ---swap-- -io -system-- 
> --cpu-
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa 
> st
> 28  0  0 981177024 315036 7084376000 0 900  1  0 
> 99  0  0
> 21  1  0 981178176 315036 7084378400 0 0 25482 329020 12  
> 3 85  0  0
> 18  1  0 981179200 315036 7084379200 0 0 26569 323596 12  
> 3 85  0  0
> 17  0  0 981175424 315036 7084380800 0 0 25374 322992 12  
> 4 85  0  0
> 12  0  0 981174208 315036 7084382400 0 0 24775 321577 12  
> 3 85  0  0
>  8  0  0 981179328 315036 7084533600 0 0 13115 199020  6  
> 2 92  0  0
> 13  0  0 981179200 315036 7084579200 0 0 22893 301373 11  
> 3 87  0  0
> 11  0  0 981179712 315036 7084580800 0 0 26933 325728 12  
> 4 84  0  0
> 30  0  0 981178304 315036 7084582400 0 0 23691 315821 11  
> 4 85  0  0
> 12  1  0 981177600 315036 7084583200 0 0 29485 320166 12  
> 4 84  0  0
> 32  0  0 981180032 315036 7084584800 0 0 25946 316724 12  
> 4 84  0  0
> 21  0  0 981176384 315036 7084586400 0 0 24227 321938 12  
> 4 84  0  0
>

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-03 Thread Kevin Grittner
On Sat, Jun 3, 2017 at 1:51 AM, Mengxing Liu
 wrote:

> I tried 30 cores. But the CPU utilization is about 45%~70%.
> How can we distinguish  where the problem is? Is disk I/O or Lock?

A simple way is to run `vmstat 1` for a bit during the test.  Can
you post a portion of the output of that here?  If you can configure
the WAL directory to a separate mount point (e.g., use the --waldir
option of initdb), a snippet of `iostat 1` output would be even
better.

I think the best thing may be if you can generate a CPU flame graph
of the worst case you can make for these lists:
http://www.brendangregg.com/flamegraphs.html  IMO, such a graph
highlights the nature of the problem better than anything else.

> Does "traversing these list" means the function "RWConflictExists"?
> And "10% waiting on the corresponding kernel mutexes" means the
> lock in the function CheckForSerializableIn/out ?

Since they seemed to be talking specifically about the conflict
list, I had read that as SerializableXactHashLock, although the
wording is a bit vague -- they may mean something more inclusive.

> If I used 30 cores for server, and 90 clients, RWConflictExists
> consumes 1.0% CPU cycles, and CheckForSerializableIn/out takes 5%
> in all.
> But the total CPU utilization of PG is about 50%. So the result
> seems to be matched.
> If we can solve this problem, we can use this benchmark for the
> future work.

If you can get a flame graph of CPU usage on this load, that would
be ideal.  At that point, we can discuss what is best to attack.
Reducing something that is 10% of the total PostgreSQL CPU load in a
particular workload sounds like it could still have significant
value, although if you see a way to attack the other 90% (or some
portion of it larger than 10%) instead, I think we could adjust the
scope based on those results.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-03 Thread Mengxing Liu


> -Original Messages-
> From: "Kevin Grittner" <kgri...@gmail.com>
> Sent Time: 2017-06-03 01:44:16 (Saturday)
> To: "Alvaro Herrera" <alvhe...@2ndquadrant.com>
> Cc: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>, 
> "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: Re: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> > Mengxing Liu wrote:
> 
> >> The CPU utilization of CheckForSerializableConflictOut/In is
> >> 0.71%/0.69%.
> 
> How many cores were on the system used for this test?  The paper
> specifically said that they didn't see performance degradation on
> the PostgreSQL implementation until 32 concurrent connections with
> 24 or more cores.  The goal here is to fix *scaling* problems.  Be
> sure you are testing at an appropriate scale.  (The more sockets,
> cores, and clients, the better, I think.)
> 
> 

I used 15 cores for server and 50 clients. 
I tried 30 cores. But the CPU utilization is about 45%~70%. 
How can we distinguish  where the problem is? Is disk I/O or Lock ?

> On Fri, Jun 2, 2017 at 10:08 AM, Alvaro Herrera
> <alvhe...@2ndquadrant.com> wrote:
> 
> > Kevin mentioned during PGCon that there's a paper by some group in
> > Sydney that developed a benchmark on which this scalability
> > problem showed up very prominently.
> 
> https://pdfs.semanticscholar.org/6c4a/e427e6f392b7dec782b7a60516f0283af1f5.pdf
> 
> "[...] we see a much better scalability of pgSSI than the
> corresponding implementations on InnoDB. Still, the overhead of
> pgSSI versus standard SI increases significantly with higher MPL
> than one would normally expect, reaching around 50% with MPL 128."
> 

Actually, I implemented the benchmark described in the paper at first. I 
reported the result in a previous email.
The problem is that the transaction with longer conflict list is easier to be 
aborted, so the conflict list can not grow too long.
I modify the benchmark by using Update-only transaction and Read-only 
transaction to get rid of this problem. In this way, dangerous structure will 
never be generated.

> "Our profiling showed that PostgreSQL spend 2.3% of the overall
> runtime in traversing these list, plus 10% of its runtime waiting on
> the corresponding kernel mutexes."
> 

Does "traversing these list" means the function "RWConflictExists"? 
And "10% waiting on the corresponding kernel mutexes" means the lock in the 
function CheckForSerializableIn/out ?

> If you cannot duplicate their results, you might want to contact the
> authors for more details on their testing methodology.
> 

If I used 30 cores for server, and 90 clients, RWConflictExists consumes 1.0% 
CPU cycles, and CheckForSerializableIn/out takes 5% in all. 
But the total CPU utilization of PG is about 50%. So the result seems to be 
matched. 
If we can solve this problem, we can use this benchmark for the future work.
 

Sincerely

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-02 Thread Kevin Grittner
> Mengxing Liu wrote:

>> The CPU utilization of CheckForSerializableConflictOut/In is
>> 0.71%/0.69%.

How many cores were on the system used for this test?  The paper
specifically said that they didn't see performance degradation on
the PostgreSQL implementation until 32 concurrent connections with
24 or more cores.  The goal here is to fix *scaling* problems.  Be
sure you are testing at an appropriate scale.  (The more sockets,
cores, and clients, the better, I think.)


On Fri, Jun 2, 2017 at 10:08 AM, Alvaro Herrera
 wrote:

> Kevin mentioned during PGCon that there's a paper by some group in
> Sydney that developed a benchmark on which this scalability
> problem showed up very prominently.

https://pdfs.semanticscholar.org/6c4a/e427e6f392b7dec782b7a60516f0283af1f5.pdf

"[...] we see a much better scalability of pgSSI than the
corresponding implementations on InnoDB. Still, the overhead of
pgSSI versus standard SI increases significantly with higher MPL
than one would normally expect, reaching around 50% with MPL 128."

"Our profiling showed that PostgreSQL spend 2.3% of the overall
runtime in traversing these list, plus 10% of its runtime waiting on
the corresponding kernel mutexes."

If you cannot duplicate their results, you might want to contact the
authors for more details on their testing methodology.

Note that the locking around access to the list is likely to be a
bigger problem than the actual walking and manipulation of the list
itself.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-02 Thread Alvaro Herrera
Mengxing Liu wrote:
> Hi,  Alvaro and Kevin.
> 
> > Anyway, this is just my analysis. 
> > So I want to hack the PG and count the conflict lists' size of 
> > transactions. That would be more accurate.
> 
> In the last week, I hacked the PG to add an additional thread to count 
> RWConflict list lengths. 
> And tune the benchmark to get more conflict. But the result is still not good.

Kevin mentioned during PGCon that there's a paper by some group in
Sydney that developed a benchmark on which this scalability problem
showed up very prominently.  I think your first step should be to
reproduce their results -- my recollection is that Kevin says you
already know that paper, so please dedicate some time to analyze it and
reproduce their workload.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-02 Thread Mengxing Liu
Hi,  Alvaro and Kevin.

> Anyway, this is just my analysis. 
> So I want to hack the PG and count the conflict lists' size of transactions. 
> That would be more accurate.

In the last week, I hacked the PG to add an additional thread to count 
RWConflict list lengths. 
And tune the benchmark to get more conflict. But the result is still not good.

> 
> > 
> > Yeah, you need a workload that generates a longer conflict list -- if
> > you can make the tool generate a conflict list with a configurable
> > length, that's even better (say, 100 conflicts vs. 1000 conflicts).
> > Then we can see how the conflict list processing scales.
> > 
> 
> Yes, I tried to increase the read set to make more conflicts. 
> However the abort ratio will also increase. The CPU cycles consumed by 
> conflict tracking are still less than 1%.
> According to the design of PG, a transaction will be aborted if there is a 
> rw-antidependency. 
> In this case, a transaction with a longer conflict list, is more possible to 
> be aborted.
> That means, the conflict list doesn't have too many chances to grow too long. 
> 

To solve this problem, I use just two kinds of transactions: Read-only 
transactions and Update-only transactions.
In this case, no transaction would  have an In-RWconflict and an Out-RWconflict 
at the same time.  
Thus transactions would not be aborted by conflict checking. 

Specifically, The benchmark is like this:
The table has 10K rows. 
Read-only transactions read 1K rows and Update-only transactions update 20 
random rows of the table. 

In this benchmark, about 91% lists are shorter than 10; 
lengths of 6% conflict lists are between 10 and 20. Only 2% lists are longer 
than 20. The CPU utilization of CheckForSerializableConflictOut/In is 
0.71%/0.69%.

I tried to increase the write set. As a result, conflict list become longer. 
But the total CPU utilization is decreased (about 50%).
CPU is not the bottleneck anymore. I'm not familiar with other part of PG. Is 
it caused by LOCK? Is there any chance to get rid of this problem?

BTW, I found that the email is not very convenient, especially when I  have 
some problem and want to discuss with you.
Would you mind scheduling a meeting every week by Skye, or other instant 
message software you like?
I would not take you too much time. Maybe one hour is enough.   


Sincerely.
--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-05-09 Thread Alvaro Herrera
Hi Mengxing,

Mengxing Liu wrote:
> Hi, Alvaro and Kevin. I'm  Mengxing.  
> 
> This is a “synchronization” email to  tell you what I've done and my next 
> plan. I'm looking forward to your advice. 

Welcome!

> According to my proposal, I want to prepare the experimental environment 
> during the community bonding period. 
> 
> As this is the first time I discuss with Alvaro, here I will introduce the 
> environment again. 
> 
> My lab have a Lenovo System x3950 X6 machine. 
> 
> https://www.lenovo.com/images/products/system-x/pdfs/datasheets/x3950_x6_ds.pdf
> 
> More specifically, there are 8 sockets, each has 15 Intel(R) Xeon(R) CPU 
> E7-8870 v2 @ 2.30GHz. 
> 
> Thus we have 120 cores in total. The storage is a 1 TB SSD, with SAS 
> interface, 500MB write bandwidth. 

OK, having a single disk and 120 CPU cores sounds unbalanced.

> Because we have too many cores, SSD becomes the bottleneck. In my test, 
> pgbench can scale to 36 connections. ( 18 KTPS throughput). CPU utilization 
> is smaller than 30%. 
> 
> Therefore:
> 
> 1. Is there anything wrong in my tuning parameters?For example, should I 
> close "fsync"? Because we don't care recovery here. 

While it's true that we don't care about recovery, I'm not sure that
benchmark results would still be valid with fsync=off.  I would try
synchronous_commit=off instead, and perhaps also enlarge wal_buffers.
What do you have shared_buffers set to?

> 2. Can we use just two sockets (30 physical cores) to run database server? 
> Then the CPU can be the bottleneck so that it  shows the problem we try to 
> solve.

Sure -- it's not a Postgres option, but an operating system option:
you'd set the "maxcpus" parameter in GRUB when booting Linux.
Alternatively, you could use "numactl" to bind the postgres server to a
subset of CPUs.  (And you could put pgbench on a different CPU set).

> >  What method of communication will be used among the mentors and with 
> > Mengxing.
> 
> What method do you prefer?

Mailing list is fine.

> >  Frequency of status updates (must be at least once a week and more often 
> > is encouraged).
> 
> How about reporting my status once a week?

Once a week sounds good to me.

> >  What steps will be taken next during the community bonding period.
> 
> As I wrote in the proposal, I want to establish the environment and read the 
> related source code. Do you have any suggestions for me?


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-15 Thread Kevin Grittner
On Wed, Mar 15, 2017 at 11:35 AM, Mengxing Liu
 wrote:

>> On a NUMA machine It is not at all unusual to see bifurcated results
>> -- with each run coming in very close to one number or a second
>> number, often at about a 50/50 rate, with no numbers falling
>> anywhere else.  This seems to be based on where the processes and
>> memory allocations happen to land.
>>
>
> Do you mean that for a NUMA machine, there usually exists two
> different results of its performance?
> Just two? Neither three nor four?

In my personal experience, I have often seen two timings that each
run randomly matched; I have not seen nor heard of more, but that
doesn't mean it can't happen.  ;-)

> At first, I will compile and install PostgreSQL by myself and try
> the profile tools (perf or oprofile).

perf is newer, and generally better if you can use it.  Don't try to
use either on HP hardware -- the BIOS uses some of the same hardware
registers that other manufacturers leave for use of profilers; an HP
machine is likely to freeze or reboot if you try to run either of
those profilers under load.

> Then I will run one or two benchmarks using different config,
> where I may need your help to ensure that my tests are close to the
> practical situation.

Yeah, we should talk about OS and PostgreSQL configuration before
you run any benchmarks.  Neither tends to come configured as I would
run a production system.

> PS: Disable NUMA in BIOS means that CPU can use its own memory
> controller when accessing local memory to reduce hops.

NUMA means that each CPU chip directly controls some of the RAM
(possibly with other, non-CPU controllers for some RAM).  The
question is whether the BIOS or the OS controls the memory
allocation.  The OS looks at what processes are on what cores and
tries to use "nearby" memory for allocations.  This can be pessimal
if the amount of RAM that is under contention is less than the size
of one memory segment, since all CPU chips need to ask the one
managing that RAM for each access.  In such a case, you actually get
best performance using a cpuset which just uses one CPU package and
the memory segments directly managed by that CPU package.  Without
the cpuset you may actually see better performance for this workload
by letting the BIOS interleave allocations, which spreads the RAM
allocations around to memory managed by all CPUs, and no one CPU
becomes the bottleneck.  The access is still not uniform, but you
dodge a bottleneck -- albeit less efficiently than using a custom
cpuset.

> On the contrary, "enable" means UMA.

No.  Think about this: regardless of this BIOS setting each RAM
package is directly connected to one CPU package, which functions as
its memory controller.  Most of the RAM used by PostgreSQL is for
disk buffers -- shared buffers in shared memory and OS cache.
Picture processes running on different CPU packages accessing a
single particular shared buffer.  Also picture processes running on
different CPU packages using the same spinlock at the same time.  No
BIOS setting can avoid the communications and data transfer among
the 8 CPU packages, and the locking of the cache lines.

> Therefore, I think Tony is right, we should disable this setting.
>
> I got the information from here.
> http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/

Ah, that explains it.  There is no such thing as "disabling NUMA" --
you can have the BIOS force interleaving, or you can have the BIOS
leave the NUMA memory assignment to the OS.  I was assuming that by
"disabling NUMA" you meant to have BIOS control it through
interleaving.  You meant the opposite -- disabling the BIOS override
of OS NUMA control.  I agree that we should leave NUMA scheduling to
the OS. There are, however, some non-standard OS configuration
options that allow NUMA to behave better with PostgreSQL than the
defaults allow.  We will need to tune a little.

The author of that article seems to be assuming that the usage will
be with applications like word processing, spreadsheets, or browsers
-- where the OS can place all the related processes on a single CPU
package and all (or nearly all) memory allocations can be made from
associated memory -- yielding a fairly uniform and fast access when
the BIOS override is disabled.  On a database product which wants to
use all the cores and almost all of the memory, with heavy
contention on shared memory, the situation is very different.
Shared resource access time is going to be non-uniform no matter
what you do.  The difference is that the OS can still make *process
local* allocations from nearby memory segments, while BIOS cannot.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-15 Thread Mengxing Liu



> -Original Messages-
> From: "Kevin Grittner" <kgri...@gmail.com>
> Sent Time: 2017-03-15 23:20:07 (Wednesday)
> To: DEV_OPS <dev...@ww-it.cn>
> Cc: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>, 
> "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> On Tue, Mar 14, 2017 at 3:45 PM, Kevin Grittner <kgri...@gmail.com> wrote:
> >> On 3/14/17 17:34, Mengxing Liu wrote:
> >>> There are several alternative benchmarks. Tony suggested that we
> >>> should use TPC-E and TPC-DS.
> >
> > More benchmarks is better, all other things being equal.  Keep in
> > mind that good benchmarking practice with PostgreSQL generally
> > requires a lot of setup time (so that we're starting from the exact
> > same conditions for every run), a lot of run time (so that the
> > effects of vacuuming, bloat, and page splitting all comes into play,
> > like it would in the real world), and a lot of repetitions of each
> > run (to account for variation).  In particular, on a NUMA machine it
> > is not at all unusual to see bifurcated
> 
> Sorry I didn't finish that sentence.
> 
> On a NUMA machine It is not at all unusual to see bifurcated results
> -- with each run coming in very close to one number or a second
> number, often at about a 50/50 rate, with no numbers falling
> anywhere else.  This seems to be based on where the processes and
> memory allocations happen to land.
> 

Do you mean that for a NUMA machine, there usually exists two different results 
of its performance? 
Just two? Neither three nor four?  

Anyway, firstly, I think I should get familiar with PostgreSQL and tools you 
recommended to me at first. 
Then I will try to have a better comprehension about it, to make  our 
discussion more efficient.

Recently, I am busy preparing for the presentation at ASPLOS'17  and other lab 
works. 
But I promise I can finish all preparation works before May. Here is my working 
plan: 
At first, I will compile and install PostgreSQL by myself and try the profile 
tools (perf or oprofile).
Then I will run one or two benchmarks using different config, where I may need 
your help to ensure that my tests are close to the practical situation.
 
PS: Disable NUMA in BIOS means that CPU can use its own memory controller when 
accessing local memory to reduce hops. 
On the contrary, "enable" means UMA. Therefore, I think Tony is right, we 
should disable this setting.
I got the information from here. 
http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/

Regards.

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-15 Thread Kevin Grittner
On Tue, Mar 14, 2017 at 3:45 PM, Kevin Grittner  wrote:
>> On 3/14/17 17:34, Mengxing Liu wrote:
>>> There are several alternative benchmarks. Tony suggested that we
>>> should use TPC-E and TPC-DS.
>
> More benchmarks is better, all other things being equal.  Keep in
> mind that good benchmarking practice with PostgreSQL generally
> requires a lot of setup time (so that we're starting from the exact
> same conditions for every run), a lot of run time (so that the
> effects of vacuuming, bloat, and page splitting all comes into play,
> like it would in the real world), and a lot of repetitions of each
> run (to account for variation).  In particular, on a NUMA machine it
> is not at all unusual to see bifurcated

Sorry I didn't finish that sentence.

On a NUMA machine It is not at all unusual to see bifurcated results
-- with each run coming in very close to one number or a second
number, often at about a 50/50 rate, with no numbers falling
anywhere else.  This seems to be based on where the processes and
memory allocations happen to land.

Different people have dealt with this in different ways.  If you can
get five or more runs of a given test, perhaps the best is to throw
away the high and low values and average the rest.  Other approaches
I've seen are to average all, take the median, or take the best
result.  The worst result of a set of runs is rarely interesting, as
it may be due to some periodic maintenance kicking in (perhaps at
the OS level and not related to database activity at all).

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-14 Thread Kevin Grittner
On Tue, Mar 14, 2017 at 6:00 AM, DEV_OPS  wrote:
> On 3/14/17 17:34, Mengxing Liu wrote:

> The worst problems have been
> seen with 32 or more cores on 4 or more sockets with a large number
> of active connections.  I don't know whether you have access to a
> machine capable of putting this kind of stress on it (perhaps at
> your university?), but if not, the community has access to various
> resources we should be able to schedule time on.
 There is a NUMA machine ( 120 cores, 8 sockets) in my lab.
>>> Fantastic!  Can you say a bit more about the architecture and OS?
>>
>> Intel(R) Xeon(R) CPU at 2.3GHz, with 1TB physical DRAM and 1.5 TB
>> SSD, running Ubuntu 14.04, Kernel 3.19.
>> I guess NUMA is disabled in BIOS, but I will check that.

I'm not sure what it would mean to "disable" NUMA -- if the CPU
chips are each functioning as memory controller for a subset of the
RAM you will have non-uniform memory access speeds from any core to
different RAM locations.  You can switch it to "interleaved" access,
so the speed of access from a core to any logical memory address
will be rather random, rather than letting the OS try to arrange
things so that processes do most of their access to memory that is
faster for them.  It is generally better to tune NUMA in the OS than
to randomize things at the BIOS level.

>> However, there is only one NIC, so network could be the
>> bottleneck if we use too many cores?

Well, if we run the pgbench client on the database server box, the
NIC won't matter at all.  If we move the client side to another box,
I still think that when we hit this problem, it will dwarf any
impact of the NIC throughput.

> The configuration is really cool, for the SSD, is it SATA interface?
> NVMe interface, or is PCIe Flash? different SSD interface had different
> performance characters.

Yeah, knowing model of SSD, as well as which particular Xeon we're
using, would be good.

> for the NUMA, because the affinity issue will impact PostgreSQL
> performance. so, Please disabled it if possible

I disagree.  (see above)

>> There are several alternative benchmarks. Tony suggested that we
>> should use TPC-E and TPC-DS.

More benchmarks is better, all other things being equal.  Keep in
mind that good benchmarking practice with PostgreSQL generally
requires a lot of setup time (so that we're starting from the exact
same conditions for every run), a lot of run time (so that the
effects of vacuuming, bloat, and page splitting all comes into play,
like it would in the real world), and a lot of repetitions of each
run (to account for variation).  In particular, on a NUMA machine it
is not at all unusual to see bifurcated

>> Personally, I am more familiar with TPC-C.

Unfortunately, the TPC-C benchmark does not create any cycles in the
transaction dependencies, meaning that it is not a great tool for
benchmarking serializable transactions.  I know there are variations
on TPC-C floating around that add a transaction type to do so, but
on a quick search right now, I was unable to find one.

>> And pgbench seems only have TPC-B built-in benchmark.

You can feed it your own custom queries, instead.

>> Well, I think we can easily find the implementations of these
>> benchmarks for PostgreSQL.

Reportedly, some of the implementations of TPC-C are not very good.
There is DBT-2, but I've known a couple of people to look at that
and find that it needed work before they could use it.  Based on the
PostgreSQL versions mentioned on the Wiki page, it has been
neglected for a while:

https://wiki.postgresql.org/wiki/DBT-2

>> The paper you recommended to me used a special benchmark defined
>> by themselves. But it is quite simple and easy to implement.

It also has the distinct advantage that we *know* they created a
scenario where the code we want to tune was using most of the CPU on
the machine.

>> For me, the challenge is profiling the execution. Is there any
>> tools in PostgreSQL to analysis where is the CPU cycles consumed?
>> Or I have to instrument and time by myself?

Generally oprofile or perf is used if you want to know where the
time is going.  This creates a slight dilemma -- if you configure
your build with --enable-cassert, you get the best stack traces and
you can more easily break down execution profiles.  That especially
true if you disable optimization and don't omit frame pointers.  But
all of those things distort the benchmarks -- adding a lot of time,
and not necessarily in proportion to where the time goes with an
optimized build.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-14 Thread DEV_OPS
Hi Mengxing

Please read my comments :


On 3/14/17 17:34, Mengxing Liu wrote:
> I send this email to Tony, too. Because he promised to help me with testing 
> and benchmarking.
>
 The worst problems have been
 seen with 32 or more cores on 4 or more sockets with a large number
 of active connections.  I don't know whether you have access to a
 machine capable of putting this kind of stress on it (perhaps at
 your university?), but if not, the community has access to various
 resources we should be able to schedule time on.
>>> There is a NUMA machine ( 120 cores, 8 sockets) in my lab.
>> Fantastic!  Can you say a bit more about the architecture and OS?
>>
> Intel(R) Xeon(R) CPU at 2.3GHz, with 1TB physical DRAM and 1.5 TB SSD, 
> running Ubuntu 14.04, Kernel 3.19.
> I guess NUMA is disabled in BIOS, but I will check that. 
> However, there is only one NIC, so network could be the bottleneck if we use 
> too many cores?
The configuration is really cool, for the SSD, is it SATA interface?
NVMe interface, or is PCIe Flash? different SSD interface had different
performance characters.

for the NUMA, because the affinity issue will impact PostgreSQL
performance. so, Please disabled it if possible
>
>>> I think it's enough to put this kind of stress.
>> The researchers who saw this bottleneck reported that performance
>> started to dip at 16 cores and the problem was very noticeable at 32
>> cores.  A stress test with 120 cores on 8 sockets will be great!
>>
>> I think perhaps the first milestone on the project should be to
>> develop a set of benchmarks we want to compare to at the end.  That
>> would need to include a stress test that clearly shows the problem
>> we're trying to solve, along with some cases involving 1 or two
>> connections -- just to make sure we don't harm performance for
>> low-contention situations.
>>
> Thank for your advice! It's really helpful for my proposal. 
>
> There are several alternative benchmarks. Tony suggested that we should use 
> TPC-E and TPC-DS. 
> Personally, I am more familiar with TPC-C. And pgbench seems only have TPC-B 
> built-in benchmark.
> Well, I think we can easily find the implementations of these benchmarks for 
> PostgreSQL.
> The paper you recommended to me used a special benchmark defined by 
> themselves. But it is quite simple and easy to implement.
for benchmark tool, TPC-E is the latest spec, could use it to stress PG
for OLTP application/mode.
but for PostgreSQL, pgbench is developed by community, and used to
benchmark PG performance.
however, you could use any you like, it depends on your situation. ^_^ 
>
> For me, the challenge is profiling the execution. Is there any tools in 
> PostgreSQL to analysis where is the CPU cycles consumed?
> Or I have to instrument and time by myself?
in Solaris system, there is a strong probe system available, it named
Dtrace, you might use it to profiling
but for how to use it, I think you need to consult illumos(a fork from
Solaris/OpenSolaris), IRC channel is #illumos on freechat.net, (but I
think there is Dtrace expert in this mail list, you could write other
thread to ask)
>
>
> Regards.
>
> --
> Mengxing Liu
>
>
>
>
>
>
>
>
>
>



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-14 Thread Mengxing Liu

I send this email to Tony, too. Because he promised to help me with testing and 
benchmarking.

> 
> >> The worst problems have been
> >> seen with 32 or more cores on 4 or more sockets with a large number
> >> of active connections.  I don't know whether you have access to a
> >> machine capable of putting this kind of stress on it (perhaps at
> >> your university?), but if not, the community has access to various
> >> resources we should be able to schedule time on.
> >
> > There is a NUMA machine ( 120 cores, 8 sockets) in my lab.
> 
> Fantastic!  Can you say a bit more about the architecture and OS?
> 

Intel(R) Xeon(R) CPU at 2.3GHz, with 1TB physical DRAM and 1.5 TB SSD, running 
Ubuntu 14.04, Kernel 3.19.
I guess NUMA is disabled in BIOS, but I will check that. 
However, there is only one NIC, so network could be the bottleneck if we use 
too many cores?

> > I think it's enough to put this kind of stress.
> 
> The researchers who saw this bottleneck reported that performance
> started to dip at 16 cores and the problem was very noticeable at 32
> cores.  A stress test with 120 cores on 8 sockets will be great!
> 
> I think perhaps the first milestone on the project should be to
> develop a set of benchmarks we want to compare to at the end.  That
> would need to include a stress test that clearly shows the problem
> we're trying to solve, along with some cases involving 1 or two
> connections -- just to make sure we don't harm performance for
> low-contention situations.
> 

Thank for your advice! It's really helpful for my proposal. 

There are several alternative benchmarks. Tony suggested that we should use 
TPC-E and TPC-DS. 
Personally, I am more familiar with TPC-C. And pgbench seems only have TPC-B 
built-in benchmark.
Well, I think we can easily find the implementations of these benchmarks for 
PostgreSQL.
The paper you recommended to me used a special benchmark defined by themselves. 
But it is quite simple and easy to implement.

For me, the challenge is profiling the execution. Is there any tools in 
PostgreSQL to analysis where is the CPU cycles consumed?
Or I have to instrument and time by myself?


Regards.

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-13 Thread Kevin Grittner
On Sat, Mar 11, 2017 at 8:39 PM, Mengxing Liu
 wrote:

>> The worst problems have been
>> seen with 32 or more cores on 4 or more sockets with a large number
>> of active connections.  I don't know whether you have access to a
>> machine capable of putting this kind of stress on it (perhaps at
>> your university?), but if not, the community has access to various
>> resources we should be able to schedule time on.
>
> There is a NUMA machine ( 120 cores, 8 sockets) in my lab.

Fantastic!  Can you say a bit more about the architecture and OS?

> I think it's enough to put this kind of stress.

The researchers who saw this bottleneck reported that performance
started to dip at 16 cores and the problem was very noticeable at 32
cores.  A stress test with 120 cores on 8 sockets will be great!

I think perhaps the first milestone on the project should be to
develop a set of benchmarks we want to compare to at the end.  That
would need to include a stress test that clearly shows the problem
we're trying to solve, along with some cases involving 1 or two
connections -- just to make sure we don't harm performance for
low-contention situations.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-11 Thread Mengxing Liu



> -Original Messages-
> From: "Kevin Grittner" 
> Sent Time: 2017-03-12 04:24:29 (Sunday)
> To: "Mengxing Liu" 
> Cc: "pgsql-hackers@postgresql.org" 
> Subject: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in 
> serializable transactions
> 
> On Fri, Mar 10, 2017 at 9:12 PM, Mengxing Liu
>  wrote:
> 
> > My name is Mengxing Liu. I am interested in the project "Eliminate
> > O(N^2) scaling from rw-conflict tracking in serializable
> > transactions”. After discussing with Kevin off-list, I think it's
> > time to post discussion here. I am afraid that there are two things
> > that I need your help. Thank you very much.
> 
> Welcome to the -hackers list!  This is a key part of how development
> happens in the community.  Don't be shy about posting questions and
> ideas, but also expect fairly blunt discussion to occur at times;
> don't let that put you off.
> 

Thanks, Kevin. I will keep that in mind. 

> >>> So the task is clear. We can use a tree-like or hash-like data
> >>> structure to speed up this function.
> >>
> >> Right -- especially with a large number of connections holding a
> >> large number of conflicts.  In one paper with high concurrency, they
> >> found over 50% of the CPU time for PostgreSQL was going to these
> >> functions (including functions called by them).  This seems to me to
> >> be due to the O(N^2) (or possibly worse) performance from the number
> >> of connections.
> >
> > Anyone knows the title of this paper? I want to reproduce its
> > workloads.
> 
> I seem to remember there being a couple other papers or talks, but
> this is probably the most informative:
> 
> http://sydney.edu.au/engineering/it/research/tr/tr693.pdf
> 

Thanks, It's exactly what I need.

> >> Remember, I think most of the work here is going to be in
> >> benchmarking.  We not only need to show improvements in simple test
> >> cases using readily available tools like pgbench, but think about
> >> what types of cases might be worst for the approach taken and show
> >> that it still does well -- or at least not horribly.  It can be OK
> >> to have some slight regression in an unusual case if the common
> >> cases improve a lot, but any large regression needs to be addressed
> >> before the patch can be accepted.  There are some community members
> >> who are truly diabolical in their ability to devise "worst case"
> >> tests, and we don't want to be blind-sided by a bad result from one
> >> of them late in the process.
> >>
> >
> > Are there any documents or links introducing how to test and
> > benchmark PostgreSQL? I may need some time to learn about it.
> 
> There is pgbench:
> 
> https://www.postgresql.org/docs/devel/static/pgbench.html
> 
> A read-only load and a read/write mix should both be tested,
> probably with a few different scales and client counts to force the
> bottleneck to be in different places.  The worst problems have been
> seen with 32 or more cores on 4 or more sockets with a large number
> of active connections.  I don't know whether you have access to a
> machine capable of putting this kind of stress on it (perhaps at
> your university?), but if not, the community has access to various
> resources we should be able to schedule time on.
> 

There is a NUMA machine ( 120 cores, 8 sockets) in my lab. I think it's enough 
to put this kind of stress. 
Anyway, I would ask for help if necessary.

> It may pay for you to search through the archives of the last year
> or two to look for other benchmarks and see what people have
> previously done.
> 

I would try.



--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-11 Thread Kevin Grittner
On Fri, Mar 10, 2017 at 9:12 PM, Mengxing Liu
 wrote:

> My name is Mengxing Liu. I am interested in the project "Eliminate
> O(N^2) scaling from rw-conflict tracking in serializable
> transactions”. After discussing with Kevin off-list, I think it's
> time to post discussion here. I am afraid that there are two things
> that I need your help. Thank you very much.

Welcome to the -hackers list!  This is a key part of how development
happens in the community.  Don't be shy about posting questions and
ideas, but also expect fairly blunt discussion to occur at times;
don't let that put you off.

>>> So the task is clear. We can use a tree-like or hash-like data
>>> structure to speed up this function.
>>
>> Right -- especially with a large number of connections holding a
>> large number of conflicts.  In one paper with high concurrency, they
>> found over 50% of the CPU time for PostgreSQL was going to these
>> functions (including functions called by them).  This seems to me to
>> be due to the O(N^2) (or possibly worse) performance from the number
>> of connections.
>
> Anyone knows the title of this paper? I want to reproduce its
> workloads.

I seem to remember there being a couple other papers or talks, but
this is probably the most informative:

http://sydney.edu.au/engineering/it/research/tr/tr693.pdf

>> Remember, I think most of the work here is going to be in
>> benchmarking.  We not only need to show improvements in simple test
>> cases using readily available tools like pgbench, but think about
>> what types of cases might be worst for the approach taken and show
>> that it still does well -- or at least not horribly.  It can be OK
>> to have some slight regression in an unusual case if the common
>> cases improve a lot, but any large regression needs to be addressed
>> before the patch can be accepted.  There are some community members
>> who are truly diabolical in their ability to devise "worst case"
>> tests, and we don't want to be blind-sided by a bad result from one
>> of them late in the process.
>>
>
> Are there any documents or links introducing how to test and
> benchmark PostgreSQL? I may need some time to learn about it.

There is pgbench:

https://www.postgresql.org/docs/devel/static/pgbench.html

A read-only load and a read/write mix should both be tested,
probably with a few different scales and client counts to force the
bottleneck to be in different places.  The worst problems have been
seen with 32 or more cores on 4 or more sockets with a large number
of active connections.  I don't know whether you have access to a
machine capable of putting this kind of stress on it (perhaps at
your university?), but if not, the community has access to various
resources we should be able to schedule time on.

It may pay for you to search through the archives of the last year
or two to look for other benchmarks and see what people have
previously done.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers