On Fri, Mar 10, 2017 at 9:12 PM, Mengxing Liu <liu-m...@mails.tsinghua.edu.cn> wrote:
> My name is Mengxing Liu. I am interested in the project "Eliminate > O(N^2) scaling from rw-conflict tracking in serializable > transactions”. After discussing with Kevin off-list, I think it's > time to post discussion here. I am afraid that there are two things > that I need your help. Thank you very much. Welcome to the -hackers list! This is a key part of how development happens in the community. Don't be shy about posting questions and ideas, but also expect fairly blunt discussion to occur at times; don't let that put you off. >>> So the task is clear. We can use a tree-like or hash-like data >>> structure to speed up this function. >> >> Right -- especially with a large number of connections holding a >> large number of conflicts. In one paper with high concurrency, they >> found over 50% of the CPU time for PostgreSQL was going to these >> functions (including functions called by them). This seems to me to >> be due to the O(N^2) (or possibly worse) performance from the number >> of connections. > > Anyone knows the title of this paper? I want to reproduce its > workloads. I seem to remember there being a couple other papers or talks, but this is probably the most informative: http://sydney.edu.au/engineering/it/research/tr/tr693.pdf >> Remember, I think most of the work here is going to be in >> benchmarking. We not only need to show improvements in simple test >> cases using readily available tools like pgbench, but think about >> what types of cases might be worst for the approach taken and show >> that it still does well -- or at least not horribly. It can be OK >> to have some slight regression in an unusual case if the common >> cases improve a lot, but any large regression needs to be addressed >> before the patch can be accepted. There are some community members >> who are truly diabolical in their ability to devise "worst case" >> tests, and we don't want to be blind-sided by a bad result from one >> of them late in the process. >> > > Are there any documents or links introducing how to test and > benchmark PostgreSQL? I may need some time to learn about it. There is pgbench: https://www.postgresql.org/docs/devel/static/pgbench.html A read-only load and a read/write mix should both be tested, probably with a few different scales and client counts to force the bottleneck to be in different places. The worst problems have been seen with 32 or more cores on 4 or more sockets with a large number of active connections. I don't know whether you have access to a machine capable of putting this kind of stress on it (perhaps at your university?), but if not, the community has access to various resources we should be able to schedule time on. It may pay for you to search through the archives of the last year or two to look for other benchmarks and see what people have previously done. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers