Re: [PERFORM] Benchmark Data requested
There are some results here that show PostgreSQL is slower in some cases than Monet and MySQL. Of course these results were published immediately prior to 8.2 being released, plus run out-of-the-box, so without even basic performance tuning. Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3), and with some sensible tuning settings for the hardware used? It will be useful to get some blind tests with more sensible settings. http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/ Multiple runs from different people/different hardware is useful since they help to iron-out differences in hardware and test methodology. So don't worry if you see somebody else doing this also. Here is another graph: http://tweakers.net/reviews/649/7 Without monetdb though. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Benchmark Data requested
Hi Simon, Note that MonetDB/X100 does not have a SQL optimizer, they ran raw hand-coded plans. As a consequence, these comparisons should be taken as an executor-executor test and we/you should be sure that the PG planner has generated the best possible plan. That said, we've already done the comparisons internally and they've got a good point to make about L2 cache use and removal of unnecessary abstractions in the executor. We've been aware of this since 2005/6 and have been slowly working these ideas into our/PG executor. Bottom line: it's a good thing to work to get close to the X100/Monet executor with a more general purpose DB. PG is a looong way from being comparable, mostly due to poor L2 D-cache locality and I-cache thrashing in the executor. The only way to cure this is to work on more rows than one at a time. - Luke On 2/4/08 10:37 AM, Simon Riggs [EMAIL PROTECTED] wrote: Can I ask for some help with benchmarking? There are some results here that show PostgreSQL is slower in some cases than Monet and MySQL. Of course these results were published immediately prior to 8.2 being released, plus run out-of-the-box, so without even basic performance tuning. Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3), and with some sensible tuning settings for the hardware used? It will be useful to get some blind tests with more sensible settings. http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/ Multiple runs from different people/different hardware is useful since they help to iron-out differences in hardware and test methodology. So don't worry if you see somebody else doing this also. Thanks, ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Benchmark Data requested
Can I ask for some help with benchmarking? There are some results here that show PostgreSQL is slower in some cases than Monet and MySQL. Of course these results were published immediately prior to 8.2 being released, plus run out-of-the-box, so without even basic performance tuning. Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3), and with some sensible tuning settings for the hardware used? It will be useful to get some blind tests with more sensible settings. http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/ Multiple runs from different people/different hardware is useful since they help to iron-out differences in hardware and test methodology. So don't worry if you see somebody else doing this also. Thanks, -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmark Data requested
On Mon, 2008-02-04 at 10:47 -0800, Luke Lonergan wrote: Note that MonetDB/X100 does not have a SQL optimizer, they ran raw hand-coded plans. As a consequence, these comparisons should be taken as an executor-executor test and we/you should be sure that the PG planner has generated the best possible plan. If it doesn't then I'd regard that as a performance issue in itself. That said, we've already done the comparisons internally and they've got a good point to make about L2 cache use and removal of unnecessary abstractions in the executor. We've been aware of this since 2005/6 and have been slowly working these ideas into our/PG executor. Bottom line: it's a good thing to work to get close to the X100/Monet executor with a more general purpose DB. PG is a looong way from being comparable, mostly due to poor L2 D-cache locality and I-cache thrashing in the executor. You maybe right, but I want to see where it hurts us the most. The only way to cure this is to work on more rows than one at a time. Do you have any results to show that's true, or are you just referring to the Cray paper? (Which used fixed length tuples and specific vector hardware). (With regard to benchmarks, I'd rather not download Monet at all. Helps avoid legal issues around did-you-look-at-the-code questions.) -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Benchmark Data requested
On Mon, 4 Feb 2008, Simon Riggs wrote: Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3) Do you have any suggestions on how people should run TPC-H? It looked like a bit of work to sort through how to even start this exercise. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Benchmark Data requested
Doing it at low scales is not attractive. Commercial databases are publishing at scale factor of 1000(about 1TB) to 1(10TB) with one in 30TB space. So ideally right now tuning should start at 1000 scale factor. Unfortunately I have tried that before with PostgreSQL the few of the problems are as follows: Single stream loader of PostgreSQL takes hours to load data. (Single stream load... wasting all the extra cores out there) Multiple table loads ( 1 per table) spawned via script is bit better but hits wal problems. To avoid wal problems, I had created tables and load statements within the same transaction, faster but cannot create index before load or it starts writing to wal... AND if indexes are created after load, it takes about a day or so to create all the indices required. (Its single threaded and creating multiple indexes/indices at the same time could result in running out of temporary DISK space since the tables are so big. Which means 1 thread at a time is the answer for creating tables that are really big. It is slow. Boy, by this time most people doing TPC-H in high end give up on PostgreSQL. I have not even started Partitioning of tables yet since with the current framework, you have to load the tables separately into each tables which means for the TPC-H data you need extra-logic to take that table data and split it into each partition child table. Not stuff that many people want to do by hand. Then for the power run that is essentially running one query at a time should essentially be able to utilize the full system (specially multi-core systems), unfortunately PostgreSQL can use only one core. (Plus since this is read only and there is no separate disk reader all other processes are idle) and system is running at 1/Nth capacity (where N is the number of cores/threads) (I am not sure here with Partitioned tables, do you get N processes running in the system when you scan the partitioned table?) Even off-loading work like fetching the data into bufferpool into separate processes will go big time with this type of workloads. I would be happy to help out if folks here want to do work related to it. Infact if you have time, I can request a project in one of the Sun Benchmarking center to see what we can learn with community members interested in understanding where PostgreSQL performs and fails. Regards, Jignesh Greg Smith wrote: On Mon, 4 Feb 2008, Simon Riggs wrote: Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3) Do you have any suggestions on how people should run TPC-H? It looked like a bit of work to sort through how to even start this exercise. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Benchmark Data requested
On Mon, 2008-02-04 at 15:09 -0500, Greg Smith wrote: On Mon, 4 Feb 2008, Simon Riggs wrote: Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3) Do you have any suggestions on how people should run TPC-H? It looked like a bit of work to sort through how to even start this exercise. The link referred to a few different scale factors, so you could try those. But anything that uses the hardware you have to its full potential is valuable. Everybody's test method is going to be different, whatever I say... -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Benchmark Data requested
Hi Simon, I have some insight into TPC-H on how it works. First of all I think it is a violation of TPC rules to publish numbers without auditing them first. So even if I do the test to show the better performance of PostgreSQL 8.3, I cannot post it here or any public forum without doing going through the process. (Even though it is partial benchmark as they are just doing the equivalent of the PowerRun of TPCH) Maybe the PR of PostgreSQL team should email [EMAIL PROTECTED] about them and see what they have to say about that comparison. On the technical side: Remember all TPC-H queries when run sequentially on PostgreSQL uses only 1 core or virtual CPU so it is a very bad for system to use it with PostgreSQL (same for MySQL too). Also very important unless you are running the UPDATE FUNCTIONS which are separate queries, all these Q1-Q22 Queries are pure READ-ONLY queries. Traditionally I think PostgreSQL does lack READ-SPEEDs specially since it is bottlenecked by the size of the reads it does (BLOCKSIZE). Major database provides multi-block parameters to do multiple of reads/writes in terms of blocksizes to reduce IOPS and also for read only they also have READ-AHEAD or prefetch sizes which is generally bigger than multi-block or extent sizes to aid reads. Scale factor is in terms of gigs and hence using max scale of 5 (5G) is pretty useless since most of the rows could be cached in modern day systems. And comparing with 0.01 is what 10MB? Size of recent L2 cache of Intel is probably bigger than that size. If you are doing tuning for TPC-H Queries focus on few of them: For example Query 1 is very Join intensive and if your CPU is not 100% used then you have a problem in your IO to solve before tuning it. Another example is Query 16 is literally IO scan speed, many people use it to see if the database can scan at line speeds of the storage, ending up with 100% CPU means the database cannot process that many rows (just to bring it in). In essence each query does some combination of system features to highlight the performance. However since it is an old benchmark, database companies end up re-engineering their technologies to gain advantage in this benchmark (Hence its time for a successor in work called TPC-DS which will have more than 100 such queries) Few of the technologies that have really helped gain ground in TPC-H world * Hash and/or Range Partitioning of tables ( PostgreSQL 8.3 can do that but the setup cost of writing schema is great specially since data has to be loaded in separate tables) * Automated Aggregated Views - used by optmiziers - database technology to update more frequently used aggregations in a smaller views * Cube views Index - like bitmap but multidimensional (I think ..but not sure) That said, is it useful to be used in Regression testing in PostgreSQL farms. I would think yes.. specially Q16 Hope this helps. Regards, Jignesh Simon Riggs wrote: Can I ask for some help with benchmarking? There are some results here that show PostgreSQL is slower in some cases than Monet and MySQL. Of course these results were published immediately prior to 8.2 being released, plus run out-of-the-box, so without even basic performance tuning. Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3), and with some sensible tuning settings for the hardware used? It will be useful to get some blind tests with more sensible settings. http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/ Multiple runs from different people/different hardware is useful since they help to iron-out differences in hardware and test methodology. So don't worry if you see somebody else doing this also. Thanks, ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark Data requested
On Mon, 4 Feb 2008, Jignesh K. Shah wrote: Doing it at low scales is not attractive. Commercial databases are publishing at scale factor of 1000(about 1TB) to 1(10TB) with one in 30TB space. So ideally right now tuning should start at 1000 scale factor. I think what Simon was trying to get at is some sort of debunking of Monet's benchmarks which were running in-memory while not giving PostgreSQL any real memory to work with. What you're talking about is a completely separate discussion which is well worth having in addition to that. I'm well aware of how painful it is to generate+load+index even single TB worth of data with PostgreSQL right now because I've been doing just that for weeks now (there's two processing phases in there as well for me that take even longer, but the raw operations are still a significant portion of the total time). I would be happy to help out if folks here want to do work related to it. Infact if you have time, I can request a project in one of the Sun Benchmarking center to see what we can learn with community members interested in understanding where PostgreSQL performs and fails. Sounds like a good 8.4 project. Maybe pick this topic back up at the East convention next month, we could talk about it then. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Benchmark Data requested
Jignesh K. Shah [EMAIL PROTECTED] writes: Then for the power run that is essentially running one query at a time should essentially be able to utilize the full system (specially multi-core systems), unfortunately PostgreSQL can use only one core. (Plus since this is read only and there is no separate disk reader all other processes are idle) and system is running at 1/Nth capacity (where N is the number of cores/threads) Is the whole benchmark like this or is this just one part of it? Is the i/o system really able to saturate the cpu though? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark Data requested
Hi Simon, On 2/4/08 11:07 AM, Simon Riggs [EMAIL PROTECTED] wrote: executor-executor test and we/you should be sure that the PG planner has generated the best possible plan. If it doesn't then I'd regard that as a performance issue in itself. Agreed, though that's two problems to investigate - I think the Monet/X100 stuff is clean in that it's a pure executor test. You maybe right, but I want to see where it hurts us the most. You'll see :-) The only way to cure this is to work on more rows than one at a time. Do you have any results to show that's true, or are you just referring to the Cray paper? (Which used fixed length tuples and specific vector hardware). No paper referenced, just inference from the results and their (and others) conclusions about locality and re-use. It's a similar enough situation to scientific programming with vector processors versus cache based superscalar that these are the right conclusions. We've done the profiling to look at cache misses and have some data to back it up as well. (With regard to benchmarks, I'd rather not download Monet at all. Helps avoid legal issues around did-you-look-at-the-code questions.) None of us have looked at the code or downloaded it. There are a number of presentations out there for Monet/X100 to see what their results are. - Luke ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmark Data requested
Hi Greg, On 2/4/08 12:09 PM, Greg Smith [EMAIL PROTECTED] wrote: Do you have any suggestions on how people should run TPC-H? It looked like a bit of work to sort through how to even start this exercise. To run TPC-H requires a license to publish, etc. However, I think you can use their published data and query generation kit to run the queries, which aren't the benchmark per-se. That's what the Monet/X100 people did. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Benchmark Data requested
On Mon, 4 Feb 2008, Luke Lonergan wrote: However, I think you can use their published data and query generation kit to run the queries, which aren't the benchmark per-se. That's what the Monet/X100 people did. Right; I was just hoping someone might suggest some relatively standardized way to do that via PostgreSQL. I read Simon's original note and was afraid that multiple people might end up duplicating some non-trivial amount of work just to get the kits setup and running, or get frustrated not expecting that part and just give up on the whole idea. I'm very interested in this particular topic (non-trivial database micro-benchmarks) but have no time to spare this week to hack on this one myself. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Benchmark Data requested
Gregory Stark wrote: Incidentally we found some cases that Solaris was particularly bad at. Is there anybody in particular that would be interested in hearing about them? (Not meant to be a knock on Solaris, I'm sure there are other cases Linux or BSD handle poorly too) Send me the details, I can file bugs for Solaris on behalf of the community. Since I am involved in lot of PostgreSQL testing on Solaris this year, I have a small list myself (mostly related to file system stuff though). I know one regarding bonnie rewriting blocks that you sent out. (I still havent done anything about it yet but finally have some test machines for such work instead of using my workstation to test it out :-) But I am really interested in seeing which one hits PostgreSQL performance/usability. Thanks in advance. Regards, Jignesh ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Benchmark Data requested
TPC-H has two runs PowerRun which is single stream (Q1-22 RF1, RF2) And Throughput Runs which has N (depends on scale) running simultaneously in a mixed sequence of the same queries and the two update functions. During throughput run you can expect to max out CPU... But commerial databases generally have PowerRuns running quite well even on multi-cores ( Oracle (without RAC have published with 144 cores on Solaris) As for IO system saturating the CPU its two folds Kernel fetching in the data which saturates at some value and in this case PostgreSQL reading the data and putting it in its bufferpool An example of how I use it is as follows: Do a select query on a table such that it results in table scan without actually returning any rows back Now keep throwing hardware (better storage) till it saturates the CPU. That's the practical max you can do with the CPU/OS combination (considering unlimited storage bandwidth). This one is primarily used in guessing how fast one of the queries in TPC-H will complete. In my tests with PostgreSQL, I generally reach the CPU limit without even reaching the storage bandwidth of the underlying storage. Just to give numbers Single 2Gb Fiber Channel port can practically go upto 180 MB/sec Single 4Gb ports have proven to go upto 360-370MB/sec So to saturate a FC port, postgreSQL has to be able to scan 370MB/sec without saturating the CPU. Then comes software stripping which allows multiple ports to be stripped over increasing the capacity of the bandwidth... Now scanning has to be able to drive Nx370MB/sec (all on single core). I had some numbers and I had some limitations based on cpu frequency, blocksize ,etc but those were for 8.1 days or so.. I think to take PostgreSQL a bit high end, we have to first scale out these numbers. Doing some sorts of test in PostgreSQL farms for every release actually does help people see the amount of data that it can drive through... We can actually work on some database operation metrics to also guage how much each release is improving over older releases.. I have ideas for few of them. Regards, Jignesh Gregory Stark wrote: Jignesh K. Shah [EMAIL PROTECTED] writes: Then for the power run that is essentially running one query at a time should essentially be able to utilize the full system (specially multi-core systems), unfortunately PostgreSQL can use only one core. (Plus since this is read only and there is no separate disk reader all other processes are idle) and system is running at 1/Nth capacity (where N is the number of cores/threads) Is the whole benchmark like this or is this just one part of it? Is the i/o system really able to saturate the cpu though? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings