[PERFORM] static virtual columns as result?
Hello, I have this two tables: Table Cat: id|A|B --+-+- 1|3|5 2|5|8 3|6|9 Table Pr: id|Catid| --+- 1|3 2|2 3|1 I need replace Catid column for corresponding values A and B (Table Cat) in Table Pr. Virtual table like this: Table Pr: id|Catid|A|B --+-+-+ 1|3|6|9 2|2|5|8 3|1|3|5 Something like this, but that works,... SELECT * FROM pr WHERE pr.a /* 1 AND*/*//* https://www.google.es/search?hl=essa=Xei=ULbyT9uKGYSt0QWNy4CwCQved=0CEUQvwUoAQq=betweenspell=1 pr.b 10; With subqueries is too slow: SELECT * FROM Pr AS p, (SELECT id AS cid FROM Cat WHERE lft 1 AND rgt 10) AS c WHERE p.Cat=c.cid AND (...)) ORDER BY Catid ASC OFFSET 0 LIMIT 40 Any suggestion?
Re: [PERFORM] static virtual columns as result?
PV wrote: Any suggestion? You provided too little information to suggest much beyond using JOIN instead of a subquery. Something like: SELECT pr.id, pr.catid, cat.a, cat.b FROM pr join cat ON (cat.id = pr.catid) WHERE lft 1 AND rgt 10 AND (...) ORDER BY cat.id OFFSET 0 LIMIT 40; We can provide more specific suggestions if you follow the advice here: http://wiki.postgresql.org/wiki/SlowQueryQuestions And please format your queries to make them more readable -- something like I did above. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] static virtual columns as result?
El 03/07/12 15:44, Kevin Grittner escribió: You provided too little information to suggest much beyond using JOIN instead of a subquery. Something like: I think that adding new columns to Product , lft and rgt with index should be fast. But does not seem a good design. Tables: # # -- Table: Category CREATE TABLE Category ( id serial NOT NULL, ... lft integer, rgt integer, ... path ltree, description text NOT NULL, idxfti tsvector, ... CONSTRAINT Category_pkey PRIMARY KEY (id ), ) WITH (OIDS=FALSE); ALTER TABLE Category OWNER TO root; CREATE INDEX Category_idxfti_idx ON Category USING gist (idxfti ); CREATE INDEX Category_lftrgt_idx ON Category USING btree (lft , rgt ); CREATE TRIGGER categorytsvectorupdate BEFORE INSERT OR UPDATE ON Category FOR EACH ROW EXECUTE PROCEDURE tsearch2('idxfti', 'description'); -- Table: Product CREATE TABLE Product ( id serial NOT NULL, ... description text NOT NULL, Category integer NOT NULL, ... creationtime integer NOT NULL, ... idxfti tsvector, ... CONSTRAINT product_pkey PRIMARY KEY (id ), CONSTRAINT product_creationtime_check CHECK (creationtime = 0), ) WITH ( OIDS=FALSE ); CREATE INDEX Product_Category_idx ON Product USING btree (Category ); CREATE INDEX Product_creationtime ON Product USING btree (creationtime ); CREATE INDEX Product_idxfti_idx ON Product USING gist (idxfti ); CREATE TRIGGER producttsvectorupdate BEFORE INSERT OR UPDATE ON Product FOR EACH ROW EXECUTE PROCEDURE tsearch2('idxfti','description'); # # Query # EXPLAIN ANALYZE SELECT * FROM Product AS p JOIN Category ON (Category.id = p.Category) WHERE lft BETWEEN 1 AND 792 ORDER BY creationtime ASC OFFSET 0 LIMIT 40 Limit (cost=2582.87..2582.97 rows=40 width=1688) (actual time=4306.209..4306.328 rows=40 loops=1) - Sort (cost=2582.87..2584.40 rows=615 width=1688) (actual time=4306.205..4306.246 rows=40 loops=1) Sort Key: p.creationtime Sort Method: top-N heapsort Memory: 69kB - Nested Loop (cost=31.21..2563.43 rows=615 width=1688) (actual time=0.256..3257.310 rows=122543 loops=1) - Index Scan using Category_lftrgt_idx on Category (cost=0.00..12.29 rows=2 width=954) (actual time=0.102..18.598 rows=402 loops=1) Index Cond: ((lft = 1) AND (lft = 792)) - Bitmap Heap Scan on Product p (cost=31.21..1270.93 rows=371 width=734) (actual time=0.561..6.125 rows=305 loops=402) Recheck Cond: (Category = Category.id) - Bitmap Index Scan on Product_Category_idx (cost=0.00..31.12 rows=371 width=0) (actual time=0.350..0.350 rows=337 loops=402) Index Cond: (Category = Category.id) Total runtime: 4306.706 ms # EXPLAIN ANALYZE SELECT * FROM Product AS p WHERE (p.idxfti @@ to_tsquery('simple', 'vpc')) ORDER BY creationtime ASC OFFSET 0 LIMIT 40 Limit (cost=471.29..471.39 rows=40 width=734) (actual time=262.854..262.971 rows=40 loops=1) - Sort (cost=471.29..471.57 rows=113 width=734) (actual time=262.850..262.890 rows=40 loops=1) Sort Key: creationtime Sort Method: top-N heapsort Memory: 68kB - Bitmap Heap Scan on Product p (cost=49.62..467.72 rows=113 width=734) (actual time=258.502..262.322 rows=130 loops=1) Recheck Cond: (idxfti @@ '''vpc'''::tsquery) - Bitmap Index Scan on Product_idxfti_idx (cost=0.00..49.60 rows=113 width=0) (actual time=258.340..258.340 rows=178 loops=1) Index Cond: (idxfti @@ '''vpc'''::tsquery) Total runtime: 263.177 ms # And here is a big problem: EXPLAIN ANALYZE SELECT * FROM Product AS p JOIN Category ON (Category.id = p.Category) WHERE lft BETWEEN 1 AND 792 AND (p.idxfti @@ to_tsquery('simple', 'vpc')) ORDER BY creationtime ASC OFFSET 0 LIMIT 40 Limit (cost=180.09..180.09 rows=1 width=1688) (actual time=26652.316..26652.424 rows=40 loops=1) - Sort (cost=180.09..180.09 rows=1 width=1688) (actual time=26652.312..26652.350 rows=40 loops=1) Sort Key: p.creationtime Sort Method: top-N heapsort Memory: 96kB - Nested Loop (cost=85.27..180.08 rows=1 width=1688) (actual
Re: [PERFORM] Drop statistics?
On Fri, Jun 22, 2012 at 11:04:36AM -0700, David Kerr wrote: On Fri, Jun 22, 2012 at 01:27:51PM -0400, Tom Lane wrote: - David Kerr d...@mr-paradox.net writes: - I'm trying to work through a root cause on a performance problem. I'd like to - be able to show that a problem was fixed by analyzing the table. - - what i've done is - set default_statistics_target=1 - analyze Table - - That gets rid of most of the rows in pg_stats, but i'm still getting decent performance. - - I usually do something like - - DELETE FROM pg_statistic WHERE starelid = 'foo'::regclass; - - (you need to be superuser to be allowed to do this). - - You may need to keep an eye on whether auto-analyze is coming along and - undoing what you did, too. - - regards, tom lane - Awesome, thanks! One cool trick I have seen is to do the DELETE pg_statistic in a multi-statement transaction and then run query query, and roll it back. This allows the statistics to be preserved, and for only your query to see empty pg_statistic values for the table. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MemSQL the world's fastest database?
It sounds like a lot of marketing BS :) But I like the fact that they use modern language like C++. It is a pain to try doing any development on postgresql. Transition to c++ would be nice (I know it's been debated on #hackers a looot). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL
Hello PostgreSQL fans, I would like to introduce myself and the TPC-V benchmark to the PostgreSQL community. I would then like to ask the community to help us make the TPC-V reference benchmarking kit a success, and establish PostgreSQL as a common DBMS used in measuring the performance of enterprise servers. I am VMware's rep to the TPC, and chair the TPC's virtualization benchmark development subcommittee. For those of you who don't know the TPC, it is an industry standards consortium, and its benchmarks are the main performance tests for enterprise-class database servers. For external (marketing) use, these benchmarks are the gold standard of comparing different servers, processors, databases, etc. For internal use, they are typically the biggest hammers an organization can use for performance stress testing of their products. TPC benchmarks are one of the workloads (if not the main workload) that processor vendors use to design their products. So the benchmarks are in much heavier use internal to companies than there are official disclosures. TPC-V is a new benchmark under development for virtualized databases. A TPC-V configuration has: - multiple virtual machines running a mix of DSS, OLTP, and business logic apps - VMs running with throughputs ranging from 10% to 40% of the total system - load elasticity emulating cloud characteristic: The benchmark maintains a constant overall tpsV load level, but the proportion directed to each VM changes every 10 minutes A paper in the TPC Technical Conference track of VLDB 2010 described the initial motivation and architecture of TPC-V. A paper that has been accepted to the TPC TC track of VLDB 2012 describes in detail the current status of the benchmark. All TPC results up to now have been on commercial databases. The majority of active results are on Oracle or Microsoft SQL Server, followed by DB2, Sybase, and other players. Again, keep in mind that these benchmarks aren't meant to only compare DBMS products. In fact the majority of results are sponsored by server hardware companies. The server hardware, processor, storage, OS, etc. all contribute to the performance. But you can't have a database server benchmark results without a good DBMS! And that's where PostgreSQL comes in. The TPC-V development subcommittee followed the usual path of TPC benchmarks by writing a functional specification, and looking to TPC members to develop benchmarking kits to implement the spec. TPC-V uses the schema and transactions of TPC-E, but the transaction mixes and the way the benchmark is run it totally new and virtualization-specific. We chose to start from TPC-E to accelerate the benchmark development phase: the specification would be easier to write, and DBMS vendors could create TPC-V kits starting from their existing TPC-E kits. Until now, benchmarking kits for various TPC benchmarks have been typically developed by DBMS vendors, and offered to their partners for internal testing or disclosures. So our expectation was that one or more DBMS companies that owned existing TPC-E benchmarking kits would allocate resources to modify their kits to execute the TPC-V transactions, and supply kits to subcommittee members for prototyping. This did not happen (let's not get into the internal politics of the TPC!!), so the subcommittee moved forward with developing its own reference kit. The reference kit has been developed to run on PostgreSQL, and we are focusing our development efforts and testing on PostgreSQL. The reference kit will be a first for the TPC, which until now has only published paper functional specifications. This kit will be publically available to anyone who wants to run TPC-V, whether for internal testing, academic studies, or official publications. Commercial DBMS vendors are allowed to develop their own kits and publish with them. Even if commercial DBMS vendors decide later on to develop TPC-V kits, we expect official TPC-V publications with this reference kit using PostgreSQL, and of course a lot of academic use of the kit. I think this will be a boost for the PostgreSQL community (correct me if I am wrong!!). The most frequent question to the TPC is do you offer a kit to run one of your benchmarks?. There will finally be such a kit, and it will run on PGSQL. But TPC benchmarks is where the big boys play. If we want the reference kit to be credible, it has to have good performance. We don't expect it to beat the commercial databases, but it has to be in the ballpark. We have started our work running the kit in a simple, single-VM, TPC-E type configuration since TPC-E is a known animal with official publications available. We have compared our performance to Microsoft SQL results published on a similar platform. After waving our hands through a number of small differences between the platforms, we have calculated a CPU cost of around 3.2ms/transaction for the published MS SQL results, versus a
Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL
On 07/04/2012 07:08 AM, Reza Taheri wrote: ... so the subcommittee moved forward with developing its own reference kit. The reference kit has been developed to run on PostgreSQL, and we are focusing our development efforts and testing on PostgreSQL. That's a very positive step. The TPC seems to me to have a pretty poor reputation among open source database users and vendors. I think that's largely because the schema and tools are typically very closed and restrictively licensed, though the prohibition against publishing benchmarks by big commercial vendors doesn't help. This sounds like a promising change. The TPC benchmarks are really good for load-testing and regression testing, so having one that's directly PostgreSQL friendly will be a big plus, especially if it is appropriately licensed. The opportunity to audit the schema, queries, and test setup before the tool is finalized would certainly be appealing. What can you publish in draft form now? What license terms does the TPC plan to release the schema, queries, and data for TPC-V under? I've cc'd Greg Smith and Dave Page, both of whom I suspect will be interested in this development but could easily miss your message. If you haven't read Greg' book PostgreSQL High Performance it's probably a good idea to do so. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] The need for clustered indexes to boost TPC-V performance
On 07/04/2012 07:13 AM, Reza Taheri wrote: Following the earlier email introducing the TPC-V benchmark, and that we are developing an industry standard benchmarking kit for TPC-V using PostgreSQL, here is a specific performance issue we have run into. Which version of PostgreSQL are you using? How has it been tuned beyond the defaults - autovacuum settings, shared_buffers, effective_cache_size, WAL settings, etc? How much RAM is on the blade? What OS and version are on the blade? Comparing the table sizes, we are close to 2X larger (more on this in a later note). But the index size is what stands out. Our overall index usage (again, after accounting for different numbers of rows) is 4.8X times larger. 35% of our I/Os are to the index space. I am guessing that the 4.8X ballooning has something to do with this, and that in itself explains a lot about our high I/O rate, as well as higher CPU/tran cycles compared to MS SQL (we are 2.5-3 times slower). This is making me wonder about bloat issues and whether proper vacuuming is being done. If the visibility map and free space map aren't maintained by proper vaccum operation everything gets messy, fast. Well, MS SQL used a clustered index for CT, i.e., the data is held in the leaf pages of the index B-Tree. The data and index are in one data structure. Once you lookup the index, you also have the data at zero additional cost. [snip] Is the PGSQL community willing to invest in a feature that a) has been requested by many others already; and b) can make a huge difference in a benchmark that can lend substantial credibility to PGSQL performance? while PostgreSQL doesn't support covering indexes or clustered indexes at this point, 9.2 has added support for index-only scans, which are a half-way point of sorts. See: http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html http://rhaas.blogspot.com.au/2010/11/index-only-scans.html http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27 If at all possible please see how your test is affected by this PostgreSQL 9.2 enhancement. It should make a big difference, and if it doesn't it's important to know why. (CC'd Robert Haas) I'm not sure what the best option for getting a 9.2 beta build for Windows is. As for the invest side - that's really a matter for EnterpriseDB, Command Prompt, Red Hat, and the other backers who're employing people to work on the DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a good explanation of the current state and progress toward clustered indexes. Some links that may be useful to you are: http://wiki.postgresql.org/wiki/Todo Things that it'd be good to support/implement at some point. Surprisingly, covering/clustered indexes aren't on there or at least aren't easily found. It's certainly a much-desired feature despite its apparent absence from the TODO. http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items -- Craig Ringer