[PERFORM] static virtual columns as result?

2012-07-03 Thread PV

Hello,

I have this two tables:


Table Cat:
id|A|B
--+-+-
1|3|5
2|5|8
3|6|9

Table Pr:
id|Catid|
--+-
1|3
2|2
3|1


I need replace Catid column for corresponding values A and B (Table 
Cat) in Table Pr.


Virtual table like this:
Table Pr:
id|Catid|A|B
--+-+-+
1|3|6|9
2|2|5|8
3|1|3|5


Something like this, but that works,...

SELECT * FROM pr WHERE pr.a /* 1 AND*/*//* 
https://www.google.es/search?hl=essa=Xei=ULbyT9uKGYSt0QWNy4CwCQved=0CEUQvwUoAQq=betweenspell=1 
pr.b  10;


With subqueries is too slow:
SELECT * FROM Pr AS p, (SELECT id AS cid FROM Cat WHERE lft  1 
AND rgt  10) AS c WHERE p.Cat=c.cid AND (...)) ORDER BY Catid 
ASC OFFSET 0 LIMIT 40



Any suggestion?



Re: [PERFORM] static virtual columns as result?

2012-07-03 Thread Kevin Grittner
PV  wrote:
 
 Any suggestion?
 
You provided too little information to suggest much beyond using JOIN
instead of a subquery.  Something like:
 
SELECT pr.id, pr.catid, cat.a, cat.b
  FROM pr join cat ON (cat.id = pr.catid)
  WHERE lft  1 AND rgt  10 AND (...)
  ORDER BY cat.id
  OFFSET 0 LIMIT 40;
 
We can provide more specific suggestions if you follow the advice
here:
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
And please format your queries to make them more readable --
something like I did above.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] static virtual columns as result?

2012-07-03 Thread PV

El 03/07/12 15:44, Kevin Grittner escribió:

You provided too little information to suggest much beyond using JOIN
instead of a subquery.  Something like:
I think that adding new columns to Product , lft and rgt  with index 
should be fast. But does not seem a good design.



Tables:
#
#
-- Table: Category
CREATE TABLE Category
(
  id serial NOT NULL,
...
  lft integer,
  rgt integer,
...
  path ltree,
  description text NOT NULL,
  idxfti tsvector,
...
CONSTRAINT Category_pkey PRIMARY KEY (id ),
)
WITH (OIDS=FALSE);
ALTER TABLE Category  OWNER TO root;

CREATE INDEX Category_idxfti_idx
ON Category
USING gist  (idxfti );
CREATE INDEX Category_lftrgt_idx
ON Category
USING btree (lft , rgt );


CREATE TRIGGER categorytsvectorupdate
  BEFORE INSERT OR UPDATE
  ON Category
  FOR EACH ROW
  EXECUTE PROCEDURE tsearch2('idxfti', 'description');


-- Table: Product

CREATE TABLE Product
(
  id serial NOT NULL,
...
  description text NOT NULL,
  Category integer NOT NULL,
...
  creationtime integer NOT NULL,
...
  idxfti tsvector,
...
  CONSTRAINT product_pkey PRIMARY KEY (id ),
  CONSTRAINT product_creationtime_check CHECK (creationtime = 0),
)
WITH (
  OIDS=FALSE
);

CREATE INDEX Product_Category_idx
  ON Product
  USING btree
  (Category );

CREATE INDEX Product_creationtime
  ON Product
  USING btree
  (creationtime );

CREATE INDEX Product_idxfti_idx
  ON Product
  USING gist
  (idxfti );

CREATE TRIGGER producttsvectorupdate
  BEFORE INSERT OR UPDATE
  ON Product
  FOR EACH ROW
  EXECUTE PROCEDURE tsearch2('idxfti','description');

#
#

Query
#

EXPLAIN ANALYZE
SELECT * FROM Product AS p
JOIN Category
ON (Category.id = p.Category)
WHERE lft BETWEEN 1 AND 792
ORDER BY creationtime ASC
OFFSET 0 LIMIT 40


Limit  (cost=2582.87..2582.97 rows=40 width=1688) (actual 
time=4306.209..4306.328 rows=40 loops=1)
  -  Sort  (cost=2582.87..2584.40 rows=615 width=1688) (actual 
time=4306.205..4306.246 rows=40 loops=1)

Sort Key: p.creationtime
Sort Method: top-N heapsort  Memory: 69kB
-  Nested Loop  (cost=31.21..2563.43 rows=615 width=1688) 
(actual time=0.256..3257.310 rows=122543 loops=1)
  -  Index Scan using Category_lftrgt_idx on Category  
(cost=0.00..12.29 rows=2 width=954) (actual time=0.102..18.598 rows=402 
loops=1)

Index Cond: ((lft = 1) AND (lft = 792))
  -  Bitmap Heap Scan on Product p  (cost=31.21..1270.93 
rows=371 width=734) (actual time=0.561..6.125 rows=305 loops=402)

Recheck Cond: (Category = Category.id)
-  Bitmap Index Scan on Product_Category_idx  
(cost=0.00..31.12 rows=371 width=0) (actual time=0.350..0.350 rows=337 
loops=402)

  Index Cond: (Category = Category.id)
Total runtime: 4306.706 ms


#

EXPLAIN ANALYZE
SELECT * FROM Product AS p
WHERE (p.idxfti @@ to_tsquery('simple', 
'vpc'))

ORDER BY creationtime ASC OFFSET 0 LIMIT 40


Limit  (cost=471.29..471.39 rows=40 width=734) (actual 
time=262.854..262.971 rows=40 loops=1)
  -  Sort  (cost=471.29..471.57 rows=113 width=734) (actual 
time=262.850..262.890 rows=40 loops=1)

Sort Key: creationtime
Sort Method: top-N heapsort  Memory: 68kB
-  Bitmap Heap Scan on Product p  (cost=49.62..467.72 
rows=113 width=734) (actual time=258.502..262.322 rows=130 loops=1)

  Recheck Cond: (idxfti @@ '''vpc'''::tsquery)
  -  Bitmap Index Scan on Product_idxfti_idx  
(cost=0.00..49.60 rows=113 width=0) (actual time=258.340..258.340 
rows=178 loops=1)

Index Cond: (idxfti @@ '''vpc'''::tsquery)
Total runtime: 263.177 ms

#

And here is a big problem:


EXPLAIN ANALYZE
SELECT * FROM Product AS p
JOIN Category
ON (Category.id = p.Category)
WHERE lft BETWEEN 1 AND 792  AND 
(p.idxfti @@ to_tsquery('simple', 'vpc'))

ORDER BY creationtime ASC
OFFSET 0 LIMIT 40



Limit  (cost=180.09..180.09 rows=1 width=1688) (actual 
time=26652.316..26652.424 rows=40 loops=1)
  -  Sort  (cost=180.09..180.09 rows=1 width=1688) (actual 
time=26652.312..26652.350 rows=40 loops=1)

Sort Key: p.creationtime
Sort Method: top-N heapsort  Memory: 96kB
-  Nested Loop  (cost=85.27..180.08 rows=1 width=1688) (actual 

Re: [PERFORM] Drop statistics?

2012-07-03 Thread Bruce Momjian
On Fri, Jun 22, 2012 at 11:04:36AM -0700, David Kerr wrote:
 On Fri, Jun 22, 2012 at 01:27:51PM -0400, Tom Lane wrote:
 - David Kerr d...@mr-paradox.net writes:
 -  I'm trying to work through a root cause on a performance problem. I'd 
 like to
 -  be able to show that a problem was fixed by analyzing the table.
 - 
 -  what i've done is
 -  set default_statistics_target=1
 -  analyze Table
 - 
 -  That gets rid of most of the rows in pg_stats, but i'm still getting 
 decent performance.
 - 
 - I usually do something like
 - 
 - DELETE FROM pg_statistic WHERE starelid = 'foo'::regclass;
 - 
 - (you need to be superuser to be allowed to do this).
 - 
 - You may need to keep an eye on whether auto-analyze is coming along and
 - undoing what you did, too.
 - 
 - regards, tom lane
 - 
 
 Awesome, thanks!

One cool trick I have seen is to do the DELETE pg_statistic in a multi-statement
transaction and then run query query, and roll it back.  This allows the
statistics to be preserved, and for only your query to see empty
pg_statistic values for the table.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MemSQL the world's fastest database?

2012-07-03 Thread Gregg Jaskiewicz
It sounds like a lot of marketing BS :)

But I like the fact that they use modern language like C++. It is a
pain to try doing any development on postgresql. Transition to c++
would be nice (I know it's been debated on #hackers a looot).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-03 Thread Reza Taheri
Hello PostgreSQL fans,
I would like to introduce myself and the TPC-V benchmark to the PostgreSQL 
community. I would then like to ask the community to help us make the TPC-V 
reference benchmarking kit a success, and establish PostgreSQL as a common DBMS 
used in measuring the performance of enterprise servers.

I am VMware's rep to the TPC, and chair the TPC's virtualization benchmark 
development subcommittee. For those of you who don't know the TPC, it is an 
industry standards consortium, and its benchmarks are the main performance 
tests for enterprise-class database servers. For external (marketing) use, 
these benchmarks are the gold standard of comparing different servers, 
processors, databases, etc. For internal use, they are typically the biggest 
hammers an organization can use for performance stress testing of their 
products. TPC benchmarks are one of the workloads (if not the main workload) 
that processor vendors use to design their products. So the benchmarks are in 
much heavier use internal to companies than there are official disclosures.

TPC-V is a new benchmark under development for virtualized databases. A TPC-V 
configuration has:
- multiple virtual machines running a mix of DSS, OLTP, and business logic apps
- VMs running with throughputs ranging from 10% to 40% of the total system
- load elasticity emulating cloud characteristic: The benchmark maintains a 
constant overall tpsV load level, but the proportion directed to each VM 
changes every 10 minutes

A paper in the TPC Technical Conference track of VLDB 2010 described the 
initial motivation and architecture of TPC-V. A paper that has been accepted to 
the TPC TC track of VLDB 2012 describes in detail the current status of the 
benchmark.

All TPC results up to now have been on commercial databases. The majority of 
active results are on Oracle or Microsoft SQL Server, followed by DB2, Sybase, 
and other players. Again, keep in mind that these benchmarks aren't meant to 
only compare DBMS products. In fact the majority of results are sponsored by 
server hardware companies. The server hardware, processor, storage, OS, etc. 
all contribute to the performance. But you can't have a database server 
benchmark results without a good DBMS!

And that's where PostgreSQL comes in. The TPC-V development subcommittee 
followed the usual path of TPC benchmarks by writing a functional 
specification, and looking to TPC members to develop benchmarking kits to 
implement the spec. TPC-V uses the schema and transactions of TPC-E, but the 
transaction mixes and the way the benchmark is run it totally new and 
virtualization-specific. We chose to start from TPC-E to accelerate the 
benchmark development phase: the specification would be easier to write, and 
DBMS vendors could create TPC-V kits starting from their existing TPC-E kits. 
Until now, benchmarking kits for various TPC benchmarks have been typically 
developed by DBMS vendors, and offered to their partners for internal testing 
or disclosures. So our expectation was that one or more DBMS companies that 
owned existing TPC-E benchmarking kits would allocate resources to modify their 
kits to execute the TPC-V transactions, and supply kits to subcommittee members 
for prototyping. This did not happen (let's not get into the internal politics 
of the TPC!!), so the subcommittee moved forward with developing its own 
reference kit. The reference kit has been developed to run on PostgreSQL, and 
we are focusing our development efforts and testing on PostgreSQL.

The reference kit will be a first for the TPC, which until now has only 
published paper functional specifications. This kit will be publically 
available to anyone who wants to run TPC-V, whether for internal testing, 
academic studies, or official publications. Commercial DBMS vendors are allowed 
to develop their own kits and publish with them. Even if commercial DBMS 
vendors decide later on to develop TPC-V kits, we expect official TPC-V 
publications with this reference kit using PostgreSQL, and of course a lot of 
academic use of the kit. I think this will be a boost for the PostgreSQL 
community (correct me if I am wrong!!).

The most frequent question to the TPC is do you offer a kit to run one of your 
benchmarks?. There will finally be such a kit, and it will run on PGSQL.

But TPC benchmarks is where the big boys play. If we want the reference kit to 
be credible, it has to have good performance. We don't expect it to beat the 
commercial databases, but it has to be in the ballpark. We have started our 
work running the kit in a simple, single-VM, TPC-E type configuration since 
TPC-E is a known animal with official publications available. We have compared 
our performance to Microsoft SQL results published on a similar platform. After 
waving our hands through a number of small differences between the platforms, 
we have calculated a CPU cost of around 3.2ms/transaction for the published MS 
SQL results, versus a 

Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-03 Thread Craig Ringer

On 07/04/2012 07:08 AM, Reza Taheri wrote:

... so the subcommittee moved forward with developing its own 
reference kit. The reference kit has been developed to run on 
PostgreSQL, and we are focusing our development efforts and testing on 
PostgreSQL.
That's a very positive step. The TPC seems to me to have a pretty poor 
reputation among open source database users and vendors. I think that's 
largely because the schema and tools are typically very closed and 
restrictively licensed, though the prohibition against publishing 
benchmarks by big commercial vendors doesn't help.


This sounds like a promising change. The TPC benchmarks are really good 
for load-testing and regression testing, so having one that's directly 
PostgreSQL friendly will be a big plus, especially if it is 
appropriately licensed.


The opportunity to audit the schema, queries, and test setup before the 
tool is finalized would certainly be appealing. What can you publish in 
draft form now?


What license terms does the TPC plan to release the schema, queries, and 
data for TPC-V under?


I've cc'd Greg Smith and Dave Page, both of whom I suspect will be 
interested in this development but could easily miss your message. If 
you haven't read Greg' book PostgreSQL High Performance it's probably 
a good idea to do so.


--
Craig Ringer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-03 Thread Craig Ringer

On 07/04/2012 07:13 AM, Reza Taheri wrote:


Following the earlier email introducing the TPC-V benchmark, and that 
we are developing an industry standard benchmarking kit for TPC-V 
using PostgreSQL, here is a specific performance issue we have run into.




Which version of PostgreSQL are you using?

How has it been tuned beyond the defaults - autovacuum settings, 
shared_buffers, effective_cache_size, WAL settings, etc?


How much RAM is on the blade? What OS and version are on the blade?

Comparing the table sizes, we are close to 2X larger (more on this in 
a later note). But the index size is what stands out. Our overall 
index usage (again, after accounting for different numbers of rows) is 
4.8X times larger. 35% of our I/Os are to the index space. I am 
guessing that the 4.8X ballooning has something to do with this, and 
that in itself explains a lot about our high I/O rate, as well as 
higher CPU/tran cycles compared to MS SQL (we are  2.5-3 times slower).


This is making me wonder about bloat issues and whether proper vacuuming 
is being done. If the visibility map and free space map aren't 
maintained by proper vaccum operation everything gets messy, fast.


Well, MS SQL used a clustered index for CT, i.e., the data is held 
in the leaf pages of the index B-Tree. The data and index are in one 
data structure. Once you lookup the index, you also have the data at 
zero additional cost.


[snip]

Is the PGSQL community willing to invest in a feature that a) has been 
requested by many others already; and b) can make a huge difference in 
a benchmark that can lend substantial credibility to PGSQL performance?




while PostgreSQL doesn't support covering indexes or clustered indexes 
at this point, 9.2 has added support for index-only scans, which are a 
half-way point of sorts. See:


http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this 
PostgreSQL 9.2 enhancement. It should make a big difference, and if it 
doesn't it's important to know why.


(CC'd Robert Haas)

I'm not sure what the best option for getting a 9.2 beta build for 
Windows is.



As for the invest side - that's really a matter for EnterpriseDB, 
Command Prompt, Red Hat, and the other backers who're employing people 
to work on the DB. Consider asking on pgsql-hackers, too; if nothing 
else you'll get a good explanation of the current state and progress 
toward clustered indexes.


Some links that may be useful to you are:

http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. 
Surprisingly, covering/clustered indexes aren't on there or at least 
aren't easily found. It's certainly a much-desired feature despite its 
apparent absence from the TODO.


http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

--
Craig Ringer