date:20100421

Re: [PERFORM] significant slow down with various LIMIT

2010-04-21 Thread Kevin Grittner

I wrote:
 
> ALTER TABLE ALTER plugins_guide_address
>   ALTER COLUMN city_id SET STATISTICS 1000;
 
One too many ALTERs in there.  Should be:
 
ALTER TABLE plugins_guide_address
  ALTER COLUMN city_id SET STATISTICS 1000;
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-21 Thread Robert Haas

On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka  wrote:
> The reason this is not done is that the mechanism used for fetching a piece
> of the results at a time can change the query plan used if using a
> PreparedStatement.  There are three ways to plan a PreparedStatement:
>
> a) Using the exact parameter values by substituting them directly into the
> query.  This isn't really "planned" as you can't re-use it at all.  This is
> only available using the V2 protocol.
>
> b) Using the parameter values for statistics, but not making any stronger
> guarantees about them.  So the parameters will be used for evaluating the
> selectivity, but not to perform other optimizations like contraint_exclusion
> or transforming a LIKE operation to a range query. This is the default plan
> type the JDBC driver uses.

Hmm.  I didn't think this was possible.  How are you doing this?

> c) Planning the query with no regard for the parameters passed to it. This
> is the plan type the JDBC driver uses when it sees the same
> PreparedStatement being re-used multiple times or when it is respecting
> setFetchSize and allowing for partial results.
>
> We must use (c) for partial results instead of (b) because of some
> limitations of the server.  Currently you cannot have two statements of type
> (b) open on the same connection.  So since the driver can't know if the user
> will issue another query before fetching the remainder of the first query's
> results, it must setup the first query to be of type (c) so that multiple
> statements can exist simultaneously.
>
> Switching the default plan type to (c) will cause a significant number of
> complaints as performance on some queries will go into the tank.  Perhaps we
> could have a default fetchSize for plain Statements as it won't affect the
> plan.  I could also see making this a URL parameter though so it could be
> set as the default with only a configuration, not a code change.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-21 Thread Nikolas Everett

On Wed, Apr 21, 2010 at 10:41 AM, Robert Haas  wrote:

> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka  wrote:
> > The reason this is not done is that the mechanism used for fetching a
> piece
> > of the results at a time can change the query plan used if using a
> > PreparedStatement.  There are three ways to plan a PreparedStatement:
> >
> > a) Using the exact parameter values by substituting them directly into
> the
> > query.  This isn't really "planned" as you can't re-use it at all.  This
> is
> > only available using the V2 protocol.
> >
> > b) Using the parameter values for statistics, but not making any stronger
> > guarantees about them.  So the parameters will be used for evaluating the
> > selectivity, but not to perform other optimizations like
> contraint_exclusion
> > or transforming a LIKE operation to a range query. This is the default
> plan
> > type the JDBC driver uses.
>
> Hmm.  I didn't think this was possible.  How are you doing this?


More to the point is there some option that can shift you into method a?
 I'm thinking of warehousing type applications where you want to re-plan a
good portion of your queries.

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-21 Thread Tom Lane

>> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka  wrote:
>>> ... There are three ways to plan a PreparedStatement:

FWIW, I think there is some consensus to experiment (in the 9.1 cycle)
with making the server automatically try replanning of parameterized
queries with the actual parameter values substituted.  It'll keep doing
so if it finds that that produces a significantly better plan than the
generic parameterized plan; which is what you'd expect if there's a
chance to optimize a LIKE search, eliminate partitions, etc.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-21 Thread Nikolas Everett

On Wed, Apr 21, 2010 at 11:30 AM, Tom Lane  wrote:

> >> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka  wrote:
> >>> ... There are three ways to plan a PreparedStatement:
>
> FWIW, I think there is some consensus to experiment (in the 9.1 cycle)
> with making the server automatically try replanning of parameterized
> queries with the actual parameter values substituted.  It'll keep doing
> so if it finds that that produces a significantly better plan than the
> generic parameterized plan; which is what you'd expect if there's a
> chance to optimize a LIKE search, eliminate partitions, etc.
>
>regards, tom lane
>

That'd be wonderful.

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-21 Thread Kris Jurka




On Wed, 21 Apr 2010, Robert Haas wrote:


On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka  wrote:


b) Using the parameter values for statistics, but not making any stronger
guarantees about them.  So the parameters will be used for evaluating the
selectivity, but not to perform other optimizations like contraint_exclusion
or transforming a LIKE operation to a range query. This is the default plan
type the JDBC driver uses.


Hmm.  I didn't think this was possible.  How are you doing this?


This is only possible at the protocol level, it's not available using SQL 
commands only.  You do this by creating an unnamed instead of a named 
statement:


http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

Query planning for named prepared-statement objects occurs when
the Parse message is processed. If a query will be repeatedly
executed with different parameters, it might be beneficial to send
a single Parse message containing a parameterized query, followed
by multiple Bind and Execute messages. This will avoid replanning
the query on each execution.

The unnamed prepared statement is likewise planned during Parse
processing if the Parse message defines no parameters. But if
there are parameters, query planning occurs during Bind processing
instead. This allows the planner to make use of the actual values
of the parameters provided in the Bind message when planning the
query.


Kris Jurka
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

2010-04-21 Thread Kris Jurka




On Wed, 21 Apr 2010, Nikolas Everett wrote:

More to the point is there some option that can shift you into method a? 
 I'm thinking of warehousing type applications where you want to re-plan 
a good portion of your queries.




This can be done by connecting to the database using the V2 protocol (use 
URL option protocolVersion=2).  This does remove some functionality of 
the driver that is only available for V3 protocol, but will work just 
fine for query execution.


Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Replacing Cursors with Temporary Tables

2010-04-21 Thread Eliot Gable

I have previously discussed my very long PL/PGSQL stored procedure on this
list. However, without getting into too many details, I have another
performance-related question.

The procedure currently uses cursors to return multiple result sets to the
program executing the procedure. Basically, I do this:

BEGIN;
SELECT * FROM stored_proc();
FETCH ALL FROM cursor1;
FETCH ALL FROM cursor2;
FETCH ALL FROM cursor3;
etc.
COMMIT;

However, there are some cases in the stored procedure where some of the
result sets returned by these cursors are also needed as inputs to
additional queries. To use them, I am currently doing:

FOR temp IN cursorX LOOP
  -- Some code that pushes the current temp record onto the end of an array
END LOOP;
OPEN cursorX;
MOVE FIRST FROM cursorX;

Then, when I need to use the results in a query, I do something like:

SELECT * FROM table1 INNER JOIN (SELECT * FROM unnest(result_array)) AS
table2 ON ( blah blah ) WHERE blah

This seems extremely inefficient to me. First, I'm not sure of the penalty
for unnesting an array into a SET OF object. Second, the set of records
returned from unnesting would not be indexed for the join which means a
sequential scan. Third, building the array in the first place using
array_append seems extremely inefficient. Fourth, opening the cursor twice
seems like it would execute the query twice, though given the placement and
context, it's probably got it cached somewhere (hopefully). I'm sure there
are probably other things I am overlooking.

Instead of doing things this way, I think using temporary tables is really
what I want. I am thinking that instead of doing this cursor BS, I can do
something like:

CREATE TEMPORARY TABLE table2 WITH (OIDS=FALSE) ON COMMIT DROP AS (
  SELECT * FROM blah blah blah -- whatever the cursor is defined as doing
);
ALTER TABLE table2 ADD PRIMARY KEY (id);
CREATE INDEX table2_blah_idx ON table2 USING btree (blah);
ANALYZE table2;

Then, when I need to use the results in another query, I could do:

SELECT * FROM table1 INNER JOIN table2 ON ( blah blah ) WHERE blah

This would use the indexes and the primary key appropriately. I could also
ensure that the order of the information in the temporary table is such that
it facilitates any joining, where clauses, or order by clauses on the
additional queries. Finally, to get results into my application, I would
then do:

BEGIN;
SELECT * FROM stored_procedure();
SELECT * FROM temp_table1;
SELECT * FROM temp_table2;
SELECT * FROM temp_table3;
etc
COMMIT;

However, this is a fairly major re-write of how things are done. Before I
spend the time to do all that re-writing, can anyone share some insight on
where / how I might expect to gain performance from this conversion and also
speak to some of the overhead (if any) in using temporary tables like this
(building them, creating indexes on them, analyzing them, then dropping them
on commit)? It is worth mentioning that the data being stored in these
temporary tables is probably <1,000 records for all tables involved. Most
will probably be <100 records. Some of these temporary tables will be joined
to other tables up to 4 more times throughout the rest of the stored
procedure. Most will be generated and then retrieved only from outside the
stored procedure. Obviously, I would not create indexes on or analyze the
temporary tables being retrieved only from outside the stored procedure.
Indexes and primary keys will only be created on the tables that are joined
to other tables and have WHERE conditions applied to them.

I have done a lot of Googling on temporary tables and cursors in PostgreSQL,
but I have found only very limited discussion as to performance differences
with respect to how I'm planning on using them, and I am unsure about the
quality of the information given that most of it is 4+ years out of date and
posted on various expert exchanges and not on this pgsql-performance list.

One final question:

In this conversion to temporary table use, there are a couple of cases where
I would prefer to do something like:

prepare blah(blah blah) as select blah;

Then, I want to call this prepared statement multiple times, passing a
different argument value each time. The only reason to do this would be to
save writing code and to ensure that updating the select statement in once
place covers all places where it is used. However, I am concerned it might
incur a performance hit by re-preparing the query since I assume that having
this inside the PL/PGSQL procedure means it is already prepared once. Can
anyone speak to this? I know that I could put it in a separate stored
procedure, but then the question becomes, does that add extra overhead? Or,
in different words, is it similar to the difference between an inlined
function and a non-inlined function in C?

I would greatly appreciate any insights to these questions/issues.

Thanks in advance for any assistance anyone can provide.


-- 
Eliot Gable

"We do not inherit the Earth from our ancestors:

[PERFORM] Re: [PERFORM] Dbt2 with postgres issues on CentOS-5. 3‏

2010-04-21 Thread Mark Wong

2010/4/20 MUHAMMAD ASIF :
> Hi,
>
> I am using dbt2 on Linux 64 (CentOS release 5.3 (Final)) . I have compiled
> latest postgresql-8.4.3 code on the machine and run dbt2 against it. I am
> little confused about the results. I ran dbt2 with the following
> configuration i.e.
>
> DBT2 Options :
>     WAREHOUSES=75
>     DB_CONNECTIONS=20
>     REGRESS_DURATION=1 #HOURS
>     REGRESS_DURATION_SEC=$((60*60*$REGRESS_DURATION))
>
> DBT2 Command :
>     ./dbt2-pgsql-create-db
>     ./dbt2-pgsql-build-db -d $DBDATA -g -r -w $WAREHOUSES
>     ./dbt2-run-workload -a pgsql -c $DB_CONNECTIONS -d
> $REGRESS_DURATION_SEC -w $WAREHOUSES -o $OUTPUT_DIR
>     ./dbt2-pgsql-stop-db
>
> I am not able to understand the sar related graphs. Iostat,mpstat and vmstat
> results are similar but
>   sar results are strange. I tried to explore the dbt2 source code to find
> out the how graphs are drawn and why sar results differ.DBT2.pm : 189 reads
> sar.out and parse it and consider 1 minute elapsed time between each record
> i.e.

That is certainly a weakness in the logic of the perl modules in
plotting the charts accurately.  I wouldn't be surprised if the other
stat tools suffer the same problem.

Regards,
Mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replacing Cursors with Temporary Tables

2010-04-21 Thread Robert Haas

I think it's really tough to say how this is going to perform.  I'd
recommend constructing a couple of simplified test cases and
benchmarking the heck out of it.  One of the problems with temporary
tables is that every time you create a temporary table, it creates a
(temporary) record in pg_class; that can get to be a problem if you do
it a lot.  Another is that for non-trivial queries you may need to do
a manual ANALYZE on the table to get good stats for the rest of the
query to perform well.  But on the flip side, as you say, nesting and
unnesting of arrays and function calls are not free either.  I am
going to hazard a SWAG that the array implementation is faster UNLESS
the lack of good stats on the contents of the arrays is hosing the
performance somewhere down the road.  But that is really just a total
shot in the dark.

Another possible implementation might be to have a couple of permanent
tables where you store the results.  Give each such table a "batch id"
column, and return the batch id from your stored procedure.  This
would probably avoid a lot of the overhead associated with temp tables
while retaining many of the benefits.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Very high effective_cache_size == worse performance?

2010-04-21 Thread Scott Carey

On Apr 20, 2010, at 12:22 PM, Scott Marlowe wrote:

> On Tue, Apr 20, 2010 at 12:47 PM, David Kerr  wrote:
>> On Tue, Apr 20, 2010 at 02:15:19PM -0400, Robert Haas wrote:
>> - On Tue, Apr 20, 2010 at 2:03 PM, David Kerr  wrote:
>> - > that thought occured to me while I was testing this. I ran a vacuumdb -z
>> - > on my database during the load and it didn't impact performance at all.
>> -
>> - The window to run ANALYZE usefully is pretty short.  If you run it
>> - before the load is complete, your stats will be wrong.  If you run it
>> - after the select statements that hit the table are planned, the
>> - updated stats won't arrive in time to do any good.
>> 
>> right, but i'm loading 20 million records in 1000 record increments. so
>> the analyze should affect all subsequent increments, no?
> 
> I keep thinking FK checks are taking a long time because they aren't
> cached because in import they went through the ring buffer in pg or
> some other way aren't in a buffer but large effective cache size says
> it's 99.99% chance or better that it's in cache, and chooses a poor
> plan to look them up.  Just a guess.
> 

Yeah,  I was thinking the same thing.

If possible make sure the table either has no indexes and FK's or only the 
minimum required (PK?) while doing the load, then add the indexes and FK's 
later.
Whether this is possible depends on what the schema is and what must be known 
by the app to load the data, but if you can do it its a huge win.

Of course, if its not all in one transaction and there is any other concurrency 
going on that could be a bad idea.  Or, if this is not a load on a fresh table 
but an append/update it may not be possible to drop some of the indexes first.

Generally speaking, a load on a table without an index followed by index 
creation is at least twice as fast, and often 5x as fast or more.  This is less 
true if each row is an individual insert and batching or 'insert into foo 
values (a, b, c, ...), (a2, b2, c2, ...)' multiple row syntax is not used.

> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replacing Cursors with Temporary Tables

2010-04-21 Thread Scott Carey

On Apr 21, 2010, at 1:16 PM, Eliot Gable wrote:

> I have previously discussed my very long PL/PGSQL stored procedure on this 
> list. However, without getting into too many details, I have another 
> performance-related question.
> 
> The procedure currently uses cursors to return multiple result sets to the 
> program executing the procedure. Basically, I do this:
> 
> CREATE TEMPORARY TABLE table2 WITH (OIDS=FALSE) ON COMMIT DROP AS (
>   SELECT * FROM blah blah blah -- whatever the cursor is defined as doing
> );
> ALTER TABLE table2 ADD PRIMARY KEY (id);
> CREATE INDEX table2_blah_idx ON table2 USING btree (blah);
> ANALYZE table2;
> 
> Then, when I need to use the results in another query, I could do:
> 
> SELECT * FROM table1 INNER JOIN table2 ON ( blah blah ) WHERE blah
> 
> This would use the indexes and the primary key appropriately. I could also 
> ensure that the order of the information in the temporary table is such that 
> it facilitates any joining, where clauses, or order by clauses on the 
> additional queries. Finally, to get results into my application, I would then 
> do:

I have had good luck with temp tables, but beware -- there isn't anything 
special performance wise about them -- they do as much I/O as a real table 
without optimizations that know that it will be dropped on commit so it doesn't 
have to be as fail-safe as ordinary ones.  Even so, a quick 
CREATE TABLE foo ON COMMIT DROP AS (SELECT ...); 
ANALYZE foo;
SELECT FROM foo JOIN bar ...  ;
can be very effective for performance.

However, creating the indexes above is going to slow it down a lot.  Most 
likely, the join with a seqscan will be faster than an index build followed by 
the join.  After all, in order to build the index it has to seqscan!   If you 
are consuming these tables for many later select queries rather than just one 
or two, building the index might help.  Otherwise its just a lot of extra work.

I suggest you experiment with the performance differences using psql on a 
specific use case on real data.

> One final question:
> 
> In this conversion to temporary table use, there are a couple of cases where 
> I would prefer to do something like:
> 
> prepare blah(blah blah) as select blah;
> 
> Then, I want to call this prepared statement multiple times, passing a 
> different argument value each time. The only reason to do this would be to 
> save writing code and to ensure that updating the select statement in once 
> place covers all places where it is used. However, I am concerned it might 
> incur a performance hit by re-preparing the query since I assume that having 
> this inside the PL/PGSQL procedure means it is already prepared once. Can 
> anyone speak to this? I know that I could put it in a separate stored 
> procedure, but then the question becomes, does that add extra overhead? Or, 
> in different words, is it similar to the difference between an inlined 
> function and a non-inlined function in C?

I can't speak for the details in your question, but it brings up a different 
issue I can speak to:
Prepared statements usually cause the planner to create a generic query plan 
for all possible inputs.  For some queries where the parameters can 
significantly influence the query plan, this can be a big performance drop.  
For other queries (particularly inserts or simple selects on PK's) the cached 
plan saves time.

> I would greatly appreciate any insights to these questions/issues.
> 
> Thanks in advance for any assistance anyone can provide.
> 
> 
> -- 
> Eliot Gable
> 
> "We do not inherit the Earth from our ancestors: we borrow it from our 
> children." ~David Brower 
> 
> "I decided the words were too conservative for me. We're not borrowing from 
> our children, we're stealing from them--and it's not even considered to be a 
> crime." ~David Brower
> 
> "Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not 
> live to eat.) ~Marcus Tullius Cicero

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] significant slow down with various LIMIT

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

[PERFORM] Replacing Cursors with Temporary Tables

[PERFORM] Re: [PERFORM] Dbt2 with postgres issues on CentOS-5. 3‏

Re: [PERFORM] Replacing Cursors with Temporary Tables

Re: [PERFORM] Very high effective_cache_size == worse performance?

Re: [PERFORM] Replacing Cursors with Temporary Tables

12 matches

Site Navigation

Mail list logo

Footer information