Re: [PERFORM] significant slow down with various LIMIT
I wrote: > ALTER TABLE ALTER plugins_guide_address > ALTER COLUMN city_id SET STATISTICS 1000; One too many ALTERs in there. Should be: ALTER TABLE plugins_guide_address ALTER COLUMN city_id SET STATISTICS 1000; -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka wrote: > The reason this is not done is that the mechanism used for fetching a piece > of the results at a time can change the query plan used if using a > PreparedStatement. There are three ways to plan a PreparedStatement: > > a) Using the exact parameter values by substituting them directly into the > query. This isn't really "planned" as you can't re-use it at all. This is > only available using the V2 protocol. > > b) Using the parameter values for statistics, but not making any stronger > guarantees about them. So the parameters will be used for evaluating the > selectivity, but not to perform other optimizations like contraint_exclusion > or transforming a LIKE operation to a range query. This is the default plan > type the JDBC driver uses. Hmm. I didn't think this was possible. How are you doing this? > c) Planning the query with no regard for the parameters passed to it. This > is the plan type the JDBC driver uses when it sees the same > PreparedStatement being re-used multiple times or when it is respecting > setFetchSize and allowing for partial results. > > We must use (c) for partial results instead of (b) because of some > limitations of the server. Currently you cannot have two statements of type > (b) open on the same connection. So since the driver can't know if the user > will issue another query before fetching the remainder of the first query's > results, it must setup the first query to be of type (c) so that multiple > statements can exist simultaneously. > > Switching the default plan type to (c) will cause a significant number of > complaints as performance on some queries will go into the tank. Perhaps we > could have a default fetchSize for plain Statements as it won't affect the > plan. I could also see making this a URL parameter though so it could be > set as the default with only a configuration, not a code change. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Wed, Apr 21, 2010 at 10:41 AM, Robert Haas wrote: > On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka wrote: > > The reason this is not done is that the mechanism used for fetching a > piece > > of the results at a time can change the query plan used if using a > > PreparedStatement. There are three ways to plan a PreparedStatement: > > > > a) Using the exact parameter values by substituting them directly into > the > > query. This isn't really "planned" as you can't re-use it at all. This > is > > only available using the V2 protocol. > > > > b) Using the parameter values for statistics, but not making any stronger > > guarantees about them. So the parameters will be used for evaluating the > > selectivity, but not to perform other optimizations like > contraint_exclusion > > or transforming a LIKE operation to a range query. This is the default > plan > > type the JDBC driver uses. > > Hmm. I didn't think this was possible. How are you doing this? More to the point is there some option that can shift you into method a? I'm thinking of warehousing type applications where you want to re-plan a good portion of your queries.
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
>> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka wrote: >>> ... There are three ways to plan a PreparedStatement: FWIW, I think there is some consensus to experiment (in the 9.1 cycle) with making the server automatically try replanning of parameterized queries with the actual parameter values substituted. It'll keep doing so if it finds that that produces a significantly better plan than the generic parameterized plan; which is what you'd expect if there's a chance to optimize a LIKE search, eliminate partitions, etc. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Wed, Apr 21, 2010 at 11:30 AM, Tom Lane wrote: > >> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka wrote: > >>> ... There are three ways to plan a PreparedStatement: > > FWIW, I think there is some consensus to experiment (in the 9.1 cycle) > with making the server automatically try replanning of parameterized > queries with the actual parameter values substituted. It'll keep doing > so if it finds that that produces a significantly better plan than the > generic parameterized plan; which is what you'd expect if there's a > chance to optimize a LIKE search, eliminate partitions, etc. > >regards, tom lane > That'd be wonderful.
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Wed, 21 Apr 2010, Robert Haas wrote: On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka wrote: b) Using the parameter values for statistics, but not making any stronger guarantees about them. So the parameters will be used for evaluating the selectivity, but not to perform other optimizations like contraint_exclusion or transforming a LIKE operation to a range query. This is the default plan type the JDBC driver uses. Hmm. I didn't think this was possible. How are you doing this? This is only possible at the protocol level, it's not available using SQL commands only. You do this by creating an unnamed instead of a named statement: http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY Query planning for named prepared-statement objects occurs when the Parse message is processed. If a query will be repeatedly executed with different parameters, it might be beneficial to send a single Parse message containing a parameterized query, followed by multiple Bind and Execute messages. This will avoid replanning the query on each execution. The unnamed prepared statement is likewise planned during Parse processing if the Parse message defines no parameters. But if there are parameters, query planning occurs during Bind processing instead. This allows the planner to make use of the actual values of the parameters provided in the Bind message when planning the query. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [JDBC] SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
On Wed, 21 Apr 2010, Nikolas Everett wrote: More to the point is there some option that can shift you into method a? I'm thinking of warehousing type applications where you want to re-plan a good portion of your queries. This can be done by connecting to the database using the V2 protocol (use URL option protocolVersion=2). This does remove some functionality of the driver that is only available for V3 protocol, but will work just fine for query execution. Kris Jurka -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Replacing Cursors with Temporary Tables
I have previously discussed my very long PL/PGSQL stored procedure on this list. However, without getting into too many details, I have another performance-related question. The procedure currently uses cursors to return multiple result sets to the program executing the procedure. Basically, I do this: BEGIN; SELECT * FROM stored_proc(); FETCH ALL FROM cursor1; FETCH ALL FROM cursor2; FETCH ALL FROM cursor3; etc. COMMIT; However, there are some cases in the stored procedure where some of the result sets returned by these cursors are also needed as inputs to additional queries. To use them, I am currently doing: FOR temp IN cursorX LOOP -- Some code that pushes the current temp record onto the end of an array END LOOP; OPEN cursorX; MOVE FIRST FROM cursorX; Then, when I need to use the results in a query, I do something like: SELECT * FROM table1 INNER JOIN (SELECT * FROM unnest(result_array)) AS table2 ON ( blah blah ) WHERE blah This seems extremely inefficient to me. First, I'm not sure of the penalty for unnesting an array into a SET OF object. Second, the set of records returned from unnesting would not be indexed for the join which means a sequential scan. Third, building the array in the first place using array_append seems extremely inefficient. Fourth, opening the cursor twice seems like it would execute the query twice, though given the placement and context, it's probably got it cached somewhere (hopefully). I'm sure there are probably other things I am overlooking. Instead of doing things this way, I think using temporary tables is really what I want. I am thinking that instead of doing this cursor BS, I can do something like: CREATE TEMPORARY TABLE table2 WITH (OIDS=FALSE) ON COMMIT DROP AS ( SELECT * FROM blah blah blah -- whatever the cursor is defined as doing ); ALTER TABLE table2 ADD PRIMARY KEY (id); CREATE INDEX table2_blah_idx ON table2 USING btree (blah); ANALYZE table2; Then, when I need to use the results in another query, I could do: SELECT * FROM table1 INNER JOIN table2 ON ( blah blah ) WHERE blah This would use the indexes and the primary key appropriately. I could also ensure that the order of the information in the temporary table is such that it facilitates any joining, where clauses, or order by clauses on the additional queries. Finally, to get results into my application, I would then do: BEGIN; SELECT * FROM stored_procedure(); SELECT * FROM temp_table1; SELECT * FROM temp_table2; SELECT * FROM temp_table3; etc COMMIT; However, this is a fairly major re-write of how things are done. Before I spend the time to do all that re-writing, can anyone share some insight on where / how I might expect to gain performance from this conversion and also speak to some of the overhead (if any) in using temporary tables like this (building them, creating indexes on them, analyzing them, then dropping them on commit)? It is worth mentioning that the data being stored in these temporary tables is probably <1,000 records for all tables involved. Most will probably be <100 records. Some of these temporary tables will be joined to other tables up to 4 more times throughout the rest of the stored procedure. Most will be generated and then retrieved only from outside the stored procedure. Obviously, I would not create indexes on or analyze the temporary tables being retrieved only from outside the stored procedure. Indexes and primary keys will only be created on the tables that are joined to other tables and have WHERE conditions applied to them. I have done a lot of Googling on temporary tables and cursors in PostgreSQL, but I have found only very limited discussion as to performance differences with respect to how I'm planning on using them, and I am unsure about the quality of the information given that most of it is 4+ years out of date and posted on various expert exchanges and not on this pgsql-performance list. One final question: In this conversion to temporary table use, there are a couple of cases where I would prefer to do something like: prepare blah(blah blah) as select blah; Then, I want to call this prepared statement multiple times, passing a different argument value each time. The only reason to do this would be to save writing code and to ensure that updating the select statement in once place covers all places where it is used. However, I am concerned it might incur a performance hit by re-preparing the query since I assume that having this inside the PL/PGSQL procedure means it is already prepared once. Can anyone speak to this? I know that I could put it in a separate stored procedure, but then the question becomes, does that add extra overhead? Or, in different words, is it similar to the difference between an inlined function and a non-inlined function in C? I would greatly appreciate any insights to these questions/issues. Thanks in advance for any assistance anyone can provide. -- Eliot Gable "We do not inherit the Earth from our ancestors:
[PERFORM] Re: [PERFORM] Dbt2 with postgres issues on CentOS-5. 3
2010/4/20 MUHAMMAD ASIF : > Hi, > > I am using dbt2 on Linux 64 (CentOS release 5.3 (Final)) . I have compiled > latest postgresql-8.4.3 code on the machine and run dbt2 against it. I am > little confused about the results. I ran dbt2 with the following > configuration i.e. > > DBT2 Options : > WAREHOUSES=75 > DB_CONNECTIONS=20 > REGRESS_DURATION=1 #HOURS > REGRESS_DURATION_SEC=$((60*60*$REGRESS_DURATION)) > > DBT2 Command : > ./dbt2-pgsql-create-db > ./dbt2-pgsql-build-db -d $DBDATA -g -r -w $WAREHOUSES > ./dbt2-run-workload -a pgsql -c $DB_CONNECTIONS -d > $REGRESS_DURATION_SEC -w $WAREHOUSES -o $OUTPUT_DIR > ./dbt2-pgsql-stop-db > > I am not able to understand the sar related graphs. Iostat,mpstat and vmstat > results are similar but > sar results are strange. I tried to explore the dbt2 source code to find > out the how graphs are drawn and why sar results differ.DBT2.pm : 189 reads > sar.out and parse it and consider 1 minute elapsed time between each record > i.e. That is certainly a weakness in the logic of the perl modules in plotting the charts accurately. I wouldn't be surprised if the other stat tools suffer the same problem. Regards, Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replacing Cursors with Temporary Tables
I think it's really tough to say how this is going to perform. I'd recommend constructing a couple of simplified test cases and benchmarking the heck out of it. One of the problems with temporary tables is that every time you create a temporary table, it creates a (temporary) record in pg_class; that can get to be a problem if you do it a lot. Another is that for non-trivial queries you may need to do a manual ANALYZE on the table to get good stats for the rest of the query to perform well. But on the flip side, as you say, nesting and unnesting of arrays and function calls are not free either. I am going to hazard a SWAG that the array implementation is faster UNLESS the lack of good stats on the contents of the arrays is hosing the performance somewhere down the road. But that is really just a total shot in the dark. Another possible implementation might be to have a couple of permanent tables where you store the results. Give each such table a "batch id" column, and return the batch id from your stored procedure. This would probably avoid a lot of the overhead associated with temp tables while retaining many of the benefits. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very high effective_cache_size == worse performance?
On Apr 20, 2010, at 12:22 PM, Scott Marlowe wrote: > On Tue, Apr 20, 2010 at 12:47 PM, David Kerr wrote: >> On Tue, Apr 20, 2010 at 02:15:19PM -0400, Robert Haas wrote: >> - On Tue, Apr 20, 2010 at 2:03 PM, David Kerr wrote: >> - > that thought occured to me while I was testing this. I ran a vacuumdb -z >> - > on my database during the load and it didn't impact performance at all. >> - >> - The window to run ANALYZE usefully is pretty short. If you run it >> - before the load is complete, your stats will be wrong. If you run it >> - after the select statements that hit the table are planned, the >> - updated stats won't arrive in time to do any good. >> >> right, but i'm loading 20 million records in 1000 record increments. so >> the analyze should affect all subsequent increments, no? > > I keep thinking FK checks are taking a long time because they aren't > cached because in import they went through the ring buffer in pg or > some other way aren't in a buffer but large effective cache size says > it's 99.99% chance or better that it's in cache, and chooses a poor > plan to look them up. Just a guess. > Yeah, I was thinking the same thing. If possible make sure the table either has no indexes and FK's or only the minimum required (PK?) while doing the load, then add the indexes and FK's later. Whether this is possible depends on what the schema is and what must be known by the app to load the data, but if you can do it its a huge win. Of course, if its not all in one transaction and there is any other concurrency going on that could be a bad idea. Or, if this is not a load on a fresh table but an append/update it may not be possible to drop some of the indexes first. Generally speaking, a load on a table without an index followed by index creation is at least twice as fast, and often 5x as fast or more. This is less true if each row is an individual insert and batching or 'insert into foo values (a, b, c, ...), (a2, b2, c2, ...)' multiple row syntax is not used. > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replacing Cursors with Temporary Tables
On Apr 21, 2010, at 1:16 PM, Eliot Gable wrote: > I have previously discussed my very long PL/PGSQL stored procedure on this > list. However, without getting into too many details, I have another > performance-related question. > > The procedure currently uses cursors to return multiple result sets to the > program executing the procedure. Basically, I do this: > > CREATE TEMPORARY TABLE table2 WITH (OIDS=FALSE) ON COMMIT DROP AS ( > SELECT * FROM blah blah blah -- whatever the cursor is defined as doing > ); > ALTER TABLE table2 ADD PRIMARY KEY (id); > CREATE INDEX table2_blah_idx ON table2 USING btree (blah); > ANALYZE table2; > > Then, when I need to use the results in another query, I could do: > > SELECT * FROM table1 INNER JOIN table2 ON ( blah blah ) WHERE blah > > This would use the indexes and the primary key appropriately. I could also > ensure that the order of the information in the temporary table is such that > it facilitates any joining, where clauses, or order by clauses on the > additional queries. Finally, to get results into my application, I would then > do: I have had good luck with temp tables, but beware -- there isn't anything special performance wise about them -- they do as much I/O as a real table without optimizations that know that it will be dropped on commit so it doesn't have to be as fail-safe as ordinary ones. Even so, a quick CREATE TABLE foo ON COMMIT DROP AS (SELECT ...); ANALYZE foo; SELECT FROM foo JOIN bar ... ; can be very effective for performance. However, creating the indexes above is going to slow it down a lot. Most likely, the join with a seqscan will be faster than an index build followed by the join. After all, in order to build the index it has to seqscan! If you are consuming these tables for many later select queries rather than just one or two, building the index might help. Otherwise its just a lot of extra work. I suggest you experiment with the performance differences using psql on a specific use case on real data. > One final question: > > In this conversion to temporary table use, there are a couple of cases where > I would prefer to do something like: > > prepare blah(blah blah) as select blah; > > Then, I want to call this prepared statement multiple times, passing a > different argument value each time. The only reason to do this would be to > save writing code and to ensure that updating the select statement in once > place covers all places where it is used. However, I am concerned it might > incur a performance hit by re-preparing the query since I assume that having > this inside the PL/PGSQL procedure means it is already prepared once. Can > anyone speak to this? I know that I could put it in a separate stored > procedure, but then the question becomes, does that add extra overhead? Or, > in different words, is it similar to the difference between an inlined > function and a non-inlined function in C? I can't speak for the details in your question, but it brings up a different issue I can speak to: Prepared statements usually cause the planner to create a generic query plan for all possible inputs. For some queries where the parameters can significantly influence the query plan, this can be a big performance drop. For other queries (particularly inserts or simple selects on PK's) the cached plan saves time. > I would greatly appreciate any insights to these questions/issues. > > Thanks in advance for any assistance anyone can provide. > > > -- > Eliot Gable > > "We do not inherit the Earth from our ancestors: we borrow it from our > children." ~David Brower > > "I decided the words were too conservative for me. We're not borrowing from > our children, we're stealing from them--and it's not even considered to be a > crime." ~David Brower > > "Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not > live to eat.) ~Marcus Tullius Cicero -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance