from:"PFC"

[HACKERS] Table and Index compression

2009-08-06 Thread PFC




With the talk about adding compression to pg_dump lately, I've been  
wondering if tables and indexes could be compressed too.

So I've implemented a quick on-the-fly compression patch for postgres

Sorry for the long email, but I hope you find this interesting.

Why compress ?

1- To save disk space ?
Disks are so cheap now that it is not a valid reason.
Besides, anyone with a database that is big enough to fill a modern  
harddisk probably also has a monster RAID to get more IO throughput.


2- To make the database faster ?
This would be a valid reason.

If the database is made smaller through compression, this can have several  
interesting effects :
- you need less RAM to cache it entirely, so random IO hits more cache  
than before
- if the CPU can decompress faster than the disk IO, even seq scans can be  
faster (this will not be the case on monster RAID setups, but for cheap  
servers, it could be)


So, why not ? I coded it.

I've benchmarked lzo, lzf, quicklz, lzjb, and fastLZ.
The best for this is lzo : very fast decompression, a good compression  
ratio on a sample of postgres table and indexes, and a license that could  
work.

QuickLZ compresses faster and more, but is not free.

Compression would compress each page independently, so it is fast to  
decompress a single page.
This means 8k pages are a bit small, I used 32k pages to get better  
compression.


I've checked various table and index files.
- your average table with many columns, some TEXT, etc, compresses about 2x
- when the row header is significant vs the row data, it can reach 3x
- indexes compress well too, 2-4x
- gist indexes could be compressed up to 6x sometimes

Roughly, 2-3x compression is achievable on a complete database.



Implementation

This lives in md.c
The implementation is about 100 lines of code (!)

Instead of calling FileWrite and FileRead, md.c calls special functions  
that read and write compressed blocks.


* Writing :

The block is compressed in a temp memory buffer.
A header (prepended to the compressed data) tells the length of this data.
Then, it is written where in the disk file.

* Reading :

The first 4k of compressed data is read.
Looking at the length header, we know how much more to read.
The rest of the data (if any) is read.
The data is decompressed.

Since a compressed block can be larger than the original block, I have  
enlarged the block size in the files by 4k, so that there is a block every  
40k instead of every 32k (with 32k postgres pages).


That's it, very simple.



Now, the reason it works is the underlying file is not handled as sparse  
by the OS.
The holes between compressed blocks are removed : not recorded on disk,  
and never cached either.


However sparse files can have big performance problems if you do this :

- write a small block in the middle of the file, surrounded by holes
- later enlarge this block

When the block is enlarged, if it needs an extra filesystem page, it will  
not be allocated contiguously.

When it is read later, it will need an extra seek, which is really bad.

So, looking at the compression statistics :

gist index for geometric coordinates search :
a 32k page is compressed to 1-2 4k-pages, very rarely 3 pages.

btree indexes :
a 32k page is compressed to 2-3 4k-pages

large table :
a 32k page is compressed to 2-4 4k-pages

Therefore, on write, I pre-allocate some space in the sparse file, by  
writing more than needed : currently I write 5 4k-blocks.
Whatever is written after the compressed data is garbage previously in the  
buffer, it is ignored on reads.


This means the disk space savings are less than a full compression, but  
access is much smoother, in fact much like a regular non-sparse file,  
since the blocks between the holes almost never need to be grown.


Without pre-allocating, performance is abysmal, not even worth talking  
about.


Pre-allocated but not actually used blocks are never read, except maybe by  
OS readahead during seq scan.
On a heavy random access database they will not be touched, not wasting  
any space in the OS cache.




shared_buffers thus contains decompressed blocks : a row that is updated  
very often will not go through decompression-compression cycles each time.

The OS cache contains compressed data.

Some tests :

It appears to behave as expected. It didn't crash (yet...).

Basically it looks like RAM has doubled and CPU speed is halved.

Random access queries are faster, even on a cold cache, and of course,  
much better cached afterwards, since the amount of data the OS cache can  
hold is at least doubled.


Seq scans on a huge table, reading data from disk, are a tiny bit slower,  
which is strange : on my test box, the disks are slow (50 MB/s) and lzo  
can decompress much faster than this. At least it isn't slower.


Seq scans on a cached table that would fit in RAM anyway are slower,  
because it needs to be decompressed.


Se

Re: [HACKERS] More thoughts on sorting

2009-08-01 Thread PFC


PFC  writes:

- for short strings (average 12 bytes), sort is CPU-bound in strcoll()
- for longer strings (average 120 bytes), sort is even more CPU-bound in
strcoll()


No news there.  If you are limited by the speed of text comparisons,
consider using C locale.

regards, tom lane



	Actually, I think (see the bottom of my last email) that this would be a  
good argument for the per-column COLLATE patch...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] More thoughts on sorting

2009-07-31 Thread PFC



There was a thread some time ago about sorting... it kind of died...
I did some tests on a desktop (Postgres 8.3.7, kubuntu, Core 2 dual core,  
4GB RAM, RAID1 of 2 SATA disks)


Quick conclusions :

- grabbing the stuff to sort can be IO bound of course (not here)
- for short strings (average 12 bytes), sort is CPU-bound in strcoll()
- for longer strings (average 120 bytes), sort is even more CPU-bound in  
strcoll()
- strcoll() time seems to depend on the length of the strings, not the  
place where a difference occurs (grokking glibc source code confirms)


See detailed test procedure below.

Locale is fr_FR.UTF-8 and database is UNICODE
All strings are ASCII, they are mostly alphanumeric references.
There are 391469 strings.
min length 6 chars
max length 80 chars
avg length 11.82 chars

We have a table test with (id INTEGER PRIMARY KEY, TEXT, BYTEA ), and  
contents of TEXT and BYTEA are identical.
We have a table test2 which contains the same thing as test, except the id  
and a 100-character constant are appended to the strings to make them  
longer.



Test Procedure :

Grab test data from :
http://home.peufeu.com/pg/test_data_references.copy.gz

 Sorting with Python

Sorting all string converted to unicode (from utf8) using strcoll() and  
correct locale

=> 5.8 s

With longer strings (as in table test2 below )
=> 8 s


 Postgres

To get query timings, I use \t and "SELECT * FROM test ORDER BY id OFFSET
391468;" which avoids EXPLAIN ANALYZE overhead, it just prints the last  
row from the results. Timings are a bit shorter than EXPLAIN ANALYZE  
gives, and I checked the plans, they are all sorts.


-- Create test table and load it
BEGIN;
CREATE TABLE test1 (t TEXT NOT NULL);
\copy test1 FROM test_data_references.copy
CREATE TABLE test (id SERIAL PRIMARY KEY, t TEXT NOT NULL, b BYTEA NOT
NULL );
INSERT INTO test (t,b) SELECT t,t::BYTEA FROM test1;
DROP TABLE test1;
ALTER TABLE test DROP CONSTRAINT test_pkey;
CREATE TABLE test2 (id INTEGER NOT NULL, t TEXT NOT NULL, b BYTEA NOT NULL
);
INSERT INTO test2 SELECT id,
(t || id || 'This is a dummy text of length 100  
bytes') AS t,
(t || id || 'This is a dummy text of length 100  
bytes')::BYTEA

AS b
 FROM test;
COMMIT;

\d test

SHOW work_mem;
--> 16MB
SHOW maintenance_work_mem;
--> 512MB

\timing
-- cache it really well
SELECT count(*) FROM test;
SELECT count(*) FROM test;
SELECT count(*) FROM test;
--> 391469
--> Temps : 87,033 ms

SELECT * FROM test ORDER BY id OFFSET 391468;
--> Temps : 918,893 ms

SELECT id FROM test ORDER BY id OFFSET 391468;
--> Temps : 948,015 ms

Interpretation :
- Time for hauling around extra data (SELECT * instead of SELECT id) is  
not significant.
- Sorting by integers is quite fast (not THAT fast though, MySQL below is  
3x faster when selecting just 'id' and 2x slower when SELECT *, hum.)


SELECT * FROM test ORDER BY b OFFSET 391468;
--> Temps : 2145,555 ms

SELECT id FROM test ORDER BY b OFFSET 391468;
--> Temps : 2152,273 ms

Interpretation :
- Time for hauling around extra data (SELECT * instead of SELECT id) is  
not significant.
- Sorting by BYTEA is just a memcmp(), it is strange that is it 2x slower  
than ints. Probably the varlena stuff, I guess.

- See ridiculous MySQL results using a BLOB below which are 10x slower

SELECT * FROM test ORDER BY t OFFSET 391468;
--> Temps : 7305,373 ms

SELECT id FROM test ORDER BY t OFFSET 391468;
--> Temps : 7345,234 ms

Interpretation :
- Time for hauling around extra data (SELECT * instead of SELECT id) is  
not significant.

- Sorting localized TEXT really is SLOW !
- The little test above calling strcoll() from Python confirms the  
slowness is in strcoll()
- MySQL (see below) seems to be much faster (about equal to postgres) on  
VARCHAR, and 2x slower on TEXT (hum...)


BEGIN;
CREATE INDEX test_id ON test( id );
--> Temps : 555,718 ms

CREATE INDEX test_b ON test( b );
--> Temps : 1762,263 ms

CREATE INDEX test_t ON test( t );
--> Temps : 6274,624 ms

ROLLBACK;

Interpretation :
- maintenance_work_mem is much higher than work_mem so the sorts are  
faster, but the slowness in localized text sorting subsists...



SELECT count(*) FROM test2;
--> 391469
--> Temps : 114,669 ms

SELECT * FROM test2 ORDER BY id OFFSET 391468;
--> Temps : 1788,246 ms

SELECT id FROM test2 ORDER BY id OFFSET 391468;
--> Temps : 989,238 ms

Interpretation :
- Time for hauling around extra data (SELECT * instead of SELECT id) IS  
significant this time due to the extra string lengths.


SELECT * FROM test2 ORDER BY b OFFSET 391468;
--> Temps : 2906,108 ms

SELECT id FROM test2 ORDER BY b OFFSET 391468;
--> Temps : 2554,931 ms

SELECT * FROM test2 ORDER BY t OFFSET 391468;
--> Temps : 10637,649 ms

SELECT id FROM test2 ORDER BY t OFFSET 391468;
--> Temps : 10322,480 ms

Interpretation :
- Note : the strings are longer, however they are sorta

Re: [HACKERS] Protection from SQL injection

2008-05-01 Thread PFC




Sure, modifying the WHERE clause is still possible, but the attacker is
a lot more limited in what he can do if he can't tack on a whole new
command.


	I hacked into a site like that some day to show a guy that you shouldn't  
trust magicquotes (especially when you switch hosting providers and it's  
not installed at your new provider, lol).

Binary search on the password field by adding some stuff to the WHERE...
	You could still wipe out tables (just add a "' OR 1;--" to the id in the  
url to delete somthing...


	But it's true that preventing multi-statements adds a layer of  
idiot-proofness... a rather thin layer...




The important aspects of this that I see are:

1. Inexpensive to implement;
2. Unlikely to break most applications;
3. Closes off a fairly large class of injection attacks.

The cost/benefit ratio looks pretty good (unlike the idea that started
this thread...)

regards, tom lane





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protection from SQL injection

2008-04-30 Thread PFC




Could we also get a mode, where PREPARE would only be allowed for
queries of the form "SELECT * FROM func(?,?,?,?,?); :)


	Actually, that is similar to the concept of "global prepared statements"  
that I proposed some time ago, but I will not have time to write the  
patch, alas...
	Idea was that the DBA can create a list of SQL statements (with  
privileges about who can execute them, just like functions) which are  
prepared on-demand at the first EXECUTE by the client.
	This would enhance performance (but for performance I like the idea of  
caching plans better).
	It would be pretty cumbersome, though, to execute dynamic SQL like the  
typical search query...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protection from SQL injection

2008-04-29 Thread PFC


For example, some applications need to replace whole phrases:

$criteria = "WHERE $var1 = '$var2'"

This is a very common approach for dynamic search screens, and really not
covered by placeholder approaches.


Python, again :

params = {
'column1': 10,
'column2': "a st'ring",
}

where = " AND ".join( "%s=%%s" % (key,value) for key,value in  
params.items() )

cursor.execute( "SELECT * FROM table WHERE " + where, params )

I use the same approach (albeit more complicated) in PHP.

	For complex expressions you can play with arrays etc, it is not that  
difficult.

Or you just do :

$criteria = db_quote_query( "WHERE $var1 = %s", array( $var2 ))

using the function I posted earlier.

	This supposes of course that $var1 which is the column name, comes from a  
known source, and not user input.
	In that case, $var1 will probably be the form field name, which means it  
is specified by the programmer a few lines prior in the code.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protection from SQL injection

2008-04-29 Thread PFC


zero developer pain


Actually it's not zero pain, but the main problem is: there is no way
to enforce using it.


	Sure, there is no way to enforce it (apart from grepping the source for  
pg_query() and flogging someone if it is found), but is it really  
necessary when the right solution is easier to use than the wrong solution  
? Capitalizing on developer laziness is a win IMHO, lol.



The problem is not only quotes. The problem is all kinds of user
input. For example: sql = "SELECT * FROM ORDERS WHERE ORDER_ID = " +
orderId; This is not a problem if orderId is a number. But what if
it's a String? For example "1 AND (SELECT * FROM USERS WHERE
NAME='admin' AND PASSWORD LIKE 'm%')". An attacker could then retrieve
the admin password quite quickly.


IMHO this is an example of what should never be done.

// very bad (especially in PHP where you never know the type of your  
variables)

sql = "SELECT * FROM ORDERS WHERE ORDER_ID = " + orderId;

// slightly better (and safe)
sql = "SELECT * FROM ORDERS WHERE ORDER_ID = " + int( orderId );

// correct (PHP syntax)
pg_query_params( "SELECT * FROM ORDERS WHERE ORDER_ID = $1",  
array( orderId ))

db_query( "SELECT * FROM ORDERS WHERE ORDER_ID = %s", array( orderId ))

// correct (Python syntax)
cursor.execute( "SELECT * FROM ORDERS WHERE ORDER_ID = %s", ( orderId, ))

The last two don't complain if orderId is a string, it will be correctly  
quoted, and then postgres will complain only if it is a string which does  
not contain a number. This is useful in PHP where you never know what type  
you actually have.


The little function in my previous mail is also useful for mysql which has  
no support for parameterized queries.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protection from SQL injection

2008-04-29 Thread PFC


On Tue, 29 Apr 2008 01:03:33 +0200, Brendan Jurd <[EMAIL PROTECTED]> wrote:


On Tue, Apr 29, 2008 at 7:00 AM, PFC <[EMAIL PROTECTED]> wrote:
 I have found that the little bit of code posted afterwards did  
eliminate
SQL holes in my PHP applications with zero developer pain, actually it  
is

MORE convenient to use than randomly pasting strings into queries.

 You just call
 db_query( "SELECT * FROM table WHERE column1=%s AND column2=%s", array(
$var1, $var2 ));



Implementing this for yourself is crazy; PHP's Postgres extension
already does this for you since 5.1.0:

$result = pg_query_params("SELECT foo FROM bar WHERE baz = $1",  
array($baz));


http://www.php.net/manual/en/function.pg-query-params.php

Cheers,
BJ


pg_query_params is quite slower actually...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protection from SQL injection

2008-04-28 Thread PFC



As you know, "SQL injection" is the main security problem of databases  
today.


I think I found a solution: 'disabling literals'. Or you may call it
'enforcing the use of parameterized statements'. This means that SQL
statements with embedded user input are rejected at runtime. My
solution goes beyond saying "developers ~should~ use parameterized
statements". That is not a solution because developers are lazy. My
solution is: "developers MUST use parameterized statements". It goes
like this: Literals are disabled using the SQL statement:


I have found that the little bit of code posted afterwards did eliminate  
SQL holes in my PHP applications with zero developer pain, actually it is  
MORE convenient to use than randomly pasting strings into queries.


You just call
db_query( "SELECT * FROM table WHERE column1=%s AND column2=%s",  
array( $var1, $var2 ));


It is inspired from the Python interface which performs the same (but  
slightly more elegantly).

I have removed the logging features for clarity.

function db_quote_query( $sql, $params=false )
{
// if no params, send query raw
if( $params === false ) return $sql;
if( !is_array( $params )) $params = array( $params );

// quote params
foreach( $params as $key => $val )
{
		if( is_array( $val )) $params[$key] = implode( ', ', array_map( intval,  
$val ));
		else	  $params[$key] =  
is_null($val)?'NULL':("'".pg_escape_string($val)."'");;

}
return vsprintf( $sql, $params );
}

function db_query( $sql, $params=false )
{
// it's already a query
if( is_resource( $sql ))
return $sql;

$sql = db_quote_query( $sql, $params );

$r = pg_query( $sql );
if( !$r )
{
		echo "Erreur PostgreSQL :/>".htmlspecialchars(pg_last_error())."Requête :/>".$sql."Traceback :";

foreach( debug_backtrace() as $t ) xdump( $t );
echo "";
die();
}
return $r;
}








SET ALLOW_LITERALS NONE;

Afterwards, SQL statements with text are not allowed any more for this
session. That means, SQL statement of the form "SELECT * FROM USERS
WHERE PASSWORD='qerkllkj'" will fail with the exception 'Literals are
not allowed, please use parameters'. It is like the database does not
know what ='qerkllkj' means. Only statements of the secure form are
allowed, for example "SELECT * FROM USERS WHERE PASSWORD=?". This
solves the problem because SQL injection is almost impossible if user
input is not directly embedded in SQL statements.

The 'ALLOW_LITERALS NONE' mode is enabled by the developer itself, or
by an administrator. It is still possible to generate SQL statements
dynamically, and use the same APIs as before, as long as SQL
statements don't include literals. Literals can still be used when
using query tools, or in applications considered 'safe'. To ease
converting the application to use parameterized queries, there should
be a second mode where number literals are allowed: SET ALLOW_LITERALS
NUMBERS. To allow all literals, execute SET ALLOW_LITERALS ALL (this
is the default setting).

So far this feature is implemented in my little database H2. More
information about this feature is described here:
http://www.h2database.com/html/advanced.html#sql_injection

I know about the Perl taint mode, but this is only for Perl. I also
know about disabling multi-statement commands (only solves part of the
problem). PostgreSQL should also support database level 'constants'
that are similar to constants in other programming languages,
otherwise application level constants (such as 'active') can't be used
in queries directly (I propose to add new SQL statements CREATE
CONSTANT ... VALUE ... and DROP CONSTANT ..., example: CREATE CONSTANT
STATE_ACTIVE VALUE 'active'). I also know the 'disabling literals'
feature does not solve SQL injection completely: for example 'ORDER BY
injection' where an application dynamically adds the column to sort on
based on a hidden 'sort column' field in a web app. To solve that I
suggest to support parameterized ORDER BY: ORDER BY ? where ? is an
integer. Then, instead of using SET ALLOW_LITERALS NONE the use of
literals should probably be two access right (REVOKE LITERAL_TEXT,
LITERAL_NUMBER FROM APP_ROLE). Those are details that still need to be
discussed.

What do you think about it? Do you think it makes sense to implement
this security feature in PostgreSQL as well? If not why not? Does
PostgreSQL have another solution or plan to solve the SQL injection
problem?

Regards,
Thomas

P.S. I have send this proposal to [EMAIL PROTECTED] first and
got replies, but I would like to get some feedback from the PostgreSQL
developers as well.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-23 Thread PFC




Example : let's imagine a "cache priority" setting.


Which we can presume the DBA will set incorrectly because the tools  
needed to set that right aren't easy to use.


LOL, yes.

Jim threw out that you can just look at the page hit percentages  
instead. That's not completely true.  If you've had some nasty query  
blow out your buffer cache, or if the server has been up a looong time  
and the total stas don't really reflect recent reality, what's in the  
buffer cache and what the stats say have been historical cached can  
diverge.


Yes :
- perform huge query on table A
- table A is now in cache
- perform huge query on table B
- table B is now in cache, A isn't
- perform huge query on table A again
- postgres still thinks table A is cached and chooses a bad plan


This would not examine whatever is in the OS' cache, though.


	Yeah, but now that shared_buffers can be set to a large part of physical  
RAM, does it still matters ?
	Point is, postgres knows what is in the shared_buffers, so it can make a  
good decision. Postgres doesn't know what the OS has in cache, so it could  
only make a wild guess. I would rather err on the side of safety...




I don't know that it's too unrealistic to model the OS as just being an  
extrapolated bigger version of the buffer cache.  I can think of a  
couple of ways those can diverge:


1) Popular pages that get high usage counts can end up with a higher  
representation in shared_buffers than the OS


2) If you've being doing something like a bulk update, you can have lots  
of pages that have been written recently in the OS cache that aren't  
really accounted for fully in shared_buffers, because they never get a  
high enough usage count to stay there (only used once) but can fill the  
OS cache as they're spooled up to write.


Especially on CHECKPOINT

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-22 Thread PFC



It started with this query :

EXPLAIN ANALYZE SELECT * FROM relations r JOIN nodes n ON  
(n.id=r.child_id) WHERE r.parent_id=16330;

QUERY PLAN
---
 Hash Join  (cost=370.96..496.29 rows=543 width=273) (actual  
time=18.887..21.164 rows=543 loops=1)

   Hash Cond: (r.child_id = n.id)
   ->  Index Scan using relations_unique on relations r   
(cost=0.00..111.75 rows=543 width=58) (actual time=0.022..0.519 rows=543  
loops=1)

 Index Cond: (parent_id = 16330)
   ->  Hash  (cost=243.76..243.76 rows=10176 width=215) (actual  
time=18.830..18.830 rows=10176 loops=1)
 ->  Seq Scan on nodes n  (cost=0.00..243.76 rows=10176 width=215)  
(actual time=0.006..5.135 rows=10176 loops=1)

 Total runtime: 21.453 ms

SET enable_hashjoin TO 0;

EXPLAIN ANALYZE SELECT * FROM relations r JOIN nodes n ON  
(n.id=r.child_id) WHERE r.parent_id=16330;

QUERY PLAN
---
 Nested Loop  (cost=0.00..514.50 rows=543 width=273) (actual  
time=0.037..4.412 rows=543 loops=1)
   ->  Index Scan using relations_unique on relations r   
(cost=0.00..111.75 rows=543 width=58) (actual time=0.023..0.476 rows=543  
loops=1)

 Index Cond: (parent_id = 16330)
   ->  Index Scan using nodes_pkey on nodes n  (cost=0.00..0.73 rows=1  
width=215) (actual time=0.004..0.005 rows=1 loops=543)

 Index Cond: (n.id = r.child_id)
 Total runtime: 4.638 ms

In order to have it use the fast plan I must set random_page_cost to 1  
which I absolutely don't want to do.

Setting effective_cache_size to huge values has no effect.
If I select a value of parent_id that has much less children, the index  
will be used, but in this case I think the threshold is misplaced, it  
should be slightly higher. Here we have about 5% of values selected. Hash  
join becomes better at about 15% because the table is cached.

This is 8.3.

Perhaps there would be a need for a per-object setting  
(object=table,index,partition) to alter the aggressiveness/lazyness of the  
page flushing and how long the pages for this object are kept in  
shared_buffers... this would be used to modify random_page_cost on a  
per-table/index/partition basis.


Example : let's imagine a "cache priority" setting.

- "cache priority" set to the minimum means this table is mostly write-only
- "cache priority" set to default would give current behaviour (which is  
correct in most cases)
- "cache priority" set to a high value would tell Postgres "I know this  
table/index/partition is small and often accessed rather randomly, so I  
want you to keep it in shared_buffers, purge it if you must but otherwise  
keep it in memory, flush something else instead which has lower  
cache_priority".


The optimizer could then use a different (much lower) value of  
random_page_cost for tables for which "cache priority" is set highest  
since it would know.


An alternative would be for the background writer to keep some stats and  
do the thing for us :


- begin bgwriter scan
- setup hashtable of [relid => page count]
- at each page that is scanned, increment "page count" for this relation  
(uses very little CPU)

- end bgwriter stats
- for each relation, compare the number of pages we found in  
shared_buffers with the number of pages in the relation and draw  
conclusions about how well cached the relation is

- update random_page_cost accordingly for this relation

This would not examine whatever is in the OS' cache, though.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Plan targetlists in EXPLAIN output

2008-04-17 Thread PFC

On Thu, 17 Apr 2008 20:42:49 +0200, Simon Riggs <[EMAIL PROTECTED]>  
wrote:



On Thu, 2008-04-17 at 12:34 -0400, Tom Lane wrote:


I'm tempted to propose redefining the currently-nearly-useless
EXPLAIN VERBOSE option as doing this.


Yes please.

Sounds like a good home for other useful things also.

I'd like to have an EXPLAIN mode that displayed the plan without *any*
changeable info (i.e. no costs, row counts etc). This would then allow
more easy determination of whether plans had changed over time. (But
EXPLAIN TERSE sounds silly).



Plan = Tree
Tree = XML

EXPLAIN ANALYZE SELECT * FROM test NATURAL JOIN test2 WHERE id  
=ANY('{3,666,975,521'});

QUERY PLAN
---
 Nested Loop  (cost=17.04..65.13 rows=1 width=8) (actual  
time=51.835..51.835 rows=0 loops=1)

   Join Filter: (test.value = test2.value)
   ->  Bitmap Heap Scan on test  (cost=17.04..31.96 rows=4 width=8)  
(actual time=16.622..16.631 rows=4 loops=1)

 Recheck Cond: (id = ANY ('{3,666,975,521}'::integer[]))
 ->  Bitmap Index Scan on test_pkey  (cost=0.00..17.04 rows=4  
width=0) (actual time=16.613..16.613 rows=4 loops=1)

   Index Cond: (id = ANY ('{3,666,975,521}'::integer[]))
   ->  Index Scan using test2_pkey on test2  (cost=0.00..8.28 rows=1  
width=8) (actual time=8.794..8.795 rows=1 loops=4)

 Index Cond: (test2.id = test.id)

EXPLAIN XML ...









Nicely parsable and displayable in all its glory in pgadmin ;)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread PFC




My wife has a snake phobia, besides, I've just started learning Scala.


	Just had a look at Scala, it looks nice. Slightly Lispish (like all good  
languages)...



txid_current()
No... hold on, it is per session, and a session can't have two or more  
transactions active at once can it?


It could be used to detect rollback.

So the problem is that other functions may be using GD themselves, and  
your own code is at the mercy of the other functions. Conversely you  
shouldn't clear GD, as some other function may be using it.


Exactly.

So you're better off using a single function for everything, and using  
SD within it?


	Since the purpose is to store counts for rows matching a certain criteria  
in a set of tables, you could build a hashtable of hashtables, like :


GD[table name][criteria name][criteria value] = count

	This would add complexity, about half a line of code. But you'd have to  
create lots of plpgsql trigger functions to wrap it.


There isn't any way of telling whether the function is being called for  
the first time in a transaction. You don't know when to clear it.


The first time in a session, GD will be empty.
	Clearing it at the start of a transaction would not be useful (clearing  
it at ROLLBACK would).
	It is updating the "real" summary table with the contents of this hash  
that is the problem, also.


	So, basically, if you connect, do one insert, and disconnect, this would  
be useless.
	But, if you do a zillion inserts, caching the counts deltas in RAM would  
be faster.
	And if you use persistent connections, you could update the counts in the  
real table only every N minutes, for instance, but this would need some  
complicity from the backend.



Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement,  
customer focus, and courage. This email with any attachments is  
confidential and may be subject to legal privilege.  If it is not  
intended for you please advise by reply immediately, destroy it and do  
not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread PFC

On Thu, 17 Apr 2008 02:48:37 +0200, Stephen Denne  
<[EMAIL PROTECTED]> wrote:



PFC wrote:

Let's try this quick & dirty implementation of a local
count-delta cache
using a local in-memory hashtable (ie. {}).



CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER )
   RETURNS INTEGER
AS $$
 if key in GD:
 GD[key] += delta
 else:
 GD[key] = delta
 return GD[key]
$$ LANGUAGE plpythonu;


Thanks for the code, this seems to be very much what I was looking for.

I don't know plpythonu (nor python), just read a few docs now:


Learn Python, it is a really useful language ;)

"The global dictionary SD is available to store data between function  
calls. This variable is private static data. The global dictionary GD is  
public data, available to all Python functions within a session. Use  
with care."


Does session == transaction or connection?
I don't understand the difference between SD and GD, private and public.  
Where are the context boundaries?


There is no sharing between processes, so
	- both SD and GD are limited to the current session (connection, postgres  
process), no shared memory is involved

- GD is global between all python functions (global)
- SD is specific to each python function (static)

	The big gotcha is that these are all non-transactional : if you rollback,  
GD and SD stay the same, and when you issue a query, you can assume the  
state of SD and GD is random (due to previous queries) unless you  
initialize them to a known value.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread PFC




The whole thing is a bit of an abuse of what the mechanism
was intended
for, and so I'm not sure we should rejigger GUC's behavior to make it
more pleasant, but on the other hand if we're not ready to provide a
better substitute ...


In my experiments with materialized views, I identified these problems  
as "minor" difficulties. Resolving them would allow further abuse ;)


Let's try this quick & dirty implementation of a local count-delta cache  
using a local in-memory hashtable (ie. {}).
Writing the results to stable storage in an ON COMMIT trigger is left as  
an exercise to the reader ;)

Performance isn't that bad, calling the trigger takes about 50 us.
Oldskool implementation with a table is at the end, it's about 10x slower.

Example :

INSERT INTO victim1 (key) VALUES ('one'),('two'),('two');
INSERT 0 3
Temps : 1,320 ms
test=# SELECT * FROM get_count();
 key | cnt
-+-
 two |   2
 one |   1


CREATE OR REPLACE FUNCTION clear_count(  )
  RETURNS VOID
AS $$
GD.clear()
$$ LANGUAGE plpythonu;

CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER )
  RETURNS INTEGER
AS $$
if key in GD:
GD[key] += delta
else:
GD[key] = delta
return GD[key]
$$ LANGUAGE plpythonu;

CREATE TYPE count_data AS ( key TEXT, cnt INTEGER );

CREATE OR REPLACE FUNCTION get_count( )
RETURNS SETOF count_data
AS $$
return GD.iteritems()
$$ LANGUAGE plpythonu;


CREATE TABLE victim( id SERIAL PRIMARY KEY, key TEXT NOT NULL );
INSERT INTO victim (key) SELECT (random() * 300)::INTEGER::TEXT FROM  
generate_series( 1,10 );


CREATE TABLE victim1( id SERIAL PRIMARY KEY, key TEXT NOT NULL );

\timing
INSERT INTO victim1 SELECT * FROM victim;
TRUNCATE TABLE victim1;

SELECT clear_count();
INSERT INTO victim1 SELECT * FROM victim RETURNING update_count( key, 1 );
SELECT * FROM get_count();
TRUNCATE TABLE victim1;

CREATE OR REPLACE FUNCTION counter_trigger_f()
RETURNS TRIGGER
LANGUAGE plpgsql
AS
$$
DECLARE
BEGIN
IF TG_OP = 'INSERT' THEN
PERFORM update_count( NEW.key, 1 );
RETURN NEW;
ELSEIF TG_OP = 'UPDATE' THEN
-- update topic
IF NEW.key != OLD.key THEN
PERFORM update_count( OLD.key, -1 ), update_count( NEW.key, 1  
);

END IF;
RETURN NEW;
ELSE-- DELETE
PERFORM update_count( OLD.key, -1 );
RETURN OLD;
END IF;
END;
$$;

CREATE TRIGGER count_trigger BEFORE INSERT OR UPDATE OR DELETE ON victim1  
FOR EACH ROW EXECUTE PROCEDURE counter_trigger_f();


SELECT clear_count();
INSERT INTO victim1 SELECT * FROM victim;
SELECT * FROM get_count();

SELECT clear_count();
TRUNCATE TABLE victim1;
INSERT INTO victim1 (key) VALUES ('one'),('two'),('two');
SELECT * FROM get_count();
DELETE FROM victim1 WHERE key='two';
SELECT * FROM get_count();
UPDATE victim1 SET key='three' WHERE key='one';
SELECT * FROM get_count();
DELETE FROM victim1;
SELECT * FROM get_count();


CREATE TABLE counts( key TEXT PRIMARY KEY, total INTEGER NOT NULL DEFAULT  
0 );


CREATE OR REPLACE FUNCTION table_counter_trigger_f()
RETURNS TRIGGER
LANGUAGE plpgsql
AS
$$
DECLARE
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE counts SET total=total+1 WHERE key=NEW.key;
IF NOT FOUND THEN INSERT INTO counts (key,total) VALUES  
(NEW.key,1); END IF;

RETURN NEW;
ELSEIF TG_OP = 'UPDATE' THEN
-- update topic
IF NEW.key != OLD.key THEN
UPDATE counts SET total=total-1 WHERE key=OLD.key;
UPDATE counts SET total=total+1 WHERE key=NEW.key;
IF NOT FOUND THEN INSERT INTO counts (key,total) VALUES  
(NEW.key,1); END IF;

END IF;
RETURN NEW;
ELSE-- DELETE
UPDATE counts SET total=total-1 WHERE key=OLD.key;
RETURN OLD;
END IF;
END;
$$;

CREATE TABLE victim2( id SERIAL PRIMARY KEY, key TEXT NOT NULL );

CREATE TRIGGER table_count_trigger BEFORE INSERT OR UPDATE OR DELETE ON  
victim2 FOR EACH ROW EXECUTE PROCEDURE table_counter_trigger_f();

SELECT * FROM counts;
TRUNCATE TABLE victim2;
INSERT INTO victim2 SELECT * FROM victim;



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC


On Mon, 14 Apr 2008 16:17:18 +0200, Csaba Nagy <[EMAIL PROTECTED]> wrote:


On Mon, 2008-04-14 at 16:10 +0200, Csaba Nagy wrote:

... or plan the query with the actual parameter value you get, and also
record the range of the parameter values you expect the plan to be valid
for. If at execution time the parameter happens to be out of that range,
replan, and possibly add new sublpan covering the extra range. This
could still work with prepared queries (where you don't get any
parameter values to start with) by estimating the most probable
parameter range (whatever that could mean), and planning for that.


More on that: recording the presumptions under which the (cached!)plan
is thought to be valid would also facilitate setting up dependencies
against statistics, to be checked when you analyze tables... and if the
key value which you depend on with your query changed, the analyze
process could possibly replan it in the background.


	LOL, it started with the idea to make small queries faster, and now the  
brain juice is pouring.

Those "Decision" nodes could potentially lead to lots of decisions 
(ahem).
	What if you have 10 conditions in the Where, plus some joined ones ? That  
would make lots of possibilities...


Consider several types of queries :

	- The small, quick query which returns one or a few rows : in this case,  
planning overhead is large relative to execution time, but I would venture  
to guess that the plans always end up being the same.
	- The query that takes a while : in this case, planning overhead is nil  
compared to execution time, better replan every time with the params.
	- The complex query that still executes fast because it doesn't process a  
lot of rows and postgres finds a good plan (for instance, a well optimized  
search query). Those would benefit from reducing the planning overhead,  
but those also typically end up having many different plans depending on  
the search parameters. Besides, those queries are likely to be dynamically  
generated. So, would it be worth it to add all those features just to  
optimize those ? I don't know...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC




Bind message behaviour was modified :
- If the user asks for execution of a named prepared statement, and the
named statement does not exist in PG's prepared statements cache,  
instead

of issuing an error and borking the transaction, it Binds to an empty
statement, that takes no parameters, and returns no result. Parameters
sent by the user are consumed but not used.


You mentioned the need for a wire protocol change to allow this.  Why
can't this be controlled with a server variable, like SET auto_prepare =
'true'?


Actually, thanks to the hack, the wire protocol doesn't change.
Explanation :

- Send Parse(SQL) to unnamed statement + Bind unnamed statement => works  
as usual (no cache)
- Send only Bind (named statement) with a statement name that is not found  
in the cache => doesn't raise an error, instead informs the application  
that the statement does not exist. The application can then prepare (send  
a Parse message with SQL and a name) the statement and give it a name. I  
used as name the SQL itself, but you can use anything else. The  
application can then send the Bind again, which will (hopefully) work.


	So, here, the information ("cache" or "don't cache") is passed from the  
client to the server, in a hidden way : it depends on what function you  
use to send the query (unnamed statements are not cached, named statements  
are cached).
	There is no protocol change, but a new information is provided to the  
server nonetheless.


	Downside to this is that the application needs to be modified (only a  
little, though) and applications that expect exceptions on "Statement does  
not exist" will break, thus the necessity of a GUC to control it.


	It was just a quick & dirty test to see if this way of doing it was an  
option to consider or not. Apparently it works, but wether it is The Right  
Way remains to be seen...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC




If cached plans would be implemented, the dependence on parameter values
could be solved too: use special "fork" nodes in the plan which execute
different sub-plans depending on special parameter values/ranges,
possibly looking up the stats at runtime, so that the plan is in a
compiled state with the "decision points" wired in.

This of course would mean a lot heavier planning and possibly a lot
bigger plans, but you could afford that if you cache the plan. You could
even have a special command to plan a query this way.


	And, the "fork node" could mutter to itself "Strange, I'm getting 1  
rows instead of the 2 for which I was planned, perhaps I should switch to  
a different plan..."


I have made another very simple hack to test for another option :

Bind message behaviour was modified :
- If the user asks for execution of a named prepared statement, and the  
named statement does not exist in PG's prepared statements cache, instead  
of issuing an error and borking the transaction, it Binds to an empty  
statement, that takes no parameters, and returns no result. Parameters  
sent by the user are consumed but not used.


The application was modified thusly :
- Calls to pg_query_params were changed to calls to the following function  
:


function pg_query_cached( $sql, $params )
{
// Try to execute it, using the query string as statement name.
$q = pg_execute( $sql, $params );
if( !$q ) die( pg_last_error() );

// If it worked, return result to caller.
if( pg_result_status( $q, PGSQL_STATUS_STRING ) != "" )
return $q;

// If we got an empty query result (not a result with 0 rows which is  
valid) then prepare the query

$q = pg_prepare( $sql, $sql );
if( !$q ) die( pg_last_error() );

// and execute it again
$q = pg_execute( $sql, $params );
if( !$q ) die( pg_last_error() );

return $q;
}

Pros :
- It works
- It is very very simple
	- The user can choose between caching plans or not by calling  
pg_query_params() (no cached plans) or pg_query_cached() (cached plans)

- It works with persistent connections

Cons :
- It is too simple
	- Plans are cached locally, so memory use is proportional to number of  
connections

- It is still vulnerable to search_path problems




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread PFC


Why limit ourselves with Oracle?  How all major proprietary RDBMSs do it.


Thanks for the links. Very interesting.
The DB2 document especially mentions an important point : in order to make  
their planner/optimizer smarter, they had to make it slower, hence it  
became crucial to cache the plans.
Contrast this with MySQL where using prepared statements gains nothing :  
the "optimizer" does so little work that it actually doesn't matter.


So, basically, Orcale :
- Parses the query every time (identifies tables, permissions etc) (soft  
parse)
- From that parsed query it looks up a cached plan (the lookup key could  
then be different depending on the schema etc)

- If not, it must plan the query (hard parse).
Also the Oracle doc mentions that the soft parsing should be avoided by  
using prepared statements in the application (ie Parse once and Bind lots  
of times)
So, Oracle will redo the parsing + permissions check each time, unless  
prepared statements are used, in which case it's direct execution.


And DB2 :
Er, the document is not very clear about what it actually does, but the  
stats look nice ;)



I liked your global prepared statements idea much better. Named the
statements is no problem: DB frontends do that for you anyway
sometimes.


Hm. The "global statements" and the cache would complement each other  
actually. Why not.


When the user wants to name the statements, he can do so (and perhaps  
control who can execute what, etc, like with stored procs)

Permission checking overhead will be there at each execution.
Should the plan be cached locally ? (RAM consumption times N bakends...)
Cached per user once permissions have been checked ? (avoids the overhead  
of rechecking permissions)

What about the search path ?
(I'd force the global statements to use the default search path no matter  
what, being explicit is better than "why does it stop working ?")


Can the application or the database library name the statements ?
I'm not so sure. This could work for compiled languages (what about when  
you run several applications ? or several versions of the same application  
? do we need a uniqueness of statement names from all developers all over  
the world ?) Solution : make each application use a different user name,  
and global prepared statements only visible to the user that created them,  
perhaps. This conflicts with some desirable features, though. It needs  
more thinking.


What about non-compiled languages ? It will not be possible to generate a  
list of statements beforehands... And queries are also constructed  
dynamically by frameworks such as Rails, which makes naming them  
impossible, but caching the plans would work well.


So, some situations would benefit from a plan cache,


Frankly, I think you're better off storing them in a table. Shared
memory is a limited resource and you cannot change how much you've


	I'd say that unless you have a perverse application that will try all the  
permutations of column names just to make sure the query is different  
every time, how many different queries would you want to cache ?...  
probably less than 1000... so it wouldn't take more than a couple  
megabytes...



allocated after the server has started. It does mean you'll have to
serialise/deserialise them, but this will be cheaper than replanning,
right?


	What would be the overhead of a catalog lookup to get a cached plan for a  
statement that returns 1 row ? Would the catalog cache make it fast enough  
?

And what about deserialization ?...


I am not too sure that plans and statistical counters should be stored
together...


Not sure either.


Probably plans should go in one place, and statistics should go to the
stats collector (I know he's not quite ready for this ;)).


That's the problem...


Hm, a limit on how much memory can be used for plans
(query_plan_cache_size GUC?), and a LRU/LFU expiration
of old plans?


Now it gets hairy ;)
	Yes memory size should be limited. But how to make a LRU cleaner which  
doesn't create lots of contention ?... Luckily, with a hash having a fixed  
number of buckets, it is easier (clean a bucket every N seconds for  
instance).



Perhaps a GUC for controlling query cache should heve three values:
 none -- don't cache any statement
 smart -- use heuristics for deciding whether to cache it
 all -- force caching all queries -- for uncommon/statistical/testing  
purposes.


I would not volunteer to write that heuristic ;)
	Although there would be a very simple solution : if time to parse > some  
percentage of time to execute then cache.
	The hairiness is in the plan dependence (or independence) on parameter  
values, ideally we only want to cache plans that would be good for all  
parameter values, only the user knows that precisely. Although it could be  
possible to examine the column histograms...



 (like mysql, /* flags */ SELECT blah )


I don't like the hint flags.  They tend to ha

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread PFC




On Fri, Apr 11, 2008 at 12:34 PM, PFC <[EMAIL PROTECTED]> wrote:

Well, I realized the idea of global prepared statements actually
sucked, so I set on another approach thanks to ideas from this list,  
this is

caching query plans.


Well, that's a blatantly bad realization.  Perhaps you should do more  
research.


	No, what I meant is that the "global prepared statements" as I tried to  
implement them before weren't that good...
	I think simple caching based on the query text itself is preferable to  
having to name each of your queries, extract them from your programs and  
replace them by executes, issue a "create statement" command for each of  
them, etc. Few people would actually use that feature because it would  
mean lots of modifications to the application, so all the applications  
that have to be compatible with other databases would not use the feature  
(*)
	It could be useful for permissions and fine access control, though, but  
views and stored procs already provide that functionality...


	(*) = Note that caching the plans based on the query text (with $ params)  
from a parse message will not provide caching for oldskool queries with  
params inside in the form of escaped strings. This is good, because it  
means the safer solution (using $-quoted params) will also be the faster  
solution. And in the application, only a very small part of the code needs  
to be changed, that's the DB abstraction layer.




 Doesn't Oracle do this now transparently to clients?


Of course it does, and it has since the late 80's I believe.


 Oracle keeps a statement/plan cache in its shared memory segment (SGA)
 that greatly improves its performance at running queries that don't
 change very often.


Can we have more details on how Oracle does it ? For "inspiration"...

Here is what I'm thinking about :
	Don't flame me too much about implementation issues, this is just  
throwing ideas in the air to see where they'll fall ;)


* global plan cache in shared memory, implemented as hashtable, hash key  
being the (search_path, query_string)
Doubt : Can a plan be stored in shared memory ? Will it have to be copied  
to local memory before being executed ?


This stores :
- the plans (not for all keys, see below)
- the stats :
- number of times this query has been executed,
	- total, min and max wallclock time and CPU time spent planning this  
query,
	- total, min and max wallclock time, CPU time and RAM spent executing  
this query,

- total, min and max number of rows returned,
- last timestamp of execution of this query,

There should be separate GUCs to control this :
- should the whole thing be activated ?
- should the cache be active ? or just the stats ? and what stats ?

There should be also a way to query this to display the statistics (ie  
"what query is killing my server ?"), and a way to purge old plans.


* every time a Parse message comes up :
- look if the (search_path, query_string) is in the cache
- if it is in the cache :
	- if there is a cached plan, make the unnamed statement point to it, and  
we're done.
	- if there is no cached plan, prepare the query, and put it in the  
unnamed statement.


Now, the query has been parsed, so we can decide if it is cacheable.  
Should this be done in Parse, in Bind, or somewhere else ? I have no idea.


For instance, queries which contain VALUES() or IN( list of consts )  
should not be cached, since the IN() is likely to change all the time, it  
would just trash the cache. Using =ANY( $1 ) instead will work with cached  
plans.


Also, will a plan to be cached have to be prepared with or without the  
parameters ? That's also an interesting question...
Perhaps the user should also be able to specify wether to cache a plan or  
not, or wether to use the params or not, with hint flags in the query  
string ?

(like mysql, /* flags */ SELECT blah )

	Now, if the query is cacheable, store it in the cache, and update the  
stats. If we decided to store the plan, do that too. For instance we might  
decide to store the plan only if this query has been executed a certain  
number of times, etc.


* In the Execute message, if a cached plan was used, execute it and update  
the stats (time spent, etc).


	Now, about contention, since this is one shared hashtable for everyone,  
it will be fought for...
	However, the lock on it is likely to be held during a very small time  
(much less than a microsecond), so would it be that bad ?
	Also, GUC can be used to mitigate the contention, for instance if the  
user is not interested in the stats, the thing becomes mostly read-only






















--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached Query Plans

2008-04-12 Thread PFC



Well if you're caching per-connection then it doesn't really matter  
whether
you do it on the client side or the server side, it's pretty much  
exactly the

same problem.


Actually I thought about doing it on the server since it would then also
work with connection pooling.
Doing it on the client means the client has to maintain state, which is
not possible in a pool...

Unsurprisingly most drivers do precisely what you're describing. In Perl  
DBI
for example you just change $dbh->prepare("") into  
$dbh->prepare_cached("")

and it does exactly what you want. I would expect the PHP drivers to have
something equivalent.


	Well, you clearly have "expectations" about PHP, lol. PHP takes pride in  
always aiming below your expectations, not above ;)
	It has no such feature. Also pg_query_params() is SLOWER than pg_query()  
which makes you choose between clean&slow, and string quoting hell.

Perhaps I should patch PHP instead...
Or perhaps this feature should be implemented in pgpool or pgbouncer.

	But, using prepared statements with persistent connections is messy,  
because you never know if the connection is new or not,



If you were to fix *that* then both this problem and others (such as
setting up desired SET-parameter values) would go away. 


True. Languages that keep a long-running context (like application
servers etc) can do this easily.
Although in the newer versions of PHP, it's not so bad, pconnect seems 
to
work (ie. it will issue ROLLBACKs when the script dies, reset session
variables like enable_indexscan, etc), so the only remaining problem seems
to be prepared statements.
And again, adding a method for the application to know if the persistent
connection is new or not, will not work in a connection pool...

Perhaps a GUC flag saying EXECUTE should raise an error but not kill the
current transaction if the requested prepared statement does not exist ?
Then the application would issue a PREPARE. It could also raise a
non-fatal error when the tables have changed (column added, for instance)
so the application can re-issue a PREPARE.

But I still think it would be cleaner to do it in the server.

Also, I rethought about what Gregory Stark said :
The contention on the shared cache is likely to negate much of the  
planning

savings but I think it would still be a win.


If a shared plan cache is implemented, it will mostly be read-only, ie.
when the application is started, new queries will come, so the plans will
have to be written to the cache, but then once the cache contains
everything it needs, it will not be modified that often, so I wouldn't
think contention would be such a problem...


It's not so easy as all that.  Consider search_path.  Consider temp
tables.


Temp tables : I thought plan revalidation took care of this ?
(After testing, it does work, if a temp table is dropped and recreated,
PG finds it, although of course if a table is altered by adding a column
for instance, it logically fails).

search_path: I suggested to either put the search_path in the cache key
along with the SQL string, or force queries to specify schema.table for
all tables.
It is also possible to shoot one's foot with the current PREPARE (ie.
search_path is used to PREPARE but of course not for EXECUTE), and also
with plpgsql functions (ie. the search path used to compile the function
is the one that is active when it is compiled, ie at its first call in the
current connection, and not the search path that was active when the
function was defined)...

SET search_path TO DEFAULT;

CREATE SCHEMA a;
CREATE SCHEMA b;

CREATE TABLE a.test( v TEXT );
CREATE TABLE b.test( v TEXT );

INSERT INTO a.test VALUES ('This is schema a');
INSERT INTO b.test VALUES ('This is schema b');

CREATE OR REPLACE FUNCTION test_search_path()
 RETURNS SETOF TEXT
 LANGUAGE plpgsql
 AS
$$
DECLARE
 x TEXT;
BEGIN
 FOR x IN SELECT v FROM test LOOP
 RETURN NEXT x;
 END LOOP;
END;
$$;

test=> SET search_path TO a,public;
test=> SELECT * FROM test_search_path();
  test_search_path
--
  This is schema a
test=> \q
$ psql test

test=> SET search_path TO b,public;
test=> SELECT * FROM test_search_path();
  test_search_path
--
  This is schema b

test=> SET search_path TO a,public;
test=> SELECT * FROM test_search_path();
  test_search_path
--
  This is schema b

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached Query Plans

2008-04-11 Thread PFC




I think what he's referring to is persistently caching plans so that new
connections can use them. That makes a lot more sense if you have lots of
short-lived connections like a stock php server without persistent  
connections
turned on or a connection pooler. You can prepare queries but they only  
live

for a single web page so you don't get any benefit.


Let me explain a little further.

	Persistent database connections are the way to go for web applications,  
because the connection is only going to get used for a few queries, and  
the time needed to start the postgres process and establish the connection  
is often significant compared to the time used for the actual queries.  
Connection pooling can also be used, you get the idea.


	So, using persistent database connections, it makes sense to use prepared  
statements to speed up execution of simple queries, like those returning a  
few rows with a few joins and no complicated WHERE clauses, which is  
actually most of the queries on your average website. As shown in my  
previous message, the CPU time spent planning the query can be as much or  
even a lot more than CPU time spent actually executing the query.


	But, using prepared statements with persistent connections is messy,  
because you never know if the connection is new or not, if it contains  
already prepared statements or not, you'd have to maintain a list of those  
statements (named) for every query in your application, and when someone  
changes a query, it's a mess, not to mention queries generated by the ORM  
like Rails etc.


The idea in this "proof of concept" was :

	Wouldn't it be nice if Postgres could just say "Hey, I already planned  
that query, I'll reuse that plan".
	And it is very easy to recognize a query we've seen before, since  
$-params takes the parameters out of the equation, and eliminates parsing  
time and string quoting hell.


	Storing the cached plans as prepared statements in the connection-local  
hashtable makes sense : it doesn't use that much memory anyway, and there  
are no locking and contention problems. Just like PREPARE and EXECUTE.


Personally I would like to see this, not primarily for the performance  
gains,
but for the possibility of managing when plans change -- ie, plan  
stability.


Unfortunately, this isn't compatible with a non-shared memory 
approach...


But there is resistance from other quarters about the reliability hit of
having the plan data structures in shared memory.


I agree.
Hence the idea to put them in non-shared memory, local to a process.
	Perfectly useless when using non-persistent connections, but very  
powerful when using persistent connections.



I still don't see why you would need a wire protocol change.


	Because I'd think that sometimes the client will not want to use a cached  
plan, when the query is rarely used (no need to waste memory to cache the  
plan), or it is complex and needs to be replanned according to parameter  
values every time.
	Sure, the client could use the oldskool "send query as text with  
parameters inside" but that's back to string escaping hell, and it's ugly.

It would be nicer to have a bool "cache_plan".


You would just
have clients prepare plans normally and stash them in shared memory for  
other
backends in a hash table keyed by, well, something, perhaps the original  
query

text.


	Query text seems to be the simplest, better not ask the user to come up  
with distinct names when the query text will be a perfect key. Besides,  
hand-generated names might turn out not to be so distinct after all...


Then whenever you're asked to prepare a query you go check if someone  
else has

already done it for you and find an already generated plan in the shared
memory hash table.
The contention on the shared cache is likely to negate much of the  
planning
savings but I think it would still be a win. But what's really  
interesting to
me is then providing an interface to see and manipulate that cache. Then  
you
could see what plans other backends are using for queries, mark plans as  
being
acceptable or not, and even revoke users' permissions to execute queries  
which

aren't already present and marked as being acceptable.


	If it can be made to work with a shared cache, why not, but that would be  
more complex. You'd also have to deal with permissions, different users  
with different privileges, etc. But local would probably be simplest (and  
faster).


	Also, there will be problems with the schema search path. Perhaps a query  
should be required to specify the fully qualified table names  
(schema.table) for all tables in order to be cacheable.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-11 Thread PFC



	Well, I realized the idea of global prepared statements actually sucked,  
so I set on another approach thanks to ideas from this list, this is  
caching query plans.


First, let's see if there is low hanging fruit with the typical small,  
often-executed queries that are so frequent on websites.
Tables test, test2 and test3 contain id (integer primary key) and another  
integer field. There are 100K rows in each.


First, the simplest query :
SELECT * FROM test WHERE id = $1

110 us : Send query as text (PHP:pg_query -> PQexec)
125 us : Parse+Bind (PHP:pg_query_params -> PQexecParams)
 67 us : Execute a previously prepared statement (PHP:pg_execute ->  
PQexecPrepared)


A slightly more complex one but still pretty classic :
SELECT * FROM (SELECT * FROM test WHERE id>$1 ORDER BY id LIMIT 5) AS a  
NATURAL LEFT JOIN test2 NATURAL LEFT JOIN test3 ORDER BY id


523 us : Send query as text (PHP:pg_query -> PQexec)
580 us : Parse+Bind (PHP:pg_query_params -> PQexecParams)
148 us : Execute a previously prepared statement (PHP:pg_execute ->  
PQexecPrepared)


OK, so there is low hanging fruit since the parsing+planning time of those  
is longer than doing the query itself.


Since the Parse message includes a $-parameterized query that is to be  
prepared, it seems logical to put the caching logic there : the query  
string (without parameters) makes a nice cache key.


So I made a really quick and really dirty experimentation without changing  
the wire protocol between client and server. This is only "proof of  
concept".


Try #1 : in exec_parse_message(), if the statement is named, look it up in  
the prepared statements cache, if it is found, return at once and do  
nothing else.
To exploit this, I issue a pg_prepare() followed by pg_execute() at every  
query, wether or not the statement exists. If it already exists,  
pg_prepare() now does nothing (except losing a little time).


Results :
 88 us : simple query
173 us : complex query

So, the timings are between a simple execute and a plan+execute. It  
provides a nice performance gain versus replanning every time, but not  
perfect.


Try #2 : again, in exec_parse_message(), if the statement is unnamed, I  
use the query string as the statement name, search the plan in the  
prepared statements hash table. If it is not found, then it is prepared.  
Then I make the unnamed statement plan point to this. Of course, this is  
dangerous since it probably introduces a dozen crash bugs, but for this  
proof of concept, it's OK.
Client code is unchanged, PQexecParams will benefit from the plan caching,  
since it always sends a Parse+Bind message using the unnamed statement.


Results are identical to executing an execute on a prepared statement,  
modulo a few microseconds.
This means the overhead of sending the Parse message, and of the server  
ignoring it when the statement is cached, is negligible.


	So, where to go from that ? I don't see a way to implement this without a  
(backwards-compatible) change to the wire protocol, because the clients  
will want to specify when a plan should be cached or not. Since the user  
should not have to name each and every one of the statements they want to  
use plan caching, I see the following choices :


	- Add a new Parse+Bind command, which gets the $-parameterized SQL and  
the parameters. If the plan is cached, grab it and execute, else prepare  
and execute. Add a flag to allow the client to specify if he wants caching  
or not.

Pros : Only one message, faster
	Cons : SQL is transmitted in full, useless most of the time, but this  
overhead is rather small.


	- Send the SQL with Bind as statement name, add a flag to Bind telling it  
to report a cache miss instead of raising an error, then have the client  
send a Parse and Bind again.


	- Should there be one common hashtable for named prepared statements and  
cached plans, or two hashtables ? Using the SQL string as the statement  
name is not clean.



























--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Dumb Micro-Optimization

2008-04-10 Thread PFC



* Dumb Optimization #1:

- Add executorFunc function pointer to struct PlanState
- in ExecProcNode.c -> ExecProcNode() :
	- upon first execution, set executorFunc to the function corresponding to  
node type

- next calls use function pointer

Effect : removes a switch (nodeTag(node)) which otherwise executes for  
every tuple returned by every node

Gain :
- 4% CPU time on SELECT sum(an integer column) FROM a table of one million  
rows

- nil on selects returning few rows obviously



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Free Space Map data structure

2008-04-10 Thread PFC


PFC wrote:

 About the FSM :
 Would it be possible to add a flag marking pages where all tuples  
are visible to all transactions ? (kinda like frozen I think)


Ah, the visibility map. That's another line of discussion. The current  
plan is to not tie that to the FSM, but implement it separately. There's  
some infrastructure changes that are needed for both, like the "map  
forks" (see recent FSM discussions), which is why we need to have a  
design for FSM as well before we start implementing the visibility map.


It's definitely something I want to do for 8.4.


Ahh, yes, yes, yes ;) yes !


 Here's my rough plan:
1. Common "map forks" support
2. Rewrite FSM
3. Implement visibility map, to allow partial vacuums
4. Implement index-only scans, using the visibility map.


Throwing another idea that is related to partial vacuums (perhaps ?):
	Consider a table that is often inserted (archive, forum posts, log,  
whatever), we want to CLUSTER it.
	In that case it would be beneficial to only cluster the "tail" of the  
table, where all the recently inserted rows are.


Example :
- Table is clustered.
	- Insert rows in random order, update some, delete some, etc, supposing  
inserts happen at the end
	- Table now looks like "head":[clustered part with some holes] plus  
"tail":[rows in random order]
	- As long as the "tail" fits in disk cache, the random order isn't a  
problem.

- So, when the "tail" reaches a certain size :
- Grab it, sort it by cluster order, write it again in the heap
- Update the indexes in a manner similar to VACUUM (ie. bulk update)
	- Table now looks like "head":[clustered part with some holes] plus  
"tail":[clustered]
	This does not remove the holes in the "head", but is this really a  
problem ? In this usage scenario, I don't think so. Regular CLUSTER could  
also be run, much less frequently than before, and it will also be much  
faster since the rows are approximately in-order already.


	This approach is complimentary to the "auto-cluster" approach where the  
index is asked "where should I insert that row ?" (this will be used to  
fill the holes). Auto-cluster will work well in tables that are updated  
very often. But starting from an empty table, or an already clustered  
table, or in a mostly-insert scenario, the index will have no idea where  
to put that row...


The goodness of this approach is that
	- As long as the "tail" fits in RAM, sorting it is going to be very fast  
(unlike the current CLUSTER).
	- Bulk index updates will also be fast as long as the list of changes to  
apply to the index fits in memory.
	- Therefore it will block the table for much less time than good old  
CLUSTER.

- Therefore it will get more use ;)

How to make it non-locking ?
- Doing something like this in pseudo SQL :
	INSERT INTO table SELECT * FROM (DELETE FROM table WHERE date > last time  
we did this RETURNING *) ORDER BY cluster_columns;

VACUUM;

	That is, take the "tail" of the table (as above), sort it, insert it back  
in big chunks, and mark the old rows as deleted just like a regular delete  
would have done. Then VACUUM.


In this case you now have :
	"head":[clustered part with some holes] + "big hole" + "tail":[clustered  
rows]


	Is the "big hole" a problem ? Probably not, it will be marked as free  
space by VACUUM and used for new inserts. A week later we get this :
	"head":[clustered part with some holes] + [rows in random order]  
+ "tail":[clustered rows]
	Repeating the process above will make the "tail" grow and the "hole" will  
stay more or less in the same place.


Another way to do it is to use partitions :
- "archive" table
- "current" table

	Periodically the rows from "current" are transferred to the "archive"  
table and sorted in the process. Then "current" is truncated.
	This works, but it is blocking, and you have the overhead from  
partitioning...



















--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Free Space Map data structure

2008-04-09 Thread PFC



About the FSM :

	Would it be possible to add a flag marking pages where all tuples are  
visible to all transactions ? (kinda like frozen I think)
	This could be useful to implement index-only scans, for count(), or to  
quickly skip rows when OFFSET is used, or to use only the index when the  
selected columns are all in the index. Of course if the page is flagged as  
"may contain updated tuples", then it would have to look in the heap. But,  
for tables that are not randomly updated (for instance tables that are  
mostly appended to, like forum posts, or logs, or the huge archive table,  
etc) it could save a lot of heap lookups and IO.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modules

2008-04-05 Thread PFC


On Sat, 05 Apr 2008 02:17:10 +0100
Gregory Stark <[EMAIL PROTECTED]> wrote:
I was inclined to dismiss it myself but I think the point that's come  
up here
is interesting. The ISP has to not just install an RPM or type make  
install in
some source tree -- but actually log into each customer's database and  
run an
SQL script. That does seem like more work and more risk than a lot of  
ISPs

will be willing to take on.


	On (k)Ubuntu you can apt-get install postgresql-contrib-8.3 which puts  
everything in the right places, all you have to do then is to run the sql  
scripts in /usr/share/postgresql/8.3/contrib/ as user postgres...
	But of course you need the ISP to do it for you if you are not superuser.  
Some will bother to run a few commands for a user, some won't...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY Transform support

2008-04-03 Thread PFC


INSERT INTO mytable (id, date, ...) SELECT id, NULLIF( date,
'-00-00'   ), ... FROM mydump WHERE (FKs check and drop the borken
records);


What do we gain against current way of doing it, which is:
  COPY loadtable FROM 'dump.txt' WITH ...
  INSERT INTO destination_table(...) SELECT ... FROM loadtable;


	You read and write the data only once instead of twice (faster) if you  
want to import all of it.
	If you just want to compute some aggregates and store the results in a  
table, you just read the data once and don't write it at all.


	The advantages are the same than your proposed transformations to COPY,  
except I feel this way of doing it opens more options (like, you can  
combine columns, check FKs at load, do queries on data without loading it,  
don't necessarily have to insert the data in a table, don't have to invent  
a new syntax to express the transformations, etc).







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY Transform support

2008-04-03 Thread PFC


On Thu, 03 Apr 2008 16:57:53 +0200, Csaba Nagy <[EMAIL PROTECTED]> wrote:


On Thu, 2008-04-03 at 16:44 +0200, PFC wrote:

CREATE FLATFILE READER mydump (
id  INTEGER,
dateTEXT,
...
) FROM file 'dump.txt'
(followed by delimiter specification syntax identical to COPY, etc)
;


Very cool idea, but why would you need to create a reader object
first ? You should be able to use COPY directly with the target table
being omitted,  meaning the copy will not pump it's result in the target
but be equivalent to a select... and use it in any place where a select
can be used. This would have absolutely no new  syntax, just the rules
changed...

Now that I had a second look you actually need the field definitions to
meaningfully interpret the file,


	Yeah, you need to tell Postgres the field names, types, and NULLness  
before it can parse them... or else it's just a plain flat text file which  
makes no sense...

but then why not use a record
specification instead of the table in the normal COPY command ? I'm not
sure if there's any existing syntax for that but I would guess yes...


	Hm, yeah, that's even simpler, just create a type for the row (or just  
use table%ROWTYPE if you have a table that fits the description), and tell  
COPY to parse according to the row type definition... smart...


Like :

CREATE TYPE import_rowtype AS (id INTEGER, date TEXT);
INSERT INTO mytable (id, date, ...)
  SELECT id, NULLIF( date, '-00-00' )::DATE
  FROM (COPY AS import_rowtype FROM 'mysql_trash.txt') AS foo
  WHERE (FKs check and drop the borken records);

Looks clean...

	Obviously, in this case (and also in my proposal's case) you must use  
COPY and not \copy since it is the database server which will be reading  
the file.
	This could probably be hacked so the client sends the file via the \copy  
interface, too...



In any case, such a feature would help a lot in processing input files
based also on other existing data in the DB.


Yeah, it would be cool.
	Also, since COPY TO can use a SELECT as a data source, you could use  
postgres to read from a file/pipe, process data, and write to a file/pipe  
(kinda better than sed, lol)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY Transform support

2008-04-03 Thread PFC



Data transformation while doing a data load is a requirement now and  
then.

Considering that users will have to do mass updates *after* the load
completes to mend the data to their liking should be reason enough to do
this while the loading is happening. I think to go about it the right  
way we

should support the following:



* The ability to provide per-column transformation expressions
* The ability to use any kind of expressions while doing the  
transformation

The transformation expression should be any expression (basically
ExecEvalExpr) that can be evaluated to give a resulting value and  
obviously

a corresponding is_null value too. It should and could be system in-built
functions (e.g. UPPER, TRIM, TO_CHAR, TO_NUMBER etc.) or user defined
functions too
* The transformation expression can refer to other columns involved in  
the

load. So that when the current row is extracted from the input file, the
current values should be used to generate the new resultant values before
doing a heap_form_tuple. E.g.
(col1 transform "col1 + 10", col2 transform "col1 * col2", col3 transform
"UPPER(col1 || col3)",...)
I have spent some thoughts on how to do this and will be happy to share  
the

same if the list is interested. Personally, I think data transformation
using such expressions is a pretty powerful and important activity while
doing the data load itself.



	Well, since COPY is about as fast as INSERT INTO ... SELECT plus the  
parsing overead, I suggest adding a special SELECT form that can read from  
a file instead of a table, which returns tuples, and which therefore can  
be used and abused to the user's liking. This is a much more powerful  
feature because :


- there is almost no new syntax
- it is much simpler for the user
- lots of existing stuff can be leveraged

EXAMPLE :

	Suppose I want to import a MySQL dump file (gasp !) which obviously  
contains lots of crap like -00-00 dates, '' instead of NULL, borken  
foreign keys, etc.


Let's have a new command :

CREATE FLATFILE READER mydump (
id  INTEGER,
dateTEXT,
...
) FROM file 'dump.txt'
(followed by delimiter specification syntax identical to COPY, etc)
;

	This command would create a set-returning function which is basically a  
wrapper around the existing parser in COPY.
	Column definition gives a name and type to the fields in the text file,  
and tells the parser what to expect and what to return.
	It looks like a table definition, and this is actually pretty normal : it  
is, after all, very close to a table.


	INSERT INTO mytable (id, date, ...) SELECT id, NULLIF( date, '-00-00'  
), ... FROM mydump WHERE (FKs check and drop the borken records);


	Now I can import data and transform it at will using a simple SELECT. The  
advantage is that everybody will know what to do without learning a new  
command, no awkward syntax (transform...), you can combine columns in  
expressions, JOIN to ckeck FKs, use ORDER to get a clustered table,  
anything you want, without any extension to the Postgres engine besides  
the creation of this file-parsing set-returning function, which should be  
pretty simple.


	Or, if I have a few gigabytes of logs, but I am absolutely not interested  
in inserting them into a table, instead I want to make some statistics, or  
perhaps I want to insert into my table some aggregate computation from  
this data, I would just :


CREATE FLATFILE READER accesses_dump (
dateTEXT,
ip  INET,
...
) FROM file 'web_server_logtxt';

And I can do some stats without even loading the data :

	SELECT ip, count(*) FROM accesses_dump GROUP BY ip ORDER BY count(*)  
HAVING count(*) > 1000;


	Much better than having to load those gigabytes just to make a query on  
them...
































--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-04-02 Thread PFC



The MAJOR benefit of Microsoft's approach is that it works on existing  
application,


Yes, that is a nice benefit !
	Is there a way to turn it on/off ? Or is it smart enough to only cache  
plans for cases where it is relevant ?
	For instance, I absolutely want some queries to be planned according to  
real parameters (makes huge difference on some search queries, as  
expected), whereas most simple queries like the proverbial select by ID  
etc could be cached without problems...


and, most importantly makes NO assumptions on the "volatile" server  
state. A few cases where the Microsoft solution works, while yours will  
fail is:


* Server restart and assorted like failover (you need to redo a
  global prepare).
* Cleanup and instantiation of a prepared statement.


	Hehe, actually, mine does work after restart since the statements are  
stored in a database-specific system catalog which is persistent.
	Actually, what I store is not the result of PREPARE (a plan) but the text  
of the SQL query "PREPARE foo.", that is I just cut the GLOBAL from  
"GLOBAL PREPARE" and store the rest. The actual PREPARE is realized by  
each connection when it encounters an EXECUTE request and doesn't find the  
cached plan. It is actually extremely simple ;) did you expect a fancy  
shared memory cache (ahem...) ? No, no, it's very basic.
	This way, if a table was dropped and recreated, or whatever other stuff  
that can invalidate a plan since the GLOBAL PREPARE was issued, no  
problem, since there was no global stored plan anyway, just some SQL text.  
Also if a needed table was dropped, the user will get the same error  
message as he would have got issuing a PREPARE for the associated SQL  
query string.
	The overhead of each connection doing its own PREPARE is negligible,  
since, if you use that feature, you intend to issue this query many, many  
times during the life of the persistent connection.


What you are doing for a global query cache is already in consideration  
and having plan invalidation mechanism on schema changes or, maybe,  
statistic updates was a step into that direction. You code mostly  
contributed the other parts already.


As I said it is much simpler than that : I store no plans ;)
Of course this means it only works with persistent connections.

Another considerations is whether most task are getting CPU bound or IO  
bound. A better, per query, plan might reduce IO load due to better use  
of statistics on that single case, while for CPU bound it is very nice  
to reduce the planning overhead significantly.


	Well, if it is IO bound, then this thing is useless. However, since the  
purpose is to optimize often-used, simple queries, the likes of which  
abound in web applications, then it is relevant... because, if this kind  
of simple selects become IO bound, and you have a few on each page, you're  
really in trouble...


Another possible implementation would be to use a connection pooler  
which, when opening a new connection, can be configured to send a SQL  
script containing all the PREPARE statements.
This is, IMHO, an application side feature that might be a good addition  
to PHP and other languages that provide the "persistent connection"  
feature.


	On second thought, if it is not in Postgres, I'd rather put this in the  
connection pooler, because this way it can be used by different  
applications. But then you have to use a connection pooler. Also, by  
putting it in Postgres, statements are only prepared as needed, whereas  
the pooler would have to issue a lot of PREPAREs at connection startup,  
making new connection startup slower.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] How embarrassing: optimization of a one-shot query doesn't work

2008-04-01 Thread PFC


On Tue, 01 Apr 2008 16:06:01 +0200, Tom Lane <[EMAIL PROTECTED]> wrote:


Dave Cramer <[EMAIL PROTECTED]> writes:

Was the driver ever changed to take advantage of the above strategy?


Well, it's automatic as long as you use the unnamed statement.  About
all that might need to be done on the client side is to use unnamed
statements more often in preference to named ones, and I believe that
something like that did get done in JDBC.

regards, tom lane



PHP is also affected if you use pg_query_params...
Syntax : pg_query_params( "SQL with $ params", array( parameters )

Note that value is TEXT, indexed, there are 100K rows in table.

pg_query( "SELECT * FROM test WHERE id =12345" ); 1 rows in  
0.15931844711304 ms
pg_query( "SELECT * FROM test WHERE value LIKE '1234%'" ); 11 rows in  
0.26795864105225 ms


pg_query_params( "SELECT * FROM test WHERE id =$1", array( 12345 ) ); 1  
rows in 0.16618013381958 ms
pg_query_params( "SELECT * FROM test WHERE value LIKE $1", array( '1234%'  
)); 11 rows in 40.66633939743 ms


Last query does not use index.
However since noone uses pg_query_params in PHP (since PHP coders just  
LOVE to manually escape their strings, or worse use magicquotes), noone  
should notice ;)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-31 Thread PFC


* Server restart and assorted like failover (you need to redo a
  global prepare).


Hmm?  He's proposing storing the info in a system catalog.  That hardly
seems "volatile"; it'll certainly survive a server restart.


Yes, it's in a system catalog.


I agree with the point that this isn't completely transparent to
applications, but if an app is already using named prepared statements
it would surely be a pretty small matter to make it use this feature.
The app code would likely get simpler instead of more complex, since
you'd stop worrying about whether a given statement had been prepared
yet in the current session.


Thanks. That was the idea behing this hack...


I'm having a problem with the terminology here, since AFAICT what your
patch does is exactly not a global "prepare" --- there is no permanently
stored cached plan.  That's a good thing probably, but it seems like
the feature needs to be described differently.


	Sure, but I couldn't come up with a suitable name at the time... perhaps  
CREATE STATEMENT (and DROP STATEMENT) ? This would describe it better  
(especially the DROP, because GLOBAL DEALLOCATE is a rather bad name,  
since it doesn't actually deallocate anything...)



I'm also pretty dubious about storing raw text in that catalog.  In the
first place, while I've not looked at your patch, I expect you are
pulling the raw text from debug_query_string.  That won't work in cases
where multiple SQL commands were submitted in one query string.


	LOL, you are right, I had tested with multiple queries on the same line  
from psql, but psql apparently splits the queries, when I feed multiple  
queries from PHP, one of them being GLOBAL PREPARE, it fails.



In the
second place, raw-text SQL commands will be subject to a whole lot of
ambiguity at parse time.  If for instance another session tries to use
the command with a different search_path or standard_conforming_string
setting, it'll get different results.  While I can think of use-cases
for that sort of behavior, it seems like mostly a bad idea.


You're right.


I'm thinking that a more appropriate representation would use stored
parse trees, the same as we do in pg_rewrite, and with the same
dependency information so that a stored statement couldn't outlive the
objects it depends on.


	Do the parse tree store fully qualified "schema.table" or  
"schema.function" ?
	I mean, if table T is mentioned in a parse tree which is stored, and the  
table is later dropped and recreated... or a column dropped... what  
happens ? Dropping the statement would seem more logical, since it would  
probably no longer be valid...



Another area that could do with more thought is the hard-wired
association between statement ownership and accessibility.  That's
likely to be pretty inconvenient in a lot of cases, particularly
systems that use role membership heavily.


Yes, need to think about that.


I also wonder whether statements should belong to schemas...


Since they are basically an extremely simple form of a function, why 
not ?
	(but since part of the goodness on prepared statements is that they are  
stored in a fast hash cache, wouldn't that add too much overhead ?)


Thanks for the helpful advice.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch : Global Prepared Statements

2008-03-30 Thread PFC



	Here is the result of my little experiment, for your viewing pleasure,  
lol.
	Now it works. Well, it hasn't crashed yet... so I guess I can show it to  
people ;)


- Purpose :

Allow PHP (or other languages) users to use prepared statements  
(pg_exec()) together with persistent connections, while not bothering  
about preparing all those statements when the persistent connection is  
first created (you have no way of knowing, from PHP, if your persistent  
connection is new or second-hand, by the way).


- What it does :

New commands :
GLOBAL PREPARE, does the same as PREPARE, except for all current and  
future sessions FROM THE SAME USER instead of just the current session.

GLOBAL DEALLOCATE does almost what you'd expect.

- Results

Here is a comparison between sending the query as string (oldskool) with  
parameters in it, and executing a prepared statement via pg_exec() (which  
emits a BIND, not a SQL EXECUTE)


CREATE TABLE test( id SERIAL PRIMARY KEY, value INTEGER );
INSERT INTO test (value) SELECT random()*100 FROM  
generate_series( 1,10 );

CREATE INDEX test_value ON test( value );
CREATE TABLE test2 (id INTEGER PRIMARY KEY REFERENCES test( id ) ON DELETE  
CASCADE, value2 TEXT );

INSERT INTO test2 SELECT id, (random()*100)::TEXT FROM test;
CREATE TABLE test3 (id INTEGER PRIMARY KEY REFERENCES test( id ) ON DELETE  
CASCADE, value3 INTEGER );

INSERT INTO test3 SELECT id, random()*100 FROM test;

- simple SELECT : SELECT * FROM test WHERE id = $1
=> Server CPU load : -46% (almost cut in half)
=> Queries/s from PHP including PHP overhead : +71%

- three table JOIN with order by/limit :
SELECT * FROM test a NATURAL JOIN test2 b NATURAL JOIN test3 c WHERE  
a.value < $1 ORDER BY value DESC LIMIT 1


=> Server CPU load : -84% (ie uses 6 times less CPU)
=> Queries/s from PHP including PHP overhead : +418% (ie 5 times faster)

(By the way, with pg_exec on those simple queries, Postgres beats InnoDB  
and MyISAM thoroughly in both queries/s (seen by client) and server CPU  
used by query).


Note that the patch has no influence on these results whatsoever.
It just allows a PHP coder to easily use the pure goodness of pg_exec()  
without hassle.


- Implementation :

* a new system catalog (I snipped the query texts, but they are stored  
entirely).


test=# SELECT * FROM pg_global_prepared ;
 stmt_owner | stmt_name | stmt_sql
+---+---
 10 | test_plan_order_3 | PREPARE test_plan_order_3 (INTEGER) AS  
SEL
 10 | test_plan_pk_3| PREPARE test_plan_pk_3 (INTEGER) AS  
SELECT
 10 | test_plan_pk  | PREPARE test_plan_pk (INTEGER) AS SELECT  
*
 10 | test_plan_order   | PREPARE test_plan_order (INTEGER) AS  
SELEC


* GLOBAL PREPARE foo AS SELECT ...
Checks syntax, does a PREPARE, if it works, inserts row in  
pg_global_prepared with user OID, statement name, and PREPARE command.


* GLOBAL DEALLOCATE foo
Deletes the line from pg_global_prepared
Note that this does not make all other open connections forget the plan ;)

* In the code :

FetchPreparedStatement() gets a new sister,
FetchOrCreatePreparedStatement().
- try to find statement in the session cache (usual behaviour)
found -> ok, we're done
- if transaction is aborted, byebye
- look in pg_global_prepared for the requested statement name and the  
current user ID.

not found -> byebye
- SPI_execute the PREPARE statement
- return the prepared statement to caller

SQL EXECUTE, the BIND handler in postgres.c, and some others got calls to  
FetchOrCreatePreparedStatement().
So if you pg_exec() or EXECUTE or EXPLAIN EXECUTE, etc, a statement that  
does not exist, the backend will PREPARE it for you if it can find it in  
pg_global_prepared.


Reason for the user OID is that independent users mst not be able to  
modify each other's prepared statements...


Patch is here (against 8.3.1)
http://home.peufeu.com/pg/

This is certainly not ready for production, lol. But since I had fun  
writing this, I'd rather share it.


Regards,
Pierre







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-30 Thread PFC




PFC wrote:

Hello,
So, I embarked (yesterday) on a weekend project to add a new  
feature to Postgres...
I use PHP with persistent connections and always have been bothered  
that those very small AJAX queries (usually simple selects returning 1  
row) take more CPU in postgres to parse & plan than to actually execute.
Microsoft's answer to this issue with SQLServer appears to have been to  
introduce a smart
cache for all statement plans. It seems to be very effective. I guess  
you're doing much the

same thing but with more user intervention,  in effect.


	Actually, the main purpose was to 1) have fun hacking Postgres, and 2)  
perhaps something useful would come of it...
	And I did find it very interesting, probably due to the fact that  
Postgres source code is so... impressively clean... well organized...  
readable... it took two hours from downloading the source to having an  
extra functional system catalog, and it worked at the first compile, all  
due to the concise but to the point comments in include/catalog, I  
couldn't believe it.


	Anyway, Microsoft's solution is cool, too, but you need to reparse the  
entire query to then detect "I've planned this query before, with other  
parameters, so I'll reuse that prepared plan", so it adds another parsing  
step, which is less efficient.
	Postgres could also do that with a small modification, by the way : like  
by using the entire string (with $1 style parameters) instead of the  
statement name, use that as a cache key, and send parameters separately,  
but I think it would be less clean than, say, a statement called  
"get_user_by_id" or something.
	Also I like the idea of named prepared queries, which feel a bit like  
procedures, because many of those small, often-used queries would end up  
being defined in the same place, which makes schema changes (and having to  
modify queries in your application) slightly less painful.
	PHP can also use pg_exec() which sends the parameters separately,  
automagically converted to postgres format, so you save time and hassle on  
quoting and de-quoting versus a SQL EXECUTE.
	Since the performance of pg_exec() is almost twice as fast as plain SQL,  
and PHP scripts tend to use quite a lot of these queries, it also means a  
free database server performance upgrade (ie. the same DB server can  
handle more webservers, for instance).


	Another possible implementation would be to use a connection pooler  
which, when opening a new connection, can be configured to send a SQL  
script containing all the PREPARE statements.


Are you sure that you application wouldn't benefit more from a MOM  
solution with
persisted database connections? Have you looked at  
http://safmq.sourceforge.net/?


Dunno. Is this related to Postgres ?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] first time hacker ;) messing with prepared statements

2008-03-30 Thread PFC


Hello,
	So, I embarked (yesterday) on a weekend project to add a new feature to  
Postgres...
	I use PHP with persistent connections and always have been bothered that  
those very small AJAX queries (usually simple selects returning 1 row)  
take more CPU in postgres to parse & plan than to actually execute.
	Since those small queries usually come in great numbers, I would like to  
PREPARE them beforehand and use php's pg_exec(), (faster than SQL  
EXECUTE). Saves about 50% CPU time on the server for those small queries.
	However with persistent connections there is a problem : you never know  
if the query has already been prepared or not.
	Ideally a PHP process would open a persistent connection and find all  
queries already prepared, ready to execute...


So :

- Added a system catalog "pg_global_prepared" (one per database actually)  
which contains :

- oid of user who created the row
- name of statement
- SQL command for preparing statement

example :

test=# SELECT * FROM pg_global_prepared ;
 stmt_owner |stmt_name 
|   stmt_sql

+-+---
 10 | test| PREPARE test (INTEGER) AS SELECT $1+3;
 10 | test_plan_pk| PREPARE test_plan_pk (INTEGER) AS SELECT *  
FROM test WHERE id = $1;
 10 | test_plan_order | PREPARE test_plan_order (INTEGER) AS  
SELECT * FROM test WHERE value < $1 ORDER BY value DESC LIMIT 1;


- Added sql command GLOBAL PREPARE foo (arg types) AS sql query
	This inserts a row in the above catalog after having run a standard  
"prepare" on the query to test its validity


- Added sql command GLOBAL DEALLOCATE
	This removes row(s) from the above catalog, (only those owned by the  
current user)


- Messed with EXECUTE (ExecuteQuery) so that :
- if the requested statement is found in session cache, use it (as 
usual)
	- if not, look into pg_global_prepared to see if there is one of the same  
name and created by same user
	- if found, use this to PREPARE, then store in session cache, then  
execute it


After that I put this logic in FetchPreparedStatement instead so if it is  
asked to fetch a non-existing statement for which there is a row in  
pg_global_prepared, it will create it.


test=# EXPLAIN ANALYZE EXECUTE test_plan_pk(1);
NOTICE:  prepared statement "test_plan_pk" does not exist in local session  
cache, now searching pg_global_prepared for a template to create it.
NOTICE:  found template for requested statement, executing :  
"test_plan_pk" :
NOTICE:  PREPARE test_plan_pk (INTEGER) AS SELECT * FROM test WHERE id =  
$1;


QUERY PLAN
--
 Index Scan using test_pkey on test  (cost=0.00..8.28 rows=1 width=8)  
(actual time=19.476..19.478 rows=1 loops=1)

   Index Cond: (id = $1)
  Total runtime: 0.079 ms
(3 lignes)

So, you take whatever persistent connection from a pool and issue an  
EXECUTE without worries.


* Now, the problem :

- EXECUTE, EXPLAIN EXECUTE, EXPLAIN ANALYZE EXECUTE all work
- pg_exec() from php makes it crash

Actually pg_exec() does not use SQL EXECUTE, I think it uses the new  
extended query protocol and just sends a message to execute a named  
prepared query.

In that case, my code in FetchPreparedStatement crashes :

NOTICE:  prepared statement "test_plan_pk" does not exist in local session  
cache, now searching pg_global_prepared for a template to create it.
LOG:  server process (PID 30692) was terminated by signal 11: Segmentation  
fault

LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing

GDB says it is because CurrentResourceOwner is NULL. Did I forger to  
initialize something ? lol.


I'll post more details and complete traceback this afternoon, but here is  
the problematic bit of code, this is the code that finds the SQL to  
prepare a statement.

Thanks for any suggestion ;)

Relationmycatalog;
HeapTuple   tup;
TupleDesc   dsc;
NameDatastmt_name_data;
ScanKeyData skey[2];
SysScanDesc scan;
Datum   datum;
boolfound = false;
boolisnull;
const char *sql = "";

namestrcpy(&stmt_name_data, stmt_name);
mycatalog = heap_open(GlobalPreparedRelationId, RowExclusiveLock);	<  
crashes here

dsc = RelationGetDescr( mycatalog );
ScanKeyInit(&skey[0],
Anum_pg_global_prepared_stmt_owner,
BTEqualStrategyNumber, F_OIDEQ,
GetUserId());
ScanKeyInit(&skey[1],
Anum_pg_global_prepared_stmt_name,
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(&stmt_name_data));
scan = systable_beginscan(mycatalog, GlobalPreparedIndexId, true,

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread PFC


On Fri, 22 Jun 2007 16:43:00 +0200, Bruce Momjian <[EMAIL PROTECTED]> wrote:


Simon Riggs wrote:

On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
>
> > Tom Lane wrote:
> >
> >> untrustworthy disk hardware, for instance.  I'd much rather use  
names

> >> derived from "deferred commit" or "delayed commit" or some such.
> >
> > Honestly, I prefer these names as well as it seems directly related  
versus
> > transaction guarantee which sounds to be more like us saying, if we  
turn it off

> > our transactions are bogus.

That was the intention..., but name change accepted.

> Hm, another possibility: "synchronous_commit = off"

Ooo, I like that. Any other takers?


Yea, I like that too but I am now realizing that we are not really
deferring or delaying the "COMMIT" command but rather the recovery of
the commit.  GUC as full_commit_recovery?



commit_waits_for_fsync =

force_yes   : makes all commits "hard"
yes : commits are "hard" unless specified otherwise [default]
	no	: commits are "soft" unless specified otherwise [should replace  
fsync=off use case]
	force_no	: makes all commits "soft" (controller with write cache  
"emulator")


	the force_yes and force_no are for benchmarking purposes mostly, ie. once  
your app is tuned to specify which commits have to be guaranteed ("hard")  
and which don't ("soft") you can then bench it with force_yes and force_no  
to see how much you gained, and how much you'd gain by buying a write  
cache controller...



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread PFC



So now we're poking a hole in that but we certainly have to ensure that  
any
transactions that do see the results of our deferred commit themselves  
don't

record any visible effects until both their commit and ours hit WAL. The
essential point in Simon's approach that guarantees that is that when you
fsync you fsync all work that came before you. So committing a  
transaction

also commits all deferred commits that you might depend on.


BTW: I really dislike the name "transaction guarantee" for the feature;
it sounds like marketing-speak, not to mention overpromising what we
can deliver.  Postgres can't "guarantee" anything in the face of
untrustworthy disk hardware, for instance.  I'd much rather use names
derived from "deferred commit" or "delayed commit" or some such.


Well from an implementation point of view we're delaying or deferring the
commit. But from a user's point of view the important thing for them to
realize is that a committed record could be lost.

Perhaps we should just not come up with a new name and reuse the fsync
variable. That way users of old installs which have fsync=off silently  
get
this new behaviour. I'm not sure I like that idea since I use fsync=off  
to run
cpu overhead tests here. But from a user's point of view it's probably  
the
"right" thing. This is really what fsync=off should always have been  
doing.


Say you call them SOFT COMMIT and HARD COMMIT...
HARD COMMIT fsyncs, obviously.
Does SOFT COMMIT fflush() the WAL (so it's postgres-crash-safe) or not ?
(just in case some user C function misbehaves and crashes)

Do we get a config param to set default_commit_mode=hard or soft ?

	By the way InnoDB has a similar option where you set  
innodb_flush_log_on_commit (or something). However you cannot set it on a  
per-transaction basis. So, on a e-commerce site, for instance, most  
transactions will be "unimportant" (ie. no need to fsync, ACI only, like  
incrementing products view counts, add to cart, etc) but some transactions  
will have to be guaranteed (full ACID) like recording that an order has  
been submitted / paid / shipped. But with InnoDB you can't choose this on  
a per-transaction basis, so it's all or nothing.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Load Distributed Checkpoints test results

2007-06-15 Thread PFC

On Fri, 15 Jun 2007 22:28:34 +0200, Gregory Maxwell <[EMAIL PROTECTED]>  
wrote:



On 6/15/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
While in theory spreading out the writes could have a detrimental  
effect I

think we should wait until we see actual numbers. I have a pretty strong
suspicion that the effect would be pretty minimal. We're still doing  
the same

amount of i/o total, just with a slightly less chance for the elevator
algorithm to optimize the pattern.


..and the sort patching suggests that the OS's elevator isn't doing a
great job for large flushes in any case. I wouldn't be shocked to see
load distributed checkpoints cause an unconditional improvement since
they may do better at avoiding the huge burst behavior that is
overrunning the OS elevator in any case.


	...also consider that if someone uses RAID5, sorting the writes may  
produce more full-stripe writes, which don't need the read-then-write  
RAID5 performance killer...


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-13 Thread PFC


>If we extended relations by more than 8k at a time, we would know a lot
>more about disk layout, at least on filesystems with a decent amount of
>free space.

I doubt it makes that much difference. If there was a significant amount
of fragmentation, we'd hear more complaints about seq scan performance.

The issue here is that we don't know which relations are on which drives
and controllers, how they're striped, mirrored etc.


Actually, isn't pre-allocation one of the tricks that Greenplum uses to
get it's seqscan performance?


	My tests here show that, at least on reiserfs, after a few hours of  
benchmark torture (this represents several million write queries), table  
files become significantly fragmented. I believe the table and index files  
get extended more or less simultaneously and end up somehow a bit mixed up  
on disk. Seq scan perf suffers. reiserfs doesn't have an excellent  
fragmentation behaviour... NTFS is worse than hell in this respect. So,  
pre-alloc could be a good idea. Brutal Defrag (cp /var/lib/postgresql to  
somewhere and back) gets seq scan perf back to disk throughput.


	Also, by the way, InnoDB uses a BTree organized table. The advantage is  
that data is always clustered on the primary key (which means you have to  
use something as your primary key that isn't necessary "natural", you have  
to choose it to get good clustering, and you can't always do it right, so  
it somehow, in the end, sucks rather badly). Anyway, seq-scan on InnoDB is  
very slow because, as the btree grows (just like postgres indexes) pages  
are split and scanning the pages in btree order becomes a mess of seeks.  
So, seq scan in InnoDB is very very slow unless periodic OPTIMIZE TABLE is  
applied. (caveat to the postgres TODO item "implement automatic table  
clustering"...)


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread PFC




My idea is that if an UPDATE places the new tuple on the same page as
the old tuple, it will not create new index entries for any indexes
where the key doesn't change.


	Basically the idea behind preventing index bloat by updates is to have  
one index tuple point to several actual tuples having the same value.


So : Index entry -> list of tuples having the same value -> actual 
tuples
(-> represents an indirection)

	I proposed to put the list of tuples in the index ; you propose to put it  
in data pages.


I think both solutions have pros and cons :

* List of tuples in the index :
+ reduces index size, makes cacheability in RAM more likely
+ speeds up index scans
- complexity
- slows down modifications to the index (a bit)

* List of tuples in the page
+ simpler to implement
+ reduces index size, but less so than previous solution
- useless if UPDATE puts the new tuple on a different page

I guess the best solution would be a mix of both.

	Also, I insist (again) that there is a lot to gain by using a bit of  
compression on the data pages, even if it's very simple compression like  
storing the new version of a row as a difference from the previous version  
(ie. only store the columns that changed).
	I think DB2 stores the latest version entirely, and stores the previous  
versions as a delta. This is more efficient.


	In the case of tables containing TEXT values, these could also get  
TOASTed. When an update does not modify the TOASTed columns, it would be  
nice to simply be able to keep the reference to the TOASTed data instead  
of decompressing it and recompressing it. Or is it already the case ?



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] vacuum, performance, and MVCC, and compression

2006-06-26 Thread PFC




What about increasing the size of an existing index entry?  Can that be
done easily when a new row is added?


I'd say it looks pretty much like inserting a new index tuple...
Say "value" is the indexed column.

Find first page in the index featuring "value".
1   If there is space on the page,
		add the tuple id to the list of the corresponding index entry (just like  
creating a new index tuple, but uses less space).

else
look at next page.
If next page has an index tuple with the same indexed value,
goto 1
else
insert new page and create an index tuple on it


I would be worried about the overhead of doing that on compression and
decompression.


	The compression methods mentioned in the article which was passed on the  
list seemed pretty fast. From IO-limited, the test database became  
CPU-limited (and a lot faster).



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] vacuum, performance, and MVCC, and compression

2006-06-26 Thread PFC



There were some talks lately about compression.
	With a bit of lateral thinking I guess this can be used to contain the  
bloat induced by updates.

Of course this is just my hypothesis.

Compression in indexes :

	Instead of storing (value, tuple identifier) keys in the indexes, store  
(value, [tuple identifier list]) ; ie. all tuples which have the same  
indexed value are referenced by the same index tuple, instead of having  
one index tuple per actual tuple.
	The length of the list would of course be limited to the space actually  
available on an index page ; if many rows have the same indexed value,  
several index tuples would be generated so that index tuples fit on index  
pages.
	This would make the index smaller (more likely to fit in RAM) at the cost  
of a little CPU overhead for index modifications, but would make the index  
scans actually use less CPU (no need to compare the indexed value on each  
table tuple).


Compression in data pages :

	The article that circulated on the list suggested several types of  
compression, offset, dictionary, etc. The point is that several row  
versions on the same page can be compressed well because these versions  
probably have similar column values.


Just a thought...

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-24 Thread PFC




What I see in this discussion is a huge amount of "the grass must be
greener on the other side" syndrome, and hardly any recognition that
every technique has its downsides and complications.


Sure ;)

	MVCC generates dead rows, by its very nature ; however I see two trends  
in this :


1* A large transaction that updates/deletes many rows.
	For instance suppose you UPDATE an entire table whose size is larger than  
memory.


	Old row versions have to be kept somewhere until commit, be it in the  
table itself or in some accessory undo-log.
	So, there will be a lot of harddisk grinding anyway, be it MVCC or  
Oracle-style, or whatever. MVCC will bloat the table and indexes, then  
VACUUM will shrink them. Update-in-place systems will bloat an undo log.


	It seems to me the current MVCC+VACUUM is the right tool for this job,  
requiring about the same amount of IO that the others.
	Vacuum scans sequentially, so it's the best way to process large volumes  
of data.


2* Many short transactions update rows in a table
Like the sessions problem, for instance.

Current VACUUM sucks for this case, I guess that's known.

---

	So, we have two different problems, and one tool which is adapted to one  
problem only. Should the tool (Vacuum) be fixed to handle both problems,  
making it more complex and difficult to maintain, or should another tool  
be created specifically for the second problem ?
	Problem 2 is very different from problem 1. The only case when they meet  
is when there is a continuous stream of small updates running concurrently  
with a long transaction.

So, what is the ideal tool for case 2 ?

	We'd want a vacuuming machine that can be run very often, or even better,  
continuously.
	The table can be large, larger than the disk cache, so scanning it is not  
an option.
	The updates are probably randomly distributed into the table. Therefore,  
VACUUMing a long time after these transactions are commited and the pages  
are no longer in cache would require a lot of random seeks, which is also  
bad.

Besides, we want this vacuum to be continuous and transparent.

	The best time to vacuum pages is, thus, when they are still in the  
background writer's memory, or the disk cache, waiting to be flushed to  
disk. There, they can be re-read, vacuumed and re-written with no seek  
penalty, only additional WAL traffic. However the amount of WAL traffic in  
bytes/s is less important that the frequency of WAL syncs. Emitting more  
WAL data shouldn't be a problem if those sync writes are coalesced with  
the sync writes of current reansactions.


	So, I guess the best solution for case 2 is to have the background writer  
perform on-the-fly VACUUM :


	An UPDATE or DELETE transaction hands over dirty pages to be written to  
the bgwriter. It also tells the bgwriter the ID of the current transaction  
and flags specifying if they contain candidate dead rows.
	The bgwriter should have a sufficiently large buffer in order to be able  
to keep these pages in memory until all the transactions that can see the  
dead rows in these pages are finished.

Then, the pages are vacuumed and written.

	The key is the size of the buffer. It should be large enough to contain  
enough pages so that it is actually possible to vacuum something out of  
them before writing them. However if the buffer capacity is exceeded (for  
instance, because there is a long running transaction), this is not a  
problem : the pages are simply written to disk normally, they will contain  
dead rows, which will need to be handled lated by the standard VACUUM.


	I think this would maximize code reuse by using the current bgwriter's  
architecture... did I miss something ?















---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread PFC




Well, then please help me find a better design cause I can't see one...
what we have here is a big "membership" table of email lists. When
there's a sendout then the memberships of the affected group are heavily
read/updated, otherwise they are idle. None of the memberships is
archive data, they are all active data... the only problem is that they
are so many. Is it so hard to believe that >100 million rows is all
active data, but only used in bursts once per week (that's an example,
some groups are more active, others less) ?


	I suppose you have a table memberships (user_id, group_id) or something  
like it ; it should have as few columns as possible ; then try regularly  
clustering on group_id (maybe once a week) so that all the records for a  
particular group are close together. Getting the members of a group to  
send them an email should be faster (less random seeks).


	For tables with very few small fields (like a few integers) the  
26-something bytes row overhead is significant ; MySQL can be faster  
because MyISAM tables have no transaction support and thus have very  
little things to store besides actual row data, and the table can then fit  
in RAM...


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-22 Thread PFC




What you seem not to grasp at this point is a large web-farm, about 10 or
more servers running PHP, Java, ASP, or even perl. The database is  
usually

the most convenient and, aside from the particular issue we are talking
about, best suited.


	The answer is sticky sessions : each user is assigned to one and only one  
webserver in the cluster and his session is maintained locally, in RAM. No  
locks, no need to manage distributed session...



I actually have a good number of years of experience in this topic, and
memcached or file system files are NOT the best solutions for a server
farm.


	If sessions are distributed, certainly, but if sessions are sticky to  
their own server ?




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-22 Thread PFC




As you can see, in about a minute at high load, this very simple table
lost about 10% of its performance, and I've seen worse based on update
frequency.  Before you say this is an obscure problem, I can tell you it
isn't. I have worked with more than a few projects that had to switch  
away

from PostgreSQL because of this behavior.

Obviously this is not a problem with small sites, but this is a real
problem with an enterprise level web site with millions of visitors and
actions a day. Quite frankly it is a classic example of something that
does not scale. The more and more updates there are, the higher the load
becomes. You can see it on "top" as the footest program runs.



I believe sessions should not be stored in a SQL database.
	It makes no sense. Updates and Inserts to the database should only be  
done where there is an interesting thing to record, when the user does an  
action like posting to a forum, making a purchase, sending a message, etc.


	I believe sessions should be stored in the memory of the application  
server, as native objects of the whatever language the application is  
written in. This way, sessions incur no serializing overhead and can be  
quite large and complex, which allows storage of interesting things, like  
the result of a complicated search query which is to be later paginated,  
for instance. It really makes sense to use native language objects too, as  
these have a lot more power and versatility than a database row. Think  
about rights management, for instance.


	When the framework used lacks this power (most do and this is sad), then  
sessions incur serializing overhead ; but they should be serialized to  
filesystem files, or better, to memory using memcached, for instance. It  
makes no sense to pay the performance penalty of a COMMIT (disk seek delay  
etc) and the database overhead for sessions, which are by nature volatile  
data changing all the time.


	I don't think postgres should be tweaked to allow better handling of  
this. It would only make a better foot-gun.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-19 Thread PFC




Great minds think alike ;-) ... I just committed exactly that protocol.
I believe it is correct, because AFAICS there are only four possible
risk cases:


Congrats !

For general culture you might be interested in reading this :

http://en.wikipedia.org/wiki/Software_transactional_memory
http://libcmt.sourceforge.net/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-17 Thread PFC




It strikes me that we are using a single communication mechanism to
handle what are really two distinct kinds of data:


Interesting.

	I recently read a paper on how to get rid of locks for this kind of  
pattern.


* For the Command String

	- Problem : need to display the currently executing command in the ps  
list.

- Will only be of use if the command is taking a long, long time.
	So, it need not be realtime ; no problem if the data comes with a little  
delay, or not at all if the command executes quickly.
	People are only interested in the currently executing command to answer  
questions like "what query has my server grinding ?"


	Point : the currently executing query is only interesting to display if  
it's currently executing. If it's not, it's in the log (if enabled).


So, the proposal :

Each backend has a shared memory area where to store :
- the currently executing command (like in your proposal).
- a timestamp
- a counter

	On executing a command, Backend stores the command string, then  
overwrites the counter with (counter + 1) and with the timestamp of  
command start.


	Periodically, like every N seconds, a separate process reads the counter,  
then reads the data, then reads the counter again.
	If the counter value changed, the process is repeated. If the counter  
value did not change, the command string did not change either, so it's  
valid, and can be used.


Other thoughts :

	If the backend process itself should update its process title, and this  
operation is costly, it should only be done if the current query has been  
running for more than T seconds. However syscalls for getting the current  
time are costly. A separate process can update a counter in shared memory  
with the current time every N seconds, and the backend can check it.


	The main point is that if this value is written to every few seconds, but  
read often by only one process ; or written often but read seldom, there  
will not be a lot of interprocessor cache trashing on it.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread PFC



From what I gather, the CSV format dump would only contain data.
	I think pg_dump is the friend of pg_restore. It dumps everything  
including user defined functions, types, schemas etc. CSV does not fit  
with this.


	Besides, people will probably want to dump into CSV the result of any  
query, to load it into excel, not just the full contents of a table.


	So, why not create a separate tool, someone suggested pg_query for that,  
I second it.
	This tool would take a query and format options, and would output a file  
in whatever format chosen by the user (CSV, COPY format, xml, whatever)


	A script language (python) can be used, which will significantly shorten  
development times and allow easy modularity, as it is easier to add a  
module to a python program than a C program.
	I would vote for Python because I love it and it has a very good postgres  
adapter (psycopg2) which knows how to convers every postgres type to a  
native language type (yes, even multidimensional arrays of BOX get  
converted). And it's really fast at retrieving large volumes of data.


	So you have a stable, fast tool for backup and restore (pg_dump) and a  
rapidly evolving, user-friendly and extendable tool for exporting data,  
and everyone is happy.


	Mr Momijan talks about adding modular functionality to pg_dump. Is it  
really necessary ? What is the objective ? Is it to reuse code in pg_dump  
? I guess not ; if a user wants to dump, for instance, all the tables in a  
schema, implementing this logic in python is only a few lines of code  
(select from information_schema...)


	To be realistic, output format modules should be written in script  
languages. Noone sane is eager to do string manipulation in C. Thus these  
modules would have to somehow fit with pg_dump, maybe with a pipe or  
something. This means designing another protocol. Reimplementing in a  
scripting langage the parts of pg_dump which will be reused by this  
project (mainly, enumerating tables and stuff) will be far easier.


Just look.

Python 2.4.2 (#1, Mar 30 2006, 14:34:35)
[GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

...opens a db connection...


c.execute( "SELECT * FROM test.csv" )
data = c.fetchall()
data
[[1, datetime.date(2006, 6, 13), 'this\tcontains\ttabulations'], [2,  
datetime.date(2006, 6, 13), "this'contains'quotes"], [3,  
datetime.date(2006, 6, 13), 'this"contains"double quotes']]

import csv, sys
c = csv.writer( sys.stdout, dialect = csv.excel )
c.writerows( data )

1,2006-06-13,this   containstabulations
2,2006-06-13,this'contains'quotes
3,2006-06-13,"this""contains""double quotes"

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] Faster Updates

2006-06-03 Thread PFC



Hello,
Sometimes people complain that UPDATE is slow in postgres. UPDATE...

- generates dead tuples which must be vacuumed.
- needs to hit all indexes even if only one column was modified.

	From what I know UPDATE creates a new copy of the old row with the  
relevant C/TID's, then indexes it. On COMMIT the old version becomes dead  
but stays in the table and indexes until VACUUM.

I propose a simple idea, which may be idiotic, but who knows.

	When a row is UPDATED, instead of storing a new copy of the entire row,  
only a differential is stored. The old row stays in the page anyway, so we  
might as well only store the binary encoded equivalent of "Use the row  
version number X and change column A to value Y".
	This is possible only if the differential fits in the free space on the  
page.
	In this case, a lot less dead space is generated. VACUUM would  
consolidate the differentials for commited transactions into a new base  
value for this row.
	While reading the page looking for a specific version of a row, all  
differences would need to be consolidated. This adds overhead, but it  
might be a win.
	With this method, it could be possible to avoid updating the indexes for  
unmodified columns. This is a big win.


What do you think ?




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] COPY (query) TO file

2006-06-03 Thread PFC





MySQL already does this for INSERT :
INSERT INTO x (a,b) VALUES (1,2), (3,4), (5,6)...;


Does MySQL really let you stream that? Trying to do syntax like that in
Postgres wouldn't work because the parser would try to build up a parse  
tree

for the whole statement before running the command.


	Hehe, I don't know, but I suppose it's parsed in one-shot then executed,  
and not streamed, because :

- you can append modifiers at the end of the statement (IGNORE...),
	- mysql barfs if the complete SQL including data is larger than the query  
buffer specified in the config file.


	The second point leads to an interesting fact, ie. dumps generated by  
phpmyadmin and mysqldump need a parameter specifying how long, in bytes,  
the insert commands can be ; so that hopefully they can be reloaded later.
	If one of the inserted values violates a "constraint", it is substituted  
by "some other default value".


	Still, it's useful ; and one interesting part is that everything happens  
in the same SQL command (wrt concurrency).



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] COPY (query) TO file

2006-06-03 Thread PFC



I was also vaguely pondering whether all the DDL commands could be  
generalized
to receive or send COPY formatted data for repeated execution. It would  
be
neat to be able to prepare an UPDATE with placeholders and stream data  
in COPY

format as parameters to the UPDATE to execute it thousands or millions of
times without any protocol overhead or network pipeline stalls.



MySQL already does this for INSERT :
INSERT INTO x (a,b) VALUES (1,2), (3,4), (5,6)...;


allowing arbitrary SELECT statements as a COPY source
seems much more powerful and flexible than just supporting COPY FROM  
VIEW.


MySQL already does this :
SELECT INTO OUTFILE blah FROM table...

	Now in both cases the MySQL syntax sucks but it's still quite practical,  
and the INSERT saves some overhead (parsing, acquiring locks...) and is  
quite a bit faster than regular INSERT.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Generalized concept of modules

2006-06-03 Thread PFC



Think about version API compatibility.

	Suppose you have a working database on server A which uses module foo  
version 1.
	Some time passes, you buy another server B and install postgres on it.  
Meanwhile the module foo has evolved into version 2 which is cooler, but  
has some minor API incompatibilities.
	You dump the database on server A and reload it on server B. pg_dump  
issues an INSTALL MODULE which installs foo version 2 on the new server.

Due to the "minor API incompatibilities", your database breaks.

	It's really cool not to pollute the dumps (and the global namespace...)  
with all the module functions, however implementing module functionality  
can be tricky.


	So don't forget about versions and possible incompatibilities ; also  
versions means you might need an UPGRADE MODULE which does more than  
uninstall + reinstall. Suppose a module has created some tables for its  
use, these shouldn't be dumped when upgrading to a new version ; however  
maybe the new version will want to add a column...


Think gentoo portage, for instance.
	This excellent package system is a lot more evolved than the module  
system needs to be, but having a look at the feature list would be a good  
inspiration maybe.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] pg_proc probin misuse

2006-05-29 Thread PFC



Hm, thinking again, I guess Tom Lane is right


Surely the initialization code would have to be run anyway ... and if
the function does import a pile of modules, do you really want to cache
all that in its pg_proc entry?  What happens if some of the modules get
updated later?


	Besides, what happens if you store compiled bytecode in a table, then  
upgrade the python interpreter to a new version... would it be compatible  
? I suppose so, but I don't really know...
	Persistent connections should be used anyway, this makes the RAM caching  
good...


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] pg_proc probin misuse

2006-05-28 Thread PFC




If it were really expensive to derive bytecode from source text
then maybe it'd make sense to do what you're doing, but surely that's
not all that expensive.  Everyone else manages to parse prosrc on the
fly and cache the result in memory; why isn't plpython doing that?


	It depends on the number of imported modules in the function. If it  
imports a lot of modules, it can take some time to compile a python  
function (especially if the modules have some initialisation code which  
must be run on import).


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread PFC





> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
DESC
> LIMIT 20
> 0.443 ms ANALYZE tmp
> 0.365 ms SELECT * FROM tmp
> 0.310 ms DROP TABLE tmp
> 32.918 ms COMMIT



The 32 seconds for commit can hardly be catalog related. It seems the
file is
fsynced before it is dropped.


I'd hope that wasn't what's happening... is the backend smart enough to
know not to fsync anything involved with the temp table? ISTM that that
transaction shouldn't actually be creating any WAL traffic at all.
Though on the other hand there's no reason that DROP should be in the
transaction at all; maybe that's gumming things up during the commit.


	I included the DROP to make it clear that the time was spent in  
COMMITting, not in DROPping the table.
	Also, you can't use CREATE TEMP TABLE AS SELECT ... and at the same time  
make it ON COMMIT DROP. You have to CREATE and INSERT.
	With an ON COMMIT DROP temp table, the global timings are the same wether  
or not it is dropped before commit : it is always the COMMIT which takes  
all the milliseconds.


	I still bet on system catalog updates being the main cause of the time  
spent in COMMIT...

(because ANALYZE changes this time)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC




On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote:

You mean the cursors'storage is in fact the same internal machinery
as a  temporary table ?


Use the source, Luke...


LOL, yeah, I should have, sorry.


See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and
heap_create_with_catalog in backend/catalog/heap.c. You'll find that
creating a tuplestore is far easier than creating a temp table.


	I had used intuition (instead of the source) to come at the same  
conclusion regarding the level of complexity of these two...

But I'll look at the source ;)


Perhaps it would be worth creating a class of temporary tables that used
a tuplestore, although that would greatly limit what could be done with
that temp table.


	Just selecting from it I guess, but that's all that's needed. Anymore  
would duplicate the functionality of a temp table.
	I find cursors awkward. The application can FETCH from them, but postgres  
itself can't do it in SQL, unless using FOR.. IN in plpgsql...
	It would be a powerful addition to be able to split queries, factor out  
common parts between multiple queries, etc, using this system, it can even  
be used to execute an inner part of a query, then plan the rest according  
to the results and execute it... without the overhead of a temp table.





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC





Have you tried getting a profile of what exactly PostgreSQL is doing
that takes so long when creating a temp table?


	Nope, I'm not proficient in the use of these tools (I stopped using C  
some time ago).



BTW, I suspect catalogs might be the answer,


Probably, because :

- Temp tables don't use fsync (I hope)
- Catalogs do
- fsync=off makes COMMIT fast
- fsync=on makes COMMIT slow
	- fsync=on and using ANALYZE makes COMMIT slower (more updates to the  
catalogs I guess)



which is why Oracle has you
define a temp table once (which does all the work of putting it in the
catalog) and then you just use it accordingly in each individual
session.


Interesting (except for the ANALYZE bit...)



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC




Speaking of which, if a temp table is defined as ON COMMIT DROP or
DELETE ROWS, there shouldn't be any need to store xmin/xmax, only
cmin/cmax, correct?


Yes, that's that type of table I was thinking about...
You can't ROLLBACK a transaction on such a table.
	You can however rollback a savepoint and use "INSERT INTO tmp SELECT FROM  
tmp" which implies MVCC (I think ?)


	I was suggesting to be able to use FETCH (from a cursor) in the same way  
as SELECT, effectively using a named cursor (DECLARE...) as a simpler,  
faster version of a temporary table, but there is another (better ?)  
option :


	If rowcount estimates for functions are implemented, then a set-returning  
function can be written, which takes as argument a named cursor, and  
returns its rows.
	It would have accurate rowcount estimation (if the cursor is WITH SCROLL,  
which is the case here, rows are stored, so we know their number).


Then you could do :

DECLARE my_cursor ... AS (query that we only want to do once)
SELECT ... FROM table1 JOIN fetch_cursor( my_cursor ) ON ...
SELECT ... FROM table2 JOIN fetch_cursor( my_cursor ) ON ...
SELECT ... FROM table3 JOIN fetch_cursor( my_cursor ) ON ...

No need to redefine the FETCH keyword.
An interesting functionalyty with minimal hassle.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC




The problem is that you need a set-returning function to retrieve
the  values. SRFs don't have rowcount estimates, so the plans suck.


What about adding some way of rowcount estimation to SRFs, in the way of:

CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS
$$ ... function code ... $$ LANGUAGE plpgsql
ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ;

Internally, this could create two functions, foo (para, meters) and
estimate_foo(para, meters) that are the same language and coupled
together (just like a SERIAL column and its sequence). The estimator
functions have an implicit return parameter of int8. Parameters may be
NULL when they are not known at query planning time.

What do you think about this idea?


It would be very useful.
A few thoughts...

	You need to do some processing to know how many rows the function would  
return.

Often, this processing will be repeated in the function itself.
	Sometimes it's very simple (ie. the function will RETURN NEXT each  
element in an array, you know the array length...)
	Sometimes, for functions returning few rows, it might be faster to  
compute the entire result set in the cost estimator.


So, it might be a bit hairy to find a good compromise.

Ideas on how to do this (clueless hand-waving mode) :

	1- Add new attributes to set-returning functions ; basically a list of  
functions, each returning an estimation parameter (rowcount, cpu tuple  
cost, etc).

This is just like you said.

	2- Add an "estimator", to a function, which would just be another  
function, returning one row, a record, containing the estimations in  
several columns (rowcount, cpu tuple cost, etc).
	Pros : only one function call to estimate, easier and faster, the  
estimator just leaves the unknown columns to NULL.
	The estimator needs not be in the same language as the function itself.  
It's just another function.


	3- The estimator could be a set-returning function itself which would  
return rows mimicking pg_statistics
	Pros : planner-friendly, the planner would SELECT from the SRF instead of  
looking in pg_statistics, and the estimator could tell the planner that,  
for instance, the function will return unique values.

Cons : complex, maybe slow

4- Add simple flags to a function, like :
- returns unique values
- returns sorted values (no need to sort my results)
	- please execute me and store my results in a temporary storage, count  
the rows returned, and plan the outer query accordingly

- etc.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC





>   SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)



Well, you can either
  SELECT * FROM somewhere JOIN (SELECT id FROM result GROUP BY id) AS
a USING (id);


It's the same thing (and postgres knows it)


You might want to use PL to store values, say PLperl, or even C, say:


I tried.
	The problem is that you need a set-returning function to retrieve the  
values. SRFs don't have rowcount estimates, so the plans suck.



Should work faster than a in-application solution :)


Should, but don't, because of what I said above...

	With the version in CVS tip, supprting a fast =ANY( array ), this should  
be doable, though.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC




Does the time for commit change much if you leave out the analyze?


	Yes, when I don't ANALYZE the temp table, commit time changes from 30 ms  
to about 15 ms ; but the queries get horrible plans (see below) :


	Fun thing is, the rowcount from a temp table (which is the problem here)  
should be available without ANALYZE ; as the temp table is not concurrent,  
it would be simple to inc/decrement a counter on INSERT/DELETE...


	I like the temp table approach : it can replace a large, complex query  
with a batch of smaller and easier to optimize queries...


EXPLAIN ANALYZE SELECT a.* FROM tmp t, annonces_display a WHERE a.id=t.id  
ORDER BY t.sort;
   QUERY  
PLAN

-
 Sort  (cost=3689.88..3693.15 rows=1310 width=940) (actual  
time=62.327..62.332 rows=85 loops=1)

   Sort Key: t.sort
   ->  Merge Join  (cost=90.93..3622.05 rows=1310 width=940) (actual  
time=5.595..61.373 rows=85 loops=1)

 Merge Cond: ("outer".id = "inner".id)
 ->  Index Scan using annonces_pkey on annonces   
(cost=0.00..3451.39 rows=10933 width=932) (actual time=0.012..6.620  
rows=10916 loops=1)
 ->  Sort  (cost=90.93..94.20 rows=1310 width=12) (actual  
time=0.098..0.105 rows=85 loops=1)

   Sort Key: t.id
   ->  Seq Scan on tmp t  (cost=0.00..23.10 rows=1310  
width=12) (actual time=0.004..0.037 rows=85 loops=1)

 Total runtime: 62.593 ms

EXPLAIN ANALYZE SELECT * FROM contacts WHERE id IN (SELECT contact_id FROM  
tmp);

 QUERY PLAN

 Hash Join  (cost=28.88..427.82 rows=200 width=336) (actual  
time=0.156..5.019 rows=45 loops=1)

   Hash Cond: ("outer".id = "inner".contact_id)
   ->  Seq Scan on contacts  (cost=0.00..349.96 rows=9396 width=336)  
(actual time=0.009..3.373 rows=9396 loops=1)
   ->  Hash  (cost=28.38..28.38 rows=200 width=4) (actual  
time=0.082..0.082 rows=46 loops=1)
 ->  HashAggregate  (cost=26.38..28.38 rows=200 width=4) (actual  
time=0.053..0.064 rows=46 loops=1)
   ->  Seq Scan on tmp  (cost=0.00..23.10 rows=1310 width=4)  
(actual time=0.001..0.015 rows=85 loops=1)

 Total runtime: 5.092 ms

ANALYZE tmp;
ANALYZE
annonces=> EXPLAIN ANALYZE SELECT a.* FROM tmp t, annonces_display a WHERE  
a.id=t.id ORDER BY t.sort;

  QUERY PLAN
---
 Sort  (cost=508.63..508.84 rows=85 width=940) (actual time=1.830..1.832  
rows=85 loops=1)

   Sort Key: t.sort
   ->  Nested Loop  (cost=0.00..505.91 rows=85 width=940) (actual  
time=0.040..1.188 rows=85 loops=1)
 ->  Seq Scan on tmp t  (cost=0.00..1.85 rows=85 width=12) (actual  
time=0.003..0.029 rows=85 loops=1)
 ->  Index Scan using annonces_pkey on annonces  (cost=0.00..5.89  
rows=1 width=932) (actual time=0.003..0.004 rows=1 loops=85)

   Index Cond: (annonces.id = "outer".id)
 Total runtime: 2.053 ms
(7 lignes)

annonces=> EXPLAIN ANALYZE SELECT * FROM contacts WHERE id IN (SELECT  
contact_id FROM tmp);

   QUERY PLAN
-
 Nested Loop  (cost=2.06..139.98 rows=36 width=336) (actual  
time=0.072..0.274 rows=45 loops=1)
   ->  HashAggregate  (cost=2.06..2.51 rows=45 width=4) (actual  
time=0.052..0.065 rows=46 loops=1)
 ->  Seq Scan on tmp  (cost=0.00..1.85 rows=85 width=4) (actual  
time=0.003..0.016 rows=85 loops=1)
   ->  Index Scan using contacts_pkey on contacts  (cost=0.00..3.04 rows=1  
width=336) (actual time=0.003..0.004 rows=1 loops=46)

 Index Cond: (contacts.id = "outer".contact_id)
 Total runtime: 0.341 ms

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC




Creating cursors for a simple plan like a single sequential scan is fast
because it's using the original data from the table.


I used the following query :

SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20

It's a backward index scan + limit... not a seq scan. And it's damn 
fast :

0.042 ms BEGIN
0.153 ms DECLARE tmp SCROLL CURSOR WITHOUT HOLD FOR SELECT * FROM  
bookmarks ORDER BY annonce_id DESC LIMIT 20

0.246 ms FETCH ALL FROM tmp
0.048 ms MOVE FIRST IN tmp
0.246 ms FETCH ALL FROM tmp
0.048 ms CLOSE tmp
0.084 ms COMMIT



But your example was
predicated on this part of the job being a complex query. If it's a  
complex
query involving joins and groupings, etc, then it will have to be  
materialized
and there's no (good) reason for that to be any faster than a temporary  
table

which is effectively the same thing.


	You mean the cursors'storage is in fact the same internal machinery as a  
temporary table ?


	In that case, this raises an interesting question : why is the cursor  
faster ?


	Let's try a real-life example from my website : it is a search query  
(quite complex) which is then joined to a lot of tables to resolve FKeys.
	To that query I must add add an application-made join using a big IN()  
clause extracted from the data.

Timings includes the time to fetch the results into Python.
The "running total" column is the sum of all timings since the BEGIN.


query_time  running_total   rowsquery
0.061 ms0.061 ms-1 BEGIN
23.420 ms   23.481 ms   85 SELECT * FROM (huge query with a  
lot of joins)
4.318 ms27.799 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u WHERE u.id=l.user_id AND l.annonce_id IN  
(list of ids from previous query) ORDER BY annonce_id, added

0.241 ms28.040 ms   -1  COMMIT

	(Just in case you want to hurt yourself, here's the EXPLAIN ANALYZE  
output : http://peufeu.com/temp/big_explain.txt)

Using a cursor takes about the same time.

	Also, doing just the search query takes about 12 ms, the joins take up  
the rest.


Now, I'll rewrite my query eliminating the joins and using a temp table.
	Storing the whole result in the temp table will be too slow, because  
there are too many columns.
	Therefore I will only store the primary and foreign key columns, and join  
again to the main table to get the full records.


query_time  running_total   rowsquery
0.141 ms0.141 ms-1  BEGIN

Do the search :

8.229 ms8.370 ms-1  CREATE TEMPORARY TABLE tmp AS  
SELECT id, city_id, zipcode, contact_id, contact_group_id, price/terrain  
as sort FROM (stripped down search query)

0.918 ms9.287 ms-1  ANALYZE tmp

Fetch the main data to display :

7.663 ms16.951 ms   85  SELECT a.* FROM tmp t,  
annonces_display a WHERE a.id=t.id ORDER BY t.sort


Fetch log entries associates with each row (one row to many log 
entries) :

1.021 ms17.972 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u, tmp t WHERE u.id=l.user_id AND l.annonce_id  
= t.id ORDER BY annonce_id, added
3.468 ms21.440 ms   216 SELECT annonce_id,  
array_accum(list_id) AS list_ids, array_accum(COALESCE(user_id,0)) AS  
list_added_by, max(added) AS added_to_list FROM bookmarks GROUP BY  
annonce_id


Resolve foreign key relations

1.034 ms22.474 ms   37  SELECT r.annonce_id FROM  
read_annonces r, tmp t WHERE r.annonce_id = t.id
0.592 ms23.066 ms   9   SELECT * FROM cities_dist_zipcode  
WHERE zipcode IN (SELECT zipcode FROM tmp)
0.716 ms23.782 ms   11  SELECT * FROM cities_dist WHERE id  
IN (SELECT city_id FROM tmp)
1.125 ms24.907 ms   45  SELECT * FROM contacts WHERE id IN  
(SELECT contact_id FROM tmp)
0.799 ms25.705 ms   42  SELECT * FROM contact_groups WHERE  
id IN (SELECT contact_group_id FROM tmp)

0.463 ms26.169 ms   -1  DROP TABLE tmp
32.208 ms   58.377 ms   -1  COMMIT


From this we see :

	Using a temporary table is FASTER than doing the large query with all the  
joins. (26 ms versus 28 ms).

It's also nicer and cleaner.
However the COMMIT takes as much time as all the queries together !

Let's run with fsync=off :

query_time  running_total   rowsquery
0.109 ms0.109 ms-1  BEGIN
8.321 ms8.430 ms-1  CREATE TEMPORARY TABLE tmp AS  
SELECT id, city_id, zipcode, contact_id, contact_group_id, price/terrain  
as sort FROM (stripped down search query)

0.849 ms9.280 ms-1  ANALYZE tmp
7.360 ms16.640 ms   85  SELECT a.* FROM tmp t,  
annonces_display a WHERE a.id=t.id ORDER BY t.sort
1.067 ms17.707 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u, t

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC





It would be interesting to know what the bottleneck is for temp tables
for you. They do not go via the buffer-cache, they are stored in
private memory in the backend, they are not xlogged. Nor flushed to
disk on backend exit. They're about as close to in-memory tables as
you're going to get...


Hum...
	Timings are a mean over 100 queries, including roundtrip to localhost,  
via a python script.


0.038 ms BEGIN
0.057 ms SELECT 1
0.061 ms COMMIT

0.041 ms BEGIN
0.321 ms SELECT count(*) FROM bookmarks
0.080 ms COMMIT

this test table contains about 250 rows

0.038 ms BEGIN
0.378 ms SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20
0.082 ms COMMIT

the ORDER BY uses an index

0.042 ms BEGIN
0.153 ms DECLARE tmp SCROLL CURSOR WITHOUT HOLD FOR SELECT * FROM  
bookmarks ORDER BY annonce_id DESC LIMIT 20

0.246 ms FETCH ALL FROM tmp
0.048 ms MOVE FIRST IN tmp
0.246 ms FETCH ALL FROM tmp
0.048 ms CLOSE tmp
0.084 ms COMMIT

the CURSOR is about as fast as a simple query

0.101 ms BEGIN
1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT  
NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC  
LIMIT 20

0.443 ms ANALYZE tmp
0.365 ms SELECT * FROM tmp
0.310 ms DROP TABLE tmp
32.918 ms COMMIT

	CREATING the table is OK, but what happens on COMMIT ? I hear the disk  
seeking frantically.


With fsync=off, I get this :

0.090 ms BEGIN
1.103 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT  
NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
0.439 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC  
LIMIT 20

0.528 ms ANALYZE tmp
0.364 ms SELECT * FROM tmp
0.313 ms DROP TABLE tmp
0.688 ms COMMIT

Getting closer ?
	I'm betting on system catalogs updates. I get the same timings with  
ROLLBACK instead of COMMIT. Temp tables have a row in pg_class...


Another temporary table wart :

BEGIN;
CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT NULL, c  
TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP;

INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20;

EXPLAIN ANALYZE SELECT * FROM tmp;
QUERY PLAN
---
 Seq Scan on tmp  (cost=0.00..25.10 rows=1510 width=20) (actual  
time=0.003..0.006 rows=20 loops=1)

 Total runtime: 0.030 ms
(2 lignes)

ANALYZE tmp;
EXPLAIN ANALYZE SELECT * FROM tmp;
   QUERY PLAN

 Seq Scan on tmp  (cost=0.00..1.20 rows=20 width=20) (actual  
time=0.003..0.008 rows=20 loops=1)

 Total runtime: 0.031 ms

	We see that the temp table has a very wrong estimated rowcount until it  
has been ANALYZED.
	However, temporary tables do not support concurrent access (obviously) ;  
and in the case of on-commit-drop tables, inserts can't be rolled back  
(obviously), so an accurate rowcount could be maintained via a simple  
counter...




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC



Additionally to your query you are already transferring the whole result  
set multiple times.  First you copy it to the result table. Then you

read it again.   Your subsequent queries will also have to read over
all the unneeded tuples just to get your primary key.


	Considering that the result set is not very large and will be cached in  
RAM, this shouldn't be a problem.


then why useth thy not the DISTINCT clause when building thy result  
table and thou shalt have no duplicates.


Because the result table contains no duplicates ;)
I need to remove duplicates in this type of queries :

-- get object owners info
SELECT * FROM users WHERE id IN (SELECT user_id FROM results);

	And in this case I find IN() easier to read than DISTINCT (what I posted  
was a simplification of my real use case...)


which is a perfect reason to use a temp table.  Another variation on the  
temp table scheme is use a result table and add a query_id.


	True. Doesn't solve my problem though : it's still complex, doesn't have  
good rowcount estimation, bloats a table (I only need these records for  
the duration of the transaction), etc.


We do something like this in our web application when users submit  
complex queries.  For each query we store tuples of (query_id,result_id)

in a result table.  It's then easy for the web application to page the
result set.


Yes, that is about the only sane way to page big result sets.


A cleaner solution usually pays off in the long run whereas a hackish
or overly complex solution will bite you in the behind for sure as
time goes by.


	Yes, but in this case temp tables add too much overhead. I wish there  
were RAM based temp tables like in mysql. However I guess the current temp  
table slowness comes from the need to mark their existence in the system  
catalogs or something. That's why I proposed using cursors...


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC




You might consider just selecting your primary key or a set of
primary keys to involved relations in your search query.  If you
currently use "select *" this can make your result set very large.

Copying all the result set to the temp. costs you additional IO
that you propably dont need.


	It is a bit of a catch : I need this information, because the purpose of  
the query is to retrieve these objects. I can first store the ids, then  
retrieve the objects, but it's one more query.



Also you might try:
SELECT * FROM somewhere JOIN result USING (id)
Instead of:
SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)


	Yes you're right in this case ; however the query to retrieve the owners  
needs to eliminate duplicates, which IN() does.


On the other hand if your search query runs in 10ms it seems to be fast  
enough for you to run it multiple times.  Theres propably no point in  
optimizing anything in such case.


I don't think so :
	- 10 ms is a mean time, sometimes it can take much more time, sometimes  
it's faster.
	- Repeating the query might yield different results if records were added  
or deleted in the meantime.
	- Complex search queries have imprecise rowcount estimates ; hence the  
joins that I would add to them will get suboptimal plans.


	Using a temp table is really the cleanest solution now ; but it's too  
slow so I reverted to generating big IN() clauses in the application.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC




The moral of the story is that you're probably better off running a
bunch of small selects than in trying to optimize things with one
gargantuan select.



Ever experiment with loading the parameters into a temp table and
joining to that?


Also, it might be worth re-testing that conclusion with PG CVS tip
(or 8.2 when it comes out).  The reimplementation of IN as = ANY that
I did a couple months ago might well change the results.


Long mail, but I think it's interesting...

	I think this is a generic problem, which is often encountered : selecting  
a bunch of records based on a list of primary keys (or other indexed,  
unique field) ; said list being anything from very short to quite large.

Here are a few occurences of this need :

	1- The application supplies a list of id's (the case of the OP of this  
thread)
	2- A query Q1 yields a list of selected objects , that we wish to use in  
several subsequent queries.
	And Q1 is a query we don't wish to do several times, either because it's  
slow, complicated (advanced search, for instance), or it acts on a  
constantly moving dataset, so the results would be different each time. So  
we store the result of Q1 in the application, or in a temp table, or in an  
array in a plpgsql variable, whatever, to reuse them.


	Then, for each of these objects, often we will make more queries to  
resolve foreign keys (get category name, owner name, from categories and  
users tables, etc).


	I have encountered both cases quite often, and they both pose a few  
problems. I think it would be a good opportunity for a new feature (see  
below).

A typical use case for point 2 :

Consider an "objects" table. Each object ...
	- is related to one or several rows from the "categories" table via an  
"objects_categories" link table.

- has an owner_id referencing the "users" table

	I do an "advanced search" query on "objects", which returns a list of  
objects. I can join directly to "users" to get the owner's name, but  
joining to "categories" is already problematic because of the many-to-many  
relationship.


	I wish to do this : fetch all objects matching the search criteria ;  
fetch the owner users ; fetch the categories ; build in my application  
object space a clean and straightforward data representation for all this.


Also :
- I do not wish to complicate the search query.
	- The row estimates for the search query results are likely to be "not so  
good" (because it's a complex query) ; so the joins to users and  
categories are likely to use suboptimal plans based on "not so good"  
estimates.
	- The rows from "objects" are large ; so moving them around through a lot  
of small joins hurts performance.


The obvious solution is this :

BEGIN;
CREATE TEMPORARY TABLE results ON COMMIT DROP AS SELECT * FROM advanced  
search query;

ANALYZE results;

-- get the results to the application
SELECT * FROM results;

-- get object owners info
SELECT * FROM users WHERE id IN (SELECT user_id FROM results);

-- get category info
SELECT * FROM categories WHERE id IN (SELECT category_id FROM  
objects_to_categories WHERE object_id IN (SELECT id FROM results));


-- get object/category relations (the ORM will use this to link objects in  
the application)
SELECT * FROM objects_to_categories WHERE object_id IN (SELECT id FROM  
results);

COMMIT;

You might wonder why I do it this way on the "categories" table.
	This is because I use an Object-Relational mapper which will instantiate  
a User or Category class object for each row I fetch from these tables. I  
do not want to fetch just the username, using a simple join, but I want  
the full object, because :
	- I want to instantiate these objects (they have useful methods to  
process rights etc)

- I do not want to mix columns from "objects" and "users"

	And I do not wish to instantiate each category more than once. This would  
waste memory, but more importantly, it is a lot cleaner to have only one  
instance per row, because my ORM then translates the foreign key relations  
into object relations (pointers). Each instanciated category will contain  
a list of Object instances ; each Object instance will contain a list of  
the categories it belongs to, and point to its owner user.


	Back to the point : I can't use the temp table method, because temp  
tables are too slow.
	Creating a temp table, filling it, analyzing it and then dropping it  
takes about 100 ms. The search query, on average, takes 10 ms.


	So I have to move this logic to the application, or to plpgsql, and jump  
through hoops and use big IN() clauses ; which has the following drawbacks  
:

- slow
- ugly
- very hard for the ORM to auto-generate

***

Feature proposal :

	A way to store query results in a named buffer and reuse them in the next  
queries.
	This should be as fast as possible, store results in RAM

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread PFC




Bulk loading speed is irrelevant here - that is dominated by parsing,  
which

we have covered copiously (har har) previously and have sped up by 500%,
which still makes Postgres < 1/2 the loading speed of MySQL.


Let's ask MySQL 4.0


LOAD DATA INFILE blah

0 errors, 666 warnings

SHOW WARNINGS;

not implemented. upgrade to 4.1

duh

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread PFC



Just to add a little anarchy in your nice debate...

Who really needs all the results of a sort on your terabyte table ?

	I guess not many people do a SELECT from such a table and want all the  
results. So, this leaves :

- Really wanting all the results, to fetch using a cursor,
- CLUSTER type things, where you really want everything in order,
	- Aggregates (Sort->GroupAggregate), which might really need to sort the  
whole table.
	- Complex queries where the whole dataset needs to be examined, in order  
to return a few values

- Joins (again, the whole table is probably not going to be selected)
- And the ones I forgot.

However,

Most likely you only want to SELECT N rows, in some ordering :
- the first N (ORDER BY x LIMIT N)
- last N (ORDER BY x DESC LIMIT N)
- WHERE x>value ORDER BY x LIMIT N
- WHERE x	Or, you are doing a Merge JOIN against some other table ; in that case,  
yes, you might need the whole sorted terabyte table, but most likely there  
are WHERE clauses in the query that restrict the set, and thus, maybe we  
can get some conditions or limit values on the column to sort.


	Also the new, optimized hash join, which is more memory efficient, might  
cover this case.


	Point is, sometimes, you only need part of the results of your sort. And  
the bigger the sort, the most likely it becomes that you only want part of  
the results. So, while we're in the fun hand-waving, new algorithm trying  
mode, why not consider this right from the start ? (I know I'm totally in  
hand-waving mode right now, so slap me if needed).


I'd say your new, fancy sort algorithm needs a few more input values :

- Range of values that must appear in the final result of the sort :
		none, minimum, maximum, both, or even a set of values from the other  
side of the join, hashed, or sorted.

- LIMIT information (first N, last N, none)
	- Enhanced Limit information (first/last N values of the second column to  
sort, for each value of the first column) (the infamous "top10 by  
category" query)

- etc.

	With this, the amount of data that needs to be kept in memory is  
dramatically reduced, from the whole table (even using your compressed  
keys, that's big) to something more manageable which will be closer to the  
size of the final result set which will be returned to the client, and  
avoid a lot of effort.


	So, this would not be useful in all cases, but when it applies, it would  
be really useful.


Regards !


















---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PERFORM] Avoiding tuple construction/deconstruction during joining

2005-03-20 Thread PFC

	On my machine (Laptop with Pentium-M 1.6 GHz and 512MB DDR333) I get the  
following timings :

	Big Joins Query will all the fields and no order by (I just put a SELECT  
* in the first table) yielding about 6k rows :
	=> 12136.338 ms

	Replacing the SELECT * from the table with many fields by just a SELECT  
of the foreign key columns :
	=> 1874.612 ms

	I felt like playing a bit so I implemented a hash join in python  
(download the file, it works on Miroslav's data) :
	All timings do not include time to fetch the data from the database.  
Fetching all the tables takes about 1.1 secs.

	* With something that looks like the current implementation (copying  
tuples around) and fetching all the fields from the big table :
	=> Fetching all the tables : 1.1 secs.
	=> Joining : 4.3 secs

* Fetching only the integer fields
=> Fetching all the tables : 0.4 secs.
=> Joining : 1.7 secs
	* A smarter join which copies nothing and updates the rows as they are  
processed, adding fields :
	=> Fetching all the tables :  1.1 secs.
	=> Joining : 0.4 secs
	With the just-in-time compiler activated, it goes down to about 0.25  
seconds.

	First thing, this confirms what Tom said.
	It also means that doing this query in the application can be a lot  
faster than doing it in postgres including fetching all of the tables.  
There's a problem somewhere ! It should be the other way around ! The  
python mappings (dictionaries : { key : value } ) are optimized like crazy  
but they store column names for each row. And it's a dynamic script  
language ! Argh.

Note : run the program like this :
python test.py |less -S
	So that the time spent scrolling your terminal does not spoil the  
measurements.

Download test program :
http://boutiquenumerique.com/pf/miroslav/test.py
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] Avoiding tuple construction/deconstruction during joining

2005-03-20 Thread PFC

	I have asked him for the data and played with his queries, and obtained  
massive speedups with the following queries :

http://boutiquenumerique.com/pf/miroslav/query.sql
http://boutiquenumerique.com/pf/miroslav/query2.sql
http://boutiquenumerique.com/pf/miroslav/materialize.sql
	Note that my optimized version of the Big Joins is not much faster that  
the materialized view without index (hash joins are damn fast in postgres)  
but of course using an index...

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

72 matches

Mail list logo