subject:"\[HACKERS\] Big IN\(\) clauses etc \: feature proposal"

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-18 Thread Gregory S. Williamson

1.451 ms = 1.451 milliseconds
1451.0 ms = 1.451 seconds ...

so 32.918 ms for a commit seems perhaps reasonable ?

Greg Williamson
DBA
GlobeXplorer LLC



-Original Message-
From:   [EMAIL PROTECTED] on behalf of Zeugswetter Andreas DCP SD
Sent:   Thu 5/11/2006 12:55 AM
To: Jim C. Nasby; PFC
Cc: Greg Stark; Tom Lane; pgsql-performance@postgresql.org; 
pgsql-hackers@postgresql.org
Subject:Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal


> Something else worth considering is not using the normal 
> catalog methods
> for storing information about temp tables, but hacking that together
> would probably be a rather large task.

But the timings suggest, that it cannot be the catalogs in the worst
case
he showed.

> 0.101 ms BEGIN
> 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
NOT  
> NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP

1.4 seconds is not great for create table, is that what we expect ?

> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
DESC  
> LIMIT 20
> 0.443 ms ANALYZE tmp
> 0.365 ms SELECT * FROM tmp
> 0.310 ms DROP TABLE tmp
> 32.918 ms COMMIT
> 
>   CREATING the table is OK, but what happens on COMMIT ? I hear
the disk  
> seeking frantically.

The 32 seconds for commit can hardly be catalog related. It seems the
file is 
fsynced before it is dropped.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

!DSPAM:446c0a75172664042098162!





---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Jim C. Nasby

On Thu, May 11, 2006 at 06:08:36PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > I'd hope that wasn't what's happening... is the backend smart enough to
> > know not to fsync anything involved with the temp table?
> 
> The catalog entries required for it have to be fsync'd, unless you enjoy
> putting your entire database at risk (a bad block in pg_class, say,
> would probably take out more than one table).

Yeah, thought about that after sending... :(

> It's interesting to speculate about keeping such catalog entries in
> child tables of pg_class etc that are themselves temp tables.  Resolving
> the apparent circularity of this is left as an exercise for the reader.

Well, since it'd be a system table with a fixed OID there could
presumably be a special case in the recovery code for it, though that's
pretty fugly sounding.

Another alternative would be to support global temp tables... I think
that would handle all the complaints of the OP except for the cost of
analyze. I suspect this would be easier to do than creating a special
type of temp table that used tuplestore instead of the full table
framework, and it'd certainly be more general-purpose.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Tom Lane

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> I'd hope that wasn't what's happening... is the backend smart enough to
> know not to fsync anything involved with the temp table?

The catalog entries required for it have to be fsync'd, unless you enjoy
putting your entire database at risk (a bad block in pg_class, say,
would probably take out more than one table).

It's interesting to speculate about keeping such catalog entries in
child tables of pg_class etc that are themselves temp tables.  Resolving
the apparent circularity of this is left as an exercise for the reader.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread PFC





> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
DESC
> LIMIT 20
> 0.443 ms ANALYZE tmp
> 0.365 ms SELECT * FROM tmp
> 0.310 ms DROP TABLE tmp
> 32.918 ms COMMIT



The 32 seconds for commit can hardly be catalog related. It seems the
file is
fsynced before it is dropped.


I'd hope that wasn't what's happening... is the backend smart enough to
know not to fsync anything involved with the temp table? ISTM that that
transaction shouldn't actually be creating any WAL traffic at all.
Though on the other hand there's no reason that DROP should be in the
transaction at all; maybe that's gumming things up during the commit.


	I included the DROP to make it clear that the time was spent in  
COMMITting, not in DROPping the table.
	Also, you can't use CREATE TEMP TABLE AS SELECT ... and at the same time  
make it ON COMMIT DROP. You have to CREATE and INSERT.
	With an ON COMMIT DROP temp table, the global timings are the same wether  
or not it is dropped before commit : it is always the COMMIT which takes  
all the milliseconds.


	I still bet on system catalog updates being the main cause of the time  
spent in COMMIT...

(because ANALYZE changes this time)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Jim C. Nasby

On Thu, May 11, 2006 at 09:55:15AM +0200, Zeugswetter Andreas DCP SD wrote:
> 
> > Something else worth considering is not using the normal 
> > catalog methods
> > for storing information about temp tables, but hacking that together
> > would probably be a rather large task.
> 
> But the timings suggest, that it cannot be the catalogs in the worst
> case
> he showed.
> 
> > 0.101 ms BEGIN
> > 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
> NOT  
> > NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
> 
> 1.4 seconds is not great for create table, is that what we expect ?
milliseconds... :) Given the amount of code and locking that it looks
like is involved in creating a table, that might not be unreasonable...

> > 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
> DESC  
> > LIMIT 20
> > 0.443 ms ANALYZE tmp
> > 0.365 ms SELECT * FROM tmp
> > 0.310 ms DROP TABLE tmp
> > 32.918 ms COMMIT
> > 
> > CREATING the table is OK, but what happens on COMMIT ? I hear
> the disk  
> > seeking frantically.
> 
> The 32 seconds for commit can hardly be catalog related. It seems the
> file is 
> fsynced before it is dropped.

I'd hope that wasn't what's happening... is the backend smart enough to
know not to fsync anything involved with the temp table? ISTM that that
transaction shouldn't actually be creating any WAL traffic at all.
Though on the other hand there's no reason that DROP should be in the
transaction at all; maybe that's gumming things up during the commit.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Jim C. Nasby

On Thu, May 11, 2006 at 08:43:46PM +0200, Martijn van Oosterhout wrote:
> On Thu, May 11, 2006 at 11:35:34AM -0400, Greg Stark wrote:
> > I can say that I've seen plenty of instances where the ability to create
> > temporary tables very quickly with no overhead over the original query would
> > be useful.
> 
> I wonder if this requires what the standard refers to as a global
> temporary table. As I read it (which may be wrong, I find the language
> obtuse), a global temporary table is a temporary table whose structure
> is predefined. So, you'd define it once, updating the catalog only once
> but still get a table that is emptied each startup.
> 
> Ofcourse, it may not be what the standard means, but it still seems
> like a useful idea, to cut down on schema bloat.

IIRC that's the exact syntax Oracle uses:

CREATE GLOBAL TEMPORARY TABLE ...

I always found it a bit odd, since it always seemed to me like a global
temporary table would be one that every backend could read... something
akin to a real table that doesn't worry about fsync or any of that (and
is potentially not backed on disk at all).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Martijn van Oosterhout

On Thu, May 11, 2006 at 11:35:34AM -0400, Greg Stark wrote:
> I can say that I've seen plenty of instances where the ability to create
> temporary tables very quickly with no overhead over the original query would
> be useful.

I wonder if this requires what the standard refers to as a global
temporary table. As I read it (which may be wrong, I find the language
obtuse), a global temporary table is a temporary table whose structure
is predefined. So, you'd define it once, updating the catalog only once
but still get a table that is emptied each startup.

Ofcourse, it may not be what the standard means, but it still seems
like a useful idea, to cut down on schema bloat.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Greg Stark

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:

> Perhaps it would be worth creating a class of temporary tables that used
> a tuplestore, although that would greatly limit what could be done with
> that temp table.

I can say that I've seen plenty of instances where the ability to create
temporary tables very quickly with no overhead over the original query would
be useful.

For instance, in one site I had to do exactly what I always advise others
against: use offset/limit to implement paging. So first I have to execute the
query with a count(*) aggregate to get the total, then execute the same query
a second time to fetch the actual page of interest. This would be (or could be
arranged to be) within the same transaction and doesn't require the ability to
execute any dml against the tuple store which I imagine would be the main
issues?

For bonus points what would be real neat would be if the database could notice
shared plan segments, keep around the materialized tuple store, and substitute
it instead of reexecuting that segment of the plan. Of course this requires
keeping track of transaction snapshot states and making sure it's still
correct.

> Something else worth considering is not using the normal catalog methods
> for storing information about temp tables, but hacking that together
> would probably be a rather large task.

It would be nice if using this feature didn't interact poorly with preplanning
all your queries and using the cached plans. Perhaps if you had some way to
create a single catalog entry that defined all the column names and types and
then simply pointed it at a new tuplestore each time without otherwise
altering the catalog entry?

-- 
greg

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Martijn van Oosterhout

On Thu, May 11, 2006 at 09:55:15AM +0200, Zeugswetter Andreas DCP SD wrote:
> > 0.101 ms BEGIN
> > 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
> NOT  
> > NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
> 
> 1.4 seconds is not great for create table, is that what we expect ?

Hmm, I'm hoping ms means milliseconds...
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Zeugswetter Andreas DCP SD


> Something else worth considering is not using the normal 
> catalog methods
> for storing information about temp tables, but hacking that together
> would probably be a rather large task.

But the timings suggest, that it cannot be the catalogs in the worst
case
he showed.

> 0.101 ms BEGIN
> 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
NOT  
> NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP

1.4 seconds is not great for create table, is that what we expect ?

> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
DESC  
> LIMIT 20
> 0.443 ms ANALYZE tmp
> 0.365 ms SELECT * FROM tmp
> 0.310 ms DROP TABLE tmp
> 32.918 ms COMMIT
> 
>   CREATING the table is OK, but what happens on COMMIT ? I hear
the disk  
> seeking frantically.

The 32 seconds for commit can hardly be catalog related. It seems the
file is 
fsynced before it is dropped.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC




On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote:

You mean the cursors'storage is in fact the same internal machinery
as a  temporary table ?


Use the source, Luke...


LOL, yeah, I should have, sorry.


See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and
heap_create_with_catalog in backend/catalog/heap.c. You'll find that
creating a tuplestore is far easier than creating a temp table.


	I had used intuition (instead of the source) to come at the same  
conclusion regarding the level of complexity of these two...

But I'll look at the source ;)


Perhaps it would be worth creating a class of temporary tables that used
a tuplestore, although that would greatly limit what could be done with
that temp table.


	Just selecting from it I guess, but that's all that's needed. Anymore  
would duplicate the functionality of a temp table.
	I find cursors awkward. The application can FETCH from them, but postgres  
itself can't do it in SQL, unless using FOR.. IN in plpgsql...
	It would be a powerful addition to be able to split queries, factor out  
common parts between multiple queries, etc, using this system, it can even  
be used to execute an inner part of a query, then plan the rest according  
to the results and execute it... without the overhead of a temp table.





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-10 Thread Jim C. Nasby

On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote:
>   You mean the cursors'storage is in fact the same internal machinery 
>   as a  temporary table ?

Use the source, Luke...

See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and
heap_create_with_catalog in backend/catalog/heap.c. You'll find that
creating a tuplestore is far easier than creating a temp table.

Perhaps it would be worth creating a class of temporary tables that used
a tuplestore, although that would greatly limit what could be done with
that temp table.

Something else worth considering is not using the normal catalog methods
for storing information about temp tables, but hacking that together
would probably be a rather large task.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC




Creating cursors for a simple plan like a single sequential scan is fast
because it's using the original data from the table.


I used the following query :

SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20

It's a backward index scan + limit... not a seq scan. And it's damn 
fast :

0.042 ms BEGIN
0.153 ms DECLARE tmp SCROLL CURSOR WITHOUT HOLD FOR SELECT * FROM  
bookmarks ORDER BY annonce_id DESC LIMIT 20

0.246 ms FETCH ALL FROM tmp
0.048 ms MOVE FIRST IN tmp
0.246 ms FETCH ALL FROM tmp
0.048 ms CLOSE tmp
0.084 ms COMMIT



But your example was
predicated on this part of the job being a complex query. If it's a  
complex
query involving joins and groupings, etc, then it will have to be  
materialized
and there's no (good) reason for that to be any faster than a temporary  
table

which is effectively the same thing.


	You mean the cursors'storage is in fact the same internal machinery as a  
temporary table ?


	In that case, this raises an interesting question : why is the cursor  
faster ?


	Let's try a real-life example from my website : it is a search query  
(quite complex) which is then joined to a lot of tables to resolve FKeys.
	To that query I must add add an application-made join using a big IN()  
clause extracted from the data.

Timings includes the time to fetch the results into Python.
The "running total" column is the sum of all timings since the BEGIN.


query_time  running_total   rowsquery
0.061 ms0.061 ms-1 BEGIN
23.420 ms   23.481 ms   85 SELECT * FROM (huge query with a  
lot of joins)
4.318 ms27.799 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u WHERE u.id=l.user_id AND l.annonce_id IN  
(list of ids from previous query) ORDER BY annonce_id, added

0.241 ms28.040 ms   -1  COMMIT

	(Just in case you want to hurt yourself, here's the EXPLAIN ANALYZE  
output : http://peufeu.com/temp/big_explain.txt)

Using a cursor takes about the same time.

	Also, doing just the search query takes about 12 ms, the joins take up  
the rest.


Now, I'll rewrite my query eliminating the joins and using a temp table.
	Storing the whole result in the temp table will be too slow, because  
there are too many columns.
	Therefore I will only store the primary and foreign key columns, and join  
again to the main table to get the full records.


query_time  running_total   rowsquery
0.141 ms0.141 ms-1  BEGIN

Do the search :

8.229 ms8.370 ms-1  CREATE TEMPORARY TABLE tmp AS  
SELECT id, city_id, zipcode, contact_id, contact_group_id, price/terrain  
as sort FROM (stripped down search query)

0.918 ms9.287 ms-1  ANALYZE tmp

Fetch the main data to display :

7.663 ms16.951 ms   85  SELECT a.* FROM tmp t,  
annonces_display a WHERE a.id=t.id ORDER BY t.sort


Fetch log entries associates with each row (one row to many log 
entries) :

1.021 ms17.972 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u, tmp t WHERE u.id=l.user_id AND l.annonce_id  
= t.id ORDER BY annonce_id, added
3.468 ms21.440 ms   216 SELECT annonce_id,  
array_accum(list_id) AS list_ids, array_accum(COALESCE(user_id,0)) AS  
list_added_by, max(added) AS added_to_list FROM bookmarks GROUP BY  
annonce_id


Resolve foreign key relations

1.034 ms22.474 ms   37  SELECT r.annonce_id FROM  
read_annonces r, tmp t WHERE r.annonce_id = t.id
0.592 ms23.066 ms   9   SELECT * FROM cities_dist_zipcode  
WHERE zipcode IN (SELECT zipcode FROM tmp)
0.716 ms23.782 ms   11  SELECT * FROM cities_dist WHERE id  
IN (SELECT city_id FROM tmp)
1.125 ms24.907 ms   45  SELECT * FROM contacts WHERE id IN  
(SELECT contact_id FROM tmp)
0.799 ms25.705 ms   42  SELECT * FROM contact_groups WHERE  
id IN (SELECT contact_group_id FROM tmp)

0.463 ms26.169 ms   -1  DROP TABLE tmp
32.208 ms   58.377 ms   -1  COMMIT


From this we see :

	Using a temporary table is FASTER than doing the large query with all the  
joins. (26 ms versus 28 ms).

It's also nicer and cleaner.
However the COMMIT takes as much time as all the queries together !

Let's run with fsync=off :

query_time  running_total   rowsquery
0.109 ms0.109 ms-1  BEGIN
8.321 ms8.430 ms-1  CREATE TEMPORARY TABLE tmp AS  
SELECT id, city_id, zipcode, contact_id, contact_group_id, price/terrain  
as sort FROM (stripped down search query)

0.849 ms9.280 ms-1  ANALYZE tmp
7.360 ms16.640 ms   85  SELECT a.* FROM tmp t,  
annonces_display a WHERE a.id=t.id ORDER BY t.sort
1.067 ms17.707 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u, t

Re: [HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread Greg Stark

PFC <[EMAIL PROTECTED]> writes:

> 
>   I really like this. It's clean, efficient, and easy to use.
> 
>   This would be a lot faster than using temp tables.
>   Creating cursors is very fast so we can create two, and avoid doing
> twice  the same work (ie. hashing the ids from the results to grab categories
> only once).

Creating cursors for a simple plan like a single sequential scan is fast
because it's using the original data from the table. But your example was
predicated on this part of the job being a complex query. If it's a complex
query involving joins and groupings, etc, then it will have to be materialized
and there's no (good) reason for that to be any faster than a temporary table
which is effectively the same thing.

-- 
greg

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC




The moral of the story is that you're probably better off running a
bunch of small selects than in trying to optimize things with one
gargantuan select.



Ever experiment with loading the parameters into a temp table and
joining to that?


Also, it might be worth re-testing that conclusion with PG CVS tip
(or 8.2 when it comes out).  The reimplementation of IN as = ANY that
I did a couple months ago might well change the results.


Long mail, but I think it's interesting...

	I think this is a generic problem, which is often encountered : selecting  
a bunch of records based on a list of primary keys (or other indexed,  
unique field) ; said list being anything from very short to quite large.

Here are a few occurences of this need :

	1- The application supplies a list of id's (the case of the OP of this  
thread)
	2- A query Q1 yields a list of selected objects , that we wish to use in  
several subsequent queries.
	And Q1 is a query we don't wish to do several times, either because it's  
slow, complicated (advanced search, for instance), or it acts on a  
constantly moving dataset, so the results would be different each time. So  
we store the result of Q1 in the application, or in a temp table, or in an  
array in a plpgsql variable, whatever, to reuse them.


	Then, for each of these objects, often we will make more queries to  
resolve foreign keys (get category name, owner name, from categories and  
users tables, etc).


	I have encountered both cases quite often, and they both pose a few  
problems. I think it would be a good opportunity for a new feature (see  
below).

A typical use case for point 2 :

Consider an "objects" table. Each object ...
	- is related to one or several rows from the "categories" table via an  
"objects_categories" link table.

- has an owner_id referencing the "users" table

	I do an "advanced search" query on "objects", which returns a list of  
objects. I can join directly to "users" to get the owner's name, but  
joining to "categories" is already problematic because of the many-to-many  
relationship.


	I wish to do this : fetch all objects matching the search criteria ;  
fetch the owner users ; fetch the categories ; build in my application  
object space a clean and straightforward data representation for all this.


Also :
- I do not wish to complicate the search query.
	- The row estimates for the search query results are likely to be "not so  
good" (because it's a complex query) ; so the joins to users and  
categories are likely to use suboptimal plans based on "not so good"  
estimates.
	- The rows from "objects" are large ; so moving them around through a lot  
of small joins hurts performance.


The obvious solution is this :

BEGIN;
CREATE TEMPORARY TABLE results ON COMMIT DROP AS SELECT * FROM advanced  
search query;

ANALYZE results;

-- get the results to the application
SELECT * FROM results;

-- get object owners info
SELECT * FROM users WHERE id IN (SELECT user_id FROM results);

-- get category info
SELECT * FROM categories WHERE id IN (SELECT category_id FROM  
objects_to_categories WHERE object_id IN (SELECT id FROM results));


-- get object/category relations (the ORM will use this to link objects in  
the application)
SELECT * FROM objects_to_categories WHERE object_id IN (SELECT id FROM  
results);

COMMIT;

You might wonder why I do it this way on the "categories" table.
	This is because I use an Object-Relational mapper which will instantiate  
a User or Category class object for each row I fetch from these tables. I  
do not want to fetch just the username, using a simple join, but I want  
the full object, because :
	- I want to instantiate these objects (they have useful methods to  
process rights etc)

- I do not want to mix columns from "objects" and "users"

	And I do not wish to instantiate each category more than once. This would  
waste memory, but more importantly, it is a lot cleaner to have only one  
instance per row, because my ORM then translates the foreign key relations  
into object relations (pointers). Each instanciated category will contain  
a list of Object instances ; each Object instance will contain a list of  
the categories it belongs to, and point to its owner user.


	Back to the point : I can't use the temp table method, because temp  
tables are too slow.
	Creating a temp table, filling it, analyzing it and then dropping it  
takes about 100 ms. The search query, on average, takes 10 ms.


	So I have to move this logic to the application, or to plpgsql, and jump  
through hoops and use big IN() clauses ; which has the following drawbacks  
:

- slow
- ugly
- very hard for the ORM to auto-generate

***

Feature proposal :

	A way to store query results in a named buffer and reuse them in the next  
queries.
	This should be as fast as possible, store results in RAM

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

Re: [HACKERS] Big IN() clauses etc : feature proposal

[HACKERS] Big IN() clauses etc : feature proposal

15 matches

Site Navigation

Mail list logo

Footer information