[SQL] DROP IF ...

2005-05-24 Thread CG
PostgreSQL 7.4 ...

I'm trying to find a way to drop a table via SQL if it contains 0 rows. Here
was my thought:

CREATE OR REPLACE FUNCTION dropif(text, bool)
  RETURNS bool AS
'DECLARE
  tblname ALIAS FOR $1;
  condition ALIAS FOR $2;
BEGIN
  IF (condition) THEN
EXECUTE(\'DROP TABLE "\' || tblname || \'";\');
  END IF;
  RETURN \'t\'::bool;
END;'
  LANGUAGE 'plpgsql' VOLATILE;

... then ...

BEGIN;
CREATE TABLE testtbl (i int4);
SELECT dropif('testtbl',(SELECT count(*)::int4 FROM testtbl) = 0);

ERROR:  relation 286000108 is still open
CONTEXT:  PL/pgSQL function "dropif" line 6 at execute statement

... It makes sense. The select is still open when the table is going to be
dropped. I need a different strategy. 

Please advise!

CG

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[SQL] Clustering problem

2005-07-08 Thread CG
I have what I call a "dictionary" table which supports a "master table". 

This dictionary table is designed to hold generic data : 

  CREATE TABLE sup_data (link_id uniqueidentifier, field_name varchar(255),
field_data text) WITH OIDS; ... 

It works well when you're digging into it to pull the supplementary information
for a small number of rows in the master table. It uses an index on the
link_id, and can jump right to the few pages for the supplemental data. That
was the design. 

Now "the powers that be" want to do some aggreate inquiries on subsets of the
generic data, based on many rows from the master table. This doesn't work so
well... Its having to pull many pages to create the result set to aggreate on. 

If I could cluster the generic data to match the clustering on the "master
table" it would reduce the number of pulled pages considerably and the speedup
would make it work well.

I'm trying to avoid replicating the column and index used to cluster the main
table in this dictionary table.

Is it even possible to cluster a table based on the clustering scheme (which is
not the link_id ...) from the master table? 

Can you gurus think of a better strategy? :) (Please??) :)

CG





Sell on Yahoo! Auctions – no fees. Bid on great items.  
http://auctions.yahoo.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Clustering problem

2005-07-11 Thread CG

Why would you then drop the index? Performance and storage issues? I imagine
that I would cluster the table at regular intervals to maintain the ordering,
so I'd need to to keep the index around, yes?

--- PFC <[EMAIL PROTECTED]> wrote:

> 
> 
> > Is it even possible to cluster a table based on the clustering scheme  
> > (which is
> > not the link_id ...) from the master table?
> >
> > Can you gurus think of a better strategy? :) (Please??) :)
> 
>   You can create a functional index on a function which returns the 
> desired  
> order by looking in the main table, cluster it, then drop the index...
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 





Sell on Yahoo! Auctions – no fees. Bid on great items.  
http://auctions.yahoo.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] empty set

2006-06-08 Thread CG
PostgreSQL 8.1

I've been trying to write a SQL prepare routine. One of the challenging
elements I'm running into is an empty set ...

"select foo from bar where foo in ? ;"

What if "?" is an set with zero elements? What is the proper value to use to
replace "?" indicating an empty set?

CG

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] empty set

2006-06-09 Thread CG

"select 1 where false" does indeed indicate an empty set. I was hoping for
something more elegant, but I'll take what I can get. :)

--- Bruno Wolff III <[EMAIL PROTECTED]> wrote:

> On Thu, Jun 08, 2006 at 14:40:12 -0700,
>   CG <[EMAIL PROTECTED]> wrote:
> > PostgreSQL 8.1
> > 
> > I've been trying to write a SQL prepare routine. One of the challenging
> > elements I'm running into is an empty set ...
> > 
> > "select foo from bar where foo in ? ;"
> > 
> > What if "?" is an set with zero elements? What is the proper value to use
> to
> > replace "?" indicating an empty set?
> 
> Something like the following suggests you can put a subquery there that
> returns
> 0 rows. I don't think that wil work for prepared queries though; so it may
> not
> help you.
> 
> bruno=> select * from test where test in (select 1 where false);
>  test
> --
> (0 rows)
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[SQL] Nested loops are killing throughput

2006-09-19 Thread CG
Postgresql 8.1

I've tried turning off nested loops to see what the query planner would choose 
instead of nested loops. It chose a hash join, and it still had a nested loop 
in the mix! How can I entice the query planner to use a simpler join scheme? 
What criteria is used to determine whether or not a merge join will be used?

Here's an explain analyze...

Subquery Scan foo  (cost=762.21..762.50 rows=1 width=12) (actual 
time=10784.849..10786.992 rows=9 loops=1)
  ->  GroupAggregate  (cost=762.21..762.24 rows=1 width=30) (actual 
time=10767.702..10768.539 rows=9 loops=1)
->  Sort  (cost=762.21..762.21 rows=1 width=30) (actual 
time=10767.525..10767.989 rows=360 loops=1)
  Sort Key: (dpdl.field_value)::timestamp with time zone
  ->  Nested Loop  (cost=2.01..762.20 rows=1 width=30) (actual 
time=283.824..10749.007 rows=360 loops=1)
->  Nested Loop  (cost=0.00..744.28 rows=1 width=16) 
(actual time=31.227..6980.765 rows=5436 loops=1)
  ->  Nested Loop  (cost=0.00..135.29 rows=101 
width=16) (actual time=25.514..273.660 rows=5436 loops=1)
->  Index Scan using ut_company_name_idx on 
user_table ut  (cost=0.00..21.96 rows=5 width=11) (actual time=6.596..6.649 
rows=9 loops=1)
  Index Cond: ((company_name)::text = Acme, 
Inc.'::text)
->  Index Scan using packet_user_idx on packet 
dp  (cost=0.00..19.89 rows=222 width=27) (actual time=16.939..28.025 rows=604 
loops=9)
  Index Cond: ((dp.username)::text = 
("outer".username)::text)
  Filter: (trans_date > 
(date_trunc('month'::text, (now() - '1 mon'::interval)) - '1 year 6 
mons'::interval))
  ->  Index Scan using packet_status_puuid_pkey on 
packet_status dps  (cost=0.00..6.02 rows=1 width=16) (actual time=1.226..1.228 
rows=1 loops=5436)
Index Cond: (dps.packet_uuid = 
"outer".packet_uuid)
Filter: ((status & 2) = 0)
->  Bitmap Heap Scan on packet_datalink dpdl  
(cost=2.01..17.87 rows=4 width=30) (actual time=0.685..0.688 rows=0 loops=5436)
  Recheck Cond: (dpdl.packet_uuid = "outer".packet_uuid)
  Filter: (((field_name)::text = 'event_date'::text) 
AND ((field_value)::date >= ('now'::text)::date))
  ->  Bitmap Index Scan on packet_dl_puuid_idx  
(cost=0.00..2.01 rows=4 width=0) (actual time=0.402..0.402 rows=4 loops=5436)
Index Cond: (dpdl.packet_uuid = 
"outer".packet_uuid)
Total runtime: 10787.198 ms

Also, no one here can figure out why the row count prediction is off on the 
packet table either. It's vacuumed and analyzed.

...

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread CG
- Original Message 
From: Tom Lane <[EMAIL PROTECTED]>
To: CG <[EMAIL PROTECTED]>
Cc: pgsql-sql@postgresql.org
Sent: Tuesday, September 19, 2006 11:03:07 AM
Subject: Re: [SQL] Nested loops are killing throughput


CG <[EMAIL PROTECTED]> writes:
> ->  Nested Loop  (cost=0.00..744.28 rows=1 width=16) 
> (actual time=31.227..6980.765 rows=5436 loops=1)
>   ->  Nested Loop  (cost=0.00..135.29 rows=101 
> width=16) (actual time=25.514..273.660 rows=5436 loops=1)
> ...
>   ->  Index Scan using packet_status_puuid_pkey on 
> packet_status dps  (cost=0.00..6.02 rows=1 width=16) (actual 
> time=1.226..1.228 rows=1 loops=5436)
> Index Cond: (dps.packet_uuid = 
> "outer".packet_uuid)
> Filter: ((status & 2) = 0)

>One problem you've got is that the planner has no stats about the
>selectivity of that status condition.  My advice would be to forget the
>cute bitmask and store one or more plain boolean columns.  Easier to
>write queries against and way more tractable statistically.

n/p ... I'll ditch it. 
 
Even if I simplify that query down to a straight-forward example...
 
select dp.*, dps.status from dpo.packet dp, dpo.packet_status dps where 
dp.packet_uuid = dps.packet_uuid and dp.username='joeuser';
 
I'm still getting the nested loop for a join.
 
Nested Loop  (cost=1.00..100013378.98 rows=2206 width=145) (actual 
time=46.743..18202.318 rows=2225 loops=1)
  ->  Index Scan using packet_user_idx on packet dp  (cost=0.00..88.03 
rows=2206 width=125) (actual time=42.263..124.519 rows=2225 loops=1)
Index Cond: ((username)::text = 'joeuser'::text)
  ->  Index Scan using packet_status_puuid_pkey on packet_status dps  
(cost=0.00..6.01 rows=1 width=20) (actual time=8.115..8.117 rows=1 loops=2225)
Index Cond: ("outer".packet_uuid = dps.packet_uuid)
Total runtime: 18205.880 ms

 
These indexes are being used, and the tables are freshly vacuum-analyzed...
 
CREATE UNIQUE INDEX packet_puuid_idx
  ON dpo.packet
  USING btree
  (packet_uuid);
 
CREATE INDEX packet_user_idx
  ON dpo.packet
  USING btree
  (username);
 
CREATE UNIQUE INDEX packet_status_puuid_pkey
  ON dpo.packet_status
  USING btree
  (packet_uuid);
 
I have no idea why I'm having such a difficult time digging through this data. 
It should be very straight-forward.
 
CG
 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread CG
>If packet_status is large, that seems like a perfectly reasonable plan
>to me.  If not ... what data type is packet_uuid?  Is its equality op
>marked mergeable or hashable?

It is of type uniqueidentifier ... 
 
CREATE OPERATOR =(
  PROCEDURE = uniqueidentifier_eq,
  LEFTARG = uniqueidentifier,
  RIGHTARG = uniqueidentifier,
  COMMUTATOR = =,
  RESTRICT = eqsel,
  JOIN = eqjoinsel,
  HASHES,
  MERGES,
  SORT1 = <,
  SORT2 = <,
  LTCMP = <,
  GTCMP = >);
 
It looks like it is marked both mergable and hashable.
 
So what size (I have no idea what units to use to think about table size... 
kilobytes? pages? rows?) is "large"  ... It seems like we recently stepped over 
that imaginary line between medium and large.
 
We could re-tune PostgreSQL... What settings would make sense to tweak? Sort 
Memory? 
 
We could partition the tables... I understand the theory behind table 
partitioning and partial indexes, but I've never come across a howto or even a 
best practices list.
 
We could remove data from the table... I don't like this as much, but if that's 
the only way, that's the only way.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread CG
>Is that the one off gborg?  It's broken because the equality function is
>marked volatile, and so the planner is afraid to try to use it for
>merging or hashing.  (It's also not marked strict, which means you can
>trivially crash the backend by passing it a null ...)

It is the one from gborg. That's good to know.
 
So to fix, I should alter the supporting functions and set the ones that should 
not be VOLATILE to IMMUTABLE (newid() should be VOLATILE and that's about it) 
and set them to STRICT as well. I haven't had a problem with it crashing my 
backend in the three years we've been using it, but I'd rather not take any 
chances.
 
I know the hacker list has been batting back and forth the idea of a built-in 
UUID type. I'll be interested to see if you end up with a standard UUID type, a 
fixed field type with multiple generators for differing levels of paranoia, or 
the status quo.
 
CG

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Nested loops are killing throughput

2006-09-20 Thread CG
>Is that the one off gborg?  It's broken because the equality function is
>marked volatile, and so the planner is afraid to try to use it for
>merging or hashing.  (It's also not marked strict, which means you can
>trivially crash the backend by passing it a null ...)

Tom, you are a genius. I changed the supporting functions and started getting 
what I expected. 
 
You were right on the money and saved my bacon.
 
I don't say it often enough: Thanks!

---(end of broadcast)---
TIP 6: explain analyze is your friend