[SQL] PL/PGSQL TUTORIAL

2003-10-11 Thread Muhyiddin A.M Hayat


Where can i find a complete full terminated database 
schema in SQL?I want to see a real complete database schema with views, 
triggers.. etc,...
 


Re: [SQL] PL/PGSQL TUTORIAL

2003-10-11 Thread Richard Huxton
On Saturday 11 October 2003 08:57, Muhyiddin A.M Hayat wrote:
> Where can i find a complete full terminated database schema in SQL?
>
> I want to see a real complete database schema with views, triggers..
> etc,...

Good places to start are:
  http://freshmeat.net/
  http://sourceforge.net/
That's not to say all the projects are high-quality, but you can probably find 
something that will suit your needs.

There's a lot of good information linked from http://techdocs.postgresql.org/ 
too.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] PL/PGSQL TUTORIAL

2003-10-11 Thread Richard Huxton
On Saturday 11 October 2003 08:57, Muhyiddin A.M Hayat wrote:
> Where can i find a complete full terminated database schema in SQL?
>
> I want to see a real complete database schema with views, triggers..
> etc,...

Oh - while I'm thinking of it, a couple of applications it might be worth 
looking at are OpenACS or Bricolage, both web-based content-management 
systems.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[SQL] sql performance and cache

2003-10-11 Thread Chris Faulkner
Hello all

I have two very similar queries which I need to execute. They both have
exactly the same from / where conditions. When I execute the first, it takes
about 16 seconds. The second is executed almost immediately after, it takes
13 seconds. In short, I'd like to know why the query result isn't being
cached and any ideas on how to improve the execution.

The first query attempts to find the maximum size of an array in the result
set- the field is called "level". IT contains anything between 1 and 10
integers. I just need to know what the largest size is. I do this to find
out the maximum size of the "level" array.

"max(replace(split_part(array_dims(level),':',2),']','')::int)"

I know this is big and ugly but is there any better way of doing it ?

The second query just returns the result set - it has exactly the same
FROM/Where clause.

OK - so I could execute the query once, and get the maximum size of the
array and the result set in one. I know what I am doing is less than optimal
but I had expected the query results to be cached. So the second execution
would be very quick. So why aren't they ? I have increased my cache size -
shared_buffers is 2000 and I have doubled the default max_fsm... settings
(although I am not sure what they do). sort_mem is 8192.

The from / where is

FROM oscar_node N, oscar_point P
where N."GEOM_ID_OF_POINT" = P."POINT_ID"
and N."TILE_REF" = P."TILE_REF"
and N."TILE_REF" in ('TQ27NE','TQ28SE','TQ37NW','TQ38SW')
and P."TILE_REF" in ('TQ27NE','TQ28SE','TQ37NW','TQ38SW')
and P."FEAT_CODE" = 3500
and P.wkb_geometry && GeometryFromText('BOX3D(529540.0 179658.88,530540.0
180307.12)'::box3d,-1)

oscar_node and oscar_point both have about 3m rows. PK on oscar_node is
composite of "TILE_REF" and "NODE_ID". PK on oscar_point is "TILE_REF" and
"POINT_ID". The tables are indexed on feat_code and I have an index on
wkb_geometry. (This is a GIST index). I have increased the statistics size
and done the analyze command.

Here is my explain plan

 Nested Loop  (cost=0.00..147.11 rows=1 width=148)
   Join Filter: ("inner"."GEOM_ID_OF_POINT" = "outer"."POINT_ID")
   ->  Index Scan using gidx_oscar_point on oscar_point p  (cost=0.00..61.34
rows=1 width=57)
 Index Cond: (wkb_geometry && 'SRID=-1;BOX3D(529540 179658.88
0,530540 180307.12 0)'::geometry)
 Filter: ((("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
'TQ38SW'::bpchar)) AND ("FEAT_CODE" = 3500))
   ->  Index Scan using idx_on_tile_ref on oscar_node n  (cost=0.00..85.74
rows=2 width=91)
 Index Cond: (n."TILE_REF" = "outer"."TILE_REF")
 Filter: (("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
'TQ38SW'::bpchar))


I am seeing this message in my logs.

"bt_fixroot: not valid old root page"

Maybe this is relevant to my performance problems.

I know this has been a long message but I would really appreciate any
performance tips.

Thanks


Chris



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] sql performance and cache

2003-10-11 Thread Richard Huxton
On Saturday 11 October 2003 10:43, Chris Faulkner wrote:
> Hello all
>
> I have two very similar queries which I need to execute. They both have
> exactly the same from / where conditions. When I execute the first, it
> takes about 16 seconds. The second is executed almost immediately after, it
> takes 13 seconds. In short, I'd like to know why the query result isn't
> being cached and any ideas on how to improve the execution.

The short answer is that PG doesn't cache query results. The only way it could 
do so safely is to lock all tables you access to make sure that no other 
process changes them. That would effectively turn PG into a single-user DB in 
short notice.

> The first query attempts to find the maximum size of an array in the result
> set- the field is called "level". IT contains anything between 1 and 10
> integers. I just need to know what the largest size is. I do this to find
> out the maximum size of the "level" array.
>
> "max(replace(split_part(array_dims(level),':',2),']','')::int)"
>
> I know this is big and ugly but is there any better way of doing it ?
>
> The second query just returns the result set - it has exactly the same
> FROM/Where clause.

I assume these two queries are linked? If you rely on the max size being 
unchanged and have more than one process using the database, you should make 
sure you lock the rows in question.

> OK - so I could execute the query once, and get the maximum size of the
> array and the result set in one. I know what I am doing is less than
> optimal but I had expected the query results to be cached. So the second
> execution would be very quick. So why aren't they ? I have increased my
> cache size - shared_buffers is 2000 and I have doubled the default
> max_fsm... settings (although I am not sure what they do). sort_mem is
> 8192.

PG will cache the underlying data, but not the results. The values you are 
changing are used to hold table/index rows etc. This means the second query 
shouldn't need to access the disk if the rows it requires are cached.

There is a discussion of the postgresql.conf file and how to tune it at:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN 
ANALYSE of either/both queries to the performance list. I'd drop the sql list 
when we're just talking about performance.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] sql performance and cache

2003-10-11 Thread Chris Faulkner
Hello

Thanks for the reply.

> The short answer is that PG doesn't cache query results. The only
> way it could
> do so safely is to lock all tables you access to make sure that no other
> process changes them. That would effectively turn PG into a
> single-user DB in
> short notice.

I am not sure I agree with you. I have done similar things with Oracle and
found that the second query will execute much more quickly than the first.
It could be made to work in at least two scenarios

- as a user/application perspective - you accept that the result might not
be up-to-date and take what comes back. This would be acceptable in my case
because I know that the tables will not change.
OR
- the database could cache the result set. If some of the data is changed by
another query or session, then the database flushes the result set out of
the cache.

> I assume these two queries are linked? If you rely on the max size being
> unchanged and have more than one process using the database, you
> should make
> sure you lock the rows in question.

I can rely on the max size remaining the same. As I mentioned above, the
tables are entirely read only. The data will not be updated or deleted by
anyone - I don't need to worry about that. The data will be updated en masse
once every 3 months.

> There is a discussion of the postgresql.conf file and how to tune it at:
>   http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

Thanks for that.

> Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN
> ANALYSE of either/both queries to the performance list. I'd drop
> the sql list
> when we're just talking about performance.

To be honest, my main concern was about the cache. If the second one could
use a cache amd execute in 2 seconds, that would be better that reducing the
execution of each individual query by 30% or so.

Thanks for the offer of help on this one. explain analyze gives me the same
as the last message - did you want verbose ?

 Nested Loop  (cost=0.00..147.11 rows=1 width=148) (actual
time=84.00..12323.00 rows=67 loops=1)
   Join Filter: ("inner"."GEOM_ID_OF_POINT" = "outer"."POINT_ID")
   ->  Index Scan using gidx_oscar_point on oscar_point p  (cost=0.00..61.34
rows=1 width=57) (actual time=0.00..9.00 rows=67 loops=1)
 Index Cond: (wkb_geometry && 'SRID=-1;BOX3D(529540 179658.88
0,530540 1
80307.12 0)'::geometry)
 Filter: ((("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
'TQ28SE'::bp
char) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" = 'TQ38SW'::bpchar))
AND
 ("FEAT_CODE" = 3500))
   ->  Index Scan using idx_on_tile_ref on oscar_node n  (cost=0.00..85.74
rows=2 width=91) (actual time=0.06..150.07 rows=4797 loops=67)
 Index Cond: (n."TILE_REF" = "outer"."TILE_REF")
 Filter: (("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
'TQ38SW'::bpchar))
 Total runtime: 12325.00 msec
(9 rows)

Thanks


Chris



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] sql performance and cache

2003-10-11 Thread Richard Huxton
On Saturday 11 October 2003 12:12, Chris Faulkner wrote:
> Hello
>
> Thanks for the reply.
>
> > The short answer is that PG doesn't cache query results. The only
> > way it could
> > do so safely is to lock all tables you access to make sure that no other
> > process changes them. That would effectively turn PG into a
> > single-user DB in
> > short notice.
>
> I am not sure I agree with you. I have done similar things with Oracle and
> found that the second query will execute much more quickly than the first.
> It could be made to work in at least two scenarios

I'm guessing because the underlying rows and perhaps the plan are cached, 
rather than the results. If you cached the results of the first query you'd 
only have the max length, not your other data anyway.

[snip]

> > I assume these two queries are linked? If you rely on the max size being
> > unchanged and have more than one process using the database, you
> > should make
> > sure you lock the rows in question.
>
> I can rely on the max size remaining the same. As I mentioned above, the
> tables are entirely read only. The data will not be updated or deleted by
> anyone - I don't need to worry about that. The data will be updated en
> masse once every 3 months.

Hmm - might be worth adding a column for your array length and pre-calculating 
if your data is basically static.

> > There is a discussion of the postgresql.conf file and how to tune it at:
> >   http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
>
> Thanks for that.
>
> > Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN
> > ANALYSE of either/both queries to the performance list. I'd drop
> > the sql list
> > when we're just talking about performance.
>
> To be honest, my main concern was about the cache. If the second one could
> use a cache amd execute in 2 seconds, that would be better that reducing
> the execution of each individual query by 30% or so.

I'm puzzled as to why they aren't both below 2 seconds to start with - you're 
not dealing with that many rows.

> Thanks for the offer of help on this one. explain analyze gives me the same
> as the last message - did you want verbose ?

Nope, this is what I need. Verbose prints pages of stuff that only the 
developers would be interested in. This one actually runs the query and gives 
you a second set of figures showing times.

>  Nested Loop  (cost=0.00..147.11 rows=1 width=148) (actual
> time=84.00..12323.00 rows=67 loops=1)
>Join Filter: ("inner"."GEOM_ID_OF_POINT" = "outer"."POINT_ID")
>->  Index Scan using gidx_oscar_point on oscar_point p 
> (cost=0.00..61.34 rows=1 width=57) (actual time=0.00..9.00 rows=67 loops=1)
>  Index Cond: (wkb_geometry && 'SRID=-1;BOX3D(529540 179658.88
> 0,530540 1
> 80307.12 0)'::geometry)
>  Filter: ((("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
> 'TQ28SE'::bp
> char) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
> 'TQ38SW'::bpchar)) AND
>  ("FEAT_CODE" = 3500))

This next bit is the issue. It's joining on TILE_REF and then filtering by 
your three static values. That's taking 67 * 150ms = 10.05secs

>->  Index Scan using idx_on_tile_ref on oscar_node n  (cost=0.00..85.74
> rows=2 width=91) (actual time=0.06..150.07 rows=4797 loops=67)
>  Index Cond: (n."TILE_REF" = "outer"."TILE_REF")
>  Filter: (("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
> 'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
> 'TQ38SW'::bpchar))

Now if you look at the first set of figures, it's estimating 2 rows rather 
than the 4797 you're actually getting. That's probably why it's chosen to 
join then filter rather than the other way around.

I'd suggest the following:
1. VACUUM FULL on the table in question if you haven't done so since the last 
update/reload. If you aren't doing this after every bulk upload, you probably 
should be.
2. VACUUM ANALYSE/ANALYSE the table.
3. Check the tuning document I mentioned and make sure your settings are at 
least reasonable. They don't have to be perfect - that last 10% takes 
forever, but if they are badly wrong it can cripple you.
4. PG should now have up-to-date stats and a reasonable set of config 
settings. If it's still getting its row estimates wrong, we'll have to look 
at the statistics its got.

If we reach the statistics tinkering stage, it might be better to wait til 
Monday if you can - more people on the list then.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[SQL] Running tally

2003-10-11 Thread Christopher Kings-Lynne
Hi guys,

If I have a table that is just a single column full of numbers, how can 
I select all the rows from the table with a second column that is the 
running tally so far down the result set?

eg:

Num Tally so far
0.3 0.3
1.2 1.5
2.0 3.5
...
Does this require PL/PgSQL coding?  If so, how do you actually construct 
an arbitrary row for returning?  The docs are somewhat unclear on this.

Chris



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [SQL] Running tally

2003-10-11 Thread Richard Huxton
On Saturday 04 October 2003 10:56, Christopher Kings-Lynne wrote:
> Hi guys,
>
> If I have a table that is just a single column full of numbers, how can
> I select all the rows from the table with a second column that is the
> running tally so far down the result set?
>
> eg:
>
> Num Tally so far
> 0.3   0.3
> 1.2   1.5
> 2.0   3.5
> ...
>
> Does this require PL/PgSQL coding?  If so, how do you actually construct
> an arbitrary row for returning?  The docs are somewhat unclear on this.

Did you post this days ago, or is your/my clock wrong - it reads 2003-10-04 in 
my mailer?

Anyway, you could do it with a sub-query, but performance will be less than 
great with a large result-set.

SELECT * FROM runtot ;
 idx | num
-+-
   1 |  10
   2 |  20
   3 |  30

CREATE FUNCTION runtot_sum(int4) RETURNS int4 AS 'SELECT sum(num)::int4 FROM 
runtot WHERE idx <= $1' LANGUAGE 'SQL';

SELECT idx, num, runtot_sum(idx) FROM runtot ORDER BY idx;
 idx | num | runtot_sum
-+-+
   1 |  10 | 10
   2 |  20 | 30
   3 |  30 | 60


Procedural is the way to go if you have a large result set. Stephan Szabo has 
written some good notes on set-returning functions:
http://techdocs.postgresql.org/guides/SetReturningFunctions

Basically, define a type:

CREATE TYPE run_tot_type AS (
  a whatever,
  b whatever,
  run_tot whatever
);

Then:

CREATE FUNCTION my_run_tot() RETURNS SETOF run_tot_type AS...

Accumulate your values in a record-type variable and use RETURN NEXT to issue 
each row.

HTH
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] Running tally

2003-10-11 Thread elein
You can use plpythonu (or tcl or C or R) to do running
sums.  For plpythonu, you must initialize the SD[]
by calling it first with the proper argument.

create or replace function runsum(int,int)
returns int as
'
  if args[0] == 1:
   SD["currval"] = 0
   return SD["currval"]
  else:
   try:
  SD["currval"] += args[1]
   except:
  SD["currval"] = args[1]
  return SD["currval"]
' language 'plpython';

select runsum(1,0);
select num, runsum(0,num) from tallytable;

Variations on this technique are discussed on
General Bits http://www.varlena.com/GeneralBits
under the Tidbits area listing talks from OSCON2003.

[EMAIL PROTECTED]


webstat=# select runsum(0,code), code, doc from temp_rawlogs;
n Sat, Oct 04, 2003 at 05:56:38PM +0800, Christopher Kings-Lynne wrote:
> Hi guys,
> 
> If I have a table that is just a single column full of numbers, how can 
> I select all the rows from the table with a second column that is the 
> running tally so far down the result set?
> 
> eg:
> 
> Num Tally so far
> 0.3   0.3
> 1.2   1.5
> 2.0   3.5
> ...
> 
> Does this require PL/PgSQL coding?  If so, how do you actually construct 
> an arbitrary row for returning?  The docs are somewhat unclear on this.
> 
> Chris
> 
> 
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>   http://archives.postgresql.org

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] PL/PGSQL TUTORIAL

2003-10-11 Thread Roberto Mello
On Sat, Oct 11, 2003 at 10:36:10AM +0100, Richard Huxton wrote:
> 
> Oh - while I'm thinking of it, a couple of applications it might be worth 
> looking at are OpenACS or Bricolage, both web-based content-management 
> systems.

OpenACS is not a content management system, but it provides a couple such
systems. OpenACS is a toolkit to build community-based web sites, handling
all the "low-level" stuff (sessions, authentication, content storage,
etc.) and providing many applications that use the toolkit, from content
management systems, to e-commerce, news, FAQs, blog, to an entire online
education vertical application: dotLRN (www.dotlrn.org)

-Roberto

-- 
+|Roberto Mello   -http://www.brasileiro.net/  |--+
+   Computer Science Graduate Student, Utah State University  +
+   USU Free Software & GNU/Linux Club - http://fslc.usu.edu/ +

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org