[GENERAL] oracle listener intercept

2006-09-13 Thread David Link
Hi,I'm looking at replacing an Oracle database that only has a few tables, but that is accessed by many programs in the company via oracle jdbc drivers, oracle odbc drivers and python database interface. Is there a way to intercept the calls to the oracle listener and redirect to postgres without changing the calling application?  
Thanks, David


Re: [GENERAL] ident authentication with named localhost

2006-05-12 Thread David Link

Tom Lane wrote:

David Link <[EMAIL PROTECTED]> writes:
  

Does anyone know how I can test ident?



I'd try sniffing the IP traffic to and from it with a packet sniffer
and/or tracing the daemon's system calls with strace.  Manually invoking
the daemon isn't going to prove a lot, you want to watch its reaction
to Postgres.
  
Thanks for your suggestion.  I'm new to the concept of packet sniffing 
and tracing.  Can you suggest where I should go or what I should read to 
better understand this?

I believe some flavors of identd have debug tracing options, too
... check the man page ...
  
Too bad no one else has reported this and already found an answer.  
Maybe I should move to md5 authentication, however I wanted to avoid 
having to type passwords.


Thanks,



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[GENERAL] ident authentication with named localhost

2006-05-11 Thread David Link

Hi,

I am having trouble with ident authentication.  Everything is working 
fine except when specifying host for connections on the local machine.


pg_hba.conf:

local   all all   ident wp
hostall all 10.97.8.0/24  ident wp

pg_ident.conf:

wp  dlink   dlink
wp  dlink   firstalert
wp  dlink   postgres
wp  dlink   video
wp  postgrespostgres
wp  wwwrun  firstalert
wp  wwwrun  video

If the db is on mach1 and the Unix user is dlink the following works

[EMAIL PROTECTED] psql -d mydb -U postgres
[EMAIL PROTECTED] psql -d mydb -U postgres -h mach1 # from remote machine

While the following does not: (nor with perl DBI)

[EMAIL PROTECTED] psql -d mydb -U postgres -h mach1
[EMAIL PROTECTED] psql -d mydb -U postgres -h localhost
[EMAIL PROTECTED] psql -d mydb -U postgres -h 10.97.8.244
[EMAIL PROTECTED] psql -d mydb -U postgres -h 127.0.0.1

If I add the following to pg_hba.conf it works of course:

hostall all 10.97.8.244/32trust

But this does not:

hostall all 10.97.8.244/32ident wp.

If I try as the postgres Unix user then it works:

[EMAIL PROTECTED] psql -d mydb -U postgres -h mach1

We are using:
SUSE 9 / Linux 2.6.5-7
Postgresql 8.1
And LDAP.

The problem might be due to how identd works on localhost with LDAP.  
The postgres user is found in /etc/passwd, while the dlink user is not. 

Incidentally, get this, on a second machine (with same software) what's 
described here as not working, works intermittently.  Now it worked.  
Now it didn't.  For dlink user.  Weird.


Does anyone know how I can test ident? I can telnet 10.97.8.244 113.  
The server port I know is 5432, but what's the client port to give?


Any and all help greatly appreciated. 
Thanks.

David Link





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] EXPLAIN SELECT .. does not return

2006-05-09 Thread David Link

Tom Lane wrote:

David Link <[EMAIL PROTECTED]> writes:
  
The following large EXPLAIN SELECT Statement fails to return, but 
continues to take up processing time until it is killed.

[ 52-way join... ]



Am I right in guessing that all the sales_xxx tables are the same size
and have similar statistics?  I think the problem is that the planner is
faced with a large set of essentially equivalent plans and isn't pruning
the list aggressively enough.  That's something we fixed in 8.0.
  

Correct.
  

Postgresql 7.4.8



You really oughta try something newer.  On my machine, 7.4.12 plans a
52-way join in about a minute, and 8.0 and 8.1 in under a second.
  
We just completed our upgrade to 8.1.3.  And we are happy campers!  Our 
Explain plan problem has gone away and everything runs faster.  I 
especially notice improved caching of repeated queries.  Hats off to you 
postgres folks.  Thank you very much.  Postgres rocks!



I wonder also if there's not a better way to design the query...
maybe a UNION ALL would work better than nested joins.
  
We need the info in separate columns.  I don't think we can do it with 
UNION.  That's why the many joins.


I understand though with the new tablespace and inheritence features in 
8/8.1 I could put all those sales tables back into one table and keep 
the data in separate files. 

regards, tom lane

  

Thanks again for all your help.

David Link
Nielsen Entertainment, White Plains, NY



---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] EXPLAIN SELECT .. does not return

2006-03-29 Thread David Link

Hi,

The following large EXPLAIN SELECT Statement fails to return, but 
continues to take up processing time until it is killed.


I've straightened up the postgresql.conf, as per Tom Lane's suggestions 
since last time I wrote about this.

See: http://archives.postgresql.org/pgsql-general/2005-12/msg00259.php

Any and all help, greatly appreciated.  Thank you.
David Link, White Plains, NY


Environment:

Linux 2.6.5-7.191 SMP
Postgresql 7.4.8
Database size: 110 Gb on disk.
vacuum analyze done

Processes:

postgres 15687 1  0 16:12 pts/100:00:00 /usr/bin/postmaster -D 
/db/pgsql
postgres 15693 15687  0 16:12 pts/100:00:00 postgres: stats buffer 
process   
postgres 15695 15693  0 16:12 pts/100:00:00 postgres: stats 
collector process   
postgres 17485 15687 99 16:17 pts/100:18:17 postgres: dlink usbkup 
[local] EXPLAIN


Configuration:

tcpip_socket = true
max_connections = 200
shared_buffers = 8000
sort_mem = 262144 
vacuum_mem = 65536
max_fsm_pages = 20
fsync = false
wal_sync_method = fsync
checkpoint_segments = 30
effective_cache_size = 131072
random_page_cost = 4
geqo = true

geqo_threshold = 13
default_statistics_target = 100
from_collapse_limit = 10
join_collapse_limit = 10
log_timestamp = true

stats_start_collector = true
stats_command_string = true
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'



The SQL:

explain
select
   t.tid,
   t.title,
   to_char (t.retail_reldate, 'mm-dd-yy') as retail_reldate,
   coalesce (s200401.units, 0) as s200401u,
   coalesce (s200402.units, 0) as s200402u,
   coalesce (s200403.units, 0) as s200403u,
   coalesce (s200404.units, 0) as s200404u,
   coalesce (s200405.units, 0) as s200405u,
   coalesce (s200406.units, 0) as s200406u,
   coalesce (s200407.units, 0) as s200407u,
   coalesce (s200408.units, 0) as s200408u,
   coalesce (s200409.units, 0) as s200409u,
   coalesce (s200410.units, 0) as s200410u,
   coalesce (s200411.units, 0) as s200411u,
   coalesce (s200412.units, 0) as s200412u,
   coalesce (s200413.units, 0) as s200413u,
   coalesce (s200414.units, 0) as s200414u,
   coalesce (s200415.units, 0) as s200415u,
   coalesce (s200416.units, 0) as s200416u,
   coalesce (s200417.units, 0) as s200417u,
   coalesce (s200418.units, 0) as s200418u,
   coalesce (s200419.units, 0) as s200419u,
   coalesce (s200420.units, 0) as s200420u,
   coalesce (s200421.units, 0) as s200421u,
   coalesce (s200422.units, 0) as s200422u,
   coalesce (s200423.units, 0) as s200423u,
   coalesce (s200424.units, 0) as s200424u,
   coalesce (s200425.units, 0) as s200425u,
   coalesce (s200426.units, 0) as s200426u,
   coalesce (s200427.units, 0) as s200427u,
   coalesce (s200428.units, 0) as s200428u,
   coalesce (s200429.units, 0) as s200429u,
   coalesce (s200430.units, 0) as s200430u,
   coalesce (s200431.units, 0) as s200431u,
   coalesce (s200432.units, 0) as s200432u,
   coalesce (s200433.units, 0) as s200433u,
   coalesce (s200434.units, 0) as s200434u,
   coalesce (s200435.units, 0) as s200435u,
   coalesce (s200436.units, 0) as s200436u,
   coalesce (s200437.units, 0) as s200437u,
   coalesce (s200438.units, 0) as s200438u,
   coalesce (s200439.units, 0) as s200439u,
   coalesce (s200440.units, 0) as s200440u,
   coalesce (s200441.units, 0) as s200441u,
   coalesce (s200442.units, 0) as s200442u,
   coalesce (s200443.units, 0) as s200443u,
   coalesce (s200444.units, 0) as s200444u,
   coalesce (s200445.units, 0) as s200445u,
   coalesce (s200446.units, 0) as s200446u,
   coalesce (s200447.units, 0) as s200447u,
   coalesce (s200448.units, 0) as s200448u,
   coalesce (s200449.units, 0) as s200449u,
   coalesce (s200450.units, 0) as s200450u,
   coalesce (s200451.units, 0) as s200451u,
   coalesce (s200452.units, 0) as s200452u
from
   title t
   left outer join sale_200401 s200401 on t.tid=s200401.tid and 
s200401.channel=100
   left outer join sale_200402 s200402 on t.tid=s200402.tid and 
s200402.channel=100
   left outer join sale_200403 s200403 on t.tid=s200403.tid and 
s200403.channel=100
   left outer join sale_200404 s200404 on t.tid=s200404.tid and 
s200404.channel=100
   left outer join sale_200405 s200405 on t.tid=s200405.tid and 
s200405.channel=100
   left outer join sale_200406 s200406 on t.tid=s200406.tid and 
s200406.channel=100
   left outer join sale_200407 s200407 on t.tid=s200407.tid and 
s200407.channel=100
   left outer join sale_200408 s200408 on t.tid=s200408.tid and 
s200408.channel=100
   left outer join sale_200409 s200409 on t.tid=s200409.tid and 
s200409.channel=100
   left outer join sale_200410 s200410 on t.tid=s200410.tid and 
s200410.channel=100
   left outer join sale_200411 s200411 on t.tid=s200411.tid and 
s200411.channel=100
   left oute

Re: [GENERAL] EXPLAIN SELECT .. does not return

2005-12-05 Thread David Link

Tom Lane wrote:


David Link <[EMAIL PROTECTED]> writes:
 


Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return.
   



I'd guess that one or all of these settings are excessive:

 


geqo_threshold = 14
from_collapse_limit = 13
join_collapse_limit = 13   
   



Keep in mind that the planning cost is exponential in these limits,
eg geqo_threshold = 14 probably allows planning times about 14 times
greater than geqo_threshold = 13.

While I'm looking:

 


shared_buffers = 2000
   



That seems extremely low for modern machines.

 


sort_mem = 1048576
   



That, on the other hand, is almost certainly way too high for a system-wide
setting.  You're promising you have 1Gb available for *each* sort.

 


max_fsm_pages = 10
   



And this way too low for a 100Gb database, unless most of the tables
never see any UPDATEs or DELETEs.

 

wal_buffers = 800  
   



Seems a bit high, especially considering you have fsync disabled and
thus there is no benefit whatever to buffering WAL.

 


commit_delay = 100
commit_siblings = 50
   



Have you measured any benefit to having this turned on?

All in all it looks like your configuration settings were chosen by
throwing darts :-(

regards, tom lane

 

Thanks for your reply, Tom.  Different folks have made different 
suggestions.  Can you suggest more reasonable values for these?  But 
more importantly, do you think the problem I am having is due to these 
configuration short comings?


Thanks much.
David




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] EXPLAIN SELECT .. does not return

2005-12-05 Thread David Link

Jaime Casanova wrote:


when you have thoses cases, you can take a look in pg_stats_activity
to find the offending query...

or simply logs all queries
 


Thanks for the advice.  I also turned on stat_command_string


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] EXPLAIN SELECT .. does not return

2005-12-05 Thread David Link
Hi, This has become a major problem for us.  Thank you in advance for 
your help.


OS: SUSE Linux 2.6.5-7.191-bigsmp
PostgreSQL: 7.4.8
Application: ModPerl Web application using DBI.pm
Database size: 100 Gb, 1025 Tables.

Problem: EXPLAIN SELECT ... does not return.

Description:

The Application uses an EXPLAIN cost to determine whether a client's dynamic
request for data is too demanding for the server so it can gracefully deny
them.  (Currently, anything over cost=0.00..50.00).  


The system gets about 3000 page requests a day.

Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return.  As a result they seem to stay churning in the
system.  Once that happens other queries build up and the performance of the
whole database server grinds to a halt.  Postgresql never dies, but
eventually, user requests start timing out.

This happens on average two or three times a week.  I kill an offending
process and all's well again.  I have not been able to identify with 
certainty

an offending SQL statement.

Config params, that have changed from default:

tcpip_socket = true
max_connections = 200
shared_buffers = 2000
sort_mem = 1048576
vacuum_mem = 65536
max_fsm_pages = 10
max_fsm_relations = 1000
max_files_per_process = 1000
fsync = false
wal_sync_method = fsync
wal_buffers = 800  
checkpoint_segments = 30   
commit_delay = 100

commit_siblings = 50
effective_cache_size = 1000
random_page_cost = 4 
geqo = true

geqo_threshold = 14
default_statistics_target = 100
from_collapse_limit = 13
join_collapse_limit = 13   

Note: we load lumps of data ea. week.  Then primarily it is a readonly 
database.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Question about backing up partial Database

2005-06-01 Thread David Link

Hi,

Can pg_dump be used to backup a selected number of tables, (rather than 
All or One)?


We have a case where we need to backup tables of the form sale_2001xx, 
or sale_2002xx, etc.


Our current solution is the following:

  pg_dump -d dbname -t sale_200101 >  sales_2001_dump
  pg_dump -d dbname -t sale_200102 >>  sales_2001_dump
  pg_dump -d dbname -t sale_200103 >>  sales_2001_dump
  ..
  pg_dump -d dbname -t sale_200152 >>  sales_2001_dump
 gzip sale_2001_dump

However this requires more diskspace and more db connection over head than:

  pg_dump -d dbname | gzip sales_2001_dump.gz

Currently all the tables are in the public namespace.  We've had trouble 
in the past for our perl program to work with multiple schema names.


Thank you for your help any suggestions.
Sincerely,
David Link
White Plains, NY






---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] diskspace usage recovered on db rebuild

2005-04-04 Thread David Link
Thank you Scott and Lonni for your replies ...
On Fri, 2005-04-01 at 11:21, David Link wrote:
I have a question regarding filesystem disk space usage.
We have a production database containing 5 years of sales data. 
Linux 2.6.5; Postgresq 7.4.7.  VACUUM ANALZYE the entire database 
everynight (about 40min).
It's size, @SUM(pg_class.relpages) * 8192K, is ...

  About 66 Gigabytes on disk.
When I rebuild the database (using pg_dump and pgsql ), the new 
resultant database is ..

  About 48 Gigabytes on disk.
A 27% space savings.
Can someone tell me why that is? 
*Scott Marlowe wrote:
*There's a certain amount of left over space that's ok.  PostgreSQL,
under normal operation, with regular, non-full vacuums, grows until it
reaches a "steady state" size that has some percentage of each file
having freed tuple space that can be reused by later inserts / updates. 
This is a "good thing" as long as it doesn't go to far.

Since inserts and updates can be placed in already allocated space, they
should be added faster than if each one had to allocate more space in a
file then tack the tuple on the end each time.
However, if there's too much free space in the table files, then the
database will be less efficient, because each sequential read of the
tables has to read a lot of "blank" space.
It's all about balance.
You might want to look at running the pg_autovacuum daemon and letting
it decide when a vacuum is necessary, or scheduling regular vacuums to
run more often than every night.  You might also want to look at adding
vacuum or possible vacuum full  when updating large tables to
individually clean up afterwards.
Next time, try a vacuum full first to see how much space it can
reclaim.  And lastly, use vacuum verbose to get an idea if your fsm
settings are high enough.
*Lonni J Friedman wrote:
*
Are you doing a vacuum full each night?  What is the specific
command(s) that you are using for vacuum, pg_dump and the import?

David Link writes:
The answer to Lonni question:
$ vacuumdb --analyze $database 2>&1 | tee -a $log 

$ time (pg_dump $database | gzip > $backup_dir/$dump_file) 2>&1 | tee -a $log
$ gunzip -c $dumpfile | psql -h $host -d $database >/dev/null

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[GENERAL] diskspace usage recovered on db rebuild

2005-04-01 Thread David Link
Greetings worthymen.
I have a question regarding filesystem disk space usage.
We have a production database containing 5 years of sales data. 
Linux 2.6.5; Postgresq 7.4.7.  VACUUM ANALZYE the entire database 
everynight (about 40min).
It's size, @SUM(pg_class.relpages) * 8192K, is ...

 About 66 Gigabytes on disk.
When I rebuild the database (using pg_dump and pgsql ), the new 
resultant database is ..

 About 48 Gigabytes on disk.
A 27% space savings.
Can someone tell me why that is? 

Thank you,
David Link
White Plains, NY
Long live Postgres.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] A conditional DROP TABLE function

2003-10-13 Thread David Link
Hi Tino,

--- Tino Wildenhain <[EMAIL PROTECTED]> wrote:
> David Link wrote:
> > Hi All,
> > 
> > Here's a Conditional drop_table func for those interested.  There
> was a
> > thread on this a long time back.
> > 
> > We do this all the time :
> > 
> >   DELETE TABLE sales;
> >   CREATE TABLE sales (...);
> > 
> Hm. "all the time" enables all the warning lights -
> what are you doing to have to delete and create
> the tables every time?

By 'all the time' i mean once a week. (incidently, it is DROP and not
DELETE table, of course).

We are working with weekly loads of data.  because of the volumns i'm
using a separate sales table per week, (ie, sale_200301, sale_200302,
etc), becuase when it was in one big happy table (ie, sale) it is
slower  -- especially recreating indexes. 

Now you can see, to make the weekly data load process rerunnable (and
we do rerun it often) i drop and [re]create this weeks collection of
sales related tables.  The logs contain statistics, warnings and
errors, which we share with the non-geeks who use the data.

Also, standard procedure (I believe) for maintaining a data model is
creating and using database creation scripts (with DROP and CREATE) -
so every time you set up a bran new system, you can get those (i
believe) unnecessary messages: ERROR table does not exist.

-Thanks.

> 
> > But nobody likes
> > 
> >   ERROR:  table "sales" does not exist
> > 
> > which we see all the time in the logs.  I want to show the logs to
> none
> > db folk -- so we can't have those error messages in it.
> 
> grep -v "ERROR" should do it.

Yes, but then you've got to wrap things in a log filter to generate
reports rather than just using the processing log as it comes out.

> 
> > 
> > (There must be some explaination why postgresql (and Oracle as
> well) do
> > not have CREATE OR REPLACE TABLE as it does for VIEWs, and
> FUNCTIONs. 
> > Anybody know?)
> 
> Nobody needs this?
> There is:
> 
> 1) delete from table;
> 2) truncate table;
> 
> to remove all the data
> 
> 3) alter table ...
> 
> to change tables layout.

these do not create the table should it not yet exist.
and i needed it. -- so i wrote the simple stored procedure to make
meself happy.  i noticed others have asked for a solution to the
problem as well.  (are you being a stickler?)

also in the name of consistency, CREATE OR REPLACE exist for stored
procedures (and views?).

PS: I LOVE Postegres.  It has made my life (as an Oracle DBA) charming
rather than hell!  Thanks.

> 
> HTH
> Tino Wildenhain
> 


__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] A conditional DROP TABLE function

2003-09-25 Thread David Link
Hi All,

Here's a Conditional drop_table func for those interested.  There was a
thread on this a long time back.

We do this all the time :

  DELETE TABLE sales;
  CREATE TABLE sales (...);

But nobody likes

  ERROR:  table "sales" does not exist

which we see all the time in the logs.  I want to show the logs to none
db folk -- so we can't have those error messages in it.

(There must be some explaination why postgresql (and Oracle as well) do
not have CREATE OR REPLACE TABLE as it does for VIEWs, and FUNCTIONs. 
Anybody know?)

Anyway here's drop_table ():


CREATE or REPLACE function drop_table (varchar) returns varchar as '
DECLARE
tablename  alias for $1;
cntint4;
BEGIN
SELECT into cnt count(*) from pg_class where relname =
tablename::name;
if cnt > 0 then
execute \'DROP TABLE \' || tablename;
return tablename || \' DROPPED\';
end if;
return tablename || \' does not exist\';
END;'
language 'plpgsql' ;


And here's it's usage in an SQL script:

\set QUIET
\pset format unaligned
\pset tuples_only
\unset QUIET

select drop_table('sale');
CREATE TABLE sale ( ... );

Regards, DAvid


__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] Why does adding SUM and GROUP BY destroy performance?

2003-09-18 Thread David Link
Thanks Ron, Thanks Christopher for your excellent feedback.

I guess it's back to the drawing board.  This is a very late hour
business requirement change.  And we need quick real-time results.

Two things are being considered:

   1. loading the non aggregate query entirely into memory (using perl
cgi, currently, but looking at the possiblity of doing this in the
database with either PL/perl or PL/plsql, though I don't know what
would be gained by doing it that way).  And handling the summing and
then the sort ourselves in the program, -- or --

   2. preprocessing a lot of the data at pre-determined times. 
Essentially doubling the size of our database.

I'd be open to any other suggestions.

Thanks again. very much.
-David



__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Why does adding SUM and GROUP BY destroy performance?

2003-09-17 Thread David Link
Hi,

Why does adding SUM and GROUP BY destroy performance?
details follow. 
Thanks, David Link

s1.sql:
SELECT 
t.tid, t.title,
COALESCE(s0c100r100.units, 0) as w0c100r100units, 
(COALESCE(r1c2r100.units, 0) + COALESCE(y0c2r100.units, 0)) 
as r0c2r100units 
FROM 
title t 
JOIN upc u1 ON t.tid = u1.tid 
LEFT OUTER JOIN sale_200331 s0c100r100 ON u1.upc = s0c100r100.upc 
AND s0c100r100.week = 200331 AND s0c100r100.channel = 100 
AND s0c100r100.region = 100 
LEFT OUTER JOIN rtd r1c2r100 ON u1.upc = r1c2r100.upc 
AND r1c2r100.year = 2002 AND r1c2r100.channel = 2 
AND r1c2r100.region = 100 
LEFT OUTER JOIN ytd_200331 y0c2r100 ON u1.upc = y0c2r100.upc 
AND y0c2r100.week = 200331 AND y0c2r100.channel = 2 
AND y0c2r100.region = 100 
LEFT OUTER JOIN media m ON t.media = m.key 
LEFT OUTER JOIN screen_format sf ON t.screen_format = sf.key 
WHERE 
t.distributor != 'CONTROL LABEL' 
ORDER BY 
t.title ASC 
LIMIT 50
;


s2.sql:
SELECT
t.tid, t.title,
SUM(COALESCE(s0c100r100.units, 0)) as w0c100r100units, 
SUM((COALESCE(r1c2r100.units, 0) + COALESCE(y0c2r100.units, 0))) 
as r0c2r100units 
FROM 
title t 
JOIN upc u1 ON t.tid = u1.tid 
LEFT OUTER JOIN sale_200331 s0c100r100 ON u1.upc = s0c100r100.upc 
AND s0c100r100.week = 200331 AND s0c100r100.channel = 100 
AND s0c100r100.region = 100 
LEFT OUTER JOIN rtd r1c2r100 ON u1.upc = r1c2r100.upc 
AND r1c2r100.year = 2002 AND r1c2r100.channel = 2 
AND r1c2r100.region = 100 
LEFT OUTER JOIN ytd_200331 y0c2r100 ON u1.upc = y0c2r100.upc 
AND y0c2r100.week = 200331 AND y0c2r100.channel = 2 
AND y0c2r100.region = 100 
LEFT OUTER JOIN media m ON t.media = m.key 
LEFT OUTER JOIN screen_format sf ON t.screen_format = sf.key 
WHERE 
t.distributor != 'CONTROL LABEL' 
GROUP BY 
t.tid, t.title
ORDER BY 
t.title ASC 
LIMIT 50
;


Times:
 s1.sql takes 0m0.124s
 s2.sql takes 1m1.450s

Stats:
 title table: 68,000 rows
 sale_200331 table: 150,000 rows
 ytd_200331 table: 0 rows
 rtd table: 650,000 rows

Indexes are in place.

s1 explain plan:   
   QUERY PLAN  


 Limit  (cost=0.00..65105.51 rows=50 width=132)
   ->  Nested Loop  (cost=0.00..91726868.54 rows=70445 width=132)
 Join Filter: ("outer".screen_format = "inner"."key")
 ->  Nested Loop  (cost=0.00..91651668.74 rows=70445 width=127)
   Join Filter: ("outer".media = "inner"."key")
   ->  Nested Loop  (cost=0.00..91578053.95 rows=70445
width=122)
 ->  Nested Loop  (cost=0.00..91236359.89
rows=70445 width=98)
   ->  Nested Loop  (cost=0.00..90894665.82
rows=70445 width=74)
 ->  Nested Loop 
(cost=0.00..90539626.76 rows=70445 width=50)
   ->  Index Scan using
title_title_ind on title t  (cost=0.00..193051.67 rows=68775 width=38)
 Filter: (distributor <>
'CONTROL LABEL'::character varying)
   ->  Index Scan using
davids_tid_index on upc u1  (cost=0.00..1309.24 rows=353 width=12)
 Index Cond: ("outer".tid =
u1.tid)
 ->  Index Scan using
sale_200331_upc_wk_chl_reg_ind on sale_200331 s0c100r100 
(cost=0.00..5.02 rows=1 width=24)
   Index Cond: (("outer".upc =
s0c100r100.upc) AND (s0c100r100.week = 200331) AND (s0c100r100.channel
= 100) AND (s0c100r100.region = 100))
   ->  Index Scan using
rtd_upc_year_chl_reg_ind on rtd r1c2r100  (cost=0.00..4.83 rows=1
width=24)
 Index Cond: (("outer".upc =
r1c2r100.upc) AND (r1c2r100."year" = 2002) AND (r1c2r100.channel = 2)
AND (r1c2r100.region = 100))
 ->  Index Scan using ytd_200331_upc_wkchlreg_ind
on ytd_200331 y0c2r100  (cost=0.00..4.83 rows=1 width=24)
   Index Cond: (("outer".upc = y0c2r100.upc)
AND (y0c2r100.week = 200331) AND (y0c2r100.channel = 2) AND
(y0c2r100.region = 100

[GENERAL] Understanding explain costs

2001-10-12 Thread David Link

Hi,
Trying to understand the planner estimate costs  ... one index scan
seems to be much more expensive then another.  Here are the facts:

  tiger=# \dbk_inv
 Table "bk_inv"
   Attribute | Type | Modifier 
  ---+--+--
   store | varchar(5)   | 
   isbn  | varchar(10)  | not null
   qty   | numeric(5,0) | 
   week  | numeric(6,0) |   
  Indices: bk_inv_isbn_idx,
   bk_inv_store_idx

  tiger=# \dbk_inv_isbn_idx
   Index "bk_inv_isbn_idx"
   Attribute |Type 
  ---+-
   isbn  | varchar(10)
  btree

  tiger=# \dbk_inv_store_idx
  Index "bk_inv_store_idx"
   Attribute |Type
  ---+
   store | varchar(5)
  btree
 
There are about 50,000,000 rows and you can see a full scan on the table
is expensive:

  tiger=# explain select * from bk_inv;
  NOTICE:  QUERY PLAN:

  Seq Scan on bk_inv  (cost=0.00..999623.77 rows=46790877 width=48)


However using the isbn index costs are much less:

  tiger=# explain select * from bk_inv where isbn = 'foo';
  NOTICE:  QUERY PLAN:

  Index Scan using bk_inv_isbn_idx on bk_inv  (cost=0.00..53.13 
  rows=13 width=48)


However, however using the store index costs are still rather high  -
why is that!!??

  tiger=# explain select * from bk_inv where store = 'foo';
  NOTICE:  QUERY PLAN:

  Index Scan using bk_inv_store_idx on bk_inv  (cost=0.00..53456.09
  rows=13488 width=48)

Incidently the store index is slightly smaller than the isbn index ...

[postgres@roma tiger]$ ls -l bk_inv*
-rw---1 postgres postgres 1073741824 Oct 10 14:28 bk_inv
-rw---1 postgres postgres 1073741824 Oct 10 10:15 bk_inv.1
-rw---1 postgres postgres 1073741824 Oct 10 10:17 bk_inv.2
-rw---1 postgres postgres 1073741824 Oct 10 10:19 bk_inv.3
-rw---1 postgres postgres 60841984 Oct 11 15:51 bk_inv.4
-rw---1 postgres postgres 1073741824 Oct 10 13:37
bk_inv_isbn_idx
-rw---1 postgres postgres 566288384 Oct 10 14:31
bk_inv_isbn_idx.1
-rw---1 postgres postgres 1073741824 Oct 11 13:13
bk_inv_store_idx
-rw---1 postgres postgres 65921024 Oct 11 13:13
bk_inv_store_idx.1


Am I missing certain fundamentals about the planner/executor?

Thanks, David Link

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



[GENERAL] Performance problem with 50,000,000 rows

2001-10-10 Thread David Link

I'm new to PG but this just seems wrong.  Can someone take a look:

  .---. .---.
  | bk_inv| | bk_title  |
  |---| |---|
  | isbn  |<--->| isbn  |
  | store | | vendor|
  | qty   | |   |
  | week  | `---'
  |   |  2,000,000 recs
  `---'
   50,000,000 recs

  Actual record numbers:
bk_inv  : 46,790,877 
bk_title:  2,311,710 


VENDOR REPORT

  A list of Inventory items, for any one given vendor (e.q. 01672708)

select   i.isbn,
 t.vendor,
 i.store,
 i.qty
from bk_inv i,
 bk_title t
wherei.isbn = t.isbn
and  t.vendor = '01672708' ;

This query should be instantaneous.  Granted that's 50 million rows, but
I have created an index on the isbn column for both tables.
After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
spits out:

ERROR:  Write to hashjoin temp file failed

tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
and t.vendor  ='5029';
NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00..11229637.06 rows=2172466 width=72)
  ->  Index Scan using bk_title_isbn_idx on bk_title t 
(cost=0.00..390788.08 rows=107331 width=24)
  ->  Index Scan using bk_inv_isbn_idx on bk_inv i 
(cost=0.00..10252621.38 rows=46790877 width=48)

BIG COST!


These explain queries show the existance of the indexes and give small
costs:

tiger=# explain select * from bk_title where isbn = '5029';
NOTICE:  QUERY PLAN:

Index Scan using bk_title_isbn_idx on bk_title  (cost=0.00..4.90 rows=1
width=24)

tiger=# explain select * from bk_inv where isbn = '0897474228';
NOTICE:  QUERY PLAN:

Index Scan using bk_inv_isbn_idx on bk_inv  (cost=0.00..225.53 rows=55
width=48)


Note.  Same tables, same query returns instantaneously with Oracle 8.1. 
What I am hoping to show is that Postgres can do our job too.

Any help on this much obliged.  (Yes I ran vacuum analyze).

David Link
White Plains, NY

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org