date:20060510

Re: [PERFORM] UNSUBSCRIBE

2006-05-10 Thread Michael Glaesemann



On May 10, 2006, at 14:42 , Tom Lane wrote:


Chris [EMAIL PROTECTED] writes:


Maybe :) The php-general list has



To unsubscribe, visit: http://www.php.net/unsub.php



at the bottom of every email, and there are still random unsubscribe
requests..


That will *always* happen. Just human nature and the numbers of  
subscribers. However, a one-liner that either points to the webpage  
for unsubscribing (probably easiest) or a brief description on how to  
unsubscribe (To unsubscribe, send an email to  
[EMAIL PROTECTED] with body unsub pgsql-performance (without  
quotes)) may intercept a few more. Is there a way to configure  
Majordomo to make even easier to unsubscribe? Just sending to pgsql- 
[EMAIL PROTECTED] or some such? I've seen other  
mailing lists that do this. Requiring a specific command (what's the  
command? in the subject or the body?) is one more place a person can  
make a mistake. (I've recently switched mail accounts and unsubbed/ 
subbed from the lists I'm on. This latter style does make it a lot  
easier.)


(And are there mail readers out there that can pick those subscribe/ 
unsubscribe headers from the list emails? Now *that'd* be sweet.)


Michael Glaesemann
grzm seespotcode net




---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] in memory views

2006-05-10 Thread Thomas Vatter

is there a possibility for creating views or temp tables in memory to 
avoid disk io when user makes select operations?


regards
tom


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] in memory views

2006-05-10 Thread Tino Wildenhain


Thomas Vatter schrieb:
is there a possibility for creating views or temp tables in memory to 
avoid disk io when user makes select operations?


No need. The data will be available in OS and database caches if
they are really required often. If not, tune up the caches and
do a regular pre select.

Regards
Tino

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] in memory views

2006-05-10 Thread Thomas Vatter


Tino Wildenhain wrote:


Thomas Vatter schrieb:

is there a possibility for creating views or temp tables in memory to 
avoid disk io when user makes select operations?



No need. The data will be available in OS and database caches if
they are really required often. If not, tune up the caches and
do a regular pre select.

Regards
Tino




hmm, I am selecting a resultset with 1300 rows joined from 12 tables. 
with jdbc I am waiting 40 seconds until the first row appears. The 
following rows appear really fast but the 40 seconds are a problem.


regards
tom


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] in memory views

2006-05-10 Thread Tino Wildenhain


Thomas Vatter schrieb:

Tino Wildenhain wrote:


Thomas Vatter schrieb:

is there a possibility for creating views or temp tables in memory to 
avoid disk io when user makes select operations?




No need. The data will be available in OS and database caches if
they are really required often. If not, tune up the caches and
do a regular pre select.

Regards
Tino




hmm, I am selecting a resultset with 1300 rows joined from 12 tables. 
with jdbc I am waiting 40 seconds until the first row appears. The 
following rows appear really fast but the 40 seconds are a problem.


Well you will need the equally 40 seconds to fill your hypothetical
in memory table. (even a bit more due to the creation of a datastructure).

So you can do the aproaches of semi materialized views (that are in fact
writing into a shadow table) or just prefetch your data at time - just
at the times you would refill your memory tables if they existed.
A cronjob with select/fetch should do.

Regards
Tino

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] in memory views

2006-05-10 Thread Thomas Vatter


Tino Wildenhain wrote:


Thomas Vatter schrieb:


Tino Wildenhain wrote:


Thomas Vatter schrieb:

is there a possibility for creating views or temp tables in memory 
to avoid disk io when user makes select operations?





No need. The data will be available in OS and database caches if
they are really required often. If not, tune up the caches and
do a regular pre select.

Regards
Tino




hmm, I am selecting a resultset with 1300 rows joined from 12 tables. 
with jdbc I am waiting 40 seconds until the first row appears. The 
following rows appear really fast but the 40 seconds are a problem.



Well you will need the equally 40 seconds to fill your hypothetical
in memory table. (even a bit more due to the creation of a 
datastructure).


So you can do the aproaches of semi materialized views (that are in fact
writing into a shadow table) or just prefetch your data at time - just
at the times you would refill your memory tables if they existed.
A cronjob with select/fetch should do.

Regards
Tino




If the in memory table is created a bootup time of the dbms it is 
already present when user selects the data. Of course the challenge is 
to keep the in memory table up to date if data are changed. What do you 
mean with semi materialized views, I have tried select * from this_view 
with the same result. Also, if I repeat the query it does not run faster.


regards
tom

--
Mit freundlichen Grüßen / Regards
Vatter

Network Inventory Software
Sun Microsystems Principal Partner

www.network-inventory.de
Tel. 030-79782510
E-Mail [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] in memory views

2006-05-10 Thread Hakan Kocaman

Hi,

there was a similar discussion with a ramdisk:
http://archives.postgresql.org/pgsql-hackers/2005-11/msg01058.php

You need to populate the data on serverstart, of course.

But as Timo mentionend, it's maybe not worth the trouble.

Maybe their is a way to speed up the queriy itself.

To analyze this, you should post the query- and table-definition 
and the output of explain analyze of the offending query.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: [EMAIL PROTECTED]



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Thomas Vatter
 Sent: Wednesday, May 10, 2006 12:43 PM
 To: Tino Wildenhain
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] in memory views
 
 
 Tino Wildenhain wrote:
 
  Thomas Vatter schrieb:
 
  Tino Wildenhain wrote:
 
  Thomas Vatter schrieb:
 
  is there a possibility for creating views or temp tables 
 in memory 
  to avoid disk io when user makes select operations?
 
 
 
 
  No need. The data will be available in OS and database caches if
  they are really required often. If not, tune up the caches and
  do a regular pre select.
 
  Regards
  Tino
 
 
 
  hmm, I am selecting a resultset with 1300 rows joined from 
 12 tables. 
  with jdbc I am waiting 40 seconds until the first row appears. The 
  following rows appear really fast but the 40 seconds are a problem.
 
 
  Well you will need the equally 40 seconds to fill your hypothetical
  in memory table. (even a bit more due to the creation of a 
  datastructure).
 
  So you can do the aproaches of semi materialized views 
 (that are in fact
  writing into a shadow table) or just prefetch your data at 
 time - just
  at the times you would refill your memory tables if they existed.
  A cronjob with select/fetch should do.
 
  Regards
  Tino
 
 
 
 If the in memory table is created a bootup time of the dbms it is 
 already present when user selects the data. Of course the 
 challenge is 
 to keep the in memory table up to date if data are changed. 
 What do you 
 mean with semi materialized views, I have tried select * from 
 this_view 
 with the same result. Also, if I repeat the query it does not 
 run faster.
 
 regards
 tom
 
 -- 
 Mit freundlichen Grüßen / Regards
 Vatter
  
 Network Inventory Software
 Sun Microsystems Principal Partner
 
 www.network-inventory.de
 Tel. 030-79782510
 E-Mail [EMAIL PROTECTED]
 
 
 ---(end of 
 broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[PERFORM] Question about explain-command...

2006-05-10 Thread Clemens Eisserer


Hello,

I just discovered the explain command and well ... have some (for you
of course very stupid) questions.

I do a quite large (for my taste) join, the query looks like the following:
SELECT DISTINCT customer.email AS cemail, customer.key AS ckey,
customer.anrede AS canrede, customer.strasse AS cstrasse, customer.plz
AS cplz, customer.ort AS cort, customer.vorname AS cvorname,
customer.nachname AS cnachname , custtype.name AS tname, customer.land
AS cland, customer.datanotvalid AS cdatanvalid FROM customer LEFT JOIN
sells ON customer.key=sells.custid LEFT JOIN goods ON
sells.goodsid=goods.key LEFT JOIN custtype ON
customer.custgroup=custtype.key LEFT JOIN prodtype ON
prodtype.key=goods.prodgroup WHERE customer.nachname LIKE  '%name%';

All primary keys are indixed, and this is what explain tells me:
Unique  (cost=15.67..16.69 rows=34 width=115)
  -  Sort  (cost=15.67..15.75 rows=34 width=115)
Sort Key: customer.email, customer.key, customer.anrede, customer.str
asse, customer.plz, customer.ort, customer.vorname, customer.nachname, custtype.
name, customer.land, customer.datanotvalid
-  Hash Left Join  (cost=6.16..14.80 rows=34 width=115)
  Hash Cond: (outer.prodgroup = inner.key)
  -  Hash Left Join  (cost=4.97..13.10 rows=34 width=119)
Hash Cond: (outer.custgroup = inner.key)
-  Hash Left Join  (cost=3.88..11.49 rows=34 width=111)
  Hash Cond: (outer.goodsid = inner.key)
  -  Hash Left Join  (cost=1.98..9.08
rows=34 width=111)
Hash Cond: (outer.key = inner.custid)
-  Seq Scan on customer 
(cost=0.00..6.10 rows=34 width=107)

  Filter: ((nachname)::text ~~
'%au%'::text)
-  Hash  (cost=1.78..1.78 rows=78 width=8)
  -  Seq Scan on sells 
(cost=0.00..1.78 rows=78 width=8)

  -  Hash  (cost=1.72..1.72 rows=72 width=8)
-  Seq Scan on goods 
(cost=0.00..1.72 rows=72 width=8)

-  Hash  (cost=1.08..1.08 rows=8 width=16)
  -  Seq Scan on custtype  (cost=0.00..1.08
rows=8 width=16)
  -  Hash  (cost=1.15..1.15 rows=15 width=4)
-  Seq Scan on prodtype  (cost=0.00..1.15 rows=15 width=4)


What does the hash-lines mean, does that mean my query does not use
the indices at all?
Why are some table-names and some column-names surrounded by '  '?
Are they threated as text-columns?
I have to admit that the tables are just filled with test-data so the
analyzer may take just a very simple way since almost no data is in...

lg Clemens

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] in memory views

2006-05-10 Thread Tino Wildenhain


Thomas Vatter schrieb:

Tino Wildenhain wrote:

...

Well you will need the equally 40 seconds to fill your hypothetical
in memory table. (even a bit more due to the creation of a 
datastructure).


So you can do the aproaches of semi materialized views (that are in fact
writing into a shadow table) or just prefetch your data at time - just
at the times you would refill your memory tables if they existed.
A cronjob with select/fetch should do.

Regards
Tino




If the in memory table is created a bootup time of the dbms it is 
already present when user selects the data. Of course the challenge is 
to keep the in memory table up to date if data are changed. What do you 
mean with semi materialized views, I have tried select * from this_view 
with the same result. Also, if I repeat the query it does not run faster.



Semi materialized views are just views with aditional rules and some
triggers which copy data to another table. There are several receipes
if you google accordingly.

I do not know what you mean by bootup time - do you really reboot
your database server? *hehe* just kidding ;)

In your first email you told me your query indeed runs faster the 2nd
time (due to the caching) now you are telling me that it is not.

Btw, judging from your analyze output you are using very cryptic
table and column names - you can use aliasing in the query and dont
have to resort to tiny tags when you actually name the objects ;)

Maybe others have comments on your query. Btw, better use
explain analyze to get realistic results.

Regards
Tino

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Douglas McNaught

Greg Stark [EMAIL PROTECTED] writes:

 Douglas McNaught [EMAIL PROTECTED] writes:

 Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive.

 Well, dollar for dollar you would get the best performance from slower drives
 anyways since it would give you more spindles. 15kRPM drives are *expensive*.

Depends on your power, heat and rack space budget too...  If you need
max performance out of a given rack space (rather than max density),
SCSI is still the way to go.  I'll definitely agree that SATA is
becoming much more of a player in the server storage market, though.

-Doug

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] in memory views

2006-05-10 Thread Thomas Vatter


Tino Wildenhain wrote:


Thomas Vatter schrieb:


Tino Wildenhain wrote:


...


Well you will need the equally 40 seconds to fill your hypothetical
in memory table. (even a bit more due to the creation of a 
datastructure).


So you can do the aproaches of semi materialized views (that are in 
fact

writing into a shadow table) or just prefetch your data at time - just
at the times you would refill your memory tables if they existed.
A cronjob with select/fetch should do.

Regards
Tino




If the in memory table is created a bootup time of the dbms it is 
already present when user selects the data. Of course the challenge 
is to keep the in memory table up to date if data are changed. What 
do you mean with semi materialized views, I have tried select * from 
this_view with the same result. Also, if I repeat the query it does 
not run faster.



Semi materialized views are just views with aditional rules and some
triggers which copy data to another table. There are several receipes
if you google accordingly.

I do not know what you mean by bootup time - do you really reboot
your database server? *hehe* just kidding ;)

In your first email you told me your query indeed runs faster the 2nd
time (due to the caching) now you are telling me that it is not.

Btw, judging from your analyze output you are using very cryptic
table and column names - you can use aliasing in the query and dont
have to resort to tiny tags when you actually name the objects ;)

Maybe others have comments on your query. Btw, better use
explain analyze to get realistic results.

Regards
Tino

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match




The subsequent rows are shown faster not the subsequent queries - if  
you really read my first e-mail ;-) . Yes, I have done analyse 
yesterday, the database has not changed, afaik it is necessary when the 
database contents are changing.


regards
tom


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Arguments Pro/Contra Software Raid

2006-05-10 Thread Florian Weimer

* Hannes Dorbath:

 + Hardware Raids might be a bit easier to manage, if you never spend a
 few hours to learn Software Raid Tools.

I disagree.  RAID management is complicated, and once there is a disk
failure, all kinds of oddities can occur which can make it quite a
challenge to get back a non-degraded array.

With some RAID controllers, monitoring is diffcult because they do not
use the system's logging mechanism for reporting.  In some cases, it
is not possible to monitor the health status of individual disks.

 + Using SATA drives is always a bit of risk, as some drives are lying
 about whether they are caching or not.

You can usually switch off caching.

 + Using hardware controllers, the array becomes locked to a particular
 vendor. You can't switch controller vendors as the array meta
 information is stored proprietary. In case the Raid is broken to a
 level the controller can't recover automatically this might complicate
 manual recovery by specialists.

It's even more difficult these days.  3ware controllers enable drive
passwords, so you can't access the drive from other controllers at all
(even if you could interpret the on-disk data).

 + Even battery backed controllers can't guarantee that data written to
 the drives is consistent after a power outage, neither that the drive
 does not corrupt something during the involuntary shutdown / power
 irregularities. (This is theoretical as any server will be UPS backed)

UPS failures are not unheard of. 8-/ Apart from that, you can address
a large class of shutdown failures if you replay a log stored in the
BBU on the next reboot (partial sector writes come to my mind).

It is very difficult to check if the controller does this correctly,
though.

A few other things to note: You can't achieve significant port density
with non-RAID controllers, at least with SATA.  You need to buy a RAID
controller anyway.  You can't quite achieve what a BBU does (even if
you've got a small, fast persistent storage device) because there's
no host software support for such a configuration.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Markus Schaber

Hi, PFC,

PFC wrote:

 The problem is that you need a set-returning function to retrieve
 the  values. SRFs don't have rowcount estimates, so the plans suck.

What about adding some way of rowcount estimation to SRFs, in the way of:

CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS
$$ ... function code ... $$ LANGUAGE plpgsql
ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ;

Internally, this could create two functions, foo (para, meters) and
estimate_foo(para, meters) that are the same language and coupled
together (just like a SERIAL column and its sequence). The estimator
functions have an implicit return parameter of int8. Parameters may be
NULL when they are not known at query planning time.

What do you think about this idea?

The same scheme could be used to add a CPUCOST_ESTIMATOR to expensive
functions.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Markus Schaber

Hi, Scott  all,

Scott Lamb wrote:

 I don't know the answer to this question, but have you seen this tool?
 
 http://brad.livejournal.com/2116715.html

We had a simpler tool inhouse, which wrote a file byte-for-byte, and
called fsync() after every byte.

If the number of fsyncs/min is higher than your rotations per minute
value of your disks, they must be lying.

It does not find as much liers as the script above, but it is less
intrusive (can be ran on every low-io machine without crashing it), and
it found some liers in-house (some notebook disks, one external
USB/FireWire to IDE case, and an older linux cryptoloop implementations,
IIRC).

If you're interested, I can dig for the C source...

HTH,
Markus




-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Bruce Momjian

Markus Schaber wrote:
 Hi, Scott  all,
 
 Scott Lamb wrote:
 
  I don't know the answer to this question, but have you seen this tool?
  
  http://brad.livejournal.com/2116715.html
 
 We had a simpler tool inhouse, which wrote a file byte-for-byte, and
 called fsync() after every byte.
 
 If the number of fsyncs/min is higher than your rotations per minute
 value of your disks, they must be lying.
 
 It does not find as much liers as the script above, but it is less

Why does it find fewer liers?

---

 intrusive (can be ran on every low-io machine without crashing it), and
 it found some liers in-house (some notebook disks, one external
 USB/FireWire to IDE case, and an older linux cryptoloop implementations,
 IIRC).
 
 If you're interested, I can dig for the C source...
 
 HTH,
 Markus
 
 
 
 
 -- 
 Markus Schaber | Logical TrackingTracing International AG
 Dipl. Inf. | Software Development GIS
 
 Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Vivek Khera



On May 10, 2006, at 12:41 AM, Greg Stark wrote:

Well, dollar for dollar you would get the best performance from  
slower drives
anyways since it would give you more spindles. 15kRPM drives are  
*expensive*.


Personally, I don't care that much for dollar for dollar I just  
need performance.  If it is within a factor of 2 or 3 in price then  
I'll go for absolute performance over bang for the buck.




smime.p7s
Description: S/MIME cryptographic signature

Re: [PERFORM] PostgreSQL VACCUM killing CPU

2006-05-10 Thread Wu Fengguang

On Tue, May 09, 2006 at 03:19:08AM -0700, [EMAIL PROTECTED] wrote:
 I have got such problem.
 Im running Postgresql 7.3.2 on Linux 2.6.13.
 What is see when VACCUM is running and killing my CPU is:
 
 Cpu(s):  3.2% us,  0.0% sy,  0.0% ni,  0.0% id, 96.8% wa,  0.0% hi,
 0.0% si
 
 what i am worry about is 96.8% wa why is it like that?

It's killing your disk drives instead of CPU(which is mostly _idle_
waiting for I/O completion).

Run this command to get an idea of the I/O activities:
iostat -x 3 3

[AD]Running a kernel patched with adaptive read-ahead may help it:
http://www.vanheusden.com/ara/adaptive-readahead-11.1-2.6.16.5.patch.gz

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Bruce Momjian

Vivek Khera wrote:
 
 On May 10, 2006, at 12:41 AM, Greg Stark wrote:
 
  Well, dollar for dollar you would get the best performance from  
  slower drives
  anyways since it would give you more spindles. 15kRPM drives are  
  *expensive*.
 
 Personally, I don't care that much for dollar for dollar I just  
 need performance.  If it is within a factor of 2 or 3 in price then  
 I'll go for absolute performance over bang for the buck.

That is really the issue.  You can buy lots of consumer-grade stuff and
work just fine if your performance/reliability tolerance is high enough.

However, don't fool yourself that consumer and server-grade hardware is
internally the same, or has the same testing.

I just had a Toshiba laptop drive replaced last week (new, not
refurbished), only to have it fail this week.  Obviously there isn't
sufficient burn-in done by Toshiba, and I don't fault them because it is
a consumer laptop --- it fails, they replace it.  For servers, the
downtime usually can't be tolerated, while consumers usually can
tolerate significant downtime.

I have always purchased server-grade hardware for my home server, and I
think I have had one day of hardware downtime in the past ten years. 
Consumer hardware just couldn't do that.

As one data point, most consumer-grade IDE drives are designed to be run
only 8 hours a day.  The engineering doesn't anticipate 24-hour
operation, and that trade-off passes all the way through the selection
of componients for the drive, which generates sigificant cost savings.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC




The problem is that you need a set-returning function to retrieve
the  values. SRFs don't have rowcount estimates, so the plans suck.


What about adding some way of rowcount estimation to SRFs, in the way of:

CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS
$$ ... function code ... $$ LANGUAGE plpgsql
ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ;

Internally, this could create two functions, foo (para, meters) and
estimate_foo(para, meters) that are the same language and coupled
together (just like a SERIAL column and its sequence). The estimator
functions have an implicit return parameter of int8. Parameters may be
NULL when they are not known at query planning time.

What do you think about this idea?


It would be very useful.
A few thoughts...

	You need to do some processing to know how many rows the function would  
return.

Often, this processing will be repeated in the function itself.
	Sometimes it's very simple (ie. the function will RETURN NEXT each  
element in an array, you know the array length...)
	Sometimes, for functions returning few rows, it might be faster to  
compute the entire result set in the cost estimator.


So, it might be a bit hairy to find a good compromise.

Ideas on how to do this (clueless hand-waving mode) :

	1- Add new attributes to set-returning functions ; basically a list of  
functions, each returning an estimation parameter (rowcount, cpu tuple  
cost, etc).

This is just like you said.

	2- Add an estimator, to a function, which would just be another  
function, returning one row, a record, containing the estimations in  
several columns (rowcount, cpu tuple cost, etc).
	Pros : only one function call to estimate, easier and faster, the  
estimator just leaves the unknown columns to NULL.
	The estimator needs not be in the same language as the function itself.  
It's just another function.


	3- The estimator could be a set-returning function itself which would  
return rows mimicking pg_statistics
	Pros : planner-friendly, the planner would SELECT from the SRF instead of  
looking in pg_statistics, and the estimator could tell the planner that,  
for instance, the function will return unique values.

Cons : complex, maybe slow

4- Add simple flags to a function, like :
- returns unique values
- returns sorted values (no need to sort my results)
	- please execute me and store my results in a temporary storage, count  
the rows returned, and plan the outer query accordingly

- etc.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Question about explain-command...

2006-05-10 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Clemens Eisserer
 Sent: Wednesday, May 10, 2006 6:50 AM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Question about explain-command...

 What does the hash-lines mean, does that mean my query does not use
 the indices at all?
 Why are some table-names and some column-names surrounded by '  '?
 Are they threated as text-columns?
 I have to admit that the tables are just filled with test-data so the
 analyzer may take just a very simple way since almost no data is in...

For small tables, it is faster to do a sequential scan than an index
scan.  You probably don't have enough test data to make the planner
choose an index scan.

I don't think the quotes really mean anything.  They are just used as
delimiters.

The hash lines mean your tables are being joined by hash joins.  You
should read this page for more info:

http://www.postgresql.org/docs/8.1/interactive/performance-tips.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Markus Schaber

Hi, Bruce,

Bruce Momjian wrote:


It does not find as much liers as the script above, but it is less
 
 Why does it find fewer liers?

It won't find liers that have a small lie-queue-length so their
internal buffers get full so they have to block. After a small burst at
start which usually hides in other latencies, they don't get more
throughput than spindle turns.

It won't find liers that first acknowledge to the host, and then
immediately write the block before accepting other commands. This
improves latency (which is measured in some benchmarks), but not
syncs/write rate.

Both of them can be captured by the other script, but not by my tool.

HTH,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Scott Marlowe

On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote:
 Scott Marlowe wrote:
  Actually, in the case of the Escalades at least, the answer is yes. 
  Last year (maybe a bit more) someone was testing an IDE escalade
  controller with drives that were known to lie, and it passed the power
  plug pull test repeatedly.  Apparently, the escalades tell the drives to
  turn off their cache.  While most all IDEs and a fair number of SATA
  drives lie about cache fsyncing, they all seem to turn off the cache
  when you ask.
  
  And, since a hardware RAID controller with bbu cache has its own cache,
  it's not like it really needs the one on the drives anyway.
 
 You do if the controller thinks the data is already on the drives and
 removes it from its cache.

Bruce, re-read what I wrote.  The escalades tell the drives to TURN OFF
THEIR OWN CACHE.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Douglas McNaught

Scott Marlowe [EMAIL PROTECTED] writes:

 On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote:

 You do if the controller thinks the data is already on the drives and
 removes it from its cache.

 Bruce, re-read what I wrote.  The escalades tell the drives to TURN OFF
 THEIR OWN CACHE.

Some ATA drives would lie about that too IIRC.  Hopefully they've
stopped doing it in the SATA era.

-Doug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Martijn van Oosterhout

On Wed, May 10, 2006 at 04:38:31PM +0200, PFC wrote:
   You need to do some processing to know how many rows the function 
   would  return.
   Often, this processing will be repeated in the function itself.
   Sometimes it's very simple (ie. the function will RETURN NEXT each  
 element in an array, you know the array length...)
   Sometimes, for functions returning few rows, it might be faster to  
 compute the entire result set in the cost estimator.

I think the best would probably be to assign a constant. An SRF will
generally return between one of 1-10, 10-100, 100-1000, etc. You don't
need exact number, you just need to get within an order of magnitude
and a constant will work fine for that.

How many functions sometimes return one and sometimes a million rows?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Markus Schaber

Hi, Bruce,

Markus Schaber wrote:

It does not find as much liers as the script above, but it is less
Why does it find fewer liers?
 
 It won't find liers that have a small lie-queue-length so their
 internal buffers get full so they have to block. After a small burst at
 start which usually hides in other latencies, they don't get more
 throughput than spindle turns.

I just reread my mail, and must admit that I would not understand what I
wrote above, so I'll explain a little more:

My test programs writes byte-for-byte. Let's say our FS/OS has 4k page-
and blocksize, that means 4096 writes that all write the same disk blocks.

Intelligent liers will see that the the 2nd and all further writes
obsolete the former writes who still reside in the internal cache, and
drop those former writes from cache, effectively going up to 4k
writes/spindle turn.

Dumb liers will keep the obsolete writes in the write cache / queue, and
so won't be caught by my program. (Note that I have no proof that such
disks actually exist, but I have enough experience with hardware that I
won't be surprised.)


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Markus Schaber

Hi, PFC,

PFC wrote:

 You need to do some processing to know how many rows the function
 would  return.
 Often, this processing will be repeated in the function itself.
 Sometimes it's very simple (ie. the function will RETURN NEXT each 
 element in an array, you know the array length...)
 Sometimes, for functions returning few rows, it might be faster to 
 compute the entire result set in the cost estimator.

I know, but we only have to estmiate the number of rows to give a hint
to the query planner, so we can use lots of simplifications.

E. G. for generate_series we return ($2-$1)/$3, and for some functions
even constant estimates will be good enough.

 - please execute me and store my results in a temporary storage,
 count  the rows returned, and plan the outer query accordingly

That's an interesting idea.

Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Scott Marlowe

On Wed, 2006-05-10 at 09:51, Douglas McNaught wrote:
 Scott Marlowe [EMAIL PROTECTED] writes:
 
  On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote:
 
  You do if the controller thinks the data is already on the drives and
  removes it from its cache.
 
  Bruce, re-read what I wrote.  The escalades tell the drives to TURN OFF
  THEIR OWN CACHE.
 
 Some ATA drives would lie about that too IIRC.  Hopefully they've
 stopped doing it in the SATA era.

Ugh.  Now that would make for a particularly awful bit of firmware
implementation.  I'd think that if I found a SATA drive doing that I'd
be likely to strike the manufacturer off of the list for possible future
purchases...

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] in memory views

2006-05-10 Thread Scott Marlowe

On Wed, 2006-05-10 at 04:55, Thomas Vatter wrote:
 Tino Wildenhain wrote:
 
  Thomas Vatter schrieb:
 
  is there a possibility for creating views or temp tables in memory to 
  avoid disk io when user makes select operations?
 
 
  No need. The data will be available in OS and database caches if
  they are really required often. If not, tune up the caches and
  do a regular pre select.
 
  Regards
  Tino
 
 
 
 hmm, I am selecting a resultset with 1300 rows joined from 12 tables. 
 with jdbc I am waiting 40 seconds until the first row appears. The 
 following rows appear really fast but the 40 seconds are a problem.

Are you selecting the whole set at once?  Or are you placing it into a
cursor?

What happens if you do this by declaring it as a cursor and then
fetching the first row?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Question about explain-command...

2006-05-10 Thread Nis Jorgensen

I will try answering your questions. Please note that I am a newbie myself.

Clemens Eisserer wrote

 All primary keys are indixed, and this is what explain tells me:
 Unique  (cost=15.67..16.69 rows=34 width=115)
   -  Sort  (cost=15.67..15.75 rows=34 width=115)
 Sort Key: customer.email, customer.key, customer.anrede, 
 customer.str
 asse, customer.plz, customer.ort, customer.vorname, customer.nachname, 
 custtype.
 name, customer.land, customer.datanotvalid
 -  Hash Left Join  (cost=6.16..14.80 rows=34 width=115)
   Hash Cond: (outer.prodgroup = inner.key)
   -  Hash Left Join  (cost=4.97..13.10 rows=34 width=119)
 Hash Cond: (outer.custgroup = inner.key)
 -  Hash Left Join  (cost=3.88..11.49 rows=34 width=111)
   Hash Cond: (outer.goodsid = inner.key)
   -  Hash Left Join  (cost=1.98..9.08
 rows=34 width=111)
 Hash Cond: (outer.key = inner.custid)
 -  Seq Scan on customer (cost=0.00..6.10 
 rows=34 width=107)
   Filter: ((nachname)::text ~~
 '%au%'::text)
 -  Hash  (cost=1.78..1.78 rows=78 width=8)
   -  Seq Scan on sells (cost=0.00..1.78 
 rows=78 width=8)
   -  Hash  (cost=1.72..1.72 rows=72 width=8)
 -  Seq Scan on goods (cost=0.00..1.72 
 rows=72 width=8)
 -  Hash  (cost=1.08..1.08 rows=8 width=16)
   -  Seq Scan on custtype  (cost=0.00..1.08
 rows=8 width=16)
   -  Hash  (cost=1.15..1.15 rows=15 width=4)
 -  Seq Scan on prodtype  (cost=0.00..1.15 rows=15 
 width=4)


 What does the hash-lines mean, does that mean my query does not use
 the indices at all?

Yes. Probably each table fits nicely into a single disk read, so reading
both the index AND the table is going to be twice as expensive.

 Why are some table-names and some column-names surrounded by '  '?
 Are they threated as text-columns?

They are either names generated by postgres (outer and inner) or
field names which are also reserved words in SQL (key). You can always
use double quotes around a field name - you have to in some cases if
they are reserved words, and always if they contain special characters
(not sure from memory exactly which these are - at least spaces). I
recommend not to use either of these, even if a reserved word is the
best description of your field.

Postgres seems to be a bit better than some other dbms's in allowing
unquoted reserved words as field names if there is no ambiguity. Thsis
may mean that you get a problem if your application is ever ported to a
different dbms.

 I have to admit that the tables are just filled with test-data so the
 analyzer may take just a very simple way since almost no data is in...

Try loading your tables with a realistic number of customers, and you
should see a change in the query plan to use your precious indexes.

/Nis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] in memory views

2006-05-10 Thread Thomas Vatter





Scott Marlowe wrote:

  On Wed, 2006-05-10 at 04:55, Thomas Vatter wrote:
  
  
Tino Wildenhain wrote:



  Thomas Vatter schrieb:

  
  
is there a possibility for creating views or temp tables in memory to 
avoid disk io when user makes select operations?

  
  
No need. The data will be available in OS and database caches if
they are really required often. If not, tune up the caches and
do a regular "pre select".

Regards
Tino


  

hmm, I am selecting a resultset with 1300 rows joined from 12 tables. 
with jdbc I am waiting 40 seconds until the first row appears. The 
following rows appear really fast but the 40 seconds are a problem.

  
  
Are you selecting the whole set at once?  Or are you placing it into a
cursor?

What happens if you do this by declaring it as a cursor and then
fetching the first row?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


  


I do executeQuery(), for the resultSet I do next() and return one row,
but wait, I have to review the logic in this area, I can tell you
tomorrow

regards
tom

Re: [PERFORM] in memory views

2006-05-10 Thread Scott Marlowe

On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote:
 Scott Marlowe wrote: 

  What happens if you do this by declaring it as a cursor and then
  fetching the first row?


 
 I do executeQuery(), for the resultSet I do next() and return one row,
 but wait, I have to review the logic in this area, I can tell you
 tomorrow


A good short test is to run explain analyze on the query from the psql
command line.  If it shows an execution time of significantly less than
what you get from you application, then it is likely that the real
problem is that your application is receiving the whole result set via
libpq and waiting for that.  A cursor will solve that problem.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] in memory views

2006-05-10 Thread me

is there a possibility for creating views or temp tables in memory to 
avoid disk io when user makes select operations?


you might also want to look into materialized views:
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
http://www.varlena.com/varlena/GeneralBits/64.php

this helped us alot when we had slow queries involving many tables.

cheers,
thomas


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Nis Jorgensen

Martijn van Oosterhout wrote:
 On Wed, May 10, 2006 at 04:38:31PM +0200, PFC wrote:
  You need to do some processing to know how many rows the function 
  would  return.
  Often, this processing will be repeated in the function itself.
  Sometimes it's very simple (ie. the function will RETURN NEXT each  
 element in an array, you know the array length...)
  Sometimes, for functions returning few rows, it might be faster to  
 compute the entire result set in the cost estimator.
 
 I think the best would probably be to assign a constant. An SRF will
 generally return between one of 1-10, 10-100, 100-1000, etc. You don't
 need exact number, you just need to get within an order of magnitude
 and a constant will work fine for that.
 
 How many functions sometimes return one and sometimes a million rows?

It will probably be quite common for the number to depend on the number
of rows in other tables. Even if this is fairly constant within one db
(some assumption), it is likely to be different in others using the same
function definition. Perhaps a better solution would be to cache the
result of the estimator function.

/Nis



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Markus Schaber

Hi, Nils,

Nis Jorgensen wrote:

 It will probably be quite common for the number to depend on the number
 of rows in other tables. Even if this is fairly constant within one db
 (some assumption), it is likely to be different in others using the same
 function definition. Perhaps a better solution would be to cache the
 result of the estimator function.

Sophisticated estimator functions are free to use the pg_statistics
views for their row count estimation.


HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] UNSUBSCRIBE

2006-05-10 Thread Bruno Wolff III

On Wed, May 10, 2006 at 01:15:11 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 Maybe the real problem is at the other end of the process, ie we should
 require some evidence of a greater-than-room-temp IQ to subscribe in the
 first place?

I suspect it is more lazyiness that smarts. That had to at least figure out
how to respond to the confirm message in the first place in order to get
subscribed.
My theory is that they don't want to take the trouble to figure out how to
unsubscribe when they (think that they) can just send a message to the list
(not even the admin) asking to be unsubscribed and it will (well actually won't
on these lists) happen.

Maybe posts with unsubscribe in the subject could be held for moderation
and/or get an automated reply with instructions for unsubscribing.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Jim C. Nasby

On Tue, May 09, 2006 at 11:33:42AM +0200, PFC wrote:
   - Repeating the query might yield different results if records were 
   added  or deleted in the meantime.

BTW, SET TRANSACTION ISOLATION LEVEL serializeable or BEGIN ISOLATION
LEVEL serializeable would cure that.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Jim C. Nasby

On Tue, May 09, 2006 at 01:29:56PM +0200, PFC wrote:
 0.101 ms BEGIN
 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT  
 NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC  
 LIMIT 20
 0.443 ms ANALYZE tmp
 0.365 ms SELECT * FROM tmp
 0.310 ms DROP TABLE tmp
 32.918 ms COMMIT
 
   CREATING the table is OK, but what happens on COMMIT ? I hear the 
   disk  seeking frantically.
 
 With fsync=off, I get this :
 
 0.090 ms BEGIN
 1.103 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT  
 NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
 0.439 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC  
 LIMIT 20
 0.528 ms ANALYZE tmp
 0.364 ms SELECT * FROM tmp
 0.313 ms DROP TABLE tmp
 0.688 ms COMMIT
 
   Getting closer ?
   I'm betting on system catalogs updates. I get the same timings with  
 ROLLBACK instead of COMMIT. Temp tables have a row in pg_class...

Have you tried getting a profile of what exactly PostgreSQL is doing
that takes so long when creating a temp table?

BTW, I suspect catalogs might be the answer, which is why Oracle has you
define a temp table once (which does all the work of putting it in the
catalog) and then you just use it accordingly in each individual
session.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-10 Thread Jim C. Nasby

On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote:
   You mean the cursors'storage is in fact the same internal machinery 
   as a  temporary table ?

Use the source, Luke...

See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and
heap_create_with_catalog in backend/catalog/heap.c. You'll find that
creating a tuplestore is far easier than creating a temp table.

Perhaps it would be worth creating a class of temporary tables that used
a tuplestore, although that would greatly limit what could be done with
that temp table.

Something else worth considering is not using the normal catalog methods
for storing information about temp tables, but hacking that together
would probably be a rather large task.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC




On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote:

You mean the cursors'storage is in fact the same internal machinery
as a  temporary table ?


Use the source, Luke...


LOL, yeah, I should have, sorry.


See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and
heap_create_with_catalog in backend/catalog/heap.c. You'll find that
creating a tuplestore is far easier than creating a temp table.


	I had used intuition (instead of the source) to come at the same  
conclusion regarding the level of complexity of these two...

But I'll look at the source ;)


Perhaps it would be worth creating a class of temporary tables that used
a tuplestore, although that would greatly limit what could be done with
that temp table.


	Just selecting from it I guess, but that's all that's needed. Anymore  
would duplicate the functionality of a temp table.
	I find cursors awkward. The application can FETCH from them, but postgres  
itself can't do it in SQL, unless using FOR.. IN in plpgsql...
	It would be a powerful addition to be able to split queries, factor out  
common parts between multiple queries, etc, using this system, it can even  
be used to execute an inner part of a query, then plan the rest according  
to the results and execute it... without the overhead of a temp table.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] in memory views

2006-05-10 Thread Thomas Vatter





Scott Marlowe wrote:

  On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote:
  
  
Scott Marlowe wrote: 

  
  
  
  

  What happens if you do this by declaring it as a cursor and then
fetching the first row?
  

  
  
  
  


  

I do executeQuery(), for the resultSet I do next() and return one row,
but wait, I have to review the logic in this area, I can tell you
tomorrow

  
  

A good short test is to run explain analyze on the query from the psql
command line.  If it shows an execution time of significantly less than
what you get from you application, then it is likely that the real
problem is that your application is receiving the whole result set via
libpq and waiting for that.  A cursor will solve that problem.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


  

Yes, the difference between psql command line and application is 6
seconds to 40 seconds. It is
exactly the step resultSet = excecuteQuery() that needs 40 seconds. I
use next() as a cursor
through the resultSet, but I fear this is not enough, do I have to use
createStatement(resultSetType, 
resultSetConcurrency) respectively prepareStatement (resultSetType,
resultSetConcurrency) to
achieve the cursor behaviour?

regards
tom

Re: [PERFORM] in memory views

2006-05-10 Thread Scott Marlowe

On Wed, 2006-05-10 at 15:54, Thomas Vatter wrote:


 Yes, the difference between psql command line and application is 6
 seconds to 40 seconds. It is
 exactly the step resultSet = excecuteQuery() that needs 40 seconds. I
 use next() as a cursor
 through the resultSet, but I fear this is not enough, do I have to use
 createStatement(resultSetType, 
 resultSetConcurrency) respectively prepareStatement (resultSetType,
 resultSetConcurrency) to
 achieve the cursor behaviour?

Not sure.  I don't use a lot of prepared statements.  I tend to build
queries and throw the at the database.  In that instance, it's done
like:

create cursor cursorname as select (rest of query here);
fetch from cursorname;

You can find more on cursors here:

http://www.postgresql.org/docs/8.1/interactive/sql-declare.html

Not sure if you can use them with prepared statements, or if prepared
statements have their own kind of implementation.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] in memory views

2006-05-10 Thread Dave Dutcher

Title: Message



Are 
you using the Postgres JDBC driver? Or are you using an ODBC JDBC 
driver? The Postgres specific driver is usually 
faster.



  
  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Thomas 
  VatterSent: Wednesday, May 10, 2006 3:54 PMTo: Scott 
  MarloweCc: Tino Wildenhain; 
  pgsql-performance@postgresql.orgSubject: Re: [PERFORM] in memory 
  viewsScott Marlowe wrote: 
  On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote:
  
Scott Marlowe wrote: 

  

  What happens if you do this by declaring it as a cursor and then
fetching the first row?
  
  


  I do executeQuery(), for the resultSet I do next() and return one row,
but wait, I have to review the logic in this area, I can tell you
tomorrow


A good short test is to run explain analyze on the query from the psql
command line.  If it shows an execution time of significantly less than
what you get from you application, then it is likely that the real
problem is that your application is receiving the whole result set via
libpq and waiting for that.  A cursor will solve that problem.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


  Yes, the difference between psql command line and 
  application is 6 seconds to 40 seconds. It isexactly the step resultSet = 
  excecuteQuery() that needs 40 seconds. I use next() as a cursorthrough the 
  resultSet, but I fear this is not enough, do I have to use 
  createStatement(resultSetType, resultSetConcurrency) respectively 
  prepareStatement (resultSetType, resultSetConcurrency) toachieve the 
  cursor behaviour?regardstom

Re: [PERFORM] in memory views

2006-05-10 Thread Thomas Vatter

Title: Message




Dave Dutcher wrote:

  
  
  
  Are you using the Postgres JDBC driver? Or
are you using an ODBC JDBC driver? The Postgres specific driver is
usually faster.


I'm using the postgres driver

regards
tom





  
  
  
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]] On Behalf Of Thomas
Vatter
Sent: Wednesday, May 10, 2006 3:54 PM
To: Scott Marlowe
Cc: Tino Wildenhain; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] in memory views


Scott Marlowe wrote:

  On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote:
  
  
Scott Marlowe wrote: 

  
  
  
  

  What happens if you do this by declaring it as a cursor and then
fetching the first row?
  

  
  
  
  


  

I do executeQuery(), for the resultSet I do next() and return one row,
but wait, I have to review the logic in this area, I can tell you
tomorrow

  
  

A good short test is to run explain analyze on the query from the psql
command line.  If it shows an execution time of significantly less than
what you get from you application, then it is likely that the real
problem is that your application is receiving the whole result set via
libpq and waiting for that.  A cursor will solve that problem.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


  

Yes, the difference between psql command line and application is 6
seconds to 40 seconds. It is
exactly the step resultSet = excecuteQuery() that needs 40 seconds. I
use next() as a cursor
through the resultSet, but I fear this is not enough, do I have to use
createStatement(resultSetType, 
resultSetConcurrency) respectively prepareStatement (resultSetType,
resultSetConcurrency) to
achieve the cursor behaviour?

regards
tom

  



-- 
Mit freundlichen Gren / Regards
Vatter
 
Network Inventory Software
Sun Microsystems Principal Partner

www.network-inventory.de
Tel. 030-79782510
E-Mail [EMAIL PROTECTED]

Re: [PERFORM] in memory views

2006-05-10 Thread Thomas Vatter





Scott Marlowe wrote:

  On Wed, 2006-05-10 at 15:54, Thomas Vatter wrote:

  
  


  

Yes, the difference between psql command line and application is 6
seconds to 40 seconds. It is
exactly the step resultSet = excecuteQuery() that needs 40 seconds. I
use next() as a cursor
through the resultSet, but I fear this is not enough, do I have to use
createStatement(resultSetType, 
resultSetConcurrency) respectively prepareStatement (resultSetType,
resultSetConcurrency) to
achieve the cursor behaviour?

  
  
Not sure.  I don't use a lot of prepared statements.  I tend to build
queries and throw the at the database.  In that instance, it's done
like:

create cursor cursorname as select (rest of query here);
fetch from cursorname;

You can find more on cursors here:

http://www.postgresql.org/docs/8.1/interactive/sql-declare.html

Not sure if you can use them with prepared statements, or if prepared
statements have their own kind of implementation.

---(end of broadcast)---
TIP 6: explain analyze is your friend


  


Yes, I have used embedded sql and create cursor, fetch before I started
with jdbc, seems that
I have to find out if new jdbc has a better way than simply resultSet =
statement.executeQuery().

regards
tom

Re: [PERFORM] Lot'sa joins - performance tip-up, please?

2006-05-10 Thread Jim C. Nasby

On Thu, May 04, 2006 at 04:45:57PM +0200, Mario Splivalo wrote:
Well, here's the problem...

  -  Nested Loop  (cost=0.00..176144.30 rows=57925 width=26)
 (actual time=1074.984..992536.243 rows=57925 loops=1)
-  Seq Scan on ticketing_codes_played
 (cost=0.00..863.25 rows=57925 width=8) (actual time=74.479..2047.993
 rows=57925 loops=1)
-  Index Scan using ticketing_codes_pk on
 ticketing_codes  (cost=0.00..3.01 rows=1 width=18) (actual
 time=17.044..17.052 rows=1 loops=57925)
  Index Cond: (ticketing_codes.code_id =
 outer.code_id)

Anyone have any idea why on earth it's doing that instead of a hash or
merge join?

In any case, try swapping the order of ticketing_codes_played and
ticketing_codes. Actually, that'd probably make it worse.

Try SET enable_nestloop = off;
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] in memory views

2006-05-10 Thread Kris Jurka




On Wed, 10 May 2006, Thomas Vatter wrote:

Yes, the difference between psql command line and application is 6 
seconds to 40 seconds. It is exactly the step resultSet = 
excecuteQuery() that needs 40 seconds. I use next() as a cursor through 
the resultSet, but I fear this is not enough, do I have to use 
createStatement(resultSetType, resultSetConcurrency) respectively 
prepareStatement (resultSetType, resultSetConcurrency) to achieve the 
cursor behaviour?


http://jdbc.postgresql.org/documentation/81/query.html#query-with-cursor

Kris Jurka


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] UNSUBSCRIBE

2006-05-10 Thread Jim C. Nasby

On Wed, May 10, 2006 at 11:10:37AM -0400, Tom Lane wrote:
 Michael Glaesemann [EMAIL PROTECTED] writes:
  (And are there mail readers out there that can pick those subscribe/ 
  unsubscribe headers from the list emails? Now *that'd* be sweet.)
 
 Well, in my fairly ancient copy of exmh, any message with such headers
 causes an additional menu to appear:

Based on the constantly broken threading in the lists, I'd bet that less
than 20% of posters use something more sophisticated than MS LookOut!,
and I'm sure that the stats for subscribers are far worse.

Does majordomo have an option to automagically handle such posts that
are sent to the post address instead of the admin address? I know
mailman can do that...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Same query - Slow in production

2006-05-10 Thread Brian Wipf

I'm trying to determine why an identical query is running  
approximately 500 to 1000 times slower on our production database  
compared to our backup database server.


Both database servers are dual 2.3 GHz G5 Xserves running PostgreSQL  
8.1.3; both are configured with 8GB of RAM with identical shared  
memory settings; both postgresql.conf files are identical; both  
databases have identical indexes defined.


The three relevant tables are all clustered the same, although I'm  
not sure when clustering was last performed on either server. All  
three tables have recently been analyzed on both servers.


The different explain plans for this query seem to be consistent on  
both servers regardless of category and the production server is  
consistently and drastically slower than the backup server.


If anyone has any ideas on how to have the production server generate  
the same explain plan as the backup server, or can suggest anything I  
might want to try, I would greatly appreciate it.


Brian Wipf
ClickSpace Interactive Inc.
[EMAIL PROTECTED]

Here's the query:

SELECT  ac.attribute_id
FROMattribute_category ac
WHERE   is_browsable = 'true' AND
category_id = 1000962 AND
EXISTS  (   SELECT  'X'
FROMproduct_attribute_value pav,
category_product cp
WHERE   pav.attribute_id = ac.attribute_id AND
pav.status_code is null AND
pav.product_id = cp.product_id AND
cp.category_id = ac.category_id AND
cp.product_is_active = 'true' AND
cp.product_status_code = 'complete'
)

Explain plans:

Fast (backup server):
 Index Scan using attribute_category__category_id_fk_idx on  
attribute_category ac  (cost=0.00..47943.34 rows=7 width=4) (actual  
time=0.110..0.263 rows=5 loops=1)

   Index Cond: (category_id = 1000962)
   Filter: (((is_browsable)::text = 'true'::text) AND (subplan))
   SubPlan
 -  Nested Loop  (cost=0.00..7983.94 rows=3 width=0) (actual  
time=0.043..0.043 rows=1 loops=5)
   -  Index Scan using  
category_product__category_id_is_active_and_status_idx on  
category_product cp  (cost=0.00..4362.64 rows=1103 width=4) (actual  
time=0.013..0.015 rows=2 loops=5)
 Index Cond: ((category_id = $1) AND  
((product_is_active)::text = 'true'::text) AND  
((product_status_code)::text = 'complete'::text))
   -  Index Scan using  
product_attribute_value__product_id_fk_idx on product_attribute_value  
pav  (cost=0.00..3.27 rows=1 width=4) (actual time=0.016..0.016  
rows=1 loops=8)

 Index Cond: (pav.product_id = outer.product_id)
 Filter: ((attribute_id = $0) AND (status_code IS  
NULL))

Total runtime: 0.449 ms
(11 rows)

Slow (production server):
 Index Scan using attribute_category__category_id_fk_idx on  
attribute_category ac  (cost=0.00..107115.90 rows=7 width=4) (actual  
time=1.472..464.437 rows=5 loops=1)

   Index Cond: (category_id = 1000962)
   Filter: (((is_browsable)::text = 'true'::text) AND (subplan))
   SubPlan
 -  Nested Loop  (cost=18.33..23739.70 rows=4 width=0) (actual  
time=92.870..92.870 rows=1 loops=5)
   -  Bitmap Heap Scan on product_attribute_value pav   
(cost=18.33..8764.71 rows=2549 width=4) (actual time=10.191..45.672  
rows=5869 loops=5)

 Recheck Cond: (attribute_id = $0)
 Filter: (status_code IS NULL)
 -  Bitmap Index Scan on  
product_attribute_value__attribute_id_fk_idx  (cost=0.00..18.33  
rows=2952 width=0) (actual time=9.160..9.160 rows=0 loops=5)

   Index Cond: (attribute_id = $0)
   -  Index Scan using x_category_product_pk on  
category_product cp  (cost=0.00..5.86 rows=1 width=4) (actual  
time=0.007..0.007 rows=0 loops=29345)
 Index Cond: ((cp.category_id = $1) AND  
(outer.product_id = cp.product_id))
 Filter: (((product_is_active)::text = 'true'::text)  
AND ((product_status_code)::text = 'complete'::text))

Total runtime: 464.667 ms
(14 rows)

Table Descriptions:

\d attribute_category;
 Table public.attribute_category
 Column  | Type | Modifiers
-+--+---
attribute_id| integer  | not null
category_id | integer  | not null
is_browsable| character varying(5) |
is_required | character varying(5) |
sort_order  | integer  |
default_unit_id | integer  |
Indexes:
attribute_category_pk PRIMARY KEY, btree (attribute_id,  
category_id)

attribute_category__attribute_id_fk_idx btree (attribute_id)
attribute_category__category_id_fk_idx btree (category_id)  
CLUSTER

Foreign-key constraints:
attribute_category_attribute_fk FOREIGN KEY (attribute_id)

Re: [PERFORM] in memory views

2006-05-10 Thread Thomas Vatter


Kris Jurka wrote:




On Wed, 10 May 2006, Thomas Vatter wrote:

Yes, the difference between psql command line and application is 6 
seconds to 40 seconds. It is exactly the step resultSet = 
excecuteQuery() that needs 40 seconds. I use next() as a cursor 
through the resultSet, but I fear this is not enough, do I have to 
use createStatement(resultSetType, resultSetConcurrency) respectively 
prepareStatement (resultSetType, resultSetConcurrency) to achieve the 
cursor behaviour?



http://jdbc.postgresql.org/documentation/81/query.html#query-with-cursor

Kris Jurka



I was just returning to my mailbox to report success, I was just a bit 
faster than your e-mail,  I have found the fetchSize function, it 
reduces the delay to 6 seconds. thanks a lot to all who helped, this was 
really great support, I am glad that the problem is solved


tom



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Same query - Slow in production

2006-05-10 Thread Brian Wipf

I added to the exists query qualifier: AND cp.category_id = 1000962  
(in addition to the cp.category_id = ac.category_id)


Now I am getting a much better query plan on our production server:

Index Scan using attribute_category__category_id_fk_idx on  
attribute_category ac  (cost=0.00..485.71 rows=7 width=4) (actual  
time=0.104..0.351 rows=5 loops=1)

   Index Cond: (category_id = 1000962)
   Filter: (((is_browsable)::text = 'true'::text) AND (subplan))
   SubPlan
 -  Nested Loop  (cost=0.00..24.77 rows=1 width=0) (actual  
time=0.058..0.058 rows=1 loops=5)
   -  Index Scan using  
x_category_product__category_id_fk_idx on category_product cp   
(cost=0.00..6.01 rows=1 width=4) (actual time=0.014..0.014 rows=1  
loops=5)
 Index Cond: ((category_id = $1) AND (category_id =  
1000962))
 Filter: (((product_is_active)::text = 'true'::text)  
AND ((product_status_code)::text = 'complete'::text))
   -  Index Scan using  
product_attribute_value__product_id_fk_idx on product_attribute_value  
pav  (cost=0.00..18.75 rows=1 width=4) (actual time=0.041..0.041  
rows=1 loops=5)

 Index Cond: (pav.product_id = outer.product_id)
 Filter: ((attribute_id = $0) AND (status_code IS  
NULL))

Total runtime: 0.558 ms
(12 rows)

It is using the x_category_product__category_id_fk_idx on  
category_product instead of the  
category_product__category_id_is_active_and_status_idx index as on  
our backup server. Still not sure what's causing the differences in  
query execution between the servers, but at least the query is fast  
again.


Brian

On 10-May-06, at 4:39 PM, Brian Wipf wrote:

I'm trying to determine why an identical query is running  
approximately 500 to 1000 times slower on our production database  
compared to our backup database server.


Both database servers are dual 2.3 GHz G5 Xserves running  
PostgreSQL 8.1.3; both are configured with 8GB of RAM with  
identical shared memory settings; both postgresql.conf files are  
identical; both databases have identical indexes defined.


The three relevant tables are all clustered the same, although I'm  
not sure when clustering was last performed on either server. All  
three tables have recently been analyzed on both servers.


The different explain plans for this query seem to be consistent on  
both servers regardless of category and the production server is  
consistently and drastically slower than the backup server.


If anyone has any ideas on how to have the production server  
generate the same explain plan as the backup server, or can suggest  
anything I might want to try, I would greatly appreciate it.


Brian Wipf
ClickSpace Interactive Inc.
[EMAIL PROTECTED]

Here's the query:

SELECT  ac.attribute_id
FROMattribute_category ac
WHERE   is_browsable = 'true' AND
category_id = 1000962 AND
EXISTS  (   SELECT  'X'
FROMproduct_attribute_value pav,
category_product cp
WHERE   pav.attribute_id = ac.attribute_id AND
pav.status_code is null AND
pav.product_id = cp.product_id AND
cp.category_id = ac.category_id AND
cp.product_is_active = 'true' AND
cp.product_status_code = 'complete'
)

Explain plans:

Fast (backup server):
 Index Scan using attribute_category__category_id_fk_idx on  
attribute_category ac  (cost=0.00..47943.34 rows=7 width=4) (actual  
time=0.110..0.263 rows=5 loops=1)

   Index Cond: (category_id = 1000962)
   Filter: (((is_browsable)::text = 'true'::text) AND (subplan))
   SubPlan
 -  Nested Loop  (cost=0.00..7983.94 rows=3 width=0) (actual  
time=0.043..0.043 rows=1 loops=5)
   -  Index Scan using  
category_product__category_id_is_active_and_status_idx on  
category_product cp  (cost=0.00..4362.64 rows=1103 width=4) (actual  
time=0.013..0.015 rows=2 loops=5)
 Index Cond: ((category_id = $1) AND  
((product_is_active)::text = 'true'::text) AND  
((product_status_code)::text = 'complete'::text))
   -  Index Scan using  
product_attribute_value__product_id_fk_idx on  
product_attribute_value pav  (cost=0.00..3.27 rows=1 width=4)  
(actual time=0.016..0.016 rows=1 loops=8)

 Index Cond: (pav.product_id = outer.product_id)
 Filter: ((attribute_id = $0) AND (status_code IS  
NULL))

Total runtime: 0.449 ms
(11 rows)

Slow (production server):
 Index Scan using attribute_category__category_id_fk_idx on  
attribute_category ac  (cost=0.00..107115.90 rows=7 width=4)  
(actual time=1.472..464.437 rows=5 loops=1)

   Index Cond: (category_id = 1000962)
   Filter: (((is_browsable)::text = 'true'::text) AND (subplan))
   SubPlan
 -  Nested Loop  (cost=18.33..23739.70 rows=4 width=0) (actual  
time=92.870..92.870 rows=1

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Tom Lane

Jim C. Nasby [EMAIL PROTECTED] writes:
 On Tue, May 09, 2006 at 03:13:01PM -0400, Tom Lane wrote:
 PFC [EMAIL PROTECTED] writes:
 Fun thing is, the rowcount from a temp table (which is the problem here)  
 should be available without ANALYZE ; as the temp table is not concurrent,  
 it would be simple to inc/decrement a counter on INSERT/DELETE...
 
 No, because MVCC rules still apply.

 But can anything ever see more than one version of what's in the table?

Yes, because there can be more than one active snapshot within a single
transaction (think about volatile functions in particular).

 Speaking of which, if a temp table is defined as ON COMMIT DROP or
 DELETE ROWS, there shouldn't be any need to store xmin/xmax, only
 cmin/cmax, correct?

No; you forgot about subtransactions.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Same query - Slow in production

2006-05-10 Thread Tom Lane

Brian Wipf [EMAIL PROTECTED] writes:
 I'm trying to determine why an identical query is running  
 approximately 500 to 1000 times slower on our production database  
 compared to our backup database server.

It looks to me like it's pure luck that the query is fast on the backup
server.  The outer side of the EXISTS' join is being badly misestimated:

 -  Index Scan using  
 category_product__category_id_is_active_and_status_idx on  
 category_product cp  (cost=0.00..4362.64 rows=1103 width=4) (actual  
 time=0.013..0.015 rows=2 loops=5)
   Index Cond: ((category_id = $1) AND  
 ((product_is_active)::text = 'true'::text) AND  
 ((product_status_code)::text = 'complete'::text))

If there actually had been 1100 matching rows instead of 2, the query
would have run 550 times slower, putting it in the same ballpark as
the other plan.  So what I'm guessing is that the planner sees these
two plans as being nearly the same cost, and small differences in the
stats between the two databases are enough to tip its choice in one
direction or the other.

So what you want, of course, is to improve that rowcount estimate.
I suppose the reason it's so bad is that we don't have multicolumn
statistics ... is there a strong correlation between product_is_active
and product_status_code?  If so, it might be worth your while to find a
way to merge them into one column.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

52 matches

Mail list logo