date:20031119

Re: [PERFORM] Join on incompatible types

2003-11-19 Thread Shridhar Daithankar

Laurent Martelli wrote:

"Shridhar" == Shridhar Daithankar <[EMAIL PROTECTED]> writes:

  Shridhar> Laurent Martelli wrote:

[...]

  >> Should I understand that a join on incompatible types (such as
  >> integer and varchar) may lead to bad performances ?
  Shridhar> Conversely, you should enforce strict type compatibility
  Shridhar> in comparisons for getting any good plans..:-)
Ha ha, now I understand why a query of mine was so sluggish.

Is there a chance I could achieve the good perfs without having he
same types ? I've tried a CAST in the query, but it's even a little
worse than without it. However, using a view to cast integers into
varchar gives acceptable results (see at the end).
I'm using Postgresql 7.3.4.
I am stripping the analyze outputs and directly jumping to the end.

Can you try following?

1. Make all fields integer in all the table.
2. Try following query
EXPLAIN ANALYZE SELECT * from lists join classes on classes.id=lists.value where 
lists.id='16'::integer;

How does it affect the runtime?

 Shridhar

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

[PERFORM] TEXT column and indexing

2003-11-19 Thread Ian Barwick


I have this table:

db=> \d object_property_value
   Table "db.object_property_value"
Column |  Type  |  Modifiers   

---++
 obj_property_value_id | integer| not null default nextval(...
 obj_property_id   | integer| not null
 value | text   | 
Indexes:
"object_property_value_pkey" primary key, btree (obj_property_value_id)
"opv_obj_property_id_ix" btree (obj_property_id)
"opv_v_ix" btree (substr(value, 1, 128))
Foreign-key constraints:
"object_property_fkey" FOREIGN KEY (obj_property_id) 
   REFERENCES object_property(obj_property_id) 
  ON UPDATE CASCADE ON DELETE CASCADE

(long lines edited for readability).

The table contains about 250,000 records and will grow at regular intervals. 
The 'value' column contains text of various lengths. The table is VACUUMed
and ANALYZEd regularly and waxed on Sunday mornings. Database encoding is
Unicode. Server is 7.4RC1 or 7.4RC2 and will be 7.4 ASAP.

I want to query this table to match a specific value along
the lines of:

SELECT obj_property_id
  FROM object_property_value opv
 WHERE opv.value = 'foo'

There will only be a few (at the moment 2 or 3) rows exactly matching
'foo'. This query will only be performed with values containing less
than around 100 characters, which account for ca. 10% of all rows in the
table.

The performance is of course lousy:

db=> EXPLAIN
db-> SELECT obj_property_id
db->   FROM object_property_value opv
db->  WHERE opv.value = 'foo';
 QUERY PLAN  
-
 Seq Scan on object_property_value opv  (cost=0.00..12258.26 rows=2 width=4)
   Filter: (value = 'foo'::text)
(2 rows)

However, if I create a VARCHAR field containing the first 128 characters of
the text field and index that, an index scan is used:

db=> EXPLAIN
db-> SELECT obj_property_id
db->   FROM object_property_value opv
db->  WHERE opv.opv_vc = 'foo';
QUERY PLAN 

---
 Index Scan using opv_vc_ix on object_property_value opv  (cost=0.00..6.84 
rows=2 width=4)
   Index Cond: ((opv_vc)::text = 'foo'::text)

The question is therefore: can I get an index to work on the TEXT column? It
is currently indexed with:
 "opv_v_ix" btree (substr(value, 1, 128))

which doesn't appear to have any effect. I am probably missing something
obvious though. I can live with maintaining an extra VARCHAR column but
would like to keep the table as simple as possible.

(For anyone wondering: yes, I can access the data using tsearch2 - via
a different table in this case - but this is not always appropriate).


Thanks for any hints.


Ian Barwick
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Join on incompatible types

2003-11-19 Thread Laurent Martelli

> "Shridhar" == Shridhar Daithankar <[EMAIL PROTECTED]> writes:

  Shridhar> Laurent Martelli wrote:
  >>> "Shridhar" == Shridhar Daithankar
  >>> <[EMAIL PROTECTED]> writes:
  Shridhar> Laurent Martelli wrote:
  >> [...]  >> Should I understand that a join on incompatible types
  >> (such as >> integer and varchar) may lead to bad performances ?
  Shridhar> Conversely, you should enforce strict type compatibility
  Shridhar> in comparisons for getting any good plans..:-)
  >> Ha ha, now I understand why a query of mine was so sluggish.  Is
  >> there a chance I could achieve the good perfs without having he
  >> same types ? I've tried a CAST in the query, but it's even a
  >> little worse than without it. However, using a view to cast
  >> integers into varchar gives acceptable results (see at the end).
  >> I'm using Postgresql 7.3.4.

  Shridhar> I am stripping the analyze outputs and directly jumping to
  Shridhar> the end.

  Shridhar> Can you try following?

  Shridhar> 1. Make all fields integer in all the table.  

I can't do this because lists.values contains non integer data which
do not refer to a classes.id value. It may sound weird. This is
because it's a generic schema for a transparent persistence framework.

The solution for me would rather be to have varchar everywhere.

  Shridhar> 2. Try following query EXPLAIN ANALYZE SELECT * from lists
  Shridhar> join classes on classes.id=lists.value where
  Shridhar> lists.id='16'::integer;

  Shridhar> How does it affect the runtime?

  Shridhar>   Shridhar

-- 
Laurent Martelli
[EMAIL PROTECTED]Java Aspect Components
http://www.aopsys.com/ http://jac.aopsys.com

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Join on incompatible types

2003-11-19 Thread Shridhar Daithankar

Laurent Martelli wrote:

"Shridhar" == Shridhar Daithankar <[EMAIL PROTECTED]> writes:
  Shridhar> I am stripping the analyze outputs and directly jumping to
  Shridhar> the end.
  Shridhar> Can you try following?

  Shridhar> 1. Make all fields integer in all the table.  

I can't do this because lists.values contains non integer data which
do not refer to a classes.id value. It may sound weird. This is
because it's a generic schema for a transparent persistence framework.
Fine .I understand. So instead of using a field value, can you use integer 
version of that field? (Was that one of your queries used that? I deleted the OP)


The solution for me would rather be to have varchar everywhere.
You need to cast every occurance of that varchar field appropriately, to start 
with. The performance might suffer as well for numbers.

  Shridhar> 2. Try following query EXPLAIN ANALYZE SELECT * from lists
  Shridhar> join classes on classes.id=lists.value where
  Shridhar> lists.id='16'::integer;
classes.id=lists.value::integer.

Try that.

The aim is absolute type compatibility. If types aren't exactly same, the plan 
is effectively dead.


I would say postgresql enforces good habits in it's application developers, from 
a cultural POV.

Had C refused to compile without such strict type compatibility, we wouldn't 
have to worry about 16bit/32bit and 64 bit software. Just upgrade the compiler 
and everything is damn good..:-)

I doubt if C would have so popular with such strict type checking but that is 
another issue. I think pascal enforces such strict syntax.. Not sure though..


 Shridhar



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Join on incompatible types

2003-11-19 Thread Shridhar Daithankar

Laurent Martelli wrote:

"Shridhar" == Shridhar Daithankar <[EMAIL PROTECTED]> writes:


[...]

  Shridhar> 2. Try following query EXPLAIN ANALYZE SELECT * from lists
  Shridhar> join classes on classes.id=lists.value where
  Shridhar> lists.id='16'::integer;
  Shridhar> classes.id=lists.value::integer.

With classes.id of type integer and lists.value of type varchar, I get
"ERROR:  Cannot cast type character varying to integer", which is not
such a surprise. 
Try to_numbr function to get a number out of string. Then cast it to integer.

http://developer.postgresql.org/docs/postgres/functions-formatting.html

I hope that works. Don't have postgresql installation handy here..

 Shridhar

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] Join on incompatible types

2003-11-19 Thread Laurent Martelli

> "Shridhar" == Shridhar Daithankar <[EMAIL PROTECTED]> writes:

[...]

  Shridhar> 2. Try following query EXPLAIN ANALYZE SELECT * from lists
  Shridhar> join classes on classes.id=lists.value where
  Shridhar> lists.id='16'::integer;

  Shridhar> classes.id=lists.value::integer.

With classes.id of type integer and lists.value of type varchar, I get
"ERROR:  Cannot cast type character varying to integer", which is not
such a surprise. 

Thanks for your help anyway.


-- 
Laurent Martelli
[EMAIL PROTECTED]Java Aspect Components
http://www.aopsys.com/ http://jac.aopsys.com


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] constant vs function param differs in performance

2003-11-19 Thread SZŰCS Gábor

Dear Tom,

Thanks for your early response.

An addition: the nastier difference increased by adding an index (it was an
essential index for this query):

  func with param improved from 2700ms to 2300ms
  func with constant improved from 400ms to 31ms
  inline query improved from 390ms to 2ms

So am I reading correct and it is completely normal and can't be helped?
(couldn't have tried 7.4 yet)

In case it reveals something:

--- cut here ---
SELECT field FROM
(SELECT field, sum(something)=0 AS boolvalue
 FROM
 (SELECT * FROM subselect1 NATURAL LEFT JOIN subselect2
  UNION
  SELECT * FROM subselect3 NATURAL LEFT JOIN subselect4
 ) AS u
 GROUP BY field) AS t
WHERE not boolvalue
ORDER BY simple_sql_func_returns_bool(field) DESC
LIMIT 1;
--- cut here ---

G.
- Original Message - 
From: "Tom Lane" <[EMAIL PROTECTED]>
Sent: Friday, November 14, 2003 9:59 PM


> "=?iso-8859-2?B?U1rbQ1MgR+Fib3I=?=" <[EMAIL PROTECTED]> writes:
> > I have two SQL function that produce different times and I can't
understand
> > why.
>
> The planner often produces different plans when there are constants in
> WHERE clauses than when there are variables, because it can get more
> accurate ideas of how many rows will be retrieved.
>
> regards, tom lane
>


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] problem with select count(*) ..

2003-11-19 Thread Bruno Wolff III

On Thu, Nov 20, 2003 at 07:07:30 +0530,
  Rajesh Kumar Mallah <[EMAIL PROTECTED]> wrote:
> 
> If i dump and reload the performance improves and it takes < 1 sec. This
> is what i have been doing since the upgrade. But its not a solution.
> 
> The Vacuum full is at the end of a loading batch SQL file which makes lot of
> insert , deletes and updates.

If a dump and reload fixes your problem, most likely you have a lot of
dead tuples in the table. You might need to run vacuum more often.
You might have an open transaction that is preventing vacuum full
from cleaning up the table.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] More detail on settings for pgavd?

2003-11-19 Thread Josh Berkus

Shridhar,

> Will look into it. Give me a day or so. I am planning couple of other
> patches as well. May be over week end.

Thanks, appreciated.   As I said, I don't think the settings themselves are 
wrong, I think the documentation is.

What are you patching?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] TEXT column and indexing

2003-11-19 Thread Stephan Szabo


On Wed, 19 Nov 2003, Ian Barwick wrote:

>
> I have this table:
>
> db=> \d object_property_value
>Table "db.object_property_value"
> Column |  Type  |  Modifiers
> ---++
>  obj_property_value_id | integer| not null default nextval(...
>  obj_property_id   | integer| not null
>  value | text   |
> Indexes:
> "object_property_value_pkey" primary key, btree (obj_property_value_id)
> "opv_obj_property_id_ix" btree (obj_property_id)
> "opv_v_ix" btree (substr(value, 1, 128))
> Foreign-key constraints:
> "object_property_fkey" FOREIGN KEY (obj_property_id)
>REFERENCES object_property(obj_property_id)
>   ON UPDATE CASCADE ON DELETE CASCADE
> I want to query this table to match a specific value along
> the lines of:
>
> SELECT obj_property_id
>   FROM object_property_value opv
>  WHERE opv.value = 'foo'
>
> The question is therefore: can I get an index to work on the TEXT column? It
> is currently indexed with:
>  "opv_v_ix" btree (substr(value, 1, 128))
>
> which doesn't appear to have any effect. I am probably missing something
> obvious though. I can live with maintaining an extra VARCHAR column but

You probably need to be querying like:
WHERE substr(value,1,128)='foo';
in order to use that index.

While substr(txtcol, 1,128) happens to have the property that it would be
probably be useful in a search against a short constant string, that's an
internal property of that function.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] TEXT column and indexing

2003-11-19 Thread Manfred Koizar

On Wed, 19 Nov 2003 10:18:18 +0100, Ian Barwick <[EMAIL PROTECTED]>
wrote:
>Indexes:
>[...]
>"opv_v_ix" btree (substr(value, 1, 128))

>SELECT obj_property_id
>  FROM object_property_value opv
> WHERE opv.value = 'foo'

Try
... WHERE substr(opv.value, 1, 128) = 'foo'

HTH.
Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] duration logging setting in 7.4

2003-11-19 Thread Bruce Momjian

Ryszard Lach wrote:
> If I turn on log_min_duration_statement (i.e. set to 0), log_statement and
> log_duration, then I receive something like that
> 
> Nov 17 22:33:27 postgres[22945]: [29231-1] LOG:  statement:
> Nov 17 22:33:27 postgres[22945]: [29232-1] LOG:  duration: 0.198 ms
> Nov 17 22:33:27 postgres[22945]: [29233-1] LOG:  duration: 0.198 ms  statement:
> Nov 17 22:33:27 postgres[22946]: [29231-1] LOG:  statement:
> Nov 17 22:33:27 postgres[22946]: [29232-1] LOG:  duration: 0.191 ms
> Nov 17 22:33:27 postgres[22946]: [29233-1] LOG:  duration: 0.191 ms  statement:
> Nov 17 22:33:27 postgres[22678]: [147134-1] LOG:  statement: select * from cms where 
> id=1465
> Nov 17 22:33:27 postgres[22679]: [154907-1] LOG:  statement:
> Nov 17 22:33:27 postgres[22679]: [154908-1] LOG:  duration: 0.867 ms
> Nov 17 22:33:27 postgres[22679]: [154909-1] LOG:  duration: 0.867 ms  statement:
> Nov 17 22:33:27 postgres[22678]: [147135-1] LOG:  duration: 1.458 ms
> Nov 17 22:33:27 postgres[22678]: [147136-1] LOG:  duration: 1.458 ms  statement: 
> select * from cms where id=1465
> Nov 17 22:33:27 postgres[22680]: [158366-1] LOG:  statement:
> Nov 17 22:33:27 postgres[22680]: [158367-1] LOG:  duration: 0.620 ms
> Nov 17 22:33:27 postgres[22680]: [158368-1] LOG:  duration: 0.620 ms  statement:
> Nov 17 22:33:27 postgres[22681]: [161294-1] LOG:  statement:
> Nov 17 22:33:27 postgres[22681]: [161295-1] LOG:  duration: 0.650 ms
> 
> It seems, that log_duration is responsible only for "duration:" lines,
> log_statement - for "statement:" ones, and "log_min_duration_statement" - for
> "duration: .* statement:". I think, that the above output should exclude losing
> of data by syslog from further delibarations. Do you thing that could be
> a bug? 

Yes, the problem is not related to syslog.  Are you using prepared
queries, perhaps?  I don't think those show the query, but it seems we
should display something better than blanks.

> There is another one thing: logs from the same database running on 7.3 and the same
> application contained lines like 'select getdatabaseencoding()', 'select
> datestyle()' and similar (not used by application explicite, probably
> added by JDBC driver), now they are missed - maybe this is the
> problem?

No, those are missing because the new 7.4 wire protocol doesn't require
those queries anymore --- the data is send automatically.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] TEXT column and indexing

2003-11-19 Thread Ian Barwick

On Wednesday 19 November 2003 17:35, Stephan Szabo wrote:
> On Wed, 19 Nov 2003, Ian Barwick wrote:
> > I have this table:
(...)
>
> You probably need to be querying like:
> WHERE substr(value,1,128)='foo';
> in order to use that index.
>
> While substr(txtcol, 1,128) happens to have the property that it would be
> probably be useful in a search against a short constant string, that's an
> internal property of that function.

That's the one :-). Thanks!

Ian Barwick
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] TEXT column and indexing

2003-11-19 Thread Ian Barwick

On Wednesday 19 November 2003 17:26, you wrote:
> On Wed, 19 Nov 2003 10:18:18 +0100, Ian Barwick <[EMAIL PROTECTED]>
>
> wrote:
> >Indexes:
> >[...]
> >"opv_v_ix" btree (substr(value, 1, 128))
> >
> >SELECT obj_property_id
> >  FROM object_property_value opv
> > WHERE opv.value = 'foo'
>
> Try
>   ... WHERE substr(opv.value, 1, 128) = 'foo'
>
> HTH.

Yup:
db=> explain
db-> SELECT obj_property_id
db->   FROM object_property_value opv
db->  WHERE substr(opv.value,1,128) = 'foo';
   QUERY PLAN  
 

 Index Scan using opv_v_ix on object_property_value opv  (cost=0.00..4185.78 
rows=1101 width=4)
   Index Cond: (substr(value, 1, 128) = 'foo'::text)
(2 rows)

Many thanks

Ian Barwick
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] duration logging setting in 7.4

2003-11-19 Thread Ryszard Lach

On Tue, Nov 18, 2003 at 10:07:48AM -0500, Bruce Momjian wrote:
> 
> Wow, that is strange.  If you don't use syslog, do you see the proper
> output?

I've just checked this. It behaves exactly the same way.

> If you turn on log_statement, do you see the statements?

If I turn on log_min_duration_statement (i.e. set to 0), log_statement and
log_duration, then I receive something like that

Nov 17 22:33:27 postgres[22945]: [29231-1] LOG:  statement:
Nov 17 22:33:27 postgres[22945]: [29232-1] LOG:  duration: 0.198 ms
Nov 17 22:33:27 postgres[22945]: [29233-1] LOG:  duration: 0.198 ms  statement:
Nov 17 22:33:27 postgres[22946]: [29231-1] LOG:  statement:
Nov 17 22:33:27 postgres[22946]: [29232-1] LOG:  duration: 0.191 ms
Nov 17 22:33:27 postgres[22946]: [29233-1] LOG:  duration: 0.191 ms  statement:
Nov 17 22:33:27 postgres[22678]: [147134-1] LOG:  statement: select * from cms where 
id=1465
Nov 17 22:33:27 postgres[22679]: [154907-1] LOG:  statement:
Nov 17 22:33:27 postgres[22679]: [154908-1] LOG:  duration: 0.867 ms
Nov 17 22:33:27 postgres[22679]: [154909-1] LOG:  duration: 0.867 ms  statement:
Nov 17 22:33:27 postgres[22678]: [147135-1] LOG:  duration: 1.458 ms
Nov 17 22:33:27 postgres[22678]: [147136-1] LOG:  duration: 1.458 ms  statement: 
select * from cms where id=1465
Nov 17 22:33:27 postgres[22680]: [158366-1] LOG:  statement:
Nov 17 22:33:27 postgres[22680]: [158367-1] LOG:  duration: 0.620 ms
Nov 17 22:33:27 postgres[22680]: [158368-1] LOG:  duration: 0.620 ms  statement:
Nov 17 22:33:27 postgres[22681]: [161294-1] LOG:  statement:
Nov 17 22:33:27 postgres[22681]: [161295-1] LOG:  duration: 0.650 ms

It seems, that log_duration is responsible only for "duration:" lines,
log_statement - for "statement:" ones, and "log_min_duration_statement" - for
"duration: .* statement:". I think, that the above output should exclude losing
of data by syslog from further delibarations. Do you thing that could be
a bug? 

There is another one thing: logs from the same database running on 7.3 and the same
application contained lines like 'select getdatabaseencoding()', 'select
datestyle()' and similar (not used by application explicite, probably
added by JDBC driver), now they are missed - maybe this is the
problem?

Richard.

-- 
"First they ignore you. Then they laugh at you. Then they
fight you. Then you win." - Mohandas Gandhi.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] duration logging setting in 7.4

2003-11-19 Thread Ryszard Lach

On Wed, Nov 19, 2003 at 01:58:27PM -0500, Bruce Momjian wrote:
> Ryszard Lach wrote:
> 
> > There is another one thing: logs from the same database running on 7.3 and the same
> > application contained lines like 'select getdatabaseencoding()', 'select
> > datestyle()' and similar (not used by application explicite, probably
> > added by JDBC driver), now they are missed - maybe this is the
> > problem?
> 
> No, those are missing because the new 7.4 wire protocol doesn't require
> those queries anymore --- the data is send automatically.
> 

Mayby this is a solution? Because of some
charset-related problems we are still using an old (AFAiR modified)
version of JDBC driver. I'm not a programmer, but I think and don't know
what JDBC driver does, but maybe it sends from client side those queries
and server doesn't know what to do with them? I'll ask our programmers
to try with 7.4 driver and tell you about results.

Richard.

-- 
"First they ignore you. Then they laugh at you. Then they
fight you. Then you win." - Mohandas Gandhi.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] duration logging setting in 7.4

2003-11-19 Thread Bruce Momjian

Ryszard Lach wrote:
> On Wed, Nov 19, 2003 at 01:58:27PM -0500, Bruce Momjian wrote:
> > Ryszard Lach wrote:
> > 
> > > There is another one thing: logs from the same database running on 7.3 and the 
> > > same
> > > application contained lines like 'select getdatabaseencoding()', 'select
> > > datestyle()' and similar (not used by application explicite, probably
> > > added by JDBC driver), now they are missed - maybe this is the
> > > problem?
> > 
> > No, those are missing because the new 7.4 wire protocol doesn't require
> > those queries anymore --- the data is send automatically.
> > 
> 
> Mayby this is a solution? Because of some
> charset-related problems we are still using an old (AFAiR modified)
> version of JDBC driver. I'm not a programmer, but I think and don't know
> what JDBC driver does, but maybe it sends from client side those queries
> and server doesn't know what to do with them? I'll ask our programmers
> to try with 7.4 driver and tell you about results.

Also, try plain psql and issue a query and see if it appears.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] More detail on settings for pgavd?

2003-11-19 Thread Shridhar Daithankar

Josh Berkus wrote:

Shridhar,

I was looking at the -V/-v and -A/-a settings in pgavd, and really don't 
understand how the calculation works.   According to the readme, if I set -v 
to 1000 and -V to 2 (the defaults) for a table with 10,000 rows, pgavd would 
only vacuum after 21,000 rows had been updated.   This seems wrong.
No. that is correct.

It is calculated as

threshold = base + scale*numebr of current rows

Which translates to

21,000 = 1000 + 2*1000

However I do not agree with this logic entirely. It pegs the next vacuum w.r.t 
current table size which is not always a good thing.

I would rather vacuum the table at 2000 updates, which is what you probably want.

Furthermore analyze threshold depends upon inserts+updates. I think it should 
also depends upon deletes for obvious reasons.

Can you clear this up a little?   I'd like to tweak these settings but can't 
without being better aquainted with the calculation.
What did you expected in above example? It is not difficult to tweak 
pg_autovacuum calculations. For testing we can play around.

Also, you may want to reverse your default ratio for Vacuum/analyze frequency.  
True, analyze is a less expensive operation than Vacuum, but it's also needed 
less often -- only when the *distribution* of data changes.I've seen 
databases where the optimal vacuum/analyze frequency was every 10 min/once 
per day.
OK vacuum and analyze thresholds are calculated with same formula as shown above 
 but with different parameters as follows.

vacthresh = vacbase + vacscale*ntuples
anathresh = anabase + anascale*ntuples
What you are asking for is

vacthresh = vacbase*vacscale
anathresh = anabase + anascale*ntuples
Would that tilt the favour the way you want? i.e. an analyze is triggered when a 
fixed *percentage* of table changes but a vacuum is triggered when a fixed 
*number of rows* are changed.

I am all for experimentation. If you have real life data to play with, I can 
give you some patches to play around.

And BTW, this is all brain child of Mathew O.Connor(Correct? I am not good at 
either names or spellings). The way I wrote pgavd originally, each table got to 
get separate threshold..:-). That was rather a brute force approach.

 Shridhar





---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] More detail on settings for pgavd?

2003-11-19 Thread Josh Berkus

Shridhar,

> However I do not agree with this logic entirely. It pegs the next vacuum
> w.r.t current table size which is not always a good thing.

No, I think the logic's fine, it's the numbers which are wrong.   We want to 
vacuum when updates reach between 5% and 15% of total rows.   NOT when 
updates reach 110% of total rows ... that's much too late.

Hmmm ... I also think the threshold level needs to be lowered; I guess the 
purpose was to prevent continuous re-vacuuuming of small tables?  
Unfortunately, in the current implementation, the result is tha small tables 
never get vacuumed at all.

So for defaults, I would peg -V at 0.1 and -v at 100, so our default 
calculation for a table with 10,000 rows is:

100 +  ( 0.1 * 10,000 ) = 1100 rows.

> I would rather vacuum the table at 2000 updates, which is what you probably
> want.

Not necessarily.  This would be painful if the table has 10,000,000 rows.   It 
*should* be based on a % of rows.

> Furthermore analyze threshold depends upon inserts+updates. I think it
> should also depends upon deletes for obvious reasons.

Yes.  Vacuum threshold is counting deletes, I hope?

> What did you expected in above example? It is not difficult to tweak
> pg_autovacuum calculations. For testing we can play around.

Can I set the settings to decimals, or are they integers?

> vacthresh = vacbase*vacscale
> anathresh = anabase + anascale*ntuples

Nope, see above.

My comment about the frequency of vacuums vs. analyze is that currently the 
*default* is to analyze twice as often as you vacuum.Based on my 
experiece as a PG admin on a variety of databases, I believe that the default 
should be to analyze half as often as you vacuum.

> I am all for experimentation. If you have real life data to play with, I
> can give you some patches to play around.

I will have real data very soon .

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Backup/restore of pg_statistics

2003-11-19 Thread Tom Lane

Joel Jacobson <[EMAIL PROTECTED]> writes:
> I understand that it is not possible to occasionally re-plan the queries in a
> PL/pgSQL function without dropping and re-creating the function.

Huh?  You only need to start a fresh connection.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] High Processor consumption

2003-11-19 Thread Shridhar Daithankar

Benjamin Bostow wrote:

I haven't modified any of the setting. I did try changing shmmax from
32MB to 256MB but didn't see much change in the processor usage. The
init script that runs to start the server uses the following:
su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl  -D $PGDATA -p
/usr/bin/postmaster -o \"-i -N 128 -B 256\" start  > /dev/null 2>&1" <
/dev/null
Lol.. -B 256 does not mean 256MB of buffers. It means 2MB of buffers. Each 
buffer is of 8K. Try upping it to a 1000 or 2000
I haven't modified postgresql.conf yet but am going through a book on
performance tuning the server. If you can provide any suggestions or
help as I am new to postgres it would be appreciated.
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

 HTH

 Shridhar

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Join on incompatible types

[PERFORM] TEXT column and indexing

Re: [PERFORM] Join on incompatible types

Re: [PERFORM] Join on incompatible types

Re: [PERFORM] Join on incompatible types

Re: [PERFORM] Join on incompatible types

Re: [PERFORM] constant vs function param differs in performance

Re: [PERFORM] problem with select count(*) ..

Re: [PERFORM] More detail on settings for pgavd?

Re: [PERFORM] TEXT column and indexing

Re: [PERFORM] TEXT column and indexing

Re: [PERFORM] duration logging setting in 7.4

Re: [PERFORM] TEXT column and indexing

Re: [PERFORM] TEXT column and indexing

Re: [PERFORM] duration logging setting in 7.4

Re: [PERFORM] duration logging setting in 7.4

Re: [PERFORM] duration logging setting in 7.4

Re: [PERFORM] More detail on settings for pgavd?

Re: [PERFORM] More detail on settings for pgavd?

Re: [PERFORM] Backup/restore of pg_statistics

Re: [PERFORM] High Processor consumption

21 matches

Site Navigation

Mail list logo

Footer information