[PERFORM] view of view

2005-12-07 Thread Keith Worthington

Hi All,

I am working on an application that uses PostgreSQL.  One of the 
functions of the application is to generate reports.  In order to keep 
the code in the application simple we create a view of the required data 
in the database and then simply execute a SELECT * FROM 
view_of_the_data;  All of the manipulation and most of the time even the 
ordering is handled in the view.


My question is how much if any performance degradation is there in 
creating a view of a view?


IOW if I have a view that ties together a couple of tables and 
manipulates some data what will perform better; a view that filters, 
manipulates, and orders the data from the first view or a view that 
performs all the necessary calculations on the original tables?


--
Kind Regards,
Keith

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Performance degradation after successive UPDATE's

2005-12-07 Thread Bruno Wolff III
On Wed, Dec 07, 2005 at 14:14:31 +0200,
  Assaf Yaari <[EMAIL PROTECTED]> wrote:
> Hi Jan,
> 
> As I'm novice with PostgreSQL, can you elaborate the term FSM and
> settings recommendations?
http://developer.postgresql.org/docs/postgres/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM

> BTW: I'm issuing VACUUM ANALYZE every 15 minutes (using cron) and also
> changes the setting of fsync to false in postgresql.conf but still time
> seems to be growing.

You generally don't want fsync set to false.

> Also no other transactions are open.

Have you given us explain analyse samples yet?

> 
> Thanks,
> Assaf.
> 
> > -Original Message-
> > From: Jan Wieck [mailto:[EMAIL PROTECTED] 
> > Sent: Tuesday, December 06, 2005 2:35 PM
> > To: Assaf Yaari
> > Cc: Bruno Wolff III; pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] Performance degradation after 
> > successive UPDATE's
> > 
> > On 12/6/2005 4:08 AM, Assaf Yaari wrote:
> > > Thanks Bruno,
> > > 
> > > Issuing VACUUM FULL seems not to have influence on the time.
> > > I've added to my script VACUUM ANALYZE every 100 UPDATE's 
> > and run the 
> > > test again (on different record) and the time still increase.
> > 
> > I think he meant
> > 
> >  - run VACUUM FULL once,
> >  - adjust FSM settings to database size and turnover ratio
> >  - run VACUUM ANALYZE more frequent from there on.
> > 
> > 
> > Jan
> > 
> > > 
> > > Any other ideas?
> > > 
> > > Thanks,
> > > Assaf. 
> > > 
> > >> -Original Message-
> > >> From: Bruno Wolff III [mailto:[EMAIL PROTECTED]
> > >> Sent: Monday, December 05, 2005 10:36 PM
> > >> To: Assaf Yaari
> > >> Cc: pgsql-performance@postgresql.org
> > >> Subject: Re: Performance degradation after successive UPDATE's
> > >> 
> > >> On Mon, Dec 05, 2005 at 19:05:01 +0200,
> > >>   Assaf Yaari <[EMAIL PROTECTED]> wrote:
> > >> > Hi,
> > >> >  
> > >> > I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0.
> > >> >  
> > >> > My application updates counters in DB. I left a test 
> > over the night 
> > >> > that increased counter of specific record. After night running 
> > >> > (several hundreds of thousands updates), I found out 
> > that the time 
> > >> > spent on UPDATE increased to be more than 1.5 second (at
> > >> the beginning
> > >> > it was less than 10ms)! Issuing VACUUM ANALYZE and even
> > >> reboot didn't
> > >> > seemed to solve the problem.
> > >> 
> > >> You need to be running vacuum more often to get rid of the deleted 
> > >> rows (update is essentially insert + delete). Once you get 
> > too many, 
> > >> plain vacuum won't be able to clean them up without 
> > raising the value 
> > >> you use for FSM. By now the table is really bloated and 
> > you probably 
> > >> want to use vacuum full on it.
> > >> 
> > > 
> > > ---(end of 
> > > broadcast)---
> > > TIP 9: In versions below 8.0, the planner will ignore your desire to
> > >choose an index scan if your joining column's 
> > datatypes do not
> > >match
> > 
> > 
> > --
> > #=
> > =#
> > # It's easier to get forgiveness for being wrong than for 
> > being right. #
> > # Let's break this rule - forgive me. 
> >  #
> > #== 
> > [EMAIL PROTECTED] #
> > 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Join the same row

2005-12-07 Thread Richard Huxton

Edison Azzi wrote:

Richard Huxton escreveu:
However, even if you removed the condition on origem, I don't think 
the planner will notice that it can eliminate the join. It's just too 
unusual a case for the planner to have a rule for it.


I might be wrong about the planner - I'm just another user. One of the 
developers may correct me.



You are rigth, the planner will not eliminate the join, see:

select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and 
p.nrlancto = 21861;


EXPLAIN:
Nested Loop  (cost=0.00..11.48 rows=1 width=816)
 ->  Index Scan using cta_pag_pk on cta_pag a  (cost=0.00..5.74 rows=1 
width=408)

   Index Cond: (21861::numeric = nrlancto)
 ->  Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1 
width=408)

   Index Cond: (nrlancto = 21861::numeric)


I know that this is too unusual case, but I hoped that the planner could 
deal
with this condition. I´m trying to speed up without have to rewrite a 
bunch of

queries. Now I'll have to think another way to work around this issue.


Is the performance really so bad? All the data is guaranteed to be 
cached for the second index-scan.


--
  Richard Huxton
  Archonet Ltd


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] First query is slow, subsequent queries fast

2005-12-07 Thread Stephan Vollmer
Hi Oleg, thanks for your quick reply!

Oleg Bartunov wrote:

> you cache is too low :) Try to increase shared_buffers, for example,
> for 2Gb I'd set it to 100,000

Ok, I set shared_buffers to 10 and indeed it makes a big
difference. Other queries than the ones I mentioned are faster, too.

Thanks very much for your help,

- Stephan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High context switches occurring

2005-12-07 Thread Scott Marlowe
On Tue, 2005-12-06 at 22:49, Tom Lane wrote:
> Bruce Momjian  writes:
> > Tom Lane wrote:
> >> 1. You don't want number of clients (-c) much higher than scaling factor
> >> (-s in the initialization step).
> 
> > Should we throw a warning when someone runs the test this way?
> 
> Not a bad idea (though of course only for the "standard" scripts).
> Tatsuo, what do you think?

Just to clarify, I think the pgbench program should throw the warning,
not postgresql itself.  Not sure if that's what you were meaning or
not.  Maybe even have it require a switch to run in such a mode, like a
--yes-i-want-to-run-a-meaningless-test switch or something.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] High context switches occurring

2005-12-07 Thread Anjan Dave
Thanks for your inputs, Tom. I was going after high concurrent clients,
but should have read this carefully - 

-s scaling_factor
this should be used with -i (initialize) option.
number of tuples generated will be multiple of the
scaling factor. For example, -s 100 will imply 10M
(10,000,000) tuples in the accounts table.
default is 1.  NOTE: scaling factor should be at least
as large as the largest number of clients you intend
to test; else you'll mostly be measuring update
contention.

I'll rerun the tests.

Thanks,
Anjan


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 06, 2005 6:45 PM
To: Anjan Dave
Cc: Vivek Khera; Postgresql Performance
Subject: Re: [PERFORM] High context switches occurring 

"Anjan Dave" <[EMAIL PROTECTED]> writes:
> -bash-3.00$ time pgbench -c 1000 -t 30 pgbench
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 1
> number of clients: 1000
> number of transactions per client: 30
> number of transactions actually processed: 3/3
> tps = 45.871234 (including connections establishing)
> tps = 46.092629 (excluding connections establishing)

I can hardly think of a worse way to run pgbench :-(.  These numbers are
about meaningless, for two reasons:

1. You don't want number of clients (-c) much higher than scaling factor
(-s in the initialization step).  The number of rows in the "branches"
table will equal -s, and since every transaction updates one
randomly-chosen "branches" row, you will be measuring mostly row-update
contention overhead if there's more concurrent transactions than there
are rows.  In the case -s 1, which is what you've got here, there is no
actual concurrency at all --- all the transactions stack up on the
single branches row.

2. Running a small number of transactions per client means that
startup/shutdown transients overwhelm the steady-state data.  You should
probably run at least a thousand transactions per client if you want
repeatable numbers.

Try something like "-s 10 -c 10 -t 3000" to get numbers reflecting test
conditions more like what the TPC council had in mind when they designed
this benchmark.  I tend to repeat such a test 3 times to see if the
numbers are repeatable, and quote the middle TPS number as long as
they're not too far apart.

regards, tom lane


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] postgresql performance tuning

2005-12-07 Thread Vivek Khera


On Dec 6, 2005, at 5:03 PM, Ameet Kini wrote:

table with only 1 index, the time to do a vacuum (without full)  
went down

from 45 minutes to under 3 minutes.  Maybe thats not bloat but thats
surely surprising.  And this was after running vacuum periodically.


I'll bet either your FSM settings are too low and/or you don't vacuum  
often enough for your data churn rate.


Without more data, it is hard to solve the right problem.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance degradation after successive UPDATE's

2005-12-07 Thread Assaf Yaari
Hi Jan,

As I'm novice with PostgreSQL, can you elaborate the term FSM and
settings recommendations?
BTW: I'm issuing VACUUM ANALYZE every 15 minutes (using cron) and also
changes the setting of fsync to false in postgresql.conf but still time
seems to be growing.
Also no other transactions are open.

Thanks,
Assaf.

> -Original Message-
> From: Jan Wieck [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, December 06, 2005 2:35 PM
> To: Assaf Yaari
> Cc: Bruno Wolff III; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Performance degradation after 
> successive UPDATE's
> 
> On 12/6/2005 4:08 AM, Assaf Yaari wrote:
> > Thanks Bruno,
> > 
> > Issuing VACUUM FULL seems not to have influence on the time.
> > I've added to my script VACUUM ANALYZE every 100 UPDATE's 
> and run the 
> > test again (on different record) and the time still increase.
> 
> I think he meant
> 
>  - run VACUUM FULL once,
>  - adjust FSM settings to database size and turnover ratio
>  - run VACUUM ANALYZE more frequent from there on.
> 
> 
> Jan
> 
> > 
> > Any other ideas?
> > 
> > Thanks,
> > Assaf. 
> > 
> >> -Original Message-
> >> From: Bruno Wolff III [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, December 05, 2005 10:36 PM
> >> To: Assaf Yaari
> >> Cc: pgsql-performance@postgresql.org
> >> Subject: Re: Performance degradation after successive UPDATE's
> >> 
> >> On Mon, Dec 05, 2005 at 19:05:01 +0200,
> >>   Assaf Yaari <[EMAIL PROTECTED]> wrote:
> >> > Hi,
> >> >  
> >> > I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0.
> >> >  
> >> > My application updates counters in DB. I left a test 
> over the night 
> >> > that increased counter of specific record. After night running 
> >> > (several hundreds of thousands updates), I found out 
> that the time 
> >> > spent on UPDATE increased to be more than 1.5 second (at
> >> the beginning
> >> > it was less than 10ms)! Issuing VACUUM ANALYZE and even
> >> reboot didn't
> >> > seemed to solve the problem.
> >> 
> >> You need to be running vacuum more often to get rid of the deleted 
> >> rows (update is essentially insert + delete). Once you get 
> too many, 
> >> plain vacuum won't be able to clean them up without 
> raising the value 
> >> you use for FSM. By now the table is really bloated and 
> you probably 
> >> want to use vacuum full on it.
> >> 
> > 
> > ---(end of 
> > broadcast)---
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >choose an index scan if your joining column's 
> datatypes do not
> >match
> 
> 
> --
> #=
> =#
> # It's easier to get forgiveness for being wrong than for 
> being right. #
> # Let's break this rule - forgive me. 
>  #
> #== 
> [EMAIL PROTECTED] #
> 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] First query is slow, subsequent queries fast

2005-12-07 Thread Oleg Bartunov

Stephan,

you cache is too low :) Try to increase shared_buffers, for example,
for 2Gb I'd set it to 100,000

On Wed, 7 Dec 2005, Stephan Vollmer wrote:


Hi everybody!

This is my first posting to this list and I'm quite a PostgreSQL
newbie. My question is:

The first time I execute a query, it is very slow, but subsequent
queries are as fast as expected. I would be very glad if somebody
could explain why the first query is so slow and what I could do to
speed it up.

The query operates on a tsearch2 indexed column, but I experienced
the same issue on other tables as well, so I don't think it's a
tsearch2 issue.

To get a better overview of the queries and EXPLAIN outputs, I've
put them on a temporary website, together with the table definition
and my postgresql.conf:



I'm running PostgreSQL 8.1 on Windows XP SP2, Athlon64 3000+, 2 GB
RAM, 400 GB SATA HDD, 120 GB ATA HDD. The data reside on the first
HDD, the indexes in an index tablespace on the second HDD.

In the example below, the first query is still quite fast compared
to others. Sometimes the first query takes up to 9000 ms (see
website). I've run VACUUM FULL, but it didn't seem to solve the problem.

Thanks very much in advance,

- Stephan



Query:

SELECT keyword, overview
FROM publications
WHERE idx_fti @@ to_tsquery('default', 'linux & kernel')
ORDER BY rank_cd(idx_fti, 'linux & kernel') DESC;



EXPLAIN for first query:

Sort  (cost=859.89..860.48 rows=237 width=299) (actual
time=1817.962..1817.971 rows=10 loops=1)
 Sort Key: rank_cd(idx_fti, '''linux'' & ''kernel'''::tsquery)
 ->  Bitmap Heap Scan on publications  (cost=3.83..850.54 rows=237
width=299) (actual time=1817.839..1817.914 rows=10 loops=1)
   Filter: (idx_fti @@ '''linux'' & ''kernel'''::tsquery)
   ->  Bitmap Index Scan on idx_fti_idx  (cost=0.00..3.83
rows=237 width=0) (actual time=1817.792..1817.792 rows=10 loops=1)
 Index Cond: (idx_fti @@ '''linux'' & ''kernel'''::tsquery)
Total runtime: 1818.068 ms



EXPLAIN for second query:

Sort  (cost=859.89..860.48 rows=237 width=299) (actual
time=4.817..4.826 rows=10 loops=1)
 Sort Key: rank_cd(idx_fti, '''linux'' & ''kernel'''::tsquery)
 ->  Bitmap Heap Scan on publications  (cost=3.83..850.54 rows=237
width=299) (actual time=4.727..4.769 rows=10 loops=1)
   Filter: (idx_fti @@ '''linux'' & ''kernel'''::tsquery)
   ->  Bitmap Index Scan on idx_fti_idx  (cost=0.00..3.83
rows=237 width=0) (actual time=4.675..4.675 rows=10 loops=1)
 Index Cond: (idx_fti @@ '''linux'' & ''kernel'''::tsquery)
Total runtime: 4.914 ms

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] First query is slow, subsequent queries fast

2005-12-07 Thread Stephan Vollmer
Hi everybody!

This is my first posting to this list and I'm quite a PostgreSQL
newbie. My question is:

The first time I execute a query, it is very slow, but subsequent
queries are as fast as expected. I would be very glad if somebody
could explain why the first query is so slow and what I could do to
speed it up.

The query operates on a tsearch2 indexed column, but I experienced
the same issue on other tables as well, so I don't think it's a
tsearch2 issue.

To get a better overview of the queries and EXPLAIN outputs, I've
put them on a temporary website, together with the table definition
and my postgresql.conf:



I'm running PostgreSQL 8.1 on Windows XP SP2, Athlon64 3000+, 2 GB
RAM, 400 GB SATA HDD, 120 GB ATA HDD. The data reside on the first
HDD, the indexes in an index tablespace on the second HDD.

In the example below, the first query is still quite fast compared
to others. Sometimes the first query takes up to 9000 ms (see
website). I've run VACUUM FULL, but it didn't seem to solve the problem.

Thanks very much in advance,

- Stephan



Query:

SELECT keyword, overview
FROM publications
WHERE idx_fti @@ to_tsquery('default', 'linux & kernel')
ORDER BY rank_cd(idx_fti, 'linux & kernel') DESC;



EXPLAIN for first query:

Sort  (cost=859.89..860.48 rows=237 width=299) (actual
time=1817.962..1817.971 rows=10 loops=1)
  Sort Key: rank_cd(idx_fti, '''linux'' & ''kernel'''::tsquery)
  ->  Bitmap Heap Scan on publications  (cost=3.83..850.54 rows=237
width=299) (actual time=1817.839..1817.914 rows=10 loops=1)
Filter: (idx_fti @@ '''linux'' & ''kernel'''::tsquery)
->  Bitmap Index Scan on idx_fti_idx  (cost=0.00..3.83
rows=237 width=0) (actual time=1817.792..1817.792 rows=10 loops=1)
  Index Cond: (idx_fti @@ '''linux'' & ''kernel'''::tsquery)
Total runtime: 1818.068 ms



EXPLAIN for second query:

Sort  (cost=859.89..860.48 rows=237 width=299) (actual
time=4.817..4.826 rows=10 loops=1)
  Sort Key: rank_cd(idx_fti, '''linux'' & ''kernel'''::tsquery)
  ->  Bitmap Heap Scan on publications  (cost=3.83..850.54 rows=237
width=299) (actual time=4.727..4.769 rows=10 loops=1)
Filter: (idx_fti @@ '''linux'' & ''kernel'''::tsquery)
->  Bitmap Index Scan on idx_fti_idx  (cost=0.00..3.83
rows=237 width=0) (actual time=4.675..4.675 rows=10 loops=1)
  Index Cond: (idx_fti @@ '''linux'' & ''kernel'''::tsquery)
Total runtime: 4.914 ms

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] table partitioning: effects of many sub-tables (was COPY too slow...)

2005-12-07 Thread Rick Schumeyer








Based on a suggestion on the postgis list, I partitioned my 80 million (for
now) record table into

subtables of about 230k records (the amount of data collected in five
minutes).  At the moment

I have 350 subtables.

 

Everything seems to be great…COPY time is ok, building a
geometric index on “only” 230k records

is ok, query performance is ok.

 

I’m a little concerned about having so many subtables.  350
tables is not bad, but what happens if

the number of subtables grows into the thousands?  Is there a
practical limit to the effectiveness

partitioning?

 








Re: [PERFORM] Join the same row

2005-12-07 Thread Richard Huxton

Edison Azzi wrote:

Hi,

I´m trying to optimize some selects between 2 tables and the best way I 
found was
alter the first table and add the fields of the 2nd table. I adjusted 
the contents and
now a have only one table with all info that I need. Now resides my 
problem, because

of legacy queries I decided to make a Rule that replace the 2nd table.

Until now all worked well, but I found when I make a join between de result
table and de Rule, even tought is the same row in the same table, the 
optimizer

generete two access for the same row:
cta_pag is the table and ctapag_adm is the  rule.

CREATE OR REPLACE RULE "_RETURN" AS
   ON SELECT TO ctapag_adm DO INSTEAD  SELECT cta_pag.nrlancto, 
cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai

  FROM cta_pag
 WHERE cta_pag.origem = 'A'::bpchar;

This is one of the legacy queries:

select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and 
p.nrlancto = 21861;


OK - and you get a self-join (which is what you asked for, but you'd 
like the planner to notice that it might not be necessary).



   Resulting in twice the time for accessing.

Acessing just on time the same row:

select * from cta_pag p where  p.nrlancto = 21861


This isn't the same query though. Your rule has an additional condition 
origem='A'. This means it wouldn't be correct to eliminate the self-join 
even if the planner could.


   Is there a way to force the optimizer to understand that is the same 
row?


However, even if you removed the condition on origem, I don't think the 
planner will notice that it can eliminate the join. It's just too 
unusual a case for the planner to have a rule for it.


I might be wrong about the planner - I'm just another user. One of the 
developers may correct me.

--
  Richard Huxton
  Archonet Ltd


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-07 Thread Michael Riess

Christopher Kings-Lynne schrieb:
No, my problem is that using TSearch2 interferes with other core 
components of postgres like (auto)vacuum or dump/restore.


That's nonsense...seriously.

The only trick with dump/restore is that you have to install the 
tsearch2 shared library before restoring.  That's the same as all 
contribs though.


Well, then it changed since I last read the documentation. That was 
about a year ago, and since then we are using Lucene ... and as it works 
quite nicely, I see no reason to switch to TSearch2. Including it with 
the pgsql core would make it much more attractive to me, as it seems to 
me that once included into the core, features seem to be more stable. 
Call me paranoid, if you must ... ;-)





Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-07 Thread Christopher Kings-Lynne
No, my problem is that using TSearch2 interferes with other core 
components of postgres like (auto)vacuum or dump/restore.


That's nonsense...seriously.

The only trick with dump/restore is that you have to install the 
tsearch2 shared library before restoring.  That's the same as all 
contribs though.


Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match