[SQL] REINDEX DATABASE

2005-07-26 Thread Eugene E.

Hello

Would you like to advice to use REINDEX DATABASE on regular basis ?

if (yes)
how it should be connected with VACUUM FULL ANALYZE which is run 
regularly ?  (reindex before vacuum or vacuum before reindex?)


else
haw to determine _when_ to run REINDEX ?


many thanks

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] REINDEX DATABASE

2005-07-26 Thread Christopher Browne
> Hello
>
> Would you like to advice to use REINDEX DATABASE on regular basis ?
>
> if (yes)
> how it should be connected with VACUUM FULL ANALYZE which is run
> regularly ?  (reindex before vacuum or vacuum before reindex?)
>
> else
> haw to determine _when_ to run REINDEX ?

If you are doing ordinary VACUUM ANALYZE frequently enough, it
shouldn't be necessary to either VACUUM FULL or REINDEX.

Back in the 7.2 days, there were sorts of update patterns that would
mandate reindexing every so often, as you could get cases where index
pages would be very sparsely populated.  That was alleviated in
version 7.3, I believe, and was clearly evident in 7.4.

You know you need to REINDEX if analysis of an index shows that it is
sparsely populated.  This generally shows up if you do an analyze on
the table and find an index has more pages than tuples.

But if you run VACUUM reasonably frequently, this shouldn't be
necessary...
-- 
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/rdbms.html
Rules of  the Evil  Overlord #153.  "My Legions of  Terror will  be an
equal-opportunity employer. Conversely, when  it is prophesied that no
man  can defeat  me, I  will  keep in  mind the  increasing number  of
non-traditional gender roles." 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] REINDEX DATABASE

2005-07-26 Thread Achilleus Mantzios
O Christopher Browne έγραψε στις Jul 26, 2005 :

> > Hello
> >
> > Would you like to advice to use REINDEX DATABASE on regular basis ?
> >
> > if (yes)
> > how it should be connected with VACUUM FULL ANALYZE which is run
> > regularly ?  (reindex before vacuum or vacuum before reindex?)
> >
> > else
> > haw to determine _when_ to run REINDEX ?
> 
> If you are doing ordinary VACUUM ANALYZE frequently enough, it
> shouldn't be necessary to either VACUUM FULL or REINDEX.
> 
> Back in the 7.2 days, there were sorts of update patterns that would
> mandate reindexing every so often, as you could get cases where index
> pages would be very sparsely populated.  That was alleviated in
> version 7.3, I believe, and was clearly evident in 7.4.
> 
> You know you need to REINDEX if analysis of an index shows that it is
> sparsely populated.  This generally shows up if you do an analyze on
> the table and find an index has more pages than tuples.
> 
> But if you run VACUUM reasonably frequently, this shouldn't be
> necessary...

I vacuum analyze every hour, however after reindexdb 
my (currently) 2.47 GB db is reduced to 2.37 GB,
thus helping both db-wise and freebsd_cache-wise boost performance.

All the above in 7.4.6.

I am not arguing that i have done detailed analysis of the 
situation, i'm just saying that i have witnessed a performance gain after
running contrib/reindexdb every month (or so).

> 

-- 
-Achilleus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] REINDEX DATABASE

2005-07-26 Thread Chris Browne
[EMAIL PROTECTED] (Achilleus Mantzios) writes:
> O Christopher Browne έγραψε στις Jul 26, 2005 :
>
>> > Hello
>> >
>> > Would you like to advice to use REINDEX DATABASE on regular basis ?
>> >
>> > if (yes)
>> > how it should be connected with VACUUM FULL ANALYZE which is run
>> > regularly ?  (reindex before vacuum or vacuum before reindex?)
>> >
>> > else
>> > haw to determine _when_ to run REINDEX ?
>> 
>> If you are doing ordinary VACUUM ANALYZE frequently enough, it
>> shouldn't be necessary to either VACUUM FULL or REINDEX.
>> 
>> Back in the 7.2 days, there were sorts of update patterns that would
>> mandate reindexing every so often, as you could get cases where index
>> pages would be very sparsely populated.  That was alleviated in
>> version 7.3, I believe, and was clearly evident in 7.4.
>> 
>> You know you need to REINDEX if analysis of an index shows that it is
>> sparsely populated.  This generally shows up if you do an analyze on
>> the table and find an index has more pages than tuples.
>> 
>> But if you run VACUUM reasonably frequently, this shouldn't be
>> necessary...
>
> I vacuum analyze every hour, however after reindexdb 
> my (currently) 2.47 GB db is reduced to 2.37 GB,
> thus helping both db-wise and freebsd_cache-wise boost performance.
>
> All the above in 7.4.6.
>
> I am not arguing that i have done detailed analysis of the 
> situation, i'm just saying that i have witnessed a performance gain after
> running contrib/reindexdb every month (or so).

That doesn't strike me as being a material improvement, and it comes
at a pretty high cost.

You can get a savings of about 4% of the space, but at the cost of
taking an appreciable outage during which the database is not usable.

I wouldn't expect the 4% savings in space to lead to a particularly
measurable improvement in performance, certainly not one worth the
outage.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  "I will not tell my Legions of Terror
"And he must  be taken alive!" The command will be:  ``And try to take
him alive if it is reasonably practical.''"


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[SQL] Joining two large tables on a tiny subset of rows

2005-07-26 Thread Dmitri Bichko
Hello,

I have two tables, one has a foreing key from the other (only showing
the relevant columns and indices here):

 Table "expresso.probes"
   Column|  Type  | Modifiers
-++---
 platform_id | integer| not null
 probe_num   | integer| not null
 mrna_acc| character varying(50)  |
Indexes:
"idx_probes_mrna_acc" btree (mrna_acc, platform_id) WHERE mrna_acc
IS NOT NULL

   Table "expresso.mrna_info"
   Column| Type  | Modifiers
-+---+---
 mrna_acc| character varying(25) | not null
 symbol  | character varying(50) | not null
Indexes:
"idx_mrna_info_symbol" btree (upper(symbol::text)) WHERE symbol IS
NOT NULL

I want to select all the records in "probes" which have an "mrna_acc"
that is associated with a specific symbol in "mrna_info".

This works fine if I do it in two steps: select from "mrna_info" (12
records out of ~250 thousand) and paste those into an IN clause to
select from "probes" (83 records out of ~4.5 million).

But if I try to do this as one query I always end up with a seq scan on
"probes".  I've tried doing it with joins, subselects, joining with a
dynamic table (select from () as foo), nothing seems to get the desired
result.

Here are the explains for the two step process:

expression=> EXPLAIN ANALYZE SELECT mrna_acc FROM mrna_info WHERE
upper(symbol) = upper('pparg') AND symbol IS NOT NULL;
  QUERY PLAN

---
 Index Scan using idx_mrna_info_symbol on mrna_info  (cost=0.00..2934.78
rows=930 width=12) (actual time=0.038..0.089 rows=12 loops=1)
   Index Cond: (upper((symbol)::text) = 'PPARG'::text)
   Filter: (symbol IS NOT NULL)
 Total runtime: 0.123 ms
(4 rows)

EXPLAIN ANALYZE SELECT platform_id, probe_num FROM probes WHERE mrna_acc
IN
('U10374','U09138','U01841','U01664','NM_015869','NM_013124','NM_011146'
,'NM_005037','D83233','BC021798','BC006811','AB011365') AND mrna_acc IS
NOT NULL;
 
QUERY PLAN








 Index Scan using idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc on probes  (cost=0.00..14710.63 rows=4151 width=8)
(actual time=0.040..0.719 rows=142 loops=1)
   Index Cond: (((mrna_acc)::text = 'U10374'::text) OR ((mrna_acc)::text
= 'U09138'::text) OR ((mrna_acc)::text = 'U01841'::text) OR
((mrna_acc)::text = 'U01664'::text) OR ((mrna_acc)::text =
'NM_015869'::text) OR ((mrna_acc)::text = 'NM_013124'::text) OR
((mrna_acc)::text = 'NM_011146'::text) OR ((mrna_acc)::text =
'NM_005037'::text) OR ((mrna_acc)::text = 'D83233'::text) OR
((mrna_acc)::text = 'BC021798'::text) OR ((mrna_acc)::text =
'BC006811'::text) OR ((mrna_acc)::text = 'AB011365'::text))
   Filter: (mrna_acc IS NOT NULL)
 Total runtime: 0.844 ms
(4 rows)

And here is the explain of the join (it's essentially the same plan as
the subselect and all the other ways I've tried):

expression=> explain SELECT platform_id, probe_num FROM mrna_info m,
probes p WHERE m.mrna_acc = p.mrna_acc and p.mrna_acc is not null and
UPPER(symbol) = UPPER('pparg') AND m.symbol IS NOT NULL;
QUERY PLAN

--
 Hash Join  (cost=2937.10..133392.04 rows=9939 width=8)
   Hash Cond: (("outer".mrna_acc)::text = ("inner".mrna_acc)::text)
   ->  Seq Scan on probes p  (cost=0.00..115434.22 rows=2984265
width=19)
 Filter: (mrna_acc IS NOT NULL)
   ->  Hash  (cost=2934.78..2934.78 rows=930 width=12)
 ->  Index Scan using idx_mrna_info_symbol on mrna_info m
(cost=0.00..2934.78 rows=930 width=12)
   Index Cond: (upper((symbol)::text) = 'PPARG'::text)
   Filter: (symbol IS NOT NULL)
(8 rows)

What am I doing wrong?

Dmitri

The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential and/or privileged material. Any 
review, retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or

Re: [SQL] Joining two large tables on a tiny subset of rows

2005-07-26 Thread Gnanavel S
On 7/27/05, Dmitri Bichko <[EMAIL PROTECTED]> wrote:
Hello,I have two tables, one has a foreing key from the other (only showingthe relevant columns and indices here): Table "expresso.probes"  
Column|  Type  |
Modifiers-++--- platform_id | integer| not null
probe_num   |
integer|
not null mrna_acc| character varying(50)  |Indexes:"idx_probes_mrna_acc" btree (mrna_acc, platform_id) WHERE mrna_accIS NOT NULL   Table "expresso.mrna_info"
  
Column|
Type  |
Modifiers-+---+--- mrna_acc| character varying(25) | not null symbol  | character varying(50) | not nullIndexes:"idx_mrna_info_symbol" btree (upper(symbol::text)) WHERE symbol IS
NOT NULLI want to select all the records in "probes" which have an "mrna_acc"that is associated with a specific symbol in "mrna_info".This works fine if I do it in two steps: select from "mrna_info" (12
records out of ~250 thousand) and paste those into an IN clause toselect from "probes" (83 records out of ~4.5 million).But if I try to do this as one query I always end up with a seq scan on
"probes".  I've tried doing it with joins, subselects, joining with adynamic table (select from () as foo), nothing seems to get the desiredresult.Here are the explains for the two step process:
_expression_=> EXPLAIN ANALYZE SELECT mrna_acc FROM mrna_info WHEREupper(symbol) = upper('pparg') AND symbol IS NOT NULL;  QUERY
PLAN--- Index Scan using idx_mrna_info_symbol on mrna_info  (cost=0.00..2934.78
rows=930 width=12) (actual time=0.038..0.089 rows=12 loops=1)   Index Cond: (upper((symbol)::text) = 'PPARG'::text)   Filter: (symbol IS NOT NULL) Total runtime: 0.123 ms(4 rows)
Whether any records are returned by this query. Moreover "symbol is not null" condition is not needed as that column cannot be null by table definition.
EXPLAIN ANALYZE SELECT platform_id, probe_num FROM probes WHERE mrna_accIN
('U10374','U09138','U01841','U01664','NM_015869','NM_013124','NM_011146','NM_005037','D83233','BC021798','BC006811','AB011365') AND mrna_acc ISNOT NULL;QUERY PLAN


 Index Scan using idx_probes_mrna_acc, idx_probes_mrna_acc,idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,idx_probes_mrna_acc, idx_probes_mrna_acc, idx_probes_mrna_acc,
idx_probes_mrna_acc on probes  (cost=0.00..14710.63 rows=4151 width=8)(actual time=0.040..0.719 rows=142 loops=1)   Index Cond: (((mrna_acc)::text = 'U10374'::text) OR ((mrna_acc)::text= 'U09138'::text) OR ((mrna_acc)::text = 'U01841'::text) OR
((mrna_acc)::text = 'U01664'::text) OR ((mrna_acc)::text ='NM_015869'::text) OR ((mrna_acc)::text = 'NM_013124'::text) OR((mrna_acc)::text = 'NM_011146'::text) OR ((mrna_acc)::text ='NM_005037'::text) OR ((mrna_acc)::text = 'D83233'::text) OR
((mrna_acc)::text = 'BC021798'::text) OR ((mrna_acc)::text ='BC006811'::text) OR ((mrna_acc)::text = 'AB011365'::text))   Filter: (mrna_acc IS NOT NULL) Total runtime: 0.844 ms(4 rows)And here is the explain of the join (it's essentially the same plan as
the subselect and all the other ways I've tried):_expression_=> explain SELECT platform_id, probe_num FROM mrna_info m,probes p WHERE m.mrna_acc = p.mrna_acc and p.mrna_acc is not null andUPPER(symbol) = UPPER('pparg') AND 
m.symbol IS NOT NULL;
Here m.mrna_acc cannot be null, so condition on p.mrna_acc is not null
is useless as m.mrna_acc = p.mrna_acc (for null) will not give any
records
Try to  use outer join. 
QUERY
PLAN-- Hash Join  (cost=2937.10..133392.04 rows=9939 width=8)   Hash Cond: (("outer".mrna_acc)::text = ("inner".mrna_acc)::text)
   ->  Seq Scan on probes p  (cost=0.00..115434.22 rows=2984265width=19) Filter: (mrna_acc IS NOT NULL)   ->  Hash  (cost=2934.78..2934.78 rows=930 width=12) ->  Index Scan using idx_mrna_info_symbol on mrna_info m
(cost=0.00..2934.78 rows=930 width=12)  
Index Cond: (upper((symbol)::text) = 'PPARG'::text)   Filter: (symbol IS NOT NULL)(8 rows)What am I doing wrong?D

Re: [SQL] REINDEX DATABASE

2005-07-26 Thread Achilleus Mantzios
O Chris Browne έγραψε στις Jul 26, 2005 :

> [EMAIL PROTECTED] (Achilleus Mantzios) writes:
> > O Christopher Browne έγραψε στις Jul 26, 2005 :
> >
> >> > Hello
> >> >
> >> > Would you like to advice to use REINDEX DATABASE on regular basis ?
> >> >
> >> > if (yes)
> >> > how it should be connected with VACUUM FULL ANALYZE which is run
> >> > regularly ?  (reindex before vacuum or vacuum before reindex?)
> >> >
> >> > else
> >> > haw to determine _when_ to run REINDEX ?
> >> 
> >> If you are doing ordinary VACUUM ANALYZE frequently enough, it
> >> shouldn't be necessary to either VACUUM FULL or REINDEX.
> >> 
> >> Back in the 7.2 days, there were sorts of update patterns that would
> >> mandate reindexing every so often, as you could get cases where index
> >> pages would be very sparsely populated.  That was alleviated in
> >> version 7.3, I believe, and was clearly evident in 7.4.
> >> 
> >> You know you need to REINDEX if analysis of an index shows that it is
> >> sparsely populated.  This generally shows up if you do an analyze on
> >> the table and find an index has more pages than tuples.
> >> 
> >> But if you run VACUUM reasonably frequently, this shouldn't be
> >> necessary...
> >
> > I vacuum analyze every hour, however after reindexdb 
> > my (currently) 2.47 GB db is reduced to 2.37 GB,
> > thus helping both db-wise and freebsd_cache-wise boost performance.
> >
> > All the above in 7.4.6.
> >
> > I am not arguing that i have done detailed analysis of the 
> > situation, i'm just saying that i have witnessed a performance gain after
> > running contrib/reindexdb every month (or so).
> 
> That doesn't strike me as being a material improvement, and it comes
> at a pretty high cost.

The numbers above are just the numbers reported after doing
a live reindex on the spot after deciding to answer to this email.

In the past (when the db was smaller) and for the same version, 
i have come across much "striking" numbers, 
and much more striking boost gain.
(e.g. a 2+ GB db directory turning to a ~1.5 GB dir).
BTW whats the limit of kern.ipc.shmmax for 32-bit intel?
~ 2^31 = ~ 2 GB.
In those cases i noticed severe boost gain.
(And of course the 1 hour vacuum analyze schedule was always there)

> 
> You can get a savings of about 4% of the space, but at the cost of
> taking an appreciable outage during which the database is not usable.

1st not all database is unusable, during the whole reindexdb run,
2nd outage outside office hours is acceptable for those who apply.

> 
> I wouldn't expect the 4% savings in space to lead to a particularly
> measurable improvement in performance, certainly not one worth the
> outage.
> 

I am just saying that the common saying "reindex is not needed for 7.4+"
maybe is not true in all circumstances.

-- 
-Achilleus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org