Re: [PERFORM] REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-10 Thread Susan Russo
Hello again - 
vacuum analyze of db did the trick, thanks!
longer procedure went from over 6 hours to ~11 minutesquite dramatic.

Reindexing wasn't necessary (did test on one db -slog-slog-, though).

Regards,
Susan

>From [EMAIL PROTECTED] Thu May 10 09:47:38 2007
X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on borise.harvard.edu
X-Spam-Status: No, score=-0.0 required=3.0 tests=AWL,BAYES_50 
autolearn=unavailable version=3.1.4
X-Spam-Level: 
X-Greylist: from auto-whitelisted by SQLgrey-1.7.5
Date: Thu, 10 May 2007 09:47:02 -0400
From: Bill Moran <[EMAIL PROTECTED]>
To: Susan Russo <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org, [EMAIL PROTECTED]
Subject: Re: [PERFORM] REVISIT specific query (not all) on Pg8 MUCH slower
 than Pg7
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Mailing-List: pgsql-performance
List-Archive: 
List-Help: 
List-ID: 
List-Owner: 
List-Post: 
List-Subscribe: 
List-Unsubscribe: 

In response to Susan Russo <[EMAIL PROTECTED]>:

> 
> 
> Hi again,
> 
> Very mixed news to report...
> 
> Recap:
> 
> 
> I'd reported:
> > Despite numerous efforts, we're unable to solve a severe performance 
> >limitation between Pg 7.3.2
> > and Pg 8.1.4.
> >
> > The query and 'explain analyze' plan below, runs in 
> > 26.20 msec on Pg 7.3.2, and 
> > 2463.968 ms on Pg 8.1.4, 
> >
> 
> 
> Tom Lane responded:
> >You're not getting the indexscan optimization of the LIKE clause, which
> >is most likely due to having initdb'd the 8.1 installation in something
> >other than C locale.  You can either redo the initdb in C locale (which
> >might be a good move to fix other inconsistencies from the 7.3 behavior
> >you're used to) or create a varchar_pattern_ops index on the column(s)
> >you're using LIKE with.
> 
> 
> Steinar H. Gunderson suggested:
> >You could always try
> >
> >  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
> 
> 
> I'd responded:
> >>You could always try
> >>
> >>  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
> >
> >WOW!  we're now at runtime 0.367ms on Pg8
> >
> >Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >
> >Thanks again - will report back soon.
> 
> 
> Alvaro Herrera pointed out:
> >> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >>
> >That's alternative to the pattern_ops index; it won't help you obtain a
> >plan faster than this one.
> 
> 
> 
> Tom concurred:
> >>> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >>
> >> That's alternative to the pattern_ops index; it won't help you obtain
> >> a plan faster than this one.
> >
> >No, but since their old DB was evidently running in C locale, this
> >seems like a prudent thing to do to avoid other surprising 
> >changes in behavior.
> 
> =
> 
> We reconfigured the server, as follows:  
> 
> initdb -D /var/lib/pgsql/data --encoding=UTF8 --locale=C
> 
>  -I'm wondering if this was incorrect (?). our Pg7 servers encode 
> SQL_ASCII -
> 
> 
> NEXT, loaded db, and the good news is the query showed:
>   Total runtime:  0.372 ms
> 
> 
> As mentioned in original post, this query is just part of a longer procedure.
> reminder:  
>The longer procedure was taking >10 *hours* to run on Pg8.1.4
> This same longer procedure runs in ~22 minutes on Pg7.3.2 
> server.
> 
> 
> =
> 
> Before redoing the initdb with C-locale, I did a CREATE INDEX on the 8.1.4
> server, which resulted not only in much faster query times, but in a drastic
> improvement in the time of the overall/longer procedure (<11mins).
> 
> With the initdb locale C Pg8.1.4 server, it ran for 6 hours before I killed 
> it (and output
> file was <.25 expected  end size).

Quick reminders:
*) Did you recreate all the indexes on the new system after the initdb?
*) Did you vacuum and analyze after loading your data?

> 
> ==
> 
> 
> I'm perplexed we're not seeing better performance on Pg8.1.4 server given the 
> 22 minutes runtime we're seeing on the Pg7.3.2 servers (on older hardware and 
> OS).
> 
> 
> So, while initdb  C locale helped the initial query, it seems to have had no 
> positive affect
> on the longer procedure.
> 
> 
> Is there some other difference between 7.3.2 and 8.1.4  we're missing?

I suggest you provide "explain analyze" output for the query on both versions.

> 
> 
> Thanks for any help.
> Regards,
> Susan Russo
> 
> 
> ===
> I enclose the db calls (selects) contained in the 'overall procedure' 
> referred to above (taken directly
> from a perl script):  THOUGH THIS RUNS IN 22 mins on Pg7.3.2, and >10 hours 
> on Pg8.1.4...
> 
>   my $aq = $dbh->prepare(sprintf("SELECT * from dbxref dx, db where 
> accession = '%s' and dx.db_

Re: [PERFORM] Background vacuum

2007-05-10 Thread Ron Mayer
Dan Harris wrote:
> Daniel Haensse wrote:
>> Has anybody a nice
>> solution to change process priority? A shell script, maybe even for java?

One way is to write astored procedure that sets it's own priority.
An example is here:
http://weblog.bignerdranch.com/?p=11


> While this may technically work, I think it lacks a key point.  'nice' (
> at least the versions I'm familiar with ) do not adjust I/O priority. 
> VACUUM is bogging things down because of the extra strain on I/O.  CPU
> usage shouldn't really be much of a factor.

Actually, CPU priorities _are_ an effective way of indirectly scheduling
I/O priorities.

This paper studied both CPU and lock priorities on a variety
of databases including PostgreSQL.

http://www.cs.cmu.edu/~bianca/icde04.pdf

" By contrast, for PostgreSQL, lock scheduling is not as
  effective as CPU scheduling (see Figure 4(c)).
  ...
  The effectiveness of CPU-Prio for TPC-C on
  PostgreSQL is surprising, given that I/O (I/O-related
  lightweight locks) is its bottleneck. Due to CPU prioritization,
  high-priority transactions are able to request I/O resources
  before low-priority transactions can. As a result,
  high-priority transactions wait fewer times (52% fewer) for
  I/O, and when they do wait, they wait behind fewer transactions
  (43% fewer). The fact that simple CPU prioritization
  is able to improve performance so significantly suggests that
  more complicated I/O scheduling is not always necessary.
  ...
  For TPC-C on MVCC DBMS, and in particular PostgreSQL,
  CPU scheduling is most effective, due to its ability
  to indirectly schedule the I/O bottleneck.
  ...
  For TPC-C running on PostgreSQL, the simplest CPU scheduling
  policy (CPU-Prio) provides a factor of 2 improvement
  for high-priority transactions, while adding priority inheritance
  (CPU-Prio-Inherit) provides a factor of 6 improvement
  while hardly penalizing low-priority transactions.
  Preemption (P-CPU) provides no appreciable benefit
  over CPU-Prio-Inherit
  "

> Instead, I would recommend looking at vacuum_cost_delay and the related
> settings to make vacuum lower priority than the queries you care about. 
> This should be a cleaner solution for you.

Yeah, that's still true.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Nested loops overpriced

2007-05-10 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Mittwoch, 9. Mai 2007 19:40 schrieb Tom Lane:
>> Hmmm ... I see at least part of the problem, which is that email_header
>> is joined twice in this query, which means that it's counted twice in
>> figuring the total volume of pages competing for cache space. So the
>> thing thinks cache space is oversubscribed nearly 3X when in reality
>> the database is fully cached.

> I should add that other, similar queries in this database that do not
> involve joining the same table twice produce seemingly optimal plans.
> (It picks hash joins which are actually faster than nested loops.)

It strikes me that in a situation like this, where the same table is
being scanned twice by concurrent indexscans, we ought to amortize the
fetches across *both* scans rather than treating them independently;
so there are actually two different ways in which we're being too
pessimistic about the indexscanning cost.

Difficult to see how to fix that in the current planner design however;
since it's a bottom-up process, we have to cost the individual scans
without any knowledge of what approach will be chosen for other scans.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Nested loops overpriced

2007-05-10 Thread Peter Eisentraut
Am Mittwoch, 9. Mai 2007 19:40 schrieb Tom Lane:
> Hmmm ... I see at least part of the problem, which is that email_header
> is joined twice in this query, which means that it's counted twice in
> figuring the total volume of pages competing for cache space.  So the
> thing thinks cache space is oversubscribed nearly 3X when in reality
> the database is fully cached.

I should add that other, similar queries in this database that do not involve 
joining the same table twice produce seemingly optimal plans.  (It picks hash 
joins which are actually faster than nested loops.)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Nested loops overpriced

2007-05-10 Thread Peter Eisentraut
Am Mittwoch, 9. Mai 2007 19:40 schrieb Tom Lane:
> I remember having dithered about whether
> to try to avoid counting the same physical relation more than once in
> total_table_pages, but this example certainly suggests that we
> shouldn't.  Meanwhile, do the estimates get better if you set
> effective_cache_size to 1GB or so?

Yes, that makes the plan significantly cheaper (something like 500,000 instead 
of 5,000,000), but still a lot more expensive than the hash join (about 
100,000).

> To return to your original comment: if you're trying to model a
> situation with a fully cached database, I think it's sensible
> to set random_page_cost = seq_page_cost = 0.1 or so.  You had
> mentioned having to decrease them to 0.02, which seems unreasonably
> small to me too, but maybe with the larger effective_cache_size
> you won't have to go that far.

Heh, when I decrease these parameters, the hash join gets cheaper as well.  I 
can't actually get it to pick the nested-loop join.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-10 Thread Susan Russo
>Quick reminders:
>*) Did you recreate all the indexes on the new system after the initdb?
>*) Did you vacuum and analyze after loading your data?

No, I didn't - am reindexing db now and will run vacuum analyze afterwards. 
 

>I suggest you provide "explain analyze" output for the query on both versions.

Pg8:

-
 Merge Join  (cost=151939.73..156342.67 rows=10131 width=1585) (actual 
time=0.129..0.129 rows=0 loops=1)
   Merge Cond: ("outer".cvterm_id = "inner".type_id)
   ->  Index Scan using cvterm_pkey on cvterm cvt  (cost=0.00..4168.22 
rows=32478 width=520) (actual time=0.044..0.044 rows=1 loops=1)
 Filter: (((name)::text <> 'gene'::text) AND ((name)::text <> 
'protein'::text) AND ((name)::text <>
'natural_transposable_element'::text) AND ((name)::text <> 
'chromosome_structure_variation'::text) AND ((name)::text <> 
'chromosome_arm'::text) AND ((name)::text <> 'repeat_region'::text))
   ->  Sort  (cost=151939.73..151965.83 rows=10441 width=1073) (actual 
time=0.079..0.079 rows=0 loops=1)
 Sort Key: f.type_id
->  Nested Loop  (cost=17495.27..151242.80 rows=10441 width=1073) 
(actual time=0.070..0.070 rows=0 loops=1)
   ->  Hash Join  (cost=17495.27..88325.38 rows=10441 width=525) 
(actual time=0.068..0.068 rows=0 loops=1)
 Hash Cond: ("outer".dbxref_id = "inner".dbxref_id)
->  Seq Scan on feature_dbxref fd  (cost=0.00..34182.71 
rows=2088171 width=9) (actual time=0.008..0.008 rows=1 loops=1)
 ->  Hash  (cost=17466.34..17466.34 rows=11572 width=524) 
(actual time=0.042..0.042 rows=0 loops=1)
   ->  Bitmap Heap Scan on dbxref dx  
(cost=117.43..17466.34 rows=11572 width=524) (actual time=0.041..0.041 rows=0 
loops=1)
 Filter: ((accession)::text ~~ 
'AY851043%'::text)
 ->  Bitmap Index Scan on dbxref_idx2  
(cost=0.00..117.43 rows=11572 width=0) (actual time=0.037..0.037 rows=0 loops=1)
   Index Cond: (((accession)::text >= 
'AY851043'::character varying) AND ((accession)::text < 'AY851044'::character 
varying))
   ->  Index Scan using feature_pkey on feature f  (cost=0.00..6.01 
rows=1 width=556) (never executed)
 Index Cond: ("outer".feature_id = f.feature_id)
 Total runtime: 0.381 ms
(18 rows)


===

Pg7:

---
 Nested Loop  (cost=0.00..23.45 rows=1 width=120) (actual time=0.08..0.08 
rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..17.49 rows=1 width=82) (actual time=0.08..0.08 
rows=0 loops=1)
 ->  Nested Loop  (cost=0.00..11.93 rows=1 width=30) (actual 
time=0.08..0.08 rows=0 loops=1)
   ->  Index Scan using dbxref_idx2 on dbxref dx  (cost=0.00..5.83 
rows=1 width=21) (actual time=0.08..0.08 rows=0 loops=1)
 Index Cond: ((accession >= 'AY851043'::character varying) 
AND (accession < 'AY851044'::character varying))
 Filter: (accession ~~ 'AY851043%'::text)
   ->  Index Scan using feature_dbxref_idx2 on feature_dbxref fd  
(cost=0.00..6.05 rows=5 width=9) (never executed)
 Index Cond: (fd.dbxref_id = "outer".dbxref_id)
->  Index Scan using feature_pkey on feature f  (cost=0.00..5.54 rows=1 
width=52) (never executed)
   Index Cond: ("outer".feature_id = f.feature_id)
   ->  Index Scan using cvterm_pkey on cvterm cvt  (cost=0.00..5.94 rows=1 
width=38) (never executed)
 Index Cond: ("outer".type_id = cvt.cvterm_id)
Filter: ((name <> 'gene'::character varying) AND (name <> 
'protein'::character varying) AND (name <> 
'natural_transposable_element'::character varying) AND (name <> 
'chromosome_structure_variation'::character varying)
AND (name <> 'chromosome_arm'::character varying) AND (name <> 
'repeat_region'::character varying))
 Total runtime: 0.36 msec
(14 rows)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-10 Thread Bill Moran
In response to Susan Russo <[EMAIL PROTECTED]>:

> 
> 
> Hi again,
> 
> Very mixed news to report...
> 
> Recap:
> 
> 
> I'd reported:
> > Despite numerous efforts, we're unable to solve a severe performance 
> >limitation between Pg 7.3.2
> > and Pg 8.1.4.
> >
> > The query and 'explain analyze' plan below, runs in 
> > 26.20 msec on Pg 7.3.2, and 
> > 2463.968 ms on Pg 8.1.4, 
> >
> 
> 
> Tom Lane responded:
> >You're not getting the indexscan optimization of the LIKE clause, which
> >is most likely due to having initdb'd the 8.1 installation in something
> >other than C locale.  You can either redo the initdb in C locale (which
> >might be a good move to fix other inconsistencies from the 7.3 behavior
> >you're used to) or create a varchar_pattern_ops index on the column(s)
> >you're using LIKE with.
> 
> 
> Steinar H. Gunderson suggested:
> >You could always try
> >
> >  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
> 
> 
> I'd responded:
> >>You could always try
> >>
> >>  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
> >
> >WOW!  we're now at runtime 0.367ms on Pg8
> >
> >Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >
> >Thanks again - will report back soon.
> 
> 
> Alvaro Herrera pointed out:
> >> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >>
> >That's alternative to the pattern_ops index; it won't help you obtain a
> >plan faster than this one.
> 
> 
> 
> Tom concurred:
> >>> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >>
> >> That's alternative to the pattern_ops index; it won't help you obtain
> >> a plan faster than this one.
> >
> >No, but since their old DB was evidently running in C locale, this
> >seems like a prudent thing to do to avoid other surprising 
> >changes in behavior.
> 
> =
> 
> We reconfigured the server, as follows:  
> 
> initdb -D /var/lib/pgsql/data --encoding=UTF8 --locale=C
> 
>  -I'm wondering if this was incorrect (?). our Pg7 servers encode 
> SQL_ASCII -
> 
> 
> NEXT, loaded db, and the good news is the query showed:
>   Total runtime:  0.372 ms
> 
> 
> As mentioned in original post, this query is just part of a longer procedure.
> reminder:  
>The longer procedure was taking >10 *hours* to run on Pg8.1.4
> This same longer procedure runs in ~22 minutes on Pg7.3.2 
> server.
> 
> 
> =
> 
> Before redoing the initdb with C-locale, I did a CREATE INDEX on the 8.1.4
> server, which resulted not only in much faster query times, but in a drastic
> improvement in the time of the overall/longer procedure (<11mins).
> 
> With the initdb locale C Pg8.1.4 server, it ran for 6 hours before I killed 
> it (and output
> file was <.25 expected  end size).

Quick reminders:
*) Did you recreate all the indexes on the new system after the initdb?
*) Did you vacuum and analyze after loading your data?

> 
> ==
> 
> 
> I'm perplexed we're not seeing better performance on Pg8.1.4 server given the 
> 22 minutes runtime we're seeing on the Pg7.3.2 servers (on older hardware and 
> OS).
> 
> 
> So, while initdb  C locale helped the initial query, it seems to have had no 
> positive affect
> on the longer procedure.
> 
> 
> Is there some other difference between 7.3.2 and 8.1.4  we're missing?

I suggest you provide "explain analyze" output for the query on both versions.

> 
> 
> Thanks for any help.
> Regards,
> Susan Russo
> 
> 
> ===
> I enclose the db calls (selects) contained in the 'overall procedure' 
> referred to above (taken directly
> from a perl script):  THOUGH THIS RUNS IN 22 mins on Pg7.3.2, and >10 hours 
> on Pg8.1.4...
> 
>   my $aq = $dbh->prepare(sprintf("SELECT * from dbxref dx, db where 
> accession = '%s' and dx.db_id = db.db_id and db.name = 'GB_protein'",$rec));
> 
> 
> my $pq = $dbh->prepare(sprintf("SELECT o.genus, o.species, 
> f.feature_id, f.uniquename, f.name, accession, is_current from feature f, 
> feature_dbxref fd, dbxref d, cvterm cvt, organism o where accession = '%s' 
> and d.dbxref_id = fd.dbxref_id and fd.feature_id = f.feature_id and 
> f.uniquename like '%s' and f.organism_id = o.organism_id and f.type_id = 
> cvt.cvterm_id and cvt.name = 'gene'",$rec,$fbgnwc));
> 
> 
> my $uq = $dbh2->prepare(sprintf("SELECT db.name, accession, 
> version, is_current from feature_dbxref fd, dbxref dx, db where fd.feature_id 
> = %d and fd.dbxref_id = dx.dbxref_id and dx.db_id = db.db_id and db.name = 
> '%s'",$pr{feature_id},$uds{$uh{$rec}{stflag}}));
> 
> 
> 
>my $cq = $dbh2->prepare(sprintf("SELECT f.uniquename, f.name, cvt.name 
> as ntype, dx.db_id, dx.accession, fd.is_current from dbxref dx, feature f, 
> feature_dbxref fd, cvte
> rm cvt where accession like '%s' and dx.dbxref_id = fd.dbxref_id and 
> fd.feature_id = f.feature_id and f.type_id = cvt.cvterm_id and cvt.name not 
> in 
> ('gene','protein','natural_transposable_element','chromoso

Re: [PERFORM] REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-10 Thread Steinar H. Gunderson
On Thu, May 10, 2007 at 09:23:03AM -0400, Susan Russo wrote:
>   my $aq = $dbh->prepare(sprintf("SELECT * from dbxref dx, db where 
> accession = '%s' and dx.db_id = db.db_id and db.name = 'GB_protein'",$rec));

This is not related to your performance issues, but it usually considered bad
form to use sprintf like this (mainly for security reasons). The usual way of
doing this would be:

  my $aq = $dbh->prepare("SELECT * from dbxref dx, db where accession = ? and 
dx.db_id = db.db_id and db.name = 'GB_protein'");
  $aq->execute($rec);

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

2007-05-10 Thread Susan Russo


Hi again,

Very mixed news to report...

Recap:


I'd reported:
> Despite numerous efforts, we're unable to solve a severe performance 
>limitation between Pg 7.3.2
> and Pg 8.1.4.
>
> The query and 'explain analyze' plan below, runs in 
>   26.20 msec on Pg 7.3.2, and 
>   2463.968 ms on Pg 8.1.4, 
>


Tom Lane responded:
>You're not getting the indexscan optimization of the LIKE clause, which
>is most likely due to having initdb'd the 8.1 installation in something
>other than C locale.  You can either redo the initdb in C locale (which
>might be a good move to fix other inconsistencies from the 7.3 behavior
>you're used to) or create a varchar_pattern_ops index on the column(s)
>you're using LIKE with.


Steinar H. Gunderson suggested:
>You could always try
>
>  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);


I'd responded:
>>You could always try
>>
>>  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
>
>WOW!  we're now at runtime 0.367ms on Pg8
>
>Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
>
>Thanks again - will report back soon.


Alvaro Herrera pointed out:
>> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
>>
>That's alternative to the pattern_ops index; it won't help you obtain a
>plan faster than this one.



Tom concurred:
>>> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
>>
>> That's alternative to the pattern_ops index; it won't help you obtain
>> a plan faster than this one.
>
>No, but since their old DB was evidently running in C locale, this
>seems like a prudent thing to do to avoid other surprising 
>changes in behavior.

=

We reconfigured the server, as follows:  

initdb -D /var/lib/pgsql/data --encoding=UTF8 --locale=C

 -I'm wondering if this was incorrect (?). our Pg7 servers encode SQL_ASCII 
-


NEXT, loaded db, and the good news is the query showed:
Total runtime:  0.372 ms


As mentioned in original post, this query is just part of a longer procedure.
reminder:  
 The longer procedure was taking >10 *hours* to run on Pg8.1.4
This same longer procedure runs in ~22 minutes on Pg7.3.2 
server.


=

Before redoing the initdb with C-locale, I did a CREATE INDEX on the 8.1.4
server, which resulted not only in much faster query times, but in a drastic
improvement in the time of the overall/longer procedure (<11mins).

With the initdb locale C Pg8.1.4 server, it ran for 6 hours before I killed it 
(and output
file was <.25 expected  end size).

==


I'm perplexed we're not seeing better performance on Pg8.1.4 server given the 
22 minutes runtime we're seeing on the Pg7.3.2 servers (on older hardware and 
OS).


So, while initdb  C locale helped the initial query, it seems to have had no 
positive affect
on the longer procedure.


Is there some other difference between 7.3.2 and 8.1.4  we're missing?


Thanks for any help.
Regards,
Susan Russo


===
I enclose the db calls (selects) contained in the 'overall procedure' referred 
to above (taken directly
from a perl script):  THOUGH THIS RUNS IN 22 mins on Pg7.3.2, and >10 hours on 
Pg8.1.4...

my $aq = $dbh->prepare(sprintf("SELECT * from dbxref dx, db where 
accession = '%s' and dx.db_id = db.db_id and db.name = 'GB_protein'",$rec));


my $pq = $dbh->prepare(sprintf("SELECT o.genus, o.species, 
f.feature_id, f.uniquename, f.name, accession, is_current from feature f, 
feature_dbxref fd, dbxref d, cvterm cvt, organism o where accession = '%s' and 
d.dbxref_id = fd.dbxref_id and fd.feature_id = f.feature_id and f.uniquename 
like '%s' and f.organism_id = o.organism_id and f.type_id = cvt.cvterm_id and 
cvt.name = 'gene'",$rec,$fbgnwc));


my $uq = $dbh2->prepare(sprintf("SELECT db.name, accession, 
version, is_current from feature_dbxref fd, dbxref dx, db where fd.feature_id = 
%d and fd.dbxref_id = dx.dbxref_id and dx.db_id = db.db_id and db.name = 
'%s'",$pr{feature_id},$uds{$uh{$rec}{stflag}}));



   my $cq = $dbh2->prepare(sprintf("SELECT f.uniquename, f.name, cvt.name 
as ntype, dx.db_id, dx.accession, fd.is_current from dbxref dx, feature f, 
feature_dbxref fd, cvte
rm cvt where accession like '%s' and dx.dbxref_id = fd.dbxref_id and 
fd.feature_id = f.feature_id and f.type_id = cvt.cvterm_id and cvt.name not in 
('gene','protein','natural_transposable_element','chromosome_structure_variation','chromosome_arm','repeat_region')",$nacc));




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Cannot make GIN intarray index be used by the planner

2007-05-10 Thread Valentine Gogichashvili

Hello again,

I got the opclass for the index and it looks like it is a default one

myvideoindex=# select pg_opclass.*, pg_type.typname
myvideoindex-#   from pg_index, pg_opclass, pg_type
myvideoindex-#  where pg_index.indexrelid =
'idx_nonnulls_myintarray_int4_gin'::regclass
myvideoindex-#and pg_opclass.oid = any (pg_index.indclass::oid[] )
myvideoindex-#and pg_type.oid = pg_opclass.opcintype;

opcamid |  opcname  | opcnamespace | opcowner | opcintype | opcdefault |
opckeytype | typname
-+---+--+--+---+++-
   2742 | _int4_ops |   11 |   10 |  1007 | t
| 23 | _int4
(1 row)

The search_path is set to the following

myvideoindex=# show search_path;
   search_path

"versionA", public
(1 row)

With best regards,

-- Valentine

On 5/9/07, Tom Lane <[EMAIL PROTECTED]> wrote:


[cc'ing to pgsql-hackers since this is looking like a contrib/intarray
bug]

"Valentine Gogichashvili" <[EMAIL PROTECTED]> writes:
> here is the DT

That works fine for me in 8.2:

regression=#  explain SELECT id, (myintarray_int4)
  FROM myintarray_table_nonulls
WHERE ARRAY[8] <@ myintarray_int4;
QUERY PLAN

--
Index Scan using idx_nonnulls_myintarray_int4_gin on
myintarray_table_nonulls  (cost=0.00..8.27 rows=1 width=36)
   Index Cond: ('{8}'::integer[] <@ myintarray_int4)
(2 rows)

What I am betting is that you've installed contrib/intarray in this
database and that's bollixed things up somehow.  In particular, intarray
tries to take over the position of "default" gin opclass for int4[],
and the opclass that it installs as default has operators named just
like the built-in ones.  If somehow your query is using pg_catalog.<@
instead of intarray's public.<@, then the planner wouldn't think the
index is relevant.

In a quick test your example still works with intarray installed, because
what it's really created is public.<@ (integer[], integer[]) which is
an exact match and therefore takes precedence over the built-in
pg_catalog.<@ (anyarray, anyarray).  But if for example you don't have
public in your search_path then the wrong operator would be chosen.

Please look at the pg_index entry for your index, eg

select * from pg_index where indexrelid =
'"versionA".idx_nonnulls_myintarray_int4_gin'::regclass;

and see whether the index opclass is the built-in one or not.

Note to hackers: we've already discussed that intarray shouldn't be
trying to take over the default gin opclass, but I am beginning to
wonder if it still has a reason to live at all.  We should at least
consider removing the redundant operators to avoid risks like this one.

regards, tom lane





--
ვალენტინ გოგიჩაშვილი
Valentine Gogichashvili