[PERFORM] Test...

2003-09-29 Thread David Griffiths



I've posted several emails, and have yet to see one 
show up (this one might not either).
 
Is there a size limit to an email (it had a big 
analyze, and schema information)??
David


[PERFORM] Tuning/performance issue...

2003-09-30 Thread David Griffiths




We're having a problem with a query during our 
investigation into Postgres (as an Oracle replacement). This query Postgres 
takes 20-40 seconds (multiple runs). Tom Lan recommended I post it here, with an 
explain-analyze.
 
Here's the query:
 
EXPLAIN ANALYZE SELECT company_name, address_1, 
address_2, address_3, city,address_list.state_province_id, 
state_province_short_desc, country_desc, zip_code, 
address_list.country_id,contact_info.email, 
commercial_entity.user_account_id, phone_num_1, phone_num_fax, website, 
boats_websiteFROM commercial_entity, country, user_account,address_list 
LEFT JOIN state_province ON address_list.state_province_id = 
state_province.state_province_idLEFT JOIN contact_info ON 
address_list.contact_info_id = contact_info.contact_info_idWHERE 
address_list.address_type_id = 101AND 
commercial_entity.commercial_entity_id=225528AND 
commercial_entity.commercial_entity_id = 
address_list.commercial_entity_idAND address_list.country_id = 
country.country_idAND commercial_entity.user_account_id = 
user_account.user_account_idAND user_account.user_role_id IN (101, 
101);
 
Here's the explain:
 
 Nested 
Loop  (cost=0.00..64570.33 rows=1 width=385) (actual 
time=42141.08..42152.06 rows=1 loops=1)   ->  Nested 
Loop  (cost=0.00..64567.30 rows=1 width=361) (actual 
time=42140.80..42151.77 rows=1 
loops=1) ->  Nested 
Loop  (cost=0.00..64563.97 rows=1 width=349) (actual 
time=42140.31..42151.27 rows=1 
loops=1)   
Join Filter: ("outer".commercial_entity_id = 
"inner".commercial_entity_id)   
->  Index Scan using commercial_entity_pkey on commercial_entity  
(cost=0.00..5.05 rows=1 width=94) (actual time=0.57..0.58 rows=1 
loops=1) 
Index Cond: (commercial_entity_id = 
225528::numeric)   
->  Materialize  (cost=63343.66..63343.66 rows=97221 width=255) 
(actual time=41741.96..41901.17 rows=90527 
loops=1) 
->  Merge Join  (cost=0.00..63343.66 rows=97221 width=255) (actual 
time=1.44..41387.68 rows=90527 
loops=1)   
Merge Cond: ("outer".contact_info_id = 
"inner".contact_info_id)   
->  Nested Loop  (cost=0.00..830457.52 rows=97221 width=222) 
(actual time=0.95..39178.32 rows=90527 
loops=1) 
Join Filter: ("outer".state_province_id = 
"inner".state_province_id) 
->  Index Scan using addr_list_ci_id_i on address_list  
(cost=0.00..586676.65 rows=97221 width=205) (actual time=0.49..2159.90 
rows=90527 
loops=1)   
Filter: (address_type_id = 
101::numeric) 
->  Seq Scan on state_province  (cost=0.00..1.67 rows=67 width=17) 
(actual time=0.00..0.21 rows=67 
loops=90527)   
->  Index Scan using contact_info_pkey on contact_info  
(cost=0.00..3366.76 rows=56435 width=33) (actual time=0.44..395.75 rows=55916 
loops=1) ->  Index 
Scan using user_account_pkey on user_account  (cost=0.00..3.32 rows=1 
width=12) (actual time=0.46..0.46 rows=1 
loops=1)   
Index Cond: ("outer".user_account_id = 
user_account.user_account_id)   
Filter: (user_role_id = 101::numeric)   ->  Index Scan 
using country_pkey on country  (cost=0.00..3.01 rows=1 width=24) (actual 
time=0.25..0.25 rows=1 
loops=1) Index Cond: 
("outer".country_id = country.country_id) Total runtime: 42165.44 
msec(21 rows)
 
 
I will post the schema in a seperate email - the list has rejected one big 
email 3 times now.
 
David


[PERFORM] Tuning/performance issue (part 2)

2003-09-30 Thread David Griffiths




Here's the schema:
 
    
Table "public.address_list"    
Column    
|  
Type  | 
Modifiers--++--- address_list_id  
| numeric(10,0)  | not 
null address_1    
| character varying(100) 
| address_2    
| character varying(100) 
| address_3    
| character varying(100) 
| city 
| character varying(100) 
| zip_code 
| character varying(20)  
| phone_num_1  | 
character varying(100) 
| phone_num_2  | 
character varying(100) 
| phone_num_fax    | character 
varying(100) | state_province_id    | 
numeric(10,0)  
| user_account_id  | 
numeric(10,0)  
| marina_id    
| numeric(10,0)  
| commercial_entity_id | 
numeric(10,0)  
| address_type_id  | 
numeric(10,0)  | not 
null distributor_id   | 
numeric(10,0)  
| contact_info_id  | 
numeric(10,0)  
| country_id   
| numeric(10,0)  
| lang_id  
| numeric(10,0)  
| boat_listing_id  | 
numeric(10,0)  
|Indexes: address_list_pkey primary key btree 
(address_list_id), 
addr_list_addr_type_id_i btree 
(address_type_id), 
addr_list_bl_id_i btree 
(boat_listing_id), 
addr_list_bl_sp_count_i btree (boat_listing_id, state_province_id, 
country_id), 
addr_list_ce_sp_c_at_c_i btree (commercial_entity_id, state_province_id, 
country_id, address_type_id, 
city), 
addr_list_ce_sp_countr_addr_type_i btree (commercial_entity_id, 
state_province_id, country_id, 
address_type_id), 
addr_list_ci_id_i btree 
(contact_info_id), 
addr_list_comm_ent_id_i btree 
(commercial_entity_id), 
addr_list_count_lang_i btree (country_id, 
lang_id), 
addr_list_country_id_i btree 
(country_id), 
addr_list_cty_bl_count_i btree (city, boat_listing_id, 
country_id), addr_list_cty_i 
btree (city), 
addr_list_distrib_id_i btree 
(distributor_id), 
addr_list_marina_id_i btree 
(marina_id), 
addr_list_sp_id_i btree 
(state_province_id), 
addr_list_ua_id_i btree (user_account_id)Foreign Key constraints: $1 FOREIGN 
KEY (address_type_id) REFERENCES address_type(address_type_id) ON UPDATE NO 
ACTION ON DELETE NO 
ACTION, 
$2 FOREIGN KEY (commercial_entity_id) REFERENCES 
commercial_entity(commercial_entity_id) ON UPDATE NO ACTION ON DELETE NO 
ACTION, 
$3 FOREIGN KEY (contact_info_id) REFERENCES contact_info(contact_info_id) ON 
UPDATE NO ACTION ON DELETE NO 
ACTION, 
$4 FOREIGN KEY (user_account_id) REFERENCES user_account(user_account_id) ON 
UPDATE NO ACTION ON DELETE NO 
ACTION, 
$5 FOREIGN KEY (state_province_id) REFERENCES state_province(state_province_id) 
ON UPDATE NO ACTION ON DELETE NO ACTION
 
   
Table 
"public.commercial_entity"  
Column   
|    
Type 
|  
Modifiers---+-+- commercial_entity_id  
| 
numeric(10,0)   
| not 
null company_name  
| character varying(100)  | not 
null website   
| character varying(200)  
| modify_date   
| timestamp without time zone 
| user_account_id   
| 
numeric(10,0)   
| source_id 
| 
numeric(10,0)   
| not null commercial_entity_type_id | 
numeric(10,0)   
| boats_website 
| character varying(200)  
| updated_on    
| timestamp without time zone | not null default ('now'::text)::timestamp(6) 
with time 
zone dealer_level_id   
| 
numeric(10,0)   
| lang_id   
| 
numeric(10,0)   
| default 
'100' yw_account_id 
| 
numeric(10,0)   
| keybank_dealer_code   | 
numeric(10,0)   
| dnetaccess_id 
| 
numeric(10,0)   
| not null default 
0 interested_in_dns | 
numeric(10,0)   
| not null default 
0 parent_office_id  
| 
numeric(10,0)   
| marinesite_welcome_msg    | character 
varying(500)  
| alt_marinesite_homepage   | character 
varying(256)  
| comments  
| character varying(4000) 
| show_finance_yn   
| character varying(1)    | not null 
default 
'Y' show_insurance_yn | 
character varying(1)    | not null 
default 
'Y' show_shipping_yn  
| character varying(1)    | not null 
default 
'Y' yw_account_id_c   
| character varying(11)   
| sales_id  
| 
numeric(10,0)   
|Indexes: commercial_entity_pkey primary key btree 
(commercial_entity_id), 

[PERFORM] Tuning/performance issue....

2003-09-30 Thread David Griffiths



And finally,
 

Here's the contents of the postgresql.conf file 
(I've been playing with these setting the last couple of days, and using the 
guide @ http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html to 
make sure I didn't have it mis-tuned):
 
tcpip_socket = truemax_connections = 500        # We will need 
quite a few connections; currently only one connection to database, 
however
port = 5432shared_buffers = 
5000   # I've tried 
5000 to 80,000 with no apparent differencewal_buffers = 16
sort_mem = 
256    
    # decreased this due to the large # of 
connectiosn
effective_cache_size 
= 5 # read that this can improve 
performance; hasn't done anything.
 
The machine is a dual-Pentium 3 933mhz, with 2 
gigabytes of RAM and a 3Ware RAID-5 card.
 
As a reference, our production Oracle database 
(exactly the same hardware, but RAID-mirroring) with way more load can handle 
the query in 1-2 seconds. I have MySQL 4.0.14 with InnoDB on the same machine 
(shutdown when I am testing Postgres, and visa versa) and it does the query in 
0.20 seconds.
 
Thanks for any insight.
David.


Re: [PERFORM] Tuning/performance issue...

2003-09-30 Thread David Griffiths
> The most efficient way to handle this query would probably be to join
> the three tables with restrictions first, and then join the other tables
> to those.  You could force this with not too much rewriting using
> something like (untested, but I think it's right)
>
> ... FROM commercial_entity CROSS JOIN user_account CROSS JOIN
> address_list LEFT JOIN state_province ON address_list.state_province_id
> = state_province.state_province_id
> LEFT JOIN contact_info ON address_list.contact_info_id =
> contact_info.contact_info_id
> CROSS JOIN country
> WHERE ...
>
> The explicit JOINs associate left-to-right, so this gives the intended
> join order.  (In your original query, explicit JOIN binds more tightly
> than commas do.)

Ok - that's interesting - I'll have to do some reading and more testing.

> The reason PG's planner doesn't discover this join order for itself
> is that it's written to not attempt to re-order outer joins from the
> syntactically defined ordering.  In general, such reordering would
> change the results.  It is possible to analyze the query and prove that
> certain reorderings are valid (don't change the results), but we don't
> currently have code to do that.

Not sure I follow. Are you saying that, depending on when the outer-join is
applied to the rows found at the time, you may end up with a different set
of rows? I would have expected the optimizer to do the outer-joins last, as
the extra data received by the outer-joins is not mandatory, and won't
affect
the rows that were retreived by joining user_account, address_list, and
commercial_entity.

An outer join would *never* be the most restrictive
join in a query. I thought (from my readings on Oracle query tuning) that
finding the most restrictive table/index was the first task of an optimizer.
Reduce the result set as quickly as possible. That query has the line,

"AND commercial_entity.commercial_entity_id=225528",

which uses an index (primary key) and uses an "=". I would have expected
that to be done first, then joined with the other inner-join tables, and
finally
have the outer-joins applied to the final result set to fill in the "might
be there" data.

Anyway, if the optimizer does the outer-joins first (address_list with
state_province
and contact_info), then it's picking the table with the most rows
(address_list has
200K+ rows, where the other 3 big tables have 70K-90K). Would re-ordering
the FROM clause (and LEFT JOIN portions) help?

Could you give an example where applying an outer-join at a different time
could
result in different results? I think I can see at situation where you use
part of the results
in the outer-join in the where clause, but I am not sure.

> I'm prepared to believe that Oracle contains code that actually does the
> analysis about which outer-join reorderings are valid, and is then able
> to find the right join order by deduction.

I'm not sure about Oracle (other than what I stated above). In fact, about
half
the time, updating table stats to try to get the Oracle optimizer to do a
better
job on a query results in even worse performance.

> ... FROM commercial_entity CROSS JOIN country) CROSS JOIN
>  user_account) CROSS JOIN address_list)
>  LEFT JOIN state_province ON ...)
>  LEFT JOIN contact_info ON ...
> WHERE ...
>
> This is clearly at odds with the SQL spec's syntactically defined join
> order semantics.  It's possible that it always yields the same results
> as the spec requires, but I'm not at all sure about that.

Again, I don't know. On the 3 queries based on these tables, Postgres
and MySQL return the exact same data (they use the same data set).

Do you have a link to the SQL spec's join-order requirements?

> In any case
> this strategy is certainly not "better" than ours, it just performs
> poorly on a different set of queries.  Would I be out of line to
> speculate that your query was previously tuned to work well in MySQL?

The query was pulled from our codebase (written for Oracle). I added a bit
to it
to make it slower, and then ported to MySQL and tested there first (just
re-wrote
the outer-join syntax). I found that  re-ordering the tables in the
from-clause on
MySQL changed the time by 45-ish% (0.36 seconds to .20 seconds), but that's
because I had forgotten to re-analyze the tables after refreshing the
dataset.
Now, table order doesn't make a difference in speed (or results).

If anything, I've done more tuning for Postgres - added some extra indexes
to try to help
(country.country_id had a composite index with another column, but not an
index for
just it), etc.

The dataset and schema is pure-Oracle. I extracted it out of the database,
removed all
Oracle-specific extensions, changed the column types, and migrated the
indexes and
foreign keys to MySQL and Postgres. Nothing more (other than an extra index
or two for Postgres - nada for MySQL).

This is all part of a "migrate away from Oracle" project. We are looking at
3 databases -
MySQL (InnoDB)