Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
Dennis Bjorklund wrote: On Wed, 22 Sep 2004, Gaetano Mendola wrote: Now my question is why the 7.4 choose the hash join ? :-( It looks to me that the marge join is faster because there wasn't really anything to merge, it resulted in 0 rows. Maybe the hash join that is choosen in 7.4 would have been faster had there been a couple of result rows (just a guess). It would be interesting to compare the plans in 7.4 with and without hash_join active and see what costs it estimates for a merge join compared to a hash join. Here they are: hash_join = on QUERY PLAN -- Limit (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 loops=1) -> Sort (cost=10.21..10.21 rows=1 width=24) (actual time=0.880..0.880 rows=0 loops=1) Sort Key: e.id_evento -> Hash Join (cost=9.02..10.21 rows=1 width=24) (actual time=0.687..0.687 rows=0 loops=1) Hash Cond: ("outer".id_tipo_evento = "inner".id_tipo_evento) -> Seq Scan on lookup_tipo_evento le (cost=0.00..1.16 rows=16 width=32) (actual time=0.017..0.038 rows=16 loops=1) -> Hash (cost=9.02..9.02 rows=1 width=16) (actual time=0.212..0.212 rows=0 loops=1) -> Index Scan using t_evento_id_pratica_key on t_evento e (cost=0.00..9.02 rows=1 width=16) (actual time=0.208..0.208 rows=0 loops=1) Index Cond: (id_pratica = 5) Filter: (((id_tipo_evento)::text = '5'::text) OR ((id_tipo_evento)::text = '6'::text) OR ((id_tipo_evento)::text = '7'::text) OR ((id_tipo_evento)::text = '8'::text)) Total runtime: 1.244 ms (11 rows) hash_join = off QUERY PLAN -- Limit (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 loops=1) -> Sort (cost=10.28..10.28 rows=1 width=24) (actual time=0.425..0.425 rows=0 loops=1) Sort Key: e.id_evento -> Merge Join (cost=10.25..10.27 rows=1 width=24) (actual time=0.218..0.218 rows=0 loops=1) Merge Cond: ("outer".id_tipo_evento = "inner".id_tipo_evento) -> Sort (cost=9.02..9.02 rows=1 width=16) (actual time=0.214..0.214 rows=0 loops=1) Sort Key: e.id_tipo_evento -> Index Scan using t_evento_id_pratica_key on t_evento e (cost=0.00..9.02 rows=1 width=16) (actual time=0.110..0.110 rows=0 loops=1) Index Cond: (id_pratica = 5) Filter: (((id_tipo_evento)::text = '5'::text) OR ((id_tipo_evento)::text = '6'::text) OR ((id_tipo_evento)::text = '7'::text) OR ((id_tipo_evento)::text = '8'::text)) -> Sort (cost=1.22..1.23 rows=16 width=32) (never executed) Sort Key: le.id_tipo_evento -> Seq Scan on lookup_tipo_evento le (cost=0.00..1.16 rows=16 width=32) (never executed) Total runtime: 0.721 ms (14 rows) Regards Gaetano Mendola ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
On Wed, 22 Sep 2004, Gaetano Mendola wrote: > Limit (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 > loops=1) > Limit (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 > loops=1) These estimated costs are almost the same, but the runtime differs a bit. This means that maybe you need to alter settings like random_page_cost, effective_cache and maybe some others to make the cost reflect the runtime better. Since the costs are so close to each other very small changes can make it choose the other plan. It's also very hard to make an estimate that is correct in all situations. That's why it's called an estimate after all. -- /Dennis Björklund ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
Dennis Bjorklund wrote: > On Wed, 22 Sep 2004, Gaetano Mendola wrote: > > >> Limit (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 loops=1) >> Limit (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 loops=1) > > > These estimated costs are almost the same, but the runtime differs a bit. > This means that maybe you need to alter settings like random_page_cost, > effective_cache and maybe some others to make the cost reflect the runtime > better. > > Since the costs are so close to each other very small changes can make it > choose the other plan. It's also very hard to make an estimate that is > correct in all situations. That's why it's called an estimate after all. Is not feseable. That values are obtained with random_page_cost = 2, effective_cache_size = 2, cpu_tuple_cost = 0.01 increasing or decreasing random_page_cost this means increase or decrease both costs: random_page_cost = 1.5 hashjoin on => 8.47 hashjoin off => 8.53 random_page_cost = 3 hashjoin on => 13.70 hashjoin off => 13.76 so is choosen the hasjoin method in both cases. In the other side the effective_cache_size doesn't affect this costs. Decreasing the cpu_tuple_cost have the same effect cpu_tuple_cost = 0.005 hashjoin on => 10.11 hashjoin off => 10.17 cpu_tuple_cost = 0.001 hashjoin on => 10.03 hashjoin off => 10.03 cpu_tuple_cost = 0.0005 hashjoin on => 10.01 hashjoin off => 10.01 And when the two costs are the same the hashjoin path is choosen. I think cpu_tuple_cost less then 0.001 is not a good idea I think the only way is set the hashjoin = off. Any other suggestion ? Regards Gaetano Mendola ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
Gaetano Mendola <[EMAIL PROTECTED]> writes: > hash_join = on > -> Seq Scan on lookup_tipo_evento le (cost=0.00..1.16 rows=16 > width=32) (actual time=0.017..0.038 rows=16 loops=1) > > hash_join = off > -> Seq Scan on lookup_tipo_evento le (cost=0.00..1.16 > rows=16 width=32) (never executed) Actually this looks like it's arguably a bug to me. Why does the hash join execute the sequential scan at all? Shouldn't it also like the merge join recognize that the other hashed relation is empty and skip the sequential scan entirely? -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
On 22 Sep 2004, Greg Stark wrote: > Actually this looks like it's arguably a bug to me. Why does the hash > join execute the sequential scan at all? Shouldn't it also like the > merge join recognize that the other hashed relation is empty and skip > the sequential scan entirely? I'm not sure you can classify that as a bug. It's just that he in one of the plans started with the empty scan and bacause of that didn't need the other, but with the hash join it started with the table that had 16 rows and then got to the empty one. While I havn't checked, I assume that if it had started with the empty table there then it would have skipped the other. I don't know what criteria is used to select which part to start with when doing a hash join. Looks like it started with the one that had the highest estimate of rows here, doing it the other way around might be a good idea because you in some cases are lucky to find an empty scans and can omit the other. The above are just observations of the behaviour, I've not seen the source at all. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
Dennis Bjorklund <[EMAIL PROTECTED]> writes: > On 22 Sep 2004, Greg Stark wrote: > > > Actually this looks like it's arguably a bug to me. Why does the hash > > join execute the sequential scan at all? Shouldn't it also like the > > merge join recognize that the other hashed relation is empty and skip > > the sequential scan entirely? > > I'm not sure you can classify that as a bug. It's just that he in one of > the plans started with the empty scan and bacause of that didn't need > the other, but with the hash join it started with the table that had 16 > rows and then got to the empty one. No, postgres didn't do things in reverse order. It hashed the empty table and then went ahead and checked every record of the non-empty table against the empty hash table. Reading the code there's no check for this, and it seems like it would be a useful low-cost little optimization. I think postgres normally hashes the table it thinks is smaller, so you do join against an empty relation it should end up on the hash side of the hash join and allow postgres to avoid the scan of the outer table. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
Greg Stark <[EMAIL PROTECTED]> writes: > No, postgres didn't do things in reverse order. It hashed the empty table and > then went ahead and checked every record of the non-empty table against the > empty hash table. > Reading the code there's no check for this, and it seems like it would be a > useful low-cost little optimization. Yeah, I was just looking at doing that. It would also be interesting to prefetch one row from the outer table and fall out immediately (without building the hash table) if the outer table is empty. This seems to require some contortion of the code though :-( > I think postgres normally hashes the table it thinks is smaller, Right, it will prefer to put the physically smaller table (estimated width*rows) on the inside. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
Tom Lane <[EMAIL PROTECTED]> writes: > Yeah, I was just looking at doing that. Well I imagine it takes you as long to read my patch as it would for you to write it. But anyways it's still useful to me as exercises. > It would also be interesting to prefetch one row from the outer table and fall > out immediately (without building the hash table) if the outer table is > empty. This seems to require some contortion of the code though :-( Why is it any more complicated than just moving the hash build down lower? There's one small special case needed in ExecHashJoinOuterGetTuple but it's pretty non-intrusive. It seems to work for me but I can't test multiple batches easily. I think I've convinced myself that they would work fine but... test=# explain analyze select * from a natural join b; QUERY PLAN - Hash Join (cost=22.50..345.00 rows=5000 width=4) (actual time=0.005..0.005 rows=0 loops=1) Hash Cond: ("outer".a = "inner".a) -> Seq Scan on a (cost=0.00..20.00 rows=1000 width=4) (actual time=0.002..0.002 rows=0 loops=1) -> Hash (cost=20.00..20.00 rows=1000 width=4) (never executed) -> Seq Scan on b (cost=0.00..20.00 rows=1000 width=4) (never executed) Total runtime: 0.070 ms (6 rows) Index: backend/executor/nodeHash.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/executor/nodeHash.c,v retrieving revision 1.86 diff -c -r1.86 nodeHash.c *** backend/executor/nodeHash.c 29 Aug 2004 04:12:31 - 1.86 --- backend/executor/nodeHash.c 22 Sep 2004 18:37:40 - *** *** 232,237 --- 232,238 hashtable->buckets = NULL; hashtable->nbatch = nbatch; hashtable->curbatch = 0; + hashtable->ntup = 0; hashtable->innerBatchFile = NULL; hashtable->outerBatchFile = NULL; hashtable->innerBatchSize = NULL; *** *** 493,498 --- 494,501 heapTuple->t_len); hashTuple->next = hashtable->buckets[bucketno]; hashtable->buckets[bucketno] = hashTuple; + + hashtable->ntup ++; } else { Index: backend/executor/nodeHashjoin.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/executor/nodeHashjoin.c,v retrieving revision 1.64 diff -c -r1.64 nodeHashjoin.c *** backend/executor/nodeHashjoin.c 29 Aug 2004 05:06:42 - 1.64 --- backend/executor/nodeHashjoin.c 22 Sep 2004 18:37:41 - *** *** 109,143 ResetExprContext(econtext); /* -* if this is the first call, build the hash table for inner relation -*/ - if (!node->hj_hashdone) - { - /* -* create the hash table -*/ - Assert(hashtable == NULL); - hashtable = ExecHashTableCreate((Hash *) hashNode->ps.plan, - node->hj_HashOperators); - node->hj_HashTable = hashtable; - - /* -* execute the Hash node, to build the hash table -*/ - hashNode->hashtable = hashtable; - (void) ExecProcNode((PlanState *) hashNode); - - /* -* Open temp files for outer batches, if needed. Note that file -* buffers are palloc'd in regular executor context. -*/ - for (i = 0; i < hashtable->nbatch; i++) - hashtable->outerBatchFile[i] = BufFileCreateTemp(false); - - node->hj_hashdone = true; - } - - /* * Now get an outer tuple and probe into the hash table for matches */ outerTupleSlot = node->js.ps.ps_OuterTupleSlot; --- 109,114 *** *** 163,171 --- 134,180 node->hj_MatchedOuter = false; /* +* if this is the first call, build the hash table for inner relation +*/ + if (!node->hj_hashdone) + { + /* +* create the hash table +*/ + Assert(hashtable == NULL); + hashtable = ExecHashTableCreate((Hash *) hashNode->ps.plan, + node->hj_HashOperators); + node->hj_HashTable = hashtable; + + /* +* execute the H
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
Greg Stark <[EMAIL PROTECTED]> writes: >> It would also be interesting to prefetch one row from the outer table and fall >> out immediately (without building the hash table) if the outer table is >> empty. This seems to require some contortion of the code though :-( > Why is it any more complicated than just moving the hash build down lower? Having to inject the consideration into ExecHashJoinOuterGetTuple seems messy to me. On reflection I'm not sure it would be a win anyway, for a couple of reasons. (1) Assuming that the planner has gotten things right and put the larger relation on the outside, the case of an empty outer relation and a nonempty inner one should rarely arise. (2) Doing this would lose some of the benefit from the optimization to detect an empty inner relation. If the outer subplan is a slow-start one (such as another hashjoin), it would lose a lot of the benefit :-( regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Caching of Queries
I couldn't find anything in the docs or in the mailing list on this, but it is something that Oracle appears to do as does MySQL. The idea, I believe, is to do a quick (hash) string lookup of the query and if it's exactly the same as another query that has been done recently to re-use the old parse tree. It should save the time of doing the parsing of the SQL and looking up the object in the system tables. It should probably go through the planner again because values passed as parameters may have changed. Although, for extra points it could look at the previous query plan as a hint. On the surface it looks like an easy enhancement, but what do I know? I suppose it would benefit mostly those programs that use a lot of PQexecParams() with simple queries where a greater percentage of the time is spent parsing the SQL rather than building the execute plan. What do you think? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Caching of Queries
Scott Kirkwood <[EMAIL PROTECTED]> writes: > What do you think? I think this would allow the problems of cached plans to bite applications that were previously not subject to them :-(. An app that wants plan re-use can use PREPARE to identify the queries that are going to be re-executed. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Infinite CPU loop due to field ::type casting
Gentlefolk, I'm not sure if this is the proper forum for this question, and it might have been answered in a previous thread, but I'm new to PostgreSQL and the research I did in the archives did not turn up anything addressing this issue. Please direct me to the proper forum is this is not the correct venue. Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6 (stock with the RHEL distribution) The two tables I used in the example are tbl_device and tbl_sad_event: vsa=# \d vsa.tbl_device; Table "vsa.tbl_device" Column | Type | Modifiers +--+- id | integer | not null default nextval('vsa.tbl_device_id_seq'::text) name | character varying(100) | not null account_id | bigint | not null vss_site_id| bigint | not null org_site_id| bigint | not null default 0 device_type_id | integer | not null default 1 os_family_id | integer | not null default 0 status_id | integer | not null default 0 timezone | character varying(80)| clientkey | character varying(2048) | not null record_created | timestamp with time zone | default now() Indexes: pk_tbl_device primary key btree (id), idx_d_uniq_name_site_account_key unique btree (name, vss_site_id, account_id, clientkey), tbl_device_clientkey_key unique btree (clientkey), idx_d_account_id btree (account_id), idx_d_account_site_name btree (account_id, vss_site_id, name), idx_d_device_type_id btree (device_type_id), idx_d_name btree (name), idx_d_org_site_id btree (org_site_id), idx_d_os_family_id btree (os_family_id), idx_d_status_id btree (status_id), idx_d_vss_site_id btree (vss_site_id) Foreign Key constraints: fk_d_va FOREIGN KEY (account_id) REFERENCES vsa.tbl_vsa_account(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_vs FOREIGN KEY (vss_site_id) REFERENCES vsa.tbl_vss_site(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dof FOREIGN KEY (os_family_id) REFERENCES vsa.enum_device_os_family(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dsc FOREIGN KEY (status_id) REFERENCES vsa.enum_device_status_code(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dt FOREIGN KEY (device_type_id) REFERENCES vsa.enum_device_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION Triggers: trg_clean_device_name vsa=# \d vsa.tbl_sad_event Table "vsa.tbl_sad_event" Column |Type | Modifiers +-+ id | integer | not null default nextval('vsa.tbl_sad_event_id_seq'::text) device_id | bigint | not null log_type | integer | severity | character varying(20) | time_logged| timestamp without time zone | user_name | character varying(50) | remote_user| character varying(50) | remote_host| character varying(100) | source_tag | character varying(30) | event_code | character varying(50) | type | character varying(6)| record_created | timestamp with time zone| default now() Indexes: pk_tbl_sad_event primary key btree (id), idx_se_dev_time_type btree (device_id, time_logged, "type"), idx_se_device_id btree (device_id), idx_se_time_logged btree (time_logged), idx_se_type btree ("type"), sjr_se_id_time_type btree (device_id, time_logged, "type") Foreign Key constraints: fk_sade_d FOREIGN KEY (device_id) REFERENCES vsa.tbl_device(id) ON UPDATE NO ACTION ON DELETE CASCADE Here is my original query, and the query plan generated by the planner: vsa=# explain SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count FROM vsa.tbl_device AS dev LEFT OUTER JOIN (SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*) FROM vsa.dtbl_logged_event_20040922 AS stbl WHERE stbl.log_type IN (2, 3, 4, 5) GROUP BY stbl.device_id, stbl.log_type, stbl.severity) AS tbl ON (dev.id=tbl.device_id) ORDER BY dev.name; QUERY PLAN -- Sort (cost=40893.18..40960.93 rows=27100 width=79) Sort Key: dev.name -> Merge Join (cost=38417.13..38897.77 rows=27100 width=79) Merge Con
Re: [PERFORM] NAS, SAN or any alternate solution ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rod Taylor wrote: | I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both | work as well as expected, but do require some tweeking as they normally | are not optimized for the datablock size that PostgreSQL likes to deal | with (8k by default) -- this can make as much as a 50% difference in | performance levels. I'm looking for documentation about the datablock size you mentioned above. My goal is to tune the disk / filesystem on our prototype system. It's an EMC disk array, so sectors on disk are 512 bytes of usable space. We've decided to go with RAID 10 since the goal is to maximize performance. Currently the raid element size is set at 16 sectors which is 8192 bytes of payload. I've got a sysadmin working on getting XFS going with 8192 byte blocks. My next task will be to calculate the amount of space used by XFS for headers etc. to find out how much of those 8192 bytes can be used for the postgres payload. Then configure postgres to use datablocks that size. So I'm looking for details on how to manipulate the size of the datablock. I'm also not entirely sure how to make the datablocks line up with the filesystem blocks. Any suggestions on this would be greatly appreciated. - -- Andrew Hammond416-673-4138[EMAIL PROTECTED] Database Administrator, Afilias Canada Corp. CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFBUeHmgfzn5SevSpoRAu2sAJ4nHHup5lhp4+RcgBPGoJpUFoE1SQCgyvW1 ixyAvqb7ZkB+IIdGb36mpxI= =uDLW -END PGP SIGNATURE- ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Fw: Infinite CPU loop due to field ::type casting, Take II :-)
I just realized in my haste to send this email out I provided the wrong table in my example. Below is the same email, but with vsa.dtbl_logged_event_20040922 substituted for vsa.tbl_sad_event. Sorry for the inconvenience. --- Steve Gentlefolk, I'm not sure if this is the proper forum for this question, and it might have been answered in a previous thread, but I'm new to PostgreSQL and the research I did in the archives did not turn up anything addressing this issue. Please direct me to the proper forum is this is not the correct venue. Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6 (stock with the RHEL distribution) The two tables I used in the example are tbl_device and dtbl_logged_event_20040922: vsa=# \d vsa.tbl_device; Table "vsa.tbl_device" Column | Type | Modifiers +--+- id | integer | not null default nextval('vsa.tbl_device_id_seq'::text) name | character varying(100) | not null account_id | bigint | not null vss_site_id| bigint | not null org_site_id| bigint | not null default 0 device_type_id | integer | not null default 1 os_family_id | integer | not null default 0 status_id | integer | not null default 0 timezone | character varying(80)| clientkey | character varying(2048) | not null record_created | timestamp with time zone | default now() Indexes: pk_tbl_device primary key btree (id), idx_d_uniq_name_site_account_key unique btree (name, vss_site_id, account_id, clientkey), tbl_device_clientkey_key unique btree (clientkey), idx_d_account_id btree (account_id), idx_d_account_site_name btree (account_id, vss_site_id, name), idx_d_device_type_id btree (device_type_id), idx_d_name btree (name), idx_d_org_site_id btree (org_site_id), idx_d_os_family_id btree (os_family_id), idx_d_status_id btree (status_id), idx_d_vss_site_id btree (vss_site_id) Foreign Key constraints: fk_d_va FOREIGN KEY (account_id) REFERENCES vsa.tbl_vsa_account(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_vs FOREIGN KEY (vss_site_id) REFERENCES vsa.tbl_vss_site(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dof FOREIGN KEY (os_family_id) REFERENCES vsa.enum_device_os_family(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dsc FOREIGN KEY (status_id) REFERENCES vsa.enum_device_status_code(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dt FOREIGN KEY (device_type_id) REFERENCES vsa.enum_device_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION Triggers: trg_clean_device_name vsa=# \d vsa.dtbl_logged_event_20040922 Table "vsa.dtbl_logged_event_20040922" Column |Type | Modifiers -+-+- id | integer | not null default nextval('vsa.dtbl_logged_event_20040922_id_seq'::text) device_id | bigint | not null report_datetime | timestamp without time zone | time_logged | timestamp without time zone | log_type| integer | not null type| character varying(50) | severity| character varying(30) | source_tag | character varying(30) | remote_host | character varying(100) | user_name | character varying(50) | event_code | character varying(10) | description | text| record_created | timestamp with time zone| default now() event_code_new | character varying(30) | remote_user | character varying(50) | Indexes: pk_dtbl_logged_event_20040922 primary key btree (id), idx_le_device_id_20040922 btree (device_id), idx_le_id_source_event_20040922 btree (device_id, source_tag, event_code), idx_le_id_src_20040922 btree (device_id, source_tag), idx_le_id_type_severity_evtcode_20040922 btree (device_id, log_type, severity, event_code), idx_le_log_type_20040922 btree (log_type), idx_le_source_tag_20040922 btree (source_tag), idx_le_time_logged_20040922 btree (time_logged), idx_le_time_type_20040922 btree (time_logged, log_type) Foreign Key constraints: fk_le_lelt_20040922 FOREIGN KEY (log_type) REFERENCES vsa.enum_le_log_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_le_d_20040922 FOREIGN KEY (device_id) REFERENCES vsa.tbl_device(id) ON UPDATE NO ACTION ON DELETE CA
Re: [PERFORM] NAS, SAN or any alternate solution ?
> Rod Taylor wrote: > | I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both > | work as well as expected, but do require some tweeking as they normally > | are not optimized for the datablock size that PostgreSQL likes to deal > | with (8k by default) -- this can make as much as a 50% difference in > | performance levels. > I'm also not entirely sure how to make the datablocks line up with the > filesystem blocks. Any suggestions on this would be greatly appreciated. We just played with Veritas settings while running pg_bench on a 200GB database. I no longer have access to the NetApp, but the settings for the Hitachi are below. In tunefstab we have: read_pref_io=8192,read_nstream=4,write_pref_io=8192,write_nstream=2 In fstab it's: defaults,mincache=tmpcache,noatime If you have better settings, please shoot them over so we can try them out. Perhaps even get someone over there to write a new SAN section in the Tuning Chapter. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] SAN performance
Hello, I’ll be moving a DB from internal RAID-10 SCSI storage to an EMC CX300 FC RAID-10 LUN, bound to the host. I’ve setup a test host machine and a test LUN. The /var/lib/pgsql/data folder is sym-linked to a partition on the LUN. Other than the shared_buffers, effective cache size, and sort memory, I am not sure if I need to change any other parameters in the postgresql.conf file for getting maximum performance from the EMC box. Is there a general guideline for setting up postgres database and the tunable parameters on a SAN, especially for EMC? Appreciate any help, Thanks, Anjan
Re: [PERFORM] Fw: Infinite CPU loop due to field ::type casting, Take II :-)
Steven Rosenstein <[EMAIL PROTECTED]> writes: > Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6 > vsa=# explain > SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count > FROM vsa.tbl_device AS dev > LEFT OUTER JOIN > (SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*) >FROM vsa.dtbl_logged_event_20040922 AS stbl >WHERE stbl.log_type IN (2, 3, 4, 5) GROUP BY stbl.device_id, > stbl.log_type, stbl.severity) AS tbl > ON (dev.id=tbl.device_id::int) > ORDER BY dev.name; > QUERY > PLAN > -- > Nested Loop (cost=0.00..45848850.65 rows=27100 width=79) >Join Filter: ("outer".id = ("inner".device_id)::integer) >-> Index Scan using idx_d_name on tbl_device dev (cost=0.00..1490.19 rows=1275 > width=26) >-> Subquery Scan tbl (cost=0.00..35552.21 rows=27100 width=26) > -> Aggregate (cost=0.00..35552.21 rows=27100 width=26) >-> Group (cost=0.00..34874.70 rows=271005 width=26) > -> Index Scan using idx_le_id_type_severity_evtcode_20040922 > on dtbl_logged_event_20040922 stbl (cost=0.00..32842.16 > rows=271005 width=26) >Filter: ((log_type = 2) OR (log_type = 3) OR (log_type = > 4) OR (log_type = 5)) > (8 rows) > Time: 1.62 ms > Notice that the query plan changes completely when I cast device_id as int. > What is worse (and why I'm writing) is that when I run the second query, it > goes into an infinite CPU loop. "Bad plan" and "infinite loop" are two very different things. In 7.3 you'd be better off without the cast, as you just found out. The 7.3 backend can only handle merge or hash joins that use a join clause of the form "variable = variable" --- anything more complicated falls back to a nested loop join. It does handle mergejoins between unlike data types, though, so you were doing okay with the undecorated query. 7.4 is smarter; dunno if you want to upgrade at this point. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Caching of Queries
On Thu, 2004-09-23 at 05:59, Tom Lane wrote: > I think this would allow the problems of cached plans to bite > applications that were previously not subject to them :-(. > An app that wants plan re-use can use PREPARE to identify the > queries that are going to be re-executed. I agree; if you want to do some work in this area, making improvements to PREPARE would IMHO be the best bet. For example, some people have talked about having PREPARE store queries in shared memory. Another idea would be to improve the quality of the plan we generate at PREPARE time: for instance you could generate 'n' plans for various combinations of input parameters, and then choose the best query plan at EXECUTE time. It's a difficult problem to solve, however (consider multiple parameters to PREPARE, for example). -Neil ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] NAS, SAN or any alternate solution ?
Andrew Hammond <[EMAIL PROTECTED]> writes: > My goal is to tune the disk / filesystem on our prototype system. It's > an EMC disk array, so sectors on disk are 512 bytes of usable space. > We've decided to go with RAID 10 since the goal is to maximize > performance. Currently the raid element size is set at 16 sectors which > is 8192 bytes of payload. Do people find it works well to have a stripe size that small? It seems like it would be better to have it be at least a few filesystem/postgres blocks so that subsequent reads stand a chance of being sequential and not causing another spindle to have to seek. Does this depend on whether it's an DSS load vs an OLTP load? If it's a single query at a time DSS system perhaps small blocksizes work best to get maximum throughput? > I've got a sysadmin working on getting XFS going with 8192 byte blocks. Having your filesystem block size match postgres's block size is probably a good idea. So 8k blocks is good. > My next task will be to calculate the amount of space used by XFS for > headers etc. to find out how much of those 8192 bytes can be used for the > postgres payload. No filesystem that I know of uses up space in every block. The overhead is all stored elsewhere in blocks exclusively contain such overhead data. So just setting postgres to 8k which the default would work well. > Then configure postgres to use datablocks that size. So I'm looking for > details on how to manipulate the size of the datablock. Look in pg_config_manual.h in src/include. Postgres has to be recompiled to change it and the database has to be reinitialized. But it could be set to 16k or 32k. In which case you would probably want to adjust your filesystem to match. But unless you do experiments you won't know if it would be of any benefit to change. > I'm also not entirely sure how to make the datablocks line up with the > filesystem blocks. Any suggestions on this would be greatly appreciated. They just will. The files start on a block boundary, so every 8k is a new block. Postgres stores 8k at a time always. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] O_DIRECT setting
On Mon, 2004-09-20 at 17:57, Guy Thornley wrote: > According to the manpage, O_DIRECT implies O_SYNC: > > File I/O is done directly to/from user space buffers. The I/O is > synchronous, i.e., at the completion of the read(2) or write(2) > system call, data is guaranteed to have been transferred. This seems like it would be a rather large net loss. PostgreSQL already structures writes so that the writes we need to hit disk immediately (WAL records) are fsync()'ed -- the kernel is given more freedom to schedule how other writes are flushed from the cache. Also, my recollection is that O_DIRECT also disables readahead -- if that's correct, that's not what we want either. BTW, using O_DIRECT has been discussed a few times in the past. Have you checked the list archives? (for both -performance and -hackers) > Would people be interested in a performance benchmark? Sure -- I'd definitely be curious, although as I said I'm skeptical it's a win. > I need some benchmark tips :) Some people have noted that it can be difficult to use contrib/pgbench to get reproducible results -- you might want to look at Jan's TPC-W implementation or the OSDL database benchmarks: http://pgfoundry.org/projects/tpc-w-php/ http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/ > Incidentally, postgres heap files suffer really, really bad fragmentation, > which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...) > quite drastically. We have in-house patches that somewhat alleiviate this, > but they are not release quality. Can you elaborate on these "in-house patches"? -Neil ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
On 22 Sep 2004 at 15:59, Tom Lane wrote: > Scott Kirkwood <[EMAIL PROTECTED]> writes: > > What do you think? > > I think this would allow the problems of cached plans to bite > applications that were previously not subject to them :-(. > An app that wants plan re-use can use PREPARE to identify the > queries that are going to be re-executed. > > regards, tom lane > And then there are the people that would like to upgrade and get a performance gain without having to change their programs. A simple conf flag could turn query/plan caching off for all those that rely on each statement being re-planned. This is where SQLServer etc. tend to get big wins. I know from direct comparisons that SQLServer often takes quite a bit longer to parse/plan a select statement than Postgres, but wins out overall from its query/plan caching. Regards, Gary. ---(end of broadcast)--- TIP 8: explain analyze is your friend