Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Gaetano Mendola
Dennis Bjorklund wrote:
On Wed, 22 Sep 2004, Gaetano Mendola wrote:

Now my question is why the 7.4 choose the hash join ?  :-(

It looks to me that the marge join is faster because there wasn't really 
anything to merge, it resulted in 0 rows. Maybe the hash join that is 
choosen in 7.4 would have been faster had there been a couple of result 
rows (just a guess).

It would be interesting to compare the plans in 7.4 with and without 
hash_join active and see what costs it estimates for a merge join compared 
to a hash join.
Here they are:
hash_join = on
   
 QUERY PLAN
--
 Limit  (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 loops=1)
   ->  Sort  (cost=10.21..10.21 rows=1 width=24) (actual time=0.880..0.880 rows=0 
loops=1)
 Sort Key: e.id_evento
 ->  Hash Join  (cost=9.02..10.21 rows=1 width=24) (actual time=0.687..0.687 
rows=0 loops=1)
   Hash Cond: ("outer".id_tipo_evento = "inner".id_tipo_evento)
   ->  Seq Scan on lookup_tipo_evento le  (cost=0.00..1.16 rows=16 
width=32) (actual time=0.017..0.038 rows=16 loops=1)
   ->  Hash  (cost=9.02..9.02 rows=1 width=16) (actual time=0.212..0.212 
rows=0 loops=1)
 ->  Index Scan using t_evento_id_pratica_key on t_evento e  
(cost=0.00..9.02 rows=1 width=16) (actual time=0.208..0.208 rows=0 loops=1)
   Index Cond: (id_pratica = 5)
   Filter: (((id_tipo_evento)::text = '5'::text) OR 
((id_tipo_evento)::text = '6'::text) OR ((id_tipo_evento)::text = '7'::text) OR 
((id_tipo_evento)::text = '8'::text))
 Total runtime: 1.244 ms
(11 rows)
hash_join = off
   
 QUERY PLAN
--
 Limit  (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 loops=1)
   ->  Sort  (cost=10.28..10.28 rows=1 width=24) (actual time=0.425..0.425 rows=0 
loops=1)
 Sort Key: e.id_evento
 ->  Merge Join  (cost=10.25..10.27 rows=1 width=24) (actual time=0.218..0.218 
rows=0 loops=1)
   Merge Cond: ("outer".id_tipo_evento = "inner".id_tipo_evento)
   ->  Sort  (cost=9.02..9.02 rows=1 width=16) (actual time=0.214..0.214 
rows=0 loops=1)
 Sort Key: e.id_tipo_evento
 ->  Index Scan using t_evento_id_pratica_key on t_evento e  
(cost=0.00..9.02 rows=1 width=16) (actual time=0.110..0.110 rows=0 loops=1)
   Index Cond: (id_pratica = 5)
   Filter: (((id_tipo_evento)::text = '5'::text) OR 
((id_tipo_evento)::text = '6'::text) OR ((id_tipo_evento)::text = '7'::text) OR 
((id_tipo_evento)::text = '8'::text))
   ->  Sort  (cost=1.22..1.23 rows=16 width=32) (never executed)
 Sort Key: le.id_tipo_evento
 ->  Seq Scan on lookup_tipo_evento le  (cost=0.00..1.16 rows=16 
width=32) (never executed)
 Total runtime: 0.721 ms
(14 rows)


Regards
Gaetano Mendola




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Dennis Bjorklund
On Wed, 22 Sep 2004, Gaetano Mendola wrote:

>   Limit  (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 
> loops=1)
>   Limit  (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 
> loops=1)

These estimated costs are almost the same, but the runtime differs a bit. 
This means that maybe you need to alter settings like random_page_cost, 
effective_cache and maybe some others to make the cost reflect the runtime 
better.

Since the costs are so close to each other very small changes can make it 
choose the other plan. It's also very hard to make an estimate that is 
correct in all situations. That's why it's called an estimate after all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Gaetano Mendola
Dennis Bjorklund wrote:
> On Wed, 22 Sep 2004, Gaetano Mendola wrote:
>
>
>>  Limit  (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 
loops=1)
>>  Limit  (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 
loops=1)
>
>
> These estimated costs are almost the same, but the runtime differs a bit.
> This means that maybe you need to alter settings like random_page_cost,
> effective_cache and maybe some others to make the cost reflect the runtime
> better.
>
> Since the costs are so close to each other very small changes can make it
> choose the other plan. It's also very hard to make an estimate that is
> correct in all situations. That's why it's called an estimate after all.
Is not feseable.
That values are obtained with random_page_cost = 2, effective_cache_size = 2,
cpu_tuple_cost = 0.01
increasing or decreasing random_page_cost this means increase or decrease both
costs:
random_page_cost = 1.5
hashjoin on  =>  8.47
hashjoin off =>  8.53
random_page_cost = 3
hashjoin on  =>  13.70
hashjoin off =>  13.76
so is choosen the hasjoin method in both cases.
In the other side the effective_cache_size doesn't affect this costs.
Decreasing the cpu_tuple_cost have the same effect
cpu_tuple_cost = 0.005
hashjoin on  =>  10.11
hashjoin off =>  10.17
cpu_tuple_cost = 0.001
hashjoin on  =>  10.03
hashjoin off =>  10.03
cpu_tuple_cost = 0.0005
hashjoin on  =>  10.01
hashjoin off =>  10.01

And when the two costs are the same the hashjoin path is choosen.
I think cpu_tuple_cost less then 0.001 is not a good idea
I think the only way is set the hashjoin = off.  Any other suggestion ?
Regards
Gaetano Mendola


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Greg Stark

Gaetano Mendola <[EMAIL PROTECTED]> writes:

> hash_join = on
> ->  Seq Scan on lookup_tipo_evento le  (cost=0.00..1.16 rows=16 
> width=32) (actual time=0.017..0.038 rows=16 loops=1)
>
> hash_join = off
>   ->  Seq Scan on lookup_tipo_evento le  (cost=0.00..1.16 
> rows=16 width=32) (never executed)


Actually this looks like it's arguably a bug to me. Why does the hash join
execute the sequential scan at all? Shouldn't it also like the merge join
recognize that the other hashed relation is empty and skip the sequential scan
entirely?


-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Dennis Bjorklund
On 22 Sep 2004, Greg Stark wrote:

> Actually this looks like it's arguably a bug to me. Why does the hash
> join execute the sequential scan at all? Shouldn't it also like the
> merge join recognize that the other hashed relation is empty and skip
> the sequential scan entirely?

I'm not sure you can classify that as a bug. It's just that he in one of 
the plans started with the empty scan and bacause of that didn't need 
the other, but with the hash join it started with the table that had 16 
rows and then got to the empty one.

While I havn't checked, I assume that if it had started with the empty 
table there then it would have skipped the other.

I don't know what criteria is used to select which part to start with when
doing a hash join. Looks like it started with the one that had the highest
estimate of rows here, doing it the other way around might be a good idea
because you in some cases are lucky to find an empty scans and can omit
the other.

The above are just observations of the behaviour, I've not seen the source 
at all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Greg Stark

Dennis Bjorklund <[EMAIL PROTECTED]> writes:

> On 22 Sep 2004, Greg Stark wrote:
> 
> > Actually this looks like it's arguably a bug to me. Why does the hash
> > join execute the sequential scan at all? Shouldn't it also like the
> > merge join recognize that the other hashed relation is empty and skip
> > the sequential scan entirely?
> 
> I'm not sure you can classify that as a bug. It's just that he in one of 
> the plans started with the empty scan and bacause of that didn't need 
> the other, but with the hash join it started with the table that had 16 
> rows and then got to the empty one.

No, postgres didn't do things in reverse order. It hashed the empty table and
then went ahead and checked every record of the non-empty table against the
empty hash table.

Reading the code there's no check for this, and it seems like it would be a
useful low-cost little optimization.

I think postgres normally hashes the table it thinks is smaller, so you do
join against an empty relation it should end up on the hash side of the hash
join and allow postgres to avoid the scan of the outer table.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> No, postgres didn't do things in reverse order. It hashed the empty table and
> then went ahead and checked every record of the non-empty table against the
> empty hash table.

> Reading the code there's no check for this, and it seems like it would be a
> useful low-cost little optimization.

Yeah, I was just looking at doing that.

It would also be interesting to prefetch one row from the outer table and fall
out immediately (without building the hash table) if the outer table is
empty.  This seems to require some contortion of the code though :-(

> I think postgres normally hashes the table it thinks is smaller,

Right, it will prefer to put the physically smaller table (estimated
width*rows) on the inside.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Yeah, I was just looking at doing that.

Well I imagine it takes you as long to read my patch as it would for you to
write it. But anyways it's still useful to me as exercises.

> It would also be interesting to prefetch one row from the outer table and fall
> out immediately (without building the hash table) if the outer table is
> empty.  This seems to require some contortion of the code though :-(

Why is it any more complicated than just moving the hash build down lower?
There's one small special case needed in ExecHashJoinOuterGetTuple but it's
pretty non-intrusive.

It seems to work for me but I can't test multiple batches easily. I think I've
convinced myself that they would work fine but...

test=# explain analyze select * from a natural join b;
 QUERY PLAN
  
-
 Hash Join  (cost=22.50..345.00 rows=5000 width=4) (actual time=0.005..0.005 rows=0 
loops=1)
   Hash Cond: ("outer".a = "inner".a)
   ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.002..0.002 
rows=0 loops=1)
   ->  Hash  (cost=20.00..20.00 rows=1000 width=4) (never executed)
 ->  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=4) (never executed)
 Total runtime: 0.070 ms
(6 rows)


Index: backend/executor/nodeHash.c
===
RCS file: /projects/cvsroot/pgsql-server/src/backend/executor/nodeHash.c,v
retrieving revision 1.86
diff -c -r1.86 nodeHash.c
*** backend/executor/nodeHash.c 29 Aug 2004 04:12:31 -  1.86
--- backend/executor/nodeHash.c 22 Sep 2004 18:37:40 -
***
*** 232,237 
--- 232,238 
hashtable->buckets = NULL;
hashtable->nbatch = nbatch;
hashtable->curbatch = 0;
+   hashtable->ntup = 0;
hashtable->innerBatchFile = NULL;
hashtable->outerBatchFile = NULL;
hashtable->innerBatchSize = NULL;
***
*** 493,498 
--- 494,501 
   heapTuple->t_len);
hashTuple->next = hashtable->buckets[bucketno];
hashtable->buckets[bucketno] = hashTuple;
+ 
+   hashtable->ntup ++;
}
else
{
Index: backend/executor/nodeHashjoin.c
===
RCS file: /projects/cvsroot/pgsql-server/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.64
diff -c -r1.64 nodeHashjoin.c
*** backend/executor/nodeHashjoin.c 29 Aug 2004 05:06:42 -  1.64
--- backend/executor/nodeHashjoin.c 22 Sep 2004 18:37:41 -
***
*** 109,143 
ResetExprContext(econtext);
  
/*
-* if this is the first call, build the hash table for inner relation
-*/
-   if (!node->hj_hashdone)
-   {
-   /*
-* create the hash table
-*/
-   Assert(hashtable == NULL);
-   hashtable = ExecHashTableCreate((Hash *) hashNode->ps.plan,
-   
node->hj_HashOperators);
-   node->hj_HashTable = hashtable;
- 
-   /*
-* execute the Hash node, to build the hash table
-*/
-   hashNode->hashtable = hashtable;
-   (void) ExecProcNode((PlanState *) hashNode);
- 
-   /*
-* Open temp files for outer batches, if needed. Note that file
-* buffers are palloc'd in regular executor context.
-*/
-   for (i = 0; i < hashtable->nbatch; i++)
-   hashtable->outerBatchFile[i] = BufFileCreateTemp(false);
- 
-   node->hj_hashdone = true;
-   }
- 
-   /*
 * Now get an outer tuple and probe into the hash table for matches
 */
outerTupleSlot = node->js.ps.ps_OuterTupleSlot;
--- 109,114 
***
*** 163,171 
--- 134,180 
node->hj_MatchedOuter = false;
  
/*
+* if this is the first call, build the hash table for inner 
relation
+*/
+   if (!node->hj_hashdone)
+   {
+   /*
+* create the hash table
+*/
+   Assert(hashtable == NULL);
+   hashtable = ExecHashTableCreate((Hash *) 
hashNode->ps.plan,
+  
 node->hj_HashOperators);
+   node->hj_HashTable = hashtable;
+ 
+   /*
+* execute the H

Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
>> It would also be interesting to prefetch one row from the outer table and fall
>> out immediately (without building the hash table) if the outer table is
>> empty.  This seems to require some contortion of the code though :-(

> Why is it any more complicated than just moving the hash build down lower?

Having to inject the consideration into ExecHashJoinOuterGetTuple seems
messy to me.

On reflection I'm not sure it would be a win anyway, for a couple of reasons.
(1) Assuming that the planner has gotten things right and put the larger
relation on the outside, the case of an empty outer relation and a
nonempty inner one should rarely arise.
(2) Doing this would lose some of the benefit from the optimization to
detect an empty inner relation.  If the outer subplan is a slow-start
one (such as another hashjoin), it would lose a lot of the benefit :-(

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Caching of Queries

2004-09-22 Thread Scott Kirkwood
I couldn't find anything in the docs or in the mailing list on this,
but it is something that Oracle appears to do as does MySQL.
The idea, I believe, is to do a quick (hash) string lookup of the
query and if it's exactly the same as another query that has been done
recently to re-use the old parse tree.
It should save the time of doing the parsing of the SQL and looking up
the object in the system tables.
It should probably go through the planner again because values passed
as parameters may have changed. Although, for extra points it could
look at the previous query plan as a hint.
On the surface it looks like an easy enhancement, but what do I know?
I suppose it would benefit mostly those programs that use a lot of
PQexecParams() with simple queries where a greater percentage of the
time is spent parsing the SQL rather than building the execute plan.
What do you think?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Caching of Queries

2004-09-22 Thread Tom Lane
Scott Kirkwood <[EMAIL PROTECTED]> writes:
> What do you think?

I think this would allow the problems of cached plans to bite
applications that were previously not subject to them :-(.
An app that wants plan re-use can use PREPARE to identify the
queries that are going to be re-executed.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Infinite CPU loop due to field ::type casting

2004-09-22 Thread Steven Rosenstein




Gentlefolk,

I'm not sure if this is the proper forum for this question, and it might
have been answered in a previous thread, but I'm new to PostgreSQL and the
research I did in the archives did not turn up anything addressing this
issue.  Please direct me to the proper forum is this is not the correct
venue.

Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6
(stock with the RHEL distribution)

The two tables I used in the example are tbl_device and tbl_sad_event:

vsa=# \d vsa.tbl_device;
   Table "vsa.tbl_device"
 Column |   Type   |
Modifiers
+--+-
 id | integer  | not null default
nextval('vsa.tbl_device_id_seq'::text)
 name   | character varying(100)   | not null
 account_id | bigint   | not null
 vss_site_id| bigint   | not null
 org_site_id| bigint   | not null default 0
 device_type_id | integer  | not null default 1
 os_family_id   | integer  | not null default 0
 status_id  | integer  | not null default 0
 timezone   | character varying(80)|
 clientkey  | character varying(2048)  | not null
 record_created | timestamp with time zone | default now()
Indexes: pk_tbl_device primary key btree (id),
 idx_d_uniq_name_site_account_key unique btree (name, vss_site_id,
account_id, clientkey),
 tbl_device_clientkey_key unique btree (clientkey),
 idx_d_account_id btree (account_id),
 idx_d_account_site_name btree (account_id, vss_site_id, name),
 idx_d_device_type_id btree (device_type_id),
 idx_d_name btree (name),
 idx_d_org_site_id btree (org_site_id),
 idx_d_os_family_id btree (os_family_id),
 idx_d_status_id btree (status_id),
 idx_d_vss_site_id btree (vss_site_id)
Foreign Key constraints: fk_d_va FOREIGN KEY (account_id) REFERENCES
vsa.tbl_vsa_account(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_vs FOREIGN KEY (vss_site_id) REFERENCES
vsa.tbl_vss_site(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dof FOREIGN KEY (os_family_id) REFERENCES
vsa.enum_device_os_family(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dsc FOREIGN KEY (status_id) REFERENCES
vsa.enum_device_status_code(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dt FOREIGN KEY (device_type_id) REFERENCES
vsa.enum_device_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION
Triggers: trg_clean_device_name

vsa=# \d vsa.tbl_sad_event
 Table "vsa.tbl_sad_event"
 Column |Type |
Modifiers
+-+
 id | integer | not null default
nextval('vsa.tbl_sad_event_id_seq'::text)
 device_id  | bigint  | not null
 log_type   | integer |
 severity   | character varying(20)   |
 time_logged| timestamp without time zone |
 user_name  | character varying(50)   |
 remote_user| character varying(50)   |
 remote_host| character varying(100)  |
 source_tag | character varying(30)   |
 event_code | character varying(50)   |
 type   | character varying(6)|
 record_created | timestamp with time zone| default now()
Indexes: pk_tbl_sad_event primary key btree (id),
 idx_se_dev_time_type btree (device_id, time_logged, "type"),
 idx_se_device_id btree (device_id),
 idx_se_time_logged btree (time_logged),
 idx_se_type btree ("type"),
 sjr_se_id_time_type btree (device_id, time_logged, "type")
Foreign Key constraints: fk_sade_d FOREIGN KEY (device_id) REFERENCES
vsa.tbl_device(id) ON UPDATE NO ACTION ON DELETE CASCADE


Here is my original query, and the query plan generated by the planner:

vsa=# explain
SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count
FROM vsa.tbl_device AS dev
LEFT OUTER JOIN
  (SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*)
   FROM vsa.dtbl_logged_event_20040922 AS stbl
   WHERE stbl.log_type IN (2, 3, 4, 5)
   GROUP BY stbl.device_id, stbl.log_type, stbl.severity) AS tbl
  ON (dev.id=tbl.device_id)
ORDER BY dev.name;
  
QUERY PLAN

--
 Sort  (cost=40893.18..40960.93 rows=27100 width=79)
   Sort Key: dev.name
   ->  Merge Join  (cost=38417.13..38897.77 rows=27100 width=79)
 Merge Con

Re: [PERFORM] NAS, SAN or any alternate solution ?

2004-09-22 Thread Andrew Hammond
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Rod Taylor wrote:
| I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both
| work as well as expected, but do require some tweeking as they normally
| are not optimized for the datablock size that PostgreSQL likes to deal
| with (8k by default) -- this can make as much as a 50% difference in
| performance levels.
I'm looking for documentation about the datablock size you mentioned above.
My goal is to tune the disk / filesystem on our prototype system. It's
an EMC disk array, so sectors on disk are 512 bytes of usable space.
We've decided to go with RAID 10 since the goal is to maximize
performance. Currently the raid element size is set at 16 sectors which
is 8192 bytes of payload. I've got a sysadmin working on getting XFS
going with 8192 byte blocks. My next task will be to calculate the
amount of space used by XFS for headers etc. to find out how much of
those 8192 bytes can be used for the postgres payload. Then configure
postgres to use datablocks that size. So I'm looking for details on how
to manipulate the size of the datablock.
I'm also not entirely sure how to make the datablocks line up with the
filesystem blocks. Any suggestions on this would be greatly appreciated.
- --
Andrew Hammond416-673-4138[EMAIL PROTECTED]
Database Administrator, Afilias Canada Corp.
CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
iD8DBQFBUeHmgfzn5SevSpoRAu2sAJ4nHHup5lhp4+RcgBPGoJpUFoE1SQCgyvW1
ixyAvqb7ZkB+IIdGb36mpxI=
=uDLW
-END PGP SIGNATURE-
---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Fw: Infinite CPU loop due to field ::type casting, Take II :-)

2004-09-22 Thread Steven Rosenstein




I just realized in my haste to send this email out I provided the wrong
table in my example.  Below is the same email, but with
vsa.dtbl_logged_event_20040922 substituted for vsa.tbl_sad_event.

Sorry for the inconvenience.

--- Steve


Gentlefolk,

I'm not sure if this is the proper forum for this question, and it might
have been answered in a previous thread, but I'm new to PostgreSQL and the
research I did in the archives did not turn up anything addressing this
issue.  Please direct me to the proper forum is this is not the correct
venue.

Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6
(stock with the RHEL distribution)

The two tables I used in the example are tbl_device and
dtbl_logged_event_20040922:

vsa=# \d vsa.tbl_device;
   Table "vsa.tbl_device"
 Column |   Type   |
Modifiers
+--+-
 id | integer  | not null default
nextval('vsa.tbl_device_id_seq'::text)
 name   | character varying(100)   | not null
 account_id | bigint   | not null
 vss_site_id| bigint   | not null
 org_site_id| bigint   | not null default 0
 device_type_id | integer  | not null default 1
 os_family_id   | integer  | not null default 0
 status_id  | integer  | not null default 0
 timezone   | character varying(80)|
 clientkey  | character varying(2048)  | not null
 record_created | timestamp with time zone | default now()
Indexes: pk_tbl_device primary key btree (id),
 idx_d_uniq_name_site_account_key unique btree (name, vss_site_id,
account_id, clientkey),
 tbl_device_clientkey_key unique btree (clientkey),
 idx_d_account_id btree (account_id),
 idx_d_account_site_name btree (account_id, vss_site_id, name),
 idx_d_device_type_id btree (device_type_id),
 idx_d_name btree (name),
 idx_d_org_site_id btree (org_site_id),
 idx_d_os_family_id btree (os_family_id),
 idx_d_status_id btree (status_id),
 idx_d_vss_site_id btree (vss_site_id)
Foreign Key constraints: fk_d_va FOREIGN KEY (account_id) REFERENCES
vsa.tbl_vsa_account(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_vs FOREIGN KEY (vss_site_id) REFERENCES
vsa.tbl_vss_site(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dof FOREIGN KEY (os_family_id) REFERENCES
vsa.enum_device_os_family(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dsc FOREIGN KEY (status_id) REFERENCES
vsa.enum_device_status_code(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dt FOREIGN KEY (device_type_id) REFERENCES
vsa.enum_device_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION
Triggers: trg_clean_device_name

vsa=# \d vsa.dtbl_logged_event_20040922
 Table
"vsa.dtbl_logged_event_20040922"
 Column  |Type |
Modifiers
-+-+-
 id  | integer | not null default
nextval('vsa.dtbl_logged_event_20040922_id_seq'::text)
 device_id   | bigint  | not null
 report_datetime | timestamp without time zone |
 time_logged | timestamp without time zone |
 log_type| integer | not null
 type| character varying(50)   |
 severity| character varying(30)   |
 source_tag  | character varying(30)   |
 remote_host | character varying(100)  |
 user_name   | character varying(50)   |
 event_code  | character varying(10)   |
 description | text|
 record_created  | timestamp with time zone| default now()
 event_code_new  | character varying(30)   |
 remote_user | character varying(50)   |
Indexes: pk_dtbl_logged_event_20040922 primary key btree (id),
 idx_le_device_id_20040922 btree (device_id),
 idx_le_id_source_event_20040922 btree (device_id, source_tag,
event_code),
 idx_le_id_src_20040922 btree (device_id, source_tag),
 idx_le_id_type_severity_evtcode_20040922 btree (device_id,
log_type, severity, event_code),
 idx_le_log_type_20040922 btree (log_type),
 idx_le_source_tag_20040922 btree (source_tag),
 idx_le_time_logged_20040922 btree (time_logged),
 idx_le_time_type_20040922 btree (time_logged, log_type)
Foreign Key constraints: fk_le_lelt_20040922 FOREIGN KEY (log_type)
REFERENCES vsa.enum_le_log_type(id) ON UPDATE NO ACTION ON DELETE NO
ACTION,
 fk_le_d_20040922 FOREIGN KEY (device_id)
REFERENCES vsa.tbl_device(id) ON UPDATE NO ACTION ON DELETE CA

Re: [PERFORM] NAS, SAN or any alternate solution ?

2004-09-22 Thread Rod Taylor
> Rod Taylor wrote:
> | I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both
> | work as well as expected, but do require some tweeking as they normally
> | are not optimized for the datablock size that PostgreSQL likes to deal
> | with (8k by default) -- this can make as much as a 50% difference in
> | performance levels.

> I'm also not entirely sure how to make the datablocks line up with the
> filesystem blocks. Any suggestions on this would be greatly appreciated.

We just played with Veritas settings while running pg_bench on a 200GB
database. I no longer have access to the NetApp, but the settings for
the Hitachi are below.

In tunefstab we have:

read_pref_io=8192,read_nstream=4,write_pref_io=8192,write_nstream=2

In fstab it's:
defaults,mincache=tmpcache,noatime


If you have better settings, please shoot them over so we can try them
out. Perhaps even get someone over there to write a new SAN section in
the Tuning Chapter.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] SAN performance

2004-09-22 Thread Anjan Dave








Hello,

 

I’ll be moving a DB from internal RAID-10 SCSI storage
to an EMC CX300 FC RAID-10 LUN, bound to the host. I’ve setup a test host
machine and a test LUN. The /var/lib/pgsql/data folder is sym-linked to a partition
on the LUN. 

 

Other than the shared_buffers, effective cache size, and
sort memory, I am not sure if I need to change any other parameters in the
postgresql.conf file for getting maximum performance from the EMC box.

 

Is there a general guideline for setting up postgres database
and the tunable parameters on a SAN, especially for EMC?

 

Appreciate any help,

 

Thanks,
Anjan








Re: [PERFORM] Fw: Infinite CPU loop due to field ::type casting, Take II :-)

2004-09-22 Thread Tom Lane
Steven Rosenstein <[EMAIL PROTECTED]> writes:
> Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6

> vsa=# explain
> SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count
> FROM vsa.tbl_device AS dev
> LEFT OUTER JOIN
>   (SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*)
>FROM vsa.dtbl_logged_event_20040922 AS stbl
>WHERE stbl.log_type IN (2, 3, 4, 5) GROUP BY stbl.device_id,
> stbl.log_type, stbl.severity) AS tbl
>   ON (dev.id=tbl.device_id::int)
> ORDER BY dev.name;
> QUERY 
> PLAN

> --
>  Nested Loop  (cost=0.00..45848850.65 rows=27100 width=79)
>Join Filter: ("outer".id = ("inner".device_id)::integer)
>->  Index Scan using idx_d_name on tbl_device dev  (cost=0.00..1490.19 rows=1275 
> width=26)
>->  Subquery Scan tbl  (cost=0.00..35552.21 rows=27100 width=26)
>  ->  Aggregate  (cost=0.00..35552.21 rows=27100 width=26)
>->  Group  (cost=0.00..34874.70 rows=271005 width=26)
>  ->  Index Scan using idx_le_id_type_severity_evtcode_20040922 
> on dtbl_logged_event_20040922 stbl  (cost=0.00..32842.16
> rows=271005 width=26)
>Filter: ((log_type = 2) OR (log_type = 3) OR (log_type = 
> 4) OR (log_type = 5))
> (8 rows)

> Time: 1.62 ms


> Notice that the query plan changes completely when I cast device_id as int.
> What is worse (and why I'm writing) is that when I run the second query, it
> goes into an infinite CPU loop.

"Bad plan" and "infinite loop" are two very different things.

In 7.3 you'd be better off without the cast, as you just found out.  The
7.3 backend can only handle merge or hash joins that use a join clause
of the form "variable = variable" --- anything more complicated falls
back to a nested loop join.  It does handle mergejoins between unlike
data types, though, so you were doing okay with the undecorated query.

7.4 is smarter; dunno if you want to upgrade at this point.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Caching of Queries

2004-09-22 Thread Neil Conway
On Thu, 2004-09-23 at 05:59, Tom Lane wrote:
> I think this would allow the problems of cached plans to bite
> applications that were previously not subject to them :-(.
> An app that wants plan re-use can use PREPARE to identify the
> queries that are going to be re-executed.

I agree; if you want to do some work in this area, making improvements
to PREPARE would IMHO be the best bet. For example, some people have
talked about having PREPARE store queries in shared memory. Another idea
would be to improve the quality of the plan we generate at PREPARE time:
for instance you could generate 'n' plans for various combinations of
input parameters, and then choose the best query plan at EXECUTE time.
It's a difficult problem to solve, however (consider multiple parameters
to PREPARE, for example).

-Neil



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] NAS, SAN or any alternate solution ?

2004-09-22 Thread Greg Stark

Andrew Hammond <[EMAIL PROTECTED]> writes:

> My goal is to tune the disk / filesystem on our prototype system. It's
> an EMC disk array, so sectors on disk are 512 bytes of usable space.
> We've decided to go with RAID 10 since the goal is to maximize
> performance. Currently the raid element size is set at 16 sectors which
> is 8192 bytes of payload. 

Do people find it works well to have a stripe size that small? It seems like
it would be better to have it be at least a few filesystem/postgres blocks so
that subsequent reads stand a chance of being sequential and not causing
another spindle to have to seek. Does this depend on whether it's an DSS load
vs an OLTP load? If it's a single query at a time DSS system perhaps small
blocksizes work best to get maximum throughput?

> I've got a sysadmin working on getting XFS going with 8192 byte blocks.

Having your filesystem block size match postgres's block size is probably a
good idea. So 8k blocks is good.

> My next task will be to calculate the amount of space used by XFS for
> headers etc. to find out how much of those 8192 bytes can be used for the
> postgres payload.

No filesystem that I know of uses up space in every block. The overhead is all
stored elsewhere in blocks exclusively contain such overhead data. So just
setting postgres to 8k which the default would work well.

> Then configure postgres to use datablocks that size. So I'm looking for
> details on how to manipulate the size of the datablock.

Look in pg_config_manual.h in src/include. Postgres has to be recompiled to
change it and the database has to be reinitialized. But it could be set to 16k
or 32k. In which case you would probably want to adjust your filesystem to
match. But unless you do experiments you won't know if it would be of any
benefit to change.

> I'm also not entirely sure how to make the datablocks line up with the
> filesystem blocks. Any suggestions on this would be greatly appreciated.

They just will. The files start on a block boundary, so every 8k is a new
block. Postgres stores 8k at a time always.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] O_DIRECT setting

2004-09-22 Thread Neil Conway
On Mon, 2004-09-20 at 17:57, Guy Thornley wrote:
> According to the manpage, O_DIRECT implies O_SYNC:
> 
> File I/O is done directly to/from user space buffers.  The I/O is
> synchronous, i.e., at the completion of the read(2) or write(2)
> system call, data is guaranteed to have been transferred.

This seems like it would be a rather large net loss. PostgreSQL already
structures writes so that the writes we need to hit disk immediately
(WAL records) are fsync()'ed -- the kernel is given more freedom to
schedule how other writes are flushed from the cache. Also, my
recollection is that O_DIRECT also disables readahead -- if that's
correct, that's not what we want either.

BTW, using O_DIRECT has been discussed a few times in the past. Have you
checked the list archives? (for both -performance and -hackers)

> Would people be interested in a performance benchmark?

Sure -- I'd definitely be curious, although as I said I'm skeptical it's
a win.

> I need some benchmark tips :)

Some people have noted that it can be difficult to use contrib/pgbench
to get reproducible results -- you might want to look at Jan's TPC-W
implementation or the OSDL database benchmarks:

http://pgfoundry.org/projects/tpc-w-php/
http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/

> Incidentally, postgres heap files suffer really, really bad fragmentation,
> which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...)
> quite drastically. We have in-house patches that somewhat alleiviate this,
> but they are not release quality.

Can you elaborate on these "in-house patches"?

-Neil



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching of Queries

2004-09-22 Thread Gary Doades
On 22 Sep 2004 at 15:59, Tom Lane wrote:

> Scott Kirkwood <[EMAIL PROTECTED]> writes:
> > What do you think?
> 
> I think this would allow the problems of cached plans to bite
> applications that were previously not subject to them :-(.
> An app that wants plan re-use can use PREPARE to identify the
> queries that are going to be re-executed.
> 
>   regards, tom lane
> 

And then there are the people that would like to upgrade and get a 
performance gain without having to change their programs. A simple 
conf flag could turn query/plan caching off for all those that rely on each 
statement being re-planned.

This is where SQLServer etc. tend to get big wins. I know from direct 
comparisons that SQLServer often takes quite a bit longer to parse/plan 
a select statement than Postgres, but wins out overall from its 
query/plan caching.

Regards,
Gary.


---(end of broadcast)---
TIP 8: explain analyze is your friend