Re: [PERFORM] PostgreSQL publishes first real benchmark

2007-07-10 Thread Heiko W.Rupp

Hi,

I think this result will be useful for performance discussions of  
postgresql against other databases.


http://www.spec.org/jAppServer2004/results/res2007q3/

More on Josh Berkus's blog:

http://blogs.ittoolbox.com/database/soup/archives/postgresql- 
publishes-first-real-benchmark-17470



Congrats to everyone that worked to make this happen.
While I will never get customers to buy that nice hardware (and I  
would recommend the JBoss Appserver anyway :-),
it really is a big sign telling yes postgres can be really fast -  
as oposed to the urban legends around.


  Heiko

--
   Reg. Adresse: Red Hat GmbH, Hauptstätter Strasse 58, 70178 Stuttgart
   Handelsregister: Amtsgericht Stuttgart HRB 153243
   Geschäftsführer: Brendan Lane, Charlie Peters, Michael  
Cunningham, Werner Knoblich




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Filesystem Direct I/O and WAL sync option

2007-07-10 Thread Dimitri

Yes, I tried all WAL sync methods, but there was no difference...
However, there was a huge difference when I run the same tests under
Solaris10 - 'fdatasync' option gave the best performance level. On the
same time direct I/O did not make difference on Solaris 10 :)

So the main rule - there is no universal rule :)
just adapt system options according your workload...

Direct I/O will generally speed-up write operation due avoiding buffer
flashing overhead as well concurrent writing (breaking POSIX
limitation of single writer per given file on the same time). But on
the same time it may slow-down your read operations, and you may need
64bit PG version to use big cache to still keep same performance level
on SELECT queries. And again, there are other file systems like QFS
(for ex.) which may give you the best of both worlds: direct write and
buffered read on the same time! etc. etc. etc. :)

Rgds,
-Dimitri

On 7/9/07, Jonah H. Harris [EMAIL PROTECTED] wrote:

On 7/9/07, Jim C. Nasby [EMAIL PROTECTED] wrote:
 BTW, it might be worth trying the different wal_sync_methods. IIRC,
 Jonah's seen some good results from open_datasync.

On Linux, using ext3, reiser, or jfs, I've seen open_sync perform
quite better than fsync/fdatasync in most of my tests.  But, I haven't
done significant testing with direct I/O lately.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Query Analyser

2007-07-10 Thread Gauri Kanekar

Hi List,

Is there anyway so as to indicate the Query Analyser not to use the
plan which it is using regularly, and use a new plan ?


From where do the Query Analyser gets the all info to prepare a plan?

Is it only from the pg_statistics table or are there anyother tables
which have this info. stored?

And can we change the statistic??

Thanx in advance
--
Regards
Gauri

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] tuning a function to insert/retrieve values from a reference table

2007-07-10 Thread Marc Mamin

Hello,

I have a simple table id/value, and a function that returns the id of a
given value, inserting the later if not yet present. The probability
that a value already exist within the reference table is very high.

Different db users may have their own reference table with different
content, but as the table definition is identical, I've defined a public
function to maintain these tables. 

Can I optimize this function with:

a) remove the EXCEPTION clause (Is there an underlying lock that prevent
concurrent inserts ?)

b) declare the function being IMMUTABLE ?
   
   - although it may insert a new raw, the returned id is invariant for
a given user
 (I don't really understand the holdability ov immutable functions;
are the results cached only for the livetime of a prepared statement ?,
or can they be shared by different sessions ?)


Thanks,

Marc




--Table definition:

create table ref_table (
  id serial NOT NULL, 
  v varchar NOT NULL, 
  constraint ref_table_pk primary key  (id)
) without oids;

create unique index ref_table_uk on ref_table(v);


-- Function:

CREATE OR REPLACE FUNCTION public.get_or_insert_value(varchar) RETURNS
INT AS 
$BODY$

DECLARE
  id_value INT;

BEGIN

  SELECT INTO id_value id FROM ref_table WHERE v =  $1;

  IF FOUND THEN

RETURN id_value;

  ELSE  --new value to be inserted

DECLARE
  rec record;

BEGIN

 FOR rec in INSERT INTO ref_table (v) VALUES ($1) RETURNING id
 LOOP
  return rec.id;  
 END LOOP;

 EXCEPTION --concurrent access ?
   WHEN unique_violation THEN
 RETURN(SELECT id FROM ref_table WHERE v =  $1);

END;

  END IF;
END;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE;



Re: [PERFORM] tuning a function to insert/retrieve values from a reference table

2007-07-10 Thread Tom Lane
Marc Mamin [EMAIL PROTECTED] writes:
 Can I optimize this function with:

 a) remove the EXCEPTION clause (Is there an underlying lock that prevent
 concurrent inserts ?)

No.

 b) declare the function being IMMUTABLE ?

Certainly not --- it's got side-effects.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Equivalent queries produce different plans

2007-07-10 Thread Craig James

The two queries below produce different plans.

select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id = 320
and   r.version_id   330
order by r.version_id;


select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id = 320
and   r.version_id   330
and   m.version_id = 320
and   m.version_id   330
order by r.version_id;

I discovered this while looking at the plans for the first query.  It seemed to be ignoring the fact that it 
could push the between condition along to the second table, since the condition and the join are 
on the same indexed columns.  So, I added a redundant condition, and bingo, it was a lot faster.  In the 
analysis shown below, the timing (about 1.0 and 1.5 seconds respectively) are for a hot database 
that's been queried a couple of times.  In real life on a cold database, the times are more like 
10 seconds and 21 seconds, so it's quite significant.

Thanks,
Craig



db= explain analyze 
db- select r.version_id, r.row_num, m.molkeys from my_rownum r

db- join my_molkeys m on (r.version_id = m.version_id)
db- where r.version_id = 320
db- and   r.version_id   330
db- order by r.version_id;

Sort  (cost=264979.51..265091.06 rows=44620 width=366) (actual 
time=1424.126..1476.048 rows=46947 loops=1)
  Sort Key: r.version_id
  -  Nested Loop  (cost=366.72..261533.64 rows=44620 width=366) (actual 
time=41.649..1186.331 rows=46947 loops=1)
-  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 rows=44620 
width=8) (actual time=41.616..431.783 rows=46947 loops=1)
  Recheck Cond: ((version_id = 320) AND (version_id  330))
  -  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 rows=46947 
loops=1)
Index Cond: ((version_id = 320) AND (version_id  
330))
-  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 loops=46947)
  Index Cond: (outer.version_id = m.version_id)
Total runtime: 1534.638 ms
(10 rows)


db= explain analyze 
db- select r.version_id, r.row_num, m.molkeys from my_rownum r

db- join my_molkeys m on (r.version_id = m.version_id)
db- where r.version_id = 320
db- and r.version_id 330
db- and m.version_id =   320
db- and m.version_id 330
db- order by r.version_id;

Sort  (cost=157732.20..157732.95 rows=298 width=366) (actual 
time=985.383..1037.423 rows=46947 loops=1)
  Sort Key: r.version_id
  -  Hash Join  (cost=41279.92..157719.95 rows=298 width=366) (actual 
time=502.875..805.402 rows=46947 loops=1)
Hash Cond: (outer.version_id = inner.version_id)
-  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 rows=46947 
loops=1)
  Index Cond: ((version_id = 320) AND (version_id  330))
-  Hash  (cost=41168.37..41168.37 rows=44620 width=8) (actual 
time=502.813..502.813 rows=46947 loops=1)
  -  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 
rows=44620 width=8) (actual time=41.621..417.508 rows=46947 loops=1)
Recheck Cond: ((version_id = 320) AND (version_id  
330))
-  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.174..21.174 rows=46947 
loops=1)
  Index Cond: ((version_id = 320) AND (version_id 
 330))
Total runtime: 1096.031 ms
(12 rows)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Equivalent queries produce different plans

2007-07-10 Thread Craig James

Sorry, I forgot to mention: This is 8.1.4, with a fairly ordinary configuration 
on a 4 GB system.

Craig


Craig James wrote:

The two queries below produce different plans.

select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id = 320
and   r.version_id   330
order by r.version_id;


select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id = 320
and   r.version_id   330
and   m.version_id = 320
and   m.version_id   330
order by r.version_id;

I discovered this while looking at the plans for the first query.  It 
seemed to be ignoring the fact that it could push the between 
condition along to the second table, since the condition and the join 
are on the same indexed columns.  So, I added a redundant condition, and 
bingo, it was a lot faster.  In the analysis shown below, the timing 
(about 1.0 and 1.5 seconds respectively) are for a hot database that's 
been queried a couple of times.  In real life on a cold database, the 
times are more like 10 seconds and 21 seconds, so it's quite significant.


Thanks,
Craig



db= explain analyze db- select r.version_id, r.row_num, m.molkeys from 
my_rownum r

db- join my_molkeys m on (r.version_id = m.version_id)
db- where r.version_id = 320
db- and   r.version_id   330
db- order by r.version_id;

Sort  (cost=264979.51..265091.06 rows=44620 width=366) (actual 
time=1424.126..1476.048 rows=46947 loops=1)

  Sort Key: r.version_id
  -  Nested Loop  (cost=366.72..261533.64 rows=44620 width=366) (actual 
time=41.649..1186.331 rows=46947 loops=1)
-  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 
rows=44620 width=8) (actual time=41.616..431.783 rows=46947 loops=1)
  Recheck Cond: ((version_id = 320) AND (version_id  
330))
  -  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 
rows=46947 loops=1)
Index Cond: ((version_id = 320) AND (version_id 
 330))
-  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 
loops=46947)

  Index Cond: (outer.version_id = m.version_id)
Total runtime: 1534.638 ms
(10 rows)


db= explain analyze db- select r.version_id, r.row_num, m.molkeys from 
my_rownum r

db- join my_molkeys m on (r.version_id = m.version_id)
db- where r.version_id = 320
db- and r.version_id 330
db- and m.version_id =   320
db- and m.version_id 330
db- order by r.version_id;

Sort  (cost=157732.20..157732.95 rows=298 width=366) (actual 
time=985.383..1037.423 rows=46947 loops=1)

  Sort Key: r.version_id
  -  Hash Join  (cost=41279.92..157719.95 rows=298 width=366) (actual 
time=502.875..805.402 rows=46947 loops=1)

Hash Cond: (outer.version_id = inner.version_id)
-  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 
rows=46947 loops=1)
  Index Cond: ((version_id = 320) AND (version_id  
330))
-  Hash  (cost=41168.37..41168.37 rows=44620 width=8) (actual 
time=502.813..502.813 rows=46947 loops=1)
  -  Bitmap Heap Scan on my_rownum r  
(cost=366.72..41168.37 rows=44620 width=8) (actual time=41.621..417.508 
rows=46947 loops=1)
Recheck Cond: ((version_id = 320) AND 
(version_id  330))
-  Bitmap Index Scan on 
i_chm_rownum_version_id_4998  (cost=0.00..366.72 rows=44620 width=0) 
(actual time=21.174..21.174 rows=46947 loops=1)
  Index Cond: ((version_id = 320) AND 
(version_id  330))

Total runtime: 1096.031 ms
(12 rows)





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Equivalent queries produce different plans

2007-07-10 Thread Tom Lane
Craig James [EMAIL PROTECTED] writes:
 The two queries below produce different plans.

 select r.version_id, r.row_num, m.molkeys from my_rownum r
 join my_molkeys m on (r.version_id = m.version_id)
 where r.version_id = 320
 and   r.version_id   330
 order by r.version_id;

 select r.version_id, r.row_num, m.molkeys from my_rownum r
 join my_molkeys m on (r.version_id = m.version_id)
 where r.version_id = 320
 and   r.version_id   330
 and   m.version_id = 320
 and   m.version_id   330
 order by r.version_id;

Yeah, the planner does not make any attempt to infer implied
inequalities, so it will not generate the last two clauses for you.
There is machinery in there to infer implied *equalities*, which
is cheaper (fewer operators to consider) and much more useful across
typical queries such as multiway joins on the same keys.  I'm pretty
dubious that it'd be worth the cycles to search for implied
inequalities.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate