Re: [PERFORM] Slow query in JDBC

2017-09-29 Thread Dave Cramer
Good catch Jeff.

as for which version. We always recommend the latest version. 42.1.4

Dave Cramer

da...@postgresintl.com
www.postgresintl.com

On 29 September 2017 at 06:44, Subramaniam C 
wrote:

> Yes you are right the timestamp which the application was providing was in
> seconds whereas the query which was using index had a timestamp in
> milliseconds. So the query was taking time in application.
>
> On Fri, Sep 29, 2017 at 12:19 PM, Jeff Janes  wrote:
>
>> On Thu, Sep 28, 2017 at 2:59 AM, Subramaniam C <
>> subramaniam31...@gmail.com> wrote:
>>
>>> First output show the output when the query is executed from sql command
>>> line. The second output show when it is executed from the application. AS
>>> per the output it is clear that the when the query is executed through JDBC
>>> its not using the index (health_index) instead its doing sequence scan.
>>> Please let us know how this issue can be resolved from JDBC?
>>>
>>> 1.)
>>>
>>>
>>> * ->  Index Only Scan
>>> using health_index on health_timeseries_table  (cost=0.56..421644.56
>>> rows=1558800 width=24)*
>>>
>>> *   Index Cond: (("timestamp" >=
>>> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>>>
>>>
>>
>>> 2.)
>>>
>>>
>>>   ->  Seq Scan on
>>> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>>>
>>> Filter: (("timestamp" >=
>>> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>>>
>>
>>
>> Those are different queries, so it is not terribly surprising it might
>> choose a different plan.
>>
>> For this type of comparison, you need to compare identical queries,
>> including parameter.
>>
>> Cheers,
>>
>> Jeff
>>
>
>


Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Dave Cramer
Why are you using such an old version of the driver ?

Either way the driver is going to use prepare statement to run this, that
is the difference from it an psql.


If you want to see the explain in psql you will need to do

prepare foo as 

then explain execute foo;

FWIW upgrading the driver won't help this situation but there's still no
reason not to upgrade.

Dave Cramer

da...@postgresintl.com
www.postgresintl.com

On 28 September 2017 at 12:32, Subramaniam C 
wrote:

> The JDBC version is 9.4-1201-jdbc41.
>
> Query :-
>
> select count(*) OVER() AS count,uuid,availability,
> objectname,datasourcename,datasourcetype,objecttype,health from (select
> distinct on (health_timeseries_table.mobid) mobid,
> health_timeseries_table.health, health_timeseries_table.timestamp from
> health_timeseries_table where timestamp >= 1505989186834 and timestamp <=
> 1505990086834 ORDER BY health_timeseries_table.mobid DESC,
> health_timeseries_table.timestamp DESC, health_timeseries_table.health
> ASC) t right join (SELECT DISTINCT ON (object_table.uuid) uuid,
> object_table.timestamp,object_table.availability,object_
> table.objectname,object_table.datasourcename,object_table.
> datasourcetype,object_table.objecttype FROM object_table where
> object_table.timestamp >= 0 and object_table.timestamp <= 1505990086834 and
> object_table.tenantid = 'perspica' ORDER BY object_table.uuid DESC,
> object_table.timestamp DESC)u on (t.mobid = u.uuid) order by health asc
> limit 20 offset 0;
>
>
> Please let us know any other details?
>
>
> Thanks and Regards
>
> Subramaniam
>
> On Thu, Sep 28, 2017 at 7:29 PM, Dave Cramer  wrote:
>
>> What version of the driver are you using?
>>
>> The driver does not automatically use a cursor, but it does use prepared
>> statements which can be slower.
>>
>>
>> Can you provide the query and the jdbc query ?
>>
>>
>>
>> Dave Cramer
>>
>> da...@postgresintl.com
>> www.postgresintl.com
>>
>> On 28 September 2017 at 05:59, Subramaniam C 
>> wrote:
>>
>>> First output show the output when the query is executed from sql command
>>> line. The second output show when it is executed from the application. AS
>>> per the output it is clear that the when the query is executed through JDBC
>>> its not using the index (health_index) instead its doing sequence scan.
>>> Please let us know how this issue can be resolved from JDBC?
>>>
>>> 1.)
>>>
>>>
>>> *Limit  (cost=510711.53..510711.58 rows=20 width=72)*
>>>
>>> *   ->  Sort  (cost=510711.53..511961.53 rows=50 width=72)*
>>>
>>> * Sort Key: health_timeseries_table.health*
>>>
>>> * ->  WindowAgg  (cost=0.98..497406.71 rows=50 width=72)*
>>>
>>> *   ->  Merge Left Join  (cost=0.98..491156.71 rows=50
>>> width=64)*
>>>
>>> * Merge Cond: (object_table.uuid =
>>> health_timeseries_table.mobid)*
>>>
>>> * ->  Unique  (cost=0.42..57977.00 rows=50
>>> width=64)*
>>>
>>> *   ->  Index Scan Backward using
>>> object_table_pkey on object_table  (cost=0.42..56727.00 rows=50
>>> width=64)*
>>>
>>> * Index Cond: (("timestamp" >= 0) AND
>>> ("timestamp" <= '1505990086834'::bigint))*
>>>
>>> * Filter: (tenantid = 'perspica'::text)*
>>>
>>> * ->  Materialize  (cost=0.56..426235.64 rows=55526
>>> width=16)*
>>>
>>> *   ->  Unique  (cost=0.56..425541.56 rows=55526
>>> width=24)*
>>>
>>> * ->  Index Only Scan
>>> using health_index on health_timeseries_table  (cost=0.56..421644.56
>>> rows=1558800 width=24)*
>>>
>>> *   Index Cond: (("timestamp" >=
>>> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>>>
>>> *LOG:  duration: 1971.697 ms*
>>>
>>>
>>>
>>>
>>>
>>> 2.)
>>>
>>>
>>> Limit  (cost=457629.21..457629.26 rows=20 width=72)
>>>
>>>   ->  Sort  (cost=457629.21..458879.21 rows=50 width=72)
>>>
>>> Sort Key: health_timeseries_table.health
>&

Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Dave Cramer
What version of the driver are you using?

The driver does not automatically use a cursor, but it does use prepared
statements which can be slower.


Can you provide the query and the jdbc query ?



Dave Cramer

da...@postgresintl.com
www.postgresintl.com

On 28 September 2017 at 05:59, Subramaniam C 
wrote:

> First output show the output when the query is executed from sql command
> line. The second output show when it is executed from the application. AS
> per the output it is clear that the when the query is executed through JDBC
> its not using the index (health_index) instead its doing sequence scan.
> Please let us know how this issue can be resolved from JDBC?
>
> 1.)
>
>
> *Limit  (cost=510711.53..510711.58 rows=20 width=72)*
>
> *   ->  Sort  (cost=510711.53..511961.53 rows=50 width=72)*
>
> * Sort Key: health_timeseries_table.health*
>
> * ->  WindowAgg  (cost=0.98..497406.71 rows=50 width=72)*
>
> *   ->  Merge Left Join  (cost=0.98..491156.71 rows=50
> width=64)*
>
> * Merge Cond: (object_table.uuid =
> health_timeseries_table.mobid)*
>
> * ->  Unique  (cost=0.42..57977.00 rows=50
> width=64)*
>
> *   ->  Index Scan Backward using
> object_table_pkey on object_table  (cost=0.42..56727.00 rows=50
> width=64)*
>
> * Index Cond: (("timestamp" >= 0) AND
> ("timestamp" <= '1505990086834'::bigint))*
>
> * Filter: (tenantid = 'perspica'::text)*
>
> * ->  Materialize  (cost=0.56..426235.64 rows=55526
> width=16)*
>
> *   ->  Unique  (cost=0.56..425541.56 rows=55526
> width=24)*
>
> * ->  Index Only Scan
> using health_index on health_timeseries_table  (cost=0.56..421644.56
> rows=1558800 width=24)*
>
> *   Index Cond: (("timestamp" >=
> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>
> *LOG:  duration: 1971.697 ms*
>
>
>
>
>
> 2.)
>
>
> Limit  (cost=457629.21..457629.26 rows=20 width=72)
>
>   ->  Sort  (cost=457629.21..458879.21 rows=50 width=72)
>
> Sort Key: health_timeseries_table.health
>
> ->  WindowAgg  (cost=367431.49..444324.39 rows=50 width=72)
>
>   ->  Merge Left Join  (cost=367431.49..438074.39 rows=50
> width=64)
>
> Merge Cond: (object_table.uuid =
> health_timeseries_table.mobid)
>
> ->  Unique  (cost=0.42..57977.00 rows=50 width=64)
>
>   ->  Index Scan Backward using object_table_pkey
> on object_table  (cost=0.42..56727.00 rows=50 width=64)
>
> Index Cond: (("timestamp" >= '0'::bigint)
> AND ("timestamp" <= '150599040'::bigint))
>
> Filter: (tenantid = 'perspica'::text)
>
> ->  Materialize  (cost=367431.07..373153.32 rows=55526
> width=16)
>
>   ->  Unique  (cost=367431.07..372459.24
> rows=55526 width=24)
>
> ->  Sort  (cost=367431.07..369945.16
> rows=1005634 width=24)
>
>   Sort Key:
> health_timeseries_table.mobid DESC, health_timeseries_table."timestamp"
> DESC, health_timeseries_table.health
>
>   ->  Seq Scan on
> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>
>
> Filter: (("timestamp" >=
> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>
> On Thu, Sep 28, 2017 at 2:56 PM, Pavy Philippe <
> philippe.p...@worldline.com> wrote:
>
>> https://www.postgresql.org/docs/current/static/auto-explain.html
>>
>>
>> -Message d'origine-
>> De : pgsql-performance-ow...@postgresql.org [mailto:
>> pgsql-performance-ow...@postgresql.org] De la part de Julien Rouhaud
>> Envoyé : jeudi 28 septembre 2017 11:21
>> À : Subramaniam C
>> Cc : pgsql-performance@postgresql.org
>> Objet : Re: [PERFORM] Slow query in JDBC
>>
>> On Thu, Sep 28, 2017 at 10:58 AM, Subramaniam C <
>> subramaniam31...@gmail.com> wrote:
>> > I configured cursor_tuple_fraction to 1 but still I am facing the same
>> > is

Re: [PERFORM] degrading inser performance

2015-09-17 Thread Dave Cramer
Nobody has asked what kind of machine this is ???

Hard disks, memory, etc.

What are your relevant settings in postgresql.conf ? Shared buffers,
checkpoints, etc.

Also how big are the inserts ? What else is this machine doing ? Is it bare
hardware, or a VM ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 17 September 2015 at 10:41, Eildert Groeneveld <
eildert.groenev...@fli.bund.de> wrote:

> Thanks for your input!
> On Do, 2015-09-17 at 11:21 -0300, Matheus de Oliveira wrote:
>
>
> On Thu, Sep 17, 2015 at 9:19 AM, Eildert Groeneveld <
> eildert.groenev...@fli.bund.de> wrote:
>
> >  * one COPY per bulk (20 000 rows)
> copy does not fit so well, as it is not only initial populating.
>
>
> Why do you say COPY doesn't fit? It seems to me that COPY fits perfectly
> for your case, and would certainly make the load faster.
>
> well, more than one table needs to get populated and data is not really
> available in one file.
>
>
> I suspect (not sure though) that the degradation is most because you are
> inserting one row at a time, and, it needs to verify FSM (Free Space Map)
> for each tuple inserted, when the table start to get more populated, this
> verification starts to become slower. If that is really the case, COPY
> would certainly improve that, or even INSERT with many rows at once.
>
> allright, sounds reasonable.
>
> But what is your experience: is it possible that
> inserting the first 2 records takes 29 seconds while inserting lot 20
> (i.e. 9*2 later) takes
> 186.9 sec? after all we are talking only about 20 records? That take 6
> times longer!!
>
> odd, anyone has an idea?
>
> greetings
>
> Eildert
>
>
> Regards,
> --
> Matheus de Oliveira
>
>
> --
> Eildert Groeneveld
> ===
> Institute of Farm Animal Genetics (FLI)
> Mariensee 31535 Neustadt Germany
> Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143
> e-mail: eildert.groenev...@fli.bund.de
> web:http://vce.tzv.fal.de
> ==
>
>


Re: [PERFORM] Unable to allocate 2G of shared memory on wheezy

2014-06-18 Thread Dave Cramer
Problem solved... a runaway process (puppet) had consumed all available
real memory

Dave Cramer


On 18 June 2014 15:24, Dave Cramer  wrote:

> 2014-06-18 13:37:15 EDT FATAL:  could not map anonymous shared memory:
> Cannot allocate memory
> 2014-06-18 13:37:15 EDT HINT:  This error usually means that PostgreSQL's
> request for a shared memory segment exceeded available memory or swap
> space. To reduce the request size (currently 8826445824 bytes), reduce
> PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or
> max_connections.
>
> ipcs -m
>
> -- Shared Memory Segments 
> keyshmid  owner  perms  bytes  nattch status
>
>
>
> Dave Cramer
>
>
> On 18 June 2014 15:15, Tom Lane  wrote:
>
>> Dave Cramer  writes:
>> > To reduce the request size [FAILently 2232950784 bytes), reduce
>> > PostgreSQL's shared memory usage,
>>
>> This error message is a bit garbled :-(.  It would've been useful
>> to know the specific errno, but you've trimmed out that info.
>>
>> Perhaps it's failing because you already have ~10G in shared memory
>> segments?  "sudo ipcs -m" might be illuminating.
>>
>> regards, tom lane
>>
>
>


Re: [PERFORM] Unable to allocate 2G of shared memory on wheezy

2014-06-18 Thread Dave Cramer
2014-06-18 13:37:15 EDT FATAL:  could not map anonymous shared memory:
Cannot allocate memory
2014-06-18 13:37:15 EDT HINT:  This error usually means that PostgreSQL's
request for a shared memory segment exceeded available memory or swap
space. To reduce the request size (currently 8826445824 bytes), reduce
PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or
max_connections.

ipcs -m

-- Shared Memory Segments 
keyshmid  owner  perms  bytes  nattch status



Dave Cramer


On 18 June 2014 15:15, Tom Lane  wrote:

> Dave Cramer  writes:
> > To reduce the request size [FAILently 2232950784 bytes), reduce
> > PostgreSQL's shared memory usage,
>
> This error message is a bit garbled :-(.  It would've been useful
> to know the specific errno, but you've trimmed out that info.
>
> Perhaps it's failing because you already have ~10G in shared memory
> segments?  "sudo ipcs -m" might be illuminating.
>
> regards, tom lane
>


[PERFORM] Unable to allocate 2G of shared memory on wheezy

2014-06-18 Thread Dave Cramer
Here are my kernel settings

kernel.shmmax = 10737418240

# Maximum total size of all shared memory segments in pages (normally 4096
bytes)
kernel.shmall = 2621440
kernel.sem = 250 32000 32 1200

They are actually set...

sysctl -a | grep shm
kernel.shmmax = 10737418240
kernel.shmall = 2621440
kernel.shmmni = 4096

To reduce the request size [FAILently 2232950784 bytes), reduce
PostgreSQL's shared memory usage,


Dave Cramer


[PERFORM] Testing strategies

2014-04-15 Thread Dave Cramer
I have a client wanting to test PostgreSQL on ZFS running Linux.

Other than pg_bench are there any other benchmarks that are easy to test?

One of the possible concerns is fragmentation over time. Any ideas on how
to fragment the database before running pg_bench ?

Also there is some concern about fragmentation of the WAL logs. I am
looking at testing with and without the WAL logs on ZFS. Any other specific
concerns ?


Dave Cramer
credativ ltd (Canada)

78 Zina St
Orangeville, ON
Canada. L9W 1E8

Office: +1 (905) 766-4091
Mobile: +1 (519) 939-0336

===
Canada:  http://www.credativ.ca
USA: http://www.credativ.us
Germany: http://www.credativ.de
Netherlands: http://www.credativ.nl
UK:  http://www.credativ.co.uk
India:   http://www.credativ.in
===


Re: [PERFORM] performance regression with 9.2

2012-11-12 Thread Dave Cramer
Tom,

Will try to get one ASAP.

Dave

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca



On Mon, Nov 12, 2012 at 3:43 PM, Tom Lane  wrote:

> Dave Cramer  writes:
> > This query is a couple orders of magnitude slower the first result is
> > 9.2.1, the second 9.1
>
> Hm, the planner's evidently doing the wrong thing inside the recursive
> union, but not obvious why.  Can you extract a self-contained test case?
>
> regards, tom lane
>


[PERFORM] performance regression with 9.2

2012-11-12 Thread Dave Cramer
ws=16
width=53) (actual time=0.004..0.005 rows=16 loops=1)
 Total runtime: 14555.254 ms
(29 rows)

CareCloud_Prod=#



#  SELECT note_sets."id" AS t0_r0, note_sets."note_set_source_id" AS t0_r1,
note_sets."parent_id" AS t0_r2, note_sets."business_entity_id" AS t0_r3,
note_sets."created_at" AS t0_r4, note_sets."updated_at" AS t0_r5,
note_sets."created_by" AS t0_r6, note_sets."updated_by" AS t0_r7,
note_set_sources."id" AS t1_r0, note_set_sources."name" AS t1_r1,
note_set_sources."code" AS t1_r2, note_set_sources."description" AS t1_r3,
note_set_sources."status" AS t1_r4, note_set_sources."created_at" AS t1_r5,
note_set_sources."updated_at" AS t1_r6, note_set_sources."created_by" AS
t1_r7, note_set_sources."updated_by" AS t1_r8, notes."id" AS t2_r0,
notes."note_set_id" AS t2_r1, notes."subject" AS t2_r2, notes."text" AS
t2_r3, notes."status" AS t2_r4, notes."is_dissmissable" AS t2_r5,
notes."is_home" AS t2_r6, notes."created_at" AS t2_r7, notes."updated_at"
AS t2_r8, notes."created_by" AS t2_r9, notes."updated_by" AS t2_r10 FROM
note_sets  LEFT OUTER JOIN note_set_sources ON note_set_sources.id =
note_sets.note_set_source_id  LEFT OUTER JOIN notes ON notes.note_set_id =
note_sets.id AND notes."status" = E'A' WHERE (note_sets.id IN (WITH
RECURSIVE parent_noteset as (SELECT id FROM note_sets where id = 8304085
UNION SELECT note_sets.id FROM parent_noteset parent_noteset, note_sets
note_sets WHERE note_sets.parent_id = parent_noteset.id) SELECT id FROM
parent_noteset));

QUERY PLAN
--
 Hash Left Join  (cost=249847.72..670928.98 rows=10180954 width=1512)
(actual time=692.423..829.258 rows=48 loops=1)
   Hash Cond: (public.note_sets.note_set_source_id = note_set_sources.id)
   ->  Hash Right Join  (cost=249846.36..530939.50 rows=10180954 width=718)
(actual time=692.382..829.180 rows=48 loops=1)
 Hash Cond: (notes.note_set_id = public.note_sets.id)
 ->  Seq Scan on notes  (cost=0.00..32981.14 rows=878550 width=678)
(actual time=0.027..413.972 rows=878529 loops=1)
   Filter: (status = 'A'::bpchar)
 ->  Hash  (cost=43045.44..43045.44 rows=10180954 width=40) (actual
time=22.904..22.904 rows=46 loops=1)
   Buckets: 524288  Batches: 4  Memory Usage: 2kB
   ->  Nested Loop  (cost=41106.18..43045.44 rows=10180954
width=40) (actual time=12.319..22.738 rows=46 loops=1)
 ->  HashAggregate  (cost=41106.18..41108.18 rows=200
width=4) (actual time=11.873..11.889 rows=46 loops=1)
   ->  CTE Scan on parent_noteset
 (cost=40459.39..40857.41 rows=19901 width=4) (actual time=0.492..11.843
rows=46 loops=1)
 CTE parent_noteset
   ->  Recursive Union
 (cost=0.00..40459.39 rows=19901 width=4) (actual time=0.489..11.822
rows=46 loops=1)
 ->  Index Scan using
note_sets_pkey on note_sets  (cost=0.00..10.50 rows=1 width=4) (actual
time=0.484..0.485 rows=1 loops=1)
   Index Cond: (id = 8304085)
 ->  Nested Loop
 (cost=0.00..4005.09 rows=1990 width=4) (actual time=1.534..3.764 rows=15
loops=3)
   ->  WorkTable Scan on
parent_noteset  (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.001
rows=15 loops=3)
   ->  Index Scan using
note_sets_parent_id_idx on note_sets  (cost=0.00..398.00 rows=199 width=8)
(actual time=0.216..0.244 rows=1 loops=46)
 Index Cond: (parent_id
= parent_noteset.id)
 ->  Index Scan using note_sets_pkey on note_sets
 (cost=0.00..9.67 rows=1 width=40) (actual time=0.234..0.234 rows=1
loops=46)
   Index Cond: (id = parent_noteset.id)
   ->  Hash  (cost=1.16..1.16 rows=16 width=794) (actual time=0.020..0.020
rows=16 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 2kB
 ->  Seq Scan on note_set_sources  (cost=0.00..1.16 rows=16
width=794) (actual time=0.012..0.014 rows=16 loops=1)
 Total runtime: 829.657 ms
(25 rows)


Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


Re: [PERFORM] JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem

2012-08-31 Thread Dave Cramer
Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Thu, Aug 30, 2012 at 2:34 AM, Eileen  wrote:
> Hi,
>
> I have written some Java code which builds a postgresql function.  That
> function calls approximately 6 INSERT statements with a RETURNING clause.  I
> recreate and re-run the function about 900,000 times.  I use JDBC to execute
> these functions on postgresql 8.3 on Windows.  When I tried running this on
> a single Connection of Postgresql, it failed (some kind of memory error).
> So I split the JDBC connections up into chunks of 5000.  I reran and
> everything was fine.  It took about 1 hour to execute all the updates.


>
> Since it took so long to perform the update, I wanted to prevent other users
> from querying the data during that time.  So I read about the LOCK command.
> It seemed like I should LOCK all the tables in the database with an ACCESS
> EXCLUSIVE mode.  That would prevent anyone from getting data while the
> database was making its updates.

Do you understand how MVCC works? Do you really need to lock out users ?
>
> Since a LOCK is only valid for 1 transaction, I set autocommit to FALSE.  I
> also removed the code which chunked up the inserts.  I had read that a
> single transaction ought to have better performance than committing after
> each insert, but that was clearly not what ended up happening in my case.

We would need more information as to what you are doing.
>
> In my case, a few problems occurred.  Number 1, the process ran at least 8
> hours and never finished.  It did not finish because the hard drive was
> filled up.  After running a manual vacuum (VACUUM FULL), no space was freed
> up.  I think this has cost me 20 GB of space.  Is there any way to free this
> space up?  I even dropped the database to no avail.
>
> Secondly, why did this process take over 8 hours to run?  While reading the
> performance mailing list, it seems like recommendations are to run lots of
> INSERTS in a single commit.  Is 5 million too many?  Is redefining a
> function over and over inside a transaction a problem?  Does the RETURNING
> clause present a problem during a single transaction?

VACUUM FULL on 8.3 is not a good idea
>
> If anyone has any suggestions for me, I would really appreciate it.
>

Can you explain at a high level what you are trying to do ?

> Tina


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] JDBC question for PG 8.3.9

2010-04-15 Thread Dave Cramer
On Wed, Apr 14, 2010 at 7:10 PM, Craig Ringer
 wrote:
> On 15/04/10 04:49, Dave Crooke wrote:
>>
>> Hi foilks
>>
>> I am using PG 8.3 from Java. I am considering a performance tweak which
>> will involve holding about 150 java.sql.PreparedStatment objects open
>> against a single PGSQL connection. Is this safe?
>>
>> I know that MySQL does not support prepared statements /per se/, and so
>> their implementation of PreparedStatement is nothing more than some
>> client-side convenience code that knows how to escape and format
>> constants for you. Is this the case for PG, or does the PG JDBC driver
>> do the real thing?
>
> Pg supports real server-side prepared statements, as does the JDBC driver.
>
> IIRC (and I can't say this with 100% certainty without checking the sources
> or a good look at TFM) the PostgreSQL JDBC driver initially does only a
> client-side prepare. However, if the PreparedStatement is re-used more than
> a certain number of times (five by default?) it switches to server-side
> prepared statements.
>
This is partially true. The driver uses an unnamed prepared statement
on the server.

> This has actually caused a bunch of performance complaints on the jdbc list,
> because the query plan may change at that switch-over point, since with a
> server-side prepared statement Pg no longer has a specific value for each
> parameter and may pick a more generic plan.

This is a limitation of the server, not the driver

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Explain Analyze returns faster than psql or JDBC calls.

2009-10-09 Thread Dave Cramer
You should also keep in mind that JDBC uses prepared statements, so you have
to explain analyze accordingly.

Dave

On Thu, Oct 8, 2009 at 5:18 PM, Tom Lane  wrote:

> G B  writes:
> > How can  explain-analyze return significantly much faster than other
> means?
>
> If the returned data is large or takes a lot of work to convert to text,
> this could happen, since EXPLAIN ANALYZE doesn't bother to format the
> tuples for display.  How big are those bytea columns, on average?
>
>regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] I have a fusion IO drive available for testing

2009-03-31 Thread Dave Cramer
On Fri, Mar 27, 2009 at 4:33 PM, David Rees  wrote:

> On Fri, Mar 27, 2009 at 10:30 AM,   wrote:
> > On Thu, 26 Mar 2009, Dave Cramer wrote:
> >> So far using dd I am seeing around 264MB/s on ext3, 335MB/s on ext2
> write
> >> speed. So the question becomes what is the best filesystem for this
> drive?
> >
> > until the current mess with ext3 and fsync gets resolved, i would say it
> > would probably be a bad choice. I consider ext4 too new, so I would say
> XFS
> > or ext2 (depending on if you need the journal or not)
>
> If you're worried about the performance implications of ext3 in
> data=ordered mode, the best thing to do is to mount the filesystem in
> data=writeback mode instead.
>
> If you're only using the filesystem for PostgreSQL data or logs, your
> data will be just as safe except now that data and metadata won't be
> forced to disk in the order it was written.
>
> And you still get the benefit of a journal so fsck's after a crash will be
> fast.
>
> XFS probably is a decent choice, but I don't have much experience with
> it except on a desktop system where I can tell you that having write
> barriers on absolutely kills performance of anything that does a lot
> of filesystem metadata updates.  Again, not a big concern if the
> filesystem is only being used for PostgreSQL data or logs.
>
> -Dave
>

So I tried writing directly to the device, gets around 250MB/s, reads at
around 500MB/s

The client is using redhat so xfs is not an option.

Dave


[PERFORM] I have a fusion IO drive available for testing

2009-03-26 Thread Dave Cramer
So far using dd I am seeing around 264MB/s on ext3, 335MB/s on ext2 write
speed. So the question becomes what is the best filesystem for this drive?

Anyone want me to run anything on it ?

Dave


Re: [JDBC] [PERFORM] Query much slower when run from postgres function

2009-03-09 Thread Dave Cramer
On Mon, Mar 9, 2009 at 1:16 PM, Tom Lane  wrote:

> Guillaume Smet  writes:
> > Unnamed prepared statements are planned after binding the values,
> > starting with 8.3, or more precisely starting with 8.3.2 as early 8.3
> > versions were partially broken on this behalf.
>
> No, 8.2 did it too (otherwise we wouldn't have considered 8.3.0 to be
> broken...).  The thing I'm not too clear about is what "use of an
> unnamed statement" translates to for a JDBC user.
>

Tom,

The driver will use unnamed statements for all statements until it sees the
same statement N times where N is 5 I believe, after that it uses a named
statement.

Dave


[PERFORM] Planner won't use composite index if there is an order by ????

2008-04-24 Thread Dave Cramer

Any idea why it wouldn't choose the right index ?

This is 8.3


# \d battles
 Table "public.battles"
  Column|Type |
Modifiers
-+- 
+--

id  | integer | not null default
nextval('battles_id_seq'::regclass)
user_id | integer | not null
contest_id  | integer | not null
entry_1_id  | integer | not null
entry_2_id  | integer | not null
new_entry_1_score   | integer |
new_entry_2_score   | integer |
score   | integer |
scored_at   | timestamp without time zone |
created_at  | timestamp without time zone | not null
function_profile_id | integer |
battle_type | integer | default 0
Indexes:
   "battles_pkey" PRIMARY KEY, btree (id)
   "unique_with_type" UNIQUE, btree (user_id, entry_1_id, entry_2_id,
battle_type)
   "battles_by_contest_and_type" btree (contest_id, battle_type)
   "battles_by_time" btree (scored_at)
Foreign-key constraints:
   "fk_battles_contests" FOREIGN KEY (contest_id) REFERENCES  
contests(id)

   "fk_battles_lefty" FOREIGN KEY (entry_1_id) REFERENCES entries(id)
   "fk_battles_righty" FOREIGN KEY (entry_2_id) REFERENCES entries(id)
   "fk_battles_users" FOREIGN KEY (user_id) REFERENCES users(id)


Here is the analyze of the query we want but it takes forever because
its using the index for the sort instead of restricting the number of
battles by user_id:

ourstage_production=# explain analyze SELECT * FROM battles WHERE
user_id = 196698 and scored_at is not null and score in (-3,3) ORDER  
BY

id DESC LIMIT 5;

QUERY PLAN

---
Limit  (cost=0.00..8381.61 rows=5 width=56) (actual
time=124421.499..183659.404 rows=2 loops=1)
  ->  Index Scan Backward using battles_pkey on battles
(cost=0.00..670528.67 rows=400 width=56) (actual
time=124421.495..183659.394 rows=2 loops=1)
Filter: ((scored_at IS NOT NULL) AND (score = ANY
('{-3,3}'::integer[])) AND (user_id = 196698))
Total runtime: 183659.446 ms
(4 rows)


If you remove the ORDER BY then it runs in 4 ms:

ourstage_production=# explain analyze SELECT * FROM battles WHERE
user_id = 196698 and scored_at is not null and score in (-3,3) LIMIT  
5;
 QUERY  
PLAN


---
Limit  (cost=0.00..126.65 rows=5 width=56) (actual time=4.607..4.621
rows=2 loops=1)
  ->  Index Scan using unique_with_type on battles
(cost=0.00..10131.66 rows=400 width=56) (actual time=4.603..4.611
rows=2 loops=1)
Index Cond: (user_id = 196698)
Filter: ((scored_at IS NOT NULL) AND (score = ANY
('{-3,3}'::integer[])))
Total runtime: 4.660 ms
(5 rows)


Here we tried to limit the table scan by time so that it would scan  
far

fewer records.  But what ended up happening is that it flipped it over
to using the right index.  The one that is based on user_id is much
preferred:


ourstage_production=# explain analyze SELECT * FROM battles WHERE
user_id = 196698 and scored_at is not null and score in (-3,3) and
scored_at > now() - INTERVAL '6 month' ORDER BY id DESC LIMIT 5;
QUERY
PLAN
-
Limit  (cost=10158.16..10158.18 rows=5 width=56) (actual
time=0.097..0.106 rows=2 loops=1)
  ->  Sort  (cost=10158.16..10158.92 rows=302 width=56) (actual
time=0.094..0.096 rows=2 loops=1)
Sort Key: id
Sort Method:  quicksort  Memory: 25kB
->  Index Scan using unique_with_type on battles
(cost=0.00..10153.15 rows=302 width=56) (actual time=0.069..0.078
rows=2 loops=1)
  Index Cond: (user_id = 196698)
  Filter: ((scored_at IS NOT NULL) AND (score = ANY
('{-3,3}'::integer[])) AND (scored_at > (now() - '6 mons'::interval)))
Total runtime: 0.152 ms
(8 rows)


Notice that we added time restriction and it now chooses to not use  
the
time index and goes after the index based on user_id.  Why?  We  
don't know.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] using like in a prepare doesnt' use the right index

2008-03-27 Thread Dave Cramer

I have a query which is

prepare s_18 as select uid from user_profile where name like  
$1::varchar and isactive=$2 order by name asc limit 250;


explain analyze execute s_18 ('atxchery%','t');
   QUERY 
 PLAN

-
 Limit  (cost=0.00..7965.22 rows=250 width=14) (actual  
time=301.714..3732.269 rows=1 loops=1)
   ->  Index Scan using user_profile_name_key on user_profile   
(cost=0.00..404856.37 rows=12707 width=14) (actual  
time=301.708..3732.259 rows=1 loops=1)

 Filter: (((name)::text ~~ $1) AND (isactive = $2))
 Total runtime: 3732.326 ms

without prepared statements we get

explain analyze select uid from user_profile where name like 'foo%'  
and isactive='t' order by name asc limit 250;
  
QUERY PLAN

-
 Limit  (cost=293.89..294.08 rows=73 width=14) (actual  
time=5.947..6.902 rows=250 loops=1)
   ->  Sort  (cost=293.89..294.08 rows=73 width=14) (actual  
time=5.943..6.258 rows=250 loops=1)

 Sort Key: name
 Sort Method:  top-N heapsort  Memory: 38kB
 ->  Bitmap Heap Scan on user_profile  (cost=5.36..291.64  
rows=73 width=14) (actual time=0.394..2.481 rows=627 loops=1)

   Filter: (isactive AND ((name)::text ~~ 'foo%'::text))
   ->  Bitmap Index Scan on user_profile_name_idx   
(cost=0.00..5.34 rows=73 width=0) (actual time=0.307..0.307 rows=628  
loops=1)
 Index Cond: (((name)::text ~>=~ 'foo'::text) AND  
((name)::text ~<~ 'fop'::text))



There are two indexes on it

"user_profile_name_idx" UNIQUE, btree (name varchar_pattern_ops)
"user_profile_name_key" UNIQUE, btree (name)

one for equality, one for like

So  how to get the prepare to use the right index

Dave


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-17 Thread Dave Cramer


On 17-Mar-08, at 2:50 PM, Justin wrote:





Just out of curiosity: Last time I did research, the word seemed to  
be that xfs was better than ext2 or ext3.  Is that not true?  Why  
use ext2/3 at all if xfs is faster for Postgres?


Criag


Ext2 vs XFS on my setup there is difference in the performance  
between the two file systems but its not OMG let switch.   XFS did  
better then Ext2 only one time, then Ext2 won out by small margin at  
best was 6%.   the other test ran at 3 to 4% better than XFS  
performance.


XFS has journaling so it should be safer.  I think i may stick with  
XFS as it has journaling


One thing i think is clear don't use ext3  it just kills performance  
by factors not small percents


here is article i found on XFS 
http://linux-xfs.sgi.com/projects/xfs/papers/xfs_white/xfs_white_paper.html

I hope this is helpful to people.  I know the process has taught me  
new things, and thanks to those that helped me out.


Before i throw this sever into production any one else want  
performance numbers.


C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 4 -v -h  
192.168.1.9 -U

postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 4
number of transactions actually processed: 40/40
tps = 2181.512770 (including connections establishing)
tps = 2187.107004 (excluding connections establishing)




2000 tps ??? do you have fsync turned off ?

Dave


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-16 Thread Dave Cramer


On 16-Mar-08, at 3:04 PM, Craig James wrote:


Dave Cramer wrote:

On 16-Mar-08, at 2:19 AM, Justin wrote:


I decided to reformat the raid 10 into ext2 to see if there was  
any real big difference in performance as some people have noted
here is the test results


please note the WAL files are still on the raid 0 set which is  
still in ext3 file system format.  these test where run with the  
fsync as before.   I made sure every thing was the same as with  
the first test.


This is opposite to the way I run things. I use ext2 on the WAL and  
ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly  
write.


Just out of curiosity: Last time I did research, the word seemed to  
be that xfs was better than ext2 or ext3.  Is that not true?  Why  
use ext2/3 at all if xfs is faster for Postgres?


I would like to see the evidence of this. I doubt that it would be  
faster than ext2. There is no journaling on ext2.


Dave

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-16 Thread Dave Cramer


On 16-Mar-08, at 2:19 AM, Justin wrote:



I decided to reformat the raid 10 into ext2 to see if there was any  
real big difference in performance as some people have noted   here  
is the test results


please note the WAL files are still on the raid 0 set which is still  
in ext3 file system format.  these test where run with the fsync as  
before.   I made sure every thing was the same as with the first test.


This is opposite to the way I run things. I use ext2 on the WAL and  
ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly  
write.


Dave


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how many index can have????

2008-03-11 Thread Dave Cramer


On 11-Mar-08, at 8:28 AM, petchimuthu lingam wrote:

I have one table with 30 fields, i have more then 60 million  
records, if i use more no of indexes will it affect the insertion

performance? and select performance?

Yes, and yes, but without more information about what you are trying  
to do, those answers are useless.



Shall i have more then one partial index for same field, 

--
With Best Regards,
Petchimuthulingam S



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-07 Thread Dave Cramer


On 6-Mar-08, at 9:30 PM, Stephen Denne wrote:


The strange thing of course is that the data is exactly the same for
both runs, the tables have not been changed between runs, and I did
them right after another. Even more strange is that the seq scan is
faster than the index scan.


It is not strange at all, since both queries read ALL the rows in  
your table, checking each and every row to see whether it matched  
your predicates.


The sequential scan read them in the order they are on the disk,  
meaning your disk didn't have to seek as much (assuming low file  
fragmentation).


The index scan again reads all the rows in your table, but reads  
them in the order they were in the index, which is probably quite  
different from the order that they are on the disk, so the disk had  
to seek a lot. In addition, it had to read the index.



OK, that makes sense.

So given that the predicates are essentially the same why would the  
planner decide to use or not use the index ?





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-07 Thread Dave Cramer

Josh,

On 6-Mar-08, at 12:26 PM, Josh Berkus wrote:


Dave,


Below I have two almost identical queries.  Strangely enough the one
that uses the index is slower ???


My first guess would be that records are highly correlated by DOB  
and not at

all by name.  However, it would help if you supplied both the index
definitions and what changed between the two queries to cause the  
index to be

used.


Indexes:
"user_profile_pkey" PRIMARY KEY, btree (uid) CLUSTER
"user_profile_name_idx" UNIQUE, btree (name varchar_pattern_ops)
"user_profile_name_key" UNIQUE, btree (name)
"user_profile_uploadcode_key" UNIQUE, btree (uploadcode)
"user_profile_active_idx" btree (isactive)
"user_profile_areacode_index" btree (areacode)
"user_profile_gender_idx" btree (gender)

and nothing changed between runs.

Dave



--
Josh Berkus
PostgreSQL @ Sun
San Francisco

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-06 Thread Dave Cramer


On 6-Mar-08, at 5:10 PM, Stephen Denne wrote:


Dave Cramer wrote:

I have two almost identical queries.  Strangely enough the one
that uses the index is slower ???


The index scan is being used so that it can retrieve the rows in the  
name order.
It expects that if it was to retrieve every row via the index, it  
would get about 1010 rows that matched the filter, and it knows it  
can stop after 250, so assuming the matching rows are evenly  
distributed it thinks it can stop after having read only a quarter  
of the rows.


However only 129 rows matched. Consequently it had to read every row  
in the table anyway, seeking a fair bit as the read order was  
specified by the index rather than in sequential order, and it also  
had to read the index. These extra costs were much larger than  
reading the lot sequentially, and sorting 129 resulting rows.


The first query picked a sequential scan as it thought it was only  
going to get 11 results, so was expecting that the limit wasn't  
going to come into play, and that every row would have to be read  
anyway.


The strange thing of course is that the data is exactly the same for  
both runs, the tables have not been changed between runs, and I did  
them right after another. Even more strange is that the seq scan is  
faster than the index scan.


Dave

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect,  
achievement, customer focus, and courage. This email with any  
attachments is confidential and may be subject to legal privilege.   
If it is not intended for you please advise by reply immediately,  
destroy it and do not copy, disclose or use it in any way.


__
 This email has been scanned by the DMZGlobal Business Quality
 Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why the difference in plans ?

2008-03-06 Thread Dave Cramer


On 6-Mar-08, at 12:26 PM, Josh Berkus wrote:


Dave,


Below I have two almost identical queries.  Strangely enough the one
that uses the index is slower ???


My first guess would be that records are highly correlated by DOB  
and not at

all by name.  However, it would help if you supplied both the index
definitions and what changed between the two queries to cause the  
index to be

used.


The two queries were run 2 seconds apart, there were no changes  
between. I'll get the index definitions.


Dave



--
Josh Berkus
PostgreSQL @ Sun
San Francisco

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Dave Cramer

Hi,

On 6-Mar-08, at 6:58 AM, sathiya psql wrote:


The only thing that is going to help you is really fast disks, and
more memory, and you should consider moving to 8.3 for all the other
performance benefits.
Is 8.3 is a stable version or what is the latest stable version of  
postgres ??



Yes it is the latest stable version.

moving my database from 7.4 to 8.3 will it do any harm ??


You will have to test this yourself. There may be issues

what are all the advantages of moving from 7.4 to 8.3

Every version of postgresql has improved performance, and robustness;  
so you will get better overall performance. However I want to caution  
you this is not a panacea. It will NOT solve your seq scan problem.




Dave





Re: [PERFORM] count * performance issue

2008-03-06 Thread Dave Cramer


On 6-Mar-08, at 1:43 AM, sathiya psql wrote:


is there any way to explicitly force the postgres to use index scan




If you want to count all the rows in the table there is only one way  
to do it (without keeping track yourself with a trigger ); a seq scan.


An index will not help you.

The only thing that is going to help you is really fast disks, and  
more memory, and you should consider moving to 8.3 for all the other  
performance benefits.


Dave 


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


[PERFORM] Why the difference in plans ?

2008-03-05 Thread Dave Cramer
Below I have two almost identical queries.  Strangely enough the one  
that uses the index is slower ???


explain analyze select uid from user_profile where  
lower(firstname)='angie' and extract(year from age('2008-02-26  
02:50:31.382', dob)) >= 18 and extract(year from age('2008-02-26  
02:50:31.382', dob)) <= 68 and image1 is not null and profileprivacy=1  
and isactive='t' order by name asc limit 250;
QUERY 
 PLAN

---
 Limit  (cost=166423.90..166423.93 rows=11 width=17) (actual  
time=1033.634..1034.137 rows=129 loops=1)
   ->  Sort  (cost=166423.90..166423.93 rows=11 width=17) (actual  
time=1033.631..1033.811 rows=129 loops=1)

 Sort Key: name
 ->  Seq Scan on user_profile  (cost=0.00..166423.71 rows=11  
width=17) (actual time=46.730..1032.994 rows=129 loops=1)
   Filter: ((lower((firstname)::text) = 'angie'::text)  
AND (date_part('year'::text, age('2008-02-26 02:50:31.382'::timestamp  
without time zone, dob)) >= 18::double precision) AND  
(date_part('year'::text, age('2008-02-26 02:50:31.382'::timestamp  
without time zone, dob)) <= 68::double precision) AND (image1 IS NOT  
NULL) AND (profileprivacy = 1) AND isactive)

 Total runtime: 1034.334 ms
(6 rows)

jnj=# explain analyze select uid from user_profile where  
lower(firstname)='angie' and dob <= '1990-03-05 15:17:29.537' and dob  
>= '1940-03-05 15:17:29.537' and image1 is not null and  
profileprivacy=1 and isactive='t' order by name asc limit 250;
 QUERY 
 PLAN

-
 Limit  (cost=0.00..113963.92 rows=250 width=17) (actual  
time=230.326..4688.607 rows=129 loops=1)
   ->  Index Scan using user_profile_name_key on user_profile   
(cost=0.00..460414.23 rows=1010 width=17) (actual  
time=230.322..4688.174 rows=129 loops=1)
 Filter: ((lower((firstname)::text) = 'angie'::text) AND (dob  
<= '1990-03-05 15:17:29.537'::timestamp without time zone) AND (dob >=  
'1940-03-05 15:17:29.537'::timestamp without time zone) AND (image1 IS  
NOT NULL) AND (profileprivacy = 1) AND isactive)

 Total runtime: 4688.906 ms
(4 rows)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] PostgreSQL performance on a virtual host

2008-03-05 Thread Dave Cramer

Hi,

I've run it on xen. works OK. Course this is all predicated upon your  
expectations. If you expect it to be as fast as a dedicated machine,  
you will be dissapointed.


Dave
On 5-Mar-08, at 3:54 AM, Moritz Onken wrote:


We have very good experiences with openVZ as virtualizer.
Since it's not a para virtualization like xen it's very fast. Almost  
as fast as the host.


www.openvz.org

Am 04.03.2008 um 16:43 schrieb Theo Kramer:


Hi

We are thinking of running a PostgreSQL instance on a virtual host  
under

Xen.

Any thoughts for/against running PostgreSQL on a virtual host would  
be

much appreciated.

--
Regards
Theo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Dave Cramer


On 21-Feb-08, at 6:16 PM, Scott Marlowe wrote:

On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood  
<[EMAIL PROTECTED]> wrote:
The other parameter you might want to look at is  
effective_cache_size -

increasing it will encourage index use. On a machine with 16GB the
default is probably too small (there are various recommendations  
about
how to set this ISTR either Scott M or Greg Smith had a page  
somewhere

that covered this quite well - guys?).


The default is always too small in my experience.

What are the rest of the configuration values ?

Dave


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Question about shared_buffers and cpu usage

2008-02-21 Thread Dave Cramer


On 21-Feb-08, at 12:13 AM, bh yuan wrote:


Hi

I am using Postgres8.3 on 8G memory , Xeon X5355 Quad Core x 2
processer RH5 machine with 10G data. (with  some table which have
about 2,000,000~ 5,000,000 rows )

I have two quesion.
1. how to set the shared_buffers and other postgresql.conf parameter
for best performance?
I only run the Postgres8.3 on the machine so I set the shared_buffers
= 7168MB (7G)
But somebody said it is too big, so confused.


Yes, it is too big! make it 2G to start





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-19 Thread Dave Cramer


On 19-Feb-08, at 2:35 PM, Douglas J Hunley wrote:


On Tuesday 19 February 2008 14:28:54 Dave Cramer wrote:

shared buffers is *way* too small as is effective cache
set them to 2G/6G respectively.

Dave


pardon my ignorance, but is this in the context of a restore only?  
or 'in

general'?


This is the "generally accepted" starting point for a pg db for  
production.





--
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

Don't let Kirk show you what he affectionately calls the "Captain's  
Log"


---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-19 Thread Dave Cramer

shared buffers is *way* too small as is effective cache
set them to 2G/6G respectively.

Dave


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-19 Thread Dave Cramer


On 19-Feb-08, at 1:12 PM, Joshua D. Drake wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 19 Feb 2008 13:03:58 -0500
Douglas J Hunley <[EMAIL PROTECTED]> wrote:


I spent a whopping seven hours restoring a database late Fri nite for
a client. We stopped the application, ran pg_dump -v -Ft -b -o $db >
~/pre_8.3.tar on the 8.2.x db, and then upgrading the software to
8.3. I then did a pg_restore -v -d $db ./pre_8.3.tar and watched it
positively crawl. I'll grant you that it's a 5.1G tar file, but 7
hours seems excessive.

Is that kind of timeframe 'abnormal' or am I just impatient? :) If
the former, I can provide whatever you need, just ask for it.
Thanks!


7 hours for 5.1 G is excessive. It took me 11 hours to do 220G :). It
would be helpful if we knew what the machine was doing. Was it IO
bound? How much ram does it have? Is it just a single HD drive? What
are your settings for postgresql?


Yeah, I did a 9G in about 20min. Did you optimize the new one ?

Joshua D. Drake


- --
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHuxwoATb/zqfZUUQRAjNzAJ9FYBIdEpytIWHtvuqC2L0Phah9EwCfdGrZ
kY1wItUqdtJ127ZA1Wl+95s=
=vvm+
-END PGP SIGNATURE-

---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Example web access to Postgres DB

2008-02-17 Thread Dave Cramer


On 17-Feb-08, at 10:18 AM, Bill Moran wrote:


"Mohamed Ali JBELI" <[EMAIL PROTECTED]> wrote:


Hi members

I am looking for an example of a web application (WAR) which  
executea a
Postgres actions. This aims to test the performance of Postgres in  
Web mode.


I shall be grateful if someone gives me a link where I can find a  
WAR file.


I think you're going to have to be more specific.  I don't know what
technology uses WAR files, and based on the tepid response, it doesn't
seem like anyone else on the list does either.


Java uses WAR files, it's a specific JAR file layout.

To answer your question.

http://blogs.sun.com/tomdaly/entry/sun_pushing_price_performance_curve

was done using a java web application.

Dave

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Dell Perc/6

2008-02-13 Thread Dave Cramer


On 13-Feb-08, at 5:02 AM, Tore Halset wrote:


Hello.

I think I started that discussion. We ended up buying a Dell 2900  
with PERC 6/i and 10 * 145GB SAS 3,5" 15KRpm discs. 6 of the SAS  
discs are in a raid 10 for the database, 2 in a mirror for the wal  
and the last 2 in a mirror for the OS. We get 350MB/s writing and  
380MB/s reading to/from the raid 10 area using dd. The OS is Ubuntu  
and the filesystem for the raid 10 is ext3.


Wow that's fantastic. Just to be sure, did you make sure that you read  
and wrote 2x memory to take the cache out of the measurement ?


Dave

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] difference between a unique constraint and a unique index ???

2007-11-12 Thread Dave Cramer


On 12-Nov-07, at 11:37 AM, Tom Lane wrote:


Alvaro Herrera <[EMAIL PROTECTED]> writes:
Well, AFAIK the index with varchar_pattern_ops is used for LIKE  
queries,
whereas the other one is going to be used for = queries.  So you  
need to

keep both indexes.


Given the current definition of text equality, it'd be possible to  
drop

~=~ and have the standard = operator holding the place of equality in
both the regular and pattern_ops opclasses.  Then it'd be possible to
support regular equality queries, as well as LIKE, with only the
pattern_ops index.

That would be ideal. Having two indexes on the same column isn't  
optimal.


Dave

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] difference between a unique constraint and a unique index ???

2007-11-12 Thread Dave Cramer


On 12-Nov-07, at 9:56 AM, Alvaro Herrera wrote:


Dave Cramer wrote:
In order to get like queries to use an index with database  
initialized with

a UTF-8 character set I added a unique index to a table with a
varchar_pattern_ops

This table already had a unique constraint on the column so I  
dropped the

unique constraint.

I can't give exact measurements however this caused my application  
to slow

down considerably.

The only thing I can figure is that the varchar_pattern_ops  
operator is

significantly slower ???

Is there some other piece of the puzzle to fill in ?


Well, AFAIK the index with varchar_pattern_ops is used for LIKE  
queries,
whereas the other one is going to be used for = queries.  So you  
need to

keep both indexes.


You would be correct, thanks for the quick answer.

Dave

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] difference between a unique constraint and a unique index ???

2007-11-12 Thread Dave Cramer
In order to get like queries to use an index with database initialized  
with a UTF-8 character set I added a unique index to a table with a  
varchar_pattern_ops


This table already had a unique constraint on the column so I dropped  
the unique constraint.


I can't give exact measurements however this caused my application to  
slow down considerably.


The only thing I can figure is that the varchar_pattern_ops operator  
is significantly slower ???


Is there some other piece of the puzzle to fill in ?

Dave



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Attempting to disable count triggers on cleanup

2007-09-25 Thread Dave Cramer

hubert depesz lubaczewski wrote:

On Tue, Sep 25, 2007 at 07:08:42AM -0400, Dave Cramer wrote:
  

ERROR:  deadlock detected
DETAIL:  Process 23063 waits for ExclusiveLock on tuple (20502,48) of 
relation 48999028 of database 14510214; blocked by process 23110.
Process 23110 waits for ShareLock on transaction 1427023217; blocked by 
process 23098.

...
CONTEXT:  SQL statement "update user_profile_count set 
buddycount=buddycount-1 where user_profile_count.uid= $1 "

PL/pgSQL function "user_buddy_count" line 11 at SQL statement
SQL statement "DELETE FROM ONLY "public"."user_buddies" WHERE 
"buddyuserid" = $1"



take a look at:
http://www.depesz.com/index.php/2007/09/12/objects-in-categories-counters-with-triggers/

and if you want to temporarily disable trigger, simply do appropriate
"alter table disable trigger".

  
Well, that doesn't work inside a transaction I've tried it. This has 
been fixed in 8.3


Dave

depesz

  



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Attempting to disable count triggers on cleanup

2007-09-25 Thread Dave Cramer

I have a database that we want to keep track of counts of rows.

We have triggers on the rows which increment, and decrement a count 
table. In order to speed up deleting many rows we have added the following


  if user != 'mocospace_cleanup' 
then  

   update user_profile_count set buddycount=buddycount-1 where 
user_profile_count.uid=OLD.userid;  
   end 
if;  



However in the logs we can see the following. I have checked to make 
sure that the user really is the mocospace_cleanup user and checked 
manually by logging in as the mocospace_cleanup user to make sure that 
the code above does what it purports to.


ERROR:  deadlock detected
DETAIL:  Process 23063 waits for ExclusiveLock on tuple (20502,48) of 
relation 48999028 of database 14510214; blocked by process 23110.
Process 23110 waits for ShareLock on transaction 1427023217; blocked by 
process 23098.

...
CONTEXT:  SQL statement "update user_profile_count set 
buddycount=buddycount-1 where user_profile_count.uid= $1 "

PL/pgSQL function "user_buddy_count" line 11 at SQL statement
SQL statement "DELETE FROM ONLY "public"."user_buddies" WHERE 
"buddyuserid" = $1"


Dave

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Dell Hardware Recommendations

2007-08-13 Thread Dave Cramer


On 13-Aug-07, at 9:50 AM, Vivek Khera wrote:



On Aug 10, 2007, at 4:36 PM, Merlin Moncure wrote:


I'm not so sure I agree.  They are using LSI firmware now (and so is
everyone else).  The servers are well built (highly subjective, I
admit) and configurable.  I have had some bad experiences with IBM
gear (adaptec controller though), and white box parts 3ware, etc.  I
can tell you that dell got us the storage and the server in record
time

do agree on adaptec however


Ok, perhaps you got luckier... I have two PowerVault 220 rack  
mounts with U320 SCSI drives in them. With an LSI 320-2X  
controller, it *refuses* to recognize some of the drives (channel 1  
on either array).  Dell blames LSI, LSI blames dell's backplane.   
This is consistent across multiple controllers we tried, and two  
different Dell disk arrays.  Dropping the SCSI speed to 160 is the  
only way to make them work.  I tend to believe LSI here.


This is the crux of the argument here. Perc/5 is a dell trademark.  
They can ship any hardware they want and call it a Perc/5.


Dave
The Adaptec 2230SLP controller recognizes the arrays fine, but  
tends to "drop" devices at inopportune moments.  Re-seating dropped  
devices starts a rebuild, but the speed is recognized as "1" and  
the rebuild takes two lifetimes to complete unless you insert a  
reboot of the system in there.  Totally unacceptable.  Again,  
dropping the scsi rate to 160 seems to make it more stable.





---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] best use of an EMC SAN

2007-07-11 Thread Dave Cramer


On 11-Jul-07, at 2:35 PM, Greg Smith wrote:


On Wed, 11 Jul 2007, Jim Nasby wrote:

I suppose an entirely in-memory database might be able to swamp a  
2 drive WAL as well.


You can really generate a whole lot of WAL volume on an EMC SAN if  
you're doing UPDATEs fast enough on data that is mostly in-memory.   
Takes a fairly specific type of application to do that though, and  
whether you'll ever find it outside of a benchmark is hard to say.


Well, this is such an application. The db fits entirely in memory,  
and the site is doing over 12M page views a day (I'm not exactly sure  
what this translates to in transactions) .
The main thing I would add as a consideration here is that you can  
configure PostgreSQL to write WAL data using the O_DIRECT path,  
bypassing the OS buffer cache, and greatly improve performance into  
SAN-grade hardware like this.  That can be a big win if you're  
doing writes that dirty lots of WAL, and the benefit is  
straightforward to measure if the WAL is a dedicated section of  
disk (just change the wal_sync_method and do benchmarks with each  
setting).  If the WAL is just another section on an array, how well  
those synchronous writes will mesh with the rest of the activity on  
the system is not as straightforward to predict.  Having the WAL  
split out provides a logical separation that makes figuring all  
this out easier.


Just to throw out a slightly different spin on the suggestions  
going by here:  consider keeping the WAL separate, starting as a  
RAID-1 volume, but keep 2 disks in reserve so that you could easily  
upgrade to a 0+1 set if you end up discovering you need to double  
the write bandwidth.  Since there's never much actual data on the  
WAL disks that would a fairly short downtime operation.  If you  
don't reach a wall, the extra drives might serve as spares to help  
mitigate concerns about the WAL drives burning out faster than  
average because of their high write volume.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com  
Baltimore, MD


---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] best use of an EMC SAN

2007-07-11 Thread Dave Cramer


On 11-Jul-07, at 10:05 AM, Gregory Stark wrote:


"Dave Cramer" <[EMAIL PROTECTED]> writes:

Assuming we have 24 73G drives is it better to make one big  
metalun and carve
it up and let the SAN manage the where everything is, or is  it  
better to

specify which spindles are where.


This is quite a controversial question with proponents of both  
strategies.


I would suggest having one RAID-1 array for the WAL and throw the  
rest of the


This is quite unexpected. Since the WAL is primarily all writes,  
isn't a RAID 1 the slowest of all for writing ?
drives at a single big array for the data files. That wastes space  
since the

WAL isn't big but the benefit is big.

If you have a battery backed cache you might not need even that.  
Just throwing

them all into a big raid might work just as well.

Any ideas on how to test this before we install the database ?


--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com




---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] best use of an EMC SAN

2007-07-11 Thread Dave Cramer
Assuming we have 24 73G drives is it better to make one big metalun  
and carve it up and let the SAN manage the where everything is, or is  
it better to specify which spindles are where.


Currently we would require 3 separate disk arrays.

one for the main database, second one for WAL logs, third one we use  
for the most active table.


Problem with dedicating the spindles to each array is that we end up  
wasting space. Are the SAN's smart enough to do a better job if I  
create one large metalun and cut it up ?


Dave

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] How much ram is too much

2007-06-14 Thread Dave Cramer

Hi Andrew
On 11-Jun-07, at 11:34 AM, Andrew Sullivan wrote:


On Mon, Jun 11, 2007 at 11:09:42AM -0400, Dave Cramer wrote:

and set them to anything remotely close to 128GB.


Well, we'd give 25% of it to postgres, and the rest to the OS.


Are you quite sure that PostgreSQL's management of the buffers is
efficient with such a large one?


No, I'm not sure of this.

In the past, that wasn't the case
for relatively small buffers; with the replacement of single-pass
LRU, that has certainly changed, but I'd be surprised if anyone
tested a buffer as large as 32G.


So does anyone have experience above 32G ?

Dave


A

--  
Andrew Sullivan  | [EMAIL PROTECTED]

The whole tendency of modern prose is away from concreteness.
--George Orwell

---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How much ram is too much

2007-06-11 Thread Dave Cramer

Hi Andrew
On 11-Jun-07, at 11:34 AM, Andrew Sullivan wrote:


On Mon, Jun 11, 2007 at 11:09:42AM -0400, Dave Cramer wrote:

and set them to anything remotely close to 128GB.


Well, we'd give 25% of it to postgres, and the rest to the OS.


Are you quite sure that PostgreSQL's management of the buffers is
efficient with such a large one?


No, I'm not sure of this.

In the past, that wasn't the case
for relatively small buffers; with the replacement of single-pass
LRU, that has certainly changed, but I'd be surprised if anyone
tested a buffer as large as 32G.


So does anyone have experience above 32G ?

Dave


A

-- Andrew Sullivan  | [EMAIL PROTECTED]
The whole tendency of modern prose is away from concreteness.
--George Orwell

---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] How much ram is too much

2007-06-11 Thread Dave Cramer


On 10-Jun-07, at 11:11 PM, Jim Nasby wrote:


On Jun 8, 2007, at 11:31 AM, Dave Cramer wrote:
Is it possible that providing 128G of ram is too much ? Will other  
systems in the server bottleneck ?


Providing to what? PostgreSQL? The OS? My bet is that you'll run  
into issues with how shared_buffers are managed if you actually try  
and set them to anything remotely close to 128GB.


Well, we'd give 25% of it to postgres, and the rest to the OS.

What is it specifically you are referring to ?

Dave

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] How much ram is too much

2007-06-11 Thread Dave Cramer

Actually this one is an opteron, so it looks like it's all good.

Dave
On 8-Jun-07, at 3:41 PM, Guy Rouillier wrote:


Dave Cramer wrote:

It's an IBM x3850 using linux redhat 4.0


I had to look that up, web site says it is a 4-processor, dual-core  
(so 8 cores) Intel Xeon system.  It also says "Up to 64GB DDR II  
ECC memory", so are you sure you can even get 128 GB RAM?


If you could, I'd expect diminishing returns from the Xeon  
northbridge memory access.  If you are willing to spend that kind  
of money on memory, you'd be better off with Opteron or Sparc.


--
Guy Rouillier

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [OT] Re: [PERFORM] How much ram is too much

2007-06-08 Thread Dave Cramer


On 8-Jun-07, at 2:10 PM, Joshua D. Drake wrote:


Zoltan Boszormenyi wrote:

Dave Cramer írta:

It's an IBM x3850 using linux redhat 4.0

Isn't that a bit old? I have a RedHat 4.2 somewhere
that was bundled with Applixware 3. :-)


He means redhat ES/AS 4 I assume.


Yes AS4

J




--  


  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate

PostgreSQL Replication: http://www.commandprompt.com/products/





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] How much ram is too much

2007-06-08 Thread Dave Cramer

It's an IBM x3850 using linux redhat 4.0


On 8-Jun-07, at 12:46 PM, Guy Rouillier wrote:


Dave Cramer wrote:
Is it possible that providing 128G of ram is too much ? Will other  
systems in the server bottleneck ?


What CPU and OS are you considering?

--
Guy Rouillier

---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] How much ram is too much

2007-06-08 Thread Dave Cramer
Is it possible that providing 128G of ram is too much ? Will other  
systems in the server bottleneck ?


Dave

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] postgresql running on a virtual cluster

2007-06-04 Thread Dave Cramer
Does anyone have any experience running pg on multiple IBM 3950's set  
up as a single machine ?


Dave

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PITR performance costs

2007-05-28 Thread Dave Cramer

Heikki,

Don't the archived logs have to be copied as well as the regular WAL  
logs get recycled ?


Dave
On 28-May-07, at 12:31 PM, Heikki Linnakangas wrote:


Dave Cramer wrote:
Since PITR has to enable archiving does this not increase the  
amount of disk I/O required ?


There's no difference in normal DML operations, but some bulk  
operations like CREATE INDEX that don't otherwise generate WAL,  
need to be WAL logged when archiving is enabled.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] PITR performance costs

2007-05-28 Thread Dave Cramer
Since PITR has to enable archiving does this not increase the amount  
of disk I/O required ?


Dave

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 500 requests per second

2007-05-21 Thread Dave Cramer

* set reasonable statement timeout
* backup with pitr.  pg_dump is a headache on extremely busy servers.


Where do you put your pitr wal logs so that they don't take up extra  
I/O ?

* get good i/o system for your box.  start with 6 disk raid 10 and go
from there.
* spend some time reading about bgwriter settings, commit_delay, etc.
* keep an eye out for postgresql hot (hopefully coming with 8.3) and
make allowances for it in your design if possible.
* normalize your database and think of vacuum as dangerous enemy.

good luck! :-)

merlin

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Intermitent slow queries

2007-05-02 Thread Dave Cramer


On 2-May-07, at 11:24 AM, Parks, Aaron B. wrote:

My pg 8.1 install on an AMD-64 box (4 processors) with 9 gigs of  
ram running RHEL4 is acting kind of odd and I thought I would see  
if anybody has any hints.




I have Java program using postgresql-8.1-409.jdbc3.jar to connect  
over the network.  In general it works very well.  I have run batch  
updates with several thousand records repeatedly that has worked fine.




The Program pulls a summation of the DB and does some processing  
with it.  It starts off wonderfully running a query every .5  
seconds.  Unfortunately, after a while it will start running  
queries that take 20 to 30 seconds.




Looking at the EXPLAIN for the query no sequential scans are going  
on and everything has an index that points directly at its search  
criteria.




Example:



Select sum(whatever) from a inner join b on a.something=b.something  
WHERE b.day=1 and b.hour=1


Select sum(whatever) from a inner join b on a.something=b.something  
WHERE b.day=1 and b.hour=2


Select sum(whatever) from a inner join b on a.something=b.something  
WHERE b.day=1 and b.hour=3


.

.

Select sum(whatever) from a inner join b on a.something=b.something  
WHERE b.day=1 and b.hour=23


Select sum(whatever) from a inner join b on a.something=b.something  
WHERE b.day=1 and b.hour=24


Select sum(whatever) from a inner join b on a.something=b.something  
WHERE b.day=2 and b.hour=1


Select sum(whatever) from a inner join b on a.something=b.something  
WHERE b.day=2 and b.hour=2


.

.

.





This query runs fine for a while (up to thousands of times). But  
what happens is that it starts to have really nasty pauses when you  
switch the day condition.  After the first query with the day it  
runs like a charm for 24 iterations, then slows back down again




My best guess was that an index never finished running, but REINDEX  
on the table (b in this case) didn’t seem to help.
I'd think it has more to do with caching data. The first query caches  
the days data, then the next day's data has to be read from disk.



Ideas?



AP






Re: [PERFORM] What`s wrong with JFS configuration?

2007-04-25 Thread Dave Cramer


On 25-Apr-07, at 4:54 AM, Heikki Linnakangas wrote:


Paweł Gruszczyński wrote:
To test I use pgBench with default database schema, run for 25,  
50, 75 users at one time. Every test I run 5 time to take average.
Unfortunetly my result shows that ext is fastest, ext3 and jfs are  
very simillar. I can understand that ext2 without jurnaling is  
faster than ext3, it is said that jfs is 40 - 60% faster. I cant  
see the difference. Part of My results: (transaction type |  
scaling factor | num of clients | tpl | num on transactions | tps  
including connection time | tps excliding connection time)

EXT2:
TPC-B (sort of),50,75,13,975|975,338.286682,358.855582
...
Can anyone tell me what`s wrong with my test? Or maybe it is normal?


With a scaling factor of 50, your database size is ~ 1 GB, which  
fits comfortably in your RAM. You're not exercising your drives or  
filesystem much. Assuming you haven't disabled fsync, the  
performance of that test is bound by the speed your drives can  
flush WAL commit records to disk.


I wouldn't expect the filesystem to make a big difference anyway,  
but you'll see..


If you really believe that jfs is 40 -60% faster ( which I highly  
doubt ) you should see this by simply reading/writing a very large  
file (2x your memory size) with dd .


Just curious but what data do you have that suggests this 40-60%  
number ?


Dave


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Foreign Key Deadlocking

2007-04-19 Thread Dave Cramer

Hi Csaba,

I have a similar problem.

In an attempt to avoid the overhead of select count(*) from mailbox  
where uid = somuid I've implemented triggers on insert and delete.


So there is a

user table which refers to to an inbox table,

so when people insert into the inbox there is an RI trigger grabbing  
the shared lock, then the count triggers try to grab an exclusive  
lock resulting in a deadlock.


Can we safely remove the shared locks ?

Is there a right way to implement the count triggers. I've tried  
before triggers, and after triggers, both result in different kinds  
of deadlocks.


Dave
On 18-Apr-07, at 11:36 AM, Csaba Nagy wrote:


Can someone confirm that I've identified the right fix?


I'm pretty sure that won't help you... see:

http://archives.postgresql.org/pgsql-general/2006-12/msg00029.php

The deadlock will be there if you update/insert the child table and
update/insert the parent table in the same transaction (even if you
update some other field on the parent table than the key referenced by
the child table). If your transactions always update/insert only  
one of
those tables, it won't deadlock (assuming you order the inserts/ 
updates

properly per PK).

Cheers,
Csaba.



---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] High Load on Postgres 7.4.16 Server

2007-04-05 Thread Dave Cramer
The problem with this is that it doesn't leverage shared buffers and  
kernel buffers well.


Anyways, my bet is that your SAN isn't performing as you expect on  
the new hardware.


Dave
On 5-Apr-07, at 4:13 PM, John Allgood wrote:

We run multiple postmasters because we can shutdown one postmaster/ 
database

without affecting the other postmasters/databases. Each database is a
division in our company. If we had everything under one postmaster if
something happened to the one the whole company would be down.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave  
Cramer

Sent: Thursday, April 05, 2007 4:01 PM
To: John Allgood
Cc: 'Jeff Frost'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High Load on Postgres 7.4.16 Server


On 5-Apr-07, at 3:33 PM, John Allgood wrote:


The hard thing about running multiple postmasters is that you have
to tune
each one separate. Most of the databases I have limited the max-
connections
to 30-50 depending on the database. What would reasonable values for
effective_cache_size and random_page_cost. I think I have these
default.
Also what about kernel buffers on RHEL4.


random_page_cost should be left alone

Why do you run multiple postmasters ? I don't think this is not the
most efficient way to utilize your hardware.

Dave


Thanks

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Frost
Sent: Thursday, April 05, 2007 3:24 PM
To: John Allgood
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High Load on Postgres 7.4.16 Server

On Thu, 5 Apr 2007, John Allgood wrote:


Hello All

I sent this message to the admin list and it never got through so I
am trying the performance list.
We moved our application to a new machine last night. It is a Dell
PowerEdge 6950 2X Dual Core. AMD Opteron 8214 2.2Ghz. 8GB Memory.  
The

machine is running Redhat AS 4 Upd 4 and Redhat Cluster Suite. The
SAN is

an

EMC SAS connected via fibre. We are using Postgres 7.4.16. We have

recently

had some major hardware issues and replaced the hardware with
brand new

Dell
equipment. We expected a major performance increase over the  
previous

being

the old equipment was nearly three years old
I will try and explain how things are configured. We have 10
separate postmasters running 5 on each node. Each of the
postmasters is a
single instance of each database. Each database is separated by
division

and

also we have them separate so we can restart an postmaster with
needing to
restart all databases My largest database is about 7 GB. And the
others

run

anywhere from 100MB - 1.8GB.
The other configuration was RHEL3 and Postgres 7.4.13 and Redhat
Cluster Suite. The application seemed to run much faster on the  
older

equipment.
My thoughts on the issues are that I could be something with the OS
tuning. Here is what my kernel.shmmax, kernel.shmall =  
1073741824. Is

there
something else that I could tune in the OS. My max_connections=35  
and

shared

buffers=8192 for my largest database.


John,

Was the SAN connected to the previous machine or is it also a new
addition
with the Dell hardware?  We had a fairly recent post regarding a
similar
upgrade in which the SAN ended up being the problem, so the first
thing I
would do is test the SAN with bonnie-++ and/or move your
application to use
a
local disk and test again.  With 8GB of RAM, I'd probably set the
shared_buffers to at least 5...If I remember correctly, this
was the
most
you could set it to on 7.4.x and continue benefitting from it.  I'd
strongly

encourage you to upgrade to at least 8.1.8 (and possibly 8.2.3) if
you can,
as
it has much better shared memory management.  You might also want
to double
check your effective_cache_size and random_page_cost to see if they
are set
to
reasonable values.  Did you just copy the old postgresql.conf over?

This is the beginning of the thread I mentioned above:

http://archives.postgresql.org/pgsql-performance/2007-03/msg00104.php

--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of
broadcast)---
TIP 5: don't forget to increase your free space map settings


--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.26/746 - Release Date:
4/4/2007
1:09 PM



---(end of
broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster


--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.26/746 - Release Date:  
4/4/20

Re: [PERFORM] High Load on Postgres 7.4.16 Server

2007-04-05 Thread Dave Cramer


On 5-Apr-07, at 3:33 PM, John Allgood wrote:

The hard thing about running multiple postmasters is that you have  
to tune
each one separate. Most of the databases I have limited the max- 
connections

to 30-50 depending on the database. What would reasonable values for
effective_cache_size and random_page_cost. I think I have these  
default.

Also what about kernel buffers on RHEL4.


random_page_cost should be left alone

Why do you run multiple postmasters ? I don't think this is not the  
most efficient way to utilize your hardware.


Dave


Thanks

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff  
Frost

Sent: Thursday, April 05, 2007 3:24 PM
To: John Allgood
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High Load on Postgres 7.4.16 Server

On Thu, 5 Apr 2007, John Allgood wrote:


Hello All

I sent this message to the admin list and it never got through so I
am trying the performance list.
We moved our application to a new machine last night. It is a Dell
PowerEdge 6950 2X Dual Core. AMD Opteron 8214 2.2Ghz. 8GB Memory. The
machine is running Redhat AS 4 Upd 4 and Redhat Cluster Suite. The  
SAN is

an

EMC SAS connected via fibre. We are using Postgres 7.4.16. We have

recently
had some major hardware issues and replaced the hardware with  
brand new

Dell

equipment. We expected a major performance increase over the previous

being

the old equipment was nearly three years old
I will try and explain how things are configured. We have 10
separate postmasters running 5 on each node. Each of the  
postmasters is a
single instance of each database. Each database is separated by  
division

and
also we have them separate so we can restart an postmaster with  
needing to
restart all databases My largest database is about 7 GB. And the  
others

run

anywhere from 100MB - 1.8GB.
The other configuration was RHEL3 and Postgres 7.4.13 and Redhat
Cluster Suite. The application seemed to run much faster on the older
equipment.
My thoughts on the issues are that I could be something with the OS
tuning. Here is what my kernel.shmmax, kernel.shmall = 1073741824. Is

there

something else that I could tune in the OS. My max_connections=35 and

shared

buffers=8192 for my largest database.


John,

Was the SAN connected to the previous machine or is it also a new  
addition
with the Dell hardware?  We had a fairly recent post regarding a  
similar
upgrade in which the SAN ended up being the problem, so the first  
thing I
would do is test the SAN with bonnie-++ and/or move your  
application to use

a
local disk and test again.  With 8GB of RAM, I'd probably set the
shared_buffers to at least 5...If I remember correctly, this  
was the

most
you could set it to on 7.4.x and continue benefitting from it.  I'd  
strongly


encourage you to upgrade to at least 8.1.8 (and possibly 8.2.3) if  
you can,

as
it has much better shared memory management.  You might also want  
to double
check your effective_cache_size and random_page_cost to see if they  
are set

to
reasonable values.  Did you just copy the old postgresql.conf over?

This is the beginning of the thread I mentioned above:

http://archives.postgresql.org/pgsql-performance/2007-03/msg00104.php

--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings


--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.26/746 - Release Date:  
4/4/2007

1:09 PM



---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Scaling SELECT:s with the number of disks on a stripe

2007-04-04 Thread Dave Cramer


On 4-Apr-07, at 2:01 AM, Peter Schuller wrote:


Hello,

The next question then is whether anything in your postgres  
configuration
is preventing it getting useful performance from the OS. What  
settings

have you changed in postgresql.conf?


The only options not commented out are the following (it's not even
tweaked for buffer sizes and such, since in this case I am not
interested in things like sort performance and cache locality other
than as an afterthought):

hba_file = '/etc/postgresql/8.1/main/pg_hba.conf'
ident_file = '/etc/postgresql/8.1/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/8.1-main.pid'
listen_addresses = '*'
port = 5432
max_connections = 100
unix_socket_directory = '/var/run/postgresql'
ssl = true
shared_buffers = 1000
This is way too low, if this 8.x then set it to 25% of available  
memory, and effective cache should be 3x that

log_line_prefix = '%t '
stats_command_string = on
stats_row_level = on
autovacuum = on
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'


Are you using any unusual settings within the OS itself?


No. It's a pretty standard kernel. The only local tweaking done is
enabling/disabling various things; there are no special patches used
or attempts to create a minimalistic kernel or anything like that.


You're forgetting the LIMIT clause. For the straight index scan, the
query aborts when the LIMIT is reached having scanned only the  
specified
number of index rows (plus any index entries that turned out to be  
dead
in the heap). For the bitmap scan case, the limit can be applied  
only after
the heap scan is under way, therefore the index scan to build the  
bitmap
will need to scan ~50k rows, not the 10k specified in the limit,  
so the
amount of time spent scanning the index is 50 times larger than in  
the

straight index scan case.


Ok - makes sense that it has to scan the entire subset of the index
for the value in question. I will have to tweak the CPU/disk costs
settings (which I have, on purpose, not yet done).

However, I do suspect you have a problem here somewhere, because  
in my
tests the time taken to do the bitmap index scan on 50k rows, with  
the

index in cache, is on the order of 30ms (where the data is cached in
shared_buffers) to 60ms (where the data is cached by the OS).  
That's on

a 2.8GHz xeon.


This is on a machine with 2.33GHz xeons and I wasn't trying to
exaggerate. I timed it and it is CPU bound (in userspace; next to no
system CPU usage at all) for about 15 seconds for the case of
selecting with a limit of 1.

Given that there is no disk activity I can't imagine any buffer sizes
or such affecting this other than userspace vs. kernelspace CPU
concerns (since obviously the data being worked on is in RAM). Or am I
missing something?

It is worth noting that the SELECT of fewer entries is entirely disk
bound; there is almost no CPU usage whatsoever. Even taking the
cumulative CPU usage into account (gut feeling calculation, nothing
scientific) and multiplying by 50 you are nowhere near 15 seconds of
CPU boundness. So it is indeed strange.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller  
<[EMAIL PROTECTED]>'

Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] scalablility problem

2007-04-01 Thread Dave Cramer
I may have missed this but have  you tuned your postgresql  
configuration ?


8.2 tuning guidelines are significantly different than 7.3

Dave
On 1-Apr-07, at 1:51 PM, Xiaoning Ding wrote:


I repeated the test again. It took 0.92 second under 8.2.3.
I checked system load using top and ps. There is no other
active processes.

Xiaoning

Ron Mayer wrote:

Xiaoning Ding wrote:

Postgresql is 7.3.18. [...]
1 process takes 0.65 second to finish.
I update PG to 8.2.3. The results are [...] now.
1 process :0.94 second

You sure about your test environment?  Anything else
running at the same time, perhaps?
I'm a bit surprised that 8.2.3 would be 40% slower than 7.3.18
even in the single process case.
---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster



---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Optimization postgresql 8.1.4 FC 6 X64 ?

2007-03-24 Thread Dave Cramer

I also think there have been changes in pgbench itself.

Make sure you run the same pgbench on both servers.

Dave
On 24-Mar-07, at 6:44 AM, Heikki Linnakangas wrote:


amrit angsusingh wrote:
I try to change my database server from the older one ie. 2Cpu  
Xeon 2.4 32
bit 4Gb SDram Hdd SCSI RAID 5 and FC 3 ix86 with 7..4.7 PG to the  
newer one
with 2CPU Xeon 3.0 64 Bit 4Gb DDRram SCSI Raid5 and FC6 X64 PG  
8.14 and try
to use  rather the same parameter from the previous  
postgresql.conf :-

...
I use pgbench to test the speed of my older database server and  
the result

is
bash-3.00$ pgbench  test -t 20 -c 30 -s 50
...


-t 20 is not enough to give repeatable results. Try something like - 
t 1000.


The speed of pgbench in that configuration (scaling factor 50,  
fsync enabled) is limited by the speed you can fsync the WAL. There  
isn't much you can do in postgresql.conf for that. If you get  
similar results with higher -t setting, it may be because your new  
RAID and drives have slightly higher latency.


You're better off testing with real queries with your real database.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] SATA RAID: Promise vs. 3ware

2007-03-20 Thread Dave Cramer


On 20-Mar-07, at 1:53 PM, Benjamin Arai wrote:

This is a little biased but I would stay away from areca only  
because they have fans on the card.  At some point down the line  
that card is going to die.  When it does there is really no telling  
what it will do to your data.  I personally use 3Ware cards, they  
work well but I have had one die before (1/10).


Well, they are also the only one of the bunch that I am aware of that  
will sell you 1G of cache. Plus if you use battery backup sooner or  
later you have to replace the batteries. I use areca  all the time  
and I've never had a fan die, but I admit it is a point of failure.


Dave


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] SATA RAID: Promise vs. 3ware

2007-03-20 Thread Dave Cramer


On 20-Mar-07, at 9:23 AM, Ireneusz Pluta wrote:


Hello all,

I sent a similar post to a FreeBSD group, but thought I'd might try  
here too.


I am completing a box for PostgreSQL server on FreeBSD. Selecting a  
RAID controller I decided to go with 3ware SE9650-16, following  
good opinions about 3ware controllers found on FreeBSD and  
PostgreSQL groups.


However my dealer suggest me not to go with 3ware, and take Promise  
SuperTrak EX16350, instead. This suggestion does not have any  
technical background and it comes generally from the fact of  
limited availability of 16x 3ware controllers on the local market  
and immediate availability of Promise.


Is this technically a good idea to take Promise instead of 3ware or  
rather I definitely should insist on 3ware and wait for it?


The reality is that most dealers have no idea what is "good" for a  
database application. It is likely that this card is better for him  
somehow ( more margin, easier to get, etc.)


I'd stick with 3ware, areca, or lsi. And even then I'd check it when  
I got it to make sure it lived up to it's reputation.

Dave

Thank you

Ireneusz Pluta


---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [JDBC] does prepareThreshold work? forced to use old driver

2007-02-26 Thread Dave Cramer


On 26-Feb-07, at 11:12 AM, Gene wrote:


hi!

I've been having some serious performance issues with
postgresql8.2/hibernate/jdbc due to postgres reusing bad cached query
plans. It doesn't look at the parameter values and therefore does not
use any partial indexes.

After trying to set prepareThreshold=0 in the connection string which
didnt work, even modifying the jdbc driver and forcing it to 0 and not
working I realized that it must be being ignored. After giving up
pretty much I tried a much older driver which doesn't use server
prepared statements at all the problem has gone away and it is once
again using the partial indexes. How can I get this to work properly
on the new jdbc driver? I don't really like having to use a 2 year old
driver to get good performance as you can imagine :)

Could someone point me to a jdbc src file where I could just disable
server-side prepared statements entirely?

you can just add protocolVersion=2 to the url and it will not use  
prepared statements.


setting prepareThreshold=0 just tells it not to use named statements.  
It will still use statements but won't cache them.


Are you sure the problem is with cached statements ? There are issues  
where prepared statements won't use the index if you don't use the  
correct type.


Dave

--
thanks, G

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Tuning

2007-01-26 Thread Dave Cramer


On 26-Jan-07, at 6:28 AM, John Parnefjord wrote:



Hi!

I'm planning to move from mysql to postgresql as I believe the latter
performs better when it comes to complex queries. The mysql database
that I'm running is about 150 GB in size, with 300 million rows in the
largest table. We do quite a lot of statistical analysis on the data
which means heavy queries that run for days. Now that I've got two new
servers with 32GB of ram I'm eager to switch to postgresql to improve
perfomance. One database is to be an analysis server and the other an
OLTP server feeding a web site with pages.

I'm setting for Postgresql 8.1 as it is available as a package in  
Debian

Etch AMD64.

As I'm new to postgresql I've googled to find some tips and found some
interesting links how configure and tune the database manager. Among
others I've found the PowerPostgresql pages with a performance  
checklist

and annotated guide to postgresql.conf
[http://www.powerpostgresql.com/]. And of course the postgresql site
itself is a good way to start. RevSys have a short guide as well
[http://www.revsys.com/writings/postgresql-performance.html]

I just wonder if someone on this list have some tips from the real  
world

how to tune postgresql and what is to be avoided. AFAIK the following
parameters seems important to adjust to start with are:

-work_mem
-maintenance_work_mem - 50% of the largest table?

Isn't it possible for this to be larger than memory ?

-shared_buffers - max value 5
Where does this shared buffers maximum come from ? It's wrong it  
should be 1/4 of available memory (8G) to start and tuned from there



-effective_cache_size - max 2/3 of available ram, ie 24GB on the
hardware described above
-shmmax - how large dare I set this value on dedicated postgres  
servers?

as big as required by shared buffer setting above

-checkpoint_segments - this is crucial as one of the server is
transaction heavy
-vacuum_cost_delay

Of course some values can only be estimated after database has been  
feed

data and queries have been run in a production like manner.

Cheers
// John

Ps. I sent to list before but the messages where withheld as I'm  
not "a

member of any of the restrict_post groups". This is perhaps due to the
fact that we have changed email address a few weeks ago and there  
was a
mismatch between addresses. So I apologize if any similar messages  
show

up from me, just ignore them.

---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Dave Cramer


On 17-Jan-07, at 3:41 PM, Steve wrote:


Hey there;

I've been lurking on this list awhile, and I've been working with  
postgres for a number of years so I'm not exactly new to this.  But  
I'm still having trouble getting a good balance of settings and I'd  
like to see what other people think.  We may also be willing to  
hire a contractor to help tackle this problem if anyone is interested.


I've got an application here that runs large (in terms of length --  
the queries have a lot of conditions in them) queries that can  
potentially return millions of rows but on average probably return  
tens of thousands of rows.  It's read only for most of the day, and  
pretty much all the queries except one are really fast.


However, each night we load data from a legacy cobol system into  
the SQL system and then we summarize that data to make the reports  
faster.  This load process is intensely insert/update driven but  
also has a hefty amount of selects as well.  This load process is  
taking ever longer to complete.



SO ... our goal here is to make this load process take less time.   
It seems the big part is building the big summary table; this big  
summary table is currently 9 million rows big.  Every night, we  
drop the table, re-create it, build the 9 million rows of data (we  
use COPY to put hte data in when it's prepared, not INSERT), and  
then build the indexes on it -- of which there are many.   
Unfortunately this table gets queried in a lot of different ways  
and needs these indexes; also unfortunately, we have operator class  
indexes to support both ASC and DESC sorting on columns so these  
are for all intents and purposes duplicate but required under  
Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this  
still a requirement?)


Building these indexes takes forever!  It's a long grind through  
inserts and then building the indexes takes a hefty amount of time  
too.  (about 9 hours).  Now, the application is likely part at  
fault, and we're working to make it more efficient, but it has  
nothing to do with the index building time.  I'm wondering what we  
can do to make this better if anything; would it be better to leave  
the indexes on?  It doesn't seem to be.  Would it be better to use  
INSERTs instead of copies?  Doesn't seem to be.



Anyway -- ANYTHING we can do to make this go faster is  
appreciated :) Here's some vital statistics:


- Machine is a 16 GB, 4 actual CPU dual-core opteron system using  
SCSI discs.  The disc configuration seems to be a good one, it's  
the best of all the ones we've tested so far.


The basic problem here is simply writing all the data to disk. you  
are building 9M rows of data plus numerous index's. How much data are  
you actually writing to the disk. Try looking at iostat while this is  
going on.


My guess is you are maxing out the disk write speed.
- The load process itself takes about 6 gigs of memory, the rest is  
free for postgres because this is basically all the machine does.


- If this was your machine and situation, how would you lay out the  
emmory settings?  What would you set the FSM to?  Would you leave  
teh bgwriter on or off?  We've already got FSYNC off because "data  
integrity" doesn't matter -- this stuff is religeously backed up  
and we've got no problem reinstalling it.  Besides, in order for  
this machine to go down, data integrity of the DB is the least of  
the worries :)


Do wal_buffers/full_page_writes matter of FSYNC is off?  If so,  
what settings?  What about checkpoints?



Not reallly, I'd have WAL buffers write to a ram disk

Any finally, any ideas on planner constants?  Here's what I'm using:

seq_page_cost = 0.5 # measured on an arbitrary  
scale

random_page_cost = 1.0  # same scale as above
cpu_tuple_cost = 0.001  # same scale as above
cpu_index_tuple_cost = 0.0001   # same scale as above
cpu_operator_cost = 0.00025 # same scale as above
effective_cache_size = 679006



as a general rule make shared buffers about 25% of free mem,  
effective cache 75% but with a write intensive load like you have I  
think the first thing to look at is write speed.
I really don't remember how I came up with that  
effective_cache_size number



Anyway... any advice would be appreciated :)


Steve

---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] PG8.2.1 choosing slow seqscan over idx scan

2007-01-17 Thread Dave Cramer


On 17-Jan-07, at 9:37 AM, Jeremy Haile wrote:

I still keep wondering if this table is bloated with dead tuples.   
Even

if you vacuum often if there's a connection with an idle transaction,
the tuples can't be reclaimed and the table would continue to grow.


I used to vacuum once an hour, although I've switched it to autovacuum
now.  It definitely could be bloated with dead tuples.  I'll paste the
"vacuum analyze verbose" output at the bottom of this e-mail.  Would a
vacuum full be a good idea?



Assuming the table's NOT bloated, you may do well to increase the
effective_cache_size, which doesn't allocate anything,



try setting it to something like 512MB or so.


It's currently set to 1000MB.


How much memory does the box have




If your table is bloating, and you don't have idle transactions  
hanging

of the database, it could be that your fsm settings are too low.


fsm is currently set to 200.  Is there any harm in setting it too
high? =)


Yes, it takes up space


Here's the vacuum analyze verbose output:

INFO:  vacuuming "public.transaction_facts"
INFO:  scanned index "transaction_facts_pkey" to remove 759969 row
versions
DETAIL:  CPU 7.20s/2.31u sec elapsed 315.31 sec.
INFO:  scanned index "transaction_facts_dcms_dim_id_idx" to remove
759969 row versions
DETAIL:  CPU 1.29s/2.15u sec elapsed 146.98 sec.
INFO:  scanned index "transaction_facts_merchant_dim_id_idx" to remove
759969 row versions
DETAIL:  CPU 1.10s/2.10u sec elapsed 126.09 sec.
INFO:  scanned index "transaction_facts_transaction_date_idx" to  
remove

759969 row versions
DETAIL:  CPU 1.65s/2.40u sec elapsed 259.25 sec.
INFO:  scanned index "transaction_facts_transaction_id_idx" to remove
759969 row versions
DETAIL:  CPU 7.48s/2.85u sec elapsed 371.98 sec.
INFO:  scanned index "transaction_facts_product_date_idx" to remove
759969 row versions
DETAIL:  CPU 2.32s/2.10u sec elapsed 303.83 sec.
INFO:  scanned index "transaction_facts_merchant_product_date_idx" to
remove 759969 row versions
DETAIL:  CPU 2.48s/2.31u sec elapsed 295.19 sec.
INFO:  scanned index "transaction_facts_merchant_date_idx" to remove
759969 row versions
DETAIL:  CPU 8.10s/3.35u sec elapsed 398.73 sec.
INFO:  scanned index "transaction_facts_success_idx" to remove 759969
row versions
DETAIL:  CPU 5.01s/2.84u sec elapsed 192.73 sec.
INFO:  scanned index "transaction_facts_failed_idx" to remove  
759969 row

versions
DETAIL:  CPU 1.03s/1.90u sec elapsed 123.00 sec.
INFO:  scanned index "transaction_facts_archived_idx" to remove 759969
row versions
DETAIL:  CPU 1.03s/1.39u sec elapsed 104.42 sec.
INFO:  scanned index "transaction_facts_response_code_idx" to remove
759969 row versions
DETAIL:  CPU 0.75s/2.17u sec elapsed 36.71 sec.
INFO:  scanned index "transaction_facts_transaction_source_idx" to
remove 759969 row versions
DETAIL:  CPU 0.60s/1.75u sec elapsed 42.29 sec.
INFO:  scanned index "transaction_facts_transaction_id_source_idx" to
remove 759969 row versions
DETAIL:  CPU 1.14s/1.84u sec elapsed 44.75 sec.
INFO:  "transaction_facts": removed 759969 row versions in 14360 pages
DETAIL:  CPU 0.57s/0.23u sec elapsed 45.28 sec.
INFO:  index "transaction_facts_pkey" now contains 2274280 row  
versions

in 152872 pages
DETAIL:  759969 index row versions were removed.
134813 index pages have been deleted, 134813 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO:  index "transaction_facts_dcms_dim_id_idx" now contains 2274280
row versions in 85725 pages
DETAIL:  759323 index row versions were removed.
75705 index pages have been deleted, 73721 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_dim_id_idx" now contains
2274280 row versions in 80023 pages
DETAIL:  759969 index row versions were removed.
71588 index pages have been deleted, 69210 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_date_idx" now contains
2274280 row versions in 144196 pages
DETAIL:  759969 index row versions were removed.
126451 index pages have been deleted, 126451 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_id_idx" now contains  
2274280

row versions in 150529 pages
DETAIL:  759969 index row versions were removed.
130649 index pages have been deleted, 130649 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_product_date_idx" now contains 2274280
row versions in 202248 pages
DETAIL:  759969 index row versions were removed.
174652 index pages have been deleted, 174652 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_product_date_idx" now  
contains

2274280 row versions in 202997 pages
DETAIL:  759969 index row versions were removed.
175398 index pages have been deleted, 175398 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_date_idx" now contains  
2

Re: [PERFORM] Problem with grouping, uses Sort and GroupAggregate, HashAggregate is better(?)

2007-01-14 Thread Dave Cramer


On 14-Jan-07, at 10:34 AM, Rolf Østvik (HA/EXA) wrote:


Computer:
Dell PowerEdge 2950
openSUSE Linux 10.1
Intel(R) Xeon 3.00GHz
4GB memory
xfs filesystem on SAS disks


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Rolf Østvik (HA/EXA)
Sent: Sunday, January 14, 2007 1:44 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Problem with grouping, uses Sort and
GroupAggregate, HashAggregate is better(?)

(now with a more sensible subject)

I have been trying to change a many parameters on server versions
7.4.15, 8.1.4, 8.2.0 and 8.2.1. I still hope a have managed to keep
my head straigth and that i do not present to much faulty  
information.


The cost estimates generated by the different server versions differ.
I have a query which (as far as i can tell) have some strange
differences
between 8.2.0 8.2.1. I can provide information about that if
anyone want
it.

Generally these parameters are used.
default_statistics_target = 10
(4 selected columns is set to 1000)
(I have tested with 1000 as default value
   but that did not have an impact)
(analyzed whenever value was changed)
shared_buffers = 64000 (512MB)


double shared_buffers

work_mem/sort_mem = variable, see different run's
effective_cache_size = 128000 (1G)


triple effective_cache (which does not actually use memory but tells  
the planner what it should expect to see in the buffers)



random_page_cost = 2
cpu_index_tuple_cost = 0.001
cpu_operator_cost= 0.025
cpu_tuple_cost   = 0.01

I have tested with different values for random_page_cost and
cpu_*_cost but it have not made a difference.
I have tried with random_page cost between 1 and 8,
and cpu_*_cost with standard value and 50x bigger)


This is a dubious setting to play with. random_page_cost is the ratio  
of random_seeks vs sequential seeks, 4 is generally the right number,  
unless you are using a *very* fast disk, or ram disk.






Query is:
explain
 analyze
select
  ur.id as ur_id,
  ur.unit_ref,
  ur.execution_time,
  u.serial_number,
  to_char(ur.start_date_time, '-MM-DD'),
  count(*) as num_test
from
  uut_result as ur
   inner join units as u
on ur.unit_ref=u.ref
   inner join step_result as sr
on ur.id=sr.uut_result
where
  ur.id between 174000 and 174000+999
group by
  ur.id,
  ur.unit_ref,
  ur.execution_time,
  u.serial_number,
  ur.start_date_time
--  order by
--ur.start_date_time
;
NB: order by clause is used in some results below.

=== Run 1:
Detect work_mem setting influence (See also Run 2)
 - server version 8.2.1
 - Query executed without "order by" clause
 - work_mem = 8600;
QUERY PLAN

-
 GroupAggregate  (cost=44857.70..47976.79 rows=95972
width=37) (actual time=1802.716..2017.337 rows=1000 loops=1)
   ->  Sort  (cost=44857.70..45097.63 rows=95972 width=37)
(actual time=1802.461..1892.743 rows=138810 loops=1)
 Sort Key: ur.id, ur.unit_ref, ur.execution_time,
u.serial_number, ur.start_date_time
 ->  Nested Loop  (cost=0.00..36915.87 rows=95972
width=37) (actual time=0.063..268.186 rows=138810 loops=1)
   ->  Nested Loop  (cost=0.00..5017.65 rows=981
width=37) (actual time=0.047..11.919 rows=1000 loops=1)
 ->  Index Scan using uut_result_pkey on
uut_result ur  (cost=0.00..1538.77 rows=1000 width=24)
(actual time=0.029..1.727 rows=1000 loops=1)
   Index Cond: ((id >= 174000) AND
(id <= 174999))
 ->  Index Scan using units_pkey on units
u  (cost=0.00..3.47 rows=1 width=17) (actual
time=0.006..0.007 rows=1 loops=1000)
   Index Cond: (ur.unit_ref = u.ref)
   ->  Index Scan using uut_result_key on
step_result sr  (cost=0.00..30.82 rows=136 width=4) (actual
time=0.011..0.125 rows=139 loops=1000)
 Index Cond: (ur.id = sr.uut_result)
Total runtime: 2021.833 ms
(12 rows)


=== Run 2:
Detect work_mem setting influence (See also Run 1)
 - server version 8.2.1
 - Query executed without "order by" clause
 - work_mem = 8700;
QUERY PLAN

-
 HashAggregate  (cost=38355.45..39795.03 rows=95972 width=37)
(actual time=436.406..439.867 rows=1000 loops=1)
   ->  Nested Loop  (cost=0.00..36915.87 rows=95972 width=37)
(actual time=0.066..256.235 rows=138810 loops=1)
 ->  Nested Loop  (cost=0.00..5017.65 rows=981
width=37) (actual time=0.049..10.858 rows=1000 loops=1)
   ->  Index Scan using uut_result_pkey on
uut_result ur  (cost=0.00..1538.77 rows=1000 width=24)
(actual time=0.031..1.546 rows=1000 loops=1)
 Index Cond: ((id >= 174000) AND (id <= 174999))
   ->  Index Scan using units_pkey on units u
(cost=0.00..3.47 rows=1 width=17) (actual time=0.005..0.006
rows=1 loops=1000)
   

Re: [PERFORM]

2007-01-14 Thread Dave Cramer


On 14-Jan-07, at 7:31 AM, Rolf Østvik (HA/EXA) wrote:

1234567890123456789012345678901234567890123456789012345678901234567890 
1234567890
00112233445566 
77

I have been trying to change a many parameters on server versions
7.4.15, 8.1.4, 8.2.0 and 8.2.1. I still hope a have managed to keep
my head straigth and that i do not present to much faulty information.

The cost estimates generated by the different server versions differ.
I have a query which (as far as i can tell) have some strange  
differences
between 8.2.0 8.2.1. I can provide information about that if anyone  
want

it.

Generally these parameters are used.
default_statistics_target = 10
(4 selected columns is set to 1000)
(I have tested with 1000 as default value
   but that did not have an impact)
(analyzed whenever value was changed)
shared_buffers = 64000 (512MB)
work_mem/sort_mem = variable, see different run's
effective_cache_size = 128000 (1G)
random_page_cost = 2
cpu_index_tuple_cost = 0.001
cpu_operator_cost= 0.025
cpu_tuple_cost   = 0.01


Can you tell us how big the machine is ? How much memory it has ? Not  
that it is terribly important, but it's a data point for me.




I have tested with different values for random_page_cost and
cpu_*_cost but it have not made a difference.
I have tried with random_page cost between 1 and 8,
and cpu_*_cost with standard value and 50x bigger)

Query is:
explain
 analyze
select
  ur.id as ur_id,
  ur.unit_ref,
  ur.execution_time,
  u.serial_number,
  to_char(ur.start_date_time, '-MM-DD'),
  count(*) as num_test
from
  uut_result as ur
   inner join units as u
on ur.unit_ref=u.ref
   inner join step_result as sr
on ur.id=sr.uut_result
where
  ur.id between 174000 and 174000+999
group by
  ur.id,
  ur.unit_ref,
  ur.execution_time,
  u.serial_number,
  ur.start_date_time
--  order by
--ur.start_date_time
;
NB: order by clause is used in some results below.

=== Run 1:
Detect work_mem setting influence (See also Run 2)
 - server version 8.2.1
 - Query executed without "order by" clause
 - work_mem = 8600;
QUERY PLAN
-
 GroupAggregate  (cost=44857.70..47976.79 rows=95972 width=37)  
(actual time=1802.716..2017.337 rows=1000 loops=1)
   ->  Sort  (cost=44857.70..45097.63 rows=95972 width=37) (actual  
time=1802.461..1892.743 rows=138810 loops=1)
 Sort Key: ur.id, ur.unit_ref, ur.execution_time,  
u.serial_number, ur.start_date_time
 ->  Nested Loop  (cost=0.00..36915.87 rows=95972 width=37)  
(actual time=0.063..268.186 rows=138810 loops=1)
   ->  Nested Loop  (cost=0.00..5017.65 rows=981  
width=37) (actual time=0.047..11.919 rows=1000 loops=1)
 ->  Index Scan using uut_result_pkey on  
uut_result ur  (cost=0.00..1538.77 rows=1000 width=24) (actual  
time=0.029..1.727 rows=1000 loops=1)
   Index Cond: ((id >= 174000) AND (id <=  
174999))
 ->  Index Scan using units_pkey on units u   
(cost=0.00..3.47 rows=1 width=17) (actual time=0.006..0.007 rows=1  
loops=1000)

   Index Cond: (ur.unit_ref = u.ref)
   ->  Index Scan using uut_result_key on step_result  
sr  (cost=0.00..30.82 rows=136 width=4) (actual time=0.011..0.125  
rows=139 loops=1000)
 Index Cond: (ur.id = sr.uut_result)  Total  
runtime: 2021.833 ms

(12 rows)


=== Run 2:
Detect work_mem setting influence (See also Run 1)
 - server version 8.2.1
 - Query executed without "order by" clause
 - work_mem = 8700;
QUERY PLAN
-
 HashAggregate  (cost=38355.45..39795.03 rows=95972 width=37)  
(actual time=436.406..439.867 rows=1000 loops=1)
   ->  Nested Loop  (cost=0.00..36915.87 rows=95972 width=37)  
(actual time=0.066..256.235 rows=138810 loops=1)
 ->  Nested Loop  (cost=0.00..5017.65 rows=981 width=37)  
(actual time=0.049..10.858 rows=1000 loops=1)
   ->  Index Scan using uut_result_pkey on uut_result  
ur  (cost=0.00..1538.77 rows=1000 width=24) (actual  
time=0.031..1.546 rows=1000 loops=1)

 Index Cond: ((id >= 174000) AND (id <= 174999))
   ->  Index Scan using units_pkey on units u   
(cost=0.00..3.47 rows=1 width=17) (actual time=0.005..0.006 rows=1  
loops=1000)

 Index Cond: (ur.unit_ref = u.ref)
 ->  Index Scan using uut_result_key on step_result sr   
(cost=0.00..30.82 rows=136 width=4) (actual time=0.011..0.123  
rows=139 loops=1000)
   Index Cond: (ur.id = sr.uut_result)  Total runtime:  
441.193 ms (10 rows)


=== Comment on Run 1 versus Run 2 (adjusted work_mem) ===
The difference in setup is value of work_mem. Bigger work_mem gave  
different

cost estimates and selected HashAggregate instead of GroupAggrega

Re: [PERFORM] Performance of Parser?

2007-01-13 Thread Dave Cramer


On 13-Jan-07, at 7:24 PM, Tom Lane wrote:


Jignesh Shah <[EMAIL PROTECTED]> writes:
The appserver is basically using bunch of prepared statements that  
the

server should be executing directly without doing the parsing again.


Better have another look at that theory, because you're clearly  
spending
a lot of time in parsing (operator resolution to be specific).  I  
think

your client code is failing to re-use prepared statements the way you
think it is.


This is exactly what is happening. The driver needs to cache  
statements for this to work.


Dave


regards, tom lane

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that  
your

   message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Large table performance

2007-01-12 Thread Dave Cramer


On 12-Jan-07, at 7:31 PM, Mark Dobbrow wrote:


Hello -

I have a fairly large table (3 million records), and am fetching  
10,000 non-contigous records doing a simple select on an indexed  
column ie


select grades from large_table where teacher_id = X

This is a test database, so the number of records is always 10,000  
and i have 300 different teacher ids.


The problem is, sometimes fetching un-cached records takes 0.5 secs  
and sometimes (more often) is takes more like 10.0 seconds


(fetching the same records for a given teacher_id a second time  
takes about 0.25 secs)


Has anyone seen similar behavior or know what the solution might be?

any help much appreciated,
Mark



ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192


5000 is pretty low, you need at least 1/4 of memory for an 8.1.x or  
newer server.

effective cache should be 3/4 of available memory

Dave






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Horribly slow query/ sequential scan

2007-01-09 Thread Dave Cramer


On 9-Jan-07, at 7:50 AM, Nörder-Tuitje, Marcus wrote:


Forget abount "IN". Its horribly slow.


I think that statement above was historically correct, but is now  
incorrect.  IN has been optimized quite significantly since 7.4


Dave


try :

select w.appid,
   w.rate,
   w.is_subscribed,
   sum(w.hits) AS Hits,
   sum(w.sius) AS IUs,
   sum(w.total_amnt) AS Total,
   sum(w.hits) * w.rate AS ByHits,
   sum(w.sius) * w.rate AS BYIUS
  from bill_rpt_work w
 where (select b.report_id from billing_reports b where  
b.report_s_date = '2006-09-30' and w.report_id = b.report_id)

   and w.client_id IN ('22741','227410')
group by 1,2,3
order by 1,2,3;



should by faster;

assuming : index on report_id in b; index on report_id, client_id in w

to enforce useage of indexes on grouping (depends on result size),  
consider extending w with cols 1,2,3.



regards,
marcus

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Auftrag von
[EMAIL PROTECTED]
Gesendet: Dienstag, 9. Januar 2007 13:36
An: Gregory S. Williamson
Cc: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] Horribly slow query/ sequential scan


I don't think I understand the idea behind this query. Do you  
really need

billing_reports twice?


The query:
explain analyze select
w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS  
IUs,

sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits,
sum(w.sius) * w.rate AS BYIUS
from bill_rpt_work w, billing_reports b
where w.report_id in
(select b.report_id from billing_reports where b.report_s_date =
'2006-09-30')
and (w.client_id = '22741' or w.client_id = '227410')
group by 1,2,3
order by 1,2,3;


Maybe this is the query you want instead?

select w.appid,
   w.rate,
   w.is_subscribed,
   sum(w.hits) AS Hits,
   sum(w.sius) AS IUs,
   sum(w.total_amnt) AS Total,
   sum(w.hits) * w.rate AS ByHits,
   sum(w.sius) * w.rate AS BYIUS
  from bill_rpt_work w
 where w.report_id in
   (select b.report_id from billing_reports b where  
b.report_s_date =

'2006-09-30')
   and (w.client_id = '22741' or w.client_id = '227410')
group by 1,2,3
order by 1,2,3;

/Dennis


---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate



---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Dave Cramer


On 6-Jan-07, at 11:32 PM, Guy Rouillier wrote:


Dave Cramer wrote:



The box has 3 GB of memory.  I would think that BigDBMS would be  
hurt by this more than PG.  Here are the settings I've modified  
in postgresql.conf:
As I said you need to set shared_buffers to at least 750MB this is  
the starting point, it can actually go higher. Additionally  
effective cache should be set to 2.25 G turning fsync is not a  
real world situation. Additional tuning of file systems can  
provide some gain, however as Craig pointed out some queries may  
need to be tweaked.


Dave, thanks for the hard numbers, I'll try them.  I agree turning  
fsync off is not a production option.  In another reply to my  
original posting, Alex mentioned that BigDBMS gets an advantage  
from its async IO.  So simply as a test, I turned fsync off in an  
attempt to open wide all the pipes.


Regarding shared_buffers=750MB, the last discussions I remember on  
this subject said that anything over 10,000 (8K buffers = 80 MB)  
had unproven benefits.  So I'm surprised to see such a large value  
suggested.  I'll certainly give it a try and see what happens.


That is 25% of your available memory. This is just a starting point.  
There are reports that going as high as 50% can be advantageous,  
however you need to measure it yourself.






autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

--
Guy Rouillier

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-05 Thread Dave Cramer


On 5-Jan-07, at 9:51 PM, Guy Rouillier wrote:

I've got back access to my test system.  I ran another test run  
with the same input data set.  This time I put pg_xlog on a  
different RAID volume (the unused one that I suspect is a software  
RAID), and I turned fsync=off in postgresql.conf.  I left the rest  
of the configuration alone (all foreign keys removed), etc.   
Unfortunately, this only dropped elapsed time down to about 28000  
seconds (from 3), still significantly more than BigDBMS.   
Additional info inline below.


Shoaib Mir wrote:

Here are my few recommendations that might help you:
- You will need to do table partitioning (http:// 
www.postgresql.org/docs/current/static/ddl-partitioning.html  
) as you are storing quite a lot of data in one  
table per day.


I'm focusing on the detailed perspective for now.  The 144 files  
I'm processing represent not even two hours of data, so that surely  
wouldn't be split up.


- You are using a RAID5 setup which is something that can also  
affect performance so switching to RAID1 might help you there, but  
again you have a RAID5 with 12 disks so hmm that shouldn't be that  
much of a problem.


Agreed.

- Have you done the tuning for postgresql.conf parameters? if not  
then you really need to do this for like checkpoint segments,  
random page cost, shared buffers, cache size, fsm pages, vacuum  
cost delay, work_mem, bgwriter etc etc. You can get good advice  
for tuning these parameters at --> http://www.powerpostgresql.com/ 
PerfList/


The box has 3 GB of memory.  I would think that BigDBMS would be  
hurt by this more than PG.  Here are the settings I've modified in  
postgresql.conf:


As I said you need to set shared_buffers to at least 750MB this is  
the starting point, it can actually go higher. Additionally effective  
cache should be set to 2.25 G turning fsync is not a real world  
situation. Additional tuning of file systems can provide some gain,  
however as Craig pointed out some queries may need to be tweaked.


autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

- For autovacuuming you need to properly tune the thresholds so  
that the vacuum and analyze is done at the right time not  
affecting the database server performance. (You can find help for  
this at http://www.postgresql.org/docs/current/static/routine- 
vacuuming.html under "22.1.4. The auto-vacuum daemon")


The real-life load on this database would be fairly constant  
throughout the day.  Stats from network devices are received every  
15 minutes from each device, but they are staggered.  As a result,  
the database is almost constantly being updated, so there is no  
dead time to do vacuums.


- You will need to separate your transactional logs i.e. pg_xlog  
folder to a different drive other then your database server drive.  
This can be done by creating symlinks for pg_xlog folder.


Done, see opening remarks.  Unfortunately minor impact.

- I hope you are doing proper connection pool management, because  
good use of database connections can be really effect the overall  
performance, connections can be expensive to create, and consume  
memory if they are not properly exited.


I probably should have mentioned this originally but was afraid of  
information overload.  The application runs on JBoss and uses JBoss  
connection pools.  So connections are pooled, but I don't know how  
they would compare to native PG connection pools.  Essentially,  
JBoss gets native JDBC connections, and the pools simply allow them  
to be re-used without opening and closing each time.  So if the  
native PG connection pools provide any pooling optimizations beyond  
that, those advantages are not being realized.


the PG Connection pools will not help, they do not currently provide  
any extra optimization.


Dave



Hope that helps your tests...


Thanks to everyone for providing suggestions, and I apologize for  
my delay in responding to each of them.




Shoaib Mir
EnterpriseDB (www.enterprisedb.com )
On 12/28/06, *Guy Rouillier* <[EMAIL PROTECTED] > wrote:

I don't want to violate any license agreement by discussing
performance,
so I'll refer to a large, commercial PostgreSQL-compatible  
DBMS only as

BigDBMS here.
I'm trying to convince my employer to replace BigDBMS with  
PostgreSQL
for at least some of our Java applications.  As a proof of  
concept, I

started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5- 
minute
usage statistics from our network devices, and stores a raw  
form of
 

Re: [PERFORM] Worse perfomance on 8.2.0 than on 7.4.14

2006-12-31 Thread Dave Cramer


On 31-Dec-06, at 6:33 AM, Rolf Østvik wrote:



Hi

I have a simple query which uses 32ms on 7.4.14 and 1015ms on 8.2.0.
I guess 7.4.14 creates a better execution plan than 8.2.0 for this  
query but

i don't know how to get it to select a better one.
Explain analyse output will be found near the end of the e-mail.

(I have simplified my real query to get it as simple as possible.  
The original query
contain 6 tables and was acceptable on 7.4.2, but took far too long  
on 8.1.4)


I have made a test setup to compare 7.4.14, 8.1.4 and 8.2.0.
8.1.4 and 8.2.0 uses the same execution plan and same time to execute.

postgresql.conf values i changed is
7.4.14
Raised shared_buffers from 32MB to 128MB
Raised temp_buffers from 8MB to 32MB
8.2.0
Raised shared_buffers from 32MB to 128MB
Raised temp_buffers from 8MB to 32MB
Raised work_mem from 1MB to 8MB


set effective_cache to 3G
shared buffers should be 1G on this computer for 8.2

Dave

(It did however not have any influence of speed for
the view_subset query shown below.)

vacuum analyze has been executed.

Computer:
Dell PowerEdge 2950
openSUSE Linux 10.1
Intel(R) Xeon 3.00GHz
4GB memory
xfs filesystem on SAS disks

 Table "public.step_result_subset"
   Column|  Type   | Modifiers
-+-+---
 id  | integer | not null
 uut_result  | integer |
 step_parent | integer |
Indexes:
"step_result_subset_pkey" PRIMARY KEY, btree (id)
"step_result_subset_parent_key" btree (step_parent)
"step_result_uut_result_idx" btree (uut_result)
Table contain 17 179 506 rows, and is ~400M when exported to file

 Table "public.uut_result_subset"
 Column  |Type | Modifiers
-+-+---
 id  | integer | not null
 start_date_time | timestamp without time zone |
Indexes:
"uut_result_subset_pkey" PRIMARY KEY, btree (id)
"uut_result_subset_start_date_time_idx" btree (start_date_time)
Table contain ~176 555 rows, and is ~4.7M when exportd to file

Query is defined as view:

create view view_subset as
select
  ur.id as ur_id,
  sr.id as sr_id
from
  uut_result_subset as ur
   inner join step_result_subset as sr
on ur.id=sr.uut_result
where
  ur.start_date_time > '2006-12-11'
  and sr.step_parent=0;

Explain analyze is run several times to get a stable result
so i guess the numbers presented is with as much as possible
data in memory buffers.

Column step_result_subset.step_parent contain 0 in as many rows as  
there are rows in table uut_result_subset.
(In my data set this will be 176 500 rows, Other values for  
step_result_subset.step_parent is present 1003 times and lower.)


Query: "select * from view_subset;" run against 7.4.14 server.
QUERY PLAN
-- 
--
 Nested Loop  (cost=0.00..1400.86 rows=17 width=8) (actual  
time=0.161..26.287 rows=68 loops=1)
   ->  Index Scan using uut_result_subset_start_date_time_idx on  
uut_result_subset ur  (cost=0.00..63.28 rows=18 width=4) (actual  
time=0.052..0.195 rows=68 loops=1)
 Index Cond: (start_date_time > '2006-12-11  
00:00:00'::timestamp without time zone)
   ->  Index Scan using step_result_uut_result_idx on  
step_result_subset sr  (cost=0.00..74.28 rows=2 width=8) (actual  
time=0.149..0.379 rows=1 loops=68)

 Index Cond: ("outer".id = sr.uut_result)
 Filter: (step_parent = 0)
 Total runtime: 26.379 ms

Query: "select * from view_subset;" run against 8.4.0 server.

QUERY PLAN
--
 Hash Join  (cost=339.61..77103.61 rows=96 width=8) (actual  
time=5.249..1010.669 rows=68 loops=1)

   Hash Cond: (sr.uut_result = ur.id)
   ->  Index Scan using step_result_subset_parent_key on  
step_result_subset sr  (cost=0.00..76047.23 rows=143163 width=8)  
(actual time=0.082..905.326 rows=176449 loops=1)

 Index Cond: (step_parent = 0)
   ->  Hash  (cost=339.31..339.31 rows=118 width=4) (actual  
time=0.149..0.149 rows=68 loops=1)
 ->  Bitmap Heap Scan on uut_result_subset ur   
(cost=4.90..339.31 rows=118 width=4) (actual time=0.060..0.099  
rows=68 loops=1)
   Recheck Cond: (start_date_time > '2006-12-11  
00:00:00'::timestamp without time zone)
   ->  Bitmap Index Scan on  
uut_result_subset_start_date_time_idx  (cost=0.00..4.90 rows=118  
width=0) (actual time=0.050..0.050 rows=68 loops=1)
 Index Cond: (start_date_time > '2006-12-11  
00:00:00'::timestamp without time zone)

 Total runtime: 1010.775 ms

Thanks for tips.

Best regards
Rolf Østvik

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 3: Have you checked

Re: [PERFORM] Postgresql Configutation and overflow

2006-12-28 Thread Dave Cramer

Hi,

On 28-Dec-06, at 8:58 PM, fabrix peñuelas wrote:


Good day,

I have been reading about the configuration of postgresql, but I  
have a server who does not give me the performance that should. The  
tables are indexed and made vacuum regularly, i monitor with top,  
ps and pg_stat_activity and when i checked was slow without a heavy  
load overage.


Before, the server reached 2000 connections to postgresql (with  
max_connections=3000 in it for future workflow).

Why would you need 2000 connections ?


I divided the load with another server for better performance, and  
now reach 500 connections, but yet is overflow.



My question is about how much memory should i configure in  
shared_buffers and effective_cache_size.


start with 25% of your 12G as shared buffers, and 75% of 12G for  
effective cache


You can go higher for shared buffers, but only do so with testing.

Dave


Features:

- 4 Processsors Intel Xeon Dual 3.0Ghz
- 12 GB RAM
- 2 discos en RAID 1 for OS
- 4 discs RAID 5 for DB
- S.O Slackware 11.0 Linux 2.6.17.7
- Postgres 8.1.4


=In internet i found this:

Tuning PostgreSQL for performance
2 Some basic parameters
2.1 Shared buffers

#  Start at 4MB (512) for a workstation
# Medium size data set and 256-512MB available RAM: 16-32MB  
(2048-4096)
# Large dataset and lots of available RAM (1-4GB): 64-256MB  
(8192-32768)

==


My postgresql.conf configuration is:

#- 
--

# FILE LOCATIONS
#- 
--


# The default values of these variables are driven from the -D  
command line
# switch or PGDATA environment variable, represented here as  
ConfigDir.


#data_directory = 'ConfigDir'# use data in another directory
#hba_file = 'ConfigDir/pg_hba.conf'# host-based authentication  
file

#ident_file = 'ConfigDir/pg_ident.conf'# IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is  
written.

#external_pid_file = '(none)'# write an extra pid file


#- 
--

# CONNECTIONS AND AUTHENTICATION
#- 
--


# - Connection Settings -

listen_addresses = '*'# what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
port = 5432
max_connections = 3000
# note: increasing max_connections costs ~400 bytes of shared  
memory per
# connection slot, plus lock space (see  
max_locks_per_transaction).  You

# might also need to raise shared_buffers to support more connections.
#superuser_reserved_connections = 2
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777# octal
#bonjour_name = ''# defaults to the computer name



#- 
--

# RESOURCE USAGE (except WAL)
#- 
--


# - Memory -


shared_buffers = 81920# min 16 or max_connections*2,  
8KB each

temp_buffers = 5000# min 100, 8KB each
max_prepared_transactions = 1000# can be 0 or more

# note: increasing max_prepared_transactions costs ~600 bytes of  
shared memory


# per transaction slot, plus lock space (see  
max_locks_per_transaction).

work_mem = 10240# min 64, size in KB
maintenance_work_mem = 253952# min 1024, size in KB
max_stack_depth = 4096# min 100, size in KB

# - Free Space Map -

#max_fsm_pages = 2# min max_fsm_relations*16, 6  
bytes each

#max_fsm_relations = 1000# min 100, ~70 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000# min 25
#preload_libraries = ''

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0# 0-1000 milliseconds
#vacuum_cost_page_hit = 1# 0-1 credits
#vacuum_cost_page_miss = 10# 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits

# - Background writer -

#bgwriter_delay = 200# 10-1 milliseconds between  
rounds
#bgwriter_lru_percent = 1.0# 0-100% of LRU buffers scanned/ 
round

#bgwriter_lru_maxpages = 5# 0-1000 buffers max written/round
#bgwriter_all_percent = 0.333# 0-100% of all buffers  
scanned/round

#bgwriter_all_maxpages = 5# 0-1000 buffers max written/round


#- 
--

# WRITE AHEAD LOG
#- 
--


# - Settings -

#fsync = on# turns forced synchronization on or off
#wal_sync_method = fsync# the default is the first option
# supported by the ope

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-28 Thread Dave Cramer

Guy,

Did you tune postgresql ? How much memory does the box have? Have you  
tuned postgresql ?


Dave
On 28-Dec-06, at 12:46 AM, Guy Rouillier wrote:

I don't want to violate any license agreement by discussing  
performance, so I'll refer to a large, commercial PostgreSQL- 
compatible DBMS only as BigDBMS here.


I'm trying to convince my employer to replace BigDBMS with  
PostgreSQL for at least some of our Java applications.  As a proof  
of concept, I started with a high-volume (but conceptually simple)  
network data collection application.  This application collects  
files of 5-minute usage statistics from our network devices, and  
stores a raw form of these stats into one table and a normalized  
form into a second table. We are currently storing about 12 million  
rows a day in the normalized table, and each month we start new  
tables.  For the normalized data, the app inserts rows initialized  
to zero for the entire current day first thing in the morning, then  
throughout the day as stats are received, executes updates against  
existing rows.  So the app has very high update activity.


In my test environment, I have a dual-x86 Linux platform running  
the application, and an old 4-CPU Sun Enterprise 4500 running  
BigDBMS and PostgreSQL 8.2.0 (only one at a time.)  The Sun box has  
4 disk arrays attached, each with 12 SCSI hard disks (a D1000 and 3  
A1000, for those familiar with these devices.)  The arrays are set  
up with RAID5.  So I'm working with a consistent hardware platform  
for this comparison.  I'm only processing a small subset of files  
(144.)


BigDBMS processed this set of data in 2 seconds, with all  
foreign keys in place.  With all foreign keys in place, PG took  
54000 seconds to complete the same job.  I've tried various  
approaches to autovacuum (none, 30-seconds) and it doesn't seem to  
make much difference.  What does seem to make a difference is  
eliminating all the foreign keys; in that configuration, PG takes  
about 3 seconds.  Better, but BigDBMS still has it beat  
significantly.


I've got PG configured so that that the system database is on disk  
array 2, as are the transaction log files.  The default table space  
for the test database is disk array 3.  I've got all the reference  
tables (the tables to which the foreign keys in the stats tables  
refer) on this array.  I also store the stats tables on this  
array.  Finally, I put the indexes for the stats tables on disk  
array 4.  I don't use disk array 1 because I believe it is a  
software array.


I'm out of ideas how to improve this picture any further.  I'd  
appreciate some suggestions.  Thanks.


--
Guy Rouillier


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Dave Cramer


On 11-Dec-06, at 5:36 AM, Daniel van Ham Colchete wrote:


Hi Dave,

On 12/11/06, Dave Cramer <[EMAIL PROTECTED]> wrote:

Hi Daniel
On 10-Dec-06, at 8:02 PM, Daniel van Ham Colchete wrote:

> Hi Gene,
>
> at my postgresql.conf, the only non-comented lines are:
> fsync = off
This can, and will result in lost data.


I know... If there is a power failure things can happen. I'm know, but
the performance dif is really really big I just have to decide if I'm
willing to take that chance or not.


> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'

How much memory does this machine have and what version of postgresql
are you using?

It's only a test server with 512MB RAM, I only used it to see how well
would  the PostgreSQL do in a ugly case.


Given that optimal performance for postgresql can require up to 50%  
of available memory, you are going to leave the OS with 256MB of  
memory ?


Dave


Daniel

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Locking in PostgreSQL?

2006-12-06 Thread Dave Cramer


On 6-Dec-06, at 8:20 AM, Jens Schipkowski wrote:

On Wed, 06 Dec 2006 13:29:37 +0100, Dave Cramer <[EMAIL PROTECTED]>  
wrote:


Unless you specifically ask for it postgresql doesn't lock any  
rows when you update data.


Thats not right. UPDATE will force a RowExclusiveLock to rows  
matching the WHERE clause, or all if no one is specified.
@Joost Kraaijeveld: Yes, because there is no EntryExclusiveLock or  
something like that. Roughly you can say, each UPDATE statement  
iterates through the affected table and locks the WHERE clause  
matching records (rows) exclusivly to prevent data inconsistancy  
during the UPDATE. After that your rows will be updated and the  
lock will be repealed.
You can see this during an long lasting UPDATE by querying the  
pg_locks with joined pg_stats_activity (statement must be enabled).


Apparently I've completely misunderstood MVCC then My  
understanding is that unless you do a select ... for update then  
update the rows will not be locked .


Dave



Dave
On 6-Dec-06, at 2:04 AM, Joost Kraaijeveld wrote:


Does PostgreSQL lock the entire row in a table if I update only 1
column?


--Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do  
not

   match




---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




CU,
Jens

--
**
Jens Schipkowski

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Locking in PostgreSQL?

2006-12-06 Thread Dave Cramer
Unless you specifically ask for it postgresql doesn't lock any rows  
when you update data.


Dave
On 6-Dec-06, at 2:04 AM, Joost Kraaijeveld wrote:


Does PostgreSQL lock the entire row in a table if I update only 1
column?


--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Configuration settings for 32GB RAM server

2006-12-04 Thread Dave Cramer


On 4-Dec-06, at 12:10 PM, Mark Lonsdale wrote:








Hi



We are migrating our Postgres 7.3.4 application to postgres 8.1.5  
and also moving it to a server with a much larger hardware  
configuration as well.The server will have the following  
specification.




- 4 physical CPUs (hyperthreaded to 8)
Try both hyperthreaded and not, there's been some evidence that HT  
helps us now

- 32 GB RAM

- x86_64 architecture

- RedHat AS 4

- postgres 8.1.5



Ive been taking a look at the various postgres tuning parameters,  
and have come up with the following settings.




shared_buffers – 50,000 -  From what Id read, increasing this  
number higher than this wont have any advantages ?
This is no longer true, 25% of available memory is a good starting  
place, and go up from there



effective_cache_size = 524288  -  My logic was I thought Id give  
the DB 16GB of the 32, and based this number on 25% of that number,  
sound okay?




this should be around 3/4 of available memory or 24G
work_mem – 32768  - I only have up to 30 connections in parallel,  
and more likely less than ½ that number.   My sql is relatively  
simple, so figured even if there was 5 sorts per query and 30  
queries in parallel, 32768 would use up 4GB of memory..   Does this  
number sound too high?




Maintenance_work_mem = 1048576 – Figured Id allocate 1GB for this.



fsm_relations = 2000  - I have about 200 tables plus maybe 4 or 5  
indexes on each, and didn’t want to have to worry about this number  
in future so doubled it.




fsm_pages = 200,000 – Based this on some statistics about the  
number of pages freed from a vacuum on older server.   Not sure if  
its fair to calculate this based on vacuum stats of 7.3.4 server?

this is dependent on your application



Do these numbers look reasonable given the machine above?   Any  
other settings that I should be paying particular consideration too?


autovacuum settings.





Thanks


Mark








Re: [PERFORM] shared_buffers > 284263 on OS X

2006-11-18 Thread Dave Cramer

Brian,
On 16-Nov-06, at 7:03 PM, Brian Wipf wrote:

I'm trying to optimize a PostgreSQL 8.1.5 database running on an  
Apple G5 Xserve (dual G5 2.3 GHz w/ 8GB of RAM), running Mac OS X  
10.4.8 Server.


The queries on the database are mostly reads, and I know a larger  
shared memory allocation will help performance (also by comparing  
it to the performance of the same database running on a SUSE Linux  
box, which has a higher shared_buffers setting).


When I set shared_buffers above 284263 (~ 2.17 GB) in the  
postgresql.conf file, I get the standard error message when trying  
to start the db:


FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=5432001,  
size=3289776128, 03600).


shmmax and shmall are set to 4GB, as can be seen by the output from  
sysctl:

hw.physmem = 2147483648
hw.usermem = 1885794304
hw.memsize = 8589934592
kern.sysv.shmmax: 4294967296
kern.sysv.shmmin: 1
kern.sysv.shmmni: 32
kern.sysv.shmseg: 8
kern.sysv.shmall: 1048576

Has anyone else noticed this limitation on OS X? Any ideas on how I  
might get shared_buffers higher than 284263?


My guess is something else has taken shared memory ahead of you. OS X  
seems to be somewhat strange in how it deals with shared memory. Try  
allocating more to shmmax ?


Dave


Brian Wipf
<[EMAIL PROTECTED]>


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Jdbc/postgres performance

2006-10-22 Thread Dave Cramer
On 17-Oct-06, at 3:05 PM, Behl, Rohit ((Infosys)) wrote:  HiWe are facing performance problems in postgres while executing a query. When I execute this query on the server it takes 5-10 seconds. Also I get good performance while executing this query from my code in java with the hard codes values. I face severe performance problems when I run it using a prepared statement. The query is as follows:Select events.event_id, ctrl.real_name, events.tsds, events.value, events.lds, events.correction, ctrl.type, ctrl.freq from iso_midw_data_update_events events, iso_midw_control ctrl where events.obj_id = ctrl.obj_id and events.event_id > 68971124 order by events.event_id limit 2000The above query executes in 5-10 seconds.However the below query executes in 8 mins:Select events.event_id, ctrl.real_name, events.tsds, events.value, events.lds, events.correction, ctrl.type, ctrl.freq from table events, iso_midw_control ctrl where events.obj_id = ctrl.obj_id and events.event_id > ?::bigint order by events.event_id limit ?setLong(1, 68971124);setInt(2, 2000);The table has close to 5 million rows. The table has the following index:iso_midw_data_update_events_event_id_keyiso_midw_data_update_events_lds_idxiso_midw_data_update_events_obj_id_idx The table is described as follows:Columns_name data_type type_name    column_sizelds     2   numeric     13obj_id      2   numeric     6tsds        2   numeric     13value       12  varchar     22correction  2   numeric     1delta_lds_tsds  2   numeric     13event_id    -5  bigserial   8Please tell me what I am missing while setting the prepared statement. I am using postgres7.4.2. and postgresql-8.1-407.jdbc3.jar.Try the same query with protocolVersion=2. There are some issues with prepared statements being slower if the parameters are not the same type as the column being compared to.protocol version 2 will issue the query exactly the same as psql does. Also note that your two queries are not identical. In the prepared query you cast to bigint ?Version 8.1.x handles this better I think.  Thanks RegardsRohit  

Re: [PERFORM] New hardware thoughts

2006-10-20 Thread Dave Cramer

Ben,

On 20-Oct-06, at 3:49 AM, Ben Suffolk wrote:


Hello all,

I am currently working out the best type of machine for a high  
volume pgsql database that I going to need for a project. I will be  
purchasing a new server specifically for the database, and it won't  
be running any other applications. I will be using FreeBSD 6.1 Stable.


I think it may be beneficial if I give a brief overview of the  
types of database access. There are several groups of tables and  
associated accesses to them.


The first can be thought of as users details and configuration  
tables. They will have low read and write access (say around 10 -  
20 a min). SIzed at around 1/2 Million rows.


The second part is logging, this will be used occasionally for  
reads when reports are run, but I will probably back that off to  
more aggregated data tables, so can probably think of this as a  
write only tables. Several table will each have around 200-300  
inserts a second. The can be archived on a regular basis to keep  
the size down, may be once a day, or once a week. Not sure yet.


The third part will be transactional and will have around 50  
transaction a second. A transaction is made up of a query followed  
by an update, followed by approx 3 inserts. In addition some of  
these tables will be read out of the transactions at approx once  
per second.


There will be around 50 simultaneous connections.

I hope that overview is a) enough and b) useful background to this  
discussion.


I have some thoughts but I need them validating / discussing. If I  
had the money I could buy the hardware and sped time testing  
different options, thing is I need to get this pretty much right on  
the hardware front first time. I'll almost certainly be buying Dell  
kit, but could go for HP as an alternative.


Processor : I understand that pgsql is not CPU intensive, but that  
each connection uses its own process. The HW has an option of upto  
4 dual core xeon processors. My thoughts would be that more lower  
spec processors would be better than fewer higher spec ones. But  
the question is 4 (8 cores) wasted because there will be so much  
blocking on I/O. Is 2 (4 cores) processors enough. I was thinking 2  
x 2.6G dual core Xeons would be enough.


Memory : I know this is very important for pgsql, and the more you  
have the more of the tables can reside in memory. I was thinking of  
around 8 - 12G, but the machine can hold a lot more. Thing is  
memory is still quite expensive, and so I don't to over spec it if  
its not going to get used.


Disk : Ok so this is the main bottleneck of the system. And the  
thing I know least about, so need the most help with. I understand  
you get good improvements if you keep the transaction log on a  
different disk from the database, and that raid 5 is not as good as  
people think unless you have lots of disks.


My option in disks is either 5 x 15K rpm disks or 8 x 10K rpm disks  
(all SAS), or if I pick a different server I can have 6 x 15K rpm  
or 8 x 10K rpm (again SAS). In each case controlled by a PERC 5/i  
(which I think is an LSI Mega Raid SAS 8408E card).



You mentioned a "Perc" controller, so I'll assume this is a Dell.

My advice is to find another supplier. check the archives for Dell.

Basically you have no idea what the Perc controller is since it is  
whatever Dell decides to ship that day.


In general though you are going down the right path here. Disks  
first, memory second, cpu third


Dave

So the question here is will more disks at a slower speed be better  
than fewer disks as a higher speed?


Assuming I was going to have a mirrored pair for the O/S and  
transaction logs that would leave me with 3 or 4 15K rpm for the  
database, 3 would mean raid 5 (not great at 3 disks), 4 would give  
me raid 10 option if I wanted it.  Or I could have raid 5 across  
all 5/6 disks and not separate the transaction and database onto  
different disks. Better performance from raid 5 with more disks,  
but does having the transaction logs and database on the same disks  
counteract / worsen the performance?


If I had the 8 10K disks, I could have 2 as a mirrored pair for O/S  
Transaction, and still have 6 for raid 5. But the disks are slower.


Anybody have any good thoughts on my disk predicament, and which  
options will serve me better.


Your thoughts are much appreciated.

Regards

Ben







---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Opteron vs. Xeon "benchmark"

2006-09-23 Thread Dave Cramer


On 23-Sep-06, at 9:49 AM, Guido Neitzer wrote:


On 9/23/06, Dave Cramer <[EMAIL PROTECTED]> wrote:


1) The database fits entirely in memory, so this is really only
testing CPU, not I/O which should be taken into account IMO


I don't think this really is a reason that MySQL broke down on ten or
more concurrent connections. The RAM might be, but I don't think so
too in this case as it represents exactly what we have seen in similar
tests. MySQL performs quite well on easy queries and not so much
concurrency. We don't have that case very often in my company ...  we
have at least ten to twenty connections to the db performing
statements. And we have some fairly complex statements running very
often.

Nevertheless - a benchmark is a benchmark. Nothing else. We prefer
PostgreSQL for other reasons then higher performance (which it has for
lots of situations).


I should make myself clear. I like the results of the benchmark. But  
I wanted to keep things in perspective.


Dave


cug

--
PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006
http://www.bignerdranch.com/news/2006-08-21.shtml

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that  
your

  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Opteron vs. Xeon "benchmark"

2006-09-23 Thread Dave Cramer


On 23-Sep-06, at 9:00 AM, Guido Neitzer wrote:


I find the benchmark much more interesting in comparing PostgreSQL to
MySQL than Intel to AMD. It might be as biased as other "benchmarks"
but it shows clearly something that a lot of PostgreSQL user always
thought: MySQL gives up on concurrency ... it just doesn't scale well.

cug

Before you get too carried away with this benchmark, you should  
review the previous comments on this thread.

Not that I don't agree, but lets put things in perspective.

1) The database fits entirely in memory, so this is really only  
testing CPU, not I/O which should be taken into account IMO

2) The machines were not "equal" The AMD boxes did not have as much ram.


DAVE


On 9/23/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

Yep. From what I understand, Intel is 8 to 10 times the size of AMD.

It's somewhat amazing that AMD even competes, and excellent for  
us, the

consumer, that they compete well, ensuring that we get very fast
computers, for amazingly low prices.

But Intel isn't crashing down any time soon. Perhaps they became a  
little

lazy, and made a few mistakes. AMD is forcing them to clean up.

May the competition continue... :-)

Cheers,
mark




--
PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006
http://www.bignerdranch.com/news/2006-08-21.shtml

---(end of  
broadcast)---

TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Dave Cramer


On 14-Sep-06, at 7:50 PM, Francisco Reyes wrote:


Dave Cramer writes:

personally, I'd set this to about 6G. This doesn't actually  
consume  memory it is just a setting to tell postgresql how much  
memory is  being used for cache and kernel buffers


Gotcha. Will increase further.


regarding shared buffers I'd make this much bigger, like 2GB or more


Will do 2GB on the weekend. From what I read this requires shared  
memory so have to restart my machine (FreeBSD).


if I plan to give shared buffers 2GB, how much more over that  
should I give the total shared memory kern.ipc.shmmax? 2.5GB?


I generally make it slightly bigger. is shmmax the size of the  
maximum chunk allowed or the total ?


Also will shared buffers impact inserts/updates at all?
I wish the postgresql.org site docs would mention what will be  
impacted.
Yes, it will, however not as dramatically as what you are seeing with  
effective_cache


Comments like: This setting must be at least 16, as well as at  
least twice the value of max_connections; however, settings  
significantly higher than the minimum are usually needed for good  
performance.


Are usefull, but could use some improvement.. increase on what? All  
performance? inserts? updates? selects?


For instance, increasing effective_cache_size has made a noticeable  
difference in selects. However as I talk to the developers we are  
still doing marginally in the inserts. About 150/min.
The reason is that with effective_cache the select plans changed (for  
the better) ; it's unlikely that the insert plans will change.


There is spare CPU cycles, both raid cards are doing considerably  
less they can do.. so next I am going to try and research what  
parameters I need to bump to increase inserts. Today I increased  
checkpoint_segments from the default to 64. Now looking at  
wall_buffers.


It would be most helpfull to have something on the docs to specify  
what each setting affects most such as reads, writes, updates,  
inserts, etc..

It's an art unfortunately.




Dave


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Dave Cramer

Francisco
On 14-Sep-06, at 4:30 PM, Francisco Reyes wrote:


Dave Cramer writes:


What is a reasonable number?
I estimate I have at least 1 to 2 GB free of memory.

You are using 6G of memory for something else ?


Right now adding up from ps the memory I have about 2GB.
Have an occassional program which uses up to 2GB.

Then I want to give some breathing room for when we have more  
connections so that work_mem doesn't make the macihne hit swap.
At 32MB say worst case scenario I may have 50 operations using  
those 32MB, that's about 1.5GB.


2+2+1.5 = 5.5
So I believe I have free about 2.5GB

effective cache should be set to 75% of free memory


So I will increase to 1.5GB then.
personally, I'd set this to about 6G. This doesn't actually consume  
memory it is just a setting to tell postgresql how much memory is  
being used for cache and kernel buffers


I may have more memory, but this is likely a safe value.

Thanks for the feedback.. Didn't even know about this setting.

regarding shared buffers I'd make this much bigger, like 2GB or more

dave


---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Dave Cramer

Francisco
On 14-Sep-06, at 1:36 PM, Francisco Reyes wrote:


Dave Cramer writes:


What is effective_cache set to ?


Increasing this seems to have helped significantly a web app. Load  
times seem magnitudes faster.


Increased it to effective_cache_size = 12288 # 96MB

What is a reasonable number?
I estimate I have at least 1 to 2 GB free of memory.

You are using 6G of memory for something else ?

effective cache should be set to 75% of free memory


Don't want to get too carried away right now with too many  
changes.. because right now we have very few connections to that  
database (usually less than 10), but I expect it to go to a norm of  
20+.. so need to make sure I won't make changes that will be a  
problem in that scenario.


So far only see one setting that can be an issue: work_mem so  have  
it set to only 32768.


---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Dave Cramer


On 14-Sep-06, at 11:23 AM, Francisco Reyes wrote:


My setup:
Freebsd 6.1
Postgresql 8.1.4
Memory: 8GB
SATA Disks
Raid 1 10 spindles (2 as hot spares)
500GB disks (16MB buffer), 7200 rpm
Raid 10

Raid 2 4 spindles
150GB 10K rpm disks
Raid 10

shared_buffers = 1
shared buffers  should be considerably more, depending on what else  
is running

temp_buffers = 1500
work_mem = 32768# 32MB
maintenance_work_mem = 524288   # 512MB

checkpoint_segments = 64
Just increased to 64 today.. after reading this may help. Was 5  
before.


What is effective_cache set to ?


pg_xlog on second raid (which sees very little activity)

Database sizes: 1 200GB+ Db and 2 100GB+

I run 3 daily "vacuumdb -azv". The vacuums were taking 2 to 3 hours

why not just let  autovac do it's thing ?

.
Recently we have started to do some data mass loading and now the  
vacuums are taking close to 5 hours AND it seems they may be  
slowing down the loads.


These are not bulk loads in the sense that we don't have a big file  
that we can do a copy.. instead it is data which several programs  
are processing from some temporary tables so we have lots of  
inserts. There are also updates to keep track of some totals.


I am looking to either improve the time of the vacuum or decrease  
it's impact on the loads.

Are the variables:
#vacuum_cost_delay = 0  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits

Is that the way to go to decrease impact?
Or should I try increasing maintenance_work_mem to 1GB?

A sum of all running processes from "ps auxw" shows about 3.5GB in  
"VSZ" and 1.5GB in "RSS".


I am also going to check if I have enough space to move the stage  
DB to the second raid which shows very little activity in iostat.


Any other suggestions?

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] sql-bench

2006-09-14 Thread Dave Cramer

Have you tuned postgresql ?

You still haven't told us what the machine is, or the tuning  
parameters. If you follow Merlin's links you will find his properly  
tuned postgres out performs mysql in every case.


--dc--
On 14-Sep-06, at 2:55 AM, yoav x wrote:

You can use the test with InnoDB by giving the --create- 
options=engine=innodb option in the
command line. Even with InnoDB, in some specific tests PG looks  
very bad compared to InnoDB.


--- Tom Lane <[EMAIL PROTECTED]> wrote:


yoav x <[EMAIL PROTECTED]> writes:

Are there any tuning parameters that can be changed to speed these
queries? Or are these queries especially tuned to show MySQL's
stgrenths?


The latter.  I've ranted about this before --- there are both obvious
and subtle biases in that benchmark.  The last time I spent any time
with it, I ended up testing with these nondefault settings:

shared_buffers = 1
work_mem = 10
maintenance_work_mem = 10
fsync = false
checkpoint_segments = 30
max_locks_per_transaction = 128

(fsync = false is pretty bogus for production purposes, but if you're
comparing to mysql using myisam tables, I think it's a reasonably  
fair

basis for comparison, as myisam is certainly not crash-safe.  It'd be
interesting to see what mysql's performance looks like on this test
using innodb tables, which should be compared against fsync = true
... but I don't know how to change it to get all the tables to be
innodb.)

Also, on some of the tests it makes a material difference whether you
are using C locale or some other one --- C is faster.  And make  
sure you
have a recent version of DBD::Pg --- a year or two back I recall  
seeing
the perl test program eating more CPU than the backend in some of  
these

tests, because of inefficiencies in DBD::Pg.

IIRC, with these settings PG 8.0 seemed to be about half the speed of
mysql 5.0 w/myisam, which is probably somewhere in the ballpark of  
the
truth for tests of this nature, ie, single query stream of fairly  
simple

queries.  If you try concurrent-update scenarios or something that
stresses planning ability you may arrive at different results though.
I have not retested with more recent versions.

regards, tom lane

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so  
that your

   message can get through to the mailing list cleanly




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---(end of  
broadcast)---

TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] sql-bench

2006-09-13 Thread Dave Cramer
First of all you are going to have to show use what these queries are  
exactly, what the machine is you are running on (CPU, memory, and  
disk) , and how you have tuned it.


slow is a relative term.. we need information to determine what  
"slow" means.


Dave
On 13-Sep-06, at 8:50 AM, yoav x wrote:


So why are these queries so slow in PG?


--- Dave Cramer <[EMAIL PROTECTED]> wrote:


All of the tuning parameters would affect all queries

shared buffers, wal buffers, effective cache, to name a few

--dc--
On 13-Sep-06, at 8:24 AM, yoav x wrote:


Hi

I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux.
Some of the insert tests seems to be ver slow

For example: select_join_in

Are there any tuning parameters that can be changed to speed these
queries? Or are these queries
especially tuned to show MySQL's stgrenths?




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---(end of
broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do  
not

   match




---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so  
that your

   message can get through to the mailing list cleanly




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] sql-bench

2006-09-13 Thread Dave Cramer

All of the tuning parameters would affect all queries

shared buffers, wal buffers, effective cache, to name a few

--dc--
On 13-Sep-06, at 8:24 AM, yoav x wrote:


Hi

I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux.
Some of the insert tests seems to be ver slow

For example: select_join_in

Are there any tuning parameters that can be changed to speed these  
queries? Or are these queries

especially tuned to show MySQL's stgrenths?




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Poor performance on seq scan

2006-09-13 Thread Dave Cramer


On 13-Sep-06, at 6:16 AM, Laszlo Nagy wrote:



I have had extremely bad performance historically with onboard  
SATA chipsets
on Linux.  The one exception has been with the Intel based  
chipsets (not the

CPU, the I/O chipset).

This board has Intel chipset. I cannot remember the exact type but  
it was not in the low end category.

dmesg says:


kernel: ad4: 152626MB  at ata2-master  
SATA150
kernel: ad4: 152627MB  at ata3-master  
SATA150


It is very likely that you are having problems with the driver for  
the

chipset.

Are you running RAID1 in hardware?  If so, turn it off and see  
what the

performance is.  The onboard hardware RAID is worse than useless, it
actually slows the I/O down.


I'm using software raid, namely gmirror:

GEOM_MIRROR: Device gm0 created (id=2574033628).
GEOM_MIRROR: Device gm0: provider ad4 detected.
GEOM_MIRROR: Device gm0: provider ad6 detected.
GEOM_MIRROR: Device gm0: provider ad4 activated.
GEOM_MIRROR: Device gm0: provider ad6 activated.

#gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 2574033628
Providers:
1. Name: mirror/gm0
  Mediasize: 160040803328 (149G)
  Sectorsize: 512
  Mode: r5w5e6
Consumers:
1. Name: ad4
  Mediasize: 160040803840 (149G)
  Sectorsize: 512
  Mode: r1w1e1
  State: ACTIVE
  Priority: 0
  Flags: DIRTY
  GenID: 0
  SyncID: 1
  ID: 1153981856
2. Name: ad6
  Mediasize: 160041885696 (149G)
  Sectorsize: 512
  Mode: r1w1e1
  State: ACTIVE
  Priority: 0
  Flags: DIRTY
  GenID: 0
  SyncID: 1
  ID: 3520427571


I tried to do:

#sysctl vfs.read_max=32
vfs.read_max: 6 -> 32

but I could not reach better disk read performance.

Thank you for your suggestions. Looks like I need to buy SCSI disks.


Well before you go do that try the areca SATA raid card


Regards,

  Laszlo


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Xeon Woodcrest/Dempsey vs Opteron Socket F/940 with postgresql and some SAS raid-figures

2006-09-08 Thread Dave Cramer


On 8-Sep-06, at 8:44 AM, Arjen van der Meijden wrote:


Dave Cramer wrote:

Hi, Arjen,
The Woodcrest is quite a bit faster than the Opterons.  
Actually... With Hyperthreading *enabled* the older Dempsey- 
processor is also faster than the Opterons with PostgreSQL. But  
then again, it is the top-model Dempsey and not a top-model  
Opteron so that isn't a clear win.
Of course its clear that even a top-Opteron wouldn't beat the  
Dempsey's as easily as it would have beaten the older Xeon's  
before that.

Why wouldn't you use a top of the line Opteron ?


What do you mean by this question? Why we didn't test the Opteron  
285 instead of the 280?

Yes, that is the question.


Well, its not that you can just go up to a hardware supplier and  
pick exactly the system you want to review/benchmar... especially  
not with pre-production hardware that (at the time) wasn't very  
widely available.
Normally, you just get what system they have available at their  
marketing or pre-sales department.

Understandable.


The Opteron 280 was from an earlier review and was fitted in the  
"Try and Buy"-version of the Sun Fire x4200. In that system; you  
only have a few options where the 280 was the fastest at the time.




But then again, systems with the Woodcrest 5150 (the subtop one)  
and Opteron 280 (also the subtop one) are about equal in price, so  
its not a bad comparison in a bang-for-bucks point of view. The  
Dempsey was added to show how both the Opteron and the newer  
Woodcrest would compete against that one.


Did I read this correctly that one of the Opterons in the test only  
had 4G of ram vs 7 G in the Intel boxes ? If so this is a severely  
limiting factor for postgresql at least?


Dave


Best regards,

Arjen




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Xeon Woodcrest/Dempsey vs Opteron Socket F/940 with postgresql and some SAS raid-figures

2006-09-08 Thread Dave Cramer

Hi, Arjen,


On 8-Sep-06, at 1:51 AM, Arjen van der Meijden wrote:


Hi,

We've been running our "webapp database"-benchmark again on mysql  
and postgresql. This time using a Fujitsu-Siemens RX300 S3 machine  
equipped with a 2.66Ghz Woodcrest (5150) and a 3.73Ghz Dempsey  
(5080). And compared those results to our earlier undertaken  
Opteron benchmarks on 2.4GHz' Socket F- and 940-versions (2216, 280).


You can see the english translation here:
http://tweakers.net/reviews/646

The Woodcrest is quite a bit faster than the Opterons. Actually...  
With Hyperthreading *enabled* the older Dempsey-processor is also  
faster than the Opterons with PostgreSQL. But then again, it is the  
top-model Dempsey and not a top-model Opteron so that isn't a clear  
win.
Of course its clear that even a top-Opteron wouldn't beat the  
Dempsey's as easily as it would have beaten the older Xeon's before  
that.


Why wouldn't you use a top of the line Opteron ?


Again PostgreSQL shows very good scalability, so good even  
HyperThreading adds extra performance to it with 4 cores enabled...  
while MySQL in every version we tested (5.1.9 is not displayed, but  
showed similar performance) was slower with HT enabled.


Further more we received our ordered Dell MD1000 SAS-enclosure  
which has 15 SAS Fujitsu MAX3036RC disks and that unit is  
controlled using a Dell PERC 5/e.
We've done some benchmarks (unfortunately everything is in Dutch  
for this).


We tested varying amounts of disks in RAID10 (a set of 4,5,6 and 7  
2-disk-mirrors striped), RAID50 and RAID5. The interfaces to  
display the results are in a google-stylee beta-state, but here is  
a list of all benchmarks done:

http://tweakers.net/benchdb/search?query=md1000&ColcomboID=5

Hover over the left titles to see how many disks and in what raid- 
level  was done. Here is a comparison of 14 disk RAID5/50/10's:
http://tweakers.net/benchdb/testcombo/wide/?TestcomboIDs%5B1156% 
5D=1&TestcomboIDs%5B1178%5D=1&TestcomboIDs%5B1176% 
5D=1&DB=Nieuws&Query=Keyword


For raid5 we have some graphs:
http://tweakers.net/benchdb/testcombo/1156
Scroll down to see how adding disks improves performance on it. The  
Areca 1280 with WD Raptor's is a very good alternative (or even  
better) as you can see for most benchmarks, but is beaten as soon  
as the relative weight of random-IO increases (I/O-meter fileserver  
and database benchmarks), the processor on the 1280 is faster than  
the one on the Dell-controller so its faster in sequential IO.
These benchmarks were not done using postgresql, so you shouldn't  
read them as absolute for all your situations ;-) But you can get a  
good impression I think.


Best regards,

Arjen van der Meijden
Tweakers.net

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 6: explain analyze is your friend


  1   2   >