from:"Hannu Krosing"

[PERFORM] Re: [pgsql-cluster-hackers][performance] fast reads on a busy server

2012-06-27 Thread Hannu Krosing

On Wed, 2012-06-27 at 00:16 +0200, Willy-Bas Loos wrote:
> Hi,
> 
> I've read this:
> http://wiki.postgresql.org/wiki/Prioritizing_databases_by_separating_into_multiple_clusters
> 
> But it doesn't really say anything about memory.
> If i can fit an extra cluster into it's shared buffer, it should have
> fast reads, right?
> Even if i don't have seperate spindles and the disks are busy.

Check if you are CPU-bound. On a database which fits fully you may
already be.

> This is on a Debain server, postgres 8.4

And if possible, upgrade to latest pg (9.1). On some operations this
already may give you a considerable performance boost

> Cheers,
> 
> WBL
> -- 
> "Quality comes from focus and clarity of purpose" -- Mark Shuttleworth
> 

-- 
---
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing

On Wed, 2010-08-04 at 22:03 +0300, Hannu Krosing wrote:
> On Wed, 2010-08-04 at 21:41 +0300, Hannu Krosing wrote:
> > On Wed, 2010-08-04 at 14:00 -0400, Tom Lane wrote:
> 
> > > regression=# select name, setting from pg_settings where name like 
> > > '%cost';
> > >  name | setting 
> > > --+-
> > >  cpu_index_tuple_cost | 0.005
> > >  cpu_operator_cost| 0.0025
> > >  cpu_tuple_cost   | 0.01
> > >  random_page_cost | 4
> > >  seq_page_cost| 1
> > > (5 rows)
> > > 
> > > To model an all-in-RAM database, you can either dial down both
> > > random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
> > > to 1 and increase all the CPU costs.  The former is less effort ;-)
> > > 
> > > It should be noted also that there's not all that much evidence backing
> > > up the default values of the cpu_xxx_cost variables.  In the past those
> > > didn't matter much because I/O costs always swamped CPU costs anyway.
> > > But I can foresee us having to twiddle those defaults and maybe refine
> > > the CPU cost model more, as all-in-RAM cases get more common.
> > 
> > Especially the context switch + copy between shared buffers and system
> > disk cache will become noticeable at these speeds.
> > 
> > An easy way to test it is loading a table with a few indexes, once with
> > a shared_buffers value, which is senough for only the main table and
> > once with one that fits both table and indexes,

I re-ran the test, and checked idx_blks_read for 28MB case

hannu=# select * from pg_statio_user_indexes where relname =
'sbuf_test';
| schemaname |  relname  | indexrelname | idx_blks_read | idx_blks_hit 
++---+--+---+--
| hannu  | sbuf_test | sbuf_test1   | 71376 |  1620908
| hannu  | sbuf_test | sbuf_test2   | 71300 |  1620365
| hannu  | sbuf_test | sbuf_test3   | 71436 |  1619619


this means that there were a total of 214112 index blocks read back from
disk cache (obviously at least some of these had to be copied the other
way as well).

This seems to indicate about 1 ms for moving pages over user/system
boundary. (Intel Core2 Duo T7500 @ 2.20GHz, Ubuntu 9.10, 4GB RAM)

for 128MB shared buffers the total idx_blks_read for 3 indexes was about
6300 .


> ok, just to back this up I ran the following test with 28MB and 128MB
> shared buffers.
> 
> create table sbuf_test(f1 float, f2 float, f3 float);
> create index sbuf_test1 on sbuf_test(f1);
> create index sbuf_test2 on sbuf_test(f2);
> create index sbuf_test3 on sbuf_test(f3);
> 
> and then did 3 times the following for each shared_buffers setting
> 
> truncate sbuf_test;
> insert into sbuf_test 
> select random(), random(), random() from generate_series(1,60);
> 
> the main table size was 31MB, indexes were 18MB each for total size of
> 85MB
> 
> in case of 128MB shared buffers, the insert run in 14sec (+/- 1 sec)
> 
> in case of 28MB shared buffers, the insert run between 346 and 431 sec,
> that is 20-30 _times_ slower.
> 
> There was ample space for keeping the indexes in linux cache (it has 1GB
> cached currently) though the system may have decided to start writing it
> to disk, so I suspect that most of the time was spent copying random
> index pages back and forth between shared buffers and disk cache.
> 
> I did not verify this, so there may be some other factors involved, but
> this seems like the most obvious suspect.
> 
> -- 
> Hannu Krosing   http://www.2ndQuadrant.com
> PostgreSQL Scalability and Availability 
>Services, Consulting and Training
> 
> 
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing

On Wed, 2010-08-04 at 15:16 -0400, Greg Smith wrote:
> Hannu Krosing wrote:
> > There was ample space for keeping the indexes in linux cache (it has 1GB
> > cached currently) though the system may have decided to start writing it
> > to disk, so I suspect that most of the time was spent copying random
> > index pages back and forth between shared buffers and disk cache.
> >   
> 
> Low shared_buffers settings will result in the same pages more often 
> being written multiple times per checkpoint,

Do you mean "written to disk", or written out from shared_buffers to
disk cache ?

>  particularly index pages, 
> which is less efficient than keeping in the database cache and updating 
> them there.  This is a slightly different issue than just the overhead 
> of copying them back and forth; by keeping them in cache, you actually 
> reduce writes to the OS cache. 

That's what I meant. Both writes to and read from the OS cache take a
significant amount of time once you are not doing real disk I/O.

> What I do to quantify that is...well, 
> the attached shows it better than I can describe; only works on 9.0 or 
> later as it depends on a feature I added for this purpose there.  It 
> measures exactly how much buffer cache churn happened during a test, in 
> this case creating a pgbench database.
> 
> -- 
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us
> 
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing

On Wed, 2010-08-04 at 22:03 +0300, Hannu Krosing wrote:
> On Wed, 2010-08-04 at 21:41 +0300, Hannu Krosing wrote:
> > On Wed, 2010-08-04 at 14:00 -0400, Tom Lane wrote:
> 
> > > regression=# select name, setting from pg_settings where name like 
> > > '%cost';
> > >  name | setting 
> > > --+-
> > >  cpu_index_tuple_cost | 0.005
> > >  cpu_operator_cost| 0.0025
> > >  cpu_tuple_cost   | 0.01
> > >  random_page_cost | 4
> > >  seq_page_cost| 1
> > > (5 rows)
> > > 
> > > To model an all-in-RAM database, you can either dial down both
> > > random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
> > > to 1 and increase all the CPU costs.  The former is less effort ;-)
> > > 
> > > It should be noted also that there's not all that much evidence backing
> > > up the default values of the cpu_xxx_cost variables.  In the past those
> > > didn't matter much because I/O costs always swamped CPU costs anyway.
> > > But I can foresee us having to twiddle those defaults and maybe refine
> > > the CPU cost model more, as all-in-RAM cases get more common.
> > 
> > Especially the context switch + copy between shared buffers and system
> > disk cache will become noticeable at these speeds.
> > 
> > An easy way to test it is loading a table with a few indexes, once with
> > a shared_buffers value, which is senough for only the main table and
> > once with one that fits both table and indexes,

I re-ran the test, and checked idx_blks_read for 28MB case

hannu=# select * from pg_statio_user_indexes where relname =
'sbuf_test';
| schemaname |  relname  | indexrelname | idx_blks_read | idx_blks_hit 
++---+--+---+--
| hannu  | sbuf_test | sbuf_test1   | 71376 |  1620908
| hannu  | sbuf_test | sbuf_test2   | 71300 |  1620365
| hannu  | sbuf_test | sbuf_test3   | 71436 |  1619619


this means that there were a total of 214112 index blocks read back from
disk cache (obviously at least some of these had to be copied the other
way as well).

This seems to indicate about 1 ms for moving pages over user/system
boundary. (Intel Core2 Duo T7500 @ 2.20GHz, Ubuntu 9.10, 4GB RAM)

for 128MB shared buffers the total idx_blks_read for 3 indexes was about
6300 .


> ok, just to back this up I ran the following test with 28MB and 128MB
> shared buffers.
> 
> create table sbuf_test(f1 float, f2 float, f3 float);
> create index sbuf_test1 on sbuf_test(f1);
> create index sbuf_test2 on sbuf_test(f2);
> create index sbuf_test3 on sbuf_test(f3);
> 
> and then did 3 times the following for each shared_buffers setting
> 
> truncate sbuf_test;
> insert into sbuf_test 
> select random(), random(), random() from generate_series(1,60);
> 
> the main table size was 31MB, indexes were 18MB each for total size of
> 85MB
> 
> in case of 128MB shared buffers, the insert run in 14sec (+/- 1 sec)
> 
> in case of 28MB shared buffers, the insert run between 346 and 431 sec,
> that is 20-30 _times_ slower.
> 
> There was ample space for keeping the indexes in linux cache (it has 1GB
> cached currently) though the system may have decided to start writing it
> to disk, so I suspect that most of the time was spent copying random
> index pages back and forth between shared buffers and disk cache.
> 
> I did not verify this, so there may be some other factors involved, but
> this seems like the most obvious suspect.
> 
> -- 
> Hannu Krosing   http://www.2ndQuadrant.com
> PostgreSQL Scalability and Availability 
>Services, Consulting and Training
> 
> 
> 



-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing

On Wed, 2010-08-04 at 21:41 +0300, Hannu Krosing wrote:
> On Wed, 2010-08-04 at 14:00 -0400, Tom Lane wrote:

> > regression=# select name, setting from pg_settings where name like '%cost';
> >  name | setting 
> > --+-
> >  cpu_index_tuple_cost | 0.005
> >  cpu_operator_cost| 0.0025
> >  cpu_tuple_cost   | 0.01
> >  random_page_cost | 4
> >  seq_page_cost| 1
> > (5 rows)
> > 
> > To model an all-in-RAM database, you can either dial down both
> > random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
> > to 1 and increase all the CPU costs.  The former is less effort ;-)
> > 
> > It should be noted also that there's not all that much evidence backing
> > up the default values of the cpu_xxx_cost variables.  In the past those
> > didn't matter much because I/O costs always swamped CPU costs anyway.
> > But I can foresee us having to twiddle those defaults and maybe refine
> > the CPU cost model more, as all-in-RAM cases get more common.
> 
> Especially the context switch + copy between shared buffers and system
> disk cache will become noticeable at these speeds.
> 
> An easy way to test it is loading a table with a few indexes, once with
> a shared_buffers value, which is senough for only the main table and
> once with one that fits both table and indexes,

ok, just to back this up I ran the following test with 28MB and 128MB
shared buffers.

create table sbuf_test(f1 float, f2 float, f3 float);
create index sbuf_test1 on sbuf_test(f1);
create index sbuf_test2 on sbuf_test(f2);
create index sbuf_test3 on sbuf_test(f3);

and then did 3 times the following for each shared_buffers setting

truncate sbuf_test;
insert into sbuf_test 
select random(), random(), random() from generate_series(1,60);

the main table size was 31MB, indexes were 18MB each for total size of
85MB

in case of 128MB shared buffers, the insert run in 14sec (+/- 1 sec)

in case of 28MB shared buffers, the insert run between 346 and 431 sec,
that is 20-30 _times_ slower.

There was ample space for keeping the indexes in linux cache (it has 1GB
cached currently) though the system may have decided to start writing it
to disk, so I suspect that most of the time was spent copying random
index pages back and forth between shared buffers and disk cache.

I did not verify this, so there may be some other factors involved, but
this seems like the most obvious suspect.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing

On Wed, 2010-08-04 at 14:00 -0400, Tom Lane wrote:
> Hannu Krosing  writes:
> > Of course there are more variables than just *_page_cost, so if you nail
> > down any other one, you may end with less than 1 for both page costs.
> 
> > I have always used seq_page_cost = 1 in my thinking and adjusted others
> > relative to it.
> 
> Right, seq_page_cost = 1 is sort of the traditional reference point,
> but you don't have to do it that way.  The main point here is that for
> an all-in-RAM database, the standard page access costs are too high
> relative to the CPU effort costs:
> 
> regression=# select name, setting from pg_settings where name like '%cost';
>  name | setting 
> --+-
>  cpu_index_tuple_cost | 0.005
>  cpu_operator_cost| 0.0025
>  cpu_tuple_cost   | 0.01
>  random_page_cost | 4
>  seq_page_cost| 1
> (5 rows)
> 
> To model an all-in-RAM database, you can either dial down both
> random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
> to 1 and increase all the CPU costs.  The former is less effort ;-)
> 
> It should be noted also that there's not all that much evidence backing
> up the default values of the cpu_xxx_cost variables.  In the past those
> didn't matter much because I/O costs always swamped CPU costs anyway.
> But I can foresee us having to twiddle those defaults and maybe refine
> the CPU cost model more, as all-in-RAM cases get more common.

Especially the context switch + copy between shared buffers and system
disk cache will become noticeable at these speeds.

An easy way to test it is loading a table with a few indexes, once with
a shared_buffers value, which is senough for only the main table and
once with one that fits both table and indexes,


>   regards, tom lane


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing

On Wed, 2010-08-04 at 09:14 -0400, Robert Haas wrote:
> On Tue, Aug 3, 2010 at 3:03 AM, Hannu Krosing  wrote:
> > In case of fully cached database it is closer to 1.
> 
> In the case of a fully cached database I believe the correct answer
> begins with a decimal point.

The number 1 here was suggested in relation to seq_page_cost, which is
1. 

For fully cached db there is no additional seek time for random access,
so seq_page_cost == random_page_cost.

Of course there are more variables than just *_page_cost, so if you nail
down any other one, you may end with less than 1 for both page costs.

I have always used seq_page_cost = 1 in my thinking and adjusted others
relative to it.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing

On Mon, 2010-08-02 at 14:23 -0700, Peter Hussey wrote:
> I already had effective_cache_size set to 500MB.
> 
> I experimented with lowering  random_page_cost to 3 then 2.  

In case of fully cached database it is closer to 1.

> 2) Why is the setting of work_mem something left to the admin and/or
> developer?  Couldn't the optimizer say how much it thinks it needs to
> build a hash table based on size of the keys and estimated number of
> rows?

Yes, It can say how much it thinks it needs to build a hash table, the
part it can't figure out is how much it can afford, based on things like
number concurrent queries and how much work-mem these are using, and any
work-mem used will be substracted from total memory pool, affecting also
how much of the files the system caches.

> It is difficult for a software development platform like ours to take
> advantage of suggestions to set work_mem, or to change the cost
> function, or turn on/off join strategies for individual queries.  The
> SQL we issue is formed by user interaction with the product and rarely
> static.  How would we know when to turn something on or off?  That's
> why I'm looking for a configuration solution that I can set on a
> database-wide basis and have it work well for all queries.

Keep trying. The close you get with your conf to real conditions, the
better choices the optimiser can make ;)



-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Hannu Krosing

On Tue, 2010-08-03 at 10:40 +0200, Yeb Havinga wrote:
> se note that the 10% was on a slower CPU. On a more recent CPU the 
> difference was 47%, based on tests that ran for an hour.

I am not surprised at all that reading and writing almost twice as much
data from/to disk takes 47% longer. If less time is spent on seeking the
amount of data starts playing bigger role.

>  That's why I 
> absolutely agree with Merlin Moncure that more testing in this 
> department is welcome, preferably by others since after all I could be 
> on the pay roll of OCZ :-)

:)

> I looked a bit into Bonnie++ but fail to see how I could do a test that 
> somehow matches the PostgreSQL setup during the pgbench tests (db that 
> fits in memory, 

Did it fit in shared_buffers, or system cache ?

Once we are in high tps ground, the time it takes to move pages between
userspace and system cache starts to play bigger role.

I first noticed this several years ago, when doing a COPY to a large
table with indexes took noticably longer (2-3 times longer) when the
indexes were in system cache than when they were in shared_buffers.

> so the test is actually how fast the ssd can capture 
> sequential WAL writes and fsync without barriers, mixed with an 
> occasional checkpoint with random write IO on another partition). Since 
> the WAL writing is the same for both block_size setups, I decided to 
> compare random writes to a file of 5GB with Oracle's Orion tool:

Are you sure that you are not writing full WAL pages ?

Do you have any stats on how much WAL is written for 8kb and 4kb test
cases ?

And for other disk i/o during the tests ?

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Testing Sandforce SSD

2010-07-27 Thread Hannu Krosing

On Mon, 2010-07-26 at 14:34 -0400, Greg Smith wrote:
> Matthew Wakeling wrote:
> > Yeb also made the point - there are far too many points on that graph 
> > to really tell what the average latency is. It'd be instructive to 
> > have a few figures, like "only x% of requests took longer than y".
> 
> Average latency is the inverse of TPS.  So if the result is, say, 1200 
> TPS, that means the average latency is 1 / (1200 transactions/second) = 
> 0.83 milliseconds/transaction. 

This is probably only true if you run all transactions sequentially in
one connection? 

If you run 10 parallel threads and get 1200 sec, the average transaction
time (latency?) is probably closer to 8.3 ms ?

>  The average TPS figure is normally on a 
> more useful scale as far as being able to compare them in ways that make 
> sense to people.
> 
> pgbench-tools derives average, worst-case, and 90th percentile figures 
> for latency from the logs.  I have 37MB worth of graphs from a system 
> showing how all this typically works for regular hard drives I've been 
> given permission to publish; just need to find a place to host it at 
> internally and I'll make the whole stack available to the world.  So far 
> Yeb's data is showing that a single SSD is competitive with a small 
> array on average, but with better worst-case behavior than I'm used to 
> seeing.
> 
> -- 
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us
> 
> 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Hannu Krosing

On Sat, 2010-07-24 at 14:36 +0800, Craig Ringer wrote:
> On 24/07/10 13:23, Greg Smith wrote:
> > Joshua Tolley wrote:
> >> Relatively minor, but it would be convenient to avoid having to query
> >> $external_pooler to determine the client_addr of an incoming connection.
> >>   
> > 
> > You suggest this as a minor concern, but I consider it to be one of the
> > most compelling arguments in favor of in-core pooling.  A constant pain
> > with external poolers is the need to then combine two sources of data in
> > order to track connections fully, which is something that everyone runs
> > into eventually and finds annoying.  It's one of the few things that
> > doesn't go away no matter how much fiddling you do with pgBouncer, it's
> > always getting in the way a bit.  And it seems to seriously bother
> > systems administrators and developers, not just the DBAs.
> 
> 
> Putting a pooler in core won't inherently fix this, and won't remove the
> need to solve it for cases where the pooler can't be on the same machine.
> 
> 9.0 has application_name to let apps identify themselves. Perhaps a
> "pooled_client_ip", to be set by a pooler rather than the app, could be
> added to address this problem in a way that can be used by all poolers
> new and existing, not just any new in-core pooling system.
> 
> If a privileged set of pooler functions is was considered, as per my
> other recent mail, the pooler could use a management connection to set
> the client ip before handing the connection to the client, so the client
> couldn't change pooled_client_ip its self by accident or through malice.
> But even without that, it'd be awfully handy.

Or maybe we can add some command extensions to the protocol for passing
extra info, so that instead of sending just the "(run_query:query)"
command over socket we could send both the extra info and execute
"(set_params:(proxy_client_ip:a.b.c.d)(proxy_client_post:n)(something
else))(run_query:query)" in one packet (for performance) and have these
things be available in logging and pg_stat_activity

I see no need to try to somehow restrict these if you can always be sure
that they are set by the direct client. proxy can decide to pass some of
these from the real client but it would be a decision made by proxy, not
mandated by some proxying rules.






-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Hannu Krosing

On Fri, 2010-07-23 at 09:52 -0700, Joshua D. Drake wrote:
> On Thu, 2010-07-22 at 20:56 +0100, Hannu Krosing wrote:
> >  
> > > Let's extend this shall we:
> > > 
> > > Avoid adding yet another network hop
> > 
> > postgreSQL is multi-process, so you either have a separate "pooler
> > process" or need to put pooler functionality in postmaster, bothw ways
> > you still have a two-hop scenario for connect. you may be able to pass
> > the socket to child process and also keep it, but doing this for both
> > client and db sides seems really convoluted. 
> 
> Which means, right now there is three hops. Reducing one is good.

No, it is still two, as postmaster passes the socket to spwaned child
postgresql process after login. 

the process is as follows

Client --connects--> postmaster --spawns--> postgreSQL server process

then socket is passed to be used directly so the use is

Client --talks-to---> postgreSQL server process

when using spooler it becomes

Client --connects-to--> Spooler --passes-requests-to-->  postgreSQL 

I see no way to have spooler select the postgreSQL process, pass the
client connection in a way that taks directly to postgrSQL server
process AND be able to get the server connection back once the client is
finishe with either the request, transaction or connection (depending on
pooling mode).

> 
> > Or is there a prortable way to pass sockets back and forth between
> > parent and child processes ?
> > 
> > If so, then pgbouncer could use it as well.
> > 
> > > Remove of a point of failure
> > 
> > rather move the point of failure from external pooler to internal
> > pooler ;)
> 
> Yes but at that point, it doesn't matter. 
> 
> > 
> > > Reduction of administrative overhead
> > 
> > Possibly. But once you start actually using it, you still need to
> > configure and monitor it and do other administrator-y tasks.
> 
> Yes, but it is inclusive.
> 
> > 
> > > Integration into our core authentication mechanisms
> > 
> > True, although for example having SSL on client side connection will be
> > so slow that it hides any performance gains from pooling, at least for
> > short-lived connections.
> 
> Yes, but right now you can't use *any* pooler with LDAP for example. We
> could if pooling was in core. Your SSL argument doesn't really work
> because its true with or without pooling.

As main slowdown in SSL is connection setup, so you can get the network
security and pooling speedup if you run pool on client side and make the
pooler-server connection over SSL.

> > > Greater flexibility in connection control
> > 
> > Yes, poolers can be much more flexible than default postgresql. See for
> > example pgbouncers PAUSE , RECONFIGURE and RESUME commands 
> 
> :D
> 
> > 
> > > And, having connection pooling in core does not eliminate the use of an
> > > external pool where it makes since.
> > 
> > Probably the easiest way to achieve "pooling in core" would be adding an
> > option to start pgbouncer under postmaster control.
> 
> Yeah but that won't happen. 

I guess it could happen as part of opening up the "postgresql controlled
process" part to be configurable and able to run third party stuff. 

Another thing to run under postmaster control would be pgqd . 

> Also I think we may have a libevent
> dependency that we have to work out.
> 
> > 
> > You probably can't get much leaner than pgbouncer.
> 
> Oh don't get me wrong. I love pgbouncer. It is my recommended pooler but
> even it has limitations (such as auth).

As pgbouncer is single-threaded and the main goal has been performance
there is not much enthusiasm about having _any_ auth method included
which cant be completed in a few cpu cycles. It may be possible to add
threads to wait for LDAP/Kerberos/... response or do SSL handshakes, but
i have not seen any interest from Marko to do it himself.

Maybe there is a way to modularise the auth part of postmaster in a way
that could be used from third party products through some nice API which
postmaster-controlled pgbouncer can start using.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-23 Thread Hannu Krosing

On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote:
> On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing  wrote:
> > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
> >> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> >>  wrote:
> >> > So rather than asking "should core have a connection pool" perhaps
> >> > what's needed is to ask "what can an in-core pool do that an external
> >> > pool cannot do?"
> >>
> >> Avoid sending every connection through an extra hop.
> >
> > not really. in-core != magically-in-right-backend-process
> 
> Well, how about if we arrange it so it IS in the right backend
> process?  I don't believe magic is required.

Do you have any design in mind, how you can make it so ?

---
Hannu




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-22 Thread Hannu Krosing

On Thu, 2010-07-22 at 12:15 -0700, Joshua D. Drake wrote:
> On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: 
> > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
> >  wrote:
> > > So rather than asking "should core have a connection pool" perhaps
> > > what's needed is to ask "what can an in-core pool do that an external
> > > pool cannot do?"
> > 
> > Avoid sending every connection through an extra hop.
> 
> Let's extend this shall we:
> 
> Avoid adding yet another network hop

postgreSQL is multi-process, so you either have a separate "pooler
process" or need to put pooler functionality in postmaster, bothw ways
you still have a two-hop scenario for connect. you may be able to pass
the socket to child process and also keep it, but doing this for both
client and db sides seems really convoluted. 

Or is there a prortable way to pass sockets back and forth between
parent and child processes ?

If so, then pgbouncer could use it as well.

> Remove of a point of failure

rather move the point of failure from external pooler to internal
pooler ;)

> Reduction of administrative overhead

Possibly. But once you start actually using it, you still need to
configure and monitor it and do other administrator-y tasks.

> Integration into our core authentication mechanisms

True, although for example having SSL on client side connection will be
so slow that it hides any performance gains from pooling, at least for
short-lived connections.

> Greater flexibility in connection control

Yes, poolers can be much more flexible than default postgresql. See for
example pgbouncers PAUSE , RECONFIGURE and RESUME commands 

> And, having connection pooling in core does not eliminate the use of an
> external pool where it makes since.

Probably the easiest way to achieve "pooling in core" would be adding an
option to start pgbouncer under postmaster control.

You probably can't get much leaner than pgbouncer.

> -- 
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
> 
> 
> 

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-22 Thread Hannu Krosing

On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
>  wrote:
> > So rather than asking "should core have a connection pool" perhaps
> > what's needed is to ask "what can an in-core pool do that an external
> > pool cannot do?"
> 
> Avoid sending every connection through an extra hop.

not really. in-core != magically-in-right-backend-process


there will still be "an extra hop",only it will be local, between pooler
and backend process.

similar to what currently happens with pgbouncer when you deploy it on
same server and use unix sockets

> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-18 Thread Hannu Krosing

On Sun, 2010-07-18 at 21:48 +0530, Rajesh Kumar Mallah wrote:
> Hi,
> 
> Sorry, if posting here was not proper instead of starting new thread
> (I am really not sure if its bad thing to do)
> 
> I would like to share my recent experience on implementation of  
> client side pooling using  pgbouncer. By client side i mean that 
> the the pgbouncer process in not on same machine as postgresql server.
> In first trial pgbouncer and postgresql were in same machine &
> phbouncer
> was connecting to postgresql using unix domain sockets. But i shifted
> it
> laters owing to high CPU usage > 50%. ( using top)
> Now i have shifted pgbouncer into a virtual machine (openvz container)
> in the application server hardware 

Why in VM (openvz container) ?

Did you also try it in the same OS as your appserver ?

Perhaps even connecting from appserver via unix seckets ?

> and all my applications on other virtual machines
> (web applications) connect to pgbouncer on this virtual machine.
> 
> I tested the setup with pgbench in two scenarios 
> 
> 1. connecting to DB server directly
> 2. connecting to DB via pgbouncer
> 
> the no of clients was 10 ( -c 10)  carrying out 1 transactions
> each (-t 1) .
> pgbench db was initilised with scaling  factor -s 100. 
> 
> since client count was less there was no queuing of requests in
> pgbouncer
> i would prefer to say  it was in 'passthrough' mode. 
> 
> the result was that 
> 
> 1. direct ~ 2000 tps
> 2. via pgbouncer ~ 1200 tps

Are you sure you are not measuring how much sunning pgbouncer slows down
pgbench directly, by competing for CPU resources and not by adding
latency to requests ?


> --
> Experience on deploying to production environment with real world
> load/usage pattern
> --
> 
> Pgbouncer was put in same machine as postgresql connecting via unix
> domain
> to server and tcp sockets with clients.
> 
> 1. There was drastic reduction in CPU loads  from  30 to 10 ldavg
> 2. There were no clients waiting, pool size was 150 and number of
> active
> connections was 100-120.
> 3. Application performance was worse (inspite of 0 clients waiting ) 
> 
> 
> I am still waiting to see what is the effect of shifting out pgbounce
> from dbserver
> to appserver, but with pgbench results i am not very hopeful. I am
> curious why inspite of 0 clients waiting pgbounce introduces a drop in
> tps.

If you have less clients than pgbouncer connections, you can't have any
clients waiting in pgbouncer, as each of them is allocated it's own
connection right away.

What you were measuring was 

1. pgbench and pgbouncer competeing for the same CPU
2. overhead from 2 hops to db (app-proxy-db) instead of 1 (app-db)

> Warm Regds
> Rajesh Kumar Mallah.
> CTO - tradeindia.com.
> 
> 
> 
> Keywords: pgbouncer performance
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, Jul 12, 2010 at 6:11 PM, Kevin Grittner
>  wrote:
> Craig Ringer  wrote:
> 
> 
> > So rather than asking "should core have a connection pool"
> perhaps
> > what's needed is to ask "what can an in-core pool do that an
> > external pool cannot do?"
> 
> 
> (1)  It can prevent the most pessimal performance problems
> resulting
> from lack of an external connection pool (or a badly
> configured one)
> by setting a single GUC.  Configuration tools could suggest a
> good
> value during initial setup.
> 
> (2)  It can be used without installing and configuring a more
> sophisticated and complex product.
> 
> (3)  It might reduce latency because it avoids having to
> receive,
> parse, and resend data in both directions -- eliminating one
> "hop".
> I know the performance benefit would usually accrue to the
> external
> connection pooler, but there might be some circumstances where
> a
> built-in pool could win.
> 
> (4)  It's one more checkbox which can be ticked off on some
> RFPs.
> 
> That said, I fully agree that if we can include good
> documentation
> on the external poolers and we can get packagers to include
> poolers
> in their distribution, that gets us a much bigger benefit.  A
> built-in solution would only be worthwhile if it was simple
> enough
> and lightweight enough not to be a burden on execution time or
> maintenance.  Maybe that's too big an if.
> 
> -Kevin
> 
> 
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To mak

Re: [PERFORM] Need help in performance tuning.

2010-07-14 Thread Hannu Krosing

On Wed, 2010-07-14 at 08:58 -0500, Kevin Grittner wrote:
> Scott Marlowe  wrote:
> > Hannu Krosing  wrote:
> >> One example where you need a separate connection pool is pooling
> >> really large number of connections, which you may want to do on
> >> another host than the database itself is running.
> > 
> > Definitely.  Often it's best placed on the individual webservers
> > that are making requests, each running its own pool.
>  
> Each running its own pool?  You've just made a case for an
> admissions policy based on active database transactions or active
> queries (or both) on the server having a benefit when used with this
> pooling arrangement.  This collection of pools can't know when the
> CPUs have enough to keep them busy and adding more will degrade
> performance.

I guess this setup is for OLTP load (read "lots of short transactions
with low timeout limits"), where you can just open 2-5 connections per
CPU for mostly-in-memory database, maybe a little more when disk
accesses are involved. If you have more, then they just wait a few
milliseconds, if you have less, you don't have anything else to run
anyway.


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Need help in performance tuning.

2010-07-13 Thread Hannu Krosing

On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote:
> Samuel Gendler  writes:
> > On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
> >  wrote:
> >> If you're not using a connection pool, start using one.
> 
> > I see this issue and subsequent advice cross this list awfully
> > frequently.  Is there in architectural reason why postgres itself
> > cannot pool incoming connections in order to eliminate the requirement
> > for an external pool?
> 
> Perhaps not, but there's no obvious benefit either.  Since there's
> More Than One Way To Do It, it seems more practical to keep that as a
> separate problem that can be solved by a choice of add-on packages.

One example where you need a separate connection pool is pooling really
large number of connections, which you may want to do on another host
than the database itself is running.

For example pgbouncer had to add option to use incoming unix sockets,
because they run into the IP socket port number limit (a little above
31k, or more exactly 63k/2.
And unix sockets can be used only on local host .

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-12 Thread Hannu Krosing

On Mon, 2010-07-12 at 18:58 +0800, Craig Ringer wrote:
> On 12/07/10 17:45, Matthew Wakeling wrote:
> > 
> > I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
> > load of non-portable stuff? It seems to work on a whole load of platforms.
> 
> A lot of what Apache HTTPd does is handled via the Apache Portable
> Runtime (APR). It contains a lot of per-platform handlers for various
> functionality.
> 
> http://apr.apache.org/docs/apr/1.4/modules.html
> 
> I don't know if the socket passing is provided as part of APR or is part
> of Apache HTTPd its self, but I wouldn't be at all surprised if it was
> in APR.
> 
> Personally I'm now swayed by arguments presented here that trying to
> push pooling into core isn't really desirable, and that better
> packaging/bundling of existing solutions would be better.

"better packaging/bundling of existing solutions" is good in it's own
right,weather there will eventually be some support for pooling in core
or not.

> Perhaps documenting the pluses/minuses of the current pooling options
> and providing clear recommendations on which to use for different use
> cases would help, since half the trouble is users not knowing they need
> a pool or being confused as to which to select.
>
> This discussion reminds me a bit of Hibernate's built-in client-side
> connection pool. It has one, but it's a unloved stepchild that even the
> Hibernate devs suggest should be avoided in favour of a couple of
> external 3rd party options.

Yes, pooling _is_ often better handled as a (set of) separate options,
just because of the reason that here one size does definitely not fit
all;

And efficient in-core pooler probably will look very much like pgbouncer
running in a separate thread spawned by postmaster anyway.

Let's hope there will be some support in core for having user defined
helper processes soon(ish), so tweaking pgbouncer to run as one will be
reasonably easy :)

> A built-in pool seems like a great idea, but there are multiple existing
> ones because they solve different problems in different ways. Unless a
> built-in one could solve ALL those needs, or be so vastly simpler (due
> to code re-use, easier configuration, etc) that it's worth building one
> that won't fit everyone's needs, then it's best to stick to the existing
> external options.
> 
> So rather than asking "should core have a connection pool" perhaps
> what's needed is to ask "what can an in-core pool do that an external
> pool cannot do?"

Probably nothing. OTOH there are some things that an external pool can
do that a built-in one can't, like running on a separate host and
pooling more than 32000 client connections there.

Cascaded pooling seems also impossible with built-in pooling

> Admission control / resource limit features would be great to have in
> core, and can't really be done fully in external modules ... but could
> be designed in ways that would allow external poolers to add
> functionality on top. Josh Berkus has made some good points on why this
> isn't as easy as it looks, though:
> 
> 
> http://it.toolbox.com/blogs/database-soup/admission-control-and-its-discontents-39895
> 
> --
> Craig Ringer
> 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Planner not using column limit specified for one column for another column equal to first

2010-04-20 Thread Hannu Krosing

On Fri, 2010-04-16 at 11:02 +0300, Віталій Тимчишин wrote:
> Hello.
> 
> 
> I have a query that performs very poor because there is a limit on
> join column that is not applied to other columns:
> 
> 
> select * from company this_ left outer join company_tag this_1_ on
> this_.id=this_1_.company_id left outer join company_measures
> companymea2_ on this_.id=companymea2_.company_id left outer join
> company_descr ces3_ on this_.id=ces3_.company_id where this_1_.tag_id
> = 7 and this_.id>5000 
> and this_1_.company_id>5000
> order by this_.id asc limit 1000;
> 
> 
> (plan1.txt)
> Total runtime: 7794.692 ms
> 
> 
> At the same time if I apply the limit (>5000) to other columns in
> query itself it works like a charm:
> 
> 
> select * from company this_ left outer join company_tag this_1_ on
> this_.id=this_1_.company_id left outer join company_measures
> companymea2_ on this_.id=companymea2_.company_id left outer join
> company_descr ces3_ on this_.id=ces3_.company_id where this_1_.tag_id
> = 7 and this_.id>5000 
> and this_1_.company_id>5000
> and companymea2_.company_id>5000 and ces3_.company_id>5000
> order by this_.id asc limit 1000;

The queries are not the same.

2nd variant will not return the rows where there are no matching rows
inthis_1_ , companymea2_ or ces3_.company_id

A query equivalent to first one would be:


select * from company this_ 
  left outer join company_tag this_1_ 
   on (this_.id=this_1_.company_id 
   and this_1_.company_id>5000)
  left outer join company_measures companymea2_ 
   on (this_.id=companymea2_.company_id 
   and companymea2_.company_id>5000)
  left outer join company_descr ces3_ 
   on (this_.id=ces3_.company_id 
   and ces3_.company_id>5000)
 where this_1_.tag_id = 7 
   and this_.id>5000 
 order by this_.id asc 
 limit 1000;


I'm not sure that planner considers the above form of plan rewrite, nor
that it would make much sense to do so unless there was a really small
number of rows where x_.company_id>5000 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] planer chooses very bad plan

2010-04-12 Thread Hannu Krosing

On Sun, 2010-04-11 at 23:12 +0200, Corin wrote:
> Hi,
> 
> I'm having a query where the planer chooses a very bad plan.
> 
> explain analyze SELECT * FROM "telegrams" WHERE ((recipient_id=508933 
> AND recipient_deleted=FALSE) OR (user_id=508933 AND user_deleted=FALSE)) 
> ORDER BY id DESC LIMIT 10 OFFSET 0
> 
> "Limit  (cost=0.00..1557.67 rows=10 width=78) (actual 
> time=0.096..2750.058 rows=5 loops=1)"
> "  ->  Index Scan Backward using telegrams_pkey on telegrams  
> (cost=0.00..156545.47 rows=1005 width=78) (actual time=0.093..2750.052 
> rows=5 loops=1)"
> "Filter: (((recipient_id = 508933) AND (NOT recipient_deleted)) 
> OR ((user_id = 508933) AND (NOT user_deleted)))"
> "Total runtime: 2750.124 ms"

You could check if creating special deleted_x indexes helps

do

CREATE INDEX tgrm_deleted_recipent_index ON telegrams(recipient_id)
 WHERE recipient_deleted=FALSE;

CREATE INDEX tgrm_deleted_user_index ON telegrams(user_id) 
 WHERE user_deleted=FALSE;

(if on live system, use "CREATE INDEX CONCURRENTLY ...")

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How to fast the REINDEX

2010-04-06 Thread Hannu Krosing

On Thu, 2010-04-01 at 04:27 +0530, raghavendra t wrote:
> I'm sorry I couldn't come up with more, but what you've
> provided so
> far is roughly equivalent to me telling you that it takes over
> four
> hours to travel to see my Uncle Jim, and then asking you how I
> can
> find out how he's doing in less time than that.  There's just
> not
> much to go on.  :-(
> 
> If you proceed with the course suggested in the URL I
> referenced,
> people on the list have a chance to be more helpful to you.
> Instead of looking into the priority of the question or where it has
> to be posted, it would be appreciated to keep a discussion to the
> point mentioned.  Truely this question belong to some other place as
> you have mentioned in the URL. But answer for Q1 might be expected
> alteast. 

Ok, here is my answer to your Q1:

Q1. What are the parameters will effect, when issuing the REINDEX
command

A: Assuming you meant what parameters affect performance of REINDEX
command.

Most parameters that affect general performance affect also REINDEX
command.

Some that affect more are:

* amount of RAM in your server - the most important thing

* speed of disk subsystem - next most important in case not all of
active data fits in memory 

Tunables

*  maintenance_work_mem - affects how much of sorting can be done in
memory, if you can afford to have maintenance_work_mem > largest index
size then sorting for index creation can be done in RAM only and is
significantly faster than when doing tape sort with intermediate files
on disks.

* wal_buffers - the bigger the better here, but competes with how big
you can make maintenance_work_mem . If more of heap and created indexes
can be kept in shared memory, everything runs faster.

* checkpoint_segments - affects how often whole wal_buffers is synced to
disk, if done too often then wastes lot of disk bandwidth for no good
reason.

* other chekpoint_* - tune to avoid excessive checkpointing.

> Hope i could get the information from the other Thread in other
> catagory.

Nah, actually [PERFORM] is the right place to ask. 

Just most people got the impression that you may be doing unnecessary
REINDEXing, and the best way to speed up unneeded things is not to do
them ;)

> Thank you
>  
> Regards
> Raghavendra

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Does the psql executable support a "fetch many" approach when dumping large queries to stdout?

2010-04-06 Thread Hannu Krosing

On Fri, 2010-04-02 at 16:28 -0400, Beaver, John E wrote:
...

> I know that the query used here could have been a COPY statement, which I 
> assume would
>  be better-behaved, but I'm more concerned about the case in which the query 
> is more complex.

COPY can copy out results of a SELECT query as well.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] mysql to postgresql, performance questions

2010-03-25 Thread Hannu Krosing

On Wed, 2010-03-24 at 09:55 +0100, Yeb Havinga wrote:
> Greg Smith wrote:
> > Tom Lane wrote:
> >> So has anyone looked at porting MythTV to PG?
> >>   
> >
> > Periodically someone hacks together something that works, last big 
> > effort I'm aware of was in 2006, and then it bit rots away.  I'm sure 
> > we'd get some user uptake on the result--MySQL corruption is one of 
> > the top ten cause of a MythTV system crashing.
> It would be the same with PG, unless the pg cluster configuration with 
> mythtv would come with a properly configured WAL - I had corrupted 
> tables (and a personal wiki entry  (the other mysql database in my 
> house) *only* when I sometimes took the risk of not shutting down the 
> machine properly when e.g. the remote was missing).

Pulling the plug should not corrupt a postgreSQL database, unless it was
using disks which lie about write caching.

Now need for WAL replica for that

> regards,
> Yeb Havinga
> 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Building multiple indexes concurrently

2010-03-22 Thread Hannu Krosing

On Thu, 2010-03-18 at 16:12 -0400, Justin Pitts wrote:
> It seems to me that a separate partition / tablespace would be a much simpler 
> approach.

Do you mean a separate partition/ tablespace for _each_ index built
concurrently ?

> On Mar 17, 2010, at 5:18 PM, Hannu Krosing wrote:
> 
> > On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
> >> Alvaro Herrera wrote:
> >>> Andres Freund escribió:
> >>> 
> >>> 
> >>>> I find it way much easier to believe such issues exist on a tables in 
> >>>> constrast to indexes. The likelihood to get sequential accesses on an 
> >>>> index is 
> >>>> small enough on a big table to make it unlikely to matter much.
> >>>> 
> >>> 
> >>> Vacuum walks indexes sequentially, for one.
> >>> 
> >> 
> >> That and index-based range scans were the main two use-cases I was 
> >> concerned would be degraded by interleaving index builds, compared with 
> >> doing them in succession. 
> > 
> > I guess that tweaking file systems to allocate in bigger chunks help
> > here ? I know that xfs can be tuned in that regard, but how about other
> > common file systems like ext3 ?
> > 
> > - 
> > Hannu Krosing   http://www.2ndQuadrant.com
> > PostgreSQL Scalability and Availability 
> >   Services, Consulting and Training
> > 
> > 
> > 
> > -- 
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
> 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Building multiple indexes concurrently

2010-03-18 Thread Hannu Krosing

On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
> Alvaro Herrera wrote:
> > Andres Freund escribió:
> >
> >   
> >> I find it way much easier to believe such issues exist on a tables in 
> >> constrast to indexes. The likelihood to get sequential accesses on an 
> >> index is 
> >> small enough on a big table to make it unlikely to matter much.
> >> 
> >
> > Vacuum walks indexes sequentially, for one.
> >   
> 
> That and index-based range scans were the main two use-cases I was 
> concerned would be degraded by interleaving index builds, compared with 
> doing them in succession. 

I guess that tweaking file systems to allocate in bigger chunks help
here ? I know that xfs can be tuned in that regard, but how about other
common file systems like ext3 ?

- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Is DBLINK transactional

2010-03-15 Thread Hannu Krosing

On Sat, 2010-03-13 at 20:10 +0800, Craig Ringer wrote:
> On 13/03/2010 5:54 AM, Jeff Davis wrote:
> > On Fri, 2010-03-12 at 12:07 -0500, Merlin Moncure wrote:
> >> of course.  You can always explicitly open a transaction on the remote
> >> side over dblink, do work, and commit it at the last possible moment.
> >> Your transactions aren't perfectly synchronized...if you crash in the
> >> precise moment between committing the remote and the local you can get
> >> in trouble.  The chances of this are extremely remote though.
> >
> > If you want a better guarantee than that, consider using 2PC.
> 
> Translation in case you don't know: 2PC = two phase commit.
> 
> Note that you have to monitor "lost" transactions that were prepared for 
> commit then abandoned by the controlling app and periodically get rid of 
> them or you'll start having issues.

And you still have the problem of committing one 2PC transaction and
then crashing before committing the other and then crashing the
transaction monitor before being able to record what crashed :P, though
this possibility is even more remote than just crashing between the 2
original commits (dblink and local).

To get around this fundamental problem, you can actually do async queues
and remember, what got replayed on the remote side, so if you have
crashes on either side, you can simply replay again.

> > The problem with things that are "extremely remote" possibilities are
> > that they tend to be less remote than we expect ;)
> 
> ... and they know just when they can happen despite all the odds to 
> maximise the pain and chaos caused.
> 
> --
> Craig Ringer
> 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-10-03 Thread Hannu Krosing

On Mon, 2009-09-28 at 10:36 -0700, Josh Berkus wrote:
> On 9/26/09 8:19 AM, Greg Smith wrote:
> > This means that the question you want an answer to is "if the OS cache
> > isn't really available, where does giving memory to shared_buffers
> > becomes less efficient than not caching things at all?"  My guess is
> > that this number is much larger than 10GB, but I don't think anyone has
> > done any tests to try to quantify exactly where it is.  Typically when
> > people are talking about systems as large as yours, they're dedicated
> > database servers at that point, so the OS cache gets considered at the
> > same time. If it's effectively out of the picture, the spot where
> > caching still helps even when it's somewhat inefficient due to buffer
> > contention isn't well explored.
> 
> It also depends on the filesystem.  In testing at Sun and on this list,
> people have found that very large s_b (60% of RAM) plus directIO was
> actually a win on Solaris UFS, partly because UFS isn't very agressive
> or smart about readahead and caching.  On Linux/Ext3, however, it was
> never a win.

Again, it depends. 

On my recent testing of a simple seqscan on 1-int table, that are
entirely in cache (either syscache or shared buffers), the shared
buffers only scan was 6% to 10% percent faster than when the relation
was entirely in system cache and each page had to be switched in via
syscall / context switch. 

This was on Linux/Ext3 but I suspect this to be mostly independent of
file system.

Also, in ancient times, when I used Slony, and an early version of Slony
at that, which did not drop and recreate indexes around initial copy,
the copy time could be 2 to 3 _times_ slower for large tables with lots
of indexes when indexes were in system cache vs. when they were in
shared buffers (if I remember correctly, it was 1G shared buffers vs. 3G
on a 4G machine). It was probably due to all kinds of index page splits
etc which shuffled index pages back and forth a lot between userspace
and syscache. So this is not entirely read-only thing either.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Hannu Krosing

On R, 2005-09-30 at 13:38 -0700, Luke Lonergan wrote:

> 
> Bulk loading speed is irrelevant here - that is dominated by parsing, which
> we have covered copiously (har har) previously and have sped up by 500%,
> which still makes Postgres < 1/2 the loading speed of MySQL.

Is this < 1/2 of MySQL with WAL on different spindle and/or WAL
disabled ?

-- 
Hannu Krosing <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Hannu Krosing

On E, 2005-10-03 at 14:16 -0700, Josh Berkus wrote:
> Jeff,
> 
> > > Nope, LOTS of testing, at OSDL, GreenPlum and Sun.   For comparison, A
> > > Big-Name Proprietary Database doesn't get much more than that either.
> >
> > I find this claim very suspicious.  I get single-threaded reads in
> > excess of 1GB/sec with XFS and > 250MB/sec with ext3.
> 
> Database reads?  Or raw FS reads?  It's not the same thing.

Just FYI, I run a count(*) on a 15.6GB table on a lightly loaded db and
it run in 163 sec. (Dual opteron 2.6GHz, 6GB RAM, 6 x 74GB 15k  disks in
RAID10, reiserfs). A little less than 100MB sec.

After this I ran count(*) over a 2.4GB file from another tablespace on
another device (4x142GB 10k disks in RAID10) and it run 22.5 sec on
first run and 12.5 on second.

db=# show shared_buffers ;
 shared_buffers

 196608
(1 row)

db=# select version();
  version

 PostgreSQL 8.0.3 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 3.3.6
(Debian 1:3.3.6-7)
(1 row)

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-22 Thread Hannu Krosing

On T, 2005-03-22 at 09:10 -0400, Alvaro Herrera wrote:
> On Mon, Mar 21, 2005 at 08:26:24PM +0200, Hannu Krosing wrote:
> > On P, 2005-03-20 at 00:52 +0100, PFC wrote:
> 
> > >   Also note the possibility to mark a partition READ ONLY. Or even a 
> > > table.
> 
> > Would we still need regular VACUUMing of read-only table to avoid 
> > OID-wraparound ?
> 
> You could VACUUM FREEZE the table or partition, so you wouldn't need to
> vacuum it again.

But when I do just VACUUM; will this know to avoid vacuuming VACUUM
FREEZE'd partitions ? 

Or could this be somehow liked to READ ONLY + VACUUM FREEZE state ?

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-22 Thread Hannu Krosing

On L, 2005-03-19 at 23:47 -0500, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Hmm.  You are right, but without that we won't be able to enforce
> > uniqueness on the partitioned table (we could only enforce it on each
> > partition, which would mean we can't partition on anything else than
> > primary keys if the tables have one).  IMHO this is something to
> > consider.
> 
> Well, partitioning on the primary key would be Good Enough for 95% or
> 99% of the real problems out there.  I'm not excited about adding a
> large chunk of complexity to cover another few percent.

That automatically means that partitioning expression has to be a range
over PK. (you dont want to have every tuple in separate tabel :)

And it also means that you have to automatically create new partitions.

Are you sure that partitioning on anything else than PK would be
significantly harder ?

I have a case where I do manual partitioning over start_time
(timestamp), but the PK is an id from a sequence. They are almost, but
not exactly in the same order. And I don't think that moving the PK to
be (start_time, id) just because of "partitioning on PK only" would be a
good design in any way.

So please don't design the system to partition on PK only.

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-22 Thread Hannu Krosing

On L, 2005-03-19 at 19:03 -0500, Tom Lane wrote:
> Josh Berkus  writes:
> >>> -- INSERT INTO should automatically create new partitions where necessary
> >>> -- DELETE FROM should automatically drop empty partitions
> >> 
> >> I am not sure I agree with either of those, and the reason is that they
> >> would turn low-lock operations into high-lock operations.  
> 
> > For INSERT, I think that's a problem we need to work through.
> 
> Possibly, but I'm concerned about locking and deadlock issues.  The
> reason that this is iffy is you would start the operation with only
> an INSERT-grade lock, and then discover that you needed to add a
> partition, which is surely something that needs an exclusive-grade
> lock (consider two sessions trying to add the same partition at the
> same time).  So I don't see how to do it without lock upgrading,
> and lock upgrading is always a recipe for deadlocks.
> 
> The DELETE case is even worse because you can't physically release
> storage until you're sure nothing in it is needed anymore by any open
> transaction --- that introduces VACUUM-like issues as well as the
> deadlock problem.
> 

If we go with my proposal (other post in this thread) of doing most of
the partitioning in the level between logical file and physikal 1Gb
storage files, then adding a partition should be nearly the same as
crossing the 1G boundary is now.

removing the partition would be just plain vacuum (if we can make pg
shring each 1G subfile independently)

> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-22 Thread Hannu Krosing

On L, 2005-03-19 at 12:02 -0800, Josh Berkus wrote:
> Folks,
> 
> I may (or may not) soon have funding for implementing full table partitioning 
> in PostgreSQL. 

If you don't get it, contact me as there is a small possibility that I
know a company interested enough to fund (some) of it :)

> I thought it would be a good idea to discuss with people here 
> who are already using pseudo-partitioning what things need to be added to 
> Postgresql in order to make full paritioning a reality; that is, what do 
> other databases do that we don't?

As these are already discussed in this thread, I'll try to outline a
method of providing a global index (unique or not) in a way that will
still make it possible to quickly remove (and not-quite-so-quickly add)
a partition.

The structure is inspired by the current way of handling >1Gb tables.

As each tid consists of 32 bit page pointer we have pointerspace of
35184372088832 bytes/index (4G of 8k pages). currently this is directly
partitioned mapped to 1Gbyte/128kpage files, but we can, with minimal
changes to indexes, put a lookup table between index and page lookup.

In case of global index over partitions this table could point to 1G
subtables from different partition tables.

The drop partition table can also be fast - just record the pages in
lookup table as deleted - means one change per 1G of dropped table.
The next vacuum should free pointers to deleted subfiles.

Adding partitions is trickier - 

If the added table forms part of partitioning index (say names from C to
E), and there is a matching index on subtable, 

Then that part of btree can probably copied into the main btree index as
a tree btanch, which should be relatively fast (compared to building it
one tid at a time).

Else adding the the index could probably also be sped up by some kind of
index merge - faster than building from scratch but slower than above.

To repeat - the global index over partitioned table should have te same
structure as our current b-tree index, only with added map of 128k index
partitions to 1G subfiles of (possibly different) tables. This map will
be quite small - for 1Tb of data it will be only 1k entries - this will
fit in cache on all modern processors and thus should add only tiny
slowdown from current direct tid.page/128k method

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-22 Thread Hannu Krosing

On P, 2005-03-20 at 00:52 +0100, PFC wrote:
> > tableoid would accomplish that already, assuming that the "partitioned
> > table" is effectively a view on separate physical tables.
> >
> > regards, tom lane
> 
>   Very good.
> 
>   Also note the possibility to mark a partition READ ONLY. Or even a 
> table.
>   It does not seem very useful but just think that for instance the 
> "1999",  
> "2000" ... "2004" partitions of a big archive probably never change.  
> READLONY means we're sure they never change, thus no need to backup them  
> every time. Keeping the example of some DB arranged by years / current  
> year / current month, Just backup the "current month" part every day and  
> the "current year" every month when you switch partitions.
>   This could be achieved also by storing the time of last modification of 
> a  
> table somewhere.

Would we still need regular VACUUMing of read-only table to avoid 
OID-wraparound ?

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-22 Thread Hannu Krosing

On E, 2005-03-21 at 09:55 -0800, Josh Berkus wrote:
> Stacy,
> 
> > Luckily they that had the chance to work with a truly fantastic DBA (the
> > author of an Oracle Press performance tuning book even) before they could
> > switch back.  He convinced them to make some of their indexes global.
> > Performance dramatically improved (compared with both the unpartitioned
> > schema, and the partitioned-and-locally-indexed schema), and they've since
> > stayed with partitioned tables and a mix of local and global indexes.
> 
> Hmmm.  Wouldn't Greg's suggestion of a bitmap index which holds information 
> on 
> what values are found in what partition also solve this?Without 1/2 of 
> the overhead imposed by global indexes?
>
> I can actually see such a bitmap as being universally useful to the 
> partitioning concept ... for one, it would resolve the whole "partition on 
> {value}" issue.

I once (maybe about a year ago) tried to elaborate using bitmap 
index(es) with page granularity as a tool for simultaneous clustering
and lookup for data warehousing using postgres. the main idea was to
determine storage location from AND of all "clustered" bitmap indexes
and corresponding fast and clustered lookups.

This could/should/maybe :) possibly be combined with clustering as well.

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] One tuple per transaction

2005-03-18 Thread Hannu Krosing

On L, 2005-03-12 at 14:05 -0800, Josh Berkus wrote:
> Tambet,
> 
> > In one of our applications we have a database function, which
> > recalculates COGS (cost of good sold) for certain period. This involves
> > deleting bunch of rows from one table, inserting them again in correct
> > order and updating them one-by-one (sometimes one row twice) to reflect
> > current state. The problem is, that this generates an enormous amount of
> > tuples in that table.
> 
> Sounds like you have an application design problem ...  how about re-writing 
> your function so it's a little more sensible?

Also, you could at least use a temp table for intermediate steps. This
will at least save WAL traffic.

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-26 Thread Hannu Krosing

Ühel kenal päeval (teisipäev, 25. jaanuar 2005, 10:41-0500), kirjutas
Tom Lane:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > Why is removing index entries essential ?
> 
> Because once you re-use the tuple slot, any leftover index entries would
> be pointing to the wrong rows.

That much I understood ;)

But can't clearing up the index be left for "later" ? 

Indexscan has to check the data tuple anyway, at least for visibility.
would adding the check for field sameness in index and data tuples be
too big performance hit ?

>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>        http://www.postgresql.org/docs/faq
-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-25 Thread Hannu Krosing

Ühel kenal päeval (neljapäev, 20. jaanuar 2005, 11:02-0500), kirjutas
Rod Taylor:


> Slony has some other issues with databases > 200GB in size as well
> (well, it hates long running transactions -- and pg_dump is a regular
> long running transaction)

IIRC it hates pg_dump mainly on master. If you are able to run pg_dump
from slave, it should be ok.

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-25 Thread Hannu Krosing

Ühel kenal päeval (neljapäev, 20. jaanuar 2005, 16:00+0100), kirjutas
Hervé Piedvache:

> > Will both do what you want. Replicator is easier to setup but
> > Slony is free.
> 
> No ... as I have said ... how I'll manage a database getting a table of may 
> be 
> 250 000 000 records ? I'll need incredible servers ... to get quick access or 
> index reading ... no ?
> 
> So what we would like to get is a pool of small servers able to make one 
> virtual server ... for that is called a Cluster ... no ?
> 
> I know they are not using PostgreSQL ... but how a company like Google do to 
> get an incredible database in size and so quick access ?

They use lots of boxes and lots custom software to implement a very
specific kind of cluster.

> regards,
-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-25 Thread Hannu Krosing

Ühel kenal päeval (pühapäev, 23. jaanuar 2005, 15:40-0500), kirjutas Tom
Lane:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Changing the idea slightly might be better: if a row update would cause
> > a block split, then if there is more than one row version then we vacuum
> > the whole block first, then re-attempt the update.
> 
> "Block split"?  I think you are confusing tables with indexes.
> 
> Chasing down prior versions of the same row is not very practical
> anyway, since there is no direct way to find them.
> 
> One possibility is, if you tried to insert a row on a given page but
> there's not room, to look through the other rows on the same page to see
> if any are deletable (xmax below the GlobalXmin event horizon).  This
> strikes me as a fairly expensive operation though, especially when you
> take into account the need to get rid of their index entries first.

Why is removing index entries essential ?

In pg yuo always have to visit data page, so finding the wrong tuple
there could just produce the same result as deleted tuple (which in this
case it actually is). The cleaning of index entries could be left to the
real vacuum.

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-25 Thread Hannu Krosing

Ühel kenal päeval (esmaspäev, 24. jaanuar 2005, 11:52+0900), kirjutas
Tatsuo Ishii:
> > Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> > > Probably VACUUM works well for small to medium size tables, but not
> > > for huge ones. I'm considering about to implement "on the spot
> > > salvaging dead tuples".
> > 
> > That's impossible on its face, except for the special case where the
> > same transaction inserts and deletes a tuple.  In all other cases, the
> > transaction deleting a tuple cannot know whether it will commit.
> 
> Of course. We need to keep a list of such that tuples until commit or
> abort.

what about other transactions, which may have started before current one
and be still running when current one commites ?

I once proposed an extra parameter added to VACUUM FULL which determines
how much free space to leave in each page vacuumed. If there were room
the new tuple could be placed near the old one in most cases and thus
avoid lots of disk head movement when updating huge tables in one go.

Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] pg_restore taking 4 hours!

2004-12-07 Thread Hannu Krosing

On P, 2004-12-05 at 21:43, Rodrigo Carvalhaes wrote:
> Hi !
> 
> Thanks for the lots of tips that I received on this matter.
> 
...
> There is something more that I can try to improve this performance?

check the speed of your ide drive. maybe tweak some params with
/sbin/hdparm . Sometimes the defaults result in 2MB/sec r/w speeds
(instead on(30-70 MB/sec)

Hannu

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] inconsistent/weird index usage

2004-10-02 Thread Hannu Krosing

On R, 2004-10-01 at 19:34, Tom Lane wrote:
> Josh Berkus <[EMAIL PROTECTED]> writes:
> >> Most of the problem here comes from the fact that "current_date - 7"
> >> isn't reducible to a constant and so the planner is making bad guesses
> >> about how much of each table will be scanned.  
> 
> > I thought this was fixed in 7.4.  No?
> 
> No.  It's not fixed as of CVS tip either, although there was some talk
> of doing something in time for 8.0.

That's weird - my 7.4.2 databases did not consider (now()-'15
min'::interval) to be a constant whereas 7.4.5 does (i.e. it does use
index scan on index on datetime column)

Is this somehow different for date types ?

--
Hannu

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Slow response of PostgreSQL

2004-02-19 Thread Hannu Krosing

Saleem Burhani Baloch kirjutas N, 19.02.2004 kell 11:01:
> Hi,
> 
> Thanks every one for helping me. I have upgraded to 7.4.1 on 
> redhat 8 ( rh 9 require a lot of lib's) and set the configuration 
> sent by Chris. Now the query results in 6.3 sec waooo. I m thinking 
> that why the 7.1 process aggregate slowly. Anyway.
> 
> I still have to go for 2 sec result 

What is the plan now ?


Hannu


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] 7.3 vs 7.4 performance

2004-02-06 Thread Hannu Krosing

Christopher Browne kirjutas N, 05.02.2004 kell 07:32:
> Oops! [EMAIL PROTECTED] (Orion Henry) was seen spray-painting on a wall:
> > Oh... as a side note I'm happy to announce that the 2.6 Linux kernel
> > has more than DOUBLED the speed of all my Postgres queries over the
> > 2.4. =)
> 
> I did some heavy-transaction-oriented tests recently on somewhat
> heftier quad-Xeon hardware, and found little difference between 2.4
> and 2.6, and a small-but-quite-repeatable advantage with FreeBSD 4.9.
> Now, I'm quite sure my load was rather different from yours, but I
> find the claim of doubling of speed rather surprising.

perhaps you were just IO-bound while he was not ?

or starving on some locks ?

-
Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] 7.3 vs 7.4 performance

2004-02-06 Thread Hannu Krosing

Orion Henry kirjutas N, 05.02.2004 kell 07:16:
> I've done some testing of 7.3.4 vs 7.4.1 and found 7.4.1 to be 20%-30%
> slower than 7.3.4.  Is this common knowledge or am I just unlucky with
> my query/data selection?
> 
> Things of note that might matter: the machine is a dual Opteron 1.4GHz
> running Fedora Core 1 Test 1 for X86_64.  The 7.3.4 was from the Fedora
> distro and the 7.4.1 was the PGDG package.

Are you sure that it is not the case that it is not tha case that 7.3.4
is 64 bit and the PGDG package is 32 ?

> The database is 3.5 Gigs with 10 millions rows and the machine had 1 Gig or ram.
> 
> Oh... as a side note I'm happy to announce that the 2.6 Linux kernel has
> more than DOUBLED the speed of all my Postgres queries over the 2.4. =)

Is this on this same hardware ?

-
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Hannu Krosing

Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 23:32:
> The end date in the previous example was actually invalid  between
> '2004-01-12'::date and '2003-01-12'::date;
> There have been multiple inserts since I recreated the index but it took
> quite some time to complete the following
> PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
> between '2004-01-12'::date and '2004-01-13'::date;

You could try ORDER BY to bias the optimiser towards using an index:

explain analyze
 select *
  from forecastelement
 where valid_time > '2004-01-12'::date
 order by valid_time
 limit 10;

This also may be more close to what you are expecting :)

--
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Hannu Krosing

Hannu Krosing kirjutas N, 22.01.2004 kell 22:46:
> Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
> > Something that I do not understand is why if you use a valid_time =
> > '2004-01-22 00:00:00' the query will use the index but if you do a
> > valid_time >  '2004-01-22 00:00:00' it does not use the index?
> 
> It probably can't tell if > is selective enough to justify using index.
> 
> Together with "limit 10" it may be.
> 
> You could try 
> 
> explain analyze select * from forecastelement where valid_time between 
> '2004-01-22'::date and '2004-01-22'::date limit 10;

Sorry, that should have been:

between '2004-01-22'::date and '2004-01-23'::date


> to see if this is considered good enough.
> 
> --
> Hannu
> 
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] database performance and query performance question

2004-01-22 Thread Hannu Krosing

Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
> Something that I do not understand is why if you use a valid_time =
> '2004-01-22 00:00:00' the query will use the index but if you do a
> valid_time >  '2004-01-22 00:00:00' it does not use the index?

It probably can't tell if > is selective enough to justify using index.

Together with "limit 10" it may be.

You could try 

explain analyze select * from forecastelement where valid_time between 
'2004-01-22'::date and '2004-01-22'::date limit 10;

to see if this is considered good enough.

--
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] failures on machines using jfs

2004-01-13 Thread Hannu Krosing

Greg Spiegelberg kirjutas E, 12.01.2004 kell 19:03:
> Hannu Krosing wrote:
> > Spiegelberg, Greg kirjutas P, 11.01.2004 kell 18:21:
> > 
> >>It would seem we're experiencing somthing similiar with our scratch
> >>volume (JFS mounted with noatime).
> > 
> > 
> > Which files/directories do you keep on "scratch" volume ?
> > 
> > All postgres files or just some (WAL, tmp) ?
> 
> No Postgres files are kept in scratch only the files being loaded
> into the database via COPY or lo_import.

then the speedup does not make any sense !

Is reading from jfs filesystem also 5 times faster than reading from
ext3 ?

The only explanation I can give to filling database from jfs volume to
be so much faster could be some strange filesystem cache interactions.

Hannu

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] failures on machines using jfs

2004-01-12 Thread Hannu Krosing

Spiegelberg, Greg kirjutas P, 11.01.2004 kell 18:21:
> It would seem we're experiencing somthing similiar with our scratch
> volume (JFS mounted with noatime).

Which files/directories do you keep on "scratch" volume ?

All postgres files or just some (WAL, tmp) ?

-
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Excessive rows/tuples seriously degrading query

2003-12-16 Thread Hannu Krosing

Chadwick, Russell kirjutas L, 13.12.2003 kell 00:40:
>  
> Hello everyone.
> Can anyone explain why this table which has never had more than a
> couple rows in it shows > 500k in the query planner even after running
> vacuum full.  Its terribly slow to return 2 rows of data.  The 2 rows
> in it are being updated a lot but I couldn't find any explanation for
> this behavior. 

It can be that there is an idle transaction somewhere that has locked a
lot of rows (i.e. all your updates have been running inside the same
transaction for hour or days)

try: 
$ ps ax| grep post

on my linux box this gives

 1683 ?  S0:00 /usr/bin/postmaster -p 5432
 1704 ?  S0:00 postgres: stats buffer process
 1705 ?  S0:00 postgres: stats collector process
 5520 ?  S0:00 postgres: hu hannu [local] idle in transaction
 5524 pts/2  S0:00 grep post

where backend 5520 seems to be the culprit.

>  Anything I could try besides droping db and recreating?  

make sure that no other backend is connected to db and do your 
> vacuum full; analyze;


or if there seems to be something unidentifieable making your table
unusable, then just recreate that table:

begin;
create table stock_log_positions_tmp 
as select * from stock_log_positions;
drop table stock_log_positions;
alter table stock_log_positions_tmp
 rename to stock_log_positions;
-- if you have any constraints, indexes or foreign keys
-- then recreate them here as well
commit;

> Thanks - Russ
>  
---
hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] cross table indexes or something?

2003-11-26 Thread Hannu Krosing

Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14:
> I was wondering if there is something I can do that would act similar to
> a index over more than one table. 
> 
> I have about 3 million people in my DB at the moment, they all have
> roles, and many of them have more than one name. 
> 
> for example, a Judge will only have one name, but a Litigant could have
> multiple aliases. Things go far to slow when I do a query on a judge
> named smith.

If you dont need all the judges named smith you could try to use LIMIT.

Have you run ANALYZE ? Why does DB think that there is only one judge
with name like SMITH% ?

-
Hannu

P.S. 
Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM]

-
Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] why index scan not working when using 'like'?

2003-11-26 Thread Hannu Krosing

Tom Lane kirjutas T, 25.11.2003 kell 23:29:
> Josh Berkus <[EMAIL PROTECTED]> writes:
> > In regular text fields containing words, your problem is solvable with full 
> > text indexing (FTI).   Unfortunately, FTI is not designed for arbitrary 
> > non-language strings.  It could be adapted, but would require a lot of 
> > hacking.
> 
> I'm not sure why you say that FTI isn't a usable solution.  As long as
> the gene symbols are separated by whitespace or some other non-letters
> (eg, "foo mif bar" not "foomifbar"), I'd think FTI would work.

If he wants to search on arbitrary substring, he could change tokeniser
in FTI to produce trigrams, so that "foomifbar" would be indexed as if
it were text "foo oom omi mif ifb fba bar" and search for things like
%mifb% should first do a FTI search for "mif" AND "ifb" and then simple
LIKE %mifb% to weed out something like "mififb".

There are ways to use trigrams for 1 and 2 letter matches as well.

-
Hannu

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Help with count(*)

2003-11-14 Thread Hannu Krosing

Christopher Browne kirjutas R, 14.11.2003 kell 16:13:
> Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Rajesh Kumar Mallah) 
> wrote:
> > INFO:  "profiles": found 0 removable, 369195 nonremovable row versions in 43423 
> > pages
> > DETAIL:  246130 dead row versions cannot be removed yet.
> > Nonremovable row versions range from 136 to 2036 bytes long.
> 
> It seems as though you have a transaction open that is holding onto a
> whole lot of old rows.
> 
> I have seen this happen somewhat-invisibly when a JDBC connection
> manager opens transactions for each connection, and then no processing
> happens to use those connections for a long time.  The open
> transactions prevent vacuums from doing any good...

Can't the backend be made to delay the "real" start of transaction until
the first query gets executed ?


Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Interbase/Firebird - any users out there - what's

2003-11-06 Thread Hannu Krosing

Christopher Browne kirjutas N, 06.11.2003 kell 20:23:
> "Private" <[EMAIL PROTECTED]> writes:
> > Try this benchmark on PostgreSQL, MySQL, FireBird, Oracle:
> >
> > http://go.jitbot.com/dbbench-pg-fb-mys-orcl
> 
> It looks like a good candidate for adding in a plpgsql stored
> procedure to get similar speedups to what was gotten with the Oracle
> benchmark.

It would also be interesting to see the same test run on Postgresql on
Linux/UNIX. PgSQL on win2000 (most likely using cygwin) is probably not
the best you can get out of that hardware.

---
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Slow performance with no apparent reason

2003-10-26 Thread Hannu Krosing

Yonatan Goraly kirjutas P, 26.10.2003 kell 00:25:
> I am in the process of  adding PostgreSQL support for an application,
> in addition to Oracle and MS SQL.
> I am using PostgreSQL version 7.3.2, Red Hat 9.0 on Intel Pentium III
> board.
> 
> I have a query that generally looks like this:
> 
> SELECT t1.col1, t2.col1 FROM t1, t2 WHERE t1.x=t2.y AND t2.p='string'
> AND t2.q=1
> 
> This query is strikingly slow (about 100 sec when both t1 and t2 has
> about 1,200 records, compare with less than 4 sec with MS SQL and
> Oracle)

always send results of EXPLAIN ANALYZE if you ask for help on [PERFORM] 

knowing which indexes you have would also help.

and you should have run ANALYZE too.

-
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Performance Concern

2003-10-25 Thread Hannu Krosing

John Pagakis kirjutas L, 25.10.2003 kell 12:56:

> I wrote a JAVA simulation of the above that did 1000 updates in 37 seconds.
> That left me scratching my head because in psql when I did the
> semi-equivalent:
> 
> UPDATE baz SET customer_id = '1234' WHERE baz_key IN( SELECT baz_key FROM
> baz WHERE customer_id IS NULL LIMIT 1000 );

try it this way, maybe it will start using an index :

UPDATE baz
   SET customer_id = '1234'
 WHERE baz_key IN (
SELECT baz_key
  FROM baz innerbaz
 WHERE customer_id IS NULL
   and innerbaz.baz_key = baz.baz_key
 LIMIT 1000 );

you may also try to add a conditional index to baz:

CREATE INDEX baz_key_with_null_custid_nxd
ON  baz
 WHERE customer_id IS NULL;

to make the index access more efficient.


Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Performance Concern

2003-10-25 Thread Hannu Krosing

John Pagakis kirjutas L, 25.10.2003 kell 10:16:
> Christopher -
> Thanks.
> 
> Answer 1:
> I believe auto commit was off (but I'm not at my dev box right now).  I'll
> double-check that and the commit interval.
> 
> Answer 2:
> Ah ha!!  No indexes on FKs.  I'll try that.
> 
> Yes, each baz is a uniquely identifiable.  I had started a SP to create gen
> the key but scrapped it when I saw no rand() function in pgpsql.  Did I miss
> something?

hannu=# select random();
  random
--
 0.59924242859671
(1 row)


\df lists all available functions in psql

to generate string keys you could use something like:

hannu=# select 'key' || to_hex(cast(random()*10 as int));
  ?column?
-
 key1e22d8ea
(1 row)

-
Hannu


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Performance Concern

2003-10-24 Thread Hannu Krosing

Christopher Browne kirjutas R, 24.10.2003 kell 22:10:

> That might be something of an improvement, but it oughtn't be
> cripplingly different to use a text field rather than an integer.

I suspect his slowness comes from not running analyze when it would be
time to start using indexes for fk checks - if you run analyze on an
empty table and then do 1 inserts, then all these will run their
checks using seqscan, as this is the fastest way to do it on an empty
table ;)

> What's crippling is submitting 100,000 queries in 100,000
> transactions.  Cut THAT down to size and you'll see performance return
> to being reasonable.

even this should not be too crippling.

I 0nce did some testing for insert performance and got about 9000
inserts/sec on 4 CPU Xeon with 2GB ram and RAID-5 (likely with battery
backed cache).

This 9000 dropped to ~250 when I added a primary key index (to a
60.000.000 record table, so that the pk index fit only partly in
memory), all this with separate transactions, but with many clients
running concurrently. (btw., the clients were not java/JDBC but
Python/psycopg)


With just one client you are usually stuck to 1 trx/disk revolution, at
least with no battery-backed write cache.

even 250/sec should insert 1 in 40 sec.

--
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Low Insert/Update Performance

2003-10-20 Thread Hannu Krosing

Rhaoni Chiu Pereira kirjutas E, 20.10.2003 kell 17:13:
> Hi List,
> 
>I got a P4 1.7Ghz , 512MB RAM , HD 7200 RPM, on RED HAT 9 running PostgreSQL 
> 7.3.2-3 Database.
>I have a Delphi aplication that updates the Oracle database using .dbf 
> file's information  ( converting the data from the old clipper aplication ) and 
> it takes about 3min and 45 seconds to update Jan/2003 .

Have you tried contrib/dbase to do the same ?

How fast does this run

> My problem is that I must substitute this Oracle for  a PostgreSQL database 
> and this same Delphi aplication takes 45 min to update Jan/2003.
> All delphi routines are converted and optmized to work with PgSQL.

Could it be that you try to run each insert in a separate transaction in
PgSQL version ?

Another possibility is that there is a primary key index created on
empty tables which is not used in subsequent UNIQUE tests when tables
start to fill and using index would be useful. An ANALYZE in a parallel
backend could help here. Same can be true for foreign keys and unique
constraints.

---
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-20 Thread Hannu Krosing

Alexander Priem kirjutas E, 20.10.2003 kell 16:04:
> Even better than the four-disk NAS I mentioned earlier is the following:
> 
> Promise UltraTrak RM8000. This is a so-called SCSI-to-IDE RAID system.

While you are at it, you could also check out http://www.3ware.com/

I guess one of these with 1 rpm 36GB SATA drivest would be pretty
fast and possibly cheaper than SCSI raid.

--
Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-20 Thread Hannu Krosing

Alexander Priem kirjutas E, 20.10.2003 kell 15:29:
> Thanks for your reply, Jeff.
> 
> If we are going to use a NAS device for storage, then it will be attached
> through a gigabit ethernet connection. Fiber will not be an option, since
> that would negate the savings we can make by using an IDE NAS device instead
> of SCSI-RAID, fiber's pretty expensive, right?
> 
> Using a NAS device (that is used only by PostgreSQL, so it's dedicated) with
> 3Gb of RAM and four 7200 rpm IDE harddisks, connected using a gigabit
> ethernet connection to the PostgreSQL server, do you think it will be a
> match for a SCSI-RAID config using 4 or 6 15000rpm disks (RAID-10) through a
> SCSI-RAID controller having 128mb of writeback cache (battery-backed)?

I sincerely don't know.

But if NAS is something that involves TCP (like iSCSI) then you should
take a look at some network card and TCP/IP stack that offloads the
protocol processing to the coprocessor on network card. (or just have
some extra processors free to do the protocol processing )

---
Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Index/Foreign Key Question

2003-10-12 Thread Hannu Krosing

Andrew Sullivan kirjutas P, 12.10.2003 kell 22:28:
> On Fri, Oct 10, 2003 at 09:01:12PM -0500, Ron Johnson wrote:
> > 
> > > Does PostgreSQL only pick one index per table on the select statements?
> > 
> > That's it's preference.
> 
> As far as I know, that's all it can do.  Do you know something
> different?

Tom has mentioned the possibility of using bitmaps as a an intermadiate
step, this would make star joins much faster as we could AND all index
info and actually examine onlu tuples that mach all indexes.

None of it is done by now, AFAIK.

---
Hannu

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Compare rows

2003-10-09 Thread Hannu Krosing

Josh Berkus kirjutas N, 09.10.2003 kell 08:36:
> Chris,

> > The need to do a lot of joins would likely hurt performance somewhat,
> > as well as the way that it greatly increases the number of rows.
> > Although you could always split it into several tables, one for each
> > "value_type", and UNION them into a view...
> 
> It increases the number of rows, yes, but *decreases* the storage size of data 
> by eliminating thousands ... or millions ... of NULL fields. 

I'm not sure I buy that.

Null fields take exactly 1 *bit* to store (or more exactly, if you have
any null fields in tuple then one 32bit int for each 32 fields is used
for NULL bitmap), whereas the same fields in "vertical" table takes 4
bytes for primary key and 1-4 bytes for category key + tuple header per
value + neccessary indexes. So if you have more than one non-null field
per tuple you will certainly lose in storage. 

> How would splitting the vertical values into dozens of seperate tables help things?

If you put each category in a separate table you save 1-4 bytes for
category per value, but still store primary key and tuple header *per
value*.

Jou may stii get better performance for single-column comparisons as
fewer pages must be touched.

> Personally, I'd rather have a table with 3 columns and 8 million rows than a 
> table with 642 columns and 100,000 rows.  Much easier to deal with.

Same here ;)

--
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function

2003-10-04 Thread Hannu Krosing

Tom Lane kirjutas L, 04.10.2003 kell 19:07:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> >> A while back I outlined how this would have to be done, and for it to
> >> be done efficiently, it would be anything BUT simple.  
> 
> > Could this be made a TODO item, perhaps with your attack plan. 
> 
> If I recall that discussion correctly, no one including Christopher
> thought the attack plan was actually reasonable.
> 
> What this keeps coming down to is that an optimization that helps only
> COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
> development and maintenance effort to justify its existence.

Please read further in my email ;)

The point I was trying to make was that faster count(*)'s is just a side
effect. If we could (conditionally) keep visibility info in indexes,
then this would also solve the problem fo much more tricky question of
index-structured tables.

Count(*) is *not* the only query that could benefit from not needing to
go to actual data table for visibilty info, The much more needed case
would be the "inveres time series" type of queries, which would
otherways trash cache pages badly.

Hannu

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] [HACKERS] Index/Function organized table layout (from Re:

2003-10-04 Thread Hannu Krosing

Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> [EMAIL PROTECTED] (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.  
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
> 
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
> 
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.  

Could this be made a TODO item, perhaps with your attack plan. 
Of course as strictly optional feature useful only for special situations
(see below)

I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes. 

> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for "walking through the
tuples" in data relation.

the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.

> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone.  I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.

Being able to do fast count(*) is just a side benefit.

Hannu

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] inferior SCSI performance

2003-10-02 Thread Hannu Krosing

Christopher Browne kirjutas K, 01.10.2003 kell 19:21:

> 
> The FS-related result appeared surprising, as the "stories" I had
> heard suggested that JFS hadn't been particularly heavily tuned on
> Linux, whereas XFS was supposed to be the "speed demon."

Gentoo linux recommends XFS only for SAN+fibre channel + good ups for
anything but database use ;)

> It is entirely possible that the result I saw was one that would
> reverse partially or even totally on a system LACKING that cache.  XFS
> might "play better" when we're cacheless; the (perhaps only fabled)
> demerits of JFS being more than totally hidden if we add the cache.
> 
> What I find disappointing is that it isn't possible to get SSD cards
> that are relatively inexpensive.  A similarly fabulous performance
> increase _ought_ to be attainable if you could stick pg_xlog and
> pg_clog on a 256MB (or bigger!) battery-backed SSD, ideally one that
> plugs into a PCI slot.

For really cheap and for small-size transactions you could experiment
with USB2 memory sticks (some of them claim 34MB/s write speed), perhaps
in striped/mirrored configuration. You would just need something
counting writes in the driver layer to alert you when you are reaching
the x00k "writes" limit and have to plug in new sticks :)

OTOH, articles I found through quick googling suggest only 2.4MB/s write
and 7MB/s read speeds for USB 2.0 memory sticks, so the 34MB is proably
just sales pitch and refers to bus speed, not actual write speed ;(

> Unfortunately, while there are companies hawking SSDs, they are in the
> "you'll have to talk to our salescritter for pricing" category, which
> means that they must be ferociously expensive.  :-(.

the cheapest I found was the one with external backup power was ~1.8k$
for 2GB PCI device

http://www.cdw.com/shop/search/Results.aspx?key=platypus&x=0&y=0

An external 16GB one with battery backup and
write-t-small-ide-drives-on-power-failure was ~25k$

-
Hannu

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Need advice about triggers

2003-09-10 Thread Hannu Krosing

Mindaugas Riauba kirjutas K, 10.09.2003 kell 13:21:

> 
> router_db=# explain analyze update ifdata set ifspeed=256000,
> ifreason='12121', iflastupdate=CURRENT_TIMESTAMP WHERE clientid='#0003904#';
>  QUERY PLAN
> 
> 
>  Index Scan using ifdata_clientid_key on ifdata  (cost=0.00..5.64 rows=1
> width=116) (actual time=0.17..0.36 rows=1 loops=1)
>Index Cond: (clientid = '#0003904#'::character varying)
>  Total runtime: 1.70 msec
> (3 rows)

could you try the same query on similar table, where clientid is int4 ?

is it faster ?

does the performance degrade at a slower rate?

---
Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Need advice about triggers

2003-09-09 Thread Hannu Krosing

Mindaugas Riauba kirjutas T, 09.09.2003 kell 15:40:
>   Hello,
> 
>   I have small table (up to 1 rows) and every row will be updated
> once per minute. Table also has "before update on each row" trigger
> written in plpgsql. But trigger 99.99% of the time will do nothing
> to the database. It will just compare old and new values in the row
> and those values almost always will be identical.
> 
>   Now I tried simple test and was able to do 1 updates on 1000
> rows table in ~30s. That's practically enough but I'd like to have
> more room to slow down.

Is it 1 *rows* or 1*1000 = 10 000 000 *rows* updated ?

When I run a simple update 10 times on 1000 rows (with no trigger, which
you claim to take about the same time) it took 0.25 sec.

> Also best result I achieved by doing commit+vacuum every ~500
> updates.

It seems like you are updating more than one row at each update ?

---
Hannu


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Serious issues with CPU usage

2003-09-06 Thread Hannu Krosing

[EMAIL PROTECTED] kirjutas L, 06.09.2003 kell 00:58:
> Hi,
> 
> i'm having _serious_ issues of postgres hogging up the CPU over time. A graph
> showing this can be seen at http://andri.estpak.ee/cpu0.png .
> 
> The database is running on Redhat 9 (stock 2.4.20-8 kernel), on a reiserfs
> partition (~8% usage - no problem there), and this problem has been with
> PostgreSQL 7.3.2 (both package provided by Redhat and self-rebuilt package)
> and 7.3.4 (i used the 7.3.4 SRPM available at postgres ftp site).
> 
> A VACUUM FULL is a remedy to this problem, but a simple VACUUM isn't. 

Could it be that FSM is too small for your vacuum interval ?

Also, you could try running REINDEX (instead of or in addition to plain
VACUUM) and see if this is is an index issue.

> This can be reproduced, I think, by a simple UPDATE command:
> 
> database=# EXPLAIN ANALYZE UPDATE table SET random_int_field = 1, last_updated
> = NOW() WHERE primary_key = 3772;
>QUERY PLAN
> -
>  Index Scan using table_pkey on table  (cost=0.00..6.81 rows=1 width=83)
> (actual time=0.09..0.10 rows=1 loops=1)
>Index Cond: (primary_key = 3772)
>  Total runtime: 0.37 msec
> 
> When I repeat this command using simple , I can see the "Total
> runtime" time grow ever so slightly - creeping from 0.37 to 0.38, then 0.39
> etc.   Would probably get higher if I had the patience. :)
> 
> The table "table" used in this example has 2721 rows, so size isn't an issue here.

Due to the MVCC the raw table size (file size) can be much bigger if you
dont VACUUM often enough.

> Any comments or suggestions are welcome. If more information is needed, let me
> know and I'll post the needed details.

1. What types of queries do you run, and how often ?

2. How is your database tuned (postgresql.conf settings) ?

3. How much memory does your machine have ?

BTW, are you sure that this is postgres that is using up the memory ?
I've read that reiserfs is a CPU hog, so this may be something that does
intensive disk access, so some IO stats would be useful as well as real
data and index file sizes.

You could also set up logging and then check if there are some
pathological queries that run for several hour doing nested seqscans ;)

---
Hannu





---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] SELECT's take a long time compared to other DBMS

2003-09-04 Thread Hannu Krosing

Relaxin kirjutas N, 04.09.2003 kell 17:35:
> So after you did that, where able to position to ANY record within the
> resultset?
> 
> Ex. Position 100,000; then  to Position 5; then to position 50,000, etc...

not in the case of :
   time psql test100k -c 'select * from test' > /dev/null
as the whole result would be written to dev null (i.e discarded)

Yes in case of python: after doing

res = con.query('select * from test') # 3 sec - perform query
list = res.getresult()# 1 sec - construct list of tuples

the whole 128k records are in a python list , 
so that i can immediately access any record by python list syntax,
ie list[5], list[5] etc.

> If you are able to do that and have your positioned row available to you
> immediately, then I'll believe that it's the ODBC driver.

It can also be the Cygwin port, which is known to have several problems,
and if you run both your client and server on the same machine, then it
can also be an interaction of the two processes (cygwin/pgsql server and
native win32 ODBC client) not playing together very well.

> "Hannu Krosing" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
> > Relaxin kirjutas N, 04.09.2003 kell 03:28:
> > > I have a table with 102,384 records in it, each record is 934 bytes.
> >
> > I created a test database on my Linux (RH9) laptop with 30GB/4200RPM ide
> > drive and P3-1133Mhz, 768MB, populated it with 128000 rows of 930 bytes
> > each and did
> >
> > [EMAIL PROTECTED] hannu]$ time psql test100k -c 'select * from test' >
> > /dev/null
> >
> > real0m3.970s
> > user0m0.980s
> > sys 0m0.570s
> >
> > so it seems definitely not a problem with postgres as such, but perhaps
> > with Cygwin and/or ODBC driver
> >
> > I also ran the same query  using the "standard" pg adapter:
> >
> > >>> import pg, time
> > >>>
> > >>> con = pg.connect('test100k')
> > >>>
> > >>> def getall():
> > ... t1 = time.time()
> > ... res = con.query('select * from test')
> > ... t2 = time.time()
> > ... list = res.getresult()
> > ... t3 = time.time()
> > ... print t2 - t1, t3-t2
> > ...
> > >>> getall()
> > 3.27637195587 1.10105705261
> > >>> getall()
> > 3.07413101196 0.996125936508
> > >>> getall()
> > 3.03377199173 1.07322502136
> >
> > which gave similar results
---
Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] SELECT's take a long time compared to other DBMS

2003-09-04 Thread Hannu Krosing

Relaxin kirjutas N, 04.09.2003 kell 03:28:
> I have a table with 102,384 records in it, each record is 934 bytes.

I created a test database on my Linux (RH9) laptop with 30GB/4200RPM ide
drive and P3-1133Mhz, 768MB, populated it with 128000 rows of 930 bytes
each and did 

[EMAIL PROTECTED] hannu]$ time psql test100k -c 'select * from test' >
/dev/null
 
real0m3.970s
user0m0.980s
sys 0m0.570s

so it seems definitely not a problem with postgres as such, but perhaps
with Cygwin and/or ODBC driver

I also ran the same query  using the "standard" pg adapter:

>>> import pg, time
>>>
>>> con = pg.connect('test100k')
>>>
>>> def getall():
... t1 = time.time()
... res = con.query('select * from test')
... t2 = time.time()
... list = res.getresult()
... t3 = time.time()
... print t2 - t1, t3-t2
...
>>> getall()
3.27637195587 1.10105705261
>>> getall()
3.07413101196 0.996125936508
>>> getall()
3.03377199173 1.07322502136

which gave similar results

--
Hannu 




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] PostgreSQL Reliability when fsync = false on

2003-09-03 Thread Hannu Krosing

Rod Taylor kirjutas N, 04.09.2003 kell 06:36:
> Another alternative is
> to buy a small 15krpm disk dedicated for WAL. In theory you can achieve
> one commit per rotation.

One commit per rotation would still be only 15000/60. = 250 tps, but
fortunately you can get better results if you use multiple concurrent
backends, then in the best case you can get one commit per backend per
rotation.

-
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Insert performance

2003-08-18 Thread Hannu Krosing

Shridhar Daithankar kirjutas E, 18.08.2003 kell 19:02:
> On 18 Aug 2003 at 18:52, Hannu Krosing wrote:
> > My own experimentation also got numbers in 9k/sec range (on a quad
> > 1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of
> > ~1000 inserts/transaction.
> > 
> > Performance dropped to ~300/sec (at about 60M rows) when I added an
> > index (primary key) - as I did random inserts, the hit rates for index
> > pages were probably low.
> 
> I was loading a geographic data couple of months back.. It was 3GB data when 
> loaded in postgresql.

With or without indexes ?

> I tried loading data first and creating index later. It ran out of available 
> 9GB space. So I created index on an empty table and started loading it. It was 
> slow but at least finished after 3 hours... Co-incidentally oracle had same 
> problems as well. So creating index beforehand remains only option at times, it 
> seems. Tom remarked that it shouldn't have made difference but apparently it 
> does..

Tom just fixed some memory leaks on array indexing the other day. Could
there be something like that on geographic types ?

> You mentioned parallel runs and still getting 9K/sec. Was that overall 9K or 
> per connection?

Overall. But notice that my setup was (a little) slower per processor.

>  If it is former, probably WAL is hit too hard. You could do 
> some additional testing by having WALit's own disk.

I guess that todays IDE disks are about the same speed (~50MB/sec) as my
test RAID was.

I run multiple parallel runs to have a chance to use all 4 processors
(but IIRC it was heavyly IO-bound) as well as to better use writing time
on WAL platters (not to wait for full rotation on each platter)

--
Hannu





---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Insert performance

2003-08-18 Thread Hannu Krosing

Shridhar Daithankar kirjutas E, 18.08.2003 kell 09:21:
> On 16 Aug 2003 at 11:40, Josh Berkus wrote:
> 
> > Shridhar,
> > 
> > > Unfortunately he can not use copy due to some constraints. 
> > 
> > Why not use COPY to load the table, and then apply the constraints by query 
> > afterwords?  It might not be faster, but then again it might.
> 
> Lol.. The constraints I mentioned weren't database constraints.. 

Can't you still apply them later ;)

My own experimentation also got numbers in 9k/sec range (on a quad
1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of
~1000 inserts/transaction.

Performance dropped to ~300/sec (at about 60M rows) when I added an
index (primary key) - as I did random inserts, the hit rates for index
pages were probably low.

--
Hannu

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] How Many Inserts Per Transactions

2003-08-14 Thread Hannu Krosing

Trevor Astrope kirjutas T, 05.08.2003 kell 18:59:
> I was wondering if anyone found a sweet spot regarding how many inserts to 
> do in a single transaction to get the best performance? Is there an 
> approximate number where there isn't any more performance to be had or 
> performance may drop off?
> 
> It's just a general question...I don't have any specific scenario, other
> than there are multiple backends doing many inserts.

I did test on huge (up to 60 million rows) simple table (5 fields with
primary key) and found that at that size many inserts per transaction
was actually a little slower than single inserts. It probably had to do
with inserting/checking new index entries and moving index pages from/to
disk. 

With small sizes or no index ~100 inserts/transaction was significantly
faster though.

I did run several (10-30) backends in parallel.

The computer was quad Xeon with 2GB RAM and ~50 MB/sec RAID.

--
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Hardware performance

2003-07-17 Thread Hannu Krosing

Joe Conway kirjutas N, 17.07.2003 kell 07:52:
> To an extent it depends on how big the drives are and how large you 
> expect the database to get. For maximal performance you want RAID 1+0 
> for data and WAL; and you want OS, data, and WAL each on their own 
> drives. So with 5 drives one possible configuration is:
> 
> 1 drive OS: OS on it's own drive makes it easy to upgrade, or restore 
> the OS from CD if needed
> 2 drives, RAID 1+0: WAL
> 2 drives, RAID 1+0: data

How do you do RAID 1+0 with just two drives ?

--
Hannu



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Improving a simple query?

2003-07-13 Thread Hannu Krosing

Steve Wampler kirjutas P, 13.07.2003 kell 23:46:
> On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote:
> > > I'm not an SQL or PostgreSQL expert.
> > >
> > > I'm getting abysmal performance on a nested query and
> > > need some help on finding ways to improve the performance:
> > [snip]
> > >  select * from attributes_table where id in (select id from
> > >   attributes where (name='obsid') and (value='oid00066'));
> > 
> > This is the classic IN problem (much improved in 7.4 dev I believe). The
> > recommended approach is to rewrite the query as an EXISTS form if
> > possible. See the mailing list archives for plenty of examples.
> > 
> > Could you not rewrite this as a simple join though?
> 
> Hmmm, I don't see how.  Then again, I'm pretty much the village
> idiot w.r.t. SQL...
> 
> The inner select is locating a set of (2049) ids (actually from
> the same table, since 'attributes' is just a view into
> 'attributes_table').  The outer select is then locating all
> records (~30-40K) that have any of those ids.  Is that really
> something a JOIN could be used for?

There may be some subtle differences, but most likely the 'join' form
wis like this:

select at.*
  from attributes_table at,
   attributes a
 where at.id = a.id
   and a.name='obsid'
   and a.value='oid00066'

--
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] PostgreSQL vs. MySQL

2003-07-05 Thread Hannu Krosing

Brian Tarbox kirjutas R, 04.07.2003 kell 15:27:
> I recently took a system from MySQL to Postgres.  Same HW, SW, same data.
> The major operations where moderately complex queries (joins on 8 tables).
> The results we got was that Postgres was fully 3 times slower than MySql.

For each and every query ??

> We were on this  list a fair bit looking for answers and tried all the
> standard answers. 

Could you post the list of "standard answers" you tried ?

>  It was still much  much much slower.

Was this with InnoDB ?

what kind of joins were they (i.e 
"FROM a JOIN b on a.i=b.i" 
or "FROM a,b WHERE a.i = b.i" ?

What was the ratio of planning time to actual execution time in pgsql?

Where the queries originally optimized for MySQL ?


Hannu

---(end of broadcast)---
TIP 8: explain analyze is your friend

81 matches

Mail list logo