Re: [PERFORM] 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-13 Thread Pierre Frédéric Caillau d


Your Query :

SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
(SELECT context_key FROM article_words JOIN words using (word_key)  
WHERE word = 'insider'

 INTERSECT
 SELECT context_key FROM article_words JOIN words using (word_key)  
WHERE word = 'trading')

AND contexts.context_key IN
 (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories  
ON (a.ancestor_key = bp_categories.context_key)

 WHERE lower(bp_categories.category) = 'law') AND articles.indexed;


I guess this is some form of keyword search, like :
- search for article
- with keywords "insider" and "trading"
- and belongs to a subcategory of "law"

The way you do it is exactly the same as the way phpBB forum implements  
it, in the case you use a database that doesn't support full text search.  
It is a fallback mechanism only meant for small forums on old versions of  
MySQL, because it is extremely slow.


Even your faster timing (7500 ms) is extremely slow.

Option 1 :

a) Instead of building your own keywords table, use Postgres' fulltext  
search, which is a lot smarter about combining keywords than using  
INTERSECT.
You can either index the entire article, or use a separate keyword field,  
or both.


b) If an article belongs to only one category, use an integer field. If,  
as is most often the case, an article can belong to several categories,  
use gist. When an article belongs to categories 1,2,3, set a column  
article_categories to the integer array {1,2,3}::INTEGER[]. Then, use a  
gist index on it.


You can then do a SELECT from articles (only one table) using an AND on  
the intersection of article_categories with an array of the required  
categories, and using Postgres' full text search on keywords.


This will most likely result in a Bitmap Scan, which will do the ANDing  
much faster than any other solution.


Alternately, you can also use keywords like category_1234, stuff  
everything in your keywords column, and use only Fulltext search.


You should this solution first, it works really well. When the data set  
becomes quite a bit larger than your RAM, it can get slow, though.


Option 2 :

Postgres' full text search is perfectly integrated and has benefits :  
fast, high write concurrency, etc. However full text search can be made  
much faster with some compromises.


For instance, I have tried Xapian : it is a lot faster than Postgres for  
full text search (and more powerful too), but the price you pay is

- a bit of work to integrate it
	- I suggest using triggers and a Python indexer script running in the  
background to update the index

- You can't SQL query it, so you need some interfacing
- updates are not concurrent (single-writer).

So, if you don't make lots of updates, Xapian may work for you. Its  
performance is unbelievable, even on huge datasets.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] some problems when i use postgresql 8.4.2 in my projects .

2010-02-03 Thread Pierre Frédéric Caillau d


when concurrency insert violate the unique constraints , they block each  
other , i test this in oracle10g, has the same behavour. I think this  
may be reasonable because the uqniue check must be  the seriazable  
check .
for resolve this problem , i do the unique check in application as  
possible , but in big concurrency env , this is not good way .


You probably can't do that in the application.

About exclusive constraints :

Transaction A : begin
Transaction A : insert value X
Transaction A : do some work, or just wait for client
...

Meanwhile :

Transaction B : begin
Transaction B : insert same value X
Transaction B : locked because A hasn't committed yet so the exclusive  
constraint can't be resolved


Transaction A : commit or rollback
Transaction B : lock is released, constraint is either OK or violated  
depending on txn A rollback/rommit.


	As you can see, the longer the transactions are, the more problems you  
get.


Solution 1 : change design.

- Why do you need this exclusive constraint ?
- How are the unique ids generated ?
- What kind of data do those ids represent ?
- Can you sidestep it by using a sequence or something ?
- Without knowing anything about your application, impossible to answer.

Solution 2 : reduce the transaction time.

- Optimize your queries (post here)
- Commit as soon as possible
- Long transactions (waiting for user input) are generally not such a good  
idea
- Anything that makes the txn holding the locks wait more is bad  
(saturated network, slow app server, etc)

- Optimize your xlog to make writes & commits faster

Solution 3 : reduce the lock time

Instead of doing :
BEGIN
INSERT X
... do some stuff ...
COMMIT;

do :

BEGIN
... do some stuff that doesn't depend on X...
INSERT X
... do less stuff while holding lock ...
COMMIT;

Solution 4 :

If you have really no control over value "X" and you need a quick reply  
"is X already there ?", you can use 2 transactions.

One transaction will "reserve" the value of X :

- SELECT WHERE col = X
ensures row and index are in cache whilst taking no locks)

- Set autocommit to 1
- INSERT X;
	inserts X and commits immediately, else cause an error. Lock will not be  
held for long, since autocommit means it commits ASAP.


- Perform the rest of your (long) operations in another transaction.

This is a bit less safe since, if the second transaction fails, insert of  
X is not rolled back.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ext4 finally doing the right thing

2010-01-21 Thread Pierre Frédéric Caillau d



Now, with ext4 moving to full barrier/fsync support, we could get to the
point where WAL in the main data FS can mimic the state where WAL is
seperate, namely that WAL writes can "jump the queue" and be written
without waiting for the data pages to be flushed down to disk, but also
that you'll get the big backlog of data pages to flush when
the first fsyncs on big data files start coming from checkpoints...


	Does postgres write something to the logfile whenever a fsync() takes a  
suspiciously long amount of time ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-16 Thread Pierre Frédéric Caillau d



I've changed the setting a bit:

(1) Replaced 7.200 disk by a 10.000 one, still sata though.

(2) Inserting rows only 10x times (instead of 100x times)
but 80mb each, so having the same amount of 800mb in total.

(3) Changed the WAL path to the system disk (by the
great 'junction' trick mentioned in the other posting), so
actually splitting the write access to the "system" disk and
the fast "data" disk.



And here is the frustrating result:

1. None of the 4 CPUs was ever more busy than 30% (never
less idle than 70%),

2. while both disks kept being far below the average write
performance: the "data" disk had 18 peaks of approx. 40 mb
but in total the average thoughput was 16-18 mb/s.


BTW:

* Disabling noatime and similar for ntfs did not change
things much (thanks though!).

* A short cross check copying 800mb random data file from
"system" to "data" disk showed a performance of constantly
75 mb/s.


So, I have no idea what remains as the bottleneck.

 Felix


Try this :

CREATE TABLE test AS SELECT * FROM yourtable;

This will test write speed, and TOAST compression speed.
Then try this:

CREATE TABLE test (LIKE yourtable);
COMMIT;
INSERT INTO test SELECT * FROM yourtable;

This does the same thing but also writes WAL.
I wonder what results you'll get.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] new server I/O setup

2010-01-15 Thread Pierre Frédéric Caillau d


No-one has mentioned SSDs yet ?...

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)

2010-01-15 Thread Pierre Frédéric Caillau d



 > 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

Would not recommend Windows OS.


	BTW, I'd be interested to know the NTFS fragmentation stats of your  
database file.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Pierre Frédéric Caillau d



http://www.hagander.net/talks/Advanced%20PostgreSQL%20on%20Windows.pdf


Great doc ! I'm keeping that ;)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Pierre Frédéric Caillau d
On Thu, 14 Jan 2010 22:28:07 +0100, fka...@googlemail.com  
 wrote:



Pierre Frédéric Caillaud:


> 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.

Big CPU and slow disk...

You should add another disk just for the WAL -- disks are pretty cheap
these days.
Writing the WAL on a second disk is the first thing to do on a
configuration like yours, if you are limited by writes.
It also reduces the fsync lag a lot since the disk is only doing WAL.


Good idea -- where can I set the path to WAL?


At install, or use a symlink (they exist on windows too !...)

http://stackoverflow.com/questions/1901405/postgresql-wal-on-windows

	I've no idea of the other needed NTFS tweaks, like if there is a  
noatime/nodiratime ?...




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an "unused" table kills my server

2010-01-14 Thread Pierre Frédéric Caillau d



"high CPU usage" It might very well be "high IO usage".


Try this :

Copy (using explorer, the shell, whatever) a huge file.
This will create load similar to ALTER TABLE.
Measure throughput, how much is it ?

	If your server blows up just like it did on ALTER TABLE, you got a IO  
system problem.

If everything is smooth, you can look into other things.

	How's your fragmentation ? Did the disk ever get full ? What does the  
task manager say (swap in/out, disk queue lengthn etc)


PS : try a separate tablespace on another disk.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Pierre Frédéric Caillau d



However the harddisk (sata) could write 43 MB/s in the worst
case! Why is write performance limited to 16 MB/s?

Some more hints what I do:

I use PQexecParams() and the INSERT ... $001 notation to NOT
create a real escapted string from the data additionally but
use a pointer to the 8MB data buffer.

I altered the binary column to STORAGE EXTERNAL.

Some experiments with postgresql.conf (fsync off,
shared_buffers=1000MB, checkpoint_segments=256) did not
change the 50s- much (somtimes 60s sometimes a little less).

4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.


Big CPU and slow disk...

	You should add another disk just for the WAL -- disks are pretty cheap  
these days.
	Writing the WAL on a second disk is the first thing to do on a  
configuration like yours, if you are limited by writes.

It also reduces the fsync lag a lot since the disk is only doing WAL.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-14 Thread Pierre Frédéric Caillau d



So, pgBouncer is pretty good.  It doesn't appear to be as good as
limiting TCON and using pconnect, but since we can't limit TCON in a
production environment, we may not have a choice.


Actually, you can : use lighttpd and php/fastcgi.

	Lighttpd handles the network stuff, and funnels/queues any number of  
client connections into a limited number of PHP fastcgi processes. You can  
configure this process pool to your tastes.


	Rather than instanciating 1 PHP interpreter (and 1 postgres) per client  
connection, you can set it up for a max of N PHP procs. If PHP waits a lot  
on IO (you use url fopen, that kind of things) you can set N=5..10 per  
core, but if you don't use that, N=2-3 per core is good. It needs to be  
tuned to your application's need.


	The idea is that if you got enough processes to keep your CPU busy,  
adding more will just fill your RAM, trash your CPU cache, add more  
context swithes, and generally lower your total throughput. Same is true  
for Postgres, too.


	I've switched from apache to lighttpd on a rather busy community site and  
the difference in performance and memory usage were quite noticeable.  
Also, this site used MySQL (argh) so the occasional locking on some MyISAM  
tables would become really itchy unless the number of concurrent processes  
was kept to a manageable level.


	When you bring down your number of postgres processes to some manageable  
level (plot a curve of throughput versus processes and select the  
maximum), if postgres still spends idle time waiting for locks, you'll  
need to do some exploration :


- use the lock view facility in postgres
	- check your triggers : are you using some trigger that updates a count  
as rows are modified ? This can be a point of contention.

- check your FKs too.
- try fsync=off
- try to put the WAL and tables on a ramdisk.
	If you have even a few % iowait, maybe that hides the fact that 1  
postmaster is fsyncing and perhaps 10 others are waiting on it to finish,  
which doesn't count as iowait...


- recompile postgres and enable lwlock timing



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-11 Thread Pierre Frédéric Caillau d


Each of the 256 requests was being processed by a php process.  So, it  
could

certainly be faster.  But, the fact that we're seeing the db performance
degrade would seem to indicate that our application is fast enough to  
punish

the db.  Isn't that true?


	Not necessarily. Your DB still has lots of idle CPU, so perhaps it's your  
client which is getting over the top. Or you have locking problems in your  
DB.

Things to test :

- vmstat on the benchmark client
- iptraf on the network link
- monitor ping times between client and server during load test

	Some time ago, I made a benchmark simulating a forum. Postgres was  
saturating the gigabit ethernet between server and client...


	If those PHP processes run inside Apache, I'd suggest switching to  
lighttpd/fastcgi, which has higher performance, and uses a limited,  
controllable set of PHP processes (and therefore DB connections), which in  
turn uses much less memory.


PS : try those settings :

fsync = fdatasync
wal_buffers = 64MB
walwriter_delay = 2ms
synchronous commits @ 1 s delay

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-11 Thread Pierre Frédéric Caillau d
On Sun, 10 Jan 2010 19:45:32 +0100, Robert Haas   
wrote:



2010/1/10 Pierre Frédéric Caillaud :



If you transfer (delete from staging, insert into archive) in one
transaction , then it will be always visible in exactly one of them,
and exatly once in a view over both staging and archive(s).


       Does the latest version implement this :

INSERT INTO archive (...) DELETE FROM staging WHERE ... RETURNING ...


No.  There are no plans to support that, though there are proposals to  
support:


WITH x AS (DELETE FROM staging WHERE ... RETURNING ...) INSERT INTO
archive (...) SELECT ... FROM x

I'm not sure how much that will help though since, in the designs so
far discused, the tuples won't be pipelined.

...Robert



	Yeah, but it's a lot more user-friendly than SELECT FOR UPDATE, INSERT  
SELECT, DELETE...




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choice of bitmap scan over index scan

2010-01-11 Thread Pierre Frédéric Caillau d
Postgres is being conservative. The plan it uses (bitmap index scan)  
will perform much better than an index scan when the data is not in the  
cache, by maybe an order of magnitude, depending on your hardware setup.


The index scan may perform better at the moment, but the bitmap index  
scan is safer.


	Suppose you make a query that will need to retrieve 5% of the rows in a  
table...


	If the table is nicely clustered (ie you want the latest rows in a table  
where they are always appended at the end with no holes, for instance),  
bitmap index scan will mark 5% of the pages for reading, and read them  
sequentially (fast). Plain index scan will also scan the rows more or less  
sequentially, so it's going to be quite fast too.


	Now if your table is not clustered at all, or clustered on something  
which has no correlation to your current query, you may hit the worst case  
: reading a ramdom sampling of 5% of the pages. Bitmap index scan will  
sort these prior to reading, so the HDD/OS will do smart things. Plain  
index scan won't.


	- worst case for bitmap index scan is a seq scan... slow, but if you have  
no other choice, it's OK.
	- worst case for plain index scan is a lot worse since it's a random  
seekfest.


	If everything is cached in RAM, there is not much difference (plain index  
scan can be faster if the bitmap "recheck cond" is slow).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-10 Thread Pierre Frédéric Caillau d



If you transfer (delete from staging, insert into archive) in one
transaction , then it will be always visible in exactly one of them,
and exatly once in a view over both staging and archive(s).


Does the latest version implement this :

INSERT INTO archive (...) DELETE FROM staging WHERE ... RETURNING ...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Pierre Frédéric Caillau d


That may help with the queries speed (not a problem now), but we'll then  
have to add UNION statement for daily staging table for other 5% of  
requests, right? And there would be a moment when daily message is in  
archive table AND in daily table (while transferring from daily table to  
archive).
Our main problem is in blocking when doing DELETE (app sometimes freezes  
for a long time), and also we have to do VACUUM on live table, which is  
not acceptable in our app.


Thanks for your reply, I was kinda worried about number of partitions  
and how this would affect PG query execution speed.


Kenneth Marshall wrote:
Oh, btw, 95% of queries are searching rows for current date (last 24  
hours).




You may want to use a daily staging table and then flush to the monthly  
archive tables at the end of the day.


	If the rows in the archive tables are never updated, this strategy means  
you never need to vacuum the big archive tables (and indexes), which is  
good. Also you can insert the rows into the archive table in the order of  
your choice, the timestamp for example, which makes it nicely clustered,  
without needing to ever run CLUSTER.


	And with partitioning you can have lots of indexes on the staging table  
(and current months partition) (to speed up your most common queries which  
are likely to be more OLTP), while using less indexes on the older  
partitions (saves disk space) if queries on old partitions are likely to  
be reporting queries which are going to grind through a large part of the  
table anyway.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-09 Thread Pierre Frédéric Caillau d



crank it up more and delay the checkpoints as much as possible during
these updates.  64 segments is already 1024M.


We have 425M rows, total table size is 78GB, so we can imagine a worst  
case UPDATE write is less than 200 bytes * number of rows specified in  
the update (is that logic correct?).


There is also the WAL : all these updates need to be logged, which doubles  
the UPDATE write throughput. Perhaps you're WAL-bound (every 16MB segment  
needs fsyncing), and tuning of fsync= and wal_buffers, or a faster WAL  
disk could help ? (I don't remember your config).


Inerestingly, the total index size is 148GB, twice that of the table,  
which may be an indication of where the performance bottleneck is.


Index updates can create random I/O (suppose you have a btree on a rather  
random column)...



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance while importing a very large data set in to database

2009-12-06 Thread Pierre Frédéric Caillau d


I have a very bit big database around 15 million in size, and the dump  
file

is around 12 GB.

While importing this dump in to database I have noticed that initially  
query

response time is very slow but it does improves with time.

Any suggestions to improve performance after dump in imported in to  
database

will be highly appreciated!


This is pretty normal.  When the db first starts up or right after a
load it has nothing in its buffers or the kernel cache.  As you access
more and more data the db and OS learned what is most commonly
accessed and start holding onto those data and throw the less used
stuff away to make room for it.  Our production dbs run at a load
factor of about 4 to 6, but when first started and put in the loop
they'll hit 25 or 30 and have slow queries for a minute or so.

Having a fast IO subsystem will help offset some of this, and
sometimes "select * from bigtable" might too.



Maybe it's the updating of the the hint bits ?...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-26 Thread Pierre Frédéric Caillau d


Is there any practical limit to the number of shared buffers PG 8.3.7  
can handle before more becomes counter-productive?


It is more efficient to have the page in shared buffers, rather than doing  
a context switch to the OS, copying the entire page from the OS's cache  
into shared buffers, and coming back to postgres. Shared buffers use less  
CPU. However, this is totally negligible versus the disk wait time of an  
uncached IO.


The same page may be cached once in shared_buffers, and once in the OS  
cache, so if your shared buffers is half your RAM, and the other half is  
disk cache, perhaps it won't be optimal: is stuff is cached twice, you can  
cache half as much stuff.


If your entire database can fit in shared buffers, good for you though.  
But then a checkpoint comes, and postgres will write all dirty buffers to  
disk in the order it finds them in Shared Buffers, which might be totally  
different from the on-disk order. If you have enough OS cache left to  
absorb these writes, the OS will reorder them. If not, lots of random  
writes are going to occur. On a RAID5 this can be a lot of fun.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using Gprof with Postgresql

2009-09-08 Thread Pierre Frédéric Caillau d
I just compiled it with gcc and produces the gmon.out file for every  
process; by the way I am running below script in order to produce  
readable .out files


  gprof .../pgsql/bin/postgres gmon.out > createtable2.out

is postgres the right executable?

regards
reydan


Off topic, but hace you tried oprofile ? It's excellent...

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving data between tables causes the db to overwhelm the system

2009-09-01 Thread Pierre Frédéric Caillau d



Indexes are on the partitions, my bad.


If you need to insert lots of data, it is faster to create the indexes  
afterwards (and then you can also create them in parallel, since you have  
lots of RAM and cores).



The explain plan looks like this:
explain SELECT * from bigTable
where
"time" >= extract ('epoch' from timestamp '2009-08-31 00:00:00')::int4
and "time" <= extract ('epoch' from timestamp '2009-08-31 23:59:59')::int
;

   QUERY PLAN

 Index Scan using bigTable_time_index on bigTable  (cost=0.00..184.04  
rows=1

width=129)
   Index Cond: (("time" >= 1251676800) AND ("time" <= 1251763199))
(2 rows)


What is slow, then, is it the insert or is it the select ?
Can you EXPLAIN ANALYZE the SELECT ?

If "bigTable" is not clustered on "time" you'll get lots of random  
accesses, it'll be slow.


If you want to partition your huge data set by "time", and the data isn't  
already ordered by "time" on disk, you could do this :


SET work_mem TO something very large like 10GB since you got 32GB RAM,  
check your shared buffers etc first;
CREATE TABLE tmp AS SELECT * FROM bigTable ORDER BY "time"; <- huge sort,  
will take some time


SET maintenance_work_mem TO something very large;
CREATE INDEX tmp_time ON tmp( "time" );

CREATE TABLE partition1 AS SELECT * FROM tmp WHERE "time" BETWEEN  
beginning AND end;

(repeat...)

Since tmp is clustered on "time" you'll get a nice fast bitmap-scan, and  
you won't need to seq-scan N times (or randomly index-scan) bigTable.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving data between tables causes the db to overwhelm the system

2009-09-01 Thread Pierre Frédéric Caillau d



We have a table that's > 2billion rows big and growing fast.  We've setup
monthly partitions for it. Upon running the first of many select * from
bigTable insert into partition statements (330million rows per month) the
entire box eventually goes out to lunch.

Any thoughts/suggestions?

Thanks in advance


	Did you create the indexes on the partition before or after inserting the  
330M rows into it ?

What is your hardware config, where is xlog ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Pierre Frédéric Caillau d


top shows the cpu usage of the pg process ranges from zero to never more  
than ten percent of a cpu, and that one cpu is always ninety some odd  
percent in iowait.  So what is postgres doing (with fsync off) that  
causes the cpu to spend so much time in iowait?


Updating indexes ?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [SQL] [PERFORM] SQL Query Performance - what gives?

2009-08-18 Thread Pierre Frédéric Caillau d



The bitmask allows the setting of multiple permissions but the table
definition doesn't have to change (well, so long as the bits fit into a
word!)  Finally, this is a message forum - the actual code itself is
template-driven and the bitmask permission structure is ALL OVER the
templates; getting that out of there would be a really nasty rewrite,
not to mention breaking the user (non-developer, but owner)
extensibility of the current structure.

Is there a way to TELL the planner how to deal with this, even if it
makes the SQL non-portable or is a hack on the source mandatory?


	You could use an integer array instead of a bit mask, make a gist index  
on it, and instead of doing "mask & xxx" do "array contains xxx", which is  
indexable with gist. The idea is that it can get much better row  
estimation. Instead of 1,2,3, you can use 1,2,4,8, etc if you like. you'd  
probably need a function to convert a bitmask into ints and another to do  
the conversion back, so the rest of your app gets the expected bitmasks.  
Or add a bitmask type to postgres with ptoper statistics...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Getting time of a postgresql-request

2009-08-18 Thread Pierre Frédéric Caillau d
On Tue, 18 Aug 2009 06:25:57 +0200, Russell Smith   
wrote:



Kai Behncke wrote:


But I would like to get it in a php-script, like

$timerequest_result=pg_result($timerequest,0);

(well, that does not work).

I wonder: Is there another way to get the time a request needs?
How do you handle this?


$time = microtime()
$result = pg_result($query);
echo "Time to run query and return result to PHP: ".(microtime() -  
$time);


Something like that.

Regards

Russell



I use the following functions wich protect against SQL injections, make  
using the db a lot easier, and log query times to display at the bottom of  
the page.
It is much less cumbersome than PEAR::DB or pdo which force you to use  
prepared statements (slower if you throw them away after using them just  
once)


db_query( "SELECT * FROM stuff WHERE a=%s AND b=%s", array( $a, $b ))

db_query( "SELECT * FROM stuff WHERE id IN (%s) AND b=%s", array(  
$list_of_ints, $b ))




function db_quote_query( $sql, $params=false )
{
// if no params, send query raw
if( $params === false ) return $sql;
if( !is_array( $params )) $params = array( $params );

// quote params
foreach( $params as $key => $val )
{
if( is_array( $val ))
$params[$key] = implode( ', ', array_map( intval, $val 
));
else
$params[$key] = 
is_null($val)?'NULL':("'".pg_escape_string($val)."'");;
}
return vsprintf( $sql, $params );
}

function db_query( $sql, $params=false )
{
// it's already a query
if( is_resource( $sql ))
return $sql;

$sql = db_quote_query( $sql, $params );

$t = getmicrotime( true );
if( DEBUG > 1 )  xdump( $sql );
$r = pg_query( $sql );
if( !$r )
{
if( DEBUG > 1 )
{
			echo "Erreur PostgreSQL :/>".htmlspecialchars(pg_last_error())."Requête :/>".$sql."Traceback :";

foreach( debug_backtrace() as $t ) xdump( $t );
echo "";
}
die();
}
if( DEBUG > 1)   xdump( $r );
global $_global_queries_log, $_mark_query_time;
$_mark_query_time = getmicrotime( true );
$_global_queries_log[] = array( $_mark_query_time-$t, $sql );
return $r;
}

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] transaction delays to apply

2009-08-12 Thread Pierre Frédéric Caillau d


Does anybody know any way to solve this? I did monitor the system  
running at full load (~20 messages per second) - postmaster's processes  
didn't eat more than 10-20% of CPU and memory. Neither did any of my  
application's processes.


now() like current_timestamp is the time of transaction start. If your  
client BEGINs, then idles for 30 seconds, then INSERTs, the timestamp in  
the insert will be from 30 second ago. Try statement_timestamp() or  
clock_timestamp().


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG-related ACM Article: "The Pathologies of Big Data"

2009-08-08 Thread Pierre Frédéric Caillau d


I don't see how on any recent hardware, random access to RAM is slower  
than
sequential from disk.  RAM access, random or not, is measured in  
GB/sec...


I don't think anybody's arguing that.


http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2795&p=5

These guys mention about 50 ns memory latency ; this would translate into  
20 million memory "seeks" per second, which is in the same ballpark as the  
numbers given by the article...


If you count 10GB/s bandwidth, 50 ns is the time to fetch 500 bytes.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance