Re: [PERFORM] query plan not optimal

2014-01-06 Thread Marc Cousin
On 29/12/2013 19:51, Jeff Janes wrote:
 On Thursday, December 19, 2013, Marc Cousin wrote:
 
 
 
 Yeah, I had forgotten to set it up correctly on this test environment
 (its value is correctly set in production environments). Putting it to a
 few gigabytes here gives me this cost:
 
 bacula=# explain select pathid, filename from batch join path using
 (path);
  QUERY PLAN
 
 
  Nested Loop  (cost=0.56..2083904.10 rows=479020 width=26)
-  Seq Scan on batch  (cost=0.00..11727.20 rows=479020 width=85)
-  Index Scan using idx_path on path  (cost=0.56..4.32 rows=1
 width=16)
  Index Cond: (path = batch.path)
 (4 lignes)
 
 It still chooses the hash join though, but by a smaller margin.
 
 
 This is still a tangent from your original point, but if I create index
 on path (path, pathid), then I can get an index only scan.  This
 actually is not much faster when everything is already cached, but the
 planner thinks it will be about 2x faster because it assumes the vm
 block accesses are free.  So this might be enough to tip it over for you.

Yeah, still a tangent :)

Many bacula users don't have index only scans (the one I was having
trouble with for example), as they are still using an older than 9.2
PostgreSQL version.

 
 
 And it still only will access a very small part of path (always the same
 5000 records) during the query, which isn't accounted for in the cost if
 I understand correctly ?
 
 
 I think you are correct, that it doesn't take account of ndistinct being
 10 to 100 fold less than ntuples on the outer loop, which theoretically
 could propagate down to the table size used in connection with
 effecitve_cache_size.
 
 It seems to me the cost of the hash join is being greatly
 underestimated, which I think is more important than the nested loop
 being overestimated.  (And in my hands, the merge join is actually the
 winner both in the planner and in reality, but I suspect this is because
 all of your fake paths are lexically greater than all of the real paths)
Yes, probably.

 
 Also, you talked earlier about cheating the planner by lowering
 random_page_cost.  But why is that cheating?  If caching means the cost
 is truly lower... 
It feels like cheating, as I feel I'm compensating for what looks like a
bad estimate of the cost: the nested loop is very fast, even if
nothing is cached at the beginning. We could put the *_page_cost
hardcoded to low values in bacula's code for this query, but it is not
that good to put it in postgresql.conf as we currently do, as some other
queries are suffering from those very low costs. Anyway, it would be
even better if it wasn't needed at all, hence this post :)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] query plan not optimal

2013-12-19 Thread Marc Cousin
Hi,

I'm having something I feel is a bit of a limitation of the optimizer (or 
something I don't understand :) ).

Sorry, this is a rather long mail.

I have a workaround for the problem below, but I don't like cheating the 
optimizer for no good reason.

First a little bit of context, because the example below may feel dumb if you 
read it with no explanation:

I'm working on the bacula project, trying to further optimize data insertion 
into the database (we've been working on this subject for a few years now, as 
backup speed is a core functionality of any backup software).

Bacula is a backup software: we are storing all of the backup's metadata into 
the database. It means all backed up files…

The two very large tables are file and path:

  * file contains metadata about ALL files that have been backed up and are 
still available in the database. It sometimes contain several billions records.

  * path contains the full path of a file. It is stored out of file to save 
(lots) of space, as most path are identical in files. It still usually contains 
20 to 50 million records, and can be as big as 20GB.

So we have something like this:

  create table file (fileid bigserial, filename text, pathid bigint);
  alter table file add primary key (fileid);

  create table path (pathid bigserial, path text);
  alter table path add primary key (pathid);
  create unique index idx_path on path(path);

I removed some columns from the file table: we store stat(), a checksum, etc. 
They're not needed for this example.

In order to insert data efficiently, backups first store data into temp tables 
which look like this:

  create table batch (path text, filename text);

(once again I removed all metadata columns)

If you want to create a batch table with useful data to replicate what I'm 
going to show, you can try something like this:

  find /home -printf '%h\t%f\n' |  psql -c COPY batch FROM STDIN bacula

We analyze the batch table: in the real code, it is a temp table, so it is 
compulsory:

  analyze batch;

Then we insert missing paths. This is one of the plans that fail, but we'll 
focus on the second one: as we are starting from an empty path table, this 
query won't be realistic.

  insert into path (path) select path from batch where not exists (select 1 
from path where path.path=batch.path) group by path;

We analyze:

  analyze path;


So now we insert into the file table.

insert into file (pathid,filename) select pathid, filename from batch join path 
using (path);

Here is the plan:

bacula=# explain select pathid, filename from batch join path using (path);
  QUERY PLAN   
---
 Hash Join  (cost=822.25..22129.85 rows=479020 width=26)
   Hash Cond: (batch.path = path.path)
   -  Seq Scan on batch  (cost=0.00..11727.20 rows=479020 width=85)
   -  Hash  (cost=539.89..539.89 rows=22589 width=81)
 -  Seq Scan on path  (cost=0.00..539.89 rows=22589 width=81)
(5 lignes)


I have this plan almost all the time. Lets add a bunch of useless records in 
path to be more realistic and make things worse: usually, bacula inserts data 
in 500,000 batches, and path is very large (millions of records), and is bigger 
than work_mem (it would have to be 20GB in most extreme cases), so it may trash 
disks heavily (many servers can be inserting at the same time).

Let's simulate this (remove indexes before, put them back afterwards :) )

  insert into path (path) select generate_series(100,5000); #Create a 
realistic path table
  analyze path;

Here is the plan:

bacula=# explain (analyze,buffers,verbose) select pathid, filename from batch 
join path using (path);
   QUERY PLAN   


 Hash Join  (cost=1646119.66..1945643.59 rows=479020 width=26) (actual 
time=27275.240..36745.904 rows=479020 loops=1)
   Output: path.pathid, batch.filename
   Hash Cond: (batch.path = path.path)
   Buffers: shared hit=130760 read=179917 written=1823, temp read=224130 
written=223876
   -  Seq Scan on public.batch  (cost=0.00..11727.20 rows=479020 width=85) 
(actual time=0.259..176.031 rows=479020 loops=1)
 Output: batch.filename, batch.path
 Buffers: shared read=6937 written=1823
   -  Hash  (cost=793966.96..793966.96 rows=49022696 width=16) (actual 
time=27274.725..27274.725 rows=49022590 loops=1)
 Output: path.pathid, path.path
 Buckets: 131072  Batches: 128  Memory Usage: 18329kB
 Buffers: shared hit=130760 read=172980, temp written=218711
 -  Seq Scan on public.path  (cost=0.00..793966.96 rows=49022696 
width=16) (actual time=0.231..9650.452 rows=49022590 loops=1)
   Output: path.pathid, 

Re: [PERFORM] query plan not optimal

2013-12-19 Thread Marc Cousin


On 19/12/2013 19:33, Jeff Janes wrote:
 QUERY PLAN
 
 --
  Nested Loop  (cost=0.56..4001768.10 rows=479020 width=26) (actual
 time=2.303..15371.237 rows=479020 loops=1)
Output: path.pathid, batch.filename
Buffers: shared hit=2403958 read=7539
-  Seq Scan on public.batch  (cost=0.00..11727.20 rows=479020
 width=85) (actual time=0.340..160.142 rows=479020 loops=1)
  Output: batch.path, batch.filename
  Buffers: shared read=6937
-  Index Scan using idx_path on public.path  (cost=0.56..8.32
 rows=1 width=16) (actual time=0.030..0.031 rows=1 loops=479020)
  Output: path.pathid, path.path
  Index Cond: (path.path = batch.path)
  Buffers: shared hit=2403958 read=602
  Total runtime: 15439.043 ms
 
 
 As you can see, more than twice as fast, and a very high hit ratio
 on the path table, even if we start from a cold cache (I did, here,
 both PostgreSQL and OS). We have an excellent hit ratio because the
 batch table contains few different path (several files in a
 directory), and is already quite clustered, as it comes from a
 backup, which is of course performed directory by directory.
 
 
 What is your effective_cache_size set to?
 
 Cheers,
 
 Jeff
Yeah, I had forgotten to set it up correctly on this test environment
(its value is correctly set in production environments). Putting it to a
few gigabytes here gives me this cost:

bacula=# explain select pathid, filename from batch join path using (path);
 QUERY PLAN

 Nested Loop  (cost=0.56..2083904.10 rows=479020 width=26)
   -  Seq Scan on batch  (cost=0.00..11727.20 rows=479020 width=85)
   -  Index Scan using idx_path on path  (cost=0.56..4.32 rows=1 width=16)
 Index Cond: (path = batch.path)
(4 lignes)

It still chooses the hash join though, but by a smaller margin.

And it still only will access a very small part of path (always the same
5000 records) during the query, which isn't accounted for in the cost if
I understand correctly ?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query plan not optimal

2013-12-19 Thread Marc Cousin

On 19/12/2013 21:36, Kevin Grittner wrote:

Marc Cousin cousinm...@gmail.com wrote:


Then we insert missing paths. This is one of the plans that fail
insert into path (path)
select path from batch
  where not exists
(select 1 from path where path.path=batch.path)
  group by path;

I know you said you wanted to focus on a different query, but this
one can easily be optimized.  Right now it is checking for an
existing row in path for each row in batch; and it only needs to
check once for each path.  One way to write it would be:

insert into path (path)
   select path from (select distinct path from batch) b
 where not exists
   (select 1 from path p where p.path = b.path);
Yeah, I know, that's why I said I didn't want to focus on this one… we 
already do this optimization :)


Thanks anyway :)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] overzealous sorting?

2011-09-27 Thread Marc Cousin
Le Tue, 27 Sep 2011 12:45:00 +1000,
anthony.ship...@symstream.com a écrit :

 On Monday 26 September 2011 19:39, Marc Cousin wrote:
  Because Index Scans are sorted, not Bitmap Index Scans, which
  builds a list of pages to visit, to be then visited by the Bitmap
  Heap Scan step.
 
  Marc.
 
 Where does this bitmap index scan come from? It seems to negate the
 advantages of b-tree indexes described in the section Indexes and
 ORDER BY of the manual. If I do set enable_bitmapscan = off; the
 query runs a bit faster although with a larger time range it reverts
 to a sequential scan.
 

Bitmap Index Scan is just another way to use a btree index. It is often
used when a bigger part of a table is required, as it costs more than
plain index scan to retrieve a few records, but less when a lot of
records are needed.

Your tests show that index scans are a bit faster on this query. But it
is probably true only when most needed data is cached, which is probably
your case, as you are doing tests using the same query all the time.
The bitmap index scan is probably cheaper when data isn't in cache. You
could also see the bitmap index scan as safer, as it won't perform as
bad when data is not cached (less random IO) :)

The thing is, the optimizer doesn't know if your data will be in cache
when you will run your query… if you are sure most of your data is in
the cache most of the time, you could try to tune random_page_cost
(lower it) to reflect that data is cached. But if the win is small on
this query, it may not be worth it.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] overzealous sorting?

2011-09-27 Thread Marc Cousin
Le Tue, 27 Sep 2011 19:05:09 +1000,
anthony.ship...@symstream.com a écrit :

 On Tuesday 27 September 2011 18:54, Marc Cousin wrote:
  The thing is, the optimizer doesn't know if your data will be in
  cache when you will run your query… if you are sure most of your
  data is in the cache most of the time, you could try to tune
  random_page_cost (lower it) to reflect that data is cached. But if
  the win is small on this query, it may not be worth it.
 
 What I really want is to just read a sequence of records in timestamp
 order between two timestamps. The number of records to be read may be
 in the millions totalling more than 1GB of data so I'm trying to read
 them a slice at a time but I can't get PG to do just this.
 
 If I use offset and limit to grab a slice of the records from a large 
 timestamp range then PG will grab all of the records in the range,
 sort them on disk and return just the slice I want. This is absurdly
 slow. 
 
 The query that I've shown is one of a sequence of queries with the
 timestamp range progressing in steps of 1 hour through the timestamp
 range. All I want PG to do is find the range in the index, find the
 matching records in the table and return them. All of the planner's
 cleverness just seems to get in the way.
 

Maybe you should try using a cursor, if you don't know where you'll
stop. This associated with a very low cursor_tuple_fraction will
probably give you what you want (a fast start plan).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] inheritance: planning time vs children number vs column number

2011-03-01 Thread Marc Cousin
The Tuesday 01 March 2011 16:33:51, Tom Lane wrote :
 Marc Cousin cousinm...@gmail.com writes:
  Le mardi 01 mars 2011 07:20:19, Tom Lane a écrit :
  It's worth pointing out that the only reason this effect is dominating
  the runtime is that you don't have any statistics for these toy test
  tables.  If you did, the cycles spent using those entries would dwarf
  the lookup costs, I think.  So it's hard to get excited about doing
  anything based on this test case --- it's likely the bottleneck would be
  somewhere else entirely if you'd bothered to load up some data.
  
  Yes, for the same test case, with a bit of data in every partition and
  statistics up to date, planning time goes from 20 seconds to 125ms for
  the 600 children/1000 columns case. Which is of course more than
  acceptable.
 
 [ scratches head ... ]  Actually, I was expecting the runtime to go up
 not down.  Maybe there's something else strange going on here.
 
   regards, tom lane

Then, what can I do to help ?


Re: [PERFORM] inheritance: planning time vs children number vs column number

2011-02-28 Thread Marc Cousin
The Monday 28 February 2011 13:57:45, Heikki Linnakangas wrote :
 On 28.02.2011 11:38, Marc Cousin wrote:
  I've been facing a very large (more than 15 seconds) planning time in a
  partitioned configuration. The amount of partitions wasn't completely
  crazy, around 500, not in the thousands. The problem was that there were
  nearly 1000 columns in the parent table (very special use case, there is
  a reason for this application for having these many columns). The check
  constraint was extremely simple (for each child, 1 column = 1 constant,
  always the same column).
  
  As I was surprised by this very large planning time, I have been trying
  to study the variation of planning time against several parameters: -
  number of columns
  - number of children tables
  - constraint exclusion's value (partition or off)
  
  What (I think) I measured is that the planning time seems to be O(n^2)
  for the number of columns, and O(n^2) for the number of children tables.
  
  Constraint exclusion had a limited impact on planning time (it added
  between 20% and 100% planning time when there were many columns).
 
 Testing here with a table with 1000 columns and 100 partitions, about
 80% of the planning time is looking up the statistics on attribute
 width, to calculate average tuple width. I don't see O(n^2) behavior,
 though, it seems linear.

It is only based on experimentation, for my part, of course… 

If you measure the planning time, modifying either the columns or the 
partitions number, the square root of the planning time is almost perfectly 
proportional with the parameter you're playing with.


Re: [PERFORM] inheritance: planning time vs children number vs column number

2011-02-28 Thread Marc Cousin
The Monday 28 February 2011 16:35:37, Tom Lane wrote :
 Marc Cousin cousinm...@gmail.com writes:
  The Monday 28 February 2011 13:57:45, Heikki Linnakangas wrote :
  Testing here with a table with 1000 columns and 100 partitions, about
  80% of the planning time is looking up the statistics on attribute
  width, to calculate average tuple width. I don't see O(n^2) behavior,
  though, it seems linear.
  
  It is only based on experimentation, for my part, of course

  
  If you measure the planning time, modifying either the columns or the
  partitions number, the square root of the planning time is almost
  perfectly proportional with the parameter you're playing with.
 
 Could we see a concrete example demonstrating that?  I agree with Heikki
 that it's not obvious what you are testing that would have such behavior.
 I can think of places that would have O(N^2) behavior in the length of
 the targetlist, but it seems unlikely that they'd come to dominate
 runtime at a mere 1000 columns.
 
   regards, tom lane

I feel a little silly not having provided a test case from the start…

A script doing a complete test is attached to this email.

It's doing a simple 

CREATE TABLE test_father (col0 int,col1 int,col2 int,col3 int,col4 int,col5 
int,col6 int,col7 int,col8 int,col9 int,col10 in
t,col11 int,col12 int,col13 int,col14 int,col15 int,col16 int,col17 int,col18 
int,col19 int,col20 int,col21 int,col22 int,co
l23 int,…)

Followed by 600 
CREATE TABLE test_child_0 (CHECK (col0=0)) INHERITS (test_father);

And a single 

SELECT col0 FROM test_father WHERE col0=0;


Here are my results (from the same machine). I've done it with 600 partitions, 
to have big planning times. If you need a smaller one (this one takes nearly 
ten minutes to run) tell me.

COLS:100 PARTITIONS:600
Time : 513,764 ms (sqrt : 22.6)
COLS:200 PARTITIONS:600
Time : 906,214 ms (sqrt : 30.1)
COLS:300 PARTITIONS:600
Time : 2255,390 ms (sqrt : 47.48)
COLS:400 PARTITIONS:600
Time : 4816,820 ms (sqrt : 69.4)
COLS:500 PARTITIONS:600
Time : 5736,602 ms (sqrt : 75.73)
COLS:600 PARTITIONS:600
Time : 7659,617 ms (sqrt : 87.51)
COLS:700 PARTITIONS:600
Time : 9313,260 ms (sqrt : 96.5)
COLS:800 PARTITIONS:600
Time : 13700,353 ms (sqrt : 117.04)
COLS:900 PARTITIONS:600
Time : 13914,765 ms (sqrt : 117.95)
COLS:1000 PARTITIONS:600
Time : 20335,750 ms (sqrt : 142.6)
COLS:1100 PARTITIONS:600
Time : 21048,958 ms (sqrt : 145.08)
COLS:1200 PARTITIONS:600
Time : 27619,559 ms (sqrt : 166.18)
COLS:1300 PARTITIONS:600
Time : 31357,353 ms (sqrt : 177.08)
COLS:1400 PARTITIONS:600
Time : 34435,711 ms (sqrt : 185.57)
COLS:1500 PARTITIONS:600
Time : 38954,676 ms (sqrt : 197.37)


As for my previous results, these ones are on a machine doing a bit of other 
work, so some values may be a bit offset, and it's only one measure each time 
anyway.

The CSV file I sent from the first email is obtained running the exact same 
commands, but playing on both columns and partitions, and averaged over 3 
measures.

Regards.


script.sql.bz2
Description: application/bzip

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] inheritance: planning time vs children number vs column number

2011-02-28 Thread Marc Cousin
Le mardi 01 mars 2011 07:20:19, Tom Lane a écrit :
 Marc Cousin cousinm...@gmail.com writes:
  The Monday 28 February 2011 16:35:37, Tom Lane wrote :
  Could we see a concrete example demonstrating that?  I agree with Heikki
  that it's not obvious what you are testing that would have such
  behavior. I can think of places that would have O(N^2) behavior in the
  length of the targetlist, but it seems unlikely that they'd come to
  dominate runtime at a mere 1000 columns.
  
  I feel a little silly not having provided a test case from the startق€�
  
  A script doing a complete test is attached to this email.
 
 I did some oprofile analysis of this test case.  It's spending
 essentially all its time in SearchCatCache, on failed searches of
 pg_statistic.  The cache accumulates negative entries for each probed
 column, and then the searches take time proportional to the number of
 entries, so indeed there is an O(N^2) behavior --- but N is the number
 of columns times number of tables in your test case, not just the number
 of columns.
 
 The cache is a hash table, so ideally the search time would be more or
 less constant as the table grows, but to make that happen we'd need to
 reallocate with more buckets as the table grows, and catcache.c doesn't
 do that yet.  We've seen a few cases that make that look worth doing,
 but they tend to be pretty extreme, like this one.
 
 It's worth pointing out that the only reason this effect is dominating
 the runtime is that you don't have any statistics for these toy test
 tables.  If you did, the cycles spent using those entries would dwarf
 the lookup costs, I think.  So it's hard to get excited about doing
 anything based on this test case --- it's likely the bottleneck would be
 somewhere else entirely if you'd bothered to load up some data.
 
   regards, tom lane

Yes, for the same test case, with a bit of data in every partition and 
statistics up to date, planning time goes from 20 seconds to 125ms for the 600 
children/1000 columns case. Which is of course more than acceptable.

Now I've got to check it's the same problem on the real environment. I think 
it has quite a few empty partitions, so no statistics for them…

Thanks a lot.

Marc

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance regression with Linux 2.6.33 and glibc 2.12

2010-06-04 Thread Marc Cousin
Hi.

I hope I'm not going to expose an already known problem, but I couldn't find 
it mailing list archives (I only found http://archives.postgresql.org/pgsql-
hackers/2009-12/msg01543.php).

On one of my (non production) machines, I've just seen a very big performance 
regression (I was doing a very simple insert test). I had an 'old' 8.4 
postgresql compiled a few month ago, performing very well, and my 'bleeding 
edge' 9.0, doing the same insert very slowly.

I managed to find the cause of the regression : with Linux 2.6.33, O_DSYNC is 
now available. With glibc 2.12, O_DSYNC is available in userspace. Having both 
(they are both very new, 2.12 isn't even official on glibc website), my new 
build defaulted to open_datasync. The problem is that it is much slower. I 
tested it on 2 small machines (no big raid, just basic machines, with SATA or 
software RAID).

Here is the trivial test :
The configuration is the default configuration, just after initdb

CREATE TABLE test (a int);
CREATE INDEX idxtest on test (a);



with wal_sync_method = open_datasync (new default)

marc=# INSERT INTO test SELECT generate_series(1,10);
INSERT 0 10
Time: 16083,912 ms

with wal_sync_method = fdatasync (old default)

marc=# INSERT INTO test SELECT generate_series(1,10);
INSERT 0 10
Time: 954,000 ms

Doing synthetic benchmarks with test_fsync:

open_datasync performance, glibc 2.12, 2.6.34, 1 SATA drive

Simple 8k write timing:
write   0.037511

Compare file sync methods using one 8k write:
open_datasync write56.998797
open_sync write   168.653995
write, fdatasync   55.359279
write, fsync  166.854911

Compare file sync methods using two 8k writes:
open_datasync write, write113.342738
open_sync write, write339.066883
write, write, fdatasync57.336820
write, write, fsync   166.847923

Compare open_sync sizes:

   
16k open_sync write   169.423723

   
2 8k open_sync writes 336.457119

   

Compare fsync times on write() and new file descriptors (if the times
are similar, fsync() can sync data written on a different descriptor):
write, fsync, close   166.264048
write, close, fsync   168.702035

This is it, I just wanted to raise an alert on this: the degradation was 16-
fold with this test. We wont see linux 2.6.33 + glibc 2.12 in production 
before months (I hope), but shouldn't PostgreSQL use fdatasync by default with 
Linux, seeing the result ?

By the way, I re-did my tests with both 2.6.33, 2.6.34 and 2.6.35-rc1 and got 
the exact same result (O_DSYNC there, obviously, but also the performance 
degradation).

Cheers

Marc

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance regression with Linux 2.6.33 and glibc 2.12

2010-06-04 Thread Marc Cousin
The Friday 04 June 2010 15:59:05, Tom Lane wrote :
 Marc Cousin cousinm...@gmail.com writes:
  I hope I'm not going to expose an already known problem, but I couldn't
  find it mailing list archives (I only found
  http://archives.postgresql.org/pgsql- hackers/2009-12/msg01543.php).
 
 You sure this isn't the well-known ext4 actually implements fsync
 where ext3 didn't issue?
 
   regards, tom lane

Everything is ext4. So I should have fsync working with write barriers on all 
the tests.

I don't think this problem is of the same kind: I think it is really because 
of O_DSYNC appearing on 2.6.33, and PostgreSQL using it by default now. If my 
filesystem was lying to me about barriers, I should take no more performance 
hit with open_datasync than with fdatasync, should I ?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version

2009-08-03 Thread Marc Cousin
The few 'obvious' things I see :

ID and POLLID aren't of the same type (numeric vs bigint)

TTIME isn't indexed.

And as a general matter, you should stick to native datatypes if you don't 
need numeric.

But as said in the other answer, maybe you should redo this schema and use 
more consistent datatypes

Anyway, from what I remenber, it's not advised to set up shared buffers that 
high for windows (I don't do so much windows myself, so maybe someone will be 
better informed).

Anyway you can start by correcting the schema…

On Friday 31 July 2009 07:45:55 pari krishnan wrote:
 Dear All,


 We are
 using Postgres 8.3.7 in our java application. We are doing performances
 tuning and load testing in our setup. we have noticed that ,some of our
 queries to the database taking long time to return the results.Please
 find our setup details belows.

 We observed that postgres is running in windows is slower than the linux .

 Machine  Database Details :

 Windows configuration:
 4 GB RAM
 4*1.6 GHZ
 windows 2008 server standard edition

 Postgresql configuration:

 shared_buffers: 1 GB
 Effective_cache_size: 2GB
 fsync: off  (even we tested this parameter is on ,we observed the same
 slowness )


 Database Details :

 Postgres  Database   : PostgreSQL 8.3.7.1
 Driver Version: PostgreSQL 8.3 JDBC4 with SSL (build 604)
 We are using 40 database connections.


 We have few tables which will be having more amount data.While running
 our application STATSDATA table will be created daily with table name
 with date.
 like as STATSDATA8_21_2009

 Schema for STATSDATA table

 create table STATSDATA8_21_2009(
 POLLID Numeric(19),
 INSTANCE varchar(100),
 TTIME Numeric(19),
 VAL Numeric(13)) ;CREATE INDEX POLLID%_ndx on STATSDATA%(POLLID)

 Schema for PolledData

 create table PolledData(
 NAME varchar(50) NOT NULL ,
 ID BIGINT NOT NULL ,
 AGENT varchar(50) NOT NULL ,
 COMMUNITY varchar(100) NOT NULL ,
 PERIOD INTEGER NOT NULL,
 ACTIVE varchar(10),
 OID varchar(200) NOT NULL,
 LOGDIRECTLY varchar(10),
 LOGFILE varchar(100),
 SSAVE varchar(10),
 THRESHOLD varchar(10),
 ISMULTIPLEPOLLEDDATA varchar(10),
 PREVIOUSSEVERITY INTEGER,
 NUMERICTYPE INTEGER,
 SAVEABSOLUTES varchar(10),
 TIMEAVG varchar(10),
 PORT INTEGER,
 WEBNMS varchar(100),
 GROUPNAME varchar(100),
 LASTCOUNTERVALUE BIGINT ,
 LASTTIMEVALUE BIGINT ,
 TIMEVAL BIGINT NOT NULL ,
 POLICYNAME varchar(100),
 THRESHOLDLIST varchar(200),
 DNSNAME varchar(100),
 SUFFIX varchar(20),
 STATSDATATABLENAME varchar(100),
 POLLERNAME varchar(200),
 FAILURECOUNT INTEGER,
 FAILURETHRESHOLD INTEGER,
 PARENTOBJ varchar(100),
 PROTOCOL varchar(50),
 SAVEPOLLCOUNT INTEGER,
 CURRENTSAVECOUNT INTEGER,
 SAVEONTHRESHOLD varchar(10),
 SNMPVERSION varchar(10),
 USERNAME varchar(30),
 CONTEXTNAME varchar(30),
 PRIMARY KEY (ID,NAME,AGENT,OID),
 index PolledData0_ndx ( NAME),
 index PolledData1_ndx ( AGENT),
 index PolledData2_ndx ( OID),
 index PolledData3_ndx ( ID),
 index PolledData4_ndx ( PARENTOBJ),
 )


 We
 have 300k row's in PolledData Table.In each STATSDATA table ,we have
 almost 12 to 13 million rows. Every one minute interval ,we insert data
 into to STATSDATA table. In our application ,we use insert and select
 query to STATSDATA table at regular interval. Please let us know why
 the below query takes more time to return the results. is there any
 thing we need to do to tune the postgres database ?




 Please find explain analyze output.


 First Query :

 postgres=# explain analyze SELECT COUNT(*) FROM ( SELECT ID,
 PolledData.AGENT, N AME, INSTANCE, TTIME, VAL FROM PolledData,
 STATSDATA8_21_2009 WHERE ( ( PolledDa ta.ID=STATSDATA8_21_2009.POLLID) AND
 ( ( TTIME = 1250838027454) AND ( TTIME = 1250838079654) ) ) ) t1;
 QUERY
 PLAN

 ---
- --
 Aggregate  (cost=773897.12..773897.13 rows=1 width=0) (actual
 time=17818.410..1 7818.412 rows=1 loops=1)
-  Merge Join  (cost=717526.23..767505.06 rows=2556821 width=0) (actual
 time =17560.469..17801.790 rows=13721 loops=1)
  Merge Cond: (statsdata8_21_2009.pollid =
 ((polleddata.id)::numeric)) -  Sort  (cost=69708.44..69742.49 rows=13619
 width=8) (actual time=239 2.659..2416.093 rows=13721 loops=1)
Sort Key: statsdata8_21_2009.pollid
Sort Method:  quicksort  Memory: 792kB
-  Seq Scan on statsdata8_21_2009  (cost=0.00..68773.27
 rows=136 19 width=8) (actual time=0.077..2333.132 rows=13721 loops=1)
  Filter: ((ttime = 1250838027454::numeric) AND (ttime
 = 12 50838079654::numeric))
  -  Materialize  (cost=647817.78..688331.92 rows=3241131 width=8)
 (actu al time=15167.767..15282.232 rows=21582 loops=1)
-  Sort  (cost=647817.78..655920.61 rows=3241131 width=8)
 (actua l time=15167.756..15218.645 rows=21574 loops=1)
  

Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-23 Thread Marc Cousin
 It really has very little impact.  It only affects index scans, and
 even then only if effective_cache_size is less than the size of the
 table.

 Essentially, when this kicks in, it models the effect that if you are
 index scanning a table much larger than the size of your cache, you
 might have to reread some blocks that you previously read in during
 *that same index scan*.


Ok, thanks for clearing that up for me. Still, I think the doc could be 
improved on this point (sorry to be a bit obsessed with that, but I'm one of 
the french translators, so I like the doc to be perfect :) )

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-16 Thread Marc Cousin
On Thursday 16 July 2009 07:20:18 Marc Cousin wrote:
 Le Thursday 16 July 2009 01:56:37, Devin Ben-Hur a écrit :
  Marc Cousin wrote:
   This mail contains the asked plans :
   Plan 1
   around 1 million records to insert, seq_page_cost 1, random_page_cost 4
  
-  Hash  (cost=425486.72..425486.72 rows=16746972 width=92)
   (actual time=23184.196..23184.196 rows=16732049 loops=1) -  Seq Scan
   on path  (cost=0.00..425486.72 rows=16746972 width=92) (actual
   time=0.004..7318.850 rows=16732049 loops=1)
  
  -  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35)
   (actual time=210831.840..210831.840 rows=79094418 loops=1) -  Seq Scan
   on filename  (cost=0.00..1436976.15 rows=79104615 width=35) (actual
   time=46.324..148887.662 rows=79094418 loops=1)
 
  This doesn't address the cost driving plan question, but I think it's a
  bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
  while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
  ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
  some terrible bloat on filename that's not present on path?  If that seq
  scan time on filename were proportionate to path this plan would
  complete about two minutes faster (making it only 6 times slower instead
  of 9 :).

 Much simpler than that I think : there is a bigger percentage of path that
 is used all the time than of filename. The database used is the production
 database, so there were other insert queries running a few minutes before I
 got this plan.

 But I'll give it a look today and come back with bloat and cache
 information on these 2 tables.

Here are the stats for filename :

SELECT * from pgstattuple('public.filename');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | 
dead_tuple_len | dead_tuple_percent | free_space | free_percent
+-++---+--++++--
 5308719104 |79338344 | 4717466438 | 88.86 |0 | 
 0 |  0 |   11883396 | 0.22

So I guess it's not bloated.

I checked in the cache, the times displayed before were with path in the cache. 
filename couldn't stay in the cache, as it's too big.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-16 Thread Marc Cousin
Le Thursday 16 July 2009 22:07:25, Kevin Grittner a écrit :
 Marc Cousin cousinm...@gmail.com wrote:
  the hot parts of these 2 tables are extremely likely to be in the
  database or linux cache (buffer hit rate was 97% in the example
  provided). Moreover, the first two queries of the insert procedure
  fill the cache for us...

 This would be why the optimizer does the best job estimating the
 relative costs of various plans when you set the random_page_cost and
 seq_page_cost very low.

 -Kevin


Ok, so to sum it up, should I keep these values (I hate doing this :) ) ? 
Would there be a way to approximately evaluate them regarding to the expected 
buffer hit ratio of the query ?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-16 Thread Marc Cousin
Le Thursday 16 July 2009 23:54:54, Kevin Grittner a écrit :
 Marc Cousin cousinm...@gmail.com wrote:
  to sum it up, should I keep these values (I hate doing this :) ) ?

 Many people need to set the random_page_cost and/or seq_page_cost to
 reflect the overall affect of caching on the active portion of the
 data.  We set our fully-cached databases to 0.1 for both.  Databases
 with less caching usually wind up at 2 and 1.  We have one database
 which does best at 0.5 and 0.3.  My advice is to experiment and try to
 find a pair of settings which works well for most or all of your
 queries.  If you have a few which need a different setting, you can
 set a special value right before running the query, but I've always
 been able to avoid that (thankfully).

  Would there be a way to approximately evaluate them regarding to
  the expected buffer hit ratio of the query ?

 Nothing query-specific except setting them on the connection right
 before the query (and setting them back or discarding the connection
 afterward).  Well, that and making sure that effective_cache_size
 reflects reality.

 -Kevin


OK, thanks a lot.

A last thing :

As mentionned in another mail from the thread (from Richard Huxton), I felt 
this message in the documentation a bit misleading :

effective_cache_size (integer)
 Sets the planner's assumption about the effective size of the disk cache that 
is available to a single query

I don't really know what the 'a single query' means. I interpreted that as 
'divide it by the amount of queries typically running in parallel on the 
database'. Maybe it should be rephrased ? (I may not be the one 
misunderstanding it).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Marc Cousin
This mail contains the asked plans :

I've done them with the different configurations, as I had done the effort of 
setting up the whole thing :)
Stats were updated between all runs. Each time is the first run of the query 
(that's what we have in production with bacula)
And I added the executor stats, in case ...

By the way, I think I must mention it, the whole thing runs over DRBD, but with 
2 gigabyte links between the master and the slave.
And I tried deactivating replication when things got really slow (despooling in 
24 hours), it changed nothing (sorts were a bit faster, 
around 20%). Server is 12 GB ram, 1 quad core xeon E5335.

PostgreSQL starts to hash filename a bit later than what I said in the first 
mail, because it's become bigger (it was around 30-40 million last time I did 
the tests).

This is the query (temp_mc is the table I've created to do my tests ...):

explain ANALYZE SELECT batch.FileIndex,
  batch.JobId,
  Path.PathId,
  Filename.FilenameId,
  batch.LStat,
  batch.MD5
  FROM temp_mc AS batch
  JOIN Path ON (batch.Path = Path.Path)
  JOIN Filename ON (batch.Name = Filename.Name);


Plan 1
around 1 million records to insert, seq_page_cost 1, random_page_cost 4

LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!   380.143452 elapsed 79.000938 user 44.386774 system sec
!   [415.785985 user 155.733732 sys total]
!   15848728/12934936 [24352752/50913184] filesystem blocks in/out
!   0/44188 [86/987512] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   93812/40706 [405069/184511] voluntary/involuntary context switches
! buffer usage stats:
!   Shared blocks: 877336 read,  0 written, buffer hit rate = 
6.75%
!   Local  blocks:  0 read,  0 written, buffer hit rate = 
0.00%
!   Direct blocks:  0 read,  0 written

 Hash Join  (cost=3923929.71..5131377.91 rows=1286440 width=91) (actual 
time=234021.194..380018.709 rows=1286440 loops=1)
   Hash Cond: (batch.name = filename.name)
   -  Hash Join  (cost=880140.87..1286265.62 rows=1286440 width=102) (actual 
time=23184.959..102400.782 rows=1286440 loops=1)
 Hash Cond: (batch.path = path.path)
 -  Seq Scan on temp_mc batch  (cost=0.00..49550.40 rows=1286440 
width=189) (actual time=0.007..342.396 rows=1286440 loops=1)
 -  Hash  (cost=425486.72..425486.72 rows=16746972 width=92) (actual 
time=23184.196..23184.196 rows=16732049 loops=1)
   -  Seq Scan on path  (cost=0.00..425486.72 rows=16746972 
width=92) (actual time=0.004..7318.850 rows=16732049 loops=1)
   -  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual 
time=210831.840..210831.840 rows=79094418 loops=1)
 -  Seq Scan on filename  (cost=0.00..1436976.15 rows=79104615 
width=35) (actual time=46.324..148887.662 rows=79094418 loops=1)
 Total runtime: 380136.601 ms


Plan 2
the same insert, with seq_page_cost to 0.01 and random_page_cost to 0.02

DETAIL:  ! system usage stats:
!   42.378039 elapsed 28.277767 user 12.192762 system sec
!   [471.865489 user 180.499280 sys total]
!   0/4072368 [24792848/59059032] filesystem blocks in/out
!   0/0 [86/989858] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   1061/9131 [429738/200320] voluntary/involuntary context switches
! buffer usage stats:
!   Shared blocks: 251574 read,  0 written, buffer hit rate = 
96.27%
!   Local  blocks:  0 read,  0 written, buffer hit rate = 
0.00%
!   Direct blocks:  0 read,  0 written
LOG:  duration: 42378.373 ms  statement: 
  QUERY PLAN
---
 Hash Join  (cost=381840.21..1012047.92 rows=1286440 width=91) (actual 
time=20284.387..42242.955 rows=1286440 loops=1)
   Hash Cond: (batch.path = path.path)
   -  Nested Loop  (cost=0.00..583683.91 rows=1286440 width=178) (actual 
time=0.026..10333.636 rows=1286440 loops=1)
 -  Seq Scan on temp_mc batch  (cost=0.00..13231.26 rows=1286440 
width=189) (actual time=0.008..380.361 rows=1286440 loops=1)
 -  Index Scan using filename_name_idx on filename  (cost=0.00..0.43 
rows=1 width=35) (actual time=0.006..0.007 rows=1 loops=1286440)
   Index Cond: (filename.name = batch.name)
   -  Hash  (cost=170049.89..170049.89 rows=16746972 width=92) (actual 
time=20280.729..20280.729 rows=16732049 loops=1)
 -  Seq Scan on path  (cost=0.00..170049.89 rows=16746972 width=92) 
(actual time=0.005..4560.872 rows=16732049 loops=1)
 Total runtime: 42371.362 ms


The thing is that this query is ten 

Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Marc Cousin
Le Wednesday 15 July 2009 15:45:01, Alvaro Herrera a écrit :
 Marc Cousin escribió:
  There are other things I am thinking of : maybe it would be better to
  have sort space on another (and not DBRD'ded) raid set ? we have a quite
  cheap setup right now for the database, and I think maybe this would help
  scale better. I can get a filesystem in another volume group, which is
  not used that much for now.

 You know, that's the first thing it came to me when I read you're using
 DRDB.  Have you tried setting temp_tablespace to a non-replicated disk?

I wish I could easily. I'm not entitled to tune the database, only to give 
directives. I've given this one, but I don't know when it will be done. I'll 
keep you informed on this one, but I don't have my hopes too high.

As mentionned before, I tried to deactivate DRBD (still using the DRBD device, 
but not connected to the other node, so it has almost no effect). It didn't 
change much (performance was a bit (around 20% better).

Anyway, the thing is that :
- big sorts kill my machine when there are more that 5 of them. I think it is 
a system problem (raid, filesystem, linux tuning, don't really know, I'll have 
to dig into this, but it will be complicated, for human reasons :) )
- the plan through nested loops is faster anyway, and I think it's because 
there is only a small fraction of filename and path that is used (most files 
backed up have the same name or path, as we save 600 machines with mostly 2 
OSes, linux and windows), so the hot parts of these 2 tables are extremely 
likely to be in the database or linux cache (buffer hit rate was 97% in the 
example provided). Moreover, the first two queries of the insert procedure fill 
the cache for us...




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Marc Cousin
Le Thursday 16 July 2009 01:56:37, Devin Ben-Hur a écrit :
 Marc Cousin wrote:
  This mail contains the asked plans :
  Plan 1
  around 1 million records to insert, seq_page_cost 1, random_page_cost 4
 
   -  Hash  (cost=425486.72..425486.72 rows=16746972 width=92)
  (actual time=23184.196..23184.196 rows=16732049 loops=1) -  Seq Scan on
  path  (cost=0.00..425486.72 rows=16746972 width=92) (actual
  time=0.004..7318.850 rows=16732049 loops=1)
 
 -  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual
  time=210831.840..210831.840 rows=79094418 loops=1) -  Seq Scan on
  filename  (cost=0.00..1436976.15 rows=79104615 width=35) (actual
  time=46.324..148887.662 rows=79094418 loops=1)

 This doesn't address the cost driving plan question, but I think it's a
 bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
 while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
 ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
 some terrible bloat on filename that's not present on path?  If that seq
 scan time on filename were proportionate to path this plan would
 complete about two minutes faster (making it only 6 times slower instead
 of 9 :).
Much simpler than that I think : there is a bigger percentage of path that is 
used all the time than of filename. The database used is the production 
database, so there were other insert queries running a few minutes before I 
got this plan.

But I'll give it a look today and come back with bloat and cache information 
on these 2 tables.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-14 Thread Marc Cousin
Le Tuesday 14 July 2009 10:15:21, vous avez écrit :
 Marc Cousin wrote:
  Your effective_cache_size is really small for the system you seem to
  have - its the size of IO caching your os is doing and uses no resources
  itself. And 800MB of that on a system with that amount of data seems a
  bit unlikely ;-)
 
  Using `free` you can see the amount of io caching your OS is doing atm.
  in the 'cached' column.
 
  That possibly might tip some plans in a direction you prefer.
 
  What kind of machine are you running this on?
 
  I played with this parameter too, and it didn't influence the plan.
  Anyway, the doc says it's the OS cache available for one query,

 No they don't. I'm guessing you're getting mixed up with work_mem.

I'm not (from the docs) :
effective_cache_size (integer)
 Sets the planner's assumption about the effective size of the disk cache that 
is available to a single query

I trust you, of course, but then I think maybe this should be rephrased in the 
doc then, because I understand it like I said ... I always had a doubt about 
this sentence, and that's why I tried both 800MB and 8GB for this parameter.


   and there may be a lot of
 
  insert queries at the same time, so I chose to be conservative with this
  value. I tried it with 8GB too, the plans were the same.
 
  The OS cache is around 8-10GB by the way.

 That's what you need to set effective_cache_size to then.
Ok but that doesn't change a thing for this query (I had a doubt on this 
parameter and tried with both 800MB and 8GB)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-14 Thread Marc Cousin
Le Tuesday 14 July 2009 10:23:25, Richard Huxton a écrit :
 Marc Cousin wrote:
  Temporarily I moved the problem at a bit higher sizes of batch by
  changing random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel
  like an apprentice sorcerer with this, as I told postgreSQL that fetching
  rows from disk are much cheaper than they are. These values are, I think,
  completely abnormal.

 They certainly don't have anything to do with reality. Try putting them
 back to (say) seq_page_cost=1 and random_page_cost=2.

That's the first thing I tried (it seemed more sensible), and it didn't work. I 
can't put them back to these values for more than one test query, the server 
really died before I changed the settings.


  So, finally, to my questions :
  - Is it normal that PostgreSQL is this off base on these queries (sorry I
  don't have the plans, if they are required I'll do my best to get some,
  but they really are the two obvious plans for this kind of query). What
  could make it choose the hash join for too small batch tables ?

 No point in speculating without plans.

Ok, I'll try to have them tomorrow.


  - Is changing the 2 costs the way to go ?

 Not the way you have.
That's what I thought, and the reason I posted :)


  - Is there a way to tell postgreSQL that it's more costly to sort than it
  thinks ? (instead of telling it that fetching data from disk doesn't cost
  anything).

 That's what the configuration settings do. But if you put a couple way
 off from reality it'll be pure chance if it gets any estimates right.

  Here are the other non-default values from my configuration :
 
  shared_buffers = 2GB
  work_mem = 64MB

 Set this *much* higher when you are running your bulk imports. You can
 do it per-connection. Try 256MB, 512MB, 1GB (but keep an eye on total
 memory used).

I'll try that. But anyhow, I've got much better performance when not doing the 
hash join. I'll get back with the plans as soon as possible.


  maintenance_work_mem = 256MB
  max_fsm_pages = 1500 # There are quite big deletes with bacula ...
  effective_cache_size = 800MB

 See other emails on this one.

  default_statistics_target = 1000

 Probably don't need this for all columns, but it won't cause problems
 with these queries.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Very big insert/join performance problem (bacula)

2009-07-13 Thread Marc Cousin
 to underestimate the cost 
of sorting here (the row estimates were good, no problem with that).
PostgreSQL seems to think that at around 1 million records in file it should 
go with a hash join on filename and path, so we go on hashing the 17 million 
records of path, the 80 millions of filename, then joining and inserting into 
file (we're talking about sorting around 15 GB for each of these despools in 
parallel).

Temporarily I moved the problem at a bit higher sizes of batch by changing 
random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an 
apprentice sorcerer with this, as I told postgreSQL that fetching rows from 
disk are much cheaper than they are. These values are, I think, completely 
abnormal. Doing this, I got the change of plan at around 8 million. And had 2 
of them at 9 millions at the same time this weekend, and both of the took 24 
hours, while the nested loops before the join (for inserts in path and 
filename) did their work in minutes...

So, finally, to my questions :
- Is it normal that PostgreSQL is this off base on these queries (sorry I 
don't have the plans, if they are required I'll do my best to get some, but 
they really are the two obvious plans for this kind of query). What could 
make it choose the hash join for too small batch tables ?
- Is changing the 2 costs the way to go ?
- Is there a way to tell postgreSQL that it's more costly to sort than it 
thinks ? (instead of telling it that fetching data from disk doesn't cost 
anything).




Here are the other non-default values from my configuration :

shared_buffers = 2GB
work_mem = 64MB
maintenance_work_mem = 256MB
max_fsm_pages = 1500 # There are quite big deletes with bacula ...
effective_cache_size = 800MB
default_statistics_target = 1000

PostgreSQL is 8.3.5 on Debian Lenny



I'm sorry for this very long email, I tried to be as precise as I could, but 
don't hesitate to ask me more.

Thanks for helping.

Marc Cousin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-13 Thread Marc Cousin
We regularly do all of dbcheck. This is our real configuration, there are 
really lots of servers and lots of files (500 million files backed up every 
month).

But thanks for mentionning that.

The thing is we're trying to improve bacula with postgresql in order to make 
it able to bear with this kind of volumes. So we are looking for things to 
improve bacula and postgresql tuning to make it cope with the queries 
mentionned (or rewrite the queries or the way to do inserts, that may not be 
a problem either)

On Monday 13 July 2009 16:37:06 SystemManagement wrote:
 Hi,
 just a remark, as the number of entries seems to be very high:
 Did you ever activate bacula's program dbcheck Option 16?

 Regards

 Reiner

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-13 Thread Marc Cousin

 While this is not your questions, I still noticed you seem to be on 8.3 -
 it might be a bit faster to use GROUP BY instead of DISTINCT.
It didn't do a big difference, I already tried that before for this query. 
Anyway, as you said, it's not the query having problems :)


 Your effective_cache_size is really small for the system you seem to have -
 its the size of IO caching your os is doing and uses no resources itself.
 And 800MB of that on a system with that amount of data seems a bit unlikely
 ;-)

 Using `free` you can see the amount of io caching your OS is doing atm. in
 the 'cached' column.

 That possibly might tip some plans in a direction you prefer.

 What kind of machine are you running this on?

I played with this parameter too, and it didn't influence the plan. Anyway, the 
doc says it's the OS cache available for one query, and there may be a lot of 
insert queries at the same time, so I chose to be conservative with this 
value. I tried it with 8GB too, the plans were the same.

The OS cache is around 8-10GB by the way.

The machine is a dell PE2900, with 6 disks dedicated to this database (raid 10 
config)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-04 Thread Marc Cousin
It's not that trivial with Oracle either. I guess you had to use shared 
servers to get to that amount of sessions. They're most of the time not 
activated by default (dispatchers is at 0).

Granted, they are part of the 'main' product, so you just have to set up 
dispatchers, shared servers, circuits, etc ... but there is still setup to 
do : dispatchers are (if I recall correctly) a completely manual parameter 
(and every dispatcher can only drive a certain amount of sessions, dependant 
on the operating system), where shared servers is a bit more dynamic, but 
still uses processes (so you may have to tweak max processes also).

What I mean to say is that Oracle does something quite alike PostgreSQL + a 
connection pooler, even if it's more advanced (it's a shared memory structure 
that is used to send messages between dispatchers and shared servers).

Or did you mean that you had thousands of sessions in dedicated mode ?



On Wednesday 03 June 2009 20:13:39 Dimitri wrote:
 Just to say you don't need a mega server to keep thousands connections
 with Oracle, it's just trivial, nor CPU affinity and other stuff you
 may or may not need with Sybase :-)

 Regarding PostgreSQL, I think it'll only benefit to have an integrated
 connection pooler as it'll make happy all populations anyway:
   - those who don't like the idea may always disable it :-)
   - those who have a lot but mostly inactive sessions will be happy to
 simplify session pooling
   - those who really seeking for the most optimal workload on their
 servers will be happy twice: if there are any PG scalability limits,
 integrated pooler will be in most cases more performant than external;
 if there are no PG scalability limits - it'll still help to size PG
 most optimally according a HW or OS capacities..

 Rgds,
 -Dimitri

 On 6/3/09, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
  Dimitri dimitrik...@gmail.com wrote:
  Few weeks ago tested a customer application on 16 cores with Oracle:
- 20,000 sessions in total
- 70,000 queries/sec
 
  without any problem on a mid-range Sun box + Solaris 10..
 
  I'm not sure what point you are trying to make.  Could you elaborate?
 
  (If it's that Oracle doesn't need an external connection pool, then
  are you advocating that PostgreSQL include that in the base product?)
 
  -Kevin



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query optimization

2008-11-30 Thread Marc Cousin
Le Sunday 30 November 2008 19:45:11 tmp, vous avez écrit :
 I am struggeling with the following query which fetches a random subset
 of 200 questions that matches certain tags within certain languages.
 However, the query takes forever to evaluate, even though I have a
 limit 200 appended. Any ideas on how to optimize it?

 QUERY: 

 SELECT distinct q.question_id
FROM question_tags qt, questions q
WHERE q.question_id = qt.question_id
  AND q.STATUS = 1
  AND not q.author_id = 105
  AND ((qt.language_id = 5 and qt.tag_id in
 (1101,917,734,550,367,183)) or (qt.language_id = 4 and qt.tag_id in
 (856,428)) or (qt.language_id =
   3 and qt.tag_id in (1156,1028,899,771,642,514,385,257,128)) or
 (qt.language_id = 2 and qt.tag_id in
 (1193,1101,1009,917,826,734,642,550,458,367,275,183,91)))
   and q.question_id not in (413)
LIMIT 200

 EXPLAIN ANALYZE: =

   Limit  (cost=1.50..1267.27 rows=200 width=4) (actual
 time=278.169..880.934 rows=200 loops=1)
 -  Unique  (cost=1.50..317614.50 rows=50185 width=4) (actual
 time=278.165..880.843 rows=200 loops=1)
   -  Merge Join  (cost=1.50..317489.04 rows=50185 width=4)
 (actual time=278.162..880.579 rows=441 loops=1)
 Merge Cond: (qt.question_id = q.question_id)
 -  Index Scan using question_tags_question_id on
 question_tags qt  (cost=0.00..301256.96 rows=82051 width=4) (actual
 time=24.171..146.811 rows=6067 loops=1)
   Filter: (((language_id = 5) AND (tag_id = ANY
 ('{1101,917,734,550,367,183}'::integer[]))) OR ((language_id = 4) AND
 (tag_id = ANY ('{856,428}'::integer[]))) OR ((language_id = 3) AND
 (tag_id = ANY ('{1156,1028,899,771,642,514,385,257,128}'::integer[])))
 OR ((language_id = 2) AND (tag_id = ANY
 ('{1193,1101,1009,917,826,734,642,550,458,367,275,183,91}'::integer[]
 -  Index Scan using questions_pkey on questions q
 (cost=0.00..15464.12 rows=83488 width=4) (actual time=222.956..731.737
 rows=1000 loops=1)
   Filter: ((q.author_id  105) AND (q.question_id
  413) AND (q.status = 1))
   Total runtime: 881.152 ms
 (9 rows)

First, because of the distinct, the limit 200 wont reduce the work to be done 
a lot : it will still have to sort for the unique. Only when everything is 
sorted will it take only the first 200 records. And anyhow it seems there are 
only 441 rows before doing the distinct, so, at least for this query, it 
won't change a lot the times.

Then it seems to me that you may try to create composed indexes, because there 
is a lot of filtering after the index scans (that is if you want the query to 
be faster). 
Maybe (q.author_id,q.status).

For qt you may try (qt.language_id,qt.tag_id)...

Hope it helps

Cheers

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird row estimate

2007-07-12 Thread Marc Cousin
Le Wednesday 11 July 2007 22:35:31 Tom Lane, vous avez écrit :
 Marc Cousin [EMAIL PROTECTED] writes:
  Nevertheless, shouldn't the third estimate be smaller or equal to the sum
  of the two others ?

 The planner's estimation for subplan conditions is pretty primitive
 compared to joinable conditions.  When you add the OR, it's no longer
 possible to treat the IN like a join, and everything gets an order of
 magnitude dumber :-(

 It might be worth trying this as a UNION of the two simple queries.

Yes, it's much better on this query with a UNION.
The problem is that this is a small set of the query, and there are several 
nested IN with an OR condition... But at least now I understand where it 
comes from.
Thanks a lot.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Weird row estimate

2007-07-11 Thread Marc Cousin
Hi,

I'm having a weird problem on a query :
I've simplified it to get the significant part (see end of message).
The point is I've got a simple
SELECT field FROM table WHERE 'condition1'
Estimated returned rows : 5453
Then
SELECT field FROM table WHERE 'condition2'
Estimated returned rows : 705
Then
SELECT field FROM table WHERE 'condition1' OR 'condition2'
Estimated returned rows : 143998

Condition2 is a bit complicated (it's a subquery).
Nevertheless, shouldn't the third estimate be smaller or equal to the sum of 
the two others ?


Postgresql is 8.2.4 on Linux, stats are up to date,
show default_statistics_target;
 default_statistics_target
---
 1000



Any ideas ?



explain analyze 
SELECT stc.CMD_ID
 FROM STOL_STC stc
 WHERE  (stc.STC_DATE='2007-07-05' AND stc.STC_DATEPLAN='2007-07-05');

QUERY PLAN
--
 Seq Scan on stol_stc stc  (cost=0.00..24265.15 rows=5453 width=8) (actual 
time=17.186..100.941 rows=721 loops=1)
   Filter: ((stc_date = '2007-07-05'::date) AND (stc_dateplan = 
'2007-07-05'::date))
 Total runtime: 101.656 ms
(3 rows)


explain analyze 
SELECT stc.CMD_ID
 FROM STOL_STC stc
 WHERE stc.STC_ID IN 
(SELECT STC_ID FROM STOL_TRJ 
WHERE TRJ_DATEARRT='2007-07-05' 
AND TRJ_DATEDEPT=TRJ_DATEARRT  
AND (TRJ_DATEDEPT='2007-07-05' 
OR TRJ_DATECREAT='2007-07-05') );


 QUERY PLAN 
  

 Nested Loop  (cost=4649.62..10079.52 rows=705 width=8) (actual 
time=6.266..13.037 rows=640 loops=1)
   -  HashAggregate  (cost=4649.62..4657.13 rows=751 width=8) (actual 
time=6.242..6.975 rows=648 loops=1)
 -  Index Scan using stol_trj_fk5 on stol_trj  (cost=0.00..4647.61 
rows=803 width=8) (actual time=0.055..4.901 rows=688 loops=1)
   Index Cond: (trj_datearrt = '2007-07-05'::date)
   Filter: ((trj_datedept = trj_datearrt) AND ((trj_datedept = 
'2007-07-05'::date) OR (trj_datecreat = '2007-07-05'::date)))
   -  Index Scan using stol_stc_pk on stol_stc stc  (cost=0.00..7.21 rows=1 
width=16) (actual time=0.004..0.005 rows=1 loops=648)
 Index Cond: (stc.stc_id = stol_trj.stc_id)
 Total runtime: 13.765 ms
(8 rows)

explain analyze
SELECT stc.CMD_ID
 FROM STOL_STC stc
 WHERE  (stc.STC_DATE='2007-07-05' AND stc.STC_DATEPLAN='2007-07-05')
 OR
(stc.STC_ID IN 
(SELECT STC_ID FROM STOL_TRJ 
WHERE TRJ_DATEARRT='2007-07-05' 
AND TRJ_DATEDEPT=TRJ_DATEARRT  
AND (TRJ_DATEDEPT='2007-07-05' 
OR TRJ_DATECREAT='2007-07-05') ));


   QUERY PLAN   
  

 Seq Scan on stol_stc stc  (cost=4649.62..29621.12 rows=143998 width=8) (actual 
time=21.564..146.365 rows=1048 loops=1)
   Filter: (((stc_date = '2007-07-05'::date) AND (stc_dateplan = 
'2007-07-05'::date)) OR (hashed subplan))
   SubPlan
 -  Index Scan using stol_trj_fk5 on stol_trj  (cost=0.00..4647.61 
rows=803 width=8) (actual time=0.054..4.941 rows=688 loops=1)
   Index Cond: (trj_datearrt = '2007-07-05'::date)
   Filter: ((trj_datedept = trj_datearrt) AND ((trj_datedept = 
'2007-07-05'::date) OR (trj_datecreat = '2007-07-05'::date)))
 Total runtime: 147.407 ms


SELECT count(*) from stol_stc ;
 count

 140960
(1 row)


Re: [PERFORM] Optimicing Postgres for SunSolaris10 on V240

2006-11-18 Thread Marc Cousin
Hi...

Bacula does no transaction right now, so every insert is done separately with 
autocommit.
Moreover, the insert loop for the main table is done by several individual
queries to insert data in several tables (filename, dir, then file), so this
is slow.
There's work underway to speed that up, using a big COPY to a temp table,
then queries to dispatch the records in the right places as fast as
possible. The patch has been made, but as it is a noticeable change in the
core, will take some time to be integrated... See the thread about that in
the bacula devel list a few weeks ago... Anyhow, our benchmark for now shows
a 10-20 times speedup with postgresql, fsync stays on, and it becomes faster
than mysql, and scales with the number of cpus... I cannot tell when/if it
will be included, but there's work on this.

For now, the only thing you can do is fsync=off, knowing you're taking a
chance with the data (but it's not that big a problem, as it's only bacula's
database, and can be rebuilt from the tapes or from a dump...) or a writeback
disk controller.



On Friday 17 November 2006 19:44, Josh Berkus wrote:
 Berner,

 First, I've corrected you e-mail so that it goes to the list, and not to
 me directly.

  I use my PostgreSQL 8.0.4 as Catalogue-Database for Bacula.
  Bacula is a Backupsoftware.

 Yes.  The lead contributor to Bacula is a active PostgreSQL project
 participant; I'll see if he'll look into your issue.

  When I backing up System (lot of very small Files) then PostgreSQL seams
  to by the bottleneck by inserting Catalogueinformation of every single
  File. The System on which Postgres runs is a Sun Solaris 10 Server on a
  Sun Fire V240 with 1GB RAM, 1CPU (SUNW,UltraSPARC-IIIi at 1.3GHz), 2
  Ultra SCSI-3 Disks 73GB at 10k RPM which are in Raid1 (Solaris Softraid).
 
  Can someone gif me a hint for compiling PostgreSQL or configuring the
  Database.
 
  fsync is already disabled..

 This is a bad idea if you care about your database.

 So, PostgreSQL 8.1 is now official supported by Sun and ships with
 Solaris 10 update 2 or later.  It is recommended that you use that
 rather and an out-of-date version.   Second, see
 www.powerpostgresql.com/PerfList

 --Josh Berkus


 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Inserts optimization?

2006-04-14 Thread Marc Cousin
I hope I'm not going to say stupid things, but here's what i know (or i think 
i know :) ) about bacula + postgresql

If I remember correctly (I allready discussed this with Kern Sibbald a while 
ago), bacula does each insert in its own transaction : that's how the program 
is done, and of course it works ok with mysql and MyIsam tables, as mysql 
doesn't have transactions with myisam...

So, you'll probably end up being slowed down by WAL fsyncs ... and you won't 
have a lot of solutions. Maybe you should start with trying to set fsync=no 
as a test to confirm that (you should have a lot of iowaits right now if you 
haven't disabled fsync).

For now, I only could get good performance with bacula and postgresql when 
disabling fsync...


On Thursday 13 April 2006 20:45, Francisco Reyes wrote:
 Chris writes:
  If you can, use copy instead:
  http://www.postgresql.org/docs/8.1/interactive/sql-copy.html

 I am familiar with copy.
 Can't use it in this scenario.

 The data is coming from a program called Bacula (Backup server).
 It is not static data.

 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] partitioning

2005-12-13 Thread Marc Cousin
Hi,

I've been working on trying to partition a big table (I've never partitioned a 
table in any other database till now).
Everything went ok, except one query that didn't work afterwards.

I've put the partition description, indexes, etc ..., and the explain plan 
attached.

The query is extremely fast without partition (index scan backards on the 
primary key)

The query is : select * from logs order by id desc limit 100;
id is the primary key.

It is indexed on all partitions.

But the explain plan does full table scan on all partitions.

While I think I understand why it is doing this plan right now, is there 
something that could be done to optimize this case ? Or put a warning in the 
docs about this kind of behaviour. I guess normally someone would partition 
to get faster queries :)

Anyway, I thought I should mention this, as it has been quite a surprise.
CREATE TABLE logs_150 (CHECK ( id_machine = 150)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_163 (CHECK ( id_machine = 163)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_289 (CHECK ( id_machine = 289)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_319 (CHECK ( id_machine = 319)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_238 (CHECK ( id_machine = 238)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_148 (CHECK ( id_machine = 148)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_176 (CHECK ( id_machine = 176)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_164 (CHECK ( id_machine = 164)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_316 (CHECK ( id_machine = 316)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_313 (CHECK ( id_machine = 313)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_217 (CHECK ( id_machine = 217)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_167 (CHECK ( id_machine = 167)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_287 (CHECK ( id_machine = 287)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_279 (CHECK ( id_machine = 279)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_248 (CHECK ( id_machine = 248)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_280 (CHECK ( id_machine = 280)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_165 (CHECK ( id_machine = 165)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_330 (CHECK ( id_machine = 330)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_149 (CHECK ( id_machine = 149)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_157 (CHECK ( id_machine = 157)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_272 (CHECK ( id_machine = 272)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_311 (CHECK ( id_machine = 311)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_309 (CHECK ( id_machine = 309)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_318 (CHECK ( id_machine = 318)) INHERITS (logs) TABLESPACE 
data_logs;


CREATE UNIQUE INDEX logs_150_pkey ON logs_150 (id) TABLESPACE index_logs;CREATE 
INDEX date_150 ON logs_150 (date) TABLESPACE index_logs ;CREATE INDEX event_150 
ON logs_150 (evenement) TABLESPACE index_logs;


.


logs= explain select * from logs order by id desc limit 100;
 QUERY PLAN
-
 Limit  (cost=16524647.29..16524647.54 rows=100 width=295)
   -  Sort  (cost=16524647.29..16568367.11 rows=17487927 width=295)
 Sort Key: public.logs.id
 -  Result  (cost=0.00..827622.27 rows=17487927 width=295)
   -  Append  (cost=0.00..827622.27 rows=17487927 width=295)
 -  Seq Scan on logs  (cost=0.00..826232.78 rows=17451978 
width=165)
 -  Seq Scan on logs_150 logs  (cost=0.00..199.04 
rows=6104 width=144)
 -  Seq Scan on logs_163 logs  (cost=0.00..261.79 
rows=7079 width=169)
 -  Seq Scan on logs_289 logs  (cost=0.00..428.93 
rows=10693 width=200)
 -  Seq Scan on logs_319 logs  (cost=0.00..31.92 rows=992 
width=129)
 -  Seq Scan on logs_238 logs  (cost=0.00..28.01 rows=701 
width=199)
 -  Seq Scan on logs_148 logs  (cost=0.00..80.15 rows=2015 
width=195)
 -  Seq Scan on logs_176 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_164 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_316 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_313 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_217 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_167 logs  (cost=0.00..57.36 rows=1536 
width=170)
 -  Seq Scan on logs_287 logs  (cost=0.00..12.40 

Re: [PERFORM] partitioning

2005-12-13 Thread Marc Cousin
Yes, that's how I solved it... and I totally agree that it's hard for the 
planner to guess what to do on the partitions. But maybe there should be 
something in the docs explaining the limitations ...

I'm only asking for the biggest 100 ids from the table, so I thought maybe the 
planner would take the 100 biggest from all partitions or something like that 
and return me the 100 biggest from those results. It didn't and that's quite 
logical.

What I meant is that I understand why the planner chooses this plan, but maybe 
it should be written somewhere in the docs that some plans will be worse 
after partitionning.

Le Mardi 13 Décembre 2005 12:50, vous avez écrit :
 I just saw that there is no where clause in the query, that you had
 fed to explain plan.
 you need to include a where clause based on id_machine column to see the
 effect.

 On 12/13/05, Pandurangan R S [EMAIL PROTECTED] wrote:
  Did you set constraint_exclusion = true in postgresql.conf file?
 
  On 12/13/05, Marc Cousin [EMAIL PROTECTED] wrote:
   Hi,
  
   I've been working on trying to partition a big table (I've never
   partitioned a table in any other database till now).
   Everything went ok, except one query that didn't work afterwards.
  
   I've put the partition description, indexes, etc ..., and the explain
   plan attached.
  
   The query is extremely fast without partition (index scan backards on
   the primary key)
  
   The query is : select * from logs order by id desc limit 100;
   id is the primary key.
  
   It is indexed on all partitions.
  
   But the explain plan does full table scan on all partitions.
  
   While I think I understand why it is doing this plan right now, is
   there something that could be done to optimize this case ? Or put a
   warning in the docs about this kind of behaviour. I guess normally
   someone would partition to get faster queries :)
  
   Anyway, I thought I should mention this, as it has been quite a
   surprise.
  
  
  
   ---(end of
   broadcast)--- TIP 1: if posting/reading through
   Usenet, please send an appropriate subscribe-nomail command to
   [EMAIL PROTECTED] so that your message can get through to the
   mailing list cleanly
 
  --
  Regards
  Pandu

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] insert performance for win32

2005-11-02 Thread Marc Cousin
Le Mercredi 02 Novembre 2005 14:54, Magnus Hagander a écrit :
   I've done the tests with rc1. This is still as slow on windows ...
 
  about
 
   6-10
   times slower thant linux (via Ip socket). (depending on
 
  using prepared
 
   queries, etc...)
  
   By the way, we've tried to insert into the windows database from a
 
  linux
 
   psql
   client, via the network. In this configuration, inserting is only
 
  about 2
 
   times slower than inserting locally (the linux client had a
 
  slower CPU
 
   1700Mhz agains 3000).
   Could it be related to a problem in the windows psql client ?
 
  [OK, I'm bringing this back on-list, and bringing it to
  QingQing's attention, who I secretly hope is the right person
  to be looking at this problem :)]
 
  Just to recap Marc and I have been looking at the performance
  disparity between windows and linux for a single transaction
  statement by statement insert on a very narrow table with no
  keys from a remote client.  Marc's observations showed (and I
  verified) that windows is much slower in this case than it
  should be.  I gprof'ed both the psql client and the server
  during the insert and didn't see anything seriously out of
  order...unfortunately QQ's latest win32 performance tweaks
  haven't helped.
 
  Marc's observation that by switching to a linux client drops
  time down drastically is really intersing!

 Could this be a case of the network being slow, as we've seen a couple
 of times before? And if you run psql on the local box, you get it
 double.

 Do you get a speed difference between the local windows box and a remote
 wnidows box?

 //Magnus
The Windows-Windows test is local (via loopback interface)
The Linux (client) - Windows (server) is via network (100Mbits)

I can't test with 2 windows box ... I haven't got that much (all machines 
linux, except the test one...)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] insert performance for win32

2005-09-07 Thread Marc Cousin
On Tuesday 06 September 2005 19:11, Merlin Moncure wrote:
  This makes me wonder if we are looking in the wrong place.  Maybe the
  problem is coming from psql?  More results to follow.

 problem is not coming from psql.

 One thing I did notice that in a 250k insert transaction the insert time
 grows with #recs inserted.  Time to insert first 50k recs is about 27
 sec and last 50 k recs is 77 sec.  I also confimed that size of table is
 not playing a role here.

 Marc, can you do select timeofday() every 50k recs from linux?  Also a
 gprof trace from linux would be helpful.


Here's the timeofday ... i'll do the gprof as soon as I can.
Every 5 rows...

Wed Sep 07 13:58:13.860378 2005 CEST
Wed Sep 07 13:58:20.926983 2005 CEST
Wed Sep 07 13:58:27.928385 2005 CEST
Wed Sep 07 13:58:35.472813 2005 CEST
Wed Sep 07 13:58:42.825709 2005 CEST
Wed Sep 07 13:58:50.789486 2005 CEST
Wed Sep 07 13:58:57.553869 2005 CEST
Wed Sep 07 13:59:04.298136 2005 CEST
Wed Sep 07 13:59:11.066059 2005 CEST
Wed Sep 07 13:59:19.368694 2005 CEST





 Merlin

 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] insert performance for win32

2005-09-06 Thread Marc Cousin
Hi, 

I usually use PostgreSQL coupled with Linux, but I have to use Windows for a 
perticular project.

So I wanted to do some tests to know if the performance will be acceptable (I 
don't need PostgreSQL to be as fast with windows as with linux, but it has to 
be usable...).

I started with trying to do lots of inserts, and I'm quite astonished by the 
catastrophics results ...

The test :
The computer was the same (my workstation, a PIV Dell with SATA disk), dual 
boot

The windows OS is XP.

Both Oses are PostgreSQL 8.0.3

Both PostgreSQL clusters (windows and linux) have the same tuning 
(shared_buffers=2, wal_buffers=128, checkpoint_segments=10)

Before each test, the clusters are vacuum analyzed, and the test database is 
recreated.

The script is quite dumb :
BEGIN;
CREATE TABLE test (col1 serial, col2 text);
INSERT INTO test (col2) values ('test');
INSERT INTO test (col2) values ('test');
INSERT INTO test (col2) values ('test');
INSERT INTO test (col2) values ('test');
INSERT INTO test (col2) values ('test');
.. 500,000 times
Then COMMIT.

I know it isn't realistic, but I needed to start with something :)

The results are as follows :
Linux : 1'9''
Windows : 9'38''

What I've tried to solve, and didn't work :

- Deactivate antivirus on windows
- fsync=no
- raise the checkpoint_segments value (32)
- remove hyperthreading (who knows...)

I don't know what could cause this (I'm not a windows admin...at all). All I 
see is a very high kernel load during the execution of this script, but I 
can't determine where it comes from.


I'd like to know if this is a know problem, if there is something I can do, 
etc...

Thanks a lot.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] insert performance for win32

2005-09-06 Thread Marc Cousin

 In my experience win32 is par with linux generally with a few gotchas on
 either side.  Are your times with fsync=no? It's much harder to give
 apples-apples comparison with fsync=on for various reasons.
It is with fsync=off on windows, fsync=on on linux


 Are you running stats_command_string=on?  Try disabling and compare
 results.
Deactivated on windows, activated on linux

 Is your loading app running locally or on the server?
Yes

 I am very interesting in discovering sources of high cpu load problems
 on win32.  If you are still having problems could you get a gprof
 profile together?  There is a recent thread on win32-hackers discussing
 how to do this.
I'll give it a look

 Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Problem with large query

2004-09-08 Thread Marc Cousin
Hi. I hope I'm not asking a too trivial question here...

I'm having trouble with a (quite big) query, and can't find a way to make it 
faster.

Here is the information :

Tables :

sces_vte - 2753539 rows
sces_art - 602327
sces_fsf - 8126
sces_frc - 7763
sces_tps - 38
sces_gtr - 35


Query :
===
SELECT
sces_gtr_art.gtr_cod,
sces_gtr_art.gtr_lib,
sces_frc_art.fou_cod,
sces_frc_art.fou_lib,
sces_tps.tps_annee_mois,
TO_NUMBER('200401','99'),
TO_NUMBER('200405','99'),
sces_tps.tps_libc,
sum(sces_vte.vte_mnt),
sum(sces_vte.vte_qte),
sum(sces_vte.vte_ton),
sces_famille.fsf_codfam,
sces_famille.fsf_lib,
sces_s_famille.fsf_codsfm,
sces_s_famille.fsf_lib
FROM
sces_vte,
sces_art,
sces_fsf  sces_famille,
sces_fsf  sces_s_famille,
sces_frc  sces_frc_art,
sces_tps,
sces_gtr  sces_gtr_art
WHERE
( sces_famille.fsf_codfam=sces_s_famille.fsf_codfam  )
AND  ( sces_famille.fsf_codseg= 0 and sces_famille.fsf_codsfm = 0  )
AND  ( sces_vte.tps_annee_mois=sces_tps.tps_annee_mois  )
AND  ( sces_vte.art_cod=sces_art.art_cod and 
sces_vte.dos_cod=sces_art.dos_cod  )
AND  ( sces_gtr_art.gtr_cod=sces_frc_art.gtr_cod  )
AND  ( sces_frc_art.gtr_cod=sces_art.gtr_cod and 
sces_frc_art.fou_cod=sces_art.fou_cod )
AND  ( sces_s_famille.fsf_codfam=sces_art.fsf_codfam and 
sces_s_famille.fsf_codsfm=sces_art.fsf_codsfm  )
AND  ( sces_s_famille.fsf_codseg = 0  )
AND  (
( ( ( sces_tps.tps_annee_mois ) = ( TO_NUMBER('200401','99') ) and 
( sces_tps.tps_annee_mois ) = (
TO_NUMBER('200405','99') )
)
OR
(
( sces_tps.tps_annee_mois ) = ( TO_NUMBER('200401','99') )-100 and 
( sces_tps.tps_annee_mois ) = (
TO_NUMBER('200405','99') )-100
)  )
AND  ( sces_gtr_art.gtr_cod  in  (2))
)
GROUP BY
sces_gtr_art.gtr_cod,
sces_gtr_art.gtr_lib,
sces_frc_art.fou_cod,
sces_frc_art.fou_lib,
sces_tps.tps_annee_mois,
TO_NUMBER('200401','99'),
TO_NUMBER('200405','99'),
sces_tps.tps_libc,
sces_famille.fsf_codfam,
sces_famille.fsf_lib,
sces_s_famille.fsf_codsfm,
sces_s_famille.fsf_lib

Explain Analyze Plan : 

 GroupAggregate  (cost=27161.91..27938.72 rows=16354 width=280) (actual 
time=484509.210..544436.148 rows=4115 loops=1)
   -  Sort  (cost=27161.91..27202.79 rows=16354 width=280) (actual 
time=484496.188..485334.151 rows=799758 loops=1)
 Sort Key: sces_gtr_art.gtr_cod, sces_gtr_art.gtr_lib, sces_frc_art.fou_cod, 
sces_frc_art.fou_lib, sces_tps.tps_annee_mois, 200401::numeric, 200405::numeric, 
sces_tps.tps_libc, sces_famille.fsf_codfam, sces_famille.fsf_lib, 
sces_s_famille.fsf_codsfm, sces_s_famille.fsf_lib
 -  Merge Join  (cost=25727.79..26017.34 rows=16354 width=280) (actual 
time=58945.821..69321.146 rows=799758 loops=1)
   Merge Cond: ((outer.fsf_codfam = inner.fsf_codfam) AND 
(outer.fsf_codsfm = inner.fsf_codsfm))
   -  Sort  (cost=301.36..304.60 rows=1298 width=83) (actual 
time=27.926..28.256 rows=332 loops=1)
 Sort Key: sces_s_famille.fsf_codfam, sces_s_famille.fsf_codsfm
 -  Seq Scan on sces_fsf sces_s_famille  (cost=0.00..234.24 
rows=1298 width=83) (actual time=0.042..19.124 rows=1341 loops=1)
   Filter: (fsf_codseg = 0::numeric)
   -  Sort  (cost=25426.43..25448.05 rows=8646 width=225) (actual 
time=58917.106..59693.810 rows=799758 loops=1)
 Sort Key: sces_art.fsf_codfam, sces_art.fsf_codsfm
 -  Merge Join  (cost=24726.32..24861.08 rows=8646 width=225) 
(actual time=19036.709..29404.943 rows=799758 loops=1)
   Merge Cond: (outer.tps_annee_mois = 
inner.tps_annee_mois)
   -  Sort  (cost=2.49..2.53 rows=17 width=23) (actual 
time=0.401..0.428 rows=20 loops=1)
 Sort Key: sces_tps.tps_annee_mois
 -  Seq Scan on sces_tps  (cost=0.00..2.14 rows=17 
width=23) (actual time=0.068..0.333 rows=20 loops=1)
   Filter: (((tps_annee_mois = 200301::numeric) 
OR (tps_annee_mois = 200401::numeric)) AND ((tps_annee_mois = 200305::numeric) OR 
(tps_annee_mois = 200401::numeric)) AND ((tps_annee_mois = 200301::numeric) OR 
(tps_annee_mois = 200405::numeric)) AND ((tps_annee_mois = 200305::numeric) OR 
(tps_annee_mois = 200405::numeric)))
   -  Sort  (cost=24723.83..24747.97 rows=9656 width=214) 
(actual time=19036.223..19917.214 rows=799757 loops=1)
 Sort Key: sces_vte.tps_annee_mois
 -  Nested Loop  (cost=21825.09..24084.74 rows=9656 
width=214) (actual time=417.603..8644.294 rows=399879 loops=1)
   -  Nested Loop  (cost=21825.09..21837.50 
rows=373 width=195) (actual time=417.444..672.741 rows=14158 loops=1)
 -  Seq Scan on sces_gtr sces_gtr_art  
(cost=0.00..1.44 rows=1 width=40) (actual 

Re: [PERFORM] Problem with large query

2004-09-08 Thread Marc Cousin
The query has been generated by business objects ... i'ill try to suggest to the 
developpers to remove this constant (if they can)...
The fields used by the sort are of type numeric(6,0) or (10,0) ...
Could it be better if the fields were integer or anything else ?


On Wednesday 08 September 2004 16:40, you wrote:
 Marc Cousin [EMAIL PROTECTED] writes:
  I'm having trouble with a (quite big) query, and can't find a way to make it 
  faster.
 
 Seems like it might help if the thing could use a HashAggregate instead
 of sort/group.  Numeric is not hashable, so having those TO_NUMBER
 constants in GROUP BY destroys this option instantly ... but why in the
 world are you grouping by constants anyway?  You didn't say what the
 datatypes of the other columns were...
 
regards, tom lane
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Problem with large query

2004-09-08 Thread Marc Cousin
On Wednesday 08 September 2004 16:56, you wrote:
 Marc Cousin [EMAIL PROTECTED] writes:
  The query has been generated by business objects ... i'ill try to suggest to the 
  developpers to remove this constant (if they can)...
  The fields used by the sort are of type numeric(6,0) or (10,0) ...
  Could it be better if the fields were integer or anything else ?
 
 integer or bigint would be a WHOLE lot faster.  I'd venture that
 comparing two numerics is order of a hundred times slower than
 comparing two integers.
 
 Even if you don't want to change the fields on-disk, you might think
 about casting them all to int/bigint in the query.
 
 Another thing that might or might not be easy is to change the order of
 the GROUP BY items so that the fields with the largest number of
 distinct values are listed first.  If two rows are distinct at the first
 column, the sorting comparison doesn't even have to look at the
 remaining columns ...
 
regards, tom lane
 
Thanks. I've just had confirmation that they can remove the two constants (allready 
won 100 seconds thanks to that)
I've tried the cast, and got down to 72 seconds.
So now we're going to try to convert the fields to int or bigint.

Thanks a lot for your help and time.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org