from:"Robert Haas"

Re: [PERFORM] function execute on v.9.2 slow down

2013-09-16 Thread Robert Haas

On Mon, Aug 12, 2013 at 8:21 AM, Александр Белинский  wrote:
> Hi!
> I can't explain why function is slow down on same data.
> Postgresql.conf the same, hardware is more powerful.
> Diffrents is postgresql version

Hmm.  PostgreSQL 9.2 will sometimes replan queries a number of times
where older releases, looking to see whether the choice of bind
variables affects the optimal plan choice, where older versions would
create a generic plan on first execution and use it forever.  I'm not
sure whether that behavior applies in this situation, though.  If you
run it say 15 times does it eventually start running faster?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Effect of the WindowAgg on the Nested Loop

2013-05-15 Thread Robert Haas

On Tue, Jan 22, 2013 at 3:57 PM, Виктор Егоров  wrote:
> Greetings.
>
> I've been playing with a small query that I've been asked to optimize
> and noticed a strange (for me) effect.
> Query uses this table:
>
>Table "clc06_tiles"
>Column   | Type  |
> Modifiers
> +---+---
>  geometry   | geometry  |
>  code_06| character varying(3)  |
>  gid| bigint| not null default
> nextval('clc06_tiles_gid_seq'::regclass)
> Indexes:
> "clc06_tiles_pkey" PRIMARY KEY, btree (gid)
> "i_clc06_tiles_geometry" gist (geometry)
> Check constraints:
> "enforce_dims_geom" CHECK (st_ndims(geometry) = 2)
> "enforce_geotype_geom" CHECK (geometrytype(geometry) =
> 'MULTIPOLYGON'::text OR geometrytype(geometry) = 'POLYGON'::text OR
> geometry IS NULL)
> "enforce_srid_geom" CHECK (st_srid(geometry) = 3035)
>
> and this function:
> CREATE OR REPLACE FUNCTION my_trans(x1 float8, y1 float8, x2 float8,
> y2 float8) RETURNS geometry AS $my_trans$
> SELECT st_Transform(
> st_GeomFromText('LINESTRING('||x1::text||' '||y1::text||
>  ', '||x2::text||' '||y2::text||')',4326),3035);
> $my_trans$ LANGUAGE sql IMMUTABLE STRICT;
>
> and these constants:
> \set x1 4.56
> \set y1 52.54
> \set x2 5.08
> \set y2 53.34
>
>
> Original query looks like this ( http://explain.depesz.com/s/pzv ):
>
> SELECT n, i.*, st_NumGeometries(i.geom)
>   FROM (
> SELECT a.code_06 as code_06,
>st_Multi(st_Intersection(a.geometry,
> my_trans(:x1,:y1,:x2,:y2))) as geom
>   FROM clc06_tiles a
>  WHERE st_Intersects(a.geometry, my_trans(:x1,:y1,:x2,:y2))) i
>   JOIN generate_series(1,10) n ON n <= st_NumGeometries(i.geom);
>
>
> After a while I added row_number() to the inner part (
> http://explain.depesz.com/s/hfs ):
>
> SELECT n, i.*, st_NumGeometries(i.geom)
>   FROM (
> SELECT row_number() OVER () AS rn, a.code_06 as code_06,
>st_Multi(st_Intersection(a.geometry,
> my_trans(:x1,:y1,:x2,:y2))) as geom
>   FROM clc06_tiles a
>  WHERE st_Intersects(a.geometry, my_trans(:x1,:y1,:x2,:y2))) i
>   JOIN generate_series(1,10) n ON n <= st_NumGeometries(i.geom);
>
>
> It was really surprising to see a "side" effect of 8x performance boost.
> The only difference I can see is an extra WindowAgg step in the second 
> variant.
>
> Could you kindly explain how WindowAgg node affects the overall
> performance, please?

Apologies for resurrecting an old thread, but I just came across this
post while doing some research and I don't see any responses.

This seems like a mighty interesting example.  I'm not sure what's
going on here, but let me guess.  I think that the WindowAgg is
forcing some operation - detoasting, maybe? - to happen under the
materialize node.  As a result, it only gets done once.  But in the
other plan, the detoast happens at the nested loop level, above the
materialize node, and therefore it happens 10x instead of 1x.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Lock and pg_stat

2013-05-14 Thread Robert Haas

On Mon, May 13, 2013 at 9:05 AM, Desbiens, Eric  wrote:
> I tried also:
>
>   select * from pg_class where oid=30352481;
>
> but didn't got anything

You probably want where relfilenode=30352481.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] RT3.4 query needed a lot more tuning with 9.2 than it did with 8.1

2013-05-14 Thread Robert Haas

On Mon, May 13, 2013 at 4:33 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, May 13, 2013 at 4:14 PM, Tom Lane  wrote:
>>> You know, of course, that the join size estimate isn't arrived at that
>>> way.  Still, this point does make it seem more like a planner bug and
>>> less like bad input stats.  It would be nice to see a self-contained
>>> example ...
>
>> Yeah, I remember there have been examples like this that have come up
>> before.  Unfortunately, I haven't fully grokked what's actually going
>> on here that allows this kind of thing to happen.  Refresh my memory
>> on where the relevant code is?
>
> The point is that we estimate the size of a joinrel independently of
> any particular input paths for it, and indeed before we've built any
> such paths.  So this seems like a bug somewhere in selectivity
> estimation, but I'm not prepared to speculate as to just where.

Hmm.  I went looking for the relevant code and found
calc_joinrel_size_estimate().  If that's actually the right place to
be looking, it's hard to escape the conclusion that pselec > 1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] RT3.4 query needed a lot more tuning with 9.2 than it did with 8.1

2013-05-13 Thread Robert Haas

On Mon, May 13, 2013 at 4:14 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> The planner is estimating this the outer side of this nested loop will
>> produce 33 rows and that the inner side will produce 1.  One would
>> assume that the row estimate for the join product couldn't be more
>> than 33 * 1 = 33 rows, but the planner is estimating 62335 rows, which
>> seems like nonsense.
>
> You know, of course, that the join size estimate isn't arrived at that
> way.  Still, this point does make it seem more like a planner bug and
> less like bad input stats.  It would be nice to see a self-contained
> example ...

Yeah, I remember there have been examples like this that have come up
before.  Unfortunately, I haven't fully grokked what's actually going
on here that allows this kind of thing to happen.  Refresh my memory
on where the relevant code is?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] RT3.4 query needed a lot more tuning with 9.2 than it did with 8.1

2013-05-13 Thread Robert Haas

On Tue, Apr 30, 2013 at 7:20 AM, Christoph Berg
 wrote:
>->  Nested Loop  
> (cost=24.57..844.83 rows=62335 width=4) (actual time=0.109..0.633 rows=23 
> loops=1)
>  ->  Bitmap Heap Scan 
> on acl acl_2  (cost=8.90..61.36 rows=33 width=10) (actual time=0.070..0.112 
> rows=22 loops=1)
>Recheck Cond: 
> rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text = 
> 'RT::System'::text)) OR (((rightname)::text = 'OwnTicket'::text) AND 
> ((objecttype)::text = 'RT::Queue'::text) AND (objectid = 10)))
>->  BitmapOr  
> (cost=8.90..8.90 rows=35 width=0) (actual time=0.064..0.064 rows=0 loops=1)
>  ->  
> Bitmap Index Scan on acl1  (cost=0.00..4.47 rows=22 width=0) (actual 
> time=0.036..0.036 rows=8 loops=1)
>
> Index Cond: (((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text 
> = 'RT::System'::text))
>  ->  
> Bitmap Index Scan on acl1  (cost=0.00..4.41 rows=13 width=0) (actual 
> time=0.026..0.026 rows=14 loops=1)
>
> Index Cond: (((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text 
> = 'RT::Queue'::text) AND (objectid = 10))
>  ->  Bitmap Heap Scan 
> on groups groups_3  (cost=15.67..23.73 rows=1 width=30) (actual 
> time=0.022..0.023 rows=1 loops=22)
>Recheck Cond: 
> ((acl_2.principalid = id) OR type)::text = (acl_2.principaltype)::text) 
> AND (instance = 10) AND ((domain)::text = 'RT::Queue-Role'::text)) OR 
> (((type)::text = (acl_2.principaltype)::text) AND (instance = 999028) AND 
> ((domain)::text = 'RT::Ticket-Role'::text
>Filter: 
> domain)::text = 'SystemInternal'::text) OR ((domain)::text = 
> 'UserDefined'::text) OR ((domain)::text = 'ACLEquivalence'::text) OR 
> (((domain)::text = 'RT::Queue-Role'::text) AND (instance = 10)) OR 
> (((domain)::text = 'RT::Ticket-Role'::text) AND (instance = 999028))) AND 
> (((acl_2.principalid = id) AND ((acl_2.principaltype)::text = 'Group'::text) 
> AND (((domain)::text = 'SystemInternal'::text) OR ((domain)::text = 
> 'UserDefined'::text) OR ((domain)::text = 'ACLEquivalence'::text))) OR 
> (domain)::text = 'RT::Queue-Role'::text) AND (instance = 10)) OR 
> (((domain)::text = 'RT::Ticket-Role'::text) AND (instance = 999028))) AND 
> ((type)::text = (acl_2.principaltype)::text
>->  BitmapOr  
> (cost=15.67..15.67 rows=2 width=0) (actual time=0.019..0.019 rows=0 loops=22)
>  ->  
> Bitmap Index Scan on groups_pkey  (cost=0.00..4.76 rows=1 width=0) (actual 
> time=0.005..0.005 rows=1 loops=22)
>
> Index Cond: (acl_2.principalid = id)
>  ->  
> BitmapOr  (cost=10.66..10.66 rows=1 width=0) (actual time=0.013..0.013 rows=0 
> loops=22)
>-> 
>  Bitmap Index Scan on groups2  (cost=0.00..5.33 rows=1 width=0) (actual 
> time=0.007..0.007 rows=0 loops=22)
>   
>Index Cond: (((type)::text = (acl_2.principaltype)::text) AND (instance = 
> 10) AND ((domain)::text = 'RT::Queue-Role'::text))
>-> 
>  Bitmap Index Scan on groups2  (cost=0.00..5.33 rows=1 width=0) (actual 
> time=0.006..0.006 rows=0 loops=22)
>   
>Index Cond: (((type)::text = (acl_2.principaltype)::text) AND (instance = 
> 999028) AND ((domain)::text = 'RT::Ticket-Role'::text))

The planner is estimating this the outer side of this nested loop will
produce 33 rows and that the inner side will produce 1.  One would
assume that the row estimate for the join product couldn't be more
than 33 * 1 = 33 rows, but the planner is estimating 62335 rows, which
seems like nonsense.  The actual result cardinality is 23.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Setting vacuum_freeze_min_age really low

2013-05-13 Thread Robert Haas

On Sun, May 12, 2013 at 8:50 AM, Andres Freund  wrote:
> [ a response that I entirely agree with ]

+1 to all that.

It's maybe worth noting that it's probably fairly uncommon for vacuum
to read a page and not dirty it, because if the page is all-visible,
we won't read it.  And if it's not all-visible, and there's nothing
else interesting to do with it, we'll probably make it all-visible,
which will dirty it.  It can happen, if for example we vacuum a page
with no dead tuples while the inserting transaction is still running,
or committed but not yet all-visible.  Of course, in those cases we
won't be able to freeze, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL planner

2013-05-10 Thread Robert Haas

On Sat, Mar 23, 2013 at 8:12 PM, Misa Simic  wrote:
> but problem is - we don't know the thing id - we know calc_id:
>
> EXPLAIN ANALYZE
> SELECT * FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
> WHERE calc_id = 20

With this query you've got to scan all three tables.  The calc_id qual
can only be pushed down into the scan on t1, so you need the whole
t2/t3 join product.

> EXPLAIN ANALYZE
> SELECT v.no_index FROM t2_left_t3 v INNER JOIN t1 USING (thing_id)
> WHERE calc_id = 20

With this query you only need to scan 2 tables.  The join between t2
and t3 is eliminated by the join removal code in favor of scanning
only t2, as shown in the plan you included:

> "Nested Loop  (cost=437.49..13047.74 rows=12111 width=4) (actual
> time=6.360..71.818 rows=12038 loops=1)"
> "  ->  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual
> time=0.016..0.024 rows=1 loops=1)"
> "Filter: (calc_id = 20)"
> "Rows Removed by Filter: 99"
> "  ->  Bitmap Heap Scan on t2  (cost=437.49..12924.38 rows=12111 width=12)
> (actual time=6.330..69.063 rows=12038 loops=1)"
> "Recheck Cond: (thing_id = t1.thing_id)"
> "->  Bitmap Index Scan on t5_c2_idx  (cost=0.00..434.46 rows=12111
> width=0) (actual time=4.372..4.372 rows=12038 loops=1)"
> "  Index Cond: (thing_id = t1.thing_id)"
> "Total runtime: 72.461 ms"

The difference is that this query has only one column in its target list, not *.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Setting vacuum_freeze_min_age really low

2013-05-09 Thread Robert Haas

On Mon, Mar 25, 2013 at 4:31 PM, Josh Berkus  wrote:
> In the past, setting vacuum_freeze_min_age (vfma) really low (say to
> 1 or 5) would have caused lots of extra writing work due to
> dirtying extra pages for freezing.  This has been our stated reason to
> keep vfma high, despite the obvious advantage of freezing tuples while
> they're still in the cache.

That, and Tom's concern about forensics, which I understand to be the
larger sticking point.

> With the visibility map, though, vfma should only be dirtying pages
> which vacuum is already visiting because there's dirty tuples on the
> page.  That is, pages which vacuum will probably dirty anyway, freezing
> or not.  (This is assuming one has applied the 9.2.3 update.)

I think this is probably not accurate, although I'll defer to someone
with more real-world experience.  I'd guess that it's uncommon for
actively updated data and very-rarely-updated data to be mixed
together on the same pages with any real regularity.  IOW, the dirty
pages probably don't have anything on them that can be frozen anyway.

So, if the table's age is less than vacuum_freeze_table_age, we'll
only scan pages not already marked all-visible.  Regardless of vfma,
we probably won't freeze much.

On the other hand, if the table's age is at least
vacuum_freeze_table_age, we'll scan the whole table and freeze a lotta
stuff all at once.  Again, whether vfma is high or low won't matter
much: it's definitely less than vacuum_freeze_table_age.

Basically, I would guess that both the costs and the benefits of
changing this are pretty small.  It would be nice to hear from someone
who has tried it, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Setting Statistics on Functional Indexes

2012-11-14 Thread Robert Haas

On Fri, Oct 26, 2012 at 5:08 PM, Tom Lane  wrote:
> So the bottom line is that this is a case where you need a lot of
> resolution in the histogram.  I'm not sure there's anything good
> we can do to avoid that.   I spent a bit of time thinking about whether
> we could use n_distinct to get some idea of how many duplicates there
> might be for the endpoint value, but n_distinct is unreliable enough
> that I can't develop a lot of faith in such a thing.  Or we could just
> arbitarily assume some fraction-of-a-histogram-bin's worth of
> duplicates, but that would make the results worse for some people.

I looked at this a bit.  It seems to me that the root of this issue is
that we aren't distinguishing (at least, not as far as I can see)
between > and >=.  ISTM that if the operator is >, we're doing exactly
the right thing, but if it's >=, we're giving exactly the same
estimate that we would give for >.  That doesn't seem right.

Worse, I suspect that in this case we're actually giving a smaller
estimate for >= than we would for =, because = would estimate as if we
were searching for an arbitrary non-MCV, while >= acts like > and
says, hey, there's nothing beyond the end.

Shouldn't there be a separate estimator for scalarlesel?  Or should
the existing estimator be adjusted to handle the two cases
differently?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] [PERFORM] out of memory

2012-11-05 Thread Robert Haas

On Tue, Oct 30, 2012 at 6:08 AM, Tatsuo Ishii  wrote:
>> i have sql file (it's size are 1GB  )
>> when i execute it then the String is 987098801 bytr too long for encoding
>> conversion  error occured .
>> pls give me solution about
>
> You hit the upper limit of internal memory allocation limit in
> PostgreSQL. IMO, there's no way to avoid the error except you use
> client encoding identical to backend.

We recently had a customer who suffered a failed in pg_dump because
the quadruple-allocation required by COPY OUT for an encoding
conversion exceeded allocatable memory.  I wonder whether it would be
possible to rearrange things so that we can do a "streaming" encoding
conversion.  That is, if we have a large datum that we're trying to
send back to the client, could we perhaps chop off the first 50MB or
so, do the encoding on that amount of data, send the data to the
client, lather, rinse, repeat?

Your recent work to increase the maximum possible size of large
objects (for which I thank you) seems like it could make these sorts
of issues more common.  As objects get larger, I don't think we can go
on assuming that it's OK for peak memory utilization to keep hitting
5x or more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Robert Haas

On Sun, Sep 2, 2012 at 5:39 PM, Jeff Janes  wrote:
> On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane  wrote:
>>>> Bruce Momjian  writes:
>>>>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
>>>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>>>> it only takes 3 seconds. Comments?
>>
>>>>> Was this applied?
>>
>>>> No, we fixed the server side instead.
>>
>>> But only for 9.2, right?  So people running back branches are still screwed.
>>
>> Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
>> behaviors involved here, not all of which are masked by what Tatsuo-san
>> suggested.
>
> All of the other ones that I know of were associated with pg_dump
> itself, and since it is recommended to run the newer version of
> pg_dump against the older version of the server, no back patching
> would be necessary to get the benefits of those particular fixes.
>
>> Six months or a year from now, we might have enough confidence in that
>> batch of 9.2 fixes to back-port them en masse.  Don't want to do it
>> today though.
>
>
> What would be the recommendation for people trying to upgrade, but who
> can't get their data out in a reasonable window?
>
> Putting Tatsuo-san's change into a future pg_dump might be more
> conservative than back-porting the server's Lock Table change to the
> server version they are trying to get rid of.

What he said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-08-30 Thread Robert Haas

On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane  wrote:
> Bruce Momjian  writes:
>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>> it only takes 3 seconds. Comments?
>
>>> Shall I commit to master and all supported branches?
>
>> Was this applied?
>
> No, we fixed the server side instead.

But only for 9.2, right?  So people running back branches are still screwed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] High CPU Usage

2012-07-23 Thread Robert Haas

On Thu, Jun 14, 2012 at 11:15 AM, Siddharth Shah
 wrote:
> I have around 1000 schema in database, Each schema having similar data
> structure with different data
> Each schema has few tables which never updates (Read only table) and other
> tables rewrites almost everyday so I prefer to TRUNCATE those tables and
> restores with new data
>
> Now facing issue on high CPU & IO on database primarily of Stats Collector &
> Vacuuming, size of statfile is almost 28MB

How many tables do you have across all the schemas?

> and when I manually vacuum
> analyze complete database it takes almost 90 minutes though auto vacuum is
> configured

There's no real reason to run vacuum analyze manually if you have
autovacuum configured.

> Restoring dump on each schema may minor data variations
> Executing SQL statements on schema are few , Affecting less than 50 touple /
> day
>
> My Questions :
>
> Increasing Maintainace_Work_Mem improves auto / manual vacuum performance ?

It can, but mostly if there are a lot of updates or deletes.  If the
tables aren't changing much it isn't going to do anything.

> If it improves will it require more IO / CPU resource ?
> If I stops Stats Collector process & auto vaccuming & Execute manual vaccum
> based on schema restoration with major change what performance parameter I
> need to consider ? (Restoring data has vary few changes)
> Is Vacuuming & Stats required here for Metadata for improving performance ?
> (Table structures remain same)
>
> Any more on this which can help to reduce IO without affecting major
> performance

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] postgres clustering interactions with pg_dump

2012-07-23 Thread Robert Haas

On Mon, Jun 11, 2012 at 9:55 AM, Fitch, Britt  wrote:
> Hi, I have a table that I am clustering on an index.
>
> I am then dumping that table via pg_dump –Fc and loading it into another
> database via pg_restore.
>
> It is unclear to me though if the clustering I did in the original database
> is preserved during the dump & restore or if I would still need to perform a
> CLUSTER again once the data was loaded into the new database.
>
> Can anyone confirm this?

The rows will end up in the new table in the same physical order that
they were stored in the dump file.

You might want to look at pg_stats.correlation for the clustered
column - that's often a good way to know whether things are ordered
the way you expect, and it's updated every time the table is analyzed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Sequencial scan in a JOIN

2012-07-18 Thread Robert Haas

On Tue, Jun 5, 2012 at 8:48 AM, Andrew Jaimes  wrote:
> Hi everyone,
>
> I am trying to run the following query:
>
> SELECT  count(1) --DISTINCT l_userqueue.queueid
>   FROM e_usersessions
>   JOIN l_userqueue
> ON l_userqueue.userid = e_usersessions.entityid
>   JOIN a_activity
> ON a_activity.activequeueid  = l_userqueue.queueid
>AND a_activity.vstatus= 1
>AND a_activity.ventrydate > 0
>AND a_activity.sbuid  = e_usersessions.sbuid
>AND a_activity.assignedtoid   = 0
>AND a_activity.status<> '0'
>   WHERE e_usersessions.sessionkeepalivedatetime > 20120605082131943
>
> Explain analyze:
> 'Aggregate  (cost=100402.10..100402.11 rows=1 width=0) (actual
> time=2249.051..2249.051 rows=1 loops=1)'
> '  ->  Hash Join  (cost=10.93..99795.09 rows=242803 width=0) (actual
> time=0.541..2249.027 rows=33 loops=1)'
> 'Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND
> (a_activity.sbuid = e_usersessions.sbuid))'
> '->  Seq Scan on a_activity  (cost=0.00..88462.52 rows=1208167
> width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)'
> '  Filter: ((ventrydate > 0) AND ((status)::text <> '0'::text)
> AND (vstatus = 1) AND (assignedtoid = 0::numeric))'
> '->  Hash  (cost=10.86..10.86 rows=5 width=22) (actual
> time=0.053..0.053 rows=4 loops=1)'
> '  ->  Hash Join  (cost=9.38..10.86 rows=5 width=22) (actual
> time=0.033..0.048 rows=4 loops=1)'
> 'Hash Cond: (l_userqueue.userid =
> e_usersessions.entityid)'
> '->  Seq Scan on l_userqueue  (cost=0.00..1.23 rows=23
> width=27) (actual time=0.003..0.009 rows=23 loops=1)'
> '->  Hash  (cost=9.31..9.31 rows=5 width=21) (actual
> time=0.018..0.018 rows=2 loops=1)'
> '  ->  Index Scan using i06_e_usersessions on
> e_usersessions  (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012
> rows=2 loops=1)'
> 'Index Cond: (sessionkeepalivedatetime >
> 20120605082131943::bigint)'
> 'Total runtime: 2249.146 ms'
>
> I am trying to understand the reason why the a sequencial scan is used on
> a_activity instead of using the index by activequeueid (i08_a_activity).

I'm chiming in a bit late here, but it seems like you're hoping that
the query plan will form the outer join as a nested loop, with the
inner and outer sides swapped, so that the results of the join between
l_userqueue and e_usersessions are used to drive a series of index
scans on a_activity that avoid scanning the whole table.  PostgreSQL
9.2 will be the first release that has the ability to generate that
kind of plan, so it would be interesting to see what happens if you
try this on 9.2beta.

Older releases should be able consider a nested loop join with
l_userqueue as the inner rel, driving an index scan over a_activity,
and then performing the join to e_usersessions afterwards.  But that
plan might not be nearly as good, since then we'd have to do 23
index-scans on a_activity rather than just 4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Array fundamentals

2012-07-18 Thread Robert Haas

On Sat, Jun 2, 2012 at 1:05 PM, idc danny  wrote:
> Hi all,
> I got 2 complementary functions, which will do opposite things.
> 1 - CombineStrings(stringarray character varying[])  RETURNS character
> varying
> This one will take as parameter an array of strings and will return a string
> with some formatted information inside
> 2-  SplitString2Array(stringtosplit character varying)  RETURNS character
> varying[]
> This one will take as parameter a formatted string and will return an array
> of string
>
> The following is true, both works just fine :
> select SplitString2Array(CombineStrings(ARRAY['abba', 'queen']))
> will return {'abba', 'queen'}
>
> Now, if I want do do the following:
> select CombineStrings(ARRAY[SplitString2Array("SomeTextColumn"), 'New string
> to add']) from "SomeTable"
> i get the following error:
> array value must start with "{" or dimension information
>
> What am I doing wrong, I am feeling I still don't get the array
> fundamentals. My goal is to add to inside formatted information in the
> column "SomeTextColumn" my new string 'New string to add' in the same manner
> if I would been used the following:
> Insert into "SomeTable"("SomeTextColumn") values
> (CombineString(ARRAY['abba', 'queen', 'New string to add']))

It sounds like one or both of your functions have a bug in them, but
without knowing what they're supposed to do or seeing the source code,
it's pretty hard to guess what it might be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 10:50 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, May 31, 2012 at 10:31 AM, Tom Lane  wrote:
>>> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
>>> anyway, since it's 9.3 material at this point.
>
>> I agree we can't back-patch that change, but then I think we ought to
>> consider back-patching some variant of Tatsuo's patch.  Maybe it's not
>> reasonable to thunk an arbitrary number of relation names in there on
>> one line, but how about 1000 relations per LOCK statement or so?  I
>> guess we'd need to see how much that erodes the benefit, but we've
>> certainly done back-branch rearrangements in pg_dump in the past to
>> fix various kinds of issues, and this is pretty non-invasive.
>
> I am not convinced either that this patch will still be useful after
> Jeff's fix goes in, ...

But people on older branches are not going to GET Jeff's fix.

> or that it provides any meaningful savings when
> you consider a complete pg_dump run.  Yeah, it will make the lock
> acquisition phase faster, but that's not a big part of the runtime
> except in very limited scenarios (--schema-only, perhaps).

That is not a borderline scenario, as others have also pointed out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 10:31 AM, Tom Lane  wrote:
> Claudio Freire  writes:
>> It's not clear whether Tom is already working on that O(N^2) fix in locking.
>
> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
> anyway, since it's 9.3 material at this point.

I agree we can't back-patch that change, but then I think we ought to
consider back-patching some variant of Tatsuo's patch.  Maybe it's not
reasonable to thunk an arbitrary number of relation names in there on
one line, but how about 1000 relations per LOCK statement or so?  I
guess we'd need to see how much that erodes the benefit, but we've
certainly done back-branch rearrangements in pg_dump in the past to
fix various kinds of issues, and this is pretty non-invasive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] heavly load system spec

2012-05-24 Thread Robert Haas

On Thu, Apr 5, 2012 at 11:39 AM, Gregg Jaskiewicz  wrote:
> I know this is a very general question. But if you guys had to specify
> system (could be one server or cluster), with sustainable transaction
> rate of 1.5M tps running postgresql, what configuration and hardware
> would you be looking for ?
> The transaction distribution there is 90% writes/updates and 10% reads.
> We're talking 64 linux, Intel/IBM system.
>
> I'm trying to see how that compares with Oracle system.

1.5 million is a lot of tps, especially if some of them are write
transactions.  On trivial read-only transactions (primary key lookup
on fully cached table), using a 16-core, 64-thread IBM POWER7 box,
pgbench -M prepared -S -n -T 60 -c 64 -j 64:

tps = 455903.743918 (including connections establishing)
tps = 456012.871764 (excluding connections establishing)

That box isn't quite the fastest one I've seen, but it's close.

What hardware is Oracle running on?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Sudden Query slowdown on our Postgresql Server

2012-05-08 Thread Robert Haas

On Fri, Mar 23, 2012 at 3:52 AM, Sebastian Melchior  wrote:
> unfortunately we cannot directly control the TRIM (i am not sure it even 
> occurs) because the SSDs are behind an LSI MegaSAS 9260 Controller which does 
> not allow access via smart. Even if some kind of TRIM command is the problem, 
> shouldn't the iowait go up in this case?

Based on my recent benchmarking experiences, maybe not.  Suppose
backend A takes a lock and then blocks on an I/O.  Then, all the other
backends block waiting on the lock.  So maybe one backend is stuck in
I/O-wait, but on a multi-processor system the percentages are averaged
across all CPUs, so it doesn't really look like there's much I/O-wait.
 If you have 'perf' available, I've found the following quite helpful:

perf record -e cs -g -a sleep 30
perf report -g

Then you can look at the report and find out what's causing PostgreSQL
to context-switch out - i.e. block - and therefore find out what lock
and call path is contended.  LWLocks don't show up in pg_locks, so you
can't troubleshoot this sort of contention that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] ...WHERE TRUE" condition in union results in bad query pla

2012-04-03 Thread Robert Haas

On Sat, Mar 3, 2012 at 10:03 PM, Tom Lane  wrote:
> Claus Stadler  writes:
>> Query optimizer glitch: "...WHERE TRUE" condition in union results in
>> bad query plan ...
>
> Yeah, this is because a nonempty WHERE clause defeats simplifying the
> UNION ALL into a simple "append relation" (cf is_safe_append_member()).
> The planner will eventually figure out that WHERE TRUE is a no-op,
> but that doesn't happen till later (and there are good reasons to do
> things in that order).
>
> Sooner or later I'd like to relax the restriction that appendrel members
> can't have extra WHERE clauses, but don't hold your breath waiting...

Does this comment need updating?

 * Note: the data structure assumes that append-rel members are single
 * baserels.  This is OK for inheritance, but it prevents us from pulling
 * up a UNION ALL member subquery if it contains a join.  While that could
 * be fixed with a more complex data structure, at present there's not much
 * point because no improvement in the plan could result.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance of SQL Function versus View

2012-04-03 Thread Robert Haas

On Wed, Feb 29, 2012 at 3:37 PM, Igor Schtein  wrote:
> Do you see any performance difference between the following approaches? The
> assumption is that most of the rows in the query will be requested often
> enough.
>
>
>
> 1.   SQL function.
>
> CREATE OR REPLACE FUNCTION X(IN a_id uuid, IN b_id uuid)
>
>  RETURNS int
>
>   STABLE
>
> AS $$
>
>   SELECT count(1)
>
>  FROM A, B
>
>  WHERE a_join_id = b_join_id
>
>    AND A.a_id  = a_id
>
>    AND B.b_id = b_id;
>
> $$ LANGUAGE SQL;
>
>
>
> SELECT X(a_id, b_id);
>
>
>
> 2.   View.
>
> CREATE OR REPLACE VIEW X AS
>
>   SELECT a_id, b_id, count(1) cnt
>
>  FROM A, B
>
>  WHERE a_join_id = b_join_id
>
> GROUP BY (a_id, b_id)
>
>
>
> SELECT cnt FROM  X WHERE X.a_id = a_id and X.B_id = b_id;

You should probably test this in your environment, but I'd expect the
view to be better.  Wrapping logic inside PL/pgsql functions
needlessly rarely turn outs to be a win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] set autovacuum=off

2012-03-21 Thread Robert Haas

On Thu, Feb 23, 2012 at 3:28 PM, Alessandro Gagliardi
 wrote:
> I'm unable to make sense of pg_locks. The vast majority are
> locktype='transactionid', mode='ExclusiveLock', granted=t. There are some
> 'relation' locks with mode='RowExclusiveLock' and fewer with
> 'AccessShareLock'. I have no idea what I should be looking for here.

If you have lock contention, you'll see locks with granted='f', at
least from time to time.  Those are the ones you want to worry about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Vacuuming problems on TOAST table

2012-02-29 Thread Robert Haas

On Wed, Feb 8, 2012 at 2:59 PM, Ofer Israeli  wrote:
> The settings we used were not in the postgresql.conf file, but rather an 
> update of the pg_autovacuum table where we set the vac_cost_limit to 2000.  
> The reason for this being that we wanted this definition only for the big 
> (TOASTed) table I was referring to.
>
> The logged settings in the ~400 second case were:
> autovac_balance_cost(pid=6224 db=16385, rel=17881, cost_limit=10, 
> cost_delay=1)
>
> Which comes as quite a surprise as it seems that the cost_limit is not set or 
> am I missing something?

That doesn't look right, but without step-by-step directions it will
be hard for anyone to reproduce this.  Also, what version are you
testing on?  pg_autovacuum was removed in PostgreSQL 8.4, so you must
be using PostgreSQL 8.3 or earlier.

You might at least want to make sure you're running a late enough
minor version to have this fix:

Author: Tom Lane 
Branch: master Release: REL9_1_BR [b58c25055] 2010-11-19 22:29:44 -0500
Branch: REL9_0_STABLE Release: REL9_0_2 [b5efc0940] 2010-11-19 22:28:25 -0500
Branch: REL8_4_STABLE Release: REL8_4_6 [fab2af30d] 2010-11-19 22:28:30 -0500
Branch: REL8_3_STABLE Release: REL8_3_13 [6cb9d5113] 2010-11-19 22:28:35 -0500

Fix leakage of cost_limit when multiple autovacuum workers are active.

When using default autovacuum_vac_cost_limit, autovac_balance_cost relied
on VacuumCostLimit to contain the correct global value ... but after the
first time through in a particular worker process, it didn't, because we'd
trashed it in previous iterations.  Depending on the state of other autovac
workers, this could result in a steady reduction of the effective
cost_limit setting as a particular worker processed more and more tables,
causing it to go slower and slower.  Spotted by Simon Poole (bug #5759).
Fix by saving and restoring the GUC variables in the loop in do_autovacuum.

In passing, improve a few comments.

Back-patch to 8.3 ... the cost rebalancing code has been buggy since it was
put in.

Also:

> And one more thing that seems a bit strange - after a 1-minute run, we would
> expect to see 1700 Tuples Updated (100*17), but instead we see 1700 Tuples
> Inserted (and no deletes).

I don't think TOAST ever updates chunks in place.  It just inserts and
deletes; or at least I think that's what it does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How to improve insert speed with index on text column

2012-02-29 Thread Robert Haas

On Sun, Feb 5, 2012 at 12:29 PM, Saurabh  wrote:
> My intention to keep autovacuum as off is bulk loading only. I was
> thinking after bullk load I will change it.
>
> I changed wal_buffer from 5MB to 16MB but I got same performance that
> I got with 5MB (even less).

Does it help if you create the index using COLLATE "C"?  Assuming
you're on 9.1.x...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] text search: tablescan cost for a tsvector

2012-02-29 Thread Robert Haas

On Mon, Feb 6, 2012 at 6:05 AM, Marc Mamin  wrote:
> without analyze: http://explain.depesz.com/s/6At
> with analyze:    http://explain.depesz.com/s/r3B

I think this is the same issue complained about here:

http://archives.postgresql.org/message-id/4ed68eec.9030...@krogh.cc

And here:

http://archives.postgresql.org/message-id/CANxtv6XiuiqEkXRJU2vk=xkafxrlep7uvhgr-xmcyjgqz29...@mail.gmail.com

The problem seems to be that the cost estimator doesn't know that
detoasting is expensive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] pl/pgsql functions outperforming sql ones?

2012-02-08 Thread Robert Haas

On Sat, Jan 28, 2012 at 11:20 PM, Carlo Stonebanks
 wrote:
> Update: The main stored function in question and all of its sub
> sub-functions were recoded to new pure sql functions.
>
> I then stub tested the sub functions sql vs. plpgsql.
>
> Here were the results for new sql vs old plpgsql:
>
> Individual sub functions tested 20-30% faster
>
> But the main function calling new sql sub functions ran 100% slower
>
> So I tried this:
>
> I modified the old plpgsql function to call the new sql sub functions.
>
> THAT ran 20-30% faster then the unmodified version.
>
> That modified function is listed below. All the functions ending in 2 are
> the new SQL versions.

One advantage of PL/pgsql for code like this is that you can compute
values once and save them in variables.  SQL doesn't have variables,
so you can end up repeating the same SQL in multiple places (causing
multiple evaluation), or even if you manage to avoid that, the system
can inline things in multiple places and produce the same effect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How to remove a table statistics ?

2012-02-03 Thread Robert Haas

On Tue, Jan 31, 2012 at 2:36 PM, Marc Mamin  wrote:
> But this raises an interesting question on how/where does Postgres store 
> statistics on functional indexes.
> in pg_statistics there are information on the column content, but I couldn't 
> find stats on the function result which is fully computed only during the 
> index creation.

Look for rows where starelid is equal to the OID of the index.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] wal_level=archive gives better performance than minimal - why?

2012-02-03 Thread Robert Haas

2012/1/22 Tomas Vondra :
> That's suspiciously similar to the checkpoint timeout (which was set to
> 4 minutes), but why should this matter for minimal WAL level and not for
> archive?

I went through and looked at all the places where we invoke
XLogIsNeeded().  When XLogIsNeeded(), we:

1. WAL log creation of the _init fork of an unlogged table or an index
on an unlogged table (otherwise, an fsync is enough)
2. WAL log index builds
3. WAL log changes to max_connections, max_prepared_xacts,
max_locks_per_xact, and/or wal_level
4. skip calling posix_fadvise(POSIX_FADV_DONTNEED) when closing a WAL file
5. skip supplying O_DIRECT when writing WAL, if wal_sync_method is
open_sync or open_datasync
6. refuse to create named restore points
7. WAL log CLUSTER
8. WAL log COPY FROM into a newly created/truncated relation
9. WAL log ALTER TABLE .. SET TABLESPACE
9. WAL log cleanup info before doing an index vacuum (this one should
probably be changed to happen only in HS mode)
10. WAL log SELECT INTO

It's hard to see how generating more WAL could cause a performance
improvement, unless there's something about full page flushes being
more efficient than partial page flushes or something like that.  But
none of the stuff above looks likely to happen very often anyway.  But
items #4 and #5 on that list like things that could potentially be
causing a problem - if WAL files are being reused regularly, then
calling POSIX_FADV_DONTNEED on them could represent a regression.  It
might be worth compiling with POSIX_FADV_DONTNEED undefined and see
whether that changes anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow nested loop execution on larger server

2012-02-03 Thread Robert Haas

s 12634 detached
>
> pg_locks for this 12634 shows all granted ones, nothing exciting there.  I
> asked how well this executes with enable_nestloop turned off, hoping to see
> that next.
>
> This all seems odd, and I get interested and concerned when that start
> showing up specifically on newer hardware.

Ridiculously late response here, but, IME, semop() calls typically
indicate LWLock contention, but with a stock build it's pretty well
impossible to figure out which LWLock is being contended; compiling
with LWLOCK_STATS could tell ou that.

Shooting from the hip, the first thing that comes to mind is that the
index isn't fully cached in shared_buffers, and every time you hit a
page that isn't there you have to acquire BufFreelistLock to run the
clock sweep.  If the lock is uncontended then you wouldn't get system
calls, but if there's other activity on the system you might get
something like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioned table: differents plans, slow on some situations

2012-01-10 Thread Robert Haas

                      WHERE sms.id = dlr.id_sms_messaggio
>                        AND sms.timestamp_todeliver >= '1/3/2010'::timestamp
>                        AND sms.timestamp_todeliver < '30/4/2010'::timestamp
>                        AND dlr.timestamp_todeliver >= '1/3/2010'::timestamp
>                        AND dlr.timestamp_todeliver < '30/4/2010'::timestamp
>                         AND sms.id_cliente = '7'
>                      ORDER BY dlr.timestamp_todeliver DESC LIMIT 50;
>                                                                               
>                           QUERY PLAN
> 
>  Limit  (cost=0.02..78345.66 rows=50 width=16) (actual 
> time=183.547..257383.459 rows=50 loops=1)
>   ->  Nested Loop  (cost=0.02..58256245.44 rows=37179 width=16) (actual 
> time=183.544..257383.379 rows=50 loops=1)
>         Join Filter: (sms.id = dlr.id_sms_messaggio)
>         ->  Merge Append  (cost=0.02..20289460.70 rows=5687651 width=16) 
> (actual time=0.047..4040.930 rows=1490783 loops=1)
>               Sort Key: dlr.timestamp_todeliver
>               ->  Index Scan Backward using sms_messaggio_dlr_todeliver on 
> sms_messaggio_dlr dlr  (cost=0.00..8.27 rows=1 width=16) (actual 
> time=0.005..0.005 rows=0 loops=1)
>                     Index Cond: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone))
>               ->  Index Scan Backward using 
> sms_messaggio_dlr_timestamp_todeliver_201003 on sms_messaggio_dlr_201003 dlr  
> (cost=0.00..12428664.98 rows=3502530 width=16) (actual time=0.018..0.018 
> rows=1 loops=1)
>                     Index Cond: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone))
>               ->  Index Scan Backward using 
> sms_messaggio_dlr_timestamp_todeliver_201004 on sms_messaggio_dlr_201004 dlr  
> (cost=0.00..7756421.17 rows=2185120 width=16) (actual time=0.022..2511.283 
> rows=1490783 loops=1)
>                     Index Cond: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone))
>         ->  Materialize  (cost=0.00..1715.42 rows=445 width=8) (actual 
> time=0.001..0.081 rows=161 loops=1490783)
>               ->  Append  (cost=0.00..1713.20 rows=445 width=8) (actual 
> time=0.111..0.502 rows=161 loops=1)
>                     ->  Seq Scan on sms_messaggio sms  (cost=0.00..0.00 
> rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=1)
>                           Filter: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone) AND (id_cliente = 7))
>                     ->  Bitmap Heap Scan on sms_messaggio_201003 sms  
> (cost=6.85..1199.49 rows=313 width=8) (actual time=0.108..0.245 rows=94 
> loops=1)
>                           Recheck Cond: (id_cliente = 7)
>                           Filter: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone))
>                           ->  Bitmap Index Scan on 
> sms_messaggio_id_cliente_201003  (cost=0.00..6.78 rows=313 width=0) (actual 
> time=0.083..0.083 rows=94 loops=1)
>                                 Index Cond: (id_cliente = 7)
>                     ->  Index Scan using 
> sms_messaggio_id_cliente_timestamp_201004 on sms_messaggio_201004 sms  
> (cost=0.00..513.71 rows=131 width=8) (actual time=0.059..0.113 rows=67 
> loops=1)
>                           Index Cond: ((id_cliente = 7) AND 
> (timestamp_todeliver >= '2010-03-01 00:00:00'::timestamp without time zone) 
> AND (timestamp_todeliver < '2010-04-30 00:00:00'::timestamp without time 
> zone))
>  Total runtime: 257383.922 ms

Hmm.  In the first (good) plan, the planner is using a parameterized
nestloop.  So for each row it finds in dlr, it looks up
dlr.id_sms_messaggio and passes that down to the index scans, which
then pull out just the rows where sms.id takes that specific value.
In the second (bad) plan, the planner is using an unparameterized
nestloop: it's fetching all 445 rows that match the remaining criteria
on sms_messagio (i.e. date and id_cliente) and then repeatedly
rescanning the output of that calculation.  My guess is that the
planner figures that repeated index scans are going to cause too much
I/O, and that caching the results is better; you might want to check
your values for random_page_cost, seq_page_cost, and
effective_cache_size.

That having been said, if the planner doesn't like the idea of
repeatedly index-scanning, why not use a hash join instead of a nested
loop?  That seems likely to be a whole lot faster for the 445 rows the
planner is estimating.  Can you show us all of your non-default
configuration settings?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Subquery flattening causing sequential scan

2012-01-10 Thread Robert Haas

On Tue, Dec 27, 2011 at 12:29 PM, Jim Crate  wrote:
> My question is why does it do a seq scan when it flattens this subquery into 
> a JOIN?  Is it because the emsg_messages table is around 1M rows?  Are there 
> some guidelines to when the planner will prefer not to use an available 
> index?  I just had a look through postgresql.conf and noticed that I forgot 
> to set effective_cache_size to something reasonable for a machine with 16GB 
> of memory.  Would the default setting of 128MB cause this behavior?  I can't 
> bounce the production server midday to test that change.

You wouldn't need to bounce the production server to test that.  You
could just use SET in the session you were testing from.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] pg_upgrade failure "contrib" issue?

2012-01-10 Thread Robert Haas

On Wed, Dec 7, 2011 at 6:53 PM, Tory M Blue  wrote:
> Well thought it was maybe just going from 8.4.4 to 9.1.1 so upgraded
> to 8.4.9 and tried pg_upgrade again (this is 64bit) and it's failing
>
> -bash-4.0$ /tmp/pg_upgrade --check --old-datadir "/data/db"
> --new-datadir "/data1/db" --old-bindir "/ipix/pgsql/bin" --new-bindir
> "/ipix/pgsql9/bin"
> Performing Consistency Checks
> -
> Checking current, bin, and data directories                 ok
> Checking cluster versions                                   ok
> Checking database user is a superuser                       ok
> Checking for prepared transactions                          ok
> Checking for reg* system oid user data types                ok
> Checking for contrib/isn with bigint-passing mismatch       ok
> Checking for large objects                                  ok
>
> There were problems executing "/ipix/pgsql/bin/pg_ctl" -w -l
> "/dev/null" -D "/data/db"  stop >> "/dev/null" 2>&1
> Failure, exiting
>
>
> I've read some re pg_migrator and issues with contribs, but wondered
> if there is something "Else" I need to know here

I'm not sure that this is on-topic for pgsql-performance, and my reply
here is horribly behind-the-times anyway, but my experience with
pg_upgrade is that it's entirely willing to send all the critically
important information you need to solve the problem to the bit bucket,
as in your example.  If you knew WHY it was having trouble running
pg_ctl, you would probably be able to fix it easily, but since
everything's been redirected to /dev/null, you can't.  I believe that
this gets considerably better if you run pg_upgrade with the "-l
logfile" option, and then check the log file.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Autovacuum Issue

2011-12-01 Thread Robert Haas

On Wed, Nov 23, 2011 at 12:55 AM, J Ramesh Kumar  wrote:
> Why the autovacuum is running even though, I disabled ? Am I miss anything ?

As Raghavendra says, anti-wraparound vacuum will always kick in to
prevent a database shutdown.

> And also please share your views on my decision about disable autovacuum for
> my application. I am planning to run vacuum command daily on that small
> table which has frequent updates.

Sounds like a bad plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Problems with FTS

2011-11-30 Thread Robert Haas

On Mon, Nov 21, 2011 at 12:53 AM, Rauan Maemirov  wrote:
> The problem has returned back, and here's the results, as you've said it's
> faster now:
>
> SET enable_seqscan=off;
> EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
> WHERE (v.active) AND (v.fts @@
> 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery and
> v.id <> 500563 )
> ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts,
> 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery),
> 1) DESC, v.views DESC
> LIMIT 6
>
> Limit  (cost=219631.83..219631.85 rows=6 width=287) (actual
> time=1850.567..1850.570 rows=6 loops=1)
>   ->  Sort  (cost=219631.83..220059.05 rows=170886 width=287) (actual
> time=1850.565..1850.566 rows=6 loops=1)
>         Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts, '( (
> ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) |
> ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)), views
>         Sort Method:  top-N heapsort  Memory: 26kB
>         ->  Bitmap Heap Scan on video v  (cost=41180.92..216568.73
> rows=170886 width=287) (actual time=214.842..1778.830 rows=103087 loops=1)
>               Recheck Cond: (fts @@ '( ( ( ( ( ''dexter'':A | ''season'':A )
> | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) | ''сезон'':A ) |
> ''серия'':A'::tsquery)
>               Filter: (active AND (id <> 500563))
>               ->  Bitmap Index Scan on idx_video_fts  (cost=0.00..41138.20
> rows=218543 width=0) (actual time=170.206..170.206 rows=171945 loops=1)
>                     Index Cond: (fts @@ '( ( ( ( ( ''dexter'':A |
> ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) |
> ''сезон'':A ) | ''серия'':A'::tsquery)
> Total runtime: 1850.632 ms
>
>
> Should I use this instead?

Can you also provide EXPLAIN ANALYZE output for the query with
enable_seqscan=on?

The row-count estimates look reasonably accurate, so there's some
other problem here.  What do you have random_page_cost, seq_page_cost,
and effective_cache_size set to?  You might try "SET
random_page_cost=2" or even "SET random_page_cost=0.5; SET
seq_page_cost=0.3" and see if those settings help.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query planner suggestion, for indexes with similar but not exact ordering.

2011-11-30 Thread Robert Haas

On Mon, Nov 14, 2011 at 5:22 PM, Andrew Barnham
 wrote:
> I wonder, if it is possible and worthwhile, to setup the query planner to
> recognize that because of the stats I indicate above, that a sort by partnum
> is almost exactly the same as a sort by partnum+franchise.  And doing a
> Index scan on partnum index, and sorting results in memory will be
> dramatically faster.  The sort buffer only needs to be very small, will only
> grow to 8 records only at most in my above example.  The buffer will scan
> partnum index, and as long as partnum is the same, it will sort that small
> segment, as soon as the partnum increments when walking the index, the
> buffer zeros out again for next sort group.

This has come up before and seems worthwhile, but nobody's implemented it yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Blocking excessively in FOR UPDATE

2011-11-04 Thread Robert Haas

On Fri, Nov 4, 2011 at 2:45 PM, Claudio Freire  wrote:
> I don't think 1 second can be such a big difference for the bgwriter,
> but I might be wrong.

Well, the default value is 200 ms.   And I've never before heard of
anyone tuning it up, except maybe to save on power consumption on a
system with very low utilization.  Nearly always you want to reduce
it.

> The wal_writer makes me doubt, though. If logged activity was higher
> than 8MB/s, then that setting would block it all.
> I guess I really should lower it.

Here again, you've set it to ten times the default value.  That
doesn't seem like a good idea.  I would start with the default and
tune down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Strange query plan

2011-11-04 Thread Robert Haas

On Fri, Nov 4, 2011 at 12:14 PM, Sorbara, Giorgio (CIOK)
 wrote:
>> How fast do you expect this to run?  It's aggregating 125 million
>> rows, so that's going to take some time no matter how you slice it.
>> Unless I'm misreading this, it's actually taking only about 4
>> microseconds per row, which does not obviously suck.
>
> Well, the problem is not how fast it takes to process one row rather the best 
> query plan I am supposed to get. I don't mean the planer is wrong but I was 
> expecting a feature is not there (yet).
> We don't have pure index scan. Fair enough. so I have approached the problem 
> in a different way: getting rid of the degenerated dimensions and exploiting 
> "useless" dimension table.
> It's a workaround but it actually seems to work :) now I have a ~350 millions 
> fact table and no partition but I am happy to get the data I want in 1 sec or 
> less.

Am I misreading the EXPLAIN ANALYZE output?  I'm reading that to say
that there were 125 million rows in the table that matched your filter
condition.  If that's correct, I don't think index-only scans (which
will be in 9.2) are going to help you much - it might be faster, but
it's definitely not going to be anything like instantaneous.

On the flip side, if I *am* misreading the output and the number of
rows needed to compute the aggregate is actually some very small
number, then you ought to be getting an index scan even in older
versions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Blocking excessively in FOR UPDATE

2011-11-04 Thread Robert Haas

On Fri, Nov 4, 2011 at 12:07 PM, Claudio Freire  wrote:
> What are those writes about? HOT vacuuming perhaps?

Every tuple lock requires dirtying the page.  Those writes are all
those dirty pages getting flushed out to disk.  It's possible that the
OS is allowing the writes to happen asynchronously for a while, but
then when you get too much dirty data in the cache, it starts
blocking.

The only thing I'm fuzzy about is why it's locking so many rows, given
that the output says rows=1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Strange query plan

2011-11-04 Thread Robert Haas

On Mon, Oct 31, 2011 at 9:52 AM, Sorbara, Giorgio (CIOK)
 wrote:
> Group  (cost=0.00..4674965.80 rows=200 width=17) (actual 
> time=13.375..550943.592 rows=1 loops=1)
>   ->  Append  (cost=0.00..4360975.94 rows=125595945 width=17) (actual 
> time=13.373..524324.817 rows=125595932 loops=1)
>         ->  Index Scan using f_suipy_pkey on f_suipy  (cost=0.00..5.64 rows=1 
> width=58) (actual time=0.019..0.019 rows=0 loops=1)
>               Index Cond: ((fk_theme)::text = 'main_py_six_scxc'::text)
>         ->  Seq Scan on f_suipy_main_py_six_scxc f_suipy  
> (cost=0.00..4360970.30 rows=125595944 width=17) (actual 
> time=13.352..495259.117 rows=125595932 loops=1)
>               Filter: ((fk_theme)::text = 'main_py_six_scxc'::text)
>  Total runtime: 550943.699 ms

How fast do you expect this to run?  It's aggregating 125 million
rows, so that's going to take some time no matter how you slice it.
Unless I'm misreading this, it's actually taking only about 4
microseconds per row, which does not obviously suck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] function slower than the same code in an sql file

2011-11-03 Thread Robert Haas

On Thu, Nov 3, 2011 at 11:31 AM, Rodrigo Gonzalez
 wrote:
> El 03/11/11 11:42, Robert Haas escribió:
>
> On Fri, Oct 28, 2011 at 9:39 AM, CS DBA  wrote:
>
> No parameters,  one of them looks like this:
>
> [ code snippet ]
>
> It's hard to believe this is the real code, because SELECT without
> INTO will bomb out inside a PL/pgsql function, won't it?
>
> But he's using CREATE TABLE xyz_view_m AS
>
> So it seems correct to me

Oh, right, I missed that.

That seems pretty mysterious then.  But is it possible the function is
getting called more times than it should?  I notice that it's set up
as a trigger; is it FOR EACH ROW when it should be a statement-level
trigger or something like that?  Maybe run EXPLAIN ANALYZE on the
query that's invoking the trigger to get some more detail on what's
going on?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query running a lot faster with enable_nestloop=false

2011-11-03 Thread Robert Haas

On Tue, Oct 25, 2011 at 5:09 AM, Mohanaraj Gopala Krishnan
 wrote:
> I have a query that runs a lot slower (~5 minutes) when I run it with
> the default enable_nestloop=true and enable_nestloop=false (~10 secs).
> The actual query is available here http://pastie.org/2754424 . It is a
> reporting query with many joins as the database is mainly used for
> transaction processing.
>
> Explain analyse result for both cases:
>
> Machine A nestloop=true - http://explain.depesz.com/s/nkj0 (~5 minutes)
> Machine A nestloop=false - http://explain.depesz.com/s/wBM (~10 secs)

A good start might be to refactor this:

Seq Scan on retailer_categories retailer_category_leaf_nodes
(cost=0.00..18.52 rows=1 width=16) (actual time=0.016..0.392 rows=194
loops=1)
Filter: ((tree_right - tree_left) = 1)

And this:

Seq Scan on product_categories product_category_leaf_nodes
(cost=0.00..148.22 rows=2 width=32) (actual time=0.031..1.109 rows=383
loops=1)
Filter: ((tree_right - tree_left) = 1)

The query planner really has no idea what selectivity to assign to
that condition, and so it's basically guessing, and it's way off.  You
could probably improve the estimate a lot by adding a column that
stores the values of tree_right - tree_left and is updated manually or
by triggers as you insert and update data.  Then you could just check
tree_left_right_difference = 1, which should get a much more accurate
estimate, and hopefully therefore a better plan.

You've also got a fairly large estimation error here:

Index Scan using invoices_create_date_idx on invoices (cost=0.00..8.28
rows=1 width=4) (actual time=0.055..0.305 rows=109 loops=1)
Index Cond: ((create_date >= '2011-09-15'::date) AND (create_date
<= '2011-09-15'::date))
Filter: (status = 128)

Apparently, status 128 is considerably more common among rows in that
date range than it is overall.  Unfortunately, it's not so easy to fix
this kind of estimation error, unless you can refactor your schema to
avoid needing to filter on both create_date and status at the same
time.

It might be worth using temporary tables here - factor out sections of
the query that are referenced multiple times, like the join between
sales_order_items and invoices, and create a temporary table.  ANALYZE
it, and then use it to run the main query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] function slower than the same code in an sql file

2011-11-03 Thread Robert Haas

On Fri, Oct 28, 2011 at 9:39 AM, CS DBA  wrote:
> No parameters,  one of them looks like this:
>
> [ code snippet ]

It's hard to believe this is the real code, because SELECT without
INTO will bomb out inside a PL/pgsql function, won't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?

2011-11-02 Thread Robert Haas

On Wed, Nov 2, 2011 at 10:38 AM, Tom Lane  wrote:
> Jay Levitt  writes:
>> So you can see where I'm going.  I know if I break everything into
>> elegant, composable functions, it'll continue to perform poorly.  If I
>> write one big hairy, it'll perform great but it will be difficult to
>> maintain, and it will be inelegant and a kitten will die.  My tools
>> are CTEs, subqueries, aliases, SQL functions, PL/pgSQL functions, and
>> views (and other tools?)  What optimizations do each of those prevent?
>
> plpgsql functions are black boxes to the optimizer.  If you can express
> your functions as single SQL commands, using SQL-language functions is
> usually a better bet than plpgsql.
>
> CTEs are also treated as optimization fences; this is not so much an
> optimizer limitation as to keep the semantics sane when the CTE contains
> a writable query.

I wonder if we need to rethink, though.  We've gotten a number of
reports of problems that were caused by single-use CTEs not being
equivalent - in terms of performance - to a non-CTE formulation of the
same idea.  It seems necessary for CTEs to behave this way when the
subquery modifies data, and there are certainly situations where it
could be desirable otherwise, but I'm starting to think that we
shouldn't do it that way by default.  Perhaps we could let people say
something like WITH x AS FENCE (...) when they want the fencing
behavior, and otherwise assume they don't (but give it to them anyway
if there's a data-modifying operation in there).

Whenever I give a talk on the query optimizer, I'm constantly telling
people to take logic out of functions and inline it, avoid CTEs, and
generally merge everything into one big query.  But as the OP says,
that is decidedly less than ideal from a code-beauty-and-maintenance
point of view: people WANT to be able to use syntactic sugar and still
get good performance.  Allowing for the insertion of optimization
fences is good and important but it needs to be user-controllable
behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Composite keys

2011-10-31 Thread Robert Haas

On Mon, Oct 31, 2011 at 2:34 PM, Claudio Freire  wrote:
> On Mon, Oct 31, 2011 at 3:24 PM, Robert Haas  wrote:
>> Sure it does:
>>
>> rhaas=# create table baz (a bool, b int, c text, primary key (a, b));
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "baz_pkey" for table "baz"
>> CREATE TABLE
>> rhaas=# insert into baz select true, g,
>> random()::text||random()::text||random()::text||random()::text from
>> generate_series(1,40) g;
>
> Ok, that's artificially skewed, since the index has only one value in
> the first column.
>
> But it does prove PG considers the case, and takes into account the
> number of values it has to iterate over on the first column, which is
> very very interesting and cool.

Yes.  As your experience indicates, it's rare for this to be the best
plan.  But it is considered.  So there you have it.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Composite keys

2011-10-31 Thread Robert Haas

On Mon, Oct 31, 2011 at 1:52 PM, Claudio Freire  wrote:
> On Mon, Oct 31, 2011 at 2:08 PM, Robert Haas  wrote:
>>> Multicolumn indices on (c1, c2, ..., cn) can only be used on where
>>> clauses involving c1..ck with k>
>> I don't think that's true.  I believe it can be used for a query that
>> only touches, say, c2.  It's just extremely inefficient.
>
> Does postgres generate those kinds of plans?
> I do not think so. I've never seen it happening.

Sure it does:

rhaas=# create table baz (a bool, b int, c text, primary key (a, b));
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"baz_pkey" for table "baz"
CREATE TABLE
rhaas=# insert into baz select true, g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,40) g;
INSERT 0 40
rhaas=# analyze baz;
ANALYZE
rhaas=# explain analyze select * from baz where b = 1;
QUERY PLAN
---
 Index Scan using baz_pkey on baz  (cost=0.00..7400.30 rows=1
width=74) (actual time=0.104..20.691 rows=1 loops=1)
   Index Cond: (b = 1)
 Total runtime: 20.742 ms
(3 rows)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] does update of column with no relation imply a relation check of other column?

2011-10-31 Thread Robert Haas

On Wed, Oct 19, 2011 at 12:42 PM, Greg Jaskiewicz  wrote:
> For example:
> Table A
> -id (PK)
> -name
>
> Table B
> -table_a_id (PK, FK)
> -address
>
> When I do an insert on table B, the database check if value for column
> “table_a_id” exists in table A
> But, if I do an update of column “address” of table B, does the database
> check again?
>
> My question is due to the nature of and update in postgres, that basically
> is a new version “insert”.
>
> In short - I believe it does. No reason for it not to.

I just tested this, and it seems not.

rhaas=# create table a (id serial primary key);
NOTICE:  CREATE TABLE will create implicit sequence "a_id_seq" for
serial column "a.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"a_pkey" for table "a"
CREATE TABLE
rhaas=# create table b (table_a_id integer primary key references a
(id), address text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"b_pkey" for table "b"
CREATE TABLE
rhaas=# insert into a DEFAULT VALUES ;
INSERT 0 1
rhaas=# insert into b values (1);
INSERT 0 1

Then, in another session:

rhaas=# begin;
BEGIN
rhaas=# lock a;
LOCK TABLE

Back to the first session:

rhaas=# update b set address = 'cow';
UPDATE 1
rhaas=# select * from b;
 table_a_id | address
+-
  1 | cow
(1 row)

rhaas=# update b set table_a_id = table_a_id + 1;


So it seems that, when the fk field was unchanged, nothing was done
that required accessing table a; otherwise, the access exclusive lock
held by the other session would have blocked it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Composite keys

2011-10-31 Thread Robert Haas

On Tue, Oct 11, 2011 at 8:52 PM, Claudio Freire  wrote:
> On Tue, Oct 11, 2011 at 5:16 PM, Carlo Stonebanks
>  wrote:
>> Question 2) Regardless of the answer to Question 1 - if another_id is not
>> guaranteed to be unique, whereas pkey_id is – there any value to changing
>> the order of declaration (more generally, is there a performance impact for
>> column ordering in btree composite keys?)
>
> Multicolumn indices on (c1, c2, ..., cn) can only be used on where
> clauses involving c1..ck with khttp://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] should i expected performance degradation over time

2011-10-28 Thread Robert Haas

On Tue, Oct 11, 2011 at 2:20 PM, Anibal David Acosta  wrote:
> Do you know if check_postgresql.pl can run on windows (with perl installed)?
>
> Because our postgres installation is running on a Windows 2008 R2 server but
> can't find any tool like this for windows :(
>
> Thanks!

It's written in Perl, so I would think you could get it to work.  But
if not, you can always extract the big ol' query that it runs from the
script and run it some other way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance problem with a table with 38928077 record

2011-10-27 Thread Robert Haas

What is a bit strange about this is that you can do this:

On Fri, Oct 7, 2011 at 6:04 AM, Giovanni Mancuso  wrote:

>  select count(*) from dm_object_perm;
>   count
> --
>  38928077
> (1 row)
>

But not this:

If i run "explain analyze select * from dm_object_perm;" it goes on for many
> hours.
>

If I had to guess, I'd bet that the second one is trying to spool the
resultset in memory someplace and that's driving the machine into swap.  But
that's just a shot in the dark.   You might want to use tools like top,
vmstat, iostat, free, etc. to see what the system is actually doing while
this is running.  I'd start the query up, let it run for 10 minutes or so,
and then see whether the machine is CPU-bound or I/O-bound, and whether the
amount of swap in use is growing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PERFORM] Shortcutting too-large offsets?

2011-10-27 Thread Robert Haas

On Fri, Sep 30, 2011 at 2:56 PM, Tom Lane  wrote:
> Josh Berkus  writes:
>>> In principle, yeah, we could make it do that, but it seems like a likely
>>> source of maintenance headaches.  This example is not exactly compelling
>>> enough to make me want to do it.  Large OFFSETs are always going to be
>>> problematic from a performance standpoint, and the fact that we could
>>> short-circuit this one corner case isn't really going to make them much
>>> more usable.
>
>> It's not that uncommon of a corner case though; it's one which happens
>> all the time in webapps which paginate.  People just have to ask for a
>> page after the end.  It's really a question of how simple the code to
>> make the optimization would be; if it would be a 5-line patch, then it's
>> worth it; if it would be a 110-line refactor, no.
>
> No, it's really a question of whether it's worth any lines at all,
> and I remain unconvinced.  If someone asks for the last page, or any
> page near the end, it'll take just about the same amount of time as
> asking for a page past the end.  So any app like this is going to have
> sucky performance, and kluging the corner case isn't going to help much.

It looks to me like it took 22.3 seconds to do the nested loop and
then 22.4 seconds to do the nested loop plus the sort.  So the sort
itself only took 100 ms, which is hardly worth getting excited about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] : PG9.0 - Checkpoint tuning and pg_stat_bgwriter

2011-10-24 Thread Robert Haas

On Oct 24, 2011, at 8:16 AM, Venkat Balaji  wrote:
> Thanks Greg !
> 
> Sorry for delayed response.
> 
> We are actually waiting to change the checkpoint_segments in our production 
> systems (waiting for the downtime).

That setting can be changed without downtime.

...Robert
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Tsearch2 - bad performance with concatenated ts-vectors

2011-10-24 Thread Robert Haas

On Tue, Aug 2, 2011 at 2:22 AM, Jan Wielgus  wrote:
> So, there is apparently a problem with vector concatenating - the indexes 
> don't work then. I tried to use the vectors separately and to make 'OR' 
> comparison between single vector @@ ts_query checks,
> but it didn't help very much (performance was better, but still over 20 sec):
> ...
> (participant.participant_tsv @@ 
> to_tsquery('simple',to_tsquerystring('Abigail'))) OR (person.person_tsv @@ 
> to_tsquery('simple',to_tsquerystring('Abigail')))
> ...
>
> Is there a way to make this work with better performance? Or is it necessary 
> to create a single vector that contains data from multiple tables and then 
> add an index on it? It would be so far problematic for us,
> because we are using multiple complex queries with variable number of 
> selected columns. I know that another solution might be an union among 
> multiple queries, every of which uses a single vector,
> but this solution is inconvenient too.

Only something of the form 'indexed-column indexable-operator value'
is going to be indexable.  So when you concatenate two columns from
different tables - as you say - not indexable.

In general, OR-based conditions that cross table boundaries tend to be
expensive, because they have to be applied only after performing the
join.  You can't know for sure looking only at a row from one table
whether or not it will be needed, so you have to join them all and
then filter the results.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] issue related to logging facility of postgres

2011-08-31 Thread Robert Haas

On Wed, Jul 27, 2011 at 5:11 AM, shailesh singh  wrote:
> I want to configure Logging of postgres in such a way that messages of
> different severity should be logged in different log file. eg: all ERROR
> message should be written in error-msg.log file while all NOTICE mesage
> should be written in notice-msg.log file.
>
> In order to do that what changes should i need to do in configuration file ?
> Could you pl give a solution.

There's no such facility built-in.  You might want to do something
like "log everything in CSV format, and then run a  Perl script over
it afterwards to split it up".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Re: GROUP BY with reasonable timings in PLAN but unreasonable execution time

2011-08-03 Thread Robert Haas

On Fri, Jul 8, 2011 at 9:33 PM, Clem Dickey  wrote:
> a. The Join cost estimators could have been given more information
>
> The functions which estimate JOIN selectivity (e.g. the chance that tuples
> will match in an equijoin, for instance) use data produced by ANALYZE. But
> the SELECT .. GROUP BY does not propagate ANALYZE data from the columns of
> its input relation to its output relation. That is too bad, because the
> column value statistics (number of unique values) would have improved
> selectivity estimates for all three join plans (merge join, nested loop, and
> hash join).

Yeah, I've had this same thought.  In fact, I think that it would
probably be an improvement to pass through not just the number of
unique values but the MCVs and frequencies of the non-GROUP-BY
columns.  Of course, for the grouping columns, we ought to let
n_distinct = -1 pop out.  Granted, the GROUP BY might totally change
the data distribution, so relying on the input column statistics to be
meaningful could be totally wrong, but on average it seems more likely
to give a useful answer than a blind stab in the dark.  I haven't
gotten around to doing anything about this, but it seems like a good
idea.

> b. the Merge Join cost estimator did a poor job with the data it was given:
>
> In function eqjoinsel_inner there are two cases (1) ANALYZE data is
> available for both sides of the join and (2) ANALYZE data is missing for one
> or both sides. Due to the GROUP BY processing described above, ANALYZE data
> was available for "t" but not for "SELECT * FROM t GROUP BY ...".
>
> The logic in that case is "use the column with the most distinct values" to
> estimate selectivity. The default number of distinct values for a column
> with no data (DEFAULT_NUM_DISTINCT) is 200. In my join the number of values
> was:
>
> col  in GROUP BY   in table t
> j      200            1
> k      200            1
> x      200           10
> y      200         1000
> z      200           30
>
> In 4 of the 5 columns the default value had more distinct values, and the
> combined selectivity (chance that two arbitrary rows would have a join
> match) was (1/200)^4 * 1/1000. Very small. The error is, IMO, that the code
> does not distinguish known numbers from default numbers. A comment in the
> code acknowledges this:
>
> "XXX Can we be smarter if we have an MCV list for just one side?"
>
> But it concludes
>
> "It seems that if we assume equal distribution for the other side, we end up
> with the same answer anyway."
>
> I don't think that is the case. Preferring a known value, where one exists,
> would provide a better estimate of the actual range of the data. Indeed, the
> var_eq_non_const in the same file (used by the nested loop join estimator)
> does essentially that.

I'm not sure I understand what you're getting at here, unless the idea
is to make get_variable_numdistinct() somehow indicate to the caller
whether it had to punt.  That might be worth doing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] very large record sizes and ressource usage

2011-07-28 Thread Robert Haas

On Thu, Jul 7, 2011 at 10:33 AM,   wrote:
> Is there any guidelines to sizing work_mem, shared_bufferes and other
> configuration parameters etc., with regards to very large records?  I
> have a table that has a bytea column and I am told that some of these
> columns contain over 400MB of data.  I am having a problem on several
> servers reading and more specifically dumping these records (table)
> using pg_dump

work_mem shouldn't make any difference to how well that performs;
shared_buffers might, but there's no special advice for tuning it for
large records vs. anything else.  Large records just get broken up
into small records, under the hood.  At any rate, your email is a
little vague about exactly what the problem is.  If you provide some
more detail you might get more help.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] hstore - Implementation and performance issues around its operators

2011-07-22 Thread Robert Haas

On Tue, Jul 19, 2011 at 5:06 PM, Stefan Keller  wrote:
> 2011/7/19 Robert Haas :
>> Putting the elements in order wouldn't really help, would it?  I mean,
>> you'd need some kind of an index inside the hstore... which there
>> isn't.
>
> Sorry for my inprecise question. In fact elements of a hstore are
> stored in order of (keylength,key) with the key comparison done
> bytewise (not locale-dependent). See e.g. function hstoreUniquePairs
> in http://doxygen.postgresql.org/ . This ordered property is being
> used by some hstore functions but not all - and I'm still wondering
> why.

Not sure, honestly.  Is there some place where it would permit an
optimization we're not currently doing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Large rows number, and large objects

2011-07-20 Thread Robert Haas

On Wed, Jul 20, 2011 at 11:57 AM, Jose Ildefonso Camargo Tolosa
 wrote:
> On Tue, Jul 19, 2011 at 3:57 PM, Robert Haas  wrote:
>> On Sun, Jun 19, 2011 at 10:19 PM, Jose Ildefonso Camargo Tolosa
>>  wrote:
>> > So, the question is, if I were to store 8TB worth of data into large
>> > objects system, it would actually make the pg_largeobject table slow,
>> > unless it was automatically partitioned.
>>
>> I think it's a bit of an oversimplification to say that large,
>> unpartitioned tables are automatically going to be slow.  Suppose you
>> had 100 tables that were each 80GB instead of one table that is 8TB.
>> The index lookups would be a bit faster on the smaller tables, but it
>> would take you some non-zero amount of time to figure out which index
>> to read in the first place.  It's not clear that you are really
>> gaining all that much.
>
> Certainly but it is still very blurry to me on *when* it is better to
> partition than not.

I think that figuring that out is as much an art as it is a science.
It's better to partition when most of your queries are going to touch
only a single partition; when you are likely to want to remove
partitions in their entirety; when VACUUM starts to have trouble
keeping up... but the reality is that in some cases you probably have
to try it both ways and see which one works better.

>> Many of the advantages of partitioning have to do with maintenance
>> tasks.  For example, if you gather data on a daily basis, it's faster
>> to drop the partition that contains Thursday's data than it is to do a
>> DELETE that finds the rows and deletes them one at a time.  And VACUUM
>> can be a problem on very large tables as well, because only one VACUUM
>> can run on a table at any given time.  If the frequency with which the
>> table needs to be vacuumed is less than the time it takes for VACUUM
>> to complete, then you've got a problem.
>
> And pg_largeobject table doesn't get vacuumed? I mean, isn't that table
> just as any other table?

Yes, it is.  So, I agree: putting 8TB of data in there is probably
going to hurt.

>> But I think that if we want to optimize pg_largeobject, we'd probably
>> gain a lot more by switching to a different storage format than we
>> could ever gain by partitioning the table.  For example, we might
>> decide that any object larger than 16MB should be stored in its own
>> file.  Even somewhat smaller objects would likely benefit from being
>> stored in larger chunks - say, a bunch of 64kB chunks, with any
>> overage stored in the 2kB chunks we use now.  While this might be an
>> interesting project, it's probably not going to be anyone's top
>> priority, because it would be a lot of work for the amount of benefit
>> you'd get.  There's an easy workaround: store the files in the
>> filesystem, and a path to those files in the database.
>
> Ok, one reason for storing a file *in* the DB is to be able to do PITR of a
> wrongly deleted files (or overwritten, and that kind of stuff), on the
> filesystem level you would need a versioning filesystem (and I don't, yet,
> know any that is stable in the Linux world).
>
> Also, you can use streaming replication and at the same time you stream your
> data, your files are also streamed to a secondary server (yes, on the
> FS-level you could use drbd or similar).

Well, those are good arguments for putting the functionality in the
database and making it all play nicely with write-ahead logging.  But
nobody's felt motivated to write the code yet, so...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bad plan: 8.4.8, hashagg, work_mem=1MB.

2011-07-19 Thread Robert Haas

On Mon, Jun 20, 2011 at 3:31 PM, Jon Nelson  wrote:
> On Mon, Jun 20, 2011 at 11:08 AM, Tom Lane  wrote:
>> Jon Nelson  writes:
>>> I ran a query recently where the result was very large. The outer-most
>>> part of the query looked like this:
>>
>>>  HashAggregate  (cost=56886512.96..56886514.96 rows=200 width=30)
>>>    ->  Result  (cost=0.00..50842760.97 rows=2417500797 width=30)
>>
>>> The row count for 'Result' is in the right ballpark, but why does
>>> HashAggregate think that it can turn 2 *billion* rows of strings (an
>>> average of 30 bytes long) into only 200?
>>
>> 200 is the default assumption about number of groups when it's unable to
>> make any statistics-based estimate.  You haven't shown us any details so
>> it's hard to say more than that.
>
> What sorts of details would you like? The row count for the Result
> line is approximately correct -- the stats for all tables are up to
> date (the tables never change after import).  statistics is set at 100
> currently.

The query and the full EXPLAIN output (attached as text files) would
be a good place to start

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Large rows number, and large objects

2011-07-19 Thread Robert Haas

On Sun, Jun 19, 2011 at 10:19 PM, Jose Ildefonso Camargo Tolosa
 wrote:
> So, the question is, if I were to store 8TB worth of data into large
> objects system, it would actually make the pg_largeobject table slow,
> unless it was automatically partitioned.

I think it's a bit of an oversimplification to say that large,
unpartitioned tables are automatically going to be slow.  Suppose you
had 100 tables that were each 80GB instead of one table that is 8TB.
The index lookups would be a bit faster on the smaller tables, but it
would take you some non-zero amount of time to figure out which index
to read in the first place.  It's not clear that you are really
gaining all that much.

Many of the advantages of partitioning have to do with maintenance
tasks.  For example, if you gather data on a daily basis, it's faster
to drop the partition that contains Thursday's data than it is to do a
DELETE that finds the rows and deletes them one at a time.  And VACUUM
can be a problem on very large tables as well, because only one VACUUM
can run on a table at any given time.  If the frequency with which the
table needs to be vacuumed is less than the time it takes for VACUUM
to complete, then you've got a problem.

But I think that if we want to optimize pg_largeobject, we'd probably
gain a lot more by switching to a different storage format than we
could ever gain by partitioning the table.  For example, we might
decide that any object larger than 16MB should be stored in its own
file.  Even somewhat smaller objects would likely benefit from being
stored in larger chunks - say, a bunch of 64kB chunks, with any
overage stored in the 2kB chunks we use now.  While this might be an
interesting project, it's probably not going to be anyone's top
priority, because it would be a lot of work for the amount of benefit
you'd get.  There's an easy workaround: store the files in the
filesystem, and a path to those files in the database.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] hstore - Implementation and performance issues around its operators

2011-07-19 Thread Robert Haas

On Sun, Jun 19, 2011 at 2:59 PM, Stefan Keller  wrote:
> 1. Obviously the '@>' has to be used in order to let use the GiST index.
> Why is the '->' operator not supported by GiST ('->' is actually
> mentioned in all examples of the doc.)?

Because it's not a comparison operator.

> 2. Currently the hstore elements are stored in order as they are
> coming from the insert statement / constructor.
> Why are the elements not ordered i.e. why is the hstore not cached in
> all hstore functions (like hstore_fetchval etc.)?

Putting the elements in order wouldn't really help, would it?  I mean,
you'd need some kind of an index inside the hstore... which there
isn't.

> 3. In the source code 'hstore_io.c' one finds the following enigmatic
> note: "... very large hstore values can't be output. this could be
> fixed, but many other data types probably have the same issue."
> What is the max. length of a hstore (i.e. the max. length of the sum
> of all elements in text representation)?

I think that anything of half a gigabyte or more is at risk of falling
down there.  But probably it's not smart to use such big hstores
anyway.

> 4. Last, I don't fully understand the following note in the hstore
> doc. (http://www.postgresql.org/docs/current/interactive/hstore.html
> ):
>> Notice that the old names are reversed from the convention
>> formerly followed by the core geometric data types!
>
> Why names? Why not rather 'operators' or 'functions'?

It's referring to the operator names.

> What does this "reversed from the convention" mean concretely?

That comment could be a little more clear, but I think what it's
saying is that hstore's old @ is like the core geometic types old ~,
and visca versa.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Poor performance when joining against inherited tables

2011-06-30 Thread Robert Haas

On Wed, May 11, 2011 at 4:47 PM, Lucas Madar  wrote:
> On 05/11/2011 09:38 AM, Robert Haas wrote:
>>>
>>> However, if I disable seqscan (set enable_seqscan=false), I get the
>>> following plan:
>>>
>>>  QUERY PLAN
>>> 
>>>  Hash Join  (cost=10001298843.53..290002337961.71 rows=8643757 width=506)
>>>   Hash Cond: (f.id = objects.id)
>>>   ->    Append  (cost=100.00..29536334.43 rows=8643757
>>> width=20)
>>>         ->    Seq Scan on item f  (cost=100.00..126.30
>>> rows=1630 width=20)
>>>         ->    Index Scan using xxx_pkey on itemXX f  (cost=0.00..10.60
>>> rows=90
>>> width=20)
>>>         ->    Index Scan using yyy_pkey on itemYY f  (cost=0.00..25.24
>>> rows=266 width=20)
>>>         ->    Index Scan using zzz_pkey on itemZZ f  (cost=0.00..9.28
>>> rows=2
>>> width=20)
>>>         ...
>>>   ->    Hash  (cost=999347.17..999347.17 rows=3941949 width=490)
>>>         ->    Index Scan using objects_pkey on objects
>>> (cost=0.00..999347.17
>>> rows=3941949 width=490)
>>>
>>> This seems like a much more sensible query plan.
>>
>> I don't think so.  Scanning the index to extract all the rows in a
>> table is typically going to be a lot slower than a sequential scan.
>>
>
> Compared to the previous query plan (omitted in this e-mail, in which the
> planner was scanning all the item tables sequentially), the second query is
> much more desirable. It takes about 12 seconds to complete, versus the other
> query which I canceled after six hours. However, what you propose seems to
> make even more sense.

I was just looking at this email again, and had another thought:
perhaps the tables in question are badly bloated.  In your situation,
it seems that the plan didn't change much when you set
enable_seqscan=off: it just replaced full-table seq-scans with
full-table index-scans, which should be slower.  But if you have a
giant table that's mostly empty space, then following the index
pointers to the limited number of blocks that contain any useful data
might be faster than scanning all the empty space.  If you still have
these tables around somewhere, you could test this hypothesis by
running CLUSTER on all the tables and see whether the seq-scan gets
faster.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] change sample size for statistics

2011-06-28 Thread Robert Haas

On Mon, Jun 13, 2011 at 6:33 PM, Willy-Bas Loos  wrote:
> On Fri, Jun 10, 2011 at 9:58 PM, Josh Berkus  wrote:
>>
>> It's not 10%.  We use a fixed sample size, which is configurable on the
>> system, table, or column basis.
>
> It seems that you are referring to "alter column set statistics" and
> "default_statistics_target", which are the number of percentiles in the
> histogram  (and MCV's) .
> I mean the number of records that are scanned by analyze to come to the
> statistics for the planner, especially n_disctict.

In 9.0+ you can do ALTER TABLE .. ALTER COLUMN .. SET (n_distinct = ...);

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] poor performance when recreating constraints on large tables

2011-06-08 Thread Robert Haas

On Mon, Jun 6, 2011 at 6:10 PM, Mike Broers  wrote:
> Thanks for the suggestion, maintenance_work_mem is set to the default of
> 16MB on the host that was taking over an hour as well as on the host that
> was taking less than 10 minutes.  I tried setting it to 1GB on the faster
> test server and it reduced the time from around 6-7 minutes to about 3:30.
>  this is a good start, if there are any other suggestions please let me know
> - is there any query to check estimated time remaining on long running
> transactions?

Sadly, no.  I suspect that coming up with a good algorithm for that is
a suitable topic for a PhD thesis.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Understanding Hash Join performance

2011-06-02 Thread Robert Haas

On Thu, Jun 2, 2011 at 2:56 PM, Claudio Freire  wrote:
> On Thu, Jun 2, 2011 at 4:57 PM, Kevin Grittner
>  wrote:
>> And the
>> planner does take the size of work_mem and the expected data set
>> into consideration when estimating the cost of the hash join.
>
> And shouldn't it?
>
> In a gross mode, when hash joins go to disk, they perform very poorly.
> Maybe the planner should take that into account.

It does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Understanding Hash Join performance

2011-06-02 Thread Robert Haas

On Wed, Jun 1, 2011 at 8:10 PM, Robert James  wrote:
> A query I has spends a long time on Hash Joins (and Hash Left Joins).
> I have a few questions:
>
> 1. When does Postgres decide to do a Hash Join, over another type of Join?
> 2. Do Hash Joins normally perform poorly?  What can I do to speed them up?
> 3. What can I do to enable Postgres to use a faster type of join?

IME, hash joins usually are much faster than any other type.  There's
not enough information in your email to speculate as to what might be
going wrong in your particular case, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Hash Anti Join performance degradation

2011-05-31 Thread Robert Haas

On Thu, May 26, 2011 at 8:33 AM, panam  wrote:
> Any third party confirmation?

Yeah, it definitely looks like there is some kind of bug here.  Or if
not a bug, then a very surprising feature.  EXPLAIN ANALYZE outputs
from your proposed test attached.  Here's a unified diff of the two
outputs:


  QUERY PLAN
 
--
- Seq Scan on box b  (cost=0.00..3669095.76 rows=128 width=8) (actual
time=0.147..431517.693 rows=128 loops=1)
+ Seq Scan on box b  (cost=0.00..3669095.76 rows=128 width=8) (actual
time=0.047..6938.165 rows=128 loops=1)
SubPlan 1
- ->  Hash Anti Join  (cost=14742.77..28664.79 rows=19239 width=8)
(actual time=2960.176..3370.425 rows=1 loops=128)
+ ->  Hash Anti Join  (cost=14742.77..28664.79 rows=19239 width=8)
(actual time=48.385..53.361 rows=1 loops=128)
Hash Cond: (m1.box_id = m2.box_id)
Join Filter: (m1.id < m2.id)
-   ->  Bitmap Heap Scan on message m1  (cost=544.16..13696.88
rows=28858 width=16) (actual time=2.320..6.204 rows=18487 loops=128)
+   ->  Bitmap Heap Scan on message m1  (cost=544.16..13696.88
rows=28858 width=16) (actual time=1.928..5.502 rows=17875 loops=128)
  Recheck Cond: (box_id = b.id)
- ->  Bitmap Index Scan on "message_box_Idx"
(cost=0.00..536.94 rows=28858 width=0) (actual time=2.251..2.251
rows=18487 loops=128)
+ ->  Bitmap Index Scan on "message_box_Idx"
(cost=0.00..536.94 rows=28858 width=0) (actual time=1.797..1.797
rows=18487 loops=128)
Index Cond: (box_id = b.id)
-   ->  Hash  (cost=13696.88..13696.88 rows=28858 width=16)
(actual time=12.632..12.632 rows=19720 loops=120)
- Buckets: 4096  Batches: 4 (originally 2)  Memory Usage: 1787kB
- ->  Bitmap Heap Scan on message m2
(cost=544.16..13696.88 rows=28858 width=16) (actual time=1.668..6.619
rows=19720 loops=120)
+   ->  Hash  (cost=13696.88..13696.88 rows=28858 width=16)
(actual time=11.603..11.603 rows=20248 loops=113)
+ Buckets: 4096  Batches: 4 (originally 2)  Memory Usage: 1423kB
+ ->  Bitmap Heap Scan on message m2
(cost=544.16..13696.88 rows=28858 width=16) (actual time=1.838..6.886
rows=20248 loops=113)
Recheck Cond: (box_id = b.id)
-   ->  Bitmap Index Scan on "message_box_Idx"
(cost=0.00..536.94 rows=28858 width=0) (actual time=1.602..1.602
rows=19720 loops=120)
+   ->  Bitmap Index Scan on "message_box_Idx"
(cost=0.00..536.94 rows=28858 width=0) (actual time=1.743..1.743
rows=20903 loops=113)
  Index Cond: (box_id = b.id)
- Total runtime: 431520.186 ms
+ Total runtime: 6940.369 ms

That's pretty odd.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
QUERY 
PLAN
--
 Seq Scan on box b  (cost=0.00..3669095.76 rows=128 width=8) (actual 
time=0.147..431517.693 rows=128 loops=1)
   SubPlan 1
 ->  Hash Anti Join  (cost=14742.77..28664.79 rows=19239 width=8) (actual 
time=2960.176..3370.425 rows=1 loops=128)
   Hash Cond: (m1.box_id = m2.box_id)
   Join Filter: (m1.id < m2.id)
   ->  Bitmap Heap Scan on message m1  (cost=544.16..13696.88 
rows=28858 width=16) (actual time=2.320..6.204 rows=18487 loops=128)
 Recheck Cond: (box_id = b.id)
 ->  Bitmap Index Scan on "message_box_Idx"  (cost=0.00..536.94 
rows=28858 width=0) (actual time=2.251..2.251 rows=18487 loops=128)
   Index Cond: (box_id = b.id)
   ->  Hash  (cost=13696.88..13696.88 rows=28858 width=16) (actual 
time=12.632..12.632 rows=19720 loops=120)
 Buckets: 4096  Batches: 4 (originally 2)  Memory Usage: 1787kB
 ->  Bitmap Heap Scan on message m2  (cost=544.16..13696.88 
rows=28858 width=16) (actual time=1.668..6.619 rows=19720 loops=120)
   Recheck Cond: (box_id = b.id)
   ->  Bitmap Index Scan on "message_box_Idx"  
(cost=0.00..536.94 rows=28858 width=0) (actual time=1.602..1.602 rows=19720 
loops=120)
 Index Cond: (box_id = b.id)
 Total runtime: 431520.186 ms
QUERY 
PLAN

[PERFORM] picking a filesystem

2011-05-31 Thread Robert Haas

On Wed, May 25, 2011 at 4:41 PM, Greg Smith  wrote:
> On 05/23/2011 06:16 PM, John Rouillard wrote:
>>
>> OS: centos 5.5
>> Filesystem: data - ext4 (note 4 not 3); 6.6T formatted
>>             wal  - ext4; 1.5T formatted
>> Raid: data - level 10, 8 disk wd2003; controller LSI MegaRAID SAS 9260-4i
>>       wal  - level 1,  2 disk wd2003; controller LSI MegaRAID SAS 9260-4i
>>
>> Could it be an ext4 issue? It seems that ext4 may still be at the
>> bleeding edge for postgres use.
>>
>
> I would not trust ext4 on CentOS 5.5 at all.  ext4 support in 5.5 is labeled
> by RedHat as being in "Technology Preview" state.  I believe that if you had
> a real RedHat system instead of CentOS kernel, you'd discover it's hard to
> even get it installed--you need to basically say "yes, I know it's not for
> production, I want it anyway" to get preview packages.  It's not really
> intended for production use.
>
> What I'm hearing from people is that they run into the occasional ext4 bug
> with PostgreSQL, but the serious ones aren't happening very often now, on
> systems running RHEL6 or Debian Squeeze.  Those kernels are way, way ahead
> of the ext4 backport in RHEL5 based systems, and they're just barely stable.

So if you're running a RHEL5.4 or RHEL5.5 system, are you basically
stuck with ext3?  I'm not sure if I'm remembering correctly, but ISTM
that you've been uncomfortable with BOTH ext4 and XFS prior to RHEL6;
but OK with both beginning with RHEL6.

Also, any tips on mount options for XFS/ext4/ext3?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] The shared buffers challenge

2011-05-27 Thread Robert Haas

On Fri, May 27, 2011 at 2:47 PM, Greg Smith  wrote:
> Any attempt to make a serious change to the documentation around performance
> turns into a bikeshedding epic, where the burden of proof to make a change
> is too large to be worth the trouble to me anymore.  I first started
> publishing tuning papers outside of the main docs because it was the path of
> least resistance to actually getting something useful in front of people.
>  After failing to get even basic good recommendations for
> checkpoint_segments into the docs, I completely gave up on focusing there as
> my primary way to spread this sort of information.

Hmm.  That's rather unfortunate.  +1 for revisiting that topic, if you
have the energy for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-23 Thread Robert Haas

On Mon, May 23, 2011 at 3:08 PM, Josh Berkus  wrote:
>
>> Well, all of that stuff sounds impractically expensive to me... but I
>> just work here.
>
> I'll point out that the simple version, which just checks for hot tables
> and indexes, would improve estimates greatly and be a LOT less
> complicated than these proposals.

I realize I'm sounding like a broken record here, but as far as I can
tell there is absolutely zero evidence that that would be better.  I'm
sure you're in good company thinking so, but the list of things that
could skew (or should I say, screw) the estimates is long and painful;
and if those estimates are wrong, you'll end up with something that is
both worse and less predictable than the status quo.  First, I haven't
seen a shred of hard evidence that the contents of the buffer cache or
OS cache are stable enough to be relied upon, and we've repeatedly
discussed workloads where that might not be true.  Has anyone done a
systematic study of this on a variety real production systems?  If so,
the results haven't been posted here, at least not that I can recall.
Second, even if we were willing to accept that we could obtain
relatively stable and accurate measurements of this data, who is to
say that basing plans on it would actually result in an improvement in
plan quality?  That may seem obvious, but I don't think it is.  The
proposed method is a bit like trying to determine the altitude of a
hot air balloon by throwing the ballast over the side and timing how
long it takes to hit the ground.  Executing plans that are based on
the contents of the cache will change the contents of the cache, which
will in turn change the plans.  The idea that we can know, without any
experimentation, how that's going to shake out, seems to me to be an
exercise in unjustified optimism of the first order.

Sorry to sound grumpy and pessimistic, but I really think we're
letting our enthusiasm get way, way ahead of the evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-23 Thread Robert Haas

On Tue, May 17, 2011 at 8:45 AM, Andrey Vorobiev
 wrote:
> 1. How does database size affect insert performance?

Well, if your database gets bigger, then your indexes will become
deeper, requiring more time to update.  But I'm not sure that's your
problem here.

> 2. Why does number of written buffers increase when database size grows?

It normally doesn't.

> 3. How can I further analyze this problem?

Are you actually inserting more user data into these tables, so that
they have more and more rows as time goes by, or are the data files
getting larger out of proportion to the amount of useful data in them?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance degradation of inserts when database size grows

2011-05-23 Thread Robert Haas

On Mon, May 23, 2011 at 2:46 PM, Tomas Vondra  wrote:
> Really? He already has 64 checkpoint segments, which is about 1GB of
> xlog data. The real problem is that the amount of buffers to write is
> constantly growing. At the beginning there's 62861 buffers (500MB) and
> at the end there's 137657 buffers (1GB).
>
> IMHO increasing the number of checkpoint segments would make this
> disruption even worse.

Maybe - but it would also make the checkpoints less frequent, which
might be a good thing.

> In 9.1 there's a feature that spreads checkpoint writes, but with 8.4
> that's not possible.

What feature are you referring to here?  Checkpoint spreading was
added in 8.3, IIRC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] SORT performance - slow?

2011-05-23 Thread Robert Haas

On Mon, May 23, 2011 at 1:01 PM, Maciek Sakrejda  wrote:
>> You're probably reading it wrong. The sort itself takes about 1 ms (just
>> subtract the numbers in "actual=").
>
> I thought it was cost=startup_cost..total_cost. That is not quite the
> same thing, since startup_cost is effectively "cost to produce first
> row", and Sort can't really operate in a "streaming" fashion (well,
> theoretically, something like selection sort could, but that's beside
> the point) so it needs to do all the work up front. I'm no explain
> expert, so someone please correct me if I'm wrong.

You are right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pushing LIMIT into sub-queries of a UNION ALL

2011-05-23 Thread Robert Haas

On Mon, May 16, 2011 at 3:38 PM, Dave Johansen  wrote:
> I am using Postgres 8.3 and I have an issue very closely related to the one
> described here:
> http://archives.postgresql.org/pgsql-general/2005-06/msg00488.php
>
> Basically, I have a VIEW which is a UNION ALL of two tables but when I do a
> select on the view using a LIMIT, it scans the entire tables and takes
> significantly longer than writing out the query with the LIMITs in the
> sub-queries themselves. Is there a solution to get the view to perform like
> the sub-query version?

I believe this is fixed by MergeAppend in 9.1.  You might want to try
9.1beta1 and see if that works better for you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)

2011-05-23 Thread Robert Haas

On Tue, May 17, 2011 at 11:10 AM,   wrote:
> For Hstore I'm using a GIST index.

I would have thought that GIN would be a better choice for this workload.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Link error when use Pgtypes function in windows

2011-05-19 Thread Robert Haas

On Fri, May 13, 2011 at 12:22 PM, Maciek Sakrejda  wrote:
>> Does someone can help me?
>
> You may want to try pgsql-general instead of this list.

Yeah, this isn't a performance question.

But I wonder if the problem might be that the OP needs to link with
the ecpg library, not just libpq.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Robert Haas

On Thu, May 19, 2011 at 2:39 PM, Jim Nasby  wrote:
> On May 19, 2011, at 9:53 AM, Robert Haas wrote:
>> On Wed, May 18, 2011 at 11:00 PM, Greg Smith  wrote:
>>> Jim Nasby wrote:
>>>> I think the challenge there would be how to define the scope of the
>>>> hot-spot. Is it the last X pages? Last X serial values? Something like
>>>> correlation?
>>>>
>>>> Hmm... it would be interesting if we had average relation access times for
>>>> each stats bucket on a per-column basis; that would give the planner a
>>>> better idea of how much IO overhead there would be for a given WHERE clause
>>>
>>> You've already given one reasonable first answer to your question here.  If
>>> you defined a usage counter for each histogram bucket, and incremented that
>>> each time something from it was touched, that could lead to a very rough way
>>> to determine access distribution.  Compute a ratio of the counts in those
>>> buckets, then have an estimate of the total cached percentage; multiplying
>>> the two will give you an idea how much of that specific bucket might be in
>>> memory.  It's not perfect, and you need to incorporate some sort of aging
>>> method to it (probably weighted average based), but the basic idea could
>>> work.
>>
>> Maybe I'm missing something here, but it seems like that would be
>> nightmarishly slow.  Every time you read a tuple, you'd have to look
>> at every column of the tuple and determine which histogram bucket it
>> was in (or, presumably, which MCV it is, since those aren't included
>> in working out the histogram buckets).  That seems like it would slow
>> down a sequential scan by at least 10x.
>
> You definitely couldn't do it real-time. But you might be able to copy the 
> tuple somewhere and have a background process do the analysis.
>
> That said, it might be more productive to know what blocks are available in 
> memory and use correlation to guesstimate whether a particular query will 
> need hot or cold blocks. Or perhaps we create a different structure that lets 
> you track the distribution of each column linearly through the table; 
> something more sophisticated than just using correlation perhaps 
> something like indicating which stats bucket was most prevalent in each 
> block/range of blocks in a table. That information would allow you to 
> estimate exactly what blocks in the table you're likely to need...

Well, all of that stuff sounds impractically expensive to me... but I
just work here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Robert Haas

On Wed, May 18, 2011 at 11:00 PM, Greg Smith  wrote:
> Jim Nasby wrote:
>> I think the challenge there would be how to define the scope of the
>> hot-spot. Is it the last X pages? Last X serial values? Something like
>> correlation?
>>
>> Hmm... it would be interesting if we had average relation access times for
>> each stats bucket on a per-column basis; that would give the planner a
>> better idea of how much IO overhead there would be for a given WHERE clause
>
> You've already given one reasonable first answer to your question here.  If
> you defined a usage counter for each histogram bucket, and incremented that
> each time something from it was touched, that could lead to a very rough way
> to determine access distribution.  Compute a ratio of the counts in those
> buckets, then have an estimate of the total cached percentage; multiplying
> the two will give you an idea how much of that specific bucket might be in
> memory.  It's not perfect, and you need to incorporate some sort of aging
> method to it (probably weighted average based), but the basic idea could
> work.

Maybe I'm missing something here, but it seems like that would be
nightmarishly slow.  Every time you read a tuple, you'd have to look
at every column of the tuple and determine which histogram bucket it
was in (or, presumably, which MCV it is, since those aren't included
in working out the histogram buckets).  That seems like it would slow
down a sequential scan by at least 10x.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Robert Haas

On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh  wrote:
>> To me it seems like a robust and fairly trivial way to to get better
>> numbers. The
>> fear is that the OS-cache is too much in flux to get any stable numbers
>> out
>> of it.
>
> Ok, it may not work as well with index'es, since having 1% in cache may very
> well mean that 90% of all requested blocks are there.. for tables in should
> be more trivial.

Tables can have hot spots, too.  Consider a table that holds calendar
reservations.  Reservations can be inserted, updated, deleted.  But
typically, the most recent data will be what is most actively
modified, and the older data will be relatively more (though not
completely) static, and less frequently accessed.  Such examples are
common in many real-world applications.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move

2011-05-16 Thread Robert Haas

On Mon, May 16, 2011 at 4:19 AM, Robert Klemme
 wrote:
>> - If the planner chooses a Bitmap Index Scan, it effectively scans the
>> index to figure out which table blocks to read, and then reads those
>> table blocks in block number order, so that the I/O is sequential,
>> with skips.
>
> Are these two separate phases (i.e. first scan index completely, then
> access table)?

Yes.

> Separating index and tables might not be a totally good idea
> generally.  Richard Foote has an excellent article about Oracle but I
> assume at least a few things do apply to PostgreSQL as well - it's at
> least worth as something to check PostgreSQL's access patterns
> against:
>
> http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/
>
> I would probably rather try to separate data by the nature and
> frequency of accesses.  One reasonable separation would be to leave
> all frequently accessed tables *and* their indexes on local RAID and
> moving less frequently accessed data to the SAN.  This separation
> could be easily identified if you have separate tables for current and
> historic data.

Yeah, I think the idea of putting tables and indexes in separate
tablespaces is mostly to bring more I/O bandwidth to bear on the same
data.  But there are other reasonable things you might do also - e.g.
put the indexes on an SSD, and the tables on a spinning disk, figuring
that the SSD is less reliable but you can always rebuild the index if
you need to...

Also, a lot of people have reported big speedups from putting pg_xlog
on a dedicated RAID 1 pair, or moving the PostgreSQL logs off the data
partition.  So those sorts of divisions should be considered also.
Your idea of dividing things by access frequency is another good
thought.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] DBT-5 & Postgres 9.0.3

2011-05-15 Thread Robert Haas

On Thu, May 12, 2011 at 3:18 AM, Sethu Prasad  wrote:
> http://sourceforge.net/mailarchive/forum.php?forum_name=osdldbt-general&max_rows=25&style=nested&viewmonth=201104

It's not very obvious from reading through that link what you still
need help with.

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres 9.0.4 + Hot Standby + FusionIO Drive + Performance => Query failed ERROR: catalog is missing 1 attribute(s) for relid 172226

2011-05-15 Thread Robert Haas

On Tue, May 10, 2011 at 3:23 AM, Sethu Prasad  wrote:
> Yes it has something to do with Hot Standby, if you omit some parts on the
> archive then the standby instance will not have the necessary stuff and
> complain like this..

If you omit some parts of the archive, it won't start at all.  To get
it to complain like this, you need something more than accidental
misconfiguration.

> I kept the FusionIO drive in my checklist while attending to this issue, as
> we tried it looking for performance combined with read-only hot standby and
> in doubt I thought that the recovery is not successful on this drive safely.
> so I pointed that Fio Drive here.
>
> Straight to say, I missed the pg_clog directory on archive.
>
> seq_page_cost = 1.0
>
> random_page_cost = 1.0
>
> Is the above settings are fine when we deal with Fio and Performance, as I
> have the advice earlier stating that read and write are treated same with
> Fio drives.

I would think more like 0.1 than 1.0.

> Any suggestions on configuration changes to have read-only hot standby
> faster on READs.

effective_io_concurrency?

Adjust OS readahead?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Robert Haas

On Sun, May 15, 2011 at 2:08 PM, Josh Berkus  wrote:
>> All true.  I suspect that in practice the different between random and
>> sequential memory page costs is small enough to be ignorable, although
>> of course I might be wrong.
>
> This hasn't been my experience, although I have not carefully measured
> it.  In fact, there's good reason to suppose that, if you were selecting
> 50% of more of a table, sequential access would still be faster even for
> an entirely in-memory table.
>
> As a parallel to our development, Redis used to store all data as linked
> lists, making every object lookup effectively a random lookup.  They
> found that even with a database which is pinned in memory, creating a
> data page structure (they call it "ziplists") and supporting sequential
> scans was up to 10X faster for large lists.
>
> So I would assume that there is still a coefficient difference between
> seeks and scans in memory until proven otherwise.

Well, anything's possible.  But I wonder whether the effects you are
describing might result from a reduction in the *number* of pages
accessed rather than a change in the access pattern.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query improvement

2011-05-13 Thread Robert Haas

On Mon, May 2, 2011 at 3:58 AM, Claudio Freire  wrote:
> Hash joins are very inefficient if they require big temporary files.

Hmm, that's not been my experience.  What have you seen?

I've seen a 64-batch hash join beat out a
nested-loop-with-inner-indexscan, which I never woulda believed,
but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas

On Fri, May 13, 2011 at 4:13 PM, Josh Berkus  wrote:
> Instead, we should be fixing the formulas these are based on and leaving
> RPC alone.
>
> For any data page, there are actually four costs associated with each
> tuple lookup, per:

All true.  I suspect that in practice the different between random and
sequential memory page costs is small enough to be ignorable, although
of course I might be wrong.  I've never seen a database that was fully
cached in memory where it was necessary to set
random_page_cost>seq_page_cost to get good plans -- no doubt partly
because even if the pages were consecutive on disk, there's no reason
to suppose they would be so in memory, and we certainly wouldn't know
one way or the other at planning time.   But I agree we should add a
cached_page_cost as part of all this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas

On Fri, May 13, 2011 at 3:20 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure  wrote:
>>> The very first thing to check is effective_cache_size and to set it to
>>> a reasonable value.
>
>> Actually, effective_cache_size has no impact on costing except when
>> planning a nested loop with inner index scan.  So, a query against a
>> single table can never benefit from changing that setting.
>
> That's flat out wrong.  It does affect the cost estimate for plain
> indexscan (and bitmap indexscan) plans.

OK, I agree.  I obviously misinterpreted this code the last time I read it.

I guess maybe the reason why it didn't matter for the OP is that - if
the size of the index page in pages is smaller than the pro-rated
fraction of effective_cache_size allowed to the index - then the exact
value doesn't affect the answer.

I apparently need to study this code more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move

2011-05-13 Thread Robert Haas

On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos  wrote:
> I'm asking them for (real) benchmarks, thanks for the advice. (fio is not
> available for us now to do it myself, grmbl)
> It just occurred to me that it is not necessarily the case that reading the
> indexes causes a lot of random I/O (on the index itself).
> I mean, maybe the index is generally read sequentially and then, when
> retrieving the data, there is a lot of random I/O.
> if it's a long story, any tips for info about this (book or web site)?

If you don't do anything special, and if the query plan says "Index
Scan" rather than "Bitmap Index Scan", then both the index I/O and the
table I/O are likely to be fairly random.  However there are a number
of cases in which you can expect the table I/O to be sequential:

- In some cases, you may happen to insert rows with an ordering that
matches the index.  For example, if you have a table with not too many
updates and deletes, and an index on a serial column, then new rows
will have a higher value in that column than old rows, and will also
typically be physically after older rows in the file.  Or you might be
inserting timestamped data from oldest to newest.
- If the planner chooses a Bitmap Index Scan, it effectively scans the
index to figure out which table blocks to read, and then reads those
table blocks in block number order, so that the I/O is sequential,
with skips.
- If you CLUSTER the table on a particular index, it will be
physically ordered to match the index's key ordering.  As the table is
further modified the degree of clustering will gradually decline;
eventually you may wish to re-CLUSTER.

It's also worth keeping in mind that the index itself won't
necessarily be accessed in physically sequential order.  The point of
the index is to emit the rows in key order, but if the table is
heavily updated, it won't necessarily be the case that a page
containing lower-valued keys physically precedes a page containing
higher-valued keys.  I'm actually somewhat fuzzy on how this works,
and to what extent it's a problem in practice, but I am fairly sure it
can happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas

On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure  wrote:
> On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner
>  wrote:
>> Sok Ann Yap  wrote:
>>
>>> So, index scan wins by a very small margin over sequential scan
>>> after the tuning. I am a bit puzzled because index scan is more
>>> than 3000 times faster in this case, but the estimated costs are
>>> about the same. Did I do something wrong?
>>
>> Tuning is generally needed to get best performance from PostgreSQL.
>> Needing to reduce random_page_cost is not unusual in situations
>> where a good portion of the active data is in cache (between
>> shared_buffers and the OS cache).  Please show us your overall
>> configuration and give a description of the hardware (how many of
>> what kind of cores, how much RAM, what sort of storage system).  The
>> configuration part can be obtained by running the query on this page
>> and pasting the result into your next post:
>>
>> http://wiki.postgresql.org/wiki/Server_Configuration
>>
>> There are probably some other configuration adjustments you could do
>> to ensure that good plans are chosen.
>
> The very first thing to check is effective_cache_size and to set it to
> a reasonable value.

Actually, effective_cache_size has no impact on costing except when
planning a nested loop with inner index scan.  So, a query against a
single table can never benefit from changing that setting.  Kevin's
suggestion of adjusting seq_page_cost and random_page_cost is the way
to go.

We've talked in the past (and I still think it's a good idea, but
haven't gotten around to doing anything about it) about adjusting the
planner to attribute to each relation the percentage of its pages
which we believe we'll find in cache.  Although many complicated ideas
for determining that percentage have been proposed, my favorite one is
fairly simple: assume that small relations will be mostly or entirely
cached, and that big ones won't be.   Allow the administrator to
override the result on a per-relation basis.  It's difficult to
imagine a situation where the planner should assume that a relation
with only handful of pages isn't going to be cached.  Even if it
isn't, as soon as someone begins accessing it, it will be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] tuning on ec2

2011-05-11 Thread Robert Haas

On Tue, Apr 26, 2011 at 11:15 AM, Joel Reymont  wrote:
> I'm running pgsql on an m1.large EC2 instance with 7.5gb available memory.
>
> The free command shows 7gb of free+cached. My understand from the docs is 
> that I should dedicate 1.75gb to shared_buffers (25%) and set 
> effective_cache_size to 7gb.

Sounds like a reasonable starting point.  You could certainly fiddle
around a bit - especially with shared_buffers - to see if some other
setting works better, but that should be in the ballpark.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Checkpoint execution overrun impact?

2011-05-11 Thread Robert Haas

On Fri, Apr 22, 2011 at 5:21 AM, drvillo  wrote:
> -given the configuration attached (which is basically a vanilla one) and the
> number of buffers written at each execution, are these execution times
> normal or above average?

They seem fine.  Remember that the write is deliberately spread out;
it's not as if the system couldn't write out 130-160 8k blocks in less
than 30 s.

> -in the case of the execution that overruns past the timeout, what are the
> implications wrt the client application?

Not sure what you are referring to here.

> -AFAIU client connections are basically stalled during checkpoints. Is it
> reasonable to infer that the fact that the application blocking on a
> getConnection() might be related to checkpoints being executed?
> -considering some tuning on the PG side, should I try increasing
> checkpoint_timeout and rising checkpoint_completion_target to lessen the
> impact of IO on the client or should I shorten the period so there's less
> stuff to write? from the number of buffers written on average I'd assume the
> first option is the one to go for but I might miss some bit of reasoning
> here...

I'm a bit puzzled by all of this because the logs you posted seem to
reflect a system under very light load.  Each checkpoint is writing no
more than 4% of shared_buffers and the sync phases are generally
completing in less than one second.  I don't see why that would be
causing stalls.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] DBT-5 & Postgres 9.0.3

2011-05-11 Thread Robert Haas

On Tue, Apr 12, 2011 at 3:51 AM, Sethu Prasad  wrote:
> Anyone lucky to have dbt5 run for PostgreSQL 9.0.3?!
>
> I am trying on Novell SuSE Linux Enterprise Server 11 SP1 x86_64 with a
> virtual machine and bit hard with no success run yet. If you can help me
> with any docs will be more of a support.

What's going wrong for you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2011-05-11 Thread Robert Haas

On Wed, Apr 13, 2011 at 1:22 PM, Scott Carey  wrote:
> A pathological skew case (all relations with the same key), should be
> _cheaper_ to probe.   There should be only _one_ entry in the hash (for
> the one key), and that entry will be a list of all relations matching the
> key.  Therefore, hash probes will either instantly fail to match on an
> empty bucket, fail to match the one key with one compare, or match the one
> key and join on the matching list.
>
> In particular for anti-join, high skew should be the best case scenario.

I think this argument may hold some water for an anti-join, and maybe
for a semi-join, but it sure doesn't seem right for any kind of join
that has to iterate over all matches (rather than just the first one);
that is, inner, left, right, or full.

> A hash structure that allows multiple entries per key is inappropriate for
> skewed data, because it is not O(n).  One that has one entry per key
> remains O(n) for all skew.  Furthermore, the hash buckets and # of entries
> is proportional to n_distinct in this case, and smaller and more cache and
> memory friendly to probe.

I don't think this argument is right.  The hash table is sized for a
load factor significantly less than one, so if there are multiple
entries in a bucket, it is fairly likely that they are all for the
same key.  Granted, we have to double-check the keys to figure that
out; but I believe that the data structure you are proposing would
require similar comparisons.  The only difference is that they'd be
required when building the hash table, rather than when probing it.

> You can put either relation on the outside with an anti-join, but would
> need a different algorithm and cost estimator if done the other way
> around.  Construct a hash on the join key, that keeps a list of relations
> per key, iterate over the other relation, and remove the key and
> corresponding list from the hash when there is a match, when complete the
> remaining items in the hash are the result of the join (also already
> grouped by the key).  It could be terminated early if all entries are
> removed.
> This would be useful if the hash was small, the other side of the hash too
> large to fit in memory, and alternative was a massive sort on the other
> relation.

This would be a nice extension of commit
f4e4b3274317d9ce30de7e7e5b04dece7c4e1791.

> Does the hash cost estimator bias towards smaller hashes due to hash probe
> cost increasing with hash size due to processor caching effects?  Its not
> quite O(n) due to caching effects.

I don't think we account for that (and I'm not convinced we need to).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Poor performance when joining against inherited tables

2011-05-11 Thread Robert Haas

On Mon, Apr 11, 2011 at 4:11 PM, Lucas Madar  wrote:
> I have a database that contains many tables, each with some common
> characteristics. For legacy reasons, they have to be implemented in a way so
> that they are *all* searchable by an older identifier to find the newer
> identifier. To do this, we've used table inheritance.
>
> Each entry has an id, as well as a legacyid1 and legacyid2. There's a master
> table that the application uses, containing a base representation and common
> characteristics:
>
> objects ( id, ... )
> item ( id, legacyid1, legacyid2 )
>  | - itemXX
>  | - itemYY
>
> There is nothing at all in the item table, it's just used for inheritance.
> However, weird things happen when this table is joined:
>
> EXPLAIN ANALYZE SELECT * FROM objects INNER JOIN item f USING ( id );
>
>  QUERY PLAN
> 
>  Hash Join  (cost=457943.85..1185186.17 rows=8643757 width=506)
>   Hash Cond: (f.id = objects.id)
>   ->  Append  (cost=0.00..224458.57 rows=8643757 width=20)
>         ->  Seq Scan on item f  (cost=0.00..26.30 rows=1630 width=20)
>         ->  Seq Scan on itemXX f  (cost=0.00..1.90 rows=90 width=20)
>         ->  Seq Scan on itemYY f  (cost=0.00..7.66 rows=266 width=20)
>         ->  Seq Scan on itemZZ f  (cost=0.00..1.02 rows=2 width=20)
>         ...
>   ->  Hash  (cost=158447.49..158447.49 rows=3941949 width=490)
>         ->  Seq Scan on objects  (cost=0.00..158447.49 rows=3941949
> width=490)
>
> This scans everything over everything, and obviously takes forever (there
> are millions of rows in the objects table, and tens of thousands in each
> itemXX table).
>
> However, if I disable seqscan (set enable_seqscan=false), I get the
> following plan:
>
>  QUERY PLAN
> 
>  Hash Join  (cost=10001298843.53..290002337961.71 rows=8643757 width=506)
>   Hash Cond: (f.id = objects.id)
>   ->  Append  (cost=100.00..29536334.43 rows=8643757 width=20)
>         ->  Seq Scan on item f  (cost=100.00..126.30
> rows=1630 width=20)
>         ->  Index Scan using xxx_pkey on itemXX f  (cost=0.00..10.60 rows=90
> width=20)
>         ->  Index Scan using yyy_pkey on itemYY f  (cost=0.00..25.24
> rows=266 width=20)
>         ->  Index Scan using zzz_pkey on itemZZ f  (cost=0.00..9.28 rows=2
> width=20)
>         ...
>   ->  Hash  (cost=999347.17..999347.17 rows=3941949 width=490)
>         ->  Index Scan using objects_pkey on objects (cost=0.00..999347.17
> rows=3941949 width=490)
>
> This seems like a much more sensible query plan.

I don't think so.  Scanning the index to extract all the rows in a
table is typically going to be a lot slower than a sequential scan.

A more interesting question is why you're not getting a plan like this:

Nested Loop
-> Seq Scan on objects
-> Append
   ->  Index Scan using xxx_pkey on itemXX
   ->  Index Scan using yyy_pkey on itemYY
   ->  Index Scan using zzz_pkey on itemZZ

> But it seems to think doing
> a sequential scan on the *empty* item table is excessively expensive in this
> case.
>
> Aside from enable_seqscan=false, is there any way I can make the query
> planner not balk over doing a seqscan on an empty table?

Why would you care?  A sequential scan of an empty table is very fast.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-05-11 Thread Robert Haas

On Tue, May 10, 2011 at 2:20 PM, Maria L. Wilson
 wrote:
> haven't tested a composite index
>
> invsensor is 2,003,980 rows and 219MB
> granver is 5,138,730 rows and 556MB
> the machine has 32G memory
> seq_page_cost, random_page_costs & effective_cache_size are set to the
> defaults (1,4, and 128MB) - looks like they could be bumped up.
> Got any recommendations?

Yeah, I'd try setting effective_cache_size=24GB, seq_page_cost=0.1,
random_page_cost=0.1 and see if you get a better plan.  If possible,
can you post the EXPLAIN ANALYZE output with those settings for us?

If that doesn't cause the planner to use the indexes, then I'd be
suspicious that there is something wrong with those indexes that makes
the planner think it *can't* use them.  It would be helpful to see the
EXPLAIN output after SET enable_seqscan=off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-05-10 Thread Robert Haas

[ woops, accidentally replied off-list, trying again ]

On Tue, May 10, 2011 at 1:47 PM, Maria L. Wilson
 wrote:
> thanks for taking a look at this  and it's never too late!!
>
> I've tried bumping up work_mem and did not see any improvements -
> All the indexes do exist that you asked see below
> Any other ideas?
>
> CREATE INDEX invsnsr_idx1
>  ON invsensor
>  USING btree
>  (granule_id);
>
> CREATE INDEX invsnsr_idx2
>  ON invsensor
>  USING btree
>  (sensor_id);

What about a composite index on both columns?

> CREATE UNIQUE INDEX granver_idx1
>  ON gran_ver
>  USING btree
>  (granule_id);

It's a bit surprising to me that this isn't getting used.  How big are
these tables, and how much memory do you have, and what values are you
using for seq_page_cost/random_page_cost/effective_cache_size?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-05-10 Thread Robert Haas

On Tue, Apr 5, 2011 at 3:25 PM, Maria L. Wilson
 wrote:
> Would really appreciate someone taking a look at the query below  Thanks
> in advance!
>
>
> this is on a linux box...
> Linux dsrvr201.larc.nasa.gov 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37
> EST 2009 x86_64 x86_64 x86_64 GNU/Linux
>
> explain analyze
> select MIN(IV.STRTDATE), MAX(IV.ENDDATE)
> from GRAN_VER GV
> left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID, INVSENSOR
> INVS
> where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and
> INVS.sensor_id='13'
>
>
> "Aggregate  (cost=736364.52..736364.53 rows=1 width=8) (actual
> time=17532.930..17532.930 rows=1 loops=1)"
> "  ->  Hash Join  (cost=690287.33..734679.77 rows=336949 width=8) (actual
> time=13791.593..17323.080 rows=924675 loops=1)"
> "    Hash Cond: (invs.granule_id = gv.granule_id)"
> "    ->  Seq Scan on invsensor invs  (cost=0.00..36189.41 rows=1288943
> width=4) (actual time=0.297..735.375 rows=1277121 loops=1)"
> "  Filter: (sensor_id = 13)"
> "    ->  Hash  (cost=674401.52..674401.52 rows=1270865 width=16) (actual
> time=13787.698..13787.698 rows=1270750 loops=1)"
> "  ->  Hash Join  (cost=513545.62..674401.52 rows=1270865
> width=16) (actual time=1998.702..13105.578 rows=1270750 loops=1)"
> "    Hash Cond: (gv.granule_id = iv.granule_id)"
> "    ->  Seq Scan on gran_ver gv  (cost=0.00..75224.90
> rows=4861490 width=4) (actual time=0.008..1034.885 rows=4867542 loops=1)"
> "    ->  Hash  (cost=497659.81..497659.81 rows=1270865
> width=12) (actual time=1968.918..1968.918 rows=1270750 loops=1)"
> "  ->  Bitmap Heap Scan on inventory iv
> (cost=24050.00..497659.81 rows=1270865 width=12) (actual
> time=253.542..1387.957 rows=1270750 loops=1)"
> "    Recheck Cond: (inv_id = 65)"
> "    ->  Bitmap Index Scan on inven_idx1
> (cost=0.00..23732.28 rows=1270865 width=0) (actual time=214.364..214.364
> rows=1270977 loops=1)"
> "  Index Cond: (inv_id = 65)"
> "Total runtime: 17533.100 ms"
>
> some additional info.
> the table inventory is about 4481 MB and also has postgis types.
> the table gran_ver is about 523 MB
> the table INVSENSOR is about 217 MB
>
> the server itself has 32G RAM with the following set in the postgres conf
> shared_buffers = 3GB
> work_mem = 64MB
> maintenance_work_mem = 512MB
> wal_buffers = 6MB
>
> let me know if I've forgotten anything!  thanks a bunch!!

Late response here, but...

Is there an index on invsensor (sensor_id, granule_id)?  If not, that
might be something to try.  If so, you might want to try to figure out
why it's not being used.

Likewise, is there an index on gran_ver (granule_id)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql?

2011-05-05 Thread Robert Haas

On Sat, Apr 30, 2011 at 4:51 AM, Hsien-Wen Chu  wrote:
> since the block size is 8k for the default, and it consisted with many
> tuple/line; as my understand, if any tuple/line is changed(maybe
> update, insert, delete). the block will be marked as dirty block. and
> then it will be flashed to disk by bgwriter.

True...

> so my question is if the data block(8k) is aligned with the file
> system block?  if it is aligned with file system block, so what's the
> potential issue make it is not safe for direct io. (please  assume
> vxfs, vxvm and the disk sector is aligned ).please correct me if any
> incorrect.

It's not about safety - it's about performance.  On a machine with
64GB of RAM, a typical setting for shared_buffers set to 8GB.  If you
start reading blocks into the PostgreSQL cache - or writing them out
of the cache - in a way that bypasses the filesystem cache, you're
going to have only 8GB of cache, instead of some much larger amount.
More cache = better performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance

2011-04-29 Thread Robert Haas

On Apr 27, 2011, at 11:01 PM, Claudio Freire  wrote:
> The patch may be simple, the testing not so much. I know that.
> 
> What tools do we have to do that testing? There are lots, and all
> imply a lot of work. Is that work worth the trouble? Because if it
> is... why not work?
> 
> I would propose a step in the right direction: a patch to compute and
> log periodical estimations of the main I/O tunables: random_page_cost,
> sequential_page_cost and effective_cache_size. Maybe per-tablespace.
> Evaluate the performance impact, and work from there.
> 
> Because, probably just using those values as input to the optimizer
> won't work, because dbas will want a way to tune the optimizer,
> because the system may not be stable enough, even because even with
> accurate estimates for those values, the optimizer may not perform as
> expected. I mean, right now those values are tunables, not real
> metrics, so perhaps the optimizer won't respond well to real values.
> 
> But having the ability to measure them without a serious performance
> impact is a step in the right direction, right?

Sure. It's not a real easy problem, but don't let that discourage you from 
working on it. Getting more eyeballs on these issues can only be a good thing.

...Robert
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance

2011-04-29 Thread Robert Haas

On Apr 29, 2011, at 10:25 AM, James Mansion  
wrote:
> Robert Haas wrote:
>> The server can and does measure hit rates for the PG buffer pool, but to my 
>> knowledge there is no clear-cut way for PG to know whether read() is 
>> satisfied from the OS cache or a drive cache or the platter.
>> 
>>  
> Does the server know which IO it thinks is sequential, and which it thinks is 
> random? 

No. It models this in the optimizer, but the executor has no clue.  And 
sometimes we model I/O as partly random, partly sequential, as in the case of 
heap fetches on a clustered index.  So the answer isn't even a Boolean.

...Robert
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

1 2 3 4 5 6 7 >

1 - 100 of 693 matches

Mail list logo