Re: [PERFORM] Query optimization using order by and limit

2011-09-21 Thread Tom Lane
Michael Viscuso  writes:
> Greg/Tom, you are correct, these columns should be modified to whatever
> is easiest for Postgres to recognize 64-bit unsigned integers.  Would
> you still recommend bigint for unsigned integers?  I likely read the
> wrong documentation that suggested bigint for signed 64-bit integers and
> numeric(20) for unsigned 64-bit integers.

Unsigned?  Oh, hm, that's a bit of a problem because we don't have any
unsigned types.  If you really need to go to 2^64 and not 2^63 then
you're stuck with numeric ... but that last bit is costing ya a lot.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Query optimization using order by and limit

2011-09-21 Thread Michael Viscuso
Thanks guys,

First of all, I should have included my postgres.conf file with the
original submission.  Sorry about that.  It is now attached.

Based on a recommendation, I also should have shown the parent child
relationship between osmoduleloads and its daily partitioned tables.  to
reduce clutter, It is at the end of this message.

Taking this one step at a time and taking Greg's second suggestion
first, issuing

select * from osmoduleloads WHERE osmoduleloads.firstloadtime >=
1296043200 AND osmoduleloads.firstloadtime < 12961036799000;

appears to only query the appropriate daily tables (2011_09_13 through
2011_09_20 - http://explain.depesz.com/s/QCG).  So it appears that
constraint_exclusion is working properly.  Putting a limit on the query
like:

select * from osmoduleloads WHERE osmoduleloads.firstloadtime >=
1296043200 AND osmoduleloads.firstloadtime < 12961036799000
limit 251;

has the result that I'd expect to see http://explain.depesz.com/s/O7fZ.
Ordering by firstloadtime AND limiting like:

select * from osmoduleloads WHERE osmoduleloads.firstloadtime >=
1296043200 AND osmoduleloads.firstloadtime < 12961036799000
order by firstloadtime desc limit 251;

also has the result that I'd expect to see
http://explain.depesz.com/s/RDh.  

Adding the hosts join condition to the mix was still OK
http://explain.depesz.com/s/2Ns. 

Adding the hosts.enabled condition was still OK
http://explain.depesz.com/s/UYN. 

Adding the hosts.user_id = 111 started the descent but it appears to
still be obeying the proper contraint_exclusion that I'd expect, just
with a ton of rows returned from the most recent daily tables
http://explain.depesz.com/s/4WE. 

Adding the final condition hosts_guid = '2007075705813916178' is what
ultimately kills it http://explain.depesz.com/s/8zy.  By adding the
host_guid, it spends considerably more time in the older tables than
without this condition and I'm not sure why. 

Thanks Greg for the recommendation to step through it like that -
hopefully this helps get us closer to a resolution.

Greg/Tom, you are correct, these columns should be modified to whatever
is easiest for Postgres to recognize 64-bit unsigned integers.  Would
you still recommend bigint for unsigned integers?  I likely read the
wrong documentation that suggested bigint for signed 64-bit integers and
numeric(20) for unsigned 64-bit integers.

Thanks again for all your help!  Perhaps 15 hours of pouring over
explain logs will finally pan out!

Mike

cb=# \d+ osmoduleloads;
 Table "public.osmoduleloads"
Column |Type |  
Modifiers   | Storage  | Description
---+-+---+--+-
 guid  | numeric(20,0)   | not
null  | main |
 osprocess_guid| numeric(20,0)   | not
null  | main |
 filepath_guid | numeric(20,0)   | not
null  | main |
 firstloadtime | numeric(20,0)   | not
null  | main |
 md5hash   | bytea   | not
null  | extended |
 host_guid | numeric(20,0)   | default
NULL::numeric | main |
 process_create_time   | numeric(20,0)   | default
NULL::numeric | main |
 process_filepath_guid | numeric(20,0)   | default
NULL::numeric | main |
 event_time| timestamp without time zone
|   | plain|
Indexes:
"osmoduleloads_pkey" PRIMARY KEY, btree (guid)
Child tables: osmoduleloads_2001_12_31,
  osmoduleloads_2010_10_11,
  osmoduleloads_2010_10_12,
  osmoduleloads_2010_10_13,
  osmoduleloads_2011_07_27,
  osmoduleloads_2011_08_04,
  osmoduleloads_2011_08_05,
  osmoduleloads_2011_08_06,
  osmoduleloads_2011_08_07,
  osmoduleloads_2011_08_08,
  osmoduleloads_2011_08_09,
  osmoduleloads_2011_08_10,
  osmoduleloads_2011_08_11,
  osmoduleloads_2011_08_12,
  osmoduleloads_2011_08_13,
  osmoduleloads_2011_08_14,
  osmoduleloads_2011_08_15,
  osmoduleloads_2011_08_16,
  osmoduleloads_2011_08_17,
  osmoduleloads_2011_08_18,
  osmoduleloads_2011_08_19,
  osmoduleloads_2011_08_20,
  osmoduleloads_2011_08_21,
  osmoduleloads_2011_08_22,
  osmoduleloads_2011_08_23,
  osmoduleloads_2011_08_24,
  osmoduleloads_2011_08_25,
  osmoduleloads_2011_08_26,
  osmoduleloads_2011_08_27,
  osmoduleloads_2011_08_28,
  osmoduleloads_2011_08_29,
  osmoduleloads_2011_08_30,
  osmoduleloads_2011_08_3

Re: [PERFORM] Query optimization using order by and limit

2011-09-21 Thread Tom Lane
Greg Smith  writes:
> That weird casting can't be helping.  I'm not sure if it's your problem 
> here, but the constraint exclusion code is pretty picky about matching 
> the thing you're looking for against the CHECK constraint, and this is a 
> messy one.  The bigint conversion in the middle there isn't doing 
> anything useful for you anyway; you really should simplify this to just 
> look like this:
> firstloadtime >= 1296044640::numeric

I have a more aggressive suggestion: change all the numeric(20,0) fields
to bigint.  Unless the OP actually needs values wider than 64 bits,
the choice to use numeric is a significant performance penalty for
nothing.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query optimization using order by and limit

2011-09-21 Thread Greg Smith

On 09/21/2011 07:14 PM, Michael Viscuso wrote:

Check constraints:
"osmoduleloads_2011_09_14_event_time_check" CHECK (event_time = 
'2011-09-14 00:00:00'::timestamp without time zone)
"osmoduleloads_2011_09_14_firstloadtime_check" CHECK 
(firstloadtime >= 1296044640::bigint::numeric AND 
firstloadtime < 1296053280::bigint::numeric)

Inherits: osmoduleloads


That weird casting can't be helping.  I'm not sure if it's your problem 
here, but the constraint exclusion code is pretty picky about matching 
the thing you're looking for against the CHECK constraint, and this is a 
messy one.  The bigint conversion in the middle there isn't doing 
anything useful for you anyway; you really should simplify this to just 
look like this:


firstloadtime >= 1296044640::numeric

SELECT osm_1.*, storefiles_1.*, filepaths_1.*, filepaths_2.* FROM 
(SELECT * FROM osmoduleloads JOIN hosts ON hosts.guid = 
osmoduleloads.host_guid WHERE hosts.guid = '2007075705813916178' AND 
osmoduleloads.firstloadtime >= 1296043200 AND 
osmoduleloads.firstloadtime < 12961036799000 AND hosts.enabled = 
true AND hosts.user_id = 111 ORDER BY osmoduleloads.firstloadtime DESC 
LIMIT 251) AS osm_1 LEFT OUTER JOIN storefiles AS storefiles_1 ON 
osm_1.md5hash = storefiles_1.md5hash LEFT OUTER JOIN filepaths AS 
filepaths_1 ON osm_1.process_filepath_guid = filepaths_1.guid AND 
osm_1.event_time = filepaths_1.event_time LEFT OUTER JOIN filepaths AS 
filepaths_2 ON osm_1.filepath_guid = filepaths_2.guid AND 
osm_1.event_time= filepaths_2.event_time ORDER BY osm_1.firstloadtime 
DESC;




What you should start with here is confirming whether or not a simpler 
query touches all of the partitions or just the ones you expect it to.  
A simpler one like this:


SELECT * FROM osmoduleloads WHERE osmoduleloads.firstloadtime >= 
1296043200 AND osmoduleloads.firstloadtime < 12961036799000;


Would be the place to begin. Once you've got that working, then you can 
build up more pieces, and see if one of them results in the query not 
excluding partitions anymore or not.  I can't figure out if you're 
running into a basic error here, where constraint exclusion just isn't 
working at all, or if you are only having this problem because the query 
is too complicated.  Figuring that out will narrow the potential solutions.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



[PERFORM] Query optimization using order by and limit

2011-09-21 Thread Michael Viscuso
First of all, thank you for taking the time to review my question.  After
attending the PostgresOpen conference in Chicago last week, I've been
pouring over explain logs for hours on end and although my system is MUCH
better, I still can't resolve a few issues.  Luckily my data is pretty well
structured so solving one issue will likely solve many more so I'll start
with this one.

Version: PostgreSQL 9.1rc1, compiled by Visual C++ build 1500, 64-bit
OS: Windows 7 64-bit
ORM: SQLAlchemy
Postgres table structure: I have daily partitioned tables for each of 4
"core tables" (the tables with the majority of my application's data).  Each
daily table inherits from its parent.  I do not explicitly define a
REFERENCE between these tables because I cannot guarantee the order in which
the events are inserted into the database, but where there are references,
the referenced row should exist in the other's daily table.  The reason I
partitioned the data in this manner is to increase query speed and make it
easy to archive old data. (I'm new to high-end Postgres performance so
there's likely several fundamental flaws in my assumptions.  I won't turn
down any recommendation.)

An example of a daily partitioned table follows:

cb=# \d osmoduleloads_2011_09_14;
 Table "public.osmoduleloads_2011_09_14"
Column |Type |
  Modifiers
---+-+
 guid  | numeric(20,0)   | not null
 osprocess_guid| numeric(20,0)   | not null
 filepath_guid | numeric(20,0)   | not null
 firstloadtime | numeric(20,0)   | not null
 md5hash   | bytea   | not null
 host_guid | numeric(20,0)   | default NULL::numeric
 process_create_time   | numeric(20,0)   | default NULL::numeric
 process_filepath_guid | numeric(20,0)   | default NULL::numeric
 event_time| timestamp without time zone | default '2011-09-14
00:00:00'::timestamp without time zone
Indexes:
"osmoduleloads_2011_09_14_pkey" PRIMARY KEY, btree (guid)
"idx_osmoduleloads_2011_09_14_filepath_guid" btree (filepath_guid)
"idx_osmoduleloads_2011_09_14_firstload_time" btree (firstloadtime)
"idx_osmoduleloads_2011_09_14_host_guid" btree (host_guid)
"idx_osmoduleloads_2011_09_14_md5hash" btree (md5hash)
"idx_osmoduleloads_2011_09_14_osprocess_guid" btree (osprocess_guid)
Check constraints:
"osmoduleloads_2011_09_14_event_time_check" CHECK (event_time =
'2011-09-14 00:00:00'::timestamp without time zone)
"osmoduleloads_2011_09_14_firstloadtime_check" CHECK (firstloadtime >=
1296044640::bigint::numeric AND firstloadtime <
1296053280::bigint::numeric)
Inherits: osmoduleloads

Objective:  The firstloadtime check constraint ensures that the record is
applicable to that daily table. (In case you were wondering, the large
numerics correspond to the Windows 100-nanosecond since the Epoch.) I'm
inserting millions of records into each daily table so "query slowness" is
quite easy to spot.  Given that there is so much data per daily table, I was
hoping to use the order by and limit clauses to "stop out" a query once it
sufficed the limit clause and not be forced to visit each daily table.
 However, I'm spending way too much time in the older tables than I'd like -
which leads me to believe that I;m doing something wrong.  For ease of
viewing, my explain analyze can be found at http://explain.depesz.com/s/tot

I'm still very new to this so I'm not sure if explain.depesz.com saves the
original query.  It wasn't readily apparent that it did so here is the
original query:

SELECT osm_1.*, storefiles_1.*, filepaths_1.*, filepaths_2.* FROM (SELECT *
FROM osmoduleloads JOIN hosts ON hosts.guid = osmoduleloads.host_guid WHERE
hosts.guid = '2007075705813916178' AND osmoduleloads.firstloadtime >=
1296043200 AND osmoduleloads.firstloadtime < 12961036799000 AND
hosts.enabled = true AND hosts.user_id = 111 ORDER BY
osmoduleloads.firstloadtime DESC LIMIT 251) AS osm_1 LEFT OUTER JOIN
storefiles AS storefiles_1 ON osm_1.md5hash = storefiles_1.md5hash LEFT
OUTER JOIN filepaths AS filepaths_1 ON osm_1.process_filepath_guid =
filepaths_1.guid AND osm_1.event_time = filepaths_1.event_time LEFT OUTER
JOIN filepaths AS filepaths_2 ON osm_1.filepath_guid = filepaths_2.guid AND
osm_1.event_time= filepaths_2.event_time ORDER BY osm_1.firstloadtime DESC;

Hopefully my assumptions about order by and limit are correct and this query
can be optimized.

Again, appreciate any help you can lend.  Thanks in advance.

Mike


Re: [PERFORM] Prepared statements and suboptimal plans

2011-09-21 Thread Royce Ausburn

Sorry all - this was a duplicate from another of my addresses =(  Thanks to all 
that have helped out on both threads.





On 21/09/2011, at 8:44 AM, Royce Ausburn wrote:

> Hi all,
> 
> It looks like I've been hit with this well known issue.  I have a complicated 
> query that is intended to run every few minutes, I'm using JDBC's 
> Connection.prepareStatement() mostly for nice parameterisation, but postgres 
> produces a suboptimal plan due to its lack of information when the statement 
> is prepared.
> 
> I've been following the mailing list for a few years and I've seen this topic 
> come up a bit.  I've just done a quick google and I'm not quite sure how to 
> fix this short of manually substituting my query parameters in to a query 
> string -- avoiding prepared statements… An alternative might be to re-write 
> the query and hope that the planner's general plan is a bit closer to 
> optimal… but are these my only options?  
> 
> I notice that the non-prepared-statement (both below my sig) plan estimates 
> 5500 rows output.  I think that's out by a factor of up to 100, suggesting 
> that I might want to increase my statistics and re-analyse… but as I 
> understand the prepared-statement problem, this probably won't help here.  
> Correct?
> 
> We've been worst hit by this query on an 8.3 site.  Another site is running 
> 8.4.  Have there been improvements in this area recently?  Upgrading to 9.0 
> might be viable for us.
> 
> Any tips would be appreciated,
> 
> --Royce
> 
> 
> test=# PREPARE test (integer) as 
>   select 
> sid, 
> role, 
> starttime::date, 
> nasid, importer, 
> max(eventbinding.biid) as biid, 
> sum(bytesin) as bytesin, 
> sum(bytesout) as bytesout,  
> sum(seconds) as seconds, 
> sum(coalesce(pages, 0)) as pages, 
> sum(coalesce(count, 0)) as count,  
> sum(coalesce(rate, 0.0)) as rate, 
> sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
> sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
> count(*) as entries  
>   from billingItem, eventBinding , fqun  
>   where eventBinding.biid > $1 and eventBinding.biid = billingItem.biid and 
> fqun.uid = eventBinding.uid  
>   group by sid, starttime::date, nasid, importer, role;
> PREPARE
> test=# explain EXECUTE test(57205899);
>   QUERY PLAN  
> 
> ---
>  GroupAggregate  (cost=12338998.78..13770049.38 rows=18465169 width=148)
>->  Sort  (cost=12338998.78..12385161.70 rows=18465169 width=148)
>  Sort Key: fqun.sid, ((billingitem.starttime)::date), 
> billingitem.nasid, billingitem.importer, eventbinding.role
>  ->  Hash Join  (cost=1498473.48..7333418.55 rows=18465169 width=148)
>Hash Cond: (eventbinding.uid = fqun.uid)
>->  Hash Join  (cost=1496916.06..6916394.83 rows=18465169 
> width=148)
>  Hash Cond: (billingitem.biid = eventbinding.biid)
>  ->  Seq Scan on billingitem  (cost=0.00..1433087.88 
> rows=56222688 width=142)
>  ->  Hash  (cost=1175939.45..1175939.45 rows=18465169 
> width=10)
>->  Bitmap Heap Scan on eventbinding  
> (cost=427409.84..1175939.45 rows=18465169 width=10)
>  Recheck Cond: (biid > $1)
>  ->  Bitmap Index Scan on 
> eventbinding_biid_uid_role_idx  (cost=0.00..422793.55 rows=18465169 width=0)
>Index Cond: (biid > $1)
>->  Hash  (cost=943.85..943.85 rows=49085 width=8)
>  ->  Seq Scan on fqun  (cost=0.00..943.85 rows=49085 
> width=8)
> (15 rows)
> 
> 
> 
> 
> As a query on the psql command line:
> 
> test=# explain 
>   select 
> sid, 
> role, 
> starttime::date, 
> nasid, 
> importer, 
> max(eventbinding.biid) as biid, 
> sum(bytesin) as bytesin, 
> sum(bytesout) as bytesout,  
> sum(seconds) as seconds, 
> sum(coalesce(pages, 0)) as pages, 
> sum(coalesce(count, 0)) as count,  
> sum(coalesce(rate, 0.0)) as rate, 
> sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
> sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
> count(*) as entries  
>   from billingItem, eventBinding , fqun  
>   where eventBinding.biid > 57205899 and eventBinding.biid = billingItem.biid 
> and fqun.uid = eventBinding.uid  
>   group by sid, starttime::date, nasid, importer, role;
>  QUERY PLAN   
>   
> 
>  HashAggregate  (cost=102496.80..102704.55 row

Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Kevin Grittner
Venkat Balaji  wrote:
 
> If i got it correct, CLUSTER would do the same what VACUUM FULL
> does (except being fast)
 
CLUSTER copies the table (in the sequence of the specified index) to
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones.  So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0.  (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.)  REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL.  Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.
 
When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible.  Like a name column if
you do a lot of name searches.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Venkat Balaji
Thanks Greg !

If i got it correct, CLUSTER would do the same what VACUUM FULL does (except
being fast).

CLUSTER is recommended only because it is faster ? As per the link, the
table would be unavailable (for shorter period compared to VACUUM FULL) when
CLUSTER is executed as well. Hope i got it correct !

Thanks
Venkat

On Wed, Sep 21, 2011 at 11:27 PM, Greg Smith  wrote:

> On 09/21/2011 12:13 PM, Venkat Balaji wrote:
>
>> I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to
>> ensure that IO performance and Indexing performance would be good
>>
>
>
> Read 
> http://wiki.postgresql.org/**wiki/VACUUM_FULLbefore
>  you run VACUUM FULL.  You probably don't want to do that.  A
> multi-gigabyte table can easily be unavailable for several hours if you
> execute VACUUM FULL against it.  CLUSTER is almost always faster.
>
> --
> Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
> org )
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-performance
>


Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Greg Smith

On 09/21/2011 12:13 PM, Venkat Balaji wrote:
I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE 
to ensure that IO performance and Indexing performance would be good



Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM 
FULL.  You probably don't want to do that.  A multi-gigabyte table can 
easily be unavailable for several hours if you execute VACUUM FULL 
against it.  CLUSTER is almost always faster.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Constraint exclusion on UNION ALL subqueries with WHERE conditions

2011-09-21 Thread Tom Lane
=?ISO-8859-1?Q?Gunnlaugur_=DE=F3r_Briem?=  writes:
> On Monday, September 19, 2011 3:59:30 AM UTC, Tom Lane wrote:
>> Works for me in 8.4.8.  Do you have constraint_exclusion set to ON?

> I did try with constraint_exclusion set to on, though the docs suggest 
> partition should be enough ("examine constraints only for ... UNION ALL 
> subqueries")

> Here's a minimal test case (which I should have supplied in the original 
> post, sorry), tried just now in 8.4.8:

> CREATE OR REPLACE VIEW v_heavy_view
> AS SELECT (random()*1e5)::integer col
> FROM generate_series(1, 1e6::integer);

> CREATE OR REPLACE VIEW v_test_constraint_exclusion AS
> SELECT col FROM v_heavy_view WHERE col < 3
> UNION ALL SELECT col FROM v_heavy_view WHERE col >= 3;

> EXPLAIN SELECT * FROM v_test_constraint_exclusion WHERE col=2;

Hmm.  The reason this particular case doesn't work is that we don't
apply relation_excluded_by_constraints() to functions-in-FROM.
It's only used for plain-table RTEs, not subqueries, functions,
etc.  I suspect the complainant's real case involved an unflattenable
subquery.

Probably the rationale for that coding was that only plain tables
could have CHECK constraints; but the portion of the logic that looks
for mutually contradictory scan constraints could apply to non-table
relations.

Should we change the code to make such checks in these cases?
The default behavior (with constraint_exclusion = partition) would
still be to do nothing extra, but it would add planning expense when
constraint_exclusion = on.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Prepared statements and suboptimal plans

2011-09-21 Thread Merlin Moncure
On Tue, Sep 20, 2011 at 5:44 PM, Royce Ausburn  wrote:
> Hi all,
> It looks like I've been hit with this well known issue.  I have
> a complicated query that is intended to run every few minutes, I'm using
> JDBC's Connection.prepareStatement() mostly for nice parameterisation, but
> postgres produces a suboptimal plan due to its lack of information when the
> statement is prepared.

Postgres has gotten incrementally smarter about this, but at the end
of the day it's just working under what the jdbc driver is telling it
what to do.  One thing you can do is disable sever-side prepared
statements with the prepareThreshold=0 decoration to the jdbc url give
that a whirl and see how it turns out.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Shaun Thomas

On 09/21/2011 11:20 AM, Venkat Balaji wrote:


It is very important to remove it from the WIKI page.


Removing it is a little premature. :) Definitely going to add a warning 
about relying on its output, though. The query itself was created and 
integrated into the check_postgres.pl nagios plugin as a very, very 
gross estimate of bloated tables.


It wasn't the most accurate thing in the world, but considering what it 
had to work with, it did a pretty good job. Generally CLUSTER or VACUUM 
FULL would remove a table from the query output, but not always. It's 
those edge cases that cause problems.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Venkat Balaji
It is very important to remove it from the WIKI page.

I ran it on production PG9.0 and it does not error out and displays numbered
output.

I noticed that, this works till PG-8.2 (as per the message).

Venkat

On Wed, Sep 21, 2011 at 8:25 PM, Shaun Thomas  wrote:

> On 09/21/2011 09:12 AM, Tom Lane wrote:
>
>  The PG wiki is editable by anyone who signs up for an account.  Feel
>> free to put in an appropriate disclaimer, or improve the sample
>> query.
>>
>
> Ah, well then. I do have an account, but thought there were more granular
> page restrictions than that. I may have to start wading into them when I see
> stuff like this. :)
>
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> stho...@peak6.com
>
> __**
>
> See 
> http://www.peak6.com/email-**disclaimer/for
>  terms and conditions related to this email
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
> org )
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-performance
>


Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Venkat Balaji
Thank you very much for your detailed explanation !

I will be working on our existing "auto-vacuuming" strategy to see
if that's optimal. But, we do have VACUUM VERBOSE ANALYZE running at the
cluster level every day and auto-vacuum is aggressive for highly active
tables.

Today, we have vacuumed a 10GB table and the table size decreased to 5 GB.

I understand that, it would very expensive for the table to reclaim the
space back from the filesystem. We have decided to do the maintenance after
a thorough analysis and our databases were not subjected to any kind of
maintenance activity since 2 yrs (with downtime).

I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to
ensure that IO performance and Indexing performance would be good and the PG
optimizer would pick up the optimal plan. As said earlier, our databases
have never been part of any re-organization since 2 years and are highly
transactional databases. I believe that, performing VACUUM FULL and
RE-INDEXING would have tightly packed rows (in every page) would ensure good
IOs.

I might have not put across the explanation in an understandable manner.

Please help me know the following -

1. When would pg_stat_user_tables will be updated and what would the
information show ?
2. Will the information about dead-rows and live-rows vanish after VACUUM or
ANALYZE or VACUUM FULL ?

I am just preparing a monitoring system which would help us know the rate of
bloats and data generation on daily basis.

Sorry for the long email !

Looking forward for your help !

Thanks
Venkat




On Wed, Sep 21, 2011 at 7:27 PM, Kevin Grittner  wrote:

> Shaun Thomas  wrote:
> > Venkat Balaji wrote:
>
> >> I see lot of free spaces or free pages in Tables and Indexes.
> >> But, I need to give an exact calculation on how much space will
> >> be reclaimed after VACUUM FULL and RE-INDEXING.
> >
> > Why?
>
> I've been wondering that, too.  And talking about the space being
> "reclaimed" seems to be at odds with your subject line.  The space
> is given up by the database engine to the file system free space,
> where reuse by the database will be much more expensive.  For good
> performance you want some free space in the tables and indexes,
> where it can be allocated to new tuples without going out through OS
> calls to the file system.
>
> Clearly, if free space gets higher than necessary to support
> creation of new tuples, it can start to harm performance, and you
> may need to take aggressive action (such as CLUSTER) to reclaim it;
> but any time you find it necessary to do *that* you should be
> investigating what went wrong to put you in such a spot.  Either
> your autovacuum is (as Shaun suggested) not aggressive enough, or
> you have some long running transaction (possibly "idle in
> transaction") which is preventing vacuums from doing their work
> effectively.  Investigating that is going to help more than
> calculating just how much space the database is going to give up to
> file system free space.
>
> -Kevin
>


Re: Show_database_bloat reliability? [was: Re: [PERFORM] REINDEX not working for wastedspace]

2011-09-21 Thread Greg Smith

On 09/21/2011 08:43 AM, Guillaume Cottenceau wrote:

AI Rumman  writes:

   

Using the query http://wiki.postgresql.org/wiki/Show_database_bloat, I got the
 

Is this stuff to show database bloat reliable?


Only in that increase and decreases of the number reported can be useful 
for determining if bloat is likely increasing or decreasing--which is 
the purpose of that query.  The value returned is a rough estimate, and 
should not be considered useful as any sort of absolute measurement.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with self-join, group by, 100m rows

2011-09-21 Thread Tom Lane
Thomas Kappler  writes:
> The query we want to do is (with example values):

> select t.cid, count(distinct t1.subject_id)
> from termindex t1, termindex t2
> where t1.cid=20642 and t1.indextype=636 and t2.indextype=636 and
> t2.subject_id=t1.subject_id
> group by t2.cid;

The EXPLAIN output you provided doesn't appear to match this query (in
particular, I don't see the indextype restrictions being checked
anyplace in the plan).

One quick-and-dirty thing that might help is to raise work_mem enough
so that (1) you get a hash aggregation not a sort/group one, and (2)
if there are still any sorts being done, they don't spill to disk.
That will probably be a higher number than would be prudent to install
as a global setting, but you could SET it locally in the current
session before issuing the expensive query.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Shaun Thomas

On 09/21/2011 09:12 AM, Tom Lane wrote:


The PG wiki is editable by anyone who signs up for an account.  Feel
free to put in an appropriate disclaimer, or improve the sample
query.


Ah, well then. I do have an account, but thought there were more 
granular page restrictions than that. I may have to start wading into 
them when I see stuff like this. :)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with self-join, group by, 100m rows

2011-09-21 Thread Robert Klemme
On Tue, Sep 20, 2011 at 7:43 PM, Thomas Kappler  wrote:
> [please CC, I'm not on the list]
>
> Hi all,
>
> we have one table that basically uses Postgres as a key-value store.
>
>     Table "public.termindex"
> Column   |  Type   | Modifiers
> -+-+---
>  subject_id | integer |
>  indextype  | integer |
>  cid        | integer |
>
> This is with Postgres 9.0.
>
> The table has 96 million rows and an index on each column. It contains
> no NULLs and has no triggers.
>
> subject_id has about 2m distinct values, cid about 200k, and indextype only 
> six.
>
> The table is *read-only* after the initial load.
>
> The query we want to do is (with example values):
>
> select t.cid, count(distinct t1.subject_id)
> from termindex t1, termindex t2
> where t1.cid=20642 and t1.indextype=636 and t2.indextype=636 and
> t2.subject_id=t1.subject_id
> group by t2.cid;

Do you have any multi column indexes?  From the text of your query it
seems it could benefit from these two indexes:

(cid, indextype)
(subject_id, indextype)

I do not know whether PostgreSQL can avoid the table if you make the first index
(cid, indextype, subject_id)
in other words: append all the columns needed for the join.  In theory
the query could then be satisfied from the indexes.

> Pasting the explain analyze output into
> http://explain.depesz.com/s/Yr4 we see that Postgres is doing an
> external sort using about 150MB of data.
>
> Now, we're not Postgres experts, or even great at relational design.
> Are there better ways of doing that query, or designing the table? For
> the latter we do have a number of constraints, though, that I don't
> want to outline now because this mail is already long enough.

Those are probably important to know.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Prepared statements and suboptimal plans

2011-09-21 Thread Grzegorz Jaśkiewicz
one thing, in SUM() , you don't have to coalesce. Consider following example:

foo=# create table bar(id serial primary key, a float);
NOTICE:  CREATE TABLE will create implicit sequence "bar_id_seq" for
serial column "bar.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"bar_pkey" for table "bar"
CREATE TABLE
Time: 666.094 ms
foo=# insert into bar(a) select random()*random()*random() from
generate_series(1, 1000) x;
INSERT 0 1000
Time: 496.451 ms
foo=# update bar set a = NULL where random() < 0.1;
UPDATE 97
Time: 150.599 ms
foo=# select sum(a) from bar;
   sum
--
 108.757220804033
(1 row)

Time: 277.227 ms
foo=# select sum(coalesce(a, 0.0)) from bar;
   sum
--
 108.757220804033
(1 row)

Time: 0.709 ms


But that obviously isn't going to improve it a lot.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Constraint exclusion on UNION ALL subqueries with WHERE conditions

2011-09-21 Thread Gunnlaugur Þór Briem
On Monday, September 19, 2011 3:59:30 AM UTC, Tom Lane wrote:
> Works for me in 8.4.8.  Do you have constraint_exclusion set to ON?

I did try with constraint_exclusion set to on, though the docs suggest 
partition should be enough ("examine constraints only for ... UNION ALL 
subqueries")

Here's a minimal test case (which I should have supplied in the original post, 
sorry), tried just now in 8.4.8:

CREATE OR REPLACE VIEW v_heavy_view
AS SELECT (random()*1e5)::integer col
FROM generate_series(1, 1e6::integer);

CREATE OR REPLACE VIEW v_test_constraint_exclusion AS
SELECT col FROM v_heavy_view WHERE col < 3
UNION ALL SELECT col FROM v_heavy_view WHERE col >= 3;

EXPLAIN SELECT * FROM v_test_constraint_exclusion WHERE col=2;

QUERY PLAN  
  
--
 Result  (cost=0.00..70.04 rows=4 width=4)
   ->  Append  (cost=0.00..70.04 rows=4 width=4)
 ->  Subquery Scan v_heavy_view  (cost=0.00..35.00 rows=2 width=4)
   Filter: ((v_heavy_view.col < 3) AND (v_heavy_view.col = 2))
   ->  Function Scan on generate_series  (cost=0.00..20.00 
rows=1000 width=0)
 ->  Subquery Scan v_heavy_view  (cost=0.00..35.00 rows=2 width=4)
   Filter: ((v_heavy_view.col >= 3) AND (v_heavy_view.col = 2))
   ->  Function Scan on generate_series  (cost=0.00..20.00 
rows=1000 width=0)

I want the planner to notice that (v_heavy_view.col >= 3) AND (v_heavy_view.col 
= 2) can never be satisfied, and skip that subquery.

Regards,

- Gulli

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query with self-join, group by, 100m rows

2011-09-21 Thread Thomas Kappler
[please CC, I'm not on the list]

Hi all,

we have one table that basically uses Postgres as a key-value store.

 Table "public.termindex"
Column   |  Type   | Modifiers
-+-+---
 subject_id | integer |
 indextype  | integer |
 cid| integer |

This is with Postgres 9.0.

The table has 96 million rows and an index on each column. It contains
no NULLs and has no triggers.

subject_id has about 2m distinct values, cid about 200k, and indextype only six.

The table is *read-only* after the initial load.

The query we want to do is (with example values):

select t.cid, count(distinct t1.subject_id)
from termindex t1, termindex t2
where t1.cid=20642 and t1.indextype=636 and t2.indextype=636 and
t2.subject_id=t1.subject_id
group by t2.cid;

The goal is to select all subjects matching certain criteria, and for
all cids (concept ids) that they have, determine the distribution over
the whole population, for a given indextype.

For instance, if the criteria are "cid=1 and indextype=636", let's say
subjects 1,2,3 match. The indextype we want the distribution for is
999. So we get all cids where subject_id in (1,2,3) and indextype=999,
group these by cid with count(subject_id) per cid. The result would
look like

cid  | count
-+---
  12 | 1
  13 |28
...

Another way of asking this query is with an inner select:

select cid, count(subject_id) from termindex where subject_id in
(select subject_id from termindex where cid=18869 and indextype=636)
and indextype=636 group by cid;

On this 96m rows table, the join query takes about 50 seconds. EXPLAIN
ANALYZE output below. The inner select query takes much longer.

Pasting the explain analyze output into
http://explain.depesz.com/s/Yr4 we see that Postgres is doing an
external sort using about 150MB of data.

Now, we're not Postgres experts, or even great at relational design.
Are there better ways of doing that query, or designing the table? For
the latter we do have a number of constraints, though, that I don't
want to outline now because this mail is already long enough.

Thanks in advance!
Thomas


QUERY PLAN
-
 GroupAggregate  (cost=446092576.21..459395622.23 rows=200 width=8)
(actual time=18927.047..19001.072 rows=2562 loops=1)
   ->  Sort  (cost=446092576.21..450526924.05 rows=1773739136 width=8)
(actual time=18927.025..18952.726 rows=119429 loops=1)
 Sort Key: t.cid
 Sort Method:  external merge  Disk: 2088kB
 ->  Merge Join  (cost=1480064.68..28107420.08 rows=1773739136
width=8) (actual time=14300.547..18836.386 rows=119429 loops=1)
   Merge Cond: (t1.subject_id = t.subject_id)
   ->  Sort  (cost=44173.64..44278.93 rows=42116 width=4)
(actual time=30.148..33.965 rows=14466 loops=1)
 Sort Key: t1.subject_id
 Sort Method:  external merge  Disk: 200kB
 ->  Bitmap Heap Scan on mcindex t1
(cost=791.57..40361.19 rows=42116 width=4) (actual time=3.901..18.655
rows=14466 loops=1)
   Recheck Cond: (cid = 20642)
   ->  Bitmap Index Scan on mc2
(cost=0.00..781.04 rows=42116 width=0) (actual time=3.319..3.319
rows=14466 loops=1)
 Index Cond: (cid = 20642)
   ->  Materialize  (cost=1435891.04..1478006.60
rows=8423113 width=8) (actual time=14270.211..17554.299 rows=8423170
loops=1)
 ->  Sort  (cost=1435891.04..1456948.82
rows=8423113 width=8) (actual time=14270.202..16346.835 rows=8423094
loops=1)
   Sort Key: t.subject_id
   Sort Method:  external merge  Disk: 148232kB
   ->  Seq Scan on mcindex t
(cost=0.00..121502.13 rows=8423113 width=8) (actual
time=0.012..1381.282 rows=8423113 loops=1)
 Total runtime: 22095.280 ms
(19 rows)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Prepared statements and suboptimal plans

2011-09-21 Thread Royce Ausburn
Hi all,

It looks like I've been hit with this well known issue.  I have a complicated 
query that is intended to run every few minutes, I'm using JDBC's 
Connection.prepareStatement() mostly for nice parameterisation, but postgres 
produces a suboptimal plan due to its lack of information when the statement is 
prepared.

I've been following the mailing list for a few years and I've seen this topic 
come up a bit.  I've just done a quick google and I'm not quite sure how to fix 
this short of manually substituting my query parameters in to a query string -- 
avoiding prepared statements… An alternative might be to re-write the query and 
hope that the planner's general plan is a bit closer to optimal… but are these 
my only options?  

I notice that the non-prepared-statement (both below my sig) plan estimates 
5500 rows output.  I think that's out by a factor of up to 100, suggesting that 
I might want to increase my statistics and re-analyse… but as I understand the 
prepared-statement problem, this probably won't help here.  Correct?

We've been worst hit by this query on an 8.3 site.  Another site is running 
8.4.  Have there been improvements in this area recently?  Upgrading to 9.0 
might be viable for us.

Any tips would be appreciated,

--Royce


test=# PREPARE test (integer) as 
  select 
sid, 
role, 
starttime::date, 
nasid, importer, 
max(eventbinding.biid) as biid, 
sum(bytesin) as bytesin, 
sum(bytesout) as bytesout,  
sum(seconds) as seconds, 
sum(coalesce(pages, 0)) as pages, 
sum(coalesce(count, 0)) as count,  
sum(coalesce(rate, 0.0)) as rate, 
sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
count(*) as entries  
  from billingItem, eventBinding , fqun  
  where eventBinding.biid > $1 and eventBinding.biid = billingItem.biid and 
fqun.uid = eventBinding.uid  
  group by sid, starttime::date, nasid, importer, role;
PREPARE
test=# explain EXECUTE test(57205899);
  QUERY PLAN
  
---
 GroupAggregate  (cost=12338998.78..13770049.38 rows=18465169 width=148)
   ->  Sort  (cost=12338998.78..12385161.70 rows=18465169 width=148)
 Sort Key: fqun.sid, ((billingitem.starttime)::date), 
billingitem.nasid, billingitem.importer, eventbinding.role
 ->  Hash Join  (cost=1498473.48..7333418.55 rows=18465169 width=148)
   Hash Cond: (eventbinding.uid = fqun.uid)
   ->  Hash Join  (cost=1496916.06..6916394.83 rows=18465169 
width=148)
 Hash Cond: (billingitem.biid = eventbinding.biid)
 ->  Seq Scan on billingitem  (cost=0.00..1433087.88 
rows=56222688 width=142)
 ->  Hash  (cost=1175939.45..1175939.45 rows=18465169 
width=10)
   ->  Bitmap Heap Scan on eventbinding  
(cost=427409.84..1175939.45 rows=18465169 width=10)
 Recheck Cond: (biid > $1)
 ->  Bitmap Index Scan on 
eventbinding_biid_uid_role_idx  (cost=0.00..422793.55 rows=18465169 width=0)
   Index Cond: (biid > $1)
   ->  Hash  (cost=943.85..943.85 rows=49085 width=8)
 ->  Seq Scan on fqun  (cost=0.00..943.85 rows=49085 
width=8)
(15 rows)




As a query on the psql command line:

test=# explain 
  select 
sid, 
role, 
starttime::date, 
nasid, 
importer, 
max(eventbinding.biid) as biid, 
sum(bytesin) as bytesin, 
sum(bytesout) as bytesout,  
sum(seconds) as seconds, 
sum(coalesce(pages, 0)) as pages, 
sum(coalesce(count, 0)) as count,  
sum(coalesce(rate, 0.0)) as rate, 
sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
count(*) as entries  
  from billingItem, eventBinding , fqun  
  where eventBinding.biid > 57205899 and eventBinding.biid = billingItem.biid 
and fqun.uid = eventBinding.uid  
  group by sid, starttime::date, nasid, importer, role;
 QUERY PLAN 


 HashAggregate  (cost=102496.80..102704.55 rows=5540 width=148)
   ->  Hash Join  (cost=1697.13..102289.05 rows=5540 width=148)
 Hash Cond: (eventbinding.uid = fqun.uid)
 ->  Nested Loop  (cost=139.71..100606.99 rows=5540 width=148)
   ->  Bitmap Heap Scan on eventbinding  (cost=139.71..20547.20 
rows=5540 width=10)
 Recheck Cond: (biid > 57205899)
 ->  Bitmap Index Sc

[PERFORM] Re: [PERFORM] parámetros de postgres y linux en maquinas virtuales

2011-09-21 Thread Claudio Freire
2011/9/21 Hellmuth Vargas :
> SAN
> VMWare
> CentOS 6-64bits
> PostgreSQL 9-64bits
> Tengo virtualizada la maquina para atender otros servicios, entonces
> mi pregunta es.. los parámetros y consideraciones que establecen en el
> articulo son aplicables al esquema visualizado especialmente en cuanto a
> la configuración del Linux? les agradezco de antemano su opinión

No es suficiente detalle para saber, pero seguramente SAN cambia las cosas.
En lo que se refiere a memoria, nada debería cambiar aunque sea una
máquina virtual. Pero todo lo que es disco sí.

Por cierto, esta lista es en inglés.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Andy Lester

On Sep 21, 2011, at 8:30 AM, Shaun Thomas wrote:

> I wish they would erase that Wiki page, or at least add a disclaimer.


The "they" that you refer to includes you.  It's a wiki.  You can write your 
own disclaimer.

xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Tom Lane
Shaun Thomas  writes:
> On 09/21/2011 02:01 AM, AI Rumman wrote:
>> Using the query http://wiki.postgresql.org/wiki/Show_database_bloat,
>> I got the following result for a table:

> God. I wish they would erase that Wiki page, or at least add a 
> disclaimer. That query is no better than a loose estimate. Never, ever, 
> ever depend on it for critical information about your tables. Ever.

The PG wiki is editable by anyone who signs up for an account.  Feel
free to put in an appropriate disclaimer, or improve the sample query.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Kevin Grittner
Shaun Thomas  wrote:
> Venkat Balaji wrote:
 
>> I see lot of free spaces or free pages in Tables and Indexes. 
>> But, I need to give an exact calculation on how much space will
>> be reclaimed after VACUUM FULL and RE-INDEXING.
> 
> Why?
 
I've been wondering that, too.  And talking about the space being
"reclaimed" seems to be at odds with your subject line.  The space
is given up by the database engine to the file system free space,
where reuse by the database will be much more expensive.  For good
performance you want some free space in the tables and indexes,
where it can be allocated to new tuples without going out through OS
calls to the file system.
 
Clearly, if free space gets higher than necessary to support
creation of new tuples, it can start to harm performance, and you
may need to take aggressive action (such as CLUSTER) to reclaim it;
but any time you find it necessary to do *that* you should be
investigating what went wrong to put you in such a spot.  Either
your autovacuum is (as Shaun suggested) not aggressive enough, or
you have some long running transaction (possibly "idle in
transaction") which is preventing vacuums from doing their work
effectively.  Investigating that is going to help more than
calculating just how much space the database is going to give up to
file system free space.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] parámetros de postgres y linux en maquinas virtuales

2011-09-21 Thread Hellmuth Vargas
buen dia

a partir de los siguientes artículos publicados en la primera edicion de
PostgreSQL Magazine (http://pgmag.org/00/read) (Performance Tunning
PostgreSQL y Tuning linux for PostgreSQL) me dispuse a implementar los mismo
en mi servidor de Postgres pero cuento con la siguiente arquitectura

SAN
VMWare
CentOS 6-64bits
PostgreSQL 9-64bits

Tengo virtualizada la maquina para atender otros servicios, entonces
mi pregunta es.. los parámetros y consideraciones que establecen en el
articulo son aplicables al esquema visualizado especialmente en cuanto a
la configuración del Linux? les agradezco de antemano su opinión

-- 
Cordialmente,

Ing. Hellmuth I. Vargas S.
Bogotá, Colombia


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Shaun Thomas

On 09/21/2011 02:01 AM, AI Rumman wrote:


Using the query http://wiki.postgresql.org/wiki/Show_database_bloat,
I got the following result for a table:


God. I wish they would erase that Wiki page, or at least add a 
disclaimer. That query is no better than a loose estimate. Never, ever, 
ever depend on it for critical information about your tables. Ever.


Ever.

With that said, there are a lot of ways which can get the information 
you want. One is the pgstattuple contrib, the other is the 
pg_freespacemap contrib (though that didn't get really useful until 8.4 
and above).


Check out those documentation pages for usage info. More importantly, 
ignore the results of that query. It's wrong. If you've just reindexed, 
those indexes are about as small as they're ever going to be.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Shaun Thomas

On 09/20/2011 11:22 AM, Venkat Balaji wrote:


Please help me understand how to calculate free space in Tables and
Indexes even after vacuuming and analyzing is performed.


Besides the query Mark gave you using freespacemap, there's also the 
pgstattuple contrib module. You'd use it like this:


SELECT pg_size_pretty(free_space) AS mb_free
  FROM pgstattuple('some_table');

Query must be run as a super-user, and I wouldn't recommend running it 
on huge tables, since it scans the actual data files to get its 
information. There's a lot of other useful information in that function, 
such as the number of dead rows.



What i understand is that, even if we perform VACUUM ANALYZE
regularly, the free space generated is not filled up.


VACUUM does not actually generate free space. It locates and marks 
reusable tuples. Any future updates or inserts on that table will be put 
in those newly reclaimed spots, instead of being bolted onto the end of 
the table.



I see lot of free spaces or free pages in Tables and Indexes. But, I
need to give an exact calculation on how much space will be reclaimed
after VACUUM FULL and RE-INDEXING.


Why? If your database is so desperate for space, VACUUM and REINDEX 
won't really help you. A properly maintained database will still have a 
certain amount of "bloat" equal to the number of rows that change 
between maintenance intervals. One way or another, that space is going 
to be used by *something*.


It sounds more like you need to tweak your autovacuum settings to be 
more aggressive if you're seeing significant enough turnover that your 
tables are bloating significantly. One of our tables, for instance, gets 
vacuumed more than once per hour because it experiences 1,000% turnover 
daily.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG 9 adminstrations

2011-09-21 Thread Shaun Thomas

On 09/20/2011 11:29 PM, Hany ABOU-GHOURY wrote:


Thanks but...did not work different error though

ERROR: relation "history" already exists
ERROR: relation "idx_history_pagegroupid" already exists
ERROR: constraint "cdocumentid" for relation "history" already exists


Clearly the history table already exists on your second host. This may 
or may not matter, if the table definitions are the same. If the table 
on host2 was empty, it'll be filled with the data from host1. If not, 
you'll need to truncate the table on host2 first. If you don't want the 
complaints about tables or indexes or constraints that already exist, 
use the -a option for data-only dumps.


Also, might I suggest using the pgsql-novice list? They're more likely 
to help with general issues like this.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Show_database_bloat reliability? [was: Re: [PERFORM] REINDEX not working for wastedspace]

2011-09-21 Thread Guillaume Cottenceau
AI Rumman  writes:

> Using the query http://wiki.postgresql.org/wiki/Show_database_bloat, I got the

Is this stuff to show database bloat reliable? After a VACUUM
FULL of the table reported as the top responsible of bloat,
performing the same request again still gives the same result
(still that table is the top responsible of bloat):

 current_database | schemaname |tablename| 
tbloat | wastedbytes |iname | 
ibloat | wastedibytes 
--++-++-+--++--
  test| public | requests|
1.1 |14565376 | requests_pkey|
0.4 |0
  test| public | requests|
1.1 |14565376 | idx_whatever |
0.8 |0
  test| public | requests|
1.1 |14565376 | idx_whatever2|
0.6 |0
...

A few investigations show that when tbloat is close to 1.0 then
it seems not reliable, otherwise it seems useful.

pg 8.4.7

-- 
Guillaume Cottenceau

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Reid Thompson
On Wed, 2011-09-21 at 13:01 +0600, AI Rumman wrote:
> I am using Postgresql 9.0.1.
> 

> I REINDEXED  both the indexes and table, but I did not find any change
> in wastedspace or wastedispace.
> Could you please tell me why?

you need to 

1) either vacuum full or cluster the table
2) analyze the table
3) check bloat again


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread AI Rumman
Yes I ANALYZE the table, but no change for wastedispace.

On Wed, Sep 21, 2011 at 1:06 PM, Guillaume Lelarge
wrote:

> On Wed, 2011-09-21 at 13:01 +0600, AI Rumman wrote:
> > I am using Postgresql 9.0.1.
> >
> > Using the query http://wiki.postgresql.org/wiki/Show_database_bloat, I
> got
> > the following result for a table:
> >
> > -[ RECORD 1 ]+---
> > current_database | crm
> > schemaname   | public
> > tablename| _attachments
> > tbloat   | 0.9
> > wastedbytes  | 0
> > iname| attachments_description_type_attachmentsid_idx
> > ibloat   | 2.3
> > wastedibytes | 5439488
> > -[ RECORD 2 ]+---
> > current_database | crm
> > schemaname   | public
> > tablename| _attachments
> > tbloat   | 0.9
> > wastedbytes  | 0
> > iname| attachments_attachmentsid_idx
> > ibloat   | 0.2
> > wastedibytes | 0
> > -[ RECORD 3 ]+---
> > current_database | crm
> > schemaname   | public
> > tablename| _attachments
> > tbloat   | 0.9
> > wastedbytes  | 0
> > iname| _attachments_pkey
> > ibloat   | 0.2
> > wastedibytes | 0
> >
> > I REINDEXED  both the indexes and table, but I did not find any change in
> > wastedspace or wastedispace.
> > Could you please tell me why?
>
> REINDEX only rebuilds indexes. And you'll obviously have a bit of "lost
> space" because of the FILLFACTOR value (90% on indexes IIRC).
>
>
> --
> Guillaume
>  http://blog.guillaume.lelarge.info
>  http://www.dalibo.com
>
>


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Venkat Balaji
Could you please let us know if you have analyzed after the re-indexing is
done ?

This must show differences for only Indexes not the Tables.

For Tables, you need to do VACUUM FULL to show the difference.

Thanks
Venkat

On Wed, Sep 21, 2011 at 12:31 PM, AI Rumman  wrote:

> I am using Postgresql 9.0.1.
>
> Using the query http://wiki.postgresql.org/wiki/Show_database_bloat, I got
> the following result for a table:
>
> -[ RECORD 1 ]+---
> current_database | crm
> schemaname   | public
> tablename| _attachments
> tbloat   | 0.9
> wastedbytes  | 0
> iname| attachments_description_type_attachmentsid_idx
> ibloat   | 2.3
> wastedibytes | 5439488
> -[ RECORD 2 ]+---
> current_database | crm
> schemaname   | public
> tablename| _attachments
> tbloat   | 0.9
> wastedbytes  | 0
> iname| attachments_attachmentsid_idx
> ibloat   | 0.2
> wastedibytes | 0
> -[ RECORD 3 ]+---
> current_database | crm
> schemaname   | public
> tablename| _attachments
> tbloat   | 0.9
> wastedbytes  | 0
> iname| _attachments_pkey
> ibloat   | 0.2
> wastedibytes | 0
>
> I REINDEXED  both the indexes and table, but I did not find any change in
> wastedspace or wastedispace.
> Could you please tell me why?
>


Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Guillaume Lelarge
On Wed, 2011-09-21 at 13:01 +0600, AI Rumman wrote:
> I am using Postgresql 9.0.1.
> 
> Using the query http://wiki.postgresql.org/wiki/Show_database_bloat, I got
> the following result for a table:
> 
> -[ RECORD 1 ]+---
> current_database | crm
> schemaname   | public
> tablename| _attachments
> tbloat   | 0.9
> wastedbytes  | 0
> iname| attachments_description_type_attachmentsid_idx
> ibloat   | 2.3
> wastedibytes | 5439488
> -[ RECORD 2 ]+---
> current_database | crm
> schemaname   | public
> tablename| _attachments
> tbloat   | 0.9
> wastedbytes  | 0
> iname| attachments_attachmentsid_idx
> ibloat   | 0.2
> wastedibytes | 0
> -[ RECORD 3 ]+---
> current_database | crm
> schemaname   | public
> tablename| _attachments
> tbloat   | 0.9
> wastedbytes  | 0
> iname| _attachments_pkey
> ibloat   | 0.2
> wastedibytes | 0
> 
> I REINDEXED  both the indexes and table, but I did not find any change in
> wastedspace or wastedispace.
> Could you please tell me why?

REINDEX only rebuilds indexes. And you'll obviously have a bit of "lost
space" because of the FILLFACTOR value (90% on indexes IIRC).


-- 
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread AI Rumman
I am using Postgresql 9.0.1.

Using the query http://wiki.postgresql.org/wiki/Show_database_bloat, I got
the following result for a table:

-[ RECORD 1 ]+---
current_database | crm
schemaname   | public
tablename| _attachments
tbloat   | 0.9
wastedbytes  | 0
iname| attachments_description_type_attachmentsid_idx
ibloat   | 2.3
wastedibytes | 5439488
-[ RECORD 2 ]+---
current_database | crm
schemaname   | public
tablename| _attachments
tbloat   | 0.9
wastedbytes  | 0
iname| attachments_attachmentsid_idx
ibloat   | 0.2
wastedibytes | 0
-[ RECORD 3 ]+---
current_database | crm
schemaname   | public
tablename| _attachments
tbloat   | 0.9
wastedbytes  | 0
iname| _attachments_pkey
ibloat   | 0.2
wastedibytes | 0

I REINDEXED  both the indexes and table, but I did not find any change in
wastedspace or wastedispace.
Could you please tell me why?