Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-16 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 SELECT COUNT(*)
 FROM user
 INNER JOIN house
   ON (user.house_id = house.id)
 LEFT OUTER JOIN district
   ON (house.district_id = district.id)
 WHERE (user.status = 0
  AND (district.update_status = 2
 OR district.update_status = 3 )
  AND (user.valid = 1
 OR user.valid = 3 )
  AND district.is_test = false );

 However, since the anonymization above doesn't quite match that used in
 the EXPLAIN plan, I'm not sure what you'll get out of it.  And yes, we
 know that the outer join is being invalidated.

Ah, I see where I was confused: in the original query plan I'd been
imagining that charlie.sierra was a unique column, but your gloss on
that as being house.district_id implies that it's highly non-unique.
And looking at the rowcounts in the original plan backs that up:
there are about 600 house rows per district row.  So my thought of
having district as the outer side of a nestloop scanning the index
on house.district_id would not really work very well --- maybe it
would end up cheaper than the mergejoin plan, but it's far from a
clear-cut win.

On the whole I'm thinking the code is operating as designed here.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High Disk write and space taken by PostgreSQL

2012-08-16 Thread Bruce Momjian
On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote:
 Hi,
 
 On 16 August 2012 15:40, J Ramesh Kumar rameshj1...@gmail.com wrote:
  As you said, MySQL with MyISAM is better choice for my app. Because I don't
  need transaction/backup. May be I'll try with InnoDB and find the disk
  write/space difference. Is there any similar methods available in postgresql
  like MyISAM engine ?
 
 You can try unlogged tables:
 http://www.postgresql.org/docs/9.1/static/sql-createtable.html
 
 If specified, the table is created as an unlogged table. Data written
 to unlogged tables is not written to the write-ahead log (see Chapter
 29), which makes them considerably faster than ordinary tables.
 However, they are not crash-safe: an unlogged table is automatically
 truncated after a crash or unclean shutdown. The contents of an
 unlogged table are also not replicated to standby servers. Any indexes
 created on an unlogged table are automatically unlogged as well;
 however, unlogged GiST indexes are currently not supported and cannot
 be created on an unlogged table.

I would set full_page_writes = off too.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High Disk write and space taken by PostgreSQL

2012-08-16 Thread Bruce Momjian
On Thu, Aug 16, 2012 at 10:53:21AM -0400, Bruce Momjian wrote:
 On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote:
  Hi,
  
  On 16 August 2012 15:40, J Ramesh Kumar rameshj1...@gmail.com wrote:
   As you said, MySQL with MyISAM is better choice for my app. Because I 
   don't
   need transaction/backup. May be I'll try with InnoDB and find the disk
   write/space difference. Is there any similar methods available in 
   postgresql
   like MyISAM engine ?
  
  You can try unlogged tables:
  http://www.postgresql.org/docs/9.1/static/sql-createtable.html
  
  If specified, the table is created as an unlogged table. Data written
  to unlogged tables is not written to the write-ahead log (see Chapter
  29), which makes them considerably faster than ordinary tables.
  However, they are not crash-safe: an unlogged table is automatically
  truncated after a crash or unclean shutdown. The contents of an
  unlogged table are also not replicated to standby servers. Any indexes
  created on an unlogged table are automatically unlogged as well;
  however, unlogged GiST indexes are currently not supported and cannot
  be created on an unlogged table.
 
 I would set full_page_writes = off too.

Better yet, read our documentation about non-durable settting:

http://www.postgresql.org/docs/9.1/static/non-durability.html

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High Disk write and space taken by PostgreSQL

2012-08-16 Thread Scott Marlowe
On Wed, Aug 15, 2012 at 11:40 PM, J Ramesh Kumar rameshj1...@gmail.com wrote:
 Dear Scott Marlowe,

 Thanks for the details.

 As you said, MySQL with MyISAM is better choice for my app. Because I don't
 need transaction/backup.

That's not exactly what I said.  Remember that if you need to run
complex queries postgresql is still likely the better candidate.

 May be I'll try with InnoDB and find the disk
 write/space difference. Is there any similar methods available in postgresql
 like MyISAM engine ?

Unlogged tables as mentioned by others.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] cluster on conditional index?

2012-08-16 Thread Doug Hunley
On Wed, Aug 15, 2012 at 5:19 PM, Bosco Rama postg...@boscorama.com wrote:
 On 08/15/12 14:05, Josh Berkus wrote:

 That actually makes sense to me. Cluster the rows covered by that
 index, let the rest fall where they may. I'm typically only accessing
 the rows covered by that index, so I'd get the benefit of the cluster
 command but wouldn't have to spend cycles doing the cluster for rows I
 don't care about.

 Sure, that's a feature request though.  And thinking about it, I'm
 willing to bet that it's far harder to implement than it sounds.

How/where does file feature requests?


 In the meantime, you could ad-hoc this by splitting the table into two
 partitions and clustering one of the two partitions.

 Wouldn't creating a second index on the boolean itself and then clustering
 on that be much easier?

that's what I was looking into doing actuallly

-- 
Douglas J Hunley (doug.hun...@gmail.com)
Twitter: @hunleyd   Web:
douglasjhunley.com
G+: http://goo.gl/sajR3


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High Disk write and space taken by PostgreSQL

2012-08-16 Thread anara...@anarazel.de


Bruce Momjian br...@momjian.us schrieb:

On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote:
 Hi,
 
 On 16 August 2012 15:40, J Ramesh Kumar rameshj1...@gmail.com
wrote:
  As you said, MySQL with MyISAM is better choice for my app. Because
I don't
  need transaction/backup. May be I'll try with InnoDB and find the
disk
  write/space difference. Is there any similar methods available in
postgresql
  like MyISAM engine ?
 
 You can try unlogged tables:
 http://www.postgresql.org/docs/9.1/static/sql-createtable.html
 
 If specified, the table is created as an unlogged table. Data written
 to unlogged tables is not written to the write-ahead log (see Chapter
 29), which makes them considerably faster than ordinary tables.
 However, they are not crash-safe: an unlogged table is automatically
 truncated after a crash or unclean shutdown. The contents of an
 unlogged table are also not replicated to standby servers. Any
indexes
 created on an unlogged table are automatically unlogged as well;
 however, unlogged GiST indexes are currently not supported and cannot
 be created on an unlogged table.

I would set full_page_writes = off too.
Why? There shouldn't be any such writes on unlogged tables.

Andres

Please excuse the brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] cluster on conditional index?

2012-08-16 Thread Jeff Janes
On Wed, Aug 15, 2012 at 2:19 PM, Bosco Rama postg...@boscorama.com wrote:
 On 08/15/12 14:05, Josh Berkus wrote:

 That actually makes sense to me. Cluster the rows covered by that
 index, let the rest fall where they may. I'm typically only accessing
 the rows covered by that index, so I'd get the benefit of the cluster
 command but wouldn't have to spend cycles doing the cluster for rows I
 don't care about.

 Sure, that's a feature request though.  And thinking about it, I'm
 willing to bet that it's far harder to implement than it sounds.

 In the meantime, you could ad-hoc this by splitting the table into two
 partitions and clustering one of the two partitions.

 Wouldn't creating a second index on the boolean itself and then clustering
 on that be much easier?

I would take an existing useful index, and build a new one on the same
columns but with is_deleted prepended.

That way, since you are going through the effort to rewrite the whole
table anyway, the ties are broken in a way that might be of further
use.

Once the CLUSTER is done, the index might even be useful enough to
keep around for use with queries, or even replace the original index
altogether.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High Disk write and space taken by PostgreSQL

2012-08-16 Thread Bruce Momjian
On Thu, Aug 16, 2012 at 06:07:26PM +0200, anara...@anarazel.de wrote:
 
 
 Bruce Momjian br...@momjian.us schrieb:
 
 On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote:
  Hi,
  
  On 16 August 2012 15:40, J Ramesh Kumar rameshj1...@gmail.com
 wrote:
   As you said, MySQL with MyISAM is better choice for my app. Because
 I don't
   need transaction/backup. May be I'll try with InnoDB and find the
 disk
   write/space difference. Is there any similar methods available in
 postgresql
   like MyISAM engine ?
  
  You can try unlogged tables:
  http://www.postgresql.org/docs/9.1/static/sql-createtable.html
  
  If specified, the table is created as an unlogged table. Data written
  to unlogged tables is not written to the write-ahead log (see Chapter
  29), which makes them considerably faster than ordinary tables.
  However, they are not crash-safe: an unlogged table is automatically
  truncated after a crash or unclean shutdown. The contents of an
  unlogged table are also not replicated to standby servers. Any
 indexes
  created on an unlogged table are automatically unlogged as well;
  however, unlogged GiST indexes are currently not supported and cannot
  be created on an unlogged table.
 
 I would set full_page_writes = off too.
 Why? There shouldn't be any such writes on unlogged tables.

True.  I was thinking more of the logged tables, and the system tables.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing WAL usage followed by sudden drop

2012-08-16 Thread delongboy
We are not doing anything to postgres that would cause the rise and drop. 
Data base activity is pretty consistent.  nor are we doing any kind of
purge.  This week the drop occurred after 6 days.  We are thinking it must
be some kind of internal postgres activity but we can't track it down.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Increasing-WAL-usage-followed-by-sudden-drop-tp5719567p5720136.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing WAL usage followed by sudden drop

2012-08-16 Thread Kevin Grittner
delongboy sdel...@saucontech.com wrote:
 We are not doing anything to postgres that would cause the rise
 and drop.  Data base activity is pretty consistent.  nor are we
 doing any kind of purge.  This week the drop occurred after 6
 days.  We are thinking it must be some kind of internal postgres
 activity but we can't track it down.
 
Maybe autovacuum freezing tuples (which is a WAL-logged operation)
as tables periodically hit the autovacuum_freeze_max_age limit?
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] cluster on conditional index?

2012-08-16 Thread Jeff Janes
On Wed, Aug 15, 2012 at 6:43 AM, Doug Hunley doug.hun...@gmail.com wrote:
 On Tue, Aug 14, 2012 at 1:29 PM, k...@rice.edu k...@rice.edu wrote:

 It probably has to do with the fact that a conditional index, does
 not include every possible row in the table. Although, a cluster of
 the matching rows and then leave the rest in place, should work. How
 is that for hand-waving.  :)


 That actually makes sense to me. Cluster the rows covered by that
 index, let the rest fall where they may. I'm typically only accessing
 the rows covered by that index, so I'd get the benefit of the cluster
 command but wouldn't have to spend cycles doing the cluster for rows I
 don't care about.

IIRC, there isn't currently an in-place version of CLUSTER, it always
rewrites the entire table.  So it would still have to do something
with those rows, so that they show up in the new table.  But it could
just treat them all as equal to one another and have them be in
whatever order they happen to fall in.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat Problem

2012-08-16 Thread Strahinja Kustudić
Thanks for the help everyone and sorry for not replying sooner, I was on
a business trip.

@Hubert pg_reorg looks really interesting and from the first read it looks
to be a very good solution for maintenance, but for now I would rather try
to slow down, or remove this bloat, so I have to do as less maintenance as
possible.

@Mark So basically I should decrease the autovacuum nap time from 60s to
10s, reduce the scale factor from 0.2 to 0.1. log_autovacuum_min_duration is
already set to 0, which means everything is logged.

@Jeff I'm not sure if I understand what you mean? I know that we never
reuse key ranges. Could you be more clear, or give an example please.

Thanks in advance,
Strahinja



On Tue, Aug 14, 2012 at 6:14 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Fri, Aug 10, 2012 at 3:15 PM, Strahinja Kustudić
 strahin...@nordeus.com wrote:
 
  For example, yesterday when I checked the database size on the production
  server it was 30GB, and the restored dump of that database was only 17GB.
  The most interesting thing is that the data wasn't bloated that much, but
  the indices were. Some of them were a few times bigger than they should
 be.
  For example an index on the production db is 440MB, while that same index
  after dump/restore is 17MB, and there are many indices with that high
  difference.

 Could your pattern of deletions be leaving sparsely populated, but not
 completely empty, index pages; which your insertions will then never
 reuse because they never again insert values in that key range?

 Cheers,

 Jeff



[PERFORM] best practice to avoid table bloat?

2012-08-16 Thread Anibal David Acosta
Hi, 

if I have a table that daily at night is deleted about 8 millions of rows
(table maybe has 9 millions) is recommended to do a vacuum analyze after
delete completes or can I leave this job to autovacuum?

 

This table is very active during the day but less active during night

 

I think that the only only thing where Postgres is weak, is in this area
(table and index bloat).

 

For some reason for the same amount of data every day postgres consume a
little more.

 

Thanks!



Re: [PERFORM] best practice to avoid table bloat?

2012-08-16 Thread Andrew Dunstan


On 08/16/2012 04:33 PM, Anibal David Acosta wrote:


Hi,

if I have a table that daily at night is deleted about 8 millions of 
rows (table maybe has 9 millions) is recommended to do a vacuum 
analyze after delete completes or can I leave this job to autovacuum?


This table is very active during the day but less active during night

I think that the only only thing where Postgres is weak, is in this 
area (table and index bloat).


For some reason for the same amount of data every day postgres consume 
a little more.






Check out pg_reorg.

cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best practice to avoid table bloat?

2012-08-16 Thread Kevin Grittner
Anibal David Acosta a...@devshock.com wrote:
 
 if I have a table that daily at night is deleted about 8 millions
 of rows (table maybe has 9 millions) is recommended to do a vacuum
 analyze after delete completes or can I leave this job to
 autovacuum?
 
Deleting a high percentage of the rows should cause autovacuum to
deal with the table the next time it wakes up, so an explicit VACUUM
ANALYZE shouldn't be needed.
 
 For some reason for the same amount of data every day postgres
 consume a little more.
 
How are you measuring the data and how are you measuring the space? 
And what version of PostgreSQL is this?
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best practice to avoid table bloat?

2012-08-16 Thread Kevin Grittner
[please don't top-post]

Anibal David Acosta a...@devshock.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.go wrote:
 Anibal David Acosta a...@devshock.com wrote:
  
 if I have a table that daily at night is deleted about 8
 millions of rows (table maybe has 9 millions) is recommended to
 do a vacuum analyze after delete completes or can I leave this
 job to autovacuum?
  
 Deleting a high percentage of the rows should cause autovacuum to
 deal with the table the next time it wakes up, so an explicit
 VACUUM ANALYZE shouldn't be needed.
 
 Every day the table has about 7 millions of new rows.
 The table hold the data for 60 days, so approx. the total rows
 must be around 420 millions.
 Every night a delete process run, and remove rows older than 60
 days.
 
Oh, I thought you were saying the table grew to 9 million rows each
day and you deleted 8 million of them each night.  That would
definitely trigger autovacuum.  Deleting 7 million rows from a table
of 420 million rows would not, so an explicit VACUUM ANALYZE after
the delete might be helpful.  Even better, with a pattern like that,
you might want to consider partitioning the table:
 
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
 
 For some reason for the same amount of data every day postgres
 consume a little more.
  
 How are you measuring the data and how are you measuring the
 space?
 
 [no answer]
 
Without knowing what is increasing, it's hard to say why it is
increasing.  For all we know you are logging all statements and
never deleting log files.  The solution for that would be entirely
different from the solution for some other problem.
 
 So, the space used by postgres should not be increase drastically
 because every day arrive 7 millions of rows but also same quantity
 is deleted but my disk get out of space every 4 months.
 
What is getting bigger over time?
 
 I must copy tables outside the server, delete local table and
 create it again, after this process I got again space for about 4
 months.
 
How do you do that?  pg_dump, DROP TABLE, restore the dump?  Have
you captured sizes of heap, toast, indexes, etc. before and after
this aggressive maintenance?  Is the size going up by orders of
magnitude, or are you running really tight and getting killed by a
10% increase.  We don't know unless you tell us.
 
 Maybe is a wrong autovacuum config, but is really complicate to
 understand what values are correct to avoid performance penalty
 but to keep table in good fit.
 
Please show us the entire result from running this query:
 
http://wiki.postgresql.org/wiki/Server_Configuration
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best practice to avoid table bloat?

2012-08-16 Thread Anibal David Acosta
Thanks Kevin.
Postgres version is 9.1.4 (lastest)

Every day the table has about 7 millions of new rows.
The table hold the data for 60 days, so approx. the total rows must be
around 420 millions.
Every night a delete process run, and remove rows older than 60 days.

So, the space used by postgres should not be increase drastically because
every day arrive 7 millions of rows but also same quantity is deleted but my
disk get out of space every 4 months.
I must copy tables outside the server, delete local table and create it
again, after this process I got again space for about 4 months.

Maybe is a wrong autovacuum config, but is really complicate to understand
what values are correct to avoid performance penalty but to keep table in
good fit.

I think that autovacuum configuration should have some like auto-config
that recalculate every day which is the best configuration for the server
condition

Thanks!


-Mensaje original-
De: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] 
Enviado el: jueves, 16 de agosto de 2012 04:52 p.m.
Para: Anibal David Acosta; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] best practice to avoid table bloat?

Anibal David Acosta a...@devshock.com wrote:
 
 if I have a table that daily at night is deleted about 8 millions of 
 rows (table maybe has 9 millions) is recommended to do a vacuum 
 analyze after delete completes or can I leave this job to autovacuum?
 
Deleting a high percentage of the rows should cause autovacuum to deal with
the table the next time it wakes up, so an explicit VACUUM ANALYZE shouldn't
be needed.
 
 For some reason for the same amount of data every day postgres consume 
 a little more.
 
How are you measuring the data and how are you measuring the space? 
And what version of PostgreSQL is this?
 
-Kevin



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High Disk write and space taken by PostgreSQL

2012-08-16 Thread Merlin Moncure
On Wed, Aug 15, 2012 at 11:30 PM, J Ramesh Kumar rameshj1...@gmail.com wrote:

 Hi David Barton,

 Please find the information below.

 Are you able to provide a table schema?


 There are 109 different types of table. I am maintaining some tables are
 daily tables and some tables are ID based. So totally we have created around
 350 tables and dropped around 350 tables. I will drop the old table and I
 don't delete any records. I am maintaing only last 30 days tables. I dropped
 tables which are older than 30 days. All the tables are only have basic data
 types like int, smallint, bigint, varchar.



 Were you using MyISAM or InnoDB on MySQL?


 I am using MyISAM tables in MySQL.

You can't compare a non-MVCC system such as MyISAM with a MVCC one.
MVCC systems have to store extra accounting information in order to
manage transactions and multiple versions of the same record for SQL
updates.  MVCC isn't all bad: for example you get much better
performance in the face of highly concurrent activity.  MyISAM does
full table locks which are not scalable at all.  The penalty for MVCC
storage may in some cases seem quite high if your tables have very
narrow records.

BTW, I am suspicious that your claim that you 'don't need'
transactions is correct, especially in the long term.

Anyways, there are several techniques to try and mitigate data growth
in postgres -- arrays for example.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance