Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-24 Thread Alvaro Herrera
johannes graën wrote:
> Hi Pavel, *,
> 
> you were right with ANALYZing the DB first. However, even after doing
> so, I frequently see Seq Scans where an index was used before. This
> usually cooccurs with parallelization and looked different before
> upgrading to 10. I can provide an example for 10 [1], but I cannot
> generate a query plan for 9.6 anymore.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM]

2017-06-29 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov :
> 

> > I just  tried UNION queries and got following error:
> >
> > ERROR:  FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
> 
> it is sad :(

I think we could lift this restriction for UNION ALL, but UNION sounds
difficult.


BTW I wonder how much of the original problem is caused by using a
prepared query.  I understand the desire to avoid repeated planning
work, but I think in this case it may be working against you.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Improving PostgreSQL insert performance

2017-06-11 Thread Alvaro Herrera
Vladimir Sitnikov wrote:
> Alvaro>Something like
> INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')>I
> did not
> Frits>try that, to be honest.
> 
> pgjdbc does automatically rewrite insert values(); into insert ...
> values(),(),(),() when reWriteBatchedInserts=true. I don't expect manual
> multivalues to be noticeably faster there.

Ahh, so that's what that option does :-)  Nice to know -- great feature.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Improving PostgreSQL insert performance

2017-06-10 Thread Alvaro Herrera
Frits Jalvingh wrote:

> So, I am still very interested in getting normal inserts faster, because
> that will gain speed for all work.. If Oracle can do it, and Postgres is
> able to insert fast with copy- where lies the bottleneck with the insert
> command? There seems to be quite a performance hit with the JDBC driver
> itself (as the stored procedure is a lot faster), so I can look into that.
> But even after that there is quite a gap..

Did you try inserting multiple tuples in one command?  Something like
INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')
It's supposed to be faster than single-row inserts, though I don't
know by how much.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of characters in column preventing index usage

2017-02-17 Thread Alvaro Herrera
Hustler DBA wrote:
> I am seeing this strange behavior, I don't know if this is by design by
> Postgres.
> 
> I have an index on a column which is defined as "character varying(255)".
> When the value I am searching for is of a certain length, the optimizer
> uses the index but when the value is long, the optimizer doesn't use the
> index but does a seq scan on the table. Is this by design? How can I make
> the optimizer use the index no matter what the size/length of the value
> being searched for?

As I recall, selectivity for strings is estimated based on the length of
the string.  Since your sample string looks suspiciously like an UUID,
perhaps you'd be better served by using an UUID column for it, which may
give better results.  This would prevent you from using the shortened
version for searches (which I suppose you can do with LIKE using the
varchar type), but you could replace it with something like this:

select *
from tab
where ID between '01625cfa-2bf8-45cf--' and
  '01625cfa-2bf8-45cf--';

Storage (both the table and indexes) is going to be more efficient this
way too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE takes too much memory

2016-07-04 Thread Alvaro Herrera
Kouber Saparev wrote:
> I tried to DELETE about 7 million rows at once, and the query went up to
> 15% of the RAM (120 GB in total), which pushed some indexes out and the
> server load went up to 250, so I had to kill the query.
> 
> The involved table does not have neither foreign keys referring to other
> tables, nor other tables refer to it. The size of the table itself is 19 GB
> (15% of 120 GB). So why the DELETE tried to put the entire table in memory,
> or what did it do to take so much memory?

Are there triggers in the table?  Deferred triggers in particular can
use memory.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big number of connections

2016-03-31 Thread Alvaro Herrera
Andrew Dunstan wrote:

> On 03/31/2016 03:12 PM, Igor Neyman wrote:

> > >   We are going to build system based on PostgreSQL database for huge 
> > > number of individual users (few thousands). Each user will have his own 
> > > account, for authorization we will use Kerberos (MIT or Windows).
> > >Most of users will have low activity, but for various reasons, connection 
> > >should be open all the time.
> > >I'd like to know what potential problems and limitations we can expect 
> > >with such deployment.
> > >   During preliminary testing we have found that for each connection we 
> > > need ~1MB RAM. Is there any way to decrease this ? Is there any risk, 
> > > that such number of users will degrade performance ?
> > >   I'll be happy to hear any remarks and suggestions related to design, 
> > > administration and handling of such installation.

> >Take a look at PgBouncer.
> >It should solve your problems.
> 
> If they are going to keep the client connections open, they would need to
> run pgbouncer in statement or transaction mode.

As I understand, in pgbouncer you cannot have connections that serve
different users.  If each individual requires its own database-level
user, pgbouncer would not help at all.

I would look seriously into getting rid of the always-open requirement
for connections.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [SPAM] Re: [PERFORM] autovacuum disk IO

2016-03-02 Thread Alvaro Herrera
Scott Marlowe wrote:
> On Wed, Mar 2, 2016 at 9:11 AM, Moreno Andreo  
> wrote:

> > ... or maybe add some more RAM to have more disk caching (if you're on
> > *nix) this worked for me in the past... even if IMHO it's more a
> > temporary "patch" while upgrading (if it can't be done in a hurry) than a
> > real solution...
> 
> Oh yeah, definitely worth looking at. But RAM can't speed up writes,
> just reads, so it's very workload dependent. If you're IO subsystem is
> maxing out on writes, faster drives / IO. If it's maxing out on reads,
> more memory. But if your dataset is much bigger than memory (say 64GB
> RAM and a 1TB data store) then more RAM isn't going to be the answer.

In the particular case of autovacuum, it may be helpful to create a
"ramdisk" and put the stats temp file in it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] checkpoints, proper config

2015-12-10 Thread Alvaro Herrera
Tomas Vondra wrote:

> Also, I don't think it makes much sense to set
> 
>(checkpoint_warning > checkpoint_timeout)
> 
> as it kinda defeats the whole purpose of the warning.

I agree, but actually, what is the sense of checkpoint_warning?  I think
it was useful back when we didn't have log_checkpoints, but now that we
have detailed checkpoint logging I think it's pretty much useless noise.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread Alvaro Herrera
Jim Nasby wrote:

> Yeah, this is not something that would be very easy to accomplish, because a
> buffer can get evicted and written to disk at any point. It wouldn't be too
> hard to read every unlogged table during recovery and see if there are any
> pages that were written after the last checkpoint, but that obviously won't
> be very fast.

If you consider only tables, then yeah perhaps this is easy to
accomplish (not really convinced myself).  But if you consider indexes,
things are not so easy anymore.


In the thread from 2011 (which this started as a reply to) the OP was
doing frequent UPDATEs to keep track of counts of something.  I think
that would be better served by using INSERTs of deltas and periodic
accumulation of grouped values, as suggested in
http://www.postgresql.org/message-id/20150305211601.gw3...@alvh.no-ip.org
This has actually been suggested many times over the years.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Avoiding Refreezing XIDs Repeatedly

2015-02-09 Thread Alvaro Herrera
bkrug wrote:
> The problem I'm facing is that I have many large (several GB) tables that are
> not being changed (they are several days old) but auto-vacuum keeps scanning
> and updating them every time the xid wraps around and thus my rsync back-up
> process sees that the disk files have changed and must copy them.

We have considered changing this, but it needs a concerted effort.  It's
not a simple problem.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Avoiding Refreezing XIDs Repeatedly

2015-02-09 Thread Alvaro Herrera
bkrug wrote:
> Matheus de Oliveira wrote
> > It changed in recent versions (9.3 or 9.4, I don't recall exactly which)
> > and moved to tuple header, but what you described is exactly what was
> > done,
> > the xid was 2.
> 
> Should the relfrozenxid of pg_class then equal 2 for very old and already
> vacuumed tables? Because that is not what I am seeing.

No.  The problem is that it's not easy to change the relfrozenxid when
an INSERT/UPDATE command creates a tuple with a non-frozen XID.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Avoiding Refreezing XIDs Repeatedly

2015-02-09 Thread Alvaro Herrera
Matheus de Oliveira wrote:
> On Mon, Feb 9, 2015 at 1:58 PM, bkrug  wrote:
> 
> > Couldn't postgres reserve a special XID that is never available for normal
> > transactions but that indicates that any transaction can see it because it
> > is so old? Then instead of constantly having to freeze old XIDs each time
> > the XID is going to wrap, vacuum can just set it to the special XID and
> > never touch it again unless something really changes.
> >
> 
> 
> It changed in recent versions (9.3 or 9.4, I don't recall exactly which)
> and moved to tuple header, but what you described is exactly what was done,
> the xid was 2.

Actually, it's been done this way for ages -- it was introduced in 2001
(release 7.2) by these commits:

Author: Tom Lane 
Branch: master Release: REL7_2 [2589735da] 2001-08-25 18:52:43 +

Replace implementation of pg_log as a relation accessed through the
buffer manager with 'pg_clog', a specialized access method modeled
on pg_xlog.  This simplifies startup (don't need to play games to
open pg_log; among other things, OverrideTransactionSystem goes away),
should improve performance a little, and opens the door to recycling
commit log space by removing no-longer-needed segments of the commit
log.  Actual recycling is not there yet, but I felt I should commit
this part separately since it'd still be useful if we chose not to
do transaction ID wraparound.


Author: Tom Lane 
Branch: master Release: REL7_2 [bc7d37a52] 2001-08-26 16:56:03 +

Transaction IDs wrap around, per my proposal of 13-Aug-01.  More
documentation to come, but the code is all here.  initdb forced.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Lock pileup causes server to stall

2014-11-12 Thread Alvaro Herrera
Jesper Krogh wrote:
> 
> > On 10/11/2014, at 22.40, Alvaro Herrera  wrote:
> > 
> > Josh Berkus wrote:
> >> All,
> >> 
> >> pg version: 9.3.5
> >> RHEL 6.5
> >> 128GB/32 cores
> >> Configured with shared_buffers=16GB
> >> Java/Tomcat/JDBC application
> >> 
> >> Server has an issue that whenever we get lock waits (transaction lock
> >> waits, usually on an FK dependancy) lasting over a minute or more than
> >> 10 at once, *all* queries on the server slow to a crawl, taking 100X to
> >> 400X normal execution times.
> > 
> > Current FK checking makes you wait if the referenced tuple is modified
> > on any indexed column, not just those that are actually used in
> > foreign keys.  Maybe this case would be sped up if we optimized that.
> 
> Even if it is an gin index that is being modified?   seems like a harsh 
> limitation to me.

Well, as I recall it's only unique indexes, so it's not *that* harsh.

Anyway, the fklocks patch was stupidly complex (and still got much stuff
wrong).  I didn't want to add more ground to objections by additionally
breaking the abstraction between heapam and the concept of "columns
referenced by a foreign key constraint".  So it was discussed and
decided we'd leave that for future improvement.  Patches are welcome,
particularly if they come from the future.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Lock pileup causes server to stall

2014-11-10 Thread Alvaro Herrera
Josh Berkus wrote:
> All,
> 
> pg version: 9.3.5
> RHEL 6.5
> 128GB/32 cores
> Configured with shared_buffers=16GB
> Java/Tomcat/JDBC application
> 
> Server has an issue that whenever we get lock waits (transaction lock
> waits, usually on an FK dependancy) lasting over a minute or more than
> 10 at once, *all* queries on the server slow to a crawl, taking 100X to
> 400X normal execution times.

Current FK checking makes you wait if the referenced tuple is modified
on any indexed column, not just those that are actually used in
foreign keys.  Maybe this case would be sped up if we optimized that.

> * This applies even to queries which are against other databases, so
> it's not purely a lock blocking issue.

Oh.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-31 Thread Alvaro Herrera
Matt Clarkson wrote:

> The LWLOCK_STATS below suggest that ProcArrayLock might be the main
> source of locking that's causing throughput to take a dive as the client
> count increases beyond the core count.

> Any thoughts or comments on these results are welcome!

Do these results change if you use Heikki's patch for CSN-based
snapshots?  See
http://www.postgresql.org/message-id/539ad153.9000...@vmware.com for the
patch (but note that you need to apply on top of 89cf2d52030 in the
master branch -- maybe it applies to HEAD the 9.4 branch but I didn't
try).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_repack solves alter table set tablespace lock

2014-01-27 Thread Alvaro Herrera
Ying He escribió:
> Thank you Josh. Won't double post again. Just thought reorg mailing list is 
> quite inactive.

Well, that tells you something about its maintenance state and what sort
of help you can expect if you find yourself in trouble with it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-30 Thread Alvaro Herrera
Sergey Konoplev escribió:
> On Fri, Dec 20, 2013 at 7:59 AM, Alvaro Herrera
>  wrote:

> > Eh.  Why can't you just do something like
> >
> > WITH moved AS (
> > DELETE FROM src WHERE ..
> > RETURNING *
> > ) INSERT INTO dst SELECT * FROM moved;
> 
> Avero, I think it could be cheaper to do this like it is shown below, correct?
> 
> psql dbname -c 'copy src to stdout' | \
> psql dbname -c 'copy dst from stdin; truncate src;'

Yes, if you can get rid of the old records by removing or emptying a
partition (or de-inheriting it, as suggested elsewhere in the thread),
that's better than DELETE because that way you don't create dead rows to
vacuum later.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-20 Thread Alvaro Herrera
Dave Johansen escribió:
> On Thu, Dec 19, 2013 at 10:27 AM, Scott Marlowe 
> wrote:

> > That's pretty much it. What I did was to create the new month table
> > and day tables, alter my triggers to reflect this, then move the data
> > with insert into / select from query for each old day partition. Then
> > once their data is moved you can just drop them. Since you changed the
> > triggers first those tables are no long taking input so it's usually
> > safe to drop them now.
> 
> It would be nice if there was just a "move command", but that seems like
> the type of model that we want and we'll probably move to that.

Eh.  Why can't you just do something like

WITH moved AS (
DELETE FROM src WHERE ..
RETURNING *
) INSERT INTO dst SELECT * FROM moved;

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-03 Thread Alvaro Herrera
Metin Doslu wrote:

> When we send concurrent Select queries to these tables, query performance
> doesn't scale out with the number of CPU cores. We find that complex Select
> queries scale out better than simpler ones. We also find that increasing
> the block size from 8 KB to 32 KB, or increasing shared_buffers to include
> the working set mitigates the problem to some extent.

Maybe you could help test this patch:
http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Alvaro Herrera
Claudio Freire escribió:
> On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane  wrote:

> > Note that there's no particular need to specify "desc" in the index
> > definition.  This same index can support searches in either direction
> > on the "called" column.
> 
> Yeah, but it's faster if it's in the same direction, because the
> kernel read-ahead code detects sequential reads, whereas it doesn't
> when it goes backwards. The difference can be up to a factor of 10 for
> long index scans.

That might be true when an index is new, but as it grows, the leaf pages
are not going to be sequential anymore.  And this doesn't much apply for
an equality lookup anyway, does it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Alvaro Herrera
Tom Lane escribió:
> Jeff Janes  writes:
> > On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan  
> > wrote:
> >> If I not mistaken, may be two code paths like this here:
> >> (1) mergejoinscansel -> scalarineqsel-> ineq_histogram_selectivity -> 
> >> get_actual_variable_range -> index_getnext
> >> (2) scalargtsel -> scalarineqsel -> ineq_histogram_selectivity -> 
> >> get_actual_variable_range -> index_getnext
> 
> > Yeah, I think you are correct.
> 
> mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range.
> It calls get_variable_range, which only looks at the pg_statistic entries.

Uh?  It's right there in line 2976 in HEAD.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Alvaro Herrera
Sergey Burladyan escribió:
> I also find this trace for other query:
> explain select * from xview.user_items_v v where ( v.item_id = 132358330 );
> 
> #0  0x7ff766967620 in read () from /lib/libc.so.6
> #1  0x7ff7689cfc25 in FileRead ()
> #2  0x7ff7689ea2f6 in mdread ()
> #3  0x7ff7689cc473 in ?? ()
> #4  0x7ff7689ccf54 in ReadBufferExtended ()
> #5  0x7ff7688050ca in index_fetch_heap ()
> #6  0x7ff76880523e in index_getnext ()
> #7  0x7ff768a63306 in ?? ()
> #8  0x7ff768a67624 in ?? ()
> #9  0x7ff768a67d9c in ?? ()
> #10 0x7ff768a688fc in scalargtsel ()

It'd be useful to see what's in frames 7-9, but this might be related to
get_actual_variable_range().  I don't see anything else nearby that
would try to read portions of the table.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum fringe case?

2013-01-23 Thread Alvaro Herrera
AJ Weber escribió:

> On 1/23/2013 2:13 PM, Jeff Janes wrote:

> >Scheduling a manual vacuum should be fine (but keep in mind that
> >vacuum has very different default cost_delay settings than autovacuum
> >does.  If the server is completely idle that shouldn't matter, but if
> >it is only mostly idle, you might want to throttle the IO a bit).  But
> >I certainly would not disable autovacuum without further evidence.  If
> >a table only needs to be vacuumed once a day and you preemptively do
> >it at 3a.m., then autovac won't bother to do it itself during the day.
> >  So there is no point, but much risk, in also turning autovac off.
> If I set autovacuum_max_workers = 1, will that effectively
> single-thread it so I don't have two running at once?  Maybe that'll
> mitigate disk contention a little at least?

If you have a single one, it will go three times as fast.  If you want
to make the whole thing go slower (i.e. cause less impact on your I/O
system when running), crank up autovacuum_vacuum_cost_delay.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2

2012-11-08 Thread Alvaro Herrera
Rodrigo Rosenfeld Rosas escribió:
> Em 07-11-2012 22:58, Tom Lane escreveu:
> >Rodrigo Rosenfeld Rosas  writes:
> >>Ok, I could finally strip part of my database schema that will allow you
> >>to run the explain query and reproduce the issue.
> >>There is a simple SQL dump in plain format that you can restore both on
> >>9.1 and 9.2 and an example EXPLAIN query so that you can see the
> >>difference between both versions.
> >>Please keep me up to date with regards to any progress. Let me know if
> >>the commit above fixed this issue.
> >AFAICT, HEAD and 9.2 branch tip plan this query a bit faster than 9.1
> >does.
> 
> Great! What is the estimate for 9.2.2 release?

Hasn't been announced, but you can grab a snapshot right now from
ftp.postgresql.org if you want.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replaying 48 WAL files takes 80 minutes

2012-10-29 Thread Alvaro Herrera
Albe Laurenz wrote:
> I am configuring streaming replication with hot standby
> with PostgreSQL 9.1.3 on RHEL 6 (kernel 2.6.32-220.el6.x86_64).
> PostgreSQL was compiled from source.
> 
> It works fine, except that starting the standby took for ever:
> it took the system more than 80 minutes to replay 48 WAL files
> and connect to the primary.
> 
> Can anybody think of an explanation why it takes that long?

Can you do a quick xlogdump of those files?  Maybe there is something
unusual (say particular types of GIN/GiST index updates) on the files
that take longer.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum blocks the operations of other manual vacuum

2010-11-21 Thread Alvaro Herrera
Excerpts from kuopo's message of dom nov 21 11:15:52 -0300 2010:
> In my experiment, I need about 1~3 min to finish the analyze operation
> on the big table (which depends on the value of vacuum_cost_delay). I
> am not surprised because this table is a really big one (now, it has
> over 200M records).

Okay.  You may want to consider lowering the statistics size for all the
column in that table; that would reduce analyze time, at the cost of
possibly worsening the plans for that table, depending on how irregular
the distribution is.  See ALTER TABLE / SET STATISTICS in the
documentation, and the default_statistics_target parameter in
postgresql.conf.

> However, the most of my concerns is the behavior of analyze/vacuum.
> You mentioned that the analyze-only operation cannot be optimized as
> the same way on optimizing vacuum. Does that mean the analyze
> operation on a table would unavoidably affect the vacuum proceeded on
> another one?

That's correct.  I think you can run VACUUM ANALYZE, and it would do
both things at once; AFAIK this is also optimized like VACUUM is, but I
admit I'm not 100% sure (and I can't check right now).

> If this is a normal reaction for an analyze operation,
> maybe I should try to lower vacuum_cost_delay or use more powerful
> hardware to minimize the interfered period. So, the pages for the
> small table would not increase quickly.

I think it would make sense to have as low a cost_delay as possible for
this ANALYZE.  (Note you can change it locally with a SET command; no
need to touch postgresql.conf.  So you can change it when you analyze
just this large table).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum blocks the operations of other manual vacuum

2010-11-19 Thread Alvaro Herrera
Excerpts from kuopo's message of jue nov 18 04:10:36 -0300 2010:
> Hi,
> 
> Thanks for your response. I've checked it again and found that the
> main cause is the execution of ANALYZE. As I have mentioned, I have
> two tables: table A is a big one (around 10M~100M records) for log
> data and table B is a small one (around 1k records) for keeping some
> current status. There are a lot of update operations and some search
> operations on the table B. For the performance issue, I would like to
> keep table B as compact as possible. According your suggestion, I try
> to invoke standard vacuum (not full) more frequently (e.g., once per
> min).
> 
> However, when I analyze the table A, the autovacuum or vacuum on the
> table B cannot find any removable row version (the number of
> nonremoveable row versions and pages keeps increasing). After the
> analysis finishes, the search operations on the table B is still
> inefficient. If I call full vacuum right now, then I can have quick
> response time of the search operations on the table B again.

Hmm, I don't think we can optimize the analyze-only operation the same
way we optimize vacuum (i.e. allow vacuum to proceed while it's in
progress).  Normally analyze shouldn't take all that long anyway -- why
is it that slow?  Are you calling it in a transaction that also does
other stuff?  Are you analyzing more than one table in a single
transaction, perhaps even the whole database?

Perhaps you could speed it up by lowering vacuum_cost_delay, if it's set
to a nonzero value.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum blocks the operations of other manual vacuum

2010-11-16 Thread Alvaro Herrera
Excerpts from kuopo's message of vie nov 12 05:01:24 -0300 2010:
> Hi,
> 
> I have a question about the behavior of autovacuum. When I have a big
> table A which is being processed by autovacuum, I also manually use
> (full) vacuum to clean another table B. Then I found that I always got
> something like “found 0 removable, 14283 nonremovable row”. However,
> if I stop the autovacuum functionality and use vacuum on that big
> table A manually, I can clean table B (ex. found 22615 removable, 2049
> nonremovable row).
> 
> Is this correct? Why do vacuum and autovacuum have different actions?

Vacuum full does not assume that it can clean up tuples while other
transactions are running, and that includes the (non full, or "lazy")
vacuum that autovacuum is running.  Autovacuum only runs lazy vacuum;
and that one is aware that other concurrent vacuums can be ignored.

Just don't use vacuum full unless strictly necessary.  It has other
drawbacks.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Alvaro Herrera
Excerpts from Greg Smith's message of jue oct 21 14:04:17 -0300 2010:

> What I would like to do is beef up the documentation with some concrete 
> examples of how to figure out if your cache and associated write path 
> are working reliably or not.  It should be possible to include "does 
> this handle full page writes correctly?" in that test suite.  Until we 
> have something like that, I'm concerned that bugs in filesystem or 
> controller handling may make full_page_writes unsafe even with a BBU, 
> and we'd have no way for people to tell if that's true or not.

I think if you assume that there are bugs in the filesystem which you
need to protect against, you are already hosed.  I imagine there must be
some filesystem bug that makes it safe to have full_page_writes=on, but
unsafe to have full_page_writes=off; but I'd probably discard those as a
rare minority and thus not worth worrying about.

I agree it would be worth testing though.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2010-10-18 Thread Alvaro Herrera
Excerpts from Samuel Gendler's message of lun oct 18 03:13:01 -0300 2010:
> On Sat, Oct 16, 2010 at 8:29 AM, Alvaro Herrera
> wrote:
> 
> > Excerpts from Samuel Gendler's message of sáb oct 16 02:35:46 -0300 2010:
> >
> > > An issue with automatically analyzing the entire hierarchy is
> > > 'abstract' table definitions.  I've got a set of tables for
> > > storing the same data at different granularities of aggregation.
> > > Within each granularity, I've got partitions, but because the set
> > > of columns is identical for each granularity, I've got an abstract
> > > table definition that is inherited by everything.  I don't need or
> > > want statistics kept on that table because I never query across
> > > the abstract table, only the parent table of each aggregation
> > > granularity
> >
> > Hmm, I think you'd be better served by using LIKE instead of regular
> > inheritance.
>
> Yep.  I inherited the architecture, though, and changing it hasn't been a
> high priority.

I understand that; my point is merely that maybe we shouldn't work
through many hoops to solve this particular facet of the problem,
because it seems to be pilot error.  (If you really needed to avoid the
extra I/O that would be caused by unnecessary analyzes, you could turn
autovac off for the abstract tables).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2010-10-16 Thread Alvaro Herrera
Excerpts from Samuel Gendler's message of sáb oct 16 02:35:46 -0300 2010:

> An issue with automatically analyzing the entire hierarchy is 'abstract'
> table definitions.  I've got a set of tables for storing the same data at
> different granularities of aggregation.  Within each granularity, I've got
> partitions, but because the set of columns is identical for each
> granularity, I've got an abstract table definition that is inherited by
> everything.  I don't need or want statistics kept on that table because I
> never query across the abstract table, only the parent table of each
> aggregation granularity

Hmm, I think you'd be better served by using LIKE instead of regular
inheritance.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] No hash join across partitioned tables?

2010-10-15 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié jun 09 15:47:55 -0400 2010:

> In going back through emails I had marked as possibly needing another
> look before 9.0 is released, I came across this issue again.  As I
> understand it, analyze (or analyse) now collects statistics for both
> the parent individually, and for the parent and its children together.
>  However, as I further understand it, autovacuum won't actually fire
> off an analyze unless there's enough activity on the parent table
> considered individually to warrant it.  So if you have an empty parent
> and a bunch of children with data in it, your stats will still stink,
> unless you analyze by hand.

So, is there something we could now do about this, while there's still
time before 9.1?

I haven't followed this issue very closely, but it seems to me that what
we want is that we want an ANALYZE in a child table to be mutated into
an analyze of its parent table, if the conditions are right; and that an
ANALYZE of a parent removes the child tables from being analyzed on the
same run.

If we analyze the parent, do we also update the children stats, or is it
just that we keep two stats for the parent, one with children and one
without, both being updated when the parent is analyzed?

If the latter's the case, maybe we should modify ANALYZE a bit more, so
that we can analyze the whole hierarchy in one go, and store the lot of
stats with a single pass (each child alone, the parent alone, the parent
plus children).  However it's not real clear how would this work with
multiple inheritance levels.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Where does data in pg_stat_user_tables come from?

2010-09-13 Thread Alvaro Herrera
Excerpts from Josh Berkus's message of lun sep 13 20:53:51 -0400 2010:
> 
> > If you select from pg_stat_user_tables, the counters should be
> > reasonably close unless your default_statistics_target is way off and
> > then pg_class.reltuples would be wrong.
> 
> At least in 8.3, running ANALYZE does not update pg_stat_user_tables in
> any way.  Does it in later versions?

It's been pure nonsense in this thread.  Please show an example of
what's not working.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgbench could not send data to client: Broken pipe

2010-09-09 Thread Alvaro Herrera
Excerpts from David Kerr's message of mié sep 08 18:29:59 -0400 2010:

> Thanks for the insight. we're currently in performance testing of the
> app. Currently, the JVM is the bottleneck, once we get past that
> i'm sure it will be the database at which point I'll have the kind
> of data you're talking about.

Hopefully you're not running the JVM stuff in the same machine.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PARSE WAITING

2010-08-23 Thread Alvaro Herrera
Excerpts from David Kerr's message of lun ago 23 18:47:02 -0400 2010:

> unlink("base/pgsql_tmp/pgsql_tmp28335.12593") = 0
> unlink("base/pgsql_tmp/pgsql_tmp28335.6041") = 0
> unlink("base/pgsql_tmp/pgsql_tmp28335.3030") = 0
> unlink("base/pgsql_tmp/pgsql_tmp28335.14737") = 0
> 
> which isn't the fastest operation.. just for my info, can anyone tell me what
> pgsql_tmp is, and why the engine is wacking each file individually?

These are temp files, which you can remove without concern if the server
is down.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PARSE WAITING

2010-08-23 Thread Alvaro Herrera
Excerpts from David Kerr's message of lun ago 23 18:15:56 -0400 2010:
> Howdy all,
> 
> We're doing some performance testing, and when we scaled it our app up to 
> about 250 concurrent users
> we started seeing a bunch of processes sititng in "PARSE WAITING" state.
> 
> Can anyone give me insite on what this means? what's the parse waiting for?

It means the parse phase is waiting for a lock.  You can see exactly
what it's waiting for by looking at pg_locks "WHERE NOT GRANTED".

Have you got lots of partitions, or something?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inefficient query plan

2010-08-23 Thread Alvaro Herrera
Excerpts from Jann Röder's message of lun ago 23 00:23:38 -0400 2010:

> "Hash Join  (cost=516.66..17710110.47 rows=8358225 width=16)"
> "  Hash Cond: ((b.itemid)::bpchar = a.itemid)"
> "  ->  Seq Scan on b  (cost=0.00..15110856.68 rows=670707968 width=16)"
> "  ->  Hash  (cost=504.12..504.12 rows=1003 width=16)"
> "->  Index Scan using idx_issueid on a  (cost=0.00..504.12
> rows=1003 width=16)"
> "  Index Cond: (issueid = 'A1983PW823'::bpchar)"

Hmm, I'm placing bets on the bpchar weirdness.  I'd try getting rid of
that and using plain varchar for all the columns.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Quesion on the use of indexes

2010-08-17 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun ago 16 23:33:29 -0400 2010:
> "Benjamin Krajmalnik"  writes:
> > A little background - I have various multi-column indexes whenever I
> > have queries which restrict the output based on the values of the 2
> > fields (for example, a client code and the date of a transaction).
> 
> > Is there a performance gain using this approach as opposed to using 2
> > separate indexes, one on the first column and one on the second column?
> 
> Maybe, maybe not ... it's going to depend on a bunch of factors, one of
> which is what your update load is like compared to the queries that read
> the indexes.  There's a bit of coverage of this in the fine manual: see
> http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
> and the next few pages.

Another important factor is how selective is each clause in isolation
compared to how selective they are together.  We have found that doing
BitmapAnd of two bitmap-scanned indexes is sometimes much too slow
compared to a two-column index.  (I have yet to see a case where indexes
beyond two columns are useful; at this point, combined bitmap indexscans
are enough.)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-11 Thread Alvaro Herrera
Excerpts from Matthew Wakeling's message of mar ago 10 11:40:16 -0400 2010:

> I am trying to retrieve, for many sets of rows grouped on a couple of 
> fields, the value of an ungrouped field where the row has the highest 
> value in another ungrouped field.

I think this does what you want (schema is from the tenk1 table in the
regression database):

select string4 as group,
   (array_agg(stringu1 order by unique1 desc))[1] as value
from tenk1
group by 1 ;

Please let me know how it performs with your data.  The plan is rather simple:

regression=# explain analyze select string4 as group, (array_agg(stringu1 order 
by unique1 desc))[1] as value from tenk1 group by 1 ;
  QUERY PLAN
   
───
 GroupAggregate  (cost=0.00..1685.16 rows=4 width=132) (actual 
time=22.825..88.922 rows=4 loops=1)
   ->  Index Scan using ts4 on tenk1  (cost=0.00..1635.11 rows=1 width=132) 
(actual time=0.135..33.188 rows=1 loops=1)
 Total runtime: 89.348 ms
(3 filas)


-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar jul 27 20:05:02 -0400 2010:
> Peter Hussey  writes:

> > 2) How is work_mem used by a query execution?
> 
> Well, the issue you're hitting is that the executor is dividing the
> query into batches to keep the size of the in-memory hash table below
> work_mem.  The planner should expect that and estimate the cost of
> the hash technique appropriately, but seemingly it's failing to do so.
> Since you didn't provide EXPLAIN ANALYZE output, though, it's hard
> to be sure.

Hmm, I wasn't aware that hash joins worked this way wrt work_mem.  Is
this visible in the explain output?  If it's something subtle (like an
increased total cost), may I suggest that it'd be a good idea to make it
explicit somehow in the machine-readable outputs?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identical query slower on 8.4 vs 8.3

2010-07-15 Thread Alvaro Herrera
Excerpts from Patrick Donlin's message of jue jul 15 11:12:53 -0400 2010:
> I'll read over that wiki entry, but for now here is the EXPLAIN ANALYZE 
> output assuming I did it correctly. I have run vacuumdb --full --analyze, it 
> actually runs as a nightly cron job. 

These plans seem identical (though the fact that the leading whitespace
was trimmed means it's untrustworthy -- please in the future send them
as text attachments instead so that your mailer doesn't interfere with
formatting).  The 8.4 plan is even a full second faster, according to
the "total runtime" line.

The slowness could've been caused by caching effects ...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-03 Thread Alvaro Herrera
Excerpts from Merlin Moncure's message of sáb jul 03 18:53:46 -0400 2010:

> What about my suggestion doesn't work for your requirements?  (btw,
> let me disagree with my peers and state pl/perl is lousy for this type
> of job, only sql/and pl/sql can interact with postgresql variables
> natively for the most part).

IIRC the other reason pl/perl sucks for this kind of thing is that it
forces a subtransaction to be created before the function call, which is
expensive.  (I might be misremembering and what actually causes a
subtransaction is a SPI call inside a PL/Perl function, which wouldn't
apply here.)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] System tables screwed up? (WAS requested shared memory size overflows size_t)

2010-06-24 Thread Alvaro Herrera
Excerpts from Craig James's message of jue jun 24 19:24:44 -0400 2010:
> On 6/24/10 4:19 PM, Alvaro Herrera wrote:
> > Excerpts from Craig James's message of jue jun 24 19:03:00 -0400 2010:
> >
> >> select relname, pg_relation_size(relname) from pg_class
> >>   where pg_get_userbyid(relowner) = 'emol_warehouse_1'
> >>   and relname not like 'pg_%'
> >>   order by pg_relation_size(relname) desc;
> >> ERROR:  relation "rownum_temp" does not exist
> >>
> >> emol_warehouse_1=>  select relname from pg_class where relname = 
> >> 'rownum_temp';
> >>  relname
> >> --
> >>rownum_temp
> >> (1 row)
> >
> > What's the full row?  I'd just add a "WHERE relkind = 'r'" to the above
> > query anyway.
> 
> Thanks, in fact that works.  But my concern is that these are system tables 
> and system functions and yet they seem to be confused.  I've used this query 
> dozens of times and never seen this behavior before.  It makes me really 
> nervous...

I think you're being bitten by lack of schema qualification.  Perhaps
you ought to pass pg_class.oid to pg_relation_size instead of relname.
What did you do to make pg_relation_size to work on type name?

Why is this a -performance question anyway?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-24 Thread Alvaro Herrera
Excerpts from Craig James's message of jue jun 24 19:03:00 -0400 2010:

> select relname, pg_relation_size(relname) from pg_class
>  where pg_get_userbyid(relowner) = 'emol_warehouse_1'
>  and relname not like 'pg_%'
>  order by pg_relation_size(relname) desc;
> ERROR:  relation "rownum_temp" does not exist
> 
> emol_warehouse_1=> select relname from pg_class where relname = 'rownum_temp';
> relname
> --
>   rownum_temp
> (1 row)

What's the full row?  I'd just add a "WHERE relkind = 'r'" to the above
query anyway.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] cpu bound postgresql setup.

2010-06-24 Thread Alvaro Herrera
Excerpts from Rajesh Kumar Mallah's message of jue jun 24 13:25:32 -0400 2010:

> What prompted me to post to list is that the server transitioned from
> being IO bound to CPU bound and 90% of syscalls being
> lseek(XXX, 0, SEEK_END) = YYY

It could be useful to find out what file is being seeked.  Correlate the
XXX with files in /proc//fd (at least on Linux) to find out more.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow index lookup

2010-06-22 Thread Alvaro Herrera
Excerpts from Anj Adu's message of mar jun 22 17:44:39 -0400 2010:
> This query seems unreasonable slow on a well-indexed table (13 million
> rows). Separate indexes are present on guardid_id , from_num and
> targetprt columns.

Maybe you need to vacuum or reindex?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Aggressive autovacuuming ?

2010-06-21 Thread Alvaro Herrera
Excerpts from Scott Marlowe's message of dom jun 20 16:13:15 -0400 2010:
> On Sun, Jun 20, 2010 at 11:44 AM, Jesper Krogh  wrote:
> > Hi.
> >
> > I have been wondering if anyone has been experimenting with "really
> > agressive"
> > autovacuuming.
> 
> I have been using moderately aggressive autovac, with 6 or more
> threads running with 1ms sleep, then keeping track of them to see if
> they're being too aggresive.  Basically as long as io utilization
> doesn't hit 100% it doesn't seem to have any negative or even
> noticeable effect.

Keep in mind that autovacuum scales down the cost limit the more workers
there are.  So if you have 10ms sleeps and 1 worker, it should roughly
use a similar amount of I/O than if you have 10ms sleeps and 10 workers
(each worker would sleep 10 times more frequently).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-16 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun jun 14 23:57:11 -0400 2010:
> Scott Carey  writes:
> > Great points.  There is one other option that is decent for the WAL:
> > If splitting out a volume is not acceptable for the OS and WAL -- 
> > absolutely split those two out into their own partitions.  It is most 
> > important to make sure that WAL and data are not on the same filesystem, 
> > especially if ext3 is involved.
> 
> Uh, no, WAL really needs to be on its own *spindle*.  The whole point
> here is to have one disk head sitting on the WAL and not doing anything
> else except writing to that file.

However, there's another point here -- probably what Scott is on about:
on Linux (at least ext3), an fsync of any file does not limit to
flushing that file's blocks -- it flushes *ALL* blocks on *ALL* files in
the filesystem.  This is particularly problematic if you have pgsql_tmp
in the same filesystem and do lots of disk-based sorts.

So if you have it in the same spindle but on a different filesystem, at
least you'll avoid that extra fsync work, even if you have to live with
the extra seeking.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-16 Thread Alvaro Herrera
Excerpts from jgard...@jonathangardner.net's message of mié jun 16 02:30:30 
-0400 2010:

> NOTE: If I do one giant commit instead of lots of littler ones, I get
> much better speeds for the slower cases, but I never exceed 5,500
> which appears to be some kind of wall I can't break through.
> 
> If there's anything else I should tinker with, I'm all ears.

increase wal_buffers?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large (almost 50%!) performance drop after upgrading to 8.4.4?

2010-06-10 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue jun 10 11:46:25 -0400 2010:

> Yes, the folks at commandprompt need to be told about this.  Loudly.
> It's a serious packaging error.

Just notified Lacey, the packager (not so loudly, though); she's working
on new packages, and apologizes for the inconvenience.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from David Jarvis's message of mar jun 01 14:01:22 -0400 2010:
> Sorry, Alvaro.
> 
> I was contemplating using a GIN or GiST index as a way of optimizing the
> query.

My fault -- I didn't read the whole thread.

> Instead, I found that re-inserting the data in order of station ID (the
> primary look-up column) and then CLUSTER'ing on the station ID, taken date,
> and category index increased the speed by an order of magnitude.

Hmm, that's nice, though I cannot but wonder whether the exclusive lock
required by CLUSTER is going to be a problem in the long run.

> I might be able to drop the station/taken/category index in favour of the
> simple station index and CLUSTER on that, instead (saving plenty of disk
> space). Either way, it's fast right now so I'm not keen to try and make it
> much faster.

Hm, keep in mind that if the station clause alone is not selective
enough, scanning it may be too expensive.  The current three column
index is probably a lot faster to search (though of course it's causing
more work to be kept up to date on insertions).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from Matthew Wakeling's message of mar jun 01 05:55:35 -0400 2010:
> On Sun, 23 May 2010, David Jarvis wrote:
> > The measurement table indexes (on date and weather station) were not being
> > used because the only given date ranges (e.g., 1900 - 2009) were causing the
> > planner to do a full table scan, which is correct.
> 
> I wonder if you might see some benefit from CLUSTERing the tables on the 
> index.

Eh, isn't this a GIN or GiST index?  I don't think you can cluster on
those, can you?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-30 Thread Alvaro Herrera
Josh Berkus escribió:

> #autovacuum_vacuum_scale_factor = 0.2
> 
> This is set because in my experience, 20% bloat is about the level at
> which bloat starts affecting performance; thus, we want to vacuum at
> that level but not sooner.  This does mean that very large tables which
> never have more than 10% updates/deletes don't get vacuumed at all until
> freeze_age; this is a *good thing*. VACUUM on large tables is expensive;
> you don't *want* to vacuum a billion-row table which has only 100,000
> updates.

Hmm, now that we have partial vacuum, perhaps we should revisit this.


> It would be worth doing a DBT2/DBT5 test run with different autovac
> settings post-8.4 so see if we should specifically change the vacuum
> threshold.

Right.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-26 Thread Alvaro Herrera
Rick wrote:

> So, in a large table, the scale_factor is the dominant term. In a
> small
> table, the threshold is the dominant term. But both are taken into
> account.

Correct.

> The default values are set for small tables; it is not being run for
> large tables.

So decrease the scale factor and leave threshold alone.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Autovaccum with cost_delay does not complete on one solaris 5.10 machine

2010-04-16 Thread Alvaro Herrera
Josh Berkus wrote:
> Tom,
> 
> Neither database has and per-table autovacuum settings.
> 
> However, since this is a production database, I had to try
> something, and set vacuum_cost_limit up to 1000.  The issue with
> vacuuming one page at a time went away, or at least I have not seen
> it repeat in the last 16 hours.

How many autovac workers are there?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Autovaccum with cost_delay does not complete on one solaris 5.10 machine

2010-04-15 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > We don't call pollsys anywhere.  Something in Solaris must be doing it
> > under the hood.
> 
> pg_usleep calls select(), and some googling indicates that select() is
> implemented as pollsys() on recent Solaris versions.  So Josh's
> assumption that those are delay calls seems plausible.  But it shouldn't
> be sleeping after each page with normal cost_delay parameters, should it?

Certainly not ... The only explanation would be that the cost balance
gets over the limit very frequently.  So one of the params would have to
be abnormally high (vacuum_cost_page_hit, vacuum_cost_page_miss,
vacuum_cost_page_dirty).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Autovaccum with cost_delay does not complete on one solaris 5.10 machine

2010-04-15 Thread Alvaro Herrera
Josh Berkus wrote:

> Basically, vacuuming of a table which normally takes about 20 minutes
> interactively with vacuum_cost_delay set to 20 had not completed after
> 14 hours.  When I trussed it, I saw activity which indicated to me that
> autovacuum was doing a pollsys, presumably for cost_limit, every data page.
> 
> Autovacuum was running with vacuum_cost_limit = 200 and
> autovacuum_vacuum_cost_delay = 20, which I believe is the default for 8.3.
> 
> Truss output:
> 
> pollsys(0xFD7FFFDF83E0, 0, 0xFD7FFFDF8470, 0x) = 0

So what is it polling?  Please try "truss -v pollsys"; is there a way in
Solaris to report what each file descriptor is pointing to?  (In linux
I'd look at /proc//fd)

We don't call pollsys anywhere.  Something in Solaris must be doing it
under the hood.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-13 Thread Alvaro Herrera
Chris wrote:
> I have a lot of centos servers which are running postgres.  Postgres isn't 
> used
> that heavily on any of them, but lately, the stats collector process keeps
> causing tons of IO load.  It seems to happen only on servers with centos 5.

Does this correlate to an increase in size of the pgstat.stat file?
Maybe you could try resetting stats, so that the file goes back to an
initial size and is slowly repopulated.  I'd suggest monitoring the size
of the stats file, just in case there's something abnormal with it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Database size growing over time and leads to performance impact

2010-03-31 Thread Alvaro Herrera
Scott Carey wrote:
> 
> On Mar 27, 2010, at 6:35 AM, Andy Colson wrote:
> > 
> > Dont "VACUUM FULL", its not helping you, and is being removed in newer 
> > versions.
> > 
> 
> Off topic:  How is that going to work?  CLUSTER doesn't work on tables
> without an index.  I would love to be able to CLUSTER on some column
> set that doesn't necessarily have an index.

VACUUM FULL has been rewritten in 9.0 so that it uses the CLUSTER logic,
except that it doesn't require an index.

If you want to do it in earlier versions, you can use a no-op SET TYPE
command, like so:

ALTER TABLE foo ALTER COLUMN bar SET TYPE baz;

assuming that table foo has a column bar which is already of type baz.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Alvaro Herrera
Andres Freund escribió:

> I find it way much easier to believe such issues exist on a tables in 
> constrast to indexes. The likelihood to get sequential accesses on an index 
> is 
> small enough on a big table to make it unlikely to matter much.

Vacuum walks indexes sequentially, for one.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Block at a time ...

2010-03-16 Thread Alvaro Herrera
Dave Crooke escribió:

> An awesomely simple alternative is to just specify the extension as e.g. 5%
> of the existing table size  it starts by adding one block at a time for
> tiny tables, and once your table is over 20GB, it ends up adding a whole 1GB
> file and pre-allocating it. Very little wasteage.

I was thinking in something like that, except that the factor I'd use
would be something like 50% or 100% of current size, capped at (say) 1 GB.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Alvaro Herrera
Tom Lane escribió:
> Alvaro Herrera  writes:
> > Tom Lane escribió:
> >> That's not going to do anything towards reducing the actual I/O volume.
> >> Although I suppose it might be useful if it just cuts the number of
> >> seeks.
> 
> > Oh, they had no problems with I/O volume.  It was relation extension
> > lock that was heavily contended for them.
> 
> Really?  I guess that serialized all the I/O ... I'll bet if we got rid
> of that locking somehow, they *would* have a problem with I/O volume.

Well, that would solve the problem as far as I'm concerned and they'd
have to start talking to their storage provider ;-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Alvaro Herrera
Tom Lane escribió:
> Alvaro Herrera  writes:
> > Tom Lane escribi�:
> >> Reorder to what, though?  You still have the problem that we don't know
> >> much about the physical layout on-disk.
> 
> > Well, to block numbers as a first step.
> 
> fsync is a file-based operation, and we know exactly zip about the
> relative positions of different files on the disk.

Doh, right, I was thinking in the sync-file-range kind of API.


> > We had a customer that had a
> > performance problem because they were inserting lots of data to TOAST
> > tables, causing very frequent extensions.  I kept wondering whether an
> > allocation policy that allocated several new blocks at a time could be
> > useful (but I didn't try it).  This would also alleviate fragmentation,
> > thus helping the physical layout be more similar to logical block
> > numbers.
> 
> That's not going to do anything towards reducing the actual I/O volume.
> Although I suppose it might be useful if it just cuts the number of
> seeks.

Oh, they had no problems with I/O volume.  It was relation extension
lock that was heavily contended for them.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Alvaro Herrera
Tom Lane escribió:
> Alvaro Herrera  writes:
> > Maybe it would make more sense to try to reorder the fsync calls
> > instead.
> 
> Reorder to what, though?  You still have the problem that we don't know
> much about the physical layout on-disk.

Well, to block numbers as a first step.

However, this reminds me that sometimes we take the block-at-a-time
extension policy too seriously.  We had a customer that had a
performance problem because they were inserting lots of data to TOAST
tables, causing very frequent extensions.  I kept wondering whether an
allocation policy that allocated several new blocks at a time could be
useful (but I didn't try it).  This would also alleviate fragmentation,
thus helping the physical layout be more similar to logical block
numbers.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Alvaro Herrera
Greg Stark escribió:
> On Tue, Mar 16, 2010 at 2:30 PM, Tom Lane  wrote:
> > "Pierre C"  writes:
> >> Does PG issue checkpoint writes in "sorted" order ?
> >
> > No.  IIRC, a patch for that was submitted, and rejected because no
> > significant performance improvement could be demonstrated.  We don't
> > have enough information about the actual on-disk layout to be very
> > intelligent about this, so it's better to just issue the writes and
> > let the OS sort them.
> 
> Keep in mind that postgres is issuing writes to the OS buffer cache.
> It defers fsyncing the files as late as it can in the hopes that most
> of those buffers will be written out by the OS before then. That gives
> the OS a long time window in which to flush them out in whatever order
> and whatever schedule is most convenient.

Maybe it would make more sense to try to reorder the fsync calls
instead.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Internal operations when the planner makes a hash join.

2010-02-23 Thread Alvaro Herrera
negora wrote:

> According to how I understood the process, the engine would get the
> name from the student with ID 1 and would look for the name of the
> father with ID 1 in the hashed table. It'd do exactly the same with the
> student #2 and father #2. But my big doubt is about the 3rd one
> (Anthony). Would the engine "know" that it already had retrieved the
> father's name for the student 1 and would avoid searching for it into
> the hashed table (using some kind of internal mechanism which allows to
> "re-utilize" the name)? Or would it search into the hashed table again?

The hash table is searched again.  But that's fast, because it's a hash
table.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AutoVacuum_NapTime

2010-02-22 Thread Alvaro Herrera
George Sexton wrote:

> If I'm cold starting the system, would it vacuum all 330 databases and then
> wait 720 minutes and then do them all again, or would it distribute the
> databases more or less evenly over the time period?

the latter

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-12 Thread Alvaro Herrera
Kevin Grittner wrote:
> Alvaro Herrera  wrote:

> > Actually, a transaction that performed no writes doesn't get a
> > commit WAL record written, so it shouldn't make any difference at
> > all.
>  
> Well, concurrent to the web application is the replication.  Would
> asynchronous commit of that potentially alter the pattern of writes
> such that it had less impact on the reads?

Well, certainly async commit would completely change the pattern of
writes: it would give the controller an opportunity to reorder them
according to some scheduler.  Otherwise they are strictly serialized.

> I'm thinking, again, of
> why the placement of the pg_xlog on a separate file system made such
> a dramatic difference to the read-only response time -- might it
> make less difference if the replication was using asynchronous
> commit?

Yeah, I think it would have been less notorious, but this is all
theoretical.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-12 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Kevin Grittner wrote:

> > Anyway, given that these are replication
> > targets, and aren't the "database of origin" for any data of their
> > own, I guess there's no reason not to try asynchronous commit. 
> 
> Yeah; since the transactions only ever write commit records to WAL, it
> wouldn't matter a bit that they are lost on crash.  And you should see
> an improvement, because they wouldn't have to flush at all.

Actually, a transaction that performed no writes doesn't get a commit
WAL record written, so it shouldn't make any difference at all.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-12 Thread Alvaro Herrera
Kevin Grittner wrote:
> Hannu Krosing  wrote:
>  
> > Can it be, that each request does at least 1 write op (update
> > session or log something) ?
>  
> Well, the web application connects through a login which only has
> SELECT rights; but, in discussing a previous issue we've pretty well
> established that it's not unusual for a read to force a dirty buffer
> to write to the OS.  Perhaps this is the issue here again.  Nothing
> is logged on the database server for every request.

I don't think it explains it, because dirty buffers are obviously
written to the data area, not pg_xlog.

> I wonder if it might also pay to make the background writer even more
> aggressive than we have, so that SELECT-only queries don't spend so
> much time writing pages.

That's worth trying.

> Anyway, given that these are replication
> targets, and aren't the "database of origin" for any data of their
> own, I guess there's no reason not to try asynchronous commit. 

Yeah; since the transactions only ever write commit records to WAL, it
wouldn't matter a bit that they are lost on crash.  And you should see
an improvement, because they wouldn't have to flush at all.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-11 Thread Alvaro Herrera
Kevin Grittner wrote:

> Another example of why I shouldn't trust my memory.  Per the
> hardware tech:
>  
>  
> OS:  /dev/sda   is RAID1  -  2  x  2.5" 15k SAS disk
> pg_xlog: /dev/sdb   is RAID1  -  2  x  2.5" 15k SAS disk
>  
> These reside on a ServeRAID-MR10k controller with 256MB BB cache.
>  
>  
> data:/dev/sdc   is RAID5  -  30 x 3.5" 15k SAS disk
>  
> These reside on the DS3200 disk subsystem with 512MB BB cache per
> controller and redundant drive loops.

Hmm, so maybe the performance benefit is not from it being on a separate
array, but from it being RAID1 instead of RAID5?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2010-02-07 Thread Alvaro Herrera
Andres Freund escribió:

> I personally think the fsync on the directory should be added to the stable 
> branches - other opinions?
> If wanted I can prepare patches for that.

Yeah, it seems there are two patches here -- one is the addition of
fsync_fname() and the other is the fsync_prepare stuff.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Alvaro Herrera
Jochen Erwied escribió:

> Maybe using 'numeric(19)' instead of bigint is an alternative. I actually
> don't know how these numbers are stored internally (some kind of BCD, or as
> base-100?), but IMHO they should be faster than strings, although not as
> fast as 'native' types.

base 10000 in the current implementation

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bigint integers up to 19 digits.

2010-02-04 Thread Alvaro Herrera
Tory M Blue escribió:

> I looked into the numeric data type, but the docs say that it can be slow.

It is slower than values that fit in a single CPU register, sure.  Is it
slow enough that you can't use it?  That's a different question.  I'd
give it a try -- maybe it's not all that slow.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] the jokes for pg concurrency write performance

2010-02-02 Thread Alvaro Herrera
Scott Marlowe escribió:
> 2010/2/1  :

> Let's rewrite this assertion:
> > * joke 1: insert operation would use a excluse lock on reference row by the
> > foreign key . a big big big performance killer , i think this is a stupid
> > design .
> 
> "problem #1: insert operation would use a excluse lock on reference row by the
> foreign key . a big big big performance killer.  "

Yeah, if it had been written this way I could have told him that this
is not the case since 8.1, but since he didn't, I simply skipped his
emails.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-21 Thread Alvaro Herrera
Scott Marlowe escribió:
> On Thu, Jan 21, 2010 at 8:51 AM, Alvaro Herrera
>  wrote:
> > Scott Marlowe escribió:
> >> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
> >>  wrote:
> >
> >> > 4) Is this the right PG version for our needs?
> >>
> >> 8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
> >> had, and still am having, problems with it crashing in production.
> >> Not often, maybe once every couple of months, but just enough that I'm
> >> not ready to try and use it there yet.  And I can't force the same
> >> failure in testing, at least not yet.
> >
> > uh.  Is there a report of the crash somewhere with details, say stack
> > traces and such?
> 
> No, the only server that does this is in production as our stats db
> and when it happens it usually gets restarted immediately.  It does
> this about once every two months.  Do the PGDG releases have debugging
> symbols and what not?  I'll see about having a stack trace ready to
> run for the next time it does this.

You mean the RPMs?  Yes, I think Devrim publishes debuginfo packages
which you need to install separately.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-21 Thread Alvaro Herrera
Scott Marlowe escribió:
> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
>  wrote:

> > 4) Is this the right PG version for our needs?
> 
> 8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
> had, and still am having, problems with it crashing in production.
> Not often, maybe once every couple of months, but just enough that I'm
> not ready to try and use it there yet.  And I can't force the same
> failure in testing, at least not yet.

uh.  Is there a report of the crash somewhere with details, say stack
traces and such?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Alvaro Herrera
Alvaro Herrera escribió:

> No amount of tinkering is going to change the fact that a seqscan is the
> fastest way to execute these queries.  Even if you got it to be all in
> memory, it would still be much slower than the other systems which, I
> gather, are using columnar storage and thus are perfectly suited to this
> problem (unlike Postgres).  The talk about "compression ratios" caught
> me by surprise until I realized it was columnar stuff.  There's no way
> you can get such high ratios on a regular, row-oriented storage.

FWIW if you want a fair comparison, get InnoDB numbers.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Alvaro Herrera
Lefteris escribió:
> Yes, I am reading the plan wrong! I thought that each row from the
> plan reported the total time for the operation but it actually reports
> the starting and ending point.
> 
> So we all agree that the problem is on the scans:)
> 
> So the next question is why changing shared memory buffers will fix
> that? i only have one session with one connection, do I have like many
> reader workers or something?

No amount of tinkering is going to change the fact that a seqscan is the
fastest way to execute these queries.  Even if you got it to be all in
memory, it would still be much slower than the other systems which, I
gather, are using columnar storage and thus are perfectly suited to this
problem (unlike Postgres).  The talk about "compression ratios" caught
me by surprise until I realized it was columnar stuff.  There's no way
you can get such high ratios on a regular, row-oriented storage.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FSM - per database or per installation?

2009-12-23 Thread Alvaro Herrera
Craig James wrote:
> Heikki Linnakangas wrote:

> >The parameter is gone in 8.4, BTW.
> 
> Both max_fsm_relations and max_fsm_pages?

Yes, both are gone.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is Diskeeper Automatic Mode safe?

2009-11-16 Thread Alvaro Herrera
Craig James escribió:

> Do it more than once.  This is a highly erratic test that can catch
> your system at a wide variety of points, some of which cause no
> problems, and some of which can be catastrophic.  If you test and it
> fails, you know you have a problem.  If you test and it doesn't fail,
> you don't know much.  It's only when you've tested a number of times
> without failure that you've gained any real knowledge.

Of course, you're only truly safe when you've tested infinite times,
which may take a bit longer than management expects.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-16 Thread Alvaro Herrera
decibel escribió:

> >If you want it to be seemless and fully optimal, you would
> >introduce a new int256 type (or whatever the name of the type you
> >are trying to represent). Adding new types to PostgreSQL is not
> >that hard. This would allow queries (=, <>, <, >) as well.
> 
> If you want an example of that, we had Command Prompt create a full
> set of hash datatypes (SHA*, and I think md5). That stuff should be
> on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com
> and I'll get it added.

It's at project "shatypes".

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to post Performance Questions

2009-09-21 Thread Alvaro Herrera
Kevin Grittner wrote:
> Michael Glaesemann  wrote:
> > On Sep 14, 2009, at 16:55 , Josh Berkus wrote:
>  
> >> Please read the following two documents before posting your  
> >> performance query here:
> >>
> >> http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
> >> http://wiki.postgresql.org/wiki/SlowQueryQuestions
> >>
> >> This will help other users to troubleshoot your problems far
> >> more rapidly.
> > 
> > Can something similar be added to the footer of (at least) the  
> > performance list?
>  
> Perhaps on this page?:
>  
> http://www.postgresql.org/community/lists/

Done this part.  (It'll take some time to propagate.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - "bit" data types

2009-09-07 Thread Alvaro Herrera
Karl Denninger escribió:

> The individual boolean fields don't kill me and in terms of some of the
> application issues they're actually rather easy to code for.
> 
> The problem with re-coding for them is extensibility (by those who
> install and administer the package); a mask leaves open lots of extra
> bits for "site-specific" use, where hard-coding booleans does not, and
> since the executable is a binary it instantly becomes a huge problem for
> everyone but me.

Did you try hiding the bitmask operations inside a function as Tom
suggested?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner question - "bit" data types

2009-09-05 Thread Alvaro Herrera
Karl Denninger escribió:
> Tom Lane wrote:

> > You never showed us any EXPLAIN results,
> Yes I did.  Go back and look at the archives.  I provided full EXPLAIN
> and EXPLAIN ANALYZE results for the original query.  Sheesh.

You did?  Where?  This is your first message in this thread:
http://archives.postgresql.org/pgsql-performance/2009-09/msg00059.php
No EXPLAINs anywhere to be seen.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] number of rows estimation for bit-AND operation

2009-08-21 Thread Alvaro Herrera
Robert Haas escribió:

> Scott, did you check whether a toast table got created here and what
> the size of it was?

A table with only bool columns (and, say, one int8 column) would not
have a toast table.  Only varlena columns produce toast tables.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Alvaro Herrera
Greg Stark wrote:

> It would be nice to have a solution to that where you could create
> lightweight temporary objects which belong to an "application session"
> which can be picked up by a different database connection each go
> around.

It would be useful:

CREATE SCHEMA session1234 UNLOGGED
  CREATE TABLE hitlist ( ... );

Each table in the "session1234" schema would not be WAL-logged, and
would be automatically dropped on crash recovery (actually the whole
schema would be).  But while the server is live it behaves like a
regular schema/table and can be seen by all backends (i.e. not temp)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Per-database warm standby?

2009-08-14 Thread Alvaro Herrera
Tom Lane wrote:
> Craig James  writes:
> > 8.4 has vastly improved the warm-standby features, but it looks to me like 
> > this is still an installation-wide backup, not a per-database backup.  That 
> > is, if you have (say) a couple hundred databases, and you only want 
> > warm-backup on one of them, you can't do it (except using other solutions 
> > like Slony).  Is that right?
> 
> Correct, and that's always going to be true of any WAL-based solution.

Except that we could create a "WAL filter" to restore only relevant
stuff to particular databases ...  Would that work?  Of course, it would
have to ensure that global objects are also recovered, but we could
simply ignore commands for other databases.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [HACKERS] freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )

2009-08-13 Thread Alvaro Herrera
Jeff Davis wrote:

> Why aren't we more opportunistic about freezing tuples? For instance, if
> we already have a dirty buffer in cache, we should be more aggressive
> about freezing those tuples than freezing tuples on disk.

The most widely cited reason is that you lose forensics data.  Although
they are increasingly rare, there are still situations in which the heap
tuple machinery messes up and the xmin/xmax/etc fields of the tuple are
the best/only way to find out what happened and thus fix the bug.  If
you freeze early, there's just no way to know.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Memory usage of writer process

2009-08-13 Thread Alvaro Herrera
Alex wrote:
> The writer process seems to be using inordinate amounts of memory:
> 
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
> 11088 postgres  13  -2 3217m 2.9g 2.9g S0 38.7   0:10.46 postgres:
> writer process
> 20190 postgres  13  -2 3219m  71m  68m S0  0.9   0:52.48 postgres:
> cribq cribq [local] idle
> 
> I am writing moderately large (~3k) records to my database a few times
> a second.  Even when I stop doing that, the process continues to take
> up all of that memory.
> 
> Am I reading this right?  Why is it using so much memory?

shared_buffers?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

> In some sense this is a bootstrap problem: what does it take to get to
> the point of being able to read pg_database and its indexes?  That is
> necessarily not dependent on the particular database we want to join.
> Maybe we could solve it by having the relcache write a "global" cache
> file containing only entries for the global tables, and load that before
> we have identified the database we want to join (after which, we'll load
> another cache file for the local entries).

This sounds good, because autovacuum could probably use this too.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

> Actually, I had forgotten that we were using the pg_database flatfile
> for purposes other than authentication checks.  In particular, we need
> it during backend startup to map from database name to database OID,
> without which it's impossible to locate the system catalogs for the
> target database.  It's pretty hard to see a way around that one.
> We could grovel through pg_database itself, as indeed is done to rebuild
> the flatfile during system start.  But that's certainly not going to be
> fast in cases where there are enough DBs to make the flatfile slow.

Also, IIRC flatfiles were introduced precisely to avoid having to read
the catalogs manually.

> So on third thought, Alvaro's right: the only real solution here is to
> adopt a more efficient representation of the flat files.  Maybe some
> sort of simple hashtable arrangement would work.  (Rendering them not so
> flat anymore...)

As long as there's a simple API, there should be no problem.

(Except that it would be nice to be able to build the file incrementally
...  If we have to write out a million lines each time a millionth user
is created, there will still be a bottleneck at CREATE USER time.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-08-08 Thread Alvaro Herrera
Tom Lane wrote:

> Well, it's a TO-THINK-ABOUT anyway.  I think the appropriate next step
> would not be to write code, but to do a detailed investigation of what
> would be gained or lost.  I don't remember exactly what we do with the
> flat-file contents.

Maybe what we need is not to get rid of the flat files, but to speed
them up.  If we're worried about speed in the pg_authid flatfile, and
come up with a solution to that problem, what will we do with the
pg_database flatfile when it grows too large?  We can't just get rid of
it, because autovacuum needs to access it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgresql and syslog

2009-08-07 Thread Alvaro Herrera
Michael Nacos escribió:

> I would be very surprised if logging had a significant overhead any method
> you choose. there's probably something very wrong with your setup if this
> is the case.

Either something very wrong, or the load is extremely high.  In the
latter case perhaps it would make sense to ship syslog to a remote
machine.  Since it uses UDP sockets, it wouldn't block when overloaded
but rather lose messages (besides, it means it has low overhead).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-04 Thread Alvaro Herrera
Merlin Moncure escribió:
> On Mon, Aug 3, 2009 at 5:30 PM, PFC wrote:
> >
> >> lzo is much, much, (much) faster than zlib.  Note, I've tried several
> >
> > decompression speed is even more awesome...
> >
> >> times to contact the author to get clarification on licensing terms
> >> and have been unable to get a response.
> >
> > lzop and the LZO library are distributed under the terms of the GNU General
> > Public License (GPL).
> > source : http://www.lzop.org/lzop_man.php
> 
> yeah...I have another project I'm working on that is closed source,
> plus I was curious if something could be worked out for pg...lzo seems
> ideal for database usage.

I think this was already discussed here.  It turns out that a specific
exception for PG wouldn't be acceptable because of the multiple
commercial derivates.  LZO would have to become BSD, which presumably
the author just doesn't want to do.

Maybe we could have a --enable-lzo switch similar to what we do with
readline.  Of course, a non-LZO-enabled build would not be able to read
a dump from such a build.  (We could also consider LZO for TOAST
compression).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] Postgres user authentification or LDAP authentification

2009-07-26 Thread Alvaro Herrera
Lauris Ulmanis wrote:
> Hello!
> 
>  
> 
> I posted you a message about slowness of creation users more than 500 000
> (#4919). It seems there is no workaround of this problem because of using
> pg_auth flat file.
> 
>  
> 
> To override this problem is it possible to use LDAP authentification metod
> to identify each user and speed up system?

No.  The users still need to exist in the PG auth system.

I'm sure this is just some missing optimization.  Feel free to work on
the code to improve performance for these cases.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Alvaro Herrera
toruvinn wrote:
> On Wed, 15 Jul 2009 16:02:09 +0200, Alvaro Herrera  
>  wrote:
>> My bet is on the pg_auth flat file.  I doubt we have ever tested the
>> behavior of that code with 1 billion users ...

> I was always wondering, though, why PostgreSQL uses this approach and not 
> its catalogs.

It does use the catalog for most things.  THe flatfile is used for the
situations where the catalogs are not yet ready to be read.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [BUGS] BUG #4919: CREATE USER command slows down system performance

2009-07-15 Thread Alvaro Herrera
Lauris Ulmanis wrote:
> Hello again!
> 
> I did test on my local test server
> 
> I created up 500 000 users in function loop very quickly - within 48
> seconds. I did again this script reaching up to 1 billion users - results
> was the same - 48 seconds. It is very quickly.
> 
> But problem seems is with transaction preparation because if in database is
> 1 billion users and I want to create 1 new - it will take 4 seconds! 
> 
> After that I generated up to 2 billion users in this server (generation
> process took just 1.44 minutes of times - again quickly).
> 
> And did 1 user creation again - now it took 9 seconds of time!
> 
> What is a reason of this slowness? Is there a workaround or solution how to
> avoid it? 

My bet is on the pg_auth flat file.  I doubt we have ever tested the
behavior of that code with 1 billion users ...

Do you really need 1 billion users?  Are you planning on giving accounts
to every human being in the planet or what?  I mean, what's the point of
this test?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very big insert/join performance problem (bacula)

2009-07-15 Thread Alvaro Herrera
Marc Cousin escribió:

> There are other things I am thinking of : maybe it would be better to have 
> sort space on another (and not DBRD'ded) raid set ? we have a quite
> cheap setup right now for the database, and I think maybe this would help 
> scale better. I can get a filesystem in another volume group, which is not 
> used that much for now.

You know, that's the first thing it came to me when I read you're using
DRDB.  Have you tried setting temp_tablespace to a non-replicated disk?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   5   >