from:"Jeff Davis"

Re: [PERFORM] How to get higher tps

2006-08-21 Thread Jeff Davis

On Mon, 2006-08-21 at 16:45 -0400, Marty Jia wrote:
> I'm exhausted to try all performance tuning ideas, like following
> parameters
> 
> shared_buffers 
> fsync 

By "tuning" fsync, what do you mean? Did you turn it off?

If you turned fsync off, that could compromise your data in case of any
kind of crash or power failure. However, if you turn fsync off you
should much higher TPS on pgbench than you're getting.

> Dual Intel Xeon 2.8GHz
> 6GB RAM
> Linux 2.4 kernel
> RedHat Enterprise Linux AS 3
> 200GB for PGDATA on 3Par, ext3
> 50GB for WAL on 3Par, ext3

Does your disk controller have battery-backed writeback cache? How much?

> With PostgreSql 8.1.4
> 
> We don't have i/o bottle neck. 
> 

Well, chances are PostgreSQL is waiting for fsync, which means you do
have an I/O bottleneck (however, you're not using all of your I/O
bandwidth, most likely).

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] Storage Options

2006-08-21 Thread Jeff Davis

I am trying to decide what kind of storage options to use for a pair of
good database servers, a primary and a read-only that can be a failover.
Here is what I'm thinking so far:

(1) We have a nice NetApp that can do iSCSI. It has a large (multi-GB)
battery-backed cache so it could potentially perform the transactions at
a very high rate. However there are many other applications accessing
the NetApp over NFS, so I am not sure what performance to expect. Any
suggestions about using network storage like this for the database? Will
the database make huge demands on the NetApp, and force my department
spend huge amounts on new NetApp hardware?

(2) I read with interest this thread:
http://archives.postgresql.org/pgsql-performance/2006-08/msg00164.php

Is there any consensus on whether to do WAL on a RAID-1 and PGDATA on a
RAID-10 versus everything on a RAID-10? How does the number of disks I
have affect this decision (I will probably have 4-8 disks per server).

Some of the applications I initially need to support will be a high
volume of simple transactions without many tablescans, if that helps.
However, I expect that these servers will need to serve many needs.

Any other suggestions are appreciated. Is there a common place to look
for hardware suggestions (like a postgresql hardware FAQ)?

Regards,
    Jeff Davis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] PowerEdge 2950 questions

2006-08-22 Thread Jeff Davis

This question is related to the thread:
http://archives.postgresql.org/pgsql-performance/2006-08/msg00152.php
but I had some questions.

I am looking at setting up two general-purpose database servers,
replicated with Slony. Each server I'm looking at has the following
specs:

Dell PowerEdge 2950
- 2 x Dual Core Intel® Xeon® 5130, 4MB Cache, 2.00GHz, 1333MHZ FSB
- 4GB RAM
- PERC 5/i, x6 Backplane, Integrated Controller Card (256MB battery-
backed cache)
- 6 x 73GB, SAS, 3.5-inch, 15K RPM Hard Drive arranged in RAID 10

These servers are reasonably priced and so they seem like a good choice
for the overall price, and the above thread indicated good performance.
However, I want to make sure that putting WAL in with PGDATA on the
RAID-10 is wise. And if there are any other suggestions that would be
great. Is the RAID controller good? Are the processors good for database
work or are Opterons significantly better?

I may go for more storage as well (i.e. getting 300GB disks), but I am
still determining the potential need for storage. I can get more RAM at
a later date if necessary also.

Regards,
    Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] VACUUM FULL needed sometimes to prevent transaction

2006-08-22 Thread Jeff Davis

On Tue, 2006-08-22 at 20:10 +0200, Marinos Yannikos wrote:
> Hello,
> 
> we're looking into the reason why we are getting warnings about 
> transaction ID wraparound despite a daily "vaccumdb -qaz". Someone is 
> claiming that VACUUM without FULL cannot reassign XIDs properly when 
> max_fsm_pages was set too low (it says so here too, but this is rather 
> old: http://www.varlena.com/GeneralBits/Tidbits/perf.html#maxfsmp). Is 
> this true, or do we have a different issue here? We're using 8.1.3 with 
> a database generated on 8.1.3 (i.e. not migrated from 7.x or anything 
> like that).

Usually this is caused by either:
(1) You're not vacuuming as a superuser, so it's not able to vacuum
everything.
(2) You have a long-running transaction that never completed for some
strange reason.

Hope this helps,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] PowerEdge 2950 questions

2006-08-22 Thread Jeff Davis

On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> Hi Jeff,
> 
> My experience with the 2950 seemed to indicate that RAID10x6 disks did
> not perform as well as RAID5x6. I believe I posted some numbers to
> illustrate this in the post you mentioned. 
> 

Very interesting. I always hear that people avoid RAID 5 on database
servers, but I suppose it always depends. Is the parity calculation
something that may increase commit latency vs. a RAID 10? That's
normally the explanation that I get.

> If I remember correctly, the numbers were pretty close, but I was
> expecting RAID10 to significantly beat RAID5. However, with 6 disks,
> RAID5 starts performing a little better, and it also has good storage
> utilization (i.e. you're only loosing 1 disk's worth of storage, so with
> 6 drives, you still have 83% - 5/6 - of your storage available, as
> opposed to 50% with RAID10). 

Right, RAID 5 is certainly tempting since I get so much more storage.

> Keep in mind that with 6 disks, theoretically (your mileage may vary by
> raid controller implementation) you have more fault tolerance with
> RAID10 than with RAID5.

I'll also have the Slony system, so I think my degree of safety is still
quite high with RAID-5.

> Also, I don't think there's a lot of performance gain to going with the
> 15k drives over the 10k. Even dell only says a 10% boost. I've
> benchmarked a single drive configuration, 10k vs 15k rpm, and yes, the
> 15k had substantially better seek times, but raw io isn't much
> different, so again, it depends on your application's needs.

Do you think the seek time may affect transaction commit time though,
rather than just throughput? Or does it not make much difference since
we have writeback?

> Lastly, re your question on putting the WAL on the RAID10- I currently
> have the box setup as RAID5x6 with the WAL and PGDATA all on the same
> raidset. I haven't had the chance to do extensive tests, but from
> previous readings, I gather that if you have write-back enabled on the
> RAID, it should be ok (which it is in my case).

Ok, I won't worry about that then.

> As to how this compares with an Opteron system, if someone has some
> pgbench (or other test) suggestions and a box to compare with, I'd be
> happy to run the same on the 2950. (The 2950 is a 2-cpu dual core 3.0
> ghz box, 8GB ram with 6 disks, running FreeBSD 6.1 amd64 RELEASE if
> you're interested in picking a "fair" opteron equivalent ;)
> 

Based on your results, I think the Intels should be fine. Does each of
the cores have independent access to memory (therefore making memory
access more parallel)?

Thanks very much for the information!

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] Which benchmark to use for testing FS?

2006-08-23 Thread Jeff Davis

I am planning to test various filesystems on some new hardware I'm
getting. Is pgbench a good way to try out the filesystem?

I'm currently planning to test some or all of:
Linux: ext2, ext3, XFS, JFS, reiser3, reiser4
FreeBSD: UFS, UFS+SU

So, I'm looking for a good way to test just the filesystem performance
through PostgreSQL (since database access is different than normal FS
activity). Would pgbench give me a good approximation?

Also, do ext2 or UFS without soft updates run the risk of losing or
corrupting my data?

I saw Chris Browne did some benchmarks back in 2003 and determined that
JFS was a good choice. However, I assume things have changed somewhat
since then. Does anyone have a pointer to some newer results?

Regards,
    Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Which benchmark to use for testing FS?

2006-08-24 Thread Jeff Davis

On Wed, 2006-08-23 at 21:50 -0400, Michael Stone wrote:
> On Wed, Aug 23, 2006 at 03:23:03PM -0700, Jeff Davis wrote:
> >Also, do ext2 or UFS without soft updates run the risk of losing or
> >corrupting my data?
> 
> I suggest you check the list archives; there's a lot of stuff about 
> filesystems and disk configuration in there.
> 

I spent a while looking in the list archives, but the list archives have
been misbehaving lately (you click on a search result and the message
that appears doesn't have the same subject as the one you clicked on).
They may have fixed that (they are aware of the problem, according to
pgsql-www). Also, the messages I was able to find were mostly from a
long time ago.

If you have a pointer to a particularly useful thread please let me
know.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PowerEdge 2950 questions

2006-08-24 Thread Jeff Davis

On Thu, 2006-08-24 at 09:21 -0400, Merlin Moncure wrote:
> On 8/22/06, Jeff Davis <[EMAIL PROTECTED]> wrote:
> > On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> > Very interesting. I always hear that people avoid RAID 5 on database
> > servers, but I suppose it always depends. Is the parity calculation
> > something that may increase commit latency vs. a RAID 10? That's
> > normally the explanation that I get.
> 
> it's not the parity, it's the seeking.  Raid 5 gives you great
> sequential i/o but random is often not much better than a single
> drive.  Actually it's the '1' in raid 10 that plays the biggest role
> in optimizing seeks on an ideal raid controller.  Calculating parity
> was boring 20 years ago as it inolves one of the fastest operations in
> computing, namely xor. :)
> 

Here's the explanation I got: If you do a write on RAID 5 to something
that is not in the RAID controllers cache, it needs to do a read first
in order to properly recalculate the parity for the write.

However, I'm sure they try to avoid this by leaving the write in the
battery-backed cache until it's more convenient to do the read, or maybe
until the rest of the stripe is written in which case it doesn't need to
do the read. I am not sure the actual end effect.

> > > Lastly, re your question on putting the WAL on the RAID10- I currently
> > > have the box setup as RAID5x6 with the WAL and PGDATA all on the same
> > > raidset. I haven't had the chance to do extensive tests, but from
> > > previous readings, I gather that if you have write-back enabled on the
> > > RAID, it should be ok (which it is in my case).
> 
> with 6 relatively small disks I think single raid 10 volume is the
> best bet.  however above 6 dedicated wal is usually worth considering.
>  since wal storage requirements are so small, it's becoming affordable
> to look at solid state for the wal.
> 

I've often wondered about that. To a certain degree, that's the same
effect as just having a bigger battery-backed cache, right?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Related to Inserting into the database from XML file

2006-08-25 Thread Jeff Davis

On Fri, 2006-08-25 at 21:23 +0530, soni de wrote:
> Hello,
>  
> I want to ask, Is there any way to insert records from XML file to the
> postgres database?

Try the contrib/xml2 module.

>  
> Please provide me some help regarding above query.
>  
> Postgres version which we are using is 7.2.4
>  

I highly recommend upgrading if at all possible. That's quite an old
version.

Hope this helps,
Jeff Davis



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] 64bit vs 32bit build on amd64

2006-09-07 Thread Jeff Davis

On Tue, 2006-09-05 at 00:42 +0400, Roman Krylov wrote:
> Hi.
> My config:
> gentoo linux "2005.1" on amd64x2 in 64-bit mode,
> kernel 2.6.16.12
> glibc 3.3.5(NPTL),
> gcc 3.4.3.
> I had not used portage for building.
> I built two versions of postgres from sources:
> postgresql-8.1.4 native(64bit)
> and 32-bit with CFLAGS=... -m32, and "LD =
> /usr/x86_64-pc-linux-gnu/bin/ld -melf_i386" in src/Makefile.global.
> 32-bit build runs much faster than 64 apparently.
> What benchmark utility should I run to provide more concrete info (numbers)?
> What could be the reason of that difference in performance?
> 

I am also interested in 32-bit versus 64-bit performance. If I only have
4GB of RAM, does it make sense to compile postgresql as a 64-bit
executable? I assume there's no reason for PostgreSQL's shared buffers,
etc., to add up to more than 2GB on a system with 4GB of RAM.

Is there a general consensus on the matter, or is it highly application-
dependent? I am not doing any huge amount of 64-bit arithmetic.

I am using Woodcrest, not Opteron.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Performance in a 7 TB database.

2006-09-08 Thread Jeff Davis

On Fri, 2006-09-08 at 10:30 +0100, Nuno Alexandre Alves wrote:
> Hi,
> 
> I have a customer who wants a database solution for a 7 TB database.
> Insert will be the main action in the database.
> 
> There are some case studies with detail information about performance 
> and hardware solution on this database size?
> What are the minimum hardware requirements for this kind of database?
> 

This is a good place to start:
http://www.postgresql.org/about/users

I would expect that the case studies for databases greater than 7TB are
few and far between (for any database software). If you decide
PostgreSQL is right, I'm sure the advocacy mailing list would like to
see your case study when you are finished.

Your hardware requirements mostly depend on how you're going to use the
data. If you expect that most of the data will never be read, and that
the database will be more of an archive, the requirements might be quite
reasonable. However, if or when you do need to search through that data,
expect it to take a long time.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Performance problem with Sarge compared with Woody

2006-09-11 Thread Jeff Davis

On Mon, 2006-09-11 at 20:14 +0200, Piñeiro wrote:
> Hi,
> 
> a week ago we migrate a Woody(postgre 7.2.1) server to Sarge(postgre
> 7.4.7). To migrate the database we use a dump, using pg_dump with this
> options:
> pg_dump -U  -c -F p -O -v -f  
> 
> We have a search, that using woody take about 1-2 minutes, but with
> sarge it is executing about 2 hours, and at least it crashes, with a
> message about a temporal file and no more disk space ( i have more than
> a GB of free disk space).
> 

It sounds to me like it's choosing a bad sort plan, and unable to write
enough temporary disk files.

A likely cause is that you did not "vacuum analyze" after you loaded the
data. Try running that command and see if it helps. If not, can you
provide the output of "explain" and "explain analyze" on both the old
database and the new?

Also, I suggest that you upgrade to 8.1. 7.4 is quite old, and many
improvements have been made since then.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Jeff Davis

On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote:
> That seems to have done it.  Are there any side effects to this
> change?  I read about random_page_cost in the documentation and it
> seems like this is strictly for planning.  All the tables on this
> database will be indexed and of a size similar to these two, so I
> don't see it  causing any other problems.  Though I would check though
> :)
> 

Right, it's just used for planning. Avoid setting it too low, if it's
below about 2.0 you would most likely see some very strange plans.
Certainly it doesn't make sense at all to set it below 1.0, since that
is saying it's cheaper to get a random page than a sequential one.

What was your original random_page_cost, and what is the new value you
set it to?

Regards,
Jeff Davis

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Jeff Davis

On Wed, 2006-09-13 at 10:19 -0600, Joshua Marsh wrote:
> > Right, it's just used for planning. Avoid setting it too low, if it's
> > below about 2.0 you would most likely see some very strange plans.
> > Certainly it doesn't make sense at all to set it below 1.0, since that
> > is saying it's cheaper to get a random page than a sequential one.
> >
> > What was your original random_page_cost, and what is the new value you
> > set it to?
> >
> > Regards,
> >Jeff Davis
> >
> >
> >
> >
> 
> I tried it at several levels.  It was initially at 4 (the default).  I
> tried 3 and 2 with no changes.  When I set it to 1, it used and index
> on view_505 but no r3s169:
> 
> data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
> v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
> FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
> s.dsiacctno;
>  QUERY PLAN
> 
>  Merge Join  (cost=154730044.01..278318711.49 rows=285230272 width=11)
>Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
>->  Index Scan using view_505_dsiacctno on view_505 v
> (cost=0.00..111923570.63 rows=112393848 width=20)
>->  Sort  (cost=154730044.01..155443119.69 rows=285230272 width=17)
>  Sort Key: s.dsiacctno
>  ->  Seq Scan on r3s169 s  (cost=1.00..106873675.72
> rows=285230272 width=17)
> 
> 
> Setting to 0.1 finally gave me the result I was looking for. I know
> that the index scan is faster though.  The seq scan never finished (i
> killed it after 24+ hours) and I'm running the query now with indexes
> and it's progressing nicely (will probably take 4 hours).

Hmm... that sounds bad. I'm sure your system will always choose indexes
with that value.

Is it overestimating the cost of using indexes or underestimating the
cost of a seq scan, or both? Maybe explain with the 0.1 setting will
help?

Regards,
Jeff Davis




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Jeff Davis

On Thu, 2006-09-14 at 11:23 -0400, Francisco Reyes wrote:
> My setup:
> Freebsd 6.1
> Postgresql 8.1.4
> Memory: 8GB
> SATA Disks 
> 
> Raid 1 10 spindles (2 as hot spares)
> 500GB disks (16MB buffer), 7200 rpm
> Raid 10
> 
> Raid 2 4 spindles
> 150GB 10K rpm disks
> Raid 10
> 
> shared_buffers = 1

Why so low? You have a lot of memory, and shared_buffers are an
important performance setting. I have a machine with 4GB of RAM, and I
found my best performance was around 15 shared buffers, which is a
little more than 1GB.

The default value of 1000 was chosen so that people who use PostgreSQL
only incidentally among many other programs do not notice an impact on
their system. It should be drastically increased when using PostgreSQL
on a dedicated system, particularly with versions 8.1 and later.

Also, a VACUUM helps a table that gets UPDATEs and DELETEs. If you're
doing mostly inserts on a big table, there may be no need to VACUUM it 3
times per day. Try VACUUMing the tables that get more UPDATEs and
DELETEs more often, and if a table has few UPDATEs/DELETEs, VACUUM it
only occasionally. You can run ANALYZE more frequently on all the
tables, because it does not have to read the entire table and doesn't
interfere with the rest of the operations.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Jeff Davis

On Thu, 2006-09-14 at 19:30 -0400, Francisco Reyes wrote:
> Will have to talk to the developers. In particular for every insert there 
> are updates. I know they have at least one table that gets udpated to have 
> summarized totals.
> 

If the table being updated is small, you have no problems at all. VACUUM
that table frequently, and the big tables rarely. If the big tables are
only INSERTs and SELECTs, the only reason to VACUUM is to avoid the xid
wraparound. See:

<http://www.postgresql.org/docs/8.1/static/maintenance.html>

See which tables need VACUUM, and how often. Use the statistics to see
if VACUUMing will gain you anything before you do it.

> One of the reasons I was doing the vacuumdb of the entire DB was to get the 
> number of shared-buffers. Now that I have an idea of how much I need I will 
> likely do something along the lines of what you suggest. One full for 
> everything at night and during the days perhaps do the tables that get more 
> updated. I also set more aggresive values on autovacuum so that should help 
> some too.

Why VACUUM FULL? That is generally not needed. Re-evaluate whether
you're gaining things with all these VACUUMs.

> > You can run ANALYZE more frequently on all the
> > tables, because it does not have to read the entire table and doesn't
> > interfere with the rest of the operations.
> 
> On a related question. Right now I have my autovacuums set as:
> autovacuum_vacuum_threshold = 5   
> autovacuum_analyze_threshold = 10
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.1
> 
> Based on what you described above then I could set my analyze values to the 
> same as the vacuum to have something like
> autovacuum_vacuum_threshold = 5
> autovacuum_analyze_threshold = 5
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.05
> 
> For DBs with hundreds of GBs would it be better to get 
> autovacuum_analyze_scale_factor to even 0.01? The permanent DB is over 200GB 
> and growing.. the 100GB ones are staging.. By the time we have finished 
> migrating all the data from the old system it will be at least 300GB. 0.01 
> is still 3GB.. pretty sizable.

Just test how long an ANALYZE takes, and compare that to how quickly
your statistics get out of date. As long as postgres is choosing correct
plans, you are ANALYZE-ing often enough.

ANALYZE takes statistical samples to avoid reading the whole table, so
it's really not a major influence on performance in my experience.

> Do the thresholds tabke presedence over the scale factors? Is it basically 
> if either one of them gets hit that the action will take place?

u = number of tuples UPDATE-ed or DELETE-ed (i.e. dead tuples)
r = the (estimated) number of total live tuples in the relation

In a loop, autovacuum checks to see if u >
(r*autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold), and if
so, it runs VACUUM. If not, it sleeps. It works the same way for
ANALYZE.

So, in a large table, the scale_factor is the dominant term. In a small
table, the threshold is the dominant term. But both are taken into
account.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Jeff Davis

On Thu, 2006-09-14 at 20:04 -0400, Francisco Reyes wrote:
> Michael Stone writes:
> 
> > On Thu, Sep 14, 2006 at 04:30:46PM -0400, Francisco Reyes wrote:
> >>Right now adding up from ps the memory I have about 2GB.
> > 
> > That's not how you find out how much memory you have. Try "free" or 
> > somesuch.
> 
> Wasn't trying to get an accurate value, just a ballpark figure.
> 
> When you say "free" are you refering to the free value from top? or some 
> program called free?
> 

Any long-running system will have very little "free" memory. Free memory
is wasted memory, so the OS finds some use for it.

The VM subsystem of an OS uses many tricks, including the sharing of
memory among processes and the disk buffer cache (which is shared also).
It's hard to put a number on the memory demands of a given process, and
it's also hard to put a number on the ability of a system to accommodate
a new process with new memory demands.

You have 8GB total, which sounds like plenty to me. Keep in mind that if
you have the shared_memory all allocated on physical memory (i.e.
"kern.ipc.shm_use_phys: 1" on FreeBSD), then that amount of physical
memory will never be available to processes other than postgres. At 2GB,
that still leaves 6GB for the other process, so you should be fine.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Jeff Davis

On Thu, 2006-09-14 at 20:07 -0400, Dave Cramer wrote:
> On 14-Sep-06, at 7:50 PM, Francisco Reyes wrote:
> 
> > Dave Cramer writes:
> >
> >> personally, I'd set this to about 6G. This doesn't actually  
> >> consume  memory it is just a setting to tell postgresql how much  
> >> memory is  being used for cache and kernel buffers
> >
> > Gotcha. Will increase further.
> >
> >> regarding shared buffers I'd make this much bigger, like 2GB or more
> >
> > Will do 2GB on the weekend. From what I read this requires shared  
> > memory so have to restart my machine (FreeBSD).
> >
> > if I plan to give shared buffers 2GB, how much more over that  
> > should I give the total shared memory kern.ipc.shmmax? 2.5GB?
> 
> I generally make it slightly bigger. is shmmax the size of the  
> maximum chunk allowed or the total ?

That's the total on FreeBSD, per process. I think to allow more than 2GB
there you may need a special compile option in the kernel.

> > Also will shared buffers impact inserts/updates at all?
> > I wish the postgresql.org site docs would mention what will be  
> > impacted.
> Yes, it will, however not as dramatically as what you are seeing with  
> effective_cache
> >
> > Comments like: This setting must be at least 16, as well as at  
> > least twice the value of max_connections; however, settings  
> > significantly higher than the minimum are usually needed for good  
> > performance.
> >
> > Are usefull, but could use some improvement.. increase on what? All  
> > performance? inserts? updates? selects?
> >
> > For instance, increasing effective_cache_size has made a noticeable  
> > difference in selects. However as I talk to the developers we are  
> > still doing marginally in the inserts. About 150/min.
> The reason is that with effective_cache the select plans changed (for  
> the better) ; it's unlikely that the insert plans will change.

There aren't multiple INSERT plans (however, there could be a subselect
or something, which would be planned separately). INSERT is INSERT. That
means effective_cache_size will have zero effect on INSERT.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Jeff Davis

On Thu, 2006-09-14 at 19:50 -0400, Francisco Reyes wrote:
> > regarding shared buffers I'd make this much bigger, like 2GB or more
> 
> Will do 2GB on the weekend. From what I read this requires shared memory so 
> have to restart my machine (FreeBSD).
> 

You should be able to do:
# sysctl -w kern.ipc.shmmax=2147483647

> if I plan to give shared buffers 2GB, how much more over that should I give 
> the total shared memory kern.ipc.shmmax? 2.5GB?
> 

To get it higher than 2GB, you may need to recompile the kernel, but you
should be able to get 2GB without a restart.

> Also will shared buffers impact inserts/updates at all?
> I wish the postgresql.org site docs would mention what will be impacted.
> 

They will not have a real impact on INSERTs, because an INSERT still has
to be logged in the WAL before commit. Technically, it may make a
difference, but I would not expect much.

shared_buffers has a big impact on UPDATEs, because an UPDATE needs to
find the record to UPDATE first. An UPDATE is basically a DELETE and an
INSERT in one transaction.

> Comments like: This setting must be at least 16, as well as at least twice 
> the value of max_connections; however, settings significantly higher than 
> the minimum are usually needed for good performance.
> 
> Are usefull, but could use some improvement.. increase on what? All 
> performance? inserts? updates? selects?

More shared_buffers means fewer reads from disk. If you have 10MB worth
of tables, having 100MB worth of shared buffers is useless because they
will be mostly empty. However, if you have 100MB of shared buffers and
you access records randomly from a 100 petabyte database, increasing
shared_buffers to 200MB doesn't help much, because the chances that the
record you need is in a shared buffer already are almost zero.

Shared buffers are a cache, pure and simple. When you have "locality of
reference", caches are helpful. Sometimes that's temporal locality (if
you are likely to access data that you recently accessed), and sometimes
that's spatial locality (if you access block 10, you're likely to access
block 11). If you have "locality of referece" -- and almost every
database does -- shared_buffers help.

> For instance, increasing effective_cache_size has made a noticeable 
> difference in selects. However as I talk to the developers we are still 
> doing marginally in the inserts. About 150/min.

effective_cache_size affects only the plan generated. INSERTs aren't
planned because, well, it's an INSERT and there's only one thing to do
and only one way to do it.

> There is spare CPU cycles, both raid cards are doing considerably less they 
> can do.. so next I am going to try and research what parameters I need to 
> bump to increase inserts. Today I increased checkpoint_segments from the 
> default to 64. Now looking at wall_buffers.

You won't see any amazing increases from those. You can improve INSERTs
a lot if you have a battery-backed cache on your RAID card and set it to
WriteBack mode (make sure to disable disk caches though, those aren't
battery backed and you could lose data). If you do this, you should be
able to do 1000's of inserts per second.

Another thing to look at is "commit_delay". If you are trying to commit
many INSERTs at once, normally they will be fsync()d individually, which
is slow. However, by adding a commit delay, postgres can batch a few
inserts into one fsync() call, which can help a lot.

> It would be most helpfull to have something on the docs to specify what each 
> setting affects most such as reads, writes, updates, inserts, etc..

I agree that they could be improved. It gets complicated quickly though,
and it's hard to generalize the effect that a performance setting will
have. They are all very interdependent.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Vacuums on large busy databases

2006-09-14 Thread Jeff Davis

On Thu, 2006-09-14 at 21:04 -0400, Michael Stone wrote:
> On Thu, Sep 14, 2006 at 05:52:02PM -0700, Jeff Davis wrote:
> >Any long-running system will have very little "free" memory. Free memory
> >is wasted memory, so the OS finds some use for it.
> 
> The important part of the output of "free" in this context isn't how 
> much is free, it's how much is cache vs how much is allocated to 
> programs. Other os's have other ways of telling the same thing. Neither 
> of those numbers generally has much to do with how much shows up in ps 
> when large amounts of shared memory are in use.

Right, ps doesn't give you much help. But he didn't tell us about the
process. If a process is using all the buffer cache, and you take away
that memory, it could turn all the reads that previously came from the
buffer cache into disk reads, leading to major slowdown and interference
with the database.

Conversely, if you have a large program running, it may not use much of
it's own memory, and perhaps some rarely-accessed pages could be paged
out in favor of more buffer cache. So even if all your memory is taken
with resident programs, your computer may easily accommodate more
processes by paging out rarely-used process memory.

If he knows a little more about the process than he can make a better
determination. But I don't think it will be much of a problem with 8GB
of physical memory.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Pipelined functions in Postgres

2006-09-19 Thread Jeff Davis

On Tue, 2006-09-19 at 23:22 +0200, Milen Kulev wrote:
> Hello Shoaib,
> I know the SETOF funcitons. I want to  simulate (somehow)
> producer/consumer relationship with SETOF(pipelined) functions.  The
> first  (producer )function generates records (just like your test_pipe
> function), and the second function consumers the records , produced by
> the first function. The second function can be rows/records producer
> for another consumer functions e.g. it should looks like(or similar)
> select * from consumer_function(  producer_function(param1,
> param2, ...));
>  
> What I want to achieve is to impelement some ETL logic
> in consumer_functions (they could be chained, of course).
> The main idea is to read source  DWH tables once (in
>  producer_function, for example), and to process the rowsets 
> in the consumer functions. I want to avoid writing to intermediate
> tables while performing ETL processing .
> Is this possible with SETOF functions ? 
>  

Functions cannot take a relation as a parameter.

Why not create a single function that does what you need it to do? You
can write such a function in the language of your choice, including C,
perl, PL/pgSQL, among others. That gives you a lot of power to do what
you need to do in a single pass, without passing the results on to other
functions.

If you provide an example of what you need to be able to do maybe
someone on this list knows a way to do it with one function call.

Also, I'll point out that what you want to do is very similar to using
typical relational constructs. Consider whether sub-selects or
aggregates in conjunction with set-returning functions can achieve what
you want. PostgreSQL is smart enough to only read the big table once if
possible.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PostgreSQL and sql-bench

2006-09-21 Thread Jeff Davis

On Thu, 2006-09-21 at 07:52 -0700, yoav x wrote:
> Hi
> 
> After upgrading DBI and DBD::Pg, this benchmark still picks MySQL as the 
> winner (at least on Linux
> RH3 on a Dell 1875 server with 2 hyperthreaded 3.6GHz CPUs and 4GB RAM).
> I've applied the following parameters to postgres.conf:
> 
> max_connections = 500
> shared_buffers = 3000

That's a low setting. 3000*8192 = 24MB. This should probably be closer
to 25% total memory, or 1GB, or 131072 shared buffers (however, that's
just a rule of thumb, there may be a better setting).

> work_mem = 10
> effective_cache_size = 30

That is a very high setting. effective_cache_size is measured in disk
pages, so if you want 3GB the correct setting is 393216.

> 
> Most queries still perform slower than with MySQL. 
> Is there anything else that can be tweaked or is this a limitation of PG or 
> the benchmark?
> 

As others have pointed out, sql-bench may not be a realistic benchmark.
The best way to examine performance is against real work.

Also, consider that relational databases were not developed to increase
performance. Things like filesystems are inherently "faster" because
they do less. However, relational databases make development of systems
of many applications easier to develop, and also make it easier to make
a well-performing application. If the benchmark isn't testing anything
that a filesystem can't do, then either:
(a) Your application could probably make better use of a relational
database; or
(b) The benchmark doesn't represent your application's needs.

Regards,
Jeff Davis

Regards,
Jeff Davis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Hints proposal

2006-10-12 Thread Jeff Davis

On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:
> The syntax these hints is something arbitrary. I'm borrowing Oracle's
> idea of embedding hints in comments, but we can use some other method if
> desired. Right now I'm more concerned with getting the general idea
> across.
> 

Is there any advantage to having the hints in the queries? To me that's
asking for trouble with no benefit at all. It would seem to me to be
better to have a system catalog that defined hints as something like:

"If user A executes a query matching regex R, then coerce (or force) the
planner in this way."

I'm not suggesting that we do that, but it seems better then embedding
the hints in the queries themselves.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Hints proposal

2006-10-12 Thread Jeff Davis

On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
> > I'm not suggesting that we do that, but it seems better then embedding
> > the hints in the queries themselves.
> 
> OK, what about this: if I execute the same query from a web client, I
> want the not-so-optimal-but-safe plan, if I execute it asynchronously, I
> let the planner choose the
> best-overall-performance-but-sometimes-may-be-slow plan ?
> 

Connect as a different user to control whether the hint matches or not.
If this doesn't work for you, read below.

> What kind of statistics/table level hinting will get you this ?
> 

It's based not just on the table, but on environment as well, such as
the user/role.

> I would say only query level hinting will buy you query level control.
> And that's perfectly good in some situations.

My particular proposal allows arbitrary regexes on the raw query. You
could add a comment with a "query id" in it. 

My proposal has these advantages over query comments:
(1) Most people's needs would be solved by just matching the query
form. 
(2) If the DBA really wanted to separate out queries individually (not
based on the query form), he could do it, but it would have an extra
step that might encourage him to reconsider the necessity
(3) If someone went to all that work to shoot themselves in the foot
with unmanagable hints that are way too specific, the postgres
developers are unlikely to be blamed
(4) No backwards compatibility issues that I can see, aside from people
making their own hints unmanagable. If someone started getting bad
plans, they could just remove all the hints from the system catalogs and
it would be just as if they had never used hints. If they added ugly
comments to their queries it wouldn't really have a bad effect.

To formalize the proposal a litte, you could have syntax like:

CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;

Where "some_hint" would be a hinting language perhaps like Jim's, except
not guaranteed to be compatible between versions of PostgreSQL. The
developers could change the hinting language at every release and people
can just re-write the hints without changing their application.

> I really can't see why a query-level hinting mechanism is so evil, why
> it couldn't be kept forever, and augmented with the possibility of
> correlation hinting, or table level hinting. 

Well, I wouldn't say "evil". Query hints are certainly against the
principles of a relational database, which separate the logical query
from the physical storage.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Hints proposal

2006-10-12 Thread Jeff Davis

On Thu, 2006-10-12 at 14:34 -0500, Jim C. Nasby wrote:
> On Thu, Oct 12, 2006 at 09:42:55AM -0700, Jeff Davis wrote:
> > On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:
> > > The syntax these hints is something arbitrary. I'm borrowing Oracle's
> > > idea of embedding hints in comments, but we can use some other method if
> > > desired. Right now I'm more concerned with getting the general idea
> > > across.
> > > 
> > 
> > Is there any advantage to having the hints in the queries? To me that's
> > asking for trouble with no benefit at all. It would seem to me to be
> > better to have a system catalog that defined hints as something like:
> > 
> > "If user A executes a query matching regex R, then coerce (or force) the
> > planner in this way."
> > 
> > I'm not suggesting that we do that, but it seems better then embedding
> > the hints in the queries themselves.
> 
> My experience is that on the occasions when I want to beat the planner
> into submission, it's usually a pretty complex query that's the issue,
> and that it's unlikely to have more than a handful of them in the
> application. That makes me think a regex facility would just get in the
> way, but perhaps others have much more extensive need of hinting.
> 
> I also suspect that writing that regex could become a real bear.
> 

Well, writing the regex is just matching criteria to apply the hint. If
you really need a quick fix, you can just write a comment with a query
id number in the query. The benefit there is that when the hint is
obsolete later (as the planner improves, or data changes
characteristics) you drop the hint and the query is planned without
interference. No application changes required.

Also, and perhaps more importantly, let's say you are trying to improve
the performance of an existing application where it's impractical to
change the query text (24/7 app, closed source, etc.). You can still
apply a hint if you're willing to write the regex. Just enable query
logging or some such to capture the query, and copy it verbatim except
for a few parameters which are unknown. Instant regex. If you have to
change the query text to apply the hint, it would be impossible in this
case.

> Having said that... I see no reason why it couldn't work... but the real
> challenge is defining the hints.

Right. The only thing I was trying to solve was the problems associated
with the hint itself embedded in the client code. I view that as a
problem that doesn't need to exist.

I'll leave it to smarter people to either improve the planner or develop
a hinting language. I don't even need hints myself, just offering a
suggestion.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Hints proposal

2006-10-12 Thread Jeff Davis

On Thu, 2006-10-12 at 17:28 -0400, Tom Lane wrote:
> [ trying once again to push this thread over to -hackers where it belongs ]
> 
> Arjen van der Meijden <[EMAIL PROTECTED]> writes:
> > On 12-10-2006 21:07 Jeff Davis wrote:
> >> On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
> >> To formalize the proposal a litte, you could have syntax like:
> >> CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;
> >> 
> >> Where "some_hint" would be a hinting language perhaps like Jim's, except
> >> not guaranteed to be compatible between versions of PostgreSQL. The
> >> developers could change the hinting language at every release and people
> >> can just re-write the hints without changing their application.
> 
> Do you have any idea how much push-back there would be to that?  In
> practice we'd be bound by backwards-compatibility concerns for the hints
> too.
> 

No, I don't have any idea, except that it would be less push-back than
changing a language that's embedded in client code. Also, I see no
reason to think that a hint would not be obsolete upon a new release
anyway.

> The problems that you are seeing all come from the insistence that a
> hint should be textually associated with a query.  Using a regex is a
> little better than putting it right into the query, but the only thing

"Little better" is all I was going for. I was just making the
observation that we can separate two concepts:
(1) Embedding code in the client's queries, which I see as very
undesirable and unnecessary
(2) Providing very specific hints

which at least gives us a place to talk about the debate more
reasonably.

> that really fixes is not having the hints directly embedded into
> client-side code.  It's still wrong at the conceptual level.
> 

I won't disagree with that. I will just say it's no more wrong than
applying the same concept in addition to embedding the hints in client
queries.

> The right way to think about it is to ask why is the planner not picking
> the right plan to start with --- is it missing a statistical
> correlation, or are its cost parameters wrong for a specific case, or
> is it perhaps unable to generate the desired plan at all?  (If the
> latter, no amount of hinting is going to help.)  If it's a statistics or
> costing problem, I think the right thing is to try to fix it with hints
> at that level.  You're much more likely to fix the behavior across a
> class of queries than you will be with a hint textually matched to a
> specific query.
> 

Agreed.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on

2006-10-18 Thread Jeff Davis

On Wed, 2006-10-18 at 15:51 -0400, Ioana Danes wrote:
> Hi everyone,
> Testing some selects I know we have in the application
> I got into a scenario where my plan does not work
> without doing code change. This scenario is:
> 
> select max(transid) from alltransaction;
> 
> because the planner does not use the existent indexes
> on the 2 new tables: public.transaction and
> archive.transaction
> 

First, the query is expanded into something like (I'm being inexact
here):

SELECT max(transid) FROM (SELECT * FROM public.transaction UNION SELECT
* FROM archive.transaction);

PostgreSQL added a hack to the max() aggregate so that, in the simple
case, it can recognize that what it really wants to do is use the index.
Using the index for an aggregate only works in special cases, like min()
and max(). What PostgreSQL actually does is to transform a query from:

SELECT max(value) FROM sometable;

Into:

SELECT value FROM sometable ORDER BY value DESC LIMIT 1;

In your case, it would need to transform the query into something more
like:

SELECT max(transid) FROM (
  SELECT transid FROM (
SELECT transid FROM public.transaction ORDER BY transid DESC
  LIMIT 1
  ) t1 
  UNION 
  SELECT transid FROM (
SELECT transid FROM archive.transaction ORDER BY transid DESC
  LIMIT 1
  ) t2 
) t;

The reason for that is because PostgreSQL (apparently) isn't smart
enough to do a mergesort on the two indexes to sort the result of the
UNION. At least, I can't get PostgreSQL to sort over two UNIONed tables
using an index; perhaps I'm missing it.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on

2006-10-18 Thread Jeff Davis

On Wed, 2006-10-18 at 23:19 +0200, Dimitri Fontaine wrote:
> Le mercredi 18 octobre 2006 23:02, Ioana Danes a écrit :
> > I tried the partitioning scenario but I've got into
> > the same problem. The max function is not using the
> > indexes on the two partitioned tables...
> >
> > Any other thoughts?
> 
> Did you make sure your test included table inheritance?
> I'm not sure the planner benefits from constraint_exclusion without selecting 
> the empty parent table (instead of your own union based view).
> 

constraint exclusion and inheritance won't help him.

The problem is that he has two indexes, and he needs to find the max
between both of them. PostgreSQL isn't smart enough to recognize that it
can use two indexes, find the max in each one, and find the max of those
two values.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on

2006-10-18 Thread Jeff Davis

On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote:
> Sorry, don't have the earlier part of this thread, but what about...
> 
> SELECT greatest(max(a), max(b)) ...
> 
> ?

To fill you in, we're trying to get the max of a union (a view across
two physical tables).

It can be done if you're creative with the query; I suggested a query
that selected the max of the max()es of the individual tables. Your
query could work too. However, the trick would be getting postgresql to
recognize that it can transform "SELECT max(x) FROM foo" into that,
where foo is a view of a union.

If PostgreSQL could sort the result of a union by merging the results of
two index scans, I think the problem would be solved. Is there something
preventing this, or is it just something that needs to be added to the
planner?

Regards,
Jeff Davis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on

2006-10-18 Thread Jeff Davis

On Wed, 2006-10-18 at 17:35 -0500, Jim C. Nasby wrote:
> On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis wrote:
> > On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote:
> > > Sorry, don't have the earlier part of this thread, but what about...
> > > 
> > > SELECT greatest(max(a), max(b)) ...
> > > 
> > > ?
> > 
> > To fill you in, we're trying to get the max of a union (a view across
> > two physical tables).
> 
> UNION or UNION ALL? You definitely don't want to do a plain UNION if you
> can possibly avoid it.

Oops, of course he must be doing UNION ALL, but for some reason I ran my
test queries with plain UNION (thanks for reminding me). However, it
didn't make a difference, see below.

> > It can be done if you're creative with the query; I suggested a query
> > that selected the max of the max()es of the individual tables. Your
> > query could work too. However, the trick would be getting postgresql to
> > recognize that it can transform "SELECT max(x) FROM foo" into that,
> > where foo is a view of a union.
> > 
> > If PostgreSQL could sort the result of a union by merging the results of
> > two index scans, I think the problem would be solved. Is there something
> > preventing this, or is it just something that needs to be added to the
> > planner?
> 
> Hrm... it'd be worth trying the old ORDER BY ... LIMIT 1 trick just to
> see if that worked in this case, but I don't have much hope for that.

Yeah, that's the solution. Here's the problem:

=> set enable_seqscan = false;
SET
=> EXPLAIN SELECT i FROM (SELECT i FROM t10 UNION ALL SELECT i FROM t11)
t ORDER BY i DESC;
 QUERY PLAN

 Sort  (cost=200026772.96..200027272.96 rows=20 width=4)
   Sort Key: t.i
   ->  Append  (cost=1.00..24882.00 rows=20 width=4)
 ->  Seq Scan on t10  (cost=1.00..11441.00
rows=10 width=4)
 ->  Seq Scan on t11  (cost=1.00..11441.00
rows=10 width=4)
(5 rows)

=> EXPLAIN SELECT i FROM (SELECT i FROM t10) t ORDER BY i DESC;
 QUERY PLAN

 Index Scan Backward using t10_idx on t10  (cost=0.00..1762.00
rows=10 width=4)
(1 row)

=> EXPLAIN SELECT i FROM (SELECT i FROM t11) t ORDER BY i DESC;
 QUERY PLAN

 Index Scan Backward using t11_idx on t11  (cost=0.00..1762.00
rows=10 width=4)
(1 row)

=>

But if PostgreSQL could just merge the index scan results, it could
"ORDER BY i" the result of a UNION ALL without a problem. But it can't
do that, so the syntactical trick introduced for min/max won't work in
his case :(

He'll probably have to change his application to make that query perform
decently if the tables are split.

Ideas?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Defining performance.

2006-11-30 Thread Jeff Davis

On Fri, 2006-12-01 at 01:05 +0100, Tobias Brox wrote:
> > However, we still are suffering a gradual decrease in performance over
> > time - or so the application engineers claim. The DBA and I have been
> > banging our heads against this for a month.
> 
> We're having the same issues, so we do the dumping and restoring every
> now and then to be sure everything is properly cleaned up.  With 8.1.
> 

What's causing that? Is it index bloat?

I would think a REINDEX would avoid having to dump/restore, right? A
CLUSTER might also be necessary, depending on what kind of performance
degradation you're experiencing.

Am I missing something?

Regards,
Jeff Davis

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Configuration settings for 32GB RAM server

2006-12-04 Thread Jeff Davis

On Mon, 2006-12-04 at 12:10 -0500, Mark Lonsdale wrote:

> - 4 physical CPUs (hyperthreaded to 8)
> 
> - 32 GB RAM
> 
> - x86_64 architecture
> 
> - RedHat AS 4
> 
> - postgres 8.1.5
> 
>  
> 
> Ive been taking a look at the various postgres tuning parameters, and
> have come up with the following settings.   
> 
>  
> 
> shared_buffers – 50,000 -  >From what Id read, increasing this
> number higher than this wont have any advantages ?
> 

Where did you read that? You should do some tests. Generally 25% of
physical memory on a dedicated box is a good point of reference (on 8.1,
anyway). I've heard as high as 50% can give you a benefit, but I haven't
seen that myself.

> fsm_pages = 200,000 – Based this on some statistics about the number
> of pages freed from a vacuum on older server.   Not sure if its fair
> to calculate this based on vacuum stats of 7.3.4 server?
> 

Might as well make it a higher number because you have a lot of RAM
anyway. It's better than running out of space in the FSM, because to
increase that setting you need to restart the daemon. Increasing this by
1 only uses 6 bytes. That means you could set it to 10 times the number
you currently have, and it would still be insignificant.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-02 Thread Jeff Davis

On Fri, 2006-12-29 at 07:52 -0500, Ron wrote:
> A= go through each query and see what work_mem needs to be for that 
> query to be as RAM resident as possible.  If you have enough RAM, set 
> work_mem for that query that large.  Remember that work_mem is =per 
> query=, so queries running in parallel eat the sum of each of their 
> work_mem's.

Just to clarify, from the docs on work_mem at
http://www.postgresql.org/docs/current/static/runtime-config-
resource.html :

"Specifies the amount of memory to be used by internal sort operations
and hash tables before switching to temporary disk files. The value is
specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that
for a complex query, several sort or hash operations might be running in
parallel; each one will be allowed to use as much memory as this value
specifies before it starts to put data into temporary files. Also,
several running sessions could be doing such operations concurrently. So
the total memory used could be many times the value of work_mem; it is
necessary to keep this fact in mind when choosing the value. Sort
operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables
are used in hash joins, hash-based aggregation, and hash-based
processing of IN subqueries."

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] How to debug performance problems

2007-02-19 Thread Jeff Davis

On Mon, 2007-02-19 at 11:50 +0100, Andreas Tille wrote:
> Hi,
> 
> I'm running a web application using Zope that obtains all data
> from a PostgreSQL 7.4 database (Debian Sarge system with package
> 7.4.7-6sarge4 on an "older" Sparc machine, equipped with 2GB

Upgrade to 8.2.3 if possible, or at least to 7.4.16.

This is a basic question, but do you VACUUM ANALYZE regularly? 7.4 is
before autovacuum was integrated in the core. If you don't do this you
could have a lot of wasted space in your tables causing unneeded I/O,
and the planner might be making bad plans.

> memory and two processors E250 server).  Once I did some performance
> tuning and found out that
> 
>   max_connections = 256
>   shared_buffers = 131072
>   sort_mem = 65536
> 

You're allocating 50% of the physical memory to shared buffers. That's
not necessarily too much, but that's on the high side of the normal
range. 

Does the total size of all of your tables and indexes add up to enough
to exhaust your physical memory? Check to see if you have any
exceptionally large tables or indexes. You can do that easily with
pg_relation_size('a_table_or_index') and pg_total_relation_size
('a_table').

> Since about two weeks the application became *drastically* slower
> and I urgently have to bring back the old performance.  As I said
> I'm talking about functions accessing tables that did not increased
> over several years and should behave more or less the same.
> 
> I wonder whether adding tables and functions could have an influence
> on other untouched parts and how to find out what makes the things
> slow that worked for years reliable and satisfying.  My first try

You need to provide queries, and also define "slower". Set
log_min_duration_statement to some positive value (I often start with
1000) to try to catch the slow statements in the logs. Once you have
found the slow statements, do an EXPLAIN and an EXPLAIN ANALYZE on those
statements. That will tell you exactly what you need to know.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] long checkpoint_timeout

2007-02-23 Thread Jeff Davis

The postgresql.conf says that the maximum checkpoint_timeout is 1 hour.
However, the following messages seem to suggest that it may be useful to
set the value significantly higher to reduce unnecessary WAL volume:

http://archives.postgresql.org/pgsql-hackers/2006-10/msg00527.php
http://archives.postgresql.org/pgsql-hackers/2006-08/msg01190.php

Is there a reason for the hour-long limit on checkpoint_timeout? Is
there a cost to doing so, aside from potentially longer recovery time?

As I understand it, the background writer keeps the I/O more balanced
anyway, avoiding I/O spikes at checkpoint. 

I don't need the checkpoint time to be higher than 1 hour, but I'm
trying to understand the reasoning behind the limit and the implications
of a longer checkpoint_timeout.

The docs here:

http://www.postgresql.org/docs/current/static/wal-configuration.html

say that checkpoints cause extra disk I/O. Is there a good way to
measure how much extra I/O (and WAL volume) is caused by the
checkpoints? Also, it would be good to know how much total I/O is caused
by a checkpoint so that I know if bgwriter is doing it's job.

Regards,
    Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] long checkpoint_timeout

2007-02-23 Thread Jeff Davis

On Fri, 2007-02-23 at 14:02 -0600, Jim C. Nasby wrote:
> > say that checkpoints cause extra disk I/O. Is there a good way to
> > measure how much extra I/O (and WAL volume) is caused by the
> > checkpoints? Also, it would be good to know how much total I/O is caused
> > by a checkpoint so that I know if bgwriter is doing it's job.
> 
> There's a patch someone just came up with that provides additional debug
> info about both bgwriter operation and checkpoints. I know it will at
> least tell you how much was written out by a checkpoint.

Excellent, that would answer a lot of my questions. I did some brief
searching and nothing turned up. Do you have a link to the discussion or
the patch?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Writting a "search engine" for a pgsql DB

2007-02-26 Thread Jeff Davis

On Mon, 2007-02-26 at 11:29 -0500, Madison Kelly wrote:
>I am looking at writing a search engine of sorts for my database. I 
> have only ever written very simple search engines before which amounted 
> to not much more that the query string being used with ILIKE on a pile 
> of columns. This was pretty rudimentary and didn't offer anything like 
> relevance sorting and such (I'd sort by result name, age or whatnot).

Look at Tsearch2:

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

It has a lot of features for searching, and can make use of powerful
indexes to return search results very quickly. As someone already
mentioned, it also has ranking features.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Two hard drives --- what to do with them?

2007-02-26 Thread Jeff Davis

On Sun, 2007-02-25 at 23:11 +0100, Peter Kovacs wrote:
> A related question:
> Is it sufficient to disable write cache only on the disk where pg_xlog
> is located? Or should write cache be disabled on both disks?
> 

When PostgreSQL does a checkpoint, it thinks the data pages before the
checkpoint have successfully made it to disk. 

If the write cache holds those data pages, and then loses them, there's
no way for PostgreSQL to recover. So use a battery backed cache or turn
off the write cache.

Regards,
    Jeff Davis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Two hard drives --- what to do with them?

2007-02-26 Thread Jeff Davis

On Tue, 2007-02-27 at 01:11 +0100, Peter Kovacs wrote:
> On 2/26/07, Jeff Davis <[EMAIL PROTECTED]> wrote:
> > On Sun, 2007-02-25 at 23:11 +0100, Peter Kovacs wrote:
> > > A related question:
> > > Is it sufficient to disable write cache only on the disk where pg_xlog
> > > is located? Or should write cache be disabled on both disks?
> > >
> >
> > When PostgreSQL does a checkpoint, it thinks the data pages before the
> > checkpoint have successfully made it to disk.
> >
> > If the write cache holds those data pages, and then loses them, there's
> > no way for PostgreSQL to recover. So use a battery backed cache or turn
> > off the write cache.
> 
> Sorry for for not being familar with storage techonologies... Does
> "battery" here mean battery in the common sense of the word - some
> kind of independent power supply? Shouldn't the disk itself be backed
> by a battery? As should the entire storage subsystem?
> 

Yes, a battery that can hold power to keep data alive in the write cache
in case of power failure, etc., for a long enough time to recover and
commit the data to disk.

So, a write cache is OK (even for pg_xlog) if it is durable (i.e. on
permanent storage or backed by enough power to make sure it gets there).
However, if PostgreSQL has no way to know whether a write is durable or
not, it can't guarantee the data is safe.

The reason this becomes an issue is that many consumer-grade disks have
write cache enabled by default and no way to make sure the cached data
actually gets written. So, essentially, these disks "lie" and say they
wrote the data, when in reality, it's in volatile memory. It's
recommended that you disable write cache on such a device.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Two hard drives --- what to do with them?

2007-02-27 Thread Jeff Davis

On Tue, 2007-02-27 at 09:27 +0100, Peter Kovacs wrote:
> I wonder if running a machine on a UPS + 1 hot standby internal PS is
> equivalent, in terms of data integrity, to using battery backed write
> cache. Instinctively, I'd think that UPS + 1 hot standby internal PS
> is better, since this setup also provides for the disk to actually
> write out the content of the cache -- as you pointed out.

It's all about the degree of safety. A battery-backed cache on a RAID
controller sits below all of these points of failure:

* External power
* Power supply
* Operating system

and with proper system administration, can recover from any transient
errors in the above. Keep in mind that it can only recover from
transient failures: if you have a long blackout that outlasts your UPS
and cache battery, you can still have data loss. Also, you need a very
responsive system administrator that can make sure that data gets to
disk in case of failure.

Let's say you have a RAID system but you rely on the UPS to make sure
the data hits disk. Well, now if you have an OS crash (caused by another
piece of hardware failing, perhaps), you've lost your data.

If you can afford it (in terms of dollars or performance hit) go with
the safe solution.

Also, put things in context. The chances of failure due to these kinds
of things are fairly low. If it's more likely that someone spills coffee
on your server than the UPS fails, it doesn't make sense to spend huge
amounts of money on NVRAM (or something) to store your data. So identify
the highest-risk scenarios and prevent those first.

Also keep in mind what the cost of failure is: a few hundred bucks more
on a better RAID controller is probably a good value if it prevents a
day of chaos and unhappy customers.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Array indexes, GIN?

2007-03-02 Thread Jeff Davis

On Thu, 2007-03-01 at 19:59 -0800, Adam L Beberg wrote:
> On the surface, looks like a job for GIN, but GIN seems undocumented, 
> specifically mentions it doesn't support the deletes we'll have many of 
> since it's designed for word searching apparently, the performance 

It can delete an entry for one of the keys of an index, it just can't
delete the key itself when the number of entries goes down to zero.
Because you only have O(100K) possible keys, that shouldn't be a
problem. The GIN indexes can reclaim space. If they couldn't, they
wouldn't be nearly as useful. 

The time when you run into problems is when you have a huge, sparsely
populated keyspace, with a huge number of keys contained by no tuples in
the table.

However, for your application, GIN still might not be the right answer.
GIN can only return tuples which do contain some matching keys, it won't
return the number of matching keys in that tuple (that's not the job of
an index). 

Let's run some numbers:

 * percentage of tuples returned = 100K rows out of the 10M = 1%
 * tuples per page = 8192 bytes / 32 (tuple header) + 8 (bigint) + 80
(10 bigints) = ~70. Let's say it's 50 due to some free space.

Based on those numbers, the GIN index is basically going to say "get
every other page". PostgreSQL will optimize that into a sequential scan
because it makes no sense to do a random fetch for every other page.

So, the fastest way you can do this (that I can see) is just fetch every
tuple and count the number of matches in each array. You know your data
better than I do, so replace those numbers with real ones and see if it
still makes sense.

The basic rule is that an index scan is useful only if it reduces the
number of disk pages you need to fetch enough to make up for the extra
cost of random I/O.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Please humor me ...

2007-04-09 Thread Jeff Davis

On Mon, 2007-04-09 at 16:05 -0400, Carlos Moreno wrote:
> And by the subject, I mean:  please provide a "factual" answer, as opposed
> to the more or less obvious answer which would be "no one in their sane
> mind would even consider doing such thing"  :-)
> 
> 1) Would it be possible to entirely disable WAL?  (something like setting a
> symlink so that pg_xlog points to /dev/null, perhaps?)

You can't disable WAL, but you can disable fsync.

> 2) What would be the real implications of doing that?
> 

A good chance that you lose your entire database cluster if the power
fails.

It's not just your tables that require WAL, it's also the system
catalogs. If you were to disable it, and a system catalog became
corrupt, the database would not know what to do.

There's always a chance you could recover some of that data by a manual
process (i.e. open up the database files in a hex editor and look for
your data), but there would be no guarantee.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Basic Q on superfluous primary keys

2007-04-18 Thread Jeff Davis

On Tue, 2007-04-17 at 21:06 -0700, Craig A. James wrote:
> Merlin Moncure wrote:
> > In the context of this debate, I see this argument all the time, with
> > the implied suffix: 'If only we used integer keys we would not have
> > had this problem...'.  Either the customer identifies parts with a
> > part number or they don't...and if they do identify parts with a
> > number and recycle the numbers, you have a problem...period.
> 
> On the contrary.  You create a new record with the same part number.  You 
> mark the old part number "obsolete".  Everything else (the part's 
> description, and all the relationships that it's in, such as order history, 
> catalog inclusion, revision history, etc.) is unaffected.  New orders are 
> placed against the new part number's DB record; for safety the old part 
> number can have a trigger that prevent new orders from being placed.
> 
> Since the part number is NOT the primary key, duplicate part numbers are not 
> a problem.  If you had instead used the part number as the primary key, you'd 
> be dead in the water.
> 
> You can argue that the customer is making a dumb decision by reusing catalog 
> numbers, and I'd agree.  But they do it, and as database designers we have to 
> handle it.  In my particular system, we aggregate information from several 
> hundred companies, and this exact scenario happens frequently.  Since we're 
> only aggregating information, we have no control over the data that these 
> companies provide.  If we'd used catalog numbers for primary keys, we'd have 
> big problems.
> 

Storing data is easy.

The difficulty lies in storing data in such a way that your assumptions
about the data remain valid and your queries still answer the questions
that you think they answer.

Because an internal ID field has no meaning outside of the database
(some auto-generated keys do have meaning outside the database, but I'm
not talking about those), you can't effectively query based on an
internal id any more than you can query by the ctid. So what do you
query by then? You query by natural keys anyway. Internal id fields are
an implementation detail related to performance (the real topic of this
discussion).

If you have two parts with the same part id, what does that mean? Sure,
you can store the data, but then the queries that assumed that data was
unique no longer hold. Sometimes you need two parts with the same part
id, but you have to know the meaning in order to query based on that
data.

Let me ask these questions:
 - Do you think that all of your relations have an internal id? 
 - Do you think that all the internal ids you use are unique in the
relations in which they appear?

If you answer "yes" to either question, consider that every query on
that data is also a relation and so are subselects and intermediate
results. Do those all have an id? If not, why not? How do you join a
virtual relation to a physical relation if the virtual relation has no
internal id? Is the id field still unique in the result of a join or
Cartesian product?

Regards,
Jeff Davis

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] index structure for 114-dimension vector

2007-04-20 Thread Jeff Davis

On Fri, 2007-04-20 at 12:07 -0700, Andrew Lazarus wrote:
> I have a table with 2.5 million real[] arrays. (They are points in a
> time series.) Given a new array X, I'd like to find, say, the 25
> closest to X in some sense--for simplification, let's just say in the
> usual vector norm. Speed is critical here, and everything I have tried
> has been too slow.
> 
> I imported the cube contrib package, and I tried creating an index on
> a cube of the last 6 elements, which are the most important. Then I

Have you tried just making normal indexes on each of the last 6 elements
and see if postgresql will use a BitmapAnd to combine them? 

Regards,
Jeff Davis




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] pg_stat_* collection

2007-05-03 Thread Jeff Davis

On Thu, 2007-05-03 at 10:45 -0400, Greg Smith wrote:
> Today's survey is:  just what are *you* doing to collect up the 
> information about your system made available by the various pg_stat views? 
> I have this hacked together script that dumps them into a file, imports 
> them into another database, and then queries against some of the more 
> interesting data.  You would thing there would be an organized project 
> addressing this need around to keep everyone from reinventing that wheel, 
> but I'm not aware of one.
> 

Is anyone out there collecting their own statistics? What's the easiest
way to take statistical samples of the data in a table without reading
the entire thing?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Performance Woes

2007-05-09 Thread Jeff Davis

On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote:
> > 2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile,
> > fd.c:471
> > 
> > 2007-05-09 03:07:50.091 GMT 0: LOG:  0: duration: 12.362 ms
> > 
> > 2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query,
> > postgres.c:1090
> > 
> >  
> > 
> > So  we decreased the max_files_per_process to 800.  This took care
> > of the error **BUT** about quadrupled  the IO wait that is happening
> > on the machine. It went from a peek of about 50% to peeks of over
> > 200% (4 processor machines, 4 gigs ram, raid).  The load on the
> > machine remained constant.
> > 
> 
> Sounds to me like you just need to up the total amount of open files 
> allowed by the operating system.

It looks more like the opposite, here's the docs for
max_files_per_process:

"Sets the maximum number of simultaneously open files allowed to each
server subprocess. The default is one thousand files. If the kernel is
enforcing a safe per-process limit, you don't need to worry about this
setting. But on some platforms (notably, most BSD systems), the kernel
will allow individual processes to open many more files than the system
can really support when a large number of processes all try to open that
many files. If you find yourself seeing "Too many open files" failures,
try reducing this setting. This parameter can only be set at server
start."

To me, that means that his machine is allowing the new FD to be created,
but then can't really support that many so it gives an error.

Ralph, how many connections do you have open at once? It seems like the
machine perhaps just can't handle that many FDs in all of those
processes at once.

That is a lot of tables. Maybe a different OS will handle it better?
Maybe there's some way that you can use fewer connections and then the
OS could still handle it?

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] performance question on VACUUM FULL (Postgres 8.4.2)

2010-01-19 Thread Jeff Davis

On Tue, 2010-01-19 at 12:19 -0800, PG User 2010 wrote:
> Hello,
> 
> We are running into some performance issues with running VACUUM FULL
> on the pg_largeobject table in Postgres (8.4.2 under Linux), and I'm
> wondering if anybody here might be able to suggest anything to help
> address the issue.

Are you running VACUUM (without FULL) regularly? And if so, is that
insufficient?

> Our pg_largeobject table is about 200 gigabytes, and I suspect that
> about 30-40% of the table are dead rows (after having run vacuumlo and
> deleting large numbers of large objects).

You can always expect some degree of bloat. Can you give an exact number
before and after the VACUUM FULL? Or is this a one-shot attempt that
never finished?

If large objects are being added/removed regularly, it might be better
just to wait (and do regular VACUUMs), and the table will naturally
compact after the rows at the end are removed.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] ext4 finally doing the right thing

2010-01-19 Thread Jeff Davis

On Fri, 2010-01-15 at 22:05 -0500, Greg Smith wrote:
> A few months ago the worst of the bugs in the ext4 fsync code started 
> clearing up, with 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f3481e9a80c240f169b36ea886e2325b9aeb745
>  
> as a particularly painful one.

Wow, thanks for the heads-up!

> On one side, we might finally be 
> able to use regular drives with their caches turned on safely, taking 
> advantage of the cache for other writes while doing the right thing with 
> the database writes.

That could be good news. What's your opinion on the practical
performance impact? If it doesn't need to be fsync'd, the kernel
probably shouldn't have written it to the disk yet anyway, right (I'm
assuming here that the OS buffer cache is much larger than the disk
write cache)?

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline

2010-02-16 Thread Jeff Davis

On Mon, 2010-02-08 at 09:49 -0800, Josh Berkus wrote:
> FWIW, back when deadline was first introduced Mark Wong did some tests
> and found Deadline to be the fastest of 4 on DBT2 ... but only by about
> 5%.  If the read vs. checkpoint analysis is correct, what was happening
> is the penalty for checkpoints on deadline was almost wiping out the
> advantage for reads, but not quite.

I also did some tests when I was putting together my Synchronized Scan
benchmarks:

http://j-davis.com/postgresql/83v82_scans.html

CFQ was so slow that I didn't include it in the results at all.

The tests weren't intended to compare schedulers, so I did most of the
tests with anticipatory (at least the ones on linux; I also tested
freebsd). However, I have some raw data from the tests I did run with
CFQ:

http://j-davis.com/postgresql/results/

They will take some interpretation (again, not intended as scheduler
benchmarks). The server was modified to record a log message every N
page accesses.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Is DBLINK transactional

2010-03-12 Thread Jeff Davis

On Fri, 2010-03-12 at 12:07 -0500, Merlin Moncure wrote:
> of course.  You can always explicitly open a transaction on the remote
> side over dblink, do work, and commit it at the last possible moment.
> Your transactions aren't perfectly synchronized...if you crash in the
> precise moment between committing the remote and the local you can get
> in trouble.  The chances of this are extremely remote though.

If you want a better guarantee than that, consider using 2PC.

The problem with things that are "extremely remote" possibilities are
that they tend to be less remote than we expect ;)

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-09 Thread Jeff Davis

On Mon, 2010-08-09 at 09:49 -0700, Scott Carey wrote:
> Also, the amount of data at risk in a power loss varies between
> drives.  For Intel's drives, its a small chunk of data ( < 256K).  For
> some other drives, the cache can be over 30MB of outstanding writes.
> For some workloads this is acceptable -- not every application is
> doing financial transactions.   Not every part of the system needs to
> be on an SSD either -- the WAL, and various table spaces can all have
> different data integrity and performance requirements.

I don't think it makes sense to speak about the data integrity of a
drive in terms of the amount of data at risk, especially with a DBMS.
Depending on which 256K you lose, you might as well lose your entire
database.

That may be an exaggeration, but the point is that it's not as simple as
"this drive is only risking 256K data loss per outage".

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] GPU Accelerated Sorting

2010-08-30 Thread Jeff Davis

On Mon, 2010-08-30 at 09:51 -0400, Eliot Gable wrote:
> Not sure if anyone else saw this, but it struck me as an interesting
> idea if it could be added to PostgreSQL. GPU accelerated database
> operations could be very... interesting. Of course, this could be
> difficult to do in a way that usefully increases performance of
> PostgreSQL, but I'll leave that up to you guys to figure out.
> 
> http://code.google.com/p/back40computing/wiki/RadixSorting
> 

Radix sort is not a comparison sort. Comparison sorts work for any data
type for which you define a total order; and any total order is allowed.
Radix sort only works for some data types and some total orders.

However, it would be very nice to use radix sorting where it does work.
That would require some extensions to the type system, but it could be
done.

The GPU issue is orthogonal.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Error message in wal_log for Streaming replication

2010-10-05 Thread Jeff Davis

On Tue, 2010-10-05 at 14:41 +0530, Nimesh Satam wrote:
> We have been trying to implement the streaming replication on one of
> our test servers. We have set archive_xlogs to on. But on the standby
> server,  when we start the database we get error the following errors:
> 
> LOG:  shutting down
> LOG:  database system is shut down
> LOG:  database system was shut down in recovery at 2010-10-05 05:05:27
> EDT
> LOG:  entering standby mode
> cp: cannot stat
> `/home/postgres/archive_xlogs/0001000D0036': No such file
> or directory
> LOG:  consistent recovery state reached at D/3678
> LOG:  database system is ready to accept read only connections
> LOG:  invalid record length at D/3678
> cp: cannot stat
> `/home/postgres/archive_xlogs/0001000D0036': No such file
> or directory
> LOG:  streaming replication successfully connected to primary
> 
> The file "0001000D0036" is seen in pg_xlog folder of the
> primary database, but is not yet pushed to the archive location.
> 
> Can you let us know what the error means and if we are doing anything
> wrong?

[ this question is more appropriate on pgsql-general ]

Those aren't postgres error messages, those are error messages generated
by "cp".

See:
http://www.postgresql.org/docs/9.0/static/archive-recovery-settings.html

"The command will be asked for file names that are not present in the
archive; it must return nonzero when so asked."

So, it is safe to ignore those errors.

Personally, I would use a restore_command that is silent when the file
doesn't exist so that it doesn't pollute your logs. I'm not sure why the
documentation suggests "cp".

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What is postmaster doing?

2010-10-20 Thread Jeff Davis

On Wed, 2010-10-20 at 14:44 -0400, Dimi Paun wrote:
> 23425 postgres  20   0 22008  10m  10m R 99.9  0.5  21:45.87 postmaster
> 
> I'd like to figure out what it is doing. How can I figure out what
> statement causes the problem? 
> 

It seems strange that the postmaster is eating 99% cpu. Is there a
chance that it's flooded with connection attempts?

Usually the work is done by backend processes, not the postmaster. The
postmaster just does some management like accepting connections and
starting new processes.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Jeff Davis

On Thu, 2011-02-03 at 16:50 -0500, Mladen Gogala wrote:
> Chris Browne wrote:
> > Well, the community declines to add hints until there is actual
> > consensus on a good way to add hints.
> >   
> OK. That's another matter entirely.   Who should make that decision? Is 
> there a committee or a person who would be capable of making that decision?

Admittedly I haven't read this whole discussion, but it seems like
"hints" might be too poorly defined right now.

If by "hints" we mean some mechanism to influence the planner in a more
fine-grained way, I could imagine that some proposal along those lines
might gain significant support.

But, as always, it depends on the content and quality of the proposal
more than the title. If someone has thoughtful proposal that tries to
balance things like:
* DBA control versus query changes/comments
* compatibility across versions versus finer plan control
* allowing the existing optimizer to optimize portions of the
  query while controlling other portions
* indicating costs and cardinalities versus plans directly

I am confident that such a proposal will gain traction among the
community as a whole.

However, a series proposals for individual hacks for specific purposes
will probably be rejected. I am in no way implying that you are
approaching it this way -- I am just trying to characterize an approach
that won't make progress.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How PostgreSQL handles multiple DDBB instances?

2007-05-29 Thread Jeff Davis

On Fri, 2007-05-25 at 20:16 +0200, Arnau wrote:
>The point I'm worried is performance. Do you think the performance 
> would be better executing exactly the same queries only adding an extra 
> column to all the tables e.g. customer_id, than open a connection to the 
> only one customers DB and execute the query there?

Have you already considered using views with specific privileges to
separate your customers?

Regards,
    Jeff Davis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Is this true?

2007-06-06 Thread Jeff Davis

On Wed, 2007-06-06 at 16:58 -0400, Chris Hoover wrote:
> Question,
> 
> Does (pg_stat_get_db_blocks_fetched(oid)-pg_stat_get_db_blocks_hit
> (oid)*8) = number of KB read from disk for the listed database since
> the last server startup?

That will give you the number of blocks requested from the OS. The OS
does it's own caching, and so many of those reads might come from the OS
buffer cache, and not the disk itself.

Also, if you're concerned with the number since the last server restart,
make sure you have stats_reset_on_server_start set appropriately.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Replication

2007-06-19 Thread Jeff Davis

On Thu, 2007-06-14 at 16:14 -0700, Craig James wrote:
> Looking for replication solutions, I find:
> 
> Slony-I
>  Seems good, single master only, master is a single point of failure,
>  no good failover system for electing a new master or having a failed
>  master rejoin the cluster.  Slave databases are mostly for safety or
>  for parallelizing queries for performance.  Suffers from O(N^2) 
>  communications (N = cluster size).
> 

There's MOVE SET which transfers the origin (master) from one node to
another without losing any committed transactions.

There's also FAILOVER, which can set a new origin even if the old origin
is completely gone, however you will lose the transactions that haven't
been replicated yet.

To have a new node join the cluster, you SUBSCRIBE SET, and you can MOVE
SET to it later if you want that to be the master.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] performance of postgresql in replication using slony

2007-07-26 Thread Jeff Davis

On Thu, 2007-07-26 at 01:44 -0700, angga erwina wrote:
> Hi all,
> whats the benefits of replication by using slony in
> postgresql??
> My office is separate in several difference place..its
> about hundreds branch office in the difference
> place..so any one can help me to replicate our dbase
> by using slony?? and why slony??
> 

This question should be asked on the slony1-general list, you'll get
more responses there.

The benefit of using slony is that you can read from many servers rather
than just one.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

2007-09-10 Thread Jeff Davis

On Mon, 2007-09-10 at 22:44 +0100, Gregory Stark wrote:
> What I don't understand is the bit about "until Postgres gets AIO + the
> ability to post multiple concurrent IOs on index probes". Even with AIO your
> seek times are not going to be improved by wide raid stripes. And you can't
> possibly find the page at level n+1 before you've looked at the page at level
> n. Do you mean to be able to probe multiple index keys simultaneously? How
> does that work out?
> 

I think he's referring to mirrors, in which there are multiple spindles
that can return a requested block. That could mitigate random I/O, if
the I/O is asynchronous and something intelligent (OS or controller) can
schedule it.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Seqscan

2007-10-22 Thread Jeff Davis

On Mon, 2007-10-22 at 19:24 -0700, Adrian Demaestri wrote:
> Hi, 
> I think planner should use other plans than seqscan to solve querys
> like select * from hugetable limit 1, especially when the talbe is
> very large. Is it solved in newer versions or is there some open
> issues about it?. 
> thanks
> I'm working with postgres 8.0.1, 

For the query in question, what would be faster than a seqscan? It
doesn't read the whole table, it only reads until it satisfies the limit
clause. 

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Speed difference between select ... union select ... and select from partitioned_table

2007-10-26 Thread Jeff Davis

On Fri, 2007-10-26 at 16:37 -0400, Pablo Alcaraz wrote:
> Hi List!
> 
> I executed 2 equivalents queries. The first one uses a union structure. 
> The second uses a partitioned table. The tables are the same with 30 
> millions of rows each one and the returned rows are the same.
> 
> But the union query perform faster than the partitioned query.
> 

I think you mean to use UNION ALL here. UNION forces a DISTINCT, which
results in a sort operation. What surprises me is that the UNION is
actually faster than the partitioning using inheritance.

I suspect it has something to do with the GROUP BYs, but we won't know
until you post EXPLAIN ANALYZE results.

    Regards,
Jeff Davis

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Clustered/covering indexes (or lack thereof :-)

2007-11-16 Thread Jeff Davis

On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote:
> This is probably a FAQ, but I can't find a good answer...
> 
> So - are there common techniques to compensate for the lack of
> clustered/covering indexes in PostgreSQL? To be more specific - here is my
> table (simplified):
> 
> topic_id int
> post_id int
> post_text varchar(1024)
> 
> The most used query is: SELECT post_id, post_text FROM Posts WHERE
> topic_id=XXX. Normally I would have created a clustered index on topic_id,
> and the whole query would take ~1 disk seek.
> 
> What would be the common way to handle this in PostgreSQL, provided that I
> can't afford 1 disk seek per record returned?
> 

Periodically CLUSTER the table on the topic_id index. The table will not
be perfectly clustered at all times, but it will be close enough that it
won't make much difference.

There's still the hit of performing a CLUSTER, however.

Another option, if you have a relatively small number of topic_ids, is
to break it into separate tables, one for each topic_id.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] explanation for seeks in VACUUM (8.2.4)

2007-12-14 Thread Jeff Davis

On Fri, 2007-12-14 at 11:29 -0800, Jeff Davis wrote:
> "bigtable" has about 60M records, about 2M of which are dead at the time
> of VACUUM. Shared_buffers are about 1GB, and the machine has 4GB of
> memory.

Forgot to mention: version 8.2.4

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] explanation for seeks in VACUUM

2007-12-14 Thread Jeff Davis

"bigtable" has about 60M records, about 2M of which are dead at the time
of VACUUM. Shared_buffers are about 1GB, and the machine has 4GB of
memory.

If I run a "SELECT COUNT(*) FROM bigtable", and I ktrace that (FreeBSD)
for 10 seconds, I see only a handful of lseek calls (33), which is no
surprise since I am asking for sequential I/O. I assume those lseeks are
just to skip over pages that already happen to be in shared_buffers.

However, If I have several indexes on that table, and I run a VACUUM, I
observe a lot of seeking. In a 10 second interval, I saw about 5000
lseek calls in the ktrace to the same file descriptor (which is an
index). That's about one every 2ms, so I'm sure a large portion of the
file must have been in the OS buffer cache.

I just don't quite understand what's causing the lseeks.

My understanding is that vacuum uses maintenance_work_mem to hold the
list of dead tuples. In my case that's 2M row versions, times about 6
bytes per entry (in the list of dead tuples) equals about 12MB, which is
much less than 128MB maintenance_work_mem. So it doesn't appear that
maintenance_work_mem is too small.

Even if maintenance_work_mem was the limiting factor, wouldn't the
VACUUM still be operating mostly sequentially, even if it takes multiple
passes?

The only seeking that it seems like VACUUM would need to do in an index
file is when an index page completely empties out, but that wouldn't
account for 5000 lseeks in 10 seconds, would it? 

Where am I going wrong? Are many of these lseeks no-ops or something?

Regards,
Jeff Davis



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] explanation for seeks in VACUUM

2007-12-14 Thread Jeff Davis

On Fri, 2007-12-14 at 19:04 -0500, Tom Lane wrote:
> Jeff Davis <[EMAIL PROTECTED]> writes:
> > Where am I going wrong? Are many of these lseeks no-ops or something?
> 
> They're not supposed to be, but if you only tracked seeks and not
> reads or writes, it's hard to be sure what's going on.

The seeks were comparable to reads and writes, which is what surprised
me:

$ ktrace -p 42440; sleep 5; ktrace -C
$ kdump -f ktrace.out | grep "GIO   fd 38 read" | wc -l
   11479
$ kdump -f ktrace.out | grep "GIO   fd 38 wrote" | wc -l
   11480
$ kdump -f ktrace.out | grep "lseek.0x26" | wc -l
   22960

> 8.2's VACUUM should process a btree index (this is a btree index no?)
> in physical order, so I'd expect lseeks only when a page is already in
> buffers --- at least on the read side.  On the write side things might
> be a great deal less predictable.  You're cleaning out about one tuple
> in 30, so the odds are that nearly every index page is getting dirtied,
> and they're going to need to be written sometime.

Actually, the table I was VACUUMing had both btree and GIN indexes, and
I'm not entirely sure which one was happening at the time. I will
investigate more.

My intuition tells me that, if the pages are being read and dirtied
sequentially, it would be able to write mostly sequentially (at least in
theory).

In the box that was causing the problem (which had more constrained
memory than my reproduced case), it seemed to be swamped by random I/O
-- low CPU usage, and low disk usage (about 1-2 MB/s write), yet VACUUM
would take forever and the box would appear very sluggish.

Regards,
Jeff Davis




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] 8.3 synchronous_commit

2008-01-21 Thread Jeff Davis

On Mon, 2008-01-21 at 22:55 +0100, Hannes Dorbath wrote:
> I might completely misunderstand this feature. Shouldn't 
> "synchronous_commit = off" improve performance?
> 
> Whatever I do, I find "synchronous_commit = off" to degrade performance. 
> 
> Especially it doesn't like the CFQ I/O scheduler, it's not so bad with 
> deadline. Synthetic load like
> 

The CFQ scheduler is bad for performance in the tests that I have run.
When I have a chance I'll put together some tests to try to demonstrate
that.

The reason it may be bad in your case is if you have many backends
commit many transactions asynchronously, and then the WAL writer tries
to make those transactions durable, CFQ might think that the WAL writer
is "unfairly" using a lot of I/O. This is just speculation though.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] 8.3 synchronous_commit

2008-01-21 Thread Jeff Davis

On Mon, 2008-01-21 at 18:31 -0500, Greg Smith wrote:
> pgbench doesn't handle 100 clients at once very well on the same box as 
> the server, unless you have a pretty serious system.  The pgbench program 
> itself has a single process model that doesn't handle the CFQ round-robin 
> very well at all.  On top of that, the database scale should be bigger 

He was referring to the CFQ I/O scheduler. I don't think that will
affect pgbench itself, because it doesn't read/write to disk, right?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-19 Thread Jeff Davis

On Tue, 2008-02-19 at 13:03 -0500, Douglas J Hunley wrote:
> I spent a whopping seven hours restoring a database late Fri nite for a 
> client. We stopped the application, ran pg_dump -v -Ft -b -o $db > 
> ~/pre_8.3.tar on the 8.2.x db, and then upgrading the software to 8.3. I then 
> did a pg_restore -v -d $db ./pre_8.3.tar and watched it positively crawl.
> I'll grant you that it's a 5.1G tar file, but 7 hours seems excessive. 
> 

Are there lots of indexes on localized text attributes? If you have a
big table with localized text (e.g. en_US.UTF-8), it can take a long
time to build the indexes. If the file is 5GB compressed, I wouldn't be
surprised if it took a long time to restore.

Keep in mind, if you have several GB worth of indexes, they take up
basically no space in the logical dump (just the "CREATE INDEX" command,
and that's it). But they can take a lot of processor time to build up
again, especially with localized text.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-19 Thread Jeff Davis

On Tue, 2008-02-19 at 14:20 -0500, Douglas J Hunley wrote:
> > Keep in mind, if you have several GB worth of indexes, they take up
> > basically no space in the logical dump (just the "CREATE INDEX" command,
> > and that's it). But they can take a lot of processor time to build up
> > again, especially with localized text.
> >
> 
> that could be a factor here. It is a UNICODE db, and we do a lot of 
> text-based 
> indexing for the application

I assume you're _not_ talking about full text indexes here.

These factors:
* unicode (i.e. non-C locale)
* low I/O utilization
* indexes taking up most of the 7 hours

mean that we've probably found the problem.

Localized text uses sorting rules that are not the same as binary sort
order, and it takes much more CPU power to do the comparisons, and sorts
are already processor-intensive operations.

Unfortunately postgresql does not parallelize this sorting/indexing at
all, so you're only using one core.

I'd recommend restoring everything except the indexes, and then you can
restore the indexes concurrently in several different sessions so that
it uses all of your cores. Build your primary key/unique indexes first,
and then after those are built you can start using the database while
the rest of the indexes are building (use "CREATE INDEX CONCURRENTLY"). 

Regards,
Jeff Davis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-19 Thread Jeff Davis

On Tue, 2008-02-19 at 14:28 -0500, Dave Cramer wrote:
> shared buffers is *way* too small as is effective cache
> set them to 2G/6G respectively.

They are way too small, but I don't think that explains the index
creation time.

Effective_cache_size is only used by the planner, and this problem is
not caused by a poorly chosen plan.

It's important to set shared_buffers higher as well, but he has so much
RAM compared with his dataset that he's certainly not going to disk. I
don't think this explains it either. 

I think it's just the result of building a lot of indexes on localized
text using only one core at a time.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-20 Thread Jeff Davis

On Wed, 2008-02-20 at 14:31 +0530, Pavan Deolasee wrote:
> I think it would be interesting if we can build these indexes in parallel.
> Each index build requires a seq scan on the table. If the table does
> not fit in shared buffers, each index build would most likely result
> in lots of IO.

He's already said that his I/O usage was not the problem. For one thing,
he has 8GB of memory for a 5GB dataset.

Even when the table is much larger than memory, what percentage of the
time is spent on the table scan? A table scan is O(N), whereas an index
build is O(N logN). If you combine that with expensive comparisons, e.g.
for localized text, then I would guess that the index building itself
was much more expensive than the scans themselves.

However, building indexes in parallel would allow better CPU
utilization.

> One option would be to add this facility to the backend so that multiple
> indexes can be built with a single seq scan of the table. In theory, it
> should be possible, but might be tricky given the way index build works
> (it calls respective ambuild method to build the index which internally
> does the seq scan).

I don't think that this would be necessary, because (as you say below)
the synchronized scan facility should already handle this.

> Other option is to make pg_restore multi-threaded/processed. The
> synchronized_scans facility would then synchronize the multiple heap
> scans. ISTM that if we can make pg_restore mult-processed, then
> we can possibly add more parallelism to the restore process.

I like this approach more. I think that pg_restore is the right place to
do this, if we can make the options reasonably simple enough to use.

See:

http://archives.postgresql.org/pgsql-hackers/2008-02/msg00699.php

Regards,
Jeff Davis

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-20 Thread Jeff Davis

On Wed, 2008-02-20 at 18:18 +, Matthew wrote:
> On Wed, 20 Feb 2008, Jeff Davis wrote:
> > However, building indexes in parallel would allow better CPU
> > utilization.
> 
> We have a process here that dumps a large quantity of data into an empty 
> database, much like pg_restore, and then creates all the indexes at the 
> end. In order to speed up that bit, I initially made it spawn off several 
> threads, and make each thread run a CREATE INDEX operation in parallel. 
> However, this resulted in random errors from Postgres - something to do 
> with locked tables. So I changed it so that no two threads create indexes 
> for the same table at once, and that solved it.

What was the specific problem? Were they UNIQUE indexes? Were you trying
to write to the tables while indexing? Did you use "CONCURRENTLY"?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] disabling an index without deleting it?

2008-02-27 Thread Jeff Davis

On Tue, 2008-02-26 at 17:22 -0500, Tom Lane wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> > "Scott Marlowe" <[EMAIL PROTECTED]> wrote:
> >> begin;
> >> drop index abc_dx;
> >> select 
> >> rollback;
> >> 
> >> and viola, your index is still there.  note that there are likely some
> >> locking issues with this, so be careful with it in production.  But on
> >> a test box it's a very easy way to test various indexes.
> 
> > Wouldn't you also bloat the index?
> 
> No, what makes you think that?  The index won't change at all in the
> above example.  The major problem is, as Scott says, that DROP INDEX
> takes exclusive lock on the table so any other sessions will be locked
> out of it for the duration of your test query.

It may cause catalog bloat though, right?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Understanding histograms

2008-04-30 Thread Jeff Davis

On Wed, 2008-04-30 at 10:43 -0400, Tom Lane wrote:
> > Instead I would expect an estimate of "rows=0" for values of const
> > that are not in the MCV list and not in the histogram.
> 
> Surely that's not very sane?  The MCV list plus histogram generally
> don't include every value in the table.  IIRC the estimate for values
> not present in the MCV list is (1 - sum(MCV frequencies)) divided by
> (n_distinct - number of MCV entries), which amounts to assuming that
> all values not present in the MCV list occur equally often.  The weak
> spot of course is that the n_distinct estimate may be pretty inaccurate.

My understanding of Len's question is that, although the MCV list plus
the histogram don't include every distinct value in the general case,
they do include every value in the specific case where the histogram is
not full.

Essentially, this seems like using the histogram to extend the MCV list
such that, together, they represent all distinct values. This idea only
seems to help when the number of distinct values is greater than the
max size of MCVs, but less than the max size of MCVs plus histogram
bounds.

I'm not sure how much of a gain this is, because right now that could
be accomplished by increasing the statistics for that column (and
therefore all of your distinct values would fit in the MCV list). Also
the statistics aren't guaranteed to be perfectly up-to-date, so an
estimate of zero might be risky.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] The shared buffers challenge

2011-05-27 Thread Jeff Davis

On Thu, 2011-05-26 at 09:31 -0500, Merlin Moncure wrote:
> Where they are most helpful is for masking of i/o if
> a page gets dirtied >1 times before it's written out to the heap

Another possible benefit of higher shared_buffers is that it may reduce
WAL flushes. A page cannot be evicted from shared_buffers until the WAL
has been flushed up to the page's LSN ("WAL before data"); so if there
is memory pressure to evict dirty buffers, it may cause extra WAL
flushes.

I'm not sure what the practical effects of this are, however, but it
might be an interesting thing to test.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Strange behavior of child table.

2011-06-01 Thread Jeff Davis

On Tue, 2011-05-31 at 10:20 +0300, Jenish wrote:
> Hi All,  
> 
> I have created partition on table Round_Action , which has 3 child
> partition tables.
> 
> 
> When I am firing a simple select query with limit on parent table it
> is taking huge time to execute. But when I am firing this query
> directly on child table it is taking few milliseconds.
> 
> 
> EXP.
> select * from Round_Action where action_id =5 limit 100 →
> execution time 80 sec
> select * from Round_Action_CH1 action_id =5 limit 100 → execution
> time 0.1 sec
> 
> Round_Action is the parent table and has no record in the tables, all
> the records are lying in child tables.

Run EXPLAIN ANALYZE on each of those queries, and post the results.

See http://wiki.postgresql.org/wiki/SlowQueryQuestions for a guide on
how to give the necessary information for others to help.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] execution time for first INSERT

2011-07-08 Thread Jeff Davis

On Fri, 2011-07-08 at 04:23 -0700, Sergio Mayoral wrote:
> this must be something with the parser stage and since i am doing
> every time the same queries, I would like to know if there is a way to
> cache these queries in order to speed up the first INSERT.

I doubt it's the parser.

Seeing as it's around a couple ms at minimum, it's probably some kind of
IO latency. You could see that by wrapping the statements in a big
transaction (BEGIN/END block) -- I bet the inserts go very quickly and
the final commit takes longer.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] IN or EXISTS

2011-09-22 Thread Jeff Davis

On Wed, 2011-08-31 at 09:33 +0800, Craig Ringer wrote:
> On the other hand, the `IN' subquery is uncorrelated needs only run 
> once, where the `EXISTS' subquery is correlated and has to run once for 
> every outer record.

If the EXISTS looks semantically similar to an IN (aside from NULL
semantics), then it can be made into a semijoin. It doesn't require
re-executing any part of the plan.

I don't think there are any cases where [NOT] IN is an improvement, am I
mistaken?

> Another complication is the possible presence of NULL in an IN list. 
> Getting NULLs in `IN' lists is a common source of questions on this 
> list, because people are quite surprised by how it works. EXISTS avoids 
> the NULL handling issue (and in the process demonstrates how woefully 
> inconsistent SQL's handling of NULL really is).

Absolutely. The NULL behavior of IN is what makes it hard to optimize,
and therefore you should use EXISTS instead if the semantics are
suitable.

> Theoretically the query planner could transform:
> 
> SELECT * from y WHERE y.id IN (SELECT DISTINCT z.y_id FROM z WHERE 
> z.y_id IS NOT NULL);
> 
> into:
> 
> SELECT * FROM y WHERE EXISTS (SELECT 1 FROM z WHERE z.y_id = y.id)
> 
> ... or vice versa depending on which it thought would be faster.

Although those two queries are semantically the same (I think), a lot of
very similar pairs of queries are not equivalent. For instance, if it
was a NOT IN you couldn't change that to a NOT EXISTS.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] delete/recreate indexes

2011-10-19 Thread Jeff Davis

On Wed, 2011-10-19 at 08:03 -0700, alan wrote:
> So I thought I’d just run this once (via cron) every morning.
> BEGIN;
> DROP INDEX data_unique;
> UPDATE data SET datum = (data.datum + interval '24 hours');
> CREATE UNIQUE INDEX data_unique ON public.data USING BTREE
> (device, group, datum);
> COMMIT;
> 
> But
> 1.it’s taking forever and
> 2.I’m seeing that my disk is filling up real fast.

An unrestricted update will end up rewriting the whole table. It's
advisable to run VACUUM afterward, so that the wasted space can be
reclaimed. What version are you on? Do you have autovacuum enabled?

Also, to take a step back, why do you try to keep the timestamps
changing like that? Why not store the information you need in the record
(e.g. insert time as well as the datum) and then compute the result you
need using a SELECT (or make it a view for convenience)? Fundamentally,
these records aren't changing, you are just trying to interpret them in
the context of the current day. That should be done using a SELECT, not
an UPDATE.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Usage of pg_stat_database

2011-10-27 Thread Jeff Davis

On Mon, 2011-10-24 at 19:34 +0530, Amitabh Kant wrote:
> If I read the xact_commit field returned by "Select * from
> pg_stat_database" multiple times, and then average the difference
> between consecutive values, would this give an approx idea about the
> transactions per second in my database? 

Yes, approximately. It relies on the stats messages.

> Does this figure include the number of select statements being
> executed in the db? 
> 
Yes, a quick test shows that select statements count as well. However,
it seems that very simple selects, like "SELECT 1" might not send the
stats messages right away, and they might end up getting lost.

Regards,
Jeff Davis



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Recover rows deleted

2012-05-29 Thread Jeff Davis

On Mon, 2012-05-28 at 19:24 +0100, Alejandro Carrillo wrote:
> Hi,
> 
> 
> ¿How I can recover a row delete of a table that wasn't vacuummed?
> I have PostgreSQL 9.1 in Windows XP/7.

The first thing to do is shut down postgresql and take a full backup of
the data directory, including any archived WAL you might have (files in
pg_xlog). Make sure this is done first.

Next, do you have any backups? If you have a base backup from before the
delete, and all the WAL files from the time of the base backup until
now, then you can try point-in-time recovery to the point right before
the data loss:

http://www.postgresql.org/docs/9.1/static/continuous-archiving.html

If not, are we talking about a single row, or many rows? If it's a
single row you might be able to do some manual steps, like examining the
pages to recover the data.

Another option is to try pg_resetxlog (make sure you have a safe backup
first!):

http://www.postgresql.org/docs/9.1/static/app-pgresetxlog.html

And try setting the current transaction ID to just before the delete
ran. Then you may be able to use pg_dump or otherwise export the deleted
rows.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance of CLUSTER

2012-06-11 Thread Jeff Davis

On Mon, 2012-06-11 at 08:42 -0500, Shaun Thomas wrote:
> On 06/10/2012 03:20 AM, Mark Thornton wrote:
> 
> > 4. 3980922 rows, 1167MB, 276s
> > 5. 31843368 rows, 9709MB, ~ 10 hours
> 
> Just judging based on the difference between these two, it would appear 
> to be from a lot of temp space thrashing. An order of magnitude more 
> rows shouldn't take over 100x longer to cluster, even with GIST. What's 
> your maintenance_work_mem setting?

GiST can have a large impact depending on all kinds of factors,
including data distribution.

9.2 contains some important improvements in GiST build times for cases
where the index doesn't fit in memory. Mark, can you please try your
experiments on the 9.2beta and tell us whether that helps you?

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Deferred constraints performance impact ?

2012-08-12 Thread Jeff Davis

On Thu, 2012-07-19 at 20:27 -0600, mark wrote:
> I understand a lot more may have to be tracked through a transaction and
> there could be some impact from that. Similar to an after update trigger? Or
> are the two not comparable in terms of impact from what is tracked and then
> checked. 

They should be very comparable to AFTER triggers. It's actually a little
better because there are optimizations to avoid queuing constraint
checks if we know it will pass.

I would recommend testing a few degenerate cases to see how big the
impact is, and try to see if that is reasonable for your application.

Regards,
    Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select on 22 GB table causes "An I/O error occured while sending to the backend." exception

2008-08-28 Thread Jeff Davis

On Thu, 2008-08-28 at 00:56 -0400, Tom Lane wrote:
> Actually, the problem with Linux' OOM killer is that it
> *disproportionately targets the PG postmaster*, on the basis not of
> memory that the postmaster is using but of memory its child processes
> are using.  This was discussed in the PG archives a few months ago;
> I'm too lazy to search for the link right now, but the details and links
> to confirming kernel documentation are in our archives.
> 

http://archives.postgresql.org/pgsql-hackers/2008-02/msg00101.php

It's not so much that the OOM Killer targets the parent process for a
fraction of the memory consumed by the child. It may not be a great
design, but it's not what's causing the problem for the postmaster.

The problem for the postmaster is that the OOM killer counts the
children's total vmsize -- including *shared* memory -- against the
parent, which is such a bad idea I don't know where to start. If you
have shared_buffers set to 1GB and 25 connections, the postmaster will
be penalized as though it was using 13.5 GB of memory, even though all
the processes together are only using about 1GB! 

Not only that, killing a process doesn't free shared memory, so it's
just flat out broken.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select on 22 GB table causes "An I/O error occured while sending to the backend." exception

2008-08-28 Thread Jeff Davis

On Wed, 2008-08-27 at 23:23 -0700, [EMAIL PROTECTED] wrote:
> there are periodic flamefests on the kernel mailing list over the OOM 
> killer, if you can propose a better algorithm for it to use than the 
> current one that doesn't end up being just as bad for some other workload 
> the kernel policy can be changed.
> 

Tried that: http://lkml.org/lkml/2007/2/9/275

All they have to do is *not* count shared memory against the process (or
at least not count it against the parent of the process), and the system
may approximate sanity.

> IIRC the reason why it targets the parent process is to deal with a 
> fork-bomb type of failure where a program doesn't use much memory itself, 
> but forks off memory hogs as quickly as it can. if the OOM killer only 
> kills the children the problem never gets solved.

But killing a process won't free shared memory. And there is already a
system-wide limit on shared memory. So what's the point of such a bad
design?

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] why does this use the wrong index?

2008-09-19 Thread Jeff Davis

> So, What can I do to encourage Postgres to use the first index even when the
> date range is smaller.
> 

It looks like PostgreSQL is estimating the selectivity of your date
ranges poorly. In the second (bad) plan it estimates that the index scan
with the filter will return 1 row (and that's probably because it
estimates that the date range you specify will match only one row).

This leads PostgreSQL to choose the narrower index because, if the index
scan is only going to return one row anyway, it might as well scan the
smaller index.

What's the n_distinct for start_time?

=> select n_distinct from pg_stats where tablename='ad_log' and
attname='start_time';

If n_distinct is near -1, that would explain why it thinks that it will
only get one result.

Based on the difference between the good index scan (takes 0.056ms per
loop) and the bad index scan with the filter (311ms per loop), the
"player" condition must be very selective, but PostgreSQL doesn't care
because it already thinks that the date range is selective.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] why does this use the wrong index?

2008-09-19 Thread Jeff Davis

On Fri, 2008-09-19 at 11:25 -0700, Jeff Davis wrote:
> What's the n_distinct for start_time?

Actually, I take that back. Apparently, PostgreSQL can't change "x
BETWEEN y AND y" into "x=y", so PostgreSQL can't use n_distinct at all.

That's your problem. If it's one day only, change it to equality and it
should be fine.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL OR performance

2008-11-05 Thread Jeff Davis

On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote:
> For a long time already I can see very poor OR performance in
> postgres. 
> If one have query like "select something from table where condition1
> or condition2" it may take ages to execute while
> "select something from table where condition1" and "select something
> from table where condition2" are executed very fast and
> "select something from table where condition1 and not condition2 union
> all select something from table where condition2" gives required
> results fast
> 

What version are you using?

Have you run "VACUUM ANALYZE"?

Next, do:

EXPLAIN ANALYZE select something from table where condition1 or
condition2;

for each of the queries, unless that query takes so long you don't want
to wait for the result. In that case, omit the "ANALYZE" and just do
"EXPLAIN ...".

Then post those results to the list. These tell us what plans PostgreSQL
is choosing and what it estimates the costs to be. If it's the output of
EXPLAIN ANALYZE, it also runs the query and tells us what the costs
really are.

>From that, we can see where the planner is going wrong, and what you
might do to change it.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )

2009-08-13 Thread Jeff Davis

[ moving to -hackers ]

If this topic has been discussed previously, please point me to the
earlier threads.

Why aren't we more opportunistic about freezing tuples? For instance, if
we already have a dirty buffer in cache, we should be more aggressive
about freezing those tuples than freezing tuples on disk.

I looked at the code, and it looks like if we freeze one tuple on the
page during VACUUM, we mark it dirty. Wouldn't that be a good
opportunity to freeze all the other tuples on the page that we can?

Or, perhaps when the bgwriter is flushing dirty buffers, it can look for
opportunities to set hint bits or freeze tuples.

Regards,
    Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] maintain_cluster_order_v5.patch

2009-10-19 Thread Jeff Davis

On Mon, 2009-10-19 at 21:32 +0200, ph...@apra.asso.fr wrote:
> Hi all,
> 
> The current discussion about "Indexes on low cardinality columns" let
> me discover this 
> "grouped index tuples" patch (http://community.enterprisedb.com/git/)
> and its associated 
> "maintain cluster order" patch
> (http://community.enterprisedb.com/git/maintain_cluster_order_v5.patch)
> 
> This last patch seems to cover the TODO item named "Automatically
> maintain clustering on a table".

The TODO item isn't clear about whether the order should be strictly
maintained, or whether it should just make an effort to keep the table
mostly clustered. The patch mentioned above makes an effort, but does
not guarantee cluster order.

> As this patch is not so new (2007), I would like to know why it has
> not been yet integrated in a standart version of PG (not well
> finalized ? not totaly sure ? not corresponding to the way the core
> team would like to address this item ?) and if there are good chance
> to see it committed in a near future.

Search the archives on -hackers for discussion. I don't think either of
these features were rejected, but some of the work and benchmarking have
not been completed.

If you can help (either benchmark work or C coding), try reviving the
features by testing them and merging them with the current tree. I
recommend reading the discussion first, to see if there are any major
problems.

Personally, I'd like to see the GIT feature finished as well. When I
have time, I was planning to take a look into it.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-22 Thread Jeff Davis

On Thu, 2009-10-22 at 18:28 +0200, Jesper Krogh wrote:
> I somehow would expect the index-search to take advantage of the MCV's
> informations in the statistics that sort of translate it into a search
> and post-filtering (as PG's queryplanner usually does at the SQL-level).

MCVs are full values that are found in columns or indexes -- you aren't
likely to have two entire documents that are exactly equal, so MCVs are
useless in your example.

I believe that stop words are a more common way of accomplishing what
you want to do, but they are slightly more limited: they won't be
checked at any level, and so are best used for truly common words like
"and". From your example, I assume that you still want the word checked,
but it's not selective enough to be usefully checked by the index.

In effect, what you want are words that aren't searched (or stored) in
the index, but are included in the tsvector (so the RECHECK still
works). That sounds like it would solve your problem and it would reduce
index size, improve update performance, etc. I don't know how difficult
it would be to implement, but it sounds reasonable to me.

The only disadvantage is that it's more metadata to manage -- all of the
existing data like dictionaries and stop words, plus this new "common
words". Also, it would mean that almost every match requires RECHECK. It
would be interesting to know how common a word needs to be before it's
better to leave it out of the index.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-22 Thread Jeff Davis

On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote:
> This is indeed information on individual terms from the statistics that
> enable this.

My mistake, I didn't know it was that smart about it.

> > In effect, what you want are words that aren't searched (or stored) in
> > the index, but are included in the tsvector (so the RECHECK still
> > works). That sounds like it would solve your problem and it would reduce
> > index size, improve update performance, etc. I don't know how difficult
> > it would be to implement, but it sounds reasonable to me.

> That sounds like it could require an index rebuild if the distribution
> changes?

My thought was that the common words could be declared to be common the
same way stop words are. As long as words are only added to this list,
it should be OK.

> That would be another plan to pursue, but the MCV is allready there

The problem with MCVs is that the index search can never eliminate
documents because they don't contain a match, because it might contain a
match that was previously an MCV, but is no longer.

Also, MCVs are relatively few -- you only get ~1000 or so. There might
be a lot of common words you'd like to track.

Perhaps ANALYZE can automatically add the common words above some
frequency threshold to the list?

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Jeff Davis

On Fri, 2009-10-23 at 09:26 +0100, Richard Huxton wrote:
> That structure isn't exposed to the planner though, so it doesn't
> benefit from any re-ordering the planner would normally do for normal
> (exposed) AND/OR clauses.

I don't think that explains it, because in the second plan you only see
a single index scan with two quals:

   Index Cond: ((ftsbody_body_fts @@
 to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@
 to_tsquery('spellerror'::text)))

So it's entirely up to GIN how to execute that.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Jeff Davis

On Fri, 2009-10-23 at 09:45 +0200, jes...@krogh.cc wrote:
> No, it definately has to go visit the index/table to confirm findings, but
> that why I wrote Queryplan in the subject line, because this os only about
> the strategy to pursue to obtain the results. And a strategy about
> limiting the amout of results as early as possible (as PG usually does)
> would be what I'd expect and MCV can help it guess on that.

I see what you're saying: you could still index the common terms like
normal, but just not look for anything in the index if it's an MCV. That
sounds reasonable, based on the numbers you provided.

>Index Cond: (ftsbody_body_fts @@ to_tsquery('commonterm &
> spellerror'::text))
>  Total runtime: 862.771 ms
> (6 rows)

...

>Index Cond: ((ftsbody_body_fts @@
> to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@
> to_tsquery('spellerror'::text)))
>  Total runtime: 8.724 ms
> (6 rows)
> 

Something seems strange here. Both are a single index scan, but having a
single complex search key is worse than having two simple search keys. 

Perhaps the real problem is that there's a difference between these
cases at all? I don't see any reason why the first should be more
expensive than the second.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Jeff Davis

On Fri, 2009-10-23 at 17:27 +0100, Richard Huxton wrote:
> Returns an array of keys given a value to be queried; that is, query is
> the value on the right-hand side of an indexable operator whose
> left-hand side is the indexed column
> 
> So - that is presumably two separate arrays of keys being matched
> against, and the AND means if the first fails it'll never check the second.

My point was that if it's only one index scan in both cases, then GIN
should have the same information in both cases, right? So why are they
being treated differently?

I must be missing something.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Savepoints in transactions for speed?

2012-11-28 Thread Jeff Davis

On Tue, 2012-11-27 at 22:16 -0300, Claudio Freire wrote:
> Updates, are faster if batched, if your business logic allows it,
> because it creates less bloat and creates more opportunities for with
> HOT updates. I don't think it applies to inserts, though, and I
> haven't heard it either.

Huge updates (e.g. UPDATE with no WHERE clause) are less likely to
benefit from HOT. HOT has two main optimizations:

1. Remove dead tuples faster without waiting for VACUUM -- this only
works if the transaction that updated/deleted the tuple actually
finished (otherwise the tuple can't be removed yet), so it only benefits
the *next* update to come along. But if it's one big update, then VACUUM
is probably just as good at cleaning up the space.

2. Doesn't make new index entries for the new tuple; reuses the old
index entries -- this only works if the update is on the same page, but
large updates tend to fill pages up (because of the buildup of dead
tuples) and force new to go to new pages.

HOT is essentially designed for lots of small updates, which didn't
perform well before PG 8.3.

Batching of inserts/updates/deletes has a big benefit over separate
transactions, but only up to a point, after which it levels off. I'm not
sure exactly when that point is, but after that, the downsides of
keeping a transaction open (like inability to remove the previous
version of an updated tuple) take over.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Savepoints in transactions for speed?

2012-11-28 Thread Jeff Davis

On Tue, 2012-11-27 at 16:04 -0600, Mike Blackwell wrote:
> I need to delete about 1.5 million records from a table and reload it
> in one transaction.  The usual advice when loading with inserts seems
> to be group them into transactions of around 1k records.  Committing
> at that point would leave the table in an inconsistent state.  Would
> issuing a savepoint every 1k or so records negate whatever downside
> there is to keeping a transaction open for all 1.5 million records, or
> just add more overhead?

A large transaction isn't really a big problem for postgres, and 1.5M
records should be processed quickly anyway.

The main problem with a long-running delete or update transaction is
that the dead tuples (deleted tuples or the old version of an updated
tuple) can't be removed until the transaction finishes. That can cause
temporary "bloat", but 1.5M records shouldn't be noticeable. 

Adding subtransactions into the mix won't help, but probably won't hurt,
either. The transaction will still run just as long, and you still can't
delete the tuples ahead of time (unless you abort a subtransaction). If
you *do* use subtransactions, make sure to release them as quickly as
you create them (don't just use ROLLBACK TO, that still leaves the
savepoint there); having 1500 open subtransactions might cause
performance problems elsewhere.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Savepoints in transactions for speed?

2012-11-29 Thread Jeff Davis

On Thu, 2012-11-29 at 00:48 -0300, Claudio Freire wrote:
> Not really that fast if you have indices (and who doesn't have a PK or two).
> 
> I've never been able to update (update) 2M rows in one transaction in
> reasonable times (read: less than several hours) without dropping
> indices. Doing it in batches is way faster if you can't drop the
> indices, and if you can leverage HOT updates.

I tried a quick test with 2M tuples and 3 indexes over int8, numeric,
and text (generated data). There was also an unindexed bytea column.
Using my laptop, a full update of the int8 column (which is indexed,
forcing cold updates) took less than 4 minutes.

I'm sure there are other issues with real-world workloads, and I know
that it's wasteful compared to something that can make use of HOT
updates. But unless there is something I'm missing, it's not really
worth the effort to batch if that is the size of the update.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [BUGS] BUG #8130: Hashjoin still gives issues

2013-05-01 Thread Jeff Davis

On Wed, 2013-05-01 at 17:44 +0200, Stefan de Konink wrote:
> Combined with the recent bugfix regarding hash 
> estimation, it gives me a good indication that there might be a bug.

To which recent bugfix are you referring?

The best venue for fixing an issue like this is pgsql-performance -- it
doesn't make too much difference whether it's a "bug" or not.
Performance problems sometimes end up as bugs and sometimes end up being
treated more like an enhancement; but most of the progress is made on
pgsql-performance regardless.

Regards,
Jeff Davis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

1 2 >

100 matches

Mail list logo