Re: RES: [PERFORM] select on 1milion register = 6s

2007-07-29 Thread Decibel!
On Sat, Jul 28, 2007 at 10:36:16PM +, Ragnar wrote:
> On lau, 2007-07-28 at 17:12 -0300, Bruno Rodrigues Siqueira wrote:
> 
> > where
> > 
> > to_char( data_encerramento ,'-mm') 
> > between   '2006-12' and  '2007-01'
> 
> assuming data_encerramento is a date column, try:
> WHERE data_encerramento between   '2006-12-01' and  '2007-01-31'

IMO, much better would be:

WHERE data_encerramento >= '2006-12-01' AND data_encerramento <
'2007-02-01'

This means you don't have to worry about last day of the month or
timestamp precision. In fact, since the field is actually a timestamp,
the between posted above won't work correctly.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp5eYJsdKDrX.pgp
Description: PGP signature


Re: RES: RES: [PERFORM] select on 1milion register = 6s

2007-07-29 Thread Decibel!
On Sun, Jul 29, 2007 at 01:44:23PM -0300, Bruno Rodrigues Siqueira wrote:
> EXPLAIN
>  ANALYZE
> select 
>to_char(data_encerramento,'mm/')  as opcoes_mes, 
>to_char(data_encerramento,'-mm') as ordem from detalhamento_bas 
> where
> 
> data_encerramento >=  '01/12/2006'  and
> data_encerramento < '01/02/2007' 
> 
> GROUP BY opcoes_mes, ordem
> ORDER BY ordem DESC
> 
> 
> 
> QUERY PLAN
> Sort  (cost=219113.10..219113.10 rows=4 width=8) (actual
> time=10079.212..10079.213 rows=2 loops=1)
>   Sort Key: to_char(data_encerramento, '-mm'::text)
>   ->  HashAggregate  (cost=219113.09..219113.09 rows=4 width=8) (actual
> time=10079.193..10079.195 rows=2 loops=1)
> ->  Seq Scan on detalhamento_bas  (cost=0.00..217945.41 rows=2335358
> width=8) (actual time=0.041..8535.792 rows=2335819 loops=1)
>   Filter: ((data_encerramento >= '2006-12-01
> 00:00:00'::timestamp without time zone) AND (data_encerramento < '2007-02-01
> 00:00:00'::timestamp without time zone))
> Total runtime: 10079.256 ms
> 
> Strange!!! Why does the index not works?

It's unlikely that it's going to be faster to index scan 2.3M rows than
to sequential scan them. Try setting enable_seqscan=false and see if it
is or not.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp7DXcKrZWrf.pgp
Description: PGP signature


Re: RES: RES: [PERFORM] select on 1milion register = 6s

2007-07-29 Thread Decibel!
Please reply-all so others can learn and contribute.

On Sun, Jul 29, 2007 at 09:38:12PM -0700, Craig James wrote:
> Decibel! wrote:
> >It's unlikely that it's going to be faster to index scan 2.3M rows than
> >to sequential scan them. Try setting enable_seqscan=false and see if it
> >is or not.
> 
> Out of curiosity ... Doesn't that depend on the table?  Are all of the data 
> for one row stored contiguously, or are the data stored column-wise?  If 
> it's the former, and the table has hundreds of columns, or a few columns 
> with large text strings, then wouldn't the time for a sequential scan 
> depend not on the number of rows, but rather the total amount of data?

Yes, the time for a seqscan is mostly dependent on table size and not
the number of rows. But the number of rows plays a very large role in
the cost of an indexscan.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828


pgpQf4XPs38J4.pgp
Description: PGP signature


Re: [PERFORM] Vacuum looping?

2007-07-30 Thread Decibel!

On Jul 30, 2007, at 9:04 AM, Steven Flatt wrote:
On 7/28/07, Jim C. Nasby <[EMAIL PROTECTED]> wrote: What are your  
vacuum_cost_* settings? If you set those too aggressively

you'll be in big trouble.

autovacuum_vacuum_cost_delay = 100


Wow, that's *really* high. I don't think I've ever set it higher than  
25. I'd cut it way back.



autovacuum_vacuum_cost_limit = 200

These are generally fine, autovacuum keeps up, and there is minimal  
impact on the system.


vacuum_cost_delay = 100
vacuum_cost_limit = 1000

We set this cost_limit a little higher so that, in the few cases  
where we have to intervene manually, vacuum runs faster.


IIRC, when the cost delay was initially introduced (8.0), someone did  
testing and decided that the cost limit of 200 was optimal, so I  
wouldn't go changing it like that without good reason.


Normally, I'll use a delay of 10ms on good disk hardware, and 20ms on  
slower hardware.

--
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] cpu throttling

2007-08-03 Thread Decibel!
On Thu, Aug 02, 2007 at 09:14:37AM -0700, Alan Hodgson wrote:
> On Thursday 02 August 2007 09:02, "Bryan Murphy" <[EMAIL PROTECTED]> 
> wrote:
> > My question:  Is there a way I can decrease the priority of a specific
> > query, or determine the PID of the process it is running in?  I'd like
> > to throw together a quick shell script if at all possible, as right
> > now I have to monitor the process manually and we'll have fixed the
> > problem long before we have the chance to implement proper database
> > clustering.
> 
> select procpid from pg_stat_activity  where current_query   
>like '%stored_proc%' and current_query not like '%pg_stat_activity%';
> 
> requires stats_command_string to be enabled
> 
> I'm surprised your operating system doesn't automatically lower the priority 
> of the process, though ..

The OS will only lower it to a certain extent.

Also, make sure you understand the concept of priority inversion before
going into production with this solution.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpwzHh5wWykY.pgp
Description: PGP signature


Re: [PERFORM] TRUNCATE TABLE

2007-08-05 Thread Decibel!
On Sat, Aug 04, 2007 at 11:39:31PM +0200, Adriaan van Os wrote:
> Kevin Grittner wrote:
> >>>>On Mon, Jul 16, 2007 at  7:18 PM, in message 
> >>>><[EMAIL PROTECTED]>,
> >Tom Lane <[EMAIL PROTECTED]> wrote: 
> >>Somehow, autovac is doing something that makes the filesystem go nuts
> >>every so often, and take an astonishingly long time to create an empty
> >>file.  But autovac itself doesn't create or delete any files, so what's
> >>up here?
> > 
> >Have you ruled out checkpoints as the culprit?
> 
> That's a good question. I will do some more tests, but I also suspect fsync 
> "cascading"
> <http://www.uwsg.iu.edu/hypermail/linux/kernel/0708.0/1435.html>.

Interesting. I'm guessing that ext3 has to sync out the entire journal
up to the point in time that fsync() is called, regardless of what
files/information the journal contains. Fortunately I think it's common
knowledge to mount PostgreSQL filesystems with data=writeback, which
hopefully eliminates much of that bottleneck... but if you don't do
noatime you're probably still spewing a lot out to the drive.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpSoBDcFGFrM.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-07 Thread Decibel!
On Tue, Aug 07, 2007 at 02:33:19PM +0100, Richard Huxton wrote:
> Mark Makarowsky wrote:
> >I have a table with 4,889,820 records in it.  The
> >table also has 47 fields.  I'm having problems with
> >update performance.  Just as a test, I issued the
> >following update:
> >
> >update valley set test='this is a test'
> >
> >This took 905641 ms.  Isn't that kind of slow?
> 
> The limiting factor here will be how fast you can write to your disk. 

Well, very possibly how fast you can read, too. Using your assumption of
1k per row, 5M rows means 5G of data, which might well not fit in
memory. And if the entire table's been updated just once before, even
with vacuuming you're now at 10G of data.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpC7cdBrUEbs.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-07 Thread Decibel!
On Tue, Aug 07, 2007 at 02:36:18PM -0500, Scott Marlowe wrote:
> On 8/7/07, Decibel! <[EMAIL PROTECTED]> wrote:
> > On Tue, Aug 07, 2007 at 02:33:19PM +0100, Richard Huxton wrote:
> > > Mark Makarowsky wrote:
> > > >I have a table with 4,889,820 records in it.  The
> > > >table also has 47 fields.  I'm having problems with
> > > >update performance.  Just as a test, I issued the
> > > >following update:
> > > >
> > > >update valley set test='this is a test'
> > > >
> > > >This took 905641 ms.  Isn't that kind of slow?
> > >
> > > The limiting factor here will be how fast you can write to your disk.
> >
> > Well, very possibly how fast you can read, too. Using your assumption of
> > 1k per row, 5M rows means 5G of data, which might well not fit in
> > memory. And if the entire table's been updated just once before, even
> > with vacuuming you're now at 10G of data.
> 
> Where one might have to update just one column of a wide table often,
> it's often a good idea to move that column into its own dependent
> table.

Yeah, I've used "vertical partitioning" very successfully in the past,
though I've never done it for just a single field. I'll typically leave
the few most common fields in the "main" table and pull everything else
into a second table.

> Or just don't update one column of every row in  table...

Yeah, that too. :) Though sometimes you can't avoid it.

I should mention that if you can handle splitting the update into
multiple transactions, that will help a lot since it means you won't be
doubling the size of the table.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp9IchIPHEK0.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-08 Thread Decibel!
On Tue, Aug 07, 2007 at 08:46:20PM -0500, Erik Jones wrote:
> Vertical partitioning is where you split up your table on disk by  
> columns, i.e on the vertical lines.  He quoted it because Postgres  
> doesn't actually support it transparently but you can always fake it  
> by splitting up your table.  For example, given the following table  
> wherein column bar gets updated a lot but the others don't:
> 
> create table foo (
> idint not null,
> bar   int,
> baz   int,
> 
> primary key (id)
> );
> 
> You could split it up like so:
> 
> create table foo_a (
> idint,
> baz   int,
> 
> primary key (id)
> );
> 
> create table foo_b (
> foo_idint,
> bar   int,
> 
> foreign key foo_a_id (foo_id) references foo_a (id)
> );

FWIW, the cases where I've actually used this have been on much wider
tables, and a number of the attributes are in-frequently accessed. An
example would be if you keep snail-mail address info for users; you
probably don't use those fields very often, so they would be good
candidates for going into a second table.

When does it actually make sense to use this? When you do a *lot* with a
small number of fields in the table. In this example, perhaps you very
frequently need to look up either user_name or user_id, probably via
joins. Having a table with just name, id, perhaps password and a few
other fields might add up to 50 bytes per row (with overhead), while
address information by itself could easily be 50 bytes. So by pushing
that out to another table, you cut the size of the main table in half.
That means more efficient use of cache, faster seqscans, etc.

The case Erik is describing is more unique to PostgreSQL and how it
handles MVCC. In some cases, splitting a frequently updated row out to a
separate table might not gain as much once we get HOT, but it's still a
good tool to consider. Depending on what you're doing another useful
technique is to not update the field as often by logging updates to be
performed into a separate table and periodically processing that
information into the main table.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpjuxIIyA3eO.pgp
Description: PGP signature


Re: [PERFORM] When/if to Reindex

2007-08-08 Thread Decibel!
On Wed, Aug 08, 2007 at 03:27:57PM -0400, Bill Moran wrote:
> I've had similar experience.  One thing you didn't mention that I've noticed
> is that VACUUM FULL often bloats indexes.  I've made it SOP that
> after application upgrades (which usually includes lots of ALTER TABLES and
> other massive schema and data changes) I VACUUM FULL and REINDEX (in that
> order).

You'd be better off with a CLUSTER in that case. It'll be faster, and
you'll ensure that the table has optimal ordering.

> Lots of ALTER TABLEs seem to bloat the database size considerably, beyond
> what normal VACUUM seems to fix.  A FULL seems to fix that, but it appears
> to bloat the indexes, thus a REINDEX helps.

Hrm, are you sure that's still true? I just did an ALTER TABLE ... TYPE
and it created a new file, meaning no bloating.

> I would expect that setting fillfactor to 100 will encourage indexs to bloat
> faster, and would only be recommended if you didn't expect the index contents
> to change?

Yes.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp4WeZLcqCcp.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-09 Thread Decibel!
On Thu, Aug 09, 2007 at 06:04:09PM +0530, Merlin Moncure wrote:
> That said, 'update' is the slowest operation for postgresql relative
> to other databases that are not MVCC.

Actually, it depends on how you do MVCC. In Oracle, DELETE is actually
the most expensive operation, because they have to not only remove the
row from the heap, they have to copy it to the undo log. And they need
to do something with indexes as well. Whereas we just update 4 bytes in
the heap and that's it.

An UPDATE in Oracle OTOH just needs to store whatever fields have
changed in the undo log. If you haven't messed with indexed fields, it
doesn't have to touch those either.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgplqWULgqzjL.pgp
Description: PGP signature


Re: [PERFORM] Dell Hardware Recommendations

2007-08-09 Thread Decibel!
On Thu, Aug 09, 2007 at 03:47:09PM -0400, Joe Uhl wrote:
> We have a 30 GB database (according to pg_database_size) running nicely
> on a single Dell PowerEdge 2850 right now.  This represents data
> specific to 1 US state.  We are in the process of planning a deployment
> that will service all 50 US states.
> 
> If 30 GB is an accurate number per state that means the database size is
> about to explode to 1.5 TB.  About 1 TB of this amount would be OLAP
> data that is heavy-read but only updated or inserted in batch.  It is
> also largely isolated to a single table partitioned on state.  This
> portion of the data will grow very slowly after the initial loading. 
> 
> The remaining 500 GB has frequent individual writes performed against
> it.  500 GB is a high estimate and it will probably start out closer to
> 100 GB and grow steadily up to and past 500 GB.

What kind of transaction rate are you looking at?

> I am trying to figure out an appropriate hardware configuration for such
> a database.  Currently I am considering the following:
> 
> PowerEdge 1950 paired with a PowerVault MD1000
> 2 x Quad Core Xeon E5310
> 16 GB 667MHz RAM (4 x 4GB leaving room to expand if we need to)

16GB for 500GB of active data is probably a bit light.

> PERC 5/E Raid Adapter
> 2 x 146 GB SAS in Raid 1 for OS + logs.
> A bunch of disks in the MD1000 configured in Raid 10 for Postgres data.
> 
> The MD1000 holds 15 disks, so 14 disks + a hot spare is the max.  With
> 12 250GB SATA drives to cover the 1.5TB we would be able add another
> 250GB of usable space for future growth before needing to get a bigger
> set of disks.  500GB drives would leave alot more room and could allow
> us to run the MD1000 in split mode and use its remaining disks for other
> purposes in the mean time.  I would greatly appreciate any feedback with
> respect to drive count vs. drive size and SATA vs. SCSI/SAS.  The price
> difference makes SATA awfully appealing.

Well, how does this compare with what you have right now? And do you
expect your query rate to be 50x what it is now, or higher?

> We plan to involve outside help in getting this database tuned and
> configured, but want to get some hardware ballparks in order to get
> quotes and potentially request a trial unit.

You're doing a very wise thing by asking for information before
purchasing (unfortunately, many people put that cart before the horse).
This list is a great resource for information, but there's no real
substitute for working directly with someone and being able to discuss
your actual system in detail, so I'd suggest getting outside help
involved before actually purchasing or even evaluating hardware. There's
a lot to think about beyond just drives and memory with the kind of
expansion you're looking at. For example, what ability do you have to
scale past one machine? Do you have a way to control your growth rate?
How well will the existing design scale out? (Often times what is a good
design for a smaller set of data is sub-optimal for a large set of
data.)

Something else that might be worth looking at is having your existing
workload modeled; that allows building a pretty accurate estimate of
what kind of hardware would be required to hit a different workload.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp180ZPmLqpp.pgp
Description: PGP signature


Re: [PERFORM] Dell Hardware Recommendations

2007-08-09 Thread Decibel!
On Thu, Aug 09, 2007 at 05:50:10PM -0400, Merlin Moncure wrote:
> Raid 10 is usually better for databases but in my experience it's a
> roll of the dice.  If you factor cost into the matrix a SAS raid 05
> might outperform a SATA raid 10 because you are getting better storage
> utilization out of the drives (n - 2 vs. n / 2).  Then again, you
> might not.

It's going to depend heavily on the controller and the workload.
Theoretically, if most of your writes are to stripes that the controller
already has cached then you could actually out-perform RAID10. But
that's a really, really big IF, because if the strip isn't in cache you
have to read the entire thing in before you can do the write... and that
costs *a lot*.

Also, a good RAID controller can spread reads out across both drives in
each mirror on a RAID10. Though, there is an argument for not doing
that... it makes it much less likely that both drives in a mirror will
fail close enough to each other that you'd lose that chunk of data.

Speaking of failures, keep in mind that a normal RAID5 puts you only 2
drive failures away from data loss, while with RAID10 you can
potentially lose half the array without losing any data. If you do RAID5
with multiple parity copies that does change things; I'm not sure which
is better at that point (I suspect it matters how many drives are
involved).

The comment about the extra controller isn't a bad idea, although I
would hope that you'll have some kind of backup server available, which
makes an extra controller much less useful.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpDQ9nJ7lGKI.pgp
Description: PGP signature


Re: [PERFORM] Dell Hardware Recommendations

2007-08-09 Thread Decibel!
On Thu, Aug 09, 2007 at 08:58:19PM -0500, Scott Marlowe wrote:
> > On Thu, Aug 09, 2007 at 05:29:18PM -0500, Scott Marlowe wrote:
> > > On 8/9/07, Decibel! <[EMAIL PROTECTED]> wrote:
> > >
> > > > Also, a good RAID controller can spread reads out across both drives in
> > > > each mirror on a RAID10. Though, there is an argument for not doing
> > > > that... it makes it much less likely that both drives in a mirror will
> > > > fail close enough to each other that you'd lose that chunk of data.
> > >
> > > I'd think that kind of failure mode is pretty uncommon, unless you're
> > > in an environment where physical shocks are common.  which is not a
> > > typical database environment.  (tell that to the guys writing a db for
> > > a modern tank fire control system though :) )

You'd be surprised. I've seen more than one case of a bunch of drives
failing within a month, because they were all bought at the same time.

> > > > while with RAID10 you can
> > > > potentially lose half the array without losing any data.
> > >
> > > Yes, but the RIGHT two drives can kill EITHER RAID 5 or RAID10.

Sure, but the odds of that with RAID5 are 100%, while they're much less
in a RAID10.

> > > While I agree with Merlin that for OLTP a faster drive is a must, for
> > > OLAP, more drives is often the real key.  The high aggregate bandwidth
> > > of a large array of SATA drives is an amazing thing to watch when
> > > running a reporting server with otherwise unimpressive specs.

True. In this case, the OP will probably want to have one array for the
OLTP stuff and one for the OLAP stuff.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgplJY7DsP6xb.pgp
Description: PGP signature


Re: [PERFORM] RAID 10 or RAID 10 + RAID 1

2007-08-14 Thread Decibel!
On Tue, Aug 14, 2007 at 01:14:39PM +0300, Michael Ben-Nes wrote:
> Hello,
> 
> I need to setup a web server with PostgreSQL. The heaviest application will
> be product comparisons engine that will analyze 5-10 Million products.
> 
> I have 6 SAS 15K disks, how should I set them up:
> 
> RAID 10 on all 6 disks ( OS + Apache + PostgreSQL )
> RAID 1 on 2 disks ( OS + Apache ) + RAID 10 on 4 disks ( PostgreSQL )
 
If the controller is any good and has a battery-backed write cache,
you'll probably do better with a 6 disk RAID10. If not, I'd put all the
data on a 4 disk RAID10, and everything else (including pg_xlog!) on a
mirror.

> What is the recommended stripe size ( The computer is Dell PowerEdge 2950 )

If you search through the archives I think you'll find some stuff about
stripe size and performance.

As always, your best bet is to benchmark both approaches with your
actual application.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp3mph03PXnn.pgp
Description: PGP signature


Re: [PERFORM] Integrated perc 5/i

2007-08-16 Thread Decibel!
On Thu, Aug 16, 2007 at 07:59:15PM +0200, Steinar H. Gunderson wrote:
> On Thu, Aug 16, 2007 at 10:53:00AM -0700, Luke Lonergan wrote:
> > They have a setting that sounds like RAID10, but it actually
> > implements spanning of mirrors.
> 
> That's interesting. I'm pretty sure it actually says "RAID10" in the BIOS,
> but is this a lie?

Unless they use the "plus notation" (ie: RAID 1+0 or RAID 0+1), you
never truly know what you're getting.

BTW, there's other reasons that RAID 0+1 stinks, beyond just
performance.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpXEC965h94T.pgp
Description: PGP signature


Re: [PERFORM] Integrated perc 5/i

2007-08-16 Thread Decibel!
On Thu, Aug 16, 2007 at 01:30:11PM -0400, [EMAIL PROTECTED] wrote:
> Hi Miki,
> by 40%.  I have not tried RAID 10 so I cannot help you there.  My
> suggestion is test both RAID 5 and RAID 10, and report back to us what
> you find.

Unless you're running something like a data warehouse, I'd put a real
low priority on testing RAID5... it's rarely a good idea for a database.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpAVdtC8cRhR.pgp
Description: PGP signature


Re: [PERFORM] Raid Configurations

2007-08-23 Thread Decibel!

On Aug 17, 2007, at 6:55 PM, Merlin Moncure wrote:

So, I'd be looking at a large raid 10 and 1-2 drives for the WAL...on
a raid 1.  If your system supports two controllers (in either
active/active or active/passive), you should look at second controller
as well.


If you only have one controller, and it can cache writes (it has a  
BBU), I'd actually lean towards putting all 12 drives into one raid  
10. A good controller will be able to handle WAL fsyncs plenty fast  
enough, so having a separate WAL mirror would likely hurt more than  
help.

--
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] asynchronous commit feature

2007-08-27 Thread Decibel!
On Thu, Aug 23, 2007 at 09:09:00AM -0400, Merlin Moncure wrote:
> I'm testing the new asynch commit feature on various raid
> configurations and my early findings is that it reduces the impact of
> keeping wal and data on the same volume.  I have 10 disks to play
> with, and am finding that it's faster to do a 10 drive raid 10 rather
> than 8 drive raid 10 + two drive wal.
> 
> anybody curious about the results, feel free to drop a line.  I think
> this will be a popular feature.

With or without a write cache on the RAID controller? I suspect that for
many systems, a write-caching controller will be very similar in
performance to async commit.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpoy8I6Hfg5B.pgp
Description: PGP signature


Re: [PERFORM] significant vacuum issues - looking for suggestions

2007-08-27 Thread Decibel!
On Fri, Aug 24, 2007 at 04:41:44PM -0400, Bill Moran wrote:
> In response to Kevin Kempter <[EMAIL PROTECTED]>:
> 
> > Hi List;
> > 
> > I've just started working with a client that has been running Postgres 
> > (with 
> > no DBA) for a few years. They're running  version 8.1.4 on 4-way dell boxes 
> > with 4Gig of memory on each box attached to RAID-10 disk arrays. 
> > 
> > Some of their key config settings are here:
> > shared_buffers = 20480
> > work_mem = 16384
> > maintenance_work_mem = 32758
> 
> Before you do any of those other things, bump shared_buffers to about
> 12 and maintenance_work_mem to 25 or so -- unless this box
> has other applications on it using significant amounts of those 4G of
> RAM.  You may find that these changes alone are enough to get vacuum
> to complete.  You'll need to restart the server for the shared_buffers
> setting to take effect.

For the really bloated table, you might need to go even higher than
25 for maint_work_mem. IIRC vacuum needs 6 bytes per dead tuple, so
that means 43M rows... with 5M dead pages, that means less than 10 rows
per page, which is unlikely. Keep in mind that if you do a vacuum
verbose, you'll be able to see if vacuum runs out of
maintenance_work_mem, because you'll see multiple passes through all the
indexes.

You could also potentially use this to your benefit. Set maint_work_mem
low enough so that vacuum will have to start it's cleaning pass after
only an hour or so... depending on how big/bloated the indexes are on
the table, it might take another 2-3 hours to clean everything. I
believe that as soon as you see it start on the indexes a second time
you can kill it... you'll have wasted some work, but more importantly
you'll have actually vacuumed part of the table.

But all of that's a moot point if they're running the default free space
map settings, which are way, way, way to conservative in 8.1. If you've
got one table with 5M dead pages, you probably want to set fsm_pages to
at least 5000 as a rough guess, at least until this is under
control. Keep in mind that does equate to 286M of memory, though.

As for your pg_dump idea... why not just do a CREATE TABLE AS SELECT *
FROM bloated_table? That would likely be much faster than messing around
with pg_dump.

What kind of disk hardware is this running on? A good raid 10 array with
write caching should be able to handle a 200G database fairly well; at
least better than it is from what I'm hearing.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpIHiEv5ElNs.pgp
Description: PGP signature


Re: [PERFORM] significant vacuum issues - looking for suggestions

2007-08-27 Thread Decibel!
On Mon, Aug 27, 2007 at 04:56:33PM -0500, Kevin Grittner wrote:
> >>> Decibel! <[EMAIL PROTECTED]> 08/27/07 4:00 PM >>> 
> > > > They're running  version 8.1.4
> > 
> > As for your pg_dump idea... why not just do a CREATE TABLE AS SELECT *
> > FROM bloated_table? That would likely be much faster than messing around
> > with pg_dump.
>  
> He wanted to upgrade to 8.2.4.  CREATE TABLE AS won't get him there.
>  
> > > > They're running  version 8.1.4 on 4-way dell boxes 
> > > > with 4Gig of memory on each box attached to RAID-10 disk arrays. 
> > 
> > What kind of disk hardware is this running on? A good raid 10 array with
> > write caching should be able to handle a 200G database fairly well
>  
> What other details were you looking for?

How many drives? Write caching? 200G isn't *that* big for good drive
hardware, *IF* it's performing the way it should. You'd be surprised how
many arrays fall on their face even from a simple dd test.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpQhFYbG6nJA.pgp
Description: PGP signature


Re: [PERFORM] io storm on checkpoints, postgresql 8.2.4, linux

2007-08-28 Thread Decibel!
On Tue, Aug 28, 2007 at 10:00:57AM -0500, Erik Jones wrote:
> >>It seemed strange to me that our 70%-read db generates so much  
> >>dirty
> >>pages that writing them out takes 4-8 seconds and grabs the full  
> >>bandwidth.
> >>First, I started to tune bgwriter to a more aggressive settings,  
> >>but this
> >>was of no help, nearly no performance changes at all. Digging into  
> >>the issue
> >>further, I discovered that linux page cache was the reason. "Dirty"
> >>parameter in /proc/meminfo (which shows the amount of ready-to- 
> >>write "dirty"
> >>data currently sitting in page cache) grows between checkpoints  
> >>from 0 to
> >>about 100Mb. When checkpoint comes, all the 100mb got flushed out  
> >>to disk,
> >>effectively causing a IO storm.
> >>
> >>I found this (http://www.westnet.com/~gsmith/content/linux- 
> >>pdflush.htm
> >><http://www.westnet.com/%7Egsmith/content/linux-pdflush.htm>)  
> >>document and
> >>peeked into mm/page-writeback.c in linux kernel source tree. I'm  
> >>not sure
> >>that I understand pdflush writeout semantics correctly, but looks  
> >>like when
> >>the amount of "dirty" data is less than dirty_background_ratio*RAM/ 
> >>100,
> >>pdflush only writes pages in background, waking up every
> >>dirty_writeback_centisecs and writing no more than 1024 pages
> >>(MAX_WRITEBACK_PAGES constant). When we hit  
> >>dirty_background_ratio, pdflush
> >>starts to write out more agressively.
> >>
> >>So, looks like the following scenario takes place: postgresql  
> >>constantly
> >>writes something to database and xlog files, dirty data gets to  
> >>the page
> >>cache, and then slowly written out by pdflush. When postgres  
> >>generates more
> >>dirty pages than pdflush writes out, the amount of dirty data in the
> >>pagecache is growing. When we're at checkpoint, postgres does fsync 
> >>() on the
> >>database files, and sleeps until the whole page cache is written out.
> >>
> >>By default, dirty_background_ratio is 2%, which is about 328Mb  
> >>of 16Gb
> >>total. Following the curring pdflush logic, nearly this amount of  
> >>data we
> >>face to write out on checkpoint effective stalling everything  
> >>else, so even
> >>1% of 16Gb is too much. My setup experience 4-8 sec pause in  
> >>operation even
> >>on ~100Mb dirty pagecache...
> >>
> >> I temporaly solved this problem by setting  
> >>dirty_background_ratio to
> >>0%. This causes the dirty data to be written out immediately. It  
> >>is ok for
> >>our setup (mostly because of large controller cache), but it  
> >>doesn't looks
> >>to me as an elegant solution. Is there some other way to fix this  
> >>issue
> >>without disabling pagecache and the IO smoothing it was designed  
> >>to perform?
> >
> >You are working at the correct level. The bgwriter performs the I/O  
> >smoothing
> >function at the database level. Obviously, the OS level smoothing  
> >function
> >needed to be tuned and you have done that within the parameters of  
> >the OS.
> >You may want to bring this up on the Linux kernel lists and see if  
> >they have
> >any ideas.
> >
> >Good luck,
> >
> >Ken
> 
> Have you tried decreasing you checkpoint interval?  That would at  
> least help to reduce the amount of data that needs to be flushed when  
> Postgres fsyncs.

The downside to that is it will result in writing a lot more data to WAL
as long as full page writes are on.

Isn't there some kind of a timeout parameter for how long dirty data
will sit in the cache? It seems pretty broken to me to allow stuff to
sit in a dirty state indefinitely.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpQaJf46brNj.pgp
Description: PGP signature


Re: [PERFORM] [Solved] Postgres performance problem

2007-08-30 Thread Decibel!
On Thu, Aug 30, 2007 at 11:50:04AM +0200, Ruben Rubio wrote:
> As you may know, I do a vacuum full and a reindex database each day. I
> have logs that confirm that its done and I can check that everything was
>  fine.
> 
> So, this morning, I stopped the website, I stopped database, started it
> again. (I was around 200  days without restarting), then I vacuum
> database and reindex it (Same command as everyday) . Restart again, and
> run again the website.
> 
> Now seems its working fine. But I really does not know where is the
> problem. Seems vacuum its not working fine?  Maybe database should need
> a restart? I really don't know.

No, it sounds to me like you just weren't vacuuming aggressively enough
to keep up with demand.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpl6Jx8Grbto.pgp
Description: PGP signature


Re: [PERFORM] [GENERAL] Performance issue with nested loop

2007-09-01 Thread Decibel!
10) AND
(overview = 1) AND (zweitstudium <> 2) AND (status > 0))

  ->  Index Scan using stud_pkey on stud
(cost=0.00..4.67 rows=1 width=4) (actual time=0.024..0.026 rows=1  
loops=420)


Index Cond: (stud.sid = "outer".sid)

Filter: ((status > 0) AND
(length((vname)::text) > 1) AND (length((nname)::text) > 1))

->  Unique  (cost=5383.29..5384.98 rows=337 width=4) (actual
time=1.520..2.686 rows=511 loops=208)

  ->  Sort  (cost=5383.29..5384.13 rows=337 width=4)  
(actual

time=1.519..1.871 rows=511 loops=208)

Sort Key: public.ausb.sid

->  Hash Join  (cost=17.61..5369.14 rows=337  
width=4)

(actual time=1.133..314.584 rows=511 loops=1)

  Hash Cond: ("outer".uniid = "inner".uniid)

  ->  Seq Scan on ausb  (cost=0.00..4827.30
rows=104174 width=8) (actual time=0.030..226.532 rows=103593 loops=1)

Filter: ((overview = 1) AND  
(zweitstudium <>

2) AND (status > 0))

  ->  Hash  (cost=17.60..17.60 rows=2 width=4)
(actual time=0.392..0.392 rows=2 loops=1)

->  Seq Scan on uni  (cost=0.00..17.60
rows=2 width=4) (actual time=0.369..0.381 rows=2 loops=1)

  Filter: ((uniort)::text =
'Augsburg'::text)



The estimated row numbers are not bad as long as one table is  
affected.
They're much worse as soon as two or more tables are joined. Though  
the
query plans are slightly different, the number of merged rows at  
different
stages seems to be rather the same for both plans. The big  
difference in my
eyes seems the cost for the first nested loop. This seems to be the  
point,

where the long running query consumes most time. I've then set
enable_nestloop to off, and actually the problem disappears.


Other maybe relevant parameters:
default_statistics_target = 100
work_mem = 4096
max_fsm_pages = 10

My questions:

What could be the problem behind high amount of actually used time  
for the

nested loop in the first query?

If we decided to constantly turn off nested loops, what side  
effects would

we have to expect?

Are there more granular ways to tell the query planner when to use  
nested

loops?

Or just other ideas what to do? We'd be grateful for any hint!


Here's what's killing you:

  ->  Nested Loop  (cost=65462.58..78785.78 rows=1 width=4) (actual
time=6743.856..698776.957 rows=250 loops=1)

Join Filter: ("outer".sid = "inner".sid)

->  Merge Join  (cost=11031.79..11883.12 rows=1 width=12)  
(actual

time=387.837..433.612 rows=494 loops=1)

That merge thinks it's olny going to see 1 row, but it ends up with  
494, which results in:


->  Unique  (cost=54430.79..4.18 rows=10599 width=4)  
(actual

time=6.851..1374.135 rows=40230 loops=494)

The miss-estimation is actually coming from lower in the query... I  
see there's one place where it expects 180 rows and gets 511, which  
is part of the problem. Try increasing the stats on ausb.sid.


Oh, and please don't line-wrap explain output.
--
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-11 Thread Decibel!
On Wed, Sep 05, 2007 at 11:06:03AM -0400, Carlo Stonebanks wrote:
> Unfortunately, LINUX is not an option at this time. We looked into it; there
> is no *NIX expertise in the enterprise. However, I have raised this issue in
> various forums before, and when pressed no one was willing to say that "*NIX
> *DEFINITELY* outperforms Windows" for what my client is doing (or if it did
> outperform Windows, that it would outperform so significantly that it
> merited the move).
> 
> Was this incorrect? Can my client DEFINITELY expect a significant
> improvement in performance for what he is doing?

Since we don't know your actual workload, there's no way to predict
this. That's what benchmarking is for. If you haven't already bought the
hardware, I'd strongly recommend benchmarking this before buying
anything, so that you have a better idea of what your workload looks
like. Is it I/O-bound? CPU-bound? Memory?

One of the fastest ways to non-performance in PostgreSQL is not
vacuuming frequently enough. Vacuum more, not less, and control IO
impact via vacuum_cost_delay. Make sure the FSM is big enough, too.

Unless your database is small enough to fit in-memory, your IO subsystem
is almost certainly going to kill you. Even if it does fit in memory, if
you're doing much writing at all you're going to be in big trouble.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpvMfqORWRv7.pgp
Description: PGP signature


Re: [PERFORM] Hardware spec

2007-09-11 Thread Decibel!
On Thu, Sep 06, 2007 at 11:26:46AM +0200, Willo van der Merwe wrote:
> Richard Huxton wrote:
> >Willo van der Merwe wrote:
> >>Hi guys,
> >>
> >>I'm have the rare opportunity to spec the hardware for a new database
> >>server. It's going to replace an older one, driving a social networking
> >>web application. The current server (a quad opteron with 4Gb of RAM and
> >>80Gb fast SCSI RAID10) is coping with an average load of ranging between
> >>1.5 and 3.5.
> >>
> >>The new machine spec I have so far:
> >What's the limiting factor on your current machine - disk, memory, cpup?
> I'm a bit embarrassed to admit that I'm not sure. The reason we're 
> changing machines is that we might be changing ISPs and we're renting / 
> leasing the machines from the ISP.

Get yourself the ability to benchmark your application. This is
invaluable^W a requirement for any kind of performance tuning.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpnCHpHesYZG.pgp
Description: PGP signature


Re: [PERFORM] SAN vs Internal Disks

2007-09-11 Thread Decibel!
On Fri, Sep 07, 2007 at 02:10:32PM -0700, [EMAIL PROTECTED] wrote:
> >3. Easy to set up "warm standby" functionality.  (Then again, if the
> >postgres server fails miserably, it's likely to be due to a disk
> >crash).
> 
> and if postgres dies for some other reason the image on disk needs repair, 
> unless you script stopping postgres when the SAN does it's snapshots, 
> those snapshots are not going to be that good. the problems are useually 
> repairable, but that makes starting your warm spare harder.

Uh, the "image" you get from a PITR backup "needs repair" too. There's
absolutely nothing wrong with using a SAN or filesystem snapshot as a
backup mechanism, as long as it's a true snapshot, and it includes *all*
PostgreSQL data (everything under $PGDATA as well as all tablespaces).

Also, to reply to someone else's email... there is one big reason to use
a SAN over direct storage: you can do HA that results in 0 data loss.
Good SANs are engineered to be highly redundant, with multiple
controllers, PSUs, etc, so that the odds of losing the SAN itself are
very, very low. The same isn't true with DAS.

But unless you need that kind of HA recovery, I'd tend to stay away from
SANs.

BTW, if you need *serious* bandwidth, look at things like Sun's
"thumper" (I know there's at least one other company that makes
something similar). 40-48 drives in a single 4U chassis == lots of
throughput.

Finally, if you do get a SAN, make sure and benchmark it. I've seen more
than one case of a SAN that wasn't getting anywhere near the performance
it should be, even with a simple dd test.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgprqUCHZieqB.pgp
Description: PGP signature


Re: [PERFORM] DRBD and Postgres: how to improve the perfomance?

2007-09-11 Thread Decibel!
On Mon, Sep 10, 2007 at 12:06:40AM -0400, Greg Smith wrote:
> On Sat, 8 Sep 2007, Joshua D. Drake wrote:
> 
> >You would have to have lightning handed by God to your server to have a 
> >total power failure without proper shutdown in the above scenario.
> 
> Do you live somewhere without thunderstorms?  This is a regular event in 

Actually, he does. :) Or at least I don't think Portland gets a lot of
t-storms, just rain by the bucketful.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpkpeZ765p7x.pgp
Description: PGP signature


Re: [PERFORM] DRBD and Postgres: how to improve the perfomance?

2007-09-11 Thread Decibel!
On Mon, Sep 10, 2007 at 12:54:37AM -0400, Tom Lane wrote:
> Greg Smith <[EMAIL PROTECTED]> writes:
> > On Sat, 8 Sep 2007, Joshua D. Drake wrote:
> >> You would have to have lightning handed by God to your server to have a 
> >> total power failure without proper shutdown in the above scenario.
> 
> > Do you live somewhere without thunderstorms?  This is a regular event in 
> > this part of the world during the summer.  It happened to me once this 
> > year and once last; lost count for previous ones.  In both of the recent 
> > cases it's believed the servers were burned from the Ethernet side because 
> > somewhere in the network was a poor switch that wasn't isolated well 
> > enough from the grid when the building was hit.  Lightning is tricky that 
> > way; cable TV and satellite wiring are also weak links that way.
> 
> Yeah.  I've lost half a dozen modems of varying generations, a server
> motherboard, a TiVo, a couple of VCRs, and miscellaneous other equipment
> from strikes near my house --- none closer than a couple blocks away.
> I don't really care to think about what would still work after a direct
> hit, despite the whole-house surge suppressor at the meter and the local
> suppressor on each circuit and the allegedly surge-proof UPSes powering
> all the valuable stuff.  I've also moved heavily into wireless local


Pretty much every surge supressor out there is a POS... 99.9% of them
just wire a varistor across the line; like a $0.02 part is going to stop
a 10,00+ amp discharge.

The only use I have for those things is if they come with an equipment
guarantee, though I have to wonder how much those are still honored,
since as you mention it's very easy for equipment to be fried via other
means (ethernet, monitor, etc).

> net to eliminate any direct electrical connections between machines that
> are not on the same power circuit (the aforesaid burned motherboard
> taught me that particular lesson).  And yet I still fear every time a
> thunderstorm passes over.

Wired is safe as long as everything's on the same circuit. My house is
wired for ethernet with a single switch running what's going to every
room, but in each room I have a second switch on the same power as
whatever's in that room; so if there is a strike it's far more likely
that I'll lose switches and not hardware.

> Then of course there are the *other* risks, such as the place burning to
> the ground, or getting drowned by a break in the city reservoir that's
> a couple hundred yards up the hill (but at least I needn't worry about

Invest in sponges. Lots of them. :)
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp1myEeooOQW.pgp
Description: PGP signature


Re: [PERFORM] DRBD and Postgres: how to improve the perfomance?

2007-09-11 Thread Decibel!
On Tue, Sep 11, 2007 at 04:57:24PM +0200, Steinar H. Gunderson wrote:
> On Tue, Sep 11, 2007 at 04:47:40PM +0200, Maila Fatticcioni wrote:
> > The servers are cross connected with a common 100 Mbit/sec Ethernet so I
> > think they have a bandwidth around 80 Mbit/sec (even if I haven't yet
> > done any test on it). A rate of 70Mb seems reasonable to me.
> 
> Umm, seriously? Unless that was a typo, you should consider very seriously to
> go to gigabit; it's cheap these days, and should provide you with a very
> decent speed boost if the network bandwidth is the bottleneck.

Actually, in this case, I suspect that latency will be far more critical
than overall bandwidth. I don't know if it's inherent to Gig-E, but my
limited experience has been that Gig-E has higher latency than 100mb.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpHMx4JicFgQ.pgp
Description: PGP signature


Re: [PERFORM] random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

2007-09-11 Thread Decibel!
On Mon, Sep 10, 2007 at 06:22:06PM -0400, Mark Mielke wrote:
> In my case, I set effective_cache_size to 25% of the RAM available to 
> the system (256 Mbytes), for a database that was about 100 Mbytes or 
> less. I found performance to increase when reducing random_page_cost 
> from 4.0 to 3.0.

Just for the record, effective_cache_size of 25% is *way* too low in
most cases, though if you only have 1GB setting it to 500MB probably
isn't too far off.

Generally, I'll set this to however much memory is in the server, minus
1G for the OS, unless there's less than 4G of total memory in which case
I subtract less.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpbj9aYHBI3X.pgp
Description: PGP signature


Re: [PERFORM] random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

2007-09-11 Thread Decibel!
On Tue, Sep 11, 2007 at 02:49:28PM -0400, Jean-David Beyer wrote:
> It is tricky for me to find a big enough file to test. I tried one of the

dd if=/dev/zero of=bigfile bs=8192 count=100
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpFbvUNb2CWU.pgp
Description: PGP signature


Re: [PERFORM] [Again] Postgres performance problem

2007-09-11 Thread Decibel!
On Tue, Sep 11, 2007 at 09:49:37AM +0200, Ruben Rubio wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> [EMAIL PROTECTED] escribi?:
> >> Last time I had this problem i solved it stopping website,  restarting
> >> database, vacuumm it, run again website. But I guess this is going to
> >> happen again.
> >>
> >> I would like to detect and solve the problem. Any ideas to detect it?
> > 
> > Do you have very long transactions? Maybe some client that is connected
> > all the time that is idle in transaction?
> 
> There should not be long transactions. I ll keep an eye on Idle transactions
> 
> I m detecting it using:
> 
> echo 'SELECT current_query  FROM pg_stat_activity;' |
> /usr/local/pgsql/bin/psql vacadb  | grep IDLE | wc -l

If you're using VACUUM FULL, you're doing something wrong. :) Run lazy
vacuum frequently enough (better yet, autovacuum, but cut all of 8.1's
autovac parameters in half), and make sure your FSM is big enough
(periodic vacuumdb -av | tail is an easy way to check that).

Try a REINDEX. VACUUM FULL is especially hard on the indexes, and it's
easy for them to seriously bloat.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpL7e2h1Br2t.pgp
Description: PGP signature


Re: [PERFORM] Barcelona vs Tigerton

2007-09-11 Thread Decibel!
On Tue, Sep 11, 2007 at 10:57:43AM +0300, Mindaugas wrote:
>   Hello,
> 
>   Now that both 4x4 out it's time for us to decide which one should be better 
> for our PostgreSQL and Oracle. And especially for Oracle we really need such 
> server to squeeze everything from Oracle licenses. Both of the databases 
> handle OLTP type of the load.

You might take a look at replacing Oracle with EnterpriseDB... but I'm a
bit biased. ;)

>   Since we plan to buy 4U HP DL580 or 585 and only very few of them so power 
> ratings are not very critical in this our case.
> 
>   First benchmarks (http://www.anandtech.com/IT/showdoc.aspx?i=3091) show 
> that Intel still has more raw CPU power but Barcelona scales much better and 
> also has better memory bandwidth which I believe is quite critical with 16 
> cores and DB usage pattern.
>   On the other hand Intel's X7350 (2.93GHz) has almost 50% advantage in CPU 
> frequency against 2GHz Barcelona.

Databases are all about bandwidth and latency. Compute horsepower almost
never matters.

The only reason I'd look at the clock rate is if it substantially
affects memory IO capability; but from what I've seen, memory seems to
be fairly independent of CPU frequency now-a-days, so I don't think
there's a huge difference there.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp0JJsI29xNu.pgp
Description: PGP signature


Re: [PERFORM] SAN vs Internal Disks

2007-09-11 Thread Decibel!
On Tue, Sep 11, 2007 at 05:09:00PM -0400, Michael Stone wrote:
> On Tue, Sep 11, 2007 at 03:55:51PM -0500, Decibel! wrote:
> >Also, to reply to someone else's email... there is one big reason to use
> >a SAN over direct storage: you can do HA that results in 0 data loss.
> >Good SANs are engineered to be highly redundant, with multiple
> >controllers, PSUs, etc, so that the odds of losing the SAN itself are
> >very, very low. The same isn't true with DAS.
> 
> You can get DAS arrays with multiple controllers, PSUs, etc.  DAS != 
> single disk.

It's still in the same chassis, though, which means if you lose memory
or mobo you're still screwed. In a SAN setup for redundancy, there's
very little in the way of a single point of failure; generally only the
backplane, and because there's very little that's on there it's
extremely rare for one to fail.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpol3S1SLA0f.pgp
Description: PGP signature


Re: [PERFORM] More Vacuum questions...

2007-09-11 Thread Decibel!
On Tue, Sep 11, 2007 at 10:24:58AM -0600, Kevin Kempter wrote:
> I let the cron vacuum run for 14 days. in that 14 days the time it takes to 
> vacuum the table grew from 1.2hours directly after the rebuild to > 8hours 
> last nite.

Sounds to me like daily isn't enough, and that your FSM is too small.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpCY6kMFsQLb.pgp
Description: PGP signature


Re: [PERFORM] random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

2007-09-11 Thread Decibel!
On Wed, Sep 12, 2007 at 12:02:46AM +0100, Gregory Stark wrote:
> "Decibel!" <[EMAIL PROTECTED]> writes:
> 
> > On Tue, Sep 11, 2007 at 02:49:28PM -0400, Jean-David Beyer wrote:
> >> It is tricky for me to find a big enough file to test. I tried one of the
> >
> > dd if=/dev/zero of=bigfile bs=8192 count=100
> 
> On linux another useful trick is:
> 
> echo 1 > /proc/sys/vm/drop_caches

The following C code should have similar effect...


/*
 * $Id: clearmem.c,v 1.1 2003/06/29 20:41:33 decibel Exp $
 *
 * Utility to clear out a chunk of memory and zero it. Useful for flushing disk 
buffers
 */

int main(int argc, char *argv[]) {
if (!calloc(atoi(argv[1]), 1024*1024)) { printf("Error allocating 
memory.\n"); }
}
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpWoNVFef8hh.pgp
Description: PGP signature


Re: [PERFORM] [Again] Postgres performance problem

2007-09-17 Thread Decibel!
On Wed, Sep 12, 2007 at 03:01:12PM -0500, Erik Jones wrote:
> 
> On Sep 12, 2007, at 2:19 PM, Frank Schoep wrote:
> 
> >On Sep 12, 2007, at 9:07 PM, Scott Marlowe wrote:
> >>On 9/12/07, Mikko Partio <[EMAIL PROTECTED]> wrote:
> >>>?
> >>>Aren't you mixing up REINDEX and CLUSTER?
> >>
> >>?
> >>Either one does what a vacuum full did / does, but generally does  
> >>it better.
> >
> >On topic of REINDEX / VACUUM FULL versus a CLUSTER / VACUUM ANALYZE  
> >I'd like to ask if CLUSTER is safe to run on a table that is in  
> >active use.
> >
> >After updating my maintenance scripts from a VACUUM FULL (add me to  
> >the list) to CLUSTER (which improves performance a lot) I noticed I  
> >was getting "could not open relation ?" errors in the log while the  
> >scripts ran so I reverted the change. This was on 8.1.9.
> 
> You'd probably see the same behavior on 8.2.x.  CLUSTER is not  
> transactionally safe so you don't want to run CLUSTER on tables that  
> are actively being used.  I believe that's been fixed for 8.3.

Actually, that's a bit over-conservative... what happens prior to 8.3 is
that CLUSTER rewrites the table using it's XID for everything. That can
break semantics for any transactions that are running in serializable
mode; if you're just using the default isolation level of read
committed, you're fine with CLUSTER.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828


pgpJrcbgbZXeA.pgp
Description: PGP signature


Re: [PERFORM] [Again] Postgres performance problem

2007-09-17 Thread Decibel!
On Thu, Sep 13, 2007 at 01:58:10AM -0400, Greg Smith wrote:
> On Wed, 12 Sep 2007, Scott Marlowe wrote:
> 
> >I'm getting more and more motivated to rewrite the vacuum docs.  I think 
> >a rewrite from the ground up might be best...  I keep seeing people 
> >doing vacuum full on this list and I'm thinking it's as much because of 
> >the way the docs represent vacuum full as anything.
> 
> I agree you shouldn't start thinking in terms of how to fix the existing 
> documentation.  I'd suggest instead writing a tutorial leading someone 
> through what they need to know about their tables first and then going 
> into how vacuum works based on that data.

Take a look at the stuff at http://decibel.org/~decibel/pervasive/, it'd
hopefully provide a useful starting point.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828


pgp9kUqIc1Pg6.pgp
Description: PGP signature


[PERFORM] Linux mis-reporting memory

2007-09-20 Thread Decibel!
Sorry, I know this is probably more a linux question, but I'm guessing
that others have run into this...

I'm finding this rather interesting report from top on a Debian box...

Mem:  32945280k total, 32871832k used,73448k free,   247432k buffers
Swap:  1951888k total,42308k used,  1909580k free, 30294300k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
12492 postgres  15   0 8469m 8.0g 8.0g S0 25.6   3:52.03 postmaster
 7820 postgres  16   0 8474m 4.7g 4.7g S0 15.1   1:23.72 postmaster
21863 postgres  15   0 8472m 3.9g 3.9g S0 12.4   0:30.61 postmaster
19893 postgres  15   0 8471m 2.4g 2.4g S0  7.6   0:07.54 postmaster
20423 postgres  17   0 8472m 1.4g 1.4g S0  4.4   0:04.61 postmaster
26395 postgres  15   0 8474m 1.1g 1.0g S1  3.4   0:02.12 postmaster
12985 postgres  15   0 8472m 937m 930m S0  2.9   0:05.50 postmaster
26806 postgres  15   0 8474m 787m 779m D4  2.4   0:01.56 postmaster

This is a machine that's been up some time and the database is 400G, so
I'm pretty confident that shared_buffers (set to 8G) should be
completely full, and that's what that top process is indicating.

So how is it that linux thinks that 30G is cached?
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828


pgp9JORJbJrah.pgp
Description: PGP signature


Re: [PERFORM] Linux mis-reporting memory

2007-10-02 Thread Decibel!

On Sep 21, 2007, at 4:43 AM, Gregory Stark wrote:

"Csaba Nagy" <[EMAIL PROTECTED]> writes:


On Fri, 2007-09-21 at 09:03 +0100, Gregory Stark wrote:
Mem:  32945280k total, 32871832k used,73448k free,
247432k buffers
Swap:  1951888k total,42308k used,  1909580k free,  
30294300k cached


It seems to imply Linux is paging out sysV shared memory. In fact  
some of

Heikki's tests here showed that Linux would do precisely that.


But then why is it not reporting that in the "Swap: used"  
section ? It

only reports 42308k used swap.


Hm, good point.

The other possibility is that Postgres just hasn't even touched a  
large part

of its shared buffers.


Sorry for the late reply...

No, this is on a very active database server; the working set is  
almost certainly larger than memory (probably by a fair margin :( ),  
and all of the shared buffers should be in use.


I'm leaning towards "top on linux == dumb".
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Linux mis-reporting memory

2007-10-02 Thread Decibel!

On Oct 2, 2007, at 1:37 PM, Adam Tauno Williams wrote:

I'm leaning towards "top on linux == dumb".


I disagree, it just isn't the appropriate tool for the job.  What top
tells you is lots of correct information,  it just isn't the right
information.


If it is in fact including shared memory as 'cached', then no, the  
information it's providing is not correct.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] autovacuum: recommended?

2007-11-19 Thread Decibel!

On Nov 18, 2007, at 1:26 PM, gabor wrote:

hubert depesz lubaczewski wrote:

On Fri, Nov 16, 2007 at 10:40:43AM +0100, Gábor Farkas wrote:

we are moving one database from postgresql-7.4 to postgresql-8.2.4.

any particular reason why not 8.2.5?


the distribution i use only has 8.2.4 currently.


Then I think you need to consider abandoning your distribution's  
packages or find a better distribution. IIRC, 8.2.5 is over 2-3  
months old now; there's no reason a distribution shouldn't have it at  
this point. (Unless of course you haven't kept your distribution up- 
to-date... ;)

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] autovacuum: recommended?

2007-11-19 Thread Decibel!

On Nov 16, 2007, at 7:38 AM, [EMAIL PROTECTED] wrote:

The table was quite huge (say 20k of products along with detailed
descriptions etc.) and was completely updated and about 12x each  
day, i.e.
it qrew to about 12x the original size (and 11/12 of the rows were  
dead).

This caused a serious slowdown of the application each day, as the
database had to scan 12x more data.


FWIW, 20k rows isn't all that big, so I'm assuming that the  
descriptions make the table very wide. Unless those descriptions are  
what's being updated frequently, I suggest you put those in a  
separate table (vertical partitioning). That will make the main table  
much easier to vacuum, as well as reducing the impact of the high  
churn rate.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] autovacuum: recommended?

2007-11-19 Thread Decibel!

On Nov 16, 2007, at 5:56 AM, Csaba Nagy wrote:
We are doing that here, i.e. set up autovacuum not to touch big  
tables,
and cover those with nightly vacuums if there is still some  
activity on

them, and one weekly complete vacuum of the whole DB ("vacuum" without
other params, preferably as the postgres user to cover system tables
too).


IIRC, since 8.2 autovacuum will take note of manual vacuums so as not  
to needlessly vacuum something that's been recently vacuumed  
manually. In other words, you shouldn't need to disable autovac for  
large tables if you vacuum them every night and their churn rate is  
low enough to not trigger autovacuum during the day.



In fact we also have a few very frequently updated small tables, those
are also covered by very frequent crontab vacuums because in 8.2
autovacuum can spend quite some time vacuuming some medium sized  
tables
and in that interval the small but frequently updated ones get  
bloated.

This should be better with 8.3 and multiple autovacuum workers.


+1. For tables that should always remain relatively small (ie: a web  
session table), I usually recommend setting up a manual vacuum that  
runs every 1-5 minutes.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] autovacuum: recommended?

2007-12-05 Thread Decibel!

On Nov 19, 2007, at 9:23 AM, Tom Lane wrote:

Decibel! <[EMAIL PROTECTED]> writes:

FWIW, 20k rows isn't all that big, so I'm assuming that the
descriptions make the table very wide. Unless those descriptions are
what's being updated frequently, I suggest you put those in a
separate table (vertical partitioning). That will make the main table
much easier to vacuum, as well as reducing the impact of the high
churn rate.


Uh, you do realize that the TOAST mechanism does that pretty much
automatically?



Only if the row exceeds 2k, which for a lot of applications is huge.  
This is exactly why I wish toast limits were configurable on a per- 
table basis (I know there were changes here for 8.3, but IIRC it was  
only for toast chunk size).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] TB-sized databases

2007-12-05 Thread Decibel!

On Nov 28, 2007, at 7:27 AM, Bill Moran wrote:

Is there something wrong with:
set enable_seqscan = off



Note that in cases of very heavy skew, that won't work. It only adds  
10M to the cost estimate for a seqscan, and it's definitely possible  
to have an index scan that looks even more expensive.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Measuring table and index bloat

2007-12-19 Thread Decibel!

On Dec 8, 2007, at 1:06 AM, Greg Smith wrote:
One of those things that comes up regularly on this list in  
particular are people whose performance issues relate to "bloated"  
tables or indexes. What I've always found curious is that I've  
never seen a good way suggested to actually measure said bloat in  
any useful numeric terms--until today.


Greg Sabino Mullane just released a Nagios plug-in for PostgreSQL  
that you can grab at http://bucardo.org/nagios_postgres/ , and  
while that is itself nice the thing I found most remarkable is the  
bloat check.  The majority of that code is an impressive bit of SQL  
that anyone could use even if you have no interest in Nagios, which  
is why I point it out for broader attention.  Look in  
check_postgres.pl for the "check_bloat" routine and the big  
statement starting at the aptly labled "This was fun to write"  
section.  If you pull that out of there and replace $MINPAGES and  
$MINIPAGES near the end with real values, you can pop that into a  
standalone query and execute it directly.  Results look something  
like this (reformatting for e-mail):


schemaname | tablename | reltuples | relpages | otta  | tbloat |  
public | accounts  |   250 |41667 | 40382 |1.0 |


wastedpages | wastedbytes | wastedsize |  iname  | ituples |
   1285 |10526720 | 10 MB  | accounts_pkey   | 250 |

ipages | iotta | ibloat | wastedipages | wastedibytes | wastedisize
  5594 | 35488 |0.2 |0 |0 | 0 bytes

I'd be curious to hear from those of you who have struggled with  
this class of problem in the past as to whether you feel this  
quantifies the issue usefully.


I don't think he's handling alignment correctly...

  CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma

AFAIK that should also be 8 on 64 bit CPUs.

A somewhat more minor nit... the calculation of the null header  
should be based on what columns in a table are nullable, not whether  
a column actually is null. Oh, and otta should be oughta. :) Though  
I'd probably just call it ideal.


Having said all that, this looks highly useful!
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Decibel!

On Dec 19, 2007, at 4:54 PM, Gregory Stark wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:


It is pretty much common knowledge that


I think we have too much "common knowledge".



Yeah. For a lot of folks it's still common knowledge that you should  
only set shared_buffers to 10% of memory...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Minimizing dead tuples caused by update triggers

2007-12-19 Thread Decibel!

On Dec 19, 2007, at 6:39 PM, Tom Lane wrote:
The thing that concerns me is dead tuples on the table_stats  
table.  I

believe that every insert of new data in one of the monitored tables
will result in an UPDATE of the table_stats table.  When thousands
( or millions ) of rows are inserted, the select performance ( even
trying with an index ) on table_stats slows down in a hurry.


Yup.  FWIW, 8.3's "HOT" tuple updates might help this quite a lot.
Not a lot to be done about it in 8.0.x though :-(



A work-around others have used is to have the trigger just insert  
into a 'staging' table and then periodically take the records from  
that table and summarize them somewhere else. You still have a vacuum  
concern on the staging table, but the advantage is that you trigger  
path is a simple insert instead of an update, which is effectively a  
delete and an insert.


This is a case where a cron'd vacuum that runs once a minute is  
probably a wise idea.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] strange pauses

2008-01-21 Thread Decibel!

On Jan 21, 2008, at 1:58 AM, Adrian Moisey wrote:
Perhaps, if you want to avoid I/O caused by temp tables (but it's  
not at
checkpoint time, so perhaps this has nothing to do with your  
problem),

you could try raising temp_buffers.


How can I find out if temp_buffers is being exceeded ?


You could monitor the pgsql_tmp directory under the appropriate  
database directory ($PGDATA/base/oid_number_of_the_database). You  
could also see how many pages temporary objects in that connection  
are using; you'd have to find the temp schema that your session is  
using (\dn pg_temp* from psql), and then


SELECT sum(relpages) FROM pg_class c JOIN pg_namespace n ON  
(c.relnamespace=n.oid) AND n.nspname='pg_temp_blah';

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Vacuum and FSM page size

2008-01-28 Thread Decibel!
On Wed, Jan 23, 2008 at 07:29:16PM +0100, Thomas Lozza wrote:
> hi
> 
> We have an installation of Postgres 8.1.2 (32bit on Solaris 9) with a DB
> size of about 250GB on disk. The DB is subject to fair amount of
> inserts, deletes and updates per day. 
> 
> Running VACUUM VERBOSE tells me that I should allocate around 20M pages
> to FSM (max_fsm_pages)! This looks like a really large amount to me. 
> 
> Has anyone gone ever that high with max_fsm_pages?

No, that's telling me that you have a lot of bloat. A 250G database is
about 31M pages. If you have 20M pages with free space then you've got a
lot of bloat. Ideally, with a autovac_vacuum_scale_factor of .25 you
should only need 4M FSM pages. At most you should only need 8M.

> The other question is why such a large number is required in the first
> place. 
> Auto vacuum is enabled. Here are the settings:
> 
> autovacuum = true 
> autovacuum_naptime = 900
Why'd you change that? That's pretty high.

> autovacuum_vacuum_threshold = 2000
> autovacuum_analyze_threshold = 1000

Both of those seem high...

> autovacuum_vacuum_scale_factor = 0.25
That means that 12.5% of your database (on average) will be dead
space... I'd probably cut that back to 0.2.

> autovacuum_analyze_scale_factor = 0.18
This also seems pretty high.

> autovacuum_vacuum_cost_delay = 150

Woah, that's *really* high. That means at most you'll get 6 vacuum
rounds in per second; with default cost settings that means you'd be
able to actually vacuum about 50 dirty pages per second, tops. Of course
not all pages will be dirty, but still...

I normally use between 10 and 20 for cost_delay (lower values for faster
drive arrays).

> autovacuum_vacuum_cost_limit = 120
Why'd you reduce this? I'd put it back to 200...
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828


pgpaDgab6IgTz.pgp
Description: PGP signature


Re: [PERFORM] Storage space usage

2008-02-01 Thread Decibel!
On Fri, Feb 01, 2008 at 02:14:18PM +0530, Gangadharan S.A. wrote:
> Hi,
> I have 2 questions regarding the storage optimization done by Postgres:
> 1) Is a NULL value optimized for storage. If I have a timestamp (or some
> such) field that I set to default NULL, will it still use up the full space
> for the data type.

Null values are indicated via a NULL bitmap. A null field is not stored,
it is just indicated in the bitmap.

> 2) Similarly, if I have a text array, is an empty array optimized for
> storage?

Arrays are stored as varlenas. I'm pretty sure than an empty array is
considered to be NULL; as such the comments above would apply.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828


pgpjPrQhlOQDS.pgp
Description: PGP signature


Re: [PERFORM] Performance Implications of Using Exceptions

2008-04-09 Thread Decibel!

On Mar 31, 2008, at 8:23 PM, Ravi Chemudugunta wrote:

 In general I would recommend that you benchmark them using
 as-close-to-real load as possible again as-real-as-possible data.


I am running a benchmark with around 900,000 odd records (real-load on
the live machine :o ) ... should show hopefully some good benchmarking
results for the two methods.



Please do, and please share. I know the docs say that exception  
blocks make things "significantly" more expensive, but I think that  
the community also sometimes loses the forest for the tree. Setting  
up a savepoint (AFAIK that's the actual expense in the exception  
block) is fairly CPU-intensive, but it's not common for a database  
server to be CPU-bound, even for OLTP. You're usually still waiting  
on disk.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Performance with temporary table

2008-04-09 Thread Decibel!

On Apr 8, 2008, at 2:43 PM, Alvaro Herrera wrote:

samantha mahindrakar escribió:

Well instead of creating a temp table everytime i just created a
permanant table and insert the data into it everytime and truncate  
it.

I created indexes on this permanent table too. This did improve the
performance to some extent.

Does using permanant tables also bloat the catalog or hinder the  
performance?


In terms of catalog usage, permanent tables behave exactly the same as
temp tables.


True, but the point is that you're not bloating the catalogs with  
thousands of temp table entries.


I agree with others though: it certainly doesn't sound like there's  
any reason to be using temp tables here at all. This sounds like a  
case of trying to apply procedural programming techniques to a  
database instead of using set theory (which generally doesn't work  
well).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Creating large database of MD5 hash values

2008-05-24 Thread Decibel!

On Apr 11, 2008, at 10:25 AM, Alvaro Herrera wrote:

Sorry, yes, I'm behind on email... :(


If MD5 values will be your primary data and you'll be storing millions
of them, it would be wise to create your own datatype and operators  
with

the most compact and efficient representation possible.



If you do this *please* post it. I really think it would be worth  
while for us to have fixed-size data types for common forms of binary  
data; MD5, SHA1 and SHA256 come to mind.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] shared_buffers performance

2008-05-24 Thread Decibel!

On Apr 14, 2008, at 3:31 PM, Tom Lane wrote:

Gregory Stark <[EMAIL PROTECTED]> writes:
The transition domain where performance drops dramatically as the  
database
starts to not fit in shared buffers but does still fit in  
filesystem cache.


It looks to me like the knee comes where the DB no longer fits in
filesystem cache.  What's interesting is that there seems to be no
synergy at all between shared_buffers and the filesystem cache.
Ideally, very hot pages would stay in shared buffers and drop out  
of the

kernel cache, allowing you to use a database approximating all-of-RAM
before you hit the performance wall.  It's clear that in this example
that's not happening, or at least that only a small part of shared
buffers isn't getting duplicated in filesystem cache.


I suspect that we're getting double-buffering on everything because  
every time we dirty a buffer and write it out the OS is considering  
that as access, and keeping that data in it's cache. It would be  
interesting to try an overcome that and see how it impacts things.  
With our improvement in checkpoint handling, we might be able to just  
write via DIO... if not maybe there's some way to tell the OS to  
buffer the write for us, but target that data for removal from cache  
as soon as it's written.



Of course, that's because pgbench reads a randomly-chosen row of
"accounts" in each transaction, so that there's exactly zero locality
of access.  A more realistic workload would probably have a Zipfian
distribution of account number touches, and might look a little better
on this type of test.


--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Posible planner improvement?

2008-05-24 Thread Decibel!

Moving to -hackers...

On May 21, 2008, at 9:09 AM, Richard Huxton wrote:

Luke Lonergan wrote:

The problem is that the implied join predicate is not being
propagated.  This is definitely a planner deficiency.


IIRC only equality conditions are propagated and gt, lt, between  
aren't.  I seem to remember that the argument given was that the  
cost of checking for the ability to propagate was too high for the  
frequency when it ocurred.


Of course, what was true for code and machines of 5 years ago might  
not be so today.


Definitely...

How hard would it be to propagate all conditions (except maybe  
functions, though perhaps the new function cost estimates make that  
more practical) in cases of equality?


For reference, the original query as posted to -performance:

select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;

That took > 84 minutes (the query was a bit longer but this is the  
part that made the difference) after a little change the query took  
~1 second:


select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and  
t1.id = t2.id;


--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] I/O on select count(*)

2008-05-24 Thread Decibel!

On May 18, 2008, at 1:28 AM, Greg Smith wrote:
I just collected all the good internals information included in  
this thread and popped it onto http://wiki.postgresql.org/wiki/ 
Hint_Bits where I'll continue to hack away at the text until it's  
readable.  Thanks to everyone who answered my questions here,  
that's good progress toward clearing up a very underdocumented area.


I note a couple of potential TODO items not on the official list  
yet that came up during this discussion:


-Smooth latency spikes when switching commit log pages by  
preallocating cleared pages before they are needed


-Improve bulk loading by setting "frozen" hint bits for tuple  
inserts which occur within the same database transaction as the  
creation of the table into which they're being inserted


Did I miss anything?  I think everything brought up falls either  
into one of those two or the existing "Consider having the  
background writer update the transaction status hint bits..." TODO.


-Evaluate impact of improved caching of CLOG per Greenplum:

Per Luke Longergan:
I'll find out if we can extract our code that did the work. It was  
simple but scattered in a few routines. In concept it worked like this:


1 - Ignore if hint bits are unset, use them if set.  This affects  
heapam and vacuum I think.
2 - implement a cache for clog lookups based on the optimistic  
assumption that the data was inserted in bulk.  Put the cache one  
call away from heapgetnext()


I forget the details of (2).  As I recall, if we fall off of the  
assumption, the penalty for long scans get large-ish (maybe 2X), but  
since when do people full table scan when they're updates/inserts are  
so scattered across TIDs?  It's an obvious big win for DW work.


--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] I/O on select count(*)

2008-05-24 Thread Decibel!

On May 18, 2008, at 1:28 AM, Greg Smith wrote:
I just collected all the good internals information included in  
this thread and popped it onto http://wiki.postgresql.org/wiki/ 
Hint_Bits where I'll continue to hack away at the text until it's  
readable.  Thanks to everyone who answered my questions here,  
that's good progress toward clearing up a very underdocumented area.


I note a couple of potential TODO items not on the official list  
yet that came up during this discussion:


-Smooth latency spikes when switching commit log pages by  
preallocating cleared pages before they are needed


-Improve bulk loading by setting "frozen" hint bits for tuple  
inserts which occur within the same database transaction as the  
creation of the table into which they're being inserted


Did I miss anything?  I think everything brought up falls either  
into one of those two or the existing "Consider having the  
background writer update the transaction status hint bits..." TODO.


Blah, sorry for the double-post, but I just remembered a few things...

Did we completely kill the idea of the bg_writer *or some other  
background process* being responsible for setting all hint-bits on  
dirty pages before they're written out?


Also, Simon and Tom had an idea at PGCon: Don't set hint-bits in the  
back-end if the page isn't already dirty. We'd likely need some  
heuristics on this... based on Luke's comments about improved CLOG  
caching maybe we want to set the bits anyway if the tuples without  
them set are from old transactions (idea being that pulling those  
CLOG pages would be pretty expensive). Or better yet; if we have to  
read a CLOG page off disk, set the bits.


This could still potentially be a big disadvantage for data  
warehouses; though perhaps the way to fix that is recommend a  
backgrounded vacuum after data load.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] query plan, index scan cost

2008-08-13 Thread Decibel!

On Jul 18, 2008, at 5:28 AM, Stefan Zweig wrote:

CREATE TABLE nw_tla_2008_4_deu
(
"ID" bigint NOT NULL,
"NET2CLASS" smallint,
"FOW" smallint,
CONSTRAINT nw_tla_2008_4_deu_pkey PRIMARY KEY ("ID"),
)
WITHOUT OIDS;


You might want to give up on the double-quotes... you'll have to use  
them everywhere. It'd drive me nuts... :)



EXPLAIN
ANALYZE

SELECT
nw."ID" AS id

FROM
nw_tla_2008_4_deu AS nw

WHERE
expand(st_pointfromtext('POINT(13.7328934 51.049476)',4326),  
0.2476961598054) && nw.the_geom

AND nw."FOW" IN (1,2,3,4,10,17)
AND nw."NET2CLASS" IN (0,1,2,3)



Total runtime: *13.332* ms


running the next query which is only slightly different and has one  
instead of two and conditions leads to the following result


EXPLAIN
ANALYZE

SELECT
nw."ID" AS id

FROM
nw_tla_2008_4_deu AS nw

WHERE
expand(st_pointfromtext('POINT(13.7328934 51.049476)',4326),  
0.2476961598054) && nw.the_geom

AND nw."FOW" IN (1,2,3,4,10,17)



Total runtime: *109*ms


so in both querys there are and conditions. there are two and  
conditions in the first query and one and condition in the second  
query. unfortunately i am not an expert in reading the postgre  
query plan. basically i am wondering why in the first query a  
second index scan is done whereas in the second query the second  
index scan is not done. the second query runs hundred times faster  
then first one which surprising to me.


The second index scan wasn't done in the second query because you  
don't have the second IN clause. And it's actually the 1st query that  
was faster, because it returned fewer rows (15k instead of 45k).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Decibel!

On Aug 11, 2008, at 9:01 AM, Jeff wrote:

On Aug 11, 2008, at 5:17 AM, Henrik wrote:

OK, changed the SAS RAID 10 to RAID 5 and now my random writes are  
handing 112 MB/ sek. So it is almsot twice as fast as the RAID10  
with the same disks. Any ideas why?


Is the iozone tests faulty?



does IOzone disable the os caches?
If not you need to use a size of 2xRAM for true results.

regardless - the test only took 10 seconds of wall time - which  
isn't very long at all. You'd probably want to run it longer anyway.



Additionally, you need to be careful of what size writes you're  
using. If you're doing random writes that perfectly align with the  
raid stripe size, you'll see virtually no RAID5 overhead, and you'll  
get the performance of N-1 drives, as opposed to RAID10 giving you N/2.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Incorrect estimates on correlated filters

2008-08-13 Thread Decibel!

On Aug 12, 2008, at 4:59 PM, Chris Kratz wrote:
Ran into a re-occuring performance problem with some report queries  
again today.  In a nutshell, we have filters on either multiple  
joined tables, or multiple columns on a single table that are  
highly correlated.  So, the estimates come out grossly incorrect  
(the planner has no way to know they are correlated).  2000:1 for  
one I'm looking at right now.  Generally this doesn't matter,  
except in complex reporting queries like these when this is the  
first join of 40 other joins.  Because the estimate is wrong at the  
lowest level, it snowballs up through the rest of the joins causing  
the query to run very, very slowly.   In many of these cases,  
forcing nested loops off for the duration of the query fixes the  
problem.  But I have a couple that still are painfully slow and  
shouldn't be.


I've been reading through the archives with others having similar  
problems (including myself a year ago).  Am I right in assuming  
that at this point there is still little we can do in postgres to  
speed up this kind of query?  Right now the planner has no way to  
know the correlation between different columns in the same table,  
let alone columns in different tables.  So, it just assumes no  
correlation and returns incorrectly low estimates in cases like these.


The only solution I've come up with so far is to materialize  
portions of the larger query into subqueries with these correlated  
filters which are indexed and analyzed before joining into the  
larger query.  This would keep the incorrect estimates from  
snowballing up through the chain of joins.


Are there any other solutions to this problem?



Well... you could try and convince certain members of the community  
that we actually do need some kind of a query hint mechanism... ;)


I did make a suggestion a few months ago that involved sorting a  
table on different columns and recording the correlation of other  
columns. The scheme isn't perfect, but it would help detect cases  
like a field populated by a sequence and another field that's insert  
timestamp; those two fields would correlate highly, and you should  
even be able to correlate the two histograms; that would allow you to  
infer that most of the insert times for _id's between 100 and 200  
will be between 2008-01-01 00:10 and 2008-01-01 00:20, for example.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Incorrect estimates on correlated filters

2008-08-16 Thread Decibel!

On Aug 13, 2008, at 1:45 PM, Chris Kratz wrote:
Yes, I know hints are frowned upon around here.  Though, I'd love  
to have them or something equivalent on this particular query just  
so the customer can run their important reports.  As it is, it's  
unrunnable.



Actually, now that I think about it the last time this was brought up  
there was discussion about something that doesn't force a particular  
execution method, but instead provides improved information to the  
planner. It might be worth pursuing that, as I think there was less  
opposition to it.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-16 Thread Decibel!

On Aug 13, 2008, at 2:54 PM, Henrik wrote:
Additionally, you need to be careful of what size writes you're  
using. If you're doing random writes that perfectly align with the  
raid stripe size, you'll see virtually no RAID5 overhead, and  
you'll get the performance of N-1 drives, as opposed to RAID10  
giving you N/2.
But it still needs to do 2 reads and 2 writes for every write,  
correct?



If you are completely over-writing an entire stripe, there's no  
reason to read the existing data; you would just calculate the parity  
information from the new data. Any good controller should take that  
approach.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Experiences storing binary in Postgres

2008-08-16 Thread Decibel!

On Aug 14, 2008, at 1:00 PM, [EMAIL PROTECTED] wrote:

We're developing a project which uses PostgreSQL to store binary
documents. Since our system is likely to grow up to some terabytes  
in two

years, I'd like to ask if some of you have had some experience with
storing a huge amount of blob files in postgres. How does it scale in
performance?


It depends on your access patterns. If this is an OLTP database, you  
need to think really hard about putting that stuff in the database,  
because it will seriously hurt your caching ability. If we had the  
ability to define buffersize limits per-tablespace, you could handle  
it that way, but...


Another consideration is why you want to put this data in a database  
in the first place? It may be convenient, but if that's the only  
reason you could be hurting yourself in the long run.


BTW, after seeing the SkyTools presentation at pgCon this year I  
realized there's a pretty attractive middle-ground between storing  
this data in your production database and storing it in the  
filesystem. Using plproxy and pgBouncer, it wouldn't be hard to store  
the data in an external database. That gives you the ease-of- 
management of a database, but keeps the data away from your  
production data.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Optimizing a VIEW

2008-08-16 Thread Decibel!

On Aug 15, 2008, at 1:36 PM, Madison Kelly wrote:
The 'cust_id' references the customer that the given data belongs  
to. The reason for this "data bucket" (does this structure have a  
proper name?) is that the data I need to store on a give customer  
is quite variable and outside of my control. As it is, there is  
about 400 different variable/value pairs I need to store per customer.



It's called Entity-Attribute-Value, and it's performance is pretty  
much guaranteed to suck for any kind of a large dataset. The problem  
is that you're storing a MASSIVE amount of extra information for  
every single value. Consider:


If each data point was just a field in a table, then even if we left  
cd_value as text, each data point would consume 4 bytes* + 1 byte per  
character (I'm assuming you don't need extra UTF8 chars or anything).  
Of course if you know you're only storing numbers or the like then  
you can make that even more efficient.


* In 8.3, the text field overhead could be as low as 1 byte if the  
field is small enough.


OTOH, your table is going to 32+24 bytes per row just for the per-row  
overhead, ints and timestamps. Each text field will have 1 or 4 bytes  
in overhead, then you have to store the actual data. Realistically,  
you're looking at 60+ bytes per data point, as opposed to maybe 15,  
or even down to 4 if you know you're storing an int.


Now figure out what that turns into if you have 100 data points per  
minute. It doesn't take very long until you have a huge pile of data  
you're trying to deal with. (As an aside, I once consulted with a  
company that wanted to do this... they wanted to store about 400 data  
points from about 1000 devices on a 5 minute interval. That worked  
out to something like 5GB per day, just for the EAV table. Just  
wasn't going to scale...)


So, back to your situation... there's several things you can do that  
will greatly improve things.


Identify data points that are very common and don't use EAV to store  
them. Instead, store them as regular fields in a table (and don't use  
text if at all possible).


You need to trim down your EAV table. Throw out the added/modified  
info; there's almost certainly no reason to store that *per data  
point*. Get rid of cd_id; there should be a natural PK you can use,  
and you certainly don't want anything else referring to this table  
(which is a big reason to use a surrogate key).


cd_variable and cd_tag need to be ints that point at other tables.  
For that matter, do you really need to tag each *data point*?  
Probably not...


Finally, if you have a defined set of points that you need to report  
on, create a materialized view that has that information.


BTW, it would probably be better to store data either in the main  
table, or the history table, but not both places.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Optimizing a VIEW

2008-08-20 Thread Decibel!

On Aug 16, 2008, at 9:19 PM, Gurjeet Singh wrote:

For you very specific case, I recommend you check out contrib/hstore:
http://www.postgresql.org/docs/current/static/hstore.html


Awesome Any comments on the performance of hstore?


I've looked at it but haven't actually used it. One thing I wish it  
did was to keep a catalog somewhere of the "names" that it's seen so  
that it wasn't storing them as in-line text. If you have even  
moderate-length names and are storing small values you quickly end up  
wasting a ton of space.


BTW, now that you can build arrays of composite types, that might be  
an easy way to deal with this stuff. Create a composite type of  
(name_id, value) and store that in an array.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Optimizing a VIEW

2008-08-20 Thread Decibel!

On Aug 17, 2008, at 10:21 AM, Madison Kelly wrote:
Truth be told, I sort of expected this would be what I had to do. I  
think I asked this more in hoping that there might be some "magic"  
I didn't know about, but I see now that's not the case. :)


As my data points grow to 500,000+, the time it took to return  
these results grew to well over 10 minutes on a decent server and  
the DB size was growing rapidly, as you spoke of.


So I did just as you suggested and took the variable names I knew  
about specifically and created a table for them. These are the ones  
that are being most often updated (hourly per customer) and made  
each column an 'int' or 'real' where possible and ditched the  
tracking of the adding/modifying user and time stamp. I added those  
out of habit, more than anything. This data will always come from a  
system app though, so...


Given that my DB is in development and how very long and intensive  
it would have been to pull out the existing data, I have started  
over and am now gathering new data. In a week or so I should have  
the same amount of data as I had before and I will be able to do a  
closer comparison test.


However, I already suspect the growth of the database will be  
substantially slower and the queries will return substantially faster.



I strongly recommend you also re-think using EAV at all for this. It  
plain and simple does not scale well. I won't go so far as to say it  
can never be used (we're actually working on one right now, but it  
will only be used to occasionally pull up single entities), but you  
have to be really careful with it. I don't see it working very well  
for what it sounds like you're trying to do.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Optimizing a VIEW

2008-08-22 Thread Decibel!

On Aug 20, 2008, at 1:18 PM, Tom Lane wrote:

If you're worried about storage space, I wouldn't go for arrays of
composite :-(.  The tuple header overhead is horrendous, almost
certainly a lot worse than hstore.



Oh holy cow, I didn't realize we had a big header in there. Is that  
to allow for changing the definition of the composite type?

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Partial index usage

2009-02-21 Thread decibel

On Feb 16, 2009, at 9:07 AM, Craig Ringer wrote:

CREATE INDEX uidx_product_partno_producer_id
 ON product
 USING btree
 (partno, producer_id);


CREATE INDEX idx_product_partno
 ON product
 USING btree
 (partno);

Can I safely delete the second one?


You can safely delete BOTH in that it won't hurt your data, only
potentially hurt performance.

Deleting the index on (partno) should somewhat improve insert
performance and performance on updates that can't be done via HOT.

However, the index on (partno, producer_id) is requires more  
storage and

memory than the index on just (partno). AFAIK it's considerably slower
to scan.



Actually, that's not necessarily true. If both partno and procuder_id  
are ints and you're on a 64bit platform, there won't be any change in  
index size, due to alignment issues.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query much slower when run from postgres function

2009-03-14 Thread decibel

On Mar 9, 2009, at 8:36 AM, Mario Splivalo wrote:
Now, as I was explained on pg-jdbc mailinglist, that 'SET  
enable_seqscan TO false' affects all queries on that persistent  
connection from tomcat, and It's not good solution. So I wanted to  
post here to ask what other options do I have.



FWIW, you can avoid that with SET LOCAL (though it would still affect  
the rest of the transaction).


You could also store whatever enable_seqscan was set to in a variable  
before setting it to false and then set it back when you're done.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query much slower when run from postgres function

2009-03-14 Thread decibel

On Mar 10, 2009, at 12:20 PM, Tom Lane wrote:

f...@redhat.com (Frank Ch. Eigler) writes:

For a prepared statement, could the planner produce *several* plans,
if it guesses great sensitivity to the parameter values?  Then it
could choose amongst them at run time.


We've discussed that in the past.  "Choose at runtime" is a bit more
easily said than done though --- you can't readily flip between plan
choices part way through, if you've already emitted some result rows.


True, but what if we planned for both high and low cardinality cases,  
assuming that pg_stats indicated both were a possibility? We would  
have to store multiple plans for one prepared statement, which  
wouldn't work well for more complex queries (if you did high and low  
cardinality estimates for each table you'd end up with 2^r plans,  
where r is the number of relations), so we'd need a way to cap it  
somehow. Of course, whether that's easier than having the ability to  
throw out a current result set and start over with a different plan  
is up for debate...


On a related note, I wish there was a way to tell plpgsql not to pre- 
plan a query. Sure, you can use EXECUTE, but building the query plan  
is a serious pain in the rear.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query performance over a large proportion of data

2009-03-14 Thread decibel

On Mar 10, 2009, at 4:12 PM, Steve McLellan wrote:
The server itself is a dual-core 3.7GHz Xeon Dell (each core  
reporting 2
logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres  
8.3.5 built

from source.



Uh, you're running an amd64 build on top of an Intel CPU? I didn't  
think FBSD would allow that, but if it does it wouldn't surprise me  
if kernel/OS performance stunk. If Postgres then used the same  
settings it would make matters even worse (IIRC there is some code  
that's different in an AMD vs Intel build).

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-14 Thread decibel

On Mar 11, 2009, at 10:48 PM, Jignesh K. Shah wrote:
Fair enough..  Well I am now appealing  to all  who has a fairly  
decent sized hardware want to try it out  and see whether there are  
"gains", "no-changes" or "regressions" based on your workload. Also  
it will help if you report number of cpus when you respond back to  
help collect feedback.



Do you have a self-contained test case? I have several boxes with 16- 
cores worth of Xeon with 96GB I could try it on (though you might not  
care about having "only" 16 cores :P)

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-14 Thread decibel

On Mar 12, 2009, at 2:22 PM, Jignesh K. Shah wrote:
Something that might be useful for him to report is the avg number  
of active backends for each data point ...
short of doing select * from pg_stat_activity and removing the IDLE  
entries, any other clean way to get that information.



Uh, isn't there a DTrace probe that would provide that info? It  
certainly seems like something you'd want to know...

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-14 Thread decibel

On Mar 13, 2009, at 8:05 AM, Gregory Stark wrote:

"Jignesh K. Shah"  writes:


Scott Carey wrote:

On 3/12/09 11:37 AM, "Jignesh K. Shah"  wrote:

In general, I suggest that it is useful to run tests with a few  
different

types of pacing. Zero delay pacing will not have realistic number of
connections, but will expose bottlenecks that are universal, and  
less

controversial


I think I have done that before so I can do that again by running  
the users at
0 think time which will represent a "Connection pool" which is  
highly utilized"
and test how big the connection pool can be before the throughput  
tanks.. This
can be useful for App Servers which sets up connections pools of  
their own

talking with PostgreSQL.


Keep in mind when you do this that it's not interesting to test a  
number of
connections much larger than the number of processors you have.  
Once the
system reaches 100% cpu usage it would be a misconfigured  
connection pooler

that kept more than that number of connections open.



How certain are you of that? I believe that assertion would only be  
true if a backend could never block on *anything*, which simply isn't  
the case. Of course in most systems you'll usually be blocking on IO,  
but even in a ramdisk scenario there's other things you can end up  
blocking on. That means having more threads than cores isn't  
unreasonable.


If you want to see this in action in an easy to repeat test, try  
compiling a complex system (such as FreeBSD) with different levels of  
-j handed to make (of course you'll need to wait until everything is  
in cache, and I'm assuming you have enough memory so that everything  
would fit in cache).

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-14 Thread decibel

On Mar 13, 2009, at 3:02 PM, Jignesh K. Shah wrote:

vmstat seems similar to wakeup some
kthr  memorypagedisk   
faults  cpu
r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 sd   in   sy
cs us sy id
63 0 0 45535728 38689856 0 14 0 0 0  0  0  0  0  0  0 163318 334225  
360179 47 17 36
85 0 0 45436736 38690760 0 6 0 0  0  0  0  0  0  0  0 165536 347462  
365987 47 17 36
59 0 0 45405184 38681752 0 11 0 0 0  0  0  0  0  0  0 155153 326182  
345527 47 16 37
53 0 0 45393816 38673344 0 6 0 0  0  0  0  0  0  0  0 152752 317851  
340737 47 16 37
66 0 0 45378312 38651920 0 11 0 0 0  0  0  0  0  0  0 150979 304350  
336915 47 16 38
67 0 0 45489520 38639664 0 5 0 0  0  0  0  0  0  0  0 157188 318958  
351905 47 16 37
82 0 0 45483600 38633344 0 10 0 0 0  0  0  0  0  0  0 168797 348619  
375827 47 17 36
68 0 0 45463008 38614432 0 9 0 0  0  0  0  0  0  0  0 173020 376594  
385370 47 18 35
54 0 0 45451376 38603792 0 13 0 0 0  0  0  0  0  0  0 161891 342522  
364286 48 17 35
41 0 0 45356544 38605976 0 5 0 0  0  0  0  0  0  0  0 167250 358320  
372469 47 17 36
27 0 0 45323472 38596952 0 11 0 0 0  0  0  0  0  0  0 165099 344695  
364256 48 17 35



The good news is there's now at least enough runnable procs. What I  
find *extremely* odd is the CPU usage is almost dead constant...

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [PERFORMANCE] work_mem vs temp files issue

2009-09-13 Thread decibel

On Aug 19, 2009, at 7:45 PM, Jaime Casanova wrote:

AFAIUI, work_mem is used for some operations (sort, hash, etc) for
avoiding the use of temp files on disk...

In a client server i'm monitoring (pg 8.3.7, 32GB of ram) work_mem is
set to 8MB, however i'm seeing a lot of temp files (>3 in 4 hours)
with small sizes (ie: 2021520 obviously lower than 8MB). so, why?
maybe we use work_mem until we find isn't enough and we send just the
difference to a temp file?

i'm not thinking in raising work_mem until i understand this well,
what's the point if we still create temp files that could fit in
work_mem...



Are you using temp tables? Those end up in pgsql_tmp as well.
--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-16 Thread decibel

On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote:

On 10/10/2009 01:14 AM, tsuraan wrote:

The most significant impact is that it takes up twice as much space,
including the primary key index. This means fewer entries per block,
which means slower scans and/or more blocks to navigate through.  
Still,
compared to the rest of the overhead of an index row or a table  
row, it
is low - I think it's more important to understand whether you  
can get
away with using a sequential integer, in which case UUID is  
unnecessary
overhead - or whether you are going to need UUID anyways. If you  
need

UUID anyways - having two primary keys is probably not worth it.


Ok, that's what I was hoping.  Out of curiosity, is there a preferred
way to store 256-bit ints in postgres?  At that point, is a bytea the
most reasonable choice, or is there a better way to do it?



Do you need to be able to do queries on it? Numeric should be able  
to store 256-bit integers.


If you don't need to do queries on it, an option I've considered in  
the past is to break it up into 4 x int64. Before UUID was  
supported, I had seriously considered storing UUID as 2 x int64.  
Now that UUID is supported, you might also abuse UUID where 1 x 256- 
bit = 2 x UUID.


If you want it to be seemless and fully optimal, you would  
introduce a new int256 type (or whatever the name of the type you  
are trying to represent). Adding new types to PostgreSQL is not  
that hard. This would allow queries (=, <>, <, >) as well.



If you want an example of that, we had Command Prompt create a full  
set of hash datatypes (SHA*, and I think md5). That stuff should be  
on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com and  
I'll get it added.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-16 Thread decibel

On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote:

On 10/10/2009 01:14 AM, tsuraan wrote:

The most significant impact is that it takes up twice as much space,
including the primary key index. This means fewer entries per block,
which means slower scans and/or more blocks to navigate through.  
Still,
compared to the rest of the overhead of an index row or a table  
row, it
is low - I think it's more important to understand whether you  
can get
away with using a sequential integer, in which case UUID is  
unnecessary
overhead - or whether you are going to need UUID anyways. If you  
need

UUID anyways - having two primary keys is probably not worth it.


Ok, that's what I was hoping.  Out of curiosity, is there a preferred
way to store 256-bit ints in postgres?  At that point, is a bytea the
most reasonable choice, or is there a better way to do it?



Do you need to be able to do queries on it? Numeric should be able  
to store 256-bit integers.


If you don't need to do queries on it, an option I've considered in  
the past is to break it up into 4 x int64. Before UUID was  
supported, I had seriously considered storing UUID as 2 x int64.  
Now that UUID is supported, you might also abuse UUID where 1 x 256- 
bit = 2 x UUID.


If you want it to be seemless and fully optimal, you would  
introduce a new int256 type (or whatever the name of the type you  
are trying to represent). Adding new types to PostgreSQL is not  
that hard. This would allow queries (=, <>, <, >) as well.



If you want an example of that, we had Command Prompt create a full  
set of hash datatypes (SHA*, and I think md5). That stuff should be  
on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com and  
I'll get it added.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Domain vs table

2009-10-25 Thread decibel

On Oct 20, 2009, at 6:55 AM, Merlin Moncure wrote:
On Sun, Oct 11, 2009 at 11:31 AM, Michal Szymanski  
 wrote:

I think I've found answer to my question
http://www.commandprompt.com/blogs/joshua_drake/2009/01/ 
fk_check_enum_or_domain_that_is_the_question/




I mostly agree with the comments on the blog but let me throw a couple
more points out there:

*) It is possible (although not necessarily advised) to manipulate
enums via direct manipulation of pg_enum
*) enums are the best solution if you need natural ordering properties
for indexing purposes
*) domains can't be used in arrays
*) foreign key is obviously preferred if you need store more related
properties than the value itself
*) if the constraint is complicated (not just a list of values), maybe
domain/check constraint is preferred, possibly hooked to immutable
function



Also, if the base table will have a very large number of rows  
(probably at least 10M), the overhead of a text datatype over a  
smallint or int/oid gets to be very large.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Oddity with view

2008-11-10 Thread Jim &#x27;Decibel!' Nasby

Why is this view 9x slower than the base table?

[EMAIL PROTECTED] explain analyze select count(*) from  
loan_tasks_committed;
 
QUERY PLAN
 
--
 Aggregate  (cost=994625.69..994625.70 rows=1 width=0) (actual  
time=7432.306..7432.306 rows=1 loops=1)
   ->  Seq Scan on loan_tasks_committed  (cost=0.00..929345.35  
rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689  
loops=1)

 Total runtime: 7432.360 ms
(3 rows)

Time: 7432.858 ms

loan_tasks effectively does SELECT * FROM loan_tasks_committed UNION  
ALL SELECT * FROM loan_tasks_pending;. There's some lookup tables for  
_pending, but as this explain shows there's no actual data there  
right now.


[EMAIL PROTECTED] explain analyze select count(*) from  
loan_tasks;

QUERY PLAN
 
 
-
 Aggregate  (cost=1516929.75..1516929.76 rows=1 width=0) (actual  
time=60396.081..60396.082 rows=1 loops=1)
   ->  Append  (cost=0.00..1190523.94 rows=26112465 width=240)  
(actual time=0.023..57902.470 rows=26115689 loops=1)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1190466.70  
rows=26112135 width=162) (actual time=0.023..54776.335 rows=26115689  
loops=1)
   ->  Seq Scan on loan_tasks_committed   
(cost=0.00..929345.35 rows=26112135 width=162) (actual  
time=0.014..22531.902 rows=26115689 loops=1)
 ->  Subquery Scan "*SELECT* 2"  (cost=36.10..57.24 rows=330  
width=240) (actual time=0.003..0.003 rows=0 loops=1)
   ->  Hash Join  (cost=36.10..53.94 rows=330 width=240)  
(actual time=0.002..0.002 rows=0 loops=1)

 Hash Cond: (ltp.loan_task_code_id = ltc.id)
 ->  Seq Scan on loan_tasks_pending ltp   
(cost=0.00..13.30 rows=330 width=208) (actual time=0.001..0.001  
rows=0 loops=1)
 ->  Hash  (cost=21.60..21.60 rows=1160  
width=36) (never executed)
   ->  Seq Scan on loan_task_codes ltc   
(cost=0.00..21.60 rows=1160 width=36) (never executed)

 Total runtime: 60396.174 ms
(11 rows)

Time: 60397.046 ms

 SELECT true AS "committed", loan_tasks_committed.id, ...,  
loan_tasks_committed.task_amount

   FROM loan_tasks_committed
UNION ALL
 SELECT false AS "committed", ltp.id, ..., NULL::"unknown" AS  
task_amount

   FROM loan_tasks_pending ltp
   JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id;

The stuff I omitted is just some fields and a few other NULLs. This  
is 8.2.9.

--
Decibel! [EMAIL PROTECTED] (512) 569-9461




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddity with view

2008-11-10 Thread Jim &#x27;Decibel!' Nasby

On Nov 10, 2008, at 7:06 AM, Tom Lane wrote:

"Jim 'Decibel!' Nasby" <[EMAIL PROTECTED]> writes:

loan_tasks effectively does SELECT * FROM loan_tasks_committed UNION
ALL SELECT * FROM loan_tasks_pending;.


You seem to have neglected to mention a join or two.



Yeah, though I did show them at the end of the message...

 SELECT true AS "committed", loan_tasks_committed.id, ...,  
loan_tasks_committed.task_amount

   FROM loan_tasks_committed
UNION ALL
 SELECT false AS "committed", ltp.id, ..., NULL::"unknown" AS  
task_amount

   FROM loan_tasks_pending ltp
   JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id;

Thing is, there's no data to be had on that side. All of the time is  
going into the seqscan of loan_tasks_committed. But here's what's  
really disturbing...


 Aggregate  (cost=994625.69..994625.70 rows=1 width=0) (actual  
time=7432.306..7432.306 rows=1 loops=1)
   ->  Seq Scan on loan_tasks_committed  (cost=0.00..929345.35  
rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689  
loops=1)


vs

 Aggregate  (cost=1516929.75..1516929.76 rows=1 width=0) (actual  
time=60396.081..60396.082 rows=1 loops=1)
   ->  Append  (cost=0.00..1190523.94 rows=26112465 width=240)  
(actual time=0.023..57902.470 rows=26115689 loops=1)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1190466.70  
rows=26112135 width=162) (actual time=0.023..54776.335 rows=26115689  
loops=1)
   ->  Seq Scan on loan_tasks_committed   
(cost=0.00..929345.35 rows=26112135 width=162) (actual  
time=0.014..22531.902 rows=26115689 loops=1)
 ->  Subquery Scan "*SELECT* 2"  (cost=36.10..57.24 rows=330  
width=240) (actual time=0.003..0.003 rows=0 loops=1)


How on earth did the seqscan suddenly take 4x longer? And why is the  
subquery scan then doubling the amount of time again?

--
Decibel! [EMAIL PROTECTED] (512) 569-9461




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddity with view (now with test case)

2008-11-10 Thread Jim &#x27;Decibel!' Nasby

On Nov 10, 2008, at 12:21 PM, Richard Huxton wrote:

Jim 'Decibel!' Nasby wrote:

On Nov 10, 2008, at 7:06 AM, Tom Lane wrote:

"Jim 'Decibel!' Nasby" <[EMAIL PROTECTED]> writes:
loan_tasks effectively does SELECT * FROM loan_tasks_committed  
UNION

ALL SELECT * FROM loan_tasks_pending;.


You seem to have neglected to mention a join or two.



Yeah, though I did show them at the end of the message...

 SELECT true AS "committed", loan_tasks_committed.id, ...,
loan_tasks_committed.task_amount
   FROM loan_tasks_committed
UNION ALL
 SELECT false AS "committed", ltp.id, ..., NULL::"unknown" AS  
task_amount

   FROM loan_tasks_pending ltp
   JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id;

Thing is, there's no data to be had on that side. All of the time is
going into the seqscan of loan_tasks_committed. But here's what's  
really

disturbing...



   ->  Seq Scan on loan_tasks_committed  (cost=0.00..929345.35
rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689  
loops=1)



   ->  Seq Scan on loan_tasks_committed
(cost=0.00..929345.35 rows=26112135 width=162) (actual
time=0.014..22531.902 rows=26115689 loops=1)


It's the width - the view is fetching all the rows. Is the "true as
committed" bit confusing it?


Turns out, no. I was just writing up a stand-alone test case and  
forgot to include that, but there's still a big difference (note what  
I'm pasting is now from HEAD as of a bit ago, but I see the effect on  
8.2 as well):


[EMAIL PROTECTED] explain analyze select count(*) from a;
 QUERY PLAN
 
-
 Aggregate  (cost=137164.57..137164.58 rows=1 width=0) (actual  
time=4320.986..4320.986 rows=1 loops=1)
   ->  Seq Scan on a  (cost=0.00..120542.65 rows=6648765 width=0)  
(actual time=0.188..2707.433 rows=999 loops=1)

 Total runtime: 4321.039 ms
(3 rows)

Time: 4344.158 ms
[EMAIL PROTECTED] explain analyze select count(*) from v;
 
QUERY PLAN
 
--
 Aggregate  (cost=270286.52..270286.53 rows=1 width=0) (actual  
time=14766.630..14766.630 rows=1 loops=1)
   ->  Append  (cost=0.00..187150.20 rows=6650905 width=36) (actual  
time=0.039..12810.073 rows=999 loops=1)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..187030.30  
rows=6648765 width=36) (actual time=0.039..10581.367 rows=999  
loops=1)
   ->  Seq Scan on a  (cost=0.00..120542.65 rows=6648765  
width=36) (actual time=0.038..5731.748 rows=999 loops=1)
 ->  Subquery Scan "*SELECT* 2"  (cost=37.67..119.90  
rows=2140 width=40) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Hash Join  (cost=37.67..98.50 rows=2140 width=40)  
(actual time=0.002..0.002 rows=0 loops=1)

 Hash Cond: (b.c_id = c.c_id)
 ->  Seq Scan on b  (cost=0.00..31.40 rows=2140  
width=8) (actual time=0.000..0.000 rows=0 loops=1)
 ->  Hash  (cost=22.30..22.30 rows=1230  
width=36) (never executed)
   ->  Seq Scan on c  (cost=0.00..22.30  
rows=1230 width=36) (never executed)

 Total runtime: 14766.784 ms
(11 rows)

Time: 14767.550 ms

In 8.2, it took 20 seconds to go through the view:
 
QUERY PLAN
 
--
 Aggregate  (cost=303960.98..303960.99 rows=1 width=0) (actual  
time=20268.877..20268.877 rows=1 loops=1)
   ->  Append  (cost=0.00..211578.98 rows=7390560 width=40) (actual  
time=0.038..17112.190 rows=999 loops=1)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..211467.40  
rows=7388620 width=36) (actual time=0.038..13973.782 rows=999  
loops=1)
   ->  Seq Scan on a  (cost=0.00..137581.20 rows=7388620  
width=36) (actual time=0.037..8280.204 rows=999 loops=1)
 ->  Subquery Scan "*SELECT* 2"  (cost=36.10..111.58  
rows=1940 width=40) (actual time=0.003..0.003 rows=0 loops=1)
   ->  Hash Join  (cost=36.10..92.18 rows=1940 width=40)  
(actual time=0.002..0.002 rows=0 loops=1)

 Hash Cond: (b.c_id = c.c_id)
 ->  Seq Scan on b  (cost=0.00..29.40 rows=1940  
width=8) (actual time=0.000..0.000 rows=0 loops=1)
 ->  Hash  (cost=21.60..21.60 rows=1160  
width=36) (never executed)
   ->  Seq Scan on c  (cost=0.00..21.60  
rows=1160 wi

Re: [PERFORM] Oddity with view (now with test case)

2008-11-10 Thread Jim &#x27;Decibel!' Nasby

On Nov 10, 2008, at 1:31 PM, Tom Lane wrote:

"Jim 'Decibel!' Nasby" <[EMAIL PROTECTED]> writes:

Here's the commands to generate the test case:



create table a(a int, b text default 'test text');
create table c(c_id serial primary key, c_text text);
insert into c(c_text) values('a'),('b'),('c');
create table b(a int, c_id int references c(c_id));
create view v as select a, b, null as c_id, null as c_text from a
union all select a, null, b.c_id, c_text from b join c on (b.c_id=
c.c_id);
\timing
insert into a(a) select generate_series(1,999);
select count(*) from a;
select count(*) from v;
explain analyze select count(*) from a;
explain analyze select count(*) from v;


I think what you're looking at is projection overhead and per-plan- 
node

overhead (the EXPLAIN ANALYZE in itself contributes quite a lot of the
latter).


True... under HEAD explain took 13 seconds while a plain count took  
10. Still not very good considering the count from the raw table took  
about 4 seconds (with or without explain).



  One thing you could do is be more careful about making the
union input types match up so that no subquery scan nodes are  
required:


create view v2 as select a, b, null::int as c_id, null::text as  
c_text from a
union all select a, null::text, b.c_id, c_text from b join c on  
(b.c_id=c.c_id);


On my machine this runs about twice as fast as the original view.


Am I missing some magic? I'm still getting the subquery scan.

[EMAIL PROTECTED] explain select count(*) from v2;
  QUERY PLAN
 
--

 Aggregate  (cost=279184.19..279184.20 rows=1 width=0)
   ->  Append  (cost=0.00..254178.40 rows=10002315 width=0)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..254058.50  
rows=1175 width=0)
   ->  Seq Scan on a  (cost=0.00..154056.75  
rows=1175 width=14)
 ->  Subquery Scan "*SELECT* 2"  (cost=37.67..119.90  
rows=2140 width=0)

   ->  Hash Join  (cost=37.67..98.50 rows=2140 width=40)
 Hash Cond: (b.c_id = c.c_id)
 ->  Seq Scan on b  (cost=0.00..31.40 rows=2140  
width=8)

 ->  Hash  (cost=22.30..22.30 rows=1230 width=36)
   ->  Seq Scan on c  (cost=0.00..22.30  
rows=1230 width=36)

(10 rows)

Time: 0.735 ms
[EMAIL PROTECTED] \d v2
   View "public.v2"
 Column |  Type   | Modifiers
+-+---
 a  | integer |
 b  | text|
 c_id   | integer |
 c_text | text|
View definition:
 SELECT a.a, a.b, NULL::integer AS c_id, NULL::text AS c_text
   FROM a
UNION ALL
     SELECT b.a, NULL::text AS b, b.c_id, c.c_text
   FROM b
  JOIN c ON b.c_id = c.c_id;

That's on HEAD, btw.
--
Decibel! [EMAIL PROTECTED] (512) 569-9461




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddity with view (now with test case)

2008-11-11 Thread Jim &#x27;Decibel!' Nasby

On Nov 10, 2008, at 9:20 PM, Tom Lane wrote:

"Jim 'Decibel!' Nasby" <[EMAIL PROTECTED]> writes:

On Nov 10, 2008, at 1:31 PM, Tom Lane wrote:

On my machine this runs about twice as fast as the original view.



Am I missing some magic? I'm still getting the subquery scan.


Hmm, I'm getting a core dump :-( ... this seems to be busted in HEAD.
8.3 gets it right though.


Doesn't seem to for me... :/

[EMAIL PROTECTED] select version();
  
version
 
-
 PostgreSQL 8.3.5 on i386-apple-darwin8.11.1, compiled by GCC i686- 
apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5370)

(1 row)

Time: 0.250 ms
[EMAIL PROTECTED] explain select count(*) from v2;
  QUERY PLAN
 
--

 Aggregate  (cost=279184.19..279184.20 rows=1 width=0)
   ->  Append  (cost=0.00..254178.40 rows=10002315 width=0)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..254058.50  
rows=1175 width=0)
   ->  Seq Scan on a  (cost=0.00..154056.75  
rows=1175 width=14)
 ->  Subquery Scan "*SELECT* 2"  (cost=37.67..119.90  
rows=2140 width=0)

   ->  Hash Join  (cost=37.67..98.50 rows=2140 width=40)
 Hash Cond: (b.c_id = c.c_id)
 ->  Seq Scan on b  (cost=0.00..31.40 rows=2140  
width=8)

 ->  Hash  (cost=22.30..22.30 rows=1230 width=36)
   ->  Seq Scan on c  (cost=0.00..22.30  
rows=1230 width=36)

(10 rows)

Time: 0.923 ms
[EMAIL PROTECTED] \d v2
   View "public.v2"
 Column |  Type   | Modifiers
+-+---
 a  | integer |
 b  | text|
 c_id   | integer |
 c_text | text|
View definition:
 SELECT a.a, a.b, NULL::integer AS c_id, NULL::text AS c_text
   FROM a
UNION ALL
 SELECT b.a, NULL::text AS b, b.c_id, c.c_text
   FROM b
   JOIN c ON b.c_id = c.c_id;

[EMAIL PROTECTED]
--
Decibel! [EMAIL PROTECTED] (512) 569-9461




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddity with view (now with test case)

2008-11-11 Thread Jim &#x27;Decibel!' Nasby

On Nov 11, 2008, at 1:15 PM, Tom Lane wrote:

"Jim 'Decibel!' Nasby" <[EMAIL PROTECTED]> writes:

On Nov 10, 2008, at 9:20 PM, Tom Lane wrote:

8.3 gets it right though.



Doesn't seem to for me... :/


Oh, I was looking at "select * from v2" not "select count(*) from v2".
HEAD is a bit smarter about the latter than 8.3 is.


So here's something odd... in both 8.3 and HEAD from a while ago it  
gives a better plan for SELECT * than for SELECT count(*):


[EMAIL PROTECTED] explain analyze select * from v2;
 QUERY PLAN
 
-
 Result  (cost=0.00..254178.40 rows=10002315 width=72) (actual  
time=0.049..8452.152 rows=999 loops=1)
   ->  Append  (cost=0.00..254178.40 rows=10002315 width=72) (actual  
time=0.048..5887.025 rows=999 loops=1)
 ->  Seq Scan on a  (cost=0.00..154056.75 rows=1175  
width=14) (actual time=0.048..4207.482 rows=999 loops=1)
 ->  Hash Join  (cost=37.67..98.50 rows=2140 width=40)  
(actual time=0.002..0.002 rows=0 loops=1)

   Hash Cond: (b.c_id = c.c_id)
   ->  Seq Scan on b  (cost=0.00..31.40 rows=2140  
width=8) (actual time=0.000..0.000 rows=0 loops=1)
   ->  Hash  (cost=22.30..22.30 rows=1230 width=36)  
(never executed)
 ->  Seq Scan on c  (cost=0.00..22.30 rows=1230  
width=36) (never executed)

 Total runtime: 9494.162 ms
(9 rows)

[EMAIL PROTECTED] explain analyze select count(*) from v2;
   QUERY  
PLAN
 
-
 Aggregate  (cost=279184.19..279184.20 rows=1 width=0) (actual  
time=13155.524..13155.524 rows=1 loops=1)
   ->  Append  (cost=0.00..254178.40 rows=10002315 width=0) (actual  
time=0.045..11042.562 rows=999 loops=1)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..254058.50  
rows=1175 width=0) (actual time=0.045..8976.352 rows=999  
loops=1)
   ->  Seq Scan on a  (cost=0.00..154056.75  
rows=1175 width=14) (actual time=0.045..5936.930 rows=999  
loops=1)
 ->  Subquery Scan "*SELECT* 2"  (cost=37.67..119.90  
rows=2140 width=0) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Hash Join  (cost=37.67..98.50 rows=2140 width=40)  
(actual time=0.002..0.002 rows=0 loops=1)

 Hash Cond: (b.c_id = c.c_id)
 ->  Seq Scan on b  (cost=0.00..31.40 rows=2140  
width=8) (actual time=0.001..0.001 rows=0 loops=1)
 ->  Hash  (cost=22.30..22.30 rows=1230  
width=36) (never executed)
   ->  Seq Scan on c  (cost=0.00..22.30  
rows=1230 width=36) (never executed)

 Total runtime: 13155.642 ms
(11 rows)

[EMAIL PROTECTED] explain analyze select count(*) from (select  
* from v2 offset 0) a;
   QUERY  
PLAN
 
-
 Aggregate  (cost=379207.34..379207.35 rows=1 width=0) (actual  
time=12592.273..12592.274 rows=1 loops=1)
   ->  Limit  (cost=0.00..254178.40 rows=10002315 width=72) (actual  
time=0.173..11057.717 rows=999 loops=1)
 ->  Result  (cost=0.00..254178.40 rows=10002315 width=72)  
(actual time=0.172..9213.524 rows=999 loops=1)
   ->  Append  (cost=0.00..254178.40 rows=10002315  
width=72) (actual time=0.172..6608.656 rows=999 loops=1)
 ->  Seq Scan on a  (cost=0.00..154056.75  
rows=1175 width=14) (actual time=0.171..4793.116 rows=999  
loops=1)
 ->  Hash Join  (cost=37.67..98.50 rows=2140  
width=40) (actual time=0.002..0.002 rows=0 loops=1)

   Hash Cond: (b.c_id = c.c_id)
   ->  Seq Scan on b  (cost=0.00..31.40  
rows=2140 width=8) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Hash  (cost=22.30..22.30 rows=1230  
width=36) (never executed)
 ->  Seq Scan on c   
(cost=0.00..22.30 rows=1230 width=36) (never executed)

 Total runtime: 12592.442 ms
(11 rows)

And yes, explain overhead is huge...

[EMAIL PROTECTED] \timing
Timing is on.
[EMAIL PROTECTED] select count(*) from v2;
  count
-
 999
(1 row)

Time: 6217.624 ms
[EMAIL PROTECTED]

--
Decibel! [EMAIL PROTECTED] (512) 569-9461




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance