Re: [PERFORM] Test...

2003-09-29 Thread Tom Lane
David Griffiths <[EMAIL PROTECTED]> writes:
> Is there a size limit to an email

IIRC, the standard policy on the pgsql lists is that messages over 40K
or so will be delayed for moderator approval.  However, you should have
gotten immediate replies from the majordomo 'bot telling you so.  If you
got nothing, there's a configuration problem with the pg-perform mail
list or your subscription or something.  Talk to Marc (scrappy at
hub.org) about identifying and fixing the issue.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Test...

2003-09-29 Thread David Griffiths



I've posted several emails, and have yet to see one 
show up (this one might not either).
 
Is there a size limit to an email (it had a big 
analyze, and schema information)??
David


Re: [PERFORM] advice on raid controller

2003-09-29 Thread Bruce Momjian
\Palle Girgensohn wrote:
> Come to think of it, I guess a battery-backed cache will make fsync as fast 
> as no fsync, right? So, the q was kinda stoopid... :-/

With fsync off, the data might never get to the battery-backed RAM.  :-(

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] advice on raid controller

2003-09-29 Thread Matt Clark
Not in general.  Besides, with a write-back cache an fsync() is very nearly
'free', as the controller will report the write as completed as soon as it's
written to cache.

I keep meaning to benchmark the difference, but I only have the facility on
a production box, so caution gets the better of me every time :-)

AFAIK the fsync calls are used to guarantee the _ordering_ of writes to
permanent storage (i.e. fsync() is called before doing something, rather
than after doing something.  So PG can be sure that before it does B, A has
definitely been written to disk).

But I could well be wrong.  And there could well be strategies exploitable
with the knowledge that a write-back cache exists that aren't currently
implemented - though intuitively I doubt it.

M




> -Original Message-
> From: Palle Girgensohn [mailto:[EMAIL PROTECTED]
> Sent: 29 September 2003 22:32
> To: Matt Clark; [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: [PERFORM] advice on raid controller
>
>
> Stupid question, perhaps, but would a battery-backed cache make
> it safe to
> set fsync=false in postgresql.conf?
>
> /Palle
>
> --On söndag, september 28, 2003 13.07.57 +0100 Matt Clark
> <[EMAIL PROTECTED]>
> wrote:
>
> > As others have mentioned, you really ought to get
> battery-backed cache if
> > you're doing any volume of writes.  The ability to do safe write-back
> > caching makes an *insane* difference to write performance.
>
>
>
>
>


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] advice on raid controller

2003-09-29 Thread Palle Girgensohn
Come to think of it, I guess a battery-backed cache will make fsync as fast 
as no fsync, right? So, the q was kinda stoopid... :-/

/Palle

--On måndag, september 29, 2003 23.31.54 +0200 Palle Girgensohn 
<[EMAIL PROTECTED]> wrote:

Stupid question, perhaps, but would a battery-backed cache make it safe
to set fsync=false in postgresql.conf?
/Palle

--On söndag, september 28, 2003 13.07.57 +0100 Matt Clark
<[EMAIL PROTECTED]> wrote:
As others have mentioned, you really ought to get battery-backed cache if
you're doing any volume of writes.  The ability to do safe write-back
caching makes an *insane* difference to write performance.








---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] advice on raid controller

2003-09-29 Thread Palle Girgensohn
Stupid question, perhaps, but would a battery-backed cache make it safe to 
set fsync=false in postgresql.conf?

/Palle

--On söndag, september 28, 2003 13.07.57 +0100 Matt Clark <[EMAIL PROTECTED]> 
wrote:

As others have mentioned, you really ought to get battery-backed cache if
you're doing any volume of writes.  The ability to do safe write-back
caching makes an *insane* difference to write performance.




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] avoiding seqscan?

2003-09-29 Thread Palle Girgensohn


--On måndag, september 29, 2003 11.12.55 -0400 Christopher Browne 
<[EMAIL PROTECTED]> wrote:

[EMAIL PROTECTED] (Palle Girgensohn) writes:
Will that make a difference? From what I've seen, it does not make
much difference, but I have seen queries speed up when rewritten
explicit joins. I guess it depends on other things, but is it really
so that the explicit joins are bad somehow? Do you have any pointers
to documentation about it, if so?
The problem is that if you expressly specify the joins, the query
optimizer can't choose its own paths.  And while that may not be
better at the moment, it is quite possible that when you upgrade to a
newer version, those queries, if "not join-specified," could
immediately get faster.
You've got a point here. Still, with some queries, since the data is pretty 
static and we know much about its distribution over the tables, we had to 
explicitally tell postgresql how to optimze the queries to get them fast 
enough. We cannot afford any queries to be more than fractions of seconds, 
really.

I would expect that the query that uses implicit joins will be clearer
to read, which adds a little further merit to that direction.
Depends, I actually don't agree on this, but I guess it depends on which 
syntax you're used to.

That goes along with the usual way that it is preferable to optimize
things, namely that you should start by solving the problem as simply
as you can, and only proceed to further optimization if that actually
proves necessary.  Optimization efforts commonly add complexity and
make code more difficult to maintain; that's not the place to start if
you don't even know the effort is necessary.
Oh, but of course. For the queries I refer to, optimization actually proved 
necessary, believe me :-)

Cheers,
Palle
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] advice on raid controller

2003-09-29 Thread scott.marlowe
On 29 Sep 2003, Will LaShell wrote:

> On Mon, 2003-09-29 at 06:48, scott.marlowe wrote:
> > I've used the megaraid / LSI cards in the past and they were pretty good 
> > in terms of reliability, but the last one I used was the 328 model, from 4 
> > years ago or so.  that one had a battery backup option for the cache, and 
> > could go to 128 Meg.  We tested it with 4/16 and 128 meg ram, and it was 
> > about the same with each, but we didn't do heavy parallel testing either.
> > 
> > Here's the page on the megaraid cards at lsilogic.com:
> > 
> > http://www.lsilogic.com/products/stor_prod/raid/ultra320products.html
> > 
> > On Sun, 28 Sep 2003, Matt Clark wrote:
> > 
> > > As others have mentioned, you really ought to get battery-backed cache if
> > > you're doing any volume of writes.  The ability to do safe write-back
> > > caching makes an *insane* difference to write performance.
> > > 
> > > The site you link to also has that for only 15% more money:
> > > http://uk.azzurri.com/product/product.cgi?productId=80
> > > 
> > > No experience with the card(s) I'm afraid.
> > > 
> > > In general though, U320 will only be faster than U160 for large sequential
> > > reads, or when you have silly numbers of disks on a channel (i.e. more than
> > > 4/channel).  If you have silly numbers of disks, then RAID5 will probably be
> > > better, if you have 4 disks total then RAID1+0 will probably be better.  In
> > > between it depends on all sorts of other factors.  Bear in mind though that
> > > if you *do* have silly numbers of disks then more channels and more cache
> > > will count for more than anything else, so spend the money on that rather
> > > than latest-and-greatest performance for a single channel.
> 
> Just to add my thoughts,  we use the MegaRaid Elite 1650 in 3 servers
> here that drive our core databases.  We paired up the controllers with
> the Seagate Cheetah 10k drives,  we could have purchased the X15's which
> are Seagate's 15k version, but due to budget constraints and lack of
> true performance increase from the 10k to the 15k rpm drives we didn't
> opt for them.
> 
> I have to say that I've been 100% pleased with the performance and
> reliability of the Megaraid controllers. They do everything a good
> controller should and they are very easy to manage. The driver is
> actively maintained by the guys at LSI and their tech support personnel
> are very good as well.
> 
> If you want any specific information or have any questions about our
> experience or configuration please feel free to contact me.

To add one more feature the LSI/MegaRAIDs have that I find interesting, 
you can put two in a machine, build a RAID0 or 5 on each card, then mirror 
the two cards together, and should one card / RAID0 ot 5 chain die, the 
other card will keep working.  I.e. the work like one big card with 
failover.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Oleg Lebedev
Yes Josh,
L_partkey is a part of the foreign key on the Lineitem table, and it was
ok to create an index on it according to the TPC-R specs. I just created
indices on the rest of the FK columns in the TPC-R database and will
continue my evaluations.
Thanks.

Oleg 

-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 29, 2003 12:11 PM
To: Oleg Lebedev; Mary Edie Meredith
Cc: Jenny Zhang; pgsql-performance
Subject: Re: [PERFORM] TPC-R benchmarks


Oleg,

> I just checked the restrictions on the TPC-R and TPC-H schemas and it 
> seems that all indexes are allowed in TPC-R and only those that index 
> parts of primary or foreign keys are allowed in TPC-H.

That would be appropriate for this case though, yes?   That column is
part of 
a foriegn key, unless I've totally lost track.

As I remarked before, Postgres does *not* automatically create indexes
for 
FKs.   Many, but not all, other database products do, so comparing
PostgreSQL 
against those products without the index is unfair.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Josh Berkus
Oleg,

> I just checked the restrictions on the TPC-R and TPC-H schemas and it
> seems that all indexes are allowed in TPC-R and only those that index
> parts of primary or foreign keys are allowed in TPC-H.

That would be appropriate for this case though, yes?   That column is part of 
a foriegn key, unless I've totally lost track.

As I remarked before, Postgres does *not* automatically create indexes for 
FKs.   Many, but not all, other database products do, so comparing PostgreSQL 
against those products without the index is unfair.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Andrew Sullivan
On Mon, Sep 29, 2003 at 05:43:26PM -, [EMAIL PROTECTED] wrote:
> 
> Anyone have a rough idea of the costs involved?

I did a back-of-an-envelope calculation one day and stopped when I
got to $10,000.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] advice on raid controller

2003-09-29 Thread Will LaShell
On Mon, 2003-09-29 at 06:48, scott.marlowe wrote:
> I've used the megaraid / LSI cards in the past and they were pretty good 
> in terms of reliability, but the last one I used was the 328 model, from 4 
> years ago or so.  that one had a battery backup option for the cache, and 
> could go to 128 Meg.  We tested it with 4/16 and 128 meg ram, and it was 
> about the same with each, but we didn't do heavy parallel testing either.
> 
> Here's the page on the megaraid cards at lsilogic.com:
> 
> http://www.lsilogic.com/products/stor_prod/raid/ultra320products.html
> 
> On Sun, 28 Sep 2003, Matt Clark wrote:
> 
> > As others have mentioned, you really ought to get battery-backed cache if
> > you're doing any volume of writes.  The ability to do safe write-back
> > caching makes an *insane* difference to write performance.
> > 
> > The site you link to also has that for only 15% more money:
> > http://uk.azzurri.com/product/product.cgi?productId=80
> > 
> > No experience with the card(s) I'm afraid.
> > 
> > In general though, U320 will only be faster than U160 for large sequential
> > reads, or when you have silly numbers of disks on a channel (i.e. more than
> > 4/channel).  If you have silly numbers of disks, then RAID5 will probably be
> > better, if you have 4 disks total then RAID1+0 will probably be better.  In
> > between it depends on all sorts of other factors.  Bear in mind though that
> > if you *do* have silly numbers of disks then more channels and more cache
> > will count for more than anything else, so spend the money on that rather
> > than latest-and-greatest performance for a single channel.

Just to add my thoughts,  we use the MegaRaid Elite 1650 in 3 servers
here that drive our core databases.  We paired up the controllers with
the Seagate Cheetah 10k drives,  we could have purchased the X15's which
are Seagate's 15k version, but due to budget constraints and lack of
true performance increase from the 10k to the 15k rpm drives we didn't
opt for them.

I have to say that I've been 100% pleased with the performance and
reliability of the Megaraid controllers. They do everything a good
controller should and they are very easy to manage. The driver is
actively maintained by the guys at LSI and their tech support personnel
are very good as well.

If you want any specific information or have any questions about our
experience or configuration please feel free to contact me.

Sincerely,

Will LaShell



> > HTH
> > 
> > Matt
> > 
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] Behalf Of Richard
> > > Jones
> > > Sent: 27 September 2003 18:25
> > > To: [EMAIL PROTECTED]
> > > Subject: [PERFORM] advice on raid controller
> > >
> > >
> > > Hi, i'm on the verge of buying a "MegaRAID SCSI 320-2" raid controller.
> > > I need it to build a db server using 4x ultra320 scsi disks
> > > i'm thinking raid 1+0 but will try with raid5 too and compare
> > >
> > > Does anyone have any experience with this model, good or bad i'd like to
> > > know.. thanks :)
> > >
> > > as seen:
> > > http://uk.azzurri.com/product/product.cgi?productId=188
> > >
> > > Regards,
> > > Richard.
> > >
> > > PS: whoever mentioned starting a site with raid controller
> > > reviews, excellent
> > > idea - its hard to find decent info on which card to buy.



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Performance: BigInt vs Decimal(19,0)

2003-09-29 Thread Josh Berkus
Franco,

> Wouldn't it be the most portable solution to work with a domain?
> CREATE DOMAIN BIG_NUMBER AS BIGINT;
>
> If I use BIG_NUMBER everywhere I need it in my database, porting it to
> other database products should be easy... any SQL 92 compliant dbms
> should support domains.

This is a good idea, on general principles.  Abstracted design is a good 
thing.  

Regrettably, though, a lot of commercial databases do not support DOMAIN.  
You'll need to check which databases you are thinking of porting to first.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread greg

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


> I'm pretty certain that there are no TPC-certified test results for
> Postgres, because to date no organization has cared to spend the money
> needed to perform a certifiable test.

Anyone have a rough idea of the costs involved?


- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200309291344
-BEGIN PGP SIGNATURE-
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/eG+avJuQZxSWSsgRApDFAJ4md34LacZhJbjnydjNGzqfLy2IzQCg5m/8
XiD273M2ugzCWd7YF5zbkio=
=jGkx
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Oleg Lebedev

Oops, my previous message got cut off.
Here is the end of it:
I just checked the restrictions on the TPC-R and TPC-H schemas and it
seems that all indexes are allowed in TPC-R and only those that index
parts of primary or foreign keys are allowed in TPC-H.
Thanks.

Oleg 

-Original Message-
From: Oleg Lebedev 
Sent: Monday, September 29, 2003 11:23 AM
To: Mary Edie Meredith
Cc: Jenny Zhang; pgsql-performance
Subject: Re: [PERFORM] TPC-R benchmarks
Importance: Low



It took 10 hours to compute the query without the index on
lineitem.l_partkey. Once I created the index on lineitem.l_partkey, it
took only 32 secs to run the same query.  
After VACUUM ANALYZE it took 72 secs to run the query.
All the subsequent runs took under 3 seconds!

That's quite amazing!

I just checked  

-Original Message-
From: Mary Edie Meredith [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 29, 2003 10:04 AM
To: Oleg Lebedev
Cc: Tom Lane; Jenny Zhang; pgsql-performance
Subject: RE: [PERFORM] TPC-R benchmarks


On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote:
> I left my TPC-R query #17 working over the weekend and it took 3988
> mins ~ 10 hours to complete. And this is considering that I am using a

> TPC-R database created with a scale factor of 1, which corresponds to
> ~1 GB of data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 
> MB RAM.

Was this run with or without the l_partkey index that Jenny suggested? 

> 
> Here is an excerpt from my postgresql.conf file (the rest of the
> settings are commented out):
> 
> #
> # Shared Memory Size
> #
> shared_buffers = 16384# 2*max_connections, min 16,
typically
> 8KB each
> 
> #
> # Non-shared Memory Sizes
> #
> sort_mem = 32768
> 
> #
> # Optimizer Parameters
> #
> effective_cache_size = 32000  # typically 8KB each
> 
> Any suggestions on how to optimize these settings?
> 
> I agree with Jenny that declaring additional indexes on the TPC-R
> tables may alter the validity of the benchmarks. Are there any 
> official TPC benchmarks submitted by PostgreSQL?

Actually, for the TPC-R you _are allowed to declare additional indexes. 
With TPC-H you are restricted to a specific set listed in the spec (an
index on l_partkey is allowed for both).

What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the
query for the purposes of making the query run faster.

Sorry if I was unclear.

Valid TPC-R benchmark results are on the TPC web site:
http://www.tpc.org/tpcr/default.asp  

I do not see one for PostgreSQL.


Regards,

Mary 

-- 
Mary Edie Meredith <[EMAIL PROTECTED]>
Open Source Development Lab

> 
> Thanks.
> 
> Oleg
> 
> -Original Message-
> From: Mary Edie Meredith [mailto:[EMAIL PROTECTED]
> Sent: Friday, September 26, 2003 10:12 AM
> To: Tom Lane
> Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
> Subject: Re: [PERFORM] TPC-R benchmarks
> 
> 
> The TPC-H/R rules allow only minor changes to the SQL that are
> necessary due to SQL implementation differences. They do not allow 
> changes made to improve performance.  It is their way to test 
> optimizer's ability to recognize an inefficient SQL statement and do 
> the rewrite.
> 
> The rule makes sense for the TPC-H, which is supposed to represent
> ad-Hoc query.  One might argue that for TPC-R, which is suppose to 
> represent "Reporting" with pre-knowledge of the query, that re-write 
> should be allowed. However, that is currently not the case. Since the 
> RDBMS's represented on the TPC council are competing with TPC-H, their

> optimizers already do the re-write, so (IMHO) there is no motivation
> to relax the rules for the TPC-R.
> 
> 
> On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
> > Oleg Lebedev <[EMAIL PROTECTED]> writes:
> > > Seems like in your case postgres uses an i_l_partkey index on 
> > > lineitem table. I have a foreign key constraint defined between
the 
> > > lineitem and part table, but didn't create an special indexes.
Here 
> > > is my query plan:
> > 
> > The planner is obviously unhappy with this plan (note the large cost

> > numbers), but it can't find a way to do better.  An index on 
> > lineitem.l_partkey would help, I think.
> > 
> > The whole query seems like it's written in a very inefficient
> > fashion;
> 
> > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized
> > across
> 
> > multiple join rows?  But I dunno whether the TPC rules allow for 
> > significant manual rewriting of the given query.
> > 
> > regards, tom lane
> > 
> > ---(end of
> > broadcast)---
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >   subscribe-nomail command to [EMAIL PROTECTED] so that
> your
> >   message can get through to the mailing list cleanly

*

This e-mail may contain privileged or confidential material intended for
the named recipient only. If you are not the named recipient, delete
this me

Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Tom Lane
Shridhar Daithankar <[EMAIL PROTECTED]> writes:
> Also if you have fast disk drives, you can reduce random page cost to 2 or 1.5.

Note however that most of the people who have found smaller
random_page_cost to be helpful are in situations where most of their
data fits in RAM.  Reducing the cost towards 1 simply reflects the fact
that there's no sequential-fetch advantage when grabbing data that's
already in RAM.

When benchmarking with data sets considerably larger than available
buffer cache, I rather doubt that small random_page_cost would be a good
idea.  Still, you might as well experiment to see.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Oleg Lebedev

It took 10 hours to compute the query without the index on
lineitem.l_partkey.
Once I created the index on lineitem.l_partkey, it took only 32 secs to
run the same query.  
After VACUUM ANALYZE it took 72 secs to run the query.
All the subsequent runs took under 3 seconds!

That's quite amazing!

I just checked  

-Original Message-
From: Mary Edie Meredith [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 29, 2003 10:04 AM
To: Oleg Lebedev
Cc: Tom Lane; Jenny Zhang; pgsql-performance
Subject: RE: [PERFORM] TPC-R benchmarks


On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote:
> I left my TPC-R query #17 working over the weekend and it took 3988 
> mins ~ 10 hours to complete. And this is considering that I am using a

> TPC-R database created with a scale factor of 1, which corresponds to 
> ~1 GB of data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 
> MB RAM.

Was this run with or without the l_partkey index that Jenny suggested? 

> 
> Here is an excerpt from my postgresql.conf file (the rest of the 
> settings are commented out):
> 
> #
> # Shared Memory Size
> #
> shared_buffers = 16384# 2*max_connections, min 16,
typically
> 8KB each
> 
> #
> # Non-shared Memory Sizes
> #
> sort_mem = 32768
> 
> #
> # Optimizer Parameters
> #
> effective_cache_size = 32000  # typically 8KB each
> 
> Any suggestions on how to optimize these settings?
> 
> I agree with Jenny that declaring additional indexes on the TPC-R 
> tables may alter the validity of the benchmarks. Are there any 
> official TPC benchmarks submitted by PostgreSQL?

Actually, for the TPC-R you _are allowed to declare additional indexes. 
With TPC-H you are restricted to a specific set listed in the spec (an
index on l_partkey is allowed for both).

What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the
query for the purposes of making the query run faster.

Sorry if I was unclear.

Valid TPC-R benchmark results are on the TPC web site:
http://www.tpc.org/tpcr/default.asp  

I do not see one for PostgreSQL.


Regards,

Mary 

-- 
Mary Edie Meredith <[EMAIL PROTECTED]>
Open Source Development Lab

> 
> Thanks.
> 
> Oleg
> 
> -Original Message-
> From: Mary Edie Meredith [mailto:[EMAIL PROTECTED]
> Sent: Friday, September 26, 2003 10:12 AM
> To: Tom Lane
> Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
> Subject: Re: [PERFORM] TPC-R benchmarks
> 
> 
> The TPC-H/R rules allow only minor changes to the SQL that are 
> necessary due to SQL implementation differences. They do not allow 
> changes made to improve performance.  It is their way to test 
> optimizer's ability to recognize an inefficient SQL statement and do 
> the rewrite.
> 
> The rule makes sense for the TPC-H, which is supposed to represent 
> ad-Hoc query.  One might argue that for TPC-R, which is suppose to 
> represent "Reporting" with pre-knowledge of the query, that re-write 
> should be allowed. However, that is currently not the case. Since the 
> RDBMS's represented on the TPC council are competing with TPC-H, their

> optimizers already do the re-write, so (IMHO) there is no motivation 
> to relax the rules for the TPC-R.
> 
> 
> On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
> > Oleg Lebedev <[EMAIL PROTECTED]> writes:
> > > Seems like in your case postgres uses an i_l_partkey index on
> > > lineitem table. I have a foreign key constraint defined between
the 
> > > lineitem and part table, but didn't create an special indexes.
Here 
> > > is my query plan:
> > 
> > The planner is obviously unhappy with this plan (note the large cost
> > numbers), but it can't find a way to do better.  An index on 
> > lineitem.l_partkey would help, I think.
> > 
> > The whole query seems like it's written in a very inefficient 
> > fashion;
> 
> > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized 
> > across
> 
> > multiple join rows?  But I dunno whether the TPC rules allow for
> > significant manual rewriting of the given query.
> > 
> > regards, tom lane
> > 
> > ---(end of
> > broadcast)---
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >   subscribe-nomail command to [EMAIL PROTECTED] so that
> your
> >   message can get through to the mailing list cleanly

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance: BigInt vs Decimal(19,0)

2003-09-29 Thread Andrew Rawnsley
On Saturday, September 27, 2003, at 10:39 PM, Yusuf W. wrote:

Now, I've got to convince my project's software
architech, that a bigint would be better than a
decimal.
Does anyone know where I could get some documentation
on how the int and decimal are implemented so I could
prove to him that ints are better?  Can people suggest
good points to make in order to prove it?
Print out Tom's reply and give it to him. Saying 'one of the people who 
develops the thing says so' ought to carry some weight. I would hope...


Thanks in advance.

--- Tom Lane <[EMAIL PROTECTED]> wrote:
"Yusuf W." <[EMAIL PROTECTED]> writes:
For the application that I'm working on, we want
to
use data types that are database independent.
(most
databases has decimal, but not big int).
Most databases have bigint, I think.

Anyhow, we are planning on using decimal(19,0) for
our
primary keys instead of a big int, would there be
a
performance difference in using a bigint over
using decimals?

You'll be taking a very large performance hit, for
very little benefit
that I can see.  How hard could it be to change the
column declarations
if you ever move to a database without bigint?
There's not normally
much need for apps to be explicitly aware of the
column type names.
			regards, tom lane


__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
---(end of 
broadcast)---
TIP 1: subscribe and unsubscribe commands go to 
[EMAIL PROTECTED]



Andrew Rawnsley
President
The Ravensfield Digital Resource Group, Ltd.
(740) 587-0114
www.ravensfield.com
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] avoiding seqscan?

2003-09-29 Thread Gaetano Mendola
Palle Girgensohn wrote:
Will that make a difference? From what I've seen, it does not make much 
difference, but I have seen queries speed up when rewritten explicit 
joins. I guess it depends on other things, but is it really so that the 
explicit joins are bad somehow? Do you have any pointers to 
documentation about it, if so?

Thanks,
Palle


Are not absolutelly bad but sometimes that path that you choose is not
the optimal, in postgres 7.4 the think will be better.
Regards
Gaetano Mendola
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] avoiding seqscan?

2003-09-29 Thread Gaetano Mendola
Palle Girgensohn wrote:
uu=# explain analyze
uu-#  select lower(substr(p.last_name,1,1)) as letter, count(*)
uu-#  FROM course c join group_data gd on (c.active_group_id = 
gd.this_group_id)
uu-#   join person p on (gd.item_text = p.userid)
uu-#   join dyn_field_person dfp on (dfp.extern_item_id = 10 and 
dfp.giver=c.giver)
uu-#   join dyn_field_content_person dfcp on (dfp.id = 
dfcp.dyn_field_id and dfcp.userid=p.userid)
uu-#   left outer join participant pt on (pt.userid = p.userid and 
pt.course_id = 707)
uu-#  WHERE c.id = 707
uu-#  group by 1
uu-# ;
Why are you using this form of join ? When and if is not necessary use 
the implicit form.

Regards
Gaetano Mendola


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] avoiding seqscan?

2003-09-29 Thread Christopher Browne
[EMAIL PROTECTED] (Palle Girgensohn) writes:
> Will that make a difference? From what I've seen, it does not make
> much difference, but I have seen queries speed up when rewritten
> explicit joins. I guess it depends on other things, but is it really
> so that the explicit joins are bad somehow? Do you have any pointers
> to documentation about it, if so?

The problem is that if you expressly specify the joins, the query
optimizer can't choose its own paths.  And while that may not be
better at the moment, it is quite possible that when you upgrade to a
newer version, those queries, if "not join-specified," could
immediately get faster.

I would expect that the query that uses implicit joins will be clearer
to read, which adds a little further merit to that direction.

That goes along with the usual way that it is preferable to optimize
things, namely that you should start by solving the problem as simply
as you can, and only proceed to further optimization if that actually
proves necessary.  Optimization efforts commonly add complexity and
make code more difficult to maintain; that's not the place to start if
you don't even know the effort is necessary.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "libertyrms.info")

Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Tom Lane
Mary Edie Meredith <[EMAIL PROTECTED]> writes:
> Valid TPC-R benchmark results are on the TPC web site:
> http://www.tpc.org/tpcr/default.asp  
> I do not see one for PostgreSQL.

I'm pretty certain that there are no TPC-certified test results for
Postgres, because to date no organization has cared to spend the money
needed to perform a certifiable test.  From what I understand you need
a pretty significant commitment of people and hardware to jump through
all the hoops involved...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Mary Edie Meredith
On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote:
> I left my TPC-R query #17 working over the weekend and it took 3988 mins
> ~ 10 hours to complete. And this is considering that I am using a TPC-R
> database created with a scale factor of 1, which corresponds to ~1 GB of
> data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 MB RAM.

Was this run with or without the l_partkey index that Jenny suggested? 

> 
> Here is an excerpt from my postgresql.conf file (the rest of the
> settings are commented out):
> 
> #
> # Shared Memory Size
> #
> shared_buffers = 16384# 2*max_connections, min 16, typically
> 8KB each
> 
> #
> # Non-shared Memory Sizes
> #
> sort_mem = 32768
> 
> #
> # Optimizer Parameters
> #
> effective_cache_size = 32000  # typically 8KB each
> 
> Any suggestions on how to optimize these settings?
> 
> I agree with Jenny that declaring additional indexes on the TPC-R tables
> may alter the validity of the benchmarks. Are there any official TPC
> benchmarks submitted by PostgreSQL? 

Actually, for the TPC-R you _are allowed to declare additional indexes. 
With TPC-H you are restricted to a specific set listed in the spec (an
index on l_partkey is allowed for both).

What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the
query for the purposes of making the query run faster.

Sorry if I was unclear.

Valid TPC-R benchmark results are on the TPC web site:
http://www.tpc.org/tpcr/default.asp  

I do not see one for PostgreSQL.


Regards,

Mary 

-- 
Mary Edie Meredith <[EMAIL PROTECTED]>
Open Source Development Lab

> 
> Thanks.
> 
> Oleg
> 
> -Original Message-
> From: Mary Edie Meredith [mailto:[EMAIL PROTECTED] 
> Sent: Friday, September 26, 2003 10:12 AM
> To: Tom Lane
> Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
> Subject: Re: [PERFORM] TPC-R benchmarks
> 
> 
> The TPC-H/R rules allow only minor changes to the SQL that are necessary
> due to SQL implementation differences. They do not allow changes made to
> improve performance.  It is their way to test optimizer's ability to
> recognize an inefficient SQL statement and do the rewrite.
> 
> The rule makes sense for the TPC-H, which is supposed to represent
> ad-Hoc query.  One might argue that for TPC-R, which is suppose to
> represent "Reporting" with pre-knowledge of the query, that re-write
> should be allowed. However, that is currently not the case. Since the
> RDBMS's represented on the TPC council are competing with TPC-H, their
> optimizers already do the re-write, so (IMHO) there is no motivation to
> relax the rules for the TPC-R.
> 
> 
> On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
> > Oleg Lebedev <[EMAIL PROTECTED]> writes:
> > > Seems like in your case postgres uses an i_l_partkey index on 
> > > lineitem table. I have a foreign key constraint defined between the 
> > > lineitem and part table, but didn't create an special indexes. Here 
> > > is my query plan:
> > 
> > The planner is obviously unhappy with this plan (note the large cost 
> > numbers), but it can't find a way to do better.  An index on 
> > lineitem.l_partkey would help, I think.
> > 
> > The whole query seems like it's written in a very inefficient fashion;
> 
> > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across
> 
> > multiple join rows?  But I dunno whether the TPC rules allow for 
> > significant manual rewriting of the given query.
> > 
> > regards, tom lane
> > 
> > ---(end of 
> > broadcast)---
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >   subscribe-nomail command to [EMAIL PROTECTED] so that
> your
> >   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Plan-Reading

2003-09-29 Thread greg

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



> Where can I find a plan-readinf tutorial?

This covers explain plans in depth:

http://www.gtsm.com/oscon2003/explain.html

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200309291123

-BEGIN PGP SIGNATURE-
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/eE65vJuQZxSWSsgRAiJeAJ9YPEopowDJiRgn9sXnrF2G8ddVHACfRR3F
3mwwf3V1P1XCAB6wy/LnoXc=
=5El1
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance: BigInt vs Decimal(19,0)

2003-09-29 Thread Franco Bruno Borghesi




Wouldn't it be the most portable solution to work with a domain?
CREATE DOMAIN BIG_NUMBER AS BIGINT;

If I use BIG_NUMBER everywhere I need it in my database, porting it to other database products should be easy... any SQL 92 compliant dbms should support domains.

On Sun, 2003-09-28 at 00:06, Josh Berkus wrote:

Yusuf,

> Does anyone know where I could get some documentation
> on how the int and decimal are implemented so I could
> prove to him that ints are better?  Can people suggest
> good points to make in order to prove it?

RTFM:
http://www.postgresql.org/docs/7.3/interactive/datatype.html#DATATYPE-NUMERIC





signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Shridhar Daithankar
Oleg Lebedev wrote:
effective_cache_size = 32000	# typically 8KB each
That is 256MB. You can raise it to 350+MB if nothing else is running on the box. 
Also if you have fast disk drives, you can reduce random page cost to 2 or 1.5.

I don't know how much this will make any difference to benchmark results but 
usually this helps when queries are slow.

 HTH

 Shridhar

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] TPC-R benchmarks

2003-09-29 Thread Oleg Lebedev
I left my TPC-R query #17 working over the weekend and it took 3988 mins
~ 10 hours to complete. And this is considering that I am using a TPC-R
database created with a scale factor of 1, which corresponds to ~1 GB of
data. I am running RedHat 8.0 on a dual 1 GHz processor, 512 MB RAM.

Here is an excerpt from my postgresql.conf file (the rest of the
settings are commented out):

#
#   Shared Memory Size
#
shared_buffers = 16384  # 2*max_connections, min 16, typically
8KB each

#
#   Non-shared Memory Sizes
#
sort_mem = 32768

#
#   Optimizer Parameters
#
effective_cache_size = 32000# typically 8KB each

Any suggestions on how to optimize these settings?

I agree with Jenny that declaring additional indexes on the TPC-R tables
may alter the validity of the benchmarks. Are there any official TPC
benchmarks submitted by PostgreSQL? 

Thanks.

Oleg

-Original Message-
From: Mary Edie Meredith [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 26, 2003 10:12 AM
To: Tom Lane
Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
Subject: Re: [PERFORM] TPC-R benchmarks


The TPC-H/R rules allow only minor changes to the SQL that are necessary
due to SQL implementation differences. They do not allow changes made to
improve performance.  It is their way to test optimizer's ability to
recognize an inefficient SQL statement and do the rewrite.

The rule makes sense for the TPC-H, which is supposed to represent
ad-Hoc query.  One might argue that for TPC-R, which is suppose to
represent "Reporting" with pre-knowledge of the query, that re-write
should be allowed. However, that is currently not the case. Since the
RDBMS's represented on the TPC council are competing with TPC-H, their
optimizers already do the re-write, so (IMHO) there is no motivation to
relax the rules for the TPC-R.


On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
> Oleg Lebedev <[EMAIL PROTECTED]> writes:
> > Seems like in your case postgres uses an i_l_partkey index on 
> > lineitem table. I have a foreign key constraint defined between the 
> > lineitem and part table, but didn't create an special indexes. Here 
> > is my query plan:
> 
> The planner is obviously unhappy with this plan (note the large cost 
> numbers), but it can't find a way to do better.  An index on 
> lineitem.l_partkey would help, I think.
> 
> The whole query seems like it's written in a very inefficient fashion;

> couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across

> multiple join rows?  But I dunno whether the TPC rules allow for 
> significant manual rewriting of the given query.
> 
>   regards, tom lane
> 
> ---(end of 
> broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that
your
>   message can get through to the mailing list cleanly
-- 
Mary Edie Meredith <[EMAIL PROTECTED]>
Open Source Development Lab

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] avoiding seqscan?

2003-09-29 Thread Gaetano Mendola
Palle Girgensohn wrote:
Will that make a difference? From what I've seen, it does not make much 
difference, but I have seen queries speed up when rewritten explicit 
joins. I guess it depends on other things, but is it really so that the 
explicit joins are bad somehow? Do you have any pointers to 
documentation about it, if so?

Thanks,
Palle


Are not absolutelly bad but sometimes that path that you choose is not
the optimal, in postgres 7.4 use the explicit join will be less 
limitative for the planner.

Regards
Gaetano Mendola
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] advice on raid controller

2003-09-29 Thread scott.marlowe
I've used the megaraid / LSI cards in the past and they were pretty good 
in terms of reliability, but the last one I used was the 328 model, from 4 
years ago or so.  that one had a battery backup option for the cache, and 
could go to 128 Meg.  We tested it with 4/16 and 128 meg ram, and it was 
about the same with each, but we didn't do heavy parallel testing either.

Here's the page on the megaraid cards at lsilogic.com:

http://www.lsilogic.com/products/stor_prod/raid/ultra320products.html

On Sun, 28 Sep 2003, Matt Clark wrote:

> As others have mentioned, you really ought to get battery-backed cache if
> you're doing any volume of writes.  The ability to do safe write-back
> caching makes an *insane* difference to write performance.
> 
> The site you link to also has that for only 15% more money:
> http://uk.azzurri.com/product/product.cgi?productId=80
> 
> No experience with the card(s) I'm afraid.
> 
> In general though, U320 will only be faster than U160 for large sequential
> reads, or when you have silly numbers of disks on a channel (i.e. more than
> 4/channel).  If you have silly numbers of disks, then RAID5 will probably be
> better, if you have 4 disks total then RAID1+0 will probably be better.  In
> between it depends on all sorts of other factors.  Bear in mind though that
> if you *do* have silly numbers of disks then more channels and more cache
> will count for more than anything else, so spend the money on that rather
> than latest-and-greatest performance for a single channel.
> 
> HTH
> 
> Matt
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] Behalf Of Richard
> > Jones
> > Sent: 27 September 2003 18:25
> > To: [EMAIL PROTECTED]
> > Subject: [PERFORM] advice on raid controller
> >
> >
> > Hi, i'm on the verge of buying a "MegaRAID SCSI 320-2" raid controller.
> > I need it to build a db server using 4x ultra320 scsi disks
> > i'm thinking raid 1+0 but will try with raid5 too and compare
> >
> > Does anyone have any experience with this model, good or bad i'd like to
> > know.. thanks :)
> >
> > as seen:
> > http://uk.azzurri.com/product/product.cgi?productId=188
> >
> > Regards,
> > Richard.
> >
> > PS: whoever mentioned starting a site with raid controller
> > reviews, excellent
> > idea - its hard to find decent info on which card to buy.
> >
> >
> > ---(end of broadcast)---
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >   joining column's datatypes do not match
> >
> 
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] avoiding seqscan?

2003-09-29 Thread Palle Girgensohn
--On måndag, september 29, 2003 15.32.31 +0200 Gaetano Mendola 
<[EMAIL PROTECTED]> wrote:

Are not absolutelly bad but sometimes that path that you choose is not
the optimal, in postgres 7.4 use the explicit join will be less
limitative for the planner.
Regards
Gaetano Mendola
Ah, OK. True! In this case though, the sql questions are crafted with great 
care, since we have a lot of data in a few of the tables, other are almost 
empty, so we try to limit the amount of data as early as possible. Our 
experience says that we often do a better job than the planner, since we 
know which tables are "fat". Hence, we have actually moved to exlicit joins 
in questions and sometimes gained speed.

But, in the general case, implicit might be better, I guess.

Regards,
Palle


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] avoiding seqscan?

2003-09-29 Thread Palle Girgensohn
Will that make a difference? From what I've seen, it does not make much 
difference, but I have seen queries speed up when rewritten explicit joins. 
I guess it depends on other things, but is it really so that the explicit 
joins are bad somehow? Do you have any pointers to documentation about it, 
if so?

Thanks,
Palle
--On måndag, september 29, 2003 00.54.43 +0200 Gaetano Mendola 
<[EMAIL PROTECTED]> wrote:

Palle Girgensohn wrote:
uu=# explain analyze
uu-#  select lower(substr(p.last_name,1,1)) as letter, count(*)
uu-#  FROM course c join group_data gd on (c.active_group_id =
gd.this_group_id)
uu-#   join person p on (gd.item_text = p.userid)
uu-#   join dyn_field_person dfp on (dfp.extern_item_id = 10 and
dfp.giver=c.giver)
uu-#   join dyn_field_content_person dfcp on (dfp.id =
dfcp.dyn_field_id and dfcp.userid=p.userid)
uu-#   left outer join participant pt on (pt.userid = p.userid and
pt.course_id = 707)
uu-#  WHERE c.id = 707
uu-#  group by 1
uu-# ;
Why are you using this form of join ? When and if is not necessary use
the implicit form.
Regards
Gaetano Mendola






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly