Re: [PERFORM] bad performance on Solaris 10

2006-04-12 Thread Tom Lane
"Jignesh K. Shah" <[EMAIL PROTECTED]> writes:
> Many times PostgreSQL has many things based on assumption that it will 
> run on  Linux and it is left to Solaris to emulate that behavior.

Au contraire --- PG tries its best to be OS-agnostic.  I've personally
resisted people trying to optimize it by putting in Linux-specific
behavior.  The above sounds to me like making excuses for a poor OS.

(And yes, I will equally much resist any requests to put in Solaris-
specific behavior...)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pg 7.4.x - pg_restore impossibly slow

2006-04-12 Thread Tom Lane
"patrick keshishian" <[EMAIL PROTECTED]> writes:
> My dev box is much slower hardware than the customer's
> server.  Even with that difference I expected to be able to
> pg_restore the database within one day.  But no.

Seems a bit odd.  Can you narrow down more closely which step of the
restore is taking the time?  (Try enabling log_statements.)

One thought is that kicking up work_mem and vacuum_mem is likely to
help for some steps (esp. CREATE INDEX and foreign-key checking).
And be sure you've done the usual tuning for write-intensive activity,
such as bumping up checkpoint_segments.  Turning off fsync wouldn't
be a bad idea either.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Inserts optimization?

2006-04-12 Thread Tom Lane
Chris <[EMAIL PROTECTED]> writes:
> Francisco Reyes wrote:
>> Doing my first write heavy database.
>> What settings will help improve inserts?
>> Only a handfull of connections, but each doing up to 30 inserts/second.

> If you can, use copy instead:
> http://www.postgresql.org/docs/8.1/interactive/sql-copy.html

Or at least try to do multiple inserts per transaction.

Also, increasing checkpoint_segments and possibly wal_buffers helps a
lot for write-intensive loads.  Try to get the WAL onto a separate disk
spindle if you can.  (These things don't matter for SELECTs, but they
do matter for writes.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3

2006-04-12 Thread Bruce Momjian

I am thinking the most flexible solution would be to get a dual Operon
machine, and initially do both data loading and queries on the same
machine.  When the load gets too high, buy a second machine and set it
up as a Slony slave and run your queries on that, and do the data loads
on the original machine as master.

---

Juan Casero (FL FLC) wrote:
> Because I plan to develop a rather large (for us anyway) data warehouse
> with PostgreSQL.  I am looking for the right hardware that can handle
> queries on a database that might grow to over a 100 gigabytes.  Right
> now our decision support system based on postgresql 8.1.3 stores retail
> sales information for about 4 four years back *but* only as weekly
> summaries.  I want to build the system so it can handle daily sales
> transactions also.  You can imagine how many more records this will
> involve so I am looking for hardware that can give me the performance I
> need to make this project useable.  In other words parsing and loading
> the daily transaction logs for our stores is likely to take huge amounts
> of effort.  I need a machine that can complete the task in a reasonable
> amount of time.  As people start to query the database to find sales
> related reports and information I need to make sure the queries will run
> reasonably fast for them.  I have already hand optimized all of my
> queries on the current system.  But currently I only have weekly sales
> summaries.  Other divisions in our company have done a similar project
> using MS SQL Server on SMP hardware far outclassing the database server
> I currently use and they report heavy loads on the server with less than
> ideal query run times.  I am sure I can do my part to optimize the
> queries once I start this project but there is only so much you can do.
> At some point you just need more powerful hardware.  This is where I am
> at right now.  Apart from that since I will only get this one chance to
> buy a new server for data processing I need to make sure that I buy
> something that can grow over time as our needs change.  I don't want to
> buy a server only to find out later that it cannot meet our needs with
> future database projects.  I have to balance a limited budget, room for
> future performance growth, and current system requirements.  Trust me it
> isn't easy.  
> 
> 
> Juan
> 
> -Original Message-
> From: Josh Berkus [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, April 06, 2006 2:57 AM
> To: pgsql-performance@postgresql.org
> Cc: Juan Casero (FL FLC); Luke Lonergan
> Subject: Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3
> 
> Juan,
> 
> > Ok that is beginning to become clear to me.  Now I need to determine 
> > if this server is worth the investment for us.  Maybe it is not a 
> > speed daemon but to be honest the licensing costs of an SMP aware 
> > RDBMS is outside our budget.
> 
> You still haven't explained why you want multi-threaded queries.  This
> is sounding like keeping up with the Joneses.
> 
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] bad performance on Solaris 10

2006-04-12 Thread Jignesh K. Shah


Bruce,

Hard to answer that... People like me who know and love PostgreSQL and  
Solaris finds this as an opportunity to make their favorite database 
work best on their favorite operating system.


Many times PostgreSQL has many things based on assumption that it will 
run on  Linux and it is left to Solaris to emulate that behavior.That 
said there are ways to improve performance even on UFS on Solaris, it 
just requires more tweaks.


Hopefully this will lead to few Solaris friendly default values  like 
fsync/odatasync :-)


Regards,
Jignesh


Bruce Momjian wrote:



It is hard to imagine why people spend so much time modifying Sun
machines run with acceptable performance when non-Sun operating systems
work fine without such hurtles.
 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Craig A. James

Jim C. Nasby wrote:
No, I don't agree with this.  Too many people waste time designing for 
"what if..." scenarios that never happen.  You don't want to be dumb and 
design something that locks out a foreseeable and likely future need, but 
referential integrity doesn't meet this criterion.  There's nothing to keep 
you from changing from app-managed to database-managed referential 
integrity if your needs change.


In this case your argument makes no sense, because you will spend far
more time re-creating RI capability inside an application than if you
just use what the database offers natively.


But one of the specific conditions in my original response was, "You have 
application-specific knowledge about when you can skip referential integrity and thereby 
greatly improve performance."  If you can't do that, I agree with you.

Anyway, this discussion is probably going on too long, and I'm partly to blame. 
 I think we all agree that in almost all situations, using the database to do 
referential integrity is the right choice, and that you should only violate 
this rule if you have a really, really good reason, and you've thought out the 
implications carefully, and you know you may have to pay a steep price later if 
your requirements change.

Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] pg 7.4.x - pg_restore impossibly slow

2006-04-12 Thread patrick keshishian
Greetings,

I have 395M pg_dump from a PostgreSQL 7.4.2 database.
This dump is from one of our customer's servers.  There is
a web-based administration UI which has been reported to
be extremely slow and unusable.

To see what's going on with their data I have grabbed a
copy of their nightly pg_dump output and attempting to
restore it on my development box, running PostgreSQL
7.4.12.

My dev box is much slower hardware than the customer's
server.  Even with that difference I expected to be able to
pg_restore the database within one day.  But no. After
leaving pg_restore running for about 2 days, I ctrl-C'ed
out of it (see copy/paste below along with other info).

I must say, that data was being restored, as I could do
select count(*) on tables which had their data restored and
I would get valid counts back.

The database contains 34 tables. The pg_restore seems to
restore the first 13 tables pretty quickly, but they do not have
many records. The largest amongst them with ~ 17,000 rows.

Then restore gets stuck on a table with 2,175,050 rows.
Following this table another table exists with 2,160,616
rows.

One thing worth mentioning is that the PostgreSQL package
that got deployed lacked compression, as in:

$ pg_dump -Fc dbname > dbname.DUMP
pg_dump: [archiver] WARNING: requested compression not available in
this installation -- archive will be uncompressed


Any suggestions as to what may be the problem here?
I doubt that the minor version mis-match is what's causing
this problem. (I am try this test on another machine with the
same version of PostgreSQL installed on it, and right now,
it is stuck on the first of the two huge tables, and it has
already been going for more than 2 hrs).

I'm open to any ideas and/or suggestions (within reason) :)

Best regards,
--patrick


[EMAIL PROTECTED]:/tmp$ date
Mon Apr 10 15:13:19 PDT 2006
[EMAIL PROTECTED]:/tmp$ pg_restore -ad dbname customer_db.DUMP ; date
^C
[EMAIL PROTECTED]:/tmp$ date
Wed Apr 12 10:40:19 PDT 2006

[EMAIL PROTECTED]:/tmp$ uname -a
Linux devbox 2.4.31 #6 Sun Jun 5 19:04:47 PDT 2005 i686 unknown
unknown GNU/Linux
[EMAIL PROTECTED]:/tmp$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 8
model name  : Pentium III (Coppermine)
stepping: 6
cpu MHz : 731.477
cache size  : 256 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
bogomips: 1461.45

[EMAIL PROTECTED]:/tmp/$ cat /proc/meminfo
total:used:free:  shared: buffers:  cached:
Mem:  527499264 523030528  44687360 10301440 384454656
Swap: 1579204608   552960 1578651648
MemTotal:   515136 kB
MemFree:  4364 kB
MemShared:   0 kB
Buffers: 10060 kB
Cached: 374984 kB
SwapCached:460 kB
Active:  79004 kB
Inactive:   306560 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   515136 kB
LowFree:  4364 kB
SwapTotal: 1542192 kB
SwapFree:  1541652 kB


postgresql.conf changes on devbox:
checkpoint_segments = 10
log_pid = true
log_timestamp = true

The checkpoint_segments was changed to 10 after
seeing many "HINT"s in PostgreSQL log file about it.
Doesn't seem to have affected pg_restore performance.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Inserts optimization?

2006-04-12 Thread Chris

Francisco Reyes wrote:

Doing my first write heavy database.
What settings will help improve inserts?
Only a handfull of connections, but each doing up to 30 inserts/second.
Plan to have 2 to 3 clients which most of the time will not run at the 
same time, but ocasionaly it's possible two of them may bump into each 
other.


If you can, use copy instead:

http://www.postgresql.org/docs/8.1/interactive/sql-copy.html

MUCH quicker (and don't worry about using multiple clients).

--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Inserts optimization?

2006-04-12 Thread Francisco Reyes

Doing my first write heavy database.
What settings will help improve inserts?
Only a handfull of connections, but each doing up to 30 inserts/second.
Plan to have 2 to 3 clients which most of the time will not run at the 
same time, but ocasionaly it's possible two of them may bump into each 
other.



If anyone recalls a previous thread like this please suggest keywords to 
search on. My search on this topic came back pretty empty.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] multi column query

2006-04-12 Thread Jim C. Nasby
On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote:
> Hi
> 
> When I update a table that has 20 columns and the where clause includes
> 16 of the columns (this is a data warehousing type update on aggregate
> fields),
> 
> The bitmap scan is not used by the optimizer. The table is indexed on 3
> of the 20 fields. The update takes really long to finish (on a 6 million
> row table)
> 
> Do I need to do some "magic" with configuration to turn on bitmap scans.

No. What's explain analyze of the query show? What's it doing now?
Seqscan? You might try set enable_seqscan=off and see what that does.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pgmemcache

2006-04-12 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> Why are AFTER COMMIT triggers impossible?

What happens if such a trigger gets an error?  You can't un-commit.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] multi column query

2006-04-12 Thread Sriram Dandapani








Hi

When I update a table that has 20 columns and the where
clause includes 16 of the columns (this is a data warehousing type update on
aggregate fields),

The bitmap scan is not used by the optimizer. The table is
indexed on 3 of the 20 fields. The update takes really long to finish (on a 6
million row table)

 

Do I need to do some “magic” with configuration
to turn on bitmap scans.








Re: [PERFORM] pgmemcache

2006-04-12 Thread Jim C. Nasby
On Wed, Apr 12, 2006 at 04:03:43PM -0700, Josh Berkus wrote:
> PFC,
> 
> > It would be nice to have ON COMMIT triggers for this use.
> >
> > However you can emulate ON COMMIT triggers with a modification of the
> > memcache update process :
> 
> Well, I'm back in touch with the GORDA project so possibly we can have 
> BEFORE COMMIT triggers after all.
> 
> BTW, it's important to note that AFTER COMMIT triggers are logically 
> impossible, so please use BEFORE COMMIT so that it's clear what we're 
> talking about.

Why are AFTER COMMIT triggers impossible? ISTM they would be useful as a
means to indicate to some external process if a transaction succeeded or
not. And for some things you might only want to fire the trigger after
you knew that the transaction had committed.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pgmemcache

2006-04-12 Thread Josh Berkus
PFC,

>   It would be nice to have ON COMMIT triggers for this use.
>
>   However you can emulate ON COMMIT triggers with a modification of the
> memcache update process :

Well, I'm back in touch with the GORDA project so possibly we can have 
BEFORE COMMIT triggers after all.

BTW, it's important to note that AFTER COMMIT triggers are logically 
impossible, so please use BEFORE COMMIT so that it's clear what we're 
talking about.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-12 Thread Jim Nasby



Adding 
-performance back in

  -Original Message-From: Oscar Picasso 
  [mailto:[EMAIL PROTECTED]Sent: Wednesday, April 12, 2006 5:51 
  PMTo: Jim NasbySubject: Re: [PERFORM] Better index 
  stategy for many fields with few values
  I would like to try it.However in an other post I added that 
  contrary to what I stated initially all the paramXX columns are not mandatory 
  in the query. So it seems that requirement make the problem more 
  complexe.Doesn't this new requirement rule out this solution? 
No, just 
group the columns logically.

   By the way I have test to 
  index each column individually and check what happens in relation to bitscan 
  map. My test table  is 1 million  rows. The explain analyze command 
  shows that a bit scan is sometimes used but I still end up with queries that 
  can take up to 10s which is way to much."Jim C. Nasby" 
  <[EMAIL PROTECTED]> wrote:
  On 
Wed, Apr 12, 2006 at 02:59:32PM +0200, Markus Schaber wrote:> > I 
was thinking about using a multicolumns index, but I have read that> 
> we should limit multicolumns indice to at most 2 or 3 columns.> 
> Yes, that's true, the index overhead gets too high.> 
> > I was also thinking about about using a functional 
index.> > If there's a logical relation between those values 
that they can easily> combined, that may be a good 
alternative.How would that be any better than just doing a 
multi-column index?> I just had another weird idea:> 
> As your paramXX values can have only 10 parameters, it also might 
be> feasible to use a bunch of 10 conditional indices, like:> 
> CREATE INDEX foo1 ON table (param1, param2 WHERE param0='1st 
value';> CREATE INDEX foo2 ON table (param1, param2 WHERE param0='2nd 
value';> CREATE INDEX foo3 ON table (param1, param2 WHERE param0='3rd 
value';> [...]Not all that weird; it's known as index 
partitioning.-- Jim C. Nasby, Sr. Engineering Consultant 
[EMAIL PROTECTED]Pervasive Software http://pervasive.com work: 
512-231-6117vcard: http://jim.nasby.net/pervasive.vcf cell: 
512-569-9461---(end of 
broadcast)---TIP 4: Have you searched our list 
archives?http://archives.postgresql.org
  
  
  Yahoo! 
  Messenger with Voice. PC-to-Phone calls for ridiculously low 
rates.


Re: [PERFORM] bad performance on Solaris 10

2006-04-12 Thread Josh Berkus
People,

> Lately I find people are not so receptive to VxFS, and Sun is promoting
> ZFS, and we don't have a reasonable near term option for Raw IO in
> Postgres, so we need to work to find a reasonable path for Solaris users
> IMO.  The long delays in ZFS production haven't helped us there, as the
> problems with UFS are severe.

FWIW, I'm testing on ZFS now.  But it's not stable yet.  People are welcome 
to join the Solaris 11 beta program.

In the near term, there are fixes to be made both in PostgreSQL 
configuration and in Solaris configuration.  Also, some of the work being 
done for 8.2 ... the external sort work done by Simon and sponsored by 
GreenPlum, and the internal sort work which Jonah and others are doing ... 
will improve things on Solaris as our sort issues hit Solaris harder than 
other OSes.

Expect lots more info on performance config for Solaris from me & Robert in 
the next few weeks. 

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Better index stategy for many fields with few

2006-04-12 Thread Luke Lonergan
Oscar,

On 4/10/06 9:58 AM, "Oscar Picasso" <[EMAIL PROTECTED]> wrote:

> - My items table:
> code int  -- can take one of 100 values
> property varchar(250) -- can take one of 5000 values
> param01 char(10)  -- can take one of 10 values
> param02 char(10)  -- can take one of 10 values
> ...
> [ 20 similar columns }
> ...
> parama20 char(10) -- can take one of 10 values
> 
> - The kind of query I want to optimize:
> select * from items
> where code betwwen 5 and 22
> and param01 = 'P'
> and param02 = 'G'
> ...
> [ all the 20 paramXX columns are used in the query}
> ...
> and param20 = 'C';

Bizgres 0.9 has an on-disk bitmap index which will likely improve this query
speed by a very large amount over normal Postgres 8.1.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] bad performance on Solaris 10

2006-04-12 Thread Luke Lonergan
Bruce,

On 4/12/06 12:56 PM, "Bruce Momjian"  wrote:

> It is hard to imagine why people spend so much time modifying Sun
> machines run with acceptable performance when non-Sun operating systems
> work fine without such hurtles.

There are a lot of Solaris customers that we support and that we'd like to
support.  To many of them, Solaris has many advantages other than speed,
though they expect a reasonably comparable performance, perhaps within a
factor of 2 of other options.

Oracle has spent a great deal of time (a decade!) optimizing their software
for Solaris, and it shows.  There are also some typical strategies that
Solaris people used to use to make Solaris perform better, like using VxFS
(Veritas Filesystem), or Oracle Raw IO to make their systems perform better.

Lately I find people are not so receptive to VxFS, and Sun is promoting ZFS,
and we don't have a reasonable near term option for Raw IO in Postgres, so
we need to work to find a reasonable path for Solaris users IMO.  The long
delays in ZFS production haven't helped us there, as the problems with UFS
are severe.

We at Greenplum have worked hard over the last year to find options for
Postgres on Solaris and have the best configuration setup that we think is
possible now on UFS, and our customers benefit from that.  However, Linux on
XFS or even ext3 is definitely the performance leader.

- Luke 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Jim C. Nasby
On Wed, Apr 12, 2006 at 10:36:28AM -0700, Craig A. James wrote:
> Jim C. Nasby wrote:
> >>1. You have only one application that modifies the data.  (Otherwise, you 
> >>have to duplicate the rules across many applications, leading to a 
> >>code-maintenance nightmare).
> >
> >You forgot something:
> >
> >1a: You know that there will never, ever, ever, ever, be any other
> >application that wants to talk to the database.
> >
> >I know tons of people that get burned because they go with something
> >that's "good enough for now", and then regret that decision for years to
> >come.
> 
> No, I don't agree with this.  Too many people waste time designing for 
> "what if..." scenarios that never happen.  You don't want to be dumb and 
> design something that locks out a foreseeable and likely future need, but 
> referential integrity doesn't meet this criterion.  There's nothing to keep 
> you from changing from app-managed to database-managed referential 
> integrity if your needs change.

In this case your argument makes no sense, because you will spend far
more time re-creating RI capability inside an application than if you
just use what the database offers natively.

It's certainly true that you don't want to over-engineer for no reason,
but many times choices are made to save a very small amount of time or
hassle up-front, and those choices become extremely painful later.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-12 Thread Jim C. Nasby
On Wed, Apr 12, 2006 at 02:59:32PM +0200, Markus Schaber wrote:
> > I was thinking about using a multicolumns index, but I have read that
> > we should limit multicolumns indice to at most 2 or 3 columns.
> 
> Yes, that's true, the index overhead gets too high.
> 
> > I was also thinking about about using a functional index.
> 
> If there's a logical relation between those values that they can easily
> combined, that may be a good alternative.
 
How would that be any better than just doing a multi-column index?
 
> I just had another weird idea:
> 
> As your paramXX values can have only 10 parameters, it also might be
> feasible to use a bunch of 10 conditional indices, like:
> 
> CREATE INDEX foo1 ON table (param1, param2 WHERE param0='1st value';
> CREATE INDEX foo2 ON table (param1, param2 WHERE param0='2nd value';
> CREATE INDEX foo3 ON table (param1, param2 WHERE param0='3rd value';
> [...]

Not all that weird; it's known as index partitioning.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] bad performance on Solaris 10

2006-04-12 Thread Bruce Momjian
Luke Lonergan wrote:
> Alvaro,
> 
> On 4/5/06 2:48 PM, "Alvaro Herrera" <[EMAIL PROTECTED]> wrote:
> 
> > This essentially means stopping all bgwriter activity, thereby deferring
> > all I/O until checkpoint.  Was this considered?  With
> > checkpoint_segments to 128, it wouldn't surprise me that there wasn't
> > any checkpoint executed at all during the whole test ...
> 
> Yes, many things about the Solaris UFS filesystem caused a great deal of
> pain over the 10 months of experiments we ran with Sun MDE.  Ultimately, the
> conclusion was that ZFS is going to make all of the pain go away.
> 
> In the meantime, all you can do is tweak up UFS and avoid I/O as much as
> possible.

It is hard to imagine why people spend so much time modifying Sun
machines run with acceptable performance when non-Sun operating systems
work fine without such hurtles.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Craig A. James

Jim C. Nasby wrote:
1. You have only one application that modifies the data.  (Otherwise, you 
have to duplicate the rules across many applications, leading to a 
code-maintenance nightmare).


You forgot something:

1a: You know that there will never, ever, ever, ever, be any other
application that wants to talk to the database.

I know tons of people that get burned because they go with something
that's "good enough for now", and then regret that decision for years to
come.


No, I don't agree with this.  Too many people waste time designing for "what 
if..." scenarios that never happen.  You don't want to be dumb and design something 
that locks out a foreseeable and likely future need, but referential integrity doesn't 
meet this criterion.  There's nothing to keep you from changing from app-managed to 
database-managed referential integrity if your needs change.

Design for your current requirements.


Let us be of good cheer, remembering that the misfortunes hardest to bear are 
those which never happen.		- James Russell Lowell (1819-1891)


Therefore do not be anxious about tomorrow, for tomorrow will be anxious for 
itself.  Let the day's own trouble be sufficient for the day.

- Matthew 6:34

Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Jim C. Nasby
On Wed, Apr 12, 2006 at 07:45:17AM -0700, Craig A. James wrote:
> All good advice, but... there are no absolutes in this world.  
> Application-enforced referential integrity makes sense if (and probably 
> ONLY if):
> 
> 1. You have only one application that modifies the data.  (Otherwise, you 
> have to duplicate the rules across many applications, leading to a 
> code-maintenance nightmare).

You forgot something:

1a: You know that there will never, ever, ever, ever, be any other
application that wants to talk to the database.

I know tons of people that get burned because they go with something
that's "good enough for now", and then regret that decision for years to
come.

> 2. If your application crashes and leaves a mess, it's not a catastrophe, 
> and you have a good way to clean it up.  For example, a bank shouldn't do 
> this, but it might be OK for a computer-aided-design application, or the 
> backend of a news web site.
> 
> 3. You have application-specific knowledge about when you can skip 
> referential integrity and thereby greatly improve performance.  For 
> example, you may have batch operations where large numbers of rows are 
> temporarily inconsistent.
> 
> If your application doesn't meet ALL of these criteria, you probably should 
> use the database for referential integrity.
> 
> Craig
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>   http://www.postgresql.org/docs/faq
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Jim C. Nasby
On Wed, Apr 12, 2006 at 09:22:52AM +0200, PFC wrote:
> 
> >>  I think this is an old question, but I want to know if it really is  
> >>well worth to not create some foreign keys an deal with the referential  
> >>integrity at application-level?
> 
>   Trust me : do it in the application and you'll enter a world of 
>   hurt. I'm  doing it with some mysql apps, and it's a nightmare ; doing 
> cascaded  delete's by hand, etc, you always forget something, you have to 
> modify a  million places in your code everytime you add a new table, your 
> ORM  bloats, you get to write cleanup cron scripts which take forever to 
> run,  your website crashes etc.

Well, yeah, thats typical for MySQL sites, but what's that have to do
with RI?

Sorry, couldn't resist. :P

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Scott Marlowe
On Wed, 2006-04-12 at 09:49, Rodrigo Sakai wrote:
>   Thanks for all help!! But my problem is with performance, I agree with all
> of you, the RI must be maintained by the database, because a bunch of
> reasons that everyone knows!
>   But, I'm dealing with a very huge database that servers more than 200
> clientes at the same time, and because of it, each manipulation (delete,
> insert, update, select) on the database have a poor performance. So, if we
> deal with RI in each client station, we take this work off the database!
>   The application is an ERP developed with DELPHI + (postgresql or oracle or
> sql server)!!

These are separate issues.

One is performance of PostgreSQL handling FK->PK relationships.
PostgreSQL, in my experience, is quite fast at this.  However, there are
ways you can set up FK->PK relationships that are non-optimal and will
result in poor performance.  FK->PK relationships are generally fastest
when they are 1-1 and based on integer types.  If there's a type
mismatch, or you use slower types, like large text fields, or numerics,
you may have poor performance.  Give us a sample of your schema where
you're having problems, let us help you troubleshoot your performance.

High parallel load is another issue.  No matter where you put your
FK->PK relationship handling, having 200+ users connected at the same
time and manipulating your database is a heavy load.

Handling FK->PK relationships in software often is vulnerable to race
conditions.  Like so:  (T1 and T2 are different "threads)

T1: select id from mastertable where id=99; -- check for row
T2: delete from mastertable where id=99; -- delete a row
T1: insert into slavetable values (); -- whoops!  No master

If we change the T1 to select for update, we now have the overhead that
most FK->PK relationships have.

What version of PostgreSQL are you running.  Older versions had much
poorer performance than newer versions when updating FK->PK
relationships.

Don't assume that application level FK->PK relationships will be faster
AND as good as the ones at database level.  It's quite possible that
they're faster for you because you're cutting corners, referentially
speaking, and your data will wind up incoherent over time.

Also, you may be dealing with a database that is IO bound, and moving
the FK checks to software is only a short stop gap, and as the machine
hits the IO performance ceiling, you'll have the same problem again,
need a bigger machine, and have incoherent data.  I.e. the same problem,
plus a few more, and have spent a lot of time spinning your wheels going
a short distance.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread PFC


What kinds of operations are made slow by foreign key checks ? Is it :
- Simple checks on INSERT ?
- Simple checks on UPDATE ?
- Cascaded deletes ?
- Cascaded updates ?
- Locks ?
- Something else ?

	Foreign keys are to ensure that the value in a column is always part of a  
specific set (the referenced table). If this set changes very rarely, like  
the list of countries in the world, or the list of states in a country, or  
the various possible states an order can be in (received, processed,  
shipped...) then it's alright to skip the check if you're sure your  
application will insert a valid value. For some other situations, doing  
the check in the application will need some queries and could be slower  
(and certainly will be more complicated...)


	Are you sure you're not missing a few indexes, which would then force  
fkey checks to use sequential scans ?


  Thanks for all help!! But my problem is with performance, I agree with  
all

of you, the RI must be maintained by the database, because a bunch of
reasons that everyone knows!
  But, I'm dealing with a very huge database that servers more than 200
clientes at the same time, and because of it, each manipulation (delete,
insert, update, select) on the database have a poor performance. So, if  
we

deal with RI in each client station, we take this work off the database!
  The application is an ERP developed with DELPHI + (postgresql or  
oracle or

sql server)!!


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Michael Glaesemann


On Apr 12, 2006, at 23:49 , Rodrigo Sakai wrote:

  Thanks for all help!! But my problem is with performance, I agree  
with all

of you, the RI must be maintained by the database, because a bunch of
reasons that everyone knows!


You've gotten a variety of good advice from a number of people. For  
more specific advice (e.g., for your particular situation), it would  
be very helpful if you could provide examples of queries that aren't  
performing well for you (including table schema and explain analyze  
output).


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Merlin Moncure
On 4/11/06, Rodrigo Sakai <[EMAIL PROTECTED]> wrote:
>
>   Hi,
>
>   I think this is an old question, but I want to know if it really is well
> worth to not create some foreign keys an deal with the referential integrity
> at application-level?
>   Specifically, the system we are developing is a server/cliente
> architecture that the server is the database and the fat client is an
> application developed in DELPHI!!!
>
>   Thanks in advance!!

Delphi IMO is the best RAD win32 IDE ever invented (especially when
paired the very excellent Zeos connection objects).  However, for
purposes of data management ,imperative languages, Delphi included,
simply suck.  Great form editor though.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Rodrigo Sakai
  Thanks for all help!! But my problem is with performance, I agree with all
of you, the RI must be maintained by the database, because a bunch of
reasons that everyone knows!
  But, I'm dealing with a very huge database that servers more than 200
clientes at the same time, and because of it, each manipulation (delete,
insert, update, select) on the database have a poor performance. So, if we
deal with RI in each client station, we take this work off the database!
  The application is an ERP developed with DELPHI + (postgresql or oracle or
sql server)!!

  Thanks again!!

- Original Message - 
From: "Markus Schaber" <[EMAIL PROTECTED]>
To: 
Cc: "Rodrigo Sakai" <[EMAIL PROTECTED]>
Sent: Wednesday, April 12, 2006 10:18 AM
Subject: Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE


> Hi, Michael,
> Hi, Rodrigo,
>
> Michael Glaesemann wrote:
>
> > If I had to choose between one or the other, I'd leave all  referential
> > integrity in the database and deal with the errors thrown  when
> > referential integrity is violated in the application. PostgreSQL  is
> > designed to handle these kinds of issues. Anything you code in  your
> > application is more likely to contain bugs or miss corner cases  that
> > would allow referential integrity to be violated. PostgreSQL has  been
> > pounded on for years by a great many users and developers,  making the
> > likelihood of bugs still remaining much smaller.
>
> I strictly agree with Michael here.
>
> > Of course, you can add some referential integrity checks in your
> > application code, but those should be in addition to your database-
> > level checks.
>
> Agree. It does make sense to have reference checks in the UI or
> application level for the sake of better error handling, but the
> database should be the mandatory judge.
>
> There's another advantage of database based checking: Should there ever
> be the need of a different application working on the same database (e.
> G. an "expert level UI", or some connector that connects / synchronizes
> to another software, or a data import tool), database based constraints
> cannot be broken opposed to application based ones.
>
> HTH,
> Markus
> -- 
> Markus Schaber | Logical Tracking&Tracing International AG
> Dipl. Inf. | Software Development GIS
>
> Fight against software patents in EU! www.ffii.org
www.nosoftwarepatents.org
>


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Craig A. James
  I think this is an old question, but I want to know if it really 
is  well worth to not create some foreign keys an deal with the 
referential  integrity at application-level?



Trust me : do it in the application and you'll enter a world of 
hurt. I'm  doing it with some mysql apps, and it's a nightmare ; doing 
cascaded  delete's by hand, etc, you always forget something, you have 
to modify a  million places in your code everytime you add a new table, 
your ORM  bloats, you get to write cleanup cron scripts which take 
forever to run,  your website crashes etc.


All good advice, but... there are no absolutes in this world.  
Application-enforced referential integrity makes sense if (and probably ONLY 
if):

1. You have only one application that modifies the data.  (Otherwise, you have 
to duplicate the rules across many applications, leading to a code-maintenance 
nightmare).

2. If your application crashes and leaves a mess, it's not a catastrophe, and 
you have a good way to clean it up.  For example, a bank shouldn't do this, but 
it might be OK for a computer-aided-design application, or the backend of a 
news web site.

3. You have application-specific knowledge about when you can skip referential 
integrity and thereby greatly improve performance.  For example, you may have 
batch operations where large numbers of rows are temporarily inconsistent.

If your application doesn't meet ALL of these criteria, you probably should use 
the database for referential integrity.

Craig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Markus Schaber
Hi, Michael,
Hi, Rodrigo,

Michael Glaesemann wrote:

> If I had to choose between one or the other, I'd leave all  referential
> integrity in the database and deal with the errors thrown  when
> referential integrity is violated in the application. PostgreSQL  is
> designed to handle these kinds of issues. Anything you code in  your
> application is more likely to contain bugs or miss corner cases  that
> would allow referential integrity to be violated. PostgreSQL has  been
> pounded on for years by a great many users and developers,  making the
> likelihood of bugs still remaining much smaller.

I strictly agree with Michael here.

> Of course, you can add some referential integrity checks in your
> application code, but those should be in addition to your database-
> level checks.

Agree. It does make sense to have reference checks in the UI or
application level for the sake of better error handling, but the
database should be the mandatory judge.

There's another advantage of database based checking: Should there ever
be the need of a different application working on the same database (e.
G. an "expert level UI", or some connector that connects / synchronizes
to another software, or a data import tool), database based constraints
cannot be broken opposed to application based ones.

HTH,
Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Sequencial scan instead of using index

2006-04-12 Thread Harry Hehl
Thanks Mark, 

>Hmm - that first query needs to do a sort, so you might want to
experiment with the sort_mem parameter. Could you show us output from
explain analyze for >both the above queries?

Not too concerned about the sort, more about the query performance with
seq scan as the tables size increases.

>At face value, selecting 20 rows (assuming the estimates are
accurate) may mean that a seqscan is the best plan! But we'll know more
after seeing the >explain analyze...

20 rows is about right.  

I saw Tom's response on the planner improvement in 8.2 but I was still
going to send the explain analyze output.
However I can't show you explain analyze. The postmaster goes to 99% cpu
and stays there. The explain analyze command hangs...

It is starting to look like inheritance does help in modeling the data,
but for searches parallel flat tables that don't use inheritance is
required to get optimum query performance. 

Has anyone else come to this conclusion?

Thanks




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-12 Thread Markus Schaber
Hi, Oscar,

Oscar Picasso wrote:

> [ all the 20 paramXX columns are used in the query}

> How can I optimize this kind of query?

PostgreSQL 8.1 has so-called bitmap index scans, which can combine
several index scans before actually accessing the data.

So I think it's best to create an index on each of the paramXX columns,
and see with EXPLAIN ANALYZE what it is doing.

> I was thinking about using a multicolumns index, but I have read that
> we should limit multicolumns indice to at most 2 or 3 columns.

Yes, that's true, the index overhead gets too high.

> If that's true then 22 columns for a multicolumn incdex seems way too
> much. Or maybe it is workable as every column uses only a very limited
> set of values?

Yes, I think that a 22 column index is way too much, especially with the
new bitmap index scans available.

> I was also thinking about about using a functional index.

If there's a logical relation between those values that they can easily
combined, that may be a good alternative.


I just had another weird idea:

As your paramXX values can have only 10 parameters, it also might be
feasible to use a bunch of 10 conditional indices, like:

CREATE INDEX foo1 ON table (param1, param2 WHERE param0='1st value';
CREATE INDEX foo2 ON table (param1, param2 WHERE param0='2nd value';
CREATE INDEX foo3 ON table (param1, param2 WHERE param0='3rd value';
[...]

This way, you don't have the index bloat of a 3-column index, but 10
2-column indices that cover 1/10th of the table each.

For 22 columns, you'd need a bunch of seven such indices plus a
single-column one, or can use some 3+1 and some 2+1 column index.

I'd like to see the query plans from explain analyze.

Btw, I expect query planning time to get rather significant for so much
columns, so gequo tuning, tuning work_mem (for the bitmap scans) and
prepared statements will pay off.

HTH,
Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread PFC


  I think this is an old question, but I want to know if it really is  
well worth to not create some foreign keys an deal with the referential  
integrity at application-level?


	Trust me : do it in the application and you'll enter a world of hurt. I'm  
doing it with some mysql apps, and it's a nightmare ; doing cascaded  
delete's by hand, etc, you always forget something, you have to modify a  
million places in your code everytime you add a new table, your ORM  
bloats, you get to write cleanup cron scripts which take forever to run,  
your website crashes etc.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] pgmemcache

2006-04-12 Thread PFC


It would be nice to have ON COMMIT triggers for this use.

	However you can emulate ON COMMIT triggers with a modification of the  
memcache update process :


- A standard trigger sends the data to update to memcache
- The trigger also sends the PID
- Instead of being used immediately, this data is kept in a buffer
- Notify is issued

On commit :
- postgres sends the NOTIFY signal
	- the memcache updater reads the NOTIFY (which embeds the PID I believe)  
; it finds the buffered data sent above and uses it to update memcached


On rollback :
- Interesting problem ;)))

OK, it's a kludge. When can we get ON COMMIT triggers ?





On Tue, Apr 04, 2006 at 12:24:42AM -0700, C Storm wrote:

I was wondering if anyone on the list has a successful installation of
pgmemcache running
that uses LISTEN/NOTIFY to signal a successfully completed transaction,
i.e., to get around the fact
that TRIGGERS are transaction unaware.  Or perhaps any other
information regarding a successful
deployment of pgmemcache.


The problem with attempting that is that you'd have a window between
transaction commit and when the cache was invalidated. If that's
acceptable then it shouldn't be too difficult to set something up using
LISTEN/NOTIFY like you describe.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly