Re: [PERFORM] bad performance on Solaris 10
"Jignesh K. Shah" <[EMAIL PROTECTED]> writes: > Many times PostgreSQL has many things based on assumption that it will > run on Linux and it is left to Solaris to emulate that behavior. Au contraire --- PG tries its best to be OS-agnostic. I've personally resisted people trying to optimize it by putting in Linux-specific behavior. The above sounds to me like making excuses for a poor OS. (And yes, I will equally much resist any requests to put in Solaris- specific behavior...) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] pg 7.4.x - pg_restore impossibly slow
"patrick keshishian" <[EMAIL PROTECTED]> writes: > My dev box is much slower hardware than the customer's > server. Even with that difference I expected to be able to > pg_restore the database within one day. But no. Seems a bit odd. Can you narrow down more closely which step of the restore is taking the time? (Try enabling log_statements.) One thought is that kicking up work_mem and vacuum_mem is likely to help for some steps (esp. CREATE INDEX and foreign-key checking). And be sure you've done the usual tuning for write-intensive activity, such as bumping up checkpoint_segments. Turning off fsync wouldn't be a bad idea either. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Inserts optimization?
Chris <[EMAIL PROTECTED]> writes: > Francisco Reyes wrote: >> Doing my first write heavy database. >> What settings will help improve inserts? >> Only a handfull of connections, but each doing up to 30 inserts/second. > If you can, use copy instead: > http://www.postgresql.org/docs/8.1/interactive/sql-copy.html Or at least try to do multiple inserts per transaction. Also, increasing checkpoint_segments and possibly wal_buffers helps a lot for write-intensive loads. Try to get the WAL onto a separate disk spindle if you can. (These things don't matter for SELECTs, but they do matter for writes.) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3
I am thinking the most flexible solution would be to get a dual Operon machine, and initially do both data loading and queries on the same machine. When the load gets too high, buy a second machine and set it up as a Slony slave and run your queries on that, and do the data loads on the original machine as master. --- Juan Casero (FL FLC) wrote: > Because I plan to develop a rather large (for us anyway) data warehouse > with PostgreSQL. I am looking for the right hardware that can handle > queries on a database that might grow to over a 100 gigabytes. Right > now our decision support system based on postgresql 8.1.3 stores retail > sales information for about 4 four years back *but* only as weekly > summaries. I want to build the system so it can handle daily sales > transactions also. You can imagine how many more records this will > involve so I am looking for hardware that can give me the performance I > need to make this project useable. In other words parsing and loading > the daily transaction logs for our stores is likely to take huge amounts > of effort. I need a machine that can complete the task in a reasonable > amount of time. As people start to query the database to find sales > related reports and information I need to make sure the queries will run > reasonably fast for them. I have already hand optimized all of my > queries on the current system. But currently I only have weekly sales > summaries. Other divisions in our company have done a similar project > using MS SQL Server on SMP hardware far outclassing the database server > I currently use and they report heavy loads on the server with less than > ideal query run times. I am sure I can do my part to optimize the > queries once I start this project but there is only so much you can do. > At some point you just need more powerful hardware. This is where I am > at right now. Apart from that since I will only get this one chance to > buy a new server for data processing I need to make sure that I buy > something that can grow over time as our needs change. I don't want to > buy a server only to find out later that it cannot meet our needs with > future database projects. I have to balance a limited budget, room for > future performance growth, and current system requirements. Trust me it > isn't easy. > > > Juan > > -Original Message- > From: Josh Berkus [mailto:[EMAIL PROTECTED] > Sent: Thursday, April 06, 2006 2:57 AM > To: pgsql-performance@postgresql.org > Cc: Juan Casero (FL FLC); Luke Lonergan > Subject: Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3 > > Juan, > > > Ok that is beginning to become clear to me. Now I need to determine > > if this server is worth the investment for us. Maybe it is not a > > speed daemon but to be honest the licensing costs of an SMP aware > > RDBMS is outside our budget. > > You still haven't explained why you want multi-threaded queries. This > is sounding like keeping up with the Joneses. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] bad performance on Solaris 10
Bruce, Hard to answer that... People like me who know and love PostgreSQL and Solaris finds this as an opportunity to make their favorite database work best on their favorite operating system. Many times PostgreSQL has many things based on assumption that it will run on Linux and it is left to Solaris to emulate that behavior.That said there are ways to improve performance even on UFS on Solaris, it just requires more tweaks. Hopefully this will lead to few Solaris friendly default values like fsync/odatasync :-) Regards, Jignesh Bruce Momjian wrote: It is hard to imagine why people spend so much time modifying Sun machines run with acceptable performance when non-Sun operating systems work fine without such hurtles. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
Jim C. Nasby wrote: No, I don't agree with this. Too many people waste time designing for "what if..." scenarios that never happen. You don't want to be dumb and design something that locks out a foreseeable and likely future need, but referential integrity doesn't meet this criterion. There's nothing to keep you from changing from app-managed to database-managed referential integrity if your needs change. In this case your argument makes no sense, because you will spend far more time re-creating RI capability inside an application than if you just use what the database offers natively. But one of the specific conditions in my original response was, "You have application-specific knowledge about when you can skip referential integrity and thereby greatly improve performance." If you can't do that, I agree with you. Anyway, this discussion is probably going on too long, and I'm partly to blame. I think we all agree that in almost all situations, using the database to do referential integrity is the right choice, and that you should only violate this rule if you have a really, really good reason, and you've thought out the implications carefully, and you know you may have to pay a steep price later if your requirements change. Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] pg 7.4.x - pg_restore impossibly slow
Greetings, I have 395M pg_dump from a PostgreSQL 7.4.2 database. This dump is from one of our customer's servers. There is a web-based administration UI which has been reported to be extremely slow and unusable. To see what's going on with their data I have grabbed a copy of their nightly pg_dump output and attempting to restore it on my development box, running PostgreSQL 7.4.12. My dev box is much slower hardware than the customer's server. Even with that difference I expected to be able to pg_restore the database within one day. But no. After leaving pg_restore running for about 2 days, I ctrl-C'ed out of it (see copy/paste below along with other info). I must say, that data was being restored, as I could do select count(*) on tables which had their data restored and I would get valid counts back. The database contains 34 tables. The pg_restore seems to restore the first 13 tables pretty quickly, but they do not have many records. The largest amongst them with ~ 17,000 rows. Then restore gets stuck on a table with 2,175,050 rows. Following this table another table exists with 2,160,616 rows. One thing worth mentioning is that the PostgreSQL package that got deployed lacked compression, as in: $ pg_dump -Fc dbname > dbname.DUMP pg_dump: [archiver] WARNING: requested compression not available in this installation -- archive will be uncompressed Any suggestions as to what may be the problem here? I doubt that the minor version mis-match is what's causing this problem. (I am try this test on another machine with the same version of PostgreSQL installed on it, and right now, it is stuck on the first of the two huge tables, and it has already been going for more than 2 hrs). I'm open to any ideas and/or suggestions (within reason) :) Best regards, --patrick [EMAIL PROTECTED]:/tmp$ date Mon Apr 10 15:13:19 PDT 2006 [EMAIL PROTECTED]:/tmp$ pg_restore -ad dbname customer_db.DUMP ; date ^C [EMAIL PROTECTED]:/tmp$ date Wed Apr 12 10:40:19 PDT 2006 [EMAIL PROTECTED]:/tmp$ uname -a Linux devbox 2.4.31 #6 Sun Jun 5 19:04:47 PDT 2005 i686 unknown unknown GNU/Linux [EMAIL PROTECTED]:/tmp$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping: 6 cpu MHz : 731.477 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 1461.45 [EMAIL PROTECTED]:/tmp/$ cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 527499264 523030528 44687360 10301440 384454656 Swap: 1579204608 552960 1578651648 MemTotal: 515136 kB MemFree: 4364 kB MemShared: 0 kB Buffers: 10060 kB Cached: 374984 kB SwapCached:460 kB Active: 79004 kB Inactive: 306560 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 515136 kB LowFree: 4364 kB SwapTotal: 1542192 kB SwapFree: 1541652 kB postgresql.conf changes on devbox: checkpoint_segments = 10 log_pid = true log_timestamp = true The checkpoint_segments was changed to 10 after seeing many "HINT"s in PostgreSQL log file about it. Doesn't seem to have affected pg_restore performance. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Inserts optimization?
Francisco Reyes wrote: Doing my first write heavy database. What settings will help improve inserts? Only a handfull of connections, but each doing up to 30 inserts/second. Plan to have 2 to 3 clients which most of the time will not run at the same time, but ocasionaly it's possible two of them may bump into each other. If you can, use copy instead: http://www.postgresql.org/docs/8.1/interactive/sql-copy.html MUCH quicker (and don't worry about using multiple clients). -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Inserts optimization?
Doing my first write heavy database. What settings will help improve inserts? Only a handfull of connections, but each doing up to 30 inserts/second. Plan to have 2 to 3 clients which most of the time will not run at the same time, but ocasionaly it's possible two of them may bump into each other. If anyone recalls a previous thread like this please suggest keywords to search on. My search on this topic came back pretty empty. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] multi column query
On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote: > Hi > > When I update a table that has 20 columns and the where clause includes > 16 of the columns (this is a data warehousing type update on aggregate > fields), > > The bitmap scan is not used by the optimizer. The table is indexed on 3 > of the 20 fields. The update takes really long to finish (on a 6 million > row table) > > Do I need to do some "magic" with configuration to turn on bitmap scans. No. What's explain analyze of the query show? What's it doing now? Seqscan? You might try set enable_seqscan=off and see what that does. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pgmemcache
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Why are AFTER COMMIT triggers impossible? What happens if such a trigger gets an error? You can't un-commit. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] multi column query
Hi When I update a table that has 20 columns and the where clause includes 16 of the columns (this is a data warehousing type update on aggregate fields), The bitmap scan is not used by the optimizer. The table is indexed on 3 of the 20 fields. The update takes really long to finish (on a 6 million row table) Do I need to do some “magic” with configuration to turn on bitmap scans.
Re: [PERFORM] pgmemcache
On Wed, Apr 12, 2006 at 04:03:43PM -0700, Josh Berkus wrote: > PFC, > > > It would be nice to have ON COMMIT triggers for this use. > > > > However you can emulate ON COMMIT triggers with a modification of the > > memcache update process : > > Well, I'm back in touch with the GORDA project so possibly we can have > BEFORE COMMIT triggers after all. > > BTW, it's important to note that AFTER COMMIT triggers are logically > impossible, so please use BEFORE COMMIT so that it's clear what we're > talking about. Why are AFTER COMMIT triggers impossible? ISTM they would be useful as a means to indicate to some external process if a transaction succeeded or not. And for some things you might only want to fire the trigger after you knew that the transaction had committed. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pgmemcache
PFC, > It would be nice to have ON COMMIT triggers for this use. > > However you can emulate ON COMMIT triggers with a modification of the > memcache update process : Well, I'm back in touch with the GORDA project so possibly we can have BEFORE COMMIT triggers after all. BTW, it's important to note that AFTER COMMIT triggers are logically impossible, so please use BEFORE COMMIT so that it's clear what we're talking about. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Better index stategy for many fields with few values
Adding -performance back in -Original Message-From: Oscar Picasso [mailto:[EMAIL PROTECTED]Sent: Wednesday, April 12, 2006 5:51 PMTo: Jim NasbySubject: Re: [PERFORM] Better index stategy for many fields with few values I would like to try it.However in an other post I added that contrary to what I stated initially all the paramXX columns are not mandatory in the query. So it seems that requirement make the problem more complexe.Doesn't this new requirement rule out this solution? No, just group the columns logically. By the way I have test to index each column individually and check what happens in relation to bitscan map. My test table is 1 million rows. The explain analyze command shows that a bit scan is sometimes used but I still end up with queries that can take up to 10s which is way to much."Jim C. Nasby" <[EMAIL PROTECTED]> wrote: On Wed, Apr 12, 2006 at 02:59:32PM +0200, Markus Schaber wrote:> > I was thinking about using a multicolumns index, but I have read that> > we should limit multicolumns indice to at most 2 or 3 columns.> > Yes, that's true, the index overhead gets too high.> > > I was also thinking about about using a functional index.> > If there's a logical relation between those values that they can easily> combined, that may be a good alternative.How would that be any better than just doing a multi-column index?> I just had another weird idea:> > As your paramXX values can have only 10 parameters, it also might be> feasible to use a bunch of 10 conditional indices, like:> > CREATE INDEX foo1 ON table (param1, param2 WHERE param0='1st value';> CREATE INDEX foo2 ON table (param1, param2 WHERE param0='2nd value';> CREATE INDEX foo3 ON table (param1, param2 WHERE param0='3rd value';> [...]Not all that weird; it's known as index partitioning.-- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED]Pervasive Software http://pervasive.com work: 512-231-6117vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461---(end of broadcast)---TIP 4: Have you searched our list archives?http://archives.postgresql.org Yahoo! Messenger with Voice. PC-to-Phone calls for ridiculously low rates.
Re: [PERFORM] bad performance on Solaris 10
People, > Lately I find people are not so receptive to VxFS, and Sun is promoting > ZFS, and we don't have a reasonable near term option for Raw IO in > Postgres, so we need to work to find a reasonable path for Solaris users > IMO. The long delays in ZFS production haven't helped us there, as the > problems with UFS are severe. FWIW, I'm testing on ZFS now. But it's not stable yet. People are welcome to join the Solaris 11 beta program. In the near term, there are fixes to be made both in PostgreSQL configuration and in Solaris configuration. Also, some of the work being done for 8.2 ... the external sort work done by Simon and sponsored by GreenPlum, and the internal sort work which Jonah and others are doing ... will improve things on Solaris as our sort issues hit Solaris harder than other OSes. Expect lots more info on performance config for Solaris from me & Robert in the next few weeks. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Better index stategy for many fields with few
Oscar, On 4/10/06 9:58 AM, "Oscar Picasso" <[EMAIL PROTECTED]> wrote: > - My items table: > code int -- can take one of 100 values > property varchar(250) -- can take one of 5000 values > param01 char(10) -- can take one of 10 values > param02 char(10) -- can take one of 10 values > ... > [ 20 similar columns } > ... > parama20 char(10) -- can take one of 10 values > > - The kind of query I want to optimize: > select * from items > where code betwwen 5 and 22 > and param01 = 'P' > and param02 = 'G' > ... > [ all the 20 paramXX columns are used in the query} > ... > and param20 = 'C'; Bizgres 0.9 has an on-disk bitmap index which will likely improve this query speed by a very large amount over normal Postgres 8.1. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] bad performance on Solaris 10
Bruce, On 4/12/06 12:56 PM, "Bruce Momjian" wrote: > It is hard to imagine why people spend so much time modifying Sun > machines run with acceptable performance when non-Sun operating systems > work fine without such hurtles. There are a lot of Solaris customers that we support and that we'd like to support. To many of them, Solaris has many advantages other than speed, though they expect a reasonably comparable performance, perhaps within a factor of 2 of other options. Oracle has spent a great deal of time (a decade!) optimizing their software for Solaris, and it shows. There are also some typical strategies that Solaris people used to use to make Solaris perform better, like using VxFS (Veritas Filesystem), or Oracle Raw IO to make their systems perform better. Lately I find people are not so receptive to VxFS, and Sun is promoting ZFS, and we don't have a reasonable near term option for Raw IO in Postgres, so we need to work to find a reasonable path for Solaris users IMO. The long delays in ZFS production haven't helped us there, as the problems with UFS are severe. We at Greenplum have worked hard over the last year to find options for Postgres on Solaris and have the best configuration setup that we think is possible now on UFS, and our customers benefit from that. However, Linux on XFS or even ext3 is definitely the performance leader. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On Wed, Apr 12, 2006 at 10:36:28AM -0700, Craig A. James wrote: > Jim C. Nasby wrote: > >>1. You have only one application that modifies the data. (Otherwise, you > >>have to duplicate the rules across many applications, leading to a > >>code-maintenance nightmare). > > > >You forgot something: > > > >1a: You know that there will never, ever, ever, ever, be any other > >application that wants to talk to the database. > > > >I know tons of people that get burned because they go with something > >that's "good enough for now", and then regret that decision for years to > >come. > > No, I don't agree with this. Too many people waste time designing for > "what if..." scenarios that never happen. You don't want to be dumb and > design something that locks out a foreseeable and likely future need, but > referential integrity doesn't meet this criterion. There's nothing to keep > you from changing from app-managed to database-managed referential > integrity if your needs change. In this case your argument makes no sense, because you will spend far more time re-creating RI capability inside an application than if you just use what the database offers natively. It's certainly true that you don't want to over-engineer for no reason, but many times choices are made to save a very small amount of time or hassle up-front, and those choices become extremely painful later. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Better index stategy for many fields with few values
On Wed, Apr 12, 2006 at 02:59:32PM +0200, Markus Schaber wrote: > > I was thinking about using a multicolumns index, but I have read that > > we should limit multicolumns indice to at most 2 or 3 columns. > > Yes, that's true, the index overhead gets too high. > > > I was also thinking about about using a functional index. > > If there's a logical relation between those values that they can easily > combined, that may be a good alternative. How would that be any better than just doing a multi-column index? > I just had another weird idea: > > As your paramXX values can have only 10 parameters, it also might be > feasible to use a bunch of 10 conditional indices, like: > > CREATE INDEX foo1 ON table (param1, param2 WHERE param0='1st value'; > CREATE INDEX foo2 ON table (param1, param2 WHERE param0='2nd value'; > CREATE INDEX foo3 ON table (param1, param2 WHERE param0='3rd value'; > [...] Not all that weird; it's known as index partitioning. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] bad performance on Solaris 10
Luke Lonergan wrote: > Alvaro, > > On 4/5/06 2:48 PM, "Alvaro Herrera" <[EMAIL PROTECTED]> wrote: > > > This essentially means stopping all bgwriter activity, thereby deferring > > all I/O until checkpoint. Was this considered? With > > checkpoint_segments to 128, it wouldn't surprise me that there wasn't > > any checkpoint executed at all during the whole test ... > > Yes, many things about the Solaris UFS filesystem caused a great deal of > pain over the 10 months of experiments we ran with Sun MDE. Ultimately, the > conclusion was that ZFS is going to make all of the pain go away. > > In the meantime, all you can do is tweak up UFS and avoid I/O as much as > possible. It is hard to imagine why people spend so much time modifying Sun machines run with acceptable performance when non-Sun operating systems work fine without such hurtles. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
Jim C. Nasby wrote: 1. You have only one application that modifies the data. (Otherwise, you have to duplicate the rules across many applications, leading to a code-maintenance nightmare). You forgot something: 1a: You know that there will never, ever, ever, ever, be any other application that wants to talk to the database. I know tons of people that get burned because they go with something that's "good enough for now", and then regret that decision for years to come. No, I don't agree with this. Too many people waste time designing for "what if..." scenarios that never happen. You don't want to be dumb and design something that locks out a foreseeable and likely future need, but referential integrity doesn't meet this criterion. There's nothing to keep you from changing from app-managed to database-managed referential integrity if your needs change. Design for your current requirements. Let us be of good cheer, remembering that the misfortunes hardest to bear are those which never happen. - James Russell Lowell (1819-1891) Therefore do not be anxious about tomorrow, for tomorrow will be anxious for itself. Let the day's own trouble be sufficient for the day. - Matthew 6:34 Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On Wed, Apr 12, 2006 at 07:45:17AM -0700, Craig A. James wrote: > All good advice, but... there are no absolutes in this world. > Application-enforced referential integrity makes sense if (and probably > ONLY if): > > 1. You have only one application that modifies the data. (Otherwise, you > have to duplicate the rules across many applications, leading to a > code-maintenance nightmare). You forgot something: 1a: You know that there will never, ever, ever, ever, be any other application that wants to talk to the database. I know tons of people that get burned because they go with something that's "good enough for now", and then regret that decision for years to come. > 2. If your application crashes and leaves a mess, it's not a catastrophe, > and you have a good way to clean it up. For example, a bank shouldn't do > this, but it might be OK for a computer-aided-design application, or the > backend of a news web site. > > 3. You have application-specific knowledge about when you can skip > referential integrity and thereby greatly improve performance. For > example, you may have batch operations where large numbers of rows are > temporarily inconsistent. > > If your application doesn't meet ALL of these criteria, you probably should > use the database for referential integrity. > > Craig > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On Wed, Apr 12, 2006 at 09:22:52AM +0200, PFC wrote: > > >> I think this is an old question, but I want to know if it really is > >>well worth to not create some foreign keys an deal with the referential > >>integrity at application-level? > > Trust me : do it in the application and you'll enter a world of > hurt. I'm doing it with some mysql apps, and it's a nightmare ; doing > cascaded delete's by hand, etc, you always forget something, you have to > modify a million places in your code everytime you add a new table, your > ORM bloats, you get to write cleanup cron scripts which take forever to > run, your website crashes etc. Well, yeah, thats typical for MySQL sites, but what's that have to do with RI? Sorry, couldn't resist. :P ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On Wed, 2006-04-12 at 09:49, Rodrigo Sakai wrote: > Thanks for all help!! But my problem is with performance, I agree with all > of you, the RI must be maintained by the database, because a bunch of > reasons that everyone knows! > But, I'm dealing with a very huge database that servers more than 200 > clientes at the same time, and because of it, each manipulation (delete, > insert, update, select) on the database have a poor performance. So, if we > deal with RI in each client station, we take this work off the database! > The application is an ERP developed with DELPHI + (postgresql or oracle or > sql server)!! These are separate issues. One is performance of PostgreSQL handling FK->PK relationships. PostgreSQL, in my experience, is quite fast at this. However, there are ways you can set up FK->PK relationships that are non-optimal and will result in poor performance. FK->PK relationships are generally fastest when they are 1-1 and based on integer types. If there's a type mismatch, or you use slower types, like large text fields, or numerics, you may have poor performance. Give us a sample of your schema where you're having problems, let us help you troubleshoot your performance. High parallel load is another issue. No matter where you put your FK->PK relationship handling, having 200+ users connected at the same time and manipulating your database is a heavy load. Handling FK->PK relationships in software often is vulnerable to race conditions. Like so: (T1 and T2 are different "threads) T1: select id from mastertable where id=99; -- check for row T2: delete from mastertable where id=99; -- delete a row T1: insert into slavetable values (); -- whoops! No master If we change the T1 to select for update, we now have the overhead that most FK->PK relationships have. What version of PostgreSQL are you running. Older versions had much poorer performance than newer versions when updating FK->PK relationships. Don't assume that application level FK->PK relationships will be faster AND as good as the ones at database level. It's quite possible that they're faster for you because you're cutting corners, referentially speaking, and your data will wind up incoherent over time. Also, you may be dealing with a database that is IO bound, and moving the FK checks to software is only a short stop gap, and as the machine hits the IO performance ceiling, you'll have the same problem again, need a bigger machine, and have incoherent data. I.e. the same problem, plus a few more, and have spent a lot of time spinning your wheels going a short distance. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
What kinds of operations are made slow by foreign key checks ? Is it : - Simple checks on INSERT ? - Simple checks on UPDATE ? - Cascaded deletes ? - Cascaded updates ? - Locks ? - Something else ? Foreign keys are to ensure that the value in a column is always part of a specific set (the referenced table). If this set changes very rarely, like the list of countries in the world, or the list of states in a country, or the various possible states an order can be in (received, processed, shipped...) then it's alright to skip the check if you're sure your application will insert a valid value. For some other situations, doing the check in the application will need some queries and could be slower (and certainly will be more complicated...) Are you sure you're not missing a few indexes, which would then force fkey checks to use sequential scans ? Thanks for all help!! But my problem is with performance, I agree with all of you, the RI must be maintained by the database, because a bunch of reasons that everyone knows! But, I'm dealing with a very huge database that servers more than 200 clientes at the same time, and because of it, each manipulation (delete, insert, update, select) on the database have a poor performance. So, if we deal with RI in each client station, we take this work off the database! The application is an ERP developed with DELPHI + (postgresql or oracle or sql server)!! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On Apr 12, 2006, at 23:49 , Rodrigo Sakai wrote: Thanks for all help!! But my problem is with performance, I agree with all of you, the RI must be maintained by the database, because a bunch of reasons that everyone knows! You've gotten a variety of good advice from a number of people. For more specific advice (e.g., for your particular situation), it would be very helpful if you could provide examples of queries that aren't performing well for you (including table schema and explain analyze output). Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
On 4/11/06, Rodrigo Sakai <[EMAIL PROTECTED]> wrote: > > Hi, > > I think this is an old question, but I want to know if it really is well > worth to not create some foreign keys an deal with the referential integrity > at application-level? > Specifically, the system we are developing is a server/cliente > architecture that the server is the database and the fat client is an > application developed in DELPHI!!! > > Thanks in advance!! Delphi IMO is the best RAD win32 IDE ever invented (especially when paired the very excellent Zeos connection objects). However, for purposes of data management ,imperative languages, Delphi included, simply suck. Great form editor though. merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
Thanks for all help!! But my problem is with performance, I agree with all of you, the RI must be maintained by the database, because a bunch of reasons that everyone knows! But, I'm dealing with a very huge database that servers more than 200 clientes at the same time, and because of it, each manipulation (delete, insert, update, select) on the database have a poor performance. So, if we deal with RI in each client station, we take this work off the database! The application is an ERP developed with DELPHI + (postgresql or oracle or sql server)!! Thanks again!! - Original Message - From: "Markus Schaber" <[EMAIL PROTECTED]> To: Cc: "Rodrigo Sakai" <[EMAIL PROTECTED]> Sent: Wednesday, April 12, 2006 10:18 AM Subject: Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE > Hi, Michael, > Hi, Rodrigo, > > Michael Glaesemann wrote: > > > If I had to choose between one or the other, I'd leave all referential > > integrity in the database and deal with the errors thrown when > > referential integrity is violated in the application. PostgreSQL is > > designed to handle these kinds of issues. Anything you code in your > > application is more likely to contain bugs or miss corner cases that > > would allow referential integrity to be violated. PostgreSQL has been > > pounded on for years by a great many users and developers, making the > > likelihood of bugs still remaining much smaller. > > I strictly agree with Michael here. > > > Of course, you can add some referential integrity checks in your > > application code, but those should be in addition to your database- > > level checks. > > Agree. It does make sense to have reference checks in the UI or > application level for the sake of better error handling, but the > database should be the mandatory judge. > > There's another advantage of database based checking: Should there ever > be the need of a different application working on the same database (e. > G. an "expert level UI", or some connector that connects / synchronizes > to another software, or a data import tool), database based constraints > cannot be broken opposed to application based ones. > > HTH, > Markus > -- > Markus Schaber | Logical Tracking&Tracing International AG > Dipl. Inf. | Software Development GIS > > Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org > ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
I think this is an old question, but I want to know if it really is well worth to not create some foreign keys an deal with the referential integrity at application-level? Trust me : do it in the application and you'll enter a world of hurt. I'm doing it with some mysql apps, and it's a nightmare ; doing cascaded delete's by hand, etc, you always forget something, you have to modify a million places in your code everytime you add a new table, your ORM bloats, you get to write cleanup cron scripts which take forever to run, your website crashes etc. All good advice, but... there are no absolutes in this world. Application-enforced referential integrity makes sense if (and probably ONLY if): 1. You have only one application that modifies the data. (Otherwise, you have to duplicate the rules across many applications, leading to a code-maintenance nightmare). 2. If your application crashes and leaves a mess, it's not a catastrophe, and you have a good way to clean it up. For example, a bank shouldn't do this, but it might be OK for a computer-aided-design application, or the backend of a news web site. 3. You have application-specific knowledge about when you can skip referential integrity and thereby greatly improve performance. For example, you may have batch operations where large numbers of rows are temporarily inconsistent. If your application doesn't meet ALL of these criteria, you probably should use the database for referential integrity. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
Hi, Michael, Hi, Rodrigo, Michael Glaesemann wrote: > If I had to choose between one or the other, I'd leave all referential > integrity in the database and deal with the errors thrown when > referential integrity is violated in the application. PostgreSQL is > designed to handle these kinds of issues. Anything you code in your > application is more likely to contain bugs or miss corner cases that > would allow referential integrity to be violated. PostgreSQL has been > pounded on for years by a great many users and developers, making the > likelihood of bugs still remaining much smaller. I strictly agree with Michael here. > Of course, you can add some referential integrity checks in your > application code, but those should be in addition to your database- > level checks. Agree. It does make sense to have reference checks in the UI or application level for the sake of better error handling, but the database should be the mandatory judge. There's another advantage of database based checking: Should there ever be the need of a different application working on the same database (e. G. an "expert level UI", or some connector that connects / synchronizes to another software, or a data import tool), database based constraints cannot be broken opposed to application based ones. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sequencial scan instead of using index
Thanks Mark, >Hmm - that first query needs to do a sort, so you might want to experiment with the sort_mem parameter. Could you show us output from explain analyze for >both the above queries? Not too concerned about the sort, more about the query performance with seq scan as the tables size increases. >At face value, selecting 20 rows (assuming the estimates are accurate) may mean that a seqscan is the best plan! But we'll know more after seeing the >explain analyze... 20 rows is about right. I saw Tom's response on the planner improvement in 8.2 but I was still going to send the explain analyze output. However I can't show you explain analyze. The postmaster goes to 99% cpu and stays there. The explain analyze command hangs... It is starting to look like inheritance does help in modeling the data, but for searches parallel flat tables that don't use inheritance is required to get optimum query performance. Has anyone else come to this conclusion? Thanks ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Better index stategy for many fields with few values
Hi, Oscar, Oscar Picasso wrote: > [ all the 20 paramXX columns are used in the query} > How can I optimize this kind of query? PostgreSQL 8.1 has so-called bitmap index scans, which can combine several index scans before actually accessing the data. So I think it's best to create an index on each of the paramXX columns, and see with EXPLAIN ANALYZE what it is doing. > I was thinking about using a multicolumns index, but I have read that > we should limit multicolumns indice to at most 2 or 3 columns. Yes, that's true, the index overhead gets too high. > If that's true then 22 columns for a multicolumn incdex seems way too > much. Or maybe it is workable as every column uses only a very limited > set of values? Yes, I think that a 22 column index is way too much, especially with the new bitmap index scans available. > I was also thinking about about using a functional index. If there's a logical relation between those values that they can easily combined, that may be a good alternative. I just had another weird idea: As your paramXX values can have only 10 parameters, it also might be feasible to use a bunch of 10 conditional indices, like: CREATE INDEX foo1 ON table (param1, param2 WHERE param0='1st value'; CREATE INDEX foo2 ON table (param1, param2 WHERE param0='2nd value'; CREATE INDEX foo3 ON table (param1, param2 WHERE param0='3rd value'; [...] This way, you don't have the index bloat of a 3-column index, but 10 2-column indices that cover 1/10th of the table each. For 22 columns, you'd need a bunch of seven such indices plus a single-column one, or can use some 3+1 and some 2+1 column index. I'd like to see the query plans from explain analyze. Btw, I expect query planning time to get rather significant for so much columns, so gequo tuning, tuning work_mem (for the bitmap scans) and prepared statements will pay off. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE
I think this is an old question, but I want to know if it really is well worth to not create some foreign keys an deal with the referential integrity at application-level? Trust me : do it in the application and you'll enter a world of hurt. I'm doing it with some mysql apps, and it's a nightmare ; doing cascaded delete's by hand, etc, you always forget something, you have to modify a million places in your code everytime you add a new table, your ORM bloats, you get to write cleanup cron scripts which take forever to run, your website crashes etc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pgmemcache
It would be nice to have ON COMMIT triggers for this use. However you can emulate ON COMMIT triggers with a modification of the memcache update process : - A standard trigger sends the data to update to memcache - The trigger also sends the PID - Instead of being used immediately, this data is kept in a buffer - Notify is issued On commit : - postgres sends the NOTIFY signal - the memcache updater reads the NOTIFY (which embeds the PID I believe) ; it finds the buffered data sent above and uses it to update memcached On rollback : - Interesting problem ;))) OK, it's a kludge. When can we get ON COMMIT triggers ? On Tue, Apr 04, 2006 at 12:24:42AM -0700, C Storm wrote: I was wondering if anyone on the list has a successful installation of pgmemcache running that uses LISTEN/NOTIFY to signal a successfully completed transaction, i.e., to get around the fact that TRIGGERS are transaction unaware. Or perhaps any other information regarding a successful deployment of pgmemcache. The problem with attempting that is that you'd have a window between transaction commit and when the cache was invalidated. If that's acceptable then it shouldn't be too difficult to set something up using LISTEN/NOTIFY like you describe. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly