Re: [GENERAL] Moving to postgresql and some ignorant questions
On 8/14/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote: > Thank you for this detailed explanation Alban. But I want to include > FK constraints on a table2 on a column in the referenced table1 where > column values are not unique. > > I just want row data to be consistent for the same ID. Yes, this is > repetitive and enough to rile DB purists, but it has its uses (for > performance in certain reporting queries). Then you need to define a lookup table, and have both of your tables reference it by foreign key. You can create an update trigger on one of the child tables to put a row into the lookup table if it doesn't exist. If I'm not clear, let me know. > 1. Should I explore views for this? I am very skeptical about them > coming from MySQL as the performance of MySQL views is horrendous. > Besides, if they are updated everytime, there's little use for a view > in the first place, I may as well simply query the table -- or is this > wrong? The UPDATE only locks and commits to the table, and then the > view gets auto updated? In pgsql, views are actually empty tables that are defined by a SQL statement and fired by rules when you select from the view. I.e. create view abc as select * from xyz becomes an empty table abc which has a rule for selects that runs select * from xyz when you access it. The performance of select * from abc will be almost exactly the same as select * from xyz, except for some very small overhead from the rules engine. The real uses for views are to allow you to reduce query complexity in the client. Suppose you have a query that joins and / or unions a dozen tables with really complex join logic. you can just wrap it in a view, and when you select from the view, postgresql will execute the real query behind it as though you passed it in. > 2. Or, I could do this with triggers, and now I realize also with > "rules" (CREATE RULE). Which are faster, rules or triggers, are they > similar in speed? Basically I want the rule/trigger to cascade the > update to table1.col1 and table1.col2 to similar columns in table2. I think you're talking about updatable views, which you can build with postgresql. Rules can let you do this pretty easily. > I just wanted to know what the usual thoughts on this are. Are views > updated as soon as its underlying table(s) are updated? Yep, because views are just enclosed queries. Note that you CAN do materialized views with pgsql. Once you've gotten familiar with regular postgresql stuff, look up materialized views for postgresql again on google. they're not that hard really, but most the time you really don't need them. > Would appreciate any thoughts on performance of views. PGSQL seems to > treat views just like tables, so I wonder if there's any performance > gain! nope, but no great loss either. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Moving to postgresql and some ignorant questions
On 8/14/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote: > > *And* you can define compound foreign key constraints, > Thank you for this detailed explanation Alban. But I want to include > FK constraints on a table2 on a column in the referenced table1 where > column values are not unique. > > I just want row data to be consistent for the same ID. Yes, this is > repetitive and enough to rile DB purists, but it has its uses (for > performance in certain reporting queries). I'm not sure I understand this. If the set of values you want to reference is not unique, what are the semantics of this reference? What should happen if one of those sets gets deleted, for instance? Perhaps you mean table1 has columns "id" and "col2", and the "col2" values are not unique, but "id" and "col2" together are? In that case you can simply put a UNIQUE constraint on them together, and use both columns in a single foreign key constraint from table2. > 1. Should I explore views for this? I am very skeptical about them > coming from MySQL as the performance of MySQL views is horrendous. > Besides, if they are updated everytime, there's little use for a view > in the first place, I may as well simply query the table -- or is this > wrong? The UPDATE only locks and commits to the table, and then the > view gets auto updated? A view is simply an alternate presentation of data in one or more tables. It's a persistent query, and the performance is effectively the same as the query itself. (More on "materialized views" below.) In regard to locking, it sounds like you're thinking in MYISAM terms; PostgreSQL uses MVCC instead of locking. Two UPDATEs attempted at the same time may cause one to wait for the other (if both touch the same rows), but during this time all readers (e.g. SELECT) will continue to run without waiting. If you were running into concurrent performance issues with MYISAM due to locking, PostgreSQL might surprise you. > 2. Or, I could do this with triggers, and now I realize also with > "rules" (CREATE RULE). Which are faster, rules or triggers, are they > similar in speed? Basically I want the rule/trigger to cascade the > update to table1.col1 and table1.col2 to similar columns in table2. They are simply different mechanisms. Rules rewrite a query during the parsing phase, and then run the eventual query tree as if you'd entered it yourself. Triggers are procedural actions in response to events. Normally you pick one based on the semantics of what you want to do. If you can use foreign keys as above, ON UPDATE CASCADE will probably do what you want without having to do anything else. > I will surely be exploring views, and reading more of this: > http://www.postgresql.org/docs/8.2/interactive/rules-views.html , but > I just wanted to know what the usual thoughts on this are. Are views > updated as soon as its underlying table(s) are updated? Can I control > the duration or timing of their update? I searched for "materialized > views", after having seen that word on the performance list, but most > of the search results and the discussions on that forum are beyond my > comprehension!! Yeah, you're confusing standard views with "materialized views". A standard view is basically a persistent query, and does not store any data itself. PostgreSQL implements them with rules. When you SELECT against a view, the query rewriter combines your conditions with the stored SELECT used to create the view, and runs the final query against the table(s) you created the view from, just as if you'd entered it yourself. If the original query against the table(s) is complex, a view makes a very nice way to simplify things for applications. If you want to create an "updateable view", where applications can INSERT/UPDATE/DELETE against the view itself, you need to add rules to it to translate operations on the presented view columns into operations on the table(s) behind them. A "materialized view" is basically a view that stores copies of its data. This is useful in situations where you need some kind of cache, possibly because the query behind the view takes a long time to complete. PostgreSQL does not have built in support for materialized views, but rules and triggers can be used to create them. The discussions you encountered revolve around the finer points of doing that. > Would appreciate any thoughts on performance of views. PGSQL seems to > treat views just like tables, so I wonder if there's any performance > gain! In general, a view is performance neutral: it's just a mechanism for simplifying presentation of data. It also has uses for security, since you can create a view and grant roles access to it while still denying them access to the underlying tables. This could be used to hide a particular table column, for instance. It doesn't sound like this is of any use for your application though. ---(end of broadcast)--- TIP 9: In versions below 8.0, the
Re: [GENERAL] Moving to postgresql and some ignorant questions
"Phoenix Kiula" <[EMAIL PROTECTED]> writes: > There are some cases where I would like to bunch queries into a > transaction purely for speed purposes, but they're not interdependent > for integrity. E.g., How do you know you need to do this for speed if you haven't run it yet? I would suggest you build your application around the application needs first, then later look at how to optimize it. Remember the two rules of optimization: 1) Don't 2) (for experts only) Don't yet The only case where you should consider batching together transactions like that is if you're processing a batch data load of some kind. In that case you have a large volume of updates and they're all single-threaded. But usually in that case you want to abort the whole load if you have a problem. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Moving to postgresql and some ignorant questions
> You could do this with savepoints which are a kind of sub-transaction inside a > "bigger" transaction. > > e.g.: > BEGIN TRANSACTION; > > SAVEPOINT sp1; > UPDATE1; > IF (failed) rollback to savepoint sp1; > > SAVEPOINT sp1; > UPDATE2; > IF (failed) rollback to savepoint sp2; Thanks Thomas, this is a great feature even if I am not looking for it right now, I bet I can use it at some point! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Moving to postgresql and some ignorant questions
On 8/14/07, Bill Moran <[EMAIL PROTECTED]> wrote: > > But the rule is, if any query within the transaction errors, then all > queries > within the transaction are rolled back. This is the default behaviour, but with psql and ON_ERROR_ROLLBACK parameter the behaviour can be changed. See http://www.postgresql.org/docs/8.2/interactive/app-psql.html Regards MP
Re: [GENERAL] Moving to postgresql and some ignorant questions
Phoenix Kiula wrote on 14.08.2007 19:46: There are some cases where I would like to bunch queries into a transaction purely for speed purposes, but they're not interdependent for integrity. E.g., BEGIN TRANSACTION; UPDATE1; UPDATE2; UPDATE3; COMMIT; If UPDATE2 fails because it, say, violates a foreign key constraint, then so be it. I want UPDATE3 to go ahead. Is this not possible, or is there an option I can use to do these kind of independent-query transactions? You could do this with savepoints which are a kind of sub-transaction inside a "bigger" transaction. e.g.: BEGIN TRANSACTION; SAVEPOINT sp1; UPDATE1; IF (failed) rollback to savepoint sp1; SAVEPOINT sp1; UPDATE2; IF (failed) rollback to savepoint sp2; COMMIT; Details here: http://www.postgresql.org/docs/8.2/static/sql-savepoint.html But I doubt that this would be faster that doing a transaction per update. Thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Moving to postgresql and some ignorant questions
In response to "Phoenix Kiula" <[EMAIL PROTECTED]>: > > If you issue a BEGIN then nothing gets committed until you issue a COMMIT. > > If > > anything happens in the meantime then everything you've done since the BEGIN > > disappears. > > There are some cases where I would like to bunch queries into a > transaction purely for speed purposes, but they're not interdependent > for integrity. E.g., > > BEGIN TRANSACTION; > UPDATE1; > UPDATE2; > UPDATE3; > COMMIT; > > If UPDATE2 fails because it, say, violates a foreign key constraint, > then so be it. I want UPDATE3 to go ahead. Is this not possible, or is > there an option I can use to do these kind of independent-query > transactions? That's not possible, by design. However, your application could keep track of which queries have run, and if one fails, start the transaction over without the failing query. But the rule is, if any query within the transaction errors, then all queries within the transaction are rolled back. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Moving to postgresql and some ignorant questions
> If you issue a BEGIN then nothing gets committed until you issue a COMMIT. If > anything happens in the meantime then everything you've done since the BEGIN > disappears. > There are some cases where I would like to bunch queries into a transaction purely for speed purposes, but they're not interdependent for integrity. E.g., BEGIN TRANSACTION; UPDATE1; UPDATE2; UPDATE3; COMMIT; If UPDATE2 fails because it, say, violates a foreign key constraint, then so be it. I want UPDATE3 to go ahead. Is this not possible, or is there an option I can use to do these kind of independent-query transactions? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Moving to postgresql and some ignorant questions
> You're confusing CHECK constraints and FOREIGN KEY constraints. They're > different things ;) > > CHECK constraints verify that data in a certain column matches a certain > condition. I'm not sure they can reference columns in other tables, > unless you wrap those checks in stored procedures maybe... > > For example: > CREATE TABLE test ( > age int NOT NULL CHECK (age > 0) > ); > > Next to that, you can define DOMAINs - basically your own customized > data types that can follow _your_ rules. Admittedly I have never done > that yet, but it's supposed to be one of the key features of the > relational model (I've seen claims that you're actually not supposed to > use the base types, but define domains for all your data types). > > *And* you can define compound foreign key constraints, > for example: > > CREATE TABLE employee ( > employee_id serial NOT NULL, > company_id integer NOT NULL, > name text NOT NULL, > CONSTRAINT employee_pk > PRIMARY KEY (employee_id, company_id) > ); > > CREATE TABLE division ( > employee_id integer NOT NULL, > company_id integer NOT NULL, > name text NOT NULL, > CONSTRAINT division_fk > FOREIGN KEY (employee_id, company_id) > REFERENCES employee > ON DELETE SET NULL > ON UPDATE CASCADE > ); > > > Also a nice trick, when performing DDL statements (CREATE TABLE and > friends), you can wrap them in a transaction and commit (or rollback) if > you like the end result (or not). I believe the only exception to that > rule is CREATE DATABASE. Thank you for this detailed explanation Alban. But I want to include FK constraints on a table2 on a column in the referenced table1 where column values are not unique. I just want row data to be consistent for the same ID. Yes, this is repetitive and enough to rile DB purists, but it has its uses (for performance in certain reporting queries). Related Questions: 1. Should I explore views for this? I am very skeptical about them coming from MySQL as the performance of MySQL views is horrendous. Besides, if they are updated everytime, there's little use for a view in the first place, I may as well simply query the table -- or is this wrong? The UPDATE only locks and commits to the table, and then the view gets auto updated? 2. Or, I could do this with triggers, and now I realize also with "rules" (CREATE RULE). Which are faster, rules or triggers, are they similar in speed? Basically I want the rule/trigger to cascade the update to table1.col1 and table1.col2 to similar columns in table2. I will surely be exploring views, and reading more of this: http://www.postgresql.org/docs/8.2/interactive/rules-views.html , but I just wanted to know what the usual thoughts on this are. Are views updated as soon as its underlying table(s) are updated? Can I control the duration or timing of their update? I searched for "materialized views", after having seen that word on the performance list, but most of the search results and the discussions on that forum are beyond my comprehension!! Would appreciate any thoughts on performance of views. PGSQL seems to treat views just like tables, so I wonder if there's any performance gain! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Moving to postgresql and some ignorant questions
On 8/14/07, Alban Hertroys <[EMAIL PROTECTED]> wrote: > *And* you can define compound foreign key constraints, > for example: > > CREATE TABLE employee ( > employee_id serial NOT NULL, > company_id integer NOT NULL, > name text NOT NULL, > CONSTRAINT employee_pk > PRIMARY KEY (employee_id, company_id) > ); > > CREATE TABLE division ( > employee_id integer NOT NULL, > company_id integer NOT NULL, > name text NOT NULL, > CONSTRAINT division_fk > FOREIGN KEY (employee_id, company_id) > REFERENCES employee > ON DELETE SET NULL > ON UPDATE CASCADE > ); You can also have multiple foreign keys to different tables, and to non-primary keys, as long as they are pointing to columns with a unique constraint on them. > Also a nice trick, when performing DDL statements (CREATE TABLE and > friends), you can wrap them in a transaction and commit (or rollback) if > you like the end result (or not). I believe the only exception to that > rule is CREATE DATABASE. One of my all time favorite features of pgsql. create tablespace is also non-transactable. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Moving to postgresql and some ignorant questions
Phoenix Kiula wrote: > Agree. That's what I am trying to do. Include as many UPDATEs etc into > the same TRANSACTION block, but my worry is when I read about > autocommit and how it is enabled by default in postgresql 8. > Specifying explicit BEGIN and COMMIT blocks should only commit when I > want the DB to, or will each UPDATE in the middle of this block get > executed? The database doesn't have autocommit, AFAIK. Some of the clients have, but it only applies if you don't put your SQL statements between BEGIN; and COMMIT; (or ROLLBACK;) statements. I never really use anything but psql for a client, so I can'treally say how other clients (pgadmin fe.) handle this. >> Incidentally, most data integrity checks are handled with CHECK constraints >> and FOREIGN KEY constraints rather than manual triggers. They're both easier >> and cheaper. > > The problem with simple CHECK constraints is that they can only > reference the primary key in another table. What if I want more than > one column to be the same as the referenced table, but do not want to > have a compound primary key in the referenced table? From reading and > re-reading the manual, I dont think FKs allow for this. Only primary > key references are supported. You're confusing CHECK constraints and FOREIGN KEY constraints. They're different things ;) CHECK constraints verify that data in a certain column matches a certain condition. I'm not sure they can reference columns in other tables, unless you wrap those checks in stored procedures maybe... For example: CREATE TABLE test ( age int NOT NULL CHECK (age > 0) ); Next to that, you can define DOMAINs - basically your own customized data types that can follow _your_ rules. Admittedly I have never done that yet, but it's supposed to be one of the key features of the relational model (I've seen claims that you're actually not supposed to use the base types, but define domains for all your data types). *And* you can define compound foreign key constraints, for example: CREATE TABLE employee ( employee_id serial NOT NULL, company_id integer NOT NULL, name text NOT NULL, CONSTRAINT employee_pk PRIMARY KEY (employee_id, company_id) ); CREATE TABLE division ( employee_id integer NOT NULL, company_id integer NOT NULL, name text NOT NULL, CONSTRAINT division_fk FOREIGN KEY (employee_id, company_id) REFERENCES employee ON DELETE SET NULL ON UPDATE CASCADE ); Also a nice trick, when performing DDL statements (CREATE TABLE and friends), you can wrap them in a transaction and commit (or rollback) if you like the end result (or not). I believe the only exception to that rule is CREATE DATABASE. -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Moving to postgresql and some ignorant questions
"Phoenix Kiula" <[EMAIL PROTECTED]> writes: > If I am reading this right, does this mean it is probably better to > leave fsync as "fsync=off" on production machines? Only if you can reconstruct your data from other sources in the case of a server crash or power failure. I wouldn't recommend it. > Also, is COMMIT automatic for my queries? In some minor testing I did (not > scientific I did at all) some queries through Perl did not update the > database at all. I had "fsync=off" in my conf file, and there was no COMMIT > etc in my SQL, just plain SQL. So I am a bit confused. What's a good > starting point? psql runs in autocommit mode by default. If you want multiple queries in a transaction you have to issue a BEGIN statement. Drivers may do various things by default. >> Grouping more work into a single transaction makes the delay for the fsync at >> COMMIT time less of a problem. > > > Agree. That's what I am trying to do. Include as many UPDATEs etc into > the same TRANSACTION block, but my worry is when I read about > autocommit and how it is enabled by default in postgresql 8. > Specifying explicit BEGIN and COMMIT blocks should only commit when I > want the DB to, or will each UPDATE in the middle of this block get > executed? > > Sorry if this is a naive question. I am reading up as much as I can. If you issue a BEGIN then nothing gets committed until you issue a COMMIT. If anything happens in the meantime then everything you've done since the BEGIN disappears. For batch work like loading then it makes sense to handle a 100-1000 records per transaction. But for most purposes you want to group things together based on what you want to happen if an error occurs. Group together into a single transaction precisely the changes that you want to be committed together or rolled back together. Don't structure your program around the performance issues. For the remaining questions I would say you need to experiment. Perhaps others will have more ideas. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Moving to postgresql and some ignorant questions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/14/07 10:09, Phoenix Kiula wrote: > On 14/08/07, Gregory Stark <[EMAIL PROTECTED]> wrote: >> "Phoenix Kiula" <[EMAIL PROTECTED]> writes: [snip] >>> >>> QUESTION1: >>> Is this it? Or am I missing something in terms of execution? We use >>> Perl on our site but may be gradually switching to PHP at some point. >>> Will the above plan of execution be ok? >> A transaction is a bunch of queries which you want to all get committed or >> aborted together. The expensive step in Postgres is the COMMIT. Postgres does >> an fsync which forces the data to go to disk (if you're using good hardware, >> and don't have fsync=off). That takes from 5-15ms depending on how much data >> and how fast your drives are. > > > > > If I am reading this right, does this mean it is probably better to > leave fsync as "fsync=off" on production machines? Also, is COMMIT > automatic for my queries? In some minor testing I did (not scientific > I did at all) some queries through Perl did not update the database at > all. I had "fsync=off" in my conf file, and there was no COMMIT etc in > my SQL, just plain SQL. So I am a bit confused. What's a good starting > point? > [snip] > > So, again, in the conf file, is this what you recommend: > > fsync=off I seriously doubt that's what he means. "fsync=off" *is* faster than "fsync=on", but leaves your data at risk in case of a hardware crash. Turning it off during initial data load is quite common, though. Just remember to turn it back on!!! - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGwczGS9HxQb37XmcRAhUvAJ9LsQPvd6tQDp+/Mzh3jl8oPs4mHQCffjev 2uCJa3x0/NzUQBVmaJMcVR4= =kVMU -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Moving to postgresql and some ignorant questions
Thank you AM. Very useful note, must appreciate the info you shared. About COPY, I have two simple questions: 1. Is there anything like an ALTER DATABASE command? I would like to change the character set without having to recreate the DATABASE again! 2. Also, when I do a mysqldump I seem to be ending up with "\r" in my lines, or so the COPY command tells me: ERROR: literal carriage return found in data HINT: Use "\r" to represent carriage return. CONTEXT: COPY links, line 962974 But afaik, there is nothing in my data in MySQL that contains the new line characters! I have checked and rechecked in the DB to find such characters. I cannot open that text file to check because it contains a lot of data (from about 7 million records). So I tried to do a "grep '\r' FILENAME" but that doesn't help either because of course there is a newline character, that is what is used to separate a line! Is there anything else I can do to make sure mysqldump data comes through? Perhaps separate fields by TABS and lines by a specific character such as "`"? TIA for any thoughts. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Moving to postgresql and some ignorant questions
On Tue, Aug 14, 2007 at 11:09:36PM +0800, Phoenix Kiula wrote: > If I am reading this right, does this mean it is probably better to > leave fsync as "fsync=off" on production machines? No, you want "fsync=on" on any machine which holds data you care about. And you want hardware which doesn't lie to you so that "fsync is finished" really means the data is on-disk. Else PostgreSQL cannot ensure ACID compliance. > Specifying explicit BEGIN and COMMIT blocks should only commit when I > want the DB to, yes > or will each UPDATE in the middle of this block get > executed? It will get executed but the effects will only become publicly visible after COMMIT (assuming no errors in the queries in which case you could issue COMMITs all day long and still see no change in the database from the transaction in which the error occurred) > So, again, in the conf file, is this what you recommend: > > fsync=off No. > max_connections=100 Yes, depending on your usage patterns. > The problem with simple CHECK constraints is that they can only > reference the primary key in another table. Not so. Or you need to explain what you mean by "simple CHECK constraints". > I am happy to do this, but I don't see an ALTER DATABASE command. ALTER DATABASE is there, of course, but it doesn't help you. > I would really like not to have to execute the CREATE DATABASE command > again! You'll have to, unfortunately, I fear. Once you go about it take the opportunity and make sure the locale and encoding settings of initdb are compatible with an UTF8 database. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Moving to postgresql and some ignorant questions
On 14/08/07, Gregory Stark <[EMAIL PROTECTED]> wrote: > "Phoenix Kiula" <[EMAIL PROTECTED]> writes: > > > Though other threads I have learned that multiple inserts or updates > > can be sped up with: > > > > [QUOTE] > > - BEGIN TRANSACTION; > > - INSERT OR UPDATE queries, ideally PREPAREd; > > - COMMIT; > > [/QUOTE] > > > > QUESTION1: > > Is this it? Or am I missing something in terms of execution? We use > > Perl on our site but may be gradually switching to PHP at some point. > > Will the above plan of execution be ok? > > A transaction is a bunch of queries which you want to all get committed or > aborted together. The expensive step in Postgres is the COMMIT. Postgres does > an fsync which forces the data to go to disk (if you're using good hardware, > and don't have fsync=off). That takes from 5-15ms depending on how much data > and how fast your drives are. If I am reading this right, does this mean it is probably better to leave fsync as "fsync=off" on production machines? Also, is COMMIT automatic for my queries? In some minor testing I did (not scientific I did at all) some queries through Perl did not update the database at all. I had "fsync=off" in my conf file, and there was no COMMIT etc in my SQL, just plain SQL. So I am a bit confused. What's a good starting point? > Grouping more work into a single transaction makes the delay for the fsync at > COMMIT time less of a problem. Agree. That's what I am trying to do. Include as many UPDATEs etc into the same TRANSACTION block, but my worry is when I read about autocommit and how it is enabled by default in postgresql 8. Specifying explicit BEGIN and COMMIT blocks should only commit when I want the DB to, or will each UPDATE in the middle of this block get executed? Sorry if this is a naive question. I am reading up as much as I can. > Also having more connections (but not too many, > more than a few per processor is probably not helping, more than 100 and it's > probably slowing you down) also means it's less important since another > process can do some of its work while you're waiting for the fsync. So, again, in the conf file, is this what you recommend: fsync=off max_connections=100 ? > Yes, it's a pain. Running vacuum frequently will be necessary. You may also > have to raise your fsm settings to allow Postgres to remember more free space > between vacuums. Thank you for your comments about autovacuum. I have these FSM and memory type settings settings in my conf (picked off the internet :)) -- max_fsm_relations = 1500 max_fsm_pages = 8 shared_buffers = 21000 effective_cache_size = 21000 sort_mem = 16348 work_mem = 16348 vacuum_mem = 16348 temp_buffers = 4096 authentication_timeout = 10s ssl = off Do these sound right? > Incidentally, most data integrity checks are handled with CHECK constraints > and FOREIGN KEY constraints rather than manual triggers. They're both easier > and cheaper. The problem with simple CHECK constraints is that they can only reference the primary key in another table. What if I want more than one column to be the same as the referenced table, but do not want to have a compound primary key in the referenced table? From reading and re-reading the manual, I dont think FKs allow for this. Only primary key references are supported. > Sorry, this is one of the main deficiencies in Postgres. You will probably > have to convert your data to utf8 across the board and hopefully you'll find a > collation which satisfies all your needs. I am happy to do this, but I don't see an ALTER DATABASE command. I would really like not to have to execute the CREATE DATABASE command again! Can I make the entire DB utf8 using some command now? Have not been able to find it. ALl manual and google stuff seems to point to the CREATE DB command only. Many thanks! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Moving to postgresql and some ignorant questions
"Phoenix Kiula" <[EMAIL PROTECTED]> writes: > Though other threads I have learned that multiple inserts or updates > can be sped up with: > > [QUOTE] > - BEGIN TRANSACTION; > - INSERT OR UPDATE queries, ideally PREPAREd; > - COMMIT; > [/QUOTE] > > QUESTION1: > Is this it? Or am I missing something in terms of execution? We use > Perl on our site but may be gradually switching to PHP at some point. > Will the above plan of execution be ok? A transaction is a bunch of queries which you want to all get committed or aborted together. The expensive step in Postgres is the COMMIT. Postgres does an fsync which forces the data to go to disk (if you're using good hardware, and don't have fsync=off). That takes from 5-15ms depending on how much data and how fast your drives are. Grouping more work into a single transaction makes the delay for the fsync at COMMIT time less of a problem. Also having more connections (but not too many, more than a few per processor is probably not helping, more than 100 and it's probably slowing you down) also means it's less important since another process can do some of its work while you're waiting for the fsync. > My queries are all optimized and indexed well. But the defragmentation > resulting from UPDATEs can be a pain. I wonder if PGSQL's autovacuum > will help. As for autovacuum we have every intention of leaving it on. > Will the following settings be alright? Yes, it's a pain. Running vacuum frequently will be necessary. You may also have to raise your fsm settings to allow Postgres to remember more free space between vacuums. > [QUOTE] > autovacuum = on > vacuum_cost_delay = 30 > stats_start_collector = on > stats_row_level = on > autovacuum_vacuum_threshold = 100 > autovacuum_analyze_threshold = 50 > [/QUOTE] > > I am hoping that the frequent vacuum thresholds will help, but: > > QUESTION 2: > Are these settings too aggressive? While an autovacuum is running in > the background, will it lock tables or anything? Can the tables still > be operational, and the autovacuum will automatically resume from > whatever point it was at? I am worried about how autovacuum will > perform when VACUUM doesn't lock tables. It's designed to operate without interfering. It does still take up i/o bandwidth which affects performance. The autovacuum_cost_delay above tells it to wait 30ms every few pages to try to avoid slowing down production. You'll have to judge based on experience whether it's taking too long with that time. You may be better off starting with 10ms or 20ms instead. I don't think the threshold parameters will be relevant to you. You should look at autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor. I may be wrong though, someone more familiar with autovacuum in 8.2 might have to speak up. > QUESTION 3. > Data integrity checks in MySQL world were very patchy, relying on CGI > stuff to make sure, which does not always work. We are trying our best > to get rid of them. With postgresql, I realize we can have triggers as > well as foreign keys. But searching through old threads on this group > suggests that triggers also present performance problems. On tables > that are infrequently updated, can I write triggers without worrying > about performance? Or, how can I make sure their performance is as > best as it can be, i.e., which of the config vars is responsible for > that? Triggers are quite efficient in Postgres but they still cost something. Nobody will be able to give you a blanket statement that you can do anything without testing it. But you're probably better off having them and then considering removing them later if you have a problem. Incidentally, most data integrity checks are handled with CHECK constraints and FOREIGN KEY constraints rather than manual triggers. They're both easier and cheaper. > QUESTION 4: > Character sets: In MySQL we had utf-8 as our default character set. > Yet, on some tables we had the collation of latin1_general_ci, and > only on specific columns we had "utf8_general_ci" (where non-english > text needed to be stored). How should I do this in pgsql? When I do a > mysqldump of these tables, and then COPY them back into pgsql, I > always see the error "ERROR: invalid byte sequence for encoding > "UTF8": 0xa7". So I do a > > \encoding latin1 > > And then my COPY import works. But this is not what I want. How can I > set up one of my columns in this table to be utf-8, and the rest to be > latin? Then I would like to import with "\encoding utf8". Can this be > somehow done? Sorry, this is one of the main deficiencies in Postgres. You will probably have to convert your data to utf8 across the board and hopefully you'll find a collation which satisfies all your needs. You can't switch encoding or collation on the fly. You could look at the CONVERT function which might help, but I'm not sure exactly what you would have to do to solve your immediate problem. If you really need multiple collations in a single database t
[GENERAL] Moving to postgresql and some ignorant questions
I have been a long time user of mysql. Switching to Postgresql because the true features included in 5.1 (as of this moment) are nothing to write home about. The InnoDB stuff is highly advocated but it has its own set of issues, and when one looks at things like backup/restore etc, it is clearly targeted at expensive installs with full DBAs and such, which we cannot afford. So I have been reading up feverishly on PGSQL and it surely is a more serious DB, which is good, but also a bit steep in its learning curve. I have some pretty silly questions to ask below. Just to confirm that I manage my switch as smoothly as possible! By way of information, we have new double AMD Opterons with 3GB of RAM. The postgres that has been installed is 8.2.3. But our DB is not as large as some of the discussions on pgsql-general. No table is more than 10 million records or likely to exceed that anytime soon. But I have some heavy simultaneous user connections much like any web application for a busy website. In particular I have a table that needs very high availability: it has bout 10,000 INSERTS a day, about 500,000 SELECTS a day (with or without joins), but most importantly about 1 million UPDATEs. (It is the UPDATE that is bothering the MYSQL engine of "MYISAM" type with frequent data corruption). Though other threads I have learned that multiple inserts or updates can be sped up with: [QUOTE] - BEGIN TRANSACTION; - INSERT OR UPDATE queries, ideally PREPAREd; - COMMIT; [/QUOTE] QUESTION1: Is this it? Or am I missing something in terms of execution? We use Perl on our site but may be gradually switching to PHP at some point. Will the above plan of execution be ok? My queries are all optimized and indexed well. But the defragmentation resulting from UPDATEs can be a pain. I wonder if PGSQL's autovacuum will help. As for autovacuum we have every intention of leaving it on. Will the following settings be alright? [QUOTE] autovacuum = on vacuum_cost_delay = 30 stats_start_collector = on stats_row_level = on autovacuum_vacuum_threshold = 100 autovacuum_analyze_threshold = 50 [/QUOTE] I am hoping that the frequent vacuum thresholds will help, but: QUESTION 2: Are these settings too aggressive? While an autovacuum is running in the background, will it lock tables or anything? Can the tables still be operational, and the autovacuum will automatically resume from whatever point it was at? I am worried about how autovacuum will perform when QUESTION 3. Data integrity checks in MySQL world were very patchy, relying on CGI stuff to make sure, which does not always work. We are trying our best to get rid of them. With postgresql, I realize we can have triggers as well as foreign keys. But searching through old threads on this group suggests that triggers also present performance problems. On tables that are infrequently updated, can I write triggers without worrying about performance? Or, how can I make sure their performance is as best as it can be, i.e., which of the config vars is responsible for that? QUESTION 4: Character sets: In MySQL we had utf-8 as our default character set. Yet, on some tables we had the collation of latin1_general_ci, and only on specific columns we had "utf8_general_ci" (where non-english text needed to be stored). How should I do this in pgsql? When I do a mysqldump of these tables, and then COPY them back into pgsql, I always see the error "ERROR: invalid byte sequence for encoding "UTF8": 0xa7". So I do a \encoding latin1 And then my COPY import works. But this is not what I want. How can I set up one of my columns in this table to be utf-8, and the rest to be latin? Then I would like to import with "\encoding utf8". Can this be somehow done? Sorry for this long post, but as exciting as this switch is, it is also daunting because I feel like I am moving into serious databases territory and I don't want to goof up. I have read up a lot and am continuing to, but it would be great if someone can shed some light on the above to begin with. TIA! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq