[HACKERS] Performance while loading data and indexing
Hello all, Some time back I posted a query to build a site with 150GB of database. In last couple of weeks, lots of things were tested at my place and there are some results and again some concerns. This is a long post. Please be patient and read thr. If we win this, I guess we have a good marketing/advocacy case here..;-) First the problems (For those who do not read beyond first page) 1) Database load time from flat file using copy is very high 2) Creating index takes huge amount of time. 3) Any suggsestions for runtime as data load and query will be going in parallel. Now the details. Note that this is a test run only.. Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI RedHat7.2/PostgreSQL7.1.3 Database in flat file: 125,000,000 records of around 100 bytes each. Flat file size 12GB Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec. Create unique composite index on 2 char and a timestamp field: 25226 sec. Database size on disk: 26GB Select query: 1.5 sec. for approx. 150 rows. Important postgresql.conf settings sort_mem = 12000 shared_buffers = 24000 fsync=true (Sad but true. Left untouched.. Will that make a difference on SCSI?) wal_buffers = 65536 wal_files = 64 Now the requirements Initial flat data load: 250GB of data. This has gone up since last query. It was 150GB earlier.. Ongoing inserts: 5000/sec. Number of queries: 4800 queries/hour Query response time: 10 sec. Now questions. 1) Instead of copying from a single 12GB data file, will a parallel copy from say 5 files will speed up the things? Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5 setup.. 2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further addition to improve create index performance? 3) 5K concurrent inserts with an index on, will this need a additional CPU power? Like deploying it on dual RISC CPUs etc? 4) Query performance is not a problem. Though 4.8K queries per sec. expected response time from each query is 10 sec. But my guess is some serius CPU power will be chewed there too.. 5)Will upgrading to 7.2.2/7.3 beta help? All in all, in the test, we didn't see the performance where hardware is saturated to it's limits. So effectively we are not able to get postgresql making use of it. Just pushing WAL and shared buffers does not seem to be the solution. If you guys have any suggestions. let me know. I need them all.. Mysql is almost out because it's creating index for last 17 hours. I don't think it will keep up with 5K inserts per sec. with index. SAP DB is under evaluation too. But postgresql is most favourite as of now because it works. So I need to come up with solutions to problems that will occur in near future.. ;-) TIA.. Bye Shridhar -- Law of Procrastination: Procrastination avoids boredom; one never has the feeling that there is nothing important to do. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Performance while loading data and indexing
On 26 Sep 2002 at 14:05, Shridhar Daithankar wrote: Some time back I posted a query to build a site with 150GB of database. In last couple of weeks, lots of things were tested at my place and there are some results and again some concerns. 2) Creating index takes huge amount of time. Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec. Create unique composite index on 2 char and a timestamp field: 25226 sec. Database size on disk: 26GB Select query: 1.5 sec. for approx. 150 rows. 2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further addition to improve create index performance? Just a thought. If I sort the table before making an index, would it be faster than creating index on raw table? And/or if at all, how do I sort the table without duplicating it? Just a wild thought.. Bye Shridhar -- linux: the choice of a GNU generation([EMAIL PROTECTED] put this on Tshirts in '93) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
I'll preface this by saying that while I have a large database, it doesn't require quite the performace you're talking about here. On Thu, Sep 26, 2002 at 02:05:44PM +0530, Shridhar Daithankar wrote: 1) Database load time from flat file using copy is very high 2) Creating index takes huge amount of time. 3) Any suggsestions for runtime as data load and query will be going in parallel. You're loading all the data in one copy. I find that INSERTs are mostly limited by indexes. While index lookups are cheap, they are not free and each index needs to be updated for each row. I fond using partial indexes to only index the rows you actually use can help with the loading. It's a bit obscure though. As for parallel loading, you'll be limited mostly by your I/O bandwidth. Have you measured it to take sure it's up to speed? Now the details. Note that this is a test run only.. Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI RedHat7.2/PostgreSQL7.1.3 Database in flat file: 125,000,000 records of around 100 bytes each. Flat file size 12GB Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec. Create unique composite index on 2 char and a timestamp field: 25226 sec. Database size on disk: 26GB Select query: 1.5 sec. for approx. 150 rows. So you're loading at a rate of 860KB per sec. That's not too fast. How many indexes are active at that time? Triggers and foreign keys also take their toll. Important postgresql.conf settings sort_mem = 12000 shared_buffers = 24000 fsync=true (Sad but true. Left untouched.. Will that make a difference on SCSI?) wal_buffers = 65536 wal_files = 64 fsync IIRC only affects the WAL buffers now but it may be quite expensive, especially considering it's running on every transaction commit. Oh, your WAL files are on a seperate disk from the data? Initial flat data load: 250GB of data. This has gone up since last query. It was 150GB earlier.. Ongoing inserts: 5000/sec. Number of queries: 4800 queries/hour Query response time: 10 sec. That looks quite acheivable. 1) Instead of copying from a single 12GB data file, will a parallel copy from say 5 files will speed up the things? Limited by I/O bandwidth. On linux vmstat can tell you how many blocks are being loaded and stored per second. Try it. As long as sync() doesn't get done too often, it should be help. Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5 setup.. No, it's not. You should be able to do better. 2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further addition to improve create index performance? Should be fine. Admittedly your indexes are taking rather long to build. 3) 5K concurrent inserts with an index on, will this need a additional CPU power? Like deploying it on dual RISC CPUs etc? It shouldn't. Do you have an idea of what your CPU usage is? ps aux should give you a decent idea. 4) Query performance is not a problem. Though 4.8K queries per sec. expected response time from each query is 10 sec. But my guess is some serius CPU power will be chewed there too.. Should be fine. 5)Will upgrading to 7.2.2/7.3 beta help? Possibly, though it may be wirth it just for the features/bugfixes. All in all, in the test, we didn't see the performance where hardware is saturated to it's limits. So effectively we are not able to get postgresql making use of it. Just pushing WAL and shared buffers does not seem to be the solution. If you guys have any suggestions. let me know. I need them all.. Find the bottleneck: CPU, I/O or memory? Mysql is almost out because it's creating index for last 17 hours. I don't think it will keep up with 5K inserts per sec. with index. SAP DB is under evaluation too. But postgresql is most favourite as of now because it works. So I need to come up with solutions to problems that will occur in near future.. ;-) 17 hours! Ouch. Either way, you should be able to do much better. Hope this helps, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ There are 10 kinds of people in the world, those that can do binary arithmetic and those that can't. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Performance while loading data and indexing
On 26 Sep 2002 at 10:51, [EMAIL PROTECTED] wrote: Hi, it seems you have to cluster it, I don't think you have another choise. Hmm.. That didn't occur to me...I guess some real time clustering like usogres would do. Unless it turns out to be a performance hog.. But this is just insert and select. No updates no deletes(Unless customer makes a 180 degree turn) So I doubt if clustering will help. At the most I can replicate data across machines and spread queries on them. Replication overhead as a down side and low query load on each machine as upside.. I'm retrieving the configuration of our postgres servers (I'm out of office now), so I can send it to you. I was quite disperate about performance, and I was thinking to migrate the data on an oracle database. Then I found this configuration on the net, and I had a succesfull increase of performance. In this case, we are upto postgresql because we/our customer wants to keep the costs down..:-) Even they are asking now if it's possible to keep hardware costs down as well. That's getting some funny responses here but I digress.. Maybe this can help you. Why you use copy to insert records? I usually use perl scripts, and they work well . Performance reasons. As I said in one of my posts earlier, putting upto 100K records in one transaction in steps of 10K did not reach performance of copy. As Tom said rightly, it was a 4-1 ratio despite using transactions.. Thanks once again.. Bye Shridhar -- Secretary's Revenge:Filing almost everything under the. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] 7.3b1 : DROP DOMAIN CASCADE CAN LEAVE A TABLE WITH NO COLUMNS
Hello, I've mailed this to the bugs list but that seems to have stopped working on the 24th so I thought I'd better email it through on here. I have found it is possible for a user with create table permission to crash the 7.3b1 backend. The crash occurs because it is possible to have a table with no columns after a DROP DOMAIN CASCADE. Create a table with one column (with that columns type specified as a domain) then issue the command to DROP DOMAIN ... CASCADE. The column will be dropped from the table, leaving the table with no columns. It is then possible (not surprisngly) to crash the backend by querying that table using a wildcard. Running the SQL listed at the bottom twice will cause a crash with the following log enteries: WARNING: ShmemInitStruct: ShmemIndex entry size is wrong FATAL: LockMethodTableInit: couldn't initialize LockTable Upon restarting the server the following message appears in the log, each time with a different offset: LOG: ReadRecord: unexpected pageaddr 0/BA36A000 in log file 0, segment 191, offset 3579904 I am assuming this is a consequence of the abnormal termination but I thought it worth mentioning for completeness. It also only appears if the SQL below is wrapped up in a transaction. To recreate the problem enter the following SQL in psql:- BEGIN; CREATE DOMAIN d1 int; CREATE TABLE t1 (col_a d1); -- IF YOU DROP DOMAIN d1 CASCADE then col_a WILL BE DROPPED AND THE TABLE t1 WILL HAVE NO COLUMNS DROP DOMAIN d1 CASCADE; -- TABLE t1 NOW HAS NO COLUMNS -- THIS PROBLEM CAN ALSO BE CREATED BY DROP SCHEMA .. CASCADE AS WELL (AS LONG AS THE TABLE IS NOT IN THE SCHEMA BEING DROPPED AND THEREFORE NOT DROPPED AS PART OF THE CASCADE). -- THE FOLLOWING SELECT WILL CRASH THE BACKEND SELECT t1.* FROM t1 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Insert Performance
Tom Lane wrote: Michael Paesold [EMAIL PROTECTED] writes: Only vacuum will reset the insert times to the lowest possible! What does the vacuum code do?? :-] It removes dead tuples. Dead tuples can only arise from update or delete operations ... so you have not been telling us the whole truth. An insert-only test would not have this sort of behavior. regards, tom lane Sleeping is good. When I woke up this morning I had an idea of what is causing these problems; and you are right. I had used a self-written sequence system for the invoice_ids -- I can't use a sequence because sequence values can skip. So inserting an invoice would also do an update on a single row of the cs_sequence table, which cause the problems. Now, with a normal sequence, it works like a charm. 17 sec. for 1 rows and 2-3 sec. for commit. But why is performance so much degrading? After 1 updates on a row, the row seems to be unusable without vacuum! I hope the currently discussed autovacuum daemon will help in such a situation. So I think I will have to look for another solution. It would be nice if one could lock a sequence! That would solve all my troubles,... dreaming BEGIN; LOCK SEQUENCE invoice_id_seq; -- now only this connection can get nextval(), all others will block INSERT INTO invoice VALUES (nextval('invoice_id_seq'), ...); INSERT INTO invoice VALUES (nextval('invoice_id_seq'), ...); ... COMMIT; -- now this only helps if sequences could be rolled back -- wake up! /dreaming What could you recommend? Locking the table and selecting max(invoice_id) wouldn't really be much faster, with max(invoice_id) not using an index... Best Regards, Michael Paesold ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Insert Performance
On 26 Sep 2002 at 12:28, Michael Paesold wrote: But why is performance so much degrading? After 1 updates on a row, the row seems to be unusable without vacuum! I hope the currently discussed autovacuum daemon will help in such a situation. Let mw know if it works. Use CVS BTW.. I am eager to know any bug reports.. Didn't have a chance to test it the way I would have liked. May be this weekend.. Bye Shridhar -- QOTD: The forest may be quiet, but that doesn't mean the snakes have gone away. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Hi Shridhar, Shridhar Daithankar wrote: snip 3) Any suggsestions for runtime as data load and query will be going in parallel. That sounds unusual. From reading this, it *sounds* like you'll be running queries against an incomplete dataset, or maybe just running the queries that affect the tables loaded thus far (during the initial load). snip fsync=true (Sad but true. Left untouched.. Will that make a difference on SCSI?) Definitely. Have directly measured a ~ 2x tps throughput increase on FreeBSD when leaving fsync off whilst performance measuring stuff recently (PG 7.2.2). Like anything it'll depend on workload, phase of moon, etc, but it's a decent indicator. snip Now questions. 1) Instead of copying from a single 12GB data file, will a parallel copy from say 5 files will speed up the things? Not sure yet. Haven't get done enough performance testing (on the cards very soon though). Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5 setup.. fsync = off would help during the data load, but not a good idea if you're going to be running queries against it at the same time. Am still getting the hang of performance tuning stuff. Have a bunch of Ultra160 hardware for the Intel platform, and am testing against it as time permits. Not as high end as I'd like, but it's a start. :-) Regards and best wishes, Justin Clift snip Bye Shridhar -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Performance while loading data and indexing
On 26 Sep 2002 at 14:05, Shridhar Daithankar wrote: Some time back I posted a query to build a site with 150GB of database. In last couple of weeks, lots of things were tested at my place and there are some results and again some concerns. 2) Creating index takes huge amount of time. Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec. Create unique composite index on 2 char and a timestamp field: 25226 sec. Database size on disk: 26GB Select query: 1.5 sec. for approx. 150 rows. I never tried 150GB of data, but 10GB of data, and this worked fine for me. Maybe it will help if you post your table schema, including which indexes you use, and the average size of one tuple. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Performance while loading data and indexing
On 26 Sep 2002 at 11:17, Mario Weilguni wrote: On 26 Sep 2002 at 14:05, Shridhar Daithankar wrote: Some time back I posted a query to build a site with 150GB of database. In last couple of weeks, lots of things were tested at my place and there are some results and again some concerns. 2) Creating index takes huge amount of time. Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec. Create unique composite index on 2 char and a timestamp field: 25226 sec. Database size on disk: 26GB Select query: 1.5 sec. for approx. 150 rows. I never tried 150GB of data, but 10GB of data, and this worked fine for me. Maybe it will help if you post your table schema, including which indexes you use, and the average size of one tuple. Well the test runs were for 10GB of data. Schema is attached. Read in fixed fonts..Last nullable fields are dummies but may be used in fututre and varchars are not acceptable(Not my requirement). Tuple size is around 100 bytes.. The index creation query was CREATE INDEX index1 ON tablename (esn,min,datetime); What if I put datetime ahead? It's likely the the datetime field will have high degree of locality being log data.. Bye Shridhar -- brain, v: [as in to brain]To rebuke bluntly, but not pointedly; to dispel a source of error in an opponent.-- Ambrose Bierce, The Devil's Dictionary Field Name Field Type NullableIndexed typeint no no esn char (10) no yes min char (10) no yes datetimetimestamp no yes opc0char (3)no no opc1char (3)no no opc2char (3)no no dpc0char (3)no no dpc1char (3)no no dpc2char (3)no no npa char (3)no no nxx char (3)no no restchar (4)no no field0 int yes no field1 char (4)yes no field2 int yes no field3 char (4)yes no field4 int yes no field5 char (4)yes no field6 int yes no field7 char (4)yes no field8 int yes no field9 char (4)yes no ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On 26 Sep 2002 at 19:17, Justin Clift wrote: Shridhar Daithankar wrote: snip 3) Any suggsestions for runtime as data load and query will be going in parallel. That sounds unusual. From reading this, it *sounds* like you'll be running queries against an incomplete dataset, or maybe just running the queries that affect the tables loaded thus far (during the initial load). That's correct. Load the data so far and keep inserting data as and when it generates. They don't mind running against data so far. It's not very accurate stuff IMO... fsync=true (Sad but true. Left untouched.. Will that make a difference on SCSI?) Definitely. Have directly measured a ~ 2x tps throughput increase on FreeBSD when leaving fsync off whilst performance measuring stuff recently (PG 7.2.2). Like anything it'll depend on workload, phase of moon, etc, but it's a decent indicator. I didn't know even that matters with SCSI..Will check out.. fsync = off would help during the data load, but not a good idea if you're going to be running queries against it at the same time. That's OK for the reasons mentioned above. It wouldn't be out of place to expect a UPS to such an installation... Bye Shridhar -- Hoare's Law of Large Problems: Inside every large problem is a small problem struggling to get out. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On 26 Sep 2002 at 19:05, Martijn van Oosterhout wrote: On Thu, Sep 26, 2002 at 02:05:44PM +0530, Shridhar Daithankar wrote: 1) Database load time from flat file using copy is very high 2) Creating index takes huge amount of time. 3) Any suggsestions for runtime as data load and query will be going in parallel. You're loading all the data in one copy. I find that INSERTs are mostly limited by indexes. While index lookups are cheap, they are not free and each index needs to be updated for each row. I fond using partial indexes to only index the rows you actually use can help with the loading. It's a bit obscure though. As for parallel loading, you'll be limited mostly by your I/O bandwidth. Have you measured it to take sure it's up to speed? Well. It's like this, as of now.. CreateDB-create table-create index-Select. So loading is not slowed by index. As of your hint of vmstat, will check it out. So you're loading at a rate of 860KB per sec. That's not too fast. How many indexes are active at that time? Triggers and foreign keys also take their toll. Nothing except the table where data os loaded.. fsync IIRC only affects the WAL buffers now but it may be quite expensive, especially considering it's running on every transaction commit. Oh, your WAL files are on a seperate disk from the data? No. Same RAID 5 disks.. It shouldn't. Do you have an idea of what your CPU usage is? ps aux should give you a decent idea. I guess we forgot to monitor system parameters. Next on my list is running vmstat, top and tuning bdflush. Find the bottleneck: CPU, I/O or memory? Understood.. Mysql is almost out because it's creating index for last 17 hours. I don't think it will keep up with 5K inserts per sec. with index. SAP DB is under evaluation too. But postgresql is most favourite as of now because it works. So I need to come up with solutions to problems that will occur in near future.. ;-) 17 hours! Ouch. Either way, you should be able to do much better. Hope this helps, Heh.. no wonder this evaluation is taking more than 2 weeks.. Mysql was running out of disk space while creating index and crashin. An upgrade to mysql helped there but no numbers as yet.. Thanks once again... Bye Shridhar -- Boren's Laws: (1) When in charge, ponder. (2) When in trouble, delegate. (3) When in doubt, mumble. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Shridhar Daithankar wrote: snip fsync=true (Sad but true. Left untouched.. Will that make a difference on SCSI?) Definitely. Have directly measured a ~ 2x tps throughput increase on FreeBSD when leaving fsync off whilst performance measuring stuff recently (PG 7.2.2). Like anything it'll depend on workload, phase of moon, etc, but it's a decent indicator. I didn't know even that matters with SCSI..Will check out.. Cool. When testing it had FreeBSD 4.6.2 installed on one drive along with the PostgreSQL 7.2.2 binaries, it had the data on a second drive (mounted as /pgdata), and it had the pg_xlog directory mounted on a third drive. Swap had it's own drive as well. Everything is UltraSCSI, etc. Haven't yet tested for a performance difference through moving the indexes to another drive after creation though. That apparently has the potential to help as well. :-) Regards and best wishes, Justin Clift -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Performance while loading data and indexing
Well the test runs were for 10GB of data. Schema is attached. Read in fixed fonts..Last nullable fields are dummies but may be used in fututre and varchars are not acceptable(Not my requirement). Tuple size is around 100 bytes.. The index creation query was CREATE INDEX index1 ON tablename (esn,min,datetime); What if I put datetime ahead? It's likely the the datetime field will have high degree of locality being log data.. Just an idea, I noticed you use char(10) for esn and min, and use this as index. Are these really fixed len fields all having 10 bytes? Otherwise varchar(10) would be better, because your tables, and especially the indices will be probably much smaller. what average length do you have for min and esn? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] [HACKERS] Performance while loading data and indexing
On Thu, Sep 26, 2002 at 03:01:35PM +0530, Shridhar Daithankar wrote: Content-Description: Mail message body The index creation query was CREATE INDEX index1 ON tablename (esn,min,datetime); What if I put datetime ahead? It's likely the the datetime field will have high degree of locality being log data.. The order of fields depends on what you're using it for. For example, you can use the above index for a query using the conditions: esn = 'aaa' esn = 'bbb' and min = 'xxx' but not for queries with only datetime = '2002-09-26' min = 'ddd' and datetime = '2002-10-02' The fields can only be used left to right. This is where a single multicolumn index differs from multiple indexes of different columns. Have you used EXPLAIN ANALYSE to determine whether your indexes are being used optimally? -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ There are 10 kinds of people in the world, those that can do binary arithmetic and those that can't. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Shridhar Daithankar wrote: On 26 Sep 2002 at 19:05, Martijn van Oosterhout wrote: snip fsync IIRC only affects the WAL buffers now but it may be quite expensive, especially considering it's running on every transaction commit. Oh, your WAL files are on a seperate disk from the data? No. Same RAID 5 disks.. Not sure if this is a good idea. Would have to think deeply about the controller and drive optimisation/load characteristics. If it's any help, when I was testing recently with WAL on a separate drive, the WAL logs were doing more readwrites per second than the main data drive. This would of course be affected by the queries you are running against the database. I was just running Tatsuo's TPC-B stuff, and the OSDB AS3AP tests. I guess we forgot to monitor system parameters. Next on my list is running vmstat, top and tuning bdflush. That'll just be the start of it for serious performance tuning and learning how PostgreSQL works. :) snip Thanks once again... Bye Shridhar -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Insert Performance
What could you recommend? Locking the table and selecting max(invoice_id) wouldn't really be much faster, with max(invoice_id) not using an index... select invoice_id from table order by invoice_id desc limit 1; should get you the maximum fast if you have a unique index on invoice_id. Andreas ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] PGXLOG variable worthwhile?
Curt Sampson wrote: On Wed, 25 Sep 2002, Jan Wieck wrote: With the number of screws our product has, there are so many possible combinations that don't work, why worry about one more or less? That's just silly, so I won't even bother replying. Curt, it might sound silly on first sight and isolated. But it was in reply to: But still, why set up a situation where your database might not start? Why not set it up so that if you get just *one* environment or command-line variable right, you can't set another inconsistently and screw up your start anyway? Why store configuration information outside of the database data directory in a form that's not easily backed up, and not easily found by other utilities? Apply that argumentation to all of our commandline switches and config options and we end up with something that behaves like Microsoft products ... they know everything better, you cannot tune them, they work ... and you needed a bigger machine anyway. I am absolutely not in favour of the PGXLOG environment variable. But if someone else wants it, it doesn't bother me because I wouldn't use it and it cannot hurt me. I am simply against this I think it's wrong so you have to change your behaviour attitude. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Insert Performance
Michael Paesold [EMAIL PROTECTED] writes: So inserting an invoice would also do an update on a single row of the cs_sequence table, which cause the problems. Now, with a normal sequence, it works like a charm. 17 sec. for 1 rows and 2-3 sec. for commit. But why is performance so much degrading? After 1 updates on a row, the row seems to be unusable without vacuum! Probably, because the table contains 1 dead tuples and one live one. The system is scanning all 10001 tuples looking for the one to UPDATE. In 7.3 it might help a little to create an index on the table. But really this is one of the reasons that SEQUENCEs were invented --- you have no alternative but to do frequent vacuums, if you repeatedly update the same row of a table. You might consider issuing a selective VACUUM cs_sequence command every so often (ideally every few hundred updates). I hope the currently discussed autovacuum daemon will help in such a situation. Probably, if we can teach it to recognize that such frequent vacuums are needed. In the meantime, cron is your friend ... regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Insert Performance
Zeugswetter Andreas SB SD [EMAIL PROTECTED] wrote: What could you recommend? Locking the table and selecting max(invoice_id) wouldn't really be much faster, with max(invoice_id) not using an index... select invoice_id from table order by invoice_id desc limit 1; should get you the maximum fast if you have a unique index on invoice_id. Andreas I've figured that out after reading the TODO about max()/min() using indexes. Thank you anyway! The second problem I had was that I have invoices here that have not been sent into accounting. An actual invoice_id is something like 210309 at the moment. So I used invoice_ids 3000 for pre invoice_ids. Having much of those pre invoices makes select ... desc limit 1 too slow. I figured out that I can use a partial index as a solution: CREATE INDEX idx_real_invoice_id ON invoice (invoice_id) WHERE invoice_id 3; Now it works great. I have a function getNextInvoiceID(): CREATE OR REPLACE FUNCTION getNextInvoiceId() RETURNS bigint AS' DECLARE ret bigint; BEGIN LOCK TABLE invoice IN SHARE ROW EXCLUSIVE MODE; SELECT INTO ret invoice_id FROM invoice WHERE invoice_id \'30\' ORDER BY invoice_id DESC limit 1; RETURN ret + 1; END; ' LANGUAGE 'plpgsql'; Using that is nearly as fast as a regular sequence. Thanks to all of you for your help. Best Regards, Michael Paesold ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PGXLOG variable worthwhile?
On Thu, 26 Sep 2002, Jan Wieck wrote: But still, why set up a situation where your database might not start? Why not set it up so that if you get just *one* environment or command-line variable right, you can't set another inconsistently and screw up your start anyway? Why store configuration information outside of the database data directory in a form that's not easily backed up, and not easily found by other utilities? Apply that argumentation to all of our commandline switches and config options and we end up with something that behaves like Microsoft products ... they know everything better, you cannot tune them, they work ... and you needed a bigger machine anyway. Talk about a straw man! I have repeatedly said: I WANT THE FEATURE THAT LETS YOU TUNE THE LOCATION OF THE LOG FILE! Read it again, and again, until you understand that we both want that feature. Then realize, I just want it implemented in a way that makes it less likely that people will find themselves in a situation where the server doesn't start. I am absolutely not in favour of the PGXLOG environment variable. But if someone else wants it, it doesn't bother me because I wouldn't use it and it cannot hurt me. Responsible programmers, when confronted with a more accident-prone and less accident-prone way of doing something, chose the less accident-prone way of doing things. That way people who are naive, or tired, or just having a bad day are less likely to come to harm. Using the config file is not only safer, it's actually more convenient. And since we're going to have the config file option anyway, removing the environment variable option means that others have less documentation to read, and will spend less time wondering why there's two different ways to do the same thing. And naive people won't chose the wrong way because they don't know any better. cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Performance while loading data and indexing
Shridhar Daithankar [EMAIL PROTECTED] writes: On 26 Sep 2002 at 11:50, Mario Weilguni wrote: Just an idea, I noticed you use char(10) for esn and min, and use this as index. Are these really fixed len fields all having 10 bytes? 10 bytes. Those are id numbers.. like phone numbers always have all the digits filled in.. If they are numbers, can you store them as bigints instead of char(N)? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] About connectby() again
On Sat, 07 Sep 2002 10:21:21 -0700 Joe Conway [EMAIL PROTECTED] wrote: I just sent in a patch using the ancestor check method. It turned out that the performance hit was pretty small on a moderate sized tree. My test case was a 22 record bill-of-material table. The tree built was 9 levels deep with about 3800 nodes. The performance hit was only about 1%. The previous patch fixed an infinite recursion bug in contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error seems to occur even if a table has commonplace tree data(see below). I would think the patch, ancestor check, should be if (strstr(branch_delim || branchstr-data || branch_delim, branch_delim || current_key || branch_delim)) This is my image, not a real code. However, if branchstr-data includes branch_delim, my image will not be perfect. -- test connectby with int based hierarchy DROP TABLE connectby_tree; CREATE TABLE connectby_tree(keyid int, parent_keyid int); INSERT INTO connectby_tree VALUES(11,NULL); INSERT INTO connectby_tree VALUES(10,11); INSERT INTO connectby_tree VALUES(111,11); INSERT INTO connectby_tree VALUES(1,111); SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level int, branch text) ERROR: infinite recursion detected Regards, Masaru Sugawara ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] About connectby() again
On Fri, 27 Sep 2002 02:02:49 +0900 I wrote [EMAIL PROTECTED] wrote: On Sat, 07 Sep 2002 10:21:21 -0700 Joe Conway [EMAIL PROTECTED] wrote: I just sent in a patch using the ancestor check method. It turned out that the performance hit was pretty small on a moderate sized tree. My test case was a 22 record bill-of-material table. The tree built was 9 levels deep with about 3800 nodes. The performance hit was only about 1%. The previous patch fixed an infinite recursion bug in contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error seems to occur even if a table has commonplace tree data(see below). I would think the patch, ancestor check, should be if (strstr(branch_delim || branchstr-data || branch_delim, branch_delim || current_key || branch_delim)) This is my image, not a real code. However, if branchstr-data includes ^ keyid or parent_keyid branch_delim, my image will not be perfect. -- test connectby with int based hierarchy DROP TABLE connectby_tree; CREATE TABLE connectby_tree(keyid int, parent_keyid int); INSERT INTO connectby_tree VALUES(11,NULL); INSERT INTO connectby_tree VALUES(10,11); INSERT INTO connectby_tree VALUES(111,11); INSERT INTO connectby_tree VALUES(1,111); SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level int, branch text) ERROR: infinite recursion detected Regards, Masaru Sugawara ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Regards, Masaru Sugawara ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] ignore
fixing a problem with the aliases, just want to make sure it goes through propelry ... ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] WAL shortcoming causes missing-pg_clog-segment problem
I think I've identified a primary cause for the no such pg_clog file problem that we've seen reported several times. A look at htup.h shows that the WAL only stores the low 8 bits of a tuple's t_infomask (see xl_heap_header struct). There is some fooling around in heapam.c's WAL redo routines to try to reconstitute some of the high-order bits, for example this: htup-t_infomask = HEAP_XMAX_INVALID | xlhdr.mask; But this is implicitly assuming that we can reconstruct the XMIN_COMMITTED bit at will. That was true when the WAL code was written, but with 7.2's ability to recycle allegedly-no-longer-needed pg_clog data, we cannot simply drop commit status bits. The only scenario I've been able to identify in which this actually causes a failure is when VACUUM FULL moves an old tuple and then shortly afterwards (before the next checkpoint) there is a crash. Post-crash, the tuple move will be redone from WAL, and the moved tuple will be inserted with zeroed-out commit status bits. When we next examine the tuple, we have to try to retrieve its commit status from pg_clog ... but it's not there anymore. As far as I can see, the only realistic solution is to store the full 16 bits of t_infomask in the WAL. We could do this without increasing WAL size by dropping the t_hoff field from xl_heap_header --- t_hoff is computable given the number of attributes and the HASNULL/HASOID bits, both of which are available. (Actually, we could save some space now by getting rid of t_oid in xl_heap_header; it's not necessary given that OID isn't in the fixed tuple header anymore.) This will require a WAL format change of course. Fortunately we can do that without forcing a complete initdb (people will have to run pg_resetxlog if they want to update a 7.3beta2 database without initdb). I see no way to fix the problem in the context of 7.2. Perhaps we should put out a bulletin warning people to avoid VACUUM FULL in 7.2, or at least to do CHECKPOINT as soon as possible after one. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
I have seen various benchmarks where XFS seems to perform best when it comes to huge amounts of data and many files (due to balanced internal b+ trees). also, XFS seems to be VERY mature and very stable. ext2/3 don't seem to be that fast in most of the benchmarks. i did some testing with reiser some time ago. the problem is that it seems to restore a very historic consistent snapshot of the data. XFS seems to be much better in this respect. i have not tested JFS yet (but on this damn AIX beside me) from my point of view i strongly recommend XFS (maybe somebody from RedHat should think about it). Hans Neil Conway wrote: Bruce Momjian [EMAIL PROTECTED] writes: The paper does recommend ext3, but the differences between file systems are very small. Well, I only did a very rough benchmark (a few runs of pgbench), but the results I found were drastically different: ext2 was significantly faster (~50%) than ext3-writeback, which was in turn significantly faster (~25%) than ext3-ordered. Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. Why would that be? Cheers, Neil -- *Cybertec Geschwinde u Schoenig* Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria Tel: +43/1/913 68 09; +43/664/233 90 75 www.postgresql.at http://www.postgresql.at, cluster.postgresql.at http://cluster.postgresql.at, www.cybertec.at http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Reconstructing FKs in pg_dump
Patrick Welche's recent problems (see pgsql-general) point out that the old CREATE CONSTRAINT TRIGGER syntax that optionally omits a FROM table clause doesn't work anymore --- the system *needs* tgconstrrelid to be set in an RI constraint trigger record, because the RI triggers now use that OID to find the referenced table. (The table name in the tgargs field isn't used anymore, mainly because it's not schema-aware.) This means that RI trigger definitions dating back to 7.0 (or whenever it was that we fixed the pg_dump bug about not dumping tgconstrrelid) don't work anymore. There are a couple things I think we should do. One: modify the CREATE CONSTRAINT TRIGGER code to try to extract a foreign relation name from the tgargs if FROM is missing. Without this, we have no hope of loading working FK trigger definitions from old dumps. Two: modify pg_dump to extract a name from the tgargs in the same fashion. I'd rather have pg_dump do this than the backend, and this will at least make things better in the case where you're using a 7.3 pg_dump against an older database. However, if we are going to put that kind of knowledge into pg_dump, it would only be a small further step to have it dump these triggers as ALTER TABLE ADD CONSTRAINT commands instead. Which would be a lot better for forward compatibility than dumping the raw triggers. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
I tend to agree with this though I have nothing to back up it with. My impression is that XFS does very well for large files. Accepting that as fact?, my impression is that XFS historically does well for database's. Again, I have nothing to back that up other than hear-say and conjecture. Greg On Thu, 2002-09-26 at 15:55, Hans-Jürgen Schönig wrote: I have seen various benchmarks where XFS seems to perform best when it comes to huge amounts of data and many files (due to balanced internal b+ trees). also, XFS seems to be VERY mature and very stable. ext2/3 don't seem to be that fast in most of the benchmarks. i did some testing with reiser some time ago. the problem is that it seems to restore a very historic consistent snapshot of the data. XFS seems to be much better in this respect. i have not tested JFS yet (but on this damn AIX beside me) from my point of view i strongly recommend XFS (maybe somebody from RedHat should think about it). Hans Neil Conway wrote: Bruce Momjian [EMAIL PROTECTED] writes: The paper does recommend ext3, but the differences between file systems are very small. Well, I only did a very rough benchmark (a few runs of pgbench), but the results I found were drastically different: ext2 was significantly faster (~50%) than ext3-writeback, which was in turn significantly faster (~25%) than ext3-ordered. Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. Why would that be? Cheers, Neil -- *Cybertec Geschwinde u Schoenig* Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria Tel: +43/1/913 68 09; +43/664/233 90 75 www.postgresql.at http://www.postgresql.at, cluster.postgresql.at http://cluster.postgresql.at, www.cybertec.at http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Reconstructing FKs in pg_dump
On Thu, 2002-09-26 at 16:57, Tom Lane wrote: This means that RI trigger definitions dating back to 7.0 (or whenever it was that we fixed the pg_dump bug about not dumping tgconstrrelid) don't work anymore. I thought 7.0 introduced foreign keys in the first place, so perhaps 7.1? However, if they're coming from 7.0 or earlier would it be appropriate to have them bounce through 7.2 / 7.1 first? Pain in the ass to dump and reload twice to get to the latest, but since they only upgrade once every 2 to 3 years... Is this the only problem that 7.0 people are going to experience (server side, SQL changes are abundant)? However, if we are going to put that kind of knowledge into pg_dump, it would only be a small further step to have it dump these triggers as ALTER TABLE ADD CONSTRAINT commands instead. Which would be a lot better for forward compatibility than dumping the raw triggers. If this type of stuff has to be done, then this is probably the best way to go. -- Rod Taylor ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Reconstructing FKs in pg_dump
On Thu, 2002-09-26 at 16:57, Tom Lane wrote: snip There are a couple things I think we should do. One: modify the CREATE CONSTRAINT TRIGGER code to try to extract a foreign relation name from the tgargs if FROM is missing. Without this, we have no hope of loading working FK trigger definitions from old dumps. Two: modify pg_dump to extract a name from the tgargs in the same fashion. I'd rather have pg_dump do this than the backend, and this will at least make things better in the case where you're using a 7.3 pg_dump against an older database. snip Thoughts? I'm trying to think of the cases where this extraction might fail, but maybe more important is what happens if it does fail? Robert Treat ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Reconstructing FKs in pg_dump
Rod Taylor [EMAIL PROTECTED] writes: However, if they're coming from 7.0 or earlier would it be appropriate to have them bounce through 7.2 / 7.1 first? Won't help. 7.2 will dump 'em out without a FROM clause, just like they were loaded. Is this the only problem that 7.0 people are going to experience (server side, SQL changes are abundant)? You're missing the point. Welche was upgrading *from 7.2*. But his trigger definitions had a dump/reload history going back to 7.0. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Reconstructing FKs in pg_dump
Robert Treat [EMAIL PROTECTED] writes: I'm trying to think of the cases where this extraction might fail, but maybe more important is what happens if it does fail? Then you have broken RI triggers ... which is the problem now. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Reconstructing FKs in pg_dump
Is this the only problem that 7.0 people are going to experience (server side, SQL changes are abundant)? You're missing the point. Welche was upgrading *from 7.2*. But his trigger definitions had a dump/reload history going back to 7.0. Oh.. I certainly did. -- Rod Taylor ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] fix for client utils compilation under win32
Tom Lane wrote: It might work to measure time since the start of the whole process, or until the timeout target, rather than accumulating adjustments to the remains count each time through. In other words something like at start: targettime = time() + specified-timeout each time we are about to wait: set select timeout to targettime - time(). This bounds the error at 1 second which is probably good enough (you might want to add 1 to targettime to ensure the error is in the conservative direction of not timing out too soon). The attached patch fixes a number of issues related to compiling the client utilities (libpq.dll and psql.exe) for win32 (missing defines, adjustments to includes, pedantic casting, non-existent functions) per: http://developer.postgresql.org/docs/postgres/install-win32.html. It compiles cleanly under Windows 2000 using Visual Studio .net. Also compiles clean and passes all regression tests (regular and contrib) under Linux. In addition to a review by the usual suspects, it would be very desirable for someone well versed in the peculiarities of win32 to take a look. If there are no objections, please commit. Thanks, Joe Index: src/backend/libpq/md5.c === RCS file: /opt/src/cvs/pgsql-server/src/backend/libpq/md5.c,v retrieving revision 1.18 diff -c -r1.18 md5.c *** src/backend/libpq/md5.c 4 Sep 2002 20:31:19 - 1.18 --- src/backend/libpq/md5.c 26 Sep 2002 17:56:11 - *** *** 26,35 *can be compiled stand-alone. */ ! #ifndef MD5_ODBC #include postgres.h #include libpq/crypt.h ! #else #include md5.h #endif --- 26,44 *can be compiled stand-alone. */ ! #if ! defined(MD5_ODBC) ! defined(FRONTEND) #include postgres.h #include libpq/crypt.h ! #endif ! ! #ifdef FRONTEND ! #include postgres_fe.h ! #ifndef WIN32 ! #include libpq/crypt.h ! #endif /* WIN32 */ ! #endif /* FRONTEND */ ! ! #ifdef MD5_ODBC #include md5.h #endif Index: src/bin/psql/command.c === RCS file: /opt/src/cvs/pgsql-server/src/bin/psql/command.c,v retrieving revision 1.81 diff -c -r1.81 command.c *** src/bin/psql/command.c 22 Sep 2002 20:57:21 - 1.81 --- src/bin/psql/command.c 26 Sep 2002 18:18:17 - *** *** 23,28 --- 23,29 #include win32.h #include io.h #include fcntl.h + #include direct.h #endif #include libpq-fe.h *** *** 1163,1169 return NULL; } ! if (i token_len - 1) return_val[i + 1] = '\0'; } --- 1164,1170 return NULL; } ! if (i (int) token_len - 1) return_val[i + 1] = '\0'; } *** *** 1240,1246 exit(EXIT_FAILURE); } ! for (p = source; p - source len *p; p += PQmblen(p, pset.encoding)) { if (esc) { --- 1241,1247 exit(EXIT_FAILURE); } ! for (p = source; p - source (int) len *p; p += PQmblen(p, pset.encoding)) { if (esc) { *** *** 1278,1284 char *end; l = strtol(p, end, 0); ! c = l; p = end - 1; break; } --- 1279,1285 char *end; l = strtol(p, end, 0); ! c = (char) l; p = end - 1; break; } Index: src/bin/psql/common.c === RCS file: /opt/src/cvs/pgsql-server/src/bin/psql/common.c,v retrieving revision 1.45 diff -c -r1.45 common.c *** src/bin/psql/common.c 14 Sep 2002 19:46:01 - 1.45 --- src/bin/psql/common.c 26 Sep 2002 18:43:31 - *** *** 11,27 #include errno.h #include stdarg.h - #include sys/time.h #ifndef HAVE_STRDUP #include strdup.h #endif #include signal.h #ifndef WIN32 #include unistd.h /* for write() */ #include setjmp.h #else #include io.h
Re: [HACKERS] Reconstructing FKs in pg_dump
On Thu, 2002-09-26 at 17:22, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: I'm trying to think of the cases where this extraction might fail, but maybe more important is what happens if it does fail? Then you have broken RI triggers ... which is the problem now. Uh...yeah, I got that part. I meant what will be done if/when it fails? Throw a WARNING and keep going? Throw an ERROR and die? Robert Treat ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Reconstructing FKs in pg_dump
On Thu, 26 Sep 2002, Tom Lane wrote: Patrick Welche's recent problems (see pgsql-general) point out that the old CREATE CONSTRAINT TRIGGER syntax that optionally omits a FROM table clause doesn't work anymore --- the system *needs* tgconstrrelid to be set in an RI constraint trigger record, because the RI triggers now use that OID to find the referenced table. (The table name in the tgargs field isn't used anymore, mainly because it's not schema-aware.) This means that RI trigger definitions dating back to 7.0 (or whenever it was that we fixed the pg_dump bug about not dumping tgconstrrelid) don't work anymore. There are a couple things I think we should do. One: modify the CREATE CONSTRAINT TRIGGER code to try to extract a foreign relation name from the tgargs if FROM is missing. Without this, we have no hope of loading working FK trigger definitions from old dumps. Two: modify pg_dump to extract a name from the tgargs in the same fashion. I'd rather have pg_dump do this than the backend, and this will at least make things better in the case where you're using a 7.3 pg_dump against an older database. I'd worry about doing things only to pg_dump since that'd still leave people that did use the old dump in the dark and there'd be nothing even indicating a problem until they did something that used the constraint. Even a notice for a missing FROM would be better (although at that point how far is it to just fixing the problem). I can look at it this weekend (since it probably was my bug in the first place) unless you'd rather do it. However, if we are going to put that kind of knowledge into pg_dump, it would only be a small further step to have it dump these triggers as ALTER TABLE ADD CONSTRAINT commands instead. Which would be a lot better for forward compatibility than dumping the raw triggers. Wasn't there still some question about the fact that ATAC causes a check of the constraint which for large tables is not insignificant. I don't remember if there was any consensus on how to deal with that. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Reconstructing FKs in pg_dump
Robert Treat [EMAIL PROTECTED] writes: On Thu, 2002-09-26 at 17:22, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: I'm trying to think of the cases where this extraction might fail, but maybe more important is what happens if it does fail? Then you have broken RI triggers ... which is the problem now. Uh...yeah, I got that part. I meant what will be done if/when it fails? Throw a WARNING and keep going? Throw an ERROR and die? What I was thinking of was to do the following in CREATE CONSTRAINT TRIGGER: if (no FROM clause) { try to extract table name from given tgargs; try to look up table OID; if successful, insert table OID into tgconstrrelid; } If the lookup fails, you'd be left creating a constraint trigger with zero tgconstrrelid, which is what's happening now. That would error out upon use (if it's really an RI trigger), thus alerting you that you have a broken trigger. (We could add a couple of lines in the RI triggers to cause the error message to be more helpful than Relation 0 not found.) regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Reconstructing FKs in pg_dump
Stephan Szabo [EMAIL PROTECTED] writes: However, if we are going to put that kind of knowledge into pg_dump, it would only be a small further step to have it dump these triggers as ALTER TABLE ADD CONSTRAINT commands instead. Which would be a lot better for forward compatibility than dumping the raw triggers. Wasn't there still some question about the fact that ATAC causes a check of the constraint which for large tables is not insignificant. I don't remember if there was any consensus on how to deal with that. Hmm, good point. That's probably why we didn't go ahead and do it already... Maybe we should just put the lookup hack into the backend's CREATE CONSTRAINT TRIGGER code and leave it at that. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Reconstructing FKs in pg_dump
On Thu, 26 Sep 2002, Tom Lane wrote: Stephan Szabo [EMAIL PROTECTED] writes: However, if we are going to put that kind of knowledge into pg_dump, it would only be a small further step to have it dump these triggers as ALTER TABLE ADD CONSTRAINT commands instead. Which would be a lot better for forward compatibility than dumping the raw triggers. Wasn't there still some question about the fact that ATAC causes a check of the constraint which for large tables is not insignificant. I don't remember if there was any consensus on how to deal with that. Hmm, good point. That's probably why we didn't go ahead and do it already... Maybe we should just put the lookup hack into the backend's CREATE CONSTRAINT TRIGGER code and leave it at that. That seems reasonable. And probably not too hard. There might still be cases where we can't get it, and I think we probably should at least throw a notice on the create in that case, the admin will *probably* ignore it, but if they want to fix the situation right away they can. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] About connectby() again
Masaru Sugawara wrote: The previous patch fixed an infinite recursion bug in contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error seems to occur even if a table has commonplace tree data(see below). I would think the patch, ancestor check, should be if (strstr(branch_delim || branchstr-data || branch_delim, branch_delim || current_key || branch_delim)) This is my image, not a real code. However, if branchstr-data includes branch_delim, my image will not be perfect. Good point. Thank you Masaru for the suggested fix. Attached is a patch to fix the bug found by Masaru. His example now produces: regression=# SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level int, branch text); keyid | parent_keyid | level | branch ---+--+---+-- 11 | | 0 | 11 10 | 11 | 1 | 11-10 111 | 11 | 1 | 11-111 1 | 111 | 2 | 11-111-1 (4 rows) While making the patch I also realized that the no show branch form of the function was not going to work very well for recursion detection. Therefore there is now a default branch delimiter ('~') that is used internally, for that case, to enable recursion detection to work. If you need a different delimiter for your specific data, you will have to use the show branch form of the function. If there are no objections, please apply. Thanks, Joe Index: contrib/tablefunc/README.tablefunc === RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/README.tablefunc,v retrieving revision 1.3 diff -c -r1.3 README.tablefunc *** contrib/tablefunc/README.tablefunc 2 Sep 2002 05:44:04 - 1.3 --- contrib/tablefunc/README.tablefunc 26 Sep 2002 22:57:27 - *** *** 365,371 branch_delim ! if optional branch value is desired, this string is used as the delimiter Outputs --- 365,373 branch_delim ! If optional branch value is desired, this string is used as the delimiter. ! When not provided, a default value of '~' is used for internal ! recursion detection only, and no branch field is returned. Outputs *** *** 388,394 the level value output 3. If the branch field is not desired, omit both the branch_delim input ! parameter *and* the branch field in the query column definition 4. If the branch field is desired, it must be the forth column in the query column definition, and it must be type TEXT --- 390,399 the level value output 3. If the branch field is not desired, omit both the branch_delim input ! parameter *and* the branch field in the query column definition. Note ! that when branch_delim is not provided, a default value of '~' is used ! for branch_delim for internal recursion detection, even though the branch ! field is not returned. 4. If the branch field is desired, it must be the forth column in the query column definition, and it must be type TEXT Index: contrib/tablefunc/tablefunc.c === RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.c,v retrieving revision 1.9 diff -c -r1.9 tablefunc.c *** contrib/tablefunc/tablefunc.c 14 Sep 2002 19:32:54 - 1.9 --- contrib/tablefunc/tablefunc.c 26 Sep 2002 23:09:27 - *** *** 652,657 --- 652,660 branch_delim = GET_STR(PG_GETARG_TEXT_P(5)); show_branch = true; } + else + /* default is no show, tilde for the delimiter */ + branch_delim = pstrdup(~); per_query_ctx = rsinfo-econtext-ecxt_per_query_memory; oldcontext = MemoryContextSwitchTo(per_query_ctx); *** *** 798,807 --- 801,816 char *current_branch; char **values; StringInfo branchstr = NULL; + StringInfo chk_branchstr = NULL; + StringInfo chk_current_key = NULL; /* start a new branch */ branchstr = makeStringInfo(); + /* need these to check for recursion */ + chk_branchstr = makeStringInfo(); + chk_current_key = makeStringInfo(); + if (show_branch) values = (char **) palloc(CONNECTBY_NCOLS * sizeof(char *)); else *** *** 854,875 { /* initialize branch for this pass */ appendStringInfo(branchstr, %s, branch); /* get the next sql result tuple */ spi_tuple = tuptable-vals[i]; /* get the current key
Re: [HACKERS] unicode
Tatsuo Ishii kirjutas N, 26.09.2002 kell 03:37: The actual checking is done in INSERT/UPDATE/COPY. However, the checking is currently very limited: every byte of a mutibyte character must be greater than 0x7f. Where can I read about basic tech details of Unicode / Charset Conversion / ... I't like to find answers to the following (for database created using UNICODE) 1. Where exactly are conversions between national charsets done 2. What is converyted (whole SQL statements or just data) 3. What format is used for processing in memory (UCS-2, UCS-4, UTF-8, UTF-16, UTF-32, ...) 4. What format is used when saving to disk (UCS-*, UTF-*, SCSU, ...) ? 5. Are LIKE/SIMILAR aware of locale stuff ? - Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: AIX compilation problems (was Re: [HACKERS] Proposal ...)
configure somehow thinks it needs to #define _LARGE_FILES though, which then clashes with pg_config.h's _LARGE_FILES. I think the test needs to #include unistd.h . _LARGE_FILES is defined because it's necessary to make off_t 64 bits. If you disagree, please post compiler output. Ah, if we want off_t to be 64 bits, then we need _LARGE_FILES. The problem is, that scan.c includes unistd.h before postgres.h and thus unistd.h defines _LARGE_FILE_API which is not allowed together with _LARGE_FILES. Do you know an answer ? Offhand I can only think of using -D_LARGE_FILES as a compiler flag :-( Do we really want a general 64 bit off_t or would it be sufficient in the two places that use fseeko ? Andreas ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] postmaster -d option (was Re: [GENERAL] Relation 0 does not exist)
Bruce Momjian [EMAIL PROTECTED] writes: diff -c -c -r1.294 postgres.c *** src/backend/tcop/postgres.c25 Sep 2002 20:31:40 -1.294 --- src/backend/tcop/postgres.c26 Sep 2002 05:15:41 - *** *** 1281,1288 * -d 0 allows user to prevent postmaster debug * from propagating to backend. */ ! SetConfigOption(server_min_messages, notice, ! ctx, gucsource); } break; --- 1281,1287 * -d 0 allows user to prevent postmaster debug * from propagating to backend. */ ! ResetPGVariable(server_min_messages); } break; If you want export PGOPTIONS=-d0 to do what the comment says, you'd also need to Reset all of the other GUC variables that -dN might have set. However, I'm not sure that I agree with the goal in the first place. If the admin has set debugging on the postmaster command line, should it really be possible for users to turn it off so easily? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: AIX compilation problems (was Re: [HACKERS] Proposal ...)
shared libs on AIX need to be able to resolve all symbols at linkage time. Those two symbols are in backend/utils/SUBSYS.o but not in the postgres executable. My guess is, that they are eliminated by the linker ? Do they need an extern declaration ? Further research prooved, that the AIX linker eliminates functions on a per c file basis if none of them is referenced elsewhere (declared extern or not). Thus it eliminates the whole conv.c file from the postgres executable since those functions are only used by the conversion shared objects. Anybody have an idea what I can do ? Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Shridhar Daithankar [EMAIL PROTECTED] writes: RedHat7.2/PostgreSQL7.1.3 I'd suggest a newer release of Postgres ... 7.1.3 is pretty old ... Create unique composite index on 2 char and a timestamp field: 25226 sec. What do you mean by char exactly? If it's really char(N), how much are you paying in padding space? There are very very few cases where I'd not say to use varchar(N), or text, instead. Also, does it have to be character data? If you could use an integer or float datatype instead the index operations should be faster (though I can't say by how much). Have you thought carefully about the order in which the composite index columns are listed? sort_mem = 12000 To create an index of this size, you want to push sort_mem as high as it can go without swapping. 12000 sounds fine for the global setting, but in the process that will create the index, try setting sort_mem to some hundreds of megs or even 1Gb. (But be careful: the calculation of space actually used by CREATE INDEX is off quite a bit in pre-7.3 releases :-(. You should probably expect the actual process size to grow to two or three times what you set sort_mem to. Don't let it get so big as to swap.) wal_buffers = 65536 The above is a complete waste of memory space, which would be better spent on letting the kernel expand its disk cache. There's no reason for wal_buffers to be more than a few dozen. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On 26 Sep 2002 at 10:33, Tom Lane wrote: Shridhar Daithankar [EMAIL PROTECTED] writes: RedHat7.2/PostgreSQL7.1.3 I'd suggest a newer release of Postgres ... 7.1.3 is pretty old ... I agree.. downloadind 7.2.2 right away.. Create unique composite index on 2 char and a timestamp field: 25226 sec. What do you mean by char exactly? If it's really char(N), how much are you paying in padding space? There are very very few cases where I'd not say to use varchar(N), or text, instead. Also, does it have to be character data? If you could use an integer or float datatype instead the index operations should be faster (though I can't say by how much). Have you thought carefully about the order in which the composite index columns are listed? I have forwarded the idea of putting things into number. If it causes speedup in index lookup/creation, it would do. Looks like bigint is the order of the day.. sort_mem = 12000 To create an index of this size, you want to push sort_mem as high as it can go without swapping. 12000 sounds fine for the global setting, but in the process that will create the index, try setting sort_mem to some hundreds of megs or even 1Gb. (But be careful: the calculation of space actually used by CREATE INDEX is off quite a bit in pre-7.3 releases :-(. You should probably expect the actual process size to grow to two or three times what you set sort_mem to. Don't let it get so big as to swap.) Great. I was skeptical to push it beyond 100MB. Now I can push it to corners.. wal_buffers = 65536 The above is a complete waste of memory space, which would be better spent on letting the kernel expand its disk cache. There's no reason for wal_buffers to be more than a few dozen. That was a rather desparate move. Nothing was improving performance and then we started pushing numbers.. WIll get it back.. Same goes for 64 WAL files.. A GB looks like waste to me.. I might have found the bottleneck, although by accident. Mysql was running out of space while creating index. So my friend shut down mysql and tried to move things by hand to create links. He noticed that even things like cp were terribly slow and it hit us.. May be the culprit is the file system. Ext3 in this case. My friend argues for ext2 to eliminate journalling overhead but I favour reiserfs personally having used it in pgbench with 10M rows on paltry 20GB IDE disk for 25 tps.. We will be attempting raiserfs and/or XFS if required. I know how much speed difference exists between resiserfs and ext2. Would not be surprised if everythng just starts screaming in one go.. Bye Shridhar -- Cropp's Law:The amount of work done varies inversly with the time spent in the office. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: AIX compilation problems (was Re: [HACKERS] Proposal ...)
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: Further research prooved, that the AIX linker eliminates functions on a per c file basis if none of them is referenced elsewhere (declared extern or not). Thus it eliminates the whole conv.c file from the postgres executable since those functions are only used by the conversion shared objects. Yipes. Surely there is a linker switch to suppress that behavior? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On 26 Sep 2002 at 10:42, Tom Lane wrote: Justin Clift [EMAIL PROTECTED] writes: If it's any help, when I was testing recently with WAL on a separate drive, the WAL logs were doing more readwrites per second than the main data drive. ... but way fewer seeks. For anything involving lots of updating transactions (and certainly 5000 separate insertions per second would qualify; can those be batched??), it should be a win to put WAL on its own spindle, just to get locality of access to the WAL. Probably they will be a single transcation. If possible we will bunch more of them together.. like 5 seconds of data pushed down in a single transaction but not sure it's possible.. This is bit like replication but from live oracle machine to postgres, from information I have. So there should be some chance of tuning there.. Bye Shridhar -- Langsam's Laws: (1) Everything depends. (2) Nothing is always. (3) Everything is sometimes. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On Thursday 26 September 2002 21:52, Shridhar Daithankar wrote: I might have found the bottleneck, although by accident. Mysql was running out of space while creating index. So my friend shut down mysql and tried to move things by hand to create links. He noticed that even things like cp were terribly slow and it hit us.. May be the culprit is the file system. Ext3 in this case. My friend argues for ext2 to eliminate journalling overhead but I favour reiserfs personally having used it in pgbench with 10M rows on paltry 20GB IDE disk for 25 tps.. We will be attempting raiserfs and/or XFS if required. I know how much speed difference exists between resiserfs and ext2. Would not be surprised if everythng just starts screaming in one go.. As it was found by someone before any non-journaling FS is faster than journaling one. This due to double work done by FS and database. Try it on ext2 and compare. -- Denis ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Shridhar Daithankar wrote: snip My friend argues for ext2 to eliminate journalling overhead but I favour reiserfs personally having used it in pgbench with 10M rows on paltry 20GB IDE disk for 25 tps.. If it's any help, the setup I mentioned before with differnt disks for the data and the WAL files was getting an average of about 72 tps with 200 concurrent users on pgbench. Haven't tuned it in a hard core way at all, and it only has 256MB DDR RAM in it at the moment (single CPU AthonXP 1600). These are figures made during the 2.5k+ test runs of pgbench done when developing pg_autotune recently. As a curiosity point, how predictable are the queries you're going to be running on your database? They sound very simple and very predicatable. The pg_autotune tool might be your friend here. It can deal with arbitrary SQL instead of using the pg_bench stuff of Tatsuos, and it can also deal with an already loaded database. You'd just have to tweak the names of the tables that it vacuums and the names of the indexes that it reindexes between each run, to get some idea of your overall server performance at different load points. Probably worth taking a good look at if you're not afraid of editing variables in C code. :) We will be attempting raiserfs and/or XFS if required. I know how much speed difference exists between resiserfs and ext2. Would not be surprised if everythng just starts screaming in one go.. We'd all probably be interested to hear this. Added the PostgreSQL Performance mailing list to this thread too, Just In Case. (wow that's a lot of cross posting now). Regards and best wishes, Justin Clift Bye Shridhar -- Cropp's Law:The amount of work done varies inversly with the time spent in the office. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On 27 Sep 2002 at 1:12, Justin Clift wrote: Shridhar Daithankar wrote: As a curiosity point, how predictable are the queries you're going to be running on your database? They sound very simple and very predicatable. Mostly predictable selects. Not a domain expert on telecom so not very sure. But in my guess prepare statement in 7.3 should come pretty handy. i.e. by the time we finish evaluation and test deployment, 7.3 will be out in next couple of months to say so. So I would recommend doing it 7.3 way only.. The pg_autotune tool might be your friend here. It can deal with arbitrary SQL instead of using the pg_bench stuff of Tatsuos, and it can also deal with an already loaded database. You'd just have to tweak the names of the tables that it vacuums and the names of the indexes that it reindexes between each run, to get some idea of your overall server performance at different load points. Probably worth taking a good look at if you're not afraid of editing variables in C code. :) Gladly. We started with altering pgbench here for testing and rapidly settled to perl generated random queries. Once postgresql wins the evaluation match and things come to implementation, pg_autotune would be a handy tool. Just that can't do it right now. Have to fight mysql and SAP DB before that.. BTW any performance figures on SAP DB? People here are as it frustrated with it with difficulties in setting it up. But still.. We will be attempting raiserfs and/or XFS if required. I know how much speed difference exists between resiserfs and ext2. Would not be surprised if everythng just starts screaming in one go.. We'd all probably be interested to hear this. Added the PostgreSQL Performance mailing list to this thread too, Just In Case. (wow that's a lot of cross posting now). I know..;-) Glad that PG list does not have strict policies like no non- subscriber posting or no attachments.. etc.. IMO reiserfs, though journalling one, is faster than ext2 etc. because the way it handles metadata. Personally I haven't come across ext2 being faster than reiserfs on few machine here for day to day use. I guess I should have a freeBSD CD handy too.. Just to give it a try. If it comes down to a better VM.. though using 2.4.19 here.. so souldn't matter much.. I will keep you guys posted on file system stuff... Glad that we have much flexibility with postgresql.. Bye Shridhar -- Bilbo's First Law: You cannot count friends that are all packed up in barrels. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On Thu, 2002-09-26 at 09:52, Shridhar Daithankar wrote: My friend argues for ext2 to eliminate journalling overhead but I favour reiserfs personally having used it in pgbench with 10M rows on paltry 20GB IDE disk for 25 tps.. We will be attempting raiserfs and/or XFS if required. I know how much speed difference exists between resiserfs and ext2. Would not be surprised if everythng just starts screaming in one go.. I'm not sure about reiserfs or ext3 but with XFS, you can create your log on another disk. Also worth noting is that you can also configure the size and number of log buffers. There are also some other performance type enhancements you can fiddle with if you don't mind risking time stamp consistency in the event of a crash. If your setup allows for it, you might want to consider using XFS in this configuration. While I have not personally tried moving XFS' log to another device, I've heard that performance gains can be truly stellar. Assuming memory allows, twiddling with the log buffering is said to allow for large strides in performance as well. If you do try this, I'd love to hear back about your results and impressions. Greg signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Bug in PL/pgSQL GET DIAGNOSTICS?
Manfred Koizar wrote: On Wed, 25 Sep 2002 21:40:03 -0400 (EDT), Bruce Momjian [EMAIL PROTECTED] wrote: Item 3 is the most controversial. Some say sum all tuple counts, i.e. sum INSERT/UPDATE/DELETE. That just seems to messy to me. I think summing only the matching tags has the highest probability of returning a meaningful number. [Trying to keep it short this time] I still believe that there is more than one correct answer; it just depends on what the dba intends. So I proposed a syntax change for letting the dba explicitly mark the statements she/he wants to affect tuple count and oid. - http://archives.postgresql.org/pgsql-hackers/2002-09/msg00720.php Unfortunately I tried to summarize all other proposals and the mail got so long that nobody read it to the end :-( That is an interesting idea; some syntax in the rule that marks the items. The one downside to that is the fact the rule writer has to make adjustments. Perhaps we could implement the behavoir I described and add such tagging later. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Shridhar Daithankar wrote: I might have found the bottleneck, although by accident. Mysql was running out of space while creating index. So my friend shut down mysql and tried to move things by hand to create links. He noticed that even things like cp were terribly slow and it hit us.. May be the culprit is the file system. Ext3 in this case. I just added a file system and multi-cpu section to my performance tuning paper: http://www.ca.postgresql.org/docs/momjian/hw_performance/ The paper does recommend ext3, but the differences between file systems are very small. If you are seeing 'cp' as slow, I wonder if it may be something more general, like poorly tuned hardware or something. You can use 'dd' to throw some data around the file system and see if that is showing slowness; compare those numbers to another machine that has different hardware/OS. Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. That would be an interesting test if you suspect ext3. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Greg Copeland [EMAIL PROTECTED] writes: I'm not sure about reiserfs or ext3 but with XFS, you can create your log on another disk. Also worth noting is that you can also configure the size and number of log buffers. There are also some other performance type enhancements you can fiddle with if you don't mind risking time stamp consistency in the event of a crash. If your setup allows for it, you might want to consider using XFS in this configuration. You can definitely put the ext3 log on a different disk with 2.4 kernels. Also, if you put the WAL logs on a different disk from the main database, and mount that partition with 'data=writeback' (ie metadata-only journaling) ext3 should be pretty fast, since WAL files are preallocated and there will therefore be almost no metadata updates. You should be able to mount the main database with data=ordered (the default) for good performance and reasonable safety. I think putting WAL on its own disk(s) is one of the keys here. -Doug ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On Thu, 2002-09-26 at 11:41, Bruce Momjian wrote: Shridhar Daithankar wrote: I might have found the bottleneck, although by accident. Mysql was running out of space while creating index. So my friend shut down mysql and tried to move things by hand to create links. He noticed that even things like cp were terribly slow and it hit us.. May be the culprit is the file system. Ext3 in this case. I just added a file system and multi-cpu section to my performance tuning paper: http://www.ca.postgresql.org/docs/momjian/hw_performance/ The paper does recommend ext3, but the differences between file systems are very small. If you are seeing 'cp' as slow, I wonder if it may be something more general, like poorly tuned hardware or something. You can use 'dd' to throw some data around the file system and see if that is showing slowness; compare those numbers to another machine that has different hardware/OS. Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. That would be an interesting test if you suspect ext3. I'm curious as to why you recommended ext3 versus some other (JFS, XFS). Do you have tests which validate that recommendation or was it a simple matter of getting the warm fuzzies from familiarity? Greg signature.asc Description: This is a digitally signed message part
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On Thu, 2002-09-26 at 11:41, Bruce Momjian wrote: Shridhar Daithankar wrote: I might have found the bottleneck, although by accident. Mysql was running out of space while creating index. So my friend shut down mysql and tried to move things by hand to create links. He noticed that even things like cp were terribly slow and it hit us.. May be the culprit is the file system. Ext3 in this case. I just added a file system and multi-cpu section to my performance tuning paper: http://www.ca.postgresql.org/docs/momjian/hw_performance/ The paper does recommend ext3, but the differences between file systems are very small. If you are seeing 'cp' as slow, I wonder if it may be something more general, like poorly tuned hardware or something. You can use 'dd' to throw some data around the file system and see if that is showing slowness; compare those numbers to another machine that has different hardware/OS. That's a good point. Also, if you're using IDE, you do need to verify that you're using DMA and proper PIO mode if at possible. Also, big performance improvements can be seen by making sure your IDE bus speed has been properly configured. The drivetweak-gtk and hdparm utilities can make huge difference in performance. Just be sure you know what the heck your doing when you mess with those. Greg signature.asc Description: This is a digitally signed message part
Re: [GENERAL] [HACKERS] Performance while loading data and indexing
If you are seeing very slow performance on a drive set, check dmesg to see if you're getting SCSI bus errors or something similar. If your drives aren't properly terminated then the performance will suffer a great deal. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: AIX compilation problems (was Re: [HACKERS] Proposal ...)
Further research prooved, that the AIX linker eliminates functions on a per c file basis if none of them is referenced elsewhere (declared extern or not). Thus it eliminates the whole conv.c file from the postgres executable since those functions are only used by the conversion shared objects. Yipes. Surely there is a linker switch to suppress that behavior? -brtl , but that does a lot more that we don't want and does not work :-( I think the best thing to do would be to do the following: link a postgres.so from all SUBSYS.o's create postgres.imp from postgres.so (since it is a lib it has all symbols) link postgres with postgres.imp and the SUBSYS.o's Currently it does link postgres create postgres.imp from postgres link postgres again using postgres.imp This is not so good anyways, since it would actually require a cyclic dependency. A remake currently requires to manually remove postgres and postgres.imp . Not sure how to do this in the Makefiles however :-( Should this be done in src/backend/Makefile with a if portname ? I don't like that. Can sombody help, please ? Andreas ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Greg Copeland wrote: The paper does recommend ext3, but the differences between file systems are very small. If you are seeing 'cp' as slow, I wonder if it may be something more general, like poorly tuned hardware or something. You can use 'dd' to throw some data around the file system and see if that is showing slowness; compare those numbers to another machine that has different hardware/OS. Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. That would be an interesting test if you suspect ext3. I'm curious as to why you recommended ext3 versus some other (JFS, XFS). Do you have tests which validate that recommendation or was it a simple matter of getting the warm fuzzies from familiarity? I used the attached email as a reference. I just changed the wording to be: File system choice is particularly difficult on Linux because there are so many file system choices, and none of them are optimal: ext2 is not entirely crash-safe, ext3 and xfs are journal-based, and Reiser is optimized for small files. Fortunately, the journaling file systems aren't significantly slower than ext2 so they are probably the best choice. so I don't specifically recommend ext3 anymore. As I remember, ext3 is good only in that it can read ext2 file systems. I think XFS may be the best bet. Can anyone clarify if data=writeback is safe for PostgreSQL. Specifically, are the data files recovered properly or is this option only for a filesystem containing WAL? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 From [EMAIL PROTECTED] Sun Apr 14 10:26:49 2002 Return-path: [EMAIL PROTECTED] Received: from postgresql.org (postgresql.org [64.49.215.8]) by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g3EEQmS21621 for [EMAIL PROTECTED]; Sun, 14 Apr 2002 10:26:48 -0400 (EDT) Received: from postgresql.org (postgresql.org [64.49.215.8]) by postgresql.org (Postfix) with SMTP id 94C5B475C9B for [EMAIL PROTECTED]; Sun, 14 Apr 2002 10:26:43 -0400 (EDT) Received: from sphinx.mythic-beasts.com (sphinx.mythic-beasts.com [195.82.107.246]) by postgresql.org (Postfix) with ESMTP id 1E27547517F for [EMAIL PROTECTED]; Sun, 14 Apr 2002 10:25:31 -0400 (EDT) Received: from matthew (helo=localhost) by sphinx.mythic-beasts.com with local-esmtp (Exim 3.33 #2) id 16wkwR-p2-00; Sun, 14 Apr 2002 15:25:19 +0100 Date: Sun, 14 Apr 2002 15:25:19 +0100 (BST) From: Matthew Kirkwood [EMAIL PROTECTED] X-X-Sender: [EMAIL PROTECTED] To: Riebs, Andy [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: Re: [GENERAL] Performance Tuning Document? In-Reply-To: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Sender: [EMAIL PROTECTED] Status: OR On Thu, 28 Mar 2002, Matthew Kirkwood wrote: [ oops, forgot to send this ages ago ] Under the crossSectionTests(Mixed IR) part of an OSDB run, a large number of shared_buffers causes severe slowdown on one of the tests -- it goes from a little over 200 seconds to nearly 2000. --postgresql=no_hash_index Ah, that would make a lot of sense. I'll do a run again with that option and see what turns up. That was right on the nose. The numbers are much better now. My initial interest was in benchmarking different filesystems on Linux. In case anyone is interested, here are today's numbers: tuning? single ir cs-ir oltpcs-oltp (sec) (tps) (sec) (tps) (sec) ext3 kn 841.28 61.52 203.33 407.58 159.72 ext3-wb kn 841.19 63.73 217.19 406.30 160.88 ext3-jd kn 839.96 58.96 203.02 307.85 159.89 jfs kn 840.53 62.74 205.90 348.33 177.70 minixkn 841.51 62.12 201.44 343.87 176.68 ext2 kn 840.72 65.02 205.40 338.20 182.22 ext3-wb is ext3 with the data=writeback mount option. ext3-jd is ext3 with data=journal and a 200Mb journal instead of the usual 32Mb one. All filesystems were mounted noatime. postgresql.conf for all these runs looks like: tcpip_socket = true shared_buffers = 10240 max_fsm_relations = 100 max_fsm_pages = 1 max_locks_per_transaction = 256 wal_buffers = 10240 sort_mem = 512 vacuum_mem = 81920 Without hash indexes, it looks like only OLTP loads can differentiate the filesystems. Sometime (once I have got a more recent kernel going) I'll try a dataset larger than memory. Matthew. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end
[HACKERS] HOLD ON BETA2
[ Subject changed.] Marc, please hold on announcing beta2 until we get this resolved. Thanks. --- Tom Lane wrote: I think I've identified a primary cause for the no such pg_clog file problem that we've seen reported several times. A look at htup.h shows that the WAL only stores the low 8 bits of a tuple's t_infomask (see xl_heap_header struct). There is some fooling around in heapam.c's WAL redo routines to try to reconstitute some of the high-order bits, for example this: htup-t_infomask = HEAP_XMAX_INVALID | xlhdr.mask; But this is implicitly assuming that we can reconstruct the XMIN_COMMITTED bit at will. That was true when the WAL code was written, but with 7.2's ability to recycle allegedly-no-longer-needed pg_clog data, we cannot simply drop commit status bits. The only scenario I've been able to identify in which this actually causes a failure is when VACUUM FULL moves an old tuple and then shortly afterwards (before the next checkpoint) there is a crash. Post-crash, the tuple move will be redone from WAL, and the moved tuple will be inserted with zeroed-out commit status bits. When we next examine the tuple, we have to try to retrieve its commit status from pg_clog ... but it's not there anymore. As far as I can see, the only realistic solution is to store the full 16 bits of t_infomask in the WAL. We could do this without increasing WAL size by dropping the t_hoff field from xl_heap_header --- t_hoff is computable given the number of attributes and the HASNULL/HASOID bits, both of which are available. (Actually, we could save some space now by getting rid of t_oid in xl_heap_header; it's not necessary given that OID isn't in the fixed tuple header anymore.) This will require a WAL format change of course. Fortunately we can do that without forcing a complete initdb (people will have to run pg_resetxlog if they want to update a 7.3beta2 database without initdb). I see no way to fix the problem in the context of 7.2. Perhaps we should put out a bulletin warning people to avoid VACUUM FULL in 7.2, or at least to do CHECKPOINT as soon as possible after one. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] HOLD ON BETA2
Oops, I see beta2's on the web site with yesterday's date. Have they been announced? -rw-r--r-- 1 70 70 1072573 Sep 25 10:15 postgresql-test-7.3b2.tar.gz --- Tom Lane wrote: I think I've identified a primary cause for the no such pg_clog file problem that we've seen reported several times. A look at htup.h shows that the WAL only stores the low 8 bits of a tuple's t_infomask (see xl_heap_header struct). There is some fooling around in heapam.c's WAL redo routines to try to reconstitute some of the high-order bits, for example this: htup-t_infomask = HEAP_XMAX_INVALID | xlhdr.mask; But this is implicitly assuming that we can reconstruct the XMIN_COMMITTED bit at will. That was true when the WAL code was written, but with 7.2's ability to recycle allegedly-no-longer-needed pg_clog data, we cannot simply drop commit status bits. The only scenario I've been able to identify in which this actually causes a failure is when VACUUM FULL moves an old tuple and then shortly afterwards (before the next checkpoint) there is a crash. Post-crash, the tuple move will be redone from WAL, and the moved tuple will be inserted with zeroed-out commit status bits. When we next examine the tuple, we have to try to retrieve its commit status from pg_clog ... but it's not there anymore. As far as I can see, the only realistic solution is to store the full 16 bits of t_infomask in the WAL. We could do this without increasing WAL size by dropping the t_hoff field from xl_heap_header --- t_hoff is computable given the number of attributes and the HASNULL/HASOID bits, both of which are available. (Actually, we could save some space now by getting rid of t_oid in xl_heap_header; it's not necessary given that OID isn't in the fixed tuple header anymore.) This will require a WAL format change of course. Fortunately we can do that without forcing a complete initdb (people will have to run pg_resetxlog if they want to update a 7.3beta2 database without initdb). I see no way to fix the problem in the context of 7.2. Perhaps we should put out a bulletin warning people to avoid VACUUM FULL in 7.2, or at least to do CHECKPOINT as soon as possible after one. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] HOLD ON BETA2
Bruce Momjian [EMAIL PROTECTED] writes: Oops, I see beta2's on the web site with yesterday's date. Have they been announced? -rw-r--r-- 1 70 70 1072573 Sep 25 10:15 postgresql-test-7.3b2.tar.gz No, but they've been out there for more than a day. I think it's too late to retract beta2. We could fix this problem and then do a quick beta3, though... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Neil Conway wrote: Bruce Momjian [EMAIL PROTECTED] writes: The paper does recommend ext3, but the differences between file systems are very small. Well, I only did a very rough benchmark (a few runs of pgbench), but the results I found were drastically different: ext2 was significantly faster (~50%) than ext3-writeback, which was in turn significantly faster (~25%) than ext3-ordered. Wow. That leaves no good Linux file system alternatives. PostgreSQL just wants an ordinary file system that has reliable recovery from a crash. Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. Why would that be? I assumed it was the double fsync for the normal and journal that made the journalling file systems slog. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] pg7.3b1
I am so glad that postgres now keeps track of relationships between rule, views, functions, tables, etc. I've had to re-do all my creation and drop scripts but this is definitely for the better. During my testing of my scripts, I have come across this message: psql:/u1/cvs73/DataBase/Config/Schema/logconfig.sql:142: WARNING: Relcache reference leak: relation positions has refcnt 1 instead of 0 What does this indicate? Thanks, -- Laurette Cisneros The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com --- NextBus say: Riders prefer to arrive just minute before bus than just minute after. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] HOLD ON BETA2
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Oops, I see beta2's on the web site with yesterday's date. Have they been announced? -rw-r--r-- 1 70 70 1072573 Sep 25 10:15 postgresql-test-7.3b2.tar.gz No, but they've been out there for more than a day. I think it's too late to retract beta2. We could fix this problem and then do a quick beta3, though... Yes, that's the only solution. Let's not announce beta2 and make things worse. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg7.3b1
On Thu, 2002-09-26 at 16:46, Laurette Cisneros wrote: I am so glad that postgres now keeps track of relationships between rule, views, functions, tables, etc. I've had to re-do all my creation and drop scripts but this is definitely for the better. During my testing of my scripts, I have come across this message: psql:/u1/cvs73/DataBase/Config/Schema/logconfig.sql:142: WARNING: Relcache reference leak: relation positions has refcnt 1 instead of 0 What does this indicate? Someone (probably me) made a mistake and forgot to release a cache handle. Do you happen to have a sequence of commands that can reproduce this? -- Rod Taylor ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] pg7.3b1
I'll see if I can pare down my scripts (they are long) to reproduce this easier. L. On 26 Sep 2002, Rod Taylor wrote: On Thu, 2002-09-26 at 16:46, Laurette Cisneros wrote: I am so glad that postgres now keeps track of relationships between rule, views, functions, tables, etc. I've had to re-do all my creation and drop scripts but this is definitely for the better. During my testing of my scripts, I have come across this message: psql:/u1/cvs73/DataBase/Config/Schema/logconfig.sql:142: WARNING: Relcache reference leak: relation positions has refcnt 1 instead of 0 What does this indicate? Someone (probably me) made a mistake and forgot to release a cache handle. Do you happen to have a sequence of commands that can reproduce this? -- Laurette Cisneros The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com --- NextBus say: Riders prefer to arrive just minute before bus than just minute after. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Neil Conway wrote: Bruce Momjian [EMAIL PROTECTED] writes: The paper does recommend ext3, but the differences between file systems are very small. Well, I only did a very rough benchmark (a few runs of pgbench), but the results I found were drastically different: ext2 was significantly faster (~50%) than ext3-writeback, which was in turn significantly faster (~25%) than ext3-ordered. Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. Why would that be? OK, I changed the text to: File system choice is particularly difficult on Linux because there are so many file system choices, and none of them are optimal: ext2 is not entirely crash-safe, ext3, xfs, and jfs are journal-based, and Reiser is optimized for small files and does journalling. The journalling file systems can be significantly slower than ext2 but when crash recovery is required, ext2 isn't an option. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Bruce Momjian [EMAIL PROTECTED] writes: Wow. That leaves no good Linux file system alternatives. PostgreSQL just wants an ordinary file system that has reliable recovery from a crash. I'm not really familiar with the reasoning behind ext2's reputation as recovering poorly from crashes; if we fsync a WAL record to disk before we lose power, can't we recover reliably, even with ext2? Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. Why would that be? I assumed it was the double fsync for the normal and journal that made the journalling file systems slog. Well, a journalling file system would need to write a journal entry and flush that to disk, even if fsync is disabled -- whereas without fsync enabled, ext2 doesn't have to flush anything to disk. ISTM that the performance advantage of ext2 over ext3 is should be even larger when fsync is not enabled. Cheers, Neil -- Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Neil Conway wrote: Bruce Momjian [EMAIL PROTECTED] writes: Wow. That leaves no good Linux file system alternatives. PostgreSQL just wants an ordinary file system that has reliable recovery from a crash. I'm not really familiar with the reasoning behind ext2's reputation as recovering poorly from crashes; if we fsync a WAL record to disk before we lose power, can't we recover reliably, even with ext2? Also, though ext3 is slower, turning fsync off should make ext3 function similar to ext2. Why would that be? I assumed it was the double fsync for the normal and journal that made the journalling file systems slog. Well, a journalling file system would need to write a journal entry and flush that to disk, even if fsync is disabled -- whereas without fsync enabled, ext2 doesn't have to flush anything to disk. ISTM that the performance advantage of ext2 over ext3 is should be even larger when fsync is not enabled. Yes, it is still double-writing. I just thought that if that wasn't happening while the db was waiting for a commit that it wouldn't be too bad. Is it just me or do all the Linux file systems seem like they are lacking something when PostgreSQL is concerned? We just want a UFS-like file system on Linux and no one has it. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On Thu, 2002-09-26 at 16:03, Neil Conway wrote: Bruce Momjian [EMAIL PROTECTED] writes: Wow. That leaves no good Linux file system alternatives. PostgreSQL just wants an ordinary file system that has reliable recovery from a crash. I'm not really familiar with the reasoning behind ext2's reputation as recovering poorly from crashes; if we fsync a WAL record to disk before we lose power, can't we recover reliably, even with ext2? Well, I have experienced data loss from ext2 before. Also, recovery from crashes on large file systems take a very, very long time. I can't imagine anyone running a production database on an ext2 file system having 10's or even 100's of GB. Ouch. Recovery would take forever! Even recovery on small file systems (2-8G) can take extended periods of time. Especially so on IDE systems. Even then manual intervention is not uncommon. While I can't say that x, y or z is the best FS to use on Linux, I can say that ext2 is probably an exceptionally poor choice from a reliability and/or uptime perspective. Greg signature.asc Description: This is a digitally signed message part
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Greg Copeland [EMAIL PROTECTED] writes: On Thu, 2002-09-26 at 16:03, Neil Conway wrote: I'm not really familiar with the reasoning behind ext2's reputation as recovering poorly from crashes; if we fsync a WAL record to disk before we lose power, can't we recover reliably, even with ext2? Well, I have experienced data loss from ext2 before. Also, recovery from crashes on large file systems take a very, very long time. Yes, but wouldn't you face exactly the same issues if you ran a UFS-like filesystem in asynchronous mode? Albeit it's not the default, but performance in synchronous mode is usually pretty poor. The fact that ext2 defaults to asynchronous mode and UFS (at least on the BSDs) defaults to synchronous mode seems like a total non-issue to me. Is there any more to the alleged difference in reliability? Cheers, Neil -- Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Bruce Momjian [EMAIL PROTECTED] writes: Can anyone clarify if data=writeback is safe for PostgreSQL. Specifically, are the data files recovered properly or is this option only for a filesystem containing WAL? data=writeback means that no data is journaled, just metadata (which is like XFS or Reiser). An fsync() call should still do what it normally does, commit the writes to disk before returning. data=journal journals all data and is the slowest and safest. data=ordered writes out data blocks before committing a journal transaction, which is faster than full data journaling (since data doesn't get written twice) and almost as safe. data=writeback is noted to keep obsolete data in the case of some crashes (since the data may not have been written yet) but a completed fsync() should ensure that the data is valid. So I guess I'd probably use data=ordered for an all-on-one-fs installation, and data=writeback for a WAL-only drive. Hope this helps... -Doug ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Neil Conway [EMAIL PROTECTED] writes: I'm not really familiar with the reasoning behind ext2's reputation as recovering poorly from crashes; if we fsync a WAL record to disk before we lose power, can't we recover reliably, even with ext2? Up to a point. We do assume that the filesystem won't lose checkpointed (sync'd) writes to data files. To the extent that the filesystem is vulnerable to corruption of its own metadata for a file (indirect blocks or whatever ext2 uses), that's not a completely safe assumption. We'd be happiest with a filesystem that journals its own metadata and not the user data in the file(s). I dunno if there are any. Hmm, maybe this is why Oracle likes doing their own filesystem on a raw device... regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Tom Lane [EMAIL PROTECTED] writes: We'd be happiest with a filesystem that journals its own metadata and not the user data in the file(s). I dunno if there are any. ext3 with data=writeback? (See my previous message to Bruce). -Doug ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Neil Conway wrote: Greg Copeland [EMAIL PROTECTED] writes: On Thu, 2002-09-26 at 16:03, Neil Conway wrote: I'm not really familiar with the reasoning behind ext2's reputation as recovering poorly from crashes; if we fsync a WAL record to disk before we lose power, can't we recover reliably, even with ext2? Well, I have experienced data loss from ext2 before. Also, recovery from crashes on large file systems take a very, very long time. Yes, but wouldn't you face exactly the same issues if you ran a UFS-like filesystem in asynchronous mode? Albeit it's not the default, but performance in synchronous mode is usually pretty poor. Yes, before UFS had soft updates, the synchronous nature of UFS made it slower than ext2, but now with soft updates, that performance difference is gone so you have two files systems, ext2 and ufs, similar peformance, but one is crash-safe and the other is not. And, when comparing the journalling file systems, you have UFS vs. XFS/ext3/JFS/Reiser, and UFS is faster. The only thing the journalling file system give you is more rapid reboot, but frankly, if your OS goes down often enough so that is an issue, you have bigger problems than fsync time. The big problem is that Linux went from non-crash safe right to crash-safe and reboot quick. We need a middle ground, which is where UFS/soft updates is. The fact that ext2 defaults to asynchronous mode and UFS (at least on the BSDs) defaults to synchronous mode seems like a total non-issue to me. Is there any more to the alleged difference in reliability? The reliability problem isn't alleged. ext2 developers admits ext2 isn't 100% crash-safe. They will say it is usually crash-safe, but that isn't good enough for PostgreSQL. I wish I was wrong. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Doug McNaught wrote: Tom Lane [EMAIL PROTECTED] writes: We'd be happiest with a filesystem that journals its own metadata and not the user data in the file(s). I dunno if there are any. ext3 with data=writeback? (See my previous message to Bruce). OK, so that makes ext3 crash safe without lots of overhead? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On Thu, 2002-09-26 at 17:39, Bruce Momjian wrote: Neil Conway wrote: Greg Copeland [EMAIL PROTECTED] writes: On Thu, 2002-09-26 at 16:03, Neil Conway wrote: I'm not really familiar with the reasoning behind ext2's reputation as recovering poorly from crashes; if we fsync a WAL record to disk before we lose power, can't we recover reliably, even with ext2? Well, I have experienced data loss from ext2 before. Also, recovery from crashes on large file systems take a very, very long time. Yes, but wouldn't you face exactly the same issues if you ran a UFS-like filesystem in asynchronous mode? Albeit it's not the default, but performance in synchronous mode is usually pretty poor. Yes, before UFS had soft updates, the synchronous nature of UFS made it slower than ext2, but now with soft updates, that performance difference is gone so you have two files systems, ext2 and ufs, similar peformance, but one is crash-safe and the other is not. Note entirely true. ufs is both crash-safe and quick-rebootable. You do need to fsck at some point, but not prior to mounting it. Any corrupt blocks are empty, and are easy to avoid. Someone just needs to implement a background fsck that will run on a mounted filesystem. -- Rod Taylor ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Rod Taylor wrote: Yes, before UFS had soft updates, the synchronous nature of UFS made it slower than ext2, but now with soft updates, that performance difference is gone so you have two files systems, ext2 and ufs, similar peformance, but one is crash-safe and the other is not. Note entirely true. ufs is both crash-safe and quick-rebootable. You do need to fsck at some point, but not prior to mounting it. Any corrupt blocks are empty, and are easy to avoid. I am assuming you need to mount the drive as part of the reboot. Of course you can boot fast with any file system if you don't have to mount it. :-) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
On Thu, 2002-09-26 at 17:47, Bruce Momjian wrote: Rod Taylor wrote: Yes, before UFS had soft updates, the synchronous nature of UFS made it slower than ext2, but now with soft updates, that performance difference is gone so you have two files systems, ext2 and ufs, similar peformance, but one is crash-safe and the other is not. Note entirely true. ufs is both crash-safe and quick-rebootable. You do need to fsck at some point, but not prior to mounting it. Any corrupt blocks are empty, and are easy to avoid. I am assuming you need to mount the drive as part of the reboot. Of course you can boot fast with any file system if you don't have to mount it. :-) Sorry, poor explanation. Background fsck (when implemented) would operate on a currently mounted (and active) file system. The only reason fsck is required prior to reboot now is because no-one had done the work. http://www.freebsd.org/cgi/man.cgi?query=fscksektion=8manpath=FreeBSD+5.0-current See the first paragraph of the above. -- Rod Taylor ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Rod Taylor wrote: On Thu, 2002-09-26 at 17:47, Bruce Momjian wrote: Rod Taylor wrote: Yes, before UFS had soft updates, the synchronous nature of UFS made it slower than ext2, but now with soft updates, that performance difference is gone so you have two files systems, ext2 and ufs, similar peformance, but one is crash-safe and the other is not. Note entirely true. ufs is both crash-safe and quick-rebootable. You do need to fsck at some point, but not prior to mounting it. Any corrupt blocks are empty, and are easy to avoid. I am assuming you need to mount the drive as part of the reboot. Of course you can boot fast with any file system if you don't have to mount it. :-) Sorry, poor explanation. Background fsck (when implemented) would operate on a currently mounted (and active) file system. The only reason fsck is required prior to reboot now is because no-one had done the work. http://www.freebsd.org/cgi/man.cgi?query=fscksektion=8manpath=FreeBSD+5.0-current See the first paragraph of the above. Oh, yes, I have heard of that missing feature. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bug in PL/pgSQL GET DIAGNOSTICS?
Bruce Momjian writes: To summarize, with non-INSTEAD, we get the tag, oid, and tuple count of the original query. Everyone agrees on that. For non-INSTEAD, we have: [I think this is the INSTEAD part.] 1) return original tag 2) return oid if all inserts in the rule insert only one row 3) return tuple count of all commands with the same tag I think proper encapsulation would require us to simulate the original command, hiding the fact that something else happened internally. I know it's really hard to determine the virtual count of an update or delete if the command had acted on a permament base table, but I'd rather maintain the encapsulation of updateable views and return unknown in that case. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: AIX compilation problems (was Re: [HACKERS] Proposal ...)
Zeugswetter Andreas SB SD writes: The problem is, that scan.c includes unistd.h before postgres.h and thus unistd.h defines _LARGE_FILE_API which is not allowed together with _LARGE_FILES. Do you know an answer ? Offhand I can only think of using -D_LARGE_FILES as a compiler flag :-( That would be pretty tricky to arrange, since the macro that detects all this is bundled with Autoconf. Also, it would only fix one particular manifestation of the overall problem, namely that pg_config.h needs to come first, period. I can see two ways to fix this properly: 1. Change the flex call to something like (echo '#include postgres.h'; $(FLEX) -t -o$@) $@ This would throw off all #line references by one. 2. Direct the flex output to another file, say scan2.c, and create a new scan.c like this: #include postgres.h #include scan2.c and create the object file from that. We have half a dozen flex calls in our tree, so either fix would propagate a great deal of ugliness around. Do we really want a general 64 bit off_t or would it be sufficient in the two places that use fseeko ? It affects all the I/O functions. If we want to access large files we need to use the large-file capable function interface. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bug in PL/pgSQL GET DIAGNOSTICS?
Peter Eisentraut wrote: Bruce Momjian writes: To summarize, with non-INSTEAD, we get the tag, oid, and tuple count of the original query. Everyone agrees on that. For non-INSTEAD, we have: [I think this is the INSTEAD part.] Sorry, yes. 1) return original tag 2) return oid if all inserts in the rule insert only one row 3) return tuple count of all commands with the same tag I think proper encapsulation would require us to simulate the original command, hiding the fact that something else happened internally. I know it's really hard to determine the virtual count of an update or delete if the command had acted on a permament base table, but I'd rather maintain the encapsulation of updateable views and return unknown in that case. Well, let's look at the common case. For proper view rules, these would all return the right values because the UPDATE in the rule would be returned. Is that what you mean? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Bruce Momjian [EMAIL PROTECTED] writes: Doug McNaught wrote: Tom Lane [EMAIL PROTECTED] writes: We'd be happiest with a filesystem that journals its own metadata and not the user data in the file(s). I dunno if there are any. ext3 with data=writeback? (See my previous message to Bruce). OK, so that makes ext3 crash safe without lots of overhead? Metadata is journaled so you shouldn't lose data blocks or directory entries. Some data blocks (that haven't been fsync()'ed) may have old or wrong data in them, but I think that's the same as ufs, right? And WAL replay should take care of that. It'd be very interesting to do some tests of the various journaling modes. I have an old K6 that I might be able to turn into a hit-the-reset-switch-at-ramdom-times machine. What kind of tests should be run? -Doug ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS] hacker help: PHP-4.2.3 patch to allow restriction of database access
the following was sent to the php developer's list, and they came back with: Isn't it generally better (where better means more secure, efficient, and easily maintained) to handle database access control using PostgreSQL's native access mappings? I would think so, and IMHO, that's where pgsql access control belongs, with pgsql. as best i can understand, there is no way to get apach/php/pgsql configured (using PostgreSQL's native access mappings) that would disallow php code in one virtual host from connecting to any database on the system. i understand that on a user level, we can control which tables they have access to (disregarding that almost all access will be by the web user). can anyone make some valid arguments for/against the patch in php? or any suggestions on how to do it in pgsql? - Forwarded message from Jim Mercer [EMAIL PROTECTED] - Date: Thu, 26 Sep 2002 14:54:45 -0400 From: Jim Mercer [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: PHP-4.2.3 patch to allow restriction of database access the patch is at: ftp://ftp.reptiles.org/pub/php/php-pgsql.patch this patch adds the config variable pgsql.allowed_dblist by default it has no value, meaning all databases are accessible it can contain a colon delimited list of databases that are accessible. if the database accessed is not in the list, and the list is not null, then an error is returned as if the database did not exist this patch is relative to php-4.2.3 this function would be very useful to apache/virtual hosting. i have tested with the following in my apache httpd.conf: Directory /home/www/htdocs/jim php_admin_value pgsql.allowed_dblistjim:billing /Directory although it can be accomplished by other means, setting the variable to a value of : effectively locks the code out of pgsql. also, a special tag of -all- will allow access to all databases. -- [ Jim Mercer[EMAIL PROTECTED] +1 416 410-5633 ] [ I want to live forever, or die trying.] - End forwarded message - -- [ Jim Mercer[EMAIL PROTECTED] +1 416 410-5633 ] [ I want to live forever, or die trying.] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] [ANNOUNCE] PQNotify java listen / notify hack
... And maybe i should attach the code aswell :) I'm not subscribed to pgsql-jdbc or pgsql-announce, so please CC me if your responding... Magnus -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Programmer/Networker [|] Magnus Naeslund -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- PGNotify-0.1.tar.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] hacker help: PHP-4.2.3 patch to allow restriction of
On Thu, 26 Sep 2002, Jim Mercer wrote: the following was sent to the php developer's list, and they came back with: Isn't it generally better (where better means more secure, efficient, and easily maintained) to handle database access control using PostgreSQL's native access mappings? I would think so, and IMHO, that's where pgsql access control belongs, with pgsql. I totally disagree. It is a language level restriction, not a database level one, so why back it into Postgres? Just parse 'conninfo' when it is pg_(p)connect() and check it against the configuration setting. The patch seems fine. I am unsure as to how useful it is. system(/usr/local/pgsql/bin/psql -U jim -c \select 'i got in';\ template1); Gavin ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] hacker help: PHP-4.2.3 patch to allow restriction of database access
On Fri, Sep 27, 2002 at 11:15:35AM +1000, Gavin Sherry wrote: On Thu, 26 Sep 2002, Jim Mercer wrote: I would think so, and IMHO, that's where pgsql access control belongs, with pgsql. I totally disagree. It is a language level restriction, not a database level one, so why back it into Postgres? Just parse 'conninfo' when it is pg_(p)connect() and check it against the configuration setting. which is effectively what my code does, except i was lazy, and i let the connection proceed, then check if PQdb() is in the auth list, and fail if it isn't. (i figured that way if there was any silliness in the conninfo string, PQconnect would figure it out). The patch seems fine. I am unsure as to how useful it is. system(/usr/local/pgsql/bin/psql -U jim -c \select 'i got in';\ template1); that wouldn't work so well in safe_mode. which is the area i'm playing with. maybe not _totally_ secure, but much moreso than nothing. and retricting virtual hosts to their own data sets relieves me of worry about GRANT all ON blah TO public;. -- [ Jim Mercer[EMAIL PROTECTED] +1 416 410-5633 ] [ I want to live forever, or die trying.] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] hacker help: PHP-4.2.3 patch to allow restriction of
On Thu, 26 Sep 2002, Jim Mercer wrote: On Fri, Sep 27, 2002 at 11:15:35AM +1000, Gavin Sherry wrote: On Thu, 26 Sep 2002, Jim Mercer wrote: I would think so, and IMHO, that's where pgsql access control belongs, with pgsql. I totally disagree. It is a language level restriction, not a database level one, so why back it into Postgres? Just parse 'conninfo' when it is pg_(p)connect() and check it against the configuration setting. which is effectively what my code does, except i was lazy, and i let the connection proceed, then check if PQdb() is in the auth list, and fail Ahh yes. I meant to say this. No point being lazy when it comes to security. maybe not _totally_ secure, but much moreso than nothing. I was basically just suggesting that its effect needs to be documented. This needs to be used in conjunction with other forms of security Gavin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] hacker help: PHP-4.2.3 patch to allow restriction of database access
On Fri, Sep 27, 2002 at 12:06:43PM +1000, Gavin Sherry wrote: On Thu, 26 Sep 2002, Jim Mercer wrote: maybe not _totally_ secure, but much moreso than nothing. I was basically just suggesting that its effect needs to be documented. This needs to be used in conjunction with other forms of security documentation? what's that? 8^) -- [ Jim Mercer[EMAIL PROTECTED] +1 416 410-5633 ] [ I want to live forever, or die trying.] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Doug McNaught [EMAIL PROTECTED] writes: data=writeback means that no data is journaled, just metadata (which is like XFS or Reiser). An fsync() call should still do what it normally does, commit the writes to disk before returning. data=journal journals all data and is the slowest and safest. data=ordered writes out data blocks before committing a journal transaction, which is faster than full data journaling (since data doesn't get written twice) and almost as safe. data=writeback is noted to keep obsolete data in the case of some crashes (since the data may not have been written yet) but a completed fsync() should ensure that the data is valid. Thanks for the explanation. So I guess I'd probably use data=ordered for an all-on-one-fs installation, and data=writeback for a WAL-only drive. Actually I think the ideal thing for Postgres would be data=writeback for both data and WAL drives. We can handle loss of un-fsync'd data for ourselves in both cases. Of course, if you keep anything besides Postgres data files on a partition, you'd possibly want the more secure settings. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] hacker help: PHP-4.2.3 patch to allow restriction of database access
Jim Mercer [EMAIL PROTECTED] writes: as best i can understand, there is no way to get apach/php/pgsql configured (using PostgreSQL's native access mappings) that would disallow php code in one virtual host from connecting to any database on the system. Betraying my ignorance of PHP here: what does a server supporting multiple virtual hosts look like from the database's end? Can we tell the difference at all between connections initiated on behalf of one virtual host from those initiated on behalf of another? If we can tell 'em apart (for instance, if they differ in apparent client IP address) then it'd make sense to put enforcement on the database side. If we can't tell 'em apart, then we need some help from the PHP interface code so that we can tell 'em apart. Proceeding on the assumption that we do need some help ... this patch adds the config variable pgsql.allowed_dblist by default it has no value, meaning all databases are accessible it can contain a colon delimited list of databases that are accessible. Seems like this hard-wires a rather narrow view of what sorts of protection restrictions you need. Might I suggest instead that an appropriate config variable would be a list of Postgres user ids that the virtual host is allowed to connect as? Then the database's usual protection mechanisms could be used to allow/disallow connection to particular databases, if that's what you want. But this does more: it lets different virtual hosts connect to the same database as different users, and then access within that DB can be controlled using the regular Postgres access-control mechanisms. Essentially, the idea here is to ensure that the DB can tell virtual hosts apart as different users. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] postmaster -d option (was Re: [GENERAL] Relation 0 does
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: diff -c -c -r1.294 postgres.c *** src/backend/tcop/postgres.c25 Sep 2002 20:31:40 -1.294 --- src/backend/tcop/postgres.c26 Sep 2002 05:15:41 - *** *** 1281,1288 * -d 0 allows user to prevent postmaster debug * from propagating to backend. */ ! SetConfigOption(server_min_messages, notice, ! ctx, gucsource); } break; --- 1281,1287 * -d 0 allows user to prevent postmaster debug * from propagating to backend. */ ! ResetPGVariable(server_min_messages); } break; Turns out I had to revert this change. There isn't a username at this point in the code so the ResetPGVariable username test fails, and even then, I don't think there is any way to set a variable to the value before -d5 set it. If you want export PGOPTIONS=-d0 to do what the comment says, you'd also need to Reset all of the other GUC variables that -dN might have set. However, I'm not sure that I agree with the goal in the first place. If the admin has set debugging on the postmaster command line, should it really be possible for users to turn it off so easily? I see what you are saying, that you can pass -d0 from the client and undo the -d5. Yes, I was wondering about that because even on the command line, if you do -d5 -d0, you still get those extra options. OK, attached patch applied. The restriction that you can't lower the debug level with PGOPTIONS is a separate issue. What I clearly didn't want to do was to reset those other options for -d0, and I didn't have to do that for -d0 to work as advertised. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 Index: src/backend/tcop/postgres.c === RCS file: /cvsroot/pgsql-server/src/backend/tcop/postgres.c,v retrieving revision 1.296 diff -c -c -r1.296 postgres.c *** src/backend/tcop/postgres.c 27 Sep 2002 03:34:15 - 1.296 --- src/backend/tcop/postgres.c 27 Sep 2002 03:49:34 - *** *** 1136,1141 --- 1136,1142 const char *DBName = NULL; boolsecure; int errs = 0; + int debug_flag = 0; GucContext ctx; GucSource gucsource; char *tmp; *** *** 1250,1255 --- 1251,1257 case 'd': /* debug level */ { + debug_flag = atoi(optarg); /* Set server debugging level. */ if (atoi(optarg) != 0) { *** *** 1259,1283 SetConfigOption(server_min_messages, debugstr, ctx, gucsource); pfree(debugstr); - /* -* -d is not the same as setting -* client_min_messages because it enables other -* output options. -*/ - if (atoi(optarg) = 1) - SetConfigOption(log_connections, true, ctx, gucsource); - if (atoi(optarg) = 2) - SetConfigOption(log_statement, true, ctx, gucsource); - if (atoi(optarg) = 3) - SetConfigOption(debug_print_parse, true, ctx, gucsource); - if (atoi(optarg) = 4) - SetConfigOption(debug_print_plan, true, ctx, gucsource); - if (atoi(optarg) = 5) - SetConfigOption(debug_print_rewritten, true, ctx, gucsource); } else /* !* -d 0 allows user to prevent
Re: [HACKERS] unicode
Where can I read about basic tech details of Unicode / Charset Conversion / ... I't like to find answers to the following (for database created using UNICODE) 1. Where exactly are conversions between national charsets done No national charset is in PostgreSQL. I assume you want to know where frontend/backend encoding conversion happens. They are handled by pg_server_to_client(does conversion BE to FE) and pg_client_to_server(FE to BE). These functions are called by the communication sub system(backend/libpq) and COPY. In summary, in most cases the encoding conversion is done before the parser and after the executor produces the final result. 2. What is converyted (whole SQL statements or just data) Whole statement. 3. What format is used for processing in memory (UCS-2, UCS-4, UTF-8, UTF-16, UTF-32, ...) format? I assume you are talking about the encoding. It is exactly same as the database encoding. For UNICODE database, we use UTF-8. Not UCS-2 nor UCS-4. 4. What format is used when saving to disk (UCS-*, UTF-*, SCSU, ...) ? Ditto. 5. Are LIKE/SIMILAR aware of locale stuff ? I don't know about SIMILAR, but I believe LIKE is not locale aware and is correct from the standard's point of view... -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] hacker help: PHP-4.2.3 patch to allow restriction of database access
On Thu, Sep 26, 2002 at 11:42:44PM -0400, Tom Lane wrote: Jim Mercer [EMAIL PROTECTED] writes: as best i can understand, there is no way to get apach/php/pgsql configured (using PostgreSQL's native access mappings) that would disallow php code in one virtual host from connecting to any database on the system. Betraying my ignorance of PHP here: what does a server supporting multiple virtual hosts look like from the database's end? Can we tell the difference at all between connections initiated on behalf of one virtual host from those initiated on behalf of another? normally (in my experience) php is linked into apache, and pgsql is linked into php. apache runs as the same user (unless you use suexec and a cgi version of php). pgsql's knowledge of the php process is only what is passed on by the user. since there is no IP addr specific to the process, we can't easily use host-based authentication. for domain sockets, pgsql only gets the UID of the httpd process. since all of the virtual hosts are run by the same uid, there is no way to differentiate between the virtual hosts. one could attempt to use a specific username and hardcoded password, but that leaves the password in plain text in the php code. and that does not stop someone from writing code to browse the available databases for tables set with GRANT ALL ON blah to PUBLIC;. my patch is an attempt to put an immutable list of databases in the apache config (safe from modification by normal users). and to have PQconnect() check against that list before allowing access. the list would be specific to a virtual host (and/or the directort hierarchy of the pages). it is possible to pass such a list to pgsql through environment variables, but those can be overridden by users. the php-dev people are giving me a hard time saying that this level of security should be managed internally by pgsql. i'm trying to explain to them that it isn't, and that my patch allows this security to happen. if libpq had an additional facility where PQconnect checked against a list passed to it in some fashion, then we could probably just pass that through in the php modules, and they'd probably be more content with that as it is just part of the pgsql API. i'm thinking something like a wrapper function like: PGconn *PQconnect_restricted(char *conninfo, char *restricted_list) { // break out conninfo ... if (restricted_list != NULL) { // check to see if dbName is in the list if (not in list) { fail as if dbName did not exist } } return(PQconnect(conninfo); } (i'm sure someone more familiar with the code could come up with a more refined way of doing this) this patch adds the config variable pgsql.allowed_dblist by default it has no value, meaning all databases are accessible it can contain a colon delimited list of databases that are accessible. Seems like this hard-wires a rather narrow view of what sorts of protection restrictions you need. Might I suggest instead that an appropriate config variable would be a list of Postgres user ids that the virtual host is allowed to connect as? Then the database's usual protection mechanisms could be used to allow/disallow connection to particular databases, if that's what you want. But this does more: it lets different virtual hosts connect to the same database as different users, and then access within that DB can be controlled using the regular Postgres access-control mechanisms. ideally, i'd like to have users-per-database, as opposed to the global model we have now. i'm tired of maintaining seperate pgsql userlists and application userlists. probably a pipe dream, but, well, there you are. 8^) if we are willing to modify libpq to support a white-list, then what you are suggesting is quite possible. to satisfy the php-dev people, we just need to extend the API to require/allow such a white-list to be processed. passing the white-list from httpd.conf - php - libpq is an easy enough tweak. i suspect the php-dev people are unhappy with my patch because it is including logic (ie. parsing the white-list) which they don't think php should be responsible for. personally, i think such an attitude is too rigid, but i'm also thinking a white-list mechanism would be useful in other contexts as well. -- [ Jim Mercer[EMAIL PROTECTED] +1 416 410-5633 ] [ I want to live forever, or die trying.] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Performance while loading data and indexing
Hello! On Thu, 26 Sep 2002, Bruce Momjian wrote: I'm not really familiar with the reasoning behind ext2's reputation as recovering poorly from crashes; if we fsync a WAL record to disk On relatively big volumes ext2 recovery can end up in formatting the fs under certain cirrumstances.;-) I assumed it was the double fsync for the normal and journal that made the journalling file systems slog. Well, a journalling file system would need to write a journal entry and flush that to disk, even if fsync is disabled -- whereas without fsync enabled, ext2 doesn't have to flush anything to disk. ISTM that the performance advantage of ext2 over ext3 is should be even larger when fsync is not enabled. Yes, it is still double-writing. I just thought that if that wasn't happening while the db was waiting for a commit that it wouldn't be too bad. Is it just me or do all the Linux file systems seem like they are lacking something when PostgreSQL is concerned? We just want a UFS-like file system on Linux and no one has it. mount -o sync an ext2 volume on Linux - and you can get a UFS-like fs.:) mount -o async an FFS volume on FreeBSD - and you can get boost in fs performance. Personally me always mount ext2 fs where Pg is living with sync option. Fsync in pg is off (since 6.3), this way successfully pass thru a few serious crashes on various systems (mostly on power problems). If fsync is on in Pg, performance gets so-oh-oh-oh-oh slowly!=) I just have done upgrade from 2.2 kernel on ext2 to ext3 capable 2.4 one so I'm planning to do some benchmarking. Roughly saying w/o benchmarks, the performance have been degraded in 2/3 proportion. But better safe then sorry. -- WBR, Yury Bokhoncovich, Senior System Administrator, NOC of F1 Group. Phone: +7 (3832) 106228, ext.140, E-mail: [EMAIL PROTECTED] Unix is like a wigwam -- no Gates, no Windows, and an Apache inside. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])