Re: [GENERAL] Long term database archival
On Thu, 6 Jul 2006, Dann Corbit wrote: It's the data that contains all the value. The hardware becomes obsolete when it can no longer keep up with business needs. . or can no longer be repaired. :) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Postmaster is starting but shutting when trying to connect (Windows)
Hello, i have a PostgreSQL (8.1) installation for testing purposes which was running fine for several months now (Windows XP). I was working with it yesterday, and today after booting my computer and restarting the service (I'm starting the service manually, because I don't need the server running all the time) postmaster comes up fine, no entries in the log file. But as soon as I try to connect to the server (does not matter what type of client, psql, PgAdmin or JDBC) I get an error message in the log file. Here is the full log file: 2006-07-07 09:18:15 LOG: database system was shut down at 2006-07-07 09:14:00 Westeuropäische Sommerzeit 2006-07-07 09:18:15 LOG: checkpoint record is at 0/3C82F60 2006-07-07 09:18:15 LOG: redo record is at 0/3C82F60; undo record is at 0/0; shutdown TRUE 2006-07-07 09:18:15 LOG: next transaction ID: 100576; next OID: 19416 2006-07-07 09:18:15 LOG: next MultiXactId: 1; next MultiXactOffset: 0 2006-07-07 09:18:15 LOG: database system is ready 2006-07-07 09:18:15 LOG: transaction ID wrap limit is 2147484148, limited by database postgres 2006-07-07 08:43:54 LOG: incomplete startup packet 2006-07-07 08:44:18 LOG: could not receive data from client: An operation was attempted on something that is not a socket. 2006-07-07 08:44:18 LOG: incomplete startup packet 2006-07-07 08:45:27 LOG: received fast shutdown request 2006-07-07 08:45:27 LOG: shutting down 2006-07-07 08:45:27 LOG: database system is shut down 2006-07-07 08:45:28 LOG: logger shutting down I already checked with netstat whether I have something running on port 5432 and I changed the port in posgresql.conf just to make sure it's not cause by another process that locks port 5432 that I'm not aware of. But still the same thing There is no firewall active which would block that port either (and - as I said before I shut down my computer yesterday evening it was working fine) There is also no postmaster.pid file hanging around when I shut down the service. Any hints what I could try to fix this? Thanks in advance Thomas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Modeling Tool
On 7/6/06, Rodrigo Sakai [EMAIL PROTECTED] wrote: Anyone knows a good tool for do the reverse engineering of a postgresql database? I tried to use DBDesigner, but I couldn't get the relationships!we used case studio 2. worked quite well.depesz-- http://www.depesz.com/ - nowy, lepszy depesz
Re: [GENERAL] Long term database archival
On Thu, 2006-07-06 at 20:57, Karl O. Pinc wrote: Hi, What is the best pg_dump format for long-term database archival? That is, what format is most likely to be able to be restored into a future PostgreSQL cluster. Should we want to restore a 20 year old backup nobody's going to want to be messing around with decoding a custom format dump if it does not just load all by itself. Karl, I would say that if you really want data from 20 years ago, keep it in the custom format, along with a set of the sources of postgres which created the dump. then in 20 years when you'll need it, you'll compile the sources and load the data in the original postgres version... of course you might need to also keep an image of the current OS and the hardware you're running on if you really want to be sure it will work in 20 years :-) Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Partition Rule Updating While Running?
Sorry if this is a duplicat, someone suggested posting to general as well, I'm using PostgreSQL 8.1.4 in a Hibernate Application and I am attempting to use partitioning via Inherited tables. At first I was going to create a rule per sub-table based on a date range, but found out with multiple rules postgres will only return the affected-row count on the last rule which gives Hibernate problems. So now I'm thinking the way to do it is just have one rule at a time and when I want to start appending data to a new partition, just change the rule on the parent table and also update the constraint on the last table to reflect the date ranges contained so that constraint_exclusion will work. this should perform better also. For instance Starting off with: Parent (Rule on insert instead insert into Child2) Child1 (Constraint date = somedate1) Child2 (Constraint date somedate1) Now I want to create another Partition: Create Table Child3 BEGIN Update Parent Rule( instead insert into Child3) somedate2 = max(date) from Child2 Update Child2 Constraint( date somedate1 AND date = somedate2 ) Set Constraint Child3 (date somedate2) END Which ends up with: Parent (Rule on insert instead insert into Child2) Child1 (Constraint date = somedate1) Child2 (Constraint date somedate1 AND date = somedate2) Child3 (Constraint date somedate2) Anyone else tried this or expect it to work consistently (without stopping db)? Is it possible that there could be a race condition for the insertion and constraints or will the transaction prevent that from occurring? I've done some testing and it seems to work but I could just get lucky so far and not lose any data :) Thanks for any help, Gene ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Partition Rule Updating While Running?
On Fri, Jul 07, 2006 at 04:39:53AM -0400, Gene wrote: Sorry if this is a duplicat, someone suggested posting to general as well, I'm using PostgreSQL 8.1.4 in a Hibernate Application and I am attempting to use partitioning via Inherited tables. At first I was going to create a rule per sub-table based on a date range, but found out with multiple rules postgres will only return the affected-row count on the last rule which gives Hibernate problems. So now I'm thinking the way to do it is just have one rule at a time and when I want to start appending data to a new partition, just change the rule on the parent table and also update the constraint on the last table to reflect the date ranges contained so that constraint_exclusion will work. this should perform better also. For instance Why not just decide somedate2 in advance and create the constraint as a range to start with. Then, a few days before somedate2 add a new constraint for between somedate2 and somedate 3. Then you never have to update any constraints... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] Postmaster is starting but shutting when trying to connect (Windows)
On 07.07.2006 09:20 Thomas Kellerer wrote: Hello, i have a PostgreSQL (8.1) installation for testing purposes which was running fine for several months now (Windows XP). I was working with it yesterday, and today after booting my computer and restarting the service (I'm starting the service manually, because I don't need the server running all the time) postmaster comes up fine, no entries in the log file. But as soon as I try to connect to the server (does not matter what type of client, psql, PgAdmin or JDBC) I get an error message in the log file. Follow up: when I start Postgres manually i.e. from a commandline running as the postgres user, using pg_ctl, then everything is working fine. When I use net start to start the Postgres Service I cannot connect Cheers Thomas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Why my cursor construction is so slow?
# kleptog@svana.org / 2006-06-22 09:19:44 +0200: On Tue, Jun 20, 2006 at 02:06:19AM -0700, [EMAIL PROTECTED] wrote: Such construction is very slow but when I modify SQL to: OPEN cursor1 FOR SELECT * FROM alias WHERE mask=alias_out ORDER BY mask LIMIT 100; it works very fast. It is strange for me becuase I've understood so far that when cursor is open select is executed but Postgres does not select all rows - only cursor is positioned on first row, when you execute fetch next row is read. But this example shows something different. PostgreSQL tries to optimise for overall query time. Without the limit it tries to find a plan that will return the whole set as quick as possible. That looks like the wrong approach for a cursor. With the LIMIT it might take a different approach, which might be worse if you read the whole lot, but better for a limited set. A fast-start plan so to speak. That looks like a better approach for a cursor. -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Long term database archival
On 7/7/2006 17:49, Csaba Nagy [EMAIL PROTECTED] wrote: On Thu, 2006-07-06 at 20:57, Karl O. Pinc wrote: Hi, What is the best pg_dump format for long-term database archival? That is, what format is most likely to be able to be restored into a future PostgreSQL cluster. Should we want to restore a 20 year old backup nobody's going to want to be messing around with decoding a custom format dump if it does not just load all by itself. Karl, I would say that if you really want data from 20 years ago, keep it in the custom format, along with a set of the sources of postgres which created the dump. then in 20 years when you'll need it, you'll compile the sources and load the data in the original postgres version... of course you might need to also keep an image of the current OS and the hardware you're running on if you really want to be sure it will work in 20 years :-) Cheers, Csaba. Depending on the size of data (if it isn't too large) you could consider creating a new database for archives, maybe even one for each year. This can be on an old server or backup server instead of the production one. Unless the data is too large you can dump/restore the archive data to a new pg version as you upgrade meaning the data will always be available and you won't have any format issues when you want to retrieve the data. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] VACUUM and fsm_max_pages
Good morning List, I have seen several posts on this concept but I don’t find a complete response. I’m using BenchmarkSQL to evaluate PostgreSQL in transaction processing and I work with PostgreSQL 8.1.3 on RHEL4-AS, Itanium-2 processor, 8GB RAM. The database, generated via BenchmarkSQL and used, is a 200-warehouses database and its size is about 20GB. The parameter “max_fsm_pages” is equal to 2 and “max_fsm_relations” to 1000. Between two benchs, I launch a VACUUM but at the end of it, I see that PostgreSQL asks me to increase the “max_fsm_pages” parameters and the value proposed grows with the number of VACUUM launched… Could someone explain me why ? This is an example of the message I have : Free space map contains 20576 pages in 17 relations A total of 2 page slots are in use (including overhead) 128512 page slots are required to track all free space Current limits are : 2 page slots, 1000 relations, using 223 KB Number of page slots needed (128512) exceeds max_fsm_pages (2) HINT : Consider increasing the config parameter “max_fsm_pages” to a value over 128512. In order not to launch a VACUUM FULL, I increase the value of “max_fsm_pages” but is it correct ? Thank you for your help. Regards, Alexandra DANTE ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] WAL internals
Hello, I am currently working on a heavily stressed system and i am having some difficulties to make the background writer work properly. But before going any further, i would like to be sure that i understood how transaction processing, and management are handled by postgreSQL. Correct me if i'm wrong : 1. A Transaction is sent from the postgreSQL backend 2. If the result of the transaction is within the shared buffers, we get our result instantly. Else some searching is done within the database datafiles to get the result which is copied to the shared buffers memory zone. The transaction is stocked in a WAL buffer. Now, when the result is copied to the shared buffer, if the transaction was an update or a delete the line is flagged to be updated/deleted in the datafiles. transactions go on and on this way. At some points, the WAL buffers are written in the checkpoint segments. I don't know when, if you could just precise this point. 3. Then periodically, there are checkpoints, those will make the changes into the datafiles from the shared buffers (meaning shared buffers are flushed into the datafiles). The last written record in the datafiles is flagged into the checkpoint segments that way REDOs are possible. Now i tried to set the bgwriter_lru_percent to 100% and bgwriter_lru_maxpages to 1000 and i did not spot any difference with the disk activities, cpu occupation or anything else from the default set up which is 1% and 5 so i was wondering if commiting after every transaction would prevent me from seeing any difference ? or is there another explanation ? by the way, i made sure all changes took effect i restarted the postmaster process. I first asked these questions on the novice mailing list. Thanks in advance for your help Regards, Fabrice Franquenk. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Do checkpoints flush all data from shared buffers ?
Hello, I was wondering if each checkpoint would flush all transactions from the shared buffers or if there could be some left at the end of the checkpoint ? Because i was trying to lower I/Os of the disks, i got the checkpoint timeout lowered to 150 seconds so i get twice the number the checkpoint. I was hoping it would reduce the number of I/Os on the disks because there would be less data to write in datafiles... Thanks in advance for your help regards, Fabrice Franquenk ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Version/Change Management of functions?
# [EMAIL PROTECTED] / 2006-07-06 22:41:27 -0600: OK I know this is an odd question but I'm working on an app that will rely more and more on database driven functions, and while the app's source is in SVN, and I intend for the source of the SQL scripts to also be there, I was wondering...what are people doing for version control and change management on their custom (esp PL/pgSQL and say PL/Perl) functions? Well, people use a version control system. Do you have any specific questions? -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] WAL internals
On Fri, Jul 07, 2006 at 11:49:13AM +0200, Fabrice Franquenk wrote: 2. If the result of the transaction is within the shared buffers, we get our result instantly. Else some searching is done within the database datafiles to get the result which is copied to the shared buffers memory zone. The transaction is stocked in a WAL buffer. Now, when the result is copied to the shared buffer, if the transaction was an update or a delete the line is flagged to be updated/deleted in the datafiles. The shared buffers are merely images of what's on disk. Whenever a block is read from disk it goes into a shared buffer. This applies to index pages, data pages, anything. Tuples are inserted into the pages in the buffer, and also appended to the transaction log. The page is marked as dirty but not yet written out. transactions go on and on this way. When a transaction commits, the WAL is synced. At some points, the WAL buffers are written in the checkpoint segments. I don't know when, if you could just precise this point. Straight away, why delay? 3. Then periodically, there are checkpoints, those will make the changes into the datafiles from the shared buffers (meaning shared buffers are flushed into the datafiles). The last written record in the datafiles is flagged into the checkpoint segments that way REDOs are possible. No, all checkpoints do is make sure all pages in the shared buffers match what's on disk. When that's the case, you don't need to keep the WAL anymore. REDO just replays the WAL, nothing more. You only read it on unclean shutdown. Now i tried to set the bgwriter_lru_percent to 100% and bgwriter_lru_maxpages to 1000 and i did not spot any difference with the disk activities, cpu occupation or anything else from the default set up which is 1% and 5 so i was wondering if commiting after every transaction would prevent me from seeing any difference ? or is there another explanation ? All the bgwriter does is write out dirty pages to disk so checkpoints don't take as long. How much data is there ever outstanding on your system, if it's not much, then the bgwriter probably isn't doing much... BTW, check out the documentation on WAL, it's much clear than what I've written.. Hvae a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] Why my cursor construction is so slow?
On Fri, Jul 07, 2006 at 11:30:35AM +, Roman Neuhauser wrote: With the LIMIT it might take a different approach, which might be worse if you read the whole lot, but better for a limited set. A fast-start plan so to speak. That looks like a better approach for a cursor. For a cursor postgres assumes you're going to ask for about 10% of the result, so it does aim for a reasonably fast-start plan. It probably depends on the specifics of the situation how well it works... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] Long term database archival
Csaba Nagy schrieb: ... Karl, I would say that if you really want data from 20 years ago, keep it in the custom format, along with a set of the sources of postgres which created the dump. then in 20 years when you'll need it, you'll compile the sources and load the data in the original postgres version... of course you might need to also keep an image of the current OS and the hardware you're running on if you really want to be sure it will work in 20 years :-) No need - you will just emulate the whole hardware in 20 years ;-) Regards Tino ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] How to sample records
Dear friends, I'm resending my Dear friends, owing to a very poor webmail I'm compelled to use I'm resending my messed-up message of yesterday. in postgresql 8.0.7 I have the following table \d basedati Tabella public.basedati Colonna |Tipo | Modificatori -- +-+-- data_ora | timestamp without time zone | cod_wmo | character(5)| t_aria | numeric(5,1)| ur | integer | looking like: data_ora | cod_wmo | t_aria | ur -+-++ 2006-07-05 00:50:00 | 16235 | 22.0 | 74 2006-07-05 02:50:00 | 16235 | 19.0 | 80 2006-07-05 09:50:00 | 16235 | 28.0 | 45 2006-07-05 12:00:00 | 16235 | 31.0 | 41 2006-07-06 00:00:00 | 16235 | 20.1 | 93 2006-07-06 00:50:00 | 16235 | 20.0 | 93 2006-07-06 06:50:00 | 16235 | 25.0 | 65 2006-07-06 11:50:00 | 16235 | 30.0 | 49 2006-07-06 12:00:00 | 16235 | 29.5 | 51 2006-07-05 03:00:00 | 16242 | 19.9 | 64 2006-07-05 03:15:00 | 16242 | 20.0 | 69 2006-07-05 10:15:00 | 16242 | 28.0 | 39 2006-07-05 12:00:00 | 16242 | 28.6 | 39 2006-07-06 00:00:00 | 16242 | 22.2 | 71 2006-07-06 00:15:00 | 16242 | 22.0 | 74 2006-07-06 00:45:00 | 16242 | 23.0 | 64 2006-07-06 01:15:00 | 16242 | 21.0 | 74 2006-07-06 10:15:00 | 16242 | 27.0 | 56 2006-07-06 12:00:00 | 16242 | 25.6 | 72 What I would like to extract (and I'm unable to!!) is: for each cod_wmo and each day (that is to_char(data_ora, '-MM- DD')), the complete record in correspondence of the maximum of t_aria. e.g. data_ora | cod_wmo | t_aria | ur -+-++ 2006-07-05 12:00:00 | 16235 | 31.0 | 41 2006-07-06 11:50:00 | 16235 | 30.0 | 49 2006-07-05 12:00:00 | 16242 | 28.6 | 39 2006-07-06 10:15:00 | 16242 | 27.0 | 56 Please help this absolute beginner Ciao Vittorio ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Version/Change Management of functions?
Michael Loftis wrote: OK I know this is an odd question but I'm working on an app that will rely more and more on database driven functions, and while the app's source is in SVN, and I intend for the source of the SQL scripts to also be there, I was wondering...what are people doing for version control and change management on their custom (esp PL/pgSQL and say PL/Perl) functions? We went for generating all server-side code out of a data dictionary. This makes for a significant change in the way change management is handled. In this scenario change management becomes the analysis of before and after data dictionaries. If the changes are all valid, build the code. -- Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds. -- Samuel Butler ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings begin:vcard fn:Kenneth Downs n:Downs;Kenneth email;internet:[EMAIL PROTECTED] tel;work:631-689-7200 tel;fax:631-689-0527 tel;cell:631-379-0010 x-mozilla-html:FALSE version:2.1 end:vcard ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Version/Change Management of functions?
Kenneth Downs [EMAIL PROTECTED] writes: We went for generating all server-side code out of a data dictionary. This makes for a significant change in the way change management is handled. In this scenario change management becomes the analysis of before and after data dictionaries. If the changes are all valid, build the code. Ken, could you explain it a bit better? I think this is an interesting idea. -- Jorge Godoy [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] migration from Sybase to Postgres
What is the most appropriate way to migrate a database form Sybase (SQL Anywhere 5.504) to Postgres? I found some shareware of freeware migration tools on the net, which are the best? Is it faster to use any of them them or just simply do it manually? Thanks Tomas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] migration from Sybase to Postgres
On 7 Jul 2006 12:50:22 -, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: What is the most appropriate way to migrate a database form Sybase (SQL Anywhere 5.504) to Postgres? I found some shareware of freeware migration tools on the net, which are the best? Is it faster to use any of them them or just simply do it manually? An excellent conversion tool is actually microsoft sql server. It can convert via dts transformation to any odbc source to any other odbc source if you have a windows machine to do the conversion and a copy of the database. barring that, I would just dump sybase into sql statements (does it support that?) and massage the text and pipe into psql. merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Need help with quote escaping in exim for postgresql
Hi, I am the maintainer of Debian's packages for exim4, a powerful and versatile Mail Transfer Agent developed in Cambridge and in wide use throughout the Free Software Community (http://www.exim.org/). One of our daemon flavours has PostgreSQL support. Our security guys have found a flaw in exim regarding quote escaping for PostgreSQL. The bug is filed in Debian's BTS as http://bugs.debian.org/369351 and was transferred to exim's Bugzilla installation as http://www.exim.org/bugzilla/show_bug.cgi?id=107. Personally, I do not have any PostgreSQL experience (and do not have time and expertise to accumulate any), and the PostgreSQL support code in exim was contributed some time ago and Philip Hazel, exim's author, doesn't know too much about PostgreSQL as well. From what I understand, the correct way would be to use PQescapeStringConn, but that function needs an established connection, and exim performs string escape early, way before the actual connection is established. I'd appreciate if anybody familiar with PostgreSQL programming could take a look at the two bug reports and probably exim's program code and suggest a possible solution, preferably in the bugzilla issue log referenced above. I'll monitor this thread for possible solutions and help, though. Any help would be greatly appreciated. Greetings Marc -- - Marc Haber | I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things.Winona Ryder | Fon: *49 621 72739834 Nordisch by Nature | How to make an American Quilt | Fax: *49 621 72739835 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Version/Change Management of functions?
On 7/7/06, Michael Loftis [EMAIL PROTECTED] wrote: OK I know this is an odd question but I'm working on an app that will rely more and more on database driven functions, and while the app's source is in SVN, and I intend for the source of the SQL scripts to also be there, I was wondering...what are people doing for version control and change management on their custom (esp PL/pgSQL and say PL/Perl) functions? Keeping your sql procs in cvs/svn is a great idea, and is (IMHO) the only way to really do it right if you have a large project. I have some suggestions that may help you get started. 1. keep your database procedures different schemas in the database. do not put tables or views in these schemas. for example, if you are building an accounting application, make a schema called, ar (accounts recevable), ap (payables), gl, etc. put all sql code in appropriate schemas. These should mirror your folder structure in your code repository. Since schemas can only go one level deep, try and structure your code base to go only one level deep. 2. For each schema/folder, maintain a sql build file or some type if make file which uploads the code to the database. you could get fancy with this, or just do a simple cat *.sql | psql yadda in a one line shell script. The important thing is to have an automatic way of reconstructing your database. 3. ban your developers from editing directly in the database. this means no pgadmin (for ddl), and no direct ddl in the shell. This bypasses the souce control. While it is fine for a development test database, all uploads to production databse should go through the build system. It is ok to copy/paste from .sql files into shell/pgadmin however. 4. an application code/database code, make a habit of fully qualifying the function e.g. select ar.update_invoices(); 5. when you make updates to a production sysem, just include (\i) your .sql files that have been updated with the change. dml can be inlined however. e.g. -- yadda_1.1.sql -- converts yadda from 1.0 to 1.1 \i ../../ar/update_invoices.sql \i ../../ap/delete_customer.sql update foo set bar = 1; 6. I would suggest, for extra safety purposes, doing a full schema-only dump on cron and inserting into svn on a daily basis. 7. views and other table dependant objets (triggers but not trigger functions) should be stored in the same schema as the table(s) they operate over. Unlike functions they therefore can not match 1-1 fodler correspondence if you have multiple copies of same table in different schemas. Putting all this together, I would suggest a folder structure like yadda ar funcs update_invoices.sql views achived_invoices.sql build_ar.sql ap funcs views build_ap.sql updates yadda_1.0.sql yadda_1.1.sql merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Long term database archival
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ben wrote: On Thu, 6 Jul 2006, Dann Corbit wrote: It's the data that contains all the value. The hardware becomes obsolete when it can no longer keep up with business needs. . or can no longer be repaired. :) http://www.softresint.com/charon-vax/index.htm - -- Ron Johnson, Jr. Jefferson LA USA Is common sense really valid? For example, it is common sense to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that common sense is obviously wrong. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFErnGyS9HxQb37XmcRAjz0AKCvhP7k5quH+Ozdwa1Z35zvdYyuLACgu45B tgCgyFmeOvyKp7jzZivpSdI= =8CL1 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] How to optimize query that concatenates strings?
Hi all, I've got a database of URLs, and when inserting new data into it I want to make sure that there are no functionally equivalent URLs already present. For example, 'umist.ac.uk' is functionally the same as 'umist.ac.uk/'. I find that searching for the latter form, using string concatentation to append the trailing slash, is much slower than searching for a simple string - the index on URL name isn't used to speed up the search. Here's an illustration url=# explain select exists(select * from url where url = 'umist.ac.uk' or url || '/' = 'umist.ac.uk') as present; QUERY PLAN --- Result (cost=47664.01..47664.02 rows=1 width=0) InitPlan - Seq Scan on url (cost=0.00..47664.01 rows=6532 width=38) Filter: ((url = 'umist.ac.uk'::text) OR ((url || '/'::text) = 'umist.ac.uk'::text)) (4 rows) url=# explain select exists(select * from url where url = 'umist.ac.uk') as present; QUERY PLAN Result (cost=5.97..5.98 rows=1 width=0) InitPlan - Index Scan using url_idx on url (cost=0.00..5.97 rows=1 width=38) Index Cond: (url = 'umist.ac.uk'::text) (4 rows) Is there any way I can force postgres to use the index when using the string concatenation in the query? Thanks in advance, BBB ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] OLEDB connection does not want to work. Help!!
---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Is this for me? I am currently using Google to post. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] OLEDB connection does not want to work. Help!!
Hi Merlin, as I tried to explain, I do not need just to send some sql to postgres, I am working at an higher level of abstraction, where I need the information (that must be) provided by (any) OleDb Provider in my opinion the support for the npgsql driver is ok (not great). It is much better than the oledb provider however. No doubt about that, but this is irrelevant for my purposes. In general when you have an application that needs to speak to all possible DBMS (and there are many, such as Reporting Tools or Query builders, ...), and hence rely strictly on the OleDb information, the OleDb is the only way to go (I believe). I have used basically every version of visual studio and have found that in most cases for most uses, you will be best off with the oledb provider for odbc drivers if you want to use such tools. Second choice (.net only) is to use npgsql driver and typed datasets. I am NOT talking of the oledb provider for odbc drivers: I never used it either. That's one of the reason of the popularity OleDb providers are gaining among programmers. ODBC is much more important standard than oledb. Here we have different opinions. I could show you a lot of things you cannot do without the OleDb functionalities, at least if you want to talk to all dbms at the same time. True, but oledb is basically microsoft only protocol while odbc is essentially open standard. Personally, I don't mind using specific providers. For example, at least 50% of my development is directly over libpq library. Nowadays every producer must have its OleDb provider. It's not an option. Moreless, like for me it not an option to write a program that does not use web services or isn't web enabled. Of course I could disregard that part. But that I would lose about 90% of my potential users (or clients). It is important that the Postgres people realize that a good OleDb provided is crucial to spread their product. Just as an example I am just waiting for their fix to provide support to Postgres users in one project of mine: http://cam70.sta.uniroma1.it/Community/ and this will certainly contribute to increase (a little) the popularity of Postgress. (But I cannot do anything to help them if they don't fix it.) You know, Merlin, sometimes people perpective are just different because they come from different experiences and environment. It's not matter to be right or wrong. In my experience, it is crucial the possibility to rely on the high level information provided by the OleDb protocol, because it allows to have a unique interface for any dbms in the world and to have a unique way to do things (apart the slight differences in sql dialects). This allows easy system integration. If you see for intance how DataTime works, retrieving dbms structure talking with any dbms and easily moving data from one to another, it will be clear what I mean. If I should write code that depends on the underlying DBMS, maintenance would just be impossibile and programming a real hell (it's already a hell the way it is now :) ! ) General standards have always prevailed. It's just matter of time. un caro saluto, Tommaso ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Need help with quote escaping in exim for postgresql
On Fri, Jul 07, 2006 at 03:48:00PM +0200, Marc Haber wrote: Hi, I am the maintainer of Debian's packages for exim4, a powerful and versatile Mail Transfer Agent developed in Cambridge and in wide use throughout the Free Software Community (http://www.exim.org/). One of our daemon flavours has PostgreSQL support. Our security guys have found a flaw in exim regarding quote escaping for PostgreSQL. The bug is filed in Debian's BTS as http://bugs.debian.org/369351 and was transferred to exim's Bugzilla installation as http://www.exim.org/bugzilla/show_bug.cgi?id=107. Whether or not the quick fix works for you depends entirly on the encoding used by the client to talk to the database. If the connection is encoded using UTF-8 or any of the Latin series, then it will be fine. The only time it does not work is if the encoding is an encoding where the quote or backslash character can appear as the second character of a multibyte char. This doesn't happen with UTF-8 or any latin encoding. http://www.postgresql.org/docs/techdocs.50 This bit may be useful also (especially the second point): There are a number of mitigating factors that may keep particular applications from being subject to these security risks: * If application always sends untrusted strings as out-of-line parameters, instead of embedding them into SQL commands, it is not vulnerable. * If client_encoding is a single-byte encoding (e.g., one of the LATINx family), there is no vulnerability. * If application cannot pass invalidly encoded data to the server, there is no vulnerability (this probably includes all Java applications, for example, because of Java's handling of Unicode strings). The easiest may be to simply always set the client encoding to something like UTF-8 and work the escaping rules so they work with that. Hope this helps, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] Version/Change Management of functions?
Jorge Godoy wrote: Kenneth Downs [EMAIL PROTECTED] writes: We went for generating all server-side code out of a data dictionary. This makes for a significant change in the way change management is handled. In this scenario change management becomes the analysis of "before" and "after" data dictionaries. If the changes are all valid, build the code. Ken, could you explain it a bit better? I think this is an interesting idea. Sure. To start off I'd say I'm one of those "biz rules belong in the server" guys. My guess is we are on the same page there so we'll take that as a given. So anyway, some years ago I joined an existing project and was eventually promoted to systems architect. Along the way I developed their change management system from scratch (we had more salary dollars than tools dollars). The "Aha!" moment came when I realized what may seem obvious to many, which was that you can never, nohow, noway, never prove ahead of time that any particular piece of code was not going to break something. You can't even prove it will do what anybody claims. I wanted a way to know by analysis, just by looking, that any particular change to a spec would work. That is, it would do what it was supposed to do, without stopping other things from doing what they were supposed to do. It so happens you can have this if you generate your code out of a spec that is itself data. The spec has to be comprehensive, it can't just be columns and tables. You need to be able to specify security and derivations all in one place, that is the only way to specify all business rules in a single place. There are two major things you can do to make sure a spec is workable before you start generating DDL and triggers. First, you look for mistakes in the spec itself, such as duplicate column names in tables, references to non-existent tables, and so forth. Second, you look for mistakes or impossibilities in the delta-spec, the changes to the spec. For instance, if column COL1 is char(7) and the new spec has it listed as INT, you can stop there and tell the person the change is not valid. Futhermore, you can then do really cool things like generate a report of what *would* happen if you did an upgrade, such as the creation of new tables, changes in formulas for existing columns, new cascades, changes in definitions of keys (added a delete cascade, removed a delete cascade), and then give it to the customer to sign. Ha! I love that one :) What falls out of all of this for free is that once you have that data dictionary you don't have to code maintenance forms anymore, because a library file can generate any maintenance from from the dictionary description of a particular table. So anyway, that's the tip of the iceberg on that. Once you go to a dictionary-based generation system, it actually changes a lot of how you do things, not just change management. begin:vcard fn:Kenneth Downs n:Downs;Kenneth email;internet:[EMAIL PROTECTED] tel;work:631-689-7200 tel;fax:631-689-0527 tel;cell:631-379-0010 x-mozilla-html:FALSE version:2.1 end:vcard ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Addressing: ERROR: could not access status of transaction
PostgreSQL has been providing reliable service for our web hosting company since 1997. Thanks! Last night we got the following error during a dump of an 8.0.6 database: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: could not access status of transaction 245900066 DETAIL: could not open file /usr/local/pgsql/data/pg_clog/00EA: No such file or directory pg_dump: The command was: SELECT tableoid, oid, oprname, oprnamespace, (select usename from pg_user where oprowner = usesysid) as usename, oprcode::oid as oprcode FROM pg_operator Another dump run during the same time frame did not have this problem, and running the mentioned command in 'psql' produces no error. Research shows that this could be a corrupted tuple, but it's not clear what it means to me if it sometimes work. This follows behavior in the past few days where we noticed PostgreSQL core dumping repeatedly, with this error: PANIC: right sibling's left-link doesn't match The core dumps stopped when we REINDEX'ed the table mentioned in the PANIC statement. We were already planning to upgrade to 8.1.x Real Soon. Are their further insights about this new error? And is my expectation correct that a full/dump restore into 8.1.x would address it? Thanks! Mark . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer [EMAIL PROTECTED] Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . . ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Can Log filename include database name?
I am currently using a log with the file name format: log_filename = 'postgresql-%Y-%m.log' Is there any way to change the filename do start witht he database name? For now just added to add the database name to each line, but it would be usefull to have each DB written to it's own file. Or even better to be able to specify on a per database basis whether to log or not. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Need help with quote escaping in exim for postgresql
On Fri, Jul 07, 2006 at 03:48:00PM +0200, Marc Haber wrote: From what I understand, the correct way would be to use PQescapeStringConn, but that function needs an established connection, and exim performs string escape early, way before the actual connection is established. I just downloaded the code and looked. The code never looks or checks the encoding of the database. This is bad from a security point of view because that means you have no idea how your queries are going to be interpreted. I'd suggest adding a PQsetClientEncoding(conn, Latin1) right after you establish a connection. I'm not sure if Exim has any kind of declaration about what encoding strings have internally. You could use UTF-8 but then postgres would complain if you pass any strings that arn't valid UTF-8. They may or may not be desirable. SQL_ASCII may also be an option (assign no special meaning to characters at all), but I'm less sure of that. Can email address contain multibyte characters? I didn't think so... What about the configuration file? Hope this helps, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
[GENERAL] VACUUM FULL versus CLUSTER ON
Postgresql 8.0.4 on FreeBSD 5.4 I have a table consisting of some 300million rows that, every couple of months, has 100 million rows deleted from it (an immediately vacuumed afterward). Even though it gets routinely vacuumed (the only deletions/updates are just the quarterly ones), the freespace map was not increased in size to keep up with the growing size of the other tables in the database which do experience many updates,etc. I suspect that the table is suffering from bloat (not the indexes though as I drop them prior to the huge delete, then create them anew). What would be the recommended method for reclaiming the disk space lost due to bloat? Dropping the 5 indexes on the table and doing a VACUUM FULL, keeping the indexes and doing a VACUUM FULL (does FULL perform the same disk moving operations on the indexes as it does on the actual table?), dropping the indexes except the primary key and CLUSTER ON primary key, keeping the indexes and doing a CLUSTER ON primary key (again, does CLUSTER ON just operation on the table proper?) What are the caveats on using one over the other? I imagine any of the options I listed above will involve a full table lock. Are there any differences in the amount of free disk space required for each method? Thanks, Sven ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] How to optimize query that concatenates strings?
badlydrawnbhoy wrote: Hi all, I've got a database of URLs, and when inserting new data into it I want to make sure that there are no functionally equivalent URLs already present. For example, 'umist.ac.uk' is functionally the same as 'umist.ac.uk/'. I find that searching for the latter form, using string concatentation to append the trailing slash, is much slower than searching for a simple string - the index on URL name isn't used to speed up the search. Here's an illustration url=# explain select exists(select * from url where url = 'umist.ac.uk' or url || '/' = 'umist.ac.uk') as present; Well, in that example, you should just remove the OR conditional - it just evaluates to false anyways. Is there any way I can force postgres to use the index when using the string concatenation in the query? If you are always going to strcat with a '/', you could probably create a functional index or add a new column for a normalized url (which is what I'd lean towards). ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Addressing: ERROR: could not access status of transaction
Mark Stosberg [EMAIL PROTECTED] writes: Last night we got the following error during a dump of an 8.0.6 database: pg_dump: Error message from server: ERROR: could not access status of transaction 245900066 Another dump run during the same time frame did not have this problem, and running the mentioned command in 'psql' produces no error. Research shows that this could be a corrupted tuple, but it's not clear what it means to me if it sometimes work. Transient errors of this kind sound like flaky hardware to me. Might be time to run some memory and disk diagnostics ... This follows behavior in the past few days where we noticed PostgreSQL core dumping repeatedly, with this error: PANIC: right sibling's left-link doesn't match The core dumps stopped when we REINDEX'ed the table mentioned in the PANIC statement. That too could have been from a hardware fault during a previous index update. (There's a known bug in early 8.1.x releases that had this symptom, but not in 8.0.x IIRC.) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Long term database archival
of course you might need to also keep an image of the current OS and the hardware you're running on if you really want to be sure it will work in 20 years :-) I think that in twenty years, I think most of us will be more worried about our retirement than the long terms data conserns of the companies we will no longer be working for. :-D Of course, some of us that really enjoy what we do for work might prefer to die with our work boots on. Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] duplicated values on primary key field on reindex
On Thu, 2006-07-06 at 17:30, Weerts, Jan wrote: Scott Marlowe wrote: On Thu, 2006-07-06 at 16:36, Weerts, Jan wrote: Hi all! This was 8.1.3 and now is 8.1.4 running on Debian Sarge, locally compiled without any fancy options. While the first answer seems much more valid (the primarkey is an artificially created number), the second answer seems to be the one being presented for all further invocations of the above query. I noted, that the second row does not fit the order by clause, so I tried a reindex of the db, but that led to a duplicate value error: # reindex index tw_blob_pkey; ERROR: could not create unique index DETAIL: Table contains duplicated values. Now that is something I don't understand at all. Since the backup for said server went postal too long ago unnoticed, I would prefer a repair solution. Any ideas? Can you get set of fields in that row to uniquely identify it by? If so, see if you can update that column to something else and continue The only way would be to update by primarykey. But since the select on the primarykey field shows this strange ordering, I wonder, what effect an update would have. My guess would be, that the correct pk value should be 1636695, but seeing only 216305 on subsequent calls makes me think. I even have executed # select * from tw_blob where primarykey = 216305; and receive a single row, which I don't really trust to be the same one producing the error. If there are no other fields you can use to uniquely identify that row, then add a new row to the table and update it from a sequence... create sequence deargodsaveme; alter table brokentable add column emergencyint int; update brokentable set emergencyint=nextval('deargodsaveme'); then use that new column, emergencyint to select it for an update. Note that these kind of problem is generally a sign of faulty hardware, so you'll need to be looking at your machine's hardware (memory, CPU, etc..) and possible problems like faulty fsyncing etc... to make sure it doesn't happen again. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] VACUUM FULL versus CLUSTER ON
[snip] as I drop them prior to the huge delete, then create them anew). What would be the recommended method for reclaiming the disk space lost due to bloat? Dropping the 5 indexes on the table and doing a VACUUM FULL, keeping the indexes and doing a VACUUM FULL (does FULL perform the same disk moving operations on the indexes as it does on the actual table?), dropping the indexes except the primary key and CLUSTER ON primary key, keeping the indexes and doing a CLUSTER ON primary key (again, does CLUSTER ON just operation on the table proper?) I won't know for sure, but I guess the least downtime you would get by not dropping the indexes before the delete, but do a reindex after it. Then cluster on the primary key... My reasoning (correct me if I'm wrong): the deletion speed won't be affected by the indexes, I think deletions don't touch the indexes at all. The REINDEX command recreates all indexes at once, I think it needs only one full table scan. That needs the indexes in place, so you shouldn't drop them. The CLUSTER is a lot faster than VACUUM FULL. The only problem could be that I think all these operations might take more disk space than the individual indexing + VACUUM FULL. Are my assumptions correct ? Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] VACUUM and fsm_max_pages
On Fri, 2006-07-07 at 01:57, DANTE Alexandra wrote: Good morning List, I have seen several posts on this concept but I don’t find a complete response. I’m using BenchmarkSQL to evaluate PostgreSQL in transaction processing and I work with PostgreSQL 8.1.3 on RHEL4-AS, Itanium-2 processor, 8GB RAM. The database, generated via BenchmarkSQL and used, is a 200-warehouses database and its size is about 20GB. The parameter “max_fsm_pages” is equal to 2 and “max_fsm_relations” to 1000. Between two benchs, I launch a VACUUM but at the end of it, I see that PostgreSQL asks me to increase the “max_fsm_pages” parameters and the value proposed grows with the number of VACUUM launched… Oh, and if you can backup your database and import it into a test server, see how much smaller your new data/base directory is over the one on your production server. That'll give you an idea of how bloated your database is. 10 to 30% larger is fine. 100 to 1000% larger is bad. You get the idea. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] How to optimize query that concatenates strings?
You could build a function-based index that contains the "simplified" version of each URL (in your case, the field with the '/' stripped). Then use the same function on the URL going in. In that case PostgreSQL will use the index that you created already. Take a look at the PostgreSQL documentation for function-based indexes. select from ... where simplify(url) url_col; In the example above 'url_col' would have a function-based index that was based on 'simplify(url_col)' Chander Ganesan Open Technology Group, Inc. One Copley Parkway, Suite 210 Morrisville, NC 27560 Phone: 877-258-8987/919-463-0999 badlydrawnbhoy wrote: Hi all, I've got a database of URLs, and when inserting new data into it I want to make sure that there are no functionally equivalent URLs already present. For example, 'umist.ac.uk' is functionally the same as 'umist.ac.uk/'. I find that searching for the latter form, using string concatentation to append the trailing slash, is much slower than searching for a simple string - the index on URL name isn't used to speed up the search. Here's an illustration url=# explain select exists(select * from url where url = '' or url || '/' = 'umist.ac.uk') as present; QUERY PLAN --- Result (cost=47664.01..47664.02 rows=1 width=0) InitPlan - Seq Scan on url (cost=0.00..47664.01 rows=6532 width=38) Filter: ((url = ''::text) OR ((url || '/'::text) = 'umist.ac.uk'::text)) (4 rows) url=# explain select exists(select * from url where url = '') as present; QUERY PLAN Result (cost=5.97..5.98 rows=1 width=0) InitPlan - Index Scan using url_idx on url (cost=0.00..5.97 rows=1 width=38) Index Cond: (url = ''::text) (4 rows) Is there any way I can force postgres to use the index when using the string concatenation in the query? Thanks in advance, BBB ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] How to insert .xls files into database
Hey,I am using Postgresql 8.1.4 on windows. I have a large amount of data stored in .xls files which I want to insert into my database. The columns in .xls files are not exactly compatible with the database schema. For example the event_id in every .xls file starts with 1 while for my database event_id is the primary key. Also, there are some information like event_type, event_location that are particular to every .xls file and thus they have been mentioned only once in the .xls file but in the database there exists a separate column for them. For more clarification I am giving my database schema and attaching a sample .xls file. My database schema is as follows : { event_id int4 NOT NULL, buoy char(1) NOT NULL, deployment varchar(40), depth int4 NOT NULL, event_type varchar(64), model_info_id varchar(256), start_date float8 NOT NULL, start_date_sd float8, end_date float8 NOT NULL, end_date_sd float8, mean float8, variance float8, max float8, min float8, event varchar(20) NOT NULL, depth_type varchar(20) NOT NULL, buoy_location geometry, duration float8, Amplitude_sd float8,}.xls file is in the attachment. Now as you can see all the bold attributes are specified only once in the .xls files. And all the bold+italics one have to be manipulated a bit before storing. Even event_id in every .xls file starts with 1 but as this is a primary key I have to manipulate this also. I think if I can transform and manipulate each row into insert statements then I can insert the data into my database. Please guide me how to do this. Or if there is any another way of doing this. I am relatively new in this field so, please dont get offended if this problem is quite obvious. Thanks Parang Saraf[EMAIL PROTECTED] A1_TS_Sigma-Tdensity_multi-year_anomaly_negative.xls Description: MS-Excel spreadsheet ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] VACUUM FULL versus CLUSTER ON
On Friday 07 July 2006 08:19, Sven Willenberger wrote: Postgresql 8.0.4 on FreeBSD 5.4 I have a table consisting of some 300million rows that, every couple of months, has 100 million rows deleted from it (an immediately vacuumed afterward). Even though it gets routinely vacuumed (the only deletions/updates are just the quarterly ones), the freespace map was not increased in size to keep up with the growing size of the other tables in the database which do experience many updates,etc. Based on the size of the table, you may want to: Backup the table Drop the table Restore the table Is is possible that this will be faster in this instance. Secondly this sounds like a perfect time for you to consider upgrading to 8.1 and making use of table partitioning. That way you can just truncate the child table containing the old data. Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] VACUUM FULL versus CLUSTER ON
Csaba Nagy [EMAIL PROTECTED] writes: I won't know for sure, but I guess the least downtime you would get by not dropping the indexes before the delete, but do a reindex after it. Then cluster on the primary key... My reasoning (correct me if I'm wrong): the deletion speed won't be affected by the indexes, I think deletions don't touch the indexes at all. That's true, more or less. I think there's a small hit actually as queries set the hint bit and the pages have to be flushed. As long as you're just deleting and not inserting or updating The REINDEX command recreates all indexes at once, I think it needs only one full table scan. No, each index build has to do its own full scan. It wouldn't save much anyways not to, where would you store the tuples in the meantime? And why would this temporary storage place be any faster than scanning the original table? -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] VACUUM FULL versus CLUSTER ON
On Fri, 2006-07-07 at 09:55 -0700, Joshua D. Drake wrote: On Friday 07 July 2006 08:19, Sven Willenberger wrote: Postgresql 8.0.4 on FreeBSD 5.4 I have a table consisting of some 300million rows that, every couple of months, has 100 million rows deleted from it (an immediately vacuumed afterward). Even though it gets routinely vacuumed (the only deletions/updates are just the quarterly ones), the freespace map was not increased in size to keep up with the growing size of the other tables in the database which do experience many updates,etc. Based on the size of the table, you may want to: Backup the table Drop the table Restore the table Is is possible that this will be faster in this instance. Secondly this sounds like a perfect time for you to consider upgrading to 8.1 and making use of table partitioning. That way you can just truncate the child table containing the old data. Sincerely, Joshua D. Drake Doing a quick check reveals that the relation in question currently consumes 186GB of space (which I highly suspect is largely bloat). The delete was just run this past weekend as was the recreation of the indexes. I have 50GB of disk space left; If I vacuum full, it does not need to create a temporary copy of the relation and indexes like cluster does, does it? At this point, I think CLUSTER ON is out of the question due to the need to create the temporary table and indexes (I will run out of space during the operation). I do plan on migrating the whole mess to a new server which will run 8.1 (I had looked at inheritance for partitioning, I am glad to see that 8.1 took the concept and ran with it further :) ) This new server will use an external SAS array so I should simply be able to add another array as the need arises and partition to it via tablespace. Thanks to all who offered suggestions; it would appear that at this stage my only option to buy some time is try a vacuum full. My final question: can I leave the indexes in place when I vacuum full? I assume this will only operate on the table itself? Sven ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] VACUUM FULL versus CLUSTER ON
Sincerely, Joshua D. Drake Doing a quick check reveals that the relation in question currently consumes 186GB of space (which I highly suspect is largely bloat). Good lord.. .186 gig for a 300 million row table? Unless those are seriously large rows, you have a TON of bloat. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] VACUUM FULL versus CLUSTER ON
How long does it take do a database dump (with gzip -1 via | and ), drop this database and create the database and restore it from the backup. That is my solution, but I dont know how long it will take to restore your database and i dont have so large databases. Secondly this sounds like a perfect time for you to consider upgrading to 8.1 I would not do this without a test (not only a dump/restore test, I would test it with your application too) Greetings, -Franz -Ursprüngliche Nachricht- Von: Sven Willenberger [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 7. Juli 2006 19:26 An: Joshua D. Drake Cc: pgsql-general@postgresql.org Betreff: Re: [GENERAL] VACUUM FULL versus CLUSTER ON On Fri, 2006-07-07 at 09:55 -0700, Joshua D. Drake wrote: On Friday 07 July 2006 08:19, Sven Willenberger wrote: Postgresql 8.0.4 on FreeBSD 5.4 I have a table consisting of some 300million rows that, every couple of months, has 100 million rows deleted from it (an immediately vacuumed afterward). Even though it gets routinely vacuumed (the only deletions/updates are just the quarterly ones), the freespace map was not increased in size to keep up with the growing size of the other tables in the database which do experience many updates,etc. Based on the size of the table, you may want to: Backup the table Drop the table Restore the table Is is possible that this will be faster in this instance. Secondly this sounds like a perfect time for you to consider upgrading to 8.1 and making use of table partitioning. That way you can just truncate the child table containing the old data. Sincerely, Joshua D. Drake Doing a quick check reveals that the relation in question currently consumes 186GB of space (which I highly suspect is largely bloat). The delete was just run this past weekend as was the recreation of the indexes. I have 50GB of disk space left; If I vacuum full, it does not need to create a temporary copy of the relation and indexes like cluster does, does it? At this point, I think CLUSTER ON is out of the question due to the need to create the temporary table and indexes (I will run out of space during the operation). I do plan on migrating the whole mess to a new server which will run 8.1 (I had looked at inheritance for partitioning, I am glad to see that 8.1 took the concept and ran with it further :) ) This new server will use an external SAS array so I should simply be able to add another array as the need arises and partition to it via tablespace. Thanks to all who offered suggestions; it would appear that at this stage my only option to buy some time is try a vacuum full. My final question: can I leave the indexes in place when I vacuum full? I assume this will only operate on the table itself? Sven ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] VACUUM FULL versus CLUSTER ON
On Fri, 2006-07-07 at 10:41 -0700, Joshua D. Drake wrote: Sincerely, Joshua D. Drake Doing a quick check reveals that the relation in question currently consumes 186GB of space (which I highly suspect is largely bloat). Good lord.. .186 gig for a 300 million row table? Unless those are seriously large rows, you have a TON of bloat. Joshua D. Drake Yes, that number came from the dbsize functions (in contrib) so I don't know if that includes the associated indexes as well. The rows are fairly large, yes, but not enough (IMO) to account for that size. It will be interesting to see the final size after the vacuum full (which is the method I have settled on to reclaim space this go round). Sven ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Long term database archival
On Jul 7, 2006, at 1:19 AM, Csaba Nagy wrote: On Thu, 2006-07-06 at 20:57, Karl O. Pinc wrote: Hi, What is the best pg_dump format for long-term database archival? That is, what format is most likely to be able to be restored into a future PostgreSQL cluster. Should we want to restore a 20 year old backup nobody's going to want to be messing around with decoding a custom format dump if it does not just load all by itself. Karl, I would say that if you really want data from 20 years ago, keep it in the custom format, along with a set of the sources of postgres which created the dump. then in 20 years when you'll need it, you'll compile the sources and load the data in the original postgres version... of course you might need to also keep an image of the current OS and the hardware you're running on if you really want to be sure it will work in 20 years :-) I've been burned by someone doing that, and then being unable to find a BCPL compiler. So don't do that. Store them in a nice, neutral ASCII format, along with all the documentation. If you can't imagine extracting the data with a small perl script and less than a days work today then your successor will likely curse your name in 20 years time. Cheers, Steve ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] How to insert .xls files into database
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Parang Saraf wrote: Hey, I am using Postgresql 8.1.4 on windows. I have a large amount of data stored in .xls files which I want to insert into my database. The columns in .xls files are not exactly compatible with the database schema. For example the event_id in every .xls file starts with 1 while for my database event_id is the primary key. Also, there are some information like event_type, event_location that are particular to every .xls file and thus they have been mentioned only once in the .xls file but in the database there exists a separate column for them. For more clarification I am giving my database schema and attaching a sample .xls file. My database schema is as follows : { event_id int4 NOT NULL, buoy char(1) NOT NULL, deployment varchar(40), depth int4 NOT NULL, event_type varchar(64), model_info_id varchar(256), start_date float8 NOT NULL, start_date_sd float8, end_date float8 NOT NULL, end_date_sd float8, mean float8, variance float8, max float8, min float8, event varchar(20) NOT NULL, depth_type varchar(20) NOT NULL, buoy_location geometry, duration float8, Amplitude_sd float8, } .xls file is in the attachment. Now as you can see all the bold attributes are specified only once in the .xls files. And all the bold+italics one have to be manipulated a bit before storing. Even event_id in every .xls file starts with 1 but as this is a primary key I have to manipulate this also. I think if I can transform and manipulate each row into insert statements then I can insert the data into my database. Please guide me how to do this. Or if there is any another way of doing this. I am relatively new in this field so, please dont get offended if this problem is quite obvious. OpenOffice.org v2 can interface .xls files with PostgreSQL. - -- Ron Johnson, Jr. Jefferson LA USA Is common sense really valid? For example, it is common sense to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that common sense is obviously wrong. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFErqcHS9HxQb37XmcRAj6bAKCRGCZMeKXbaIUewBFAPDnko8t/kACfU+sa 7EiEI+V2LEGD1OKJh+8IDeU= =Ahtm -END PGP SIGNATURE- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Version/Change Management of functions?
--On July 7, 2006 12:35:53 PM + Roman Neuhauser [EMAIL PROTECTED] wrote: # [EMAIL PROTECTED] / 2006-07-06 22:41:27 -0600: OK I know this is an odd question but I'm working on an app that will rely more and more on database driven functions, and while the app's source is in SVN, and I intend for the source of the SQL scripts to also be there, I was wondering...what are people doing for version control and change management on their custom (esp PL/pgSQL and say PL/Perl) functions? Well, people use a version control system. Do you have any specific questions? Yes, how exactly do you use it. Since there's no way to directly control whats in the DB via a VCS, further, how do you verify that what is in the DB is also in the VCS, etc? (I'm intentionally asking a bit of a 'dumb' question because I really want to find out exactly what others are doing). ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Version/Change Management of functions?
On Friday 07 July 2006 13:08, Michael Loftis wrote: --On July 7, 2006 12:35:53 PM + Roman Neuhauser [EMAIL PROTECTED] wrote: # [EMAIL PROTECTED] / 2006-07-06 22:41:27 -0600: OK I know this is an odd question but I'm working on an app that will rely more and more on database driven functions, and while the app's source is in SVN, and I intend for the source of the SQL scripts to also be there, I was wondering...what are people doing for version control and change management on their custom (esp PL/pgSQL and say PL/Perl) functions? Well, people use a version control system. Do you have any specific questions? Yes, how exactly do you use it. Since there's no way to directly control whats in the DB via a VCS, further, how do you verify that what is in the DB is also in the VCS, etc? (I'm intentionally asking a bit of a 'dumb' question because I really want to find out exactly what others are doing). I take text based schema dumps using the same file name each time and commit them so that subversion can tell me the differences. Joshua D. Drake ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] How to insert .xls files into database
I guess the solution depends on what is a 'large amount of data'. The most time consuming part is going to be converting the single data elements at the top of each sheet into multiple elements. I would create columns for the data in the sheet. At the same time I would order the columns to match the database schema. Then it would a matter of cut and paste to fill the columns with the data. The event id's could be renumbered using Excel's series generator to create a non repeating set of id's. If the amount of data was very large it might pay to create some macros to do the work. Once the data was filled in you would have a couple of choices. One, as mentioned by Ron would be to use OpenOffice v2 to dump the data into the database. The other would be to save the data as CSV and use the psql \copy command to move the data into the table. On Friday 07 July 2006 09:40 am, Parang Saraf wrote: Hey, I am using Postgresql 8.1.4 on windows. I have a large amount of data stored in .xls files which I want to insert into my database. The columns in .xls files are not exactly compatible with the database schema. For example the event_id in every .xls file starts with 1 while for my database event_id is the primary key. Also, there are some information like event_type, event_location that are particular to every .xls file and thus they have been mentioned only once in the .xls file but in the database there exists a separate column for them. For more clarification I am giving my database schema and attaching a sample .xls file. My database schema is as follows : { event_id int4 NOT NULL, buoy char(1) NOT NULL, deployment varchar(40), depth int4 NOT NULL, event_type varchar(64), model_info_id varchar(256), start_date float8 NOT NULL, start_date_sd float8, end_date float8 NOT NULL, end_date_sd float8, mean float8, variance float8, max float8, min float8, event varchar(20) NOT NULL, depth_type varchar(20) NOT NULL, buoy_location geometry, duration float8, Amplitude_sd float8, } .xls file is in the attachment. Now as you can see all the bold attributes are specified only once in the .xls files. And all the bold+italics one have to be manipulated a bit before storing. Even event_id in every .xls file starts with 1 but as this is a primary key I have to manipulate this also. I think if I can transform and manipulate each row into insert statements then I can insert the data into my database. Please guide me how to do this. Or if there is any another way of doing this. I am relatively new in this field so, please dont get offended if this problem is quite obvious. Thanks Parang Saraf [EMAIL PROTECTED] -- Adrian Klaver [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] How to optimize query that concatenates strings?
On Fri, Jul 07, 2006 at 04:29:51AM -0700, badlydrawnbhoy [EMAIL PROTECTED] wrote a message of 48 lines which said: I've got a database of URLs, and when inserting new data into it I want to make sure that there are no functionally equivalent URLs already present. For example, 'umist.ac.uk' is functionally the same as 'umist.ac.uk/'. IMHO, your problem seems to be an instance of a very general class: data that needs canonicalization. For instance, I work for a domain name registry and domain names are case-insensitive. Do we store the domain names blindly and then always use ILIKE or other case-insensitive operators? No, we canonicalize domain names by changing them to lower-case. That way: * we do not have to think of using case-insensitive operators, * indexes do work. This is what I recommend here: decide on a canonical form and canonicalize everything when it comes into the database (it is easy to do it from a trigger). If the original form is important, you can still store it in a column intended for display, not for searching. Here is a way to canonicalize, with a trigger. The function canonicalize is left as an exercice (you can write it in PL/pgSQL, C, Python, etc): CREATE OR REPLACE FUNCTION force_canonicalization() RETURNS TRIGGER AS 'BEGIN NEW.url = canonicalize(NEW.url); RETURN NEW; END;' LANGUAGE PLPGSQL; CREATE TRIGGER force_canonicalization BEFORE INSERT ON Things FOR EACH ROW EXECUTE PROCEDURE force_canonicalization(); ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Bug? Changing where distinct occurs produces error?
OK I'm either insane or found a bug in 8.1.3 If you execute say: SELECT DISTINCT(ua.user_id),pa.poll_id FROM user_answers ua, poll_answers pa WHERE pa.poll_answer_id = ua.poll_answer_id AND ua.active='Y'; Everything is fine, however if you run SELECT pa.poll_id,DISTINCT(ua.user_id) FROM user_answers ua, poll_answers pa WHERE pa.poll_answer_id = ua.poll_answer_id AND ua.active='Y'; ERROR: syntax error at or near DISTINCT at character 19 LINE 1: SELECT pa.poll_id,DISTINCT(ua.user_id) FROM user_answers ua,... ^ Further if I wrap it in a SUM (with a group by pa.poll_id) I get no error. I'm a little confused as to what is wrong with my second example there? -- Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds. -- Samuel Butler ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Bug? Changing where distinct occurs produces error?
On Fri, 7 Jul 2006, Michael Loftis wrote: OK I'm either insane or found a bug in 8.1.3 If you execute say: SELECT DISTINCT(ua.user_id),pa.poll_id FROM user_answers ua, poll_answers pa WHERE pa.poll_answer_id = ua.poll_answer_id AND ua.active='Y'; Everything is fine, however if you run SELECT pa.poll_id,DISTINCT(ua.user_id) FROM user_answers ua, poll_answers pa WHERE pa.poll_answer_id = ua.poll_answer_id AND ua.active='Y'; This statement is invalid. DISTINCT is a set quantifier and either comes before the select list or as the first thing in a set function specification. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Bug? Changing where distinct occurs produces error?
SELECT DISTINCT(ua.user_id),pa.poll_id FROM user_answers ua, poll_answers pa WHERE pa.poll_answer_id = ua.poll_answer_id AND ua.active='Y'; Everything is fine, however if you run SELECT pa.poll_id,DISTINCT(ua.user_id) FROM user_answers ua, poll_answers pa WHERE pa.poll_answer_id = ua.poll_answer_id AND ua.active='Y'; ERROR: syntax error at or near DISTINCT at character 19 LINE 1: SELECT pa.poll_id,DISTINCT(ua.user_id) FROM user_answers ua,... ^ Notice: http://www.postgresql.org/docs/8.1/interactive/sql-select.html According to the syntax for a select a distinct | distinct on must be the first column specified in the syntax. So perhaps it is designed that way. Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
re: [GENERAL] How to insert .xls files into database
[mailto:[EMAIL PROTECTED] On Behalf Of Adrian Klaver I guess the solution depends on what is a 'large amount of data'. The most time consuming part is going to be converting the single data elements at the top of each sheet into multiple elements. I would create columns for the data in the sheet. At the same time I would order the columns to match the database schema. Then it would a matter of cut and paste to fill the columns with the data. The event id's could be renumbered using Excel's series generator to create a non repeating set of id's. If the amount of data was very large it might pay to create some macros to do the work. Once the data was filled in you would have a couple of choices. One, as mentioned by Ron would be to use OpenOffice v2 to dump the data into the database. The other would be to save the data as CSV and use the psql \copy command to move the data into the table. On Friday 07 July 2006 09:40 am, Parang Saraf wrote: Evrything You described is familiar to me, except the OpenOffice v2 dump - could You explain this more in details pls? I tried to do it many times, without success. Thank You Tomas [del] -- Adrian Klaver [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Bug? Changing where distinct occurs produces error?
--On July 7, 2006 3:22:01 PM -0700 Richard Broersma Jr [EMAIL PROTECTED] wrote: Notice: http://www.postgresql.org/docs/8.1/interactive/sql-select.html According to the syntax for a select a distinct | distinct on must be the first column specified in the syntax. So perhaps it is designed that way. Ahhh I see, I keep thinking of DISTINCT as a Aggregator function. That's my mistake. Thanks all! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] How to insert .xls files into database
One option is to write a Python translator to create CSV files, or even an uploader to go directly from the Excel files to the database. There is at least one module to read Excel files, in all their complexity: http://cheeseshop.python.org/pypi/xlrd/0.5.2 and a number of Postgres modules: http://wiki.python.org/moin/PostgreSQL - John Burger MITRE ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] How to insert .xls files into database
So far I have only got this to work with the Postgres SDBC driver- http://dba.openoffice.org/drivers/postgresql/index.html 1) Open the Data Source (F4) window in the spreadsheet. 2) Make a connection to the database. I usually do this by opening a table. This is fairly important, otherwise when you do the import OO will try to establish connection at the same as it is importing the data and end up hanging. 3)Select the data in the spreadsheet you want to import. The column headings will become the field names. NOTE: You want the data to be as plain as possible. OpenOffice tends to auto format the cell contents i.e changing 1/2 to a typeset version, changing to true quotes. This introduces format codes that don't translate well. 4)Drag the selected data to the Tables heading of the appropriate data source. At this point a dialog form will pop up. 5)Name the table. You will need to schema qualify i.e. public.test_table. Make sure you select Create a Primary Key. Without the table will be created but the data will not be imported. This why I use the SDBC driver. The JDBC driver does not seem to support this step. 6) Select Next. Here you can choose the columns you wish to import. 7) Select Next. Here you can modify the data types of the columns. 8) Select Create. The table should be created and the data inserted. On Friday 07 July 2006 03:41 pm, [EMAIL PROTECTED] wrote: [mailto:[EMAIL PROTECTED] On Behalf Of Adrian Klaver I guess the solution depends on what is a 'large amount of data'. The most time consuming part is going to be converting the single data elements at the top of each sheet into multiple elements. I would create columns for the data in the sheet. At the same time I would order the columns to match the database schema. Then it would a matter of cut and paste to fill the columns with the data. The event id's could be renumbered using Excel's series generator to create a non repeating set of id's. If the amount of data was very large it might pay to create some macros to do the work. Once the data was filled in you would have a couple of choices. One, as mentioned by Ron would be to use OpenOffice v2 to dump the data into the database. The other would be to save the data as CSV and use the psql \copy command to move the data into the table. On Friday 07 July 2006 09:40 am, Parang Saraf wrote: Evrything You described is familiar to me, except the OpenOffice v2 dump - could You explain this more in details pls? I tried to do it many times, without success. Thank You Tomas [del] -- Adrian Klaver [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Adrian Klaver [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postmaster is starting but shutting when trying to connect (Windows)
Hello, i have a PostgreSQL (8.1) installation for testing purposes which was running fine for several months now (Windows XP). I was working with it yesterday, and today after booting my computer and restarting the service (I'm starting the service manually, because I don't need the server running all the time) postmaster comes up fine, no entries in the log file. But as soon as I try to connect to the server (does not matter what type of client, psql, PgAdmin or JDBC) I get an error message in the log file. Here is the full log file: This looks exactly like the issues we've seen with broken antivirus or personal firewall software. Make sure you don't have any such installed (actualy installed, not just enabled), and if you do try to uninstall them. If you don't, but had before, check for any remaining pieces. There should be posts in the archives with details about it. Again, note that just disabling things usually isn't enough, you need to actually uninstall. //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly