Re: [GENERAL] GiST and Gin technical doc
GiST: http://gist.cs.berkeley.edu/ http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/ GIN: GIN is ordinary inverted index which described in many docs. Look for some implementation details in my presentation on conference: http://www.sigaev.ru/gin/Gin.pdf Yannick Warnier wrote: Hi there, I'm having trouble finding technical documentation about GiST, Gin and TSearch2. I am particularly interested in the internal data structure of a GiST-ed or Gin-ed index, and the availability of searches by proximity. Does anyone know of a good place to find such doc, outside from the source code (I'm not too much into C/C++)? Thank you, Yannick ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] PostgreSQL 8.1 - SSL:Required
Hi,I am using the binary install for postgresql 8.1 Win32 on Windows XP. I know how to intstall SSL if I was installing from source, --with-openssl, but I am installing onto Windows XP for the first time, so my question is: 1. How do I install the SSL module via the install wizard?Blessings Andrew
Re: [GENERAL] Major Performance decrease after some hours
2006/10/2, Tom Lane [EMAIL PROTECTED]: Peter Bauer [EMAIL PROTECTED] writes: Attached you can find the postgresql logfiles and a logfile which contains alls SQL statements executed in the relevant time together with the excpetions thrown. I also attached a file with all used Pl/pgSQL functions. Since we were not able to find a problem for so long, i think it makes no sense to filter the information because we are probably not looking on the right spots. I hope the pgsql-general moderators are not going to approve an 8MB message :-(. I'll hang onto my copy though, just in case it proves useful. Sorry, i put the files on http://dagobert.apus.co.at/pglogfiles/ so please use them. It might be worth attaching to a few of them with gdb to get stack traces, Should this be done when the problems are appearing or during normal operation or both to compare the results? When the problems are appearing. ok, i will try to reproduce it and post the stack traces thx, Peter ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] PostgreSQL 8.1 - SSL:Required
Hi, I am using the binary install for postgresql 8.1 Win32 on Windows XP. I know how to intstall SSL if I was installing from source, -- with-openssl, but I am installing onto Windows XP for the first time, so my question is: 1. How do I install the SSL module via the install wizard? SSL is built into all the binaries that are distributed. You just need to create and install your certificate, and then change the Config files appropriately (same way as you would when installing from source) //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] pg web hosting with tsearch2?
Am 2006-09-29 13:45:34, schrieb Rick Schumeyer: 3)All the web hosts I am aware of are still offering only pg 7.4. Does anybody offer pg 8.x ? I am thinking about a root-Server for 39 Euro/Month on which I can install what I want Greetings Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 50, rue de Soultz MSM LinuxMichi 0033/6/6192519367100 Strasbourg/France IRC #Debian (irc.icq.com) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Major Performance decrease after some hours
On Sun, Oct 01, 2006 at 12:55:51PM +0200, MaXX wrote: Pure speculation: are you sure you aren't vacuuming too agressively? The DELETE waiting and SELECT waiting sound to me like they are waiting for a lock that another vacuum is holding. How would one determine the lock situation definitively? Is there an internal mechanism that can be queried? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] How to trace index to table?
I'm trying to build some queries to gather metrics on my PG database. When looking at pg_class and pulling a row that is an index, how do I use sql to pull the table the index belongs to?Thanks,Chris
Re: [GENERAL] How to trace index to table?
If you look on pg_indexes you find the index and the table. - Original Message - From: Chris Hoover To: pgsql-general@postgresql.org Sent: Monday, October 02, 2006 9:19 AM Subject: [GENERAL] How to trace index to table? I'm trying to build some queries to gather metrics on my PG database. When looking at pg_class and pulling a row that is an index, how do I use sql to pull the table the index belongs to?Thanks,Chris No virus found in this incoming message.Checked by AVG Free Edition.Version: 7.1.407 / Virus Database: 268.12.11/460 - Release Date: 01/10/2006
Re: [GENERAL] Major Performance decrease after some hours
Ray Stell [EMAIL PROTECTED] writes: How would one determine the lock situation definitively? Is there an internal mechanism that can be queried? pg_locks view. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Normal vs Surrogate Primary Keys...
Stephan Szabo wrote: On Sun, 1 Oct 2006, rlee0001 wrote: I know, for example, that by default PostgreSQL assigns every record a small unique identifier called an OID. It seems reasonable then, that when the DBA creates a cascading foreign key to a record, that the DBMS could, instead of storing the record's entire natural key, store only a reference to the OID and abstract/hide this behavior from the environment just as PostgreSQL does with its OID feature now. Of course, this would require that the OID be guaranteed unique, which I don't beleave is the case in the current versions. This would completely eliminate concerns related to the performance of cascading updates because no actual cascade would take place, but rather the update would affect all referencing records implicitly via the abstraction. Well, that alone isn't enough I think. MATCH SIMPLE allows you to pass the constraint for a row if any of the columns in a multi-column foreign key are NULL, so there isn't always a matching row, but there's also meaningful information in the column values. MATCH PARTIAL (which we admittedly don't support yet) allows you to have a valid key if the non-NULL portions of the multi-column foreign key match to one or more rows in the referenced table, so there may be more than one matching row. The all NULL case is pretty easy to handle in general. In addition, AFAICT for cascades you would potentially be trading the cost at cascade time with the cost at select time, so that would itself not always be a win. Also, I don't see how you get away with not needing two indexes on the referenced table to do this well unless you're storing something like a ctid which has its own problems with updates. I think there may be some better options than what we've got, but there's tradeoffs as well. LOL, I hadn't even considered that. In my proposed solution, every select against a table containing a foreign key (which selects at least part of the foreign key) would require an implicit join to retrieve the actual foreign key values. Additionally, selects with explicit joins would need to be converted, which would require the DBMS to execute several additional queries (to retrieve the internal surrogate key for the given natural key values) before executing the original join query. Oh well, can't kill a guy for trying. Thanks for pointing out what I was too blind to see. :o) -Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] pg_dump design problem (bug??)
Postgresql 8.1.3 Hi, I'm wondering if there's a problem with pg_dump --create, or if I'm just missing something. It does not seem to restore things like: ALTER DATABASE foo SET DateStyle TO European; Shouldn't the database that is re-created be like the database that is being dumped? For our purposes we do a pg_dumpall --globals-only and then pg_dumps of each of our databases. We like this because we can restore blobs this way, get a nice compressed and flexable --format=c, and are able to restore individual databases. But there is clearly a problem because we lose the database meta-information like datestyle, timezones and all that other per-db SET stuff. It seems the only way to get that is with a pg_dumpall, and then it's not per-database. What should we do to work around this problem? Should pg_dump/pg_restore have arguments like: --dbmeta (the default when --create is used) Dumps the database's SET options. --no-dbmeta (the default when --create is not used) Does not dump the database's SET options. --dbowner (the default when --create is used) Dumps the database's owner. --no-dbowner (the default when --create is not used) Does not dump the database's owner. Hummm for complete control consider the following: Or maybe pg_dump/pg_restore should augment/replace --data-only --schema-only --create with: --content=ctype[, ...] where ctype=db|dbowner|meta|schema|schemaowner|table|tableowner|data db create the database dbowner set the database owner as in the dumped db metaset the database SETs as in the dumped db schema create the schema (not data definitions/table structure) as in the dumped db schemaowner set the schema owner as in the dumped db table create the table(s) as in the dumped db tableowner set the table owners as in the dumped db dataload the data as in the dumped db I'd also want to add functions, triggers, views and the other sorts of things that go into databases to the above list, but that's enough for now. Thanks for listening. Karl [EMAIL PROTECTED] Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] using float4, I get incorrect value when select
Hello, I have a table, with a float4 column, when I insert a new row and for example I use this value 5000542,5 in the column of type float4, the insert do OK, but when I select (select myColumn from myTable) I get the value 5000540 I don't know why. But if I cast to float8 (select cast(myColumn as float8) from myTable) I get the correct value (5000542,5). I don't know if I am doing something wrong.. Please help me. Thanks P.D.: I am using postgres 8.1 for windows Anibal ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Normal vs Surrogate Primary Keys...
Tom Lane wrote: rlee0001 [EMAIL PROTECTED] writes: ... I know, for example, that by default PostgreSQL assigns every record a small unique identifier called an OID. Well, actually, that hasn't been the default for some time, and even if you turn it on it's not guaranteed unique without additional steps, and seeing that it's only 32bits wide it's not usable as a unique ID for very large tables. unique and small are more or less contradictory in this context. ... it seems like this is the sort of thing that even a fully SQL-compliant DBMS could do internally to compensate for the performance issues with using large natural keys in relationships. What performance issues are those, exactly? I have seen no data that proves that large key size is a bottleneck for reasonable schema designs in Postgres. Maybe that just means that we have more fundamental problems to fix :-( ... but there's no point in investing a lot of effort on an aspect that isn't the next bottleneck. Well from what I understand, there are basically two reasons to use surrogate primary keys: 1) No reliable natural candidate key exists or 2) The natural candidate keys are percieved to be far too large/complex to use as the primary key. I have yet to hear anyone recommend the use of surrogate keys in modern databases for any other reason. Obviously that some entities in practice have no reliable natural keys cannot be helped and in those cases a surrogate key pretty much has to be used, enless the data modeler would find it acceptable to use every attribute in the entity as part of the primary key (in order to ensure uniqueness of each record overall). The second argument for the use of surrogate keys is simply that reliable natural candidate keys are often perceived to be too large to use as primary keys. The perception seems to be that large primary keys consume a considerable amount of storage space when used in foreign keys. For example, if I key employee by Last Name, First Name, Date of Hire and Department, I would need to store copies of all this data in any entity that relates to an employee (e.g. payroll, benefits and so on). In addition, if any of these fields change in value, that update would need to cascade to any related entities, which might be perceived as a performance issue if there are many related records. I'm not saying that PostgreSQL specifically has performance problems but that using large natural keys can hamper performance (both in terms of storage space required and cascading update time) when a lot of relationships exist between entities with such large keys. Personally I hate using surrogate keys except in places where no reliable natural key exists but find it nessisary in order to improve the efficiency of foreign keys in the database. So my proposal was simply to have the DBMS internally create and use an invisible surrogate key to identify and relate records, but expose the natural key to the environment. The currently OID implimentation cannot be used for this as you've already stated but I think a similar implimentation could work. I'm just wondering if anyone else would take advantage of the performance benefit, or perhaps sees a flaw in my logic. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Postgres backup
guys, i have a java web app with a postgres backend. now i want to implement a db database functionality from within my web app. now i have got this so far, String[] args = {\C:\\Program Files\\PostgreSQL\\8.1\\bin\\pg_dump\, -i, -h, localhost, -p, 5432, -U, postgres, -F, t, -b, -v, -f, \C:\\Test.backup\, TESTDB}; String result = ; if (args.length 0) { result = args[0];// start with the first element for (int i=1; iargs.length; i++) { result = result + args[i]; } } Process p = Runtime.getRuntime().exec(result); Now on executing this i am getting a file called Test.backup, but its 0KB compeletely empty. can somebody tell me what i am doing wrong and how can i fix it. somebody also told me that this is not the best way of doing this, and i was wondering is there a better way of implementhing this. Thanks. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL]
Hi, I am using Postgresql 8.0. I have the problem to do any operation with the database like vaccumdb or dumping the database. The postgresql log shows, that it can't find the file /var/lib/pgsql/data/pg_clog/ . The database starts up and some tables can queried. Any help is welcome Dirk ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Advantages of postgresql
Hello everybody, So far I have only been working with MySQL. Today I was talking to a friend and he was suggesting I migrated to postgreSQL, as it is way better My question is why? I mean could someone pls tell me some advantages and disadvantages of working with postgresql? Thanks in advance, Iulian!
[GENERAL] FlushRelationBuffers Error
Last night during a vacuum full, one of our production 7.4.12 databases threw this warning: WARNING: FlushRelationBuffers(idxtype26, 6137): block 5752 is referenced (private 0, global 1) vacuumdb: vacuuming of database cvdb failed: ERROR: FlushRelationBuffers returned -2 PANIC: cannot abort transaction 35650725, it was already committed server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. We do use some external C functions that deal with creating indexes and locking tables, but we're pretty confident that they're locking correctly. Another bit of information, the database gets restarted before the vacuum process begins. Is this something that will resolve itself the next time the database gets vacuumed or does it need some kind of immediate action? Does the index need reindexed? Thanks ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Advantages of postgresql
Iulian Manea [EMAIL PROTECTED] schrieb: Hello everybody, So far I have only been working with MySQL. Today I was talking to a friend and he was suggesting I migrated to postgreSQL, as it is way better My question is why? I mean could someone pls tell me some advantages and disadvantages of working with postgresql? Read http://sql-info.de/mysql/gotchas.html versus http://sql-info.de/postgresql/postgres-gotchas.html HTH, Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) If I was god, I would recompile penguin with --enable-fly.(unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Dynamic access to record columns (in PL/pgsql)
Tom Lane wrote: Daniele Mazzini [EMAIL PROTECTED] writes: I have a trigger procedure which I want to be called after insert or update on different tables. In this procedure, I need to find the values of key fields for the affected table in the NEW record, but I can't find a way to access a field the name of which I only know at execution time. There is no way to do that in plpgsql. You could do it in the other PLs (eg plperl, pltcl) since they are not as strongly typed as plpgsql. regards, tom lane Well, I wouldn't go that far. A little more information might point to a solution here. Given any one table that this trigger might fire on, are the fields that you need to check defined? I.e., if the trigger fires on table A, will you always be looking at column A.b and on table B column B.a? If so, when you create the trigger, pass in some kind of variable that the trigger can use to know what field it's looking for. If that doesn't make enough sense, let me know and I can give you a more fleshed out example of what I'm talking about. -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Weird disk write load caused by PostgreSQL?
I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that seems to be writing data to disk at rates that I think are disproportional to the update load imposed on the database. I am looking for ways to determine the cause of this I/O. As an example, here is a typical graph produced by Munin: http://purefiction.net/paste/pg-iostat.png Running an hourly iostat produces this output: Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda 43.50 0.21 0.04 427915 72736 sda 43.62 0.21 0.04 428183 88904 sda 43.74 0.21 0.05 428440 104877 sda 43.90 0.21 0.06 428808 124681 sda 44.06 0.21 0.07 429111 145447 sda 44.27 0.21 0.08 429532 170317 sda 44.46 0.21 0.09 429985 193594 In other words, it's reading about 400MB/hour and writing around 15-20GB/hour, or exactly 118GB during the last six hours. To determine how well this correlates to the actual inserts and updates being performed on the database, I ran a loop alongside iostat that executed select sum(n_tup_upd), sum(n_tup_ins) from pg_stat_all_tables against PostgreSQL every hour and output the difference. Here are a few samples: | delta_upd | delta_ins | +---+---+ | 7111 | 2343 | | 7956 | 2302 | | 7876 | 2181 | | 9269 | 2477 | | 8553 | 2205 | For the write numbers to match the tuple numbers, each updated/ inserted tuple would have to average at least 1.5MB (15 GB divided by 10,000 tuples), which is not the case; the total size of the raw tuples updated/inserted during the above session probably does not exceed a couple of megabytes. Even considering overhead, page size, MVCC, etc., this does not compute. I have not narrowed this explicitly down to PostgreSQL, but since the stock Linux kernel we are running does not offer per-process I/O statistics, I cannot determine this for sure. However, except for the PostgreSQL database, everything else on the box should be identical to what we are running on other boxes, which are not exhibiting the same kind of load. Oh, and I have PostgreSQL logging turned off. Note that PostgreSQL's performance in itself seems fine, and according to top/ps it's only very rarely in iowait. Alexander. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] test
Mail from this ML doesn't seem to arrive at our office anymore (you haven't been silent for 4 days, have you?). Hence a small test. Sorry for the inconvenience. -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Weird disk write load caused by PostgreSQL?
Alexander Staubo [EMAIL PROTECTED] writes: I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that seems to be writing data to disk at rates that I think are disproportional to the update load imposed on the database. I am looking for ways to determine the cause of this I/O. Are you sure that iostat is to be trusted? The read numbers in particular look suspiciously uniform ... it would be a strange query load that would create a read demand changing less than 1% from hour to hour, unless perhaps that represented the disk's saturation point, which is not the case if you're not seeing obvious performance problems. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Normal vs Surrogate Primary Keys...
On Sun, Oct 01, 2006 at 07:48:14PM -0700, rlee0001 wrote: snip For example, if I key employee by Last Name, First Name, Date of Hire and Department, I would need to store copies of all this data in any entity that relates to an employee (e.g. payroll, benefits and so on). In addition, if any of these fields change in value, that update would need to cascade to any related entities, which might be perceived as a performance issue if there are many related records. Err, those fields don't make a natural key since they have no guarentee of uniqueness. You've simply decided that the chance of collision is low enough that you don't care, but for me that's not really good enough for use as a key. Secondly, three of the four fields you suggest are subject to change, so that indeed makes them a bad choice. My definition of key includes unchanged for the lifetime of the tuple. In that situation your idea may work well, but that's just a surrogate key in disguise... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] using float4, I get incorrect value when select
On Sun, Oct 01, 2006 at 10:47:27PM -0400, Anibal David Acosta F. wrote: Hello, I have a table, with a float4 column, when I insert a new row and for example I use this value 5000542,5 in the column of type float4, the insert do OK, but when I select (select myColumn from myTable) I get the value 5000540 I don't know why. But if I cast to float8 (select cast(myColumn as float8) from myTable) I get the correct value (5000542,5). Do you understand the concept of floating point and the difference between float4 and float8? Floating point throws away digits that it doesn't have room to store, and with float4 that occurs after about 6 digits. If you want to remember exact numbers, use numeric. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] Postgres backup
On Sun, Oct 01, 2006 at 09:20:12PM -0700, pd wrote: guys, i have a java web app with a postgres backend. now i want to implement a db database functionality from within my web app. now i have got this so far, String[] args = {\C:\\Program Files\\PostgreSQL\\8.1\\bin\\pg_dump\, -i, -h, localhost, -p, 5432, -U, postgres, -F, t, -b, -v, -f, \C:\\Test.backup\, TESTDB}; Do you really have space in those strings? Because that's not going to work. Secondly, work out where your STDERR is going, because any error messages from pg_dump will have gone there... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] String handling function, substring vs. substr
Hi all, brian [EMAIL PROTECTED] writes: [EMAIL PROTECTED] wrote: Does this mean that substr calls substring internally?? Or is it the other way around?? Or are they independent of each other?? Looks like they're pretty evenly matched. Actually, a bit of poking into the contents of pg_proc will show you that they are both aliases for the same C function (text_substr_no_len). So they should be *exactly* the same speed. regards, tom lane Thanks Brian for your answer (although that wasn't what I was looking for, I was looking for Tom's answer). Thanks Tom for your answer. Archie ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Weird disk write load caused by PostgreSQL?
On Oct 2, 2006, at 17:50 , Tom Lane wrote: Alexander Staubo [EMAIL PROTECTED] writes: I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that seems to be writing data to disk at rates that I think are disproportional to the update load imposed on the database. I am looking for ways to determine the cause of this I/O. Are you sure that iostat is to be trusted? No. :) But iostat reads directly from /dev/diskstats, which should be reliable. Of course, it still doesn't say anything about which process is doing the writing; for that I would need to install the atop kernel patches or similar. ... The read numbers in particular look suspiciously uniform ... it would be a strange query load that would create a read demand changing less than 1% from hour to hour, unless perhaps that represented the disk's saturation point, which is not the case if you're not seeing obvious performance problems. They are not uniform at all -- they correlate perfectly with the web traffic; it just so happens that the samples I quoted were from peak hours. Take a look at the Munin graph. (The spikes correspond to scheduled maintenance tasks like backups.) Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Advantages of postgresql
Ive recently done the same thing. Basically, it boils down to philosophy. MySQLs primary goal is speed. Speed over features, and even speed over data integrity. PostgreSQLs (and most RDBMS systems) primary goal is to present the complete relational model and maintain ACID compliance. If youre using MySQL 4.x or earlier, youve got a terrible DBMS. Prior to MySQL 5, non-integer math was always inaccurate. There was no precise datatype. Additionally, MySQL 4 lacked a number of features like views, triggers, and stored procedures. MySQL 5 adds these features. Even then, however, the default engine for MySQL, MyISAM, is *not* a transactional engine so updates are not atomic. MyISAM also doesnt support foreign key constraints, which, if your schema is even remotely complex, is nightmarish. You must use the InnoDB engine in MySQL to get the benefits of transactions. Essentially, it boils down to this: 1. If you have a very simple database of 1 or two unrelated tables for a shopping cart or a guest book, MySQL is fine. (But so is SQLite.) 2. If all you care about is speed and arent terribly concerned if some of your records break or dont commit, MySQL is also fine. This is why some sites (Slashdot, Digg) use MySQL databases. Its no big deal if one of the forums loses some random guys anti-MS rant. 3. If you plan to do all your data checking in your control code and not enforce referential integrity, MySQL is fine. This method is generally considered poor design, however. Part of the problem many DBAs have with MySQL is that the primary developer is a bit strange. Early versions of the MySQL documentation called foreign keys tools for weak developers, and said that ACID compliance could be emulated in your application code so it wasnt necessary in the database. It should be pointed out that no credible DBA (and, I should hope, no credible app devs) would agree with these statements. Essentially, instead of properly citing limitations of the database, early MySQL docs simply said not only that every other DBMS in the world had it wrong, but that the relational model itself was essentially not useful. To DBAs, MySQL came to be seen as one step above the MBA who makes his department use a central Excel spreadsheet as a database. This reputation continues to stick with MySQL in spite of the strides it has made with MySQL 5. Another huge problem with MySQL is that it silently truncates data. If you have a DECIMAL(5) field and try to INSERT 10 or 100 or what have you, instead of throwing an error MySQL instead inserts 9 (the maximum value). Thats just bad. An RDBMS should do exactly everything you tell it and complain *loudly* when it cant. If youre used to MySQL, the problems with PostgreSQL are basically that its not quite as friendly as MySQL. The command line for Postgre, psql, is less user-friendly. The Windows GUI app, pgAdmin III, is also less user-friendly. Additionally, the default install for PostgreSQL on nearly every Linux system Ive seen is configured to run at minimal requirements. So youll have to edit the configuration file in order to get the database to perform correctly. Also, since PostgreSQL has so many more features than MySQL, it can be a bit daunting to get started. Its like youve worked with Notepad for years and years, and now youre starting to use Word or EMACS. -- Brandon Aiken CS/IT Systems Engineer From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Iulian Manea Sent: Monday, October 02, 2006 3:38 AM To: pgsql-general@postgresql.org Subject: [GENERAL] Advantages of postgresql Hello everybody, So far I have only been working with MySQL. Today I was talking to a friend and he was suggesting I migrated to postgreSQL, as it is way better My question is why? I mean could someone pls tell me some advantages and disadvantages of working with postgresql? Thanks in advance, Iulian!
Re: [GENERAL] Weird disk write load caused by PostgreSQL?
I'm surprised that I have to reply to myself, since in hindsight this should be bloody obvious: It's the pgsql_tmp directory. I just monitored the file creation in that directory, and found PostgreSQL to be creating huge temporary, extremely short-lived files ranging from 1MB to 20MB in size. I increased work_mem to 25MB -- should be perfectly safe on a box with 4GB of RAM, I hope - and the iostat traffic seems to have dropped to near zero. I'm a bit baffled as to why PostgreSQL would ever be sorting 20MB of data in the first place, even with ~12 connections running queries concurrently, but I suppose I will have to look more closely at our query patterns. Alexander. On Oct 2, 2006, at 19:25 , Alexander Staubo wrote: On Oct 2, 2006, at 17:50 , Tom Lane wrote: Alexander Staubo [EMAIL PROTECTED] writes: I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that seems to be writing data to disk at rates that I think are disproportional to the update load imposed on the database. I am looking for ways to determine the cause of this I/O. Are you sure that iostat is to be trusted? No. :) But iostat reads directly from /dev/diskstats, which should be reliable. Of course, it still doesn't say anything about which process is doing the writing; for that I would need to install the atop kernel patches or similar. ... The read numbers in particular look suspiciously uniform ... it would be a strange query load that would create a read demand changing less than 1% from hour to hour, unless perhaps that represented the disk's saturation point, which is not the case if you're not seeing obvious performance problems. They are not uniform at all -- they correlate perfectly with the web traffic; it just so happens that the samples I quoted were from peak hours. Take a look at the Munin graph. (The spikes correspond to scheduled maintenance tasks like backups.) Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Performance and Generic Config after install
As an aside to the [GENERAL] Advantages of PostgreSQL thread going on today, I have wondered why the initial on install config of PostgreSQL is (according to most posts) very conservative. I can see how this would be a plus when people may be getting PostgreSQL as part of an OS in the Linux world who may never/rarely use it. I know that in reality tuning and sizing all of the parameters is a very database specific thing, but it would seem that if some default choices would be available it would give those testing/evaluation and trying to get started a shot at quicker progress. Obviously they would still need to tune to your own application. Some dreadfully named, possibly pointless options? Unoptimized / low performance - Low load on Server (The current out of the box) Production Non Dedicated - PostgreSQL is one of the apps sharing server but is important. Production Dedicated Server - The only purpose of this box is to run PostgreSQL Perhaps this has already been suggested and or shot down? Oisin smime.p7s Description: S/MIME Cryptographic Signature
Re: [GENERAL] Performance and Generic Config after install
On Mon, Oct 02, 2006 at 02:40:03PM -0400, Oisin Glynn wrote: As an aside to the [GENERAL] Advantages of PostgreSQL thread going on today, I have wondered why the initial on install config of PostgreSQL is (according to most posts) very conservative. I can see how this Actually, that's a bit of a hangover in much the way the reputation of MySQL as having no support for ACID is: the last couple of releases of Postgres try to make at least some effort at estimating sane but safe basic configuration for the system when it's installed. That said, there is something of a problem in auto-configuring a cost-based planner and optimiser: some of the tuning is likely to be extremely sensitive to other things you're doing on the box, which means that you need to do a good, careful job for optimal performance. I often hear people complaining about this feature of Postgres in comparison to MySQL. But it's not a reasonable comparison, because MySQL basically uses a rule-based optimiser. And systems like DB2 and Oracle, that use a cost-based optimiser, are often far from perfect after a fresh install, too. A -- Andrew Sullivan | [EMAIL PROTECTED] The plural of anecdote is not data. --Roger Brinner ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Performance and Generic Config after install
On Mon, 2006-10-02 at 14:40 -0400, Oisin Glynn wrote: As an aside to the [GENERAL] Advantages of PostgreSQL thread going on today, I have wondered why the initial on install config of PostgreSQL is (according to most posts) very conservative. I can see how this would be a plus when people may be getting PostgreSQL as part of an OS in the Linux world who may never/rarely use it. I know that in reality tuning and sizing all of the parameters is a very database specific thing, but it would seem that if some default choices would be available it would give those testing/evaluation and trying to get started a shot at quicker progress. Obviously they would still need to tune to your own application. Some dreadfully named, possibly pointless options? Unoptimized / low performance - Low load on Server (The current out of the box) Keep in mind that PostgreSQL doesn't really restrict itself as a whole. If you set the settings too low, and throw costly queries at it, the load on the server will be very high. We don't want to imply that PostgreSQL's settings restrict it's cpu, memory, or disk usage as a whole. Production Non Dedicated - PostgreSQL is one of the apps sharing server but is important. Production Dedicated Server - The only purpose of this box is to run PostgreSQL In my opinion, this is more the job of distributions packaging it. Distributions have these advantages when they are choosing the settings: (1) They have more information about the target computer (2) They have more information about the intended use of the system as a whole (3) They have more ability to ask questions of the user PostgreSQL itself can't easily do those things in a portable way. If someone is compiling from source, it is more reasonable to expect them to know what settings to use. However, that said, I think that distributions certainly do take a cue from the default settings in the source distribution. That's why lately the default settings have been growing more aggressive with each release. Regards, Jeff Davis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgres backup
On 10/3/06, Martijn van Oosterhout kleptog@svana.org wrote: On Sun, Oct 01, 2006 at 09:20:12PM -0700, pd wrote: guys, i have a java web app with a postgres backend. now i want to implement a db database functionality from within my web app. now i have got this so far, String[] args = {\C:\\Program Files\\PostgreSQL\\8.1\\bin\\pg_dump\, -i, -h, localhost, -p, 5432, -U, postgres, -F, t, -b, -v, -f, \C:\\Test.backup\, TESTDB}; Secondly, work out where your STDERR is going, because any error messages from pg_dump will have gone there... For some useful info on draining STDERR (and seeing error codes etc) from with java see http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html -Damian ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Determining size of a database before dumping
Hi all, I am (re)writing a backup program and I want to add a section for backing up pSQL DBs. In the planning steps (making sure a given destination has enough space) I try to calculate how much space will be needed by a 'pg_dump' run *before* actually dumping it. Is there a relatively easy way to do that? Moreso, if it possible to do this from an unpriviledged account? If not, is there a way to add the permissions to a specific pg user to allow that user to perform this? Thanks in advance! Madison ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Determining size of a database before dumping
Madison Kelly wrote: Hi all, I am (re)writing a backup program and I want to add a section for backing up pSQL DBs. In the planning steps (making sure a given destination has enough space) I try to calculate how much space will be needed by a 'pg_dump' run *before* actually dumping it. I suppose: pg_dump $PGD_OPTIONS | wc -c isn't efficient enough, right? Without knowing the options you plan to use with pg_dump (compression? dump just tables?, etc.) this is going to be hard to get a decent estimate from... -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Determining size of a database before dumping
On Oct 2, 2006, at 22:17 , Madison Kelly wrote: I am (re)writing a backup program and I want to add a section for backing up pSQL DBs. In the planning steps (making sure a given destination has enough space) I try to calculate how much space will be needed by a 'pg_dump' run *before* actually dumping it. Is there a relatively easy way to do that? Moreso, if it possible to do this from an unpriviledged account? If not, is there a way to add the permissions to a specific pg user to allow that user to perform this? You could dump the database to /dev/null, piping it through wc to catch the size, but that would of course be wasteful. You could count the disk space usage of the actual stored tuples, though this will necessarily be inexact: http://www.postgresql.org/docs/8.1/static/diskusage.html Or you could count the size of the physical database files (/var/lib/ postgresql or wherever). While these would be estimates, you could at least guarantee that the dump would not *exceed* the esimtate. Keep in mind that pg_dump can compress the dump and (iirc) will do so by default when you use the custom format (-Fc or --format=c). Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Determining size of a database before dumping
Steve Wampler wrote: Madison Kelly wrote: Hi all, I am (re)writing a backup program and I want to add a section for backing up pSQL DBs. In the planning steps (making sure a given destination has enough space) I try to calculate how much space will be needed by a 'pg_dump' run *before* actually dumping it. I suppose: pg_dump $PGD_OPTIONS | wc -c isn't efficient enough, right? Without knowing the options you plan to use with pg_dump (compression? dump just tables?, etc.) this is going to be hard to get a decent estimate from... For now, lets assume I am doing a raw dump (no compression) and no fancy switches. I would probably err of the side of caution and try dumping OIDs and all schema (plus whatever else is needed to insure a full restore to a clean DB). I could try piping the dump into something like 'wc' but with very large DBs I'd be worried about the (tremendous) disk I/O that would cause. This is also why I am hoping Pg keeps this info somewhere. Madison ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Performance and Generic Config after install
I think the problem would be partly mitigated be better or more obvious documentation that makes it clear that a) PostgreSQL is probably not configured optimally, and b) where exactly to go to get server optimization information. Even basic docs on postgresql.conf seem lacking. The fact that something like these exist: http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html http://www.varlena.com/GeneralBits/Tidbits/perf.html#conf Should be a giant red flag that documentation is a tad sparse for the config file. Those docs would take hours of work, they're the only thing I've really found, *and they're still 3 years out of date*. It took me a lot of digging to find the docs on optimizing PostgreSQL from postgresql.org. It's in the documentation area, but it's not documented in the manual at all that I could find (which is highly counter-intuitive). Instead, it's listed under 'Technical Documentation' (making the manual casual documentation? I thought all Postgre docs were technical.) then under 'Community Guides and Docs', and finally under the subheading Optimizing (note that the above links are listed here): http://www.postgresql.org/docs/techdocs.2 Either the server installer or the (preferably) the manual needs to make it very clear about this documentation. If nobody can find it nobody will use it, and it's very well hidden at the moment. The manual gets updated with every release, but more and more I realize that the manual isn't comprehensive. The manual explains the SQL syntax and how PostgreSQL interprets the relational model, but it has very little information on how to really *use* PostgreSQL as a server. The manual is all app dev and no sysadmin. For example, *what* compile time options are available? I know they exist, but I've never seen them listed. For another example, take a look at this so-called detailed guide to installing PG on Fedora, which is linked from the 'Technical Documentation' area of postgresql.org: http://dancameron.org/pages/how-to-install-and-setup-postgresql-for-fedo ralinux/ Now, really, this 'guide' is little more than what yum command to run and which config lines to edit to limit remote TCP connections. Now take a look at the first comment: Thanks for the advice. For an Oracle DBA this really helped me in comming up to speed on Postgres administration. There should be an Administration Guide companion to the Manual. -- Brandon Aiken CS/IT Systems Engineer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Davis Sent: Monday, October 02, 2006 2:58 PM To: Oisin Glynn Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Performance and Generic Config after install On Mon, 2006-10-02 at 14:40 -0400, Oisin Glynn wrote: As an aside to the [GENERAL] Advantages of PostgreSQL thread going on today, I have wondered why the initial on install config of PostgreSQL is (according to most posts) very conservative. I can see how this would be a plus when people may be getting PostgreSQL as part of an OS in the Linux world who may never/rarely use it. I know that in reality tuning and sizing all of the parameters is a very database specific thing, but it would seem that if some default choices would be available it would give those testing/evaluation and trying to get started a shot at quicker progress. Obviously they would still need to tune to your own application. Some dreadfully named, possibly pointless options? Unoptimized / low performance - Low load on Server (The current out of the box) Keep in mind that PostgreSQL doesn't really restrict itself as a whole. If you set the settings too low, and throw costly queries at it, the load on the server will be very high. We don't want to imply that PostgreSQL's settings restrict it's cpu, memory, or disk usage as a whole. Production Non Dedicated - PostgreSQL is one of the apps sharing server but is important. Production Dedicated Server - The only purpose of this box is to run PostgreSQL In my opinion, this is more the job of distributions packaging it. Distributions have these advantages when they are choosing the settings: (1) They have more information about the target computer (2) They have more information about the intended use of the system as a whole (3) They have more ability to ask questions of the user PostgreSQL itself can't easily do those things in a portable way. If someone is compiling from source, it is more reasonable to expect them to know what settings to use. However, that said, I think that distributions certainly do take a cue from the default settings in the source distribution. That's why lately the default settings have been growing more aggressive with each release. Regards, Jeff Davis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of
Re: [GENERAL] Performance and Generic Config after install
Patches welcome. :) BTW, -docs or -www might be a better place to discuss this. On Mon, Oct 02, 2006 at 05:11:20PM -0400, Brandon Aiken wrote: I think the problem would be partly mitigated be better or more obvious documentation that makes it clear that a) PostgreSQL is probably not configured optimally, and b) where exactly to go to get server optimization information. Even basic docs on postgresql.conf seem lacking. The fact that something like these exist: http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html http://www.varlena.com/GeneralBits/Tidbits/perf.html#conf Should be a giant red flag that documentation is a tad sparse for the config file. Those docs would take hours of work, they're the only thing I've really found, *and they're still 3 years out of date*. It took me a lot of digging to find the docs on optimizing PostgreSQL from postgresql.org. It's in the documentation area, but it's not documented in the manual at all that I could find (which is highly counter-intuitive). Instead, it's listed under 'Technical Documentation' (making the manual casual documentation? I thought all Postgre docs were technical.) then under 'Community Guides and Docs', and finally under the subheading Optimizing (note that the above links are listed here): http://www.postgresql.org/docs/techdocs.2 Either the server installer or the (preferably) the manual needs to make it very clear about this documentation. If nobody can find it nobody will use it, and it's very well hidden at the moment. The manual gets updated with every release, but more and more I realize that the manual isn't comprehensive. The manual explains the SQL syntax and how PostgreSQL interprets the relational model, but it has very little information on how to really *use* PostgreSQL as a server. The manual is all app dev and no sysadmin. For example, *what* compile time options are available? I know they exist, but I've never seen them listed. For another example, take a look at this so-called detailed guide to installing PG on Fedora, which is linked from the 'Technical Documentation' area of postgresql.org: http://dancameron.org/pages/how-to-install-and-setup-postgresql-for-fedo ralinux/ Now, really, this 'guide' is little more than what yum command to run and which config lines to edit to limit remote TCP connections. Now take a look at the first comment: Thanks for the advice. For an Oracle DBA this really helped me in comming up to speed on Postgres administration. There should be an Administration Guide companion to the Manual. -- Brandon Aiken CS/IT Systems Engineer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Davis Sent: Monday, October 02, 2006 2:58 PM To: Oisin Glynn Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Performance and Generic Config after install On Mon, 2006-10-02 at 14:40 -0400, Oisin Glynn wrote: As an aside to the [GENERAL] Advantages of PostgreSQL thread going on today, I have wondered why the initial on install config of PostgreSQL is (according to most posts) very conservative. I can see how this would be a plus when people may be getting PostgreSQL as part of an OS in the Linux world who may never/rarely use it. I know that in reality tuning and sizing all of the parameters is a very database specific thing, but it would seem that if some default choices would be available it would give those testing/evaluation and trying to get started a shot at quicker progress. Obviously they would still need to tune to your own application. Some dreadfully named, possibly pointless options? Unoptimized / low performance - Low load on Server (The current out of the box) Keep in mind that PostgreSQL doesn't really restrict itself as a whole. If you set the settings too low, and throw costly queries at it, the load on the server will be very high. We don't want to imply that PostgreSQL's settings restrict it's cpu, memory, or disk usage as a whole. Production Non Dedicated - PostgreSQL is one of the apps sharing server but is important. Production Dedicated Server - The only purpose of this box is to run PostgreSQL In my opinion, this is more the job of distributions packaging it. Distributions have these advantages when they are choosing the settings: (1) They have more information about the target computer (2) They have more information about the intended use of the system as a whole (3) They have more ability to ask questions of the user PostgreSQL itself can't easily do those things in a portable way. If someone is compiling from source, it is more reasonable to expect them to know what settings to use. However, that said, I think that distributions certainly do take a cue from the default settings in the source distribution. That's why lately the default
Re: [GENERAL] Determining size of a database before dumping
Alexander Staubo wrote: On Oct 2, 2006, at 22:17 , Madison Kelly wrote: I am (re)writing a backup program and I want to add a section for backing up pSQL DBs. In the planning steps (making sure a given destination has enough space) I try to calculate how much space will be needed by a 'pg_dump' run *before* actually dumping it. Is there a relatively easy way to do that? Moreso, if it possible to do this from an unpriviledged account? If not, is there a way to add the permissions to a specific pg user to allow that user to perform this? You could dump the database to /dev/null, piping it through wc to catch the size, but that would of course be wasteful. You could count the disk space usage of the actual stored tuples, though this will necessarily be inexact: http://www.postgresql.org/docs/8.1/static/diskusage.html Or you could count the size of the physical database files (/var/lib/postgresql or wherever). While these would be estimates, you could at least guarantee that the dump would not *exceed* the esimtate. Keep in mind that pg_dump can compress the dump and (iirc) will do so by default when you use the custom format (-Fc or --format=c). Alexander. Heh, that looks like just the article I would have wanted if I had properly RTFM. :D Many thanks!! Madi ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Determining size of a database before dumping
Alexander Staubo [EMAIL PROTECTED] writes: You could count the disk space usage of the actual stored tuples, though this will necessarily be inexact: http://www.postgresql.org/docs/8.1/static/diskusage.html Or you could count the size of the physical database files (/var/lib/ postgresql or wherever). While these would be estimates, you could at least guarantee that the dump would not *exceed* the esimtate. You could guarantee no such thing; consider compression of TOAST values. Even for uncompressed data, datatypes such as int and float can easily print as more bytes than they occupy on-disk. Given all the non-data overhead involved (eg for indexes), it's probably unlikely that a text dump would exceed the du size of the database, but it's far from guaranteed. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Experiences with 3PAR
I'm curious if anyone on the list has any hands on performance experience with running PostgreSQL on 3PAR appliances (big and small). If you do, please contact me offlist - I'll be happy to summarize for the list archives. Thanks, Aaron ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Determining size of a database before dumping
On Oct 2, 2006, at 23:19 , Tom Lane wrote: Alexander Staubo [EMAIL PROTECTED] writes: You could count the disk space usage of the actual stored tuples, though this will necessarily be inexact: http://www.postgresql.org/docs/8.1/static/diskusage.html Or you could count the size of the physical database files (/var/lib/ postgresql or wherever). While these would be estimates, you could at least guarantee that the dump would not *exceed* the esimtate. You could guarantee no such thing; consider compression of TOAST values. Even for uncompressed data, datatypes such as int and float can easily print as more bytes than they occupy on-disk. Why does pg_dump serialize data less efficiently than PostgreSQL when using the custom format? (Pg_dump arguably has greater freedom in being able to apply space-saving optimizations to the output format. For example, one could use table statistics to selectively apply something like Rice coding for numeric data, or vertically decompose the tuples and emit sorted vectors using delta compression.) As for TOAST, should not pg_dump's compression compress just as well, or better? Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Determining size of a database before dumping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/02/06 16:19, Tom Lane wrote: Alexander Staubo [EMAIL PROTECTED] writes: You could count the disk space usage of the actual stored tuples, though this will necessarily be inexact: http://www.postgresql.org/docs/8.1/static/diskusage.html Or you could count the size of the physical database files (/var/lib/ postgresql or wherever). While these would be estimates, you could at least guarantee that the dump would not *exceed* the esimtate. You could guarantee no such thing; consider compression of TOAST values. Even for uncompressed data, datatypes such as int and float can easily print as more bytes than they occupy on-disk. Given all the non-data overhead involved (eg for indexes), it's probably unlikely that a text dump would exceed the du size of the database, but it's far from guaranteed. It's my experience that when there are lots of numeric fields, fixed-width text records are approx 2.5x larger than the original binary records. - -- Ron Johnson, Jr. Jefferson LA USA Is common sense really valid? For example, it is common sense to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that common sense is obviously wrong. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFIZgvS9HxQb37XmcRAp6xAKC74LV+2wR6Ao5Oq56RInkkDP8PZgCglKEv z0fvjrXTloWJJ7qdhfOpIoI= =jICB -END PGP SIGNATURE- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Determining size of a database before dumping
On Tue, 2006-10-03 at 00:42 +0200, Alexander Staubo wrote: Why does pg_dump serialize data less efficiently than PostgreSQL when using the custom format? (Pg_dump arguably has greater freedom in being able to apply space-saving optimizations to the output format. For example, one could use table statistics to selectively apply something like Rice coding for numeric data, or vertically decompose the tuples and emit sorted vectors using delta compression.) As for TOAST, should not pg_dump's compression compress just as well, or better? It would be a strange set of data that had a larger representation as a compressed pg_dump than the data directory itself. However, one could imagine a contrived case where that might happen. Let's say you had a single table with 10,000 columns of type INT4, 100M records, all with random numbers in the columns. I don't think standard gzip compression will compress random INT4s down to 32 bits. Another example is NULLs. What if only a few of those records had non- NULL values? If I understand correctly, PostgreSQL will represent those NULLs with just one bit. What you're saying is more theoretical. If pg_dump used specialized compression based on the data type of the columns, and everything was optimal, you're correct. There's no situation in which the dump *must* be bigger. However, since there is no practical demand for such compression, and it would be a lot of work, there is no *guarantee* that the data directory will be bigger. However, it probably is. Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Determining size of a database before dumping
Jeff Davis [EMAIL PROTECTED] writes: On Tue, 2006-10-03 at 00:42 +0200, Alexander Staubo wrote: Why does pg_dump serialize data less efficiently than PostgreSQL when using the custom format? What you're saying is more theoretical. If pg_dump used specialized compression based on the data type of the columns, and everything was optimal, you're correct. There's no situation in which the dump *must* be bigger. However, since there is no practical demand for such compression, and it would be a lot of work ... There are several reasons for not being overly tense about the pg_dump format: * We don't have infinite manpower * Cross-version and cross-platform portability of the dump files is critical * The more complicated it is, the more chance for bugs, which you'd possibly not notice until you *really needed* that dump. In practice, pushing the data through gzip gets most of the potential win, for a very small fraction of the effort it would take to have a smart custom compression mechanism. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Foreign keys, arrays, and uniqueness
Let me start by saying I understand that postgresql does not support the following: composite data types with individual components acting as foreign keys, arrays of composite data types, and arrays with elements acting as foreign keys. I will layout my example using them for clarity even though they don't exist Ok let me layout a rough example of my problem: CREATE TABLE target_node1 ( id int4, value text ); CREATE TABLE target_node2 ( id int4, value text ); CREATE TYPE composite1 ( idx int4, reference int4 ); CREATE TYPE composite2 ( idx int4, reference1 int4 reference2 int4 ); CREATE TABLE example_table ( id int4, value test, type1 composite1[]; type2 composite2[]; ); ALTER TABLE example_table ADD FOREIGN KEY ((composite1).reference) REFERENCES target_node1 (id); ALTER TABLE example_table ADD FOREIGN KEY ((composite2).reference1) REFERENCES target_node1 (id); ALTER TABLE example_table ADD FOREIGN KEY ((composite2).reference2) REFERENCES target_node2 (id); In addition I want a UNIQUE check constraint on INSERT and UPDATE that first checks that the text in the value column is unique. If that fails then I need to check that the type1 and type2 arrays are unique and act accordingly. Yes I understand all of that is completely wrong, but hopefully it clarifies my intentions. Essentially, I want a list(s) of a custom type(hopefully composite type) where individual data members of the type are foreign keys to other tables. Also I need to use those custom types for uniqueness in combination with other information in the table. The solutions as I see it: 1. Parallel arrays for the data members of the types. In other words an array for composite1.idx, an array for composite1.reference and so on. Then I could take the array of interest and create triggers that would allow the array to act as a set of foreign keys. However, this is slow when updating or deleting from a table like target_node1. Also it just seems accident prone given that the values are all separate from each other, and hardly human understandable. 2. Normalize this and move the composite type into its own table that references example_table. It is a many-to-one relationship after all, and I am sure this is the route that will probably be suggested by you guys. However... if I do that how the hell do I do maintain my uniqueness check on insert into example table given that half of the data I am checking on is in another table, and hasn't been inserted yet for the new row? I am totally stuck in regards to this... 3. Something with truly custom data types(aka C functions loaded into the back end). To be honest I don't even know how possible that is or how much work that would take. I have defined my own custom types before, but not any indexing functions for them. What would I have to define beyond the type itself? How would the foreign keys work in this situation, if at all? Help! This is seriously important to my project... but naturally the less complicated the solution the better. Thanks in advance, Morgan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match