Re: [GENERAL] pg_autovacuum not doing anything
I verified that it was not doing anything by erasing the pg_statistics table and it did not fill it within 24 hours. When I ran vacuum analyze manually it filled it in. I am using postgresql 8.03 on gentoo which was downloaded with portage and included pg_autovacuum as part of the set up. I will try setting the debug output to see if that gives me any more information. Thank You Sim Thomas F. O'Connell [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] How do you know that it is not vacuuming or analyzing anything? And which version of postgres did you build/acquire pg_autovacuum from? It seems that in post-8.0 versions, there is almost no output by default. You could try increasing the debug output using -d 1 or -d 2 on the command line to verify that any activity is occurring. -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax) On Sep 11, 2005, at 4:26 AM, Sim Zacks wrote: I have pg_autovacuum running on my gentoo server, the same server with the postgresql database. ps shows that it is running and I have it start automatically with the system. It is not vacuuming or analyzing the tables though. Am I doing something wrong? Is there a better way to do this? Thank You Sim ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] SQL - planet redundant data
Thank you for your input! Individual replies follow below. ## Chris Travers wrote: Ok. Imagine two huge huge tables: file_dataand additional_data create_table file_data ( file_id serial primary key, station_id text, ); create table additional_data ( data_id bigserial primary key, file_id int REFERENCES file_data (file_id), temp, ); So the 500 million entries in additional_data.file_id are now restricted to belong to the same group of values found in file_data.file_id. But I don't see how this can get me rid of redundant data - I still have 500M entries? Consider the column year. My 500M records come from 13 years. Saving 500M values when only 13 are distinct is very redundant. Also you can normalize your way out of the redundant data problem. I can tell how to do this for filename and station name - I save something by saving an integer rather than a text string. But all the rest of my columns are already smallints. Do I save anything by storing 5 rather than 1997? I would *highly* recommend waiting until 8.1 goes live to impliment this in production. It will be built/rebuilt after 8.1 is released. ## John D. Burger wrote: Why not effectively append all of your per-file tables into one huge table, with an additional column indicating which file the data comes from? Then you have only two tables, one with a row for each observation in your data, one with a row for each file. Some queries need to join the two tables, but that's not a big deal. That big table with a row for each observation will have to include all columns, such as year and station_id. For such columns I store 500M values of which only ~15 are distinct in my case - the redundancy I'm trying to minimize. By splitting into smaller tables, many columns will contain only one distinct value. It *should* be possible to delete such columns and instead store their corresponding values in a table header. If I had one table pr. file, the table you suggest with one row pr. file could effectively serve as the collection of table headers. It also seems to me that you may be tying your schema design too closely to the current way that the data is represented. Do you really need to have the data file figure so prominently in your design? The archive of data files is shared by a number of research groups around the world. There's a whole software package that people use for data analysis, accessing the data files as they are. So I expect a lot of file-specific queries. Hmm, in fact if the redundant values you're worried about come in long stretches (e.g., temperature is the same for many observations in a row), I suppose you could do the same thing - map a constant value to the range of observation IDs for which it holds. This gets back to having many tables, though. This is in effect normalization? But if the observation ID takes just as much storage as the original value, have we gained anything? BTW, I'm not aiming at redundancy in the measurements - this is minimal compared to e.g. year and station ID. I do have an idea of how it *should* be possible to get rid of much of it - read the response to Tom Lane below. ## Tom Lane wrote: No, tableoid is sort of a virtual column ... it doesn't exist on disk. When you query it you get a value fetched from the internal data structure representing the table. So virtual columns are possible - THIS is a way to clear redundant data! Is it possible for a user to create a virtual column? If not, this would make a big improvement. What I really need are partial virtual columns. I'm imagining an alternative version of VACUUM ANALYZE that could do the following: 1) Order the rows in the table so that for each column, identical values are placed next to each other for as far as possible (the row order that optimizes one column will probably not be optimal for other columns). 2) For each column, identify the stretches that contain only one distinct value. Save that value together with ID of start and end row and delete stretch. It is not obvious how to do a perfect optimization process in 1), at least not to me - I'm sure a skilled mathematician would know exactly how to do it. But here's a simple approach that would get us part of the way: 1.1) Grab the column w. most redundancy (fewest distinct values) and sort it into groups according to the distinct values. 1.2) For each of these groups, grab the column w. next most redundancy and sort into groups according to the distinct values. And so on. Stop whenever groups become so small that there's nothing to gain. Such an analysis would make it much less expensive to combine same-schema tables, and having everything in the same table is really convenient. It would obviously save a lot of storage space, but I imagine it would enable more efficient queries too
[GENERAL] help me learn
hi, i'm new to postgreSQL as well as new to database concepts. please tell me how can i learn. i mean the easiest and fast way. Your help will be appreciated.
Re: [GENERAL] help me learn
am 12.09.2005, um 17:08:31 +0530 mailte suresh ramasamy folgendes: hi, i'm new to postgreSQL as well as new to database concepts. please tell me how can i learn. i mean the easiest and fast way. Your help will be Read a book. http://techdocs.postgresql.org/techdocs/bookreviews.php Regards, Andreas -- Andreas Kretschmer(Kontakt: siehe Header) Heynitz: 035242/47212, D1: 0160/7141639 GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net ===Schollglas Unternehmensgruppe=== ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] help me learn
suresh ramasamy wrote: i'm new to postgreSQL as well as new to database concepts. please tell me how can i learn. i mean the easiest and fast way. Your help will be appreciated. Make an appropriate posting to pgsql-jobs? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Replication
Hi, I currently have a postgresql 8 system which I want to replicate (ideally in realtime) with a spare host in order to introduce some redundancy - eg. if the master server dies then I've got a ready-to-go backup. Switchover does not have to be automated. I've looked into commandprompt.com's mammoth system, but it only supports up to 1000 tables (the documentation doesn't mention this!) - the database in question has more than 1000 tables, and adds new tables regularly. Slony-I and pgpool apparently don't support dynamic schemas, which I'd obviously need, so they're not quite up to the job either. I'm currently looking at some sort of hack-job with the WAL archives (see http://www.issociate.de/board/index.php?t=msggoto=443099), but this looks like a slightly flaky approach - have I missed the obvious solution? Is there any stable software available which can replicate a large and dynamic number of tables? Cheers, Barry ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Replication
Am Montag, 12. September 2005 13:52 schrieb [EMAIL PROTECTED]: I currently have a postgresql 8 system which I want to replicate Look at DRBD. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
Example: assume a table of 10 columns, three of which are fname, lname, and dob. If an index is created on (fname, lname, dob), will a query that utilizes two of the columns ( select 'data' from table where fname = 'X' and lname = 'Y') utilize the index? thanks, reid ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: Example: assume a table of 10 columns, three of which are fname, lname, and dob. If an index is created on (fname, lname, dob), will a query that utilizes two of the columns ( select 'data' from table where fname = 'X' and lname = 'Y') utilize the index? Yes, if it is selective enough. (It _can_ use the index, which does not mean that it _will_ use it.) Note that if your example query used the columns (lname, dob), the answer would be no. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, Gee, Officer Krupke) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
Alvaro Herrera wrote: On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: Example: assume a table of 10 columns, three of which are fname, lname, and dob. If an index is created on (fname, lname, dob), will a query that utilizes two of the columns ( select 'data' from table where fname = 'X' and lname = 'Y') utilize the index? Yes, if it is selective enough. (It _can_ use the index, which does not mean that it _will_ use it.) Note that if your example query used the columns (lname, dob), the answer would be no. Why is that? In order to use an index, does the query have to utilize the 'first' element of the index? reid ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: assume a table of 10 columns, three of which are fname, lname, and dob. If an index is created on (fname, lname, dob), will a query that utilizes two of the columns ( select 'data' from table where fname = 'X' and lname = 'Y') utilize the index? See Multicolumn Indexes in the Indexes chapter of the documentation. http://www.postgresql.org/docs/8.0/interactive/indexes-multicolumn.html You can use EXPLAIN to see whether the planner will use an index for a particular query. http://www.postgresql.org/docs/8.0/interactive/performance-tips.html#USING-EXPLAIN Note, however, that the planner will ignore an index and use a sequential scan if it thinks the latter will be faster, so if you want to see whether the query *can* use an index (as opposed to *will* use it) then you could execute SET enable_seqscan TO off and then run EXPLAIN (don't forget to RESET enable_seqscan or SET it back to on when you're done testing). -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote: Alvaro Herrera wrote: Note that if your example query used the columns (lname, dob), the answer would be no. Why is that? In order to use an index, does the query have to utilize the 'first' element of the index? In released versions of PostgreSQL, yes. Version 8.1 will remove that restriction. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote: Alvaro Herrera wrote: On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: Example: assume a table of 10 columns, three of which are fname, lname, and dob. If an index is created on (fname, lname, dob), will a query that utilizes two of the columns ( select 'data' from table where fname = 'X' and lname = 'Y') utilize the index? Yes, if it is selective enough. (It _can_ use the index, which does not mean that it _will_ use it.) Note that if your example query used the columns (lname, dob), the answer would be no. Why is that? In order to use an index, does the query have to utilize the 'first' element of the index? The leftmost part. There's no way to scan an index if you don't know the key. On a btree index, the key is ordered, and the columns at the left are more significant than those at the right. If you don't provide a value for the leftmost (first) column, there's no way to start scanning the index because there's no starting point. I don't think that was nearly clear enough, but OTOH I haven't had any coffee today yet. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com If you have nothing to say, maybe you need just the right tool to help you not say it. (New York Times, about Microsoft PowerPoint) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
Alvaro Herrera [EMAIL PROTECTED] writes: On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: Example: assume a table of 10 columns, three of which are fname, lname, and dob. If an index is created on (fname, lname, dob), will a query that utilizes two of the columns ( select 'data' from table where fname = 'X' and lname = 'Y') utilize the index? Yes, if it is selective enough. (It _can_ use the index, which does not mean that it _will_ use it.) Note that if your example query used the columns (lname, dob), the answer would be no. Actually, that last point is not true anymore as of 8.1 --- see this thread: http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php which led to this patch: http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php I missed the fact that the documentation said it wouldn't work though. Will fix... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] SQL - planet redundant data
Hmm, in fact if the redundant values you're worried about come in long stretches (e.g., temperature is the same for many observations in a row), I suppose you could do the same thing - map a constant value to the range of observation IDs for which it holds. This gets back to having many tables, though. This is in effect normalization? But if the observation ID takes just as much storage as the original value, have we gained anything? BTW, I'm not aiming at redundancy in the measurements - this is minimal compared to e.g. year and station ID. I don't think this is exactly what people meant by normalization, but maybe. My basic thought was that you since you have some redundancy in your data, you might want to use some form of compression. One of the simplest forms of compression is called run-length encoding (http://en.wikipedia.org/wiki/Run_length_encoding). So you'd have most of your data in a main table: create table observations ( obsID integer primary key,-- Maybe a BIGINT temperature float, etc. ); and some other compressed tables for those features that have long runs of repetitive values: create table obsYears ( startObsinteger primary key references observations (obsID), endObs integer references observations (obsID), yearinteger); create table obsStations ( startObsinteger primary key references observations (obsID), endObs integer references observations (obsID), stationID integer); (Caution, I haven't checked these for syntax.) I've introduced an observation ID, and then I have compressed tables that map =ranges= of these IDs to values that are constant for long stretches. Each year occupies only one row, same with each station. (I think your reply to Tom may have been getting at something like this.) Now you can do queries like this, say, for temperature statistics in a particular year: select avg(temperature), stddev(temperature) from observations, obsYears where obsID between startObs and endObs and year = 2001; You could join in other compressed tables in the same way. In fact, you could glue them all together with a VIEW, and you'd be able to treat the whole thing like one giant table, with much of the redundancy removed. Note that if you define indexes on the startObs and endObs columns, Postgresql might avoid scanning through the compressed tables every time you do a query. You might also benefit from a composite index on (startObs, endObs). For features like year, which are massively repetitive, this might even be faster than storing the feature in the main table, since the compressed table will easily fit in memory. So the basic idea is run-length encoding for repetitive values. I think I can use this in some of my own data - I don't know why I never thought of it before. - John D. Burger MITRE ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
Alvaro Herrera [EMAIL PROTECTED] writes: On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote: Why is that? In order to use an index, does the query have to utilize the 'first' element of the index? The leftmost part. There's no way to scan an index if you don't know the key. On a btree index, the key is ordered, and the columns at the left are more significant than those at the right. If you don't provide a value for the leftmost (first) column, there's no way to start scanning the index because there's no starting point. Actually, btree doesn't have any particular problem with that --- it just starts the scan at the beginning of the index. However the other index types do all require a constraint on the first index column; for instance hash has to be able to determine a hash value. Greg Stark suggests here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php that GiST could also be fixed to work with any subset of the index columns, but it hasn't been done yet, unless Teodor and Oleg snuck something in during that last round of GiST work. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Replication
[EMAIL PROTECTED] wrote: I've looked into commandprompt.com's mammoth system, but it only supports up to 1000 tables (the documentation doesn't mention this!) - the database in question has more than 1000 tables, and adds new tables regularly. Slony-I and pgpool apparently don't support dynamic schemas, which I'd obviously need, so they're not quite up to the job either. Actually the theorectical limit for the Mammoth system is 10,000 tables, we have only tested up to 1000. This is metioned quite clearly on the website. It should be noted that if Slony-I won't do what you need then Mammoth probably would not either. Although Mammoth can add new tables on the fly as long as those tables are empty. If they are not empty they would cause a full sync to occur. Sincerely, Joshua D. Drake I'm currently looking at some sort of hack-job with the WAL archives (see http://www.issociate.de/board/index.php?t=msggoto=443099), but this looks like a slightly flaky approach - have I missed the obvious solution? Is there any stable software available which can replicate a large and dynamic number of tables? Cheers, Barry ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] If an index is based on 3 columns will a query using
Greg Stark suggests here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php that GiST could also be fixed to work with any subset of the index columns, but it hasn't been done yet, unless Teodor and Oleg snuck something in during that last round of GiST work. GiST may work with any subset of index columns too. Even in existing code I don't see any problem except NULL in a first column. GiST doesn't store tuples with leading NULL value (gist.c lines 174, 326), so index doesn't contained them. After our work about WAL-lization GiST, it may work with invalid tuples (possibly occured after crash recovery), so itsn't a big deal to add support NULL in a first column. But freeze date is outdated... Should I add or leave it to 8.2? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
Tom Lane [EMAIL PROTECTED] writes: Alvaro Herrera [EMAIL PROTECTED] writes: Yes, if it is selective enough. (It _can_ use the index, which does not mean that it _will_ use it.) Note that if your example query used the columns (lname, dob), the answer would be no. Actually, that last point is not true anymore as of 8.1 --- see this thread: http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php which led to this patch: http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php Did that patch actually implement skip scanning? The comment seems to only describe removing the restriction from the planner. Which would make it theoretically possible but presumably the the cost estimator should ensure it essentially never gets chosen for btree indexes. The btree index would very very rarely help since it would require a complete index scan. I guess I could see some corner cases where it would help. Very wide tables with an index on a few very selective relatively narrow columns. So the index could be scanned in its entirety much faster than a full table scan. But the index would have to be *much* narrower than the table and quite selective to overcome the random access penalty. Skip scanning would make it much more likely to be helpful. Also, I think Oracle has another scan method called a fast index scan that basically does a full sequential scan of the index. So the tuples come out unordered but the access pattern is sequential. Would that be a good TODO for Postgres? Is it feasible given the index disk structures in Postgres? -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] ERROR: type temp_gc already exists
Am Samstag, 10. September 2005 18:05 schrieb Tom Lane: Janning Vygen [EMAIL PROTECTED] writes: i guess the table was dropped but not the corresponding type. How can things like this happen? Corrupted pg_depend table maybe? You might try REINDEXing pg_depend to be on the safe side. Also please look to see if there are any relevant entries in it (look for objid = the type's OID, or refobjid = 16562879 which we can see was the table's OID). How can i fix it? Can i just drop the type from pg_type? If there's no pg_depend entry then DROP TYPE should work. Otherwise you might have to resort to manually DELETEing the pg_type row. Thanks for your detailed answer. I don't want to do anything wrong. To be sure, i have some more questions: - There is no entry in pg_depend. Should i just drop the entry from pg_type or should i REINDEX anyway? - Can i REINDEX pg_depend in normal operation mode or do i have to take precautions mentioned in the docs? [1] - How can things like this happen? Hardware failure? If yes, should i change my harddisk? kind regards, Janning [1] It's not clear to me if pg_depend is a shared system catalog because the docs say any of the shared system catalogs (pg_database, pg_group, pg_shadow, or pg_tablespace) Maybe the iteration is final, maybe it shows only examples) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
Teodor Sigaev [EMAIL PROTECTED] writes: GiST may work with any subset of index columns too. Even in existing code I don't see any problem except NULL in a first column. GiST doesn't store tuples with leading NULL value (gist.c lines 174, 326), so index doesn't contained them. Well, that's exactly the problem :-(. Or at least one of the problems; the other being what you'd use as search key to find such tuples. After our work about WAL-lization GiST, it may work with invalid tuples (possibly occured after crash recovery), so itsn't a big deal to add support NULL in a first column. But freeze date is outdated... Should I add or leave it to 8.2? Too late for 8.1 I'd say --- this definitely sounds like a new feature rather than a bug fix. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php Did that patch actually implement skip scanning? No, it just removed the planner's arbitrary assumption that the index methods wouldn't cope. Skip scanning is actually something rather different anyway. The comment seems to only describe removing the restriction from the planner. Which would make it theoretically possible but presumably the the cost estimator should ensure it essentially never gets chosen for btree indexes. btcostestimate does understand this now. I guess I could see some corner cases where it would help. Very wide tables with an index on a few very selective relatively narrow columns. So the index could be scanned in its entirety much faster than a full table scan. But the index would have to be *much* narrower than the table and quite selective to overcome the random access penalty. With a bitmap index scan the penalty wouldn't be so high. Also, I think Oracle has another scan method called a fast index scan that basically does a full sequential scan of the index. So the tuples come out unordered but the access pattern is sequential. Would that be a good TODO for Postgres? Is it feasible given the index disk structures in Postgres? I think this would probably fail under concurrent update conditions: you couldn't guarantee not to miss or multiply return index entries. There is interlocking in an index-order scan that prevents such problems, but I don't see how it'd work for a physical-order scan. You could probably make it work if you were willing to lock out writers for the duration of the scan, but that'd severely restrict the usefulness I would think. I'm also not sure how we'd express such a constraint within the system... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Replication
Barry, You can use PITR to archive transaction logs to a second server that is kept in standby mode. This will cope with any number of tables and cope with dynamic changes to tables. This is fairly straightforward and very low overhead. Set archive_command to a program that transfers xlog files to second server. Then set restore_command on the second server to a program that loops until the next file is available. Switchover time is low. Best Regards, Simon Riggs -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of [EMAIL PROTECTED] Sent: 12 September 2005 04:52 To: pgsql-general@postgresql.org Subject: [GENERAL] Replication Hi, I currently have a postgresql 8 system which I want to replicate (ideally in realtime) with a spare host in order to introduce some redundancy - eg. if the master server dies then I've got a ready-to-go backup. Switchover does not have to be automated. I've looked into commandprompt.com's mammoth system, but it only supports up to 1000 tables (the documentation doesn't mention this!) - the database in question has more than 1000 tables, and adds new tables regularly. Slony-I and pgpool apparently don't support dynamic schemas, which I'd obviously need, so they're not quite up to the job either. I'm currently looking at some sort of hack-job with the WAL archives (see http://www.issociate.de/board/index.php?t=msggoto=443099), but this looks like a slightly flaky approach - have I missed the obvious solution? Is there any stable software available which can replicate a large and dynamic number of tables? Cheers, Barry ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] linux pg pointed to windows partition for db
On Sun, Sep 11, 2005 at 06:36:26PM +0100, mike dixon wrote: Tried a couple other places and aren't getting anywhere. A windows xp program I use uses pgsql; I'd like to create a backup of the db but from within linux without xp running (I run xp in vmware; and the xp db backup will be written to an ext* partition). I have the same rev pgsql installed on xp and linux (8.0.3 IIRC); the xp/pgsql db is on an NTFS partition and I mount the disc read-only. You'll be unable to start the database with the disk mounted read-only. If you want to create a backup of the DB you have three options that are likely to work. 1) Create a filesystem level backup of the data directory 2) Boot the virtual machine, start postgresql and run a pg_dump backup under XP. 3) Boot the virtual machine, start postgresql and then run a pg_dump backup under Linux, connecting to the virtual machine over the VMWare virtual network. If you remount the NTFS partition read-write (does that work on Linux these days reliably?) you may well be able to get a linux postmaster to look at it, but I'd be concerned about trashing the data. Not something to try as a backup attempt. Cheers, Steve My problem is two-fold: 1. What parameters to start the linux pgsql with for this circumstance if not the defaults; 2. How to point the linux pgsql/pg_dump to the xp pg's data to do the backup. Can I just start the linux pgsql with its defaults? If so, how do I point linux pg_dump to the pgsql db data in an xp dir (/mnt/Program Files/Postgresql/8.0/data)? If linux pgsql defaults aren't good enough for this circumstance, what do I need to change? Also, if the xp pgsql db requires a pw when working with it under xp, if xp isn't running and I can use the linux pg_dump for this do I still need the xp pgsql pw? Is there any way to pass that to the linux pg_dump on the cmd line so I can put this operation in a script? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] ERROR: type temp_gc already exists
Janning Vygen [EMAIL PROTECTED] writes: Am Samstag, 10. September 2005 18:05 schrieb Tom Lane: If there's no pg_depend entry then DROP TYPE should work. Otherwise you might have to resort to manually DELETEing the pg_type row. Thanks for your detailed answer. I don't want to do anything wrong. To be sure, i have some more questions: - There is no entry in pg_depend. Should i just drop the entry from pg_type or should i REINDEX anyway? Well, what did you do to check that there was no entry? If the index is corrupt and you issued a query that used the index, it might have failed to find an entry that's actually there in the table (in fact, if we're assuming the DROP TYPE didn't happen because the system didn't find the dependency row while dropping the table, this is pretty much exactly what you'd expect). I'd REINDEX and then check again. - How can things like this happen? Hardware failure? If yes, should i change my harddisk? Insufficient information to say. It wouldn't be a bad idea to run some disk tests though. [1] It's not clear to me if pg_depend is a shared system catalog because the docs say any of the shared system catalogs (pg_database, pg_group, pg_shadow, or pg_tablespace) Maybe the iteration is final, maybe it shows only examples) That's meant to be a complete list --- I've updated the documentation to make this clearer. But you could check for yourself: select relname from pg_class where relisshared; regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL - planet redundant data
Thank you, John! I misunderstood you the first time, but I now see we have the same thing in mind. So you'd have most of your data in a main table: create table observations ( obsIDintegerprimary key,-- Maybe a BIGINT temperaturefloat, etc. ); and some other compressed tables for those features that have long runs of repetitive values: create table obsYears ( startObsintegerprimary keyreferences observations (obsID), endObsintegerreferences observations (obsID), yearinteger); create table obsStations ( startObsintegerprimary keyreferences observations (obsID), endObsintegerreferences observations (obsID), stationIDinteger); (Caution, I haven't checked these for syntax.) I've introduced an observation ID, and then I have compressed tables that map =ranges= of these IDs to values that are constant for long stretches. Each year occupies only one row, same with each station. One complication: Applying the observation ID you're in effect ordering the rows. If you order them chronologically, year is perfectly lined up, giving you one row pr. value in your compressed table, but e.g. month will be split up in n_years*12 stretches of obsIDs, and station_id may not have any continuous stretches of obsIDs at all. I don't see any solution to this, but better compression can be achieved by ordering rows optimally when applying the obsID. The reply to Tom Lane in my previous post suggested one way to do this - it may not always be optimal, but at least it's simple. Now you can do queries like this, say, for temperature statistics in a particular year: select avg(temperature), stddev(temperature) from observations, obsYears where obsID between startObs and endObs and year = 2001; This works! I had not yet realized how to make this connection between two tables, so that was a major help - thank you. You could join in other compressed tables in the same way. In fact, you could glue them all together with a VIEW, and you'd be able to treat the whole thing like one giant table, with much of the redundancy removed. That is exactly what I want, and now I finally see how to do it (I think!). However, it is a considerable amount of work to set this up manually, plus, it has been a headache realizing how to get there at all. I'm hoping that one or more of the developers think it would be a good idea for PostgreSQL to perform an internal table optimization process using run-length encoding. Imagine you could just throw all your data into one table, run OPTIMIZE TABLE and you'd be done. With SQL being all about tables I'm surprised this idea (or something even better) hasn't been implemented already. Poul Jensen ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] PQtrace doesn't work
Hello, I've send this message also on 29-1-2004 and have since no solution for this problem .. :o . I have a little test program (see at the end of the message). The program crashes when PQTrace is called (instruction referenced memory at 0x0010, the memory could not be written (obvious ... ). I use the library libpqdll.lib and postgresql v8.0.1, but also happens in 7.4.9. Running under W2000 sp4, VC++ 6 SP5. If compiling under Linux then there is no problem. Obvious there is something wrong with the use under windows If I comment traceoption all works fine. Any ideas? gr, Willem. #include stdio.h #include stdlib.h #include libpq-fe.h #include winsock.h void main () { int nFields; int i, j; PGconn*conn; PGresult *res; char *pghost = linux; char *dbName = some_db; FILE *debug; WSADATA wsadata; WSAStartup(0x0101, wsadata); conn = PQsetdbLogin (pghost, NULL, NULL, NULL, dbName, user,); if (PQstatus(conn) == CONNECTION_BAD) { printf (Connection to database %s is failed\n, dbName); printf (%s, PQerrorMessage (conn)); PQfinish (conn); exit (1); } debug = fopen (trace.out, w); --- PQtrace (conn, debug); res = PQexec (conn, BEGIN); if (!res || PQresultStatus (res) != PGRES_COMMAND_OK) { printf (BEGIN command failed\n); PQclear (res); PQfinish (conn); exit (1); } PQclear (res); res = PQexec (conn, DECLARE mycursor CURSOR FOR select sum(id) from relaties); if (!res || PQresultStatus (res) != PGRES_COMMAND_OK) { printf (DECLARE CURSOR command failed\n); PQclear (res); PQfinish (conn); exit (1); } PQclear (res); res = PQexec (conn, FETCH ALL in mycursor); if (!res || PQresultStatus (res) != PGRES_TUPLES_OK) { printf (FETCH ALL command didn't return tuples properly\n); PQclear (res); PQfinish (conn); exit (1); } nFields = PQnfields (res); for (i = 0; i nFields; i++) printf (%-15s, PQfname (res, i)); printf (\n\n); for (i = 0; i PQntuples (res); i++) { for (j = 0; j nFields; j++) printf (%-15s, PQgetvalue (res, i, j)); printf (\n); } PQclear (res); res = PQexec (conn, CLOSE mycursor); PQclear (res); res = PQexec (conn, COMMIT); PQclear (res); PQfinish (conn); fclose (debug); WSACleanup(); } ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Building postgres on Suze
I'm trying to build postgres on Suze with --with-pam, and it tells me: /usr/lib/gcc-lib/i586-suse-linux/3.3.5/../../../../i586-suse-linux/bin/ld: cannot find -lpam I know it is actually installed, and disecting the configure script and hand-compiling the test program works if I say /lib/libpam.so.0, but fails if I say -lpam. Very bizarre - anyone encountered this? I realize this is probably a suze issue and not a postgres issue, but I hope someone here has seen this problem Thanks in advance! -- cg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Replication
I currently have a postgresql 8 system which I want to replicate (ideally in realtime) with a spare host in order to introduce some redundancy - eg. if the master server dies then I've got a ready-to-go backup. Switchover does not have to be automated. I've looked into commandprompt.com's mammoth system, but it only supports up to 1000 tables (the documentation doesn't mention this!) - the database in question has more than 1000 tables, and adds new tables regularly. Slony-I and pgpool apparently don't support dynamic schemas, which I'd obviously need, so they're not quite up to the job either. pgpool(without Slony-I) replicates schema changes. And PGCluter too. -- SRA OSS, Inc. Japan Tatsuo Ishii ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Utility that creates table schema from csv data?
Probably wishful thinking, but who knows - maybe there's something in contrib! I have a bunch of csv data with the field names specified on the first line of the various files. Is there any such utility that will create a table schema using the field names AND look through the data and determine what data types each field should be? Thanks, CSN __ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SQL - planet redundant data
That is exactly what I want, and now I finally see how to do it (I think!). However, it is a considerable amount of work to set this up manually, plus, it has been a headache realizing how to get there at all. I'm hoping that one or more of the developers think it would be a good idea for PostgreSQL to perform an internal table optimization process using run-length encoding. Imagine you could just throw all your data into one table, run OPTIMIZE TABLE and you'd be done. With SQL being all about tables I'm surprised this idea (or something even better) hasn't been implemented already. There was a recent brief thread here on storing timeseries data, where the use of clustered indices for static tables was suggested. This might also be useful in your situation... Cheers, Brent Wood ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq