Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)
Thanks for looking into it, Tom. We're using 9.0.4, so that might indeed be the problem. What additional data (if any) would you like to see? If you want to look into it further, I can give you schema, though I hesitate to spam the whole list. I could also mock up some tables and see what's the smallest data set that shows the problem and send you those in a dump. The fact that the behavior changes so radically when the limit on the joined table goes from 199 to 200 rows does make me suspect somethings not behaving the way it should. On Tue, Mar 20, 2012 at 4:27 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: You've still got a nasty join-size estimation error: - Nested Loop (cost=6.18..1939.43 rows=411736 width=8) (actual time=0.203..3.487 rows=35 loops=1) It's not apparent why that's so far off ... What PG version is this, anyway? It strikes me that this estimation error might have something with the eqjoinsel bugs that we repaired in 9.0.5. I'm not having any luck reproducing such a bogus estimate with current code, either, though that may just mean you've omitted some critical info about how the tables are set up. regards, tom lane -- Kevin M. Goess Software Engineer Berkeley Electronic Press kgo...@bepress.com 510-665-1200 x179 www.bepress.com bepress: sustainable scholarly publishing
Re: [GENERAL] How to convert integer to string in functions
Hi, 2011/8/12 David Johnston pol...@yahoo.com: In my table, some of the columns are in text datatype. Few data will come down from UI layer as integers. I want to convert that to string/text before saving it into the table. Please help me on this. SQL Standard: CAST( value AS text ) [or varchar] PostgreSQL short-hand: value::text In both formats replace value with whatever you want to convert. When writing a parameterized query (using ?) you can write ?::text ( or Cast(? AS type) ) to explicitly cast the unknown parameter. The text in the above can be any type name. David J. You often find this advice of doing a cast. But this only works if the input is a clean list of number characters already! Anything other than this will issue an error: postgres=# SELECT '10'::int; After trying hard to cope with anything possibly as an input string I found this: postgres=# SELECT to_number('0'||mytextcolumn, '999.000')::int FROM mytable; You can try this here: Show all peaks of Switzerland which are higher than 4000 meters above sea. SELECT ST_AsText(way) AS geom, name||','||ele AS label FROM osm_point WHERE natural = 'peak' AND to_number('0'||ele, '999.000')::int = 4000 Any better solutions are welcome. Yours, Stefan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to convert integer to string in functions
On Tue, Mar 20, 2012 at 7:14 PM, Stefan Keller sfkel...@gmail.com wrote: But this only works if the input is a clean list of number characters already! Anything other than this will issue an error: postgres=# SELECT '10'::int; After trying hard to cope with anything possibly as an input string I found this: postgres=# SELECT to_number('0'||mytextcolumn, '999.000')::int FROM mytable; I came across the same issue, specifically wanting semantics like C's atoi function. Some discussion on this list turned up a few options. Beginning of thread, including one possibility: http://osdir.com/ml/postgresql-pgsql-general/2012-03/msg00090.html Another well-researched option, with slightly different semantics: http://osdir.com/ml/postgresql-pgsql-general/2012-03/msg00094.html Hope that helps! ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to convert integer to string in functions
2012/3/20 Chris Angelico ros...@gmail.com: On Tue, Mar 20, 2012 at 7:14 PM, Stefan Keller sfkel...@gmail.com wrote: But this only works if the input is a clean list of number characters already! Anything other than this will issue an error: postgres=# SELECT '10'::int; After trying hard to cope with anything possibly as an input string I found this: postgres=# SELECT to_number('0'||mytextcolumn, '999.000')::int FROM mytable; I came across the same issue, specifically wanting semantics like C's atoi function. Some discussion on this list turned up a few options. Beginning of thread, including one possibility: http://osdir.com/ml/postgresql-pgsql-general/2012-03/msg00090.html Another well-researched option, with slightly different semantics: http://osdir.com/ml/postgresql-pgsql-general/2012-03/msg00094.html Hope that helps! ChrisA Referring to your last hint, this is was Tom's answer: Can you use to_number() here? It sounds like something along the lines of cast(to_number('0' || field::varchar, '9.') as int) might give the behaviour you're after, and a quick test seems to indicate that it's about 4x faster than the original function: I'm actually flattered that I came across almost the same solution as Tom with my proposal: to_number('0'||mytextcolumn, '999.000')::int ... Hopefully there's a cleaner way of writing that without a long list of 9s in the format string, and if the field is nullable I'd guess you probably need a coalesce(..., 0) around that as well. Would be glad to find any cleaner way but that's the silver bullet until then:- -Stefan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] current thinking on Amazon EC2?
On Mar 19, 2012, at 10:59 AM, Welty, Richard wrote: i just finished this thread from May of last year, and am wondering if this still represents consensus thinking about postgresql deployments in the EC2 cloud: http://postgresql.1045698.n5.nabble.com/amazon-ec2-td4368036.html Yes, I believe that still sums up the situation pretty well. In the past when forced onto EC2 we have had good success using a combination of Raid 1/0'ed ephemeral storage and WAL shipping to S3 (https://github.com/heroku/WAL-E). You have to design around the potential for the ephemeral disks to go away, but you get much more rational performance compared to EBS and also storage space isn't charged. S -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Conditionnal validation for transaction
Florent THOMAS wrote: 1 - Is there a way to have conditions for committing transactions like in oracle : http://www.scribd.com/doc/42831667/47/Validation-conditionnelle-de-trans action-62 PostgreSQL follows the SQL standard which does not allow anything like that. Later versions do allow anonymous blocks, also known as DO statements that allow you to execute some code to allow decision making like that. So the Oracle example is very similar code in PostgreSQL, except that you can't issue ROLLBACK and COMMIT. Thanks, Could you precise the sentence bellow But then you don't need to because you can do a conditional error or drop through to a commit. How do you do that? I don't know what exactly Simon meant here, but I'd do it like that in PostgreSQL (example from your link): CREATE TABLE transtest(x smallint); INSERT INTO transtest VALUES (1), (2); CREATE FUNCTION dec_trans() RETURNS void LANGUAGE plpgsql AS $$DECLARE minx transtest.x%TYPE; BEGIN UPDATE transtest SET x=x-1; SELECT min(x) INTO minx FROM transtest; IF minx0 THEN RAISE EXCEPTION 'bad decrement'; END IF; END$$; SELECT * FROM transtest; x --- 1 2 (2 rows) DO LANGUAGE plpgsql $$BEGIN PERFORM dec_trans(); EXCEPTION WHEN OTHERS THEN NULL; END$$; SELECT * FROM transtest; x --- 0 1 (2 rows) DO LANGUAGE plpgsql $$BEGIN PERFORM dec_trans(); EXCEPTION WHEN OTHERS THEN NULL; END$$; SELECT * FROM transtest; x --- 0 1 (2 rows) Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] WHERE IN (subselect) versus WHERE IN (1,2,3,)
Kevin Goess kgo...@bepress.com writes: Thanks for looking into it, Tom. We're using 9.0.4, so that might indeed be the problem. What additional data (if any) would you like to see? Well, the first thing to do is update to 9.0.latest and see if the plan changes. There are plenty of good reasons to do that besides this issue; see the release notes. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PASSWORD vs. md5('somepass')
Hi, I look for a way to reproduce the encrypted string stored as a password by means other than using the CREATE ROLE command. When using CREATE ROLEPASSWORD 'somepass' the resulting string for rolpassword in pg_authid always starts with md5, suggesting it would create some md5 string. So I thought to use SELECT md5('somepass') to get the same. But the two strings differ. Is there a function that does that outside the create role context? Thanks Alex -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PASSWORD vs. md5('somepass')
On Tue, Mar 20, 2012 at 8:28 AM, Alexander Reichstadt l...@mac.com wrote: Hi, I look for a way to reproduce the encrypted string stored as a password by means other than using the CREATE ROLE command. When using CREATE ROLEPASSWORD 'somepass' the resulting string for rolpassword in pg_authid always starts with md5, suggesting it would create some md5 string. So I thought to use SELECT md5('somepass') to get the same. But the two strings differ. Is there a function that does that outside the create role context? See pg_authid's explanation of the rolpassword column: http://www.postgresql.org/docs/9.1/static/catalog-pg-authid.html which you can reproduce via: SELECT 'md5' || MD5(role_password_here || role_name_here); Josh -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Initialize the User-defined Aggregate in ECPG
Hi all, I'm using the embedded SQL in C to create the User-Defined Aggregate(UDA). The command is : CREATE AGGREGATE aggname( sfunc=kmeans, stype=double precision[], finalfunc=kmeansfinal, INITCOND='{1,2,3}'); Since I need to do the aggregation in multiple rounds. I need to initialize the INITCOND using the execution result of last execution (aggregation.) Any one can tell me how can I initialize the INITCOND using host variables instead of hard code the INITCOND? The reason why I need to do this is that I am implementing the K-Means Clustering algorithm which is a recursive algorithm. Now my idea of doing this is I get the result from one round aggregation, DROP the AGGREGATE, and CREATE a new AGGREGATE using the result of last round. Jay
[GENERAL] Is it even possible?
I have now tried at least 7 different install methods to get pg up and running on Lion. I fear that my system is now thoroughly inoculated and will never be able to run postgres/postgis. I started with the pg mac installer / stack builder. That worked to get pg installed, but could not get postgis installed. I've now tried two different instructions using MacPort, Two using homebrew, and two using some-other-macport-homebrew-like method. NONE of them worked for me. I can see postgress running from the last install: sameloyiv 39844 0.0 0.1 2455512 6496 ?? S 9:02AM 0:00.12 /usr/local/bin/postgres -D /usr/local/var/postgres -r /usr/local/var/postgres/server.log -c unix_socket_directory=/var/pgsql_socket -c unix_socket_group=_postgres -c unix_socket_permissions=0770 root 104 0.0 0.0 2467372 1140 ?? Ss4:46PM 0:00.21 /opt/local/bin/daemondo --label=postgresql91-server --start-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper restart ; --pid=none sameloyiv 40075 0.0 0.0 2434892548 s000 S+9:17AM 0:00.00 grep post sameloyiv 39849 0.0 0.0 2441352384 ?? Ss9:02AM 0:00.02 postgres: stats collector process sameloyiv 39848 0.0 0.0 2455644 1564 ?? Ss9:02AM 0:00.02 postgres: autovacuum launcher process sameloyiv 39847 0.0 0.0 2455512512 ?? Ss9:02AM 0:00.09 postgres: wal writer process sameloyiv 39846 0.0 0.0 2455512604 ?? Ss9:02AM 0:00.12 postgres: writer process But continue to see this when using psql: psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket /tmp/.s.PGSQL.5432? Is there anyone who has ever successfully gotten postgres/postGIS running on Mac Lion? Really? How? Is there a way to purge my system of all of the corrupted/bad installs and start over? How? How do I get pgadmin if I use homebrew/macports etc? Would love to believe this is possible, as I cannot stand having to stop whatever I'm doing every 3-4 days to repair Windows. I hate Windows. I loath Windows, and would love to see the U.S. Economy rebound from all of the production gained by everyone universally abandoning Windows… -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is it even possible?
On 3/20/2012 9:22 AM, Sam Loy wrote: I have now tried at least 7 different install methods to get pg up and running on Lion. I fear that my system is now thoroughly inoculated and will never be able to run postgres/postgis. I started with the pg mac installer / stack builder. That worked to get pg installed, but could not get postgis installed. I've now tried two different instructions using MacPort, Two using homebrew, and two using some-other-macport-homebrew-like method. NONE of them worked for me. I can see postgress running from the last install: sameloyiv 39844 0.0 0.1 2455512 6496 ?? S 9:02AM 0:00.12 /usr/local/bin/postgres -D /usr/local/var/postgres -r /usr/local/var/postgres/server.log -c unix_socket_directory=/var/pgsql_socket -c unix_socket_group=_postgres -c unix_socket_permissions=0770 root 104 0.0 0.0 2467372 1140 ?? Ss4:46PM 0:00.21 /opt/local/bin/daemondo --label=postgresql91-server --start-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper restart ; --pid=none sameloyiv 40075 0.0 0.0 2434892548 s000 S+9:17AM 0:00.00 grep post sameloyiv 39849 0.0 0.0 2441352384 ?? Ss9:02AM 0:00.02 postgres: stats collector process sameloyiv 39848 0.0 0.0 2455644 1564 ?? Ss9:02AM 0:00.02 postgres: autovacuum launcher process sameloyiv 39847 0.0 0.0 2455512512 ?? Ss9:02AM 0:00.09 postgres: wal writer process sameloyiv 39846 0.0 0.0 2455512604 ?? Ss9:02AM 0:00.12 postgres: writer process But continue to see this when using psql: psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket /tmp/.s.PGSQL.5432? Is there anyone who has ever successfully gotten postgres/postGIS running on Mac Lion? Really? How? Is there a way to purge my system of all of the corrupted/bad installs and start over? How? How do I get pgadmin if I use homebrew/macports etc? Would love to believe this is possible, as I cannot stand having to stop whatever I'm doing every 3-4 days to repair Windows. I hate Windows. I loath Windows, and would love to see the U.S. Economy rebound from all of the production gained by everyone universally abandoning Windows… I'm not a mac user, but I'll try. The server is using unix socket: -c unix_socket_directory=/var/pgsql_socket The client (psql) using looking: connections on Unix domain socket /tmp/.s.PGSQL.5432? You can try using tcp/ip instead by passing -h localhost: psql -U postgres -h localhost someDbName -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is it even possible?
Sam, I started with the pg mac installer / stack builder. That worked to get pg installed, but could not get postgis installed. I haven't installed PostGIS, but I have no problems running the database in Lion using the EnterpriseDB (EDB) installer as provided. Is there a way to purge my system of all of the corrupted/bad installs and start over? How? I can't speak for Homebrew, but running the included uninstaller safely removes the EDB pieces. I hate Windows. I loath Windows, and would love to see the U.S. Economy rebound from all of the production gained by everyone universally abandoning Windows? You are not alone -- and I thought I felt strongly about it. :) -- Gary Chambers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [BUGS] [GENERAL] Altering a table with a rowtype column
On Wed, Mar 7, 2012 at 3:49 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Mar 7, 2012 at 2:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Mar 7, 2012 at 11:45 AM, Mike Blackwell mike.blackw...@rrd.com wrote: alter table a add column even_more_stuff boolean not null default false; aha! that's not what you posted last time. you appended 'not null default false'; which inexplicably breaks the ALTER. try this: ALTER TABLE a ADD COLUMN even_more_stuff text not null; ALTER TABLE a ALTER even_more_stuff set default false; ALTER TABLE a DROP COLUMN even_more_stuff; ALTER TABLE a ADD COLUMN even_more_stuff boolean not null default false; (this really looks like a bug in postgres, cc-ing to bugs) It is not a bug. The ALTER ADD ... DEFAULT ... form implies rewriting every existing tuple of the rowtype to insert a non-null value in the added column, and we don't have support for doing that to rowtype columns, only to the target table and descendants. I'm not buying that..it implies no such thing. In particular, for table-as-rowtype columns, there's no way that I can see to have default values be generated. So why does it follow that the dependent table has to be rewritten? Column constraints are not enforced on the rowtype, so it follows that default shouldn't be either considering there's no way to get the default to fire. Composite type (or table based composite) defaults are applied to the composite as a whole, not to specific fields. I think Tom's correct about what the right behavior would be if composite types supported defaults, but they don't, never have, and maybe never will. I had a previous argument about this with Tom, and lost, though I am not sure that anyone other than Tom thinks that the current behavior is for the best. But see commits a06e41deebdf74b8b5109329dc75b2e9d9057962 and a40b1e0bf32b1da46c1baa9bc7da87f207cd37d8. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is it even possible?
On Tue, 2012-03-20 at 09:22 -0500, Sam Loy wrote: I have now tried at least 7 different install methods to get pg up and running on Lion. I fear that my system is now thoroughly inoculated and will never be able to run postgres/postgis. I started with the pg mac installer / stack builder. That worked to get pg installed, but could not get postgis installed. I've now tried two different instructions using MacPort, Two using homebrew, and two using some-other-macport-homebrew-like method. NONE of them worked for me. I can see postgress running from the last install: sameloyiv 39844 0.0 0.1 2455512 6496 ?? S 9:02AM 0:00.12 /usr/local/bin/postgres -D /usr/local/var/postgres -r /usr/local/var/postgres/server.log -c unix_socket_directory=/var/pgsql_socket -c unix_socket_group=_postgres -c unix_socket_permissions=0770 root 104 0.0 0.0 2467372 1140 ?? Ss4:46PM 0:00.21 /opt/local/bin/daemondo --label=postgresql91-server --start-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper restart ; --pid=none sameloyiv 40075 0.0 0.0 2434892548 s000 S+9:17AM 0:00.00 grep post sameloyiv 39849 0.0 0.0 2441352384 ?? Ss9:02AM 0:00.02 postgres: stats collector process sameloyiv 39848 0.0 0.0 2455644 1564 ?? Ss9:02AM 0:00.02 postgres: autovacuum launcher process sameloyiv 39847 0.0 0.0 2455512512 ?? Ss9:02AM 0:00.09 postgres: wal writer process sameloyiv 39846 0.0 0.0 2455512604 ?? Ss9:02AM 0:00.12 postgres: writer process But continue to see this when using psql: psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket /tmp/.s.PGSQL.5432? Is there anyone who has ever successfully gotten postgres/postGIS running on Mac Lion? Really? How? Is there a way to purge my system of all of the corrupted/bad installs and start over? How? How do I get pgadmin if I use homebrew/macports etc? Would love to believe this is possible, as I cannot stand having to stop whatever I'm doing every 3-4 days to repair Windows. I hate Windows. I loath Windows, and would love to see the U.S. Economy rebound from all of the production gained by everyone universally abandoning Windows… unix_socket_directory=/var/pgsql_socket vs /tmp/.s.PGSQL.5432 you need to tell psql where the unix socket is located, or you need to configure tcp to be available and use the tcp port to connect -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_upgrade + streaming replication ?
On Mon, Mar 19, 2012 at 03:07:02PM -0700, Jeff Davis wrote: On Mon, 2012-03-19 at 15:30 -0400, Bruce Momjian wrote: On Thu, Mar 01, 2012 at 02:01:31PM -0800, Lonni J Friedman wrote: I've got a 3 node cluster (1 master/2 slaves) running 9.0.x with streaming replication. I'm in the planning stages of upgrading to 9.1.x, and am looking into the most efficient way to do the upgrade with the goal of minimizing downtime risk. After googling, the only discussion that I've found of using pg_upgrade with a streaming replication setup seems to be this (nearly) year old thread: http://web.archiveorange.com/archive/v/9FNVlDWGQtpyWVL54jlK In summary, there is no way to use both pg_upgrade and streaming replication simultaneously. I'd have to either use pg_upgrade and then effectively rebuild/redeploy the slaves, or not use pg_upgrade, and reimport all of the data. Is that still the latest status, or are there other options? You can shut down all three servers, run pg_upgrade on all of them, then restart them as 9.1 servers. After running pg_upgrade on each server individually, they will have different system IDs, and potentially different on-disk representation of the catalogs, right? So how can you resume streaming without rebuilding the slaves? Oh, wow, I never thought of the fact that the system tables will be different? I guess you could assume the pg_dump restore is going to create things exactly the same on all the systems, but I never tested that. Do the system id's have to match? That would be a problem because you are initdb'ing on each server. OK, crazy idea, but I wonder if you could initdb on the master, then copy that to the slaves, then run pg_upgrade on each of them. Obviously this needs some testing. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [BUGS] [GENERAL] Altering a table with a rowtype column
Robert Haas robertmh...@gmail.com writes: On Wed, Mar 7, 2012 at 3:49 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Mar 7, 2012 at 2:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: It is not a bug. The ALTER ADD ... DEFAULT ... form implies rewriting every existing tuple of the rowtype to insert a non-null value in the added column, and we don't have support for doing that to rowtype columns, only to the target table and descendants. I'm not buying that..it implies no such thing. In particular, for table-as-rowtype columns, there's no way that I can see to have default values be generated. So why does it follow that the dependent table has to be rewritten? Column constraints are not enforced on the rowtype, so it follows that default shouldn't be either considering there's no way to get the default to fire. Composite type (or table based composite) defaults are applied to the composite as a whole, not to specific fields. I think Tom's correct about what the right behavior would be if composite types supported defaults, but they don't, never have, and maybe never will. I had a previous argument about this with Tom, and lost, though I am not sure that anyone other than Tom thinks that the current behavior is for the best. Um, did I say I thought it was for the best? I thought I said we don't have support for doing better. If we are willing to legislate that column defaults are not and never will be applied to composite types, then I think Merlin might be right that we could just let an ALTER ADD with DEFAULT ignore the existence of composite columns. I'd always figured that we'd want to try to fix that omission eventually, though. But see commits a06e41deebdf74b8b5109329dc75b2e9d9057962 and a40b1e0bf32b1da46c1baa9bc7da87f207cd37d8. Note that the actual problem with the original commit was that it depended on a misreading of the SQL standard. Per spec, ALTER ADD with DEFAULT is *not* the same thing as ALTER ADD followed by ALTER SET DEFAULT; the contents of the table end up different. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_upgrade + streaming replication ?
On Tue, Mar 20, 2012 at 11:46 AM, Bruce Momjian br...@momjian.us wrote: On Mon, Mar 19, 2012 at 03:07:02PM -0700, Jeff Davis wrote: On Mon, 2012-03-19 at 15:30 -0400, Bruce Momjian wrote: On Thu, Mar 01, 2012 at 02:01:31PM -0800, Lonni J Friedman wrote: I've got a 3 node cluster (1 master/2 slaves) running 9.0.x with streaming replication. I'm in the planning stages of upgrading to 9.1.x, and am looking into the most efficient way to do the upgrade with the goal of minimizing downtime risk. After googling, the only discussion that I've found of using pg_upgrade with a streaming replication setup seems to be this (nearly) year old thread: http://web.archiveorange.com/archive/v/9FNVlDWGQtpyWVL54jlK In summary, there is no way to use both pg_upgrade and streaming replication simultaneously. I'd have to either use pg_upgrade and then effectively rebuild/redeploy the slaves, or not use pg_upgrade, and reimport all of the data. Is that still the latest status, or are there other options? You can shut down all three servers, run pg_upgrade on all of them, then restart them as 9.1 servers. After running pg_upgrade on each server individually, they will have different system IDs, and potentially different on-disk representation of the catalogs, right? So how can you resume streaming without rebuilding the slaves? Oh, wow, I never thought of the fact that the system tables will be different? I guess you could assume the pg_dump restore is going to create things exactly the same on all the systems, but I never tested that. Do the system id's have to match? That would be a problem because you are initdb'ing on each server. OK, crazy idea, but I wonder if you could initdb on the master, then copy that to the slaves, then run pg_upgrade on each of them. Obviously this needs some testing. Wouldn't it be easier to just pg_upgrade the master, then setup the slaves from scratch (with rsync, etc)? It certainly wouldn't be any more work to do it that way (although still a lot more work than simply running pg_upgrade on all servers). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_upgrade + streaming replication ?
On Tue, Mar 20, 2012 at 11:56:29AM -0700, Lonni J Friedman wrote: So how can you resume streaming without rebuilding the slaves? Oh, wow, I never thought of the fact that the system tables will be different? I guess you could assume the pg_dump restore is going to create things exactly the same on all the systems, but I never tested that. Do the system id's have to match? That would be a problem because you are initdb'ing on each server. OK, crazy idea, but I wonder if you could initdb on the master, then copy that to the slaves, then run pg_upgrade on each of them. Obviously this needs some testing. Wouldn't it be easier to just pg_upgrade the master, then setup the slaves from scratch (with rsync, etc)? It certainly wouldn't be any more work to do it that way (although still a lot more work than simply running pg_upgrade on all servers). Hey, wow, that is an excellent idea because rsync is going to realize that all the user-data files are exactly the same and skip them --- that is the winner solution. I should probably add this to the pg_upgrade documentaiton. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [BUGS] [GENERAL] Altering a table with a rowtype column
On Tue, Mar 20, 2012 at 12:16 PM, Robert Haas robertmh...@gmail.com wrote: I think Tom's correct about what the right behavior would be if composite types supported defaults, but they don't, never have, and maybe never will. I had a previous argument about this with Tom, and lost, though I am not sure that anyone other than Tom thinks that the current behavior is for the best. But see commits a06e41deebdf74b8b5109329dc75b2e9d9057962 and a40b1e0bf32b1da46c1baa9bc7da87f207cd37d8. I'll go further than that -- given the current infrastructure I'd say that composite type defaults are not very well defined or useful besides not being implemented. The way things work now: create type foo as(a int, b int); create table bar(f foo default row(1,2)); works perfectly ok. how would you proxy the default from one of those two columns? does it make sense to do so? defaults are applied to table columns, not to types (you could argue that domains violate that rule but IMO it's not the same thing). type constraints are another matter. this would be useful and valuable but may end up being impossible to add for a lot of reasons such as backwards compatibility and dealing with the standard's lack (implemented nowhere in postgres except for the very special case of IS NULL) of distinguishing between the type itself being null and it's fields being null (making type constraints smack into plpgsql variable declarations). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Authenticating from a web service call
Interesting idea. However, I think this is ssl between the client and database. Given the client would be the server hosting the web service I don't think this would work for the web service client. On Fri, Mar 16, 2012 at 2:54 PM, Raymond O'Donnell r...@iol.ie wrote: On 16/03/2012 18:39, Bryan Montgomery wrote: Hello, We are looking at implementing a web service that basically makes calls to the database. I have been thinking about ways to secure the web service based on the database. I initially thought about just connecting to the database as the user with parameters passed through the web service - however I don't know how to do that other than clear text passwords. Postgres supports connections over SSL - will this do the job? http://www.postgresql.org/docs/9.1/static/ssl-tcp.html Ray. -- Raymond O'Donnell :: Galway :: Ireland r...@iol.ie
Re: [GENERAL] Is it even possible?
Il 20/03/12 15:22, Sam Loy ha scritto: I have now tried at least 7 different install methods to get pg up and running on Lion. I fear that my system is now thoroughly inoculated and will never be able to run postgres/postgis. I started with the pg mac installer / stack builder. That worked to get pg installed, but could not get postgis installed. I've now tried two different instructions using MacPort, Two using homebrew, and two using some-other-macport-homebrew-like method. NONE of them worked for me. I can see postgress running from the last install: sameloyiv 39844 0.0 0.1 2455512 6496 ?? S 9:02AM 0:00.12 /usr/local/bin/postgres -D /usr/local/var/postgres -r /usr/local/var/postgres/server.log -c unix_socket_directory=/var/pgsql_socket -c unix_socket_group=_postgres -c unix_socket_permissions=0770 root 104 0.0 0.0 2467372 1140 ?? Ss4:46PM 0:00.21 /opt/local/bin/daemondo --label=postgresql91-server --start-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.postgresql91-server/postgresql91-server.wrapper restart ; --pid=none sameloyiv 40075 0.0 0.0 2434892548 s000 S+9:17AM 0:00.00 grep post sameloyiv 39849 0.0 0.0 2441352384 ?? Ss9:02AM 0:00.02 postgres: stats collector process sameloyiv 39848 0.0 0.0 2455644 1564 ?? Ss9:02AM 0:00.02 postgres: autovacuum launcher process sameloyiv 39847 0.0 0.0 2455512512 ?? Ss9:02AM 0:00.09 postgres: wal writer process sameloyiv 39846 0.0 0.0 2455512604 ?? Ss9:02AM 0:00.12 postgres: writer process But continue to see this when using psql: psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket /tmp/.s.PGSQL.5432? Hu... are you using the apple psql? Lion comes with its own copy of psql. I have many problem (access rights) with it. Try using psql that come with your postgres installation. Is there anyone who has ever successfully gotten postgres/postGIS running on Mac Lion? Really? How? I am using (compiling from surce) Postgres 8.4 and postgis 1.5 (latest versions). 8.4 because I have 8.4 on my server. Compile postgres is a no-problem operation. Compile postgis is not so easy because it needs some libraries. Is there a way to purge my system of all of the corrupted/bad installs and start over? How? I think that depends on you installation, I have all data in /var/database and simply removing that I have a clean system. How do I get pgadmin if I use homebrew/macports etc? Maybe that I didn't understand. simply download it from http://www.pgadmin.org/download/macosx.php Edoardo -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg-admin development snapshots
Hi, The link[1] for the development snapshots of pg-admin as advertised here [2] seems to be broken. Are these snapshots hosted somewhere else these days or are they no longer produced. I have a colleague who's bravely switching from SQL Server to Postgresql who'd really like to use the new scripting feature. [1]: http://www.pgadmin.org/snapshots [2]: http://www.pgadmin.org/download/snapshots.php -- Andy Chambers Software Engineer (e) achamb...@mcna.net (t) 954-682-0573 CONFIDENTIALITY NOTICE: This electronic mail may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic mail or its contents by persons other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail so that we may correct our internal records. Please then delete the original message. Thank you. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is it even possible?
Is there anyone who has ever successfully gotten postgres/postGIS running on Mac Lion? Really? How? Hello Sam, I'm running Lion, and had the same trouble using the Enterprise Stack Builder to install PostGIS. I finally got it working by using Kyng Chaos' installers for both PostgreSQL and PostGIS: http://www.kyngchaos.com/software/postgres Bryan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_upgrade + streaming replication ?
On Tue, Mar 20, 2012 at 02:58:20PM -0400, Bruce Momjian wrote: On Tue, Mar 20, 2012 at 11:56:29AM -0700, Lonni J Friedman wrote: So how can you resume streaming without rebuilding the slaves? Oh, wow, I never thought of the fact that the system tables will be different? I guess you could assume the pg_dump restore is going to create things exactly the same on all the systems, but I never tested that. Do the system id's have to match? That would be a problem because you are initdb'ing on each server. OK, crazy idea, but I wonder if you could initdb on the master, then copy that to the slaves, then run pg_upgrade on each of them. Obviously this needs some testing. Wouldn't it be easier to just pg_upgrade the master, then setup the slaves from scratch (with rsync, etc)? It certainly wouldn't be any more work to do it that way (although still a lot more work than simply running pg_upgrade on all servers). Hey, wow, that is an excellent idea because rsync is going to realize that all the user-data files are exactly the same and skip them --- that is the winner solution. I should probably add this to the pg_upgrade documentaiton. Thanks. Actually, I am not sure how well rsync will work, because by default it only skips files with matching file timestamp and size, and I bet many of the file will have different times because of streaming replication lag, and server time lag. I think we need this rsync options: -c, --checksum This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a quick check that (by default) checks if each file's size and time of last modification match between the sender and receiver. This option changes this to compare a 128-bit checksum for each file that has a matching size. Generating the check sums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly. The sending side generates its checksums while it is doing the file-system scan that builds the list of the available files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has the same size as the corresponding sender's file: files with either a changed size or a changed checksum are selected for transfer. and I suspect that will be slow. Probably better than nothing, but not super-fast either. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg-admin development snapshots
Hi, On Tue, 2012-03-20 at 16:01 -0400, Andy Chambers wrote: [...] The link[1] for the development snapshots of pg-admin as advertised here [2] seems to be broken. Are these snapshots hosted somewhere else these days or are they no longer produced. They are no longer produced. I'll fix the website. Thanks for noticing. I have a colleague who's bravely switching from SQL Server to Postgresql who'd really like to use the new scripting feature. Which scripting feature are you talking about? -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_upgrade + streaming replication ?
On Tue, 2012-03-20 at 16:49 -0400, Bruce Momjian wrote: On Tue, Mar 20, 2012 at 02:58:20PM -0400, Bruce Momjian wrote: On Tue, Mar 20, 2012 at 11:56:29AM -0700, Lonni J Friedman wrote: So how can you resume streaming without rebuilding the slaves? Oh, wow, I never thought of the fact that the system tables will be different? I guess you could assume the pg_dump restore is going to create things exactly the same on all the systems, but I never tested that. Do the system id's have to match? That would be a problem because you are initdb'ing on each server. OK, crazy idea, but I wonder if you could initdb on the master, then copy that to the slaves, then run pg_upgrade on each of them. Obviously this needs some testing. This sounds promising. Fundamentally, the user data files aren't changing, and if you can upgrade the master you can upgrade the slaves. So there is no fundamental problem here, but there will be some careful bookkeeping. I think we need to look at this as a new feature that needs its own testing and documentation. It's important though, because as you point out downthread, rsync doesn't really solve the problem (still takes time proportional to the user data size). Regards, Jeff Davis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg-admin development snapshots
On Tue, Mar 20, 2012 at 4:53 PM, Guillaume Lelarge guilla...@lelarge.info wrote: Hi, On Tue, 2012-03-20 at 16:01 -0400, Andy Chambers wrote: [...] The link[1] for the development snapshots of pg-admin as advertised here [2] seems to be broken. Are these snapshots hosted somewhere else these days or are they no longer produced. They are no longer produced. I'll fix the website. Thanks for noticing. I have a colleague who's bravely switching from SQL Server to Postgresql who'd really like to use the new scripting feature. Which scripting feature are you talking about? http://pgscript.projects.postgresql.org/INDEX.html Am I right in thinking this will be included in the next version of pg-admin? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Index on System Table
I've got a SaaS situation where I'm using 1000+ schemas in a single database (each schema contains the same tables, just different data per tenant). I used schemas so that the shared app servers could share a connection to the single database for all schemas. Things are working fine. However, when using psql, doing \d or trying to use tab complete takes FOREVER, because it's doing a sequence scan against pg_class (which has over a million rows), and relying on pg_table_is_visible to do search_path filtering. I've figured out that if I add nspname = ANY(current_schemas(true)) to the query psql is using, and an index to pg_class on relnamespace, the query optimizer is able to do an index scan, and the queries return in milliseconds instead of minutes. However, I can't actually create an index on pg_class because it is a system table (I was able to test by copying it to a temporary table and adding the index there). My question is if there is a way to create the index on the system table somehow for just my database, and if not how would the developer community react to the suggestion of adding an index to a system table in the default postgres distro. Thanks, Cody Cutrer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_upgrade + streaming replication ?
actually rsync works fine on file level and is good for manual syncing. it check really the files with the stat command, so a bit change will trigger the copy in practice you need to keep an eye on compleetness of the rsync action. try to use it without compression for large data sets, it saves time strangely. Met vriendelijk groet, with kind regards Henk Bronk On 20 mrt. 2012, at 21:49, Bruce Momjian br...@momjian.us wrote: On Tue, Mar 20, 2012 at 02:58:20PM -0400, Bruce Momjian wrote: On Tue, Mar 20, 2012 at 11:56:29AM -0700, Lonni J Friedman wrote: So how can you resume streaming without rebuilding the slaves? Oh, wow, I never thought of the fact that the system tables will be different? I guess you could assume the pg_dump restore is going to create things exactly the same on all the systems, but I never tested that. Do the system id's have to match? That would be a problem because you are initdb'ing on each server. OK, crazy idea, but I wonder if you could initdb on the master, then copy that to the slaves, then run pg_upgrade on each of them. Obviously this needs some testing. Wouldn't it be easier to just pg_upgrade the master, then setup the slaves from scratch (with rsync, etc)? It certainly wouldn't be any more work to do it that way (although still a lot more work than simply running pg_upgrade on all servers). Hey, wow, that is an excellent idea because rsync is going to realize that all the user-data files are exactly the same and skip them --- that is the winner solution. I should probably add this to the pg_upgrade documentaiton. Thanks. Actually, I am not sure how well rsync will work, because by default it only skips files with matching file timestamp and size, and I bet many of the file will have different times because of streaming replication lag, and server time lag. I think we need this rsync options: -c, --checksum This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a quick check that (by default) checks if each file's size and time of last modification match between the sender and receiver. This option changes this to compare a 128-bit checksum for each file that has a matching size. Generating the check sums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly. The sending side generates its checksums while it is doing the file-system scan that builds the list of the available files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has the same size as the corresponding sender's file: files with either a changed size or a changed checksum are selected for transfer. and I suspect that will be slow. Probably better than nothing, but not super-fast either. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg-admin development snapshots
On Tue, 2012-03-20 at 17:17 -0400, Andy Chambers wrote: On Tue, Mar 20, 2012 at 4:53 PM, Guillaume Lelarge guilla...@lelarge.info wrote: Hi, On Tue, 2012-03-20 at 16:01 -0400, Andy Chambers wrote: [...] The link[1] for the development snapshots of pg-admin as advertised here [2] seems to be broken. Are these snapshots hosted somewhere else these days or are they no longer produced. They are no longer produced. I'll fix the website. Thanks for noticing. I have a colleague who's bravely switching from SQL Server to Postgresql who'd really like to use the new scripting feature. Which scripting feature are you talking about? http://pgscript.projects.postgresql.org/INDEX.html Am I right in thinking this will be included in the next version of pg-admin? This is available in pgAdmin since quite some time. You'll have it with pgAdmin 1.14 for sure. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Authenticating from a web service call
Actually, through some experimentation, googling and looking at a postgres book, I found out how to encrypt the password, and to compare that to pg_shadow. However, during my research I realized the need for double encrypting as per postgres clients. So,another option is to use encryption on the web service xml using public / private keys, or using ssl to pass the md5 hash of the clients password. The more elegant way seems to be using the encrypted web service, but the more universal method for clients would probably be ssl. On Tue, Mar 20, 2012 at 3:16 PM, Bryan Montgomery mo...@english.net wrote: Interesting idea. However, I think this is ssl between the client and database. Given the client would be the server hosting the web service I don't think this would work for the web service client. On Fri, Mar 16, 2012 at 2:54 PM, Raymond O'Donnell r...@iol.ie wrote: On 16/03/2012 18:39, Bryan Montgomery wrote: Hello, We are looking at implementing a web service that basically makes calls to the database. I have been thinking about ways to secure the web service based on the database. I initially thought about just connecting to the database as the user with parameters passed through the web service - however I don't know how to do that other than clear text passwords. Postgres supports connections over SSL - will this do the job? http://www.postgresql.org/docs/9.1/static/ssl-tcp.html Ray. -- Raymond O'Donnell :: Galway :: Ireland r...@iol.ie
[GENERAL] Indexes on System Table
I've got a SaaS situation where I'm using 1000+ schemas in a single database (each schema contains the same tables, just different data per tenant). I used schemas so that the shared app servers could share a connection to the single database for all schemas. Things are working fine. However, when using psql, doing \d or trying to use tab complete takes FOREVER, because it's doing a sequence scan against pg_class (which has over a million rows), and relying on pg_table_is_visible to do search_path filtering. I've figured out that if I add nspname = ANY(current_schemas(true)) to the query psql is using, and an index to pg_class on relnamespace, the query optimizer is able to do an index scan, and the queries return in milliseconds instead of minutes. However, I can't actually create an index on pg_class because it is a system table (I was able to test by copying it to a temporary table and adding the index there). My question is if there is a way to create the index on the system table somehow for just my database, and if not how would the developer community react to the suggestion of adding an index to a system table in the default postgres distro. Thanks, Cody Cutrer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] unaccent install howto ?
New to PostgreSQL, I'd like to install a dictionnary unaccent.rules and needs an howto. I do have to install that on Ubuntu 11.10 and Mac OS X latest. -- Yvon
[GENERAL] huge price database question..
Greetings list! I am pretty new to postgresql from mysql and did a fairly extensive search of the list and came up with a few good ones but didn't find the exact same situation as I have now. so I am venturing asking here. I have daily minute stock price data from 2005 on and each day with columns timestamp, open,high,low,close,volume and a few more. each day's data is about 1.2million rows. I want import all the data to postgresql and analyze using R with the help of Rpostgresql. right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. I would seek advise on the following idea: store everything in a big table, partition by month(this gives a reasonable number of partitions) and do bulk loading on the daily file. my queries would consist mostly select on a particular symbol on a particular day. Also in the future, I will import daily data to the db every day. my hardware is 16G Ram, 4x5400rpm raid10 with enough space. Thanks! Jim. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. Are you committing each insert separately or doing them in batches using 'begin transaction' and 'commit'? I have a database that I do inserts in from a text file. Doing a commit every 1000 transactions cut the time by over 90%. -- Mike Nolan
Re: [GENERAL] Index on System Table
Cody Cutrer c...@instructure.com writes: I've got a SaaS situation where I'm using 1000+ schemas in a single database (each schema contains the same tables, just different data per tenant). ... if I add nspname = ANY(current_schemas(true)) to the query psql is using, and an index to pg_class on relnamespace, the query optimizer is able to do an index scan, and the queries return in milliseconds instead of minutes. However, I can't actually create an index on pg_class because it is a system table (I was able to test by copying it to a temporary table and adding the index there). My question is if there is a way to create the index on the system table somehow for just my database, There's not really support for adding indexes to system catalogs on-the-fly. I think it would work (barring concurrency issues) for most catalogs, but pg_class has special limitations due to the relmapping infrastructure. It's not something I'd particularly care to try on a production database. and if not how would the developer community react to the suggestion of adding an index to a system table in the default postgres distro. In many (probably most) databases, an index on pg_class.relnamespace wouldn't be selective enough to justify its update costs. I'd want to see a lot more than one request for this before considering it. If you're correct that the main costs come from the pg_table_is_visible tests, it should be possible to dodge that without an extra index. I'd suggest making a function similar to current_schemas() except it returns an OID array instead of names (this should be cheaper anyway) and just putting the relnamespace = ANY(current_schema_oids()) condition in front of the visibility test. Or maybe you could dispense with the visibility test altogether, depending on your usage patterns. (BTW, I think that \d schemaname.* doesn't involve any visibility tests, in case that helps.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 04:27 PM, Jim Green wrote: Greetings list! I am pretty new to postgresql from mysql Welcome. I have daily minute stock price data from 2005 on and each day with columns timestamp, open,high,low,close,volume and a few more. each day's data is about 1.2million rows. I want import all the data to postgresql and analyze using R with the help of Rpostgresql. right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I'm not sure I understand - please expand and clarify. If you have 7000 stocks, 1.2 million rows/day is only 171 observations/stock/day or a little under 3-hours of data at 60 samples/hour. Are there holes in the data or am I completely missing the setup? For the NYSE you have 52-weeks/year, 5 trading-days/week (except for up to 8 trading holidays), 6.5 trading-hours/day giving 100,000 rows/stock/year give or take depending on holiday schedule. Over your time range, that will be less than 1-million rows per stock. Without holes in the data, it seems you will be adding 2.7 million rows per trading day. Is perl doing individual record inserts? Short of reconnecting for each insert, this is the slowest option. Each insert is its own transaction and, unless you are using a raid card with writeback enabled (and hopefully with battery-backed cache if it is) you will hit some limitations imposed by your disk's rotational speed. If you can ensure the data is sorted by stock, you can start a transaction, write all the records for that stock, then commit the transaction which should speed the operation. Only because you are bulk adding historical data and can rebuild your database if it is destroyed, you could consider turning off fsync while you are importing historical data. Dropping indexes while you play catch-up can help as well. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. I would seek advise on the following idea: store everything in a big table, partition by month(this gives a reasonable number of partitions) and do bulk loading on the daily file. my queries would consist mostly select on a particular symbol on a particular day. If you already have daily files, you might want to work out the process of importing a day and apply that to your backlog. There are lots of options - you might try experimenting. Given the type of queries you plan, simple partitioning by time period as you described is a reasonable approach. You could import a day's data as a bulk copy then select from that table into the various individual stock tables. BTW, you have plenty of RAM that that daily bulk file will be cached and queries on each individual stock will probably be quite fast. But you will need to wrap things in a transaction or otherwise ensure that you can recover if things fail part-way through that distribution of data. You could partition your data by stock symbol and use a trigger on the parent to put the data into the correct table on insert. I am unsure how fast this will run - try it and test. One benefit is that your daily import will fully succeed or fully fail. But you will need to update your partitioning and triggers to deal with new stocks. You can, of course, have the trigger choose which table to use based on the table name. This eliminates the need to alter the trigger code but it is still recommended to make the child-table in advance. my hardware is 16G Ram, 4x5400rpm raid10 with enough space. Have you started with basic tuning. It is unlikely that whatever stock PostgreSQL you have installed is suboptimal. (What PG version and OS are you using?) Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 19:45, Michael Nolan htf...@gmail.com wrote: right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. Are you committing each insert separately or doing them in batches using 'begin transaction' and 'commit'? I have a database that I do inserts in from a text file. Doing a commit every 1000 transactions cut the time by over 90%. I use perl dbi and prepared statement. also I set shared_buffers = 4GB work_mem = 1GB synchronous_commit = off effective_cache_size = 8GB fsync=off full_page_writes = off when I do the insert. Thanks! -- Mike Nolan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 20:19, Steve Crawford scrawf...@pinpointresearch.com wrote: On 03/20/2012 04:27 PM, Jim Green wrote: Greetings list! I am pretty new to postgresql from mysql Welcome. I have daily minute stock price data from 2005 on and each day with columns timestamp, open,high,low,close,volume and a few more. each day's data is about 1.2million rows. I want import all the data to postgresql and analyze using R with the help of Rpostgresql. right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I'm not sure I understand - please expand and clarify. If you have 7000 stocks, 1.2 million rows/day is only 171 observations/stock/day or a little under 3-hours of data at 60 samples/hour. Are there holes in the data or am I completely missing the setup? Hi: stocks are being delisted and added so not everyday I have 7000 stocks and 7000 is just a rough number. also lots of them are not liquid so holes are possible.. For the NYSE you have 52-weeks/year, 5 trading-days/week (except for up to 8 trading holidays), 6.5 trading-hours/day giving 100,000 rows/stock/year give or take depending on holiday schedule. Over your time range, that will be less than 1-million rows per stock. Without holes in the data, it seems you will be adding 2.7 million rows per trading day. I spot checked 0302's data it has 1.2 million rows in it and would be similar for other dates. Is perl doing individual record inserts? Short of reconnecting for each insert, this is the slowest option. Each insert is its own transaction and, unless you are using a raid card with writeback enabled (and hopefully with battery-backed cache if it is) you will hit some limitations imposed by your disk's rotational speed. If you can ensure the data is sorted by stock, you can start a transaction, write all the records for that stock, then commit the transaction which should speed the operation. I use the per dbi and prepared statement to insert to table per symbol. Only because you are bulk adding historical data and can rebuild your database if it is destroyed, you could consider turning off fsync while you are importing historical data. Dropping indexes while you play catch-up can help as well. I already turn fsync off.. but autovacuum uses lots of io and my 8 core cpu is really busy.. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. I would seek advise on the following idea: store everything in a big table, partition by month(this gives a reasonable number of partitions) and do bulk loading on the daily file. my queries would consist mostly select on a particular symbol on a particular day. If you already have daily files, you might want to work out the process of importing a day and apply that to your backlog. There are lots of options - you might try experimenting. Given the type of queries you plan, simple partitioning by time period as you described is a reasonable approach. You could import a day's data as a bulk copy then select from that table into the various individual stock tables. BTW, you have plenty of RAM that that daily bulk file will be cached and queries on each individual stock will probably be quite fast. But you will need to wrap things in a transaction or otherwise ensure that you can recover if things fail part-way through that distribution of data. You could partition your data by stock symbol and use a trigger on the parent to put the data into the correct table on insert. I am unsure how I've read thousands of partition might be too much.. fast this will run - try it and test. One benefit is that your daily import will fully succeed or fully fail. But you will need to update your partitioning and triggers to deal with new stocks. You can, of course, have the trigger choose which table to use based on the table name. This eliminates the need to alter the trigger code but it is still recommended to make the child-table in advance. my hardware is 16G Ram, 4x5400rpm raid10 with enough space. Have you started with basic tuning. It is unlikely that whatever stock PostgreSQL you have installed is suboptimal. (What PG version and OS are you using?) Yes I do, I posted it in last reply. I run debian squeeze 64bit and 9.1.3 version PG.. It looks like alternatives are kind of complex to me, right now my approach(perl dbi and prepared insert) would take about 8/9 mins to insert a day's data. I think I'll probably just stick with it and wait. the autovacuum processes does a lot of io and make my pc unusable while I do the data inserts.. and I tested autovacuum off with not much success because of they are launched due to the transaction id wrap around issue. Thanks! Jim. Cheers, Steve -- Sent via pgsql-general mailing list
Re: [GENERAL] pg_upgrade + streaming replication ?
On Tue, 2012-03-20 at 22:21 +0100, Henk Bronk wrote: actually rsync works fine on file level and is good for manual syncing. it check really the files with the stat command, so a bit change will trigger the copy in practice you need to keep an eye on compleetness of the rsync action. Rsync still needs to examine the entire file. It has no information to know that the file is the same on master and slave. We could try to give it the appropriate information on which it can make that assumption -- e.g. keep the timestamps the same so that rsync assumes the contents are the same. But that seems fragile and I don't see a good way of doing it, anyway. We need a way to take a base backup of just the catalogs, essentially, and leave the user data intact. Probably quite a few details to sort out though. Regards, Jeff Davis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 04:27 PM, Jim Green wrote: Greetings list! I am pretty new to postgresql from mysql and did a fairly extensive search of the list and came up with a few good ones but didn't find the exact same situation as I have now. so I am venturing asking here. I have daily minute stock price data from 2005 on and each day with columns timestamp, open,high,low,close,volume and a few more. each day's data is about 1.2million rows. I want import all the data to postgresql and analyze using R with the help of Rpostgresql. right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. I would seek advise on the following idea: store everything in a big table, partition by month(this gives a reasonable number of partitions) and do bulk loading on the daily file. my queries would consist mostly select on a particular symbol on a particular day. Also in the future, I will import daily data to the db every day. my hardware is 16G Ram, 4x5400rpm raid10 with enough space. Thanks! Jim. Seems like you'd want to do this? http://search.cpan.org/~turnstep/DBD-Pg-2.19.2/Pg.pm#pg_putcopydata http://search.cpan.org/%7Eturnstep/DBD-Pg-2.19.2/Pg.pm#pg_putcopydata COPY support DBD::Pg allows for quick (bulk) reading and storing of data by using the COPY command. The basic process is to use $dbh-do to issue a COPY command, and then to either add rows using pg_putcopydata, or to read them by using pg_getcopydata.
Re: [GENERAL] pg_upgrade + streaming replication ?
On Tue, Mar 20, 2012 at 8:27 PM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-03-20 at 22:21 +0100, Henk Bronk wrote: actually rsync works fine on file level and is good for manual syncing. it check really the files with the stat command, so a bit change will trigger the copy in practice you need to keep an eye on compleetness of the rsync action. Rsync still needs to examine the entire file. It has no information to know that the file is the same on master and slave. We could try to give it the appropriate information on which it can make that assumption -- e.g. keep the timestamps the same so that rsync assumes the contents are the same. But that seems fragile and I don't see a good way of doing it, anyway. We need a way to take a base backup of just the catalogs, essentially, and leave the user data intact. Probably quite a few details to sort out though. Regards, Jeff Davis rsync can take file that contains a list of files to sync. Is there a convenient way to generate such a control file?
Re: [GENERAL] huge price database question..
On 20 March 2012 21:40, David Kerr d...@mr-paradox.net wrote: On 03/20/2012 04:27 PM, Jim Green wrote: Greetings list! I am pretty new to postgresql from mysql and did a fairly extensive search of the list and came up with a few good ones but didn't find the exact same situation as I have now. so I am venturing asking here. I have daily minute stock price data from 2005 on and each day with columns timestamp, open,high,low,close,volume and a few more. each day's data is about 1.2million rows. I want import all the data to postgresql and analyze using R with the help of Rpostgresql. right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. I would seek advise on the following idea: store everything in a big table, partition by month(this gives a reasonable number of partitions) and do bulk loading on the daily file. my queries would consist mostly select on a particular symbol on a particular day. Also in the future, I will import daily data to the db every day. my hardware is 16G Ram, 4x5400rpm raid10 with enough space. Thanks! Jim. Seems like you'd want to do this? http://search.cpan.org/~turnstep/DBD-Pg-2.19.2/Pg.pm#pg_putcopydata COPY support DBD::Pg allows for quick (bulk) reading and storing of data by using the COPY command. The basic process is to use $dbh-do to issue a COPY command, and then to either add rows using pg_putcopydata, or to read them by using pg_getcopydata. Thanks! would you comment on the table setup as well? Jim. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 21:54, Brent Wood brent.w...@niwa.co.nz wrote: Also look at a clustered index on timestamp Thanks, this looks very helpful. what do you think about the thousands table vs one table partitioned by month? I guess if I go with one table, index would be too big to fit in ram? Jim. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 06:50 PM, Jim Green wrote: On 20 March 2012 21:40, David Kerrd...@mr-paradox.net wrote: On 03/20/2012 04:27 PM, Jim Green wrote: Greetings list! I am pretty new to postgresql from mysql and did a fairly extensive search of the list and came up with a few good ones but didn't find the exact same situation as I have now. so I am venturing asking here. I have daily minute stock price data from 2005 on and each day with columns timestamp, open,high,low,close,volume and a few more. each day's data is about 1.2million rows. I want import all the data to postgresql and analyze using R with the help of Rpostgresql. right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. I would seek advise on the following idea: store everything in a big table, partition by month(this gives a reasonable number of partitions) and do bulk loading on the daily file. my queries would consist mostly select on a particular symbol on a particular day. Also in the future, I will import daily data to the db every day. my hardware is 16G Ram, 4x5400rpm raid10 with enough space. Thanks! Jim. Seems like you'd want to do this? http://search.cpan.org/~turnstep/DBD-Pg-2.19.2/Pg.pm#pg_putcopydata COPY support DBD::Pg allows for quick (bulk) reading and storing of data by using the COPY command. The basic process is to use $dbh-do to issue a COPY command, and then to either add rows using pg_putcopydata, or to read them by using pg_getcopydata. Thanks! would you comment on the table setup as well? Jim. \copy on 1.2million rows should only take a minute or two, you could make that table unlogged as well to speed it up more. If you could truncate / drop / create / load / then index the table each time then you'll get the best throughput. Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 22:03, David Kerr d...@mr-paradox.net wrote: \copy on 1.2million rows should only take a minute or two, you could make that table unlogged as well to speed it up more. If you could truncate / drop / create / load / then index the table each time then you'll get the best throughput. Thanks, Could you explain on the runcate / drop / create / load / then index the table each time then you'll get the best throughput. part.. or point me to some docs?.. Jim Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 22:08, Jim Green student.northwest...@gmail.com wrote: On 20 March 2012 22:03, David Kerr d...@mr-paradox.net wrote: \copy on 1.2million rows should only take a minute or two, you could make that table unlogged as well to speed it up more. If you could truncate / drop / create / load / then index the table each time then you'll get the best throughput. Thanks, Could you explain on the runcate / drop / create / load / then index the table each time then you'll get the best throughput. part.. or point me to some docs?.. Also if I use copy, I would be tempted to go the one table route, or else I need to parse my raw daily file, separate to individual symbol file and copy to individual table for each symbol(this sounds like not very efficient).. Jim Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 07:08 PM, Jim Green wrote: On 20 March 2012 22:03, David Kerrd...@mr-paradox.net wrote: \copy on 1.2million rows should only take a minute or two, you could make that table unlogged as well to speed it up more. If you could truncate / drop / create / load / then index the table each time then you'll get the best throughput. Thanks, Could you explain on the runcate / drop / create / load / then index the table each time then you'll get the best throughput. part.. or point me to some docs?.. Jim I'm imagining that you're loading the raw file into a temporary table that you're going to use to process / slice new data data into your 7000+ actual tables per stock. So that table doesn't probably need to be around once you've processed your stocks through that table. so you could just truncate/drop it after you're done. When you create it, if you avoid indexes the inserts will be faster (it doesn't have to rebuild the index every insert) so then once the table is loaded, you create the indexes (So it's actually useful) and then process the data into the various stock tables. Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/12 7:12 PM, Jim Green wrote: Also if I use copy, I would be tempted to go the one table route, or else I need to parse my raw daily file, separate to individual symbol file and copy to individual table for each symbol(this sounds like not very efficient).. your 7000 tables all contain the exact same information, with the only difference being the stock ticker symbol, right?then really, the single table, perhaps partitioned by month or whatever, is the right way to go. Any schema that makes you have to build SQL statements from strings for each query is designed wrong. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 09:12 PM, Jim Green wrote: On 20 March 2012 22:08, Jim Greenstudent.northwest...@gmail.com wrote: On 20 March 2012 22:03, David Kerrd...@mr-paradox.net wrote: \copy on 1.2million rows should only take a minute or two, you could make that table unlogged as well to speed it up more. If you could truncate / drop / create / load / then index the table each time then you'll get the best throughput. Thanks, Could you explain on the runcate / drop / create / load / then index the table each time then you'll get the best throughput. part.. or point me to some docs?.. Also if I use copy, I would be tempted to go the one table route, or else I need to parse my raw daily file, separate to individual symbol file and copy to individual table for each symbol(this sounds like not very efficient).. Jim Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general I think the decisions: 1) one big table 2) one big partitioned table 3) many little tables would probably depend on how you want to read the data. Writing would be very similar. I tried to read through the thread but didnt see how you're going to read. I have apache logs in a database. Single table, about 18 million rows. I have an index on hittime (its a timestamp), and I can pull a few hundred records based on a time, very fast. On the other hand, a count(*) on the entire table takes a while. If you are going to hit lots and lots of records, I think the multi-table (which include partitioning) would be faster. If you can pull out records based on index, and be very selective, then one big table works fine. On the perl side, use copy. I have code in perl that uses it (and reads from .gz as well), and its very fast. I can post some if you'd like. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 22:21, David Kerr d...@mr-paradox.net wrote: I'm imagining that you're loading the raw file into a temporary table that you're going to use to process / slice new data data into your 7000+ actual tables per stock. Thanks! would slice new data data into your 7000+ actual tables per stock. be a relatively quick operation? So that table doesn't probably need to be around once you've processed your stocks through that table. so you could just truncate/drop it after you're done. When you create it, if you avoid indexes the inserts will be faster (it doesn't have to rebuild the index every insert) so then once the table is loaded, you create the indexes (So it's actually useful) and then process the data into the various stock tables. Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 22:22, John R Pierce pie...@hogranch.com wrote: your 7000 tables all contain the exact same information, with the only difference being the stock ticker symbol, right? then really, the single table, perhaps partitioned by month or whatever, is the right way to go. Any schema that makes you have to build SQL statements from strings for each query is designed wrong. each table contains 1 minute price data for that symbol, so each table has the same schema, open, high,low,close and volume etc, but not the same data.. Thanks. -- john r pierce N 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 22:25, Andy Colson a...@squeakycode.net wrote: I think the decisions: 1) one big table 2) one big partitioned table 3) many little tables would probably depend on how you want to read the data. Writing would be very similar. I tried to read through the thread but didnt see how you're going to read. I have apache logs in a database. Single table, about 18 million rows. I have an index on hittime (its a timestamp), and I can pull a few hundred records based on a time, very fast. On the other hand, a count(*) on the entire table takes a while. If you are going to hit lots and lots of records, I think the multi-table (which include partitioning) would be faster. If you can pull out records based on index, and be very selective, then one big table works fine. On the perl side, use copy. I have code in perl that uses it (and reads from .gz as well), and its very fast. I can post some if you'd like. my queries would mostly consider select for one symbol for one particular day or a few hours in a particular day, occasionally I would do select on multiple symbols for some timestamp range. you code sample would be appreciated, Thanks! Jim. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 09:35 PM, Jim Green wrote: On 20 March 2012 22:25, Andy Colsona...@squeakycode.net wrote: I think the decisions: 1) one big table 2) one big partitioned table 3) many little tables would probably depend on how you want to read the data. Writing would be very similar. I tried to read through the thread but didnt see how you're going to read. I have apache logs in a database. Single table, about 18 million rows. I have an index on hittime (its a timestamp), and I can pull a few hundred records based on a time, very fast. On the other hand, a count(*) on the entire table takes a while. If you are going to hit lots and lots of records, I think the multi-table (which include partitioning) would be faster. If you can pull out records based on index, and be very selective, then one big table works fine. On the perl side, use copy. I have code in perl that uses it (and reads from .gz as well), and its very fast. I can post some if you'd like. my queries would mostly consider select for one symbol for one particular day or a few hours in a particular day, occasionally I would do select on multiple symbols for some timestamp range. you code sample would be appreciated, Thanks! Jim. -Andy Here is some copy/pasted parts: my @list = glob('*.gz'); for my $fname (@list) { $db-do('copy access from stdin'); open my $fh, -|, /usr/bin/zcat $fname or die $fname: $!; while ($fh) { # bunch of stuff to format sniped here # if you have comma separated or something you might be able # to just feed it in $db-pg_putcopydata($county\t$ip\t$time\t$status\t$size\t$url\t$ua\n); } $db-pg_endcopy; $db-commit; } Do you ever plan on batch deleted a BUNCH of records? Do you ever want to do read all of one symbol (like, select avg(high) from stocks where symbol = 'bob')? -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 22:43, Andy Colson a...@squeakycode.net wrote: Here is some copy/pasted parts: my @list = glob('*.gz'); for my $fname (@list) { $db-do('copy access from stdin'); open my $fh, -|, /usr/bin/zcat $fname or die $fname: $!; while ($fh) { # bunch of stuff to format sniped here # if you have comma separated or something you might be able # to just feed it in $db-pg_putcopydata($county\t$ip\t$time\t$status\t$size\t$url\t$ua\n); } $db-pg_endcopy; $db-commit; } Do you ever plan on batch deleted a BUNCH of records? no, after historical data is populated, I'll only add data daily. no delete.. Do you ever want to do read all of one symbol (like, select avg(high) from stocks where symbol = 'bob')? yes its possible but I would more likely grab the data to R and get the avg in R.. Thanks, Jim. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/12 7:49 PM, Jim Green wrote: yes its possible but I would more likely grab the data to R and get the avg in R.. avg() in the database is going to be a lot faster than copying the data into memory for an application to process. Also, you know there's a plR for postgres that lets you embed R functions in the database server and invoke them in SQL statements?? -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 09:49 PM, Jim Green wrote: On 20 March 2012 22:43, Andy Colsona...@squeakycode.net wrote: Do you ever plan on batch deleted a BUNCH of records? no, after historical data is populated, I'll only add data daily. no delete.. Do you ever want to do read all of one symbol (like, select avg(high) from stocks where symbol = 'bob')? yes its possible but I would more likely grab the data to R and get the avg in R.. Thanks, Jim. -Andy Based on your answers: my queries would mostly consider select for one symbol for one particular day or a few hours in a particular day, occasionally I would do select on multiple symbols for some timestamp range one big table would probably be about the same speed as multiple smaller tables. Either way you'll hit an index first for the above usage. no, after historical data is populated, I'll only add data daily. no delete.. Truncating/dropping a table is much faster than a huge delete... but if you'll never delete then it really doenst matter. yes its possible but I would more likely grab the data to R and get the avg in R.. but... to get the data to R you still have to step thru the entire table. If you have a partition per symbol (which is the same as having a separate table per symbol) then I believe you can step thru it faster (just a table scan) than if you had one big table (index lookups on symbol). So in this case, partitioned would be better (I think). So the score is: One big table = 1 Doesn't matter = 1 Partitioned = 1 Of course, there are probably other usage patters I'm not aware of. And I also am assuming some things based on what I've heard -- not of actual experience. I'm not sure this was really helpful :-) -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
Also look at a clustered index on timestamp Brent Wood GIS/DBA consultant NIWA +64 (4) 4 386-0300 From: pgsql-general-ow...@postgresql.org [pgsql-general-ow...@postgresql.org] on behalf of Jim Green [student.northwest...@gmail.com] Sent: Wednesday, March 21, 2012 2:50 PM To: David Kerr Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] huge price database question.. On 20 March 2012 21:40, David Kerr d...@mr-paradox.net wrote: On 03/20/2012 04:27 PM, Jim Green wrote: Greetings list! I am pretty new to postgresql from mysql and did a fairly extensive search of the list and came up with a few good ones but didn't find the exact same situation as I have now. so I am venturing asking here. I have daily minute stock price data from 2005 on and each day with columns timestamp, open,high,low,close,volume and a few more. each day's data is about 1.2million rows. I want import all the data to postgresql and analyze using R with the help of Rpostgresql. right now I am having about 7000 tables for individual stock and I use perl to do inserts, it's very slow. I would like to use copy or other bulk loading tool to load the daily raw gz data. but I need the split the file to per stock files first before I do bulk loading. I consider this a bit messy. I would seek advise on the following idea: store everything in a big table, partition by month(this gives a reasonable number of partitions) and do bulk loading on the daily file. my queries would consist mostly select on a particular symbol on a particular day. Also in the future, I will import daily data to the db every day. my hardware is 16G Ram, 4x5400rpm raid10 with enough space. Thanks! Jim. Seems like you'd want to do this? http://search.cpan.org/~turnstep/DBD-Pg-2.19.2/Pg.pm#pg_putcopydata COPY support DBD::Pg allows for quick (bulk) reading and storing of data by using the COPY command. The basic process is to use $dbh-do to issue a COPY command, and then to either add rows using pg_putcopydata, or to read them by using pg_getcopydata. Thanks! would you comment on the table setup as well? Jim. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Please consider the environment before printing this email. NIWA is the trading name of the National Institute of Water Atmospheric Research Ltd. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is it even possible?
Looks promising. Does anyone know if you install tpostgres using the postgres EDB before using Kyng Chaos'. Im not sure of the process… Thanks, Sam On Mar 20, 2012, at 3:16 PM, Bryan Lee Nuse wrote: Is there anyone who has ever successfully gotten postgres/postGIS running on Mac Lion? Really? How? Hello Sam, I'm running Lion, and had the same trouble using the Enterprise Stack Builder to install PostGIS. I finally got it working by using Kyng Chaos' installers for both PostgreSQL and PostGIS: http://www.kyngchaos.com/software/postgres Bryan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 08:54 PM, Brent Wood wrote: Also look at a clustered index on timestamp Brent Wood GIS/DBA consultant NIWA +64 (4) 4 386-0300 A clustered index is only clustered at the point in time you run the command. It wont remain that way, and with a really big table, you don't wanna spend the time re-clustering it every time you import more data. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 03/20/2012 07:26 PM, Jim Green wrote: On 20 March 2012 22:21, David Kerrd...@mr-paradox.net wrote: I'm imagining that you're loading the raw file into a temporary table that you're going to use to process / slice new data data into your 7000+ actual tables per stock. Thanks! would slice new data data into your 7000+ actual tables per stock. be a relatively quick operation? well, it solves the problem of having to split up the raw file by stock symbol. From there you can run multiple jobs in parallel to load individual stocks into their individual table which is probably faster than what you've got going now. It would probably be faster to load the individual stocks directly from the file but then, as you said, you have to split it up first, so that may take time. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 22:57, John R Pierce pie...@hogranch.com wrote: avg() in the database is going to be a lot faster than copying the data into memory for an application to process. I see.. Also, you know there's a plR for postgres that lets you embed R functions in the database server and invoke them in SQL statements?? Thanks for this tip! Jim. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On 20 March 2012 23:01, Andy Colson a...@squeakycode.net wrote: Of course, there are probably other usage patters I'm not aware of. And I also am assuming some things based on what I've heard -- not of actual experience. I am not expert in sql, so what I get out of postgresql is probably mostly select, but as you hinted there may be faster ways to do things in sql natively rather than selecting stuff out and do stuff in R. I need to learn more sql:) I'm not sure this was really helpful :-) it's very helpful, Thanks! Jim. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Authenticating from a web service call
On Fri, Mar 16, 2012 at 11:39 AM, Bryan Montgomery mo...@english.net wrote: Hello, We are looking at implementing a web service that basically makes calls to the database. I have been thinking about ways to secure the web service based on the database. I initially thought about just connecting to the database as the user with parameters passed through the web service - however I don't know how to do that other than clear text passwords. It's a problem we have been looking at for some time in LedgerSMB, actually. So I have some thoughts on the topic. PostgreSQL is remarkably flexible here and so you have a bunch of options depending on your needs. The basic thing is you have to have re-usable credentials so things like client cert auth, or httpd-digest won't work. So the clients have to pass the password to the web server in a way it can use them to log in. So, is it possible for clients to encrypt their password and pass that through the web service to the database? SSL protecting both the link from the client to the web service and the web service to the db is what we recommend with LedgerSMB. It's the most versatile approach since it doesn't require any other infrastructure. Another approach would be to use Kerberos 5 auth on both sides and pass the forwardable ticket through. More secure but the client has to be part of a KRB5 realm and configuration is a bit more complex. I was looking at the way postgres stores the users passwords but first of all I'm not sure if that is something the client could do. Then, if they could, how to go about connecting as a system user and verifying that the userid and password provided by the client are correct. Ick... I don't like that. It requires too much knowledge and replay vulnerabilities across the whole process. Best Wishes, Chris Travers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Binary compatibility
Hello. We have FreeBSD/amd64 PostgreSQL 9.0 server and would like to move data to Linux/amd64 PostgreSQL 9.0 server. Are databases on these systems binary compatible? Can I just transfer datafiles or I have to do full export/import? -- Best regards, Alexander Pyhalov, system administrator of Computer Center of Southern Federal University -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Binary compatibility
Alexander Pyhalov a...@rsu.ru writes: We have FreeBSD/amd64 PostgreSQL 9.0 server and would like to move data to Linux/amd64 PostgreSQL 9.0 server. Are databases on these systems binary compatible? Can I just transfer datafiles or I have to do full export/import? If built with same configure options, I'd expect those two to be binary compatible. But the wise man will test it before trusting it, no? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] POSTGRESQL Newbie
folks, i am newbie in prosgretsql i am in midst of making decission of which database techology shoould i choose for our large web apps. mysql or postgresql? could you share larges sites who are implemented Posgtresql successfully in term of replication, clustering, scale, and etc. I heard postgresql does not have great feature for scale, replication and clustering like mysql, is it true?
Re: [GENERAL] POSTGRESQL Newbie
On Tue, Mar 20, 2012 at 11:27 PM, Geek Matter geekmat...@yahoo.com wrote: folks, i am newbie in prosgretsql i am in midst of making decission of which database techology shoould i choose for our large web apps. mysql or postgresql? could you share larges sites who are implemented Posgtresql successfully in term of replication, clustering, scale, and etc. I heard postgresql does not have great feature for scale, replication and clustering like mysql, is it true? Skype. And pgsql has some great replication solutions that actually work -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] POSTGRESQL Newbie
any other large sites use postgresql? i need to make right descission coz my decision will affect business that is related with $ From: Scott Marlowe scott.marl...@gmail.com To: Geek Matter geekmat...@yahoo.com Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org Sent: Wednesday, March 21, 2012 1:32 PM Subject: Re: [GENERAL] POSTGRESQL Newbie On Tue, Mar 20, 2012 at 11:27 PM, Geek Matter geekmat...@yahoo.com wrote: folks, i am newbie in prosgretsql i am in midst of making decission of which database techology shoould i choose for our large web apps. mysql or postgresql? could you share larges sites who are implemented Posgtresql successfully in term of replication, clustering, scale, and etc. I heard postgresql does not have great feature for scale, replication and clustering like mysql, is it true? Skype. And pgsql has some great replication solutions that actually work -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general