Re: [GENERAL] Extensions and privileges in public schema
On Sun, Dec 4, 2016 at 4:24 PM, Paul Ramseywrote: > When you create the student user, remove their create privs in public. > Then create a scratch schema and grant them privs there. > Finally, alter the student user so that the scratch schema appears FIRST > in their search path. This will cause unqualified CREATE statements to > create in the scratch schema. > For full separation, give each student their own login and set the search > path to > > "$user", public > > That way each student gets their own private scratch area, and it is used > by default for their creates. > > P > > Paul, I've been avoiding giving each student an individual login role, but it might be worth it to consider for a future term. I've followed your (and Charles') advice to: REVOKE CREATE ON SCHEMA public FROM public; ALTER ROLE gus_faculty SET search_path = scratch,public,tiger; It also occurred to me that I don't want anyone changing data in spatial_ref_sys. I think I should revoke everything *except* SELECT and REFERENCES, and make this the default for new objects created in public schema: ALTER DEFAULT PRIVILEGES IN SCHEMA scratch REVOKE INSERT, UPDATE, DELETE, TRUNCATE, TRIGGER ON TABLES FROM public; Please let me know if this is inadvisable or violates accepted practice. Best, --Lee
[GENERAL] Extensions and privileges in public schema
This question is specifically motivated by my use of the PostGIS extension, but since other extensions create functions and other supporting objects in public schema, I believe it is more general. I'm teaching a university-level class using PostGIS. I have created a scratch schema for students to create objects in. At the end of the term I can drop scratch and start fresh the following term. Students of course can also create objects in public schema, and often do unintentionally because the forget to schema qualify their CREATE TABLE statements. This complicates things because I can't drop public schema without dropping various PostGIS (and other) tables and functions. Additionally, while I doubt the students would do something like drop a public function or supporting table (like spatial_ref_sys), it nonetheless seems like a poor idea for these database objects to be vulnerable. What is considered best practices in this case? Should PostGIS extension be kept in its own schema (as was suggested when I asked about this on GIS.SE)? If I do so, can I treat public schema the way I have been using scratch schema, i.e. could I drop and recreate clean public schema at end of term? Should I leave extensions in public but limit rights of public role in that schema (so that they don't unintentionally create tables there, or accidentally delete other objects)? Or do Postgres DBA's just not worry about the objects in public schema, and rely upon applications and login roles to interact with the database intelligently? To be clear, primary goal is to keep student created objects in one schema which can be dropped at the end of the term. But the question of preventing accidental creation/deletion of objects in public schema is possibly related, and the overall database organization might address both concerns. Best, --Lee -- Lee Hachadoorian Assistant Professor of Instruction, Geography and Urban Studies Assistant Director, Professional Science Master's in GIS Temple University
Re: [GENERAL] Replication Recommendation
On Mon, Sep 12, 2016 at 5:12 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote: > On 09/12/2016 12:46 PM, Lee Hachadoorian wrote: >> >> There are a wide variety of Postgres replication solutions, and I >> would like advice on which one would be appropriate to my use case. >> >> * Small (~half dozen) distributed workforce using a file sharing >> service, but without access to direct network connection over the >> internet >> * Database is updated infrequently, when new government agency data >> releases replace old data >> * Because database is updated infrequently, workforce can come >> together for LAN-based replication as needed >> * Entire database is on the order of a few GB >> >> Given this, I am considering the super lowtech "replication" solution >> of updating "master" and doing a full database drop and restore on the >> "slaves". But I would like to know which of the other (real) >> replication solutions might work for this use case. > > > If I follow correctly the layout is?: > > Main database <--- Govt. data > | > | >\ / > >File share > | > | >\ / > > DB DBDB DB DBDB > > User 1 User 2User 3 User 4 User 5User 6 > > > > For your simple scenario you might want to look at: > > https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html > That diagram is what I am proposing. pg_basebackup looks interesting. My initial impression is that the main gain would be for a multiple database cluster. Are there other advantages to using this in preference to a full DB dump and restore if all of our data will be in a single database? Best, --Lee -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Replication Recommendation
There are a wide variety of Postgres replication solutions, and I would like advice on which one would be appropriate to my use case. * Small (~half dozen) distributed workforce using a file sharing service, but without access to direct network connection over the internet * Database is updated infrequently, when new government agency data releases replace old data * Because database is updated infrequently, workforce can come together for LAN-based replication as needed * Entire database is on the order of a few GB Given this, I am considering the super lowtech "replication" solution of updating "master" and doing a full database drop and restore on the "slaves". But I would like to know which of the other (real) replication solutions might work for this use case. Regards, --Lee -- Lee Hachadoorian Assistant Professor of Instruction, Geography and Urban Studies Assistant Director, Professional Science Master's in GIS Temple University -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres case insensitive searches
On 06/29/2013 09:02 AM, bhanu udaya wrote: Hello, I agree that it is just search condition. But, in a2.5 million record table search, upper function is not that fast. The expectation is to get the query retrieved in 100 ms...with all indexes used. I tried with upper, Citext, but the result set was more than a second. The OS server we are using is Linux 64 bit. Thanks and Regards Radha Krishna Subject: Re: [pgadmin-support] [GENERAL] Postgres case insensitive searches From: haram...@gmail.com Date: Sat, 29 Jun 2013 09:37:51 +0200 CC: laurenz.a...@wien.gv.at; pgsql-general@postgresql.org; pgadmin-supp...@postgresql.org To: udayabhanu1...@hotmail.com On Jun 29, 2013, at 3:59, bhanu udaya udayabhanu1...@hotmail.com wrote: Thanks. But, I do not want to convert into upper and show the result. Why not? It won't modify your results, just the search condition: SELECT id, val FROM t WHERE upper(val) LIKE 'AB%' ORDER BY val; Or: SELECT id, val FROM t WHERE upper(val) LIKE 'AB%' ORDER BY upper(val), val; Example, if I have records as below: id type 1. abcd 2. Abcdef 3. ABcdefg 4. aaadf The below query should report all the above select * from table where type like 'ab%'. It should get all above 3 records. Is there a way the database itself can be made case-insensitive with UTF8 characterset. I tried with character type collation POSIX, but it did not really help. I was under the impression this would work, but ISTR that not every OS has this capability (Postgres makes use of the OS collation mechanics). So, what OS are you running the server on? Duplicate the column with an upper or lowercase version and run all queries against that. CREATE TABLE foo ( id serial PRIMARY KEY, val text, val_lower text ); Index val_lower. Use triggers to keep val and val_lower in sync and discard all attempts to write directly to val_lower. Then all queries would be of the form SELECT id, val FROM foo WHERE val_lower LIKE 'ab%'; Wouldn't want to write every table like this, but if (a) query speed trumps all other requirements and (b) functional index, CITEXT, etc. have all been rejected as not fast enough --Lee -- Lee Hachadoorian Assistant Professor in Geography, Dartmouth College http://freecity.commons.gc.cuny.edu
[GENERAL] SQL keywords are suddenly case sensitive
List, SQL seems to be behaving in a case-sensitive manner: universe=# select 1; ?column? -- 1 (1 row) universe=# SELECT 1; ERROR: syntax error at or near SELECT 1 LINE 1: SELECT 1; ^ I cannot figure out how this happened, and Google is not helping because all I'm getting is information about case-sensitivity in identifiers or in string comparison. I didn't even think this was possible, as I've always switched between lower case and upper case keywords, usually using lower case while testing and upper case to prettify scripts that I will be saving for reuse. version() = PostgreSQL 9.1.8 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit Regards, --Lee -- Lee Hachadoorian Assistant Professor in Geography, Dartmouth College http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SQL keywords are suddenly case sensitive
On 04/16/2013 07:31 PM, Adrian Klaver wrote: On 04/16/2013 02:46 PM, Lee Hachadoorian wrote: List, SQL seems to be behaving in a case-sensitive manner: universe=# select 1; ?column? -- 1 (1 row) universe=# SELECT 1; ERROR: syntax error at or near SELECT 1 LINE 1: SELECT 1; ^ I cannot figure out how this happened, and Google is not helping because all I'm getting is information about case-sensitivity in identifiers or in string comparison. I didn't even think this was possible, as I've always switched between lower case and upper case keywords, usually using lower case while testing and upper case to prettify scripts that I will be saving for reuse. Interesting. Does this behavior survive logging out and then back into a session? It survives complete restart. (This is a laptop that I use for development and analysis, not a high-availability server, so the first thing I did when I realized my scripts started failing was reboot.) Do you have any other client using the database that exhibits this behavior? Same behavior in both psql and pgAdmin. Regards, --Lee -- Lee Hachadoorian Assistant Professor in Geography, Dartmouth College http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SQL keywords are suddenly case sensitive
On 04/16/2013 07:34 PM, Adrian Klaver wrote: On 04/16/2013 02:46 PM, Lee Hachadoorian wrote: List, SQL seems to be behaving in a case-sensitive manner: universe=# select 1; ?column? -- 1 (1 row) universe=# SELECT 1; ERROR: syntax error at or near SELECT 1 LINE 1: SELECT 1; ^ I cannot figure out how this happened, and Google is not helping because all I'm getting is information about case-sensitivity in identifiers or in string comparison. I didn't even think this was possible, as I've always switched between lower case and upper case keywords, usually using lower case while testing and upper case to prettify scripts that I will be saving for reuse. Another question: Are the psql and Postgres versions the same? Appears to both be 9.1.8. lee@tycho ~ $ psql -d universe psql (9.1.8) Type help for help. universe=# select version(); version -- PostgreSQL 9.1.8 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit Regards, --Lee -- Lee Hachadoorian Assistant Professor in Geography, Dartmouth College http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SQL keywords are suddenly case sensitive
On 04/16/2013 08:23 PM, Tom Lane wrote: SELECT 1 as all one token. Also, if you transcribed this accurately, it looks like the error cursor is pointing to the second character of the SELECT not the first (could you confirm that?). Which is even more bizarre. No, that must have been an email formatting thing. In psql, the caret is under the S. Looking at the other issues you raised, but just wanted to provide a quick answer to that. Best, --Lee -- Lee Hachadoorian Assistant Professor in Geography, Dartmouth College http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SQL keywords are suddenly case sensitive
On 04/16/2013 08:23 PM, Tom Lane wrote: Lee Hachadoorian lee.hachadooria...@gmail.com writes: SQL seems to be behaving in a case-sensitive manner: universe=# select 1; ?column? -- 1 (1 row) universe=# SELECT 1; ERROR: syntax error at or near SELECT 1 LINE 1: SELECT 1; ^ That's really bizarre, but I don't think it's a case sensitivity problem as such. Watch what I get from a syntax error on a normally-functioning system: $ psql psql (9.1.9) Type help for help. regression=# select 1; ?column? -- 1 (1 row) regression=# SELECT 1; ?column? -- 1 (1 row) regression=# xELECT 1; ERROR: syntax error at or near xELECT LINE 1: xELECT 1; ^ See the differences? The error message indicates that your parser saw SELECT 1 as all one token. Also, if you transcribed this accurately, it looks like the error cursor is pointing to the second character of the SELECT not the first (could you confirm that?). Which is even more bizarre. I'm not sure what's going on, but I think it's more likely to be something to do with whitespace not being really whitespace than it is with case as such. Consider the possibility that you're somehow typing a non-breaking space or some such character. One thing that might be useful is to examine the error report in the postmaster log using an editor that will show you any non-printing characters. regards, tom lane Tom, Yes, nonbreaking spaces was the problem. Sorry for the red herring re: case sensitivity, and thanks for figuring it out. Adrian, thanks for your input as well. Best, --Lee -- Lee Hachadoorian Assistant Professor in Geography, Dartmouth College http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Current Schema for Functions called within other Functions
Merlin, Perfect. Thank you. Best, --Lee On Mon, Apr 1, 2013 at 10:04 AM, Merlin Moncure mmonc...@gmail.com wrote: On Sun, Mar 31, 2013 at 5:32 PM, Lee Hachadoorian lee.hachadooria...@gmail.com wrote: I'm working on some PL/pgSQL functions to generate dynamic SQL. The functions live in the public schema, but the SQL generated will vary depending on what schema they are called from. Something is going on which I cannot figure out. I am defining variables by creating a getter function within each schema. This getter function is then called by the dynamic SQL function. But this works once, and then the value seems to persist. ```SQL CREATE SCHEMA var1; CREATE SCHEMA var2; SET search_path = public; /*This function generates dynamic SQL, here I have it just returning a string with the current schema and the value from the getter function.*/ DROP FUNCTION IF EXISTS sql_dynamic(); CREATE FUNCTION sql_dynamic() RETURNS text AS $function$ DECLARE sql TEXT := ''; BEGIN sql := current_schema() || ',' || get_var(); RETURN sql; END; $function$ LANGUAGE plpgsql; SET search_path = var1, public; SELECT get_var(); --Fails SELECT sql_dynamic(); --Fails DROP FUNCTION IF EXISTS get_var(); CREATE FUNCTION get_var() RETURNS text AS $get_var$ BEGIN RETURN 'var1'; END; $get_var$ LANGUAGE plpgsql; SELECT get_var(); SELECT sql_dynamic(); SET search_path = var2, public; SELECT get_var(); --Fails SELECT sql_dynamic(); --Returns current_schema and *old* get_var() value from wrong schema! DROP FUNCTION IF EXISTS get_var(); CREATE FUNCTION get_var() RETURNS text AS $get_var$ BEGIN RETURN 'var2'; END; $get_var$ LANGUAGE plpgsql; SELECT get_var(); --Succeeds SELECT sql_dynamic(); --Still returns *old* get_var() value from wrong schema! ``` At this point, if I DROP/CREATE sql_dyanamic() in public, get_var() run in the var2 schema works, but if I change the search_path back to var1, sql_dynamic() returns var1,var2. I also tried using a table to store the variable. I created a table var (with one field also named var) in each schema, then altered sql_dynamic() to return current_schema() and the value of var.var (unqualified, so that expected when search_path includes var1 it would return var1.var.var), but I ran into the same persistence problem. Once initialized in one schema, changing search_path to the other schema returns the correct current_schema but the value from the table in the *other* schema (e.g. var2,var1). What am I missing? in plpgsql, all functions and tables that are not schema qualified become schema qualified when the function is invoked and planned the first time. The line: ql := current_schema() || ',' || get_var(); attaches a silent var1. to get_var() so it will forever be stuck for that connection. The solution is to use EXECUTE. merlin -- Lee Hachadoorian Asst Professor of Geography, Dartmouth College http://freecity.commons.gc.cuny.edu/
[GENERAL] Current Schema for Functions called within other Functions
I'm working on some PL/pgSQL functions to generate dynamic SQL. The functions live in the public schema, but the SQL generated will vary depending on what schema they are called from. Something is going on which I cannot figure out. I am defining variables by creating a getter function within each schema. This getter function is then called by the dynamic SQL function. But this works once, and then the value seems to persist. ```SQL CREATE SCHEMA var1; CREATE SCHEMA var2; SET search_path = public; /*This function generates dynamic SQL, here I have it just returning a string with the current schema and the value from the getter function.*/ DROP FUNCTION IF EXISTS sql_dynamic(); CREATE FUNCTION sql_dynamic() RETURNS text AS $function$ DECLARE sql TEXT := ''; BEGIN sql := current_schema() || ',' || get_var(); RETURN sql; END; $function$ LANGUAGE plpgsql; SET search_path = var1, public; SELECT get_var(); --Fails SELECT sql_dynamic(); --Fails DROP FUNCTION IF EXISTS get_var(); CREATE FUNCTION get_var() RETURNS text AS $get_var$ BEGIN RETURN 'var1'; END; $get_var$ LANGUAGE plpgsql; SELECT get_var(); SELECT sql_dynamic(); SET search_path = var2, public; SELECT get_var(); --Fails SELECT sql_dynamic(); --Returns current_schema and *old* get_var() value from wrong schema! DROP FUNCTION IF EXISTS get_var(); CREATE FUNCTION get_var() RETURNS text AS $get_var$ BEGIN RETURN 'var2'; END; $get_var$ LANGUAGE plpgsql; SELECT get_var(); --Succeeds SELECT sql_dynamic(); --Still returns *old* get_var() value from wrong schema! ``` At this point, if I DROP/CREATE sql_dyanamic() in public, get_var() run in the var2 schema works, but if I change the search_path back to var1, sql_dynamic() returns var1,var2. I also tried using a table to store the variable. I created a table var (with one field also named var) in each schema, then altered sql_dynamic() to return current_schema() and the value of var.var (unqualified, so that expected when search_path includes var1 it would return var1.var.var), but I ran into the same persistence problem. Once initialized in one schema, changing search_path to the other schema returns the correct current_schema but the value from the table in the *other* schema (e.g. var2,var1). What am I missing? Thanks, --Lee -- Lee Hachadoorian Asst Professor of Geography, Dartmouth College http://freecity.commons.gc.cuny.edu/
[GENERAL] Check table storage parameters
How can I read the current storage parameters for an existing table? Specifically interested in autovacuum_enabled. Sorry to ask such basic question, but I can't find this in the docs, and every search I've tried ends up taking me to how to *set* the parameter with CREATE TABLE and ALTER TABLE. Thanks, --Lee -- Lee Hachadoorian Asst Professor of Geography, Dartmouth College http://freecity.commons.gc.cuny.edu/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Check table storage parameters
On Fri, Nov 16, 2012 at 10:11 AM, Lee Hachadoorian lee.hachadooria...@gmail.com wrote: How can I read the current storage parameters for an existing table? Specifically interested in autovacuum_enabled. On Fri, Nov 16, 2012 at 12:08 PM, Mike Blackwell mike.blackw...@rrd.com wrote: Try pg_class.reloptions? That was it. Interestingly, if the table uses the server default, reloptions is NULL. Best, --Lee -- Lee Hachadoorian Asst Professor of Geography, Dartmouth College http://freecity.commons.gc.cuny.edu/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_wrapper error
On Fri, Oct 26, 2012 at 9:37 AM, Kevin Grittner kgri...@mail.com wrote: José Pedro Santos wrote: When I try to use the command line tool shp2pgsql in the shell I have this error: Error: pg_wrapper: invalid command name I already read some information in the Debian lists but I don't understand the problem/relation with PostgreSQL. shp2pgsql converts ESRI Shape files into SQL suitable for insertion into a PostGIS/PostgreSQL database. You are not using PostgreSQL until you take the resulting file and run it against the database using some client application, like psql. pg_wrapper is an Ubuntu helper program that runs a PostgreSQL client program like psql after looking up which version to use for the requested database cluster. http://manpages.ubuntu.com/manpages/lucid/man1/shp2pgsql.1.html http://manpages.ubuntu.com/manpages/lucid/man1/pg_wrapper.1.html It appears that you don't have pg_wrapper installed, but are trying to use it (either directly or by running something which tries to use it). -Kevin No, pg_wrapper is installed. If it weren't, you would get pg_wrapper: command not found The issue is that it seems raster2pgsql and shp2pgsql are not installed. You cannnot enable them by making them each a link to pg_wrapper. You might want to bring this back to the PostGIS list, because I think this is a problem of an incorrect PostGIS installation. --Lee -- Lee Hachadoorian Asst Professor of Geography, Dartmouth College http://freecity.commons.gc.cuny.edu/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] dblink.sql and Linux
On 05/14/2012 09:19 PM, Mike Christensen wrote: I just installed Ubuntu 12.04 today. Postgres was not listed in the Ubuntu Software Center, so I downloaded the apt installer from: http://www.openscg.com/se/oscg_home_download.jsp Mike On Mon, May 14, 2012 at 6:10 PM, Devrim GÜNDÜZ dev...@gunduz.org wrote: Hi, On Mon, 2012-05-14 at 18:05 -0700, Mike Christensen wrote: I've become a big fan of DBLink lately, but I'm curious where it lives on Linux installations. Which Linux? Which package/installer? It mostly ships with the -contrib package. Full list of Ubuntu packages is available at packages.ubuntu.com. Postgres 9.1 (default) and 8.4 are in the repository. The Ubuntu Software Center hides stuff from you. If you search for postgres, you won't find it unless you click a not-very-obvious "Show [x] technical items" link at the bottom of the window. I would recommend using apt (command line) or Synaptic (graphical) instead of Software Center. sudo apt-get install postgresql postgresql-contrib should get you started. --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu
[GENERAL] Multiple COPY statements
Does anyone have experience or advice on how to efficiently issue a large number of COPY statements? The data (US Census) comes in 100 segments (each will be copied to its own database tables) for each state (51), for a total of 5000 text files. I can generate the COPY statements with a script. The two specific question I can think of (but I'm sure there's more that I'm not thinking of) are: 1) COPY is fastest when used within the same transaction as an earlier CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be written, because in case of an error, the files containing the newly loaded data will be removed anyway. Would I be able to take advantage of this if I: BEGIN; TRUNCATE import_table; COPY import_table FROM 'file1'; COPY import_table FROM 'file2'; ... COPY import_table FROM 'file51'; END; 2) Is there a performance hit to doing a COPY to more than one table in the same transaction? Any other advice will be appreciated. Regards, --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Multiple COPY statements
On Thu, May 10, 2012 at 2:42 PM, Andy Colson a...@squeakycode.net wrote: On 5/10/2012 1:10 PM, Lee Hachadoorian wrote: 2) Is there a performance hit to doing a COPY to more than one table in the same transaction? No, I don't think so. I assume you are the only user hitting the import_table, so holding one big transaction wont hurt anything. Actually what I mean is that there are multiple import tables, import_table1 ... import_table100. But it is true that I would be the only user hitting the import tables. Any other advice will be appreciated. To really speed it up, you'd need to run multiple concurrent connections each doing COPY's. Maybe up to the number of cores you have. (of course you dont want each connection to fire off truncates, but concurrent should trump skip wall in terms of speed). If import_table is just a temp holding stot you can look into temp and/or unlogged tables. Yes, it is a staging table, data needs to be manipulated before shunting to its desired destination. I think unlogged tables will be helpful, and if I understand correctly then I wouldn't need to use the BEGIN; TRUNCATE; COPY...; END; trick. And would unlogged + concurrent connections work together? --Lee -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge price database question..
On Tue, Mar 20, 2012 at 11:28 PM, Jim Green student.northwest...@gmail.comwrote: On 20 March 2012 22:57, John R Pierce pie...@hogranch.com wrote: avg() in the database is going to be a lot faster than copying the data into memory for an application to process. I see.. As an example, I ran average on a 700,000 row table with 231 census variables reported by state. Running average on all 231 columns grouping by state inside Postgres beat running it by R by a factor of 130 NOT COUNTING an additional minute or so to pull the table from Postgres to R. To be fair, these numbers are not strictly comparable, because it's running on different hardware. But the setup is not atypical: Postgres is running on a heavy hitting server while R is running on my desktop. SELECT state, avg(col1), avg(col2), [...] avg(col231) FROM some_table GROUP BY state; 5741 ms aggregate(dfSomeTable, by = list(dfSomeTable$state), FUN = mean, na.rm = TRUE) 754746 ms --Lee
Re: [GENERAL] huge price database question..
On Wed, Mar 21, 2012 at 12:57 PM, Andy Colson a...@squeakycode.net wrote: On 3/21/2012 11:45 AM, Lee Hachadoorian wrote: On 20 March 2012 22:57, John R Pierce pie...@hogranch.com mailto:pie...@hogranch.com wrote: avg() in the database is going to be a lot faster than copying the data into memory for an application to process. As an example, I ran average on a 700,000 row table with 231 census variables reported by state. Running average on all 231 columns grouping by state inside Postgres beat running it by R by a factor of 130 NOT COUNTING an additional minute or so to pull the table from Postgres to R. To be fair, these numbers are not strictly comparable, because it's running on different hardware. But the setup is not atypical: Postgres is running on a heavy hitting server while R is running on my desktop. SELECT state, avg(col1), avg(col2), [...] avg(col231) FROM some_table GROUP BY state; 5741 ms aggregate(dfSomeTable, by = list(dfSomeTable$state), FUN = mean, na.rm = TRUE) 754746 ms --Lee avg() might not be a good example though. If you just want average, great, but what about when you want to do lots of different stats'y functions that PG does not have? I'll bet R (not pl-R) can read a text file (a csv probably) pretty fast. Don't get me wrong. I DO use R for all kinds of stats and graphics stuff. I just tested avg() right now to test the claim made above. I did some garbage collection and reran the aggregate() on half the data and this time it only took 80 seconds. Much faster, though still an order of magnitude slower than running it inside Postgres. The 700,000 row data frame in an an unclean R session swamped the 6 GB RAM on my desktop. Maybe the original test was unfair. On the other hand, it does directly speak to John's comment that avg() in the database is going to be a lot faster than copying the data into memory for an application to process. If you were doing a really complex analysis I bet dumping a dataset out to csv first and then running R scripts over it would be fast and useful. Interesting: COPY statement (same table, 700,000 rows): 7761 ms SFTP file copy from Postgres server to local machine: 21 seconds (didn't time it and my FTP client didn't report it) read.csv (R command): 62 seconds Total ~ 91 seconds R commands: library(RPostgreSQL) declare connection object dfSomeTable = dbGetQuery(conn, SELECT * FROM some_table) ~ 85 seconds So whether you export to CSV first or use RPostgreSQL, it's roughly equivalent, though I would tend to favor RPostgreSQL just because it's one step instead of three. On the other hand: sql = SELECT state, avg(col1), avg(col2), [...] avg(col231) FROM some_table GROUP BY state; dfGrouped = dbGetQuery(conn, sql) ~ 5 seconds So aggregating inside Posgres, then the smaller data set via RPostgreSQL wins on two counts, speed of aggregation and speed of transfer. aggregate(dfSomeTable, by = list(dfSomeTable$state), FUN = mean, na.rm = TRUE) Is this a Postgres stored proc (pl-R)? Or is that R itself? If its plR then I wonder if its stepping through the recordset twice. Depending on how the function is written, I can see the function firing off a query, PG collects the entire recordset, and hands it back to the function, which then iterates thru it again. (vs. having the function called for each row as PG steps thru the recordset only once). This is run in R. Data frame (dfSomeTable in this code snippet) is already in R, so no trips to Postgres are taking up time in this process. The real reason for the inflated time is pointed out above. --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu/
Re: [GENERAL] DDL Triggers
On Wed, Feb 15, 2012 at 12:42 PM, Bob Pawley rjpaw...@shaw.ca wrote: I have a table that is generated through ogr2ogr. To get ogr2ogr working the way I want, I need to use the -overwrite function. If I use the append function information is lost. Something to do with the way the switches work. Overwrite drops the existing table and also the attached trigger . I am attempting to have the dropped table trigger a function as it is being populated by insert commands. DDL trigger was mentioned as a possibility on the GDAL list. Any suggestions will be welcome. Bob Are you sure ogr2ogr is the right tool for what you want to accomplish? Reading between the lines, it *seems* like you intend to do an append (SQL INSERT), but ogr2ogr isn't working the way you want, forcing you to use overwrite (DROP/CREATE TABLE). Trying to use a DDL trigger seems like a roundabout way to get what you want when the problem is ogr2ogr. So, I would back up and ask, what are you trying to do, and what information is being lost using -append? Also, you mentioned asking this on the GDAL list, did you try the PostGIS list? --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu/
Re: [GENERAL] JOIN column maximum
On Thu, Jan 5, 2012 at 10:19 PM, Darren Duncan dar...@darrenduncan.net wrote: This all being said, 23K values per row just sounds wrong, and I can't imagine any census forms having that many details. Do you, by chance, have multiple values of the same type that are in different fields, eg telephone_1, telephone_2 or child_1, child_2 etc? You should take any of those and collect them into array-typed fields, or separate tables with just telephone or child columns. Or do you say have a set of coordinates in separate fields? Or you may have other kinds of redundancy within single rows that are best normalized into separate rows. With 23K values, these probably have many mutual associations, and you could split that table into a bunch of other ones where columns that relate more closely together are collected. My question is already answered, so this is mostly for anyone curious about Census data, and if anyone wants to add to this, feel free. You're right that no census form has that many details. The American Community Survey replaced the old Census long form, so it does have more details than the form sent out for the 2010 Census, but not 23,000 questions. It might ask, e.g. income, sex, and how you travel to work. But the forms are private, so that data (the so-called microdata) is not released in its entirety. What I am working with is called the summary file, which presents the data in aggregate. That means you have an income table with 16 income classes, plus a total column. Then you have 9 more tables which show the same income classes by 9 racial and ethnic categories, for a total of 153 more columns. Then you also have a table which crosses 9 income classes by 5 mobility statuses (living in same house, moved within county, moved within state, etc.) for a total of 55 columns. Then you have income classes crossed with sex, income classes crossed with mode of transportation to work, sex crossed with mode of transportation to work, etc. When all is said and done, you have 23,000 columns. Believe me, I would all love to be working directly with the microdata. All of these different ways of slicing and dicing the categories are basically how the Census Bureau provides as much detail as possible without providing so much detail that privacy would be compromised (i.e., enough information is available that specific individuals could be identified). That inevitably leads to a great deal of redundancy in the data, since the same individuals are just being counted in different groups in different tables. Given all that, one could still take the data that came from Census and try to normalize it and organize it, but my main goal given the size of the dataset is to keep it as similar as possible to the source, so that a researcher familiar with the Census data can work with our database using the Census documentation without our having to produce a new set of documentation. The way I had done that was to store the sequences (which are merely a data dissemination convenience, and have no relationship to the logic of the data) in Postgres tables, and make the subject tables (which do represent logical groupings of data) into views. I'm thinking about making the sequences into array columns. The subject tables would still be views. --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu/
[GENERAL] JOIN column maximum
How is the number of columns in a join determined? When I combine somewhere around 90 tables in a JOIN, the query returns: ERROR: joins can have at most 32767 columns SQL state: 54000 I'm sure most people will say Why the hell are you joining 90 tables. I've asked this list before for advice on how to work with the approximately 23,000 column American Community Survey dataset, and based on previous responses I am trying to combine 117 sequences (basically vertical partitions of the dataset) into one table using array columns. Of course, I can build this up by joining a few tables at a time, so the question is mostly curiosity, but I haven't been able to find this documented anywhere. Moreover, the 32767 limit doesn't map to any immediately intuitive transformation of 90, like squaring (which is much too low) or factorial (which is much to high). Any insight? Regards, --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu/
Re: [GENERAL] JOIN column maximum
On 01/05/2012 06:18 PM, Tom Lane wrote: ERROR: joins can have at most 32767 columns It's the sum of the number of columns in the base tables. That makes sense. I totally misunderstood the message to be referring to the number of joined columns rather than table columns. I've asked this list before for advice on how to work with the approximately 23,000 column American Community Survey dataset, Are there really 23000 populated values in each row? I hesitate to suggest an EAV approach, but it kinda seems like you need to go in that direction. You're never going to get decent performance out of a schema that requires 100-way joins, even if you avoid bumping up against hard limits. Many of the smaller geographies, e.g. census tracts, do in fact have data for the vast majority of the columns. I am trying to combine it all into one table to avoid the slowness of multiple JOINs (even though in practice I'm never joining all the tables at once). EAV sounds correct in terms of normalization, but isn't it usually better performance-wise to store write-once/read-many data in a denormalized (i.e. flattened) fashion? One of these days I'll have to try to benchmark some different approaches, but for now planning on using array columns, with each sequence (in the Census sense, not the Postgres sense) of 200+ variables in its own array rather than its own table. --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Significant Digits in Floating Point Datatype
I would like to store some in a single array some data that is conceptually related, but some of the data is floating point, and some of it is integer. Obviously the integer data *can* be stored as double precision, but I need to know about potential loss of precision. Double precision has a precision of at least 15 digits. I would assume that that would mean that for double precision, 15 digits of data would be faithfully preserved. But I found a question on the list where a value stored as 955.60 sometimes returns as 955.5998. (http://archives.postgresql.org/pgsql-general/2011-08/msg00144.php) If this is the case, what does a precision of at least [x] digits actually mean? And can I reliably retrieve the original integer by casting to int (or bigint) if the number of digits in the original integer is less than 15? Regards, --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Significant Digits in Floating Point Datatype
On 11/20/2011 02:06 PM, Tom Lane wrote: Lee Hachadoorianlee.hachadoor...@gmail.com writes: And can I reliably retrieve the original integer by casting to int (or bigint) if the number of digits in the original integer is less than 15? On IEEE-floating-point machines, I'd expect float8 to store integers up to 2^52 (or maybe it's 2^53) exactly. With other floating-point formats the limit might be different, but it should still be exact for reasonable-size integers. This has nothing whatever to do with whether decimal fractions are reproduced exactly (in general, they aren't, no matter how many or few digits are involved). So integers are fine, bigints not so much. regards, tom lane Thank you, that clarification is extremely useful. --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Large Rows
On 10/26/2011 12:31 AM, David Johnston wrote: On Oct 25, 2011, at 22:17, Lee Hachadoorianlee.hachadoor...@gmail.com wrote: I need some advice on storing/retrieving data in large rows. Invariably someone points out that very long rows are probably poorly normalized, but I have to deal with how to store a dataset which cannot be changed, specifically the ~23,000 column US Census American Community Survey. The Census releases these data in 117 sequences of 256 columns (in order to be read by spreadsheet applications with a 256 column limit). I have previously stored each sequence in its own table, which is pretty straightforward. My problem is that some of the demographic researchers I work with want a one-table dump of the entire dataset. This would primarily be for data transfer. This is of limited actual use in analysis, but nonetheless, that's what we want to be able to do. Now, I can't join all the sequences in one SQL query for export because of the 1600 column limit. So based on previous list activity (Tom Lane: Perhaps you could collapse multiple similar columns into an array column? http://archives.postgresql.org/pgsql-admin/2008-05/msg00211.php), I decided to try to combine all the sequences into one table using array columns. (This would actually make querying easier since the users wouldn't have to constantly JOIN the sequences in their queries.) Next problem: I run into the 8k row size limit once about half the columns are populated. As far as I can understand, even though a row theoretically supports a 1.6TB (!) row size, this only works for TOASTable data types (primarily text?). The vast majority of the 23k columns I'm storing are bigint. Questions: 1) Is there any way to solve problem 1, which is to export the 23k columns from the database as it is, with 117 linked tables? 2) Is there any way to store the data all in one row? If numeric types are un-TOASTable, 23k columns will necessarily break the 8k limit even if they were all smallint, correct? Regards, --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general You can brute-force a 23k column CSV output file using a programming language but if you need to keep it in a database the fact we are talking about being over the numeric column limit by a factor of twenty means you are basically SOL with PostgreSQL. Even if such a table were possible how it, in it's entirety, would be useful is beyond me. It's not, as no one would ever analyze all the variables at once. Doing this with a programming language is probably the way to go. But am I correct that using arrays to reduce the number of columns won't work because numeric data types aren't TOASTable? There are few things that cannot be changed, and this requirement is unlikely to be one of those things. Your problems are more political than technical and those are hard to provide advice for in an e-mail. If you need technical solutions there may be another tool out there that can get you what you want but stock PostgreSQL isn't going to cut it. Not having any idea what those 23k columns are doesn't help either; the census questionnaire isn't that big... The vast majority of the columns represent population counts. Sometimes it might represent a dollar amount (income or contract rent, for example). While a sample of individual questionnaires is released (the microdata), this question concerns the summary files, where the individual answers are categorized/bucketed and aggregated by various geographies. So a cell might represent number of people in a county (row) who commuted to work by bicycle (column). The number of rows grows when various categories are crossed with each other. Table B08519 - MEANS OF TRANSPORTATION TO WORK BY WORKERS' EARNINGS IN THE PAST 12 MONTHS contains 6 transportation modes crossed by 8 income classes, for 63 columns once subtotals are added. The complete list of variables is available at http://www2.census.gov/acs2009_5yr/summaryfile/Sequence_Number_and_Table_Number_Lookup.xls. Instead of giving them what they think they want talk to them and then try to provide them what they actually need given the limitations of your current toolset, or resolve to find a more suitable tool if the needs are valid but cannot be met with the existing tools. David J. Regards, --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Large Rows
On Wed, Oct 26, 2011 at 10:15 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Oct 26, 2011 at 2:57 PM, Lee Hachadoorian lee.hachadoor...@gmail.com wrote: Interesting. Although your example of one, 10-dimension array works, five hundred 2-dimension arrays does not work. I can do the SELECT, but the CREATE TABLE fails: ERROR: row is too big: size 9024, maximum size 8160 SQL state: 54000 David has already hit the nail on the head in terms of this being a political problem rather than a technology problem. I'm open to ideas, but I realize there might be no other answer than No one in their right mind should do this. No, this is a technology problem. Toast pointers are 20 bytes per column, so with 500 columns that is 1 bytes - which will not fit in one block. If you wish to fit this in then you should use a 2 dimensional array, which will then be just 1 column and your data will fit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services Very useful to know. Thank you. --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu/
[GENERAL] Large Rows
I need some advice on storing/retrieving data in large rows. Invariably someone points out that very long rows are probably poorly normalized, but I have to deal with how to store a dataset which cannot be changed, specifically the ~23,000 column US Census American Community Survey. The Census releases these data in 117 sequences of 256 columns (in order to be read by spreadsheet applications with a 256 column limit). I have previously stored each sequence in its own table, which is pretty straightforward. My problem is that some of the demographic researchers I work with want a one-table dump of the entire dataset. This would primarily be for data transfer. This is of limited actual use in analysis, but nonetheless, that's what we want to be able to do. Now, I can't join all the sequences in one SQL query for export because of the 1600 column limit. So based on previous list activity (Tom Lane: Perhaps you could collapse multiple similar columns into an array column? http://archives.postgresql.org/pgsql-admin/2008-05/msg00211.php), I decided to try to combine all the sequences into one table using array columns. (This would actually make querying easier since the users wouldn't have to constantly JOIN the sequences in their queries.) Next problem: I run into the 8k row size limit once about half the columns are populated. As far as I can understand, even though a row theoretically supports a 1.6TB (!) row size, this only works for TOASTable data types (primarily text?). The vast majority of the 23k columns I'm storing are bigint. Questions: 1) Is there any way to solve problem 1, which is to export the 23k columns from the database as it is, with 117 linked tables? 2) Is there any way to store the data all in one row? If numeric types are un-TOASTable, 23k columns will necessarily break the 8k limit even if they were all smallint, correct? Regards, --Lee -- Lee Hachadoorian PhD, Earth Environmental Sciences (Geography) Research Associate, CUNY Center for Urban Research http://freecity.commons.gc.cuny.edu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] The first dedicated PostgreSQL forum
If anyone's interested, I've started accessing the postgres list through gmane.org (along with several other mailing lists I subscribe to). It's gives you the choice of reading the list as a threaded archive, a blog, or through an NNTP newsreader or an RSS feed. Everyone chooses their preferred interface, the community is not fractured by interface preference. --Lee On 11/15/2010 03:13 PM, Joshua D. Drake wrote: On Mon, 2010-11-15 at 21:06 +0100, Magnus Hagander wrote: I do think that the PostgreSQL lists are available there. And other than that, they are on markmail.org, Nabble, etc. AFAIK several of those allow both reading and posting. Is there actually something about these interfaces that people find *missing*, or can we easily solve this whole thing by more clearly telling people that these options exist? Mainly I think it is the user interface and the fact that they are external. They don't look, act or feel like forums. Shrug. Further they aren't part of postgresql.org so nobody knows the level of real support they are going to get. JD -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Connection timeouts from pgAdmin
John, Just wanted to reply that this seems to have been the right track. Rather than change the firewall settings, our network administrator was able set postgres to send a keepalive to the client. Thanks, --Lee On Thu, Mar 4, 2010 at 5:26 PM, Lee Hachadoorian lee.hachadoor...@gmail.com wrote: On Thu, Mar 4, 2010 at 5:01 PM, John R Pierce pie...@hogranch.com wrote: are you running pgadmin and postgres server on the same computer, or on different computers? Different computers. if different computers, is there any sort of connection tracking in between, such as a NAT router/gateway? 15-20 mins sounds a lot like the typical NAT idle connection timeout... I will have to ask the network administrator and respond. Thanks, --Lee -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Connection timeouts from pgAdmin
I use the SQL editor in pgAdmin for all of my database work. The problem I'm having is that after a period of inactivity, pgAdmin loses the connection to the server. I'm trying to figure out how to avoid this happening. Not being a network administrator, I'm wondering if this is related to the tcp_keepalives settings described at http://www.postgresql.org/docs/8.4/static/runtime-config-connection.html. Can someone confirm that I'm looking at the right settings, or is there something else that might cause clients to lose the connection? The connection seems to be being lost after ~ 15-20 minutes of inactivity. Thanks, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Connection timeouts from pgAdmin
On Thu, Mar 4, 2010 at 5:01 PM, John R Pierce pie...@hogranch.com wrote: are you running pgadmin and postgres server on the same computer, or on different computers? Different computers. if different computers, is there any sort of connection tracking in between, such as a NAT router/gateway? 15-20 mins sounds a lot like the typical NAT idle connection timeout... I will have to ask the network administrator and respond. Thanks, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Array columns vs normalized table
I work with state labor data which is reported to us in the form industry, year, quarter1, quarter2, quarter3, quarter4 where each quarter represents an employment count. Obviously, this can be normalized to industry, year, quarter, employment Can anyone comment on, or point to me to an article or discussion regarding, why one would use an array column instead of normalizing the data? That is, would there be any benefit to storing it as industry int, year smallint, employment int[ ] where the last column would be a four element array with data for the four quarters. Thanks, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Array columns vs normalized table
Pavel, the idea of using arrays to store long time-series data sounds good, but I take your point that normalized tables might be better and are easier to query. I suppose the choice will be between normalizing or using the denormalized industry int, year smallint, emp_q1 int, emp_q2 int, emp_q3 int, emp_q4 int as suggested by Peter and Scott. We're mostly actually interested in annual numbers, but need to preserve the quarterly data for verification and for unusual analyses. So perhaps storing denormalized with an additional emp_annual int field, and a view that keeps the quarterly data out of sight. Thanks for your replies. Please feel free to comment if you think of anything else. Best, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Client Encoding and Latin characters
My database is encoded UTF8. I recently was uploading (via COPY) some census data which included place names with ñ, é, ü, and other such characters. The upload choked on the Latin characters. Following the docs, I was able to fix this with: SET CLIENT_ENCODING TO 'LATIN1'; COPY table FROM 'filename'; After which I SET CLIENT_ENCODING TO 'UTF8'; I typically use COPY FROM to bulk load data. My question is, is there any disadvantage to setting the default client_encoding as LATIN1? I expect to never be dealing with Asian languages, or most of the other LATINx languages. If I ever try to COPY FROM data incompatible with LATIN1, the command will just choke, and I can pick an appropriate encoding and try again, right? Thanks, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Client Encoding and Latin characters
Uh, no. You can pretty much assume that LATIN1 will take any random byte string; likewise for any other single-byte encoding. UTF8 as a default is a bit safer because it's significantly more likely that it will be able to detect non-UTF8 input. regards, tom lane So, IIUC, the general approach is: *Leave the default client_encoding = server_encoding (in this case UTF8) *Rely on the client to change client_encoding on a session basis only Thanks, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PostgreSQL won't start
Last week I set up Postgres 8.3 on a WindowsXP machine. Had it up and running and imported data. Now when I try to start the server (after a machine restart) I get the message: pg_ctl: another server might be running; trying to start server anyway PANIC: could not open control file global/pg_control: Permission denied This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. server starting In the archives I've seen suggestions to make sure to start using the postgres account (I am), make sure postgres is configured as a service (it is), and make sure global/pg_control and the rest of the PostgreSQL directory has read/write access (it does). What else can I try to start the server? Thanks, Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL won't start
It was far stupider than that. I had been playing around with a couple of different data clusters before doing a complete reinstall of PostgreSQL. I just realized I was trying to start a cluster that I was no longer using and the postgres account didn't have appropriate permissions for. It's one of those things where once you ask the question, you realize that the answer is implicit in the question. But a useful followup question is, how do I make this start itself when Windows starts? The service is set up to run as postgres and execute C:\Program Files\PostgreSQL\8.3\bin\pg_ctl.exe runservice -w -N pgsql-8.3 -D C:\Program Files\PostgreSQL\8.3\data\ This *is* pointing to the right data cluster (which I'm able to start successfully from the command line), but it's not starting automatically (even though it's configured to) and when I try to start it manually within the Component Services Manager, it generates the following error: Error 1069: The service did not start due to a logon failure. Thanks, Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center On Wed, Mar 12, 2008 at 2:05 PM, Scott Marlowe [EMAIL PROTECTED] wrote: On Wed, Mar 12, 2008 at 10:49 AM, Lee Hachadoorian [EMAIL PROTECTED] wrote: Last week I set up Postgres 8.3 on a WindowsXP machine. Had it up and running and imported data. Now when I try to start the server (after a machine restart) I get the message: pg_ctl: another server might be running; trying to start server anyway PANIC: could not open control file global/pg_control: Permission denied Or it could just be that you do already have another postmaster up and running already. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general