Re: [GENERAL] FW: Constraint exclusion in partitions
Hi Daniel: Bill Moran already pointed out some things, I more or less agree with him On Sun, May 24, 2015 at 12:16 AM, Daniel Begin jfd...@hotmail.com wrote: .. I use pgadmin_III sql window. I'll just point from my experience. At work, more than three fourths of the problems I've had to help my pgadmin using colleagues solve where due to their usage of pgadmin. Nearly every time they did dome db administration task, they did it poorly and begun to do it correctly when using plain postgresql. This may be anecdotal. I write the following query (I have changed the id to make sure it does not use previous results still in memory)... That is exactly the opposite of what you should do, unless you are able to thorougly clean the server caches ( not as easy as it sounds ) between runs. You want warm caches first, to account for the times in planning, hashing and joining and so. Once you've got that part ready, you can move on to real queries, busy server, etc for more fine tuning. Now I select explain query from the menu and I get the following result... ... Now, I select run and I get one record as a result and the following message in history tab... . As I never use pgadmin, I cannot tell you too much, but one thing I know is NONE of MY users was able to tell me what pgadmin does when you hit run and/or explain query. OTOH, I know what psql does. And, if you use EXPLAIN ANALYZE, as suggested by my previous message, you'll be better of ( as in this case you just get a single query result from the server and you do not have to care about what your tool of choice does in the middle ). lotta deletions, I really need to get much faster results with my queries on this large table and partitioning was my last option... Make sure you are not having an XY-problem. Also, from the type of queries you've shown us, I do not think partitioning is the way to go. It will only give you marginal improvements when querying for single or range of ids ( and small ones if your ranges of ids are somehow clustered in your tables ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has. How to do this? My SO link is not a solution to my problem I see now. -- Regards, Arup Rakshit Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. --Brian Kernighan Assuming that these CSV files are coming from an external source (e.g. Bank statement transactions for feeding into a Bank Rec module) then you need a program to read the file and handle it accordingly. If end users are running this, then they would probably appreciate a little report about what was loaded and what was discarded. On the other hand, if DBA's are running this you could use ExecuteQuery (written in Java) that has a facility to load CSV files and it will report the duplicates. However, you can ignore the duplicates and still commit the non duplicated transactions to the table, if you so desire. The default for EQ is NOT to run in auto-commit mode, so you have to actually issue a commit to save your work. However, this option can be changed in your preferences. HTH, Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Enum in foreign table: error and correct way to handle.
Thank you for the message Tom; sounds great. I'll try that out, will check on the planner's resultant behavior and email back. Peter On Sat, May 23, 2015 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Swartz peter.goodings.swa...@gmail.com writes: suppose the foreign database adds a value to the enum, and the foreign table now has rows with this new value, while the local definition of the enum remains unchanged. Obviously, the appropriate action on my part is to maintain consistency of enum definition between the foreign and local database, but I'm curious about what behavior would result if there was an error in this manual updating process. What I'd expect to happen is that you'd get errors when retrieving rows that had the values not known on the local side. One potential way around this is to declare the foreign table's columns as text rather than enums; you would lose some error checking on the local side, but the remote server would enforce validity whenever you stored something. (But I'm not sure whether this hack behaves desirably for WHERE conditions on the enum column; you'd need to test.) regards, tom lane
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
On Sunday, May 24, 2015 07:24:41 AM you wrote: On 05/24/2015 04:55 AM, Arup Rakshit wrote: On Sunday, May 24, 2015 02:52:47 PM you wrote: On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote: Hi, I am copying the data from a CSV file to a Table using COPY command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with copy command. By doing Google I got below 1 idea to use temp table. http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com. Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Assuming you are using Unix, or can install Unix tools, run the input files through sort -u before passing them to COPY. Oliver Elphick I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has. How to do this? As others have pointed out this depends on what you are considering a duplicate. Is it if the entire row is duplicated? It is entire row. Or if some portion of the row(a 'primary key') is duplicated? My SO link is not a solution to my problem I see now. -- Regards, Arup Rakshit Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. --Brian Kernighan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PG and undo logging
Ravi Krishna sravikrish...@gmail.com writes: Thanks for the detailed explanation. The explanation makes me wonder that PG must do more work at commit time, right? No. Commit and abort are both O(1). Where we pay the piper is in having to run VACUUM to clean up no-longer-needed row versions. This is a better design in principle, because the necessary maintenance can be done in background processes rather than making clients wait for transactions to finish. In practice, it's still pretty annoying, just in different ways than Oracle's UNDO. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote: Hi, I am copying the data from a CSV file to a Table using COPY command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with copy command. By doing Google I got below 1 idea to use temp table. http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com. Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Assuming you are using Unix, or can install Unix tools, run the input files through sort -u before passing them to COPY. Oliver Elphick -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
On 05/24/2015 04:55 AM, Arup Rakshit wrote: On Sunday, May 24, 2015 02:52:47 PM you wrote: On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote: Hi, I am copying the data from a CSV file to a Table using COPY command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with copy command. By doing Google I got below 1 idea to use temp table. http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com. Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Assuming you are using Unix, or can install Unix tools, run the input files through sort -u before passing them to COPY. Oliver Elphick I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has. How to do this? As others have pointed out this depends on what you are considering a duplicate. Is it if the entire row is duplicated? Or if some portion of the row(a 'primary key') is duplicated? My SO link is not a solution to my problem I see now. -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FW: Constraint exclusion in partitions
Hi Daniel: On Sat, May 23, 2015 at 8:37 PM, Daniel Begin jfd...@hotmail.com wrote: Following Francisco suggestion, I was able to do some tests earlier this morning when the partitioning process completed and all the resulting tables analyzed. Here is what I got on both the original table and its partitioned counterpart while running the same queries. I tested them only for a couple of values but in summary... Using a constant id: All the queries I tried took longer on the partitioned table! I got similar results for multiple records using IN (id value1, id value2 ...) Using a range of ids: Surprisingly again, all the queries I tried took longer on the partitioned table! Using a list of ids from a select clause: More surprisingly, the queries I tried took less time on the partitioned table at least when using the primary key. Using an indexed field took so long compared to the old table that I cancelled the execution for the new one! Guess what, I will get back to my old fat table unless someone tells me I missed something obvious! Note: Tables/indexes description, queries and execution plans are below. Tables/indexes description -- The original table has 387013 records. Primary key/index on each partition queries are ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version); CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id); The partitioned table has 387013 records distributed over 87 partitions. Primary key/index on each partition queries are ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, version); CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id); Where xx is the partition's number suffix It is missing here, but I supose you did not forget to add the exclusion check constraint. constant id --- select * from oldtable where id=123456789; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 62 ms. 1 rows retrieved select * from newtable where id=123456789; Append (cost=0.00..20.19 rows=5 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = 123456789::bigint) - Index Scan using newtable72_idversion_pk on newtable_72 (cost=0.56..20.19 rows=4 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 156 ms. 1 rows retrieved This is not surprissing, partitions never help much against that type of queries, with 87 partitions index are not going to be much shallower and the optimizer has a lot more of work to do. Just a couple points. When doing a lot of sequential queries on nearby ids ( on the same partition ) it will help ( better data locality, possibility of having more relevant pages in the cache ). Also, how did you do your timinngs? ( you should repeat each query 2 or 3 times, to see if the times go down a lot due to caching and, if you can, time a couple of explains ( plain, not analyze ) to see how much time the planner takes. As an example, the explain I sent you in my previous message takes between 20 and 30 milliseconds and I only have 17 live partitions ( I routinely un-inherit and move to another schema partitions older than a two years, to help the planner, and re-inherit them if needed ) ). Also, your queries seem to be very, very slow for a single indexed fetch, if I do this on my tables ( similar query to what I sent, modified to hit just 1 partition ): select count(*) from carrier_cdrs where setup between '20150107T123456' and '20150107T22'; It takes 40 ms in the first go, drops down to 27 after that, and I have ~15 ms RTT to the server ( which is more or less the time reported when I do a plain 'select 1' ). I mean, I suspect your measurements are not good, as they seem too slow. Also, when I use explain analyze on the previous query the server reports 13 ms, which is more or less one RTT less ( as the server starts measuring after receiving the query and ends before sending the reply ). Another thing, how are you getting the explain results ? I would urge you to use explain analyze, as, apart of seeming too slow, the stimations seem to be way off ( explain analyze sends you the estimated and actual results, so it is easy to see, and executes everything on the server, so result transmission time, which is not of use as it has to be the same for every method of calculating the same correct result, is not reported ). For my query I get this: QUERY PLAN Aggregate
[GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
Hi, I am copying the data from a CSV file to a Table using COPY command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with copy command. By doing Google I got below 1 idea to use temp table. http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com. Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Thanks in advance! -- Regards, Arup Rakshit Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. --Brian Kernighan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
On Sun, 2015-05-24 at 18:25 +0630, Arup Rakshit wrote: Assuming you are using Unix, or can install Unix tools, run the input files through sort -u before passing them to COPY. Oliver Elphick I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has. How to do this? Unix tools are still the easiest way to deal with it, I think. Ensure the total input is unique as above and stored in file1. Use COPY to output the existing table to another text file (file2) with similar format to file1. Then cat file1 file2 | sort | uniq -d file3 This will only output lines that exist in both file1 and file2. Then cat file1 file3 | sort | uniq -u newinputfile This will eliminate from file1 lines that are already in file2. It will only eliminate lines that are entirely identical; it won't stop duplicate primary keys. Oliver Elphick -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PG and undo logging
On Sat, May 23, 2015 at 10:12 PM, Scott Marlowe scott.marl...@gmail.com wrote: Ever run an insert with 1M rows, and roll it back in postgresql and compare that to oracle. Time the rollback in both. That should give you an idea of how differently the two dbs operate. A rollback in postgres is immediate because it's already rolled back so to speak. NONE of it's in the main data store yet, it's all in the transaction log waiting. Oracle spends it's time and effort creating an undo log because it commits every so often, whether or not you've committed your transaction. PostgreSQL doesn't. The whole transaction exists in the transaction log (called xlog dir in pg lingo.) When you roll back a pg transaction it literally requires almost no work. Mark the transaction as aborted etc and get on with life. Transaction logs get cleaned up as usual in the background and we go on our way. This means that Oracle uses space for rollback, while postgres uses space for roll forward (aka the transaction logs) so to speak. Thanks for the detailed explanation. The explanation makes me wonder that PG must do more work at commit time, right? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
On 05/24/2015 06:24 AM, Arup Rakshit wrote: On Sunday, May 24, 2015 07:24:41 AM you wrote: On 05/24/2015 04:55 AM, Arup Rakshit wrote: On Sunday, May 24, 2015 02:52:47 PM you wrote: On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote: Hi, I am copying the data from a CSV file to a Table using COPY command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with copy command. By doing Google I got below 1 idea to use temp table. http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com. Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Assuming you are using Unix, or can install Unix tools, run the input files through sort -u before passing them to COPY. Oliver Elphick I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has. How to do this? As others have pointed out this depends on what you are considering a duplicate. Is it if the entire row is duplicated? It is entire row. So, Olivers second solution. Or if some portion of the row(a 'primary key') is duplicated? My SO link is not a solution to my problem I see now. -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
On Sunday, May 24, 2015 02:52:47 PM you wrote: On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote: Hi, I am copying the data from a CSV file to a Table using COPY command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with copy command. By doing Google I got below 1 idea to use temp table. http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com. Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Assuming you are using Unix, or can install Unix tools, run the input files through sort -u before passing them to COPY. Oliver Elphick I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has. How to do this? My SO link is not a solution to my problem I see now. -- Regards, Arup Rakshit Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. --Brian Kernighan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
Hi Arup On Sun, May 24, 2015 at 12:26 PM, Arup Rakshit arupraks...@rocketmail.com wrote: I am copying the data from a CSV file to a Table using COPY command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with copy command. By doing Google I got below 1 idea to use temp table. http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com. Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Without looking at your SO answers ( I suppose both consist on copying to a temp table, then either doing an 'insert into main table select * from temp where not exists...' or a delete from temp / insert into main , which is what I would suggest ) I would point out copy is not the tool for this kind of jobs. Copy is normally used for bulk loading of correct data, and is great at this. If you need to preclean data, you are not doing double work using one of those solutions. Many ways of doing it have been pointed, the exact one depends on your data. You've pointed there are duplicate rows, if this is true you can easily do that using text tools and dumps or the temporary table. There is also the solution of just turning your CSV with your favorite text handiling tool into a set of insert lines conditioned of inexistence of an appropiately indexed set of fields in the table ( which I've done but cannot quote exactly, it was an experiment and got quite hairy ). From my experience, go for the temporary table plus insert-select/delete+select route, it's easier, is normally faster ( supposing you have appropiate indexes ) and it really is no double work. You have dirty data ( duplicates ), you have to clean it before inserting. Or, you could turn the file into a series of plain inserts and feed it to psql in autocommit mode, if you have some unique condition on the table, so it errors out on every duplicate. It's not pretty but should work. I still recommend the temp table approach. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
On Sunday, May 24, 2015 07:52:43 AM you wrote: Is it if the entire row is duplicated? It is entire row. So, Olivers second solution. I have done this : columns_t1 = self.singleton_class.fields.map { |f| t1.#{f} }.join(,) columns_t2 = self.singleton_class.fields.map { |f| t2.#{f} }.join(,) ActiveRecord::Base.transaction do conn = ActiveRecord::Base.connection conn.execute CREATE TEMP TABLE tmp_table AS SELECT * FROM #{table.strip}; conn.execute(COPY tmp_table ( #{self.singleton_class.fields.join(',') } ) FROM '#{source_file}' CSV HEADER DELIMITER '\t' QUOTE '|' ;) conn.execute INSERT INTO #{table.strip} ( #{self.singleton_class.fields.join(',')} ) SELECT DISTINCT #{columns_t1} FROM tmp_table t1 WHERE NOT EXISTS ( SELECT 1 FROM #{table.strip} t2 WHERE (#{columns_t2}) IS NOT DISTINCT FROM (#{columns_t1}) ); conn.execute DROP TABLE IF EXISTS tmp_table; End The SQL wrapped inside the ActiveRecord ORM as you see above. But I hope you got the idea. But I am not sure, if it is the correct way to do it or how it will hit the performance. The Application can run on different OS. So I am helpless to use Unix commands. -- Regards, Arup Rakshit Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. --Brian Kernighan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trying to solve a SocketError [Errno 13] Permission with PL/Python in PostgreSQL 9.3
On 05/24/2015 04:15 PM, Marcos Ortiz wrote: Sorry for the late response. Not sure if it applies but see here: http://neo4j.com/docs/stable/rest-api-security.html When Neo4j is first installed you can authenticate with the default user neo4j and the default password neo4j. However, the default password must be changed (see the section called “User status and password changing”) before access to resources will be permitted. ... Yes, I changed the password. It seems that the problem was with SELinux. I disabled it for a moment to make a simple test y everything worked. But, like Tom said, I don´t want SELinux disabled in my systems, so I will find out the security label who is blocking this in CentOS, and enable it again. When I find it, I will send the solution to the list. Glad you found the cause and thanks for following up. Nice to be able to close the loop on a problem. Best wishes and thanks again for your time. -- Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data Infrastructure) at UCI @marcosluis2186 http://twitter.com/marcosluis2186 -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Re: Server tries to read a different config file than it is supposed to
twoflower standa.ku...@gmail.com writes: I was doing this after I upgraded to 9.4.2, yes. As for the shut down: I suspect the server was rebooted without explicitly stopping Postgres. Not sure how this plays out in terms of cleanliness. This is everything relevant in the log file after I ran the start script: 2015-05-23 10:36:39.999 GMT [2102][0]: [1] LOG: database system was interrupted; last known up at 2015-05-23 08:59:41 GMT 2015-05-23 10:36:40.053 GMT [2102][0]: [2] FATAL: could not open file /storage/postgresql/9.4/data/postgresql.conf: Permission denied 2015-05-23 10:36:40.054 GMT [2100][0]: [3] LOG: startup process (PID 2102) exited with exit code 1 2015-05-23 10:36:40.054 GMT [2100][0]: [4] LOG: aborting startup due to startup process failure Yeah, so this is long after the real config file has been read. I think that that unwritable postgresql.conf file had probably been hanging around in your data directory for some time. It was not causing any particular problem until we decided we ought to fsync everything in the data directory after a crash. So this is indeed the same case Christoph was complaining about. But really you should remove that file not just change its permissions; as is it's just causing confusion. I also tried the same situation on two other Ubuntu servers with the same version of Postgres (also upgraded to 9.4.2) and the same directory layout - made *postgresql.conf* in the data directory unaccessible, even renamed it, and everything worked fine. The only difference is that these are streaming-replicated standby servers. They also had been restarted without explicitly terminating Postgres. Hm. I wonder why we aren't fsync'ing on crash restart on standby servers as well. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY
On Sun, May 24, 2015 at 4:26 AM, Arup Rakshit arupraks...@rocketmail.com wrote: Hi, I am copying the data from a CSV file to a Table using COPY command. But one thing that I got stuck, is how to skip duplicate records while copying from CSV to tables. By looking at the documentation, it seems, Postgresql don't have any inbuilt too to handle this with copy command. By doing Google I got below 1 idea to use temp table. http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql I am also thinking what if I let the records get inserted, and then delete the duplicate records from table as this post suggested - http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com. Both of the solution looks like doing double work. But I am not sure which is the best solution here. Can anybody suggest which approach should I adopt ? Or if any better ideas you guys have on this task, please share. Have you looked at pg_loader? http://pgloader.io/index.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trying to solve a SocketError [Errno 13] Permission with PL/Python in PostgreSQL 9.3
Sorry for the late response. On 23/05/15 19:38, Adrian Klaver wrote: On 05/23/2015 04:16 PM, Marcos Ortiz wrote: On 23/05/15 19:09, Adrian Klaver wrote: On 05/23/2015 03:51 PM, Marcos Ortiz wrote: On 23/05/15 18:40, Adrian Klaver wrote: On 05/23/2015 03:27 PM, Marcos Ortiz wrote: Regards to all the list. First all the info about the system: O.S: CentOS 7 64 bits PostgreSQL version: SELECT version(); version -- PostgreSQL 9.2.7 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit (1 row) Neo4j version: 2.1.M Py2neo version: 2.0.8 Python version: python Python 2.7.5 (default, Jun 17 2014, 18:11:42) [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2 Now the problem: We are working here to integrate PostgreSQL with Neo4j through PL/Python using the py2neo module for it, and when we want to send sentences to Neo4j using port 7474, the executed code raises a SocketError [Errno 13] Permission denied. Well first in the code below, if I am following correctly, the socket is 37474. Yes, Adrian. Sorry for that, the correct port is 7474. I just was testing with higher ports to see if the error persisted. I tested the same code in a normal python script outside of PostgreSQL, and it works well, but the problem is when I use the code inside PostgreSQL with PL/Python. Second the plpythonu code is running as the postgres user, so does that user have permissions on the socket. Did you mean the socket created by Neo4j's server right? For that reason, I created a group in the system for this named supervisor, where neo4j/postgres users are members. So, if I find the socket file for Neo4j-server, changing permissions could solve the problem. Right? Not sure, but a quick search found that py2neo uses the neo4j REST API and that API has authorization parameters: http://neo4j.com/docs/stable/security-server.html Have you gone through the above? Yes, Adrian. py2neo installs a tool called neoauth, which can be used to create users with their respective passwords. For that reason, I use this way to create the graph: graph = Graph(http://neo4j:neo4j@10.8.45.136:7474/db/data;) using the user neo4j and its pass neo4j Not sure if it applies but see here: http://neo4j.com/docs/stable/rest-api-security.html When Neo4j is first installed you can authenticate with the default user neo4j and the default password neo4j. However, the default password must be changed (see the section called “User status and password changing”) before access to resources will be permitted. ... Yes, I changed the password. It seems that the problem was with SELinux. I disabled it for a moment to make a simple test y everything worked. But, like Tom said, I don´t want SELinux disabled in my systems, so I will find out the security label who is blocking this in CentOS, and enable it again. When I find it, I will send the solution to the list. Best wishes and thanks again for your time. -- Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data Infrastructure) at UCI @marcosluis2186 http://twitter.com/marcosluis2186
SOLVED: [GENERAL] Trying to solve a SocketError [Errno 13] Permission with PL/Python in PostgreSQL 9.3
Regards, Adrian, Tom and all pgsql-general list. Like Tom said, the problem was with SELinux and I found the policy which caused all problems: grep denied audit.log | audit2allow #= postgresql_t == # This avc can be allowed using the boolean 'nis_enabled' allow postgresql_t ephemeral_port_t:tcp_socket name_connect; # This avc can be allowed using the boolean 'nis_enabled' allow postgresql_t unreserved_port_t:tcp_socket name_connect; I checked that boolean with: # getsebool -a | grep nis_enabled nis_enabled -- off Then, I changed it to on, and everything works well with SELinux enabled by default. Thanks again for the time and patience. On 24/05/15 19:43, Adrian Klaver wrote: On 05/24/2015 04:15 PM, Marcos Ortiz wrote: Sorry for the late response. Not sure if it applies but see here: http://neo4j.com/docs/stable/rest-api-security.html When Neo4j is first installed you can authenticate with the default user neo4j and the default password neo4j. However, the default password must be changed (see the section called “User status and password changing”) before access to resources will be permitted. ... Yes, I changed the password. It seems that the problem was with SELinux. I disabled it for a moment to make a simple test y everything worked. But, like Tom said, I don´t want SELinux disabled in my systems, so I will find out the security label who is blocking this in CentOS, and enable it again. When I find it, I will send the solution to the list. Glad you found the cause and thanks for following up. Nice to be able to close the loop on a problem. Best wishes and thanks again for your time. -- Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data Infrastructure) at UCI @marcosluis2186 http://twitter.com/marcosluis2186 -- Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data Infrastructure) at UCI @marcosluis2186 http://twitter.com/marcosluis2186
[GENERAL] Re: Server tries to read a different config file than it is supposed to
From root, presumably ... Yes I thought of a different theory: maybe the server's complaint is not due to trying to read that file as a config file, but it's just because there is an unreadable/unwritable file in the data directory. See Christoph Berg's complaint at http://www.postgresql.org/message-id/20150523172627.ga24...@...this would only apply if the OP was trying to use this week's releases though. Also, I thought the fsync-everything code would only run if the server had been shut down uncleanly. Which maybe it was, but that bit of info wasn't provided either. I was doing this after I upgraded to 9.4.2, yes. As for the shut down: I suspect the server was rebooted without explicitly stopping Postgres. Not sure how this plays out in terms of cleanliness. This is everything relevant in the log file after I ran the start script: 2015-05-23 10:36:39.999 GMT [2102][0]: [1] LOG: database system was interrupted; last known up at 2015-05-23 08:59:41 GMT 2015-05-23 10:36:40.053 GMT [2102][0]: [2] FATAL: could not open file /storage/postgresql/9.4/data/postgresql.conf: Permission denied 2015-05-23 10:36:40.054 GMT [2100][0]: [3] LOG: startup process (PID 2102) exited with exit code 1 2015-05-23 10:36:40.054 GMT [2100][0]: [4] LOG: aborting startup due to startup process failure I also tried the same situation on two other Ubuntu servers with the same version of Postgres (also upgraded to 9.4.2) and the same directory layout - made *postgresql.conf* in the data directory unaccessible, even renamed it, and everything worked fine. The only difference is that these are streaming-replicated standby servers. They also had been restarted without explicitly terminating Postgres. -- View this message in context: http://postgresql.nabble.com/Server-tries-to-read-a-different-config-file-than-it-is-supposed-to-tp5850752p5850829.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.