Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-24 Thread Francisco Olarte
Hi Daniel:

Bill Moran already pointed out some things, I more or less agree with him

On Sun, May 24, 2015 at 12:16 AM, Daniel Begin jfd...@hotmail.com wrote:
..
 I use pgadmin_III sql window.

I'll just point from my experience. At work, more than three fourths
of the problems I've had to help my pgadmin using colleagues solve
where due to their usage of pgadmin. Nearly every time they did dome
db administration task, they did it poorly and begun to do it
correctly when using plain postgresql. This may be anecdotal.


 I write the following query (I have changed
 the id to make sure it does not use previous results still in memory)...

That is exactly the opposite of what you should do, unless you are
able to thorougly clean the server caches ( not as easy as it sounds )
between runs. You want warm caches first, to account for the times in
planning, hashing and joining and so. Once you've got that part ready,
you can move on to real queries, busy server, etc for more fine
tuning.


 Now I select explain query from the menu and I get the following result...
...
 Now, I select run and I get one record as a result and the following
 message in history tab...
.

As I never use pgadmin, I cannot tell you too much, but one thing I
know is NONE of MY users was able to tell me what pgadmin does when
you hit run and/or explain query. OTOH, I know what psql does. And, if
you use EXPLAIN ANALYZE, as suggested by my previous message, you'll
be better of ( as in this case you just get a single query result from
the server and you do not have to care about what your tool of choice
does in the middle ).

lotta deletions,

 I really need to get much faster results with my queries on this large table
 and partitioning was my last option...

Make sure you are not having an XY-problem. Also, from the type of
queries you've shown us, I do not think partitioning is the way to go.
It will only give you marginal improvements when querying for single
or range of ids ( and small ones if your ranges of ids are somehow
clustered in your tables ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread rob stone

 I think I need to ask more specific way. I have a table say `table1`, where I 
 feed data from different CSV files. Now suppose I have inserted N records to 
 my table `table1` from csv file `c1`. This is ok, next time when again I am 
 importing from a different CSV file say `c2` to `table1`, I just don't want 
 reinsert any record from this new CSV file to table `table1`, if the current 
 CSV data already table has.
 
 How to do this?
 
  My SO link is not a solution to my problem I see now.
 
 -- 
 
 Regards,
 Arup Rakshit
 
 Debugging is twice as hard as writing the code in the first place. Therefore, 
 if you write the code as cleverly as possible, you are, by definition, not 
 smart enough to debug it.
 
 --Brian Kernighan
 
 

Assuming that these CSV files are coming from an external source (e.g.
Bank statement transactions for feeding into a Bank Rec module) then you
need a program to read the file and handle it accordingly. If end users
are running this, then they would probably appreciate a little report
about what was loaded and what was discarded.

On the other hand, if DBA's are running this you could use ExecuteQuery
(written in Java) that has a facility to load CSV files and it will
report the duplicates. However, you can ignore the duplicates and still
commit the non duplicated transactions to the table, if you so desire.
The default for EQ is NOT to run in auto-commit mode, so you have to
actually issue a commit to save your work. However, this option can be
changed in your preferences.

HTH,
Robert



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Enum in foreign table: error and correct way to handle.

2015-05-24 Thread Peter Swartz
Thank you for the message Tom; sounds great.  I'll try that out, will check
on the planner's resultant behavior and email back.

Peter

On Sat, May 23, 2015 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Peter Swartz peter.goodings.swa...@gmail.com writes:
  suppose the foreign database adds a value to the enum, and the foreign
  table now has rows with this new value, while the local definition of the
  enum remains unchanged.  Obviously, the appropriate action on my part is
 to
  maintain consistency of enum definition between the foreign and local
  database, but I'm curious about what behavior would result if there was
 an
  error in this manual updating process.

 What I'd expect to happen is that you'd get errors when retrieving rows
 that had the values not known on the local side.

 One potential way around this is to declare the foreign table's columns
 as text rather than enums; you would lose some error checking on the
 local side, but the remote server would enforce validity whenever you
 stored something.  (But I'm not sure whether this hack behaves desirably
 for WHERE conditions on the enum column; you'd need to test.)

 regards, tom lane



Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Arup Rakshit
On Sunday, May 24, 2015 07:24:41 AM you wrote:
 On 05/24/2015 04:55 AM, Arup Rakshit wrote:
  On Sunday, May 24, 2015 02:52:47 PM you wrote:
  On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote:
  Hi,
 
  I am copying the data from a CSV file to a Table using COPY command.
  But one thing that I got stuck, is how to skip duplicate records while
  copying from CSV to tables. By looking at the documentation, it seems,
  Postgresql don't have any inbuilt too to handle this with copy
  command. By doing Google I got below 1 idea to use temp table.
 
  http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql
 
  I am also thinking what if I let the records get inserted, and then
  delete the duplicate records from table as this post suggested -
  http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com.
 
  Both of the solution looks like doing double work. But I am not sure
  which is the best solution here. Can anybody suggest which approach
  should I adopt ? Or if any better ideas you guys have on this task,
  please share.
 
  Assuming you are using Unix, or can install Unix tools, run the input
  files through
 
 sort -u
 
  before passing them to COPY.
 
  Oliver Elphick
 
 
  I think I need to ask more specific way. I have a table say `table1`, where 
  I feed data from different CSV files. Now suppose I have inserted N records 
  to my table `table1` from csv file `c1`. This is ok, next time when again I 
  am importing from a different CSV file say `c2` to `table1`, I just don't 
  want reinsert any record from this new CSV file to table `table1`, if the 
  current CSV data already table has.
 
  How to do this?
 
 As others have pointed out this depends on what you are considering a 
 duplicate.
 
 Is it if the entire row is duplicated?

It is entire row.

 Or if some portion of the row(a 'primary key') is duplicated?
 
 
My SO link is not a solution to my problem I see now.
 
 
 
 

-- 

Regards,
Arup Rakshit

Debugging is twice as hard as writing the code in the first place. Therefore, 
if you write the code as cleverly as possible, you are, by definition, not 
smart enough to debug it.

--Brian Kernighan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG and undo logging

2015-05-24 Thread Tom Lane
Ravi Krishna sravikrish...@gmail.com writes:
 Thanks for the detailed explanation. The explanation makes me wonder
 that PG must do more work at commit time, right?

No.  Commit and abort are both O(1).  Where we pay the piper is in
having to run VACUUM to clean up no-longer-needed row versions.

This is a better design in principle, because the necessary maintenance
can be done in background processes rather than making clients wait
for transactions to finish.  In practice, it's still pretty annoying,
just in different ways than Oracle's UNDO.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Oliver Elphick
On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote:
 Hi,
 
 I am copying the data from a CSV file to a Table using COPY command.
 But one thing that I got stuck, is how to skip duplicate records while
 copying from CSV to tables. By looking at the documentation, it seems,
 Postgresql don't have any inbuilt too to handle this with copy
 command. By doing Google I got below 1 idea to use temp table.
 
 http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql
 
 I am also thinking what if I let the records get inserted, and then
 delete the duplicate records from table as this post suggested -
 http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com.
 
 Both of the solution looks like doing double work. But I am not sure
 which is the best solution here. Can anybody suggest which approach
 should I adopt ? Or if any better ideas you guys have on this task,
 please share.

Assuming you are using Unix, or can install Unix tools, run the input
files through

  sort -u

before passing them to COPY.

Oliver Elphick



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Adrian Klaver

On 05/24/2015 04:55 AM, Arup Rakshit wrote:

On Sunday, May 24, 2015 02:52:47 PM you wrote:

On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote:

Hi,

I am copying the data from a CSV file to a Table using COPY command.
But one thing that I got stuck, is how to skip duplicate records while
copying from CSV to tables. By looking at the documentation, it seems,
Postgresql don't have any inbuilt too to handle this with copy
command. By doing Google I got below 1 idea to use temp table.

http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql

I am also thinking what if I let the records get inserted, and then
delete the duplicate records from table as this post suggested -
http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com.

Both of the solution looks like doing double work. But I am not sure
which is the best solution here. Can anybody suggest which approach
should I adopt ? Or if any better ideas you guys have on this task,
please share.


Assuming you are using Unix, or can install Unix tools, run the input
files through

   sort -u

before passing them to COPY.

Oliver Elphick



I think I need to ask more specific way. I have a table say `table1`, where I 
feed data from different CSV files. Now suppose I have inserted N records to my 
table `table1` from csv file `c1`. This is ok, next time when again I am 
importing from a different CSV file say `c2` to `table1`, I just don't want 
reinsert any record from this new CSV file to table `table1`, if the current 
CSV data already table has.

How to do this?


As others have pointed out this depends on what you are considering a 
duplicate.


Is it if the entire row is duplicated?

Or if some portion of the row(a 'primary key') is duplicated?



  My SO link is not a solution to my problem I see now.




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-24 Thread Francisco Olarte
Hi Daniel:


On Sat, May 23, 2015 at 8:37 PM, Daniel Begin jfd...@hotmail.com wrote:
 Following Francisco suggestion, I was able to do some tests earlier this 
 morning when the partitioning process completed and all the resulting tables 
 analyzed.

 Here is what I got on both the original table and its partitioned counterpart 
 while running the same queries. I tested them only for a couple of values but 
 in summary...
 Using a constant id:
 All the queries I tried took longer on the partitioned table! I got similar 
 results for multiple records using IN (id value1, id value2 ...)
 Using a range of ids:
 Surprisingly again, all the queries I tried took longer on the partitioned 
 table!
 Using a list of ids from a select clause:
 More surprisingly, the queries I tried took less time on the partitioned 
 table at least when using the primary key. Using an indexed field took so 
 long compared to the old table that I cancelled the execution for the new one!

 Guess what, I will get back to my old fat table unless someone tells me I 
 missed something obvious!

 Note: Tables/indexes description, queries and execution plans are below.






 Tables/indexes description 
 --
 The original table has 387013 records. Primary key/index on each 
 partition queries are
 ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, 
 version);
 CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);

 The partitioned table has 387013 records distributed over 87 partitions. 
 Primary key/index on each partition queries are
 ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY 
 (id, version);
 CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id);
 Where xx is the partition's number suffix

It is missing here, but I supose you did not forget to add the
exclusion check constraint.


 constant id 
 ---
 select * from oldtable where id=123456789;
 Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..4437.15 
 rows=1682 width=66)
   Index Cond: (id = 123456789::bigint)
 --Total query runtime: 62 ms. 1 rows retrieved
 select * from newtable where id=123456789;
 Append  (cost=0.00..20.19 rows=5 width=66)
   -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
 Filter: (id = 123456789::bigint)
   -  Index Scan using newtable72_idversion_pk on newtable_72  
 (cost=0.56..20.19 rows=4 width=66)
 Index Cond: (id = 123456789::bigint)
 --Total query runtime: 156 ms. 1 rows retrieved

This is not surprissing, partitions never help much against that type
of queries, with 87 partitions index are not going to be much
shallower and the optimizer has a lot more of work to do. Just a
couple points. When doing a lot of sequential queries on nearby ids (
on the same partition ) it will help ( better data locality,
possibility of having more relevant pages in the cache ). Also, how
did you do your timinngs? ( you should repeat each query 2 or 3 times,
to see if the times go down a lot due to caching and, if you can, time
a couple of explains ( plain, not analyze ) to see how much time the
planner takes. As an example, the explain I sent you in my previous
message takes between 20 and 30 milliseconds and I only have 17 live
partitions ( I routinely un-inherit and move to another schema
partitions older than a two years, to help the planner, and re-inherit
them if needed ) ). Also, your queries seem to be very, very slow for
a single indexed fetch, if I do this on my tables ( similar query to
what I sent, modified to hit just 1 partition ):
select count(*) from carrier_cdrs where setup between
'20150107T123456' and '20150107T22';
It takes 40 ms in the first go, drops down to 27 after that, and I
have ~15 ms RTT to the server ( which is more or less the time
reported when I do a plain 'select 1' ). I mean, I suspect your
measurements are not good, as they seem too slow. Also, when I use
explain analyze on the previous query the server reports 13 ms, which
is more or less one RTT less ( as the server starts measuring after
receiving the query and ends before sending the reply ).

Another thing, how are you getting the explain results ? I would urge
you to use explain analyze, as, apart of seeming too slow, the
stimations seem to be way off ( explain analyze sends you the
estimated and actual results, so it is easy to see, and executes
everything on the server, so result transmission time, which is not of
use as it has to be the same for every method of calculating the same
correct result, is not reported ). For my query I get this:

QUERY PLAN

 Aggregate  

[GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Arup Rakshit
Hi,

I am copying the data from a CSV file to a Table using COPY command. But one 
thing that I got stuck, is how to skip duplicate records while copying from CSV 
to tables. By looking at the documentation, it seems, Postgresql don't have any 
inbuilt too to handle this with copy command. By doing Google I got below 1 
idea to use temp table.

http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql

I am also thinking what if I let the records get inserted, and then delete the 
duplicate records from table as this post suggested - 
http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com.

Both of the solution looks like doing double work. But I am not sure which is 
the best solution here. Can anybody suggest which approach should I adopt ? Or 
if any better ideas you guys have on this task, please share.

Thanks in advance!

-- 

Regards,
Arup Rakshit

Debugging is twice as hard as writing the code in the first place. Therefore, 
if you write the code as cleverly as possible, you are, by definition, not 
smart enough to debug it.

--Brian Kernighan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Oliver Elphick
On Sun, 2015-05-24 at 18:25 +0630, Arup Rakshit wrote:
  
  Assuming you are using Unix, or can install Unix tools, run the input
  files through
  
sort -u
  
  before passing them to COPY.
  
  Oliver Elphick
  
 
 I think I need to ask more specific way. I have a table say `table1`,
 where I feed data from different CSV files. Now suppose I have
 inserted N records to my table `table1` from csv file `c1`. This is
 ok, next time when again I am importing from a different CSV file say
 `c2` to `table1`, I just don't want reinsert any record from this new
 CSV file to table `table1`, if the current CSV data already table has.
 
 How to do this?

Unix tools are still the easiest way to deal with it, I think.

Ensure the total input is unique as above and stored in file1.

Use COPY to output the existing table to another text file (file2) with
similar format to file1.  Then

  cat file1 file2 | sort | uniq -d file3

This will only output lines that exist in both file1 and file2.  Then

  cat file1 file3 | sort | uniq -u newinputfile

This will eliminate from file1 lines that are already in file2.

It will only eliminate lines that are entirely identical; it won't stop
duplicate primary keys.


Oliver Elphick



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG and undo logging

2015-05-24 Thread Ravi Krishna
On Sat, May 23, 2015 at 10:12 PM, Scott Marlowe scott.marl...@gmail.com wrote:

 Ever run an insert with 1M rows, and roll it back in postgresql and
 compare that to oracle. Time the rollback in both. That should give
 you an idea of how differently the two dbs operate.

 A rollback in postgres is immediate because it's already rolled back
 so to speak. NONE of it's in the main data store yet, it's all in the
 transaction log waiting.

 Oracle spends it's time and effort creating an undo log because it
 commits every so often, whether or not you've committed your
 transaction.

 PostgreSQL doesn't. The whole transaction exists in the transaction
 log (called xlog dir in pg lingo.)

 When you roll back a pg transaction it literally requires almost no
 work. Mark the transaction as aborted etc and get on with life.
 Transaction logs get cleaned up as usual in the background and we go
 on our way.

 This means that Oracle uses space for rollback, while postgres uses
 space for roll forward (aka the transaction logs) so to speak.

Thanks for the detailed explanation. The explanation makes me wonder
that PG must do more work at commit time, right?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Adrian Klaver

On 05/24/2015 06:24 AM, Arup Rakshit wrote:

On Sunday, May 24, 2015 07:24:41 AM you wrote:

On 05/24/2015 04:55 AM, Arup Rakshit wrote:

On Sunday, May 24, 2015 02:52:47 PM you wrote:

On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote:

Hi,

I am copying the data from a CSV file to a Table using COPY command.
But one thing that I got stuck, is how to skip duplicate records while
copying from CSV to tables. By looking at the documentation, it seems,
Postgresql don't have any inbuilt too to handle this with copy
command. By doing Google I got below 1 idea to use temp table.

http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql

I am also thinking what if I let the records get inserted, and then
delete the duplicate records from table as this post suggested -
http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com.

Both of the solution looks like doing double work. But I am not sure
which is the best solution here. Can anybody suggest which approach
should I adopt ? Or if any better ideas you guys have on this task,
please share.


Assuming you are using Unix, or can install Unix tools, run the input
files through

sort -u

before passing them to COPY.

Oliver Elphick



I think I need to ask more specific way. I have a table say `table1`, where I 
feed data from different CSV files. Now suppose I have inserted N records to my 
table `table1` from csv file `c1`. This is ok, next time when again I am 
importing from a different CSV file say `c2` to `table1`, I just don't want 
reinsert any record from this new CSV file to table `table1`, if the current 
CSV data already table has.

How to do this?


As others have pointed out this depends on what you are considering a
duplicate.

Is it if the entire row is duplicated?


It is entire row.


So, Olivers second solution.




Or if some portion of the row(a 'primary key') is duplicated?



   My SO link is not a solution to my problem I see now.










--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Arup Rakshit
On Sunday, May 24, 2015 02:52:47 PM you wrote:
 On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote:
  Hi,
  
  I am copying the data from a CSV file to a Table using COPY command.
  But one thing that I got stuck, is how to skip duplicate records while
  copying from CSV to tables. By looking at the documentation, it seems,
  Postgresql don't have any inbuilt too to handle this with copy
  command. By doing Google I got below 1 idea to use temp table.
  
  http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql
  
  I am also thinking what if I let the records get inserted, and then
  delete the duplicate records from table as this post suggested -
  http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com.
  
  Both of the solution looks like doing double work. But I am not sure
  which is the best solution here. Can anybody suggest which approach
  should I adopt ? Or if any better ideas you guys have on this task,
  please share.
 
 Assuming you are using Unix, or can install Unix tools, run the input
 files through
 
   sort -u
 
 before passing them to COPY.
 
 Oliver Elphick
 

I think I need to ask more specific way. I have a table say `table1`, where I 
feed data from different CSV files. Now suppose I have inserted N records to my 
table `table1` from csv file `c1`. This is ok, next time when again I am 
importing from a different CSV file say `c2` to `table1`, I just don't want 
reinsert any record from this new CSV file to table `table1`, if the current 
CSV data already table has.

How to do this?

 My SO link is not a solution to my problem I see now.

-- 

Regards,
Arup Rakshit

Debugging is twice as hard as writing the code in the first place. Therefore, 
if you write the code as cleverly as possible, you are, by definition, not 
smart enough to debug it.

--Brian Kernighan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Francisco Olarte
Hi Arup

On Sun, May 24, 2015 at 12:26 PM, Arup Rakshit
arupraks...@rocketmail.com wrote:
 I am copying the data from a CSV file to a Table using COPY command. But 
 one thing that I got stuck, is how to skip duplicate records while copying 
 from CSV to tables. By looking at the documentation, it seems, Postgresql 
 don't have any inbuilt too to handle this with copy command. By doing 
 Google I got below 1 idea to use temp table.
 http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql
 I am also thinking what if I let the records get inserted, and then delete 
 the duplicate records from table as this post suggested - 
 http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com.
 Both of the solution looks like doing double work. But I am not sure which is 
 the best solution here. Can anybody suggest which approach should I adopt ? 
 Or if any better ideas you guys have on this task, please share.

Without looking at your SO answers ( I suppose both consist on copying
to a temp table, then either doing an 'insert into main table select *
from temp where not exists...' or a delete from temp / insert into
main , which is what I would suggest ) I would point out copy is not
the tool for this kind of jobs. Copy is normally used for bulk loading
of correct data, and is great at this. If you need to preclean data,
you are not doing double work using one of those solutions. Many ways
of doing it have been pointed, the exact one depends on your data.
You've pointed there are duplicate rows, if this is true you can
easily do that using text tools and dumps or the temporary table.
There is also the solution of just turning your CSV with your favorite
text handiling tool into a set of insert lines conditioned of
inexistence of an appropiately indexed set of fields in the table (
which I've done but cannot quote exactly, it was an experiment and got
quite hairy ). From my experience, go for the temporary table plus
insert-select/delete+select route, it's easier, is normally faster (
supposing you have appropiate indexes ) and it really is no double
work. You have dirty data ( duplicates ), you have to clean it before
inserting.

Or, you could turn the file into a series of plain inserts and feed it
to psql in autocommit mode, if you have some unique condition on the
table, so it errors out on every duplicate. It's not pretty but should
work. I still recommend the temp table approach.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Arup Rakshit
On Sunday, May 24, 2015 07:52:43 AM you wrote:
 
  Is it if the entire row is duplicated?
 
  It is entire row.
 
 So, Olivers second solution.
 

I have done this : 

columns_t1 = self.singleton_class.fields.map { |f| t1.#{f} }.join(,)
columns_t2 = self.singleton_class.fields.map { |f| t2.#{f} }.join(,)
ActiveRecord::Base.transaction do
  conn = ActiveRecord::Base.connection
  conn.execute CREATE TEMP TABLE tmp_table AS SELECT * FROM #{table.strip}; 
  conn.execute(COPY tmp_table ( #{self.singleton_class.fields.join(',') } ) 
FROM '#{source_file}' CSV HEADER DELIMITER '\t' QUOTE '|' ;)
  conn.execute INSERT INTO #{table.strip} ( 
#{self.singleton_class.fields.join(',')} ) SELECT DISTINCT #{columns_t1} FROM 
tmp_table t1 WHERE NOT EXISTS ( SELECT 1 FROM #{table.strip} t2 WHERE 
(#{columns_t2}) IS NOT DISTINCT FROM (#{columns_t1}) );
  conn.execute DROP TABLE IF EXISTS tmp_table;
End

The SQL wrapped inside the ActiveRecord ORM as you see above. But I hope you 
got the idea. But I am not sure, if it is the correct way to do it or how it 
will hit the performance.

The Application can run on different OS. So I am helpless to use Unix commands.

-- 

Regards,
Arup Rakshit

Debugging is twice as hard as writing the code in the first place. Therefore, 
if you write the code as cleverly as possible, you are, by definition, not 
smart enough to debug it.

--Brian Kernighan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trying to solve a SocketError [Errno 13] Permission with PL/Python in PostgreSQL 9.3

2015-05-24 Thread Adrian Klaver

On 05/24/2015 04:15 PM, Marcos Ortiz wrote:

Sorry for the late response.





Not sure if it applies but see here:

http://neo4j.com/docs/stable/rest-api-security.html
When Neo4j is first installed you can authenticate with the default
user neo4j and the default password neo4j. However, the default
password must be changed (see the section called “User status and
password changing”) before access to resources will be permitted. ...

Yes, I changed the password.
It seems that the problem was with SELinux.
I disabled it for a moment to make a simple test y everything worked.
But, like Tom said, I don´t want SELinux disabled in my systems, so I
will find out the
security label who is blocking this in CentOS, and enable it again.
When I find it, I will send the solution to the list.


Glad you found the cause and thanks for following up. Nice to be able to 
close the loop on a problem.



Best wishes and thanks again for your time.





--
Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data
Infrastructure) at UCI
@marcosluis2186 http://twitter.com/marcosluis2186





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Server tries to read a different config file than it is supposed to

2015-05-24 Thread Tom Lane
twoflower standa.ku...@gmail.com writes:
 I was doing this after I upgraded to 9.4.2, yes. As for the shut down: I
 suspect the server was rebooted without explicitly stopping Postgres. Not
 sure how this plays out in terms of cleanliness. This is everything relevant
 in the log file after I ran the start script:
 2015-05-23 10:36:39.999 GMT [2102][0]: [1] LOG: database system was
 interrupted; last known up at 2015-05-23 08:59:41 GMT
 2015-05-23 10:36:40.053 GMT [2102][0]: [2] FATAL: could not open file
 /storage/postgresql/9.4/data/postgresql.conf: Permission denied
 2015-05-23 10:36:40.054 GMT [2100][0]: [3] LOG: startup process (PID 2102)
 exited with exit code 1
 2015-05-23 10:36:40.054 GMT [2100][0]: [4] LOG: aborting startup due to
 startup process failure

Yeah, so this is long after the real config file has been read.

I think that that unwritable postgresql.conf file had probably been
hanging around in your data directory for some time.  It was not causing
any particular problem until we decided we ought to fsync everything in
the data directory after a crash.  So this is indeed the same case
Christoph was complaining about.  But really you should remove that file
not just change its permissions; as is it's just causing confusion.

 I also tried the same situation on two other Ubuntu servers with the same
 version of Postgres (also upgraded to 9.4.2) and the same directory layout -
 made *postgresql.conf* in the data directory unaccessible, even renamed it,
 and everything worked fine. The only difference is that these are
 streaming-replicated standby servers. They also had been restarted without
 explicitly terminating Postgres.

Hm.  I wonder why we aren't fsync'ing on crash restart on standby servers
as well.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to skip duplicate records while copying from CSV to table in Postgresql using COPY

2015-05-24 Thread Scott Marlowe
On Sun, May 24, 2015 at 4:26 AM, Arup Rakshit
arupraks...@rocketmail.com wrote:
 Hi,

 I am copying the data from a CSV file to a Table using COPY command. But 
 one thing that I got stuck, is how to skip duplicate records while copying 
 from CSV to tables. By looking at the documentation, it seems, Postgresql 
 don't have any inbuilt too to handle this with copy command. By doing 
 Google I got below 1 idea to use temp table.

 http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql

 I am also thinking what if I let the records get inserted, and then delete 
 the duplicate records from table as this post suggested - 
 http://www.postgresql.org/message-id/37013500.dff0a...@manhattanproject.com.

 Both of the solution looks like doing double work. But I am not sure which is 
 the best solution here. Can anybody suggest which approach should I adopt ? 
 Or if any better ideas you guys have on this task, please share.

Have you looked at pg_loader?
http://pgloader.io/index.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trying to solve a SocketError [Errno 13] Permission with PL/Python in PostgreSQL 9.3

2015-05-24 Thread Marcos Ortiz

Sorry for the late response.

On 23/05/15 19:38, Adrian Klaver wrote:

On 05/23/2015 04:16 PM, Marcos Ortiz wrote:



On 23/05/15 19:09, Adrian Klaver wrote:

On 05/23/2015 03:51 PM, Marcos Ortiz wrote:



On 23/05/15 18:40, Adrian Klaver wrote:

On 05/23/2015 03:27 PM, Marcos Ortiz wrote:

Regards to all the list.
First all the info about the system:

O.S: CentOS 7 64 bits
PostgreSQL version:

SELECT version();
version
-- 





  PostgreSQL 9.2.7 on x86_64-redhat-linux-gnu, compiled by gcc (GCC)
4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
(1 row)

Neo4j version: 2.1.M
Py2neo version: 2.0.8
Python version:
python
Python 2.7.5 (default, Jun 17 2014, 18:11:42)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2

Now the problem:

We are working here to integrate PostgreSQL with Neo4j through
PL/Python
using the py2neo module for it, and when we want to send 
sentences to

Neo4j using port 7474, the executed code raises a SocketError
[Errno 13]
Permission denied.


Well first in the code below, if I am following correctly, the socket
is 37474.

Yes, Adrian. Sorry for that, the correct port is 7474. I just was
testing with higher ports to
see if the error persisted.





I tested the same code in a normal python script outside of
PostgreSQL,
and it works well, but the problem is when I use the code inside
PostgreSQL with PL/Python.


Second the plpythonu code is running as the postgres user, so does
that user have permissions on the socket.

Did you mean the socket created by Neo4j's server right?
For that reason, I created a group in the system for this named
supervisor, where neo4j/postgres users are members.
So, if I find the socket file for Neo4j-server, changing permissions
could solve the problem. Right?


Not sure, but a quick search found that py2neo uses the neo4j REST API
and that API has authorization parameters:

http://neo4j.com/docs/stable/security-server.html

Have you gone through the above?

Yes, Adrian. py2neo installs a tool called neoauth, which can be used to
create users with their respective passwords.
For that reason, I use this way to create the graph:

graph = Graph(http://neo4j:neo4j@10.8.45.136:7474/db/data;)
using the user neo4j and its pass neo4j



Not sure if it applies but see here:

http://neo4j.com/docs/stable/rest-api-security.html
When Neo4j is first installed you can authenticate with the default 
user neo4j and the default password neo4j. However, the default 
password must be changed (see the section called “User status and 
password changing”) before access to resources will be permitted. ...

Yes, I changed the password.
It seems that the problem was with SELinux.
I disabled it for a moment to make a simple test y everything worked.
But, like Tom said, I don´t want SELinux disabled in my systems, so I 
will find out the

security label who is blocking this in CentOS, and enable it again.
When I find it, I will send the solution to the list.
Best wishes and thanks again for your time.





--
Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data 
Infrastructure) at UCI

@marcosluis2186 http://twitter.com/marcosluis2186



SOLVED: [GENERAL] Trying to solve a SocketError [Errno 13] Permission with PL/Python in PostgreSQL 9.3

2015-05-24 Thread Marcos Ortiz

Regards, Adrian, Tom and all pgsql-general list.
Like Tom said, the problem was with SELinux and I found the policy which 
caused all problems:

grep denied audit.log | audit2allow


#= postgresql_t ==

# This avc can be allowed using the boolean 'nis_enabled'
allow postgresql_t ephemeral_port_t:tcp_socket name_connect;

# This avc can be allowed using the boolean 'nis_enabled'
allow postgresql_t unreserved_port_t:tcp_socket name_connect;

I checked that boolean with:

# getsebool -a | grep nis_enabled
nis_enabled -- off

Then, I changed it to on, and everything works well with SELinux enabled 
by default.

Thanks again for the time and patience.


On 24/05/15 19:43, Adrian Klaver wrote:

On 05/24/2015 04:15 PM, Marcos Ortiz wrote:

Sorry for the late response.





Not sure if it applies but see here:

http://neo4j.com/docs/stable/rest-api-security.html
When Neo4j is first installed you can authenticate with the default
user neo4j and the default password neo4j. However, the default
password must be changed (see the section called “User status and
password changing”) before access to resources will be permitted. ...

Yes, I changed the password.
It seems that the problem was with SELinux.
I disabled it for a moment to make a simple test y everything worked.
But, like Tom said, I don´t want SELinux disabled in my systems, so I
will find out the
security label who is blocking this in CentOS, and enable it again.
When I find it, I will send the solution to the list.


Glad you found the cause and thanks for following up. Nice to be able 
to close the loop on a problem.





Best wishes and thanks again for your time.





--
Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data
Infrastructure) at UCI
@marcosluis2186 http://twitter.com/marcosluis2186







--
Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data 
Infrastructure) at UCI

@marcosluis2186 http://twitter.com/marcosluis2186



[GENERAL] Re: Server tries to read a different config file than it is supposed to

2015-05-24 Thread twoflower

 From root, presumably ... 

Yes 
 I thought of a different theory: maybe the server's complaint is not due
 to trying to read that file as a config file, but it's just because there
 is an unreadable/unwritable file in the data directory.  See Christoph
 Berg's complaint at
 http://www.postgresql.org/message-id/20150523172627.ga24...@...this would
 only apply if the OP was trying to use this week's releases though.  Also,
 I thought the fsync-everything code would only run if the server had been
 shut down uncleanly.  Which maybe it was, but that bit of info wasn't
 provided either.

I was doing this after I upgraded to 9.4.2, yes. As for the shut down: I
suspect the server was rebooted without explicitly stopping Postgres. Not
sure how this plays out in terms of cleanliness. This is everything relevant
in the log file after I ran the start script:
2015-05-23 10:36:39.999 GMT [2102][0]: [1] LOG: database system was
interrupted; last known up at 2015-05-23 08:59:41 GMT
2015-05-23 10:36:40.053 GMT [2102][0]: [2] FATAL: could not open file
/storage/postgresql/9.4/data/postgresql.conf: Permission denied
2015-05-23 10:36:40.054 GMT [2100][0]: [3] LOG: startup process (PID 2102)
exited with exit code 1
2015-05-23 10:36:40.054 GMT [2100][0]: [4] LOG: aborting startup due to
startup process failure
I also tried the same situation on two other Ubuntu servers with the same
version of Postgres (also upgraded to 9.4.2) and the same directory layout -
made *postgresql.conf* in the data directory unaccessible, even renamed it,
and everything worked fine. The only difference is that these are
streaming-replicated standby servers. They also had been restarted without
explicitly terminating Postgres.




--
View this message in context: 
http://postgresql.nabble.com/Server-tries-to-read-a-different-config-file-than-it-is-supposed-to-tp5850752p5850829.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.