Re: [GENERAL] Bulk Load Ignore/Skip Feature

2007-11-17 Thread Ow Mun Heng

On Wed, 2007-11-14 at 00:02 -0800, Willem Buitendyk wrote:
> Perfect - that appears to be exactly what I was looking for.

> Reg Me Please wrote:
> > Il Wednesday 14 November 2007 05:50:36 Willem Buitendyk ha scritto:
> >   
> >> Will Postgresql ever implement an ignore on error feature when bulk
> >> loading data?  Currently it is my understanding that any record that
> >> violates a unique constraint will cause the "copy from" command to halt
> >> execution instead of skipping over the violation and logging it - as is
> >> done in Oracle and DB2.
> >
> > pgloader
> >
> > http://pgfoundry.org/projects/pgloader/
> >
> >   

I believe the last time I tried this, there was still some issues with
it. See attached email. (if it makes it to the list)

--- Begin Message ---
On Mon, 2007-08-27 at 11:27 +0200, Dimitri Fontaine wrote:

> We've just made some tests here with 2.2.1 and as this release contains the 
> missing files, it works fine without any installation.

Yep.. I can confirm that it works.. I am using the csv example.

Goal : similar functionality much like mysql's mysqlimport --replace
(overwrite any rows which has duplicate primary keys)

$ psql pgloader < csv/csv.sql
$ ../pgloader.py -Tvc examples/pgloader.conf csv

pgloader=# alter table csv add primary key (a,b,c);
pgloader=# \d csv
Table "public.csv"
 Column | Type | Modifiers
+--+---
 a  | bigint   | not null
 b  | bigint   | not null
 c  | character(2) | not null
 d  | text |
Indexes:
"csv_pkey" PRIMARY KEY, btree (a, b, c)

pgloader=# select * from csv;
a |b | c  |   d
--+--++
 33996344 | 33996351 | GB | United Kingdom
 50331648 | 68257567 | US | United States
 68257568 | 68257599 | CA | Canada
 68257600 | 68259583 | US | United States
 68259584 | 68259599 | CA | Canada

$cat csv/csv.data
"2.6.190.56","2.6.190.63","33996344","33996351","GB","Error Kingdom"
"4.17.143.0","4.17.143.15","68259584","68259599","CA","new Country"
--- End Message ---

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Bulk Load Ignore/Skip Feature

2007-11-16 Thread Dimitri Fontaine
Hi all,

Le Friday 16 November 2007 18:04:44 Willem Buitendyk, vous avez écrit :
> Martijn van Oosterhout wrote:
> > On Thu, Nov 15, 2007 at 08:09:46PM -0800, Willem Buitendyk wrote:
> >> Damn - so the unqiue contraint is still an issue.  What gives?  Why is
> >> it so hard to implement this in Postgresql?  sigh - if only I had more
> >> time.
> >
> > Can you explain? The server ofcourse still generates error messages in
> > the logs, there's no way around that. However it looks to me that the
> > data ended up in the database correctly? Or did I miss something?

pgloader will load non-conflicting data and produce both a reject log with 
errors about non inserted (COPYied) data and a reject data file with the 
input line ready to be processed again if such is the operator choice.

But at the moment it does not provide any way to automate the UPDATE the PK 
conflicting rows. I'm really hesitant as to code this option: what to do in 
the case of a non primary key unique constraint conflict:

dim=# create table unic(a integer unique);
dim=# insert into unic values (1);
INSERT 2312559 1
dim=# insert into unic values (1);
ERROR:  duplicate key violates unique constraint "unic_a_key"

dim=# create table pk(a integer primary key);
dim=# insert into pk values (1);
INSERT 2312565 1
dim=# insert into pk values (1);
ERROR:  duplicate key violates unique constraint "pk_pkey"

I'm thinking maybe in the first case you don't want existing values to be 
overwritten, but in the second case it's what you want to happen. Should this 
be the user responsibility to make the difference --- by configuring pgloader 
properly --- or should the tool try hard to protect the user against himself?

How to act on a table with a surrogate pk and a unique constraint when you 
want to automatically update surrogate key but not the unique data, or the 
other way around?

So I have two questions for the community:
 - should I provide a pgloader mailing list?
 - what do you think about adding the UPDATE-on-duplicate-key-error option?

> My apologies.  I misinterpreted that last post.  I have not been able to
> try pgloader as I am using the windows platform.

pgloader is a python "script" which depends on psycopg for handling the 
PostgreSQL connection, and only standard python modules after that. The 
following link provides windows binaries for psycopg.
  http://www.stickpeople.com/projects/python/win-psycopg/

I've gotten reports of pgloader running on windows, even if I didn't make any 
specific effort for this to happen and I don't have any proprietary licenced 
OS to test pgloader on.

Hope this helps,
-- 
dim

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Bulk Load Ignore/Skip Feature

2007-11-16 Thread Willem Buitendyk
My apologies.  I misinterpreted that last post.  I have not been able to 
try pgloader as I am using the windows platform. 


Martijn van Oosterhout wrote:

On Thu, Nov 15, 2007 at 08:09:46PM -0800, Willem Buitendyk wrote:
  
Damn - so the unqiue contraint is still an issue.  What gives?  Why is 
it so hard to implement this in Postgresql?  sigh - if only I had more time.



Can you explain? The server ofcourse still generates error messages in
the logs, there's no way around that. However it looks to me that the
data ended up in the database correctly? Or did I miss something?

Have a nice day,
  



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Bulk Load Ignore/Skip Feature

2007-11-16 Thread Martijn van Oosterhout
On Thu, Nov 15, 2007 at 08:09:46PM -0800, Willem Buitendyk wrote:
> Damn - so the unqiue contraint is still an issue.  What gives?  Why is 
> it so hard to implement this in Postgresql?  sigh - if only I had more time.

Can you explain? The server ofcourse still generates error messages in
the logs, there's no way around that. However it looks to me that the
data ended up in the database correctly? Or did I miss something?

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution 
> inevitable.
>  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [GENERAL] Bulk Load Ignore/Skip Feature

2007-11-15 Thread Willem Buitendyk
Damn - so the unqiue contraint is still an issue.  What gives?  Why is 
it so hard to implement this in Postgresql?  sigh - if only I had more time.



Ow Mun Heng wrote:

On Wed, 2007-11-14 at 00:02 -0800, Willem Buitendyk wrote:
  

Perfect - that appears to be exactly what I was looking for.



  

Reg Me Please wrote:


Il Wednesday 14 November 2007 05:50:36 Willem Buitendyk ha scritto:
  
  

Will Postgresql ever implement an ignore on error feature when bulk
loading data?  Currently it is my understanding that any record that
violates a unique constraint will cause the "copy from" command to halt
execution instead of skipping over the violation and logging it - as is
done in Oracle and DB2.


pgloader

http://pgfoundry.org/projects/pgloader/

  
  


I believe the last time I tried this, there was still some issues with
it. See attached email. (if it makes it to the list)




__ NOD32 2657 (20071114) Information __

This message was checked by NOD32 antivirus system.
http://www.eset.com
  




Subject:
PgLoader unable to handle pkey dups Was [Re: {Spam} [GENERAL] pgloader 
- Can't find textreader/csvreader]

From:
Ow Mun Heng <[EMAIL PROTECTED]>
Date:
Mon, 27 Aug 2007 18:01:54 +0800
To:
Dimitri Fontaine <[EMAIL PROTECTED]>

To:
Dimitri Fontaine <[EMAIL PROTECTED]>
CC:
pgsql-general@postgresql.org


On Mon, 2007-08-27 at 11:27 +0200, Dimitri Fontaine wrote:

  
We've just made some tests here with 2.2.1 and as this release contains the 
missing files, it works fine without any installation.



Yep.. I can confirm that it works.. I am using the csv example.

Goal : similar functionality much like mysql's mysqlimport --replace
(overwrite any rows which has duplicate primary keys)

$ psql pgloader < csv/csv.sql
$ ../pgloader.py -Tvc examples/pgloader.conf csv

pgloader=# alter table csv add primary key (a,b,c);
pgloader=# \d csv
Table "public.csv"
 Column | Type | Modifiers
+--+---
 a  | bigint   | not null
 b  | bigint   | not null
 c  | character(2) | not null
 d  | text |
Indexes:
"csv_pkey" PRIMARY KEY, btree (a, b, c)

pgloader=# select * from csv;
a |b | c  |   d
--+--++
 33996344 | 33996351 | GB | United Kingdom
 50331648 | 68257567 | US | United States
 68257568 | 68257599 | CA | Canada
 68257600 | 68259583 | US | United States
 68259584 | 68259599 | CA | Canada

$cat csv/csv.data
"2.6.190.56","2.6.190.63","33996344","33996351","GB","Error Kingdom"
"4.17.143.0","4.17.143.15","68259584","68259599","CA","new Country"
  



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Bulk Load Ignore/Skip Feature

2007-11-14 Thread David Fetter
On Tue, Nov 13, 2007 at 08:50:36PM -0800, Willem Buitendyk wrote:
> Will Postgresql ever implement an ignore on error feature when bulk
> loading data?  Currently it is my understanding that any record that
> violates a unique constraint will cause the "copy from" command to
> halt execution instead of skipping over the violation and logging it
> - as is done in Oracle and DB2.
> 
> Are there alternative ways of dealing with this scenario that won't
> consume as much time?
> 
> Appreciate any help  - would love to migrate away from Oracle.

You might try pgloader. :)

http://pgfoundry.org/projects/pgloader/

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Bulk Load Ignore/Skip Feature

2007-11-14 Thread Willem Buitendyk

Perfect - that appears to be exactly what I was looking for.

Cheers

Reg Me Please wrote:

Il Wednesday 14 November 2007 05:50:36 Willem Buitendyk ha scritto:
  

Will Postgresql ever implement an ignore on error feature when bulk
loading data?  Currently it is my understanding that any record that
violates a unique constraint will cause the "copy from" command to halt
execution instead of skipping over the violation and logging it - as is
done in Oracle and DB2.

Are there alternative ways of dealing with this scenario that won't
consume as much time?

Appreciate any help  - would love to migrate away from Oracle.

Cheers



pgloader

http://pgfoundry.org/projects/pgloader/

  



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Bulk Load Ignore/Skip Feature

2007-11-13 Thread Reg Me Please
Il Wednesday 14 November 2007 05:50:36 Willem Buitendyk ha scritto:
> Will Postgresql ever implement an ignore on error feature when bulk
> loading data?  Currently it is my understanding that any record that
> violates a unique constraint will cause the "copy from" command to halt
> execution instead of skipping over the violation and logging it - as is
> done in Oracle and DB2.
>
> Are there alternative ways of dealing with this scenario that won't
> consume as much time?
>
> Appreciate any help  - would love to migrate away from Oracle.
>
> Cheers

pgloader

http://pgfoundry.org/projects/pgloader/

-- 
Reg me Please

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] Bulk Load Ignore/Skip Feature

2007-11-13 Thread Willem Buitendyk
Will Postgresql ever implement an ignore on error feature when bulk 
loading data?  Currently it is my understanding that any record that 
violates a unique constraint will cause the "copy from" command to halt 
execution instead of skipping over the violation and logging it - as is 
done in Oracle and DB2.


Are there alternative ways of dealing with this scenario that won't 
consume as much time?


Appreciate any help  - would love to migrate away from Oracle.

Cheers



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq