Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Lincoln Yeoh wrote:
> > At 06:32 PM 9/6/2007, Richard Huxton wrote:
> >
> >> Two other tips for bulk-updates like this:
> >> 1. Do as many columns in one go as you can
> >> 2. Only update rows that need updating
> >>
> >> When you've finished, a CLUSTER/VACUUM FULL can be useful too.
> >
> > How about: make sure you have enough free space because the table will 
> > effectively double in size? Assuming it hasn't already been updated a few 
> > times without vacuuming :).
> 
> It is true for CLUSTER, but not for VACUUM FULL.

Doh, sorry, you were referring to double the space during the UPDATE.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Alvaro Herrera
Lincoln Yeoh wrote:
> At 06:32 PM 9/6/2007, Richard Huxton wrote:
>
>> Two other tips for bulk-updates like this:
>> 1. Do as many columns in one go as you can
>> 2. Only update rows that need updating
>>
>> When you've finished, a CLUSTER/VACUUM FULL can be useful too.
>
> How about: make sure you have enough free space because the table will 
> effectively double in size? Assuming it hasn't already been updated a few 
> times without vacuuming :).

It is true for CLUSTER, but not for VACUUM FULL.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Lincoln Yeoh

At 06:32 PM 9/6/2007, Richard Huxton wrote:


Two other tips for bulk-updates like this:
1. Do as many columns in one go as you can
2. Only update rows that need updating

When you've finished, a CLUSTER/VACUUM FULL can be useful too.


How about: make sure you have enough free space because the table 
will effectively double in size? Assuming it hasn't already been 
updated a few times without vacuuming :).


That's still true right?

It is safe to assume that postgresql will still handle the out of 
disk space scenario gracefully - no data corruption - the transaction 
fails and that's it?


Regards,
Link.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread hubert depesz lubaczewski
On Thu, Sep 06, 2007 at 01:39:51PM +0200, Nis Jørgensen wrote:
> Rubbish. From the documentation:

hmm .. i'm sorry - i was *sure* about it because we were bitten by
something like this lately - apparently it was similiar but not the
same.

sorry again for misinformation.

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Nis Jørgensen
hubert depesz lubaczewski skrev:
> On Thu, Sep 06, 2007 at 11:08:02AM +0200, Alban Hertroys wrote:
>> create index tmp_idx on table(number) where number != trim(number);
>> analyze table;
>> update table set number = trim(number) where number != trim(number);
> 
> dont use !=. use <>. != does something different, and in fact it is
> not a real operator - it's just 2 operators bundled together.

Rubbish. From the documentation:

"The != operator is converted to <> in the parser stage. It is not
possible to implement != and <> operators that do different things."

Nis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread hubert depesz lubaczewski
On Thu, Sep 06, 2007 at 11:08:02AM +0200, Alban Hertroys wrote:
> create index tmp_idx on table(number) where number != trim(number);
> analyze table;
> update table set number = trim(number) where number != trim(number);

dont use !=. use <>. != does something different, and in fact it is
not a real operator - it's just 2 operators bundled together.

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Richard Huxton

Ow Mun Heng wrote:

On Thu, 2007-09-06 at 04:47 -0500, Ron Johnson wrote:

Maybe there's an English language "issue", or maybe I'm just
excessively picky, but using "number" in this context is confusing.


My Bad.. hehe.. 


Then I agree with Alban:
  update table set number = trim(number);
or, if you need the leading spaces:
  update table set number = rtrim(number)

Then:
  update table set number = NULL where number = '';


Dag it.. I wish I didn't live on the other end of the TimeZone and
gotten this answer before I went the "hard way" 


oh well.. lesson learned and that's the positive spin on it right?

Thanks a bunch guys..


Two other tips for bulk-updates like this:
1. Do as many columns in one go as you can
2. Only update rows that need updating

When you've finished, a CLUSTER/VACUUM FULL can be useful too.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Ow Mun Heng
On Thu, 2007-09-06 at 04:47 -0500, Ron Johnson wrote:
> Maybe there's an English language "issue", or maybe I'm just
> excessively picky, but using "number" in this context is confusing.

My Bad.. hehe.. 

> Then I agree with Alban:
>   update table set number = trim(number);
> or, if you need the leading spaces:
>   update table set number = rtrim(number)
> 
> Then:
>   update table set number = NULL where number = '';

Dag it.. I wish I didn't live on the other end of the TimeZone and
gotten this answer before I went the "hard way" 

oh well.. lesson learned and that's the positive spin on it right?

Thanks a bunch guys..


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/06/07 04:20, Ow Mun Heng wrote:
> On Thu, 2007-09-06 at 04:07 -0500, Ron Johnson wrote:
> 
>> On 09/06/07 01:13, Ow Mun Heng wrote:
> 
>>> update org_column set number = foo.number where foo.unique_id =
>>> org_column=unique_id.
>> Number?  Where does "number" come from?  Unless you've got weird
>> field names, that doesn't sound like a very good name for a
>> VARCHAR(4) column.
> 
> "number" is just a fictitious column name. I use sane column names of
> course :-)

OK.

Maybe there's an English language "issue", or maybe I'm just
excessively picky, but using "number" in this context is confusing.

In any event, it didn't stop the Earth from spinning or the sun from
shining, so it can't be that bad of a problem...

>>> The update is taking a few hours and still hasn't ended.
>>>
> 
>> Is it only *some* tuples that have the "extra space" problem?
> 
> Actually, it's all of it 

Then I agree with Alban:
  update table set number = trim(number);
or, if you need the leading spaces:
  update table set number = rtrim(number)

Then:
  update table set number = NULL where number = '';

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG38y7S9HxQb37XmcRAgC8AKDue6TRz4oLcmavV5u6dw0yOiQC4gCfVmgt
pCuDuyjOKh7LM/dfACkw3lc=
=KCw6
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Ow Mun Heng
On Thu, 2007-09-06 at 11:08 +0200, Alban Hertroys wrote:
> Ow Mun Heng wrote:
> > I found 2 new ways to do this.
> > 
> > option 1
> > ---
> > 
> > create table foo as select unique_id, rtrim(number) as number from foo;
> > alter table add primary key...
> > create index...
> > drop org_table
> > alter table rename...
> > All this is ~10min
> 
> This only works if you don't have foreign key constraints involving that
> table. Otherwise you just lost your data integrity (although I expect an
> error to be thrown).

Got it.. Don't use FK's so.. I'm safe (for now)
> 
> > option 2
> > 
> > This I saw in the mysql archives (in my laptop).. when I say this I
> > went.. WTF? This is possible?? Dang IT! 
> > 
> > update a set number=replace(number,'ABC ', 'ABC') where reptest like '%
> > ABC%';
> 
> Ehrm... yes, nothing special about it. Basic SQL really ;)
Hmm.. I feel the salt..

> But shouldn't you be using trim() or rtrim() instead?:
> 
> update table set number = trim(number)

Hmm.. didn't think of that. Next time I guess. (in all honestly, I
didn't know you can update it on the same process/column/table. I was
dumping it to a separate table and updating it..

Now I know..
> 
> you could probably speed that up by only querying the records that need
> trimming, for example:
> 
> create index tmp_idx on table(number) where number != trim(number);
> analyze table;
> update table set number = trim(number) where number != trim(number);

all fields in that column is affected. I have " " (5 spaces) instead
of nulls

Thanks for the pointers..

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Ow Mun Heng
On Thu, 2007-09-06 at 04:07 -0500, Ron Johnson wrote:

> On 09/06/07 01:13, Ow Mun Heng wrote:

> > update org_column set number = foo.number where foo.unique_id =
> > org_column=unique_id.
> 
> Number?  Where does "number" come from?  Unless you've got weird
> field names, that doesn't sound like a very good name for a
> VARCHAR(4) column.

"number" is just a fictitious column name. I use sane column names of
course :-)

> 
> > The update is taking a few hours and still hasn't ended.
> > 

> Is it only *some* tuples that have the "extra space" problem?

Actually, it's all of it 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/06/07 01:13, Ow Mun Heng wrote:
> I have a table in PG, Pulled from SQL Server using Perl DBI (w/o using
> chopblanks) and have ended up with a column where the "space" is being
> interpreted as a value.
> 
> eg: 
> 
> "ABC " when it should be "ABC"
> 
> this is being defined  as varchar(4)
> 
> I've already pull the relevent columns with 
> 
> create foo as select unique_id, rtrim(number) from org_column
> 
> I've tried to do the update using
> 
> update org_column set number = foo.number where foo.unique_id =
> org_column=unique_id.

Number?  Where does "number" come from?  Unless you've got weird
field names, that doesn't sound like a very good name for a
VARCHAR(4) column.

> The update is taking a few hours and still hasn't ended.
> 
> I've killed it already and rolled back the changes.
> 
> what's the easiest way to update these fields?

Is it only *some* tuples that have the "extra space" problem?

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG38NAS9HxQb37XmcRAlZhAKCMtXSMzvbZ04M3YAdlAhjN4p7rSQCfZTDp
Goyd+/FIFdwoc7IA87Mr3xM=
=hJfr
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Alban Hertroys
Ow Mun Heng wrote:
> I found 2 new ways to do this.
> 
> option 1
> ---
> 
> create table foo as select unique_id, rtrim(number) as number from foo;
> alter table add primary key...
> create index...
> drop org_table
> alter table rename...
> All this is ~10min

This only works if you don't have foreign key constraints involving that
table. Otherwise you just lost your data integrity (although I expect an
error to be thrown).

> option 2
> 
> This I saw in the mysql archives (in my laptop).. when I say this I
> went.. WTF? This is possible?? Dang IT! 
> 
> update a set number=replace(number,'ABC ', 'ABC') where reptest like '%
> ABC%';

Ehrm... yes, nothing special about it. Basic SQL really ;)
But shouldn't you be using trim() or rtrim() instead?:

update table set number = trim(number)

you could probably speed that up by only querying the records that need
trimming, for example:

create index tmp_idx on table(number) where number != trim(number);
analyze table;
update table set number = trim(number) where number != trim(number);

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-05 Thread Ow Mun Heng
On Thu, 2007-09-06 at 14:13 +0800, Ow Mun Heng wrote:
> I have a table in PG, Pulled from SQL Server using Perl DBI (w/o using
> chopblanks) and have ended up with a column where the "space" is being
> interpreted as a value.
> 
> eg: 
> 
> "ABC " when it should be "ABC"
> 
> this is being defined  as varchar(4)
> 
> I've already pull the relevent columns with 
> 
> create foo as select unique_id, rtrim(number) from org_column
> 
> I've tried to do the update using
> 
> update org_column set number = foo.number where foo.unique_id =
> org_column=unique_id.
> 
> The update is taking a few hours and still hasn't ended.
> 
> I've killed it already and rolled back the changes.
> 
> what's the easiest way to update these fields?
> 
> 
> Thanks..


Bad Form.. I know.. replying to my own post.. but..

I found 2 new ways to do this.

option 1
---

create table foo as select unique_id, rtrim(number) as number from foo;
alter table add primary key...
create index...
drop org_table
alter table rename...
All this is ~10min


option 2

This I saw in the mysql archives (in my laptop).. when I say this I
went.. WTF? This is possible?? Dang IT! 

update a set number=replace(number,'ABC ', 'ABC') where reptest like '%
ABC%';

of course this will need to parse it through like a couple of times,
unless I use some regex magic etc.. 

but anyway.. problem solved and using chopblanks => 1 now..

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Need suggestion on how best to update 3 million rows

2007-09-05 Thread Ow Mun Heng
I have a table in PG, Pulled from SQL Server using Perl DBI (w/o using
chopblanks) and have ended up with a column where the "space" is being
interpreted as a value.

eg: 

"ABC " when it should be "ABC"

this is being defined  as varchar(4)

I've already pull the relevent columns with 

create foo as select unique_id, rtrim(number) from org_column

I've tried to do the update using

update org_column set number = foo.number where foo.unique_id =
org_column=unique_id.

The update is taking a few hours and still hasn't ended.

I've killed it already and rolled back the changes.

what's the easiest way to update these fields?


Thanks..

---(end of broadcast)---
TIP 6: explain analyze is your friend