Re: [GENERAL] SQL Diff ?

2007-08-27 Thread Alban Hertroys
Kevin Kempter wrote:
> Hi List;
> 
> I have a very large table (52million rows) - I'm creating a copy of it to rid 
> it of 35G worth of dead space, then I'll do a sync, drop the original table 
> and rename table2.
> 
> Once I have the table2 as a copy of table1 what's the best way to select all 
> rows that have been changed, modified in table1  since the initial laod from 
> table1 into table2?

I think you could get smart having a few rules for insert/update/delete
on 'table' that "keep track" of what happens during your work on table2.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] SQL Diff ?

2007-08-26 Thread Erik Jones


On Aug 26, 2007, at 9:02 AM, Dawid Kuroczko wrote:


On 8/26/07, Kevin Kempter <[EMAIL PROTECTED]> wrote:

On Saturday 25 August 2007 21:10:19 Ron Johnson wrote:

On 08/25/07 21:51, Kevin Kempter wrote:

Hi List;

I have a very large table (52million rows) - I'm creating a copy  
of it to
rid it of 35G worth of dead space, then I'll do a sync, drop the  
original

table and rename table2.


What is your definition of "dead space"?

Bad rows, duplicate rows, old rows?  Something else?


deleted rows that should have been cleaned up with vacuum, problem  
is the
client let it go so long that now I cant get a vacuum to finish  
cause it
impacts the day2day operations too much.  Long story, see my  
recent questions

on the performance list for more info.


In your place I would do something like Slony-I does, when
it replicates the tables.  Create on insert/update/delete triggers
on table1 which will log operations on table1 to some table1_log
table.  Then copy table1 to table2.  Then replay table1_log on
table2, then BEGIN;LOCK tablel1;finish replaying the lock;DROP table1;
alter table rename...;commit;

Or perhaps actually use Slony-I for the above steps?  Should work
quite nicely... Or perhaps use SkyTools for it (I've never used it)?



Yeah, for trigger based replication it'd be simpler to just use Slony- 
I or Skytools.  However, if you're on 8.2, with row-wise comparisons,  
you could do something like:


begin;
lock table1;

insert into table2
select *
from table1
where id not in (select id from test2);

drop  table1;
alter table2 rename to table1;
commit;

Here id is your primary key.  Note that if your ids are generated by  
a sequence you'll need to use setval on the sequence to get it  
"caught up" before that commit or you'll get duplicate key errors  
immediately.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] SQL Diff ?

2007-08-26 Thread Kevin Kempter
On Saturday 25 August 2007 23:49:39 Ron Johnson wrote:
> On 08/25/07 22:21, Kevin Kempter wrote:
> > On Saturday 25 August 2007 21:10:19 Ron Johnson wrote:
> >> On 08/25/07 21:51, Kevin Kempter wrote:
> >>> Hi List;
> >>>
> >>> I have a very large table (52million rows) - I'm creating a copy of it
> >>> to rid it of 35G worth of dead space, then I'll do a sync, drop the
> >>> original table and rename table2.
> >>
> >> What is your definition of "dead space"?
> >>
> >> Bad rows, duplicate rows, old rows?  Something else?
> >
> > deleted rows that should have been cleaned up with vacuum, problem is the
> > client let it go so long that now I cant get a vacuum to finish cause it
> > impacts the day2day operations too much.  Long story, see my recent
> > questions on the performance list for more info.
>
> OK.
>
> >>> Once I have the table2 as a copy of table1 what's the best way to
> >>> select all rows that have been changed, modified in table1  since the
> >>> initial laod from table1 into table2?
>
> Is this a 24x7 database?

Yes. with little room for extra overhead


>
> >>> Also I'll need to delete any rows in table2 that no longer remain in
> >>> table1.
> >>>
> >>> There is no change date column
> >>> I could do something like select * from table1 where col1 || col2 ||
> >>> col3 etc not in (select col1 || col2 || col3 etc from table2)
> >>>
> >>> but this would be ineffecient & slow.
> >>>
> >>> Anyone have a suggestion to do this in an efficient manner?
> >>>
> >>> Thanks in advance



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] SQL Diff ?

2007-08-26 Thread Dawid Kuroczko
On 8/26/07, Kevin Kempter <[EMAIL PROTECTED]> wrote:
> On Saturday 25 August 2007 21:10:19 Ron Johnson wrote:
> > On 08/25/07 21:51, Kevin Kempter wrote:
> > > Hi List;
> > >
> > > I have a very large table (52million rows) - I'm creating a copy of it to
> > > rid it of 35G worth of dead space, then I'll do a sync, drop the original
> > > table and rename table2.
> >
> > What is your definition of "dead space"?
> >
> > Bad rows, duplicate rows, old rows?  Something else?
>
> deleted rows that should have been cleaned up with vacuum, problem is the
> client let it go so long that now I cant get a vacuum to finish cause it
> impacts the day2day operations too much.  Long story, see my recent questions
> on the performance list for more info.

In your place I would do something like Slony-I does, when
it replicates the tables.  Create on insert/update/delete triggers
on table1 which will log operations on table1 to some table1_log
table.  Then copy table1 to table2.  Then replay table1_log on
table2, then BEGIN;LOCK tablel1;finish replaying the lock;DROP table1;
alter table rename...;commit;

Or perhaps actually use Slony-I for the above steps?  Should work
quite nicely... Or perhaps use SkyTools for it (I've never used it)?

   Regarda,
   Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] SQL Diff ?

2007-08-25 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/25/07 22:21, Kevin Kempter wrote:
> On Saturday 25 August 2007 21:10:19 Ron Johnson wrote:
>> On 08/25/07 21:51, Kevin Kempter wrote:
>>> Hi List;
>>>
>>> I have a very large table (52million rows) - I'm creating a copy of it to
>>> rid it of 35G worth of dead space, then I'll do a sync, drop the original
>>> table and rename table2.
>> What is your definition of "dead space"?
>>
>> Bad rows, duplicate rows, old rows?  Something else?
> 
> deleted rows that should have been cleaned up with vacuum, problem is the 
> client let it go so long that now I cant get a vacuum to finish cause it 
> impacts the day2day operations too much.  Long story, see my recent questions 
> on the performance list for more info.

OK.

> 
>>> Once I have the table2 as a copy of table1 what's the best way to select
>>> all rows that have been changed, modified in table1  since the initial
>>> laod from table1 into table2?

Is this a 24x7 database?

>>> Also I'll need to delete any rows in table2 that no longer remain in
>>> table1.
>>>
>>> There is no change date column
>>> I could do something like select * from table1 where col1 || col2 || col3
>>> etc not in (select col1 || col2 || col3 etc from table2)
>>>
>>> but this would be ineffecient & slow.
>>>
>>> Anyone have a suggestion to do this in an efficient manner?
>>>
>>> Thanks in advance

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG0RRzS9HxQb37XmcRAtuhAJ0TEBIHJwvcR58iU4MvjV2CYMvvfQCdFnkJ
ThDbGY8dXJ2MoqOaNHInqdU=
=GF05
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] SQL Diff ?

2007-08-25 Thread brian

Kevin Kempter wrote:

Hi List;

I have a very large table (52million rows) - I'm creating a copy of it to rid 
it of 35G worth of dead space, then I'll do a sync, drop the original table 
and rename table2.


Once I have the table2 as a copy of table1 what's the best way to select all 
rows that have been changed, modified in table1  since the initial laod from 
table1 into table2?


Also I'll need to delete any rows in table2 that no longer remain in table1.



Set log_statement to 'mod' (or 'ddl' if you expect any changes to the 
schema itself) in postgresql.conf. Then it's just a matter of parsing 
out the mods in the log.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] SQL Diff ?

2007-08-25 Thread Kevin Kempter
On Saturday 25 August 2007 21:10:19 Ron Johnson wrote:
> On 08/25/07 21:51, Kevin Kempter wrote:
> > Hi List;
> >
> > I have a very large table (52million rows) - I'm creating a copy of it to
> > rid it of 35G worth of dead space, then I'll do a sync, drop the original
> > table and rename table2.
>
> What is your definition of "dead space"?
>
> Bad rows, duplicate rows, old rows?  Something else?

deleted rows that should have been cleaned up with vacuum, problem is the 
client let it go so long that now I cant get a vacuum to finish cause it 
impacts the day2day operations too much.  Long story, see my recent questions 
on the performance list for more info.


>
> > Once I have the table2 as a copy of table1 what's the best way to select
> > all rows that have been changed, modified in table1  since the initial
> > laod from table1 into table2?
> >
> > Also I'll need to delete any rows in table2 that no longer remain in
> > table1.
> >
> > There is no change date column
> > I could do something like select * from table1 where col1 || col2 || col3
> > etc not in (select col1 || col2 || col3 etc from table2)
> >
> > but this would be ineffecient & slow.
> >
> > Anyone have a suggestion to do this in an efficient manner?
> >
> > Thanks in advance



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] SQL Diff ?

2007-08-25 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/25/07 21:51, Kevin Kempter wrote:
> Hi List;
> 
> I have a very large table (52million rows) - I'm creating a copy of it to rid 
> it of 35G worth of dead space, then I'll do a sync, drop the original table 
> and rename table2.

What is your definition of "dead space"?

Bad rows, duplicate rows, old rows?  Something else?

> Once I have the table2 as a copy of table1 what's the best way to select all 
> rows that have been changed, modified in table1  since the initial laod from 
> table1 into table2?
> 
> Also I'll need to delete any rows in table2 that no longer remain in table1.
> 
> There is no change date column
> I could do something like select * from table1 where col1 || col2 || col3 etc 
> not in (select col1 || col2 || col3 etc from table2)
> 
> but this would be ineffecient & slow.
> 
> Anyone have a suggestion to do this in an efficient manner?
> 
> Thanks in advance

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG0O8bS9HxQb37XmcRAsWdAJoDI/WvdaGSO7CUkUa74xHoRgycIwCguLXv
Msw0TdJyYI4keoq+ofu4j3c=
=Gi/f
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] SQL Diff ?

2007-08-25 Thread Kevin Kempter
Hi List;

I have a very large table (52million rows) - I'm creating a copy of it to rid 
it of 35G worth of dead space, then I'll do a sync, drop the original table 
and rename table2.

Once I have the table2 as a copy of table1 what's the best way to select all 
rows that have been changed, modified in table1  since the initial laod from 
table1 into table2?

Also I'll need to delete any rows in table2 that no longer remain in table1.

There is no change date column
I could do something like select * from table1 where col1 || col2 || col3 etc 
not in (select col1 || col2 || col3 etc from table2)

but this would be ineffecient & slow.

Anyone have a suggestion to do this in an efficient manner?

Thanks in advance

/Kevin


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly