Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Jonah H. Harris
On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso <[EMAIL PROTECTED]> wrote:
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table
> from ORACLE to PGSQL.

The fastest and easiest method would be to dump the data from Oracle
into CSV/delimited format using something like ociuldr
(http://www.anysql.net/en/ociuldr.html) and load it back into PG using
pg_bulkload (which is a helluva lot faster than COPY).  Of course, you
could try other things as well... such as setting up generic
connectivity to PG and inserting the data to a PG table over the
database link.

Similarly, while I hate to see shameless self-plugs in the community,
the *fastest* method you could use is dblink_ora_copy, contained in
EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI
connection to COPY the data directly from Oracle into Postgres, which
also saves you the intermediate step of dumping the data.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [HACKERS] [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search

2008-04-26 Thread Tom Lane
Robert Treat <[EMAIL PROTECTED]> writes:
> On Saturday 26 April 2008 13:26, Tom Lane wrote:
>> Oh, you failed to state that the dtrace output was post-patch.  You need
>> to show *pre* patch dtrace output if you want us to think it relevant.

> Please read up-thread. 

Sorry, I'd forgotten your previous post.

I poked around for calls to TransactionIdIsCurrentTransactionId that
are in current code and weren't in 8.1.  I found these:

src/backend/commands/analyze.c: 965:if 
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(targtuple.t_data)))
src/backend/commands/analyze.c: 984:if 
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(targtuple.t_data)))
src/backend/commands/cluster.c: 803:if 
(!TransactionIdIsCurrentTransactionId(
src/backend/commands/cluster.c: 816:if 
(!TransactionIdIsCurrentTransactionId(
src/backend/storage/ipc/procarray.c: 374:   if 
(TransactionIdIsCurrentTransactionId(xid))
src/backend/utils/time/combocid.c: 108: 
Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)));
src/backend/utils/time/combocid.c: 123: 
Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tup)));
src/backend/utils/time/combocid.c: 156: 
TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)))

The ANALYZE and CLUSTER calls are not likely to be your issue, but the
one in HeapTupleHeaderAdjustCmax could get called a lot, and the one
in TransactionIdIsInProgress definitely will get called a lot.
Neither of those calls existed in 8.2.

So I think that explains why TransactionIdIsCurrentTransactionId has
become more performance-critical in 8.3 than it was before.  Will
apply the back-patch.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Craig Ringer

Joshua D. Drake wrote:

Potluri Srikanth wrote:

But do we link oracle trigger to postgres trigger ?

i mean :

oracle trigger will take a note of what has been changed .
but then how do we pass those changes to postgres trigger ?


I am assuming you can use the java trigger from oracle to load the 
postgresql jdbc driver, make a connection to postgresql and perform 
whatever statement needed to be done.


Note that this will be rather inefficient if you're obtaining a new 
connection every time. It looks like Oracle's Java stored procedures and 
triggers run in an appserver-like environment, though, so you should be 
able to use a connection pool, JNDI, or similar.


Some Java stored procedure examples:

http://www.oracle.com/technology/sample_code/tech/java/jsp/oracle9ijsp.html

You could also use a Java trigger to send simpler change message, with a 
serialized row if required, to an external app that's responsible for 
updating the PostgreSQL database. That might cause less load on the DB 
server.


The trouble with this approach, though, is that it might be hard to get 
right when transactions roll back. An alternative is to use an Oracle 
trigger that inserts records in a change tracking / audit table. You can 
then periodically read and clear the audit table, using that change 
history data to update the PostgreSQL database. This method has the 
advantage of being transaction safe, as data will never become visible 
in the audit table until the transaction making the changes has committed.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [HACKERS] [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search

2008-04-26 Thread Robert Treat
On Saturday 26 April 2008 13:26, Tom Lane wrote:
> Robert Treat <[EMAIL PROTECTED]> writes:
> > On Friday 25 April 2008 17:32, Tom Lane wrote:
> >> Robert Treat <[EMAIL PROTECTED]> writes:
> >>> Oddly some dtrace profiling gave me this, which is pretty different,
> >>> but certainly doesn't have concerns about
> >>> TransactionIdIsCurrentTransactionId
> >>
> >>  which seems to pretty much destroy your thesis, no?
> >
> > How so? Before the patch we bog down for hours, spending 99% of our time
> > in TransactionIdIsCurrentTransactionId, after the patch everything
> > performs well (really better than before) and we spend so little time in
> > TransactionIdIsCurrentTransactionId it barely shows up on the radar.
>
> Oh, you failed to state that the dtrace output was post-patch.  You need
> to show *pre* patch dtrace output if you want us to think it relevant.
>

Please read up-thread. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [HACKERS] [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search

2008-04-26 Thread Tom Lane
Robert Treat <[EMAIL PROTECTED]> writes:
> On Friday 25 April 2008 17:32, Tom Lane wrote:
>> Robert Treat <[EMAIL PROTECTED]> writes:
>>> Oddly some dtrace profiling gave me this, which is pretty different, but
>>> certainly doesn't have concerns about TransactionIdIsCurrentTransactionId
>> 
>>  which seems to pretty much destroy your thesis, no?

> How so? Before the patch we bog down for hours, spending 99% of our time in  
> TransactionIdIsCurrentTransactionId, after the patch everything performs well
> (really better than before) and we spend so little time in 
> TransactionIdIsCurrentTransactionId it barely shows up on the radar. 

Oh, you failed to state that the dtrace output was post-patch.  You need
to show *pre* patch dtrace output if you want us to think it relevant.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Luke Lonergan
Yep ­ just do something like this within sqlplus (from
http://www.dbforums.com/showthread.php?t=350614):

set termout off
set hea off
set pagesize 0

spool c:\whatever.csv

select a.a||','||a.b||','||a.c
from a
where a.a="whatever";

spool off

COPY is the fastest approach to get it into PG.

- Luke

On 4/26/08 6:25 AM, "Adonias Malosso" <[EMAIL PROTECTED]> wrote:

> Hi All,
> 
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table 
> from ORACLE to PGSQL.
> 
> The current approach is to dump the data in CSV and than COPY it to
> Postgresql.
> 
> Anyone has a better idea.
> 
> 
> Regards
> Adonias Malosso
> 



Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Joshua D. Drake

Potluri Srikanth wrote:

But do we link oracle trigger to postgres trigger ?

i mean :

oracle trigger will take a note of what has been changed .
but then how do we pass those changes to postgres trigger ?


I am assuming you can use the java trigger from oracle to load the 
postgresql jdbc driver, make a connection to postgresql and perform 
whatever statement needed to be done.


Sincerely,

Joshua D. Drake

P.S. It is possible that Oracle can't do this (I don't know)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Potluri Srikanth
 But do we link oracle trigger to postgres trigger ?
 i mean :
 oracle trigger will take a note of what has been changed .
 but then how do we pass those changes to postgres trigger ?
 can u suggest any logic or algorithm ?
 Regards, 
 Srikanth k Potluri 
 +63 9177444783(philippines) 
 On Sat 26/04/08  8:40 PM , "Joshua D. Drake" [EMAIL PROTECTED]
sent:
 Adonias Malosso wrote: 
 > Hi All, 
 >  
 > I�d like to know what�s the best practice to LOAD a 70 milion
rows, 101  
 > columns table 
 > from ORACLE to PGSQL. 
 >  
 > The current approach is to dump the data in CSV and than COPY it
to  
 > Postgresql. 
 >  
 > Anyone has a better idea. 
 Write a java trigger in Oracle that notes when a row has been  
 added/delete/updated and does the exact same thing in postgresql. 
 Joshua D. Drake 
 >  
 >  
 > Regards 
 > Adonias Malosso 
 --  
 Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org [1]) 
 To make changes to your subscription: 
 http://www.postgresql.org/mailpref/pgsql-performance 


Links:
--
[1]
http://sitemail7.hostway.com/javascript:top.opencompose(\'[EMAIL 
PROTECTED]',\'\',\'\',\'\')


Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Joshua D. Drake

Adonias Malosso wrote:

Hi All,

I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 
columns table

from ORACLE to PGSQL.

The current approach is to dump the data in CSV and than COPY it to 
Postgresql.


Anyone has a better idea.


Write a java trigger in Oracle that notes when a row has been 
added/delete/updated and does the exact same thing in postgresql.


Joshua D. Drake





Regards
Adonias Malosso



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddly slow queries

2008-04-26 Thread Thomas Spreng

On 22.04.2008, at 17:25, Scott Marlowe wrote:
On Tue, Apr 22, 2008 at 7:42 AM, Thomas Spreng <[EMAIL PROTECTED]>  
wrote:


I think I'll upgrade PostgreSQL to the latest 8.3 version in the next
few days anyway, along with a memory upgrade (from 1.5GB to 4GB)  
and a
new 2x RAID-1 (instead of RAID-5) disk configuration. I hope that  
this

has already a noticeable impact on the performance.


Note that if you have a good RAID controller with battery backed cache
and write back enabled, then you're probably better or / at least as
well off using four disks in a RAID-10 than two separate RAID-1 sets
(one for xlog and one for data).


I just wanted to let you know that upgrading Postgres from 8.1 to 8.3,
RAM from 1.5GB to 4GB and changing from a 3 disk RAID5 to 2x RAID1 (OS &
WAL, Tablespace) led to a significant speed increase. What's especially
important is that those randomly slow queries seem to be gone (for now).

Thanks for all the help.

Cheers,

Tom

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Adonias Malosso
Hi All,

I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
columns table
from ORACLE to PGSQL.

The current approach is to dump the data in CSV and than COPY it to
Postgresql.

Anyone has a better idea.


Regards
Adonias Malosso


Re: [PERFORM] Re: [HACKERS] [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search

2008-04-26 Thread Robert Treat
On Friday 25 April 2008 17:32, Tom Lane wrote:
> Robert Treat <[EMAIL PROTECTED]> writes:
> > Oddly some dtrace profiling gave me this, which is pretty different, but
> > certainly doesn't have concerns about TransactionIdIsCurrentTransactionId
>
>  which seems to pretty much destroy your thesis, no?
>

How so? Before the patch we bog down for hours, spending 99% of our time in  
TransactionIdIsCurrentTransactionId, after the patch everything performs well 
(really better than before) and we spend so little time in 
TransactionIdIsCurrentTransactionId it barely shows up on the radar. 

Note I'm open to the idea that TransactionIdIsCurrentTransactionId itself is 
not the problem, but that something else changed between 8.1 and 8.3 that 
exposes TransactionIdIsCurrentTransactionId as a problem.  Changing to a 
binary search for TransactionIdIsCurrentTransactionId  makes that a non-issue 
though. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance