Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-28 Thread Adonias Malosso
Jonah,

Thank you for the answer. Good to know about this enterprise DB feature.

I´ll follow using pgloader.

Regards.

Adonias Malosso

On Sat, Apr 26, 2008 at 10:14 PM, Jonah H. Harris [EMAIL PROTECTED]
wrote:

 On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso [EMAIL PROTECTED]
 wrote:
  I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
  columns table
  from ORACLE to PGSQL.

 The fastest and easiest method would be to dump the data from Oracle
 into CSV/delimited format using something like ociuldr
 (http://www.anysql.net/en/ociuldr.html) and load it back into PG using
 pg_bulkload (which is a helluva lot faster than COPY).  Of course, you
 could try other things as well... such as setting up generic
 connectivity to PG and inserting the data to a PG table over the
 database link.

 Similarly, while I hate to see shameless self-plugs in the community,
 the *fastest* method you could use is dblink_ora_copy, contained in
 EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI
 connection to COPY the data directly from Oracle into Postgres, which
 also saves you the intermediate step of dumping the data.

 --
 Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
 EnterpriseDB Corporation | fax: 732.331.1301
 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
 Edison, NJ 08837 | http://www.enterprisedb.com/



[PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Adonias Malosso
Hi All,

I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
columns table
from ORACLE to PGSQL.

The current approach is to dump the data in CSV and than COPY it to
Postgresql.

Anyone has a better idea.


Regards
Adonias Malosso


[PERFORM] [PERFORMANCE] Error loading 37G CSV file invalid string enlargement request size 65536

2008-04-22 Thread Adonias Malosso
Hi all,

I´m running a copy for a 37G CSV and receiving the following error:

invalid string enlargement request size 65536

The file has about 70 million lines with 101 columns, all them varchar.

When I run the command with the whole file i receive the error after loading
about 29million lines. So i´ve spllited the file in 10 million lines with
split:

split --lines=1000

And running the copy i receive the error on the 5th file:

psql:/srv/www/htdocs/import/script_q2.sql:122: ERROR:  invalid string
enlargement request size 65536
CONTEXT:  COPY temp_q2, line 3509639: 9367276;4;DANIEL DO
CARMO BARROS;31-Jan-1986;M;1;10;3162906;GILSON TEIXEIRA...

Any clues?

My postgresql version is 8.2.4 the server is running suse linux with 1.5GB
Sensitive changes in postgresql.conf are:

shared_buffers = 512MB
temp_buffers = 256MB
checkpoint_segments = 60

I´d also like to know if there´s any way to optimize huge data load in
operations like these.

Regards

Adonias Malosso


[PERFORM] 4s query want to run faster

2008-02-21 Thread Adonias Malosso
Hi all,

The following query takes about 4s to run in a 16GB ram server. Any ideas
why it doesn´t use index for the primary keys in the join conditions?

select i.inuid, count(*) as total
from cte.instrumentounidade i
inner join cte.pontuacao p on p.inuid = i.inuid
inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
inner join cte.subacaoindicador si on si.aciid = ai.aciid
where i.itrid = 2 and p.ptostatus = 'A'
group by i.inuid
having count(*)  0

HashAggregate  (cost=47905.87..47941.01 rows=2008 width=4)
 Filter: (count(*)  0)
 -  Hash Join  (cost=16307.79..46511.45 rows=185923 width=4)
   Hash Cond: (si.aciid = ai.aciid)
   -  Seq Scan on subacaoindicador si  (cost=0.00..22812.17 rows=368817
width=4)
   -  Hash  (cost=16211.40..16211.40 rows=38556 width=8)
 -  Hash Join  (cost=9018.20..16211.40 rows=38556 width=8)
   Hash Cond: (p.inuid = i.inuid)
   -  Hash Join  (cost=8908.41..15419.10 rows=39593
width=8)
 Hash Cond: (ai.ptoid = p.ptoid)
 -  Seq Scan on acaoindicador ai  (cost=
0.00..4200.84 rows=76484 width=8)
 -  Hash  (cost=8678.33..8678.33 rows=92034
width=8)
   -  Seq Scan on pontuacao p  (cost=
0.00..8678.33 rows=92034 width=8)
 Filter: (ptostatus = 'A'::bpchar)
   -  Hash  (cost=104.78..104.78 rows=2008 width=4)
 -  Seq Scan on instrumentounidade i  (cost=
0.00..104.78 rows=2008 width=4)
   Filter: (itrid = 2)


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Adonias Malosso
Set random_page_cost = 2 solved the problem. thanks

On Thu, Feb 21, 2008 at 6:16 PM, Claus Guttesen [EMAIL PROTECTED] wrote:

why it doesn´t use index for the primary keys in the join
 conditions?
  
   Maby random_page_cost is set too high? What version are you using?
 
  Postgresql v. 8.2.1

 You can try to lower this value. The default (in 8.3) is 4.

 --
 regards
 Claus

 When lenity and cruelty play for a kingdom,
 the gentlest gamester is the soonest winner.

 Shakespeare