Hi,
I am trying to import the TPC-H dataset (SF100) to the database without
success. The import method is the same as in the benchmark/tpch, except
the number of expected records in the load script and the load script
execution (executed line-by-line in the console). The machine is an dual
core AMD64 (64bit OS), with 4G ram and 8 disk raid0 for storage.
The import process consume nearly all memory and nearly all the swap
(please note that the number of expected records is specified after the
COPY command). In fact I have to restart the server process after each
table import to prevent the "swap to death" state.
However it seems that the lineitem table import will fail, no matter
what I do. I have tried it with different client (eg.: mjclient with
-Xbatching mode), sliced lineitem.tlb file, with the same result.
I have noticed that two or three hours after the issued lineitem COPY
command the mserver5 process does not consume CPU any more. The attached
strace shows me the following:
[pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout)
[pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout)
[pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout)
Any idea? Is there any way to import huge datasets (TB) in bulk mode to
the database? For example the postgresql database has a feature that it
can import the data without write ahead logging, nearly at disk speed.
The dataset can be sliced up per column, so a direct column copy would
be possible.
Regards,
J.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Monetdb-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-developers