We have a long Etl process that takes about 72 hours to complete moving 871 million records out of an oracle server. There are several tables involved and currently use the single threaded non cached pipeline but run all the processes simultaneously. This is also for a data warehouse and haven't had any threading issues or bad data come of it.
Nathan On Tue, Jan 18, 2011 at 4:34 AM, Miles Waller <[email protected]>wrote: > Hi, > > I have a long-running process (about an hour per night) to load and > transform data for a data warehouse. However this is mostly because I run > it completely single-threaded (to avoid memory issues), plus the entire run > consists of several ETLs which I run one at a time rather than several at > once. In this arrangement, the entire process appears to be IO-bound due to > the drive speed when writing the final output files (text format) which are > needed for audit purposes. > > The machine has several separate drives so I thought I could get a > significant boost by running several ETLs at once, on separate threads, and > arranging for all the input/output files to be on different drives. So far > it looks quite promising. However, am I likely to run into any threading > issues with this configuration that might cause bad data to come out? I > don't think so but given how hard it might be to spot until it's too late, I > wondered if anyone else had an opinion. > > Cheers, > > Miles > > > -- > You received this message because you are subscribed to the Google Groups > "Rhino Tools Dev" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<rhino-tools-dev%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/rhino-tools-dev?hl=en. > -- You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.
