I just finished tuning an application almost exactly reverse Brandon's, not 
parsing text file and then loading, but extracting, parsing/transforming the 
data, and then generate XML file.

Original application did fetch and parsing record one-by-one, as it the code 
for parsing/transforming is no implemented efficient. It took around 40 minutes 
to pull data from a highly active transaction database, the DBAs complained 
about it.I split the extracting, let it quickly dump the data to a flat file ( 
~8 minutes) and then parsing/transforming the data and generate the XML file 
(~3 minutes). It is possible to speed it up even more by (parallel) pulling the 
data into memory (not very big, ~2GB) or dumping the flat data file on DB 
server side.



________________________________
From: tiger peng <tigerpeng2...@yahoo.com>
To: Tim Bunce <tim.bu...@pobox.com>; John Scoles <byter...@hotmail.com>
Cc: "bphe...@gls.com" <bphe...@gls.com>; "dbi-users@perl.org" 
<dbi-users@perl.org>
Sent: Friday, September 16, 2011 10:13 AM
Subject: Re: Tail Module + DBI Module, can\'t keep up!


Tim,

I bet at that time your MySQL did not support partition.


Now, with partition management available, the jobs should be easy.

Using a small partition, called hot partition in DW/ETL field, to receive the 
new data, as there is only small chunk of data the insert/update DML should be 
executed fast (keep index locally).When it reach a threshold, split the hot 
partition, use the fresh hot partition to receive newer data, and merge the 
older data to the main partition.

Using of text file has its advantage: When ever the DB is down, the parsing of 
log files can still run by itself; and when the DB is back,  bulk-load tool can 
be used to catch up the load quickly.


Tiger



________________________________
From: Tim Bunce <tim.bu...@pobox.com>
To: John Scoles <byter...@hotmail.com>
Cc: tigerpeng2...@yahoo.com; bphe...@gls.com; dbi-users@perl.org
Sent: Friday, September 16, 2011 4:20 AM
Subject: Re: Tail Module + DBI Module, can\'t keep up!

This is all fine advice for performance tuning DBI app and worth doing.
But there's
 always a limit to what can be achieved on the client.
So it's worth exploring what can be done on the server side, beyond
standard tuning practices.

I wrote a high volume log insertion app using mysql many years ago.
I can't remember the performance figures, but it was fast.

A major cost of insertion is maintaining indices. So for maximum
insertion speed you want to have no indices. But to query the log
records you'll probably want indices.

The way I approached this was to have per-hour tables: log_YYYYMMDDHH
The loader streams the records into the table as fast as it can.
(From memory I used a multi-row INSERT statement, no placeholders.)
Every time it switches to a new hour it triggered the building of
indices on the previous hour's table. It also triggered recreating
per-day views (log_YYYYMMDD) that abstracted the hourly tables.

The same technique could be applied at whatever time granularity
 meets
your needs, from table-per day to table-per minute.

If you can delay the loading of the log data by whatever period you're
using (eg hourly) then you have the option of preparing the log data as
hourly text files and then, when each is complete, using a proprietary
bulk-loading tool to load it.

Tim.

Reply via email to