Re: Tail Module + DBI Module, can\'t keep up!
On Sep 21, 2011, at 8:55 AM, Curtis Leach wrote: > Here's a way that might speed things up for you considerably by > eliminating a DB hit. > >> 4. Perl script from (3) constantly reads in the /var/log files using >> the Tail module. It does the following: >> >> a. When a connection is opened, it INSERTs into the >> sonicwall_connections table >> >> b. When a connection is closed, it SELECTs from the >> sonicwall_connection table the last record id that matches the >> protocol, source_ip, source_port, and destination_port >> >> c. If a record exists matching this criteria, it UPDATEs that >> record's close_dt column with the time the connection was >> closed >> >> d. If a record does not exist, then in our case this means that >> the connection was denied due to firewall rules, and we >> instead INSERT into a different table, sonicwall_denied > [...] If the slowdown is truly this step of inserting data into the DB, how about just inserting the unchecked log entry into the database and doing all the ruke matching post insert, or doing the matching as views into the raw table; manage all the above business rules in the database. If you're just looking for open and close records, for example: Perl script watches the log if a log entry matches either OPEN or CLOSE dump it to the raw log table, which is indexed on protocol, source_ip, source_port, and destination_port. Then with the data in the database, you can do the queries needed to find the open and close of a connection and easily find rows with a close without an open to find the sonicwall-denied entries. (this, in fact was exactly how we used to manage the same type of information about a Cisco terminal server, looking for abnormally dropped connections and identifying who was connected when to what IP address) Basically you just need to process the log entries into row inserts without doing any other queries to the database, which will be about as fast as you can manage, especially if you do block commits. Then if it's still not fast enough to keep up, you need to look elsewhere for speed improvements Quit making Perl do the work of Mysql, in other words. -- Bruce Johnson University of Arizona College of Pharmacy Information Technology Group Institutions do not have opinions, merely customs
RE: Tail Module + DBI Module, can\'t keep up!
Here's a way that might speed things up for you considerably by eliminating a DB hit. I'm assuming that your query only returns 1 row and that you are not stepping through them all & just selecting the last one returned. If you are stepping through them, modify your select to only return 1 row. I'd assume you'd only have 1 record at a time with close_dt NULL. Then combine steps (b) & (C) and just do the update. If it updates 0 records do (d). This will eliminate a trip to the database. I believe the DBI can tell you how many records it updated. But I don't remember how to do this off hand. Curtis -Original Message- From: Roode, Eric [mailto:ero...@barrack.com] Sent: Wednesday, September 21, 2011 8:38 AM To: dbi-users@perl.org Cc: Brandon Phelps Subject: RE: Tail Module + DBI Module, can\'t keep up! On Wednesday, September 21, 2011 10:04 AM, Brandon Phelps [mailto:bphe...@gls.com] wrote: > Subject: Re: Tail Module + DBI Module, can\'t keep up! [...] > 4. Perl script from (3) constantly reads in the /var/log files using > the Tail module. It does the following: > >a. When a connection is opened, it INSERTs into the > sonicwall_connections table > >b. When a connection is closed, it SELECTs from the > sonicwall_connection table the last record id that matches the > protocol, source_ip, source_port, and destination_port > >c. If a record exists matching this criteria, it UPDATEs that > record's close_dt column with the time the connection was > closed > >d. If a record does not exist, then in our case this means that > the connection was denied due to firewall rules, and we > instead INSERT into a different table, sonicwall_denied [...] Inserting into the table ought to be a very fast operation. Searching it might be fast or might be slow. Updating it ought to be fast, especially if you have the primary key. Here is a shot in the dark: * Create a small auxiliary table with columns for protocol, source_ip, source_port, destination, and whatever the primary key for the sonicwall_connections table is. * Index it on your matching criteria (the protocol, source_ip, source_port, and destination_port). * When you INSERT into sonicwall_connections, step 4a above, also insert into this small table. * Then, when you encounter a connection-close (step 4b above), you won't have to search the whole sonicwall_connections table (which I presume is humongous). * In step 4c, after the UPDATE, delete the row from this auxiliary table. * In a separate program or process, periodically delete old rows from the auxiliary table (connections that never closed, for some reason). This will keep that table small and fast. But this is all just a guess, based on the idea that the SELECT is slowing you down. You really ought to profile it. You may be surprised to find out that the slowness is not where you think it is. -- Eric
Re: Tail Module + DBI Module, can\'t keep up!
By my understanding, for processing one record, you have multiple round-trips between Perl process and DB: get data from DB, check the if the data match, and send DML command to DB. I am not familiar with MySQL anymore, but if I have to handle all these DML processing quickly on Oracle, I will just one DML ("insert all", "merge" statements on Oracle database), so I only do one execute and the DB server will do all the job. Just check online, MySQL has "insert into ... duplicated key" similar to Oracle's merge but I could not find MySQL statement similar to Oracle's "insert all", but you can write a (stored) procedure do all the DML, cannot you? From: Brandon Phelps To: tiger peng Cc: Tim Bunce ; John Scoles ; "dbi-users@perl.org" Sent: Wednesday, September 21, 2011 9:49 AM Subject: Re: Tail Module + DBI Module, can\'t keep up! What do yo mean about the condition check in a script? Not sure I follow. On 09/21/2011 10:41 AM, tiger peng wrote: > The first bottleneck is open/close for each record, which are time consuming > operation. Why don't you just use one connection? > The second one is do the condition check in script instead of doing all > within database server. > > Try to fix this two issues. If it is still too slow, we can do more tuning. > > -- -- -- > *From:* Brandon Phelps > *To:* tiger peng > *Cc:* Tim Bunce ; John Scoles ; > "dbi-users@perl.org" > *Sent:* Wednesday, September 21, 2011 9:04 AM > *Subject:* Re: Tail Module + DBI Module, can\'t keep up! > > Thanks for the responses. > > I would love to be able to output the data to a flat file or something and do > bulk inserts daily, however this isn't really feasible for my situation > because the database operations depend on other records. > > Basically the system I am creating works like this currently: > > 1. Syslog-ng accepts syslog messages from a sonicwall firewall on the > network, and logs to a file in /var/log > 3. Perl script runs as a daemon, extracting open_dt, close_dt, protocol, > source_ip, destination_ip, source_port, destination_port, bytes_sent, and > bytes_received > 2. Database table called sonicwall_connections, partitioned by the open_dt > column, 1 partition per day > 4. Perl script from (3) constantly reads in the /var/log files using the Tail > module. It does the following: > a. When a connection is opened, it INSERTs into the sonicwall_connections > table > b. When a connection is closed, it SELECTs from the sonicwall_connection > table the last record id that matches the protocol, source_ip, source_port, > and destination_port > c. If a record exists matching this criteria, it UPDATEs that record's > close_dt column with the time the connection was closed > d. If a record does not exist, then in our case this means that the > connection was denied due to firewall rules, and we instead INSERT into a > different table, sonicwall_denied > > > Previously I was doing the entire SELECT prepare and execute in my never > ending while loop that parsed the log. I have since changed this per the > recommendations here. Now, I open my connection to the database using > Perl::DBI, then I create my prepared statement (outside the while loop), and > I simply call execute() within the while loop, so the prepared statement only > has to be created once. This seems to have helped a small bit in that the > script does not fall behind quite as fast now, however it does still fall > behind. Currently with these modifications it is 10AM EST and the script is > still working on log entries from yesterday night. Around 12 hours behind, > compared to the 22-23 hours behind it was previously, so this change > obviously did help a bi
RE: Tail Module + DBI Module, can\'t keep up!
On Wednesday, September 21, 2011 10:04 AM, Brandon Phelps [mailto:bphe...@gls.com] wrote: > Subject: Re: Tail Module + DBI Module, can\'t keep up! [...] > 4. Perl script from (3) constantly reads in the /var/log files using > the Tail module. It does the following: > >a. When a connection is opened, it INSERTs into the > sonicwall_connections table > >b. When a connection is closed, it SELECTs from the > sonicwall_connection table the last record id that matches the > protocol, source_ip, source_port, and destination_port > >c. If a record exists matching this criteria, it UPDATEs that > record's close_dt column with the time the connection was > closed > >d. If a record does not exist, then in our case this means that > the connection was denied due to firewall rules, and we > instead INSERT into a different table, sonicwall_denied [...] Inserting into the table ought to be a very fast operation. Searching it might be fast or might be slow. Updating it ought to be fast, especially if you have the primary key. Here is a shot in the dark: * Create a small auxiliary table with columns for protocol, source_ip, source_port, destination, and whatever the primary key for the sonicwall_connections table is. * Index it on your matching criteria (the protocol, source_ip, source_port, and destination_port). * When you INSERT into sonicwall_connections, step 4a above, also insert into this small table. * Then, when you encounter a connection-close (step 4b above), you won't have to search the whole sonicwall_connections table (which I presume is humongous). * In step 4c, after the UPDATE, delete the row from this auxiliary table. * In a separate program or process, periodically delete old rows from the auxiliary table (connections that never closed, for some reason). This will keep that table small and fast. But this is all just a guess, based on the idea that the SELECT is slowing you down. You really ought to profile it. You may be surprised to find out that the slowness is not where you think it is. -- Eric
Re: Tail Module + DBI Module, can\'t keep up!
What do yo mean about the condition check in a script? Not sure I follow. On 09/21/2011 10:41 AM, tiger peng wrote: The first bottleneck is open/close for each record, which are time consuming operation. Why don't you just use one connection? The second one is do the condition check in script instead of doing all within database server. Try to fix this two issues. If it is still too slow, we can do more tuning. -- *From:* Brandon Phelps *To:* tiger peng *Cc:* Tim Bunce ; John Scoles ; "dbi-users@perl.org" *Sent:* Wednesday, September 21, 2011 9:04 AM *Subject:* Re: Tail Module + DBI Module, can\'t keep up! Thanks for the responses. I would love to be able to output the data to a flat file or something and do bulk inserts daily, however this isn't really feasible for my situation because the database operations depend on other records. Basically the system I am creating works like this currently: 1. Syslog-ng accepts syslog messages from a sonicwall firewall on the network, and logs to a file in /var/log 3. Perl script runs as a daemon, extracting open_dt, close_dt, protocol, source_ip, destination_ip, source_port, destination_port, bytes_sent, and bytes_received 2. Database table called sonicwall_connections, partitioned by the open_dt column, 1 partition per day 4. Perl script from (3) constantly reads in the /var/log files using the Tail module. It does the following: a. When a connection is opened, it INSERTs into the sonicwall_connections table b. When a connection is closed, it SELECTs from the sonicwall_connection table the last record id that matches the protocol, source_ip, source_port, and destination_port c. If a record exists matching this criteria, it UPDATEs that record's close_dt column with the time the connection was closed d. If a record does not exist, then in our case this means that the connection was denied due to firewall rules, and we instead INSERT into a different table, sonicwall_denied Previously I was doing the entire SELECT prepare and execute in my never ending while loop that parsed the log. I have since changed this per the recommendations here. Now, I open my connection to the database using Perl::DBI, then I create my prepared statement (outside the while loop), and I simply call execute() within the while loop, so the prepared statement only has to be created once. This seems to have helped a small bit in that the script does not fall behind quite as fast now, however it does still fall behind. Currently with these modifications it is 10AM EST and the script is still working on log entries from yesterday night. Around 12 hours behind, compared to the 22-23 hours behind it was previously, so this change obviously did help a bit... still not perfect yet though. As you can see from the procedure above I cannot really work with a flat file or non-indexed hot table because I constantly need to read back in records that were previously inserted. Sometimes the record I need was inserted just seconds before, while other times I need to retrieve records that have been there for days. For example, when a user loads a website in their browser, I get my initial open connection log entry and I do the initial INSERT, but only a second later (depending on the size of the site obviously) I am ready to UPDATE the record with the close_dt time (when the web request connection was closed). On the other hand if a user downloads a 10GB file via FTP, or starts a remote desktop session, the connection could potentially be opened for hours if not days, so I also need to be able to read records from that long ago, in order to get the record ID to UPDATE when the connection is closed. Another reason I cannot store records to a temporary table or flat file is because the actual web gui users most often need the data right away. For example if a manager sees a huge spike in bandwidth on our fiber connection, they need to be able to instantly
Re: Tail Module + DBI Module, can\'t keep up!
Tiger, I didn't mean that I am opening closing the database connection for each record. I only have 1 database connection which is opened at the beginning of the perl script and closed when (if ever) the script is killed. The open and close I was referring to was for the actual log messages. The log messages themselves indicate when a firewall connection was opened or closed, so whenever I get a log message indicated that a connection was closed, I need to SELECT the proper record ID from the database first, in order to know which record to UPDATE with the new information. Sorry for the confusion. On 09/21/2011 10:41 AM, tiger peng wrote: The first bottleneck is open/close for each record, which are time consuming operation. Why don't you just use one connection? The second one is do the condition check in script instead of doing all within database server. Try to fix this two issues. If it is still too slow, we can do more tuning. From: Brandon Phelps To: tiger peng Cc: Tim Bunce; John Scoles; "dbi-users@perl.org" Sent: Wednesday, September 21, 2011 9:04 AM Subject: Re: Tail Module + DBI Module, can\'t keep up! Thanks for the responses. I would love to be able to output the data to a flat file or something and do bulk inserts daily, however this isn't really feasible for my situation because the database operations depend on other records. Basically the system I am creating works like this currently: 1. Syslog-ng accepts syslog messages from a sonicwall firewall on the network, and logs to a file in /var/log 3. Perl script runs as a daemon, extracting open_dt, close_dt, protocol, source_ip, destination_ip, source_port, destination_port, bytes_sent, and bytes_received 2. Database table called sonicwall_connections, partitioned by the open_dt column, 1 partition per day 4. Perl script from (3) constantly reads in the /var/log files using the Tail module. It does the following: a. When a connection is opened, it INSERTs into the sonicwall_connections table b. When a connection is closed, it SELECTs from the sonicwall_connection table the last record id that matches the protocol, source_ip, source_port, and destination_port c. If a record exists matching this criteria, it UPDATEs that record's close_dt column with the time the connection was closed d. If a record does not exist, then in our case this means that the connection was denied due to firewall rules, and we instead INSERT into a different table, sonicwall_denied Previously I was doing the entire SELECT prepare and execute in my never ending while loop that parsed the log. I have since changed this per the recommendations here. Now, I open my connection to the database using Perl::DBI, then I create my prepared statement (outside the while loop), and I simply call execute() within the while loop, so the prepared statement only has to be created once. This seems to have helped a small bit in that the script does not fall behind quite as fast now, however it does still fall behind. Currently with these modifications it is 10AM EST and the script is still working on log entries from yesterday night. Around 12 hours behind, compared to the 22-23 hours behind it was previously, so this change obviously did help a bit... still not perfect yet though. As you can see from the procedure above I cannot really work with a flat file or non-indexed hot table because I constantly need to read back in records that were previously inserted. Sometimes the record I need was inserted just seconds before, while other times I need to retrieve records that have been there for days. For example, when a user loads a website in their browser, I get my initial open connection log entry and I do the initial INSERT, but only a second later (depending on the size of the site obviously) I am ready to UPDATE the record with the close_dt time (when the web request connection was closed). On the other hand if a user downloads a 10GB file via FTP, or starts a remote desktop session, the connection could potentially be opened for hours if not days, so I also need to be able to read records from that long ago, in order to get the record ID to UPDATE when the connection is closed. Another reason I cannot store records to a temporary table or flat file is because the actual web gui users most often need the data right away. For example if a manager sees a huge spike in bandwidth on our fiber connection, they need to be able to instantly open the web page and get the exact details of that connection, such as the source ip address... this way they can determine what employee has initiated the download or whatever, and to what destination they are connected. This way they can say, "Oh this employee has a VNC connection opened to their house, that doesn't really seem work related", etc. On 09/16/2011 12:4
Re: Tail Module + DBI Module, can\'t keep up!
The first bottleneck is open/close for each record, which are time consuming operation. Why don't you just use one connection? The second one is do the condition check in script instead of doing all within database server. Try to fix this two issues. If it is still too slow, we can do more tuning. From: Brandon Phelps To: tiger peng Cc: Tim Bunce ; John Scoles ; "dbi-users@perl.org" Sent: Wednesday, September 21, 2011 9:04 AM Subject: Re: Tail Module + DBI Module, can\'t keep up! Thanks for the responses. I would love to be able to output the data to a flat file or something and do bulk inserts daily, however this isn't really feasible for my situation because the database operations depend on other records. Basically the system I am creating works like this currently: 1. Syslog-ng accepts syslog messages from a sonicwall firewall on the network, and logs to a file in /var/log 3. Perl script runs as a daemon, extracting open_dt, close_dt, protocol, source_ip, destination_ip, source_port, destination_port, bytes_sent, and bytes_received 2. Database table called sonicwall_connections, partitioned by the open_dt column, 1 partition per day 4. Perl script from (3) constantly reads in the /var/log files using the Tail module. It does the following: a. When a connection is opened, it INSERTs into the sonicwall_connections table b. When a connection is closed, it SELECTs from the sonicwall_connection table the last record id that matches the protocol, source_ip, source_port, and destination_port c. If a record exists matching this criteria, it UPDATEs that record's close_dt column with the time the connection was closed d. If a record does not exist, then in our case this means that the connection was denied due to firewall rules, and we instead INSERT into a different table, sonicwall_denied Previously I was doing the entire SELECT prepare and execute in my never ending while loop that parsed the log. I have since changed this per the recommendations here. Now, I open my connection to the database using Perl::DBI, then I create my prepared statement (outside the while loop), and I simply call execute() within the while loop, so the prepared statement only has to be created once. This seems to have helped a small bit in that the script does not fall behind quite as fast now, however it does still fall behind. Currently with these modifications it is 10AM EST and the script is still working on log entries from yesterday night. Around 12 hours behind, compared to the 22-23 hours behind it was previously, so this change obviously did help a bit... still not perfect yet though. As you can see from the procedure above I cannot really work with a flat file or non-indexed hot table because I constantly need to read back in records that were previously inserted. Sometimes the record I need was inserted just seconds before, while other times I need to retrieve records that have been there for days. For example, when a user loads a website in their browser, I get my initial open connection log entry and I do the initial INSERT, but only a second later (depending on the size of the site obviously) I am ready to UPDATE the record with the close_dt time (when the web request connection was closed). On the other hand if a user downloads a 10GB file via FTP, or starts a remote desktop session, the connection could potentially be opened for hours if not days, so I also need to be able to read records from that long ago, in order to get the record ID to UPDATE when the connection is closed. Another reason I cannot store records to a temporary table or flat file is because the actual web gui users most often need the data right away. For example if a manager sees a huge spike in bandwidth on our fiber connection, they need to be able to instantly open the web page and get the exact details of that connection, such as the source ip address... this way they can determine what employee has initiated the download or whatever, and to what destination they are connected. This way they can say, "Oh this employee has a VNC connection opened to their house, that doesn't really seem work related", etc. On 09/16/2011 12:42 PM, tiger peng wrote: > I just finished tuning an application almost exactly reverse Brandon's, not > parsing text file and then loading, but extracting, parsing/transforming the > data, and then generate XML file. > > Original application did fetch and parsing record one-by-one, as it the code > for parsing/transforming is no implemented efficient. It took around 40 > minutes to pull data from a highly active transaction database, the DBAs > complained about it.I split the extracting, let it quickly dump the data to a > flat file ( ~8 minutes) and then parsing/transforming the data and generate > the XML file (~3 min
Re: Tail Module + DBI Module, can\'t keep up!
t data file on DB server side. -- *From:* tiger peng *To:* Tim Bunce ; John Scoles *Cc:* "bphe...@gls.com" ; "dbi-users@perl.org" *Sent:* Friday, September 16, 2011 10:13 AM *Subject:* Re: Tail Module + DBI Module, can\'t keep up! Tim, I bet at that time your MySQL did not support partition. Now, with partition management available, the jobs should be easy. Using a small partition, called hot partition in DW/ETL field, to receive the new data, as there is only small chunk of data the insert/update DML should be executed fast (keep index locally).When it reach a threshold, split the hot partition, use the fresh hot partition to receive newer data, and merge the older data to the main partition. Using of text file has its advantage: When ever the DB is down, the parsing of log files can still run by itself; and when the DB is back, bulk-load tool can be used to catch up the load quickly. Tiger -- *From:* Tim Bunce *To:* John Scoles *Cc:* tigerpeng2...@yahoo.com; bphe...@gls.com; dbi-users@perl.org *Sent:* Friday, September 16, 2011 4:20 AM *Subject:* Re: Tail Module + DBI Module, can\'t keep up! This is all fine advice for performance tuning DBI app and worth doing. But there's always a limit to what can be achieved on the client. So it's worth exploring what can be done on the server side, beyond standard tuning practices. I wrote a high volume log insertion app using mysql many years ago. I can't remember the performance figures, but it was fast. A major cost of insertion is maintaining indices. So for maximum insertion speed you want to have no indices. But to query the log records you'll probably want indices. The way I approached this was to have per-hour tables: log_MMDDHH The loader streams the records into the table as fast as it can. (From memory I used a multi-row INSERT statement, no placeholders.) Every time it switches to a new hour it triggered the building of indices on the previous hour's table. It also triggered recreating per-day views (log_MMDD) that abstracted the hourly tables. The same technique could be applied at whatever time granularity meets your needs, from table-per day to table-per minute. If you can delay the loading of the log data by whatever period you're using (eg hourly) then you have the option of preparing the log data as hourly text files and then, when each is complete, using a proprietary bulk-loading tool to load it. Tim.
Re: Tail Module + DBI Module, can\'t keep up!
I just finished tuning an application almost exactly reverse Brandon's, not parsing text file and then loading, but extracting, parsing/transforming the data, and then generate XML file. Original application did fetch and parsing record one-by-one, as it the code for parsing/transforming is no implemented efficient. It took around 40 minutes to pull data from a highly active transaction database, the DBAs complained about it.I split the extracting, let it quickly dump the data to a flat file ( ~8 minutes) and then parsing/transforming the data and generate the XML file (~3 minutes). It is possible to speed it up even more by (parallel) pulling the data into memory (not very big, ~2GB) or dumping the flat data file on DB server side. From: tiger peng To: Tim Bunce ; John Scoles Cc: "bphe...@gls.com" ; "dbi-users@perl.org" Sent: Friday, September 16, 2011 10:13 AM Subject: Re: Tail Module + DBI Module, can\'t keep up! Tim, I bet at that time your MySQL did not support partition. Now, with partition management available, the jobs should be easy. Using a small partition, called hot partition in DW/ETL field, to receive the new data, as there is only small chunk of data the insert/update DML should be executed fast (keep index locally).When it reach a threshold, split the hot partition, use the fresh hot partition to receive newer data, and merge the older data to the main partition. Using of text file has its advantage: When ever the DB is down, the parsing of log files can still run by itself; and when the DB is back, bulk-load tool can be used to catch up the load quickly. Tiger From: Tim Bunce To: John Scoles Cc: tigerpeng2...@yahoo.com; bphe...@gls.com; dbi-users@perl.org Sent: Friday, September 16, 2011 4:20 AM Subject: Re: Tail Module + DBI Module, can\'t keep up! This is all fine advice for performance tuning DBI app and worth doing. But there's always a limit to what can be achieved on the client. So it's worth exploring what can be done on the server side, beyond standard tuning practices. I wrote a high volume log insertion app using mysql many years ago. I can't remember the performance figures, but it was fast. A major cost of insertion is maintaining indices. So for maximum insertion speed you want to have no indices. But to query the log records you'll probably want indices. The way I approached this was to have per-hour tables: log_MMDDHH The loader streams the records into the table as fast as it can. (From memory I used a multi-row INSERT statement, no placeholders.) Every time it switches to a new hour it triggered the building of indices on the previous hour's table. It also triggered recreating per-day views (log_MMDD) that abstracted the hourly tables. The same technique could be applied at whatever time granularity meets your needs, from table-per day to table-per minute. If you can delay the loading of the log data by whatever period you're using (eg hourly) then you have the option of preparing the log data as hourly text files and then, when each is complete, using a proprietary bulk-loading tool to load it. Tim.
Re: Tail Module + DBI Module, can\'t keep up!
Tim, I bet at that time your MySQL did not support partition. Now, with partition management available, the jobs should be easy. Using a small partition, called hot partition in DW/ETL field, to receive the new data, as there is only small chunk of data the insert/update DML should be executed fast (keep index locally).When it reach a threshold, split the hot partition, use the fresh hot partition to receive newer data, and merge the older data to the main partition. Using of text file has its advantage: When ever the DB is down, the parsing of log files can still run by itself; and when the DB is back, bulk-load tool can be used to catch up the load quickly. Tiger From: Tim Bunce To: John Scoles Cc: tigerpeng2...@yahoo.com; bphe...@gls.com; dbi-users@perl.org Sent: Friday, September 16, 2011 4:20 AM Subject: Re: Tail Module + DBI Module, can\'t keep up! This is all fine advice for performance tuning DBI app and worth doing. But there's always a limit to what can be achieved on the client. So it's worth exploring what can be done on the server side, beyond standard tuning practices. I wrote a high volume log insertion app using mysql many years ago. I can't remember the performance figures, but it was fast. A major cost of insertion is maintaining indices. So for maximum insertion speed you want to have no indices. But to query the log records you'll probably want indices. The way I approached this was to have per-hour tables: log_MMDDHH The loader streams the records into the table as fast as it can. (From memory I used a multi-row INSERT statement, no placeholders.) Every time it switches to a new hour it triggered the building of indices on the previous hour's table. It also triggered recreating per-day views (log_MMDD) that abstracted the hourly tables. The same technique could be applied at whatever time granularity meets your needs, from table-per day to table-per minute. If you can delay the loading of the log data by whatever period you're using (eg hourly) then you have the option of preparing the log data as hourly text files and then, when each is complete, using a proprietary bulk-loading tool to load it. Tim.
Re: Tail Module + DBI Module, can\'t keep up!
This is all fine advice for performance tuning DBI app and worth doing. But there's always a limit to what can be achieved on the client. So it's worth exploring what can be done on the server side, beyond standard tuning practices. I wrote a high volume log insertion app using mysql many years ago. I can't remember the performance figures, but it was fast. A major cost of insertion is maintaining indices. So for maximum insertion speed you want to have no indices. But to query the log records you'll probably want indices. The way I approached this was to have per-hour tables: log_MMDDHH The loader streams the records into the table as fast as it can. (From memory I used a multi-row INSERT statement, no placeholders.) Every time it switches to a new hour it triggered the building of indices on the previous hour's table. It also triggered recreating per-day views (log_MMDD) that abstracted the hourly tables. The same technique could be applied at whatever time granularity meets your needs, from table-per day to table-per minute. If you can delay the loading of the log data by whatever period you're using (eg hourly) then you have the option of preparing the log data as hourly text files and then, when each is complete, using a proprietary bulk-loading tool to load it. Tim.
RE: Tail Module + DBI Module, can\'t keep up!
> Date: Thu, 15 Sep 2011 09:26:41 -0700 > From: tigerpeng2...@yahoo.com > Subject: Re: Tail Module + DBI Module, can\'t keep up! > To: bphe...@gls.com > CC: dbi-users@perl.org > > Separate the process into two steps and figure out which step should be fixed > first. > > 1. Parse the log with Tail and dump the data to plain file. > 2. Load the plain file with DBI (are you using batch insert?) This would be my stratigy as well. you could also give the insert_array at try but I do not think that will give you much as you are useing MySQL but at lease you could use one script that 1) Bufferes up X-n records to insert in an array/arrays 2) at buffer X do an insert with execute_array 3) carry on burrering up the array. one other thing to look out for is to make sure you are not 'prepareing' the same SQL over and over agian. If you have only 1 query just prepare it once and reuse tha handle. Also make sure you are not trying to reconnect each time as both of these actions are rather heavy on resources. Cheers John
Re: Tail Module + DBI Module, can\'t keep up!
Separate the process into two steps and figure out which step should be fixed first. 1. Parse the log with Tail and dump the data to plain file. 2. Load the plain file with DBI (are you using batch insert?)