Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On 07/09/12 14:03, Reinier Olislagers wrote: For my Dutch postcode program https://bitbucket.org/reiniero/postcode with an embedded Firebird 2.5 database, I allow users to read in a CSV file with new or updated postcode data. I use sqldb, FPC x86. I'd like to get your suggestions on speed improvements. I try to get the data into a temporary table as quickly as possible. Later on, a stored procedure will normalize the data and insert to/update various tables (with postcode, city, street information, etc). Because I also allow querying information, I set up 2 connections+transactions: for reading and writing in my database class constructor, and destroy them in the destructor. However, (currently) my application controls the database and I know that querying and bulk inserts at the same time is impossible. The write transaction has this code: FWriteTransaction.Params.Add('isc_tpb_concurrency'); FWriteTransaction.Params.Add('isc_tpb_write'); FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable transaction-level undo log, handy for getting max throughput when performing a batch update My code loads an ANSI CSV file into a csvdocument in memory (about 50meg), then goes through it, and calls an insert procedure for each record (converting the field contents to UTF8): FDBLayer.BulkInsertUpdateRecord( SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]), SysToUTF8(Postcodes.Cells[CityField,LineNum]), SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]), SysToUTF8(Postcodes.Cells[StreetField,LineNum]), StrToInt(Postcodes.Cells[NumberLowestField,LineNum]), StrToInt(Postcodes.Cells[NumberHighestField,LineNum]), Even, Latitude, Longitude); Relevant snippets from the insert procedure: QuerySQL='INSERT INTO BULKINSERTDATA '+ '(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE) '+ 'VALUES ( '+ ':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)'; then the transaction is started (if it is inactive) and the query parameters are filled (using Query.Params.ParamByName, but I don't suppose that would be a big slowdown??); finally the SQL is executed. The transaction is left open. Currently, after every 100 records, the transaction is committed: if (linenum mod 100=0) then FDBLayer.BulkInsertCommit(false); IIRC, advice on the Firebird list is to play with this interval; any suggestions? Given the aggressive nature of the transaction parameters, I might even dispense with it. Finally, once done, the transaction is committed, and the xtored procedure that does subsequent updates is called. Thanks, Reinier ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal - Turning Forced Writes off while doing the bulk inserts may help (at a higher risk). But make sure you turn it back on afterwards. - If my memory serves well, the Firebird SP can directly read from the CSV file instead of going through sqldb for each and every record. Stephano ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On 8-9-2012 11:04, patspiper wrote: On 07/09/12 14:03, Reinier Olislagers wrote: - Turning Forced Writes off while doing the bulk inserts may help (at a higher risk). But make sure you turn it back on afterwards. - If my memory serves well, the Firebird SP can directly read from the CSV file instead of going through sqldb for each and every record. Stephano Thanks, Stephano! 1. Yep, forced writes could indeed be a good one (surely for the first phase: putting everything in the temporary table). 2. It can't read from CSV - but can read from fixed width text files. The problem is that it gets very complicated, (AFAIU) impossible if you have UTF8 data in there. Of course, currently I'm reading ANSI and converting to UF8 myself; the db can probably do that too. I'll keep this option in mind - however I'd also like to support client/server systems in future so that's why I haven't looked at this further. Thanks, Reinier ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On 08/09/12 06:31, Martin Schreiber wrote: It depends on the reaction time of the network because AFAIK there is a roundtrip for every inserted record... I didn't commit the transaction after every record. I first tried in a batch of 1 records per transaction. Later I even bumped that up more, and still no problems. Anyway, that code isn't used too much in our product - it is only used for data conversion from our old system to our new system. So normally only needed once per setup. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] Firebird: bulk insert performance: suggestions?
For my Dutch postcode program https://bitbucket.org/reiniero/postcode with an embedded Firebird 2.5 database, I allow users to read in a CSV file with new or updated postcode data. I use sqldb, FPC x86. I'd like to get your suggestions on speed improvements. I try to get the data into a temporary table as quickly as possible. Later on, a stored procedure will normalize the data and insert to/update various tables (with postcode, city, street information, etc). Because I also allow querying information, I set up 2 connections+transactions: for reading and writing in my database class constructor, and destroy them in the destructor. However, (currently) my application controls the database and I know that querying and bulk inserts at the same time is impossible. The write transaction has this code: FWriteTransaction.Params.Add('isc_tpb_concurrency'); FWriteTransaction.Params.Add('isc_tpb_write'); FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable transaction-level undo log, handy for getting max throughput when performing a batch update My code loads an ANSI CSV file into a csvdocument in memory (about 50meg), then goes through it, and calls an insert procedure for each record (converting the field contents to UTF8): FDBLayer.BulkInsertUpdateRecord( SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]), SysToUTF8(Postcodes.Cells[CityField,LineNum]), SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]), SysToUTF8(Postcodes.Cells[StreetField,LineNum]), StrToInt(Postcodes.Cells[NumberLowestField,LineNum]), StrToInt(Postcodes.Cells[NumberHighestField,LineNum]), Even, Latitude, Longitude); Relevant snippets from the insert procedure: QuerySQL='INSERT INTO BULKINSERTDATA '+ '(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE) '+ 'VALUES ( '+ ':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)'; then the transaction is started (if it is inactive) and the query parameters are filled (using Query.Params.ParamByName, but I don't suppose that would be a big slowdown??); finally the SQL is executed. The transaction is left open. Currently, after every 100 records, the transaction is committed: if (linenum mod 100=0) then FDBLayer.BulkInsertCommit(false); IIRC, advice on the Firebird list is to play with this interval; any suggestions? Given the aggressive nature of the transaction parameters, I might even dispense with it. Finally, once done, the transaction is committed, and the xtored procedure that does subsequent updates is called. Thanks, Reinier ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On Fri, 7 Sep 2012, Reinier Olislagers wrote: For my Dutch postcode program https://bitbucket.org/reiniero/postcode with an embedded Firebird 2.5 database, I allow users to read in a CSV file with new or updated postcode data. I use sqldb, FPC x86. I'd like to get your suggestions on speed improvements. I try to get the data into a temporary table as quickly as possible. Later on, a stored procedure will normalize the data and insert to/update various tables (with postcode, city, street information, etc). Because I also allow querying information, I set up 2 connections+transactions: for reading and writing in my database class constructor, and destroy them in the destructor. However, (currently) my application controls the database and I know that querying and bulk inserts at the same time is impossible. The write transaction has this code: FWriteTransaction.Params.Add('isc_tpb_concurrency'); FWriteTransaction.Params.Add('isc_tpb_write'); FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable transaction-level undo log, handy for getting max throughput when performing a batch update My code loads an ANSI CSV file into a csvdocument in memory (about 50meg), then goes through it, and calls an insert procedure for each record (converting the field contents to UTF8): FDBLayer.BulkInsertUpdateRecord( SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]), SysToUTF8(Postcodes.Cells[CityField,LineNum]), SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]), SysToUTF8(Postcodes.Cells[StreetField,LineNum]), StrToInt(Postcodes.Cells[NumberLowestField,LineNum]), StrToInt(Postcodes.Cells[NumberHighestField,LineNum]), Even, Latitude, Longitude); Relevant snippets from the insert procedure: QuerySQL='INSERT INTO BULKINSERTDATA '+ '(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE) '+ 'VALUES ( '+ ':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)'; then the transaction is started (if it is inactive) and the query parameters are filled (using Query.Params.ParamByName, but I don't suppose that would be a big slowdown??); finally the SQL is executed. The transaction is left open. Do you prepare the query before you start the batch ? If not, it is prepared on every insert, which is inherently slower. Currently, after every 100 records, the transaction is committed: if (linenum mod 100=0) then FDBLayer.BulkInsertCommit(false); IIRC, advice on the Firebird list is to play with this interval; any suggestions? Given the aggressive nature of the transaction parameters, I might even dispense with it. I once did tests with that (600.000 records) and did not notice any influence of the transaction control. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
For my Dutch postcode program https://bitbucket.org/reiniero/postcode with an embedded Firebird 2.5 database, I allow users to read in a CSV file with new or updated postcode data. I use sqldb, FPC x86. I'd like to get your suggestions on speed improvements. Turn of indices when inserting and turn them on again when the inserting is done. Since you are the only user and concurrent access is not that important (I guess), I believe isc_tpb_concurrency is not the best choice. IIRC isc_tpb_read_committed + isc_tpb_no_rec_version has the less overhead. Ludo ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On 07/09/12 12:12, michael.vancann...@wisa.be wrote: I once did tests with that (600.000 records) and did not notice any influence of the transaction control. Same here... I've imported 100's of thousands of records with SqlDB+Firebird with no serious speed issues. Also from CSV files. Transactions are always used. Graeme. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On Saturday 08 September 2012 01:05:28 Graeme Geldenhuys wrote: On 07/09/12 12:12, michael.vancann...@wisa.be wrote: I once did tests with that (600.000 records) and did not notice any influence of the transaction control. Same here... I've imported 100's of thousands of records with SqlDB+Firebird with no serious speed issues. Also from CSV files. Transactions are always used. It depends on the reaction time of the network because AFAIK there is a roundtrip for every inserted record... Martin ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal