Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On Saturday 08 September 2012 01:05:28 Graeme Geldenhuys wrote: > On 07/09/12 12:12, michael.vancann...@wisa.be wrote: > > I once did tests with that (600.000 records) and did not notice any > > influence of the transaction control. > > Same here... I've imported 100's of thousands of records with > SqlDB+Firebird with no serious speed issues. Also from CSV files. > Transactions are always used. > It depends on the reaction time of the network because AFAIK there is a roundtrip for every inserted record... Martin ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On 07/09/12 12:12, michael.vancann...@wisa.be wrote: I once did tests with that (600.000 records) and did not notice any influence of the transaction control. Same here... I've imported 100's of thousands of records with SqlDB+Firebird with no serious speed issues. Also from CSV files. Transactions are always used. Graeme. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Re: Firebird: bulk insert performance: suggestions?
> If it prepares the statement automatically, it also > unprepares it. (at least, it should :) ) > I does for every change in query, connection, transaction, active state of dataset, filter, etc. , but not at the end of an execsql. Ludo ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Re: Firebird: bulk insert performance: suggestions?
> > Do you prepare the query before you start the batch ? > > If not, it is prepared on every insert, which is inherently slower. > I didn't do an explicit .Prepare, but I've added it, thanks. > I thought sqldb would prepare automatically if you are using > parameters though? sqldb always uses a prepare. As long as you don't change the sql statement or close the dataset, the prepare will only be done once. A tight "setparams execsql" loop will prepare once and execute many times. Ludo ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] Re: Firebird: bulk insert performance: suggestions?
On 7-9-2012 13:22, Ludo Brands wrote: >> For my Dutch postcode program https://bitbucket.org/reiniero/postcode >> with an embedded Firebird 2.5 database, I allow users to read >> in a CSV file with new or updated postcode data. I use sqldb, >> FPC x86. I'd like to get your suggestions on speed improvements. >> > > Turn of indices when inserting and turn them on again when the inserting is > done. No indices/constraints on that table; the stored procedure that processes the records from that table will go through them line by line anyway: CREATE TABLE BULKINSERTDATA ( PROVINCENAME VARCHAR(255), CITYNAME VARCHAR(64), POSTCODE VARCHAR(6), STREETNAME VARCHAR(255), LOW INTEGER, HIGH INTEGER, EVEN BIT DEFAULT NULL, --basically a SMALLINT LATITUDE DECIMAL(10,8), LONGITUDE DECIMAL(10,8) ); > Since you are the only user and concurrent access is not that important (I > guess), I believe isc_tpb_concurrency is not the best choice. IIRC > isc_tpb_read_committed + isc_tpb_no_rec_version has the less overhead. Mmmm, I remember having figured this out earlier. At least I investigated enough to write http://wiki.lazarus.freepascal.org/Firebird_in_action#Advanced_transactions ... but didn't document enough so that I can justify my choice ;) I'll do some more digging and get back on this. Thanks, Reinier Oh, if anybody has suggestions about improving the SP, I'd be grateful... It's meant to either add new data or replace existing matching data. City 1:N CityName Province and Country are not used ATM (the Pascal code passess NULL values) Realstreet has postcode details (e.g. the letters AB in 1012AB) Postcode.FourPP has the postcode digits (e.g. 1012 in 1012AB) SET TERM ^ ; CREATE PROCEDURE BULKUPDATE AS DECLARE VARIABLE localPROVINCENAME VARCHAR(255); DECLARE VARIABLE localCITYNAME VARCHAR(64); DECLARE VARIABLE localPOSTCODE VARCHAR(6); DECLARE VARIABLE localSTREETNAME VARCHAR(255); DECLARE VARIABLE localLOW INTEGER; DECLARE VARIABLE localHIGH INTEGER; DECLARE VARIABLE localEVEN BIT; DECLARE VARIABLE localLAT DECIMAL(10,8); DECLARE VARIABLE localLNG DECIMAL(10,8); DECLARE VARIABLE localCOUNTRYID INTEGER; DECLARE VARIABLE localPROVINCEID INTEGER; DECLARE VARIABLE localCITYNAMEID INTEGER; DECLARE VARIABLE localCITYID INTEGER; DECLARE VARIABLE localPOSTCODEID INTEGER; DECLARE VARIABLE localSTREETNAMEID INTEGER; DECLARE VARIABLE localFOURPP INTEGER; DECLARE VARIABLE localPOSTCODECHARS POSTCODECHARS; BEGIN FOR SELECT PROVINCENAME, CITYNAME, POSTCODE, STREETNAME, LOW, HIGH, EVEN, LATITUDE, LONGITUDE FROM BULKINSERTDATA INTO :localPROVINCENAME, :localCITYNAME, :localPOSTCODE, :localSTREETNAME, :localLOW, :localHIGH, :localEVEN, :localLAT, :localLNG DO BEGIN /* 1. Test for required input */ IF (:localCITYNAME IS NULL) THEN BEGIN IN AUTONOMOUS TRANSACTION DO BEGIN INSERT INTO LOGS (LOGMESSAGE) VALUES ('CITYNAME is null. Exception will be called: DATAMAYNOTBENULL'); END EXCEPTION DATAMAYNOTBENULL; END IF (:localPOSTCODE IS NULL) THEN BEGIN IN AUTONOMOUS TRANSACTION DO BEGIN INSERT INTO LOGS (LOGMESSAGE) VALUES ('POSTCODE is null. Exception will be called: DATAMAYNOTBENULL'); END EXCEPTION DATAMAYNOTBENULL; END IF (:localSTREETNAME IS NULL) THEN BEGIN IN AUTONOMOUS TRANSACTION DO BEGIN INSERT INTO LOGS (LOGMESSAGE) VALUES ('STREETNAME is null. Exception will be called: DATAMAYNOTBENULL'); END EXCEPTION DATAMAYNOTBENULL; END /* 2. Test for valid input, initialize variables*/ localCOUNTRYID=NULL; localPROVINCEID=NULL; localCITYNAMEID=NULL; localCITYID=NULL; localPOSTCODEID=NULL; localSTREETNAMEID=NULL; localFOURPP=LEFT(localPOSTCODE, 4); localPOSTCODECHARS=RIGHT(localPOSTCODE, 2); /* Fill database */ -- We use update or insert instead of merge because we can use the returning clause. UPDATE OR INSERT INTO COUNTRY(COUNTRYNAME) VALUES ('Nederland') MATCHING (COUNTRYNAME) RETURNING ID INTO :localCOUNTRYID; IF (:localPROVINCENAME IS NULL) THEN BEGIN localPROVINCEID=NULL; END ELSE BEGIN UPDATE OR INSERT INTO PROVINCE(PROVINCENAME, COUNTRY_ID) VALUES(:localPROVINCENAME, :localCOUNTRYID) MATCHING (PROVINCENAME) RETURNING ID INTO :localPROVINCEID; END -- City is special, only add something if we don't have a valid CITYNAME -- Also, we assume the city name given is the official cityname. SELECT ID FROM CITYNAME WHERE NAME=:localCITYNAME INTO :localCITYNAMEID; IF (:localCITYNAMEID IS NULL) THEN BEGIN -- Add a city record first, then a CITYNAME INSERT INTO CITY (PROVINCE_ID) VALUES (:localPROVINCEID) RETURNING ID INTO :localCITYID; INSERT INTO CITYNAME(NAME, CITY_ID, OFFICIAL) VALUES (:localCITYNAME, :localCITYID, 1) RETURNI
Re: [fpc-pascal] Re: Firebird: bulk insert performance: suggestions?
On Fri, 7 Sep 2012, Reinier Olislagers wrote: On 7-9-2012 13:12, michael.vancanneyt-0is9kj9s...@public.gmane.org wrote: On Fri, 7 Sep 2012, Reinier Olislagers wrote: then the transaction is started (if it is inactive) and the query parameters are filled (using Query.Params.ParamByName, but I don't suppose that would be a big slowdown??); finally the SQL is executed. The transaction is left open. Do you prepare the query before you start the batch ? If not, it is prepared on every insert, which is inherently slower. I didn't do an explicit .Prepare, but I've added it, thanks. I thought sqldb would prepare automatically if you are using parameters though? If it prepares the statement automatically, it also unprepares it. (at least, it should :) ) Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
> For my Dutch postcode program https://bitbucket.org/reiniero/postcode > with an embedded Firebird 2.5 database, I allow users to read > in a CSV file with new or updated postcode data. I use sqldb, > FPC x86. I'd like to get your suggestions on speed improvements. > Turn of indices when inserting and turn them on again when the inserting is done. Since you are the only user and concurrent access is not that important (I guess), I believe isc_tpb_concurrency is not the best choice. IIRC isc_tpb_read_committed + isc_tpb_no_rec_version has the less overhead. Ludo ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] Re: Firebird: bulk insert performance: suggestions?
On 7-9-2012 13:12, michael.vancanneyt-0is9kj9s...@public.gmane.org wrote: > On Fri, 7 Sep 2012, Reinier Olislagers wrote: >> then the transaction is started (if it is inactive) and the query >> parameters are filled (using Query.Params.ParamByName, but I don't >> suppose that would be a big slowdown??); finally the SQL is executed. >> The transaction is left open. > > Do you prepare the query before you start the batch ? > If not, it is prepared on every insert, which is inherently slower. I didn't do an explicit .Prepare, but I've added it, thanks. I thought sqldb would prepare automatically if you are using parameters though? >> Currently, after every 100 records, the transaction is committed: >> if (linenum mod 100=0) then >> FDBLayer.BulkInsertCommit(false); >> IIRC, advice on the Firebird list is to play with this interval; any >> suggestions? Given the aggressive nature of the transaction parameters, >> I might even dispense with it. > > I once did tests with that (600.000 records) and did not notice any > influence > of the transaction control. Ok, thanks. Time to add some timing output to the GUI ;) (Though my stored procedure could probably be optimized as well, I suppose... perhaps I'll try on the Firebird list) ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?
On Fri, 7 Sep 2012, Reinier Olislagers wrote: For my Dutch postcode program https://bitbucket.org/reiniero/postcode with an embedded Firebird 2.5 database, I allow users to read in a CSV file with new or updated postcode data. I use sqldb, FPC x86. I'd like to get your suggestions on speed improvements. I try to get the data into a temporary table as quickly as possible. Later on, a stored procedure will normalize the data and insert to/update various tables (with postcode, city, street information, etc). Because I also allow querying information, I set up 2 connections+transactions: for reading and writing in my database class constructor, and destroy them in the destructor. However, (currently) my application controls the database and I know that querying and bulk inserts at the same time is impossible. The write transaction has this code: FWriteTransaction.Params.Add('isc_tpb_concurrency'); FWriteTransaction.Params.Add('isc_tpb_write'); FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable transaction-level undo log, handy for getting max throughput when performing a batch update My code loads an ANSI CSV file into a csvdocument in memory (about 50meg), then goes through it, and calls an insert procedure for each record (converting the field contents to UTF8): FDBLayer.BulkInsertUpdateRecord( SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]), SysToUTF8(Postcodes.Cells[CityField,LineNum]), SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]), SysToUTF8(Postcodes.Cells[StreetField,LineNum]), StrToInt(Postcodes.Cells[NumberLowestField,LineNum]), StrToInt(Postcodes.Cells[NumberHighestField,LineNum]), Even, Latitude, Longitude); Relevant snippets from the insert procedure: QuerySQL='INSERT INTO BULKINSERTDATA '+ '(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE) '+ 'VALUES ( '+ ':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)'; then the transaction is started (if it is inactive) and the query parameters are filled (using Query.Params.ParamByName, but I don't suppose that would be a big slowdown??); finally the SQL is executed. The transaction is left open. Do you prepare the query before you start the batch ? If not, it is prepared on every insert, which is inherently slower. Currently, after every 100 records, the transaction is committed: if (linenum mod 100=0) then FDBLayer.BulkInsertCommit(false); IIRC, advice on the Firebird list is to play with this interval; any suggestions? Given the aggressive nature of the transaction parameters, I might even dispense with it. I once did tests with that (600.000 records) and did not notice any influence of the transaction control. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] Firebird: bulk insert performance: suggestions?
For my Dutch postcode program https://bitbucket.org/reiniero/postcode with an embedded Firebird 2.5 database, I allow users to read in a CSV file with new or updated postcode data. I use sqldb, FPC x86. I'd like to get your suggestions on speed improvements. I try to get the data into a temporary table as quickly as possible. Later on, a stored procedure will normalize the data and insert to/update various tables (with postcode, city, street information, etc). Because I also allow querying information, I set up 2 connections+transactions: for reading and writing in my database class constructor, and destroy them in the destructor. However, (currently) my application controls the database and I know that querying and bulk inserts at the same time is impossible. The write transaction has this code: FWriteTransaction.Params.Add('isc_tpb_concurrency'); FWriteTransaction.Params.Add('isc_tpb_write'); FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable transaction-level undo log, handy for getting max throughput when performing a batch update My code loads an ANSI CSV file into a csvdocument in memory (about 50meg), then goes through it, and calls an insert procedure for each record (converting the field contents to UTF8): FDBLayer.BulkInsertUpdateRecord( SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]), SysToUTF8(Postcodes.Cells[CityField,LineNum]), SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]), SysToUTF8(Postcodes.Cells[StreetField,LineNum]), StrToInt(Postcodes.Cells[NumberLowestField,LineNum]), StrToInt(Postcodes.Cells[NumberHighestField,LineNum]), Even, Latitude, Longitude); Relevant snippets from the insert procedure: QuerySQL='INSERT INTO BULKINSERTDATA '+ '(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE) '+ 'VALUES ( '+ ':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)'; then the transaction is started (if it is inactive) and the query parameters are filled (using Query.Params.ParamByName, but I don't suppose that would be a big slowdown??); finally the SQL is executed. The transaction is left open. Currently, after every 100 records, the transaction is committed: if (linenum mod 100=0) then FDBLayer.BulkInsertCommit(false); IIRC, advice on the Firebird list is to play with this interval; any suggestions? Given the aggressive nature of the transaction parameters, I might even dispense with it. Finally, once done, the transaction is committed, and the xtored procedure that does subsequent updates is called. Thanks, Reinier ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal