[fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Reinier Olislagers
For my Dutch postcode program https://bitbucket.org/reiniero/postcode
with an embedded Firebird 2.5 database, I allow users to read in a CSV
file with new or updated postcode data.
I use sqldb, FPC x86.
I'd like to get your suggestions on speed improvements.


I try to get the data into a temporary table as quickly as possible.
Later on, a stored procedure will normalize the data and insert
to/update various tables (with postcode, city, street information, etc).

Because I also allow querying information, I set up 2
connections+transactions: for reading and writing in my database class
constructor, and destroy them in the destructor.
However, (currently) my application controls the database and I know
that querying and bulk inserts at the same time is impossible.

The write transaction has this code:
FWriteTransaction.Params.Add('isc_tpb_concurrency');
FWriteTransaction.Params.Add('isc_tpb_write');
FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable
transaction-level undo log, handy for getting max throughput when
performing a batch update

My code loads an ANSI CSV file into a csvdocument in memory (about
50meg), then goes through it, and calls an insert procedure for each
record (converting the field contents to UTF8):
FDBLayer.BulkInsertUpdateRecord(
  SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]),
  SysToUTF8(Postcodes.Cells[CityField,LineNum]),
  SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]),
  SysToUTF8(Postcodes.Cells[StreetField,LineNum]),
  StrToInt(Postcodes.Cells[NumberLowestField,LineNum]),
  StrToInt(Postcodes.Cells[NumberHighestField,LineNum]),
  Even,
  Latitude,
  Longitude);


Relevant snippets from the insert procedure:
  QuerySQL='INSERT INTO BULKINSERTDATA '+
'(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE)
'+
'VALUES ( '+
':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)';
then the transaction is started (if it is inactive) and the query
parameters are filled (using Query.Params.ParamByName, but I don't
suppose that would be a big slowdown??); finally the SQL is executed.
The transaction is left open.

Currently, after every 100 records, the transaction is committed:
if (linenum mod 100=0) then
  FDBLayer.BulkInsertCommit(false);
IIRC, advice on the Firebird list is to play with this interval; any
suggestions? Given the aggressive nature of the transaction parameters,
I might even dispense with it.

Finally, once done, the transaction is committed, and the xtored
procedure that does subsequent updates is called.

Thanks,
Reinier
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread michael . vancanneyt



On Fri, 7 Sep 2012, Reinier Olislagers wrote:


For my Dutch postcode program https://bitbucket.org/reiniero/postcode
with an embedded Firebird 2.5 database, I allow users to read in a CSV
file with new or updated postcode data.
I use sqldb, FPC x86.
I'd like to get your suggestions on speed improvements.


I try to get the data into a temporary table as quickly as possible.
Later on, a stored procedure will normalize the data and insert
to/update various tables (with postcode, city, street information, etc).

Because I also allow querying information, I set up 2
connections+transactions: for reading and writing in my database class
constructor, and destroy them in the destructor.
However, (currently) my application controls the database and I know
that querying and bulk inserts at the same time is impossible.

The write transaction has this code:
FWriteTransaction.Params.Add('isc_tpb_concurrency');
FWriteTransaction.Params.Add('isc_tpb_write');
FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable
transaction-level undo log, handy for getting max throughput when
performing a batch update

My code loads an ANSI CSV file into a csvdocument in memory (about
50meg), then goes through it, and calls an insert procedure for each
record (converting the field contents to UTF8):
FDBLayer.BulkInsertUpdateRecord(
 SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]),
 SysToUTF8(Postcodes.Cells[CityField,LineNum]),
 SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]),
 SysToUTF8(Postcodes.Cells[StreetField,LineNum]),
 StrToInt(Postcodes.Cells[NumberLowestField,LineNum]),
 StrToInt(Postcodes.Cells[NumberHighestField,LineNum]),
 Even,
 Latitude,
 Longitude);


Relevant snippets from the insert procedure:
 QuerySQL='INSERT INTO BULKINSERTDATA '+
'(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE)
'+
   'VALUES ( '+
':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)';
then the transaction is started (if it is inactive) and the query
parameters are filled (using Query.Params.ParamByName, but I don't
suppose that would be a big slowdown??); finally the SQL is executed.
The transaction is left open.


Do you prepare the query before you start the batch ?
If not, it is prepared on every insert, which is inherently slower.



Currently, after every 100 records, the transaction is committed:
if (linenum mod 100=0) then
 FDBLayer.BulkInsertCommit(false);
IIRC, advice on the Firebird list is to play with this interval; any
suggestions? Given the aggressive nature of the transaction parameters,
I might even dispense with it.


I once did tests with that (600.000 records) and did not notice any influence
of the transaction control.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


[fpc-pascal] Re: Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Reinier Olislagers
On 7-9-2012 13:12, michael.vancanneyt-0is9kj9s...@public.gmane.org wrote:
 On Fri, 7 Sep 2012, Reinier Olislagers wrote:
 then the transaction is started (if it is inactive) and the query
 parameters are filled (using Query.Params.ParamByName, but I don't
 suppose that would be a big slowdown??); finally the SQL is executed.
 The transaction is left open.
 
 Do you prepare the query before you start the batch ?
 If not, it is prepared on every insert, which is inherently slower.
I didn't do an explicit .Prepare, but I've added it, thanks.
I thought sqldb would prepare automatically if you are using
parameters though?

 Currently, after every 100 records, the transaction is committed:
 if (linenum mod 100=0) then
  FDBLayer.BulkInsertCommit(false);
 IIRC, advice on the Firebird list is to play with this interval; any
 suggestions? Given the aggressive nature of the transaction parameters,
 I might even dispense with it.
 
 I once did tests with that (600.000 records) and did not notice any
 influence
 of the transaction control.
Ok, thanks.

Time to add some timing output to the GUI ;)
(Though my stored procedure could probably be optimized as well, I
suppose... perhaps I'll try on the Firebird list)

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Ludo Brands
 For my Dutch postcode program https://bitbucket.org/reiniero/postcode
 with an embedded Firebird 2.5 database, I allow users to read 
 in a CSV file with new or updated postcode data. I use sqldb, 
 FPC x86. I'd like to get your suggestions on speed improvements.
 

Turn of indices when inserting and turn them on again when the inserting is
done.

Since you are the only user and concurrent access is not that important (I
guess), I believe isc_tpb_concurrency is not the best choice. IIRC
isc_tpb_read_committed + isc_tpb_no_rec_version has the less overhead. 

Ludo

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Re: Firebird: bulk insert performance: suggestions?

2012-09-07 Thread michael . vancanneyt



On Fri, 7 Sep 2012, Reinier Olislagers wrote:


On 7-9-2012 13:12, michael.vancanneyt-0is9kj9s...@public.gmane.org wrote:

On Fri, 7 Sep 2012, Reinier Olislagers wrote:

then the transaction is started (if it is inactive) and the query
parameters are filled (using Query.Params.ParamByName, but I don't
suppose that would be a big slowdown??); finally the SQL is executed.
The transaction is left open.


Do you prepare the query before you start the batch ?
If not, it is prepared on every insert, which is inherently slower.

I didn't do an explicit .Prepare, but I've added it, thanks.
I thought sqldb would prepare automatically if you are using
parameters though?


If it prepares the statement automatically, it also unprepares it.
(at least, it should :) )

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


[fpc-pascal] Re: Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Reinier Olislagers
On 7-9-2012 13:22, Ludo Brands wrote:
 For my Dutch postcode program https://bitbucket.org/reiniero/postcode
 with an embedded Firebird 2.5 database, I allow users to read 
 in a CSV file with new or updated postcode data. I use sqldb, 
 FPC x86. I'd like to get your suggestions on speed improvements.

 
 Turn of indices when inserting and turn them on again when the inserting is
 done.
No indices/constraints on that table; the stored procedure that
processes the records from that table will go through them line by line
anyway:
CREATE TABLE BULKINSERTDATA
(
  PROVINCENAME VARCHAR(255),
  CITYNAME VARCHAR(64),
  POSTCODE VARCHAR(6),
  STREETNAME VARCHAR(255),
  LOW INTEGER,
  HIGH INTEGER,
  EVEN BIT DEFAULT NULL, --basically a SMALLINT
  LATITUDE DECIMAL(10,8),
  LONGITUDE DECIMAL(10,8)
);

 Since you are the only user and concurrent access is not that important (I
 guess), I believe isc_tpb_concurrency is not the best choice. IIRC
 isc_tpb_read_committed + isc_tpb_no_rec_version has the less overhead. 
Mmmm, I remember having figured this out earlier. At least I
investigated enough to write
http://wiki.lazarus.freepascal.org/Firebird_in_action#Advanced_transactions
... but didn't document enough so that I can justify my choice ;)

I'll do some more digging and get back on this.

Thanks,
Reinier

Oh, if anybody has suggestions about improving the SP, I'd be grateful...
It's meant to either add new data or replace existing matching data.
City 1:N CityName
Province and Country are not used ATM (the Pascal code passess NULL values)
Realstreet has postcode details (e.g. the letters AB in 1012AB)
Postcode.FourPP has the postcode digits (e.g. 1012 in 1012AB)

SET TERM ^ ;
CREATE PROCEDURE BULKUPDATE
AS
DECLARE VARIABLE localPROVINCENAME VARCHAR(255);
DECLARE VARIABLE localCITYNAME VARCHAR(64);
DECLARE VARIABLE localPOSTCODE VARCHAR(6);
DECLARE VARIABLE localSTREETNAME VARCHAR(255);
DECLARE VARIABLE localLOW INTEGER;
DECLARE VARIABLE localHIGH INTEGER;
DECLARE VARIABLE localEVEN BIT;
DECLARE VARIABLE localLAT DECIMAL(10,8);
DECLARE VARIABLE localLNG DECIMAL(10,8);
DECLARE VARIABLE localCOUNTRYID INTEGER;
DECLARE VARIABLE localPROVINCEID INTEGER;
DECLARE VARIABLE localCITYNAMEID INTEGER;
DECLARE VARIABLE localCITYID INTEGER;
DECLARE VARIABLE localPOSTCODEID INTEGER;
DECLARE VARIABLE localSTREETNAMEID INTEGER;
DECLARE VARIABLE localFOURPP INTEGER;
DECLARE VARIABLE localPOSTCODECHARS POSTCODECHARS;
BEGIN

FOR SELECT
  PROVINCENAME,
  CITYNAME,
  POSTCODE,
  STREETNAME,
  LOW,
  HIGH,
  EVEN,
  LATITUDE,
  LONGITUDE
  FROM BULKINSERTDATA
  INTO
  :localPROVINCENAME,
  :localCITYNAME,
  :localPOSTCODE,
  :localSTREETNAME,
  :localLOW,
  :localHIGH,
  :localEVEN,
  :localLAT,
  :localLNG
DO
BEGIN
/* 1. Test for required input */
IF (:localCITYNAME IS NULL) THEN
BEGIN
IN AUTONOMOUS TRANSACTION
DO
BEGIN
  INSERT INTO LOGS (LOGMESSAGE) VALUES ('CITYNAME is null.
Exception will be called: DATAMAYNOTBENULL');
END
EXCEPTION DATAMAYNOTBENULL;
END


IF (:localPOSTCODE IS NULL) THEN
BEGIN
IN AUTONOMOUS TRANSACTION
DO
BEGIN
  INSERT INTO LOGS (LOGMESSAGE) VALUES ('POSTCODE is null.
Exception will be called: DATAMAYNOTBENULL');
END
EXCEPTION DATAMAYNOTBENULL;
END


IF (:localSTREETNAME IS NULL) THEN
BEGIN
IN AUTONOMOUS TRANSACTION
DO
BEGIN
  INSERT INTO LOGS (LOGMESSAGE) VALUES ('STREETNAME is null.
Exception will be called: DATAMAYNOTBENULL');
END
EXCEPTION DATAMAYNOTBENULL;
END

/* 2. Test for valid input, initialize variables*/
localCOUNTRYID=NULL;
localPROVINCEID=NULL;
localCITYNAMEID=NULL;
localCITYID=NULL;
localPOSTCODEID=NULL;
localSTREETNAMEID=NULL;
localFOURPP=LEFT(localPOSTCODE, 4);
localPOSTCODECHARS=RIGHT(localPOSTCODE, 2);

/* Fill database */
-- We use update or insert instead of merge because we can use the
returning clause.
UPDATE OR INSERT INTO COUNTRY(COUNTRYNAME) VALUES ('Nederland')
MATCHING (COUNTRYNAME)
RETURNING ID INTO :localCOUNTRYID;

IF (:localPROVINCENAME IS NULL) THEN
BEGIN
localPROVINCEID=NULL;
END
ELSE
BEGIN
UPDATE OR INSERT INTO PROVINCE(PROVINCENAME, COUNTRY_ID)
VALUES(:localPROVINCENAME, :localCOUNTRYID)
MATCHING (PROVINCENAME)
RETURNING ID INTO :localPROVINCEID;
END


-- City is special, only add something if we don't have a valid CITYNAME
-- Also, we assume the city name given is the official cityname.
SELECT ID FROM CITYNAME WHERE NAME=:localCITYNAME INTO :localCITYNAMEID;
IF (:localCITYNAMEID IS NULL) THEN
BEGIN
-- Add a city record first, then a CITYNAME
INSERT INTO CITY (PROVINCE_ID) VALUES (:localPROVINCEID)
RETURNING ID INTO :localCITYID;
INSERT INTO CITYNAME(NAME, CITY_ID, OFFICIAL) VALUES
(:localCITYNAME, :localCITYID, 1) RETURNING ID INTO 

Re: [fpc-pascal] Re: Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Ludo Brands
  Do you prepare the query before you start the batch ?
  If not, it is prepared on every insert, which is inherently slower.
 I didn't do an explicit .Prepare, but I've added it, thanks.
 I thought sqldb would prepare automatically if you are using 
 parameters though?

sqldb always uses a prepare. As long as you don't change the sql statement
or close the dataset, the prepare will only be done once. A tight setparams
execsql loop will prepare once and execute many times.

Ludo

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Re: Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Ludo Brands
 If it prepares the statement automatically, it also 
 unprepares it. (at least, it should :) )
 
I does for every change in query, connection, transaction, active state of
dataset, filter, etc. , but not at the end of an execsql.

Ludo

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Graeme Geldenhuys

On 07/09/12 12:12, michael.vancann...@wisa.be wrote:

I once did tests with that (600.000 records) and did not notice any
influence of the transaction control.



Same here... I've imported 100's of thousands of records with 
SqlDB+Firebird with no serious speed issues. Also from CSV files. 
Transactions are always used.


  Graeme.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Martin Schreiber
On Saturday 08 September 2012 01:05:28 Graeme Geldenhuys wrote:
 On 07/09/12 12:12, michael.vancann...@wisa.be wrote:
  I once did tests with that (600.000 records) and did not notice any
  influence of the transaction control.

 Same here... I've imported 100's of thousands of records with
 SqlDB+Firebird with no serious speed issues. Also from CSV files.
 Transactions are always used.

It depends on the reaction time of the network because AFAIK there is a 
roundtrip for every inserted record...

Martin
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal