Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-08 Thread patspiper

On 07/09/12 14:03, Reinier Olislagers wrote:

For my Dutch postcode program https://bitbucket.org/reiniero/postcode
with an embedded Firebird 2.5 database, I allow users to read in a CSV
file with new or updated postcode data.
I use sqldb, FPC x86.
I'd like to get your suggestions on speed improvements.


I try to get the data into a temporary table as quickly as possible.
Later on, a stored procedure will normalize the data and insert
to/update various tables (with postcode, city, street information, etc).

Because I also allow querying information, I set up 2
connections+transactions: for reading and writing in my database class
constructor, and destroy them in the destructor.
However, (currently) my application controls the database and I know
that querying and bulk inserts at the same time is impossible.

The write transaction has this code:
FWriteTransaction.Params.Add('isc_tpb_concurrency');
FWriteTransaction.Params.Add('isc_tpb_write');
FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable
transaction-level undo log, handy for getting max throughput when
performing a batch update

My code loads an ANSI CSV file into a csvdocument in memory (about
50meg), then goes through it, and calls an insert procedure for each
record (converting the field contents to UTF8):
FDBLayer.BulkInsertUpdateRecord(
   SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]),
   SysToUTF8(Postcodes.Cells[CityField,LineNum]),
   SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]),
   SysToUTF8(Postcodes.Cells[StreetField,LineNum]),
   StrToInt(Postcodes.Cells[NumberLowestField,LineNum]),
   StrToInt(Postcodes.Cells[NumberHighestField,LineNum]),
   Even,
   Latitude,
   Longitude);


Relevant snippets from the insert procedure:
   QuerySQL='INSERT INTO BULKINSERTDATA '+
'(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE)
'+
 'VALUES ( '+
':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)';
then the transaction is started (if it is inactive) and the query
parameters are filled (using Query.Params.ParamByName, but I don't
suppose that would be a big slowdown??); finally the SQL is executed.
The transaction is left open.

Currently, after every 100 records, the transaction is committed:
if (linenum mod 100=0) then
   FDBLayer.BulkInsertCommit(false);
IIRC, advice on the Firebird list is to play with this interval; any
suggestions? Given the aggressive nature of the transaction parameters,
I might even dispense with it.

Finally, once done, the transaction is committed, and the xtored
procedure that does subsequent updates is called.

Thanks,
Reinier
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

- Turning Forced Writes off while doing the bulk inserts may help (at a 
higher risk). But make sure you turn it back on afterwards.


- If my memory serves well, the Firebird SP can directly read from the 
CSV file instead of going through sqldb for each and every record.


Stephano
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-08 Thread Reinier Olislagers
On 8-9-2012 11:04, patspiper wrote:
 On 07/09/12 14:03, Reinier Olislagers wrote:
 - Turning Forced Writes off while doing the bulk inserts may help (at a
 higher risk). But make sure you turn it back on afterwards.
 
 - If my memory serves well, the Firebird SP can directly read from the
 CSV file instead of going through sqldb for each and every record.
 
 Stephano

Thanks, Stephano!
1. Yep, forced writes could indeed be a good one (surely for the first
phase: putting everything in the temporary table).
2. It can't read from CSV - but can read from fixed width text files.
The problem is that it gets very complicated, (AFAIU) impossible if you
have UTF8 data in there.
Of course, currently I'm reading ANSI and converting to UF8 myself; the
db can probably do that too.
I'll keep this option in mind - however I'd also like to support
client/server systems in future so that's why I haven't looked at this
further.

Thanks,
Reinier
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-08 Thread Graeme Geldenhuys

On 08/09/12 06:31, Martin Schreiber wrote:


It depends on the reaction time of the network because AFAIK there is a
roundtrip for every inserted record...


I didn't commit the transaction after every record. I first tried in a 
batch of 1 records per transaction. Later I even bumped that up 
more, and still no problems.


Anyway, that code isn't used too much in our product - it is only used 
for data conversion from our old system to our new system. So normally 
only needed once per setup.


Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


[fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Reinier Olislagers
For my Dutch postcode program https://bitbucket.org/reiniero/postcode
with an embedded Firebird 2.5 database, I allow users to read in a CSV
file with new or updated postcode data.
I use sqldb, FPC x86.
I'd like to get your suggestions on speed improvements.


I try to get the data into a temporary table as quickly as possible.
Later on, a stored procedure will normalize the data and insert
to/update various tables (with postcode, city, street information, etc).

Because I also allow querying information, I set up 2
connections+transactions: for reading and writing in my database class
constructor, and destroy them in the destructor.
However, (currently) my application controls the database and I know
that querying and bulk inserts at the same time is impossible.

The write transaction has this code:
FWriteTransaction.Params.Add('isc_tpb_concurrency');
FWriteTransaction.Params.Add('isc_tpb_write');
FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable
transaction-level undo log, handy for getting max throughput when
performing a batch update

My code loads an ANSI CSV file into a csvdocument in memory (about
50meg), then goes through it, and calls an insert procedure for each
record (converting the field contents to UTF8):
FDBLayer.BulkInsertUpdateRecord(
  SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]),
  SysToUTF8(Postcodes.Cells[CityField,LineNum]),
  SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]),
  SysToUTF8(Postcodes.Cells[StreetField,LineNum]),
  StrToInt(Postcodes.Cells[NumberLowestField,LineNum]),
  StrToInt(Postcodes.Cells[NumberHighestField,LineNum]),
  Even,
  Latitude,
  Longitude);


Relevant snippets from the insert procedure:
  QuerySQL='INSERT INTO BULKINSERTDATA '+
'(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE)
'+
'VALUES ( '+
':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)';
then the transaction is started (if it is inactive) and the query
parameters are filled (using Query.Params.ParamByName, but I don't
suppose that would be a big slowdown??); finally the SQL is executed.
The transaction is left open.

Currently, after every 100 records, the transaction is committed:
if (linenum mod 100=0) then
  FDBLayer.BulkInsertCommit(false);
IIRC, advice on the Firebird list is to play with this interval; any
suggestions? Given the aggressive nature of the transaction parameters,
I might even dispense with it.

Finally, once done, the transaction is committed, and the xtored
procedure that does subsequent updates is called.

Thanks,
Reinier
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread michael . vancanneyt



On Fri, 7 Sep 2012, Reinier Olislagers wrote:


For my Dutch postcode program https://bitbucket.org/reiniero/postcode
with an embedded Firebird 2.5 database, I allow users to read in a CSV
file with new or updated postcode data.
I use sqldb, FPC x86.
I'd like to get your suggestions on speed improvements.


I try to get the data into a temporary table as quickly as possible.
Later on, a stored procedure will normalize the data and insert
to/update various tables (with postcode, city, street information, etc).

Because I also allow querying information, I set up 2
connections+transactions: for reading and writing in my database class
constructor, and destroy them in the destructor.
However, (currently) my application controls the database and I know
that querying and bulk inserts at the same time is impossible.

The write transaction has this code:
FWriteTransaction.Params.Add('isc_tpb_concurrency');
FWriteTransaction.Params.Add('isc_tpb_write');
FWriteTransaction.Params.Add('isc_tpb_no_auto_undo'); //disable
transaction-level undo log, handy for getting max throughput when
performing a batch update

My code loads an ANSI CSV file into a csvdocument in memory (about
50meg), then goes through it, and calls an insert procedure for each
record (converting the field contents to UTF8):
FDBLayer.BulkInsertUpdateRecord(
 SysToUTF8(Postcodes.Cells[ProvinceField,LineNum]),
 SysToUTF8(Postcodes.Cells[CityField,LineNum]),
 SysToUTF8(Postcodes.Cells[PostcodeField,LineNum]),
 SysToUTF8(Postcodes.Cells[StreetField,LineNum]),
 StrToInt(Postcodes.Cells[NumberLowestField,LineNum]),
 StrToInt(Postcodes.Cells[NumberHighestField,LineNum]),
 Even,
 Latitude,
 Longitude);


Relevant snippets from the insert procedure:
 QuerySQL='INSERT INTO BULKINSERTDATA '+
'(PROVINCENAME,CITYNAME,POSTCODE,STREETNAME,LOW,HIGH,EVEN,LATITUDE,LONGITUDE)
'+
   'VALUES ( '+
':PROVINCENAME,:CITYNAME,:POSTCODE,:STREETNAME,:LOW,:HIGH,:EVEN,:LATITUDE,:LONGITUDE)';
then the transaction is started (if it is inactive) and the query
parameters are filled (using Query.Params.ParamByName, but I don't
suppose that would be a big slowdown??); finally the SQL is executed.
The transaction is left open.


Do you prepare the query before you start the batch ?
If not, it is prepared on every insert, which is inherently slower.



Currently, after every 100 records, the transaction is committed:
if (linenum mod 100=0) then
 FDBLayer.BulkInsertCommit(false);
IIRC, advice on the Firebird list is to play with this interval; any
suggestions? Given the aggressive nature of the transaction parameters,
I might even dispense with it.


I once did tests with that (600.000 records) and did not notice any influence
of the transaction control.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Ludo Brands
 For my Dutch postcode program https://bitbucket.org/reiniero/postcode
 with an embedded Firebird 2.5 database, I allow users to read 
 in a CSV file with new or updated postcode data. I use sqldb, 
 FPC x86. I'd like to get your suggestions on speed improvements.
 

Turn of indices when inserting and turn them on again when the inserting is
done.

Since you are the only user and concurrent access is not that important (I
guess), I believe isc_tpb_concurrency is not the best choice. IIRC
isc_tpb_read_committed + isc_tpb_no_rec_version has the less overhead. 

Ludo

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Graeme Geldenhuys

On 07/09/12 12:12, michael.vancann...@wisa.be wrote:

I once did tests with that (600.000 records) and did not notice any
influence of the transaction control.



Same here... I've imported 100's of thousands of records with 
SqlDB+Firebird with no serious speed issues. Also from CSV files. 
Transactions are always used.


  Graeme.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Firebird: bulk insert performance: suggestions?

2012-09-07 Thread Martin Schreiber
On Saturday 08 September 2012 01:05:28 Graeme Geldenhuys wrote:
 On 07/09/12 12:12, michael.vancann...@wisa.be wrote:
  I once did tests with that (600.000 records) and did not notice any
  influence of the transaction control.

 Same here... I've imported 100's of thousands of records with
 SqlDB+Firebird with no serious speed issues. Also from CSV files.
 Transactions are always used.

It depends on the reaction time of the network because AFAIK there is a 
roundtrip for every inserted record...

Martin
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal