[Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-26 Thread Werner Pamler via Lazarus
Trying to extend the import/export example of fpspreadsheet from a dBase 
table to a TBufDataset I came across this issue with TBufDataset: While 
data are posted to the database as quickly as usual writing to file 
takes extremely long if there are more than a few thousand records.


Run the demo attached below. On my system, I measure these (non-linearly 
scaling) execution times for writing the TBufDataset table to file:


1000 records -- 0.9 seconds
2000 records -- 8.8 seconds
3000 records -- 31.1 seconds
etc.

Compared to that, writing of the same data to a dbf file is a wink of an 
eye. Is there anything which I am doing wrong? Or should I report a bug?


--- snip ---

program project1;

{$mode objfpc}{$H+}

uses
  SysUtils, classes, db, bufdataset;

const
  // Parameters for generating database file contents
  NUM_LAST_NAMES = 8;
  NUM_FIRST_NAMES = 8;
  NUM_CITIES = 10;
  LAST_NAMES: array[0..NUM_LAST_NAMES-1] of string = (
'Chaplin', 'Washington', 'Dylan', 'Springsteen', 'Brando',
'Monroe', 'Dean', 'Lincoln');
  FIRST_NAMES: array[0..NUM_FIRST_NAMES-1] of string = (
'Charley', 'George', 'Bob', 'Bruce', 'Marlon',
'Marylin', 'James', 'Abraham');
  CITIES: array[0..NUM_CITIES-1] of string = (
'New York', 'Los Angeles', 'San Francisco', 'Chicago', 'Miami',
'New Orleans', 'Washington', 'Boston', 'Seattle', 'Las Vegas');

  TABLENAME = 'people'; //name for the database table, extension will 
be added

  DATADIR = 'data'; //subdirectory where database file is stored
  DB_EXT: array[0..1] of string = (
'.dbf', '.db');

const
  NUM_RECORDS = 1000;
  SECONDS_PER_DAY = 24 * 60 * 60;

var
  FExportDataset: TBufDataset;

procedure CreateDatabase;
var
  i: Integer;
  startDate: TDate;
  maxAge: Integer = 100 * 365;
  fn: String;
  stream: TMemoryStream;
  t: TDateTime;
begin
  ForceDirectories(DATADIR);
  startDate := Date();

  fn := DATADIR + DirectorySeparator + TABLENAME + DB_EXT[1];
  DeleteFile(fn);

  FExportDataset := TBufDataset.Create(nil);
//  FExportDataset.Filename := fn;

  FExportDataset.FieldDefs.Add('Last name', ftString, 15);
  FExportDataset.FieldDefs.Add('First name', ftString, 10);
  FExportDataset.FieldDefs.Add('City', ftString, 15);
  FExportDataset.FieldDefs.Add('Birthday', ftDate);
  FExportDataset.FieldDefs.Add('Salary', ftCurrency);
  FExportDataset.FieldDefs.Add('Work begin', ftDateTime);
  FExportDataset.FieldDefs.Add('Work end', ftDateTime);
  FExportDataset.FieldDefs.Add('Size', ftFloat);
  FExportDataset.CreateDataset;

  FExportDataset.Open;

  // Random data
  for i:=1 to NUM_RECORDS do begin
if (i mod 100 = 0) then
  WriteLn(Format('Adding record %d...', [i]));
FExportDataset.Insert;
FExportDataset.FieldByName('Last name').AsString := 
LAST_NAMES[Random(NUM_LAST_NAMES)];
FExportDataset.FieldByName('First name').AsString := 
FIRST_NAMES[Random(NUM_FIRST_NAMES)];
FExportDataset.FieldByName('City').AsString := 
CITIES[Random(NUM_CITIES)];
FExportDataset.FieldByName('Birthday').AsDateTime := startDate - 
random(maxAge);

FExportDataset.FieldByName('Salary').AsFloat := 1000+Random(9000);
FExportDataset.FieldByName('Size').AsFloat := (160 + Random(50)) / 100;
FExportDataSet.FieldByName('Work begin').AsDateTime := 
4+EncodeTime(6+Random(4), Random(60), Random(60), 0);
FExportDataSet.FieldByName('Work end').AsDateTime := 
EncodeTime(15+Random(4), Random(60), Random(60), 0);

FExportDataset.Post;
  end;

  WriteLn('Saving...');
  t := now;
  stream := TMemoryStream.Create;
  try
FExportDataset.SaveToStream(stream);
stream.Position := 0;
WriteLn('Written to memory stream: ', FormatFloat('0.000 s', (now - 
t) * SECONDS_PER_DAY));

stream.SaveToFile(fn);
  finally
stream.Free;
  end;
  Writeln('Done. Total time needed for saving: ', FormatFloat('0.000 
s', (now - t) * SECONDS_PER_DAY));


  FExportDataset.Close;

  WriteLn(Format('Created file "%s" in folder "data".', [
ExtractFileName(fn), ExtractFileDir(fn)
  ]));
  FExportDataset.Free;
end;

begin
  CreateDatabase;

  WriteLn;
  WriteLn('Press ENTER to close.');
  ReadLn;
end.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-26 Thread Howard Page-Clark via Lazarus

On 26/03/17 22:53, Werner Pamler via Lazarus wrote:
Trying to extend the import/export example of fpspreadsheet from a 
dBase table to a TBufDataset I came across this issue with 
TBufDataset: While data are posted to the database as quickly as usual 
writing to file takes extremely long if there are more than a few 
thousand records.


Run the demo attached below. On my system, I measure these 
(non-linearly scaling) execution times for writing the TBufDataset 
table to file:


1000 records -- 0.9 seconds
2000 records -- 8.8 seconds
3000 records -- 31.1 seconds
etc.

Compared to that, writing of the same data to a dbf file is a wink of 
an eye. Is there anything which I am doing wrong? Or should I report a 
bug?

I don't think you do anything wrong.
I can get small performance increases by
- avoiding FieldByName() calls and using AppendRecord
- using SaveToFile and avoiding an intermediate memory stream
- increasing the value of PacketRecords
but the speedups are insignificant.

Clearly either the insertion algorithm should be improved, or the 
buffering, or the way the buffered records are written to disk. Maybe 
all three areas of TBufDataset can be optimised for better performance.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread Marc Santhoff via Lazarus
On So, 2017-03-26 at 23:53 +0200, Werner Pamler via Lazarus wrote:
> Trying to extend the import/export example of fpspreadsheet from a dBase 
> table to a TBufDataset I came across this issue with TBufDataset: While 
> data are posted to the database as quickly as usual writing to file 
> takes extremely long if there are more than a few thousand records.
> 
> Run the demo attached below. On my system, I measure these (non-linearly 
> scaling) execution times for writing the TBufDataset table to file:
> 
> 1000 records -- 0.9 seconds
> 2000 records -- 8.8 seconds
> 3000 records -- 31.1 seconds
> etc.
> 
> Compared to that, writing of the same data to a dbf file is a wink of an 
> eye. Is there anything which I am doing wrong? Or should I report a bug?
> 

I didn't count, but you make extensive use of the Random() function.
Could that be the cause of slowness?

HTH,
Marc

[...]
>FExportDataset.Open;
> 
>// Random data
>for i:=1 to NUM_RECORDS do begin
>  if (i mod 100 = 0) then
>WriteLn(Format('Adding record %d...', [i]));
>  FExportDataset.Insert;
>  FExportDataset.FieldByName('Last name').AsString := 
> LAST_NAMES[Random(NUM_LAST_NAMES)];
>  FExportDataset.FieldByName('First name').AsString := 
> FIRST_NAMES[Random(NUM_FIRST_NAMES)];
>  FExportDataset.FieldByName('City').AsString := 
> CITIES[Random(NUM_CITIES)];
>  FExportDataset.FieldByName('Birthday').AsDateTime := startDate - 
> random(maxAge);
>  FExportDataset.FieldByName('Salary').AsFloat := 1000+Random(9000);
>  FExportDataset.FieldByName('Size').AsFloat := (160 + Random(50)) / 100;
>  FExportDataSet.FieldByName('Work begin').AsDateTime := 
> 4+EncodeTime(6+Random(4), Random(60), Random(60), 0);
>  FExportDataSet.FieldByName('Work end').AsDateTime := 
> EncodeTime(15+Random(4), Random(60), Random(60), 0);
>  FExportDataset.Post;
>end;


-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread Werner Pamler via Lazarus


Am 27.03.2017 um 09:07 schrieb Marc Santhoff via Lazarus:

I didn't count, but you make extensive use of the Random() function.
Could that be the cause of slowness?


No, Random() is called only while records are populated - this step is 
completed without any noticable delay. Time is measured afterwards when 
the populated table is written to stream/file.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread Werner Pamler via Lazarus

Am 27.03.2017 um 00:53 schrieb Howard Page-Clark via Lazarus:

I can get small performance increases by
- avoiding FieldByName() calls and using AppendRecord
No, at least not for the issue I am referring to. Like in the answer to 
Marc's comment: This happens while the table is populated, but the delay 
occurs when the populated table is written to stream/file.



- using SaveToFile and avoiding an intermediate memory stream
Has no noticable effect. 1000 records really is not much, and flushing 
the memory stream to disk occurs without any delay (see my code haveing 
measurement points before and after writing the memory stream and after 
writing to file. In fact, when I noticed this effect I did not have any 
explicit writing code at all, I noticed an excessive delay while the 
dataset is closed -- this is when the BufDataset is saved automatically.



- increasing the value of PacketRecords
Not knowing what this is I increased the value in multiples of 10 from 1 
to 1E9 and don't see any effect within the usual scatter.


Clearly either the insertion algorithm should be improved, or the 
buffering, or the way the buffered records are written to disk. Maybe 
all three areas of TBufDataset can be optimised for better performance.
Thanks. When I have time I'll write a bug report. The current 
TBufDataset is usable only as a pure in-memory table which is never 
written to file. BTW, in the attached modified demo code the TBufDataset 
can be replaced by a TMemDataset (define "USE_MEM_DATASET"), and this 
one is written instantly.


-- snip ---

program project1;

{$mode objfpc}{$H+}

{$DEFINE USE_MEM_DATASET}

uses
  SysUtils, classes, db, memds, bufdataset;

const
  TABLENAME = 'people'; //name for the database table, extension will 
be added

  DATADIR = 'data'; //subdirectory where database is stored

const
  NUM_RECORDS = 5000;
  SECONDS_PER_DAY = 24 * 60 * 60;

var
  FExportDataset: TDataset;

procedure CreateDatabase;
var
  i: Integer;
  fn: String;
  stream: TMemoryStream;
  t: TDateTime;
begin
  ForceDirectories(DATADIR);

  fn := DATADIR + DirectorySeparator + TABLENAME + '.db';
  DeleteFile(fn);

  {$IFDEF USE_MEM_DATASET}
  FExportDataset := TMemDataset.Create(nil);
  {$ELSE}
  FExportDataset := TBufDataset.Create(nil);
  {$ENDIF}

  FExportDataset.FieldDefs.Add('Last name', ftString, 15);
  FExportDataset.FieldDefs.Add('First name', ftString, 10);
  FExportDataset.FieldDefs.Add('City', ftString, 15);
  FExportDataset.FieldDefs.Add('Birthday', ftDate);
  FExportDataset.FieldDefs.Add('Salary', ftCurrency);
  FExportDataset.FieldDefs.Add('Work begin', ftDateTime);
  FExportDataset.FieldDefs.Add('Work end', ftDateTime);
  FExportDataset.FieldDefs.Add('Size', ftFloat);
  {$IFNDEF USE_MEM_DATASET}
  TBufDataset(FExportDataset).CreateDataset;
  {$ENDIF}

  FExportDataset.Open;

  // Random data
  for i:=1 to NUM_RECORDS do begin
if (i mod 100 = 0) then
  WriteLn(Format('Adding record %d...', [i]));
FExportDataset.Insert;
FExportDataset.FieldByName('Last name').AsString := 'A';
FExportDataset.FieldByName('First name').AsString := 'B';
FExportDataset.FieldByName('City').AsString := 'C';
FExportDataset.FieldByName('Birthday').AsDateTime := 0;
FExportDataset.FieldByName('Salary').AsFloat := 0;
FExportDataset.FieldByName('Size').AsFloat := 0;
FExportDataSet.FieldByName('Work begin').AsDateTime := 0;
FExportDataSet.FieldByName('Work end').AsDateTime := 0;
FExportDataset.Post;
  end;

  WriteLn('Saving...');
  t := now;
  stream := TMemoryStream.Create;
  try
{$IFDEF USE_MEM_DATASET}
TMemDataset(FExportDataset).SaveToStream(stream);
{$ELSE}
TBufDataset(FExportDataset).SaveToStream(stream);
{$ENDIF}
stream.Position := 0;
WriteLn('Written to memory stream: ', FormatFloat('0.000 s', (now - 
t) * SECONDS_PER_DAY));

stream.SaveToFile(fn);
  finally
stream.Free;
  end;
  Writeln('Done. Total time needed for saving: ', FormatFloat('0.000 
s', (now - t) * SECONDS_PER_DAY));


  FExportDataset.Close;

  writeLn(Format('Created file "%s" in folder "data".', [
ExtractFileName(fn), ExtractFileDir(fn)
  ]));
  FExportDataset.Free;
end;

begin
  CreateDatabase;

  WriteLn;
  WriteLn('Press ENTER to close.');
  ReadLn;
end.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread LacaK via Lazarus

Try call FExportDataset.MergeChangeLog before:
  WriteLn('Saving...'); 

Does anything in your timing changed ?
-Laco.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread Werner Pamler via Lazarus

Am 27.03.2017 um 10:13 schrieb LacaK via Lazarus:

Try call FExportDataset.MergeChangeLog before:
  WriteLn('Saving...'); 

Does anything in your timing changed ?


Ah - that's it. TBufDataset saves the records instantly now. Probably, 
this should go into the official wiki site for TBufDataset.


But now another issue is coming up: If I increase the number of records 
to 40,000 population of records slows down after about 10,000 records, 
speeds up again, and comes to an apparant stand-still for 32900 records. 
After waiting some time the record counter (which is incremented in 
steps of 100 in my demo) goes up to 33000. Then I gave up.


Again, If I run the demo with TMemDataset, these effects do not show up. 
(As for the current code, see my other post of today).

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread Martin Schreiber via Lazarus
On Sunday 26 March 2017 23:53:08 Werner Pamler via Lazarus wrote:
> Trying to extend the import/export example of fpspreadsheet from a dBase
> table to a TBufDataset I came across this issue with TBufDataset: While
> data are posted to the database as quickly as usual writing to file
> takes extremely long if there are more than a few thousand records.
>
> Run the demo attached below. On my system, I measure these (non-linearly
> scaling) execution times for writing the TBufDataset table to file:
>
> 1000 records -- 0.9 seconds
> 2000 records -- 8.8 seconds
> 3000 records -- 31.1 seconds
> etc.
>
> Compared to that, writing of the same data to a dbf file is a wink of an
> eye. Is there anything which I am doing wrong? Or should I report a bug?
>
Can you switch off 'applyupdate'-functionality in TBufdataset? MSEgui 
TLocalDataset (a fork of FPC TBufDataset) writes 1'000'000 records in about 
0.4 seconds if options bdo_noapply is set.
" 
100: 0.313s
100: 0.308s
100: 0.319s
100: 0.311s
100: 0.411s
100: 0.293s
100: 0.327s
100: 0.321s
3000: 0.001s
3000: 0.001s
3000: 0.001s
"
"
procedure tmainfo.recev(const sender: TObject);
var
 i1: int32;
 t1: tdatetime;
begin
 locds.active:= false;
 locds.disablecontrols();
 try
  locds.active:= true;
  for i1:= 1 to reccount.value do begin
   locds.appendrecord([i1,inttostrmse(i1)+'abcdefghiklmnop',10*i1]);
  end;
  t1:= nowutc();
  locds.savetofile('test.db');
  t1:= nowutc()-t1;
  writeln(reccount.value,': ',formatfloatmse(t1*60*60*24,'0.000s'));
  locds.active:= false;
 finally
  locds.enablecontrols();
 end;
end;
"

Martin
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread LacaK via Lazarus



Try call FExportDataset.MergeChangeLog before:
  WriteLn('Saving...'); 

Does anything in your timing changed ?


Ah - that's it. TBufDataset saves the records instantly now. Probably, 
this should go into the official wiki site for TBufDataset.


But now another issue is coming up: If I increase the number of 
records to 40,000 population of records slows down after about 10,000 
records, speeds up again, and comes to an apparant stand-still for 
32900 records. After waiting some time the record counter (which is 
incremented in steps of 100 in my demo) goes up to 33000. Then I gave up.

Try call MergeChangeLog regulary on every 1000 rows for example.
If does not help, attach your test program, so we can reproduce ...

L.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread Werner Pamler via Lazarus

Am 27.03.2017 um 10:59 schrieb LacaK via Lazarus:
But now another issue is coming up: If I increase the number of 
records to 40,000 population of records slows down after about 10,000 
records, speeds up again, and comes to an apparant stand-still for 
32900 records. After waiting some time the record counter (which is 
incremented in steps of 100 in my demo) goes up to 33000. Then I gave 
up.

Try call MergeChangeLog regulary on every 1000 rows for example.
If does not help, attach your test program, so we can reproduce ...


Yes, this is the solution. Thank you. MergeChangeLog definitely should 
be documented in a better way.


--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Writing >1000 TBufDataset records to file is extremely slow

2017-03-27 Thread LacaK via Lazarus


But now another issue is coming up: If I increase the number of 
records to 40,000 population of records slows down after about 
10,000 records, speeds up again, and comes to an apparant 
stand-still for 32900 records. After waiting some time the record 
counter (which is incremented in steps of 100 in my demo) goes up to 
33000. Then I gave up.

Try call MergeChangeLog regulary on every 1000 rows for example.
If does not help, attach your test program, so we can reproduce ...


Yes, this is the solution. Thank you. MergeChangeLog definitely should 
be documented in a better way.

You can report bug report about it or add it to wiki yourself
 - There is http://wiki.freepascal.org/TBufDataset but it seems that 
there is only one line of text ;-)
 - in FCL documentation 
http://www.freepascal.org/docs-html/current/fcl/db/index.html I can not 
find TBufDataset at all

L.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus