Is there a good article written for this? Preferably for D specifically...

I notice as I'm working a bit with my challenge to make/update the symbol/id compressor that perhaps the GC is getting in the way and skewing the results. Means a number of what I've put up as benchmark values may wildly off. So forgive my ignorance.

So first, compiling flags. What should be used? So far -inline -noboundscheck -O -release

Flags for a C program (if it comes into play?) I only can see -o -c to be applicable (then link it to your program).

How do I work around/with the GC? what code should I use for benchmarks?

Currently I'm trying to use the TickDuration via Benchmark, but it isn't exactly an arbitrary unit of time. If benchmark is a bad choice, what's a good one?

As for the GC, since it might be running or pause threads in order to run, how do I ensure it's stopped before I do my benchmarks?

Here's what I have so far...

  import core.thread : thread_joinAll;
  import core.memory;
  import std.datetime : benchmark;

  //test functions
  //actual functions slower than lambdas??
  auto f1 = (){};
  auto f2 = (){};

  int rounds = 100_000;

  GC.collect();
//GC.reserve(1024*1024*32); //no guarantee of reserves. So would this help?
  thread_joinAll();           //guarentees the GC is done?
  GC.disable();               //turned off

  auto test1 = benchmark!(f1)(rounds);

  GC.collect();               //collect between tests
  thread_joinAll();           //make sure GC is done?

  auto test2 = benchmark!(f2)(rounds);
  //collect, joinall
  ...
//optional cleanup after the fact? Or leave the program to do it after exiting?
  //GC.enable();
  //GC.collect();

Is it better to have a bunch of free memory and ignore leaks? Or to free memory as it's going through for cases that require it?

//compress returns memory malloc'd, compiled with DMC and C code.
  char *compress(cast(char*) ptr, int size);

  auto f3 = (){
compress(cast(char*) haystack.ptr, haystack.length); //this with leaks? GC.free(compress(cast(char*) haystack.ptr, haystack.length)); //or this?
  };

Is memory allocated by DMC freed properly by GC.free if I end up using it this way? (For all I know GC.free ignores the pointer). If I do a separate allocations to match what the functions and calls did, can I subtract it to get a cleaner set of statistics? Or is that line of thinking a wrong?

  auto f3_mm = (){
    void *ptr = GC.malloc(1024);
    GC.free(ptr);
  };

  auto test2 = benchmark!(f3, f3_mm)(rounds); //f3-f3_mm = delta?

For the functions/lambdas passed to benchmark, is it better to provide all the information in the function and not have data stored elsewhere? Or store it all as a pure function? Does the overhead of the extra stack pointer make any difference?

Is it better to collect all the tests and output the results all at once? Or is it okay or better to output the statistics as they are finished (between benchmarks and before the collection/thread_joinall calls)?

What other things should I do/consider when writing basic benchmark code?

Reply via email to