On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote:
On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote:
> I am as unclear about the problems of autodecoding as I am
about the necessity
> to remove curl. Whenever I ask I hear some arguments that
work well emotionally
> but are scant on reason and engineering. Maybe it's time to
rehash them? I just
> did so about curl, no solid argument seemed to come together.
I'd be curious of
> a crisp list of grievances about autodecoding. -- Andrei


Given the importance of performance in the auto-decoding topic, it seems reasonable to quantify it. I took a stab at this. It would of course be prudent to have others conduct similar analysis rather than rely on my numbers alone.

Measurements were done using an artificial scenario, counting lower-case ascii letters. This had the effect of calling front/popFront many times on a long block of text. Runs were done both treating the text as char[] and ubyte[] and comparing the run times. (char[] performs auto-decoding, ubyte[] does not.)

Timings were done with DMD and LDC, and on two different data sets. One data set was a mix of latin languages (e.g. German, English, Finnish, etc.), the other non-Latin languages (e.g. Japanese, Chinese, Greek, etc.). The goal being to distinguish between scenarios with high and low Ascii character content.

The result: For DMD, auto-decoding showed a 1.6x to 2.6x cost. For LDC, a 12.2x to 12.9x cost.

Details:
- Test program: https://dpaste.dzfl.pl/67c7be11301f
- DMD 2.071.0. Options: -release -O -boundscheck=off -inline
- LDC 1.0.0-beta1 (based on DMD v2.070.2). Options: -release -O -boundscheck=off
- Machine: Macbook Pro (2.8 GHz Intel I7, 16GB ram)

Runs for each combination were done five times and the median times used. The median times and the char[] to ubyte[] ratio are below:
|          |           |    char[] |   ubyte[] |
| Compiler | Text type | time (ms) | time (ms) | ratio |
|----------+-----------+-----------+-----------+-------|
| DMD      | Latin     |      7261 |      4513 |   1.6 |
| DMD      | Non-latin |     10240 |      3928 |   2.6 |
| LDC      | Latin     |     11773 |       913 |  12.9 |
| LDC      | Non-latin |     10756 |       883 |  12.2 |

Note: The numbers above don't provide enough info to derive a front/popFront rate. The program artificially makes multiple loops to increase the run-times. (For these runs, the program's repeat-count was set to 20).

Characteristics of the two data sets:
|           |         |         |             | Bytes per |
| Text type | Bytes | DChars | Ascii Chars | DChar | Pct Ascii |
|-----------+---------+---------+-------------+-----------+-----------|
| Latin | 4156697 | 4059016 | 3965585 | 1.024 | 97.7% | | Non-latin | 4061554 | 1949290 | 348164 | 2.084 | 17.9% |

Run-to-run variability - The run times recorded were quite stable. The largest delta between minimum and median time for any group was 17 milliseconds.

Reply via email to