Re: The Case Against Autodecode

Jon D via Digitalmars-d Sun, 15 May 2016 16:17:43 -0700

On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote:

On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote:
> I am as unclear about the problems of autodecoding as I am
about the necessity
> to remove curl. Whenever I ask I hear some arguments that
work well emotionally
> but are scant on reason and engineering. Maybe it's time to
rehash them? I just
> did so about curl, no solid argument seemed to come together.
I'd be curious of
> a crisp list of grievances about autodecoding. -- Andrei

Given the importance of performance in the auto-decoding topic,it seems reasonable to quantify it. I took a stab at this. Itwould of course be prudent to have others conduct similaranalysis rather than rely on my numbers alone.

Measurements were done using an artificial scenario, countinglower-case ascii letters. This had the effect of callingfront/popFront many times on a long block of text. Runs were doneboth treating the text as char[] and ubyte[] and comparing therun times. (char[] performs auto-decoding, ubyte[] does not.)

Timings were done with DMD and LDC, and on two different datasets. One data set was a mix of latin languages (e.g. German,English, Finnish, etc.), the other non-Latin languages (e.g.Japanese, Chinese, Greek, etc.). The goal being to distinguishbetween scenarios with high and low Ascii character content.

The result: For DMD, auto-decoding showed a 1.6x to 2.6x cost.For LDC, a 12.2x to 12.9x cost.


Details:
- Test program: https://dpaste.dzfl.pl/67c7be11301f
- DMD 2.071.0. Options: -release -O -boundscheck=off -inline

- LDC 1.0.0-beta1 (based on DMD v2.070.2). Options: -release -O-boundscheck=off

- Machine: Macbook Pro (2.8 GHz Intel I7, 16GB ram)

Runs for each combination were done five times and the mediantimes used. The median times and the char[] to ubyte[] ratio arebelow:

|          |           |    char[] |   ubyte[] |
| Compiler | Text type | time (ms) | time (ms) | ratio |
|----------+-----------+-----------+-----------+-------|
| DMD      | Latin     |      7261 |      4513 |   1.6 |
| DMD      | Non-latin |     10240 |      3928 |   2.6 |
| LDC      | Latin     |     11773 |       913 |  12.9 |
| LDC      | Non-latin |     10756 |       883 |  12.2 |

Note: The numbers above don't provide enough info to derive afront/popFront rate. The program artificially makes multipleloops to increase the run-times. (For these runs, the program'srepeat-count was set to 20).


Characteristics of the two data sets:
|           |         |         |             | Bytes per |

|-----------+---------+---------+-------------+-----------+-----------|

| Latin | 4156697 | 4059016 | 3965585 | 1.024 |97.7% || Non-latin | 4061554 | 1949290 | 348164 | 2.084 |17.9% |

Run-to-run variability - The run times recorded were quitestable. The largest delta between minimum and median time for anygroup was 17 milliseconds.

Re: The Case Against Autodecode

Reply via email to