In order to be taken `super, suPER, SUPER` seriously, let me be hermetically sealed, and scientifically rigorous, in presenting this `problem`.
Base Hardware: System76 laptop with Intel I7 6700HQ cpu, 2.6-3.5 GHz clock, 4 cores, 8 threads, 16GB of memory, and 128GB SSD. Base OS: Linux kernel 4.11.11, gcc 4.9.2|clang 3.9.1, PCLinuxOS 64-bit. (This is a pre Meltdown and Spectre patched system) VB OSs: Linux kernels 4.15.x-4.16.x, gcc 7.3.0/1 : clang 3.9.1-6.0 (These are post Meltdown and Spectre kernel patched systems) The results shown here are from my base system, but the `problem` is consistent with any combination of kernel, and gcc or clang, on VB based systems for 64-bit Linux distros. Here are the gists of the tested code: The difference between the code is line 239 in `twins_sieve`. `twinprimes_test.nim` [https://gist.github.com/jzakiya/e140e9f3d660059631b2bb09487220f9](https://gist.github.com/jzakiya/e140e9f3d660059631b2bb09487220f9) `twinprimes_test1.nim` [https://gist.github.com/jzakiya/8f7768c8c9f6e925b200c5f463a2f95c](https://gist.github.com/jzakiya/8f7768c8c9f6e925b200c5f463a2f95c) They were compiled with flags as follows, and run on a `quiet system`. (Rebooted, opened only a terminal, and ran tests.) nim c --cc:gcc --d:release --threads:on twinprimes_test.nim nim c --cc:gcc --d:release --threads:on twinprimes_test1.nim then run echo 500_000_000_000 | ./twinprimes_test echo 500_000_000_000 | ./twinprimes_test1 echo 1_000_000_000_000 | ./twinprimes_test echo 1_000_000_000_000 | ./twinprimes_test1 then compile as nim c --cc:clang --d:release --threads:on twinprimes_test.nim nim c --cc:clang --d:release --threads:on twinprimes_test1.nim and run echo 500_000_000_000 | ./twinprimes_test echo 500_000_000_000 | ./twinprimes_test1 echo 1_000_000_000_000 | ./twinprimes_test echo 1_000_000_000_000 | ./twinprimes_test1 For either compiling with gcc or clang, as the input values become bigger, `twinprimes_test1` times become increasingly slower, as a percentage of `twinprimes_test`, approaching on order of 10% for the two data points shown. For bigger inputs the differences grow larger. (On a good note, I was pleasantly surprised to see clang has faster times for this particular architecture, at least on my base system, as it had always been slower.) Input Number | twinprimes_test | twinprimes_test1 | | gcc 4.9.2 | clang 3.9.1 | gcc 4.9.2 | clang 3.9.1 | ------------------------------------------------------------------ 5e11 | 28.926 | 28.241 | 31.759 | 30.970 | 1e12 | 63.285 | 61.042 | 67.842 | 66.678 | Even though the devs have exhibited an extreme lack of _curiosity_ (` willful blindess`) to acknowledge this `problem`, even as just a user, you should pay attention to it, though. I only `discovered` this behavioral phenomena because of serendipty. How many places in your (or Nim's) codebase are similar occurences of just `this` code phenomena lurking, unbekownst to its potential performance hit? This is also, obviously, `a potential security vector`. I've already identified the nim source code difference that (re)produces the problem. I've already identified the compiled code C output created for the Nim source code. What is needed is a forensic analysis of the assembly code differences, which I don't know how to do, nor really have the inclination (or time) to do if I did. It would also obviously be interesting (and rigorous) to see if|how this phenomena exists on different hardware (AMD, ARM, PowerPc, etc), and OS (BSD, Windows, Mac OS, IOS, Android, etc) systems. If the devs don't have even a basic level of intellectual inquisitiveness (pride?) to understand why this phenomena exists (and would have to ultimately `fix` it ), I don't know what more data, motivation, or incentive, is needed.