In order to be taken `super, suPER, SUPER` seriously, let me be hermetically 
sealed, and scientifically rigorous, in presenting this `problem`.

Base Hardware:

System76 laptop with Intel I7 6700HQ cpu, 2.6-3.5 GHz clock, 4 cores, 8 
threads, 16GB of memory, and 128GB SSD.

Base OS:

Linux kernel 4.11.11, gcc 4.9.2|clang 3.9.1, PCLinuxOS 64-bit.

(This is a pre Meltdown and Spectre patched system)

VB OSs: Linux kernels 4.15.x-4.16.x, gcc 7.3.0/1 : clang 3.9.1-6.0

(These are post Meltdown and Spectre kernel patched systems)

The results shown here are from my base system, but the `problem` is consistent 
with any combination of kernel, and gcc or clang, on VB based systems for 
64-bit Linux distros.

Here are the gists of the tested code:

The difference between the code is line 239 in `twins_sieve`.

`twinprimes_test.nim`

[https://gist.github.com/jzakiya/e140e9f3d660059631b2bb09487220f9](https://gist.github.com/jzakiya/e140e9f3d660059631b2bb09487220f9)

`twinprimes_test1.nim`

[https://gist.github.com/jzakiya/8f7768c8c9f6e925b200c5f463a2f95c](https://gist.github.com/jzakiya/8f7768c8c9f6e925b200c5f463a2f95c)

They were compiled with flags as follows, and run on a `quiet system`.

(Rebooted, opened only a terminal, and ran tests.)
    
    
    nim c --cc:gcc --d:release --threads:on twinprimes_test.nim
    
    nim c --cc:gcc --d:release --threads:on twinprimes_test1.nim
    
    then run
    
    echo 500_000_000_000 | ./twinprimes_test
    
    echo 500_000_000_000 | ./twinprimes_test1
    
    echo 1_000_000_000_000 | ./twinprimes_test
    
    echo 1_000_000_000_000 | ./twinprimes_test1
    
    
    then compile as
    
    
    nim c --cc:clang --d:release --threads:on twinprimes_test.nim
    
    nim c --cc:clang --d:release --threads:on twinprimes_test1.nim
    
    and run
    
    echo 500_000_000_000 | ./twinprimes_test
    
    echo 500_000_000_000 | ./twinprimes_test1
    
    echo 1_000_000_000_000 | ./twinprimes_test
    
    echo 1_000_000_000_000 | ./twinprimes_test1
    

For either compiling with gcc or clang, as the input values become bigger, 
`twinprimes_test1` times become increasingly slower, as a percentage of 
`twinprimes_test`, approaching on order of 10% for the two data points shown. 
For bigger inputs the differences grow larger.

(On a good note, I was pleasantly surprised to see clang has faster times for 
this particular architecture, at least on my base system, as it had always been 
slower.)
    
    
    Input Number |     twinprimes_test     |     twinprimes_test1    |
                 | gcc 4.9.2 | clang 3.9.1 | gcc 4.9.2 | clang 3.9.1 |
    ------------------------------------------------------------------
    5e11         |  28.926   |    28.241   |  31.759   |    30.970   |
    1e12         |  63.285   |    61.042   |  67.842   |    66.678   |
    

Even though the devs have exhibited an extreme lack of _curiosity_ (` willful 
blindess`) to acknowledge this `problem`, even as just a user, you should pay 
attention to it, though.

I only `discovered` this behavioral phenomena because of serendipty. How many 
places in your (or Nim's) codebase are similar occurences of just `this` code 
phenomena lurking, unbekownst to its potential performance hit?

This is also, obviously, `a potential security vector`.

I've already identified the nim source code difference that (re)produces the 
problem. I've already identified the compiled code C output created for the Nim 
source code. What is needed is a forensic analysis of the assembly code 
differences, which I don't know how to do, nor really have the inclination (or 
time) to do if I did.

It would also obviously be interesting (and rigorous) to see if|how this 
phenomena exists on different hardware (AMD, ARM, PowerPc, etc), and OS (BSD, 
Windows, Mac OS, IOS, Android, etc) systems.

If the devs don't have even a basic level of intellectual inquisitiveness 
(pride?) to understand why this phenomena exists (and would have to ultimately 
`fix` it ), I don't know what more data, motivation, or incentive, is needed.

Reply via email to