Re: Can I access arrays faster than this?

2019-03-04 Thread Varriount
Hm, when I compile the C++ code, I get a segfault.


Re: Can I access arrays faster than this?

2019-03-04 Thread ggibson
That's pretty weird! It works for me Clang v5 and GCC 6.4 on 64-bit CentOS 7.


Re: Can I access arrays faster than this?

2019-03-04 Thread Varriount
No.. I'm running it on a MacBook with 16GB of ram


Re: Can I access arrays faster than this?

2019-03-04 Thread ggibson
ookay, I think it was a memory fragmentation error, so probably it would have 
gone away if you had rebooted. I restructured the example to grab heap memory 
instead. Should work now!


Re: Can I access arrays faster than this?

2019-03-04 Thread ggibson
Hats off to you, sir. Thank you for pointing that out. I was under the 
impression opt:speed was default - nope.


Re: Can I access arrays faster than this?

2019-03-04 Thread Varriount
Ok, so I was able to get the Nim and C++ examples to compile when using the 
heap (rather than the stack).

Comparing the code, I think the difference in performance is caused by two 
things:


  * The Nim code isn't in a main procedure
  * The C++ code is using int`s (32 bits) while Nim is using `int`s (64 bits). 
Nim's `int type is always the size of a pointer, so you can use it for indexing 
arrays.



The time I got for the modified code was about the same: 


/tmp $>time ./temp
result: 1998

real0m0.847s
user0m0.792s
sys 0m0.048s

/tmp $>time ./temp_cpp
result: 1998

real0m0.905s
user0m0.851s
sys 0m0.051s


Run

And the code I used: 


## compiled with: nim -d:r c filename
## nim v 0.19.9

proc main =
  const N = 20_000_000
  var data = newSeqUninitialized[int](N)
  # custom init
  for i in 0 ..< N:
data[i] = i
  # busy work
  for r in 1 .. 50:
for i in 3 ..< N-1:
  data[i] = (data[i+1]-data[i])+data[i-1]
  echo "result: ",data[N-2]

main()


Run


// compiled with: c++ -O3 -o filename filename.cpp
#include 
const int N = 2000;
int main()
{
  size_t* data = new size_t[N];
  // custom init
  for (size_t i=0; i

Re: Can I access arrays faster than this?

2019-03-04 Thread Ward
try "nim c -d:release --opt:speed"


Re: Can I access arrays faster than this?

2019-03-05 Thread cblake
If you can't break your habit, you can always add a few lines near the top 
(before all the `release`-dependent switching) of your `nim.cfg`: 


@if r:   #Allow short alias -d:r to activate fast release mode
  define:release
@end


Run

Perhaps somewhere you picked up a `$HOME/.config/nim.cfg` that does exactly 
this, and then lost it somehow moving between accounts/machines or maybe 
`nim.cfg` to `.nims`? There's surely also some similar Nim Script/`.nims` 
variant


Re: Can I access arrays faster than this?

2019-03-05 Thread ggibson
That's a neat trick! Thanks for mentioning it. I'll have to learn about how the 
non-`.nim` files all actually work as I've been avoiding them.


Re: Can I access arrays faster than this?

2019-03-05 Thread ggibson
> * The Nim code isn't in a main procedure
> 
> * The C++ code is using int s (32 bits) while Nim is using int s (64 bits). 
> Nim's int type is always the size of a pointer, so you can use it for 
> indexing arrays. The difference in size between those types means that the 
> Nim program is processing twice as much data.

The 32 bit / 64 bit made a noticeable difference - that was an excellent 
suggestion. I looked around for a nice way to enforce changes to all my 
literals and types, and found an example 
[here](https://forum.nim-lang.org/t/1267#19038) by @Araq that I've modified to 
be more generic. I love that nim supports this! I may rework it and submit it 
as a module since it's so useful.


Re: Can I access arrays faster than this?

2019-03-05 Thread Stefan_Salewski
> Now that it's all working (me now understanding what's going on) I looked 
> around for a nice way to enforce changes to all my literals and types, and 
> found an example here by

While Araqs example macro is very interesting, I strongly assume that 
performance impact of float literal types is only minimal in most cases. 
Because for CPU computations, what makes the performance difference is 
generally that for float32 number of values in caches is 2 time compared to 
float64. The math operations in FPU itself make no real difference (for x86 
CPU), and I would even guess that C backend can optimize it when result is 
stored in a f32 var. For ARM CPU or GPU code literal types may make a 
difference indeed. Do you have an example for this case already?


Re: Can I access arrays faster than this?

2019-03-06 Thread cblake
Either automatic or manual vectorization can also allow twice as many `float32` 
numbers to be handled per vector instruction vs `float64` on x86 just like the 
ARM or GPU cases. You may need `-march=native` or `-mavx` compiler flags (or 
manual intrinsics/assembly) to activate that feature, though, instead of 
targeting some lowest common denominator x86 cpu and C compiler 
autovectorization can be finicky.

It is true that for many calculations things are memory bandwidth bound where 
you still get 2x improvement. However, many are not membw bound or may be in 
fast caches. For those the 2x wider vectors help. (Funny - caches used to be 
almost entirely about latency but have become about both latency & bandwidth in 
recent times).

Obviously, the wrong answer faster is not helpful, but it often is close to 2x 
faster, depending on how vectorizable what you're doing is, compiler, and 
compiler flags (and/or manual assembly). Excess precision is also not helpful, 
if the cost is not minimal.


Re: Can I access arrays faster than this?

2019-03-06 Thread ggibson
> While Araqs example macro is very interesting, I strongly assume that 
> performance impact of float literal types is only minimal in most cases.

But that's for floats, perhaps. The only reason I went down that road is 
because I found a noticeable speedup by putting `'i32` and `int32` everywhere, 
but it was easy to miss decorating a literal and made the code visually very 
messy.

The example is the first post code but with N=100 million, and I run that exe 
10 times. On my machine c++ (int32) takes about 7.5 seconds per run, and nim 
(int64) 8.5 seconds per run. If I liberally `i32` decorate, then nim becomes 
identical to the c++ version in runtime.

With @cblake's suggestion, I tried `-march=native` and/or `-mavx` with no 
benefit for me.


Re: Can I access arrays faster than this?

2019-03-06 Thread ggibson
My current extension of @Araq's macro looks like this:


import macros, typetraits

proc replace(n: NimNode; typesrc,typedst: typed; kndsrc,knddst: 
NimNodeKind): NimNode =
  if n.kind == kndsrc:
when not defined(release): echo "replacing ",n.repr," with ",knddst
result = newNimNode(knddst)
case kndsrc:
of nnkFloatLit: result.floatVal = n.floatVal
of nnkIntLit: result.intVal = n.intVal
else: discard
  elif n.repr == typesrc.repr:
when not defined(release): echo "replacing ",n.repr," with 
",typedst.repr
result = newIdentNode(typedst.repr)
  else:
result = copyNimNode(n)
for i in 0..

Re: Can I access arrays faster than this?

2019-03-06 Thread Stefan_Salewski
I really like that macro, because it is a nice example for explaining the power 
of Nim macros to new users. It is not too complicated, and it is easy to 
understand the usecase.

But note that using int32 data type is not that hard without it, if really 
desired: At most 3 locations would need a fix:


proc doit =
  var a = [3i32, 3, 6]
  for i in 0.int32 ..< 3:
echo a[i]
doAssert i is int32
doAssert a[i] is int32
  for i in low(a).int32 .. high(a):
echo a[i]
doAssert i is int32
doAssert a[i] is int32

doit()



Run


Re: Can I access arrays faster than this?

2019-03-07 Thread mratsim
Also for r in 1 .. 50: does more work than C for (size_t r=0; r<50; r++) {


Re: Can I access arrays faster than this?

2019-03-07 Thread cblake
@mratsim is probably intending to refer to the `..` including `50` in Nim while 
the C `for` with a `<` excludes `50` doing about 2% less work, but the 
terseness and style of of his comment may leave the wrong impression. 


for i in 0..50: echo i


Run

indeed compiles down (in release mode) to simply: 


NI res = ((NI) 0);
while (1) {
if (!(res <= ((NI) 50))) goto LA3;
res += ((NI) 1);
}
LA3: ;


Run

plus some stuff only related to my choice of `echo` for the body (which I 
removed for clarity). Any decent optimizing C compiler should treat those two 
ways to spell the loop (@mratsim's `for` and the above `while`) the same.

TL;DR the extra time is from the bounds of the iteration, not the language or 
iterator overhead.


Re: Can I access arrays faster than this?

2019-03-07 Thread miran
> @mratsim is probably intending to refer to the .. including 50 in Nim while 
> the C for with a < excludes 50

I doubt that because he has written **1** .. 50 ;)


Re: Can I access arrays faster than this?

2019-03-07 Thread cblake
With such a brief comment, it's hard to know which is why I said "probably 
intending". Only one person knows. ;) Maybe he did think iterators cost more.

You are right I did misread his 1..50 as 0..50 {after looking at the first 
version of the ggibson Nim code, not the 2nd where he confusingly switched to 
1..50 not paralleling the C as well, but correcting his amount-of-work 
mismatch}.


Re: Can I access arrays faster than this?

2019-03-07 Thread ggibson
@cblake True, true. I was simply enjoying that I could write "one through 
fifty, so fifty times" very simply and easy to read in nim, whereas I just 
relied on trained C eyes to interpret the C code of its intended meaning "zero 
up until 50, meaning 50 times". Perhaps nim's `countup()` would have been even 
more appropriate.

@Stefan_Salewski You're making fun of my specific example! :) Yes adding `i32` 
wasn't that cumbersome in THAT example, but I'm sure you could imagine 
scenarios where it would get more annoying - this was only an illustration of 
how it works. Also, you have to be sure that you didn't miss one, not to 
mention you'll already have to annotate by hand any relevant var types in the 
signature since the macro doesn't handle that part.


Re: Can I access arrays faster than this?

2019-03-10 Thread cdunn2001

nim -d:r c foo.nim
...
Hint: operation successful (12405 lines compiled; 0.251 sec total; 
16.414MiB peakmem; Debug Build) [SuccessX]


Run

That's still a "Debug Build". You need `-d:release`.


Re: Can I access arrays faster than this?

2019-03-10 Thread cdunn2001

# nim: et
## compiled with: nim -d:release c filename
## nim v 0.19.9
proc main() =
  const N = 20_000_000;
  #var data {.noinit.}: array[N,int32]
  var data {.noinit.} = newSeq[int32](N)
  # custom init
  for i in 0'i32 ..< N:
data[i] = i
  # busy work
  for r in 1 .. 49:
for i in 3 ..< N-1:
  data[i] = (data[i-1]+data[i+1]) div 2
  echo "result: ",data[N-2]
when isMainModule:
  main()


Run


$ time ./speed-nim
result: 1998

real0m1.576s
user0m1.527s
sys 0m0.035s

$ time ./speed-cpp.exe
result: 1998

real0m1.591s
user0m1.543s
sys 0m0.035s


Run

Does anyone know whether `{.noInit.}` applies to `newSeq()`? Just curious?


Re: Can I access arrays faster than this?

2019-03-10 Thread miran
> Does anyone know whether `{.noInit.}` applies to `newSeq()`?

You could use 
[newSeqUninitialized](https://nim-lang.github.io/Nim/system.html#newSeqUninitialized%2CNatural).