Re: Execution speed Nim vs. Python

2017-09-27 Thread cblake
(I should perhaps qualify beyond -d:release using gcc or clang with high 
optimization levels on the back end since jil210 revealed in [another 
thread](https://forum.nim-lang.org/t/3198) the baseline 5X performance mystery 
arose from using tcc as a back end.)

Also, for the curious, the reason C++ STL will usually underperform Nim's hash 
tables in benchmarks like this is that STL iterator deletion semantics make the 
natural STL hash table collision resolution implementation choice be external 
chaining. Those extra linked list indirections add latency, especially when 
tables do not fit in on-CPU cache.


Re: Execution speed Nim vs. Python

2017-09-27 Thread cblake
Nim `Table` and `HashSet` should be caching the hash code values in their 
tables and also using that as a "comparison prefix" (comparing string values 
only if the integer hash codes already match). The string hash of the new 
inputs is ineliminable - or rather has to be done at least once anyway.

My guess is the lines are long-ish (greater than 10 chars, say). The default 
Nim hash string function could be faster, especially for longer strings. E.g., 
the current default does byte-at-a-time manipulations and usually 
8-byte/word-at-a-time can get about as good bit mixing much faster, especially 
for long strings like lines in files. Such a replacement might take that 60% 
hashing time in this particular benchmark down to 10-15% for a 2X-ish overall 
speed up.

I have benchmarked C++ for these kinds of tests and I suspect the current STL 
would not be faster than Nim with the -d:release in this case. (C++, too could 
benefit from faster default string hash).


Re: Execution speed Nim vs. Python

2017-09-27 Thread Varriount
Python caches hashing of strings. Nim does not (it would be a challenge, as Nim 
strings are mutable). I suggest using string references or Hash objects if you 
want to compare performance.

Has anyone benchmarked C++ for this kind of test?


Re: Execution speed Nim vs. Python

2017-09-26 Thread jil210
Please help speedup with the following nim code taking five times longer than 
python version: import os, sets, strutils

if paramCount() != 2:
echo "mydiff file1 file2" quit()

let file1 = open(paramStr(1)) let file2 = open(paramStr(2))

let old_lines = file1.readAll().splitLines() let new_lines = 
file2.readAll().splitLines() file1.close() file2.close()

let old_lines_set = toSet(old_lines) let new_lines_set = toSet(new_lines)

let old_added = old_lines_set - new_lines_set let old_removed = new_lines_set - 
old_lines_set

for line in old_lines:


if line in old_added:
echo "-", line.strip()
elif line in old_removed:
echo "+", line.strip()
for line in new_lines:


if line in old_added:
echo "-", line.strip()
elif line in old_removed:
echo "+", line.strip()


Re: Execution speed Nim vs. Python

2016-08-11 Thread euant
@didlybom: In C#, we call them IEnumerable which is quite a good name. 
Something like enumerable[int] or iterable[int] might make more sense than 
openarray does, though it's a fairly substantial change.


Re: Execution speed Nim vs. Python

2016-08-11 Thread didlybom
I think it was already discussed a while ago but I think this thread shows that 
"openarray" is not a very good name. IMHO it really does not convey what it 
really means. Something else such as "anyarray" or "arraylike" might be less 
accurate but easiert o inderstand (again, IMVHO). 


Re: Execution speed Nim vs. Python

2016-08-11 Thread Araq
Since seqs and strings have value semantics we can implement them with C++'s 
"Small vector optimization" and don't need more complexity in the language (but 
in the runtime).


Re: Execution speed Nim vs. Python

2016-08-11 Thread cblake
flyx wrote:

> And yes, something between array and seq would be nice. But currently, Nim's 
> type system does not allow a type with a runtime-calculated length parameter. 
> Allowing it would be a hack similar to C VLAs.

No hack involved here. The length parameter can be part of the object like seq 
rather than part of the type like array. openarray-accepting procs should not 
care since they can receive seqs anyway with run-time variable everything or 
arrays with compile-time fixed everything.

One might argue that I/others could just do their own concept which is true. 
The standard lib has lots of openarray-accepting routines that would work fine 
with post-init-fixed seq-like objects, though. So, not having such a type in 
the standard lib that's part of openarray feels unnecessarily inflexible, but 
on this I think we agree since you said it would be nice. It may just be easier 
than first imagined. :)


Re: Execution speed Nim vs. Python

2016-08-11 Thread wiffel
> Weirdly enough that code runs ~25% faster for me when changing res from 
> string to seq[char], 1.51 instead of 1.99. I have no clue why.

@Tarmean: Strange indeed ...

I did try that too, but the performance was the same.


Re: Execution speed Nim vs. Python

2016-08-11 Thread flyx
cblake: C11 made VLAs an optional feature though. I consider C VLAs a very poor 
solution; as you say, it's mostly syntactic sugar for alloca.

And yes, something between array and seq would be nice. But currently, Nim's 
type system does not allow a type with a runtime-calculated length parameter. 
Allowing it would be a hack similar to C VLAs.


Re: Execution speed Nim vs. Python

2016-08-11 Thread cblake
@flyx - c99 (and gcc long before it) have VLAs of the 
fixed-after-initialization variety...mostly syntactic sugar for alloca, of 
course. All that is surely more commonly used than Ada and required no fancy 
containers or any constraints about being a leaf function. Such an array style 
may be different enough from the post-init-resizable Nim seqs to warrant a 
different type, but it could still be an openarray.

I think Nim needs a kind of openarray between array and seq - one which is 
run-time-sized, but fixed after initialization and with backing memory not 
necessarily on the heap. For example, an array of char that is backed by a 
read-only memory mapped file and so cannot be zero-terminated, but being from a 
file its size cannot change after initialization and yet the size can also 
never be known at compile time. Then it could get bounds checking like other 
openarrays and otherwise behave similarly. Maybe that's not the best example, 
but I'm sure there are many more. Lots of times the sizes of things are only 
known at run-time, but don't change once known/computed. Also, many openarray 
accepting routines need not modify their inputs -- why, they cannot change 
lengths of arrays, for example. So, it would be nice if all openarray accepting 
routines could work with such a type, whatever it might be called. Maybe there 
is one and I am just unaware.


Re: Execution speed Nim vs. Python

2016-08-11 Thread mora
Just a quick idea: if you don't understand what happens under the hood, then 
you can peek at nimcache/*.c (or even better, you can diff two versions).

> Peter


Re: Execution speed Nim vs. Python

2016-08-11 Thread Tarmean
wiffel

Weirdly enough that code runs ~25% faster for me when changing res from string 
to seq[char], 1.51 instead of 1.99. I have no clue why.


Re: Execution speed Nim vs. Python

2016-08-10 Thread flyx
Stefan_Salewski:

> May it be possible to have one single Seq in each proc on the stack? Of 
> course it has to be the last element, and can only work when that proc do not 
> call other procs.

That may be possible, but far too complicated with far too few impact on 
performance. The compiler would need to properly place the seq, statically 
analyze that no other procs are called, and add special transformation code 
every time you copy this seq. If you have, somewhere in your proc, some code 
`a.b`, where `b` is some field in `a``s type, and you transform ``b` into a 
getter proc at some point in the future, you code suddenly wouldn't be able to 
compile anymore. And this is just one example why it is a really bad idea.

Ada is the one language I know which went a long way for arrays of dynamic 
length which can still be stored on the stack. The only restriction is that an 
array cannot change size after initialization. Therefore, it does not cover all 
use cases and it still needs dynamic container types in its standard library. 
Also, handling those arrays is a bit more difficult than in other programming 
languages. Everything comes at a price.


Re: Execution speed Nim vs. Python

2016-08-10 Thread Arrrrrrrrr
> Seqs in nim have value type semantics. Your inner loop ( for d in Directions) 
> actually assigns Directions[...] to d on every iteration which involves full 
> copy of the seq.

How strange that it actually makes a new copy when mitems doesn't. What's the 
reason of this?


Re: Execution speed Nim vs. Python

2016-08-10 Thread wiffel
@yglukhov, @def : If this is indeed a performance contest ... :)

This code is about 10x faster. (and I only cheated a tiny little bit :) )

A 100 million iterations runs in less than 2 seconds on my machine.


import times

const
  count = 100_000_000
  NE = [8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]
  Directions = [NE, NE, NE, NE]

proc do_something(board: string, res: var string) =
  for b in board.low .. board.high:
for d in Directions.low .. Directions.high:
  res[b * Directions.len + d] = board[Directions[d][b]]

var
  board = "0p.pP.pP.p.pP.p"
  res = newStringOfCap(board.len * Directions.len)

let start = cpuTime()
for i in 1 .. count:
  do_something(board, res)

echo "** Nim "
echo "** Time   : ", $(cpuTime() - start), " seconds"
echo "** Counts : ", count
echo "** Result : ", res
echo "***"



Re: Execution speed Nim vs. Python

2016-08-10 Thread Stefan_Salewski
> that it is really the const seq which is causing most of the slowdown

That is very interesting! I strongly assume that const seg is identical to 
const array?

Have you tested? So this may indicate that we should avoid const arrays when 
speed is important.


Re: Execution speed Nim vs. Python

2016-08-10 Thread chemist69
The const array is fast, the const seq seems to be very slow, much worse than 
in the var array vs. var seq case. So, const seq definitely is not identical to 
const array.

To me, the takeaway message is: do not use const seq (probably does not make 
much sense to use anyway) and use const array instead.


Re: Execution speed Nim vs. Python

2016-08-10 Thread chemist69
I would like to point out, that it is really the **const seq** which is causing 
most of the slowdown in the original version (and a const seq does not make 
much sense, if you think about it and should maybe be complained about by the 
compiler). If you use a var seq instead, the difference to the var array is 
much less pronounced. This takes 0.9 sec for 1_000_000 iterations: 


var NE = @[8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]
var Directions = [NE, NE, NE, NE]


while this takes 0.6 sec: 


var NE = [8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]
var Directions = [NE, NE, NE, NE]


Compare this to 11 vs. 1 sec for the const case mentioned earlier.


Re: Execution speed Nim vs. Python

2016-08-10 Thread Stefan_Salewski
> flyx: Seqs always need to be allocated on the heap.

Sorry, do not now much about it...

May it be possible to have one single Seq in each proc on the stack? Of course 
it has to be the last element, and can only work when that proc do not call 
other procs. 


Re: Execution speed Nim vs. Python

2016-08-10 Thread flyx
> I do not understand the use of the openarray parameter in relation to 
> performance.

`openarray` has nothing to do with performance. It is just a feature to allow 
implementing procs that take both an `array` and a `seq` as parameter.

The actual performance gain stems from using arrays instead of seqs. Because an 
array's length is known at compile time, it can be allocated on the stack. Seqs 
always need to be allocated on the heap. Therefore, it is always advisable to 
use arrays when possible. Other performance optimizations are explained in the 
previous posts.

> If the length of the output array of a proc is not known in advance, are 
> there general rules to accomplish a good performance?

If you can calculate the target length of the array efficiently before actually 
filling it, initialize the result seq with as many elements as needed (as shown 
in the other posts) or use `newSeqOfCap` if you're on devel or once this has 
made it into a release.


Re: Execution speed Nim vs. Python

2016-08-10 Thread akalverboer
The answers were very instructive. Thanks. Two questions.

I do not understand the use of the openarray parameter in relation to 
performance. Is it better for the speed than using seq[char] ?

If the length of the output array of a proc is not known in advance, are there 
general rules to accomplish a good performance? For example using tables.


Re: Execution speed Nim vs. Python

2016-08-09 Thread chemist69
Hi, I am still very new to Nim and played around with the different solutions 
given here. To me, the biggest difference is made by changing the type of NE 
from seq to array: With this line: 


const NE = @[8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]


and 1_000_000 iterations it takes around 11 sec on my Linux box. With this line 
instead: 


const NE = [8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]


the time is reduced to ~ 1 sec.


Re: Execution speed Nim vs. Python

2016-08-09 Thread Sixte
With the original version I got 1.822 s runtime. With yglukhov's (first) 
version, I got 0.012 s runtime, replacing Directions.len with Directions.high.

Python runtime (for comparison) not available. 


Re: Execution speed Nim vs. Python

2016-08-09 Thread def
Optimization, my favorite Nim topic!

Runtime of your version for me: 0.766 s


res.add board[d[i]]


This line resizes the `res` seq, instead you should allocate it to have the 
correct size directly. Like this: 


proc do_something(board: seq[char]): seq[char] =
   var res = newSeq[char](board.len * Directions.len)
   for i, p in board:
  for j, d in Directions:
 res[i * Directions.len + j] = board[d[i]]
   return res


New runtime: 0.073 s

With the new devel branch of Nim you can also use `newSeqOfCap` if you don't 
want to manually calculate the index:


proc do_something(board: seq[char]): seq[char] =
   var res = newSeqOfCap[char](board.len * Directions.len)
   for i, p in board:
  for d in Directions:
 res.add board[d[i]]
   return res


As a workaround you can also do `var res = newSeq[char](board.len * 
Directions.len); res.setLen(0)` to achieve the same effect.


res = do_something(board)


You create a new `res` seq for every call to do_something. Instead you could 
reuse a single data structure and pass it to `do_something`. But that's 
probably not what you want to benchmark then.

Full code with some cleanup (use implicit result variable, no `$` for echo, 
arrays instead of seqs, more consts):


# Nim example to test execution speed against a Python version
import times, os

const
  board = ['0', 'p', '.', 'p', 'P', '.', 'p', 'P', '.', 'p', '.', 'p', 'P', 
'.', 'p']
  NE = [8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]
  Directions = [NE, NE, NE, NE]  # for short: in real application the array 
elements differ

proc do_something(board: openarray[char]): seq[char] =
  result = newSeqOfCap[char](board.len * Directions.len)
  for i in 0 .. board.high:
for d in Directions:
  result.add board[d[i]]

let t0 = cpuTime()
var res: seq[char]
const count = 100_000
for i in 1..count:
  res = do_something(board)
let t1 =  cpuTime()
echo "* Time elapsed for Nim: ", t1 - t0, "  Counts: ", count
echo res.repr


New runtime: 0.065 s

For comparison, Python runtime: 0.783 s


Re: Execution speed Nim vs. Python

2016-08-09 Thread def
@yglukhov: Ah, you don't make copies of the entries of Directions. I think 
that's an optimization the compiler should be able to do somehow. Also, you can 
use `0 .. Directions.high` instead of `0 ..< Directions.len`. Using `mitems` 
instead achieves the same effect, but needs a `var`:


# Nim example to test execution speed against a Python version
import times, os

const
  board = ['0', 'p', '.', 'p', 'P', '.', 'p', 'P', '.', 'p', '.', 'p', 'P', 
'.', 'p']
  NE = [8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]
var Directions = [NE, NE, NE, NE]  # for short: in real application the 
array elements differ

proc do_something(board: openarray[char]): seq[char] =
  result = newSeqOfCap[char](board.len * Directions.len)
  for i in 0 .. board.high:
for d in Directions.mitems:
  result.add board[d[i]]

let t0 = cpuTime()
var res: seq[char]
const count = 100_000
for i in 1..count:
  res = do_something(board)
let t1 =  cpuTime()
echo "* Time elapsed for Nim: ", t1 - t0, "  Counts: ", count
echo res.repr



Re: Execution speed Nim vs. Python

2016-08-09 Thread yglukhov
@def, your version is 0.08 for me. My version is 0.05 ;) 


import times, os

const
  board = ['0', 'p', '.', 'p', 'P', '.', 'p', 'P', '.', 'p', '.', 'p', 'P', 
'.', 'p']
  NE = [8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]
  Directions = [NE, NE, NE, NE]  # for short: in real application the array 
elements differ

proc do_something(board: openarray[char]): seq[char] =
   result = newSeqOfCap[char](board.len * Directions.len)
   for i in 0 .. board.high:
  for d in 0 ..< Directions.len:
 result.add board[Directions[d][i]]

let t0 = cpuTime()
var res: seq[char]
const count = 10
for i in 1..count:
   res = do_something(board)
let t1 =  cpuTime()
echo "* Time elapsed for Nim: ", t1 - t0, "  Counts: ", count



Re: Execution speed Nim vs. Python

2016-08-09 Thread yglukhov
Seqs in nim have value type semantics. Your inner loop ( for d in Directions) 
actually assigns Directions[...] to d on every iteration which involves full 
copy of the seq. Here's the patched version. I have also preallocated result 
seq. 


import times, os

var board: seq[char] = @['0', 'p', '.', 'p', 'P', '.', 'p', 'P', '.', 'p', 
'.', 'p', 'P', '.', 'p']
const NE: seq[int] = @[8, 2, 13, 4, 5, 7, 8, 3, 9, 12, 7, 8, 3, 9, 12]
const Directions = [NE, NE, NE, NE]  # for short: in real application the 
array elements differ

proc do_something(board: openarray[char]): seq[char] =
  result = newSeq[char](board.len * Directions.len)
  var ir = 0
  for i, p in board:
 for d in 0 ..< Directions.len:
result[ir] = board[Directions[d][i]]
inc ir

let t0 = cpuTime()
var res: seq[char]
let count = 10
for i in 1..count:
  res = do_something(board)
let t1 =  cpuTime()
echo "* Time elapsed for Nim: ", $(t1 - t0), "  Counts: ", $count