Re: How do you check the presence of an annotation?

2017-11-12 Thread Udiknedormin
I'm not sure it's possible, actually. You can retrieve t's initialization value 
(!), using t.symbol.getImpl, which is useful when handling const, but I can't 
see any way (or at least an obvious way) how to extract variable pragmas from a 
variable symbol.


Re: Heterogeneous object pool with Timed Eviction

2017-11-12 Thread Udiknedormin
@mratsim Well, you mentioned game programming at first so "serious deep 
learning" didn't come to my mind. ^^"


Re: What's happening with destructors?

2017-11-12 Thread Udiknedormin
@mratsim Well, if you call GC_collect manually then it's much worse than manual 
allocation and deallocation from memory pool, I guess.

Could you explain further what getOccupiedMem changes in the case we were 
talking about (ref to ptr to memory external to thread heap)?


Re: atomics: Why is interlockedCompareExchange8 "safe"?

2017-11-12 Thread cdome
Actually _InterlockedCompareExchange8 does exist, It was added in VS2012 
silently and it is not mentioned in the docs.


Re: Help wrapping a C++ module

2017-11-12 Thread jlp765
I submitted an example to the docs, but it is still [a 
PR](https://github.com/nim-lang/Nim/pull/6711), so have a look at the PR.

And try something like the following (and remove the push and pop pragmas):


const
  hdr = "nimfuzz/fts_fuzzy_match.h"
proc fuzzyMatchSimple*(pattern: cstring; str: cstring): bool {.importcpp: 
"fuzzy_match_simple", header: hdr.}


which will add the line


#include "nimfuzz/fts_fuzzy_match.h"

to the `nimcache/test.cpp` file, and map the c++ function `fuzzy_match_simple` 
to the Nim proc `fuzzyMatchSimple()`


Re: question about templates / namespaces / modules

2017-11-12 Thread Stefan_Salewski
Have you tried something like


when AVX2_Available():


Of course that can only work when your AVX2_Available() is known at compile 
time...


question about templates / namespaces / modules

2017-11-12 Thread jackmott
Here is some pseudocode for what I would like to do:


template SIMD(actions:untyped) =
  if AVX2_Available():
import AVX2   # use the avx2 version of Add/Mul etc.
actions
  else:
import SSE# use the sse version of Add/Mul etc.
actions

SIMD:
  let c = SIMD_Add(a,b)
  SIMD_Mul(b,c)


But I can't do that because imports have to be top level. I can do include 
instead of import, that will lead to massive code size if someone had multiple 
uses of the template in their code, as it will reproduce the same functions 
over and over.

Is there some way to accomplish this?


How do you check the presence of an annotation?

2017-11-12 Thread monster
I'd like to check that some "memory location" is annotated with {.volatile.}, 
to make sure my code only compiles, if I use {.volatile.} in the right place. I 
searc the lib code but didn't really find much, except this (asyncmacro.nim):


proc asyncSingleProc(prc: NimNode): NimNode {.compileTime.} =
  # ...
  # LINE NO: 385
  # If proc has an explicit gcsafe pragma, we add it to iterator as well.
  if prc.pragma.findChild(it.kind in {nnkSym, nnkIdent} and $it == 
"gcsafe") != nil:
closureIterator.addPragma(newIdentNode("gcsafe"))


I gave it a try, but did not get very far, as I'm still "practicing" the 
language itself, and haven't learned (or more precisely already forgot), how 
the meta-programming works.

Here is what I tried: 


import macros, strutils

type
  VolatilePtr*[T] = distinct ptr T
  # Means it's a pointer to some volatile value

proc hasAnnotation(stuff: NimNode, annotation: static[string]): bool 
{.compileTime.} =
  (stuff.pragma.findChild(it.kind in {nnkSym, nnkIdent} and $it == 
annotation) != nil)

template toVolatilePtr*[T](t: untyped) =
  when hasAnnotation(t, "volatile"):
VolatilePtr[T](addr t)
  else:
{.error: "t is not volatile!".}


when isMainModule:
  var tua = 42'i32
  let p: VolatilePtr[int32] = toVolatilePtr[int32](tua)
  let tua2: int32 = 42'i32 #atomicLoadNSeqCST[int32](p)
  assert(tua2 == tua)



Re: Heterogeneous object pool with Timed Eviction

2017-11-12 Thread mratsim
Yes, a Nvidia Titan X is quite a common GPU for deep learning and comes with 
12GB of VRAM, GTX 1080 Ti, 11GB and GTX 1070-1080 8GB. Whatever the GPU the 
goal is to saturate them with the biggest batch they can handle.

There is a lot of research going into more memory-efficient networks, and how 
to optimize memory usage without impacting accuracy.

I did not came across this Rust allocators collection, great. All my Cuda 
Allocator research is detailed 
[there](https://github.com/mratsim/Arraymancer/issues/112)


Re: What's happening with destructors?

2017-11-12 Thread mratsim
Right it seems like there are a bunch of `getOccupiedMem` in the gc file so 
that it can get the memory managed, including a [collection 
threshold](https://github.com/nim-lang/Nim/blob/devel/lib/system/gc.nim#L838).

In my case the shared mem array is only used to hold temporaries for 
computation (main long-lived data is held in a seq) so I can trigger a manual 
GC_collect after computation is done, but for non-temporary usage I could get 
out-of-mem even though enough space can be collected.


Re: Surprises with Generics

2017-11-12 Thread cblake
One other wrinkle more in line with this thread topic is making things work for 
general object types but specific standard key types. In C one might handle 
that by taking an `offsetof` for where they key field is within an object and 
`sizeof` for the size of object in the array. In Nim, one could do similar, but 
it might be more "Nimonic" to more safely dispatch via some `sort template` 
instead of a `sort proc`, e.g.: 


template sort[T](inp: seq[T], keyField: untyped) =
  when T.`keyField` is byte: ## XXX Handle all the standard types
...  ## XXX good & stable algos for each one
myArray.sort(myKey)


The implementation might look nicer with a family of overloaded calls rather 
than a big `when T.`keyField` is` dispatcher. I am not sure there is a clean 
way to do that in this case. Each `when` clause should be able to dispatch to a 
type-specialized case, though. So, it's not too bad.


Re: Surprises with Generics

2017-11-12 Thread cblake
@Stefan_Salewski - I think this idea of having a family of overloaded procs to 
sort on standard types is very good. I have almost raised it myself several 
times.

It would be useful for `cmp`-less `sort` procs on such type to be _stable_ 
sorts (meaning elements with identical keys retain relative positions in the 
array). Then a user could implement a simple 2-level or 3-level sort on 
distinct embedded keys by just calling such sort procs 2 or 3 times. (A 
multi-level sort orders elements first by a primary key, then a secondary key 
for equal primary keys, and so on).

This arrangement has another performance bonus potentially much greater than 
the inlining effect mentioned so far in this thread. It allows one to tune 
which sorting algorithm is used based on the nature of the key, e.g. how long 
it is, integer or floating point or string, etc. For example, for single-byte 
integer keys it is hard to be faster than [counting 
sort](https://en.wikipedia.org/wiki/Counting_sort) which does two pretty linear 
sweeps through the input - the first to histogram byte values and the second to 
output the answer (with a keyspace-sized prefix sum in between to convert 
counts to output offsets). The simplest version of that does require an 
additional copy of the input array which may have some consequences on the proc 
signature/interface/specification.


atomics: Why is interlockedCompareExchange8 "safe"?

2017-11-12 Thread monster
Hi,

I'm trying to understand what goes on in "lib\system\atomics.nim". This is in 
part because I'm missing atomicLoadN/atomicStoreN on Windows, and I'm trying to 
work out how to implement that myself. I've just stumbled upon this declaration 
(atomics.nim, line #220):


 interlockedCompareExchange8(p: pointer; exchange, comparand: byte): byte
  {.importc: "_InterlockedCompareExchange64", header: "".}


At first, I though using __InterlockedCompareExchange64_ was a bug, but then I 
found out that there is no __InterlockedCompareExchange8_.

So, I guess _exchange_ and _comparand_ get cast to __int64_, and the return 
value just gets cast to _byte_. So far so good.

But __InterlockedCompareExchange64_ assumes _p_ points to a __int64_ value, and 
so will overwrite the _8 bytes_ at that location.

How can that not go horribly wrong?


Re: Surprises with Generics

2017-11-12 Thread Stefan_Salewski
Reading this blog post

[http://nibblestew.blogspot.de/2017/11/aiming-for-c-sorting-speed-with-plain-c.html](http://nibblestew.blogspot.de/2017/11/aiming-for-c-sorting-speed-with-plain-c.html)

> I just remembered a discussion some time ago:

S. Salewski wrote:

> A consequence may be, that for algorithm.sort() we do not need a cmp() proc 
> parameter (which can not be inlined and is a bit slow so). We may just define 
> a plain cmp() proc in our module for the types which we pass to compare?
> 
> Some months ago I tested a quicksort proc, and found out that giving a cmp() 
> proc for custom types decreases performance by about 25 percent compared to 
> using a default cmp proc for known standard types.

So maybe we should provide indeed a sort proc without cmp() function.


Re: Do we really like the ...It templates?

2017-11-12 Thread niofis
I do like the It templates, they make the code shorter.

Although I also found that nesting them was awkward when trying to multiply two 
sequences, it made me think of different ways to achieve the same thing, and 
make it look even better than the initial implementation, it ended looking like 
this:


let acc = neuron.weights.zip(input).mapIt(it.a * it.b).foldr(a + b)


What's not to like about the above code; without the It template, the map would 
end up being more verbose, or worse, using for loops.


Re: Do we really like the ...It templates?

2017-11-12 Thread Udiknedormin
@Araq Maybe you don't need flexibility but please imagine using a it-template 
inside of an it-template (e.g. an apply inside of an apply).  Just like @olwi 
said, => is better and I think it should be brought to standard Nim.

People only mention two reasons to use it-templates, correct me if I'm wrong:

  * "shorter code" \--- ok, but also less flexible (apply-in-apply) and less 
readable (a matter of taste, I admit)
  * "closures are inefficient" \--- why not making them efficient somehow? Like 
inlining, let's say?



@doofenstein But it's optional (just like in, let's say, Scala). Why? Because 
of the higher-order-functions composition, which would make it (pun intended) 
ambiguous.


Re: case statement with exceptions

2017-11-12 Thread Udiknedormin
Call me a weirdo but I think a converter which could fail should return an 
option[T] instead of T itself.  This way you can either unwrap the option 
(works like exception) or pass it somewhere first, e.g. into a container so 
that you could use unwrap inside of an higher-order routine.


Re: Heterogeneous object pool with Timed Eviction

2017-11-12 Thread Udiknedormin
@cdome That's funny, actually, I've used a container that seems to work just 
like yours, despite the fact the use case was totally different.  It was for an 
evolutionary program.

@mratsim If I get it, you need at least 93 MB * 32 images = 11 GB 904 MB per 
batch?

By the way: did you look at [Rust allocators 
collection](https://github.com/rphmeier/allocators)?


Re: What's happening with destructors?

2017-11-12 Thread Udiknedormin
Well, you said "today you can do" as a general advice so I assumed you try to 
show us "how simple it is to do it by hand", not "how easy it is to make sth 
only for internal usage".

About this object being a ref one --- no, please read my comment again. Long 
story short: ref object can be deepCopied, your example cannot.

Besides: no, that's not the same. Why? Because GC finalizes a _pointer_. So it 
will finalize it when it comes to the conclusion it should free the memory for 
a _pointer_. It cares not about how huge memory block on shared heap it 
corresponds to nor think whenever shared heap should be cleaned at all. So you 
could have 90% of the shared heap filled with unused arrays and GC saying "Why 
free anythin? I've still got plenty of space on my thread's heap!" because 
pointers themselves are very small. That's the same I've told you about CUDA 
memory. Simply put: you don't have a GC-ed shared heap. You have an _illusion_ 
of GC-ed shared heap.


Re: Hiding sideeffect from compiler

2017-11-12 Thread Udiknedormin
I don't think it's something you should actually do as the compiler can do 
certain optimizations when it knows a routine has no side effects. 
Optimizations which may opt out your side effects, I guess.


Re: Do we really like the ...It templates?

2017-11-12 Thread doofenstein
e.g. in Kotlin, when a function with only one parameter is accepted, in it's 
definition, the first parameter name can be omitted and instead it's named `it` 
implicitly: 


strings.filter { it.length == 5 }.sortedBy { it }.map { it.toUpperCase() }


So there it's even part of the core language. In Nim it's only a part of 
certain stdlib templates, but it shows that there are other places, where such 
implicit things are used.

Also the `it` is part of the template's name(`mapIt`, `applyIt`, ...).