Re: How do you check the presence of an annotation?
I'm not sure it's possible, actually. You can retrieve t's initialization value (!), using t.symbol.getImpl, which is useful when handling const, but I can't see any way (or at least an obvious way) how to extract variable pragmas from a variable symbol.
Re: Heterogeneous object pool with Timed Eviction
@mratsim Well, you mentioned game programming at first so "serious deep learning" didn't come to my mind. ^^"
Re: What's happening with destructors?
@mratsim Well, if you call GC_collect manually then it's much worse than manual allocation and deallocation from memory pool, I guess. Could you explain further what getOccupiedMem changes in the case we were talking about (ref to ptr to memory external to thread heap)?
Re: atomics: Why is interlockedCompareExchange8 "safe"?
Actually _InterlockedCompareExchange8 does exist, It was added in VS2012 silently and it is not mentioned in the docs.
Re: Help wrapping a C++ module
I submitted an example to the docs, but it is still [a PR](https://github.com/nim-lang/Nim/pull/6711), so have a look at the PR. And try something like the following (and remove the push and pop pragmas): const hdr = "nimfuzz/fts_fuzzy_match.h" proc fuzzyMatchSimple*(pattern: cstring; str: cstring): bool {.importcpp: "fuzzy_match_simple", header: hdr.} which will add the line #include "nimfuzz/fts_fuzzy_match.h" to the `nimcache/test.cpp` file, and map the c++ function `fuzzy_match_simple` to the Nim proc `fuzzyMatchSimple()`
Re: question about templates / namespaces / modules
Have you tried something like when AVX2_Available(): Of course that can only work when your AVX2_Available() is known at compile time...
question about templates / namespaces / modules
Here is some pseudocode for what I would like to do: template SIMD(actions:untyped) = if AVX2_Available(): import AVX2 # use the avx2 version of Add/Mul etc. actions else: import SSE# use the sse version of Add/Mul etc. actions SIMD: let c = SIMD_Add(a,b) SIMD_Mul(b,c) But I can't do that because imports have to be top level. I can do include instead of import, that will lead to massive code size if someone had multiple uses of the template in their code, as it will reproduce the same functions over and over. Is there some way to accomplish this?
How do you check the presence of an annotation?
I'd like to check that some "memory location" is annotated with {.volatile.}, to make sure my code only compiles, if I use {.volatile.} in the right place. I searc the lib code but didn't really find much, except this (asyncmacro.nim): proc asyncSingleProc(prc: NimNode): NimNode {.compileTime.} = # ... # LINE NO: 385 # If proc has an explicit gcsafe pragma, we add it to iterator as well. if prc.pragma.findChild(it.kind in {nnkSym, nnkIdent} and $it == "gcsafe") != nil: closureIterator.addPragma(newIdentNode("gcsafe")) I gave it a try, but did not get very far, as I'm still "practicing" the language itself, and haven't learned (or more precisely already forgot), how the meta-programming works. Here is what I tried: import macros, strutils type VolatilePtr*[T] = distinct ptr T # Means it's a pointer to some volatile value proc hasAnnotation(stuff: NimNode, annotation: static[string]): bool {.compileTime.} = (stuff.pragma.findChild(it.kind in {nnkSym, nnkIdent} and $it == annotation) != nil) template toVolatilePtr*[T](t: untyped) = when hasAnnotation(t, "volatile"): VolatilePtr[T](addr t) else: {.error: "t is not volatile!".} when isMainModule: var tua = 42'i32 let p: VolatilePtr[int32] = toVolatilePtr[int32](tua) let tua2: int32 = 42'i32 #atomicLoadNSeqCST[int32](p) assert(tua2 == tua)
Re: Heterogeneous object pool with Timed Eviction
Yes, a Nvidia Titan X is quite a common GPU for deep learning and comes with 12GB of VRAM, GTX 1080 Ti, 11GB and GTX 1070-1080 8GB. Whatever the GPU the goal is to saturate them with the biggest batch they can handle. There is a lot of research going into more memory-efficient networks, and how to optimize memory usage without impacting accuracy. I did not came across this Rust allocators collection, great. All my Cuda Allocator research is detailed [there](https://github.com/mratsim/Arraymancer/issues/112)
Re: What's happening with destructors?
Right it seems like there are a bunch of `getOccupiedMem` in the gc file so that it can get the memory managed, including a [collection threshold](https://github.com/nim-lang/Nim/blob/devel/lib/system/gc.nim#L838). In my case the shared mem array is only used to hold temporaries for computation (main long-lived data is held in a seq) so I can trigger a manual GC_collect after computation is done, but for non-temporary usage I could get out-of-mem even though enough space can be collected.
Re: Surprises with Generics
One other wrinkle more in line with this thread topic is making things work for general object types but specific standard key types. In C one might handle that by taking an `offsetof` for where they key field is within an object and `sizeof` for the size of object in the array. In Nim, one could do similar, but it might be more "Nimonic" to more safely dispatch via some `sort template` instead of a `sort proc`, e.g.: template sort[T](inp: seq[T], keyField: untyped) = when T.`keyField` is byte: ## XXX Handle all the standard types ... ## XXX good & stable algos for each one myArray.sort(myKey) The implementation might look nicer with a family of overloaded calls rather than a big `when T.`keyField` is` dispatcher. I am not sure there is a clean way to do that in this case. Each `when` clause should be able to dispatch to a type-specialized case, though. So, it's not too bad.
Re: Surprises with Generics
@Stefan_Salewski - I think this idea of having a family of overloaded procs to sort on standard types is very good. I have almost raised it myself several times. It would be useful for `cmp`-less `sort` procs on such type to be _stable_ sorts (meaning elements with identical keys retain relative positions in the array). Then a user could implement a simple 2-level or 3-level sort on distinct embedded keys by just calling such sort procs 2 or 3 times. (A multi-level sort orders elements first by a primary key, then a secondary key for equal primary keys, and so on). This arrangement has another performance bonus potentially much greater than the inlining effect mentioned so far in this thread. It allows one to tune which sorting algorithm is used based on the nature of the key, e.g. how long it is, integer or floating point or string, etc. For example, for single-byte integer keys it is hard to be faster than [counting sort](https://en.wikipedia.org/wiki/Counting_sort) which does two pretty linear sweeps through the input - the first to histogram byte values and the second to output the answer (with a keyspace-sized prefix sum in between to convert counts to output offsets). The simplest version of that does require an additional copy of the input array which may have some consequences on the proc signature/interface/specification.
atomics: Why is interlockedCompareExchange8 "safe"?
Hi, I'm trying to understand what goes on in "lib\system\atomics.nim". This is in part because I'm missing atomicLoadN/atomicStoreN on Windows, and I'm trying to work out how to implement that myself. I've just stumbled upon this declaration (atomics.nim, line #220): interlockedCompareExchange8(p: pointer; exchange, comparand: byte): byte {.importc: "_InterlockedCompareExchange64", header: "".} At first, I though using __InterlockedCompareExchange64_ was a bug, but then I found out that there is no __InterlockedCompareExchange8_. So, I guess _exchange_ and _comparand_ get cast to __int64_, and the return value just gets cast to _byte_. So far so good. But __InterlockedCompareExchange64_ assumes _p_ points to a __int64_ value, and so will overwrite the _8 bytes_ at that location. How can that not go horribly wrong?
Re: Surprises with Generics
Reading this blog post [http://nibblestew.blogspot.de/2017/11/aiming-for-c-sorting-speed-with-plain-c.html](http://nibblestew.blogspot.de/2017/11/aiming-for-c-sorting-speed-with-plain-c.html) > I just remembered a discussion some time ago: S. Salewski wrote: > A consequence may be, that for algorithm.sort() we do not need a cmp() proc > parameter (which can not be inlined and is a bit slow so). We may just define > a plain cmp() proc in our module for the types which we pass to compare? > > Some months ago I tested a quicksort proc, and found out that giving a cmp() > proc for custom types decreases performance by about 25 percent compared to > using a default cmp proc for known standard types. So maybe we should provide indeed a sort proc without cmp() function.
Re: Do we really like the ...It templates?
I do like the It templates, they make the code shorter. Although I also found that nesting them was awkward when trying to multiply two sequences, it made me think of different ways to achieve the same thing, and make it look even better than the initial implementation, it ended looking like this: let acc = neuron.weights.zip(input).mapIt(it.a * it.b).foldr(a + b) What's not to like about the above code; without the It template, the map would end up being more verbose, or worse, using for loops.
Re: Do we really like the ...It templates?
@Araq Maybe you don't need flexibility but please imagine using a it-template inside of an it-template (e.g. an apply inside of an apply). Just like @olwi said, => is better and I think it should be brought to standard Nim. People only mention two reasons to use it-templates, correct me if I'm wrong: * "shorter code" \--- ok, but also less flexible (apply-in-apply) and less readable (a matter of taste, I admit) * "closures are inefficient" \--- why not making them efficient somehow? Like inlining, let's say? @doofenstein But it's optional (just like in, let's say, Scala). Why? Because of the higher-order-functions composition, which would make it (pun intended) ambiguous.
Re: case statement with exceptions
Call me a weirdo but I think a converter which could fail should return an option[T] instead of T itself. This way you can either unwrap the option (works like exception) or pass it somewhere first, e.g. into a container so that you could use unwrap inside of an higher-order routine.
Re: Heterogeneous object pool with Timed Eviction
@cdome That's funny, actually, I've used a container that seems to work just like yours, despite the fact the use case was totally different. It was for an evolutionary program. @mratsim If I get it, you need at least 93 MB * 32 images = 11 GB 904 MB per batch? By the way: did you look at [Rust allocators collection](https://github.com/rphmeier/allocators)?
Re: What's happening with destructors?
Well, you said "today you can do" as a general advice so I assumed you try to show us "how simple it is to do it by hand", not "how easy it is to make sth only for internal usage". About this object being a ref one --- no, please read my comment again. Long story short: ref object can be deepCopied, your example cannot. Besides: no, that's not the same. Why? Because GC finalizes a _pointer_. So it will finalize it when it comes to the conclusion it should free the memory for a _pointer_. It cares not about how huge memory block on shared heap it corresponds to nor think whenever shared heap should be cleaned at all. So you could have 90% of the shared heap filled with unused arrays and GC saying "Why free anythin? I've still got plenty of space on my thread's heap!" because pointers themselves are very small. That's the same I've told you about CUDA memory. Simply put: you don't have a GC-ed shared heap. You have an _illusion_ of GC-ed shared heap.
Re: Hiding sideeffect from compiler
I don't think it's something you should actually do as the compiler can do certain optimizations when it knows a routine has no side effects. Optimizations which may opt out your side effects, I guess.
Re: Do we really like the ...It templates?
e.g. in Kotlin, when a function with only one parameter is accepted, in it's definition, the first parameter name can be omitted and instead it's named `it` implicitly: strings.filter { it.length == 5 }.sortedBy { it }.map { it.toUpperCase() } So there it's even part of the core language. In Nim it's only a part of certain stdlib templates, but it shows that there are other places, where such implicit things are used. Also the `it` is part of the template's name(`mapIt`, `applyIt`, ...).