Re: Arraymancer - A n-dimensional array / tensor library

2018-05-06 Thread qqtop
Congratulations !

I especially like the neural network examples and hope more

will be forthcoming.


Re: Arraymancer - A n-dimensional array / tensor library

2018-05-05 Thread mratsim
The new version of Arraymancer, v0.4.0 "The Name of the Wind" is live today. 
here is the changelog:

* * *

  * Core:
* OpenCL tensors are now available! However Arraymancer will naively select 
the first backend available. It can be CPU, it can be GPU. They support basic 
and broadcasted operations (Addition, matrix multiplication, elementwise 
multiplication, ...)
* Addition of an `argmax` and `argmax_max` procs.
  * Datasets:
* Loading the MNIST dataset from 
[http://yann.lecun.com/exdb/mnist](http://yann.lecun.com/exdb/mnist)/
* Reading and writing from CSV
  * Linear algebra:
* Least squares solver
* Eigenvalues and eigenvectors decomposition for symmetric matrices
  * Machine Learning
* Principal Component Analysis (PCA)
  * Statistics
* Computation of covariance matrices
  * Neural network
* Introduction of a short intuitive syntax to build neural networks! (A 
blend of Keras and PyTorch).
* Maxpool2D layer
* Mean Squared Error loss
* Tanh and softmax activation functions
  * Examples and tutorials
* Digit recognition using Convolutional Neural Net
* Teaching Fizzbuzz to a neural network
  * Tooling
* Plotting tensors through Python



Several updates linked to Nim rapid development and several bugfixes.

* * *

Thanks:


  * Bluenote10 for the CSV writing proc and the tensor plotting tool
  * Miran for benchmarking
  * Manguluka for tanh
  * Vindaar for bugfixing
  * Every participants in RFCs
  * And you user of the library.




Re: Arraymancer - A n-dimensional array / tensor library

2017-12-14 Thread mratsim
Arraymancer v0.3.0 Dec. 14 2017

Finally after much struggles, here is Arraymancer new version. Available now on 
Nimble. It comes with a new shiny doc (thanks @flyx and NimYAML doc): 
[https://mratsim.github.io/Arraymancer](https://mratsim.github.io/Arraymancer)

Changes:

  * **Very** Breaking



>   * Tensors uses reference semantics now: `let a = b` will share data by 
> default and copies must be made explicitly.
> 

>
>>   * There is no need to use `unsafe` proc to avoid copies especially for 
>> slices.
>>   * Unsafe procs are deprecated and will be removed leading to a smaller and 
>> simpler codebase and API/documentation.
>>   * Tensors and CudaTensors now works the same way.
>>   * Use `clone` to do copies.
>>   * Arraymancer now works like Numpy and Julia, making it easier to port 
>> code.
>>   * Unfortunately it makes it harder to debug unexpected data sharing.
>> 


  * Breaking (?)



>   * The max number of dimensions supported has been reduced from 8 to 7 to 
> reduce cache misses. Note, in deep learning the max number of dimensions 
> needed is 6 for 3D videos: [batch, time, color/feature channels, Depth, 
> Height, Width]
> 


  * Documentation



>   * Documentation has been completely revamped and is available here: 
> [https://mratsim.github.io/Arraymancer](https://mratsim.github.io/Arraymancer)/
> 


  * Huge performance improvements



>   * Use non-initialized seq
>   * shape and strides are now stored on the stack
>   * optimization via inlining all higher-order functions
>
>> * `apply_inline`, `map_inline`, `fold_inline` and `reduce_inline` 
>> templates are available.
> 
>   * all higher order functions are parallelized through OpenMP
>   * integer matrix multiplication uses SIMD, loop unrolling, restrict and 
> 64-bit alignment
>   * prevent false sharing/cache contention in OpenMP reduction
>   * remove temporary copies in several proc
>   * runtime checks/exception are now behind `unlikely`
>   * `A*B + C` and `C+=A*B` are automatically fused in one operation
>   * do not initialize result tensors
> 


  * Neural network:



>   * Added `linear`, `sigmoid_cross_entropy`,
> `softmax_cross_entropy` layers
>   * Added Convolution layer
> 


  * Shapeshifting:



>   * Added `unsqueeze` and `stack`
> 


  * Math:



>   * Added `min`, `max`, `abs`, `reciprocal`, `negate` and in-place `mnegate` 
> and `mreciprocal`
> 


  * Statistics:



>   * Added variance and standard deviation
> 


  * Broadcasting



>   * Added `.^` (broadcasted exponentiation)
> 


  * Cuda:



>   * Support for convolution primitives: forward and backward
>   * Broadcasting ported to Cuda
> 


  * Examples



>   * Added perceptron learning `xor` function example
> 


  * Precision



>   * Arraymancer uses `ln1p` (`ln(1 + x)`) and `exp1m` procs (`exp(1 - x)`) 
> where appropriate to avoid catastrophic cancellation
> 


  * Deprecated



>   * Version 0.3.1 with the ALL deprecated proc removed will be released in a 
> week. Due to issue 
> [https://github.com/nim-lang/Nim/issues/6436](https://github.com/nim-lang/Nim/issues/6436),
>  even using non-deprecated proc like `zeros`, `ones`, `newTensor` you will 
> get a deprecated warning.
>   * `newTensor`, `zeros`, `ones` arguments have been changed from `zeros([5, 
> 5], int)` to `zeros[int]([5, 5])`
>   * All `unsafe` proc are now default and deprecated.
> 



Re: Arraymancer - A n-dimensional array / tensor library

2017-11-20 Thread Udiknedormin
@mratsim Would you mind if I make a reference to your lib in my bachelor thesis 
about optimization?


Re: Arraymancer - A n-dimensional array / tensor library

2017-09-24 Thread dataPulverizer
This looks like a very useful library for me. I shall certainly be checking it 
out. Nice one!


Re: Arraymancer - A n-dimensional array / tensor library

2017-09-24 Thread mratsim
I am very excited to announce the second release of Arraymancer which includes 
numerous improvements `blablabla` ...

Without further ado:

  * Communauty
* There is a Gitter room!
  * Breaking
* `shallowCopy` is now `unsafeView` and accepts `let` arguments
* Element-wise multiplication is now `.*` instead of `|*|`
* vector dot product is now `dot` instead of `.*`
  * Deprecated
* All tensor initialization proc have their `Backend` parameter deprecated.
* `fmap` is now `map`
* `agg` and `agg_in_place` are now `fold` and nothing (too bad!)
  * Initial support for Cuda !!!
* All linear algebra operations are supported
* Slicing (read-only) is supported
* Transforming a slice to a new contiguous Tensor is supported
  * Tensors
* Introduction of `unsafe` operations that works without copy: 
`unsafeTranspose, unsafeReshape, unsafebroadcast, unsafeBroadcast2, 
unsafeContiguous`
* Implicit broadcasting via `.+, .*, ./, .-` and their in-place equivalent 
`.+=, .-=, .*=, ./=`
* Several shapeshifting operations: `squeeze`, `at` and their `unsafe` 
version.
* New property: `size`
* Exporting: `export_tensor` and `toRawSeq`
* `reduce` and `reduce` on axis
  * Ecosystem:
* I express my deep thanks to @edubart for testing Arraymancer, 
contributing new functions, and improving its overall performance. He built 
[arraymancer-demos](https://github.com/edubart/arraymancer-demos) and 
[arraymancer-vision](https://github.com/edubart/arraymancer-vision), check

those out you can load images in Tensor and do logistic regression on those!




Also thanks to the Nim communauty on IRC/Gitter, they are a tremendous help 
(yes Varriount, Yardanico, Zachary, Krux).  
I probably would have struggled a lot more without the guidance of Andrea's 
code for Cuda in his [neo](https://github.com/unicredit/neo) and 
[nimcuda](https://github.com/unicredit/nimcuda) library.  


And obviously Araq and Dom for Nim which is an amazing language for 
performance, productivity, safety and metaprogramming. 


Re: Arraymancer - A n-dimensional array / tensor library

2017-07-16 Thread mratsim
The only static parts of the Tensor types are the Backend (Cpu, CUDA, ...) and 
the internal type (int32, float32, object ...).

The network topology will be dynamic and using dynamic graphs more akin to 
PyTorch/Chainer/DyNet than Theano/Tensorflow/Keras.

My next step is to build an autograd so people only need to implement the 
forward pass, backpropagation will be automatic. For this part I'm waiting for 
VTable.

PS: I think NimData is great too, Pandas seems like a much harder beast!


Re: Arraymancer - A n-dimensional array / tensor library

2017-07-15 Thread bluenote
A late reply because I was hoping to dive into this a bit deeper before 
replying. But due to lack of time, a high-level feedback must suffice: This 
looks awesome!

I completely agree with your observation that there is a gap between developing 
prototypes e.g. in Python and bringing them into production -- not only in deep 
learning, but data science in general. And I also think that Nim's feature set 
would be perfect to fill this gap.

A quick question on using statically-typed tensors: I assume that this implies 
that the topolgy of a network cannot be dynamic at all? I'm wondering if there 
are good work-arounds to situations where dynamic network topologies are 
required, for instance when a model wants to choose its number of hidden layer 
nodes iteratively, picking the best model variant. Are dynamically typed 
tensors an option or would that defeat the design / performance?


Re: Arraymancer - A n-dimensional array / tensor library

2017-07-08 Thread mratsim
`get_data_ptr` is now public .

For now, I will add the neural network functionality directly in Arraymancer.

The directory structure will probably be:

  * src/arraymancer ==> core Tensor stuff
  * src/autograd ==> automatic gradient computation (i.e. 
[Nim-rmad](https://github.com/mratsim/nim-rmad) ported to tensors)
  * src/neuralnet ==> neural net layers



This mirrors [PyTorch's 
tree](https://github.com/pytorch/pytorch/tree/master/torch)

I made this choice for the following reasons:

  * It's easier for me to keep track of one repo, refactor code, document and 
test.
  * I'm focusing on deep learning
  * It's much easier to communicate about one single package (and attracts new 
people to Nim  ).
  * Data scientists are used to have deep learning in a single package (tensor 
+ neural net interface): Tensorflow, Torch/PyTorch, Nervana Neon, MxNet ...
  * Nim's `DeadCodeElim` will ensure that unused code will not be compiled.



If the tensor part (without the NN) get even 0.1% of Numpy popularity and 
people start using it in several packages that means:

  * It's a rich man problem!
  * We get new devs and input for scientific/numerical Nim.
  * We can reconsider splitting as we will know actual expectations.
  * We can even build a "scinim" community which drives all key scientific nim 
packages.



In the mean time I think it's best if I do what is easier for me and worry 
about how to scale later.


Re: Arraymancer - A n-dimensional array / tensor library

2017-07-08 Thread cmacmackin
I've been following this for a while on GitHub and I think it is a very 
impressive project. Nim would be a great language for scientific computing, but 
it needs to have the numerical libraries and this is an excellent first step in 
creating them.

A couple of questions. First, are you planning to add neural network 
functionality directly to Arraymancer? Surely that would be something better 
suited for a separate, specialised library? A second, more general, question I 
have is whether you'd consider making the 
[get_data_ptr](https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/data_structure.nim#L62)
 proc public. It would be nice to be able to integrate your tensors with 
wrappers for existing numerical software written in C and we'd need access to 
the raw data for that.


Arraymancer - A n-dimensional array / tensor library

2017-07-05 Thread mratsim
As a data scientist, I feel that Nim has tremendous potential for data science, 
machine learning and deep learning.

In particular, it's currently non-trivial to bridge the gap between deep 
learning research (mostly Python and sometimes Lua) and production (C for 
embedded devices, javascript for web services ...).

For the past 3 months I've been working on Arraymancer, a tensor library that 
currently provides a subset of Numpy functionality in a fast and ergonomic 
library. It features:

  * Creating tensors from nested sequences and arrays (even 10 level of nesting)
  * Pretty printing of up to 4D tensors (would need help to generalize)
  * Slicing with Nim syntax
  * Slices can be mutated
  * Reshaping, broadcasting, concatenating tensors. Also permuting their 
dimensions.
  * Universal functions
  * Accelerated matrix and vector operations using BLAS
  * Iterators (on values, coordinates, axis)
  * Aggregate and statistics (sum, mean, and a generic aggregate higher order 
function)



Next steps (in no particular order) include:

  * adding CUDA support using andrea's nimcuda package
  * adding Neural Network / Deep Learning functions
  * Improving the documentation and adding the library on Nimble



The library: 
[https://github.com/mratsim/Arraymancer](https://github.com/mratsim/Arraymancer)

I welcome your feedback or expected use case. I especially would love to know 
the pain points people have with deep learning and putting deep learning models 
in production.