And my less performant 2.7 GHz Intel Core i5 MBP with Apple accelerated VecLib is way faster than naive netlib BLAS (I guess it's multi-threaded):
| a b | a := LapackDGEMatrix randNormal: #(1000 1000). b := LapackDGEMatrix randNormal: #(1000 1000). [a * b] timeToRun. 45 | a b | a := LapackSGEMatrix randNormal: #(1000 1000). b := LapackSGEMatrix randNormal: #(1000 1000). [a * b] timeToRun 19 Le mar. 21 mai 2019 à 10:05, Nicolas Cellier < nicolas.cellier.aka.n...@gmail.com> a écrit : > Hi Serge, > this is good news, having tensor flow bindings is also a must! > I have this in Smallapack with pure CPU unaccelerated blas (no MKL, nor > ATLAS, just plain and dumb netlib code) > > | a b | > a := LapackDGEMatrix randNormal: #(1000 1000). > b := LapackDGEMatrix randNormal: #(1000 1000). > [a * b] timeToRun > 783 > > | a b | > a := LapackSGEMatrix randNormal: #(1000 1000). > b := LapackSGEMatrix randNormal: #(1000 1000). > [a * b] timeToRun > 448 > > Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz > So I think that we can get much better with accelerated library! > > Le mar. 21 mai 2019 à 05:13, Serge Stinckwich <serge.stinckw...@gmail.com> > a écrit : > >> There is another solution with my TensorFlow Pharo binding: >> https://github.com/PolyMathOrg/libtensorflow-pharo-bindings >> >> You can do a matrix multiplication like that : >> >> | graph t1 t2 c1 c2 mult session result | >> graph := TF_Graph create. >> t1 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000). >> t2 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000). >> c1 := graph const: 'c1' value: t1. >> c2 := graph const: 'c2' value: t2. >> mult := c1 * c2. >> session := TF_Session on: graph. >> result := session runOutput: (mult output: 0). >> result asNumbers >> >> Here I'm doing a multiplication between 2 matrices of 1000X1000 size in >> 537 ms on my computer. >> >> All operations can be done in a graph of operations that is run outside >> Pharo, so could be very fast. >> Operations can be done on CPU or GPU. 32 bits or 64 bits float operations >> are possible. >> >> This is a work in progress but can already be used. >> Regards, >> >> >> >> On Tue, May 21, 2019 at 6:54 AM Jimmie Houchin <jlhouc...@gmail.com> >> wrote: >> >>> I wasn't worried about how to do sliding windows. My problem is that >>> using LapackDGEMatrix in my example was 18x slower than FloatArray, which >>> is slower than Numpy. It isn't what I was expecting. >>> >>> What I didn't know is if I was doing something wrong to cause such a >>> tremendous slow down. >>> >>> Python and Numpy is not my favorite. But it isn't uncomfortable. >>> >>> So I gave up and went back to Numpy. >>> >>> Thanks. >>> >>> >>> >>> On 5/20/19 5:17 PM, Nicolas Cellier wrote: >>> >>> Hi Jimmie, >>> effectively I did not subsribe... >>> Having efficient methods for sliding window average is possible, here is >>> how I would do it: >>> >>> "Create a vector with 100,000 rows filles with random values (uniform >>> distrubution in [0,1]" >>> v := LapackDGEMatrix randUniform: #(100000 1). >>> >>> "extract values from rank 10001 to 20000" >>> w1 := v atIntervalFrom: 10001 to: 20000 by: 1. >>> >>> "create a left multiplier matrix for performing average of w1" >>> a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size. >>> >>> "get the average (this is a 1x1 matrix from which we take first element)" >>> avg1 := (a * w1) at: 1. >>> >>> [ "select another slice of same size" >>> w2 := v atIntervalFrom: 15001 to: 25000 by: 1. >>> >>> "get the average (we can recycle a)" >>> avg2 := (a * w2) at: 1 ] bench. >>> >>> This gives: >>> '16,500 per second. 60.7 microseconds per run.' >>> versus: >>> [w2 sum / w2 size] bench. >>> '1,100 per second. 908 microseconds per run.' >>> >>> For max and min, it's harder. Lapack/Blas only provide max of absolute >>> value as primitive: >>> [w2 absMax] bench. >>> '19,400 per second. 51.5 microseconds per run.' >>> >>> Everything else will be slower, unless we write new primitives in C and >>> connect them... >>> [w2 maxOf: [:each | each]] bench. >>> '984 per second. 1.02 milliseconds per run.' >>> >>> Le dim. 19 mai 2019 à 14:58, Jimmie <jlhouc...@gmail.com> a écrit : >>> >>>> On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this? >>>> > Did someone tried to use Smallapack in Pharo? >>>> > Jimmie? >>>> > >>>> >>>> I am going to guess that you are not on pharo-users. My bad. >>>> I posted this in pharo-users as I it wasn't Pharo development question. >>>> >>>> I probably should have posted here or emailed you directly. >>>> >>>> All I really need is good performance with a simple array of floats. No >>>> matrix math. Nothing complicated. Moving Averages over a slice of the >>>> array. A variety of different averages, weighted, etc. Max/min of the >>>> array. But just a single simple array. >>>> >>>> Any help greatly appreciated. >>>> >>>> Thanks. >>>> >>>> >>>> On 4/28/19 8:32 PM, Jimmie Houchin wrote: >>>> Hello, >>>> >>>> I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas. >>>> >>>> I am very unsure on my use of Smallapack. I am not a mathematician or >>>> scientist. However the only part of Smallapack I am trying to use at >>>> the >>>> moment is something that would be 64bit and compare to FloatArray so >>>> that I can do some simple accessing, slicing, sum, and average on the >>>> array. >>>> >>>> Here is some sample code I wrote just to play in a playground. >>>> >>>> I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray >>>> samples. The ones not in use are commented out for any run. >>>> >>>> fp is a download from >>>> http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip >>>> and unzipped to a directory. >>>> >>>> fp := '/home/jimmie/data/EUR_USD_Week1.csv' >>>> index := 0. >>>> pricesSum := 0. >>>> asum := 0. >>>> ttr := [ >>>> lines := fp asFileReference contents lines allButFirst. >>>> a := ExternalDoubleArray new: lines size. >>>> "la := LapackDGEMatrix allocateNrow: lines size ncol: 1. >>>> a := la columnAt: 1." >>>> "a := FloatArray new: lines size." >>>> lines do: [ :line || parts price | >>>> parts := ',' split: line. >>>> index := index + 1. >>>> price := Float readFrom: (parts last). >>>> a at: index put: price. >>>> pricesSum := pricesSum + price. >>>> (index rem: 100) = 0 ifTrue: [ >>>> asum := a sum. >>>> ]]] timeToRun. >>>> { index. pricesSum. asum. ttr }. >>>> "ExternalDoubleArray an Array(337588 383662.5627699992 >>>> 383562.2956199993 0:00:01:59.885)" >>>> "FloatArray an Array(337588 383662.5627699992 383562.2954441309 >>>> 0:00:00:06.555)" >>>> >>>> FloatArray is not the precision I need. But it is over 18x faster. >>>> >>>> I am afraid I must be doing something badly wrong. Python/Numpy is over >>>> 4x faster than FloatArray for the above. >>>> >>>> If I am using Smallapack incorrectly please help. >>>> >>>> Any help greatly appreciated. >>>> >>>> Thanks. >>>> >>>> >>>> >> >> -- >> Serge Stinckwic >> h >> >> Int. Research Unit >> on Modelling/Simulation of Complex Systems (UMMISCO) >> Sorbonne University >> (SU) >> French National Research Institute for Sustainable Development (IRD) >> U >> niversity of Yaoundé I, Cameroon >> "Programs must be written for people to read, and only incidentally for >> machines to execute." >> https://twitter.com/SergeStinckwich >> >