And my less performant 2.7 GHz Intel Core i5 MBP with Apple accelerated
VecLib is way faster than naive netlib BLAS (I guess it's multi-threaded):

| a b |
a := LapackDGEMatrix randNormal: #(1000 1000).
b := LapackDGEMatrix randNormal: #(1000 1000).
[a * b] timeToRun.

| a b |
a := LapackSGEMatrix randNormal: #(1000 1000).
b := LapackSGEMatrix randNormal: #(1000 1000).
[a * b] timeToRun

Le mar. 21 mai 2019 à 10:05, Nicolas Cellier <> a écrit :

> Hi Serge,
> this is good news, having tensor flow bindings is also a must!
> I have this in Smallapack with pure CPU unaccelerated blas (no MKL, nor
> ATLAS, just plain and dumb netlib code)
> | a b |
> a := LapackDGEMatrix randNormal: #(1000 1000).
> b := LapackDGEMatrix randNormal: #(1000 1000).
> [a * b] timeToRun
>  783
> | a b |
> a := LapackSGEMatrix randNormal: #(1000 1000).
> b := LapackSGEMatrix randNormal: #(1000 1000).
> [a * b] timeToRun
>  448
> Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
> So I think that we can get much better with accelerated library!
> Le mar. 21 mai 2019 à 05:13, Serge Stinckwich <>
> a écrit :
>> There is another solution with my TensorFlow Pharo binding:
>> You can do a matrix multiplication like that :
>> | graph t1 t2 c1 c2 mult session result |
>> graph := TF_Graph create.
>> t1 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
>> t2 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
>> c1 := graph const: 'c1' value: t1.
>> c2 := graph const: 'c2' value: t2.
>> mult := c1 * c2.
>> session := TF_Session on: graph.
>> result := session runOutput: (mult output: 0).
>> result asNumbers
>> Here I'm doing a multiplication between 2 matrices of 1000X1000 size in
>> 537 ms on my computer.
>> All operations can be done in a graph of operations that is run outside
>> Pharo, so could be very fast.
>> Operations can be done on CPU or GPU. 32 bits or 64 bits float operations
>> are possible.
>> This is a work in progress but can already be used.
>> Regards,
>> On Tue, May 21, 2019 at 6:54 AM Jimmie Houchin <>
>> wrote:
>>> I wasn't worried about how to do sliding windows. My problem is that
>>> using LapackDGEMatrix in my example was 18x slower than FloatArray, which
>>> is slower than Numpy. It isn't what I was expecting.
>>> What I didn't know is if I was doing something wrong to cause such a
>>> tremendous slow down.
>>> Python and Numpy is not my favorite. But it isn't uncomfortable.
>>> So I gave up and went back to Numpy.
>>> Thanks.
>>> On 5/20/19 5:17 PM, Nicolas Cellier wrote:
>>> Hi Jimmie,
>>> effectively I did not subsribe...
>>> Having efficient methods for sliding window average is possible, here is
>>> how I would do it:
>>> "Create a vector with 100,000 rows filles with random values (uniform
>>> distrubution in [0,1]"
>>> v := LapackDGEMatrix randUniform: #(100000 1).
>>> "extract values from rank 10001 to 20000"
>>> w1 := v atIntervalFrom: 10001 to: 20000 by: 1.
>>> "create a left multiplier matrix for performing average of w1"
>>> a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.
>>> "get the average (this is a 1x1 matrix from which we take first element)"
>>> avg1 := (a * w1) at: 1.
>>> [ "select another slice of same size"
>>> w2 := v atIntervalFrom: 15001 to: 25000 by: 1.
>>> "get the average (we can recycle a)"
>>> avg2 := (a * w2) at: 1 ] bench.
>>> This gives:
>>>  '16,500 per second. 60.7 microseconds per run.'
>>> versus:
>>> [w2 sum / w2 size] bench.
>>>  '1,100 per second. 908 microseconds per run.'
>>> For max and min, it's harder. Lapack/Blas only provide max of absolute
>>> value as primitive:
>>> [w2 absMax] bench.
>>>  '19,400 per second. 51.5 microseconds per run.'
>>> Everything else will be slower, unless we write new primitives in C and
>>> connect them...
>>> [w2 maxOf: [:each | each]] bench.
>>>  '984 per second. 1.02 milliseconds per run.'
>>> Le dim. 19 mai 2019 à 14:58, Jimmie <> a écrit :
>>>> On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
>>>>  > Did someone tried to use Smallapack in Pharo?
>>>>  > Jimmie?
>>>>  >
>>>> I am going to guess that you are not on pharo-users. My bad.
>>>> I posted this in pharo-users as I it wasn't Pharo development question.
>>>> I probably should have posted here or emailed you directly.
>>>> All I really need is good performance with a simple array of floats. No
>>>> matrix math. Nothing complicated. Moving Averages over a slice of the
>>>> array. A variety of different averages, weighted, etc. Max/min of the
>>>> array. But just a single simple array.
>>>> Any help greatly appreciated.
>>>> Thanks.
>>>> On 4/28/19 8:32 PM, Jimmie Houchin wrote:
>>>> Hello,
>>>> I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.
>>>> I am very unsure on my use of Smallapack. I am not a mathematician or
>>>> scientist. However the only part of Smallapack I am trying to use at
>>>> the
>>>> moment is something that would  be 64bit and compare to FloatArray so
>>>> that I can do some simple accessing, slicing, sum, and average on the
>>>> array.
>>>> Here is some sample code I wrote just to play in a playground.
>>>> I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
>>>> samples. The ones not in use are commented out for any run.
>>>> fp is a download from
>>>> and unzipped to a directory.
>>>> fp := '/home/jimmie/data/EUR_USD_Week1.csv'
>>>> index := 0.
>>>> pricesSum := 0.
>>>> asum := 0.
>>>> ttr := [
>>>>      lines := fp asFileReference contents lines allButFirst.
>>>>      a := ExternalDoubleArray new: lines size.
>>>>      "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
>>>>      a := la columnAt: 1."
>>>>      "a := FloatArray new: lines size."
>>>>      lines do: [ :line || parts price |
>>>>          parts := ',' split: line.
>>>>          index := index + 1.
>>>>          price := Float readFrom: (parts last).
>>>>          a at: index put: price.
>>>>          pricesSum := pricesSum + price.
>>>>          (index rem: 100) = 0 ifTrue: [
>>>>              asum := a sum.
>>>>       ]]] timeToRun.
>>>> { index. pricesSum. asum. ttr }.
>>>>   "ExternalDoubleArray an Array(337588 383662.5627699992
>>>> 383562.2956199993 0:00:01:59.885)"
>>>>   "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
>>>> 0:00:00:06.555)"
>>>> FloatArray is not the precision I need. But it is over 18x faster.
>>>> I am afraid I must be doing something badly wrong. Python/Numpy is over
>>>> 4x faster than FloatArray for the above.
>>>> If I am using Smallapack incorrectly please help.
>>>> Any help greatly appreciated.
>>>> Thanks.
