[Numpy-discussion] size of arrays

2021-03-13 Thread klark--kent
Dear colleagues! Size of np.float16(1) is 26Size of np.float64(1) is 3232 / 26 = 1.23 Since memory is limited I have a question after this code:    import numpy as np   import sys    a1 = np.ones(1, dtype='float16')   b1 = np.ones(1, dtype='float64')   div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)   # div_1 = 1.06    a2 = np.ones(10, dtype='float16')   b2 = np.ones(10, dtype='float64')   div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)       # div_2 = 1.51    a3 = np.ones(100, dtype='float16')   b3 = np.ones(100, dtype='float64')   div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)   # div_3 = 3.0Size of np.float64 numpy arrays is four times more than for np.float16.Is it possible to minimize the difference close to 1.23?___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] size of arrays

2021-03-13 Thread Todd
Ideally float64 uses 64 bits for each number while float16 uses 16 bits.
64/16=4.  However, there is some additional overhead.  This overhead makes
up a large portion of small arrays, but becomes negligible as the array
gets bigger.

On Sat, Mar 13, 2021, 16:01  wrote:

> Dear colleagues!
>
> Size of np.float16(1) is 26
> Size of np.float64(1) is 32
> 32 / 26 = 1.23
>
> Since memory is limited I have a question after this code:
>
>import numpy as np
>import sys
>
>a1 = np.ones(1, dtype='float16')
>b1 = np.ones(1, dtype='float64')
>div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)
># div_1 = 1.06
>
>a2 = np.ones(10, dtype='float16')
>b2 = np.ones(10, dtype='float64')
>div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)
># div_2 = 1.51
>
>a3 = np.ones(100, dtype='float16')
>b3 = np.ones(100, dtype='float64')
>div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)
># div_3 = 3.0
> Size of np.float64 numpy arrays is four times more than for np.float16.
> Is it possible to minimize the difference close to 1.23?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] size of arrays

2021-03-13 Thread Robert Kern
On Sat, Mar 13, 2021 at 4:02 PM  wrote:

> Dear colleagues!
>
> Size of np.float16(1) is 26
> Size of np.float64(1) is 32
> 32 / 26 = 1.23
>

Note that `sys.getsizeof()` is returning the size of the given Python
object in bytes. `np.float16(1)` and `np.float64(1)` are so-called "numpy
scalar objects" that wrap up the raw `float16` (2 bytes) and `float64` (8
bytes) values with the necessary information to make them Python objects.
The extra 24 bytes for each is _not_ present for each value when you have
`float16` and `float64` arrays of larger lengths. There is still some
overhead to make the array of numbers into a Python object, but this does
not increase with the number of array elements. This is what you are seeing
below when you compute the sizes of the Python objects that are the arrays.
The fixed overhead does not increase when you increase the sizes of the
arrays. They eventually approach the ideal ratio of 4: `float64` values
take up 4 times as many bytes as `float16` values, as the names suggest.
The ratio of 1.23 that you get from comparing the scalar objects reflects
that the overhead for making a single value into a Python object takes up
significantly more memory than the actual single number itself.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] size of arrays

2021-03-13 Thread klark--kent
So is it right that 100 arrays of one element is smaller than one array with size of 100 elements?14.03.2021, 00:06, "Todd" :Ideally float64 uses 64 bits for each number while float16 uses 16 bits.  64/16=4.  However, there is some additional overhead.  This overhead makes up a large portion of small arrays, but becomes negligible as the array gets bigger.On Sat, Mar 13, 2021, 16:01   wrote:Dear colleagues! Size of np.float16(1) is 26Size of np.float64(1) is 3232 / 26 = 1.23 Since memory is limited I have a question after this code:    import numpy as np   import sys    a1 = np.ones(1, dtype='float16')   b1 = np.ones(1, dtype='float64')   div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)   # div_1 = 1.06    a2 = np.ones(10, dtype='float16')   b2 = np.ones(10, dtype='float64')   div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)       # div_2 = 1.51    a3 = np.ones(100, dtype='float16')   b3 = np.ones(100, dtype='float64')   div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)   # div_3 = 3.0Size of np.float64 numpy arrays is four times more than for np.float16.Is it possible to minimize the difference close to 1.23?___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] size of arrays

2021-03-13 Thread Robert Kern
On Sat, Mar 13, 2021 at 4:18 PM  wrote:

> So is it right that 100 arrays of one element is smaller than one array
> with size of 100 elements?
>

No, typically the opposite is true.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] size of arrays

2021-03-13 Thread Todd
No, because the array of 100 elements will only have the overhead once,
while the 100 arrays will each have the overhead repeated.


Think about the overhead like a book cover on a book. It takes additional
space, but provides storage for the book, information to help you find it,
etc. Each book only needs one cover. So a single 100 page book only needs
one cover, while a hundred 1 page books needs 100 covers. Also, as the book
gets more pages the cover takes a smaller portion of the total size of the
book.

On Sat, Mar 13, 2021, 16:17  wrote:

> So is it right that 100 arrays of one element is smaller than one array
> with size of 100 elements?
>
> 14.03.2021, 00:06, "Todd" :
>
> Ideally float64 uses 64 bits for each number while float16 uses 16 bits.
> 64/16=4.  However, there is some additional overhead.  This overhead makes
> up a large portion of small arrays, but becomes negligible as the array
> gets bigger.
>
> On Sat, Mar 13, 2021, 16:01  wrote:
>
> Dear colleagues!
>
> Size of np.float16(1) is 26
> Size of np.float64(1) is 32
> 32 / 26 = 1.23
>
> Since memory is limited I have a question after this code:
>
>import numpy as np
>import sys
>
>a1 = np.ones(1, dtype='float16')
>b1 = np.ones(1, dtype='float64')
>div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)
># div_1 = 1.06
>
>a2 = np.ones(10, dtype='float16')
>b2 = np.ones(10, dtype='float64')
>div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)
># div_2 = 1.51
>
>a3 = np.ones(100, dtype='float16')
>b3 = np.ones(100, dtype='float64')
>div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)
># div_3 = 3.0
> Size of np.float64 numpy arrays is four times more than for np.float16.
> Is it possible to minimize the difference close to 1.23?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy 1.20.1 availability

2021-03-13 Thread dan_patterson
Any idea why the most recent version isn't available on the main anaconda
channel.  conda-forge and building are not options for a number of reasons.  
I posted a package request there but double digit days have gone by it just
got a thumbs up and package-request tag
https://github.com/ContinuumIO/anaconda-issues/issues/12309
I realize it could be the "times" or maybe no one is aware of its absence.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

2021-03-13 Thread Juan Nunez-Iglesias
Hi Pierre,

If you’re able to compile NumPy locally and you have reliable benchmarks, you 
can write a script that tests the runtime of your benchmark and reports it as a 
test pass/fail. You can then use “git bisect run” to automatically find the 
commit that caused the issue. That will help narrow down the discussion before 
it gets completely derailed a second time. 😂

https://lwn.net/Articles/317154/

Juan. 

> On 13 Mar 2021, at 10:34 am, PIERRE AUGIER 
>  wrote:
> 
> Hi,
> 
> I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy 
> --force-reinstall` and I can reproduce the regression.
> 
> Good news, I was able to reproduce the difference with only Numpy 1.20.1. 
> 
> Arrays prepared with (`df` is a Pandas dataframe)
> 
> arr = df.values.copy()
> 
> or 
> 
> arr = np.ascontiguousarray(df.values)
> 
> lead to "slow" execution while arrays prepared with
> 
> arr = np.copy(df.values)
> 
> lead to faster execution.
> 
> arr.copy() or np.copy(arr) do not give the same result, with arr obtained 
> from a Pandas dataframe with arr = df.values. It's strange because 
> type(df.values) gives  so I would expect arr.copy() 
> and np.copy(arr) to give exactly the same result.
> 
> Note that I think I'm doing quite serious and reproducible benchmarks. I also 
> checked that this regression is reproducible on another computer.
> 
> Cheers,
> 
> Pierre
> 
> - Mail original -
>> De: "Sebastian Berg" 
>> À: "numpy-discussion" 
>> Envoyé: Vendredi 12 Mars 2021 22:50:24
>> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 
>> and 0.20 explaining a perf regression with
>> Pythran
> 
>>> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
>>> Hi,
>>> 
>>> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
>>> could explain a performance regression (~15 %) with Pythran.
>>> 
>>> I observe this regression with the script
>>> https://github.com/paugier/nbabel/blob/master/py/bench.py
>>> 
>>> Pythran reimplements Numpy so it is not about Numpy code for
>>> computation. However, Pythran of course uses the native array
>>> contained in a Numpy array. I'm quite sure that something has changed
>>> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
>>> since I don't get the same performance with Numpy 0.20. I checked
>>> that the values in the arrays are the same and that the flags
>>> characterizing the arrays are also the same.
>>> 
>>> Good news, I'm now able to obtain the performance difference just
>>> with Numpy 0.19.5. In this code, I load the data with Pandas and need
>>> to prepare contiguous Numpy arrays to give them to Pythran. With
>>> Numpy 0.19.5, if I use np.copy I get better performance that with
>>> np.ascontiguousarray. With Numpy 0.20, both functions create array
>>> giving the same performance with Pythran (again, less good that with
>>> Numpy 0.19.5).
>>> 
>>> Note that this code is very efficient (more that 100 times faster
>>> than using Numpy), so I guess that things like alignment or memory
>>> location can lead to such difference.
>>> 
>>> More details in this issue
>>> https://github.com/serge-sans-paille/pythran/issues/1735
>>> 
>>> Any help to understand what has changed would be greatly appreciated!
>>> 
>> 
>> If you want to really dig into this, it would be good to do profiling
>> to find out at where the differences are.
>> 
>> Without that, I don't have much appetite to investigate personally. The
>> reason is that fluctuations of ~30% (or even much more) when running
>> the NumPy benchmarks are very common.
>> 
>> I am not aware of an immediate change in NumPy, especially since you
>> are talking pythran, and only the memory space or the interface code
>> should matter.
>> As to the interface code... I would expect it to be quite a bit faster,
>> not slower.
>> There was no change around data allocation, so at best what you are
>> seeing is a different pattern in how the "small array cache" ends up
>> being used.
>> 
>> 
>> Unfortunately, getting stable benchmarks that reflect code changes
>> exactly is tough...  Here is a nice blog post from Victor Stinner where
>> he had to go as far as using "profile guided compilation" to avoid
>> fluctuations:
>> 
>> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
>> 
>> I somewhat hope that this is also the reason for the huge fluctuations
>> we see in the NumPy benchmarks due to absolutely unrelated code
>> changes.
>> But I did not have the energy to try it (and a probably fixed bug in
>> gcc makes it a bit harder right now).
>> 
>> Cheers,
>> 
>> Sebastian
>> 
>> 
>> 
>> 
>>> Cheers,
>>> Pierre
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>> 
>> 
>> 
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/nu