Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread paul . carrico
Dear All

First of all thanks for the answers and the information's (I'll ding
into it) and let me trying to add comments on what I want to : 

* My asci file mainly contains data (float and int) in a single column
* (it is not always the case but I can easily manage it - as well I
saw I can use 'spli' instruction if necessary)
* Comments/texts indicates the beginning of a bloc immediately
followed by the number of sub-blocs

* So I need to read/record all the values in order to build a matrix
before working on it (using Numpy & vectorization) 

* The columns 2 and 3 have been added for further treatments
* The '0' values will be specifically treated afterward

Numpy won't be a problem I guess (I did some basic tests and I'm quite
confident) on how to proceed, but I'm really blocked on data records … I
trying to find a way to efficiently read and record data in a matrix: 

* avoiding dynamic memory allocation (here using 'append' in python
meaning, not np),
* dealing with huge asci file: the latest file I get contains more
than 60 MILLION OF LINES

Please find in attachment an extract of the input format
('example_of_input'), and the matrix I'm trying to create and manage
with Numpy 

Thanks again for your time 

Paul 

### 

##BEGIN _-> line number x in the original file_ 

42   _-> indicates the number of sub-blocs_ 

1 _-> number of the 1rst sub-bloc_ 

6 _-> gives how many value belong to the sub bloc_ 

12 

47 

2 

46 

3 

51 

…. 

13  _ -> another type of sub-bloc with 25 values_ 

25 

15 

88 

21 

42 

22 

76 

19 

89 

0 

18 

80 

23 

38 

24 

73 

20 

81 

0 

90 

0 

41 

0 

39 

0 

77 

… 

42 _-> another type of sub-bloc with 2 values_ 

2 

115 

109 

 ### 

THE MATRIX RESULT 

1 0 0 6 12 47 2 46 3 51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

2 0 0 6 3 50 11 70 12 51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

3 0 0 8 11 50 3 49 4 54 5 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

4 0 0 8 12 70 11 66 9 65 10 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

5 0 0 8 2 47 12 68 10 44 1 43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

6 0 0 8 5 56 6 58 7 61 11 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

7 0 0 8 11 61 7 60 8 63 9 66 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

8 0 0 19 12 47 2 46 3 51 0 13 97 14 92 15 96 0 72 0 48 0 52 0 0 0 0 0 0 

9 0 0 19 13 97 14 92 15 96 0 16 86 17 82 18 85 0 95 0 91 0 90 0 0 0 0 0
0 

10 0 0 19 3 50 11 70 12 51 0 15 89 19 94 13 96 0 52 0 71 0 72 0 0 0 0 0
0 

11 0 0 19 15 89 19 94 13 96 0 18 81 20 84 16 85 0 90 0 77 0 95 0 0 0 0 0
0 

12 0 0 25 3 49 4 54 5 57 11 50 0 15 88 21 42 22 76 19 89 0 52 0 53 0 55
0 71 

13 0 0 25 15 88 21 42 22 76 19 89 0 18 80 23 38 24 73 20 81 0 90 0 41 0
39 0 77 

14 0 0 25 11 66 9 65 10 68 12 70 0 19 78 25 99 26 98 13 94 0 71 0 67 0
69 0 72 

…. 

### 

AN EXAMPLE OF THE CODE I STARTED TO WRITE 

# -*- coding: utf-8 -*- 

 import time, sys, os, re 

import itertools 

import numpy as np 

PATH = str(os.path.abspath('')) 

input_file_name ='/example_of_input.txt' 

## check if the file exists, then if it's empty or not 

if (os.path.isfile(PATH + input_file_name)): 

if (os.stat(PATH + input_file_name).st_size > 0): 

## go through the file in order to find specific sentences 

## specific blocks will be defined afterward 

Block_position = []; j=0; 

with open(PATH + input_file_name, "r") as data: 

for line in data: 

if '##BEGIN' in line: 

Block_position.append(j) 

j=j+1 

## just to tests to get all the values 

#i = 0 

#data = np.zeros( (505), dtype=np.int ) 

#with open(PATH + input_file_name, "r") as f: 

#for i in range (0,505): 

#data[i] = int(f.read(Block_position[0]+1+i)) 

#print ("i = ", i) 

#   for line in itertools.islice(f,Block_position[0],516): 

#   data[i]=f.read(0+i) 

#   i=i+1 

else: 

print "The file %s is empty : post-processing cannot be
performed !!!\n" % input_file_name 

else: 

print "Error : the file %s does not exist: post-processing stops
!!!\n" % input_file_name___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread Robert Kern
On Thu, Jul 6, 2017 at 1:49 AM,  wrote:
>
> Dear All
>
> First of all thanks for the answers and the information’s (I’ll ding into
it) and let me trying to add comments on what I want to :
>
> My asci file mainly contains data (float and int) in a single column
> (it is not always the case but I can easily manage it – as well I saw I
can use ‘spli’ instruction if necessary)
> Comments/texts indicates the beginning of a bloc immediately followed by
the number of sub-blocs
> So I need to read/record all the values in order to build a matrix before
working on it (using Numpy & vectorization)
>
> The columns 2 and 3 have been added for further treatments
> The ‘0’ values will be specifically treated afterward
>
>
> Numpy won’t be a problem I guess (I did some basic tests and I’m quite
confident) on how to proceed, but I’m really blocked on data records … I
trying to find a way to efficiently read and record data in a matrix:
>
> avoiding dynamic memory allocation (here using ‘append’ in python
meaning, not np),

Although you can avoid some list appending in your case (because the blocks
self-describe their length), I would caution you against prematurely
avoiding it. It's often the most natural way to write the code in Python,
so go ahead and write it that way first. Once you get it working correctly,
but it's too slow or memory intensive, then you can puzzle over how to
preallocate the numpy arrays later. But quite often, it's fine. In this
case, the reading and handling of the text data itself is probably the
bottleneck, not appending to the lists. As I said, Python lists are
cleverly implemented to make appending fast. Accumulating numbers in a list
then converting to an array afterwards is a well-accepted numpy idiom.

> dealing with huge asci file: the latest file I get contains more than 60
million of lines
>
> Please find in attachment an extract of the input format
(‘example_of_input’), and the matrix I’m trying to create and manage with
Numpy
>
> Thanks again for your time

Try something like the attached. The function will return a list of blocks.
Each block will itself be a list of numpy arrays, which are the sub-blocks
themselves. I didn't bother adding the first three columns to the
sub-blocks or trying to assemble them all into a uniform-width matrix by
padding with trailing 0s. Since you say that the trailing 0s are going to
be "specially treated afterwards", I suspect that you can more easily work
with the lists of arrays instead. I assume floating-point data rather than
trying to figure out whether int or float from the data. The code can
handle multiple data values on one line (not especially well-tested, but it
ought to work), but it assumes that the number of sub-blocks, index of the
sub-block, and sub-block size are each on the own line. The code gets a
little more complicated if that's not the case.

--
Robert Kern
from __future__ import print_function

import numpy as np


def write_random_file(filename, n_blocks=42, n_elems=60*1000*1000):
q, r = divmod(n_elems, n_blocks)
block_lengths = [q] * n_blocks
block_lengths[-1] += r
with open(filename, 'w') as f:
print('##BEGIN', file=f)
print(n_blocks, file=f)
for i, block_length in enumerate(block_lengths, 1):
print(i, file=f)
print(block_length, file=f)
block = np.random.randint(0, 1000, size=block_length)
for x in block:
print(x, file=f)


def read_blocked_file(filename):
blocks = []
with open(filename, 'r') as f:
# Loop over all blocks.
while True:
# Consume lines until the start of the next block.
# Unfortunately, we can't use `for line in f:` because we need to
# use `f.readline()` later.
line = f.readline()
found_block = True
while '##BEGIN' not in line:
line = f.readline()
if line == '':
# We've reached the end of the file.
found_block = False
break
if not found_block:
# We iterated to the end of the file. Break out of the `while`
# loop.
break

# Read the number of sub-blocks.
# This assumes that it is on a line all by itself.
n_subblocks = int(f.readline())
subblocks = []
for i_subblock in range(1, n_subblocks + 1):
read_i_subblock = int(f.readline())
# These ought to match.
if read_i_subblock != i_subblock:
raise RuntimeError("Mismatched sub-block index")
# Read the size of the sub-block.
subblock_size = int(f.readline())
# Allocate an array for the contents.
subblock_data = np.empty(subblock_size, dtype=float)
i = 0
while True:
line = f.readline()

Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread paul . carrico
Thanks Rober for your effort - I'll have a look on it 

...  the goal is be guide in how to proceed (and to understand), and not
to have a "ready-made solution" ... but I appreciate honnestly :-) 

Paul 

Le 2017-07-06 11:51, Robert Kern a écrit :

> On Thu, Jul 6, 2017 at 1:49 AM,  wrote:
>> 
>> Dear All
>> 
>> First of all thanks for the answers and the information's (I'll ding into 
>> it) and let me trying to add comments on what I want to :
>> 
>> My asci file mainly contains data (float and int) in a single column
>> (it is not always the case but I can easily manage it - as well I saw I can 
>> use 'spli' instruction if necessary)
>> Comments/texts indicates the beginning of a bloc immediately followed by the 
>> number of sub-blocs
>> So I need to read/record all the values in order to build a matrix before 
>> working on it (using Numpy & vectorization)
>> 
>> The columns 2 and 3 have been added for further treatments
>> The '0' values will be specifically treated afterward
>> 
>> 
>> Numpy won't be a problem I guess (I did some basic tests and I'm quite 
>> confident) on how to proceed, but I'm really blocked on data records ... I 
>> trying to find a way to efficiently read and record data in a matrix:
>> 
>> avoiding dynamic memory allocation (here using 'append' in python meaning, 
>> not np), 
> 
> Although you can avoid some list appending in your case (because the blocks 
> self-describe their length), I would caution you against prematurely avoiding 
> it. It's often the most natural way to write the code in Python, so go ahead 
> and write it that way first. Once you get it working correctly, but it's too 
> slow or memory intensive, then you can puzzle over how to preallocate the 
> numpy arrays later. But quite often, it's fine. In this case, the reading and 
> handling of the text data itself is probably the bottleneck, not appending to 
> the lists. As I said, Python lists are cleverly implemented to make appending 
> fast. Accumulating numbers in a list then converting to an array afterwards 
> is a well-accepted numpy idiom. 
> 
>> dealing with huge asci file: the latest file I get contains more than 60 
>> million of lines
>> 
>> Please find in attachment an extract of the input format 
>> ('example_of_input'), and the matrix I'm trying to create and manage with 
>> Numpy
>> 
>> Thanks again for your time
> 
> Try something like the attached. The function will return a list of blocks. 
> Each block will itself be a list of numpy arrays, which are the sub-blocks 
> themselves. I didn't bother adding the first three columns to the sub-blocks 
> or trying to assemble them all into a uniform-width matrix by padding with 
> trailing 0s. Since you say that the trailing 0s are going to be "specially 
> treated afterwards", I suspect that you can more easily work with the lists 
> of arrays instead. I assume floating-point data rather than trying to figure 
> out whether int or float from the data. The code can handle multiple data 
> values on one line (not especially well-tested, but it ought to work), but it 
> assumes that the number of sub-blocks, index of the sub-block, and sub-block 
> size are each on the own line. The code gets a little more complicated if 
> that's not the case.
> 
> --
> Robert Kern 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-06 Thread Ben Rowland
> On 5 Jul 2017, at 19:05, Stephan Hoyer  wrote:
> 
> On Wed, Jul 5, 2017 at 10:40 AM, Chris Barker  > wrote:
> Along those lines, there was some discussion of having a set of utilities (or 
> maybe eve3n an ABC?) that would make it easier to create a ndarray-like 
> object.
> 
> That is, the boilerplate needed for multi-dimensional indexing and slicing, 
> etc...
> 
> That could be a nice little sprint-able project.
> 
> Indeed. Let me highlight a few mixins 
> 
>  that I wrote for xarray that might be more broadly useful. The challenge 
> here is that there are quite a few different meanings to "ndarray-like", so 
> mixins really need to be mix-and-match-able. But at least defining a base 
> list of methods to implement/override would be useful.
> 
> In NumPy, this could go along with NDArrayOperatorsMixins in 
> numpy/lib/mixins.py 
> 
Slightly off topic, but as someone who has just spent a fair amount of time 
implementing various
subclasses of nd-array, I am interested (and a little concerned), that the 
consensus is not to use
them. Is there anything available which explains why this is the case and what 
the alternatives
are?

Ben___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Charles R Harris
Hi All,

I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show up
fixing #29943   so we can close #9272
, but the Python release has
been delayed to July 11 (expected). The Python problem means that NumPy
compiled with Python 3.6.1 will not run in Python 3.6.0. However, I've also
been asked to have a bugfixed version of 1.13 available for Scipy 2017 next
week. At this point it looks like the best thing to do is release 1.13.1
compiled with Python 3.6.1 and ask folks to upgrade Python if they have a
problem, and then release 1.13.2 as soon as 3.6.2 is released.

Thoughts?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Matthew Brett
Hi,

On Thu, Jul 6, 2017 at 2:10 PM, Charles R Harris
 wrote:
> Hi All,
>
> I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show up
> fixing #29943  so we can close #9272, but the Python release has been
> delayed to July 11 (expected). The Python problem means that NumPy compiled
> with Python 3.6.1 will not run in Python 3.6.0. However, I've also been
> asked to have a bugfixed version of 1.13 available for Scipy 2017 next week.
> At this point it looks like the best thing to do is release 1.13.1 compiled
> with Python 3.6.1 and ask folks to upgrade Python if they have a problem,
> and then release 1.13.2 as soon as 3.6.2 is released.

I think this problem only applies to Windows.  We might be able to
downgrade the Appveyor Python 3.6.1 to 3.6.0 for that - I can look
into it today if it would help.

While I'm at it - how about switching to OpenBLAS wheels on Windows
for this release?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Charles R Harris
On Thu, Jul 6, 2017 at 7:15 AM, Matthew Brett 
wrote:

> Hi,
>
> On Thu, Jul 6, 2017 at 2:10 PM, Charles R Harris
>  wrote:
> > Hi All,
> >
> > I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show up
> > fixing #29943  so we can close #9272, but the Python release has been
> > delayed to July 11 (expected). The Python problem means that NumPy
> compiled
> > with Python 3.6.1 will not run in Python 3.6.0. However, I've also been
> > asked to have a bugfixed version of 1.13 available for Scipy 2017 next
> week.
> > At this point it looks like the best thing to do is release 1.13.1
> compiled
> > with Python 3.6.1 and ask folks to upgrade Python if they have a problem,
> > and then release 1.13.2 as soon as 3.6.2 is released.
>
> I think this problem only applies to Windows.  We might be able to
> downgrade the Appveyor Python 3.6.1 to 3.6.0 for that - I can look
> into it today if it would help.
>
> While I'm at it - how about switching to OpenBLAS wheels on Windows
> for this release?
>
> Cheers,
>
> Matthew
>

Haste makes waste ;) I'd rather put off the move to OpenBlas to 1.14 to
allow more time for it to settle, and compiling against Python 3.6.0 seems
like more work than it is worth, It should be easy to upgrade to 3.6.1 for
those affected once they are aware of the problem, and it should not be too
long before Python 3.6.2 is out. I'll call it the Scipy2017 release.

Chuck

>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Matthew Brett
On Thu, Jul 6, 2017 at 3:37 PM, Charles R Harris
 wrote:
>
>
> On Thu, Jul 6, 2017 at 7:15 AM, Matthew Brett 
> wrote:
>>
>> Hi,
>>
>> On Thu, Jul 6, 2017 at 2:10 PM, Charles R Harris
>>  wrote:
>> > Hi All,
>> >
>> > I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show up
>> > fixing #29943  so we can close #9272, but the Python release has been
>> > delayed to July 11 (expected). The Python problem means that NumPy
>> > compiled
>> > with Python 3.6.1 will not run in Python 3.6.0. However, I've also been
>> > asked to have a bugfixed version of 1.13 available for Scipy 2017 next
>> > week.
>> > At this point it looks like the best thing to do is release 1.13.1
>> > compiled
>> > with Python 3.6.1 and ask folks to upgrade Python if they have a
>> > problem,
>> > and then release 1.13.2 as soon as 3.6.2 is released.
>>
>> I think this problem only applies to Windows.  We might be able to
>> downgrade the Appveyor Python 3.6.1 to 3.6.0 for that - I can look
>> into it today if it would help.
>>
>> While I'm at it - how about switching to OpenBLAS wheels on Windows
>> for this release?
>>
>> Cheers,
>>
>> Matthew
>
>
> Haste makes waste ;) I'd rather put off the move to OpenBlas to 1.14 to
> allow more time for it to settle,

I'd only say that I don't know of any settling that is likely to
happen.  I suspect that not many people have tried the experimental
wheels.   I've automated the build process both for OpenBLAS and the
OpenBLAS wheels, and I believe those are solid now.

> and compiling against Python 3.6.0 seems
> like more work than it is worth,

Probably about two hours of futzing on Appveyor - your call - I'm
happy not to do it :)

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread Chris Barker
OK, you have two performance "issues"

1) memory use: IF yu need to read a file to build a numpy array, and dont
know how big it is when you start,  you need to accumulate the values
first, and then make an array out of them. And numpy arrays are fixed size,
so they can not efficiently accumulate values.

The usual way to handle this is to read the data into a list with .append()
or the like, and then make an array from it. This is quite fast -- lists
are fast and efficient for extending arrays. However, you are then storing
(at least) a pointer and a python float object for each value, which is a
lot more memory than a single float value in a numpy array, and you need to
make the array from it, which means you have the full list and all its
pyton floats AND the array in memory at once.

Frankly, computers have a lot of memory these days, so this is a non-issue
in most cases.

Nonetheless, a while back I wrote an extendable numpy array object to
address just this issue. You can find the code on gitHub here:

https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulator.py

I have not tested it with recent numpy's but I expect is still works fine.
It's also py2, but wouldn't take much to port.

In practice, it uses less memory that the "build a list, then make it into
an array", but isnt any faster, unless you add (.extend) a bunch of values
at once, rather than one at a time. (if you do it one at a time, the whole
python float to numpy float conversion, and function call overhead takes
just as long).

But it will, generally be as fast or faster than using  a list, and use
less memory, so a fine basis for a big ascii file reader.

However, it looks like while your files may be huge, they hold a number of
arrays, so each array may not be large enough to bother with any of this.

2) parsing and converting overhead -- for the most part, python/numpy text
file reading code read the text into a python string, converts it to python
number objects, then puts them in a list or converts them to native numbers
in an array. This whole process is a bit slow (though reading files is slow
anyway, so usually not worth worrying about, which is why the built-in file
reading methods do this). To improve this, you need to use code that reads
the file and parses it in C, and puts it straight into a numpy array
without passing through python. This is what the pandas (and I assume
astropy) text file readers do.

But if you don't want those dependencies, there is the "fromfile()"
function in numpy -- it is not very robust, but if you files are
well-formed, then it is quite fast. So your code would look something like:

with open(the_filename) as infile:
while True:
line = infile.readline()
if not line:
break
# work with line to figure out the next block
if ready_to_read_a_block:
arr = np.fromfile(infile, dtype=np.int32, count=num_values,
sep=' ')
# sep specifies that you are reading text, not binary!
arr.shape = the_shape_it_should_be


But Robert is right -- get it to work with the "usual" methods -- i.e. put
numbers in a list, then make an array out it -- first, and then worry about
making it faster.

-CHB


On Thu, Jul 6, 2017 at 1:49 AM,  wrote:

> Dear All
>
>
> First of all thanks for the answers and the information’s (I’ll ding into
> it) and let me trying to add comments on what I want to :
>
>1. My asci file mainly contains data (float and int) in a single column
>2. (it is not always the case but I can easily manage it – as well I
>saw I can use ‘spli’ instruction if necessary)
>3. Comments/texts indicates the beginning of a bloc immediately
>followed by the number of sub-blocs
>4. So I need to read/record all the values in order to build a matrix
>before working on it (using Numpy & vectorization)
>   - The columns 2 and 3 have been added for further treatments
>   - The ‘0’ values will be specifically treated afterward
>
>
> Numpy won’t be a problem I guess (I did some basic tests and I’m quite
> confident) on how to proceed, but I’m really blocked on data records … I
> trying to find a way to efficiently read and record data in a matrix:
>
>- avoiding dynamic memory allocation (here using ‘append’ in python
>meaning, not np),
>- dealing with huge asci file: the latest file I get contains more
>than *60 million of lines*
>
>
> Please find in attachment an extract of the input format
> (‘example_of_input’), and the matrix I’m trying to create and manage with
> Numpy
>
>
> Thanks again for your time
>
> Paul
>
>
> ###
>
> ##BEGIN *-> line number x in the original file*
>
> 42   *-> indicates the number of sub-blocs*
>
> 1 *-> number of the 1rst sub-bloc*
>
> 6 *-> gives how many value belong to the sub bloc*
>
> 12
>
> 47
>
> 2
>
> 46
>
> 3
>
> 51
>
> ….
>
> 13  * -> another type of sub-bloc with 25 values*
>
> 25
>
> 15
>
> 88

Re: [Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Charles R Harris
On Thu, Jul 6, 2017 at 9:53 AM, Matthew Brett 
wrote:

> On Thu, Jul 6, 2017 at 3:37 PM, Charles R Harris
>  wrote:
> >
> >
> > On Thu, Jul 6, 2017 at 7:15 AM, Matthew Brett 
> > wrote:
> >>
> >> Hi,
> >>
> >> On Thu, Jul 6, 2017 at 2:10 PM, Charles R Harris
> >>  wrote:
> >> > Hi All,
> >> >
> >> > I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show
> up
> >> > fixing #29943  so we can close #9272, but the Python release has been
> >> > delayed to July 11 (expected). The Python problem means that NumPy
> >> > compiled
> >> > with Python 3.6.1 will not run in Python 3.6.0. However, I've also
> been
> >> > asked to have a bugfixed version of 1.13 available for Scipy 2017 next
> >> > week.
> >> > At this point it looks like the best thing to do is release 1.13.1
> >> > compiled
> >> > with Python 3.6.1 and ask folks to upgrade Python if they have a
> >> > problem,
> >> > and then release 1.13.2 as soon as 3.6.2 is released.
> >>
> >> I think this problem only applies to Windows.  We might be able to
> >> downgrade the Appveyor Python 3.6.1 to 3.6.0 for that - I can look
> >> into it today if it would help.
> >>
> >> While I'm at it - how about switching to OpenBLAS wheels on Windows
> >> for this release?
> >>
> >> Cheers,
> >>
> >> Matthew
> >
> >
> > Haste makes waste ;) I'd rather put off the move to OpenBlas to 1.14 to
> > allow more time for it to settle,
>
> I'd only say that I don't know of any settling that is likely to
> happen.  I suspect that not many people have tried the experimental
> wheels.   I've automated the build process both for OpenBLAS and the
> OpenBLAS wheels, and I believe those are solid now.
>

But it does add risk. We can deal with that in a regular release because of
the betas and release condidates, but I'm not counting on any of those for
1.13.1 (2).

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Chris Barker
On Thu, Jul 6, 2017 at 6:10 AM, Charles R Harris 
wrote:

> I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show up
> fixing #29943   so we can close #9272
> , but the Python release has
> been delayed to July 11 (expected). The Python problem means that NumPy
> compiled with Python 3.6.1 will not run in Python 3.6.0.
>

If it's compiled against 3.6.0 will it work fine with 3.6.1? and probably
3.6.2 as well?

If so, it would be nice to do it that way, if Matthew doesn't mind :-)

But either way, it'll be good to get it out.

Thanks!

-CHB



> However, I've also been asked to have a bugfixed version of 1.13 available
> for Scipy 2017 next week. At this point it looks like the best thing to do
> is release 1.13.1 compiled with Python 3.6.1 and ask folks to upgrade
> Python if they have a problem, and then release 1.13.2 as soon as 3.6.2 is
> released.
>
> Thoughts?
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-06 Thread Chris Barker
On Wed, Jul 5, 2017 at 11:05 AM, Stephan Hoyer  wrote:

> That is, the boilerplate needed for multi-dimensional indexing and
>> slicing, etc...
>>
>> That could be a nice little sprint-able project.
>>
>
> Indeed. Let me highlight a few mixins
> 
>  that
> I wrote for xarray that might be more broadly useful.
>

At a quick glance, that is exactly the kind of ting I had in mind.

The challenge here is that there are quite a few different meanings to
> "ndarray-like", so mixins really need to be mix-and-match-able.
>

exactly!


> But at least defining a base list of methods to implement/override would
> be useful.
>

With sample implementations, even... at last of parts of it -- I'm thinking
things like parsing out the indexes/slices in __getitem__ -- that sort of
thing.



> In NumPy, this could go along with NDArrayOperatorsMixins in
> numpy/lib/mixins.py
> 
>

Yes! I had no idea that existed.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-06 Thread Stephan Hoyer
On Thu, Jul 6, 2017 at 4:42 AM, Ben Rowland  wrote:

> Slightly off topic, but as someone who has just spent a fair amount of
> time implementing various
> subclasses of nd-array, I am interested (and a little concerned), that the
> consensus is not to use
> them. Is there anything available which explains why this is the case and
> what the alternatives
> are?
>

Writing such docs (especially to explain how to write array-like objects
that aren't subclasses) would be another good topic for the sprint ;).

But more seriously: numpy.ndarray subclasses are supported, but inherently
error prone, because we don't have a well defined subclassing API. As
Martin will attest, this means seemingly harmless internal refactoring in
NumPy has a tendency to break downstream subclasses, which often
unintentionally end up relying on untested implementation details.

This is particularly problematic when subclasses are implemented in a
different code-base, as is the case for user subclasses of numpy.ndarray.
Due to diligent testing efforts, we often (but not always) catch these
issues before making a release, but the process is inherently error prone.
Writing NumPy functionality in a manner that is robust to all possible
subclassing approaches turns out to be very difficult (nearly impossible).

This is actually a classic OOP problem, e.g., see
https://en.wikipedia.org/wiki/Composition_over_inheritance
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-06 Thread Stephan Hoyer
On Thu, Jul 6, 2017 at 9:42 AM, Chris Barker  wrote:

> In NumPy, this could go along with NDArrayOperatorsMixins in
>> numpy/lib/mixins.py
>> 
>>
>
> Yes! I had no idea that existed.
>

It's brand new for NumPy 1.13 :). I wrote it to go along with
__array_ufunc__.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Nathaniel Smith
It's also possible to work around the 3.6.1 problem with a small
preprocessor hack. On my phone but there's a link in the bug report
discussion.

On Jul 6, 2017 6:10 AM, "Charles R Harris" 
wrote:

> Hi All,
>
> I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show up
> fixing #29943   so we can close #9272
> , but the Python release has
> been delayed to July 11 (expected). The Python problem means that NumPy
> compiled with Python 3.6.1 will not run in Python 3.6.0. However, I've also
> been asked to have a bugfixed version of 1.13 available for Scipy 2017 next
> week. At this point it looks like the best thing to do is release 1.13.1
> compiled with Python 3.6.1 and ask folks to upgrade Python if they have a
> problem, and then release 1.13.2 as soon as 3.6.2 is released.
>
> Thoughts?
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread paul . carrico
Thanks all for your advices 

Well many thing to look for, but it's obvious now  that I've first to
work on (better) strategy than the one I was thinking previously (i.e.
load all the files and results in one step). 

It's is just a reflexion, but for huge files one solution might be to
split/write/build first the array in a dedicated file (2x o(n)
iterations - one to identify the blocks size - additional one to get and
write), and then to load it in memory and work with numpy - at this
stage the dimension is known and some packages will be fast and more
adapted (pandas or astropy as suggested). 

Thanks all for your time and help 

Paul___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread Robert Kern
On Thu, Jul 6, 2017 at 3:19 AM,  wrote:
>
> Thanks Rober for your effort - I'll have a look on it
>
> ...  the goal is be guide in how to proceed (and to understand), and not
to have a "ready-made solution" ... but I appreciate honnestly :-)

Sometimes it's easier to just write the code than to try to explain in
prose what to do. :-)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread Chris Barker
On Thu, Jul 6, 2017 at 10:55 AM,  wrote:
>
> It's is just a reflexion, but for huge files one solution might be to
> split/write/build first the array in a dedicated file (2x o(n) iterations -
> one to identify the blocks size - additional one to get and write), and
> then to load it in memory and work with numpy -
>

I may have your use case confused, but if you have a huge file with
multiple "blocks" in it, there shouldn't be any problem with loading it in
one go -- start at the top of the file and load one block at a time
(accumulating in a list) -- then you only have the memory overhead issues
for one block at a time, should be no problem.

at this stage the dimension is known and some packages will be fast and
> more adapted (pandas or astropy as suggested).
>
pandas at least is designed to read variations of CSV files, not sure you
could use the optimized part to read an array out of part of an open file
from a particular point or not.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Juan Nunez-Iglesias
Just chiming in with a +1 to releasing 1.13.1 before SciPy. It will certainly 
save the skimage tutorial a lot of headaches! Not that I’ll be there but I look 
out for my own. =P

On 7 Jul 2017, 3:54 AM +1000, Nathaniel Smith , wrote:
> It's also possible to work around the 3.6.1 problem with a small preprocessor 
> hack. On my phone but there's a link in the bug report discussion.
>
> > On Jul 6, 2017 6:10 AM, "Charles R Harris"  
> > wrote:
> > > Hi All,
> > >
> > > I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show up 
> > > fixing #29943  so we can close #9272, but the Python release has been 
> > > delayed to July 11 (expected). The Python problem means that NumPy 
> > > compiled with Python 3.6.1 will not run in Python 3.6.0. However, I've 
> > > also been asked to have a bugfixed version of 1.13 available for Scipy 
> > > 2017 next week. At this point it looks like the best thing to do is 
> > > release 1.13.1 compiled with Python 3.6.1 and ask folks to upgrade Python 
> > > if they have a problem, and then release 1.13.2 as soon as 3.6.2 is 
> > > released.
> > >
> > > Thoughts?
> > >
> > > Chuck
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NumPy 1.13.1 released

2017-07-06 Thread Charles R Harris
Hi All,

On behalf of the NumPy team, I am pleased to announce the release of NumPy
1.13.1. This is a bugfix release for problems found in 1.13.0. The major
changes are:


   - fixes for the new memory overlap detection,
   - fixes for the new temporary elision capability,
   - reversion of the removal of the boolean binary ``-`` operator.


It is recommended that users of 1.13.0 upgrade to 1.13.1. Wheels can be
found on PyPI .  Source tarballs,
zipfiles, release notes, and the changelog are available on github
.

Note that the wheels for Python 3.6 are built against 3.6.1, hence will not
work when used with 3.6.0 due to Python bug #29943
. The plan is to release NumPy 1.13.2
shortly after the release of Python 3.6.2 is out with a fix that problem.
If you are using 3.6.0, the workaround is to upgrade to 3.6.1 or use an
earlier Python version.




*Pull requests merged*A total of 19 pull requests were merged for this
release.

* #9240 DOC: BLD: fix lots of Sphinx warnings/errors.
* #9255 Revert "DEP: Raise TypeError for subtract(bool_, bool_)."
* #9261 BUG: don't elide into readonly and updateifcopy temporaries for...
* #9262 BUG: fix missing keyword rename for common block in numpy.f2py
* #9263 BUG: handle resize of 0d array
* #9267 DOC: update f2py front page and some doc build metadata.
* #9299 BUG: Fix Intel compilation on Unix.
* #9317 BUG: fix wrong ndim used in empty where check
* #9319 BUG: Make extensions compilable with MinGW on Py2.7
* #9339 BUG: Prevent crash if ufunc doc string is null
* #9340 BUG: umath: un-break ufunc where= when no out= is given
* #9371 DOC: Add isnat/positive ufunc to documentation
* #9372 BUG: Fix error in fromstring function from numpy.core.records...
* #9373 BUG: ')' is printed at the end pointer of the buffer in numpy.f2py.
* #9374 DOC: Create NumPy 1.13.1 release notes.
* #9376 BUG: Prevent hang traversing ufunc userloop linked list
* #9377 DOC: Use x1 and x2 in the heaviside docstring.
* #9378 DOC: Add $PARAMS to the isnat docstring
* #9379 DOC: Update the 1.13.1 release notes



*Contributors*
A total of 12 people contributed to this release.  People with a "+" by
their
names contributed a patch for the first time.

* Andras Deak +
* Bob Eldering +
* Charles Harris
* Daniel Hrisca +
* Eric Wieser
* Joshua Leahy +
* Julian Taylor
* Michael Seifert
* Pauli Virtanen
* Ralf Gommers
* Roland Kaufmann
* Warren Weckesser
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion