Re: [Numpy-discussion] record data previous to Numpy use

2017-07-05 Thread Robert McLeod
While I'm going to bet that the fastest way to build a ndarray from ascii
is with a 'io.ByteIO` stream, NumPy does have a function to load from text,
`numpy.loadtxt` that works well enough for most purposes.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

It's hard to tell from the original post if the ascii is being continuously
generated or not.  If it's being produced in an on-going fashion then a
stream object is definitely the way to go, as the array chunks can be
produced by `numpy.frombuffer()`.

https://docs.python.org/3/library/io.html

https://docs.scipy.org/doc/numpy/reference/generated/numpy.frombuffer.html

Robert


On Wed, Jul 5, 2017 at 3:21 PM, Robert Kern  wrote:

> On Wed, Jul 5, 2017 at 5:41 AM,  wrote:
> >
> > Dear all
> >
> > I’m sorry if my question is too basic (not fully in relation to Numpy –
> while it is to build matrices and to work with Numpy afterward), but I’m
> spending a lot of time and effort to find a way to record data from an asci
> while, and reassign it into a matrix/array … with unsuccessfully!
> >
> > The only way I found is to use ‘append()’ instruction involving dynamic
> memory allocation. :-(
>
> Are you talking about appending to Python list objects? Or the np.append()
> function on numpy arrays?
>
> In my experience, it is usually fine to build a list with the `.append()`
> method while reading the file of unknown size and then converting it to an
> array afterwards, even for dozens of millions of lines. The list object is
> quite smart about reallocating memory so it is not that expensive. You
> should generally avoid the np.append() function, though; it is not smart.
>
> > From my current experience under Scilab (a like Matlab scientific
> solver), it is well know:
> >
> > Step 1 : matrix initialization like ‘np.zeros(n,n)’
> > Step 2 : record the data
> > and write it in the matrix (step 3)
> >
> > I’m obviously influenced by my current experience, but I’m interested in
> moving to Python and its packages
> >
> > For huge asci files (involving dozens of millions of lines), my strategy
> is to work by ‘blocks’ as :
> >
> > Find the line index of the beginning and the end of one block (this
> implies that the file is read ounce)
> > Read the block
> > (process repeated on the different other blocks)
>
> Are the blocks intrinsic parts of the file? Or are you just trying to
> break up the file into fixed-size chunks?
>
> > I tried different codes such as bellow, but each time Python is telling
> me I cannot mix iteration and record method
> >
> > #
> >
> > position = []; j=0
> > with open(PATH + file_name, "r") as rough_ data:
> > for line in rough_ data:
> > if my_criteria in line:
> > position.append(j) ## huge blocs but limited in
> number
> > j=j+1
> >
> > i = 0
> > blockdata = np.zeros( (size_block), dtype=np.float)
> > with open(PATH + file_name, "r") as f:
> >  for line in itertools.islice(f,1,size_block):
> >  blockdata [i]=float(f.readline() )
>
> For what it's worth, this is the line that is causing the error that you
> describe. When you iterate over the file with the `for line in
> itertools.islice(f, ...):` loop, you already have the line text. You don't
> (and can't) call `f.readline()` to get it again. It would mess up the
> iteration if you did and cause you to skip lines.
>
> By the way, it is useful to help us help you if you copy-paste the exact
> code that you are running as well as the full traceback instead of
> paraphrasing the error message.
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 
Robert McLeod, Ph.D.
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch 
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-05 Thread Derek Homeier
Hi Paul,

> ascii file is an input format (and the only one I can deal with)
> 
> HDF5 one might be an export one (it's one of the options) in order to speed 
> up the post-processing stage
> 
> 
> 
> Paul
> 
> 
> 
> 
> 
> Le 2017-07-05 20:19, Thomas Caswell a écrit :
> 
>> Are you tied to ASCII files?   HDF5 (via h5py or pytables) might be a better 
>> storage format for what you are describing.
>>  
>> Tom
>> 
>> On Wed, Jul 5, 2017 at 8:42 AM  wrote:
>> Dear all
>> 
>> 
>> 
>> I'm sorry if my question is too basic (not fully in relation to Numpy – 
>> while it is to build matrices and to work with Numpy afterward), but I'm 
>> spending a lot of time and effort to find a way to record data from an asci 
>> while, and reassign it into a matrix/array ... with unsuccessfully!
>> 
>> 
>> 
>> The only way I found is to use 'append()' instruction involving dynamic 
>> memory allocation. :-(
>> 
>> 
>> 
>> From my current experience under Scilab (a like Matlab scientific solver), 
>> it is well know:
>> 
>>  • Step 1 : matrix initialization like 'np.zeros(n,n)'
>>  • Step 2 : record the data
>>  • and write it in the matrix (step 3)
>> 
>> 
>> I'm obviously influenced by my current experience, but I'm interested in 
>> moving to Python and its packages
>> 
>> 
>> 
>> For huge asci files (involving dozens of millions of lines), my strategy is 
>> to work by 'blocks' as :
>> 
>>  • Find the line index of the beginning and the end of one block (this 
>> implies that the file is read ounce)
>>  • Read the block
>>  • (process repeated on the different other blocks)
>> 
>> 
>> I tried different codes such as bellow, but each time Python is telling me I 
>> cannot mix iteration and record method
>> 

if you are indeed tied to using ASCII input data, you will of course have to 
deal with significant
performance handicaps, but there are at least some gains to be had by using an 
input parser
that does not do all the conversions at the Python level, but with a compiled 
(C) reader - either
pandas as Tom already mentioned, or astropy - see e.g. 
https://github.com/dhomeier/astropy-notebooks/blob/master/io/ascii/ascii_read_bench.ipynb
for the almost one order of magnitude speed gains you may get.

In your example it is not clear what “record” method you were trying to use 
that raised the errors
you mention - we would certainly need a full traceback of the error to find out 
more.

In principle your approach of allocating the numpy matrix first and reading the 
data in chunks
makes sense, as it will avoid the much larger temporary lists created during 
read-in.
But it might be more convenient to just read in the block into a list of lines 
and pass that to a
higher-level reader like np.genfromtxt or the faster astropy.io.ascii.read or 
pandas.read_csv
to speed up the parsing of the numbers themselves.
That said, on most systems these readers should still be able to handle files 
up to a few 10^8
items (expect ~ 25-55 bytes of memory for each input number allocated for 
temporary lists),
so if saving memory is not an absolute priority, directly reading the entire 
file might still be the
best choice (and would also save the first pass reading).

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-05 Thread Robert Kern
On Wed, Jul 5, 2017 at 5:41 AM,  wrote:
>
> Dear all
>
> I’m sorry if my question is too basic (not fully in relation to Numpy –
while it is to build matrices and to work with Numpy afterward), but I’m
spending a lot of time and effort to find a way to record data from an asci
while, and reassign it into a matrix/array … with unsuccessfully!
>
> The only way I found is to use ‘append()’ instruction involving dynamic
memory allocation. :-(

Are you talking about appending to Python list objects? Or the np.append()
function on numpy arrays?

In my experience, it is usually fine to build a list with the `.append()`
method while reading the file of unknown size and then converting it to an
array afterwards, even for dozens of millions of lines. The list object is
quite smart about reallocating memory so it is not that expensive. You
should generally avoid the np.append() function, though; it is not smart.

> From my current experience under Scilab (a like Matlab scientific
solver), it is well know:
>
> Step 1 : matrix initialization like ‘np.zeros(n,n)’
> Step 2 : record the data
> and write it in the matrix (step 3)
>
> I’m obviously influenced by my current experience, but I’m interested in
moving to Python and its packages
>
> For huge asci files (involving dozens of millions of lines), my strategy
is to work by ‘blocks’ as :
>
> Find the line index of the beginning and the end of one block (this
implies that the file is read ounce)
> Read the block
> (process repeated on the different other blocks)

Are the blocks intrinsic parts of the file? Or are you just trying to break
up the file into fixed-size chunks?

> I tried different codes such as bellow, but each time Python is telling
me I cannot mix iteration and record method
>
> #
>
> position = []; j=0
> with open(PATH + file_name, "r") as rough_ data:
> for line in rough_ data:
> if my_criteria in line:
> position.append(j) ## huge blocs but limited in number
> j=j+1
>
> i = 0
> blockdata = np.zeros( (size_block), dtype=np.float)
> with open(PATH + file_name, "r") as f:
>  for line in itertools.islice(f,1,size_block):
>  blockdata [i]=float(f.readline() )

For what it's worth, this is the line that is causing the error that you
describe. When you iterate over the file with the `for line in
itertools.islice(f, ...):` loop, you already have the line text. You don't
(and can't) call `f.readline()` to get it again. It would mess up the
iteration if you did and cause you to skip lines.

By the way, it is useful to help us help you if you copy-paste the exact
code that you are running as well as the full traceback instead of
paraphrasing the error message.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-05 Thread paul . carrico
Hi 

Thanks for the answer: 

ascii file is an input format (and the only one I can deal with) 

HDF5 one might be an export one (it's one of the options) in order to
speed up the post-processing stage 

Paul 

Le 2017-07-05 20:19, Thomas Caswell a écrit :

> Are you tied to ASCII files?   HDF5 (via h5py or pytables) might be a better 
> storage format for what you are describing. 
> 
> Tom 
> 
> On Wed, Jul 5, 2017 at 8:42 AM  wrote: 
> 
>> Dear all 
>> 
>> I'm sorry if my question is too basic (not fully in relation to Numpy - 
>> while it is to build matrices and to work with Numpy afterward), but I'm 
>> spending a lot of time and effort to find a way to record data from an asci 
>> while, and reassign it into a matrix/array ... with unsuccessfully! 
>> 
>> The only way I found is to use _'append()'_ instruction involving dynamic 
>> memory allocation. :-( 
>> 
>> From my current experience under Scilab (a like Matlab scientific solver), 
>> it is well know: 
>> 
>> * Step 1 : matrix initialization like _'np.zeros(n,n)'_
>> * Step 2 : record the data
>> * and write it in the matrix (step 3)
>> 
>> I'm obviously influenced by my current experience, but I'm interested in 
>> moving to Python and its packages 
>> 
>> For huge asci files (involving dozens of millions of lines), my strategy is 
>> to work by 'blocks' as : 
>> 
>> * Find the line index of the beginning and the end of one block (this 
>> implies that the file is read ounce)
>> * Read the block
>> * (process repeated on the different other blocks)
>> 
>> I tried different codes such as bellow, but each time Python is telling me I 
>> CANNOT MIX ITERATION AND RECORD METHOD 
>> 
>> # 
>> 
>> position = []; j=0 
>> 
>> with open(PATH + file_name, "r") as rough_ data: 
>> 
>> for line in rough_ data: 
>> 
>> if _my_criteria_ in line: 
>> 
>> position.append(j) ## huge blocs but limited in number 
>> 
>> j=j+1 
>> 
>> i = 0 
>> 
>> blockdata = np.zeros( (size_block), dtype=np.float) 
>> 
>> with open(PATH + file_name, "r") as f: 
>> 
>> for line in itertools.islice(f,1,size_block): 
>> 
>> blockdata [i]=float(f.readline() ) 
>> 
>> i=i+1 
>> 
>> # 
>> 
>> Should I work on lists using f.readlines (but this implies to load all the 
>> file in memory). 
>> 
>> Additional question:  can I use record with vectorization, with 'i 
>> =np.arange(0,65406)' if I remain  in the previous example 
>> 
>> Thanks for your time and comprehension 
>> 
>> (I'm obviously interested by doc references speaking about those specific 
>> tasks) 
>> 
>> Paul 
>> 
>> PS: for Chuck:  I'll had a look to pandas package but in an code 
>> optimization step :-) (nearly 2000 doc pages) 
>> 
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-05 Thread Thomas Caswell
Are you tied to ASCII files?   HDF5 (via h5py or pytables) might be a
better storage format for what you are describing.

Tom

On Wed, Jul 5, 2017 at 8:42 AM  wrote:

> Dear all
>
>
> I’m sorry if my question is too basic (not fully in relation to Numpy –
> while it is to build matrices and to work with Numpy afterward), but I’m
> spending a lot of time and effort to find a way to record data from an asci
> while, and reassign it into a matrix/array … with unsuccessfully!
>
>
> The only way I found is to use *‘append()’* instruction involving dynamic
> memory allocation. :-(
>
>
> From my current experience under Scilab (a like Matlab scientific solver),
> it is well know:
>
>1. Step 1 : matrix initialization like *‘np.zeros(n,n)’*
>2. Step 2 : record the data
>3. and write it in the matrix (step 3)
>
>
> I’m obviously influenced by my current experience, but I’m interested in
> moving to Python and its packages
>
>
> For huge asci files (involving dozens of millions of lines), my strategy
> is to work by ‘blocks’ as :
>
>- Find the line index of the beginning and the end of one block (this
>implies that the file is read ounce)
>- Read the block
>- (process repeated on the different other blocks)
>
>
> I tried different codes such as bellow, but each time Python is telling me *I
> cannot mix iteration and record method*
>
> #
>
> position = []; j=0
>
> with open(PATH + file_name, "r") as rough_ data:
>
> for line in rough_ data:
>
> if *my_criteria* in line:
>
> position.append(j) ## huge blocs but limited in number
>
> j=j+1
>
>
> i = 0
>
> blockdata = np.zeros( (size_block), dtype=np.float)
>
> with open(PATH + file_name, "r") as f:
>
>  for line in itertools.islice(f,1,size_block):
>
>  blockdata [i]=float(f.readline() )
>
>  i=i+1
>
>  #
>
>
> Should I work on lists using f.readlines (but this implies to load all the
> file in memory).
>
>
> *Additional question*:  can I use record with vectorization, with ‘i
> =np.arange(0,65406)’ if I remain  in the previous example
>
>
>
> Thanks for your time and comprehension
>
> (I’m obviously interested by doc references speaking about those specific
> tasks)
>
>
> Paul
>
>
> PS: for Chuck:  I’ll had a look to pandas package but in an code
> optimization step :-) (nearly 2000 doc pages)
>
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Stephan Hoyer
On Wed, Jul 5, 2017 at 10:40 AM, Chris Barker  wrote:

> Along those lines, there was some discussion of having a set of utilities
> (or maybe eve3n an ABC?) that would make it easier to create a ndarray-like
> object.
>
> That is, the boilerplate needed for multi-dimensional indexing and
> slicing, etc...
>
> That could be a nice little sprint-able project.
>

Indeed. Let me highlight a few mixins

that
I wrote for xarray that might be more broadly useful. The challenge here is
that there are quite a few different meanings to "ndarray-like", so mixins
really need to be mix-and-match-able. But at least defining a base list of
methods to implement/override would be useful.

In NumPy, this could go along with NDArrayOperatorsMixins in
numpy/lib/mixins.py

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Chris Barker
On Mon, Jul 3, 2017 at 4:27 PM, Stephan Hoyer  wrote:

> If someone who does subclasses/array-likes or so (e.g. like Stefan
>> Hoyer ;)) and is interested, and also we do some
>> teleconferencing/chatting (and I have time) I might be interested
>> in discussing and possibly trying to develop the new indexer ideas,
>> which I feel are pretty far, but I got stuck on how to get subclasses
>> right.
>
>
> I am off course very happy to discuss this (online or via teleconference,
> sadly I won't be at scipy), but to be clear I use array likes, not
> subclasses. I think Marten van Kerkwijk is the last one who thinks that is
> still a good idea :).
>

Indeed -- I thought the community more or less had decided that duck-typing
was THE way to make something that could be plugged in where a numpy array
is expected.

Along those lines, there was some discussion of having a set of utilities
(or maybe eve3n an ABC?) that would make it easier to create a ndarray-like
object.

That is, the boilerplate needed for multi-dimensional indexing and slicing,
etc...

That could be a nice little sprint-able project.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Charles R Harris
Lots of good ideas here. It would help if issues were opened for them and
flagged with the sprint label. I'll be doing some myself, but I'm not as
intimately familiar with some of the topics as the proposers are.



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread David Cournapeau
On Wed, Jul 5, 2017 at 10:43 AM, Ralf Gommers 
wrote:

>
>
> On Mon, Jul 3, 2017 at 7:01 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Sun, Jul 2, 2017 at 9:33 AM, Sebastian Berg <
>> sebast...@sipsolutions.net> wrote:
>>
>>> On Sun, 2017-07-02 at 10:49 -0400, Allan Haldane wrote:
>>> > On 07/02/2017 10:03 AM, Charles R Harris wrote:
>>> > > Updated list below.
>>> > >
>>> > > On Sat, Jul 1, 2017 at 7:08 PM, Benjamin Root >> > >
>>> > > > wrote:
>>> > >
>>> > > Just a heads-up. There is now a sphinx-gallery plugin.
>>> > > Matplotlib
>>> > > and a few other projects have migrated their docs over to use
>>> > > it.
>>> > >
>>> > > https://sphinx-gallery.readthedocs.io/en/latest/
>>> > > 
>>> > >
>>> > > Cheers!
>>> > > Ben Root
>>> > >
>>> > >
>>> > > On Sat, Jul 1, 2017 at 7:12 AM, Ralf Gommers >> > > l.com
>>> > > > wrote:
>>> > >
>>> > >
>>> > >
>>> > > On Fri, Jun 30, 2017 at 6:50 AM, Pauli Virtanen >> > > > wrote:
>>> > >
>>> > > Charles R Harris kirjoitti 29.06.2017 klo 20:45:
>>> > > > Here's a random idea: how about building a NumPy
>>> > > gallery?
>>> > > > scikit-{image,learn} has it, and while those
>>> > > projects may have more
>>> > > > visual datasets, I can imagine something along
>>> > > the lines of Nicolas
>>> > > > Rougier's beautiful book:
>>> > > >
>>> > > > http://www.labri.fr/perso/nrougier/from-python-to
>>> > > -numpy/
>>> > > >> > > y/>
>>> > > > >> > > o-numpy/
>>> > > >> > > y/>>
>>> > > >
>>> > > >
>>> > > > So that would be added in the  numpy
>>> > > > /numpy.org
>>> > > 
>>> > > > >> > > > repo?
>>> > >
>>> > > Or https://scipy-cookbook.readthedocs.io/
>>> > >   ?
>>> > > (maybe minus bitrot and images added :)
>>> > > _
>>> > >
>>> > >
>>> > > I'd like the numpy.org  one. numpy.org
>>> > >  is now incredibly sparse and ugly, a
>>> > > gallery
>>> > > would make it look a lot better.
>>> > >
>>> > > Another idea, from the "deprecate np.matrix" discussion:
>>> > > add
>>> > > numpy documentation describing the preferred way to handle
>>> > > matrices, extolling the virtues of @, and move np.matrix
>>> > > documentation to a deprecated section.
>>> > >
>>> > >
>>> > >   Putting things together with a few new ideas,
>>> > >
>>> > >  1. add gallery to numpy.org ,
>>> > >  2. add extended documentation of '@' operator,
>>> > >  3. make Numpy tests Pytest compatible,
>>> > >  4. add matrix multiplication ufunc.
>>> > >
>>> > >   Any more ideas?
>>> >
>>> > The new doctest runner suggested in the printing thread? This is to
>>> > ignore whitespace and precision in ndarray output.
>>> >
>>> > I can see an argument for distributing it in numpy if it is designed
>>> > to
>>> > be specially aware of ndarrays or numpy scalars (eg to test equality
>>> > between 'wants' and 'got')
>>> >
>>>
>>> I don't really feel it is very numpy specific or should be under the
>>> numpy umbrella (I mean if there is no other spot, I guess it could live
>>> on the numpy github page). Its about as numpy specific, as the gallery
>>> sphinx extension is probably matplotlib specific
>>>
>>> That doesn't mean that it might not be a good sprint, though :).
>>>
>>> The question to me is a bit what those who actually go there want from
>>> it or do a few people who know numpy/scipy already plan to come? Two
>>> years ago, we did not have much of a plan, so it was mostly giving
>>> three people or so a bit of a tutorial of how numpy worked internally
>>> leading to some bug fixes.
>>>
>>> One quick idea that might be nice and dives a bit into the C-layer
>>> (might be nice if there is no big topic with a few people working on):
>>>
>>> * Find places that should have the new memory overlap
>>>   detection and implement it there.
>>>
>>> If someone who does subclasses/array-likes or so (e.g. like Stefan
>>> Hoyer ;)) and is interested, and also we do some
>>> teleconferencing/chatting (and I have time) I might be interested
>>> in discussing and possibly trying to develop the new indexer ideas,
>>> which I feel are pretty far, but I got stuck on how to get subclasses
>>> right.
>>>
>>> - 

[Numpy-discussion] record data previous to Numpy use

2017-07-05 Thread paul . carrico
Dear all 

I'm sorry if my question is too basic (not fully in relation to Numpy -
while it is to build matrices and to work with Numpy afterward), but I'm
spending a lot of time and effort to find a way to record data from an
asci while, and reassign it into a matrix/array … with unsuccessfully! 

The only way I found is to use _'append()'_ instruction involving
dynamic memory allocation. :-( 

>From my current experience under Scilab (a like Matlab scientific
solver), it is well know: 

* Step 1 : matrix initialization like _'np.zeros(n,n)'_
* Step 2 : record the data
* and write it in the matrix (step 3)

I'm obviously influenced by my current experience, but I'm interested in
moving to Python and its packages 

For huge asci files (involving dozens of millions of lines), my strategy
is to work by 'blocks' as : 

* Find the line index of the beginning and the end of one block (this
implies that the file is read ounce)
* Read the block
* (process repeated on the different other blocks)

I tried different codes such as bellow, but each time Python is telling
me I CANNOT MIX ITERATION AND RECORD METHOD 

# 

position = []; j=0 

with open(PATH + file_name, "r") as rough_ data: 

for line in rough_ data: 

if _my_criteria_ in line: 

position.append(j) ## huge blocs but limited in
number 

j=j+1 

i = 0 

blockdata = np.zeros( (size_block), dtype=np.float) 

with open(PATH + file_name, "r") as f: 

 for line in itertools.islice(f,1,size_block): 

 blockdata [i]=float(f.readline() ) 

 i=i+1 

 # 

Should I work on lists using f.readlines (but this implies to load all
the file in memory). 

Additional question:  can I use record with vectorization, with 'i
=np.arange(0,65406)' if I remain  in the previous example 

Thanks for your time and comprehension 

(I'm obviously interested by doc references speaking about those
specific tasks) 

Paul 

PS: for Chuck:  I'll had a look to pandas package but in an code
optimization step :-) (nearly 2000 doc pages)___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Matthew Brett
On Wed, Jul 5, 2017 at 11:31 AM, Peter Cock  wrote:
> On Wed, Jul 5, 2017 at 11:25 AM, Ralf Gommers  wrote:
>>
>>
>> On Wed, Jul 5, 2017 at 10:14 PM, Peter Cock 
>> wrote:
>>>
>>> Note that TravisCI does not yet have official Python support on Mac OS X,
>>>
>>> https://github.com/travis-ci/travis-ci/issues/2312
>>>
>>> I believe it is possible to do anyway by faking it under another setting
>>> (e.g. pretend to be a generic language build, and use the system Python
>>> or install your own specific version of Python as needed), so that may be
>>> worth trying during a sprint.
>>
>>
>> That approach has worked reliably for
>> https://github.com/MacPython/numpy-wheels for a while now, so should be
>> straightforward.
>>
>> Ralf
>
> Thanks for that link - I'm going off topic but the MacPython wiki page goes
> into more background about how they build wheels for PyPI which I'm
> very interested to read up on:
>
> https://github.com/MacPython/wiki/wiki/Spinning-wheels

Yes, you'll see that the multibuild framework that numpy and scipy
uses, includes utilities to download Python.org Python and build
against that, in Spinning-wheels fashion,

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Matthew Brett
On Wed, Jul 5, 2017 at 11:25 AM, Ralf Gommers  wrote:
>
>
> On Wed, Jul 5, 2017 at 10:14 PM, Peter Cock 
> wrote:
>>
>> Note that TravisCI does not yet have official Python support on Mac OS X,
>>
>> https://github.com/travis-ci/travis-ci/issues/2312
>>
>> I believe it is possible to do anyway by faking it under another setting
>> (e.g. pretend to be a generic language build, and use the system Python
>> or install your own specific version of Python as needed), so that may be
>> worth trying during a sprint.
>
>
> That approach has worked reliably for
> https://github.com/MacPython/numpy-wheels for a while now, so should be
> straightforward.

And https://travis-ci.org/MacPython/scipy-wheels where we are testing
OSX, 64 and 32 bit manylinux builds daily.  That didn't catch the
recent ndimage error because I'd disabled the 32-bit builds there.

Numpy, scipy, and a fairly large number of other projects use
https://github.com/matthew-brett/multibuild to set up builds in this
way for manylinux, OSX and (with a bit more effort) Windows.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Peter Cock
On Wed, Jul 5, 2017 at 11:25 AM, Ralf Gommers  wrote:
>
>
> On Wed, Jul 5, 2017 at 10:14 PM, Peter Cock 
> wrote:
>>
>> Note that TravisCI does not yet have official Python support on Mac OS X,
>>
>> https://github.com/travis-ci/travis-ci/issues/2312
>>
>> I believe it is possible to do anyway by faking it under another setting
>> (e.g. pretend to be a generic language build, and use the system Python
>> or install your own specific version of Python as needed), so that may be
>> worth trying during a sprint.
>
>
> That approach has worked reliably for
> https://github.com/MacPython/numpy-wheels for a while now, so should be
> straightforward.
>
> Ralf

Thanks for that link - I'm going off topic but the MacPython wiki page goes
into more background about how they build wheels for PyPI which I'm
very interested to read up on:

https://github.com/MacPython/wiki/wiki/Spinning-wheels

Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Ralf Gommers
On Wed, Jul 5, 2017 at 10:14 PM, Peter Cock 
wrote:

> Note that TravisCI does not yet have official Python support on Mac OS X,
>
> https://github.com/travis-ci/travis-ci/issues/2312
>
> I believe it is possible to do anyway by faking it under another setting
> (e.g. pretend to be a generic language build, and use the system Python
> or install your own specific version of Python as needed), so that may be
> worth trying during a sprint.
>

That approach has worked reliably for
https://github.com/MacPython/numpy-wheels for a while now, so should be
straightforward.

Ralf



> Peter
>
> On Wed, Jul 5, 2017 at 10:43 AM, Ralf Gommers 
> wrote:
> >
> > Better platform test coverage would be a useful topic if someone is
> willing
> > to work on that. NumPy needs OS X testing enabled on TravisCI, SciPy
> needs
> > OS X and a 32-bit test (steal from NumPy). And if someone really feels
> > ambitious: replace ATLAS by OpenBLAS in one of the test matrix entries.
> >
> > Ralf
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Peter Cock
Note that TravisCI does not yet have official Python support on Mac OS X,

https://github.com/travis-ci/travis-ci/issues/2312

I believe it is possible to do anyway by faking it under another setting
(e.g. pretend to be a generic language build, and use the system Python
or install your own specific version of Python as needed), so that may be
worth trying during a sprint.

Peter

On Wed, Jul 5, 2017 at 10:43 AM, Ralf Gommers  wrote:
>
> Better platform test coverage would be a useful topic if someone is willing
> to work on that. NumPy needs OS X testing enabled on TravisCI, SciPy needs
> OS X and a 32-bit test (steal from NumPy). And if someone really feels
> ambitious: replace ATLAS by OpenBLAS in one of the test matrix entries.
>
> Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Ralf Gommers
On Mon, Jul 3, 2017 at 7:01 AM, Charles R Harris 
wrote:

>
>
> On Sun, Jul 2, 2017 at 9:33 AM, Sebastian Berg  > wrote:
>
>> On Sun, 2017-07-02 at 10:49 -0400, Allan Haldane wrote:
>> > On 07/02/2017 10:03 AM, Charles R Harris wrote:
>> > > Updated list below.
>> > >
>> > > On Sat, Jul 1, 2017 at 7:08 PM, Benjamin Root > > >
>> > > > wrote:
>> > >
>> > > Just a heads-up. There is now a sphinx-gallery plugin.
>> > > Matplotlib
>> > > and a few other projects have migrated their docs over to use
>> > > it.
>> > >
>> > > https://sphinx-gallery.readthedocs.io/en/latest/
>> > > 
>> > >
>> > > Cheers!
>> > > Ben Root
>> > >
>> > >
>> > > On Sat, Jul 1, 2017 at 7:12 AM, Ralf Gommers > > > l.com
>> > > > wrote:
>> > >
>> > >
>> > >
>> > > On Fri, Jun 30, 2017 at 6:50 AM, Pauli Virtanen > > > > wrote:
>> > >
>> > > Charles R Harris kirjoitti 29.06.2017 klo 20:45:
>> > > > Here's a random idea: how about building a NumPy
>> > > gallery?
>> > > > scikit-{image,learn} has it, and while those
>> > > projects may have more
>> > > > visual datasets, I can imagine something along
>> > > the lines of Nicolas
>> > > > Rougier's beautiful book:
>> > > >
>> > > > http://www.labri.fr/perso/nrougier/from-python-to
>> > > -numpy/
>> > > > > > y/>
>> > > > > > > o-numpy/
>> > > > > > y/>>
>> > > >
>> > > >
>> > > > So that would be added in the  numpy
>> > > > /numpy.org
>> > > 
>> > > > > > > > repo?
>> > >
>> > > Or https://scipy-cookbook.readthedocs.io/
>> > >   ?
>> > > (maybe minus bitrot and images added :)
>> > > _
>> > >
>> > >
>> > > I'd like the numpy.org  one. numpy.org
>> > >  is now incredibly sparse and ugly, a
>> > > gallery
>> > > would make it look a lot better.
>> > >
>> > > Another idea, from the "deprecate np.matrix" discussion:
>> > > add
>> > > numpy documentation describing the preferred way to handle
>> > > matrices, extolling the virtues of @, and move np.matrix
>> > > documentation to a deprecated section.
>> > >
>> > >
>> > >   Putting things together with a few new ideas,
>> > >
>> > >  1. add gallery to numpy.org ,
>> > >  2. add extended documentation of '@' operator,
>> > >  3. make Numpy tests Pytest compatible,
>> > >  4. add matrix multiplication ufunc.
>> > >
>> > >   Any more ideas?
>> >
>> > The new doctest runner suggested in the printing thread? This is to
>> > ignore whitespace and precision in ndarray output.
>> >
>> > I can see an argument for distributing it in numpy if it is designed
>> > to
>> > be specially aware of ndarrays or numpy scalars (eg to test equality
>> > between 'wants' and 'got')
>> >
>>
>> I don't really feel it is very numpy specific or should be under the
>> numpy umbrella (I mean if there is no other spot, I guess it could live
>> on the numpy github page). Its about as numpy specific, as the gallery
>> sphinx extension is probably matplotlib specific
>>
>> That doesn't mean that it might not be a good sprint, though :).
>>
>> The question to me is a bit what those who actually go there want from
>> it or do a few people who know numpy/scipy already plan to come? Two
>> years ago, we did not have much of a plan, so it was mostly giving
>> three people or so a bit of a tutorial of how numpy worked internally
>> leading to some bug fixes.
>>
>> One quick idea that might be nice and dives a bit into the C-layer
>> (might be nice if there is no big topic with a few people working on):
>>
>> * Find places that should have the new memory overlap
>>   detection and implement it there.
>>
>> If someone who does subclasses/array-likes or so (e.g. like Stefan
>> Hoyer ;)) and is interested, and also we do some
>> teleconferencing/chatting (and I have time) I might be interested
>> in discussing and possibly trying to develop the new indexer ideas,
>> which I feel are pretty far, but I got stuck on how to get subclasses
>> right.
>>
>> - Sebastian
>>
>>
>>
> I've opened an issue for Pytests
>  and given it a "Scipy2017
> Sprint" label. I'd be much obliged if the folks with suggestions here would
> open other issues and also label the