Are you tied to ASCII files?   HDF5 (via h5py or pytables) might be a
better storage format for what you are describing.

Tom

On Wed, Jul 5, 2017 at 8:42 AM <paul.carr...@free.fr> wrote:

> Dear all
>
>
> I’m sorry if my question is too basic (not fully in relation to Numpy –
> while it is to build matrices and to work with Numpy afterward), but I’m
> spending a lot of time and effort to find a way to record data from an asci
> while, and reassign it into a matrix/array … with unsuccessfully!
>
>
> The only way I found is to use *‘append()’* instruction involving dynamic
> memory allocation. :-(
>
>
> From my current experience under Scilab (a like Matlab scientific solver),
> it is well know:
>
>    1. Step 1 : matrix initialization like *‘np.zeros(n,n)’*
>    2. Step 2 : record the data
>    3. and write it in the matrix (step 3)
>
>
> I’m obviously influenced by my current experience, but I’m interested in
> moving to Python and its packages
>
>
> For huge asci files (involving dozens of millions of lines), my strategy
> is to work by ‘blocks’ as :
>
>    - Find the line index of the beginning and the end of one block (this
>    implies that the file is read ounce)
>    - Read the block
>    - (process repeated on the different other blocks)
>
>
> I tried different codes such as bellow, but each time Python is telling me *I
> cannot mix iteration and record method*
>
> #############################################
>
> position = []; j=0
>
> with open(PATH + file_name, "r") as rough_ data:
>
>             for line in rough_ data:
>
>                 if *my_criteria* in line:
>
>                     position.append(j) ## huge blocs but limited in number
>
>                 j=j+1
>
>
>         i = 0
>
>         blockdata = np.zeros( (size_block), dtype=np.float)
>
>         with open(PATH + file_name, "r") as f:
>
>                  for line in itertools.islice(f,1,size_block):
>
>                      blockdata [i]=float(f.readline() )
>
>                      i=i+1
>
>  #########################################
>
>
> Should I work on lists using f.readlines (but this implies to load all the
> file in memory).
>
>
> *Additional question*:  can I use record with vectorization, with ‘i
> =np.arange(0,65406)’ if I remain  in the previous example
>
>
>
> Thanks for your time and comprehension
>
> (I’m obviously interested by doc references speaking about those specific
> tasks)
>
>
> Paul
>
>
> PS: for Chuck:  I’ll had a look to pandas package but in an code
> optimization step :-) (nearly 2000 doc pages)
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to