Hi
Thanks for the answer:
ascii file is an input format (and the only one I can deal with)
HDF5 one might be an export one (it's one of the options) in order to
speed up the post-processing stage
Paul
Le 2017-07-05 20:19, Thomas Caswell a écrit :
> Are you tied to ASCII files? HDF5 (via h5py or pytables) might be a better
> storage format for what you are describing.
>
> Tom
>
> On Wed, Jul 5, 2017 at 8:42 AM <paul.carr...@free.fr> wrote:
>
>> Dear all
>>
>> I'm sorry if my question is too basic (not fully in relation to Numpy -
>> while it is to build matrices and to work with Numpy afterward), but I'm
>> spending a lot of time and effort to find a way to record data from an asci
>> while, and reassign it into a matrix/array ... with unsuccessfully!
>>
>> The only way I found is to use _'append()'_ instruction involving dynamic
>> memory allocation. :-(
>>
>> From my current experience under Scilab (a like Matlab scientific solver),
>> it is well know:
>>
>> * Step 1 : matrix initialization like _'np.zeros(n,n)'_
>> * Step 2 : record the data
>> * and write it in the matrix (step 3)
>>
>> I'm obviously influenced by my current experience, but I'm interested in
>> moving to Python and its packages
>>
>> For huge asci files (involving dozens of millions of lines), my strategy is
>> to work by 'blocks' as :
>>
>> * Find the line index of the beginning and the end of one block (this
>> implies that the file is read ounce)
>> * Read the block
>> * (process repeated on the different other blocks)
>>
>> I tried different codes such as bellow, but each time Python is telling me I
>> CANNOT MIX ITERATION AND RECORD METHOD
>>
>> #############################################
>>
>> position = []; j=0
>>
>> with open(PATH + file_name, "r") as rough_ data:
>>
>> for line in rough_ data:
>>
>> if _my_criteria_ in line:
>>
>> position.append(j) ## huge blocs but limited in number
>>
>> j=j+1
>>
>> i = 0
>>
>> blockdata = np.zeros( (size_block), dtype=np.float)
>>
>> with open(PATH + file_name, "r") as f:
>>
>> for line in itertools.islice(f,1,size_block):
>>
>> blockdata [i]=float(f.readline() )
>>
>> i=i+1
>>
>> #########################################
>>
>> Should I work on lists using f.readlines (but this implies to load all the
>> file in memory).
>>
>> Additional question: can I use record with vectorization, with 'i
>> =np.arange(0,65406)' if I remain in the previous example
>>
>> Thanks for your time and comprehension
>>
>> (I'm obviously interested by doc references speaking about those specific
>> tasks)
>>
>> Paul
>>
>> PS: for Chuck: I'll had a look to pandas package but in an code
>> optimization step :-) (nearly 2000 doc pages)
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion