Re: [Numpy-discussion] record data previous to Numpy use

paul . carrico Wed, 05 Jul 2017 11:41:16 -0700

Hi 

Thanks for the answer:


ascii file is an input format (and the only one I can deal with) 

HDF5 one might be an export one (it's one of the options) in order to
speed up the post-processing stage 

Paul 

Le 2017-07-05 20:19, Thomas Caswell a écrit :

> Are you tied to ASCII files?   HDF5 (via h5py or pytables) might be a better 
> storage format for what you are describing. 
> 
> Tom 
> 
> On Wed, Jul 5, 2017 at 8:42 AM <paul.carr...@free.fr> wrote: 
> 
>> Dear all 
>> 
>> I'm sorry if my question is too basic (not fully in relation to Numpy - 
>> while it is to build matrices and to work with Numpy afterward), but I'm 
>> spending a lot of time and effort to find a way to record data from an asci 
>> while, and reassign it into a matrix/array ... with unsuccessfully! 
>> 
>> The only way I found is to use _'append()'_ instruction involving dynamic 
>> memory allocation. :-( 
>> 
>> From my current experience under Scilab (a like Matlab scientific solver), 
>> it is well know: 
>> 
>> * Step 1 : matrix initialization like _'np.zeros(n,n)'_
>> * Step 2 : record the data
>> * and write it in the matrix (step 3)
>> 
>> I'm obviously influenced by my current experience, but I'm interested in 
>> moving to Python and its packages 
>> 
>> For huge asci files (involving dozens of millions of lines), my strategy is 
>> to work by 'blocks' as : 
>> 
>> * Find the line index of the beginning and the end of one block (this 
>> implies that the file is read ounce)
>> * Read the block
>> * (process repeated on the different other blocks)
>> 
>> I tried different codes such as bellow, but each time Python is telling me I 
>> CANNOT MIX ITERATION AND RECORD METHOD 
>> 
>> ############################################# 
>> 
>> position = []; j=0 
>> 
>> with open(PATH + file_name, "r") as rough_ data: 
>> 
>> for line in rough_ data: 
>> 
>> if _my_criteria_ in line: 
>> 
>> position.append(j) ## huge blocs but limited in number 
>> 
>> j=j+1 
>> 
>> i = 0 
>> 
>> blockdata = np.zeros( (size_block), dtype=np.float) 
>> 
>> with open(PATH + file_name, "r") as f: 
>> 
>> for line in itertools.islice(f,1,size_block): 
>> 
>> blockdata [i]=float(f.readline() ) 
>> 
>> i=i+1 
>> 
>> ######################################### 
>> 
>> Should I work on lists using f.readlines (but this implies to load all the 
>> file in memory). 
>> 
>> Additional question:  can I use record with vectorization, with 'i 
>> =np.arange(0,65406)' if I remain  in the previous example 
>> 
>> Thanks for your time and comprehension 
>> 
>> (I'm obviously interested by doc references speaking about those specific 
>> tasks) 
>> 
>> Paul 
>> 
>> PS: for Chuck:  I'll had a look to pandas package but in an code 
>> optimization step :-) (nearly 2000 doc pages) 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] record data previous to Numpy use

Reply via email to