Re: [Numpy-discussion] record data previous to Numpy use

paul . carrico Thu, 06 Jul 2017 03:20:07 -0700

Thanks Rober for your effort - I'll have a look on it 

...  the goal is be guide in how to proceed (and to understand), and not
to have a "ready-made solution" ... but I appreciate honnestly :-)


Paul 

Le 2017-07-06 11:51, Robert Kern a écrit :

> On Thu, Jul 6, 2017 at 1:49 AM, <paul.carr...@free.fr> wrote:
>> 
>> Dear All
>> 
>> First of all thanks for the answers and the information's (I'll ding into 
>> it) and let me trying to add comments on what I want to :
>> 
>> My asci file mainly contains data (float and int) in a single column
>> (it is not always the case but I can easily manage it - as well I saw I can 
>> use 'spli' instruction if necessary)
>> Comments/texts indicates the beginning of a bloc immediately followed by the 
>> number of sub-blocs
>> So I need to read/record all the values in order to build a matrix before 
>> working on it (using Numpy & vectorization)
>> 
>> The columns 2 and 3 have been added for further treatments
>> The '0' values will be specifically treated afterward
>> 
>> 
>> Numpy won't be a problem I guess (I did some basic tests and I'm quite 
>> confident) on how to proceed, but I'm really blocked on data records ... I 
>> trying to find a way to efficiently read and record data in a matrix:
>> 
>> avoiding dynamic memory allocation (here using 'append' in python meaning, 
>> not np), 
> 
> Although you can avoid some list appending in your case (because the blocks 
> self-describe their length), I would caution you against prematurely avoiding 
> it. It's often the most natural way to write the code in Python, so go ahead 
> and write it that way first. Once you get it working correctly, but it's too 
> slow or memory intensive, then you can puzzle over how to preallocate the 
> numpy arrays later. But quite often, it's fine. In this case, the reading and 
> handling of the text data itself is probably the bottleneck, not appending to 
> the lists. As I said, Python lists are cleverly implemented to make appending 
> fast. Accumulating numbers in a list then converting to an array afterwards 
> is a well-accepted numpy idiom. 
> 
>> dealing with huge asci file: the latest file I get contains more than 60 
>> million of lines
>> 
>> Please find in attachment an extract of the input format 
>> ('example_of_input'), and the matrix I'm trying to create and manage with 
>> Numpy
>> 
>> Thanks again for your time
> 
> Try something like the attached. The function will return a list of blocks. 
> Each block will itself be a list of numpy arrays, which are the sub-blocks 
> themselves. I didn't bother adding the first three columns to the sub-blocks 
> or trying to assemble them all into a uniform-width matrix by padding with 
> trailing 0s. Since you say that the trailing 0s are going to be "specially 
> treated afterwards", I suspect that you can more easily work with the lists 
> of arrays instead. I assume floating-point data rather than trying to figure 
> out whether int or float from the data. The code can handle multiple data 
> values on one line (not especially well-tested, but it ought to work), but it 
> assumes that the number of sub-blocks, index of the sub-block, and sub-block 
> size are each on the own line. The code gets a little more complicated if 
> that's not the case.
> 
> --
> Robert Kern 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] record data previous to Numpy use

Reply via email to