Thanks Rober for your effort - I'll have a look on it
... the goal is be guide in how to proceed (and to understand), and not
to have a "ready-made solution" ... but I appreciate honnestly :-)
Paul
Le 2017-07-06 11:51, Robert Kern a écrit :
> On Thu, Jul 6, 2017 at 1:49 AM, <paul.carr...@free.fr> wrote:
>>
>> Dear All
>>
>> First of all thanks for the answers and the information's (I'll ding into
>> it) and let me trying to add comments on what I want to :
>>
>> My asci file mainly contains data (float and int) in a single column
>> (it is not always the case but I can easily manage it - as well I saw I can
>> use 'spli' instruction if necessary)
>> Comments/texts indicates the beginning of a bloc immediately followed by the
>> number of sub-blocs
>> So I need to read/record all the values in order to build a matrix before
>> working on it (using Numpy & vectorization)
>>
>> The columns 2 and 3 have been added for further treatments
>> The '0' values will be specifically treated afterward
>>
>>
>> Numpy won't be a problem I guess (I did some basic tests and I'm quite
>> confident) on how to proceed, but I'm really blocked on data records ... I
>> trying to find a way to efficiently read and record data in a matrix:
>>
>> avoiding dynamic memory allocation (here using 'append' in python meaning,
>> not np),
>
> Although you can avoid some list appending in your case (because the blocks
> self-describe their length), I would caution you against prematurely avoiding
> it. It's often the most natural way to write the code in Python, so go ahead
> and write it that way first. Once you get it working correctly, but it's too
> slow or memory intensive, then you can puzzle over how to preallocate the
> numpy arrays later. But quite often, it's fine. In this case, the reading and
> handling of the text data itself is probably the bottleneck, not appending to
> the lists. As I said, Python lists are cleverly implemented to make appending
> fast. Accumulating numbers in a list then converting to an array afterwards
> is a well-accepted numpy idiom.
>
>> dealing with huge asci file: the latest file I get contains more than 60
>> million of lines
>>
>> Please find in attachment an extract of the input format
>> ('example_of_input'), and the matrix I'm trying to create and manage with
>> Numpy
>>
>> Thanks again for your time
>
> Try something like the attached. The function will return a list of blocks.
> Each block will itself be a list of numpy arrays, which are the sub-blocks
> themselves. I didn't bother adding the first three columns to the sub-blocks
> or trying to assemble them all into a uniform-width matrix by padding with
> trailing 0s. Since you say that the trailing 0s are going to be "specially
> treated afterwards", I suspect that you can more easily work with the lists
> of arrays instead. I assume floating-point data rather than trying to figure
> out whether int or float from the data. The code can handle multiple data
> values on one line (not especially well-tested, but it ought to work), but it
> assumes that the number of sub-blocks, index of the sub-block, and sub-block
> size are each on the own line. The code gets a little more complicated if
> that's not the case.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion