While I'm going to bet that the fastest way to build a ndarray from ascii is with a 'io.ByteIO` stream, NumPy does have a function to load from text, `numpy.loadtxt` that works well enough for most purposes.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html It's hard to tell from the original post if the ascii is being continuously generated or not. If it's being produced in an on-going fashion then a stream object is definitely the way to go, as the array chunks can be produced by `numpy.frombuffer()`. https://docs.python.org/3/library/io.html https://docs.scipy.org/doc/numpy/reference/generated/numpy.frombuffer.html Robert On Wed, Jul 5, 2017 at 3:21 PM, Robert Kern <robert.k...@gmail.com> wrote: > On Wed, Jul 5, 2017 at 5:41 AM, <paul.carr...@free.fr> wrote: > > > > Dear all > > > > I’m sorry if my question is too basic (not fully in relation to Numpy – > while it is to build matrices and to work with Numpy afterward), but I’m > spending a lot of time and effort to find a way to record data from an asci > while, and reassign it into a matrix/array … with unsuccessfully! > > > > The only way I found is to use ‘append()’ instruction involving dynamic > memory allocation. :-( > > Are you talking about appending to Python list objects? Or the np.append() > function on numpy arrays? > > In my experience, it is usually fine to build a list with the `.append()` > method while reading the file of unknown size and then converting it to an > array afterwards, even for dozens of millions of lines. The list object is > quite smart about reallocating memory so it is not that expensive. You > should generally avoid the np.append() function, though; it is not smart. > > > From my current experience under Scilab (a like Matlab scientific > solver), it is well know: > > > > Step 1 : matrix initialization like ‘np.zeros(n,n)’ > > Step 2 : record the data > > and write it in the matrix (step 3) > > > > I’m obviously influenced by my current experience, but I’m interested in > moving to Python and its packages > > > > For huge asci files (involving dozens of millions of lines), my strategy > is to work by ‘blocks’ as : > > > > Find the line index of the beginning and the end of one block (this > implies that the file is read ounce) > > Read the block > > (process repeated on the different other blocks) > > Are the blocks intrinsic parts of the file? Or are you just trying to > break up the file into fixed-size chunks? > > > I tried different codes such as bellow, but each time Python is telling > me I cannot mix iteration and record method > > > > ############################################# > > > > position = []; j=0 > > with open(PATH + file_name, "r") as rough_ data: > > for line in rough_ data: > > if my_criteria in line: > > position.append(j) ## huge blocs but limited in > number > > j=j+1 > > > > i = 0 > > blockdata = np.zeros( (size_block), dtype=np.float) > > with open(PATH + file_name, "r") as f: > > for line in itertools.islice(f,1,size_block): > > blockdata [i]=float(f.readline() ) > > For what it's worth, this is the line that is causing the error that you > describe. When you iterate over the file with the `for line in > itertools.islice(f, ...):` loop, you already have the line text. You don't > (and can't) call `f.readline()` to get it again. It would mess up the > iteration if you did and cause you to skip lines. > > By the way, it is useful to help us help you if you copy-paste the exact > code that you are running as well as the full traceback instead of > paraphrasing the error message. > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -- Robert McLeod, Ph.D. robert.mcl...@unibas.ch robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch> robbmcl...@gmail.com
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion