On Sat, Nov 1, 2014 at 4:41 PM, Alexander Belopolsky <ndar...@mac.com> wrote:
> > On Sat, Nov 1, 2014 at 3:15 PM, Warren Weckesser < > warren.weckes...@gmail.com> wrote: > >> Is there wider interest in such an argument to `genfromtxt`? For my >> use-cases, `max_rows` is sufficient. I can't recall ever needing the full >> generality of a slice for pulling apart a text file. Does anyone have >> compelling use-cases that are not handled by `max_rows`? >> > > It is occasionally useful to be able to skip rows after the header. Maybe > we should de-deprecate skip_rows and give it the meaning different from > skip_header in case of names = None? For example, > > genfromtxt(fname, skip_header= 3, skip_rows = 1, max_rows = 100) > > would mean skip 3 lines, read column names from the 4-th, skip 5-th, > process up to 100 more lines. This may be useful if the file contains some > meta-data about the column below the header line. For example, it is > common to put units of measurement below the column names. > Or you could just call genfromtxt() once with `max_rows=1` to skip a row. (I'm assuming that the first argument to genfromtxt is the open file object--or some other iterator--and not the filename.) > > Another application could be processing a large text file in chunks, which > again can be covered nicely by skip_rows/max_rows. > You don't really need `skip_rows` for this. In a previous email (and in https://github.com/numpy/numpy/pull/5103) I gave an example of using `max_rows` for handling a file that doesn't have a header. If the file has a header, you could process the file in batches using something like the following example, where the dtype determined in the first batch is used when reading the subsequent batches: In [12]: !cat foo.dat a b c 1.0 2.0 -9.0 3.0 4.0 -7.6 5.0 6.0 -1.0 7.0 8.0 -3.3 9.0 0.0 -3.4 In [13]: f = open("foo.dat", "r") In [14]: batch1 = genfromtxt(f, dtype=None, names=True, max_rows=2) In [15]: batch1 Out[15]: array([(1.0, 2.0, -9.0), (3.0, 4.0, -7.6)], dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')]) In [16]: batch2 = genfromtxt(f, dtype=batch1.dtype, max_rows=2) In [17]: batch2 Out[17]: array([(5.0, 6.0, -1.0), (7.0, 8.0, -3.3)], dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')]) In [18]: batch3 = genfromtxt(f, dtype=batch1.dtype, max_rows=2) In [19]: batch3 Out[19]: array((9.0, 0.0, -3.4), dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')]) Warren > I cannot think of a situation where I would need more generality such as > reading every 3rd row or rows with the given numbers. Such processing is > normally done after the text data is loaded into an array. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion