Re: [Numpy-discussion] Add `nrows` to `genfromtxt`

Warren Weckesser Sat, 01 Nov 2014 07:32:12 -0700

On 9/24/14, Alan G Isaac <alan.is...@gmail.com> wrote:
> On 9/24/2014 2:52 PM, Jaime Fernández del Río wrote:
>> There is a PR in github that adds a new keyword to the genfromtxt
>> function, to limit the number of rows that actually get read in:
>> https://github.com/numpy/numpy/pull/5103
>
> Sorry to come late to this party, but it seems to me that
> more versatile than an `nrows` keyword for the number of rows
> would be a "rows" keyword for a slice argument.
>
> fwiw,
> Alan Isaac
>


I've continued the PR for the addition of the `nrows` (now
`max_rows`) argument to `genfromtxt` here:
    https://github.com/numpy/numpy/pull/5253

Alan's suggestion to use a slice is interesting, but I'd like to
see a more concrete proposal for the API.  For example, how does
it interact with `skip_header` and `skip_footer`?  How would one
use it to read a file in batches?

The following are a couple use-cases for `max_rows` (originally
added as comments at https://github.com/numpy/numpy/pull/5103):


(1) Read a file in batches:

Suppose the file "a.csv" contains:

 0 10
 1 11
 2 12
 3 13
 4 14
 5 15
 6 16
 7 17
 8 18
 9 19

With `max_rows`, the file can be read in batches of, say, 4:

In [31]: f = open("a.csv", "r")

In [32]: genfromtxt(f, dtype=None, max_rows=4)
Out[32]:
array([[ 0, 10],
       [ 1, 11],
       [ 2, 12],
       [ 3, 13]])

In [33]: genfromtxt(f, dtype=None, max_rows=4)
Out[33]:
array([[ 4, 14]
       [ 5, 15],
       [ 6, 16],
       [ 7, 17]])

In [33]: genfromtxt(f, dtype=None, max_rows=4)
Out[33]:
array([[ 8, 18],
       [ 9, 19]])


(2) Multiple arrays in a single file:

I've seen a file format of the form

3 5
1.0 1.5 2.1 2.5 4.8
3.5 1.0 8.7 6.0 2.0
4.2 0.7 4.4 5.3 2.0
2 3
89.1 66.3 42.1
12.3 19.0 56.6

The file contains multiple arrays. Each array is
preceded by a line containing the number of rows
and columns in that array. The `max_rows` argument
would make it easy to read this file with genfromtxt:

In [7]: f = open("b.dat", "r")

In [8]: nrows, ncols = genfromtxt(f, dtype=None, max_rows=1)

In [9]: A = genfromtxt(f, max_rows=nrows)

In [10]: nrows, ncols = genfromtxt(f, dtype=None, max_rows=1)

In [11]: B = genfromtxt(f, max_rows=nrows)

In [12]: A
Out[12]:
array([[ 1. ,  1.5,  2.1,  2.5,  4.8],
       [ 3.5,  1. ,  8.7,  6. ,  2. ],
       [ 4.2,  0.7,  4.4,  5.3,  2. ]])

In [13]: B
Out[13]:
array([[ 89.1,  66.3,  42.1],
       [ 12.3,  19. ,  56.6]])


Warren


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Add `nrows` to `genfromtxt`

Reply via email to