On 9/24/14, Alan G Isaac <alan.is...@gmail.com> wrote: > On 9/24/2014 2:52 PM, Jaime Fernández del Río wrote: >> There is a PR in github that adds a new keyword to the genfromtxt >> function, to limit the number of rows that actually get read in: >> https://github.com/numpy/numpy/pull/5103 > > Sorry to come late to this party, but it seems to me that > more versatile than an `nrows` keyword for the number of rows > would be a "rows" keyword for a slice argument. > > fwiw, > Alan Isaac >
I've continued the PR for the addition of the `nrows` (now `max_rows`) argument to `genfromtxt` here: https://github.com/numpy/numpy/pull/5253 Alan's suggestion to use a slice is interesting, but I'd like to see a more concrete proposal for the API. For example, how does it interact with `skip_header` and `skip_footer`? How would one use it to read a file in batches? The following are a couple use-cases for `max_rows` (originally added as comments at https://github.com/numpy/numpy/pull/5103): (1) Read a file in batches: Suppose the file "a.csv" contains: 0 10 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 With `max_rows`, the file can be read in batches of, say, 4: In [31]: f = open("a.csv", "r") In [32]: genfromtxt(f, dtype=None, max_rows=4) Out[32]: array([[ 0, 10], [ 1, 11], [ 2, 12], [ 3, 13]]) In [33]: genfromtxt(f, dtype=None, max_rows=4) Out[33]: array([[ 4, 14] [ 5, 15], [ 6, 16], [ 7, 17]]) In [33]: genfromtxt(f, dtype=None, max_rows=4) Out[33]: array([[ 8, 18], [ 9, 19]]) (2) Multiple arrays in a single file: I've seen a file format of the form 3 5 1.0 1.5 2.1 2.5 4.8 3.5 1.0 8.7 6.0 2.0 4.2 0.7 4.4 5.3 2.0 2 3 89.1 66.3 42.1 12.3 19.0 56.6 The file contains multiple arrays. Each array is preceded by a line containing the number of rows and columns in that array. The `max_rows` argument would make it easy to read this file with genfromtxt: In [7]: f = open("b.dat", "r") In [8]: nrows, ncols = genfromtxt(f, dtype=None, max_rows=1) In [9]: A = genfromtxt(f, max_rows=nrows) In [10]: nrows, ncols = genfromtxt(f, dtype=None, max_rows=1) In [11]: B = genfromtxt(f, max_rows=nrows) In [12]: A Out[12]: array([[ 1. , 1.5, 2.1, 2.5, 4.8], [ 3.5, 1. , 8.7, 6. , 2. ], [ 4.2, 0.7, 4.4, 5.3, 2. ]]) In [13]: B Out[13]: array([[ 89.1, 66.3, 42.1], [ 12.3, 19. , 56.6]]) Warren > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion