On 11/2/14, Alexander Belopolsky <ndar...@mac.com> wrote: > On Sun, Nov 2, 2014 at 2:32 PM, Warren Weckesser > <warren.weckes...@gmail.com >> wrote: > >> >>> Still, the case of dtype=None, name=None is problematic. Suppose I >>> want >>> genfromtxt() to detect the column names from the 1-st row and data >>> types >>> from the 3-rd. How would you do that? >>> >>> >> >> This may sound like a cop out, but at some point, I stop trying to make >> genfromtxt() handle every possible case, and instead I would write a >> custom >> header reader to handle this. >> > > In the abstract, I would agree with you. It is often the case that 2-3 > lines of clear Python code is better than a terse function call with half a > dozen non-obvious options. Specifically, I would be against the proposed > slice_rows because it is either equivalent to genfromtxt(islice(..), ..) > or hard to specify.
I don't have much more to add to the API discussion at the moment, but I want to make sure one aspect is clear. (Sorry for the noise if the following is obvious.) In an earlier email, I gave my interpretation of the semantics of `slice_rows` (and `max_rows`), which is that `genfromtxt(f, ..., slice_rows=arg)` produces the same result as `genfromtxt(f, ...)[arg]`. (The difference is that it only consumes items from the input iterator f as required by `arg`). This isn't the same as `genfromtxt(islice(f, <slice args>), ...)`, because `genfromtxt` skips comments and blank lines. (It also skips invalid lines if the argument `invalid_raise=False` is used.) So if the input file was ----- 1 10 # A comment. 2 20 3 30 4 40 5 50 ----- Then `genfromtxt(f, dtype=int, slice_rows=slice(4))` would produce `array([[1, 10], [2, 20], [3, 30], [4, 40]])`, while `genfromtxt(islice(f, 4), dtype=int)` would produce `array([1, 10], [2, 20]])`. That's my interpretation of how `max_rows` or `slice_rows` should work. If that is not what other folks expect, than that should also be part of the discussion. Warren > > On the other hand, skip_rows is different for two reasons: > > 1. It is not a new option. It is currently a deprecated alias to > skip_header, so a change is expected - either removal or redefinition. > 2. The intended use-case - inferring column names and type information from > a file where data is separated from the column names is hard to code > explicitly. (Try it!) > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion