I've implemented this feature with skip_header=-1 as suggested by Pierre, and in doing so removed the regression. TravisBot seems to like it: https://github.com/numpy/numpy/pull/351
On Mon, 2012-07-16 at 16:12 +0200, Pierre GM wrote: > To be ultra clear (since I want to code this), you are > suggesting that > 'first_commented_line' be a *new* accepted value for the kwarg > 'names', to invoke the behaviour you suggest? > > > > Nope, I was just referring to some hypothetical variable name. I meant > that: > > first_values = None > try: > while not first_values: > first_line = fhd.next() > if names is True: > parsed = [m for m in first_line.split(comments) if > m.strip()] > if parsed: > first_value = split_line(parsed[0]) > else: > ... > > (it's not tested, I'm writing it as it comes. And I didn't even use > the `first_commented_line` name, sorry) > > > If this IS what you mean, I'd counter-propose something in the > same spirit, but a bit simpler…we let the kwarg 'skip_header' > take some additional value, say int(0), int(-1), str('auto'), > or True. > > > > > In this case, instead of skipping a fixed number of lines, it > will skip any number of consecutive empty OR commented lines; > > > > > I really like the idea of having `skip_header=-1` skip all the empty > or commented lines (that is, lines whose first non-space character is > the `comments` character). That'd be rather convenient. > > > > > The semantics of this are more intuitive, because this is what > I am > really after: to *skip* a commented *header* of arbitrary > length. So my four examples below could be parsed with: > > 1. genfromtxt(..., names=True) > 2. genfromtxt(..., names=True, skip_header=True) > 3. genfromtxt(..., names=True) > 4. genfromtxt(..., names=True, skip_header=True) > > …crucially #1 avoids the regression. > > > Does this seem good to everyone? > > > > > Sounds good w/ `skip_header=-1` > > > But if this is NOT what you mean, then what you say does not > actually work with the simple use-case of my Example #2 below. > The first commented line is "# here is a..." with # as the > first non-space character, so the part after becomes the names > 'here', 'is', 'a' etc. > > > > > In that case, you could always use `skip_header=2` > > In short, the code can't resolve the ambiguity without some > extra > information from the user. > > > > > It's always best not to let the code guess too much anyway... > > Well, no regression, and you have a nice plan. I'm for it. > Anybody else? > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Paul Natsuo Kishimoto SM candidate, Technology & Policy Program (2012) Research assistant, http://globalchange.mit.edu https://paul.kishimoto.name +1 617 302 6105
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion