Tom, I agree that the documentation should be updated (both the doctoring and the relevant parts of the user manual), and specific unit-tests added. Paul, that's a direct nudge ;) (I'm sure you don't mind).
I was also considering the weird case >>> first_line = "# A B C #1 #2 #3" How many columns in that case ? 6 ? 3 ? So, instead of using a `split`, maybe we should just check >>> index=first_line.index(comment) and take `first_line[:index]` (or `first_line[index+1:]` after depending on the case). But then again, it's a weird case. -- Pierre GM On Monday, July 16, 2012 at 22:00 , Tom Aldcroft wrote: > On Mon, Jul 16, 2012 at 3:06 PM, Paul Natsuo Kishimoto > <m...@paul.kishimoto.name (mailto:m...@paul.kishimoto.name)> wrote: > > I've implemented this feature with skip_header=-1 as suggested by > > Pierre, and in doing so removed the regression. TravisBot seems to like > > it: https://github.com/numpy/numpy/pull/351 > > > > On Mon, 2012-07-16 at 16:12 +0200, Pierre GM wrote: > > > To be ultra clear (since I want to code this), you are > > > suggesting that > > > 'first_commented_line' be a *new* accepted value for the kwarg > > > 'names', to invoke the behaviour you suggest? > > > > > > > > > > > > Nope, I was just referring to some hypothetical variable name. I meant > > > that: > > > > > > first_values = None > > > try: > > > while not first_values: > > > first_line = fhd.next() > > > if names is True: > > > parsed = [m for m in first_line.split(comments) if > > > m.strip()] > > > if parsed: > > > first_value = split_line(parsed[0]) > > > else: > > > ... > > > > > > (it's not tested, I'm writing it as it comes. And I didn't even use > > > the `first_commented_line` name, sorry) > > > > > > > > > If this IS what you mean, I'd counter-propose something in the > > > same spirit, but a bit simpler…we let the kwarg 'skip_header' > > > take some additional value, say int(0), int(-1), str('auto'), > > > or True. > > > > > > > > > > > > > > > In this case, instead of skipping a fixed number of lines, it > > > will skip any number of consecutive empty OR commented lines; > > > > > > > > > > > > > > > I really like the idea of having `skip_header=-1` skip all the empty > > > or commented lines (that is, lines whose first non-space character is > > > the `comments` character). That'd be rather convenient. > > > > > > > > > > > > > > > The semantics of this are more intuitive, because this is what > > > I am > > > really after: to *skip* a commented *header* of arbitrary > > > length. So my four examples below could be parsed with: > > > > > > 1. genfromtxt(..., names=True) > > > 2. genfromtxt(..., names=True, skip_header=True) > > > 3. genfromtxt(..., names=True) > > > 4. genfromtxt(..., names=True, skip_header=True) > > > > > > …crucially #1 avoids the regression. > > > > > > > > > Does this seem good to everyone? > > > > > > > > > > > > > > > Sounds good w/ `skip_header=-1` > > > > > > > > > But if this is NOT what you mean, then what you say does not > > > actually work with the simple use-case of my Example #2 below. > > > The first commented line is "# here is a..." with # as the > > > first non-space character, so the part after becomes the names > > > 'here', 'is', 'a' etc. > > > > > > > > > > > > > > > In that case, you could always use `skip_header=2` > > > > > > In short, the code can't resolve the ambiguity without some > > > extra > > > information from the user. > > > > > > > > > > > > > > > It's always best not to let the code guess too much anyway... > > > > > > Well, no regression, and you have a nice plan. I'm for it. > > > Anybody else? > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@scipy.org (mailto:NumPy-Discussion@scipy.org) > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > -- > > Paul Natsuo Kishimoto > > > > SM candidate, Technology & Policy Program (2012) > > Research assistant, http://globalchange.mit.edu > > https://paul.kishimoto.name +1 617 302 6105 > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org (mailto:NumPy-Discussion@scipy.org) > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > I think that the proposed solution is OK, but it does make it even > trickier for the average user to predict the behavior of genfromtxt() > for different situations. Perhaps as part of this pull request Paul > should also update the documentation to include a section describing > this behavior and usage with examples 1 to 4. > > - Tom > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org (mailto:NumPy-Discussion@scipy.org) > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion