On Mon, Sep 14, 2009 at 10:55 PM, Skipper Seabold <jsseab...@gmail.com> wrote: > On Mon, Sep 14, 2009 at 10:41 PM, Pierre GM <pgmdevl...@gmail.com> wrote: >> >> On Sep 14, 2009, at 10:31 PM, Skipper Seabold wrote: >>> >>> I actually figured out a workaround with converters, since my missing >>> values are " "," "," " ie., irregular number of spaces and the >>> values aren't stripped of white spaces. I just define {# : lambda s: >>> float(s.strip() or 0)}, and I have a loop build all of the converters, >>> but then I have to go through and drop the ones that are supposed to >>> be strings or dates, which is still pretty tedious, since I have a >>> number of datasets that are like this, but they all contain different >>> data in different orders and there's no (computer) logical order to it >>> that I've discovered yet. >> >> I understand your frustration... We could think about some kind of >> global default for the missing values... > > I'm not too frustrated, I'd just like to do this as few times as > humanly (or machine-ly, rather) possible in the future... > > The main thing I'd like right now I think is for whitespace to be > stripped, but maybe there is a good reason for this. I didn't realize > this was the source of my confusion at first. Also just being able to > define missing as a number would be nice. I started a patch for this, > but I reverted when I realized I could make the converters as I did. > > While we're on the subject, the other thing on my wishlist (unless I > just don't know how to do this) is being able to define a "column map" > for datasets that have no delimiters. At first each observation of my > data was just one long string with no gaps or regular breaks but I > knew which columns had what. Eg., the first variable was (not > zero-indexed) columns 1-6, the second columns 11-15, the third column > 16, etc. so I would just say delimiter = [1:6,11:15,16,...]. >
Err, 1-6, 7-10, 11-15, 16... I need some sleep. >>> I tried another workaround for the dates with my converters defined >>> as conv >>> >>> conv.update({date : lambda s : datetime(*map(int, >>> s.strip().split('/')[-1:]+s.strip().split('/')[:2]))}) >>> >>> Where `date` is the column that contains a date. The problem was that >>> my dates are "mm/dd/yyyy" and datetime needs "yyyy,mm,dd," it worked >>> for a test case if my dates were "dd/mm/yyyy" and I just use reversed, >>> but gave an error about not finding the day in the third position, >>> though that lambda function worked for a test case outside of >>> genfromtxt. >> >> Check the archives of the mailing list, there's an example using >> dateutil.parser that may be just what you need. >> > > Ah ok. I looked for a bit, but I was sure I missed something. Thanks. > > Skipper > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion