Bruce Southey wrote: >> <chris.bar...@noaa.gov> wrote: > Using the numpy NaN or similar (noting R's approach to missing values > which in turn allows it to have the above functionality) is just a > very bad idea for missing values because you always have to check that > which NaN is a missing value and which was due to some numerical > calculation.
well, this is specific to reading files, so you know where it came from. And the principle of fromfile() is that it is fast and simple, if you want masked arrays, use slower, but more full-featured methods. However, in this case: In [9]: np.fromstring("3, 4, NaN, 5", sep=",") Out[9]: array([ 3., 4., NaN, 5.]) An actual NaN is read from the file, rather than a missing value. Perhaps the user does want the distinction, so maybe it should really only fil it in if the users asks for it, but specifying "missing_value=np.nan" or something. >>From what I can see is that you expect that fromfile() should only > split at the supplied delimiters, optionally(?) strip any whitespace whitespace stripping is not optional. > Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12' > actually assumes multiple delimiters because there is no comma between > 4 and 5 and 8 and 9. Yes, that's the point. I thought about allowing arbitrary multiple delimiters, but I think '/n' is a special case - for instance, a comma at the end of some numbers might mean missing data, but a '\n' would not. And I couldn't really think of a useful use-case for arbitrary multiple delimiters. > In Josef's last case how many 'missing values should there be? >> extra newlines at end of file >> str = '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n' none -- exactly why I think \n is a special case. What about: >> extra newlines in the middle of the file >> str = '1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n' I think they should be ignored, but I hope I'm not making something that is too specific to my personal needs. Travis Oliphant wrote: > +1 (ignoring new-lines transparently is a nice feature). You can also > use sscanf with weave to read most files. right -- but that requires weave. In fact, MATLAB has a fscanf function that allows you to pass in a C format string and it vectorizes it to use the same one over an over again until it's done. It's actually quite powerful and flexible. I once started with that in mind, but didn't have the C chops to do it. I ended up with a tool that only did doubles (come to think of it, MATLAB only does doubles, anyway...) I may some day write a whole new C (or, more likely, Cython) function that does something like that, but for now, I'm jsut trying to get fromfile to be useful for me. > +1 (much preferrable to insert NaN or other user value than raise > ValueError in my opinion) But raise an error for integer types? I guess this is still up the air -- no consensus yet. Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion