On Fri, Jan 8, 2010 at 5:12 PM, Christopher Barker <chris.bar...@noaa.gov> wrote: > Bruce Southey wrote: >> Also a user has to check for missing >> values or numpy has to warn a user > > I think warnings are next to useless for all but interactive work -- so > I don't want to rely on them > >> that missing values are present >> immediately after reading the data so the appropriate action can be >> taken (like using functions that handle missing values appropriately). >> That is my second problem with using codes (NaN, -99999 etc) for >> missing values. > > But I think you're right -- if someone write code, tests with good > input, then later runs it with missing valued import, they are likely to > have not ever bothered to test for missing values. > > So I think missing values should only be replaced by something if the > user specifically asks for it. > >>> And the principle of fromfile() is that it is fast and simple, if you >>> want masked arrays, use slower, but more full-featured methods. >> >> So in that case it should fail with missing data. > > Well, I'm not so sure -- the point is performance, no reason not to have > high performing code that handles missing data. > >> What about '\r' and '\n\r'? > > I have thought about that -- I'm hoping that python's text file reading > will just take care of it, but as we're working with C file handles here > (I think), I guess not. '/n/r' is easy -- the '/r' is just extra > whitespace. 'r' is another case to handle. > > >> My problem with this is that you are reading one huge 1-D array (that >> you can resize later) rather than a 2-D array with rows and columns >> (which is what I deal with). > > That's because fromfile()) is not designed to be row-oriented at all, > and the binary read certainly isn't. I'm just trying to make this easy > -- though it's not turning out that way! > > > But I agree that you can have an option >> to say treat '\n' or '\r' as a delimiter but I think it should be >> turned off by default. > > that's what I've done. > >> You should have a corresponding value for ints because raising an >> exceptionwould be inconsistent with allowing floats to have a value. > > I'm not sure I care, really -- but I think having the user specify the > fill value is the best option, anyway. > > josef.p...@gmail.com wrote: >>>> none -- exactly why I think \n is a special case. >>> What about '\r' and '\n\r'? >> >> Yes, I forgot about this, and it will be the most common case for >> Windows users like myself. >> >> I think \r should be stripped automatically, like in non-binary >> reading of files in python. > > except for folks like me that have old mac files laying around...so I > want this like "Universal newlines" support. > >> A warning would be good, but doing np.any(np.isnan(x)) or >> np.isnan(x).sum() on the result is always a good idea for a user when >> missing values are possibility. > > right, but the issue is the user has to know that they are possible, and > we all know how carefully we all read docs! > > Thanks for your input -- I think I know what I'd like to do, but it's > proving less than trivial to do it, so we'll see. > > In short: > > 1) optionally allow newlines to serve as a delimiter, so large tables > can be read. > > 2) raise an exception for missing values, unless: > 3) the user specifies a fill value of their choice (compatible with > the chosen data type. > > > -Chris > >
I fully agree with your approach! Thanks for considering my thoughts! Bruce _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion