Pierre GM wrote: > On Nov 26, 2008, at 5:55 PM, Ryan May wrote: > >> Manuel Metz wrote: >>> Ryan May wrote: >>>> 3) Better support for missing values. The docstring mentions a >>>> way of >>>> handling missing values by passing in a converter. The problem >>>> with this is >>>> that you have to pass in a converter for *every column* that will >>>> contain >>>> missing values. If you have a text file with 50 columns, writing >>>> this >>>> dictionary of converters seems like ugly and needless >>>> boilerplate. I'm >>>> unsure of how best to pass in both what values indicate missing >>>> values and >>>> what values to fill in their place. I'd love suggestions >>> Hi Ryan, >>> this would be a great feature to have !!! > > About missing values: > > * I don't think missing values should be supported in np.loadtxt. That > should go into a specific np.ma.io.loadtxt function, a preview of > which I posted earlier. I'll modify it taking Ryan's new function into > account, and Chrisopher's suggestion (defining a dictionary {column > name : missing values}. > > * StringConverter already defines some default filling values for each > dtype. In np.ma.io.loadtxt, these values can be overwritten. Note > that you should also be able to define a filling value by specifying a > converter (think float(x or 0) for example) > > * Missing values on space-separated fields are very tricky to handle: > take a line like "a,,,d". With a comma as separator, it's clear that > the 2nd and 3rd fields are missing. > Now, imagine that commas are actually spaces ( "a d"): 'd' is now > seen as the 2nd field of a 2-field record, not as the 4th field of a 4- > field record with 2 missing values. I thought about it, and kicked in > touch > > * That said, there should be a way to deal with fixed-length fields, > probably by taking consecutive slices of the initial string. That way, > we should be able to keep track of missing data...
Certainly, yes! Dealing with fixed-length fields would be necessary. The case I had in mind had both -- a separator ("|") __and__ fixed-length fields -- and is probably very special in that sense. But such data-files exists out there... mm _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion