Hi folks, I'm taking a look once again at fromfile() for reading text files. I often have the need to read a LOT of numbers form a text file, and it can actually be pretty darn slow do i the normal python way:
for line in file: data = map(float, line.strip().split()) or various other versions that are similar. It really does take longer to read the text, split it up, convert to a number, then put that number into a numpy array, than it does to simply read it straight into the array. However, as it stands, fromfile() turn out to be next to useless for anything but whitespace separated text. Full set of ideas here: http://projects.scipy.org/numpy/ticket/909 However, for the moment, I'm digging into the code to address a particular problem -- reading files like this: 123, 65.6, 789 23, 3.2, 34 ... That is comma (or whatever) separated text -- pretty common stuff. The problem with the current code is that you can't read more than one line at time with fromfile: a = np.fromfile(infile, sep=",") will read until it doesn't find a comma, and thus only one line, as there is no comma after each line. As this is a really typical case, I think it should be supported. Here is the question: The work of finding the separator is done in: multiarray/ctors.c: fromfile_skip_separator() It looks like it wouldn't be too hard to add some code in there to look for a newline, and consider that a valid separator. However, that would break backward compatibility. So maybe a flag could be passed in, saying you wanted to support newlines. The problem is that flag would have to get passed all the way through to this function (and also for fromstring). I also notice that it supports separators of arbitrary length, which I wonder how useful that is. But it also does odd things with spaces embedded in the separator: ", $ #" matches all of: ",$#" ", $#" ",$ #" Is it worth trying to fix that? In the longer term, it would be really nice to support comments as well, tough that would require more of a re-factoring of the code, I think (though maybe not -- I suppose a call to fromfile_skip_separator() could look for a comment character, then if it found one, skip to where the comment ends -- hmmm. thanks for any feedback, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion