Christopher Barker, on 2010-01-04 17:05, wrote: > Hi folks, > > I'm taking a look once again at fromfile() for reading text files. I > often have the need to read a LOT of numbers form a text file, and it > can actually be pretty darn slow do i the normal python way: > > for line in file: > data = map(float, line.strip().split()) > > > or various other versions that are similar. It really does take longer > to read the text, split it up, convert to a number, then put that number > into a numpy array, than it does to simply read it straight into the array. > > However, as it stands, fromfile() turn out to be next to useless for > anything but whitespace separated text. Full set of ideas here: > > http://projects.scipy.org/numpy/ticket/909 > > However, for the moment, I'm digging into the code to address a > particular problem -- reading files like this: > > 123, 65.6, 789 > 23, 3.2, 34 > ... > > That is comma (or whatever) separated text -- pretty common stuff. > > The problem with the current code is that you can't read more than one > line at time with fromfile: > > a = np.fromfile(infile, sep=",") > > will read until it doesn't find a comma, and thus only one line, as > there is no comma after each line. As this is a really typical case, I > think it should be supported.
Just a potshot, but have you tried np.loadtxt? I find it pretty fast. > > Here is the question: > > The work of finding the separator is done in: > > multiarray/ctors.c: fromfile_skip_separator() > > It looks like it wouldn't be too hard to add some code in there to look > for a newline, and consider that a valid separator. However, that would > break backward compatibility. So maybe a flag could be passed in, saying > you wanted to support newlines. The problem is that flag would have to > get passed all the way through to this function (and also for fromstring). > > I also notice that it supports separators of arbitrary length, which I > wonder how useful that is. But it also does odd things with spaces > embedded in the separator: > > ", $ #" matches all of: ",$#" ", $#" ",$ #" > > Is it worth trying to fix that? > > > In the longer term, it would be really nice to support comments as well, > tough that would require more of a re-factoring of the code, I think > (though maybe not -- I suppose a call to fromfile_skip_separator() could > look for a comment character, then if it found one, skip to where the > comment ends -- hmmm. > > thanks for any feedback, > > -Chris > > > > > > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion