>On Mon, Jan 4, 2010 at 10:39 PM, <a...@ajackson.org> wrote: >>>Hi folks, >>> >>>I'm taking a look once again at fromfile() for reading text files. I >>>often have the need to read a LOT of numbers form a text file, and it >>>can actually be pretty darn slow do i the normal python way: >>> .....................big snip >> >> I agree. I've tried using it, and usually find that it doesn't quite get >> there. >> >> I rather like the R command(s) for reading text files - except then I have to >> use R which is painful after using python and numpy. Although ggplot2 is >> awfully nice too ... but that is a later post. >> >> read.table(file, header = FALSE, sep = "", quote = "\"'", >> dec = ".", row.names, col.names, >> as.is = !stringsAsFactors, >> na.strings = "NA", colClasses = NA, nrows = -1, >> skip = 0, check.names = TRUE, fill = !blank.lines.skip, >> strip.white = FALSE, blank.lines.skip = TRUE, >> comment.char = "#", >> allowEscapes = FALSE, flush = FALSE, >> stringsAsFactors = default.stringsAsFactors(), >> fileEncoding = "", encoding = "unknown") ....................... big snip > > >Aren't the newly improved > >numpy.genfromtxt(fname, dtype=<type 'float'>, comments='#', >delimiter=None, skiprows=0, converters=None, missing='', >missing_values=None, usecols=None, names=None, excludelist=None, >deletechars=None, case_sensitive=True, unpack=None, usemask=False, >loose=True) > >and friends indented to handle all this > >Josef >
Reopening an old thread... genfromtxt is a big step forward. Something I'm fiddling with is trying to work through the book "Using R for Data Analysis and Graphics, Introduction, Code, and Commentary" by J H Maindonald (available online), in python. So I am trying to see what it takes in python/numpy to work his examples and problems, sort of a learning exercise for me. So anyway, with that introduction, here is a case that I believe genfromtxt fails on, because it doesn't support the reasonable (IMHO) behavior of treating quote delimited strings in the input file as a single field. Below is the example from the book... So we have 2 issues. The header for the first field is quote-blank-quote, and various values for field one have 1 to 3 blank delimited strings, but encapsulated in quotes. I'm putting something together to read it using shlex.split, since it honors strings protected by quote pairs. I'm not an excel person, but I think it might export data like this in a format similar to what is shown below. " " "distance" "climb" "time" "Greenmantle" 2.5 650 16.083 "Carnethy" 6 2500 48.35 "Craig Dunain" 6 900 33.65 "Ben Rha" 7.5 800 45.6 "Ben Lomond" 8 3070 62.267 "Goatfell" 8 2866 73.217 "Bens of Jura" 16 7500 204.617 "Cairnpapple" 6 800 36.367 "Scolty" 5 800 29.75 "Traprain" 6 650 39.75 "Lairig Ghru" 28 2100 192.667 -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | a...@ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion