Le 06/12/2011 23:13, Wes McKinney a écrit : > I think R has two functions read.csv and read.csv2, where read.csv2 is > capable of dealing with things like European decimal format. > I may be wrong, but from R's help I understand that read.csv, read.csv2, read.delim, ... are just calls to read.table with different default values (for separtor, decimal sign, ....) This function read.table is indeed pretty flexible (see signature below)
Having a dedicated fast function for properly formatted CSV table may be a good idea. But how to define "properly formatted" ... I've seen many tiny variations so I'm not sure ! Now for my personal use, I was not so frustrated by loading performance but rather by NA support, so I wrote my own loadCsv function to get a masked array. It was nor beautiful, neither very efficient, but it does the job ! Best, Pierre read.table &co signatures : read.table(file, header = FALSE, sep = "", quote = "\"'", dec = ".", row.names, col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", encoding = "unknown", text) read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE, comment.char="", ...) read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",", fill = TRUE, comment.char="", ...) --------------------------------------------------------- Copy paste from my own dirty "csv toolbox" NA = -9999. def _NA_conv(s): '''convert a string number representation into a float, with a special behaviour for "NA" values : if s=="" or "NA", it returns the key value NA (set to -9999.) ''' if s=='' or s=='NA': return NA else: return float(s) def loadCsv(filename, delimiter=',', usecols=None, skiprows=1): '''wrapper around numpy.loadtxt to load a properly R formatted CSV file with NA values of which the first row should be a header row Returns ------- (headers, data, dataNAs) ''' # 1) Read header headers = [] with open(filename) as f: line = f.readline().strip() headers = line.split(delimiter) if usecols: headers = [headers[i] for i in usecols] # 2) Read converters = None if usecols is not None: converters = dict(zip(usecols, [_NA_conv]*len(usecols))) data = np.loadtxt(filename, delimiter=delimiter, usecols=usecols, skiprows=skiprows, converters = converters ) dataNAs = (data == NA) # Set NAs to zero data[dataNAs] = 0. # Transforms array in "masked array" data = np.ma.masked_array(data, dataNAs) return (headers, data, dataNAs) _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion