Re: [Numpy-discussion] numpy 1.7.0 release?

Pierre Haessig Wed, 07 Dec 2011 02:24:59 -0800

Le 06/12/2011 23:13, Wes McKinney a écrit :
> I think R has two functions read.csv and read.csv2, where read.csv2 is
> capable of dealing with things like European decimal format.
>
I may be wrong, but from R's help I understand that read.csv, read.csv2, 
read.delim, ...
are just calls to read.table with different default values (for 
separtor, decimal sign, ....)
This function read.table is indeed pretty flexible (see signature below)


Having a dedicated fast function for properly formatted CSV table may be 
a good idea.
But how to define "properly formatted" ... I've seen many tiny 
variations so I'm not sure !

Now for my personal use, I was not so frustrated by loading performance 
but rather by NA support, so I wrote my own loadCsv function to get a 
masked array. It was nor beautiful, neither very efficient, but it does 
the job !

Best,
Pierre

read.table &co signatures :

read.table(file, header = FALSE, sep = "", quote = "\"'",
                 dec = ".", row.names, col.names,
                 as.is = !stringsAsFactors,
                 na.strings = "NA", colClasses = NA, nrows = -1,
                 skip = 0, check.names = TRUE, fill = !blank.lines.skip,
                 strip.white = FALSE, blank.lines.skip = TRUE,
                 comment.char = "#",
                 allowEscapes = FALSE, flush = FALSE,
                 stringsAsFactors = default.stringsAsFactors(),
                 fileEncoding = "", encoding = "unknown", text)

read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".",
               fill = TRUE, comment.char="", ...)

read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",",
                fill = TRUE, comment.char="", ...)

---------------------------------------------------------
Copy paste from my own dirty "csv toolbox"

NA = -9999.
def _NA_conv(s):
     '''convert a string number representation into a float,
     with a special behaviour for "NA" values :
     if s=="" or "NA", it returns the key value NA (set to -9999.)
     '''
     if s=='' or s=='NA':
         return NA
     else:
         return float(s)

def loadCsv(filename, delimiter=',', usecols=None, skiprows=1):
     '''wrapper around numpy.loadtxt to load
     a properly R formatted CSV file with NA values
     of which the first row should be a header row

     Returns
     -------
     (headers, data, dataNAs)
     '''
     # 1) Read header
     headers = []
     with open(filename) as f:
         line = f.readline().strip()
         headers = line.split(delimiter)

     if usecols:
         headers = [headers[i] for i in usecols]

     # 2) Read
     converters = None
     if usecols is not None:
         converters = dict(zip(usecols, [_NA_conv]*len(usecols)))
     data = np.loadtxt(filename,
                       delimiter=delimiter, usecols=usecols, 
skiprows=skiprows,
                       converters = converters
                       )

     dataNAs = (data == NA)
     # Set NAs to zero
     data[dataNAs] = 0.
     # Transforms array in "masked array"
     data = np.ma.masked_array(data, dataNAs)

     return (headers, data, dataNAs)


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy 1.7.0 release?

Reply via email to