Re: [R] CSV format issues

jim holtman Mon, 23 Jul 2012 07:43:52 -0700

try this; looks for strings of numbers with commas and quotes them:


> x <- readLines(textConnection("Time,Value
+ 32,-7,183246E-02
+ 32,05,3,469364E-02"))
> # process the data putting in quotes on scientific
> x.new1 <- gsub("(-?[0-9]+,[0-9]+E-?[0-9]+)", '"\\1"', x)
> x.new1
[1] "Time,Value"             "32,\"-7,183246E-02\""   "32,05,\"3,469364E-02\""
> # put quotes on just numbers
> x.new2 <- gsub("(-?[0-9]+,[0-9]+)(,|$)", '"\\1"\\2', x.new1)
> x.new2
[1] "Time,Value"                 "32,\"-7,183246E-02\""
"\"32,05\",\"3,469364E-02\""
> temp <- tempfile()
> writeLines(x.new2, temp)
> x.input <- read.csv(temp)
> x.input
   Time         Value
1    32 -7,183246E-02
2 32,05  3,469364E-02


On Mon, Jul 23, 2012 at 9:06 AM, Guillaume Meurice
<guillaume.meur...@igr.fr> wrote:
> Dear all,
>
> I have some encoding problem which I'm not familiar with.
> Here is the case :
> I'm read data files which can have been generated from a  computer either 
> with global settings in french or in english.
>
> Here is an exemple ouf data file :
>
> * English output
> Time,Value
> 17,-0.0753953
> 17.05,-6.352454E-02
>
> * French output.
> Time,Value
> 32,-7,183246E-02
> 32,05,3,469364E-02
>
> In the first case, I can totally retrieve both columns, splitting each line 
> using the comma as a separator.
> In the second case, it's impossible, since the comma (in french) is also used 
> to separate decimal. Usually, the CSV french file format add some quote, to 
> distinguish the comma used as column separator from comma used as decimal, 
> like the following :
>
> Time,Value
> 32,"-7,183246E-02"
> "32,05","3,469364E-02"
>
> Since I'm expecting 2 numbers, I can set that if there is 3 comma, the first 
> two number are to be gathered as well as the two lefting ones.
> But in case of only two comma, which number is the floating one (I know that 
> it is the second one, but who this is a great source of bugs ...).
>
> the unix tools "file" returns :
> ===
> $ file P23_RD06_High\ Sensitivity\ DNA\ 
> Assay_DE04103664_2012-06-27_11-57-29_Sample1.csv
> $ P23_RD06_High Sensitivity DNA 
> Assay_DE04103664_2012-06-27_11-57-29_Sample1.csv: ASCII text, with CRLF line 
> terminators
> ===
>
>
> Unfortunately, the raw file doesn't contains the precious quote. So sorry to 
> bother with this question which is not totally related to R (which I'm 
> using). Do you know if there any facilities using R to get the data in the 
> good format ?
>
>
> Bests,
> --
> Guillaume Meurice - PhD
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CSV format issues

Reply via email to