[R] data file import - numbers and letters in a matrix(!)
Hello, I have a problem with the import of a date file. I seems verry tricky. I have a text file (end of the mail). Every file has a different number of measurments witch start with START OF HEIGHT DATA and ende with END OF HEIGHT DATA. I imported the file in a matrix but the letters before the numbers are my problem (S= ,S=,x=,y=). Because through the letters and the space after S= I got a different number of columns in my matrix and with letters in my matrix I can't count. My question. Is it possible to import the file to got 3 columns only with numbers and no letters like x=, y=? Thank's a lot Felix My R Code: -- # na.strings = S= Measure1 - matrix(scan(data.dat, n= 5063 * 4, skip = 20, what = character() ), 5063, 3, byrow = TRUE) Measure2 - matrix(scan(data.dat, n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE) My data file: --- FILEDATE:02.02.2007 ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 ... S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0. S=10 y=0.1 x=0.00075557 ... S=99 y=5.3 x=1.94719490 END OF HEIGHT DATA ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 The imported matrix: [,1] [,2] [,3] [,4] [6,] S= 9y=4.9x=1.67278117 [7,] S= 9y=5.0x=1.74873257 [8,] S=10 y=0.0x=0. S=10 [9,] y=0.1x=0.00075557 S=10 y=0.2 [10,] x=0.00277444 S=10 y=0.3x=0.00605958 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data file import - numbers and letters in a matrix(!)
Try pasting this into an R session: Lines.raw - FILEDATE:02.02.2007 ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 ... S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0. S=10 y=0.1 x=0.00075557 ... S=99 y=5.3 x=1.94719490 END OF HEIGHT DATA ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 # next line would be replaced by # somthing like: Lines - readLines(myfile.dat) Lines - readLines(textConnection(Lines.raw)) # extract those lines that contain an = Lines - grep(=, Lines, value = TRUE) # get col names by removing all but letters spaces from line 1 cn - gsub([^a-zA-Z ], , Lines[1]) cn - scan(textConnection(cn), what = ) # remove anything that is not a number, dot or space and read in Lines - gsub([^ .0-9], , Lines) DF - read.table(textConnection(Lines), col.names = cn) closeAllConnections() DF On 4/12/07, Felix Wave [EMAIL PROTECTED] wrote: Hello, I have a problem with the import of a date file. I seems verry tricky. I have a text file (end of the mail). Every file has a different number of measurments witch start with START OF HEIGHT DATA and ende with END OF HEIGHT DATA. I imported the file in a matrix but the letters before the numbers are my problem (S= ,S=,x=,y=). Because through the letters and the space after S= I got a different number of columns in my matrix and with letters in my matrix I can't count. My question. Is it possible to import the file to got 3 columns only with numbers and no letters like x=, y=? Thank's a lot Felix My R Code: -- # na.strings = S= Measure1 - matrix(scan(data.dat, n= 5063 * 4, skip = 20, what = character() ), 5063, 3, byrow = TRUE) Measure2 - matrix(scan(data.dat, n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE) My data file: --- FILEDATE:02.02.2007 ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 ... S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0. S=10 y=0.1 x=0.00075557 ... S=99 y=5.3 x=1.94719490 END OF HEIGHT DATA ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 The imported matrix: [,1] [,2] [,3] [,4] [6,] S= 9y=4.9x=1.67278117 [7,] S= 9y=5.0x=1.74873257 [8,] S=10 y=0.0x=0. S=10 [9,] y=0.1x=0.00075557 S=10 y=0.2 [10,] x=0.00277444 S=10 y=0.3x=0.00605958 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data file import - numbers and letters in a matrix(!)
Here is the contents of my testdata.txt : - START OF HEIGHT DATA S= 0y=0.0 x=0. S= 0 y=0.1 x=0.00055643 S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0. S=10y=0.1 x=0.00075557 S=99 y=5.3x=1.94719490 END OF HEIGHT DATA - If you have access to a shell command, you can try changing the input file for read.delim using cat testdata.txt | grep -v ^START | grep -v ^END | sed 's/ //g' | sed 's/S=//' | sed 's/y=/\t/' | sed 's/x=/\t/' or here is my ugly fix in R my.read.file - function(file=file){ v1 - readLines( con=file, n=-1) v2 - v1[ - grep( ^START|^END, v1 ) ] v3 - gsub( , , v2) v4 - gsub( S=|y=|x=, , v3 ) v5 - gsub(^ , , v4) m - t( sapply( strsplit(v5, split= ), as.numeric ) ) colnames(m) - c(S, y, x ) return(m) } my.read.file( testdata.txt ) Regards, Adai Felix Wave wrote: Hello, I have a problem with the import of a date file. I seems verry tricky. I have a text file (end of the mail). Every file has a different number of measurments witch start with START OF HEIGHT DATA and ende with END OF HEIGHT DATA. I imported the file in a matrix but the letters before the numbers are my problem (S= ,S=,x=,y=). Because through the letters and the space after S= I got a different number of columns in my matrix and with letters in my matrix I can't count. My question. Is it possible to import the file to got 3 columns only with numbers and no letters like x=, y=? Thank's a lot Felix My R Code: -- # na.strings = S= Measure1 - matrix(scan(data.dat, n= 5063 * 4, skip = 20, what = character() ), 5063, 3, byrow = TRUE) Measure2 - matrix(scan(data.dat, n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE) My data file: --- FILEDATE:02.02.2007 ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 ... S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0. S=10 y=0.1 x=0.00075557 ... S=99 y=5.3 x=1.94719490 END OF HEIGHT DATA ... START OF HEIGHT DATA S= 0 y=0.0 x=0. S= 0 y=0.1 x=0.00055643 The imported matrix: [,1] [,2] [,3] [,4] [6,] S= 9y=4.9x=1.67278117 [7,] S= 9y=5.0x=1.74873257 [8,] S=10 y=0.0x=0. S=10 [9,] y=0.1x=0.00075557 S=10 y=0.2 [10,] x=0.00277444 S=10 y=0.3x=0.00605958 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.