Re: [R] Reading a file w/ two delimiters
... and yet another line I left out below! I apologize for this baloney! On Fri, Nov 18, 2011 at 10:48 AM, Bert Gunter wrote: > ... I failed to correctly paste the first line of an example: > > On Fri, Nov 18, 2011 at 10:44 AM, Bert Gunter wrote: >> David: >> >> As you now realize "\t" etc. is a perfectly legal single tab character. >> >> Now consider: > - left this out -- >> gsub("\\","a","\\") > --- >> Error in gsub("\\", "a", "\\") : >> invalid regular expression '\', reason 'Trailing backslash' >> >> BUT >> >>> gsub("","a","\\") >> [1] "a" >> >> ??? >> >> The issue is there are two levels of escapes here -- the R parser's >> and the reg expression's. The R parser recognizes "\\" as a single >> backslash character in the third argument of gsub above. In the first >> incorrect version, this single backslash is passed on to the reg >> expression engine and it sees a single backslash, which is meaningless >> to it. For example, a backreference would be something like "\\2" = >> "backslash 2." >> >> The second incantation's first argument is correct and is passed onto >> the reg expression engine as "backslash backslash," which it >> interprets as an escaped "\" which is a literal "\" , per the >> documentation. >> >> So what about : >> --- also left this out --- z <- "ab\tcd" --- >>> cat(z) >> ab cd> >>> cat(sub("\\\t","\n",z)) >> ab >> cd> >> >> R passes "backslash tab_character" to the regexp engine, which looks >> also to me like an error ; However, this may be one of those >> "implementation dependent" details mentioned in the Help file, It >> seems to me that the engine sees a meaningless escape sequence and >> just throws away the escape to interpret the character literally. As >> support for this, "\h" is not a meaningful escape sequence in R: >> >>> gsub("\\h","a","\h") >> Error: '\h' is an unrecognized escape in character string starting "\h" >> >> and >> >>> gsub("\\h","a","h") >> [1] "a" >> >> But I may be wrong, and I am hoping that this post will prompt someone >> more knowledgeable than I to respond (if only just to confirm my >> "explanation" if it's correct). >> >> Cheers, >> Bert >> >> >> >> >> >> On Fri, Nov 18, 2011 at 7:26 AM, David Winsemius >> wrote: >>> >>> On Nov 18, 2011, at 9:28 AM, jim holtman wrote: >>> It is pretty straightforward in R: > x <- > readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) > closeAllConnections() > # convert tabs to newlines > x <- gsub("\t", "\n", x) >>> >>> Did the rules get liberalized for escaping patterns? Or have I been >>> unnecessarily expending backslashes all these years. I thought that one >>> needed 3 blackslashes. This code does work and I am wondering if/when I >>> "didn't get the memo". (I do see that there is a line early in the ?regex >>> page that suggests I have been deluded all along.) >>> >>> "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as >>> LF, \r as CR and \t as TAB." >>> x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) closeAllConnections() # convert tabs to newlines x2 <- gsub("\\\t", "\n", x) x2 >>> [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv" >>> >>> So I guess my question is (now) why the triple-slash technique even works? >>> >>> -- >>> David. >>> >>> >>> > # write out to a temp file and then read in as a data frame > myFile <- tempfile() > writeLines(x, con = myFile) > x.df <- read.table(myFile, sep = "|") > > > x.df V1 V2 V3 1 sadf asdf asdf 2 qwer qwer qwer 3 zxcv zxcv zxfcgv > On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim wrote: > > Thanks Paul, > > That's the path I was marching down, I was hoping for something > a little cleaner, I do the same with Perl or Java. > > Jim > > On 11/18/11 8:35 AM, "Paul Hiemstra" wrote: > >> Hi Jim, >> >> You can read the text file using readLines. This puts each line in the >> file into an element of a list. Then you can go through the lines >> manually (e.g. using grep, sub, strsplit) and create your data.frame. >> >> cheers, >> Paul >> >> On 11/18/2011 12:37 PM, Langston, Jim wrote: >>> >>> Hi all, >>> >>> I've been scratching and poking, but basically, the file I need to read >>> has >>> two delimiters that I need to contend with. The first is that the file >>> contains >>> tabs (\t) , instead of newlines (\n), and the second is that the fields >>> have >>> | for the seperators. I can easily do a read if I first convert the \t >>> to >>> \n >>> and then use read.table to get the file read with the | separator. But, >>> what I would really like to do, is do
Re: [R] Reading a file w/ two delimiters
... I failed to correctly paste the first line of an example: On Fri, Nov 18, 2011 at 10:44 AM, Bert Gunter wrote: > David: > > As you now realize "\t" etc. is a perfectly legal single tab character. > > Now consider: - left this out -- > gsub("\\","a","\\") --- > Error in gsub("\\", "a", "\\") : > invalid regular expression '\', reason 'Trailing backslash' > > BUT > >> gsub("","a","\\") > [1] "a" > > ??? > > The issue is there are two levels of escapes here -- the R parser's > and the reg expression's. The R parser recognizes "\\" as a single > backslash character in the third argument of gsub above. In the first > incorrect version, this single backslash is passed on to the reg > expression engine and it sees a single backslash, which is meaningless > to it. For example, a backreference would be something like "\\2" = > "backslash 2." > > The second incantation's first argument is correct and is passed onto > the reg expression engine as "backslash backslash," which it > interprets as an escaped "\" which is a literal "\" , per the > documentation. > > So what about : > >> cat(z) > ab cd> >> cat(sub("\\\t","\n",z)) > ab > cd> > > R passes "backslash tab_character" to the regexp engine, which looks > also to me like an error ; However, this may be one of those > "implementation dependent" details mentioned in the Help file, It > seems to me that the engine sees a meaningless escape sequence and > just throws away the escape to interpret the character literally. As > support for this, "\h" is not a meaningful escape sequence in R: > >> gsub("\\h","a","\h") > Error: '\h' is an unrecognized escape in character string starting "\h" > > and > >> gsub("\\h","a","h") > [1] "a" > > But I may be wrong, and I am hoping that this post will prompt someone > more knowledgeable than I to respond (if only just to confirm my > "explanation" if it's correct). > > Cheers, > Bert > > > > > > On Fri, Nov 18, 2011 at 7:26 AM, David Winsemius > wrote: >> >> On Nov 18, 2011, at 9:28 AM, jim holtman wrote: >> >>> It is pretty straightforward in R: >>> x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) closeAllConnections() # convert tabs to newlines x <- gsub("\t", "\n", x) >> >> Did the rules get liberalized for escaping patterns? Or have I been >> unnecessarily expending backslashes all these years. I thought that one >> needed 3 blackslashes. This code does work and I am wondering if/when I >> "didn't get the memo". (I do see that there is a line early in the ?regex >> page that suggests I have been deluded all along.) >> >> "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as >> LF, \r as CR and \t as TAB." >> >>> x <- >>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >>> closeAllConnections() >>> # convert tabs to newlines >>> x2 <- gsub("\\\t", "\n", x) >>> x2 >> [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv" >> >> So I guess my question is (now) why the triple-slash technique even works? >> >> -- >> David. >> >> >> # write out to a temp file and then read in as a data frame myFile <- tempfile() writeLines(x, con = myFile) x.df <- read.table(myFile, sep = "|") x.df >>> >>> V1 V2 V3 >>> 1 sadf asdf asdf >>> 2 qwer qwer qwer >>> 3 zxcv zxcv zxfcgv >>> >>> On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim >>> wrote: Thanks Paul, That's the path I was marching down, I was hoping for something a little cleaner, I do the same with Perl or Java. Jim On 11/18/11 8:35 AM, "Paul Hiemstra" wrote: > Hi Jim, > > You can read the text file using readLines. This puts each line in the > file into an element of a list. Then you can go through the lines > manually (e.g. using grep, sub, strsplit) and create your data.frame. > > cheers, > Paul > > On 11/18/2011 12:37 PM, Langston, Jim wrote: >> >> Hi all, >> >> I've been scratching and poking, but basically, the file I need to read >> has >> two delimiters that I need to contend with. The first is that the file >> contains >> tabs (\t) , instead of newlines (\n), and the second is that the fields >> have >> | for the seperators. I can easily do a read if I first convert the \t >> to >> \n >> and then use read.table to get the file read with the | separator. But, >> what I would really like to do, is do this all within R. I have a lot >> of >> files >> to read and do analysis on. >> >> I can read the data into a table using the \t has delimiter, but can't >> figure >> out how to take that table data and use the | for separation, I've look >> at >> string splits, etc. but haven't figured out how to split the whole >> table. >> >> Any thoughts ? hints ?
Re: [R] Reading a file w/ two delimiters
David: As you now realize "\t" etc. is a perfectly legal single tab character. Now consider: Error in gsub("\\", "a", "\\") : invalid regular expression '\', reason 'Trailing backslash' BUT > gsub("","a","\\") [1] "a" ??? The issue is there are two levels of escapes here -- the R parser's and the reg expression's. The R parser recognizes "\\" as a single backslash character in the third argument of gsub above. In the first incorrect version, this single backslash is passed on to the reg expression engine and it sees a single backslash, which is meaningless to it. For example, a backreference would be something like "\\2" = "backslash 2." The second incantation's first argument is correct and is passed onto the reg expression engine as "backslash backslash," which it interprets as an escaped "\" which is a literal "\" , per the documentation. So what about : > cat(z) ab cd> > cat(sub("\\\t","\n",z)) ab cd> R passes "backslash tab_character" to the regexp engine, which looks also to me like an error ; However, this may be one of those "implementation dependent" details mentioned in the Help file, It seems to me that the engine sees a meaningless escape sequence and just throws away the escape to interpret the character literally. As support for this, "\h" is not a meaningful escape sequence in R: > gsub("\\h","a","\h") Error: '\h' is an unrecognized escape in character string starting "\h" and > gsub("\\h","a","h") [1] "a" But I may be wrong, and I am hoping that this post will prompt someone more knowledgeable than I to respond (if only just to confirm my "explanation" if it's correct). Cheers, Bert On Fri, Nov 18, 2011 at 7:26 AM, David Winsemius wrote: > > On Nov 18, 2011, at 9:28 AM, jim holtman wrote: > >> It is pretty straightforward in R: >> >>> x <- >>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >>> closeAllConnections() >>> # convert tabs to newlines >>> x <- gsub("\t", "\n", x) > > Did the rules get liberalized for escaping patterns? Or have I been > unnecessarily expending backslashes all these years. I thought that one > needed 3 blackslashes. This code does work and I am wondering if/when I > "didn't get the memo". (I do see that there is a line early in the ?regex > page that suggests I have been deluded all along.) > > "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as > LF, \r as CR and \t as TAB." > >> x <- >> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >> closeAllConnections() >> # convert tabs to newlines >> x2 <- gsub("\\\t", "\n", x) >> x2 > [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv" > > So I guess my question is (now) why the triple-slash technique even works? > > -- > David. > > > >>> # write out to a temp file and then read in as a data frame >>> myFile <- tempfile() >>> writeLines(x, con = myFile) >>> x.df <- read.table(myFile, sep = "|") >>> >>> >>> x.df >> >> V1 V2 V3 >> 1 sadf asdf asdf >> 2 qwer qwer qwer >> 3 zxcv zxcv zxfcgv >>> >> >> On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim >> wrote: >>> >>> Thanks Paul, >>> >>> That's the path I was marching down, I was hoping for something >>> a little cleaner, I do the same with Perl or Java. >>> >>> Jim >>> >>> On 11/18/11 8:35 AM, "Paul Hiemstra" wrote: >>> Hi Jim, You can read the text file using readLines. This puts each line in the file into an element of a list. Then you can go through the lines manually (e.g. using grep, sub, strsplit) and create your data.frame. cheers, Paul On 11/18/2011 12:37 PM, Langston, Jim wrote: > > Hi all, > > I've been scratching and poking, but basically, the file I need to read > has > two delimiters that I need to contend with. The first is that the file > contains > tabs (\t) , instead of newlines (\n), and the second is that the fields > have > | for the seperators. I can easily do a read if I first convert the \t > to > \n > and then use read.table to get the file read with the | separator. But, > what I would really like to do, is do this all within R. I have a lot > of > files > to read and do analysis on. > > I can read the data into a table using the \t has delimiter, but can't > figure > out how to take that table data and use the | for separation, I've look > at > string splits, etc. but haven't figured out how to split the whole > table. > > Any thoughts ? hints ? > > Thanks, > > Jim > > > The contents of this e-mail are intended for the named > a...{{dropped:6}} > > >>> The contents of this e-mail are intended for the named addressee only. It >>> contains information that may be confidential. Unless you are the named >>> addressee or an authorized designee, you may not copy or use it, or disclose >>> it to anyone else. If you received it in error
Re: [R] Reading a file w/ two delimiters
On Fri, Nov 18, 2011 at 10:26 AM, David Winsemius wrote: > > On Nov 18, 2011, at 9:28 AM, jim holtman wrote: > >> It is pretty straightforward in R: >> >>> x <- >>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >>> closeAllConnections() >>> # convert tabs to newlines >>> x <- gsub("\t", "\n", x) > > Did the rules get liberalized for escaping patterns? Or have I been > unnecessarily expending backslashes all these years. I thought that one > needed 3 blackslashes. This code does work and I am wondering if/when I > "didn't get the memo". (I do see that there is a line early in the ?regex > page that suggests I have been deluded all along.) > > "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as > LF, \r as CR and \t as TAB." > >> x <- >> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >> closeAllConnections() >> # convert tabs to newlines >> x2 <- gsub("\\\t", "\n", x) >> x2 > [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv" > > So I guess my question is (now) why the triple-slash technique even works? > There are two levels of parsing: first its converted to a character string by R and in that parse "\\\t" gets converted to a backslash character followed by a tab character (2 characters). Secondly, the regular expression parser interprets those two characters as a tab. For example, consider these: > gsub("\\\t", "x", "\\\t,\t") # 1 [1] "\\x,x" > gsub("\\\t", "x", "\\\t,\t", fixed = TRUE) # 2 [1] "x,\t" The first arg in 1 is processed into backslash tab by R and then the regular expression parser processes that into just tab; however, the third argument in 1 is processed by R to backslash tab comma tab and is not further processed since its not regarded as a regular expression. Thus the result follows. In contrast the first arg in 2 is processed into backlash tab by R as before but now its not regarded as a regular expression so the second level of interpretation that occurred in 1 is not performed. Rather, only occurrences of backslash tab get replaced instead of occurrences of tab. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a file w/ two delimiters
On Nov 18, 2011, at 9:28 AM, jim holtman wrote: It is pretty straightforward in R: x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv| zxcv|zxfcgv")) closeAllConnections() # convert tabs to newlines x <- gsub("\t", "\n", x) Did the rules get liberalized for escaping patterns? Or have I been unnecessarily expending backslashes all these years. I thought that one needed 3 blackslashes. This code does work and I am wondering if/ when I "didn't get the memo". (I do see that there is a line early in the ?regex page that suggests I have been deluded all along.) "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as LF, \r as CR and \t as TAB." > x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv| zxcv|zxfcgv")) > closeAllConnections() > # convert tabs to newlines > x2 <- gsub("\\\t", "\n", x) > x2 [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv" So I guess my question is (now) why the triple-slash technique even works? -- David. # write out to a temp file and then read in as a data frame myFile <- tempfile() writeLines(x, con = myFile) x.df <- read.table(myFile, sep = "|") x.df V1 V2 V3 1 sadf asdf asdf 2 qwer qwer qwer 3 zxcv zxcv zxfcgv On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim wrote: Thanks Paul, That's the path I was marching down, I was hoping for something a little cleaner, I do the same with Perl or Java. Jim On 11/18/11 8:35 AM, "Paul Hiemstra" wrote: Hi Jim, You can read the text file using readLines. This puts each line in the file into an element of a list. Then you can go through the lines manually (e.g. using grep, sub, strsplit) and create your data.frame. cheers, Paul On 11/18/2011 12:37 PM, Langston, Jim wrote: Hi all, I've been scratching and poking, but basically, the file I need to read has two delimiters that I need to contend with. The first is that the file contains tabs (\t) , instead of newlines (\n), and the second is that the fields have | for the seperators. I can easily do a read if I first convert the \t to \n and then use read.table to get the file read with the | separator. But, what I would really like to do, is do this all within R. I have a lot of files to read and do analysis on. I can read the data into a table using the \t has delimiter, but can't figure out how to take that table data and use the | for separation, I've look at string splits, etc. but haven't figured out how to split the whole table. Any thoughts ? hints ? Thanks, Jim The contents of this e-mail are intended for the named a... {{dropped:6}} The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a file w/ two delimiters
The thing to watch out for is if you file is large, 'textConnection' is very slow at providing the data stream for something like read.table. It is usually much faster to read in the file with 'readLines', preprocess the data data, write it out to a tempfile and then read it back in with 'read.table'. On Fri, Nov 18, 2011 at 9:52 AM, David Winsemius wrote: > > On Nov 18, 2011, at 9:13 AM, Langston, Jim wrote: > >> Thanks Paul, >> >> That's the path I was marching down, I was hoping for something >> a little cleaner, I do the same with Perl or Java. > >> tesfil <- "aa|bb|cc\tdd|ee|ff\t" > >> read.table(textConnection(gsub("\\\t", "\n", scan( > textConnection(tesfil), # substitute your file here > what="character")) ), sep="|") > Read 2 items > V1 V2 V3 > 1 aa bb cc > 2 dd ee ff > >> >> Jim >> >> On 11/18/11 8:35 AM, "Paul Hiemstra" wrote: >> >>> Hi Jim, >>> >>> You can read the text file using readLines. This puts each line in the >>> file into an element of a list. Then you can go through the lines >>> manually (e.g. using grep, sub, strsplit) and create your data.frame. >>> >>> cheers, >>> Paul >>> >>> On 11/18/2011 12:37 PM, Langston, Jim wrote: Hi all, I've been scratching and poking, but basically, the file I need to read has two delimiters that I need to contend with. The first is that the file contains tabs (\t) , instead of newlines (\n), and the second is that the fields have | for the seperators. I can easily do a read if I first convert the \t to \n and then use read.table to get the file read with the | separator. But, what I would really like to do, is do this all within R. I have a lot of files to read and do analysis on. I can read the data into a table using the \t has delimiter, but can't figure out how to take that table data and use the | for separation, I've look at string splits, etc. but haven't figured out how to split the whole table. Any thoughts ? hints ? Thanks, Jim The contents of this e-mail are intended for the named a...{{dropped:6}} >> The contents of this e-mail are intended for the named addressee only. It >> contains information that may be confidential. Unless you are the named >> addressee or an authorized designee, you may not copy or use it, or disclose >> it to anyone else. If you received it in error please notify us immediately >> and then destroy it. >> R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> -- >>> Paul Hiemstra, Ph.D. >>> Global Climate Division >>> Royal Netherlands Meteorological Institute (KNMI) >>> Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 >>> P.O. Box 201 | 3730 AE | De Bilt >>> tel: +31 30 2206 494 >>> >>> http://intamap.geo.uu.nl/~paul >>> http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 >>> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a file w/ two delimiters
On Nov 18, 2011, at 9:13 AM, Langston, Jim wrote: Thanks Paul, That's the path I was marching down, I was hoping for something a little cleaner, I do the same with Perl or Java. > tesfil <- "aa|bb|cc\tdd|ee|ff\t" > read.table(textConnection(gsub("\\\t", "\n", scan( textConnection(tesfil), # substitute your file here what="character")) ), sep="|") Read 2 items V1 V2 V3 1 aa bb cc 2 dd ee ff Jim On 11/18/11 8:35 AM, "Paul Hiemstra" wrote: Hi Jim, You can read the text file using readLines. This puts each line in the file into an element of a list. Then you can go through the lines manually (e.g. using grep, sub, strsplit) and create your data.frame. cheers, Paul On 11/18/2011 12:37 PM, Langston, Jim wrote: Hi all, I've been scratching and poking, but basically, the file I need to read has two delimiters that I need to contend with. The first is that the file contains tabs (\t) , instead of newlines (\n), and the second is that the fields have | for the seperators. I can easily do a read if I first convert the \t to \n and then use read.table to get the file read with the | separator. But, what I would really like to do, is do this all within R. I have a lot of files to read and do analysis on. I can read the data into a table using the \t has delimiter, but can't figure out how to take that table data and use the | for separation, I've look at string splits, etc. but haven't figured out how to split the whole table. Any thoughts ? hints ? Thanks, Jim The contents of this e-mail are intended for the named a... {{dropped:6}} The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a file w/ two delimiters
It is pretty straightforward in R: > x <- > readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) > closeAllConnections() > # convert tabs to newlines > x <- gsub("\t", "\n", x) > # write out to a temp file and then read in as a data frame > myFile <- tempfile() > writeLines(x, con = myFile) > x.df <- read.table(myFile, sep = "|") > > > x.df V1 V2 V3 1 sadf asdf asdf 2 qwer qwer qwer 3 zxcv zxcv zxfcgv > On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim wrote: > Thanks Paul, > > That's the path I was marching down, I was hoping for something > a little cleaner, I do the same with Perl or Java. > > Jim > > On 11/18/11 8:35 AM, "Paul Hiemstra" wrote: > >>Hi Jim, >> >>You can read the text file using readLines. This puts each line in the >>file into an element of a list. Then you can go through the lines >>manually (e.g. using grep, sub, strsplit) and create your data.frame. >> >>cheers, >>Paul >> >>On 11/18/2011 12:37 PM, Langston, Jim wrote: >>> Hi all, >>> >>> I've been scratching and poking, but basically, the file I need to read >>>has >>> two delimiters that I need to contend with. The first is that the file >>> contains >>> tabs (\t) , instead of newlines (\n), and the second is that the fields >>> have >>> | for the seperators. I can easily do a read if I first convert the \t >>>to >>> \n >>> and then use read.table to get the file read with the | separator. But, >>> what I would really like to do, is do this all within R. I have a lot of >>> files >>> to read and do analysis on. >>> >>> I can read the data into a table using the \t has delimiter, but can't >>> figure >>> out how to take that table data and use the | for separation, I've look >>>at >>> string splits, etc. but haven't figured out how to split the whole >>>table. >>> >>> Any thoughts ? hints ? >>> >>> Thanks, >>> >>> Jim >>> >>> >>> The contents of this e-mail are intended for the named a...{{dropped:6}} >>> >>> > The contents of this e-mail are intended for the named addressee only. It > contains information that may be confidential. Unless you are the named > addressee or an authorized designee, you may not copy or use it, or disclose > it to anyone else. If you received it in error please notify us immediately > and then destroy it. > >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>>http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >>-- >>Paul Hiemstra, Ph.D. >>Global Climate Division >>Royal Netherlands Meteorological Institute (KNMI) >>Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 >>P.O. Box 201 | 3730 AE | De Bilt >>tel: +31 30 2206 494 >> >>http://intamap.geo.uu.nl/~paul >>http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a file w/ two delimiters
Thanks Paul, That's the path I was marching down, I was hoping for something a little cleaner, I do the same with Perl or Java. Jim On 11/18/11 8:35 AM, "Paul Hiemstra" wrote: >Hi Jim, > >You can read the text file using readLines. This puts each line in the >file into an element of a list. Then you can go through the lines >manually (e.g. using grep, sub, strsplit) and create your data.frame. > >cheers, >Paul > >On 11/18/2011 12:37 PM, Langston, Jim wrote: >> Hi all, >> >> I've been scratching and poking, but basically, the file I need to read >>has >> two delimiters that I need to contend with. The first is that the file >> contains >> tabs (\t) , instead of newlines (\n), and the second is that the fields >> have >> | for the seperators. I can easily do a read if I first convert the \t >>to >> \n >> and then use read.table to get the file read with the | separator. But, >> what I would really like to do, is do this all within R. I have a lot of >> files >> to read and do analysis on. >> >> I can read the data into a table using the \t has delimiter, but can't >> figure >> out how to take that table data and use the | for separation, I've look >>at >> string splits, etc. but haven't figured out how to split the whole >>table. >> >> Any thoughts ? hints ? >> >> Thanks, >> >> Jim >> >> >> The contents of this e-mail are intended for the named a...{{dropped:6}} >> >> The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > >-- >Paul Hiemstra, Ph.D. >Global Climate Division >Royal Netherlands Meteorological Institute (KNMI) >Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 >P.O. Box 201 | 3730 AE | De Bilt >tel: +31 30 2206 494 > >http://intamap.geo.uu.nl/~paul >http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a file w/ two delimiters
Hi Jim, You can read the text file using readLines. This puts each line in the file into an element of a list. Then you can go through the lines manually (e.g. using grep, sub, strsplit) and create your data.frame. cheers, Paul On 11/18/2011 12:37 PM, Langston, Jim wrote: > Hi all, > > I've been scratching and poking, but basically, the file I need to read has > two delimiters that I need to contend with. The first is that the file > contains > tabs (\t) , instead of newlines (\n), and the second is that the fields > have > | for the seperators. I can easily do a read if I first convert the \t to > \n > and then use read.table to get the file read with the | separator. But, > what I would really like to do, is do this all within R. I have a lot of > files > to read and do analysis on. > > I can read the data into a table using the \t has delimiter, but can't > figure > out how to take that table data and use the | for separation, I've look at > string splits, etc. but haven't figured out how to split the whole table. > > Any thoughts ? hints ? > > Thanks, > > Jim > > > The contents of this e-mail are intended for the named a...{{dropped:6}} > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.