Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread Bert Gunter
... and yet another line I left out below!  I apologize for this baloney!

On Fri, Nov 18, 2011 at 10:48 AM, Bert Gunter  wrote:
> ... I failed to correctly paste the first line of an example:
>
> On Fri, Nov 18, 2011 at 10:44 AM, Bert Gunter  wrote:
>> David:
>>
>> As you now realize "\t" etc. is a perfectly legal single tab character.
>>
>> Now consider:
> -  left this out --
>> gsub("\\","a","\\")
> ---
>> Error in gsub("\\", "a", "\\") :
>>  invalid regular expression '\', reason 'Trailing backslash'
>>
>> BUT
>>
>>> gsub("","a","\\")
>> [1] "a"
>>
>> ???
>>
>> The issue is there are two levels of escapes here -- the R parser's
>> and the reg expression's. The R parser recognizes "\\" as a single
>> backslash character in the third argument of gsub above. In the first
>> incorrect version, this single backslash is passed on to the reg
>> expression engine and it sees a single backslash, which is meaningless
>> to it. For example, a backreference would be something like "\\2"  =
>> "backslash 2."
>>
>> The second incantation's first argument is correct and is passed onto
>> the reg expression engine as "backslash backslash," which it
>> interprets as an escaped "\" which is a literal "\" , per the
>> documentation.
>>
>> So what about :
>>
---  also left this out ---
z <- "ab\tcd"
---
>>> cat(z)
>> ab      cd>
>>> cat(sub("\\\t","\n",z))
>> ab
>> cd>
>>
>> R passes "backslash tab_character" to the regexp engine, which looks
>> also to me like an error ;  However, this may be one of those
>> "implementation dependent" details mentioned in the Help file, It
>> seems to me that the engine sees a meaningless escape sequence and
>> just throws away the escape to interpret the character literally. As
>> support for this, "\h" is not a meaningful escape sequence in R:
>>
>>> gsub("\\h","a","\h")
>> Error: '\h' is an unrecognized escape in character string starting "\h"
>>
>> and
>>
>>> gsub("\\h","a","h")
>> [1] "a"
>>
>> But I may be wrong, and I am hoping that this post will prompt someone
>> more knowledgeable than I to respond (if only just to confirm my
>> "explanation" if it's correct).
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>>
>> On Fri, Nov 18, 2011 at 7:26 AM, David Winsemius  
>> wrote:
>>>
>>> On Nov 18, 2011, at 9:28 AM, jim holtman wrote:
>>>
 It is pretty straightforward in R:

> x <-
> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
> closeAllConnections()
> # convert tabs to newlines
> x <- gsub("\t", "\n", x)
>>>
>>> Did the rules get liberalized for escaping patterns? Or have I been
>>> unnecessarily expending backslashes all these years. I thought that one
>>> needed 3 blackslashes. This code does work and I am wondering if/when I
>>> "didn't get the memo". (I do see that there is a line early in the ?regex
>>> page that suggests I have been deluded all along.)
>>>
>>> "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as
>>> LF, \r as CR and \t as TAB."
>>>
 x <-
 readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
 closeAllConnections()
 # convert tabs to newlines
 x2 <- gsub("\\\t", "\n", x)
 x2
>>> [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv"
>>>
>>> So I guess my question is (now) why the triple-slash technique even works?
>>>
>>> --
>>> David.
>>>
>>>
>>>
> # write out to a temp file and then read in as a data frame
> myFile <- tempfile()
> writeLines(x, con = myFile)
> x.df <- read.table(myFile, sep = "|")
>
>
> x.df

   V1   V2     V3
 1 sadf asdf   asdf
 2 qwer qwer   qwer
 3 zxcv zxcv zxfcgv
>

 On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim
  wrote:
>
> Thanks Paul,
>
> That's the path I was marching down, I was hoping for something
> a little cleaner, I do the same with Perl or Java.
>
> Jim
>
> On 11/18/11 8:35 AM, "Paul Hiemstra"  wrote:
>
>> Hi Jim,
>>
>> You can read the text file using readLines. This puts each line in the
>> file into an element of a list. Then you can go through the lines
>> manually (e.g. using grep, sub, strsplit) and create your data.frame.
>>
>> cheers,
>> Paul
>>
>> On 11/18/2011 12:37 PM, Langston, Jim wrote:
>>>
>>> Hi all,
>>>
>>> I've been scratching and poking, but basically, the file I need to read
>>> has
>>> two delimiters that I need to contend with. The first is that the file
>>> contains
>>> tabs (\t) , instead of newlines (\n), and the second is that the fields
>>> have
>>> | for the seperators. I can easily do a read if I first convert the \t
>>> to
>>> \n
>>> and then use read.table to get the file read with the | separator. But,
>>> what I would really like to do, is do 

Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread Bert Gunter
... I failed to correctly paste the first line of an example:

On Fri, Nov 18, 2011 at 10:44 AM, Bert Gunter  wrote:
> David:
>
> As you now realize "\t" etc. is a perfectly legal single tab character.
>
> Now consider:
-  left this out --
> gsub("\\","a","\\")
---
> Error in gsub("\\", "a", "\\") :
>  invalid regular expression '\', reason 'Trailing backslash'
>
> BUT
>
>> gsub("","a","\\")
> [1] "a"
>
> ???
>
> The issue is there are two levels of escapes here -- the R parser's
> and the reg expression's. The R parser recognizes "\\" as a single
> backslash character in the third argument of gsub above. In the first
> incorrect version, this single backslash is passed on to the reg
> expression engine and it sees a single backslash, which is meaningless
> to it. For example, a backreference would be something like "\\2"  =
> "backslash 2."
>
> The second incantation's first argument is correct and is passed onto
> the reg expression engine as "backslash backslash," which it
> interprets as an escaped "\" which is a literal "\" , per the
> documentation.
>
> So what about :
>
>> cat(z)
> ab      cd>
>> cat(sub("\\\t","\n",z))
> ab
> cd>
>
> R passes "backslash tab_character" to the regexp engine, which looks
> also to me like an error ;  However, this may be one of those
> "implementation dependent" details mentioned in the Help file, It
> seems to me that the engine sees a meaningless escape sequence and
> just throws away the escape to interpret the character literally. As
> support for this, "\h" is not a meaningful escape sequence in R:
>
>> gsub("\\h","a","\h")
> Error: '\h' is an unrecognized escape in character string starting "\h"
>
> and
>
>> gsub("\\h","a","h")
> [1] "a"
>
> But I may be wrong, and I am hoping that this post will prompt someone
> more knowledgeable than I to respond (if only just to confirm my
> "explanation" if it's correct).
>
> Cheers,
> Bert
>
>
>
>
>
> On Fri, Nov 18, 2011 at 7:26 AM, David Winsemius  
> wrote:
>>
>> On Nov 18, 2011, at 9:28 AM, jim holtman wrote:
>>
>>> It is pretty straightforward in R:
>>>
 x <-
 readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
 closeAllConnections()
 # convert tabs to newlines
 x <- gsub("\t", "\n", x)
>>
>> Did the rules get liberalized for escaping patterns? Or have I been
>> unnecessarily expending backslashes all these years. I thought that one
>> needed 3 blackslashes. This code does work and I am wondering if/when I
>> "didn't get the memo". (I do see that there is a line early in the ?regex
>> page that suggests I have been deluded all along.)
>>
>> "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as
>> LF, \r as CR and \t as TAB."
>>
>>> x <-
>>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
>>> closeAllConnections()
>>> # convert tabs to newlines
>>> x2 <- gsub("\\\t", "\n", x)
>>> x2
>> [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv"
>>
>> So I guess my question is (now) why the triple-slash technique even works?
>>
>> --
>> David.
>>
>>
>>
 # write out to a temp file and then read in as a data frame
 myFile <- tempfile()
 writeLines(x, con = myFile)
 x.df <- read.table(myFile, sep = "|")


 x.df
>>>
>>>   V1   V2     V3
>>> 1 sadf asdf   asdf
>>> 2 qwer qwer   qwer
>>> 3 zxcv zxcv zxfcgv

>>>
>>> On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim
>>>  wrote:

 Thanks Paul,

 That's the path I was marching down, I was hoping for something
 a little cleaner, I do the same with Perl or Java.

 Jim

 On 11/18/11 8:35 AM, "Paul Hiemstra"  wrote:

> Hi Jim,
>
> You can read the text file using readLines. This puts each line in the
> file into an element of a list. Then you can go through the lines
> manually (e.g. using grep, sub, strsplit) and create your data.frame.
>
> cheers,
> Paul
>
> On 11/18/2011 12:37 PM, Langston, Jim wrote:
>>
>> Hi all,
>>
>> I've been scratching and poking, but basically, the file I need to read
>> has
>> two delimiters that I need to contend with. The first is that the file
>> contains
>> tabs (\t) , instead of newlines (\n), and the second is that the fields
>> have
>> | for the seperators. I can easily do a read if I first convert the \t
>> to
>> \n
>> and then use read.table to get the file read with the | separator. But,
>> what I would really like to do, is do this all within R. I have a lot
>> of
>> files
>> to read and do analysis on.
>>
>> I can read the data into a table using the \t has delimiter, but can't
>> figure
>> out how to take that table data and use the | for separation, I've look
>> at
>> string splits, etc. but haven't figured out how to split the whole
>> table.
>>
>> Any thoughts ? hints ?

Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread Bert Gunter
David:

As you now realize "\t" etc. is a perfectly legal single tab character.

Now consider:
Error in gsub("\\", "a", "\\") :
  invalid regular expression '\', reason 'Trailing backslash'

BUT

> gsub("","a","\\")
[1] "a"

???

The issue is there are two levels of escapes here -- the R parser's
and the reg expression's. The R parser recognizes "\\" as a single
backslash character in the third argument of gsub above. In the first
incorrect version, this single backslash is passed on to the reg
expression engine and it sees a single backslash, which is meaningless
to it. For example, a backreference would be something like "\\2"  =
"backslash 2."

The second incantation's first argument is correct and is passed onto
the reg expression engine as "backslash backslash," which it
interprets as an escaped "\" which is a literal "\" , per the
documentation.

So what about :

> cat(z)
ab  cd>
> cat(sub("\\\t","\n",z))
ab
cd>

R passes "backslash tab_character" to the regexp engine, which looks
also to me like an error ;  However, this may be one of those
"implementation dependent" details mentioned in the Help file, It
seems to me that the engine sees a meaningless escape sequence and
just throws away the escape to interpret the character literally. As
support for this, "\h" is not a meaningful escape sequence in R:

> gsub("\\h","a","\h")
Error: '\h' is an unrecognized escape in character string starting "\h"

and

> gsub("\\h","a","h")
[1] "a"

But I may be wrong, and I am hoping that this post will prompt someone
more knowledgeable than I to respond (if only just to confirm my
"explanation" if it's correct).

Cheers,
Bert





On Fri, Nov 18, 2011 at 7:26 AM, David Winsemius  wrote:
>
> On Nov 18, 2011, at 9:28 AM, jim holtman wrote:
>
>> It is pretty straightforward in R:
>>
>>> x <-
>>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
>>> closeAllConnections()
>>> # convert tabs to newlines
>>> x <- gsub("\t", "\n", x)
>
> Did the rules get liberalized for escaping patterns? Or have I been
> unnecessarily expending backslashes all these years. I thought that one
> needed 3 blackslashes. This code does work and I am wondering if/when I
> "didn't get the memo". (I do see that there is a line early in the ?regex
> page that suggests I have been deluded all along.)
>
> "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as
> LF, \r as CR and \t as TAB."
>
>> x <-
>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
>> closeAllConnections()
>> # convert tabs to newlines
>> x2 <- gsub("\\\t", "\n", x)
>> x2
> [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv"
>
> So I guess my question is (now) why the triple-slash technique even works?
>
> --
> David.
>
>
>
>>> # write out to a temp file and then read in as a data frame
>>> myFile <- tempfile()
>>> writeLines(x, con = myFile)
>>> x.df <- read.table(myFile, sep = "|")
>>>
>>>
>>> x.df
>>
>>   V1   V2     V3
>> 1 sadf asdf   asdf
>> 2 qwer qwer   qwer
>> 3 zxcv zxcv zxfcgv
>>>
>>
>> On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim
>>  wrote:
>>>
>>> Thanks Paul,
>>>
>>> That's the path I was marching down, I was hoping for something
>>> a little cleaner, I do the same with Perl or Java.
>>>
>>> Jim
>>>
>>> On 11/18/11 8:35 AM, "Paul Hiemstra"  wrote:
>>>
 Hi Jim,

 You can read the text file using readLines. This puts each line in the
 file into an element of a list. Then you can go through the lines
 manually (e.g. using grep, sub, strsplit) and create your data.frame.

 cheers,
 Paul

 On 11/18/2011 12:37 PM, Langston, Jim wrote:
>
> Hi all,
>
> I've been scratching and poking, but basically, the file I need to read
> has
> two delimiters that I need to contend with. The first is that the file
> contains
> tabs (\t) , instead of newlines (\n), and the second is that the fields
> have
> | for the seperators. I can easily do a read if I first convert the \t
> to
> \n
> and then use read.table to get the file read with the | separator. But,
> what I would really like to do, is do this all within R. I have a lot
> of
> files
> to read and do analysis on.
>
> I can read the data into a table using the \t has delimiter, but can't
> figure
> out how to take that table data and use the | for separation, I've look
> at
> string splits, etc. but haven't figured out how to split the whole
> table.
>
> Any thoughts ? hints ?
>
> Thanks,
>
> Jim
>
>
> The contents of this e-mail are intended for the named
> a...{{dropped:6}}
>
>
>>> The contents of this e-mail are intended for the named addressee only. It
>>> contains information that may be confidential. Unless you are the named
>>> addressee or an authorized designee, you may not copy or use it, or disclose
>>> it to anyone else. If you received it in error

Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread Gabor Grothendieck
On Fri, Nov 18, 2011 at 10:26 AM, David Winsemius
 wrote:
>
> On Nov 18, 2011, at 9:28 AM, jim holtman wrote:
>
>> It is pretty straightforward in R:
>>
>>> x <-
>>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
>>> closeAllConnections()
>>> # convert tabs to newlines
>>> x <- gsub("\t", "\n", x)
>
> Did the rules get liberalized for escaping patterns? Or have I been
> unnecessarily expending backslashes all these years. I thought that one
> needed 3 blackslashes. This code does work and I am wondering if/when I
> "didn't get the memo". (I do see that there is a line early in the ?regex
> page that suggests I have been deluded all along.)
>
> "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as
> LF, \r as CR and \t as TAB."
>
>> x <-
>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
>> closeAllConnections()
>> # convert tabs to newlines
>> x2 <- gsub("\\\t", "\n", x)
>> x2
> [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv"
>
> So I guess my question is (now) why the triple-slash technique even works?
>

There are two levels of parsing:  first its converted to a character
string by R and in that parse "\\\t" gets converted to a backslash
character followed by a tab character (2 characters).  Secondly, the
regular expression parser interprets those two characters as a tab.
For example, consider these:

> gsub("\\\t", "x", "\\\t,\t") # 1
[1] "\\x,x"
> gsub("\\\t", "x", "\\\t,\t", fixed = TRUE) # 2
[1] "x,\t"

The first arg in 1 is processed into backslash tab by R and then the
regular expression parser processes that into just tab; however, the
third argument in 1 is processed by R to backslash tab comma tab and
is not further processed since its not regarded as a regular
expression.  Thus the result follows.

In contrast the first arg in 2 is processed into backlash tab by R as
before but now its not regarded as a regular expression so the second
level of interpretation that occurred in 1 is not performed.  Rather,
only occurrences of backslash tab get replaced instead of occurrences
of tab.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread David Winsemius


On Nov 18, 2011, at 9:28 AM, jim holtman wrote:


It is pretty straightforward in R:

x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv| 
zxcv|zxfcgv"))

closeAllConnections()
# convert tabs to newlines
x <- gsub("\t", "\n", x)


Did the rules get liberalized for escaping patterns? Or have I been  
unnecessarily expending backslashes all these years. I thought that  
one needed 3 blackslashes. This code does work and I am wondering if/ 
when I "didn't get the memo". (I do see that there is a line early in  
the ?regex page that suggests I have been deluded all along.)


"The current implementation interprets \a as BEL, \e asESC, \f as FF,  
\n as LF, \r as CR and \t as TAB."


> x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv| 
zxcv|zxfcgv"))

> closeAllConnections()
> # convert tabs to newlines
> x2 <- gsub("\\\t", "\n", x)
> x2
[1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv"

So I guess my question is (now) why the triple-slash technique even  
works?


--
David.




# write out to a temp file and then read in as a data frame
myFile <- tempfile()
writeLines(x, con = myFile)
x.df <- read.table(myFile, sep = "|")


x.df

   V1   V2 V3
1 sadf asdf   asdf
2 qwer qwer   qwer
3 zxcv zxcv zxfcgv




On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim
 wrote:

Thanks Paul,

That's the path I was marching down, I was hoping for something
a little cleaner, I do the same with Perl or Java.

Jim

On 11/18/11 8:35 AM, "Paul Hiemstra"  wrote:


Hi Jim,

You can read the text file using readLines. This puts each line in  
the

file into an element of a list. Then you can go through the lines
manually (e.g. using grep, sub, strsplit) and create your  
data.frame.


cheers,
Paul

On 11/18/2011 12:37 PM, Langston, Jim wrote:

Hi all,

I've been scratching and poking, but basically, the file I need  
to read

has
two delimiters that I need to contend with. The first is that the  
file

contains
tabs (\t) , instead of newlines (\n), and the second is that the  
fields

have
| for the seperators. I can easily do a read if I first convert  
the \t

to
\n
and then use read.table to get the file read with the |  
separator. But,
what I would really like to do, is do this all within R. I have a  
lot of

files
to read and do analysis on.

I can read the data into a table using the \t has delimiter, but  
can't

figure
out how to take that table data and use the | for separation,  
I've look

at
string splits, etc. but haven't figured out how to split the whole
table.

Any thoughts ? hints ?

Thanks,

Jim


The contents of this e-mail are intended for the named a... 
{{dropped:6}}



The contents of this e-mail are intended for the named addressee  
only. It contains information that may be confidential. Unless you  
are the named addressee or an authorized designee, you may not copy  
or use it, or disclose it to anyone else. If you received it in  
error please notify us immediately and then destroy it.



R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread jim holtman
The thing to watch out for is if you file is large, 'textConnection'
is very slow at providing the data stream for something like
read.table.  It is usually much faster to read in the file with
'readLines', preprocess the data data, write it out to a tempfile and
then read it back in with 'read.table'.

On Fri, Nov 18, 2011 at 9:52 AM, David Winsemius  wrote:
>
> On Nov 18, 2011, at 9:13 AM, Langston, Jim wrote:
>
>> Thanks Paul,
>>
>> That's the path I was marching down, I was hoping for something
>> a little cleaner, I do the same with Perl or Java.
>
>> tesfil <- "aa|bb|cc\tdd|ee|ff\t"
>
>> read.table(textConnection(gsub("\\\t", "\n", scan(
>               textConnection(tesfil), # substitute your file here
>               what="character")) ), sep="|")
> Read 2 items
>  V1 V2 V3
> 1 aa bb cc
> 2 dd ee ff
>
>>
>> Jim
>>
>> On 11/18/11 8:35 AM, "Paul Hiemstra"  wrote:
>>
>>> Hi Jim,
>>>
>>> You can read the text file using readLines. This puts each line in the
>>> file into an element of a list. Then you can go through the lines
>>> manually (e.g. using grep, sub, strsplit) and create your data.frame.
>>>
>>> cheers,
>>> Paul
>>>
>>> On 11/18/2011 12:37 PM, Langston, Jim wrote:

 Hi all,

 I've been scratching and poking, but basically, the file I need to read
 has
 two delimiters that I need to contend with. The first is that the file
 contains
 tabs (\t) , instead of newlines (\n), and the second is that the fields
 have
 | for the seperators. I can easily do a read if I first convert the \t
 to
 \n
 and then use read.table to get the file read with the | separator. But,
 what I would really like to do, is do this all within R. I have a lot of
 files
 to read and do analysis on.

 I can read the data into a table using the \t has delimiter, but can't
 figure
 out how to take that table data and use the | for separation, I've look
 at
 string splits, etc. but haven't figured out how to split the whole
 table.

 Any thoughts ? hints ?

 Thanks,

 Jim


 The contents of this e-mail are intended for the named a...{{dropped:6}}


>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or disclose
>> it to anyone else. If you received it in error please notify us immediately
>> and then destroy it.
>>
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> --
>>> Paul Hiemstra, Ph.D.
>>> Global Climate Division
>>> Royal Netherlands Meteorological Institute (KNMI)
>>> Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
>>> P.O. Box 201 | 3730 AE | De Bilt
>>> tel: +31 30 2206 494
>>>
>>> http://intamap.geo.uu.nl/~paul
>>> http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread David Winsemius


On Nov 18, 2011, at 9:13 AM, Langston, Jim wrote:


Thanks Paul,

That's the path I was marching down, I was hoping for something
a little cleaner, I do the same with Perl or Java.


> tesfil <- "aa|bb|cc\tdd|ee|ff\t"

> read.table(textConnection(gsub("\\\t", "\n", scan(
   textConnection(tesfil), # substitute your file here
   what="character")) ), sep="|")
Read 2 items
  V1 V2 V3
1 aa bb cc
2 dd ee ff



Jim

On 11/18/11 8:35 AM, "Paul Hiemstra"  wrote:


Hi Jim,

You can read the text file using readLines. This puts each line in  
the

file into an element of a list. Then you can go through the lines
manually (e.g. using grep, sub, strsplit) and create your data.frame.

cheers,
Paul

On 11/18/2011 12:37 PM, Langston, Jim wrote:

Hi all,

I've been scratching and poking, but basically, the file I need to  
read

has
two delimiters that I need to contend with. The first is that the  
file

contains
tabs (\t) , instead of newlines (\n), and the second is that the  
fields

have
| for the seperators. I can easily do a read if I first convert  
the \t

to
\n
and then use read.table to get the file read with the | separator.  
But,
what I would really like to do, is do this all within R. I have a  
lot of

files
to read and do analysis on.

I can read the data into a table using the \t has delimiter, but  
can't

figure
out how to take that table data and use the | for separation, I've  
look

at
string splits, etc. but haven't figured out how to split the whole
table.

Any thoughts ? hints ?

Thanks,

Jim


The contents of this e-mail are intended for the named a... 
{{dropped:6}}



The contents of this e-mail are intended for the named addressee  
only. It contains information that may be confidential. Unless you  
are the named addressee or an authorized designee, you may not copy  
or use it, or disclose it to anyone else. If you received it in  
error please notify us immediately and then destroy it.



R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread jim holtman
It is pretty straightforward in R:

> x <- 
> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
> closeAllConnections()
> # convert tabs to newlines
> x <- gsub("\t", "\n", x)
> # write out to a temp file and then read in as a data frame
> myFile <- tempfile()
> writeLines(x, con = myFile)
> x.df <- read.table(myFile, sep = "|")
>
>
> x.df
V1   V2 V3
1 sadf asdf   asdf
2 qwer qwer   qwer
3 zxcv zxcv zxfcgv
>

On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim
 wrote:
> Thanks Paul,
>
> That's the path I was marching down, I was hoping for something
> a little cleaner, I do the same with Perl or Java.
>
> Jim
>
> On 11/18/11 8:35 AM, "Paul Hiemstra"  wrote:
>
>>Hi Jim,
>>
>>You can read the text file using readLines. This puts each line in the
>>file into an element of a list. Then you can go through the lines
>>manually (e.g. using grep, sub, strsplit) and create your data.frame.
>>
>>cheers,
>>Paul
>>
>>On 11/18/2011 12:37 PM, Langston, Jim wrote:
>>> Hi all,
>>>
>>> I've been scratching and poking, but basically, the file I need to read
>>>has
>>> two delimiters that I need to contend with. The first is that the file
>>> contains
>>> tabs (\t) , instead of newlines (\n), and the second is that the fields
>>> have
>>> | for the seperators. I can easily do a read if I first convert the \t
>>>to
>>> \n
>>> and then use read.table to get the file read with the | separator. But,
>>> what I would really like to do, is do this all within R. I have a lot of
>>> files
>>> to read and do analysis on.
>>>
>>> I can read the data into a table using the \t has delimiter, but can't
>>> figure
>>> out how to take that table data and use the | for separation, I've look
>>>at
>>> string splits, etc. but haven't figured out how to split the whole
>>>table.
>>>
>>> Any thoughts ? hints ?
>>>
>>> Thanks,
>>>
>>> Jim
>>>
>>>
>>> The contents of this e-mail are intended for the named a...{{dropped:6}}
>>>
>>>
> The contents of this e-mail are intended for the named addressee only. It 
> contains information that may be confidential. Unless you are the named 
> addressee or an authorized designee, you may not copy or use it, or disclose 
> it to anyone else. If you received it in error please notify us immediately 
> and then destroy it.
>
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>--
>>Paul Hiemstra, Ph.D.
>>Global Climate Division
>>Royal Netherlands Meteorological Institute (KNMI)
>>Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
>>P.O. Box 201 | 3730 AE | De Bilt
>>tel: +31 30 2206 494
>>
>>http://intamap.geo.uu.nl/~paul
>>http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread Langston, Jim
Thanks Paul,

That's the path I was marching down, I was hoping for something
a little cleaner, I do the same with Perl or Java.

Jim

On 11/18/11 8:35 AM, "Paul Hiemstra"  wrote:

>Hi Jim,
>
>You can read the text file using readLines. This puts each line in the
>file into an element of a list. Then you can go through the lines
>manually (e.g. using grep, sub, strsplit) and create your data.frame.
>
>cheers,
>Paul
>
>On 11/18/2011 12:37 PM, Langston, Jim wrote:
>> Hi all,
>>
>> I've been scratching and poking, but basically, the file I need to read
>>has
>> two delimiters that I need to contend with. The first is that the file
>> contains
>> tabs (\t) , instead of newlines (\n), and the second is that the fields
>> have 
>> | for the seperators. I can easily do a read if I first convert the \t
>>to
>> \n
>> and then use read.table to get the file read with the | separator. But,
>> what I would really like to do, is do this all within R. I have a lot of
>> files
>> to read and do analysis on.
>>
>> I can read the data into a table using the \t has delimiter, but can't
>> figure
>> out how to take that table data and use the | for separation, I've look
>>at
>> string splits, etc. but haven't figured out how to split the whole
>>table.
>>
>> Any thoughts ? hints ?
>>
>> Thanks,
>>
>> Jim
>>
>>
>> The contents of this e-mail are intended for the named a...{{dropped:6}}
>>
>> 
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it.

>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>-- 
>Paul Hiemstra, Ph.D.
>Global Climate Division
>Royal Netherlands Meteorological Institute (KNMI)
>Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
>P.O. Box 201 | 3730 AE | De Bilt
>tel: +31 30 2206 494
>
>http://intamap.geo.uu.nl/~paul
>http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a file w/ two delimiters

2011-11-18 Thread Paul Hiemstra
Hi Jim,

You can read the text file using readLines. This puts each line in the
file into an element of a list. Then you can go through the lines
manually (e.g. using grep, sub, strsplit) and create your data.frame.

cheers,
Paul

On 11/18/2011 12:37 PM, Langston, Jim wrote:
> Hi all,
>
> I've been scratching and poking, but basically, the file I need to read has
> two delimiters that I need to contend with. The first is that the file
> contains
> tabs (\t) , instead of newlines (\n), and the second is that the fields
> have 
> | for the seperators. I can easily do a read if I first convert the \t to
> \n
> and then use read.table to get the file read with the | separator. But,
> what I would really like to do, is do this all within R. I have a lot of
> files
> to read and do analysis on.
>
> I can read the data into a table using the \t has delimiter, but can't
> figure
> out how to take that table data and use the | for separation, I've look at
> string splits, etc. but haven't figured out how to split the whole table.
>
> Any thoughts ? hints ?
>
> Thanks,
>
> Jim
>
>
> The contents of this e-mail are intended for the named a...{{dropped:6}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.