Hello,
My code doesn't predict a point you've made clear in this post. Inline.
Em 10-08-2012 19:05, Fred G escreveu:
Thanks Arun. The only issue is that I need the code to be very
generalizable, such that the grep() really has to be if the first string up
to the whitespace in a row (ie "New", "Boston", "Washington", "Detroit
below) is the same as the first string up to the whitespace in the row
directly below it
Does this mean that "New York" ---> "New" in one row shouldn't match
"Other New" in the next row because "New" is not the first string up to
the whitespace? If this is the case, modify my earlier code to
fun <- function(i, x){
if(x[i, "ID"] != x[i + 1, "ID"]){
s1 <- unlist(strsplit(x[i, "NAME"], "[[:space:]]"))[1] #
keep first string
s2 <- unlist(strsplit(x[i + 1, "NAME"], "[[:space:]]"))[1] #
keep first string
if(grepl(s1, s2)) return(TRUE)
}
FALSE
}
If it isn't the case, do nothing.
Rui Barradas
, AND the ID's are different, then copy. The actual file
has thousands of different IDs and names...
On Fri, Aug 10, 2012 at 2:01 PM, arun <smartpink...@yahoo.com> wrote:
Hi,
Try this:
dat1<-read.table(text="
ID, NAME, YEAR, SOURCE
1, New York Mets, 1900, ESPN
2, New York Yankees, 1920, Cooperstown
3, Boston Redsox, 1918, ESPN
4, Washington Nationals, 2010, ESPN
5, Detroit Tigers, 1990, ESPN
",sep=",",header=TRUE,stringsAsFactors=FALSE)
index<-grep("New York.*",dat1$NAME)
dat1[index,]
# ID NAME YEAR SOURCE
#1 1 New York Mets 1900 ESPN
#2 2 New York Yankees 1920 Cooperstown
A.K.
----- Original Message -----
From: Fred G <bayespoker...@gmail.com>
To: r-help@r-project.org
Cc:
Sent: Friday, August 10, 2012 1:41 PM
Subject: [R] Regular Expressions + Matrices
Hi all,
My code looks like the following:
inname = read.csv("ID_error_checker.csv", as.is=TRUE)
outname = read.csv("output.csv", as.is=TRUE)
#My algorithm is the following:
#for line in inname
#if first string up to whitespace in row in inname$name = first string up
to whitespace in row + 1 in inname$name
#AND ID in inname$ID for the top row NOT EQUAL ID in inname$ID for the row
below it
#copy these two lines to a new file
In other words, if the name (up to the first whitespace) in the first row
equals the name in the second row (etc for whole file) and the ID in the
first row does not equal the ID in the second row, copy both of these rows
in full to a new file. Only caveat is that I want a regular expression not
to take the full names, but just the first string up to the first
whitespace in the inname$name column (ie if row1 has a name of: New York
Mets and row2 has a name of New York Yankees, I would want both of these
rows to be copied in full since "New" is the same in both...)
Here is some example data:
ID NAME YEAR SOURCE NOTES
1 New York Mets 1900 ESPN
2 New York Yankees 1920 Cooperstown
3 Boston Redsox 1918 ESPN
4 Washington Nationals 2010 ESPN
5 Detroit Tigers 1990 ESPN
The desired output would be:
ID NAME YEAR SOURCE
1 New York Mets 1900 ESPN
2 New York Yankees 1920 Cooperstown
Thanks so much!
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.